Here’s a quick demo of the finished gesture tracking prototype:
In this post we are going to pick up where we left off and broadcast the skeleton data from Node.js through web sockets for other computers and browser to use. In this instance we will subscribe to the feed from Chrome on a Macintosh. We will also look at basic gesture interpretation from this pure data.
In order to help us start and run a Node.js server we need to install the Express library. For the real-time data streaming through web sockets we need Socket.io. If you create a new folder and a new file called package.json and fill out the library dependencies section correctly then the
npm install terminal command will automatically install everything for you:
npm install into the command line tool inside the project folder all the necessary libraries should be installed.
Broadcasting skeleton data
Let’s start building our skeleton data broadcasting Node.js application. First we load all the necessary library:
We also need an instance of the Kinect 2 library:
Next, we call kinect.open() and in the same line we are also waiting for the response to be true in case powering up and accessing the Kinect sensor was successful:
Inside this condition we start our server:
The following expression will serve the index.html file inside the public folder. This step isn’t necessary as you can subscribe to the web socket data from html files that are not hosted on this server. Can be just a simple local page.
The final chunk of code is responsible for actually listening for received bodyFrames from the Kinect sensor which is one frame worth of skeleton data formatted into JSON. Inside the anonymous callback function we have the bodyFrame variable exposed which has the object. This is what we start sending through the web socket with the
Finally, to kick off the whole process, we call the openBodyReader() function:
Here’s how the final version looks like, all in one:
If you save this document as broadcast.js, connect and power up the Kinect then enter node broadcast.js data broadcasting should start within a few seconds. Next task is to look at the front-end code that subscribes to this data.
Subscribing to the data
After this step we attach an event listener onto the socket instance which in our case listens for incoming messages named “bodyFrame”. This is the name we specified in the Node.js application when we sent the data. The second argument is the handler function to call when a message was received:
Inside the interpretData function we can pass in the a variable which will contain the received message, which is the JSON formatted skeleton data in our case:
Let’s try investigating what the structure is inside this bodyFrame JSON object. At the first level we will find a property called bodies which contains an array of skeleton data for each skeleton tracked within the frame. This means that we can interpret gestures from multiple people standing in front of the Kinect camera the same time.
The selected user skeleton has a joints property which, again, contains an array with all the boy joints tracked:
Here’s a list of all the joints:
Spine base: 0
Spine mid point: 1
Left shoulder: 4
Left elbow: 5
Left wrist: 6
Left hand: 7
Right shoulder: 8
Right elbow: 9
Right wrist: 10
Right hand: 11
Left hip: 12
Left knee: 13
Left ankle: 14
Left foot: 15
Right hip: 16
Right knee: 17
Right ankle: 18
Right foot: 19
Shoulder spine center: 20
Left hand tip: 21
Left thumb: 22
Right hand tip: 23
Right thumb: 24
Each of these joints have other properties too, like depthX, and depthY which will give you the 2D positional coordinates of each of the joints. You also have other properties like cameraZ to get distance from the camera too. With this we have 3D positional information from 24 joints of the body.
How can we use this to do basic gesture tracking?
One of the simplest gestures I can think if is a swipe in the air. Something you would do to swipe through an image gallery or carousel. Let’s analyse this movement in detail. We can say that the horizontal movement of your right hand becomes a “swipe left” gesture when the speed of the hand reaches a certain threshold into the left direction. The first first problem with this is that if your whole body moves from right to the left then it will still count as an increasing speed in your hand joint but your hand actually stands still relative to your body.
One solution to this is to rather than simply checking X coordinate and speed of the hand we calculate everything relative to the centre of the body which is your spine. For this, let’s first get the absolute position of the hand:
Then we get the absolute position of the spine:
Then from this we get the relative position of the hand:
Then we calculate the relative speed by checking how much the position has changed between each frames:
Once we have this calculated you can simply add this value into an if condition and trigger the animation:
If you would like to make your code more reliable you can add a second condition to test whether the hand is above the waist line to consider it for gestures.