This week I have been working on two major parts (programming-wise) of the prototype. The first being the single user’s experience in the elevator (which is a pose mimic game for those who do not remember). In doing so I first investigated TensorFlow and went down a big rabbit hole that has led me to PoseNet. PoseNet is a script which allows for a human body to be mapped out with 32 points. The image below is an example of this.
PoseNet works both on a still image and cam as you can see. What I’ve been trying to do with PoseNet is to be able to call on specific images from a local folder and compare it to the live webcam feed to ensure the poses are similar. I have found that using cosine similarity, you are able to do this Article here. Originally, I intended for the system to use an image classifier to scan the webcam and a picture saved and compare the two but ran into a few issues with loading in the images and comparing it to the webcam. (This is all done using JS and HTML).
Additionally, I have also been working on the multi-user experience for the elevator (which is the charade game) audio response. Using an API, I am using speech recognition, to detect if the words being said is correct. At the moment it can hear the words and send to text and I am still working on checking if the words being said is correct. As this isn’t the major functionality that I am focusing on, I haven’t worked on it as much as the posing. Although in saying that I do want this to be completed. The image below for the layout of the audio to text that I’ve made.
Lastly, I have only done a bit for the report that is due soon (whoops). And should start focusing on that and the video as that video will take a while to make.