Xbox Kinect: How the voice recognition works

The audio system explained

How Xbox Kinect works | 1. Movement tracking | 2. Voice recognition | 3. The motor

The problem facing the microphone subsystem is that it needs to be sensitive to voices up to 10 feet away, while being able to ignore ambient noises and any sounds other than your voice. To solve this problem, the Microsoft lab went to 250 homes with 16 microphones and took a host of recordings from different setups, determining in the end the very best mic positioning.

The end result is an array of four downward-facing (so that the front of Kinect stays clean and grill-free) mics, spaced one on the left and three on the right. In fact, this specific microphone placement is the only reason why Kinect is as wide as it is.

This array works best at picking up voices at distance, but it still needs help. The onboard processing unit cancels out noise that it determines is coming from your beefy 5.1 surround system, while a software system called ‘Beam Forming’ works with the camera to work out where you are to create an envelope of sound around you. This hammers in the sound of your voice and ignores your friends or family members either side of you.

Kinect has an ‘acoustical model’ for countries and individual regional dialects, built from 100s of hours of actors from round the world talking through various sayings.

Just like the Optics, this is happening all the time. The sound recognition works on an open-mic system, meaning the microphones are listening at all times - ready to take commands such as ‘Xbox Pause’ at any point during movie playback.