I’m quite tired of seeing the stupid things people try to pass off as facts about how the kinect works so instead of complaining about it I decided to make this handy guide.
First of all what it isn’t: it is not an EyeToy or just a webcam.
That weird (huge) box contains:
- a VGA 640x480 color camera (CMOS) with a Bayer color filter
- a IR
640x480 camera (CMOS)*
- an IR projector
- a motor, an assortment of control chips and 4 microphones
- a fan!
(* it seems the outputted depthmap is 640x480 but the IR camera sensor size is closer too 1600x1200)
The hardware is something researchers have been pining over for a loooong time because it’s a cheap camera that provides distance measurements for every pixel. Previously all they could do was guess and hope for ideal conditions so that your algorithm would work.
How it works:
It is NOT a time of flight camera as a lot of websites have incorrectly stated. It is a structured light camera. You’ve seen those videos of 3d scanners projecting stripes onto things as if imitating venetian blinds? It the same principle at work only there’s nothing venetian about it.
That IR projector mentioned above is an IR laser that passes through a diffraction grating and turns into a lot of IR dots.
They look chaotic because that’s how the kinect can tell what’s going on. NO IT WILL NOT GIVE YOU CANCER. It is a class 1 laser so it's safe to be exposed to it indefinitely. Each of those points is unique and the IR camera sees how it distorts and says „hmm the distance to that point is x”.Combine that data with the color image from the camera and you have this:
That ladies and gentlemen is a realtime 3D video stream!
Those odd shadows are where the camera can’t see. Combine a few more you say? You can’t. Remember how I said the dots are unique? They suddenly become a lot less unique if there are two or more dotfields in the same area. The Kinects would become confused, sad and depressed.
Switching between them so there’s only one dotfield active at a time might work but that would divide the 30Hz framerate by the number of Kinects rendering the whole thing useless.
Unfortunately laser light is already polarized to some degree so a hack to enable Kinects to operate simultaneously with polarized filters would prove challenging. In the same tone wavelength shifting would be insanely complicated and expensive as narrow band IR filters are not cheap or easy to come by.
Radu Bogdan Rusu has said that "Judging from a few basic tests with two Kinects, I can't seem to get them to interfere with each other to the point where the data is unusable."
Also here is a great explanation of how the pattern is used to determine depth:
The software is also quite revolutionary. One guy had this to say about it:
The sophistication and performance of the algorithms rival or exceed anything that I've seen in academic research, never mind a consumer product....We would all love to one day have our own personal holodeck. This is a pretty measurable step in that direction.
Who’s Johnny Chung Lee? Remember this guy?
What’s he working on? The Kinect.
He’s not exaggerating either. The software is nothing short of revolutionary. Estimates are it took 5 to 10 years to develop! That’s the average lifespan of a ferret.
(missing picture: Ferret computer scientists)
How it works:
The team first built a software that could learn, then sent it to school to learn how humans look and move.
And it works! It actually works. Until now computer vision has been this thing that sortof works and breaks if you look at it funny but this actually works.
As if that wasn’t enough it also understands your voice and tracks it in the room (those 4 microphones were not just for weigh balancing).
Researchers would pay thousands until now for hardware that wasn’t half as good as this. The community has embraced this thing with open arms. Microsoft has said they’ll release an SDK to work with their wonderful software (hurry up will you?). Until now computers couldn’t actually see. Even the limited info they could receive wasn’t really that easy to interpret so it wasn’t used. The Kinect has given computers more than eyes, it’s given them vision.
One question is: Why livingroom gaming as the first target? It seems like an odd first choice. My personal theory is that it’s a trojan horse tactic. First get it into everything then enable everyone to write software for it. Developers, developers, developers.
One thing is certain. This has changed the face of computing and robotics forever.