Gil-o-topia: How does the Kinect really work?

marți, noiembrie 16, 2010

How does the Kinect really work?

I’m quite tired of seeing the stupid things people try to pass off as facts about how the kinect works so instead of complaining about it I decided to make this handy guide.

First of all what it isn’t: it is not an EyeToy or just a webcam.

Hardware:

That weird (huge) box contains:

a VGA 640x480 color camera (CMOS) with a Bayer color filter
a IR 640x480 camera (CMOS)*
an IR projector
a motor, an assortment of control chips and 4 microphones
a fan!

(* it seems the outputted depthmap is 640x480 but the IR camera sensor size is closer too 1600x1200)

The hardware is something researchers have been pining over for a loooong time because it’s a cheap camera that provides distance measurements for every pixel. Previously all they could do was guess and hope for ideal conditions so that your algorithm would work.

How it works:

It is NOT a time of flight camera as a lot of websites have incorrectly stated. It is a structured light camera. You’ve seen those videos of 3d scanners projecting stripes onto things as if imitating venetian blinds? It the same principle at work only there’s nothing venetian about it.

That IR projector mentioned above is an IR laser that passes through a diffraction grating and turns into a lot of IR dots.

They look chaotic because that’s how the kinect can tell what’s going on. NO IT WILL NOT GIVE YOU CANCER. It is a class 1 laser so it's safe to be exposed to it indefinitely. Each of those points is unique and the IR camera sees how it distorts and says „hmm the distance to that point is x”.

Combine that data with the color image from the camera and you have this:

That ladies and gentlemen is a realtime 3D video stream!
Those odd shadows are where the camera can’t see. Combine a few more you say? You can’t. Remember how I said the dots are unique? They suddenly become a lot less unique if there are two or more dotfields in the same area. The Kinects would become confused, sad and depressed.
Switching between them so there’s only one dotfield active at a time might work but that would divide the 30Hz framerate by the number of Kinects rendering the whole thing useless.

Unfortunately laser light is already polarized to some degree so a hack to enable Kinects to operate simultaneously with polarized filters would prove challenging. In the same tone wavelength shifting would be insanely complicated and expensive as narrow band IR filters are not cheap or easy to come by.

Radu Bogdan Rusu has said that "Judging from a few basic tests with two Kinects, I can't seem to get them to interfere with each other to the point where the data is unusable."

Also here is a great explanation of how the pattern is used to determine depth:

The software:
The software is also quite revolutionary. One guy had this to say about it:

The sophistication and performance of the algorithms rival or exceed anything that I've seen in academic research, never mind a consumer product.

...

We would all love to one day have our own personal holodeck. This is a pretty measurable step in that direction.

– Johnny Chung Lee

Who’s Johnny Chung Lee? Remember this guy?

Him.

What’s he working on? The Kinect.

He’s not exaggerating either. The software is nothing short of revolutionary. Estimates are it took 5 to 10 years to develop! That’s the average lifespan of a ferret.

(missing picture: Ferret computer scientists)

How it works:

The team first built a software that could learn, then sent it to school to learn how humans look and move.

And it works! It actually works. Until now computer vision has been this thing that sortof works and breaks if you look at it funny but this actually works.

As if that wasn’t enough it also understands your voice and tracks it in the room (those 4 microphones were not just for weigh balancing).

Researchers would pay thousands until now for hardware that wasn’t half as good as this. The community has embraced this thing with open arms. Microsoft has said they’ll release an SDK to work with their wonderful software (hurry up will you?). Until now computers couldn’t actually see. Even the limited info they could receive wasn’t really that easy to interpret so it wasn’t used. The Kinect has given computers more than eyes, it’s given them vision.

One question is: Why livingroom gaming as the first target? It seems like an odd first choice. My personal theory is that it’s a trojan horse tactic. First get it into everything then enable everyone to write software for it. Developers, developers, developers.

One thing is certain. This has changed the face of computing and robotics forever.

35 de comentarii:

Joe16 noiembrie 2010 la 15:07
Thanks for this write up. In comment to the "why the living room" -- well, every household has one. It's more or less the "killer app" for it. Sadly, us geeks are the ones thinking of it's applications beyond being a fancy toy.
RăspundețiȘtergere
Răspunsuri
Anonim16 noiembrie 2010 la 15:23
What do you think about Kinect IR class1 laser . is it dangerous for eyes at short distance ?
what about people playing a lot of time ?
RăspundețiȘtergere
Răspunsuri
William16 noiembrie 2010 la 16:16
A class 1 laser is eye safe under all conditions. No worries.
RăspundețiȘtergere
Răspunsuri
Mif16 noiembrie 2010 la 16:50
Great write up!
RăspundețiȘtergere
Răspunsuri
hddscan16 noiembrie 2010 la 17:35
According to research depth camera is 640x480 not 320x240
RăspundețiȘtergere
Răspunsuri
Anonim16 noiembrie 2010 la 18:23
you could use more than one kinect by polarising the light - or swapping the IR leds for ones emitting a different IR frequency
RăspundețiȘtergere
Răspunsuri
Gil16 noiembrie 2010 la 20:32
There are no IR leds. It's a laser diode inside and lasers are already polarized. The idea might be worth exploring but it will be no walk in the park.
RăspundețiȘtergere
Răspunsuri
Anonim17 noiembrie 2010 la 02:18
Nice wrap-up. Thank you.
RăspundețiȘtergere
Răspunsuri
Anonim18 noiembrie 2010 la 03:39
Thanks for an interesting read. I have a couple of quibbles. Computer Vision researchers don't spend thousands on hardware; quite the opposite. The majority of systems use one or two cameras. Also, i'm not totally convinced about the scale of this revolution you're expecting. This system won't scale to large or outdoor installations easily, nor will it handle the majority of tasks computer vision focuses on (the detection of events, for example). That said, for small room systems this could be very useful, gesture recognition springs to mind.
RăspundețiȘtergere
Răspunsuri
miu miu bag18 noiembrie 2010 la 10:30
Acest comentariu a fost eliminat de administratorul blogului.
RăspundețiȘtergere
Răspunsuri
Anonim19 noiembrie 2010 la 07:31
Is the second image of the sensor (without the cover) photographed by you? Just curious - I would like to use it on a different webpage, and want to make sure that I give the appropriate credits. Thanks.
RăspundețiȘtergere
Răspunsuri
Gil19 noiembrie 2010 la 14:21
No it's from the Ifixit teardown. The image links to it.
RăspundețiȘtergere
Răspunsuri
Unknown19 noiembrie 2010 la 19:08
My first thought for making the two light fields work with each other is a shutter system over the laser projector and camera. An external mod. I'd be interested in seeing how much light loss the IR(near ir) camera can deal with before depth data is lost. I'm pretty sure it could deal with a 1/15th exposure time allowing for at least two to work together.

The next question is then what is the Jello effect like on the camera and how much does the shutter cause distortion?

I would have assumed the camera was a 1/60 shutter. 1/30th results in much motion bluring.
RăspundețiȘtergere
Răspunsuri
Anonim22 noiembrie 2010 la 13:43
yes, hurry up and release SDK PLEASE!!!!
RăspundețiȘtergere
Răspunsuri
Unknown24 noiembrie 2010 la 19:49
Acest comentariu a fost eliminat de autor.
RăspundețiȘtergere
Răspunsuri
Anonim24 noiembrie 2010 la 19:53
You said: "Each of those points is unique and the IR camera sees how it distorts and says „hmm the distance to that point is x”.

The QUESTION is: Nobody really knows how they did that! Please clarify!
RăspundețiȘtergere
Răspunsuri
Gil24 noiembrie 2010 la 20:22
My best guess:

Each point has a position at the minimum distance that the kinect measures. The camera recognizes that point based on it's position in the grid and measures the difference between the actual position and the default position. It's the same concept as IR projector images getting bigger the farther away they are.
RăspundețiȘtergere
Răspunsuri
Anonim26 noiembrie 2010 la 00:05
Thanks for that article! I have one question - do you know if you can also operate kinect without the fan? I have to make my kinect as small as possible and it would be helpful if I could simply remove the fan without destroying kinect.

Stefan
RăspundețiȘtergere
Răspunsuri
Gil26 noiembrie 2010 la 00:12
The fan is there to keep the components in a specific range of temperature (specifically the IR camera and laser). Removing it should be interesting :P
My guess is that it would work but who knows what your distance error will be. It certainly won't be ruined by a few tests
RăspundețiȘtergere
Răspunsuri
Benjamin Burns28 noiembrie 2010 la 11:44
They could prevent crosstalk by using a pseudorandom modulation scheme. The idea is that the laser's intensity can be controlled, and that a series of random intensities is repeatedly applied to the laser to form a "chirp." As the chirp is being projected it could also "simultaneously" be captured. Points that do not match the expected intensity range (intensity would vary given distance, surface reflectivity, etc) would be discarded. Points that do would be kept for the duration of the chirp. At the end of the capturing sequence, the depth map could be computed as normal from the remaining points.

Unfortunately, this would increase the cost of the hardware. As the laser would need to be controlled by a DAC, and the framerate of the camera would need to be increased. Also, unless the modulation frequency is high enough (which hardware costs would probably prohibit), a motion tracking algorithm might also need to be added to keep real dots from being falsely filtered due to their position changing during a chirp sequence.
RăspundețiȘtergere
Răspunsuri
Benjamin Burns28 noiembrie 2010 la 11:49
Actually, for long enough chirps, a pseudorandom square wave would probably be good enough. That would at least eliminate the need for the DAC.
RăspundețiȘtergere
Răspunsuri
Anonim20 decembrie 2010 la 16:52
I just acquire a Kinect and made some incredible measurements. The laser output power is about 40mW to 60mW, i.e. it exceeds hundred times the class 1 limit. Moreover this is an IR light so there is no blink reflex that could protect the eye. Although the diffractive optical element - that generate the strutured dot pattern for 3D measurement- separates the incident laser beam into thousand of low power beams, at short distance (few centimeters) all this power is focused on the retina. I've been involved in laser product certification, and I can't understand how Microsoft got this class 1 . For me this is a very dangerous device and I would recommand to certificationnever look at the laser dot pattern at less than 50cm. Take care with the children.
RăspundețiȘtergere
Răspunsuri
Anonim20 decembrie 2010 la 16:56
I just acquire a Kinect and made some incredible measurements. The laser output power is about 40mW to 60mW, i.e. it exceeds hundred times the class 1 limit. Moreover this is an IR light so there is no blink reflex that could protect the eye. Although the diffractive optical element - that generate the strutured dot pattern for 3D measurement- separates the incident laser beam into thousand of low power beams, at short distance (few centimeters) all this power is focused on the retina. I've been involved in laser product certification, and I can't understand how Microsoft got this class 1 . For me this is a very dangerous device and I would recommand to never look at the laser at less than 50cm. Take care with the children.
RăspundețiȘtergere
Răspunsuri
Anonim21 decembrie 2010 la 01:20
This is 100% bullshit.
Do you really think Microsoft will release a device that's not 100% safe? Do yo you really think any sensible company would take that insane risk? I have a feeling that they understand this subject just a bit more than you do. The device is a certified class 1 laser and that means that you can do whatever you want with it, from any distance, for any amount of time and it will still be safe. Unless you have some *REAL* proof.. stop scaring people.
RăspundețiȘtergere
Răspunsuri
Anonim21 decembrie 2010 la 17:51
I confirm that a class 1 laser cannot deliver more than 0.39mW into a 7mm diameter pupil at 7cm from the laser source (ANSIZ136/IEC60825). This is not, by far, the case with Kinect. So probably that Kinect doesn't use a laser source but a high power IR LED. Anyway it is not recommend to stare a 60mW IR source (coherent or not) at short distance.
RăspundețiȘtergere
Răspunsuri
Anonim27 decembrie 2010 la 12:30
I did the same with a my handycam as seen in the video and i really don t know it is save to see hours and hours in this laser light, because when you look throu the camera you cant see anything else then a bright light
RăspundețiȘtergere
Răspunsuri
elllbelll29 decembrie 2010 la 13:11
Hi, I came accross your website as I was looking for a technical description of the Kinect.. I find it very interesting because I am doing some research on Avatars, and how we could use them to create 'signing avatars' that could interpret into BSL (British Sign Language) rather than having real-life interpreters all the time.
There have been comments that the Kinect Avatars are actually half a second behind real-life movement, does this affect game-play? Just how much detail can be obtained with the Kinect and could this software be adapted to have short recordings of different signs and then stored into a dictionary to be called by voice-activation?
Sorry, a million questions at once but this might be a revolutionary technical find that could change so many peoples lifes! :)
RăspundețiȘtergere
Răspunsuri
Gil29 decembrie 2010 la 18:22
The delay is mostly caused by the processing software. The hardware is perfectly capable of realtime work. The processing power of the computer

Your application is tricky because in the default setting it the kinect doesn't register fingers. It is perfectly capable of doing so from close range but the software doesn't support it. The good news is that you can write your own software using OpenNI and soon using the Microsoft SDK. It's certainly possible to do what you want and the kinect is perfect for it but it won't be easy.

If you have any more question I'd be happy to answer them
RăspundețiȘtergere
Răspunsuri
elllbelll29 decembrie 2010 la 18:55
Thank you for replying :) Made my day!
Yes, I was thinking the finger recognition didn't really work.. I take it, it also doesn't really notice facial expressions?
I was also looking into the PS3 Move, I don't know how much you know about that.. would that be better for finger/facial expressions? (don't worry if you don't know the answer to that!!)
:)
RăspundețiȘtergere
Răspunsuri
Gil29 decembrie 2010 la 19:00
The PS3 move wouldn't work AT ALL. All the Move does is track the remote through the room. No other 3D data is processed.

For facial expressions you could process the video information from the Kinect. It's doable with OpenCV but not easy.
RăspundețiȘtergere
Răspunsuri
Anonim11 ianuarie 2011 la 00:35
Thank you so much for explaining how kinect actually works. This is a pretty good article until you start talking about personal holodecks, the tone changes to a salesman after that point. "The Kinect has given computers more than eyes, it’s given them vision." LOL..if you were a computer vision guy you wouldn't say that.
RăspundețiȘtergere
Răspunsuri
Anonim11 ianuarie 2011 la 10:18
I wrok with computer vision and I think he's right.
RăspundețiȘtergere
Răspunsuri
Anonim11 februarie 2011 la 23:30
for getting multiple angles, why double up the entire system? what about just using a second IR camera to decode the one dot-field, from a different vantage point? I.e., there'd still be a single unique dot field, you'd just pull the same distance-inference trick with the video stream from the new, 2nd IR cam.
RăspundețiȘtergere
Răspunsuri
Anonim26 februarie 2011 la 05:17
haha...I liked your last comment about the living room video console mostly it brings to attention the following notion:

It took entertainment to bring a novel technology that has so many other applications to actually come to fruition.

Though I think the Kinect is awesome and I am using it to build a robot, but I think its a little bit evident of human nature where our priorities are.
RăspundețiȘtergere
Răspunsuri
Unknown18 februarie 2015 la 02:29
http://sahr-allyaly.blogspot.com/

مدونة سهر الليالى|يحتوي على |كليبات |اغاني |رومنسي |برامج |تليفزونيه |كليبات عربيه | كليبات اجنبيه | اغاني اجنبيه | افلام اجنبيه |افلام عربيه |العاب | اغاني شعبي | مصارعه | موبيلات | مشاهده افلام اون لاين | اسلاميات | مسلسلات | مشاهده | كأس | العالم | بث | مباشر | مشاهده كأس العالم بث مباشر

مدونة سهر الليالى, افلام بدون تحميل , مشاهدة مباشرة , افلام جديدة , افلام اجنبية اون لاين , افلام عربية اون لاين , افلام ممنوعة , مشاهدة افلام عربية , مشاهدة افلام اجنبية , افلام للكبار , افلام اجنبية للكبار , برامج كمبيوتر, برامج تلفزيونيه، العاب اون لاين، العاب كمبيوتر، العاب للكبار،

Code of sleepless nights | contain | Videos | Music | Romance | software | Tlevzonyh | Videos | English clips Eminem | Music Dion | foreign films | Arabic Movies | Games | Popular Music | Wrestling | Mupailat | Watch Movies Online | islamic | Series | views | Cup | World | broadcast | Live | Watch world Cup Live

Code of sleepless nights, movies without downloading, watch directly, new movies, foreign films Online, Watch free Online, movies are banned, watch Arabic movies, watch foreign movies, for adults, foreign films for adults, computer programs, TV shows, online games Online, computer games, games for adults

http://sahr-allyaly.blogspot.com/
RăspundețiȘtergere
Răspunsuri