The Kinect is a device introduced in 2010 as an accessory of XBox 360. The data acquired has different and complementary natures, combining geometry with visual attributes. For this reason, the Kinect is a flexible tool that can be used in applications of several areas, such as: Computer Graphics, Image Processing, Computer Vision and Human-Machine Interaction. In this way, the Kinect is a widely used device in the industry (games, robotics, theater performers, natural interfaces, etc.) and in research. Initially in this tutorial we will present the main techniques related to the acquisition of data: capturing, representation and filtering. The data consists of a colored image (RGB) and depth information (D). This structure is called RGBD Image. After that, we will talk about tools available for developing applications on various platforms. We will also discuss some recent developed projects based on RGBD Images. In particular, those related to Object Recognition, 3D Reconstruction, and Interaction. In this tutorial, we will show some researches, developed by the academic community, and some projects developed for the industry. We intend to show the basic principles to begin developing applications using Kinect, and present some projects developed at the VISGRAF Lab. And finally, we intend to discuss the new possibilities, challenges and trends raised by Kinect.
We intend to address Vision and Image Processing techniques using RGBD images, and talk about the applications development using the Kinect (like games, natural interfaces, robotics, 3D reconstruction, object extraction, etc.). Accordingly, the topics are of interest to students and researchers in the areas of Computer Graphics, Computer Vision, Image Processing and Pattern Recognition.
I - Introduction and Motivations - 30 minutes - Luiz Velho
II - Development with Kinect - 1 hour - Djalma Lúcio
III - Applications - 1 hour - Leandro Cruz
IV - Conclusions - 30 minutes - Luiz Velho
The Kinect appeared, in 2010, as an accessory to Xbox 360 Console. It is as a device developed by the PrimeSense Company, together with Microsoft. Its announcement, in 2009, caused great expectations in the academic community of Computer Graphics and Computer Vision. The product promised a new way to interact in games, completely based on gestures and voice(without any other type of control).
3D scanning is a task that has been very popular in the last years. There are several technologies that could be used to capture the geometry of an object (or a scene), such that: LIDAR, time of flight, stereo cameras, and structured light. Nevertheless, a scanner is an expensive machine. On the other hand, the kinect is a low cost device capable of capturing, in real time, geometry and colors of a scene. Naturally, there is a trade off. The kinect data resolution is typically 640x480. It is lower than the most of scanners. However, it is enough for several applications. Furthermore, we can infer a better data from the captured one.
Most of image processing systems are based only in the color channels of the images. Nevertheless, others image attributes can be used in processig, for instance: depth, normals, luminnance, etc. This attributes can carry informations that further, or it allows, to implement some procedures that are hard, if not impossible, using only colors. Accordingly, the information acquired by the kinect (RGB + Depth) has a structure that creates a new way to process images. In this tutorial, we will explore the new possibilities generated by this structure, called RGBD Image. An example of these possibilities is the real time tracking of a human skeleton (a structure widely used on gesture-based interactions).
Some courses related to Kinect were presentes in the last two years. In particular, in 2011, the IMPA's Image Processing course was presented with The RGBD Video theme. The projects using Kinect, developed in this course, motivated the development of this tutorial.
Topics to be Present
I - Introduction and Motivations
Since it was presented, in 2010, the Kinect became a device widely used in industry (games, robotics, theater performers, natural interfaces, etc.) and in research. It is motivated by new possibilities raised combining color and depth.
The Kinect has an RGB camera, and an infrared (IR) emitter and camera. They are capable of capturing a colored image and the depth of each pixel in the scene. These data contain visual and geometric informations of the scene. They are complementary, and they allow us to do tasks that are difficult, if not impossible, when we only use images.
We intend to discuss, in this tutorial, the data acquisition process using the kinect. It consists in capturing, representation and filtering of the data. About the capture, it is known that Kinect uses an RGB camera, to get a colored image, and an infrared emitter and camera. It uses a structured light method to obtain the depth information from the infrared data. The light pattern emitted is private (not disclosed). However, there are public tools that know this information, and so, they know how to calculate the depth values. The data representation is a combination of an RGB image (an image with three channels, each one using an integer of 8 bits, per pixel) and the depth data (a matrix, like an image, where each element is an integer of 16 bits), ie, an RGBD Image.
II - Development with Kinect
Besides of XBox 360 Console, we can use the Kinect together with computers. It is used in games, but also in several others applications. There are some tools to develop systems using kinect. We can cite OpenKinect, OpenNI e Microsoft Kinect SDK. In this tutorial we will show a brief introduction about these tools. Moreover, we will present some applications developed at VISGRAF
- Lambe-lambe 3D
- A 3D face scanner using the kinect
- Kinect Stereo Viewer
- To visualize the informations captured with kinect in 3D stereo format
- A method for object extraction in RGBD images (an extension of GrabCut method)
- Feature track and detection
- Detect and track features in a RGBD video
- AC/DC Bot
- a robot that can move around a floor using motorized wheels, and by analyzing the images obtained with the Kinect, it recognize the 3D environment around
- INTEGRARTE ENTREGARTE
- Explore the body and its possible visual and audible outspread
III - Applications
The raw depth data coming from the kinect has some problems. It is usually very noisy, it could be missing in certain regions (there are holes), and it is not aligned with the color information. In this way, we need to preprocess the raw data to use it in most applications. This preprocessing consists in filter the data (to remove noise and holes) and calibrate the cameras (RGB and IR) to align the given depth with the color information. We will discuss some techniques for these tasks.
Kinect has raised interest not only in the game industry, but it also has been used as a research tool by the academic community. We can cite works related to Reconstruction of 3D Scenes and Objects, Object Recognition, Interaction, among others. These research topics are old and fairly well developed. However, in general, the solutions use expensive machines, or very complex algorithms to infer geometry from images. One motivation for using Kinect is its low cost and availability of several tools that help its use.
It is important to highlight the kinect is a low cost device and there are several data sets of RGBD Images available on the internet. We can cite the NYU Depth Dataset, RGB-D Object Dataset, among others. With this data, it is possible to do a lot of research (about RGBD Image and Video Processing, Pattern Recognition, etc.) even if the person has not acess to a Kinect.
IV - Conclusions
The main purpose of this tutorial is to present some applications done using kinect and to point good directions of research. We will analyse some researches that are being done nowadays using RGBD image (in games, robotics, theater performers, natural interfaces, etc.) to show the challenges and trends of the area. Moreover, we will also discuss some possibilities of new applications using this device.