VISGRAF - Computer Graphics Laboratory

Abstract

The Kinect is a device introduced in 2010 as an accessory of XBox 360. The data acquired has different and complementary natures, combining geometry with visual attributes. For this reason, the Kinect is a flexible tool that can be used in applications of several areas, such as: Computer Graphics, Image Processing, Computer Vision and Human-Machine Interaction. In this way, the Kinect is a widely used device in the industry (games, robotics, theater performers, natural interfaces, etc.) and in research. Initially in this tutorial we will present the main techniques related to the acquisition of data: capturing, representation and filtering. The data consists of a colored image (RGB) and depth information (D). This structure is called RGBD Image. After that, we will talk about tools available for developing applications on various platforms. We will also discuss some recent developed projects based on RGBD Images. In particular, those related to Object Recognition, 3D Reconstruction, and Interaction. In this tutorial, we will show some researches, developed by the academic community, and some projects developed for the industry. We intend to show the basic principles to begin developing applications using Kinect, and present some projects developed at the VISGRAF Lab. And finally, we intend to discuss the new possibilities, challenges and trends raised by Kinect.

Target Audience

We intend to address Vision and Image Processing techniques using RGBD images, and talk about the applications development using the Kinect (like games, natural interfaces, robotics, 3D reconstruction, object extraction, etc.). Accordingly, the topics are of interest to students and researchers in the areas of Computer Graphics, Computer Vision, Image Processing and Pattern Recognition.

Motivation

The Kinect appeared, in 2010, as an accessory to Xbox 360 Console. It is as a device developed by the PrimeSense Company, together with Microsoft. Its announcement, in 2009, caused great expectations in the academic community of Computer Graphics and Computer Vision. The product promised a new way to interact in games, completely based on gestures and voice(without any other type of control).

3D scanning is a task that has been very popular in the last years. There are several technologies that could be used to capture the geometry of an object (or a scene), such that: LIDAR, time of flight, stereo cameras, and structured light. Nevertheless, a scanner is an expensive machine. On the other hand, the kinect is a low cost device capable of capturing, in real time, geometry and colors of a scene. Naturally, there is a trade off. The kinect data resolution is typically 640x480. It is lower than the most of scanners. However, it is enough for several applications. Furthermore, we can infer a better data from the captured one.

Most of image processing systems are based only in the color channels of the images. Nevertheless, others image attributes can be used in processig, for instance: depth, normals, luminnance, etc. This attributes can carry informations that further, or it allows, to implement some procedures that are hard, if not impossible, using only colors. Accordingly, the information acquired by the kinect (RGB + Depth) has a structure that creates a new way to process images. In this tutorial, we will explore the new possibilities generated by this structure, called RGBD Image. An example of these possibilities is the real time tracking of a human skeleton (a structure widely used on gesture-based interactions).

Some courses related to Kinect were presentes in the last two years. In particular, in 2011, the IMPA's Image Processing course was presented with The RGBD Video theme. The projects using Kinect, developed in this course, motivated the development of this tutorial.

Topics to be Present

I - Introduction and Motivations

Since it was presented, in 2010, the Kinect became a device widely used in industry (games, robotics, theater performers, natural interfaces, etc.) and in research. It is motivated by new possibilities raised combining color and depth.

The Kinect has an RGB camera, and an infrared (IR) emitter and camera. They are capable of capturing a colored image and the depth of each pixel in the scene. These data contain visual and geometric informations of the scene. They are complementary, and they allow us to do tasks that are difficult, if not impossible, when we only use images.

We intend to discuss, in this tutorial, the data acquisition process using the kinect. It consists in capturing, representation and filtering of the data. About the capture, it is known that Kinect uses an RGB camera, to get a colored image, and an infrared emitter and camera. It uses a structured light method to obtain the depth information from the infrared data. The light pattern emitted is private (not disclosed). However, there are public tools that know this information, and so, they know how to calculate the depth values. The data representation is a combination of an RGB image (an image with three channels, each one using an integer of 8 bits, per pixel) and the depth data (a matrix, like an image, where each element is an integer of 16 bits), ie, an RGBD Image.

II - Development with Kinect

Besides of XBox 360 Console, we can use the Kinect together with computers. It is used in games, but also in several others applications. There are some tools to develop systems using kinect. We can cite OpenKinect, OpenNI e Microsoft Kinect SDK. In this tutorial we will show a brief introduction about these tools. Moreover, we will present some applications developed at VISGRAF

Lambe-lambe 3D

A 3D face scanner using the kinect

Kinect Stereo Viewer

To visualize the informations captured with kinect in 3D stereo format

GrabCut+D

A method for object extraction in RGBD images (an extension of GrabCut method)

Feature track and detection

Detect and track features in a RGBD video

AC/DC Bot

a robot that can move around a floor using motorized wheels, and by analyzing the images obtained with the Kinect, it recognize the 3D environment around

INTEGRARTE ENTREGARTE

Explore the body and its possible visual and audible outspread

III - Applications

The raw depth data coming from the kinect has some problems. It is usually very noisy, it could be missing in certain regions (there are holes), and it is not aligned with the color information. In this way, we need to preprocess the raw data to use it in most applications. This preprocessing consists in filter the data (to remove noise and holes) and calibrate the cameras (RGB and IR) to align the given depth with the color information. We will discuss some techniques for these tasks.

Kinect has raised interest not only in the game industry, but it also has been used as a research tool by the academic community. We can cite works related to Reconstruction of 3D Scenes and Objects, Object Recognition, Interaction, among others. These research topics are old and fairly well developed. However, in general, the solutions use expensive machines, or very complex algorithms to infer geometry from images. One motivation for using Kinect is its low cost and availability of several tools that help its use.

It is important to highlight the kinect is a low cost device and there are several data sets of RGBD Images available on the internet. We can cite the NYU Depth Dataset, RGB-D Object Dataset, among others. With this data, it is possible to do a lot of research (about RGBD Image and Video Processing, Pattern Recognition, etc.) even if the person has not acess to a Kinect.

IV - Conclusions

The main purpose of this tutorial is to present some applications done using kinect and to point good directions of research. We will analyse some researches that are being done nowadays using RGBD image (in games, robotics, theater performers, natural interfaces, etc.) to show the challenges and trends of the area. Moreover, we will also discuss some possibilities of new applications using this device.

Authors Biography

IMPA - Instituto Nacional de Matemática Pura e Aplicada VISGRAF Lab - Laboratório de Computação Gráfica e Visão Computacional Estrada Dona Castorina, 110 CEP: 22460-320, Rio de Janeiro - RJ, Brasil.

Luiz Velho - lvelho@impa.br

Luiz Velho is a Full Researcher / Professor at IMPA - Instituto de Matematica Pura e Aplicada of CNPq, and the leading scientist of VISGRAF Laboratory. He received a BE in Industrial Design from ESDI / UERJ in 1979, a MS in Computer Graphics from the MIT / Media Lab in 1985, and a Ph.D. in Computer Science in 1994 from the University of Toronto under the Graphics and Vision groups. His experience in computer graphics spans the fields of modeling, rendering, imaging and animation. During 1982 he was a visiting researcher at the National Film Board of Canada. From 1985 to 1987 he was a Systems Engineer at the Fantastic Animation Machine in New York, where he developed the company's 3D visualization system. From 1987 to 1991 he was a Principal Engineer at Globo TV Network in Brazil, where he created special effects and visual simulation systems. In 1994 he was a visiting professor at the Courant Institute of Mathematical Sciences of New York University. He also was a visiting scientist at the HP Laboratories in 1995 and at Microsoft Research China in 2002. He has published extensively in conferences and journals of the area. He is the author of several books and has taught many courses on graphics-related topics. He is a member of the editorial board of various technical publications, and was the guest editor of the Special Issue on Computer Graphics of JBCS and of Computer & Graphics . He has also served on numerous conference program committees. His awards include the "Ordem Nacional do Merito Cientifico", a Honors Prize in the II Compaq Award for Computer Science and Prizes for Best Technical Videos and Best Papers at SIBGRAPI. In 1996 he was the Program Chair of the IX SIBGRAPI . He was distinguished as the first researcher in South America to be on the SIGGRAPH Papers Committee, in 1999. He served in the SIGGRAPH Papers Committee also in 2000, 2002 and 2003. He was a member of the Eurographics IPC in 2008. He received the prestigious grant award "Cientista do Nosso Estado" from FAPERJ in 2004, 2007 and 2009. He has been a Keynote Speaker in several conferences, including SGP 2005, CNMAC 2006, the SBPC Congress 2006, SIBGRAPI 2007, ISMM 2007, and WVC 2010.

Leandro Cruz - lcruz@impa.br

Leandro Cruz is a PhD student at IMPA since March 2011, received the title of Licence in Mathematics at the Universidade Estadual do Norte Fluminense, in 2006, the title of Bachelor of Computer Science at the Universidade Candido Mendes, in 2009, and the title of Master in Computer Graphics at IMPA, in 2011. For four years he taught mathematics at the CEDERJ, and for a year worked at the NSI (Center for Information Systems) in Intituto Federal Fluminense, working with multimedia applications Webs. When he was an undergraduate student, he presented three works on the SIBGRAPI-WUW and two works on Jornada de Iniciação Científica do IMPA. During the master, he research about Sketch-based modeling (resulting in a tutorial on SIBGRAPI in 2010) and Terrain Modeling (he presented a paper on SIBGRAPI-WTD in 2011). In 2011, was a Teaching Assistant of a Image Processing course, at IMPA, which the theme was RGBD Videos. Nowadays, he is researching interaction and modeling using kinect and he continued his research about terrain modeling.

Djalma Lúcio - dlucio@impa.br

Djalma Lúcio is a computer Scientist with over 20 years of experience in medium and large enter- prises. Among the activities held over the years of experience, he planned, developed and implemented various systems and services, servers, and has taught several courses in operating systems. Currently, he works as a developer and systems administrator in the VISGRAF Lab, at IMPA. He worked as a developer in the Tecgraf Lab, at PUC-Rio. He managed projects of NATE (Group of Learning, Work and Entertainment) in Integrated Systems Lab at POLI-USP. He worked in research and development of software for educational and recreational areas, including a racing simulator.

Kinect and RGBD Images: Challenges and Application

Tutorial - SIBGRAPI 2012

Luiz Velho, Leandro Cruz, Djalma Lúcio

Abstract

Target Audience

Schedule

I - Introduction and Motivations - 30 minutes - Luiz Velho

II - Development with Kinect - 1 hour - Djalma Lúcio

III - Applications - 1 hour - Leandro Cruz

IV - Conclusions - 30 minutes - Luiz Velho

Motivation

Topics to be Present

Materials

Interesting Links

Authors Biography

Luiz Velho - lvelho@impa.br

Leandro Cruz - lcruz@impa.br

Djalma Lúcio - dlucio@impa.br

Kinect and RGBD Images: Challenges and Application Tutorial - SIBGRAPI 2012

Luiz Velho, Leandro Cruz, Djalma Lúcio

Abstract

Target Audience

Schedule

I - Introduction and Motivations - 30 minutes - Luiz Velho

II - Development with Kinect - 1 hour - Djalma Lúcio

III - Applications - 1 hour - Leandro Cruz

IV - Conclusions - 30 minutes - Luiz Velho

Motivation

Topics to be Present

Materials

Interesting Links

Authors Biography

Luiz Velho - lvelho@impa.br

Leandro Cruz - lcruz@impa.br

Djalma Lúcio - dlucio@impa.br

Kinect and RGBD Images: Challenges and Application

Tutorial - SIBGRAPI 2012