Can Robots Learn to See?
Wednesday, May 5, 2010
Building 3 Auditorium - 11:00 AM
(Coffee at 10:30 AM)
The visual systems of animals and humans learn to locate and recognize objects, to recognize locations, and to navigate the world autonomously and effortlessly. What "learning algorithm" does the visual cortex use to organize itself? Could robots learn to see by just looking at the world and moving around it, the way animals do? A major challenge for machine learning and computer vision is to devise architectures and algorithms that can learn complex visual tasks from raw, unlabeled images and videos. The visual cortex uses a multi-stage hierarchy of representations, from pixels, to edges, to motifs, to parts, to objects, to scenes. Recent research in so-called "deep learning" has produced new algorithms that learn such multi-stage hierarchies of representations in an unsupervised fashion.
I will first describe the convolutional network model, whose architecture is inspired by the visual cortex. Each stage is composed of a series of filters, followed by a non-linear operation, and a spatial pooling operation that builds invariance to small geometric transformations of the input. I will then describe a class of learning algorithms, based on sparse coding, that enables such architectures to learn good internal representations in an unsupervised fashion. Specific visual tasks can then be learned from very small amounts of labeled data.
An application to category-level object recognition with invariance to pose and illumination will be described. By stacking multiple stages of sparse features, and refining the whole system with supervised training, state-the-art accuracy can be achieved on standard datasets with very few labeled samples. A real-time demo will be shown. Another application to vision-based navigation for off-road mobile robots will be described. After a phase of off-line unsupervised learning, the system autonomously learns to discriminate obstacles from traversable areas at long range using labels produced with stereo vision for nearby areas. Other applications of deep learning and convolutional networks by our groups and others will be shown.
Dr. Yann LeCun is Silver Professor of Computer Science and Neural Science at the Courant Institute of Mathematical Sciences and the Center for Neural Science of New York University. He received the Electrical Engineer Diploma from Ecole Supérieure d'Ingénieurs en Electrotechnique et Electronique (ESIEE), Paris in 1983, and the PhD in from Université Pierre et Marie Curie (Paris) in 1987. After a postdoc at the University of Toronto, he joined AT&T Bell Laboratories in Holmdel, NJ, in 1988, and became head of the Image Processing Research Department at AT&T Labs-Research in 1996. He joined NYU in 2003, after a brief period as Fellow at the NEC Research Institute in Princeton. His current interests include machine learning, computer vision, pattern recognition, mobile robotics, and computational neuroscience. He has published over 140 technical papers and book chapters on these topics as well as on neural networks, handwriting recognition, image processing and compression, and VLSI design. His handwriting recognition technology is used by several banks around the world to read checks. His image compression technology, called DjVu, is used by hundreds of web sites and publishers and millions of users to access scanned documents on the Web, and his image recognition methods are used in deployed systems by companies such as Google, Microsoft, NEC, France Telecom and several startup companies for document recognition, human-computer interaction, image indexing, and video analytics. He has been on the editorial board of IJCV, IEEE PAMI, IEEE Trans on Neural Networks, was program chair of CVPR'06, and is chair of the annual Learning Workshop. He is on the science advisory board of Institute for Pure and Applied Mathematics, and is the co-founder of MuseAmi, a music technology company.
IS&T Colloquium Committee Host: Tony Gualtieri
Sign language interpreter upon request: 301-286-8313