To truly understand language, an intelligent system must be able to connect words, phrases, and sentences to its perception of objects and events in the world. Current natural language processing and computer vision systems make extensive use of machine learning to acquire the probabilistic knowledge needed to comprehend linguistic and visual input. However, to date, there has been relatively little work on learning the relationships between the two modalities. In this talk, I will review some of the existing work on learning to connect language and perception, discuss important directions for future research in this area, and argue that the time is now ripe to make a concerted effort to address this important, integrative AI problem.
Download Full PDF Version (Non-Commercial Use)