Perception for Action

Figure 7.5: The camera view of a wheel being mounted on an axis through dual arm manipulation. The blue outline visualizes the estimated state of the manipulation task, i.e. the pose of the arm, hand and wheel relative to the camera. Additionally to the visual sensor, force-torque and haptic sensor readings are marked that can be exploited to achieve this task.

In this research area, we are interested in the question of how a robot can leverage the continuous stream of multi-modal sensor data to act robustly in an uncertain and dynamically changing world. At the Autonomous Motion Department, we see perceptuo-motor skills as units in which motor control and perceptual processing are informing each other in a tight loop. Particular motor skills rely on specific sensory feedback and some perception tasks improve by performing certain actions. For example, bringing the robot hand to a specific pose relative to an object requires robust and fast simultaneous tracking of hand and object. Recognizing an object with a higher level of confidence may be facilitated by head motion to obtain an observation from an alternative viewpoint. We want to analyze how this interdependency between action and perception is structured for different tasks and how it can be learned.

Task-relevant and Goal-Directed A lot of the current research within robotics focuses on using range sensing to built an internal representation of the environment that is general, ubiquitous and as detailed as possible. The idea is to use this model for motion planning and then open-loop execution (Perceive-Plan-Act). In our department, we want to enable a robot to act within complex, dynamic and uncertain environments of which it is difficult to build and maintain such a detailed model. We focus on developing closed-loop behaviors that are adaptive due to continuous feedback of only task-relevant details. We want to analyze what feedback is actually relevant for particular tasks and how this weighting of information can be autonomously learned by a robot.

Continuous and Real-time The ability of a robot to act robustly within complex, dynamic and uncertain environments requires it to quickly react to unforeseen effects. This may consist of small, local movement adaptations or of re-planning an entire motion. This can only be achieved if the robot is able to continuously monitor all the task-relevant variables in real-time. We are interested in developing perception routines that can continuously provide task-relevant feedback to low-level controllers and reactive planners.

Multi-Modal The robots in our department have many actuators such as arms and hands, legs and ocular-motor systems. They are equipped with many sensors that stream proprioceptive, tactile, visual and auditory data. They provide the robot with different kinds of information that is dependent on the action the robot performs, whether it is in contact with the environment or not and on the environment. A robot needs to exploit these numerous sensory channels as feedback to achieve robust action execution. The different sensor modalities are complementary, provide redundancy and are correlated over time during action execution. In our department, we want to study how this kind of high-dimensional, multi-modal, streaming data can be exploited as feedback by robots that interact with the world.

Perception-Action-Learning Loops While it is common in robot motion control to close feedback loops around low-dimensional, proprioceptive sensors that measure for example joint angles and torques, it is much less common to exploit the kind of high-dimensional, multi-modal data that comes from visual and haptic sensors. The aforementioned characteristics of continuous, task-relevant and multi-modal feedback cannot be simply engineered in this case. The huge variability of environment structure, sensing conditions and modalities as well as tasks that a robot may encounter, require a robot to learn the open parameters of such a feedback controller from previous experience. We are interested in studying different models for learning controllers or policies that exploit this kind of sensory data.