by Fengyi Art Jiang, Shangqun Simon Yu, Qiuhong Anna Wei
“Hello. I am Baymax, your personal healthcare companion.”
- Baymax in Big Hero 6
On the recent release of BayMax! Show from Disney+, the residents of San Fransokyo showed us how great life could be with the companionship of BayMax: an adorable, energetic robot with the power to heal. The fluffy robot runs around San Fransokyo, showing compassion to everyone (and little kittens!) and providing health checks. How far away is humanity from such a future with caring, devoted, and selfless robots?
With recent advancements in artificial intelligence and breakthroughs in semiconductor industries, are we really that far away from reaching our wildest dreams — living in harmony with robots we created and being liberated from the daily mundane chores? This is the answer we are trying to find in CSCI 2952-O A Practical Introduction to Advanced 3D Robot Perception.
Computer Vision and Robotics are two of the most rapidly growing fields in technology today. If you're curious about what are the possibilities we can achieve combining those two fields at Brown, keep reading! In this article, we will give you a sneak peek into our lecture experience and what you can expect from this course.
Robotics is an interdisciplinary field – driving a robot, a robotic arm that picks and places objects, or an autonomous vehicle for navigating the rush hour traffic all require not only sensing and feeding environmental information to the system but also planning and actuating. At Brown we like to take a hands-on approach to learning. In CSCI 2952-O’s lectures, we covered the basics of control systems and robotics principles and focused on computer vision and scene representations for robotics. Then the hands-on robotics lab guided us to build the tools needed for applying the concepts from the literature. Lastly, the final project enabled us to explore what excites us at the intersection of robotics, CV, and machine learning (ML), and threw us into the battleground of solving real world problems instead of following stencil code and step-by-step instructions.
[ Art ]: The first thing that attracted me in 2952-O is the chance to work with a real robotic arm, a smaller version of a 6 degree of freedom (DoF) arm typically used on many automated production floors. My previous work experience on surgical robotic system focused mostly on one sub-component and I am missing the higher level overview of the system. Thus I was fascinated with the idea of having my own robotic arm manipulating objects and interacting with the physical world. For me, it was the perfect starting point to understanding the underlying structure of a complicated system, from vision input to motion planning and error handling.
Literature Deep Dive
The literature deep dive was also a highlight of the class. Over the course of the semester, we reviewed the literature shaping the development of perception for robotics in the past two decades, from pose estimation to neural scene representation. As a newbie recently stepping into this field, the first few sessions were intense as there were just so many aspects to learn. Little by little, we built up the technical stack that seemed intimidating at the beginning of the semester.
The literature review was carried out not only through student presentations, but also through engaging, amicable debate-style discussions and occasional demos. For example, seeing the live demos of NeRF (Neural Radiance Field) that our UTA Anna Wei (qwei3) and Prof. Sridhar prepared was exciting and mind-blowing — the synthetic reconstruction of the CIT was vivid and high-resolution. That was the first time I witnessed the power of multi-layer perceptions (MLP).
Synthetic results of the CIT from Neural Radiance Field trained by Anna Wei
But that's not all! We also touched on other exciting research areas such as human-robot interaction, simultaneous localization and mapping (SLAM), and pose estimation. By the end of the course, we had a well-rounded understanding of AI and Robotics and the foundation to tackle the final project.
In the second half of the course, we extended the research paper we had discussed during class to a final project working with a real robot arm, and we had the freedom to choose our own topic in vision and robotics.
For example, one team used one of Brown's Spot robots, the most advanced legged system in the industry, to capture 3D scenes in the real world via Instant NGP, a real time NeRF implementation. The Spot robot was able to capture images from multiple angles and generate high-fidelity 3D scenes within minutes. The team also came up with algorithms that let robots decide which angle requires more image data, enabling robots to collect all the data without human intervention. The double NeRF algorithms that students created also open new potential on state exploration for future research.
The class’s students have diverse backgrounds – some of us are PhD candidates, some are departing for industry, and some had worked in the field for years, but the boundaries between students, engineers, and scholars are blurred as we not only research and propose the technical solution, but also carry out the implementation details. Some of us started with no knowledge about Robot Operating System (ROS) or any experience with depth cameras, but all became subject matter experts by the end of the class.
Looking back, it is incredible to see how much we have learned in such a short amount of time. We are all looking forward to continuing our work in the field of robotics. We might be far from reaching the point of making caring, devoted robots like BayMax, but we are definitely one step closer towards our goal of becoming future robotics scientists/engineers.
For more information, click the link that follows to contact Brown CS Communications Manager Jesse C Polhemus.