In January 2024 a fascinating project was released, "OK robot".
This robot demo was able to move around unseen New York apartments and retrieve objects given natural language commands. But the coolest part was that this system was a clever combination of existing technology, rather than something entirely new. Everything was open source except the grasping technology, AnyGrasp.
Yet AnyGrasp was a key component of the system, so I did some digging and found the paper, and it contained great tips for anyone wanting to make a world class grasping library. Here are some cool things I found:
Catching Robot Fish Is Hard
One of the grasping tasks was to catch a robot fish (i.e. a finding nemo fish swimming in water) with a rubberized 2 finger gripper. There is an entire pie chart dedicated to all the ways that this task failed, which was both hilarious and impressive that it could catch robot fish at all.
Neural Networks Still Need Collision Avoidance
AnyGrasp is yet another neural network based grasping approach, and it uses the whole scene to determine valid hand positions, and so it have enough information to ignore bad hand poses. Yet they still need to double check their results with a collision avoidance system for safety.
It's an example of combining modern neural network methods with classical methods. Learning the classics isn't going to be a waste of time anytime soon.
Real World Data Wins, And More Labels Is Better
AnyGrasp focuses on using real world data, and shows substantial performance improvements over synthetic only, which is another example of how difficult the Sim 2 Real gap can be. Although they still pre-train on synthetic data.
They also discover that increasing the number of annotated grasps (labels in the data) is just as important as more data. Suggesting that you can get more signal out of existing datasets.
Grasping Near The Centre Of Gravity Is Important
A new insight is that by estimating the centre of gravity, much more reliable grasps can be obtained.