DeepMind’s New Robots: An AI Revolution!

Time: 0.08

Fellow Scholars, I think you are going to love this. This is a follow up to an amazing project

Time: 6.56

where AI agents learn to play football/soccer. Initially, they start out like this. Well,

Time: 13.72

learning is a strong word to use here. But now, let’s jump 5 years in time, and look! They are

Time: 21.76

now competent players, but wait a second, was this really 5 years of training? Yes it was,

Time: 29.68

however, not in our time, but in their time. You see, if you have a powerful computer,

Time: 36.56

you can simulate these little AIs much, much faster than real time.

Time: 42.32

And now, we are going to look at a similar, really exciting sim2real project,

Time: 48.4

which means that first, the agents learn to play in a simulation, in a computer game if you will,

Time: 55.24

and then come out into the real world as real robots to shoot a real ball. We talked a bit

Time: 62.64

about football robots too before, but the new results are truly something else. However,

Time: 69.32

I am a little worried. I hear you asking, Károly, why are you worried? Well, look

Time: 76.72

at this. In this earlier software project, there was no referee, and therefore there

Time: 82.72

was no penalty of basically completely destroying each other, and look at that.

Time: 90.04

Absolute pandemonium ensued. Now I am worried that these little Robot Scholars are also going to beat

Time: 98.28

up each other really bad. So, do they? I will tell you in a moment why this is not the case.

Time: 106.12

Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér.

Time: 110.64

So, first, they learn in a video game world, and I already see that this is going to be

Time: 117.36

quite tough. Look, 20 degrees of freedom. That means 20 controllable joints. That gives us at

Time: 125.44

least 20 things that can go wrong. Ouch. This is not going to be glorious, not in the slightest!

Time: 132.56

This means that we have to start out from the humble beginnings. That controlling the arms

Time: 139.08

and the limbs of this robot will not be easy, so much so that they first need to learn to stand,

Time: 146.16

walk, and of course, get up after falling. Then, basic training in football. Wow,

Time: 155.48

this is not looking great. I wonder if enough learning can happen here so we get to see anything

Time: 162.08

interesting? Not sure. Then, soccer training against an opponent. This is not too impressive

Time: 170.08

at this point, and it will probably get even worse upon uploading this AI into a real robot.

Time: 178.04

Especially that these poor little robots barely see anything, and when they start running,

Time: 184.68

oh my goodness. It gets even worse. So now we see that this is a super difficult task,

Time: 192.2

so let’s see if it can be done at all.

Time: 195.44

First, a penalty kick is simulated in the video game world, and now, hold on to your

Time: 201.36

papers Fellow Scholars, and…yes! That was a good kick! Finally! Learning is happening!

Time: 210.96

But this is when things are going well. So what if things are not going well at all? Well, after

Time: 218.28

a bit more learning, look how robust it became against perturbations. What does that mean? Well,

Time: 225.48

this is a fancy way of saying that there was lots of fun to be had in the lab that day. And

Time: 231.76

it can recover from all this. Wait! Not only that, but it can even get up and score! Bravo!

Time: 240.72

But it gets worse. I know that for a fact. Why do I know that for a fact? Because I had the

Time: 247.96

huge honor of recording this piece of footage myself in person at the Google DeepMind lab.

Time: 254.56

Look at that! Ouch! I did not expect that at all, and it really shows how these robots can

Time: 262.04

fail in the most spectacular manner. Wow. And, a big thank you to Google DeepMind for the trip.

Time: 269.36

So, are they destroying each other? Not quite. Why not? Because even though there we don’t have

Time: 276.92

a referee in person, they were told that the rule is that if you get too close to the other robot,

Time: 283.44

you get a penalty. Whew! So we can expect peaceful matches after this insanity from a previous work.

Time: 292.52

And then, something incredible happened. After playing more and more against themselves,

Time: 299

they learned new things, and now they have 7 absolutely incredible new skills.

Time: 305.96

One, they were also advised to avoid high knee torques during the video game phase. Essentially,

Time: 312.92

they were taught to take it easy or otherwise that knee is not going to

Time: 318.16

make it. I love that as soon as they are locked into a somewhat human-like body,

Time: 324.4

they now have knee pain of sorts. Welcome to the real world, little AI!

Time: 330.16

And, over time, yes! They learned to walk and run a bit softer. So good!

Time: 336.6

Two, they can now kick a moving ball. Three, they learned to block the other robot’s shot

Time: 344.04

with their bodies, that is amazing. Four, can now turn much better, and look. Five, they can also

Time: 353.32

do something that I did not expect at all. And that is getting up…but from the back. So cool!

Time: 360.6

Six, they can also anticipate what is about the happen and position themselves

Time: 366.68

accordingly. And seven, their ball control skills are also a thing of beauty. Bravo!

Time: 374.48

And now, here comes the absolute crown jewel result. The manufacturer of this robot has

Time: 381.68

little handcrafted scripts, programs that control the robot for these movements. This

Time: 387.36

is what it looks like. These are carefully crafted by engineers who know these models

Time: 393.16

really well. And now, let’s compare it to their learned behavior in this project.

Time: 399.24

Oh my goodness. This is so much better! And they learned all this by themselves.

Time: 406.16

The AI agent now walks and turns 2-3 times faster,

Time: 411.4

takes only close to half as much time to get up, and get this Fellow Scholars,

Time: 417.28

it even kicks a ball 34% faster than this manufacturer baseline. Holy mother of papers!

Time: 425.92

This is an absolutely incredible paper and I feel honored to witness it together with you. Wow.

Time: 434.4

And note that this training could not have been done entirely in the real world,

Time: 439.92

but first only in a video game world because it would have taken too long,

Time: 445.92

and the robots could also have hurt themselves in the process. But,

Time: 450.88

after learning in a video game, we get this absolute miracle. And remember, in the game,

Time: 457.72

they can put in years and years of training in just a few hours. What a time to be alive!