DeepMind’s AlphaStar: A Grandmaster Level StarCraft 2 AI


Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. The paper that we are going to cover today
in my view, is one of the more important things that happened in AI research lately. In the last few years, we have seen DeepMind’s
AI defeat the best Go players in the world, and after OpenAI’s venture in the game of
DOTA2, DeepMind embarked on a journey to defeat pro players in Starcraft 2, a real-time strategy
game. This is a game that requires a great deal
of mechanical skill, split-second decision making and we have imperfect information as
we only see what our units can see. A nightmare situation for any AI. The previous version of AlphaStar we covered
in this series was able to beat at least mid-grandmaster level players, which is truly remarkable,
but, as with every project of this complexity, there were limitations and caveats. In our earlier video, the paper was still
pending, and now, it has finally appeared, so my sleepless nights have officially ended,
at least for this work, and now, we can look into some more results. One of the limitations of the earlier version
was that DeepMind needed to further tune some of the parameters and rules to make sure that
the AI and the players play on an even footing. For instance, the camera movement and the
number of actions the AI can make per minute has been limited some more and are now more
human-like. TLO, a professional StarCraft 2 player noted
that this time around, it indeed felt very much like playing another human player. The second limitation was that the AI was
only able to play Protoss, which is one of the three races available in the game. This new version can now play all three races,
and here you see its MMR ratings, a number that describes the skill level of the AI,
and for non-experts, win percentages for each individual race. As you see, it is still the best with Protoss,
however, all three races are well over the 99% winrate mark. Absolutely amazing. In this version, there is also more emphasis
on self-play, and the goal is to create a learning algorithm that is able to learn how
to play really well by playing against previous versions of itself millions and millions of
times. This is, again, one of those curious cases
where the agents train against themselves in a simulated world, and then, when the final
AI was deployed on the official game servers, it played against human players for the very
first time. I promise to tell you about the results in
a moment, but for now, please note that relying more on self-play is extremely difficult. Let me explain why. Self-play agents have the well-known drawback
of forgetting, which means that as they improve, they might forget how to win against a previous
version of themselves. Since StarCraft 2 is designed in a way that
every unit and strategy has an antidote, we have a rock-paper-scissors kind of situation
where the agent plays rock all the time because it encountered a lot of scissors lately. Then, when a lot of papers appear, it will
start playing scissors more often, and completely forget about the olden times when the rock
was all the rage. And, on and on this circle goes without any
real learning or progress. This doesn’t just lead to suboptimal results
– this leads to disastrously bad learning, if any learning at all. But it gets even worse. This situation opens up the possibility for
an exploiter to take advantage of this information and easily beat these agents. In concrete StarCraft terms, such an exploit
could be trying to defeat the AlphaStar AI early by rushing it with workers and warping
in photon cannons to their base. This strategy is also known as a cannon rush,
and as you can see here the red agent performing this, it can quickly defeat the unsuspecting
blue opponent. So, how do we defend against such exploits? DeepMind used a clever idea here, by trying
to turn the whole thing around and use these exploits to its advantage. How? Well, they proposed a novel self-play method
where they additionally insert these exploiter AIs to expose the main AI’s flaws and create
an overall, more knowledgeable and robust agent. So, how did it go? Well, as a result, you can see how the green
agent has learned to adapt to this by pulling its worker line and successfully defended
the cannon rush of the red AI. This is proper machine learning progress happening
right before our eyes. Glorious! This is just one example of using exploiters
to create a better main AI, but the training process continually creates newer and newer
kinds of exploiters, for instance, you will see in a moment that it later came up with
a nasty strategy including attacking the main base with cloaking units. One of the coolest parts of the work, in my
opinion, is that this kind of exploitation is a general concept that will surely come
useful for completely different test domains as well. We noted earlier that it finally started playing
humans for the first time on the official servers. So, how did that go? In my opinion, given the difficulty and the
vast search space we have in StarCraft 2, creating a self-learning AI that has the skills
of an amateur player is already incredible. But that’s not what happened. Hold on to your papers, because it quickly
reached grandmaster level with all three races and ranked above 99.8% of the officially ranked
human players. Bravo, DeepMind. Stunning work. Later, it also played Serral, a decorated,
world champion Zerg player, one of the most dominant players of our time. I will not spoil the results, especially given
there were limitations as Serral wasn’t playing on his equipment, but I will note
that Artosis, a well-known and beloved Starcraft player and commentator analyzed these matches
and said “The results are so impressive and I really feel like we can learn a lot
from it. I would be surprised if a non-human entity
could get this good and there was nothing to learn”. His commentary is excellent and is tailored
towards people who don’t know anything about the game. He’ll often pause the game and slowly explain
what is going on. In these matches, I loved the fact that so
many times it makes so many plays that we consider to be very poor and somehow, overall,
it still plays outrageously well. It has unit compositions that nobody in their
right minds would play. It is kind of like a drunken kung fu master,
but in StarCraft 2. Love it. But no more spoilers – I think you should
really watch these matches and, of course, I put a link to his analysis videos in the
video description. Even though both this video and the paper
appears to be laser focused on playing StarCraft 2, it is of utmost importance to note that
this is still just a testbed to demonstrate the learning capabilities of this AI. As amazing as it sounds, DeepMind wasn’t
just looking to spend millions and millions of dollars on research to just play video
games. The building blocks of AlphaStar are meant
to be reasonably general, which means that parts of this AI can be reused for other things,
for instance, Demis Hassabis mentioned weather prediction and climate modeling as examples. If you take only one thought from this video,
let it be this one. There is really so much to talk about, so
make sure to head over to the video description, watch the matches and check out the paper
as well. The evaluation section is as detailed as it
can possibly get. What a time to
be alive! Thanks for watching and for your generous
support, and I’ll see you next time!

Author Since: Mar 11, 2019

  1. I believe in Demis Hassabis and his vision towards AGI, and I have a feeling that deepmind will be the firs to reach superintelligence

  2. Not to put a too fine point of it, but if you really read it through, you will find it just an optimisation on several mainstream tactics that are used by human, which is imported to the model by human, too.

  3. Super awesome. We can definitely learn some neat tricks which the AI explores and reveals in action, which no human could have thought of /have had time and determination to test properly.

  4. I wonder if they did this ai for c and c tiberian sun, how long would it take to create an ai that beat all my old strategies of tunneling in engineers with tunneling apc. Or airlifting a gdi apc with engineers. Stealing a enemy command structure and extracting it back to base, then building up both structure tress. Building a firewall around the entire base, concrete the entire floor and use the firewall glitch where you ion canon the firewall as you turn it on and get infinite firewall.

  5. I really like that in most of our AI training programs we teach them how to defeat humans in the most efficient way.
    Next step is plugging them to our nuclear arsenal I guess…

  6. Next year they play war games with real machines. I three years first ai-machines become part of the us-army. In eight years ai-machines fight against human in the third world war.
    I need more popcorn, what a nice time we live in!

  7. Unlike humans, I guess AI can "play with it's self" and still win. I can't wait until the results of the 1970 movie "Colossus: The Forbin Project" come's to pass…

  8. I find this one to be far less impressive that the Go victory few months back. RTS games have far less strategy than non-gamers think.

  9. I want to see it being applied to city and company planning. Hopefully it could be made to make cities and companies just as prosperous as it makes it's Star Craft army. We get rid of corrupt politician and replace them with AI, the AI could rule us all. All hail our AI overlords which will get us out of debt and make us prosperous. Andrew Yang for POTUS might be on to something.

  10. Exciting News.

    I wish this learning would be done for so many games. It would be very interesting to see, what a self learning AI can come up in games like Rocket League or the very complex Age of Empires 2.
    And while we are on it, we should also not limit the AI APM just to see, how crazy a game can get.

  11. I'll repeat myself here but they need to try Rainbow Six Siege. I'm so interested to see how they would perform navigating the environment and coming up with new angles/strats etc.

  12. A small correction: 2:05 you say it's "well above the 99% winrate mark" — it's rather above the 99 percentile of players. You win approximately 50% of games due to the ladder matchmaking you with similar skill, except at the very top, and Alphastar is not at the top (~7350 mmr, whose winrates is about 80-90%).

  13. Alphastar is a pure trash in the current state. Everyone who plays the game well enough understand this.

    Maybe it can be useful in other applications, but for the Starcraft it added zero profit. No new strategies. Nothing new. It even doesn’t understand how to use current strategies, it just relies on luck.

    Please see videos with critics of it to understand the praises cost nothing.

  14. I'd also like to see how it will behave with different handicaps. Like, can it beat an average player with 10 times less resources? Or with a terribly low APM? I think it will be impressive to see an AI, that makes 1 action per 10 seconds, and still keeps winning.

  15. In the future, AI will be the commander deploying drones. And us humans just watch them as if it is like war games

  16. In a few years.
    Computer: Which music style do you want me to generate?
    Computer: Does D&B with phil collins synthesized on vocals sound like a good idea?
    Me: Neh just open cubase with a dance kick on bar 1 in 4/4 time at 157bpm.
    Me: And we'l just go from there ok.
    Computer: Ok lets go!

  17. Another example of: "Oh, maybe [some problem] wasn't that complicated after all, but AI will never manage to do [some new problem]."
    At some point the machines will be able to do everything, and some people still deny that that's even possible.

    I think maybe Ray Kurzweil's timeframe is more realistic than we think.

  18. It wins by extreme consistency and lack of positioning errors, both of which are hard for humans. If you look at its plays, you'll see AI tries to replicate human strategies, like using certain units to sabotage, but it ends up just sending them to death without trying to actually get advantage unit/building/position wise

  19. Well I was boycotting SC II channel because fuck blizzard stance on free speech. But I just have to see those games… so I think I'll make an exception just this once.

  20. The use of exploiter AIs sounds a lot like using an ecosystem to design a single lifeform, this might be how we solve the issue of a single AI surpassing humans, having many AIs that fight among themselves.

  21. The purpose is to create the best possible AI player, correct? Not simply the best AI strategist. In other words, this is the whole package – mental and physical – not just mental. For that reason I wish they would remove the input limitations on the AI. Yes of course it can make inputs way faster than a human, but that's not unfair, that's just an inherent fact and something that matters in making it a better player every bit as much as the mental aspect does. When human players are competing, they don't test their physical limits and then restrict the better one to eliminate that as a variable, they let it be part of the battle, which it should be. By this logic, they could just as well have done the opposite – left its input rate uncapped but limited the intelligence to match that of the other players. For what reason? I couldn't tell you, and that is my point.

    If you want to really talk fairness, there's a much larger issue at play, or at least, I think there is. I have to admit I'm not entirely certain on how everything was done, but my understanding is that in the original matches, the AI was allowed to effectively see the entire map at once, where as of course the human can only see what they are currently looking at and needs to move the camera around to keep themselves informed. I also believe, though I could be wrong, that this ability was removed for one final match in which the AI performed considerably worse.

    My question then is has this been rectified? Was the AI in these new match ups playing without a massive unfair advantage and forced to actually observe and control the game like a person, or was it effectively wired directly into the game mind-meld style so it had omnipotent vision and control? The impressiveness of these results really hangs on that in my opinion.

  22. endgame ]04:00+060s[ Exploiters! Been looking for a word for this AI learning technique. It's one of the key areas humans trainers can expand/supplement the models aggregated learned libraries and the bottom charts are great visualizations of robusterizing its strategy. What will be interesting is to see if its more spread out strategies will alter its win confidence curves. Initially probably negatively, but over time I bet it will stabilize at at higher % earlier in the game without minimal deviation to the strat field.

  23. Does "coming up with new exploits" mean they were recognised and injected by some method? Was it automated or manual? Or did the AI just play against a (not necessarily uniformly) random previous version?

  24. I've seen detailed analysis of several Alpha Star's games against human players. Event middle-level players can come up with simple strategies that will defeat AS with no chances. In such games AS looks pathetically weak. It simply cannot adapt to any novelty. So DeepMind proclamations are really-really premature. It is like saying that AI finally defeats humans in Chess when it accomplishes FIDE master but cannot play nearly as good as champion. AS is not even close to what was done in Chess and Go.

  25. Short answer is "No". AlphaStar does not play at 6k+ MMR, just about 5.3k. For all races. And it's still cheating. One of the best StarCraft experts made a video about it recently: https://www.youtube.com/watch?v=mpAUufSzaUo It's in russian, so you may need to enable subtitles with automatic translations.

  26. "its kind of like a drunken kung fu master, but in StarCraft2" Lol
    Meanwhile i burn my brain 200% to be a low plat player

  27. Thanks for another great video!
    Could you dive into how AI is being used for climate or earth biomes models in one of the next episodes?

  28. Thanks for the interesting video. While not a gamer, I'm interested in future applications of strategic, and tactical games, which may be of great importance in the future. At 2:17, the significant slope on the chart's right implied an interesting potential ability.

  29. Hi , thanks for your ai and machine learning review. But unfortunately final version of AlphaStar(AS) has an cheat sistem of raiting , with which it obtain that grandmaster level , full review is on that link (https://www.youtube.com/watch?v=mpAUufSzaUo&list=PLagUXFJ8tvanG_I7oLJdRh1jY9BsMvbhH&index=15) (sorry material is in russian), also i would like to mention that in all its games AS has weak tactic it mechanically repeat tricks of progamers without understanding .

  30. Have you seen there quiet MuZero paper? It's a model free implementation of alphazero enlarge to atari games (chess go shogi atari games). Model free meaning exactly what we think: AI should construct on it's on an internal representation of the state of the game, of it's dynamic (by example stones that disappears when captured) and of terminal states. It's an Alphazero that "in it's mind" cannot use the game simulator and should use it's learned model. Very cool stuff

  31. From someone who is working in research and has been playing SC for almost 20 years, Alphastar is an amazing feat.
    However, I still think it has an unfair apm advantage (perfect micro, targeting, humongous apm spikes). Another point was that in my opinion googles team did not achieve what they set out for – creative play. AlphaGo was able to produce innovative play, which AlphaStar did not – it copied strategies found on the ladder and applied its perfect micro/macro skills with peak apm to win.