insightful ai facts

Human-level performance milestones

  • 1989: BILL a Bayesian learning-based system for playing the board game Othello developed by Kai-Fu Lee and Sanjoy Mahajan beat the highest ranked U.S. player, Brian Rose by 56—8. Later, in 1997, a program named Logistello won every game in a six game match against the reigning Othello world champion.

  • 1995: Checkers In 1952, Arthur Samuels built a series of programs that played the game of checkers and improved via self-play. However, it was not until 1995 that a checkers-playing program, Chinook, beat the world champion.

  • 1997 IBM’s DeepBlue system beat chess champion Gary Kasparov. 15 years later one of DeepBlue engineers revealed a decisive move made by the machine was actually the result of a bug. Today, chess programs running on smartphones can play at the grandmaster level.

  • 2011 Jeopardy! In 2011, the IBM Watson computer system competed on the popular quiz show Jeopardy! against former winners Brad Rutter and Ken Jennings. Watson won the first place prize of $1 million.

  • 2015 Atari Games In 2015, a team at Google DeepMind used a reinforcement learning system to learn how to play 49 Atari games. The system was able to achieve human-level performance in a majority of the games (e.g., Breakout), though some are still significantly out of reach (e.g., Montezuma’s Revenge).

  • 2016 Object Detection in ImageNet In 2016, the error rate of automatic labeling of ImageNet declined from 28% in 2010 to less than 3%. Human performance is about 5%.

  • 2016: The AlphaGo system developed by the Google DeepMind team beat Lee Sedol, one of the world’s greatest Go players, 4—1. Later DeepMind released AlphaGo Master, which defeated the top ranked player, Ke Jie. A more recent version called AlphaGo Zero beat the original AlphaGo system by 100—0.

  • 2017 Skin Cancer Classification In a 2017 Nature article, Esteva et al. describe an AI system trained on a data set of 129,450 clinical images of 2,032 different diseases and compare its diagnostic performance against 21 board-certified dermatologists. They find the AI system capable of classifying skin cancer at a level of competence comparable to the dermatologists.

  • 2017 Speech Recognition on Switchboard In 2017, Microsoft and IBM both achieved performance within close range of “human-parity” speech recognition in the limited Switchboard domain.

  • 2017 Poker In January 2017, a program from CMU called Libratus defeated four top human players in a tournament of 120,000 games of two-player, heads up, no-limit Texas Hold’em. In February 2017, a program from the University of Alberta called DeepStack played a group of 11 professional players more than 3,000 games each. DeepStack won enough poker games to prove the statistical significance of its skill over the professionals.

  • 2017 Ms. Pac-Man Maluuba, a deep learning team acquired by Microsoft, created an AI system that learned how to reach the game’s maximum point value of 999,900 on Atari 2600.

  • 2018 Chinese - English Translation A Microsoft machine translation system achieved human-level quality and accuracy when translating news stories from Chinese to English. The test was performed on newstest2017, a data set commonly used in machine translation competitions.

  • 2018 Capture the Flag A DeepMind agent reached human-level performance in a modified version of Quake III Arena Capture the Flag (a popular 3D multiplayer first-person video game). The agents showed human-like behaviours such as navigating, following, and defending. The trained agents exceeded the win-rate of strong human players both as teammates and opponents, beating several existing state-of-the art systems. HUMAN-LEVEL PERFORMANCE MILESTONES

  • 2018 Dota 2 OpenAI Five, OpenAI’s team of five neural networks, defeats amateur human teams at Dota 2 (with restrictions). OpenAI Five was trained by playing 180 years worth of games against itself every day, learning via self-play. (OpenAI Five is not yet superhuman, as it failed to beat a professional human team)

  • 2018 Prostate Cancer Grading Google developed a deep learning system that can achieve an overall accuracy of 70% when grading prostate cancer in prostatectomy specimens. The average accuracy of achieved by US board-certified general pathologists in study was 61%. Additionally, of 10 high-performing individual general pathologists who graded every sample in the validation set, the deep learning system was more accurate than 8.

  • 2018 Alphafold DeepMind developed Alphafold that uses vast amount of geometric sequence data to predict the 3D structure of protein at an unparalleled level of accuracy than before.

  • 2019 Alphastar DeepMind developed Alphastar to beat a top professional player in Starcraft II.

  • 2019 Detect diabetic retinopathy (DR) with specialist-level accuracy Recent study shows one of the largest clinical validation of a deep learning algorithm with significantly higher accuracy than specialists. The tradeoff for reduced false negative rate is slightly higher false positive rates with the deep learning approach.

Last updated