AI at the International Mathematical Olympiad: How AlphaProof and AlphaGeometry 2 Achieved Silver-Medal Standard
Mathematical reasoning is an essential aspect of human cognitive skills and drives progress in scientific discoveries and technological developments. As we strive to develop artificial general intelligence that matches human cognition, it is essential to equip AI with advanced mathematical reasoning capabilities. While current AI systems can handle basic mathematical problems, they struggle with the complex reasoning required for advanced mathematical disciplines such as algebra and geometry. However, this may change, as Google has created DeepMind significant steps in advancing the mathematical reasoning capabilities of an AI system. This breakthrough is made on the International Mathematical Olympiad (IMO) 2024. Founded in 1959, the IMO is the oldest and most prestigious mathematics competition, challenging high school students worldwide with problems in algebra, combinatorics, geometry and number theory. Every year, teams of young mathematicians compete to solve six very challenging problems. This year, Google DeepMind introduced two AI systems: AlphaProof, which focuses on formal mathematical reasoning, and AlphaGeometry 2, which specializes in solving geometric problems. These AI systems managed to solve four out of six problems and performed at silver medalist levels. In this article we will explore how these systems work to solve mathematical problems.
AlphaProof: combination of AI and formal language for proving mathematical theorems
AlphaProof is an AI system designed to prove mathematical statements using formal language Scanty. It integrates Gemini, a pre-trained language model, with AlphaZeroa reinforcement learning algorithm known for mastering chess, shogi and Go.
The Gemini model translates natural language problem statements into formal statements, creating a library of problems of varying difficulty. This serves two purposes: converting imprecise natural language into precise formal language for verifying mathematical proofs and using Gemini’s predictive capabilities to generate a list of possible solutions with formal language precision.
When AlphaProof encounters a problem, it generates possible solutions and looks for evidence steps in Lean to verify or refute them. This is essentially a neuro-symbolic approach, where the neural network, Gemini, translates natural language instructions into the symbolic formal language. Lean to prove or disprove the statement. Similar to AlphaZero’s self-play mechanism, where the system learns by playing games against itself, AlphaProof trains itself by trying to prove mathematical statements. Each proof attempt refines AlphaProof’s language model, with successful proofs strengthening the model’s capabilities to tackle more challenging problems.
For the International Mathematical Olympiad (IMO), AlphaProof was trained by proving or disproving millions of problems of varying difficulty levels and mathematical topics. This training continued during the competition, where AlphaProof refined its solutions until it found complete answers to the problems.
AlphaGeometry 2: Integrating LLMs and symbolic AI for solving geometry problems
AlphaGeometry 2 is the latest version of the AlphaGeometry series, designed to tackle geometry problems with improved precision and efficiency. Building on the foundation of its predecessor, AlphaGeometry 2 uses a neuro-symbolic approach that combines neural large language models (LLMs) with symbolic AI. This integration combines rule-based logic with the predictive ability of neural networks to identify auxiliary points, essential for solving geometry problems. The LLM in AlphaGeometry predicts new geometric constructions, while the symbolic AI applies formal logic to generate proofs.
When faced with a geometric problem, AlphaGeometry’s LLM evaluates numerous possibilities and predicts structures critical to problem solving. These predictions serve as valuable clues, guiding the symbolic engine to accurate conclusions and closer to a solution. This innovative approach enables AlphaGeometry to tackle complex geometric challenges that extend beyond conventional scenarios.
An important improvement in AlphaGeometry 2 is the integration of the Gemini LLM. This model is trained from the ground up on significantly more synthetic data than its predecessor. This extensive training allows him to tackle more difficult geometry problems, including problems involving object motion and equations of angles, ratios or distances. In addition, AlphaGeometry 2 features a symbolic engine that runs two orders of magnitude faster, allowing alternative solutions to be explored at unprecedented speed. These improvements make AlphaGeometry 2 a powerful tool for solving complex geometry problems, setting a new standard in the field.
AlphaProof and AlphaGeometry 2 at IMO
This year, participants at the International Mathematical Olympiad (IMO) were tested on six different problems: two in algebra, one in number theory, one in geometry and two in combinatorics. Google researchers translated converted these problems into formal mathematical language for AlphaProof and AlphaGeometry 2. AlphaProof tackled two algebra problems and one number theory problem, including the competition’s most difficult problem, which was solved by just five human participants this year. Meanwhile, AlphaGeometry 2 successfully solved the geometry problem, although it did not solve the two combinatorial challenges
Each problem at the IMO is worth seven points, for a maximum of 42. AlphaProof and AlphaGeometry 2 earned 28 points and achieved perfect scores for the problems they solved. This placed them at the top of the silver medal category. The gold medal threshold this year was 29 points, achieved by 58 of the 609 participants.
Next jump: natural language for math challenges
AlphaProof and AlphaGeometry 2 have shown impressive advances in AI’s mathematical problem-solving capabilities. However, these systems still rely on human experts to translate mathematical problems into formal language for processing. Furthermore, it is unclear how these specialized mathematical skills might be integrated into other AI systems, such as for exploring hypotheses, testing innovative solutions to long-standing problems, and efficiently managing time-consuming aspects of proof.
To overcome these limitations, Google researchers are developing a natural language reasoning system based on Gemini and their latest research. This new system aims to promote problem-solving skills without the need for formal language translation, and is designed to integrate smoothly with other AI systems.
It comes down to
The performance of AlphaProof and AlphaGeometry 2 at the International Mathematical Olympiad is a remarkable leap forward in AI’s ability to tackle complex mathematical reasoning. Both systems performed at silver medal level by solving four of the six challenging problems, demonstrating significant progress in formal proof and geometric problem solving. Despite their performance, these AI systems still rely on human input to translate problems into formal language and face challenges in integrating with other AI systems. Future research aims to further improve these systems, possibly integrating natural language reasoning to extend their capabilities across a wider range of mathematical challenges.