The AlphaProof and AlphaGeometry teams continue to explore AI approaches for advancing mathematical reasoning and plan to release more technical details on AlphaProof soon.
Google Deepmind's latest models AlphaProof and AlphaGeometry 2 solved four out of six problems from this year's International Mathematical Olympiad (IMO), achieving a score equivalent to a silver medalist in the competition.
The IMO, the oldest and largest competition for young mathematicians, has become a benchmark for measuring AI's advanced mathematical reasoning capabilities. Each year, participants solve six difficult problems in algebra, combinatorics, geometry, and number theory.
This year, the competition's problems were scored by prominent mathematicians, including Prof Sir Timothy Gowers and Dr Joseph Myers.
AlphaProof, a reinforcement-learning-based system for formal math reasoning, and AlphaGeometry 2, an improved geometry-solving system, solved two algebra problems, one number theory problem, and one geometry problem. The systems scored 28 out of 42 total points, just shy of the 29-point threshold for a gold medal.
The problems were manually translated into formal mathematical language for the AI systems to understand. While the official competition allows 4.5 hours per problem, the AI systems solved one problem within minutes and took up to three days for others.
AlphaProof utilises the AlphaZero reinforcement learning algorithm, previously known for mastering chess, shogi, and Go. It trained itself to prove mathematical statements in the formal language Lean by generating and verifying solutions. AlphaProof trained for the IMO by solving millions of problems across various mathematical topics.
AlphaGeometry 2, a neuro-symbolic hybrid system, significantly improved its predecessor's performance, solving 83% of historical IMO geometry problems compared to the previous 53%. It's a neuro-symbolic hybrid system in which the language model was based on Gemini and trained from scratch on an order of magnitude more synthetic data than its predecessor.
It employed a symbolic engine and a knowledge-sharing mechanism to solve complex geometry problems quickly.
The AlphaProof and AlphaGeometry teams continue to explore AI approaches for advancing mathematical reasoning and plan to release more technical details on AlphaProof soon.
According to venture capitalist and Meta board director Peter Theil within the next 3-5 years, AI systems will possess the capability to solve all problems presented in the prestigious International Mathematical Olympiad (IMO).
NuminaMath 7B TIR, a joint collaboration between Numina and Hugging Face which managed to solve 29 out of 50 problems in the AI Maths Olympiad. NuminaMath is a mix of open-source libraries notably TRL, PyTorch, vLLM, and DeepSpeed.
Last week, Mistral AI released a model for math reasoning called MathΣtral. It is tailored to tackle complex, multi-step logical reasoning challenges in STEM fields.
For instance, MathΣtral 7B achieves significant accuracy enhancements, scoring 68.37% on MATH through majority voting and 74.59% with a strong reward model among 64 candidates.
OpenAI is also working on a new AI technology under the code name "Strawberry." This project aims to significantly enhance the reasoning capabilities of its AI models.
With enhanced problem-solving abilities, AI could solve complex mathematical problems, help in engineering calculations, and even participate in theoretical research. As per reports, Strawberry scored 90% on the MATH test for neural networks.