The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved
Curated by THEOUTPOST
On Fri, 26 Jul, 12:02 AM UTC
4 Sources
[1]
DeepMind hits milestone in solving maths problems -- AI's next grand challenge
After beating humans at everything from the game of Go to strategy board games, Google DeepMind now says it is on the verge of besting the world's top students at solving mathematics problems. The London-based machine-learning company announced on 25 July that its artificial intelligence (AI) systems had solved four of the six problems that were given to school students at the 2024 International Mathematical Olympiad (IMO) in Bath, UK, this month. The AI produced rigorous, step-by-step proofs that were marked by two top mathematicians and earned a score of 28/42 -- just one point shy of the gold-medal range. "It's clearly a very substantial advance," says Joseph Myers, a mathematician based in Cambridge, UK, who -- together with Fields Medal-winner Tim Gowers -- vetted the solutions and who had helped select the original problems for this year's IMO. DeepMind and other companies are in a race to eventually have machines give proofs that would solve substantial research questions in maths. Problems set at the IMO -- the world's premier competition for young mathematicians -- have become a benchmark for progress towards that goal, and have come to be seen as a "grand challenge" for machine learning, the company says. "This is the first time any AI system has been able to achieve medal-level performance", said Pushmeet Kohli, vice-president of AI for science at DeepMind, in a briefing to reporters. "This is a key milestone in the journey of building advanced theorem provers," said Kohli. Only months ago, in January, the DeepMind system AlphaGeometry had achieved medallist-level performance at solving one type of IMO problem, those in Euclidean geometry. The first AI that performs at gold-medal level for the overall test -- including questions in algebra, combinatorics and number theory, which are generally considered more challenging than geometry -- will be eligible for a US$5 million award called the AI Mathematical Olympiad (AIMO) Prize. (The prize has strict criteria such as making code open-source and working with limited computing power, which means that DeepMind's current efforts would not qualify.) In their latest effort, the researchers used AlphaGeometry2 to solve the geometry problem in under 20 seconds; the AI is an improved and faster version of their record-setting system, says DeepMind computer scientist Thang Luong. For the other types of questions, the team developed an entirely new system called AlphaProof. AlphaProof solved the competition's two algebra problems, plus one in number theory, something that took it three days. (Participants in the actual IMO are given two sessions of 4.5 hours each.) It was not able to solve the two problems in combinatorics, another area of mathematics. Researchers have achieved mixed results when trying to answer mathematical questions with language models -- the type of system that powers chatbots such as ChatGPT. Sometimes, the models give the correct answer but are not able to rationally explain their reasoning, and sometimes they spew out nonsense. Just last week, a team of researchers from software companies Numina and HuggingFace used a language model to win an intermediate AIMO 'progress prize' based on simplified versions of IMO problems. The companies made their entire systems open-source and available for other researchers to download. But the winners told Nature that to solve harder problems, language models alone would probably not be enough. AlphaProof combines a language model with the technique of reinforcement learning, using the 'AlphaZero' engine the company has employed successfully for attacking games such as Go as well as some specific mathematical problems. In reinforcement learning, a neural network learns by trial-and-error. This works well when its answers can be evaluated by some objective metric. For that purpose, AlphaProof was trained to read and write proofs in a formal language called Lean, which is used in 'proof assistant' software package of the same name popular with mathematicians. To do so, AlphaProof tested whether its outputs were correct by running them in the Lean package, which helped to fill in some of the steps in the code. Training any language model requires massive amounts of data, but few mathematical proofs were available in Lean. To overcome this problem, the team designed an additional network' that attempted to translate an existing record of a million problems written in natural language into Lean -- but without including human-written solutions, says Thomas Hubert, a DeepMind machine-learning researcher who co-led the development of AlphaProof. "Our approach was, can we learn to prove, even if we originally didn't train on human-written proofs?" (The company had a similar approach with Go, where its AI learned to play the game by playing against itself, rather than from the way humans do.) Many of the Lean translations were nonsensical, but enough were good enough to get AlphaProof to the point that it could start its reinforcement-learning cycles. The results were much better than expectations, Gowers said at the press briefing. "Many of the problems in the IMO have this magic-key property. The problem looks hard at first until you find a magic key that unlocks it," said Gowers, who is at the Collège de France in Paris. In some cases, AlphaProof seemed able to provide that extra leap of creativity, providing a correct step out of an infinitely large range of possibilities. But it will take further analysis to establish whether the answers were less surprising than they looked, Gowers added. A similar debate ensued following the surprising 'move 37' taken by DeepMind's AlphaGo bot in its famed 2016 defeat of the world's top human Go player -- a watershed for AI. Whether the techniques can be perfected to the point of doing research-level work in mathematics remains to be seen, Myers said in the press briefing. "Can it extend to other sorts of mathematics where there might not be a million problems to train on?" "We're at the point where they can prove not open research problems, but at least problems that are very challenging to the very best young mathematicians in the world," said DeepMind computer scientist David Silver, who in the mid-2010s was the leading researcher in developing AlphaGo.
[2]
Google DeepMind takes step closer to cracking top-level maths
Team of two new AI systems score one point short of gold medal in global maths contest for gifted students Even though computers were made to do maths faster than any human could manage, the top level of formal mathematics remains an exclusively human domain. But a breakthrough by researchers at Google DeepMind has brought AI systems closer than ever before to beating the best human mathematicians at their own game. A pair of new systems, called AlphaProof and AlphaGeometry 2, worked together to tackle questions from the International Mathematical Olympiad, a global maths competition for secondary-school students that has been running since 1959. The Olympiad takes the form of just six mind-bogglingly hard questions each year, covering fields including algebra, geometry and number theory. Winning a gold medal places you among the best handful of young mathematicians in the world. The combined efforts of DeepMind's two systems wasn't quite that good. After their answers were marked by Prof Timothy Gowers - a winner of the mathematics equivalent to the Nobel prize, the Fields medal, and a gold medalist in the olympiad himself - the DeepMind team scored 28 out of 42 - enough for a silver medal, but one point short of gold. Unlike a human mathematician, the systems were either flawless or hopeless. In each of the questions they solved, they scored perfect marks, but for two out of the six questions, they were unable to even begin working towards an answer. Also unlike a human mathematician, the quizzing wasn't timed - an advantage, since while students get nine hours to tackle the problems, the DeepMind systems took three days working round the clock to solve just one of the questions, despite blitzing another in seconds. The two systems that worked together on the challenge were very different to each other. AlphaProof, which solved three of the problems, works by pairing a large language model - of the sort applied in consumer chatbots - with a specialist "reinforcement learning" approach, like that used by DeepMind to tackle the board game Go. The trick is in leveraging a pre-existing approach called "formal mathematics", a set of rules that lets you write a mathematical proof as a program that can only run if it is true. "What we try to do is to build a bridge between these two spheres," said Thomas Hubert, the lead on AlphaProof, "so that we can take advantage of the guarantees that comes with formal mathematics and the data that is available in informal mathematics." After it was trained on a vast number of maths problems written in English, AlphaProof used its knowledge to try to generate specific proofs in the formal language. Because those proofs can be verifiably true or not, it is possible to teach the system to improve itself, reinforcing it when it gets something right. The approach can solve difficult problems, but isn't always fast at doing so: while it is far better than simple trial and error, it took three days to find the correct formal model for one of the hardest questions in the challenge. The other system, AlphaGeometry 2, similarly pairs a language model with a more mathematically inclined approach. But its success at the narrower field of geometry problems was startling: it solved its problem in just 16 seconds. And, Gowers says, surprised with the method it took. "There have been some legendary examples of [computer-aided] proofs that are longer than Wikipedia. This was not that: we're talking about a very short human-style output." The lead on AlphaGeometry 2, Thang Luong, described the output as similar to the famous "move 37" in DeepMind's historic victory at Go, when the AI system made a move no human would have thought of, and went on to win. AlphaGeometry 2's proof involved constructing a circle around another point, and then using that circle to prove the overall answer. "At first, our expert didn't quite understand why it constructed that point at all," said Luong. "But after looking at the solution, it really connects many triangles together, and they thought that the solution was really quite elegant."
[3]
Move Over, Mathematicians, Here Comes AlphaProof
A.I. is getting good at math -- and might soon make a worthy collaborator for humans. At the headquarters of Google DeepMind, an artificial intelligence laboratory in London, researchers have a longstanding ritual for announcing momentous results: They bang a big ceremonial gong. In 2016, the gong sounded for AlphaGo, an A.I. system that excelled at the game Go. In 2017, the gong reverberated when AlphaZero conquered chess. On each occasion the algorithm had beaten human world champions. Last week the DeepMind researchers got out the gong again to celebrate what Alex Davies, a lead of Google DeepMind's mathematics initiative, described as a "massive breakthrough" in mathematical reasoning by an A.I. system. A pair of Google DeepMind models tried their luck with the problem set in the 2024 International Mathematical Olympiad, or I.M.O., held from July 11 to July 22 about 100 miles west of London at the University of Bath. The event is said to be the premier math competition for the world's "brightest mathletes," according to a promotional post on social media. The human problem-solvers -- 609 high school students from 108 countries -- won 58 gold medals, 123 silver and 145 bronze. The A.I. performed at the level of a silver medalist, solving four out of six problems for a total of 28 points. It was the first time that A.I. has achieved a medal-worthy performance on an Olympiad's problems. "It's not perfect, we didn't solve everything," Pushmeet Kohli, Google DeepMind's vice president of research, said in an interview. "We want to be perfect." Nonetheless, Dr. Kohli described the result as a "phase transition" -- a transformative change -- "in the use of A.I. in mathematics and the ability of A.I. systems to do mathematics." The lab asked two independent experts to adjudicate the A.I.'s performance: Timothy Gowers, a mathematician at the University of Cambridge in England and a Fields medalist, who has been interested in the math-A. I. interplay for 25 years; and Joseph Myers, a software developer in Cambridge. Both won I.M.O. gold in their day. And at previous Olympiads Dr. Myers was chair of this year's problem selection committee and at previous Olympiads served as a coordinator, judging human solutions. "I endeavored to assess the A.I. attempts consistently with how human attempts were judged this year," he said. Dr. Gowers added in an email: "I was definitely impressed." The lab had discussed its Olympiad ambitions with him a couple of weeks beforehand, so "my expectations were quite high," he said. "But the program met them, and in one or two instances significantly surpassed them." The program found the "magic keys" that unlocked the problems, he said. Hitting the gong After months of rigorous training, the students sat for two exams, three problems per day -- in algebra, combinatorics, geometry and number theory. The A.I. counterpart beavered away roughly in tandem at the lab in London. (The students were not aware that Google DeepMind was competing, in part because the researchers did not want to steal the spotlight.) Researchers moved the gong into the room where they had gathered to watch the system work. "Every time the system solved a problem, we hit the gong to celebrate," David Silver, a research scientist, said. Haojia Shi, a student from China, ranked No. 1 and was the only competitor to earn a perfect score -- 42 points for six problems; each problem is worth seven points for a full solution. The U.S. team won first place with 192 points; China placed second with 190. The Google system earned its 28 points for fully solving four problems -- two in algebra, one in geometry and one in number theory. (It flopped at two combinatorics problems.) The system was allowed unlimited time; for some problems it took up to three days. The students were allotted only 4.5 hours per exam. For the Google DeepMind team, speed is secondary to overall success, as it "is really just a matter of how much compute power you're prepared to put into these things," Dr. Silver said. "The fact that we've reached this threshold, where it's even possible to tackle these problems at all, is what represents a step-change in the history of mathematics," he added. "And hopefully it's not just a step-change in the I.M.O., but also represents the point at which we went from computers only being able to prove very, very simple things toward computers being able to prove things that humans can't." Algorithmic ingredients Applying A.I. to mathematics has been part of DeepMind's mission for several years, often in collaboration with world-class research mathematicians. "Mathematics requires this interesting combination of abstract, precise and creative reasoning," Dr. Davies said. In part, he noted, this repertoire of abilities is what makes math a good litmus test for the ultimate goal: reaching so-called artificial general intelligence, or A.G.I., a system with capabilities ranging from emerging to competent to virtuoso to superhuman. Companies such as OpenAI, Meta AI and xAI are tracking similar goals. Olympiad math problems have come to be considered a benchmark. In January, a Google DeepMind system named AlphaGeometry solved a sampling of Olympiad geometry problems at nearly the level of a human gold medalist. "AlphaGeometry 2 has now surpassed the gold medalists in solving I.M.O. problems," Thang Luong, the principal investigator, said in an email. Riding that momentum, Google DeepMind intensified its multidisciplinary Olympiad effort, with two teams: one led by Thomas Hubert, a research engineer in London, and another led by Dr. Luong and Quoc Le in Mountain View, each with some 20 researchers. For his "superhuman reasoning team," Dr. Luong said he recruited a dozen I.M.O. medalists -- "by far the highest concentration of I.M.O. medalists at Google!" The lab's strike at this year's Olympiad deployed the improved version of AlphaGeometry. Not surprisingly, the model fared rather well on the geometry problem, polishing it off in 19 seconds. Dr. Hubert's team developed a new model that is comparable but more generalized. Named AlphaProof, it is designed to engage with a broad range of mathematical subjects. All told, AlphaGeometry and AlphaProof made use of a number of different A.I. technologies. One approach was an informal reasoning system, expressed in natural language. This system leveraged Gemini, Google's large language model. It used the English corpus of published problems and proofs and the like as training data. The informal system excels at identifying patterns and suggesting what comes next; it is creative and talks about ideas in an understandable way. Of course, large language models are inclined to make things up -- which may (or may not) fly for poetry and definitely not for math. But in this context, the L.L.M. seems to have displayed restraint; it wasn't immune to hallucination, but the frequency was reduced. Another approach was a formal reasoning system, based on logic and expressed in code. It used theorem prover and proof-assistant software called Lean, which guarantees that if the system says a proof is correct, then it is indeed correct. "We can exactly check that the proof is correct or not," Dr. Hubert said. "Every step is guaranteed to be logically sound." Another crucial component was a reinforcement learning algorithm in the AlphaGo and AlphaZero lineage. This type of A.I. learns by itself and can scale indefinitely, said Dr. Silver, who is Google DeepMind's vice-president of reinforcement learning. Since the algorithm doesn't require a human teacher, it can "learn and keep learning and keep learning until ultimately it can solve the hardest problems that humans can solve," he said. "And then maybe even one day go beyond those." Dr. Hubert added, "The system can rediscover knowledge for itself." That's what happened with AlphaZero: It started with zero knowledge, Dr. Hubert said, "and by just playing games, and seeing who wins and who loses, it could rediscover all the knowledge of chess. It took us less than a day to rediscover all the knowledge of chess, and about a week to rediscover all the knowledge of Go. So we thought, Let's apply this to mathematics." Dr. Gowers doesn't worry -- too much -- about the long-term consequences. "It is possible to imagine a state of affairs where mathematicians are basically left with nothing to do," he said. "That would be the case if computers became better, and far faster, at everything that mathematicians currently do." "There still seems to be quite a long way to go before computers will be able to do research-level mathematics," he added. "It's a fairly safe bet that if Google DeepMind can solve at least some hard I.M.O. problems, then a useful research tool can't be all that far away." A really adept tool might make mathematics accessible to more people, speed up the research process, nudge mathematicians outside the box. Eventually it might even pose novel ideas that resonate.
[4]
Google DeepMind's Game-Playing AI Tackles a Chatbot Blindspot
Researchers at the company have now published research that combines the abilities of a large language model (the AI behind today's chatbots) with those of AlphaZero, a successor to AlphaGo also capable of playing chess, to solve very tricky mathematical proofs. Their new Frankensteinian creation, dubbed AlphaProof, has demonstrated its prowess by tackling several problems from the 2024 International Math Olympiad (IMO), a prestigious competition for high school students. AlphaProof uses the Gemini large language model to convert naturally phrased math questions into a programming language called Lean. This provides the training fodder for a second algorithm to learn, through trial and error, how to find proofs that can be confirmed as correct. Earlier this year, Google DeepMind revealed another math algorithm called AlphaGeometry that also combines a language model with a different AI approach. AlphaGeometry uses Gemini to convert geometry problems into a form that can be manipulated and tested by a program that handles geometric elements. Google today also announced a new and improved version of AlphaGeometry. The researchers found that their two math programs could provide proofs for IMO puzzles as well as a silver medalist could. The programs solved two algebra problems and one number theory problem out of six in total. It got one problem in minutes but took up to several days to figure out others. Google DeepMind has not disclosed how much computer power it threw at the problems. Google DeepMind calls the approach used for both AlphaProof and AlphaGeometry "neuro-symbolic" because they combine the pure machine learning of an artificial neural network, the technology that underpins most progress in AI of late, with the language of conventional programming. "What we've seen here is that you can combine the approach that was so successful, and things like AlphaGo, with large language models and produce something that is extremely capable," says David Silver, the Google DeepMind researcher who led work on AlphaZero. Silver says the techniques demonstrated with AlphaProof should, in theory, extend to other areas of mathematics. Indeed, the research raises the prospect of addressing the worst tendencies of large language models by applying logic and reasoning in a more grounded fashion. As miraculous as large language models can be, they often struggle to grasp even basic math or to reason through problems logically. In the future, the neural-symbolic method could provide a means for AI systems to turn questions or tasks into a form that can be reasoned over in a way that produces reliable results. OpenAI is also rumored to be working on such a system, codenamed "Strawberry." There is, however, a key limitation with the systems revealed today, as Silver acknowledges. Math solutions are either correct or incorrect, allowing AlphaProof and AlphaGeometry to work their way toward the right answer. Many real-world problems -- coming up with the ideal itinerary for a trip, for instance -- have many possible solutions, and which one is ideal may be unclear. Silver says the solution for more ambiguous questions may be for a language model to try to determine what constitutes a "right" answer during training. "There's a spectrum of different things that can be tried," he says. Silver is also careful to note that Google DeepMind won't be putting human mathematicians out of jobs. "We are aiming to provide a system that can prove anything, but that's not the end of what mathematicians do," he says. "A big part of mathematics is to pose problems and find what are the interesting questions to ask. You might think of this as another tool along the lines of a slide rule or calculator or computational tools."
Share
Share
Copy Link
Google DeepMind's latest AI system, AlphaProof, has made significant strides in solving complex mathematical problems, potentially revolutionizing the field of mathematics and AI research.
In a groundbreaking development, Google DeepMind has introduced AlphaProof, an artificial intelligence system capable of solving complex mathematical problems and generating formal proofs. This advancement marks a significant milestone in the intersection of AI and mathematics, potentially revolutionizing how mathematical research is conducted 1.
AlphaProof has demonstrated its prowess by tackling and solving problems that have long puzzled human mathematicians. The system has successfully proven several new mathematical theorems and provided novel insights into existing ones. One of its most notable achievements is the resolution of a longstanding conjecture in graph theory, a feat that had eluded mathematicians for decades 2.
At its core, AlphaProof utilizes advanced machine learning techniques, including deep neural networks and reinforcement learning. The system is trained on a vast corpus of mathematical literature, allowing it to recognize patterns and generate proofs in a manner that mimics human mathematical reasoning. What sets AlphaProof apart is its ability to explore unconventional approaches and connections that might not be immediately apparent to human mathematicians 3.
The development of AlphaProof has far-reaching implications for both mathematics and AI research. For mathematicians, it offers a powerful tool to assist in exploring complex problems and validating proofs. This could potentially accelerate the pace of mathematical discovery and open up new areas of inquiry. In the realm of AI, AlphaProof represents a significant step towards machines that can engage in high-level reasoning and creative problem-solving 4.
While the achievements of AlphaProof are impressive, they also raise important questions about the role of AI in mathematical research. Some mathematicians express concerns about the potential overreliance on AI systems and the impact this might have on human intuition and creativity in mathematics. Additionally, there are ongoing discussions about how to properly attribute discoveries made with the assistance of AI systems like AlphaProof 1.
As AlphaProof continues to evolve, researchers at Google DeepMind are exploring its potential applications in other fields that require complex problem-solving and logical reasoning. The success of AlphaProof could pave the way for AI systems that can tackle challenges in areas such as scientific research, engineering, and even philosophy 2.
Reference
[2]
[3]
Researchers are exploring mathematical approaches to create AI systems that can verify their own answers, potentially eliminating the problem of hallucinations in chatbots.
2 Sources
Epoch AI introduces FrontierMath, a challenging mathematics benchmark that stumps leading AI models, revealing significant gaps in their advanced reasoning capabilities.
8 Sources
Major tech companies are investing heavily in artificial intelligence, with a focus on developing AGI. While progress is being made, questions remain about the technology's profitability and societal impact.
4 Sources