How Facebook and Google's AI agents fare in advanced math

December 26th
Facebook's Palo Alto office. (Office Snapshots)

This post is sponsored by Brilliant, which offers courses in Beautiful Geometry and Algebra Fundamentals, as well as Computer Science Fundamentals and Data Structures. Sign up today and get 20% off when you use this link.

It's often said that AI excels in advanced data analysis, but falls short when competing against humans in basic abstract tasks. AIs have, for instance, reliably beat players in extraordinarily complex video games, and are regularly used to discover patterns in massive datasets. On the other hand, when an AI was trained to pass a middle school science exam in 2019, it was heralded as a major achievement.

That may be changing. Recent publications from Facebook and Google have shown some advancement in more fundamental problem solving.

At Facebook, researchers recently announced that they've successfully trained an AI to solve university-level calculus problems in seconds. Instead of approaching the research as a math problem, Facebook's scientists borrowed tools from natural language processing (NLP), a subfield dedicated to mining written text and speech.

"This works because the mathematics in each problem can be thought of as a sentence, with variables, normally denoted x, playing the role of nouns and operations, such as finding the square root, playing the role of verbs," Gege Li, a reporter at NewScientist, wrote. "The AI then 'translates' the problem into a solution."

François Charton and Guillaume Lample, the Facebook researchers behind the agent, said their AI scored a 98% accuracy rating when given 500 calculus problems. Their success suggests that a future agent could solve more advanced problems that humans cannot, they told NewScientist.

"These results are surprising given the difficulty of neural models to perform simpler tasks like integer addition or multiplication," they wrote in their paper, echoing the fact that machines seem to do well in areas humans struggle in, but struggle in tasks children learn to do quickly.

At Google, results have been less obviously successful. In an April, 2019 paper, researchers announced that their trained AI failed a high school level math test. It turned out that their AI broke down about where the casual human does.

The AI "has a performance of 90% or more on the 'add or subtract several numbers' module and the 'multiply or divide several numbers' module" researchers wrote. "However on the mixed arithmetic module (mixing all four operations together with parentheses), the performance drops to around 50%."

Later in 2019, however, Google researchers said that they'd trained an AI to prove 1,200 math theorems. In the paper, researchers claimed to have published a benchmark performance they encouraged other teams to attempt to beat. Researchers hope the benchmark, called HOList, will do for math theorems what ImageNet did for image recognition -- incite competition and speed up research. Like other benchmark competitions, one designed for math could help advance AI agents' performance in mathematics.

"Given the fundamental nature of mathematics and its importance for most scientific disciplines, the capability for high level formal mathematical reasoning is both an important practical task as well as one of the most challenging case studies in AI," they wrote.