March 4, 2025

Pennsylvania Convention Center, Room 115A| Philadelphia, Pennsylvania, USA

Tuesday, March 4, 2025
09:00-09:05Opening remarks
Challenger Mishra, Mateja Jamnik, Pietro Lio and Tiansi Dong
Keynote #1 "Machine Learning G2 Geometry?"
Speaker: Elli UG-Heyes, Imperial College London
Chair: Mateja Jamnik
9:05-9:45 Compact Ricci-flat Calabi-Yau and holonomy G2 manifolds appear in string and M-theory respectively as descriptions of the extra spatial dimensions that arise in the theories. Since 2017 machine-learning techniques have been applied extensively to study Calabi-Yau manifolds but until 2024 no similar work had been carried out on holonomy G2 manifolds. In this talk, I will firstly show how topological properties of these manifolds can be learnt using simple neural networks. I will then discuss how one may try to learn Ricci-flat metrics on G2 manifolds using machine-learning and why these approximations would be useful for M-theory.
Keynote #2 "Curiosity Styles in the (Natural & Artificial) Wild"
Speaker: Dani Bassett, University of Pensylvania
Chair: Tiansi Dong
09:45–10:30 What is curiosity? Across disciplines, some scholars offer a range of definitions while others eschew definitions altogether. Is the field of curiosity studies simply too young? Should we, as has been argued in neuroscience, press forward in definition-less characterization? At this juncture in the field's history, we turn to an examination of curiosity styles, and ask: How has curiosity been practiced over the last two millennia and how is it practiced today? We exercise a recent historico-philosophical account to catalogue common styles of curiosity and test for their presence as humans browse Wikipedia. Next we consider leading theories from psychology and mechanics that could explain curiosity styles, and formalize those theories in the mathematics of network science. Such a formalization allows theories of curiosity to be explicitly tested in human behavioral data and for their relative mental affordances to be investigated. Moreover, the formalization allows us to train artificial agents to build in human-like curiosity styles through reinforcement learning. Finally, with styles and theories in hand, we expand out to a study of several million users of Wikipedia to understand how curiosity styles might or might not differ around the world and track factors of social inequality. Collectively, our findings support the notion that curiosity is practiced---differently across people---as unique styles of network building, thereby providing a connective counterpoint to the common acquisitional account of curiosity in humans.
Coffee break 10:30-11:00
Keynote #3 "Automating the discovery of mathematical conjectures and identities"
Speaker: Thomas Fink, London Institute for Mathematical Sciences
Chair: Challenger Mishra
11:00-11:45 Given a mathematical object, identities help us understand the behavior of that object. For example, we understand the special functions exp(x) and cos(x) by deducing identities about them, such as exp(2x) = exp(x) · exp(x), and cos(2x) = 2 cos(x) · cos(x) − 1. A lot of research involves searching for identities about new mathematical objects. To what extent can this be automated? To find out, we searched over all possible definitions of special functions with a quadratic form: f (m x) = a f (x) · f (x) + b f (x) + c. Using intelligent automation, we were able to discover conjectures about a variety of new and known special functions.
Keynote #4 "Evaluating Human-Centric Thought Partners for Mathematics"
Speaker: Katherine Collins, University of Cambridge
Chair: Challenger Mishra
11:45-12:30 There is much excitement about harnessing the power of large language models (LLMs) to move towards not just tools for thought but partners in thought for challenging reasoning tasks. However, the standard methodology of evaluating LLMs based on static pairs of inputs and outputs is insufficient to be able to make an informed decision about which LLMs, and under what assistive settings they can be sensibly utilised. Static assessment fails to take into account the essential interactive element in their deployment, and therefore limits how we understand language model capabilities. In this talk, I’ll present our work introducing CheckMate, an adaptable prototype platform for humans to interact with and evaluate LLMs. We’ll discuss our study with CheckMate to evaluate three language models (InstructGPT, ChatGPT, and GPT-4) as assistants in proving undergraduate-level mathematics, with a mixed cohort of participants from undergraduate students to professors of mathematics. We release the resulting interaction and rating dataset, MathConverse. By analysing MathConverse, we derive a taxonomy of human behaviours and uncover that despite a generally positive correlation, there are notable instances of divergence between correctness and perceived helpfulness in LLM generations, amongst other findings. We conclude with actionable takeaways for ML practitioners and mathematicians: models which communicate uncertainty, respond well to user corrections, and are more interpretable and concise may constitute better assistants; interactive evaluation is a promising way to navigate the capability of these models; humans should be aware of language models’ algebraic fallibility and discern where they are appropriate to use. I will then present some early work exploring other styles of interactive evaluation to better understand the human-compatibility of AI thought partners for mathematics in the context of formalization and general gameplay.
Lunch break 12:30-14:00
Keynote #5 "NeuroSymbolic AI and combinatorial competitive programming"
Speaker: Alex Vitvitskyi, Google DeepMind
Chair: Tiansi Dong
14:00-14:45 From AlphaZero and Agent57 to AlphaProof; from Reinforcement Learning-fuelled Chain of Thought to FunSearch. While these are all very different systems, they have one key element in common: they are neurosymbolic.
Neural and symbolic paradigms have different inherent strengths and weaknesses. Hybrid neurosymbolic architectures, such as the ones above, combine both of these approaches in multiple creative ways to maximize systems’ performance.
In this talk, we will be looking at a few such architectures, their structure, commonalities and differences between them. A particular focus will be placed on FunSearch, a neuro-symbolic system powered by genetic algorithms, and its application to the challenging domain of combinatorial competitive programming.
Poster session 14:45-15:30
"Active Symbolic Discovery of Ordinary Differential Equations via Phase Portrait Sketching"
Nan Jiang, Md Nasim, Yexiang Xue
"LLM-based SQL Generation with Reinforcement Learning"
Mariia Berdnyk, Marine Collery
"From Black Box to Algorithmic Insight: Explainable AI in Graph Neural Networks for Graph Coloring"
Elad Shoham, Havana Rika, Dan Vilenchik
"Beyond Interpolation: Extrapolative Reasoning with Reinforcement Learning and Graph Neural Networks"
Niccolò Grillo, Andrea Toccaceli, Benjamin Estermann, Joël Mathys, Stefania Fresca, Roger Wattenhofer
"Can Better Solvers Find Better Matches? Assessing Math-LLM Models in Similar Problem Identification"
Savitha Sam Abraham, Pietro Totis, Marjan Alirezaie, Luc De Raedt
"Towards Learning to Reason: Comparing LLMs with Neuro-Symbolic on Arithmetic Relations in Abstract Reasoning"
Michael Hersche, Giacomo Camposampiero, Roger Wattenhofer, Abu Sebastian, Abbas Rahimi
"Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search"
Shuangtao Li, ShuaihaoDong, Kexin Luan, Xinhan Di, Chaofan Ding
"An Evaluation of Approaches to Train Embeddings for Logical Inference"
Yasir White, Jevon Lipsey, Jeff Heflin
Coffee break 15:30-16:00
Panel discussion 16:00-16:55
Interdisciplinary Discussions on Neural Reasoning and Mathematical Discovery
16:55–17:00Closing remarks
Challenger Mishra, Mateja Jamnik, Pietro Lio and Tiansi Dong