University of Oxford researchers have released a new open data set and AI model to accelerate drug discovery.
The project is led by the OpenBind consortium, a collaboration between the Universities of Washington, Columbia, and Oxford, as well as European Bioinformatics and several other research groups and industry partners across the world. OpenBind aims to make large, standardised open-access datasets that are publicly available.
Fergus Imrie, Associate Professor at the Department of Statistics and OpenBind computational researcher, told Cherwell: “One of the major bottlenecks in AI-enabled drug discovery is the shortage of large, reliable experimental datasets showing how small molecules bind to proteins.”
On 5th May, OpenBind released the first open dataset, which consisted of X-rays of compounds binding to the EA-A71 virus protein, as well as the binding strength measurements for many of the images. Speaking to Cherwell, Imrie described how this data had been generated at the Diamond Light Source in Oxfordshire using “high-throughput X-ray crystallography”.
Charlotte Dean, a senior OpenBind investigator, as well as the chair of the Engineering and Physical Sciences Research Council, described this first release of data as “an important step because it shows we can now generate high-quality, standardised data at scale, specifically designed for AI in drug discovery”.
The Department for Science, Innovation and Technology has invested £8 million in the project, with OpenBird researchers hoping to use increased investment to scale their operationsProfessor of Structural Chemical Biology at Oxford and Principal Scientist at Diamond Light Source Frank von Delft described how OpenBind intends to “implement the lessons from this foundation phase to ramp up a long-term operation that links high-volume production of AI data with active discovery projects”.
Open data at scale is key to the expansion of AI-powered drug discovery. Imrie told Cherwell: “AI models are only as good as the data they learn from. The data being generated by OpenBind is surprisingly scarce in the public domain. OpenBind aims to address this by generating and openly releasing high-quality protein–ligand structures and affinity data. This will enable the community to build better AI tools for discovering new medicines and advancing science.”
Imrie also referred to AlphaFold, an open public dataset detailing protein folding, which won the 2024 Nobel Prize in Chemistry, as a perfect example of the advances and benefits that can emerge from open-source data. Imrie told Cherwell: “AI tools offer real promise to improve both the speed and quality of molecules being developed, for example, by helping us model complex biological systems.”
The OpenBind project hopes to create new opportunities for postdoctoral positions in the area of AI drug discovery.

