Oxford researchers collaborate to release open data to accelerate AI drug discovery

University of Oxford researchers have released a new open data set and AI model to accelerate drug discovery.

The project is led by the OpenBind consortium, a collaboration between the Universities of Washington, Columbia, and Oxford, as well as European Bioinformatics and several other research groups and industry partners across the world. OpenBind aims to make large, standardised open-access datasets that are publicly available.

Fergus Imrie, Associate Professor at the Department of Statistics and OpenBind computational researcher, told Cherwell: “One of the major bottlenecks in AI-enabled drug discovery is the shortage of large, reliable experimental datasets showing how small molecules bind to proteins.”

On 5th May, OpenBind released the first open dataset, which consisted of X-rays of compounds binding to the EA-A71 virus protein, as well as the binding strength measurements for many of the images. Speaking to Cherwell, Imrie described how this data had been generated at the Diamond Light Source in Oxfordshire using “high-throughput X-ray crystallography”.

Charlotte Deane, a senior OpenBind investigator, as well as the chair of the Engineering and Physical Sciences Research Council, described this first release of data as “an important step because it shows we can now generate high-quality, standardised data at scale, specifically designed for AI in drug discovery”.

The Department for Science, Innovation and Technology has invested £8 million in the project, with OpenBind researchers hoping to use increased investment to scale their operations. Professor of Structural Chemical Biology at Oxford and Principal Scientist at Diamond Light Source Frank von Delft described how OpenBind intends to “implement the lessons from this foundation phase to ramp up a long-term operation that links high-volume production of AI data with active discovery projects”.

Open data at scale is key to the expansion of AI-powered drug discovery. Imrie told Cherwell: “AI models are only as good as the data they learn from. The data being generated by OpenBind is surprisingly scarce in the public domain. OpenBind aims to address this by generating and openly releasing high-quality protein–ligand structures and affinity data. This will enable the community to build better AI tools for discovering new medicines and advancing science.”

Imrie also referred to AlphaFold, an open public dataset detailing protein folding, which won the 2024 Nobel Prize in Chemistry, as a perfect example of the advances and benefits that can emerge from open-source data. Imrie told Cherwell: “AI tools offer real promise to improve both the speed and quality of molecules being developed, for example, by helping us model complex biological systems.”

The OpenBind project hopes to create new opportunities for postdoctoral positions in the area of AI drug discovery.

Oxford researchers collaborate to release open data to accelerate AI drug discovery

Check out our other content

King Charles III inaugurates the Schwarzman Centre

Jacinda Ardern and eight others awarded with honorary degrees

Tommy Robinson’s invitation to Oxford Union met with protest: Live updates

Home Office proposes doubling of Campsfield capacity

New Oxford campaign seeks to demystify genetic and neurological conditions using animations

Twelve Oxford Scientists receive prestigious Royal Society Fellowship

Oxford summer schools ranked among the fastest-growing companies in Europe

University Council candidates warn of financial pressures, bureaucracy, and AI disruption at Oxford

Nine colleges indirectly invest in local Campsfield immigration centre

Most Popular Articles

The BNOC List 2026

The life and death of a library

The Oxford students who can’t read books

From sub fusc penguins to college puffer herds: The ‘uniforms’ of Oxford

A plate for everyone: Food restrictions at formals

Support Student Journalism

Explore

Follow us

More