Digital Society

OpenBind Shows How Shared Data Beats AI Hype in Drug Discovery

Jamie Bykov-Brett · 14 May 2026 · 4 min read

Even the most celebrated AI systems in biology, the ones that have changed how we think about protein structure, hit a wall when asked to predict something genuinely new. The algorithms are fine; the experimental data they learn from is patchy, inconsistent and, in pharma especially, locked behind commercial walls.

That is the problem the OpenBind consortium is trying to fix, and the way they are doing it is worth a closer look for anyone in a data-heavy industry, including those outside life sciences.

Researchers from Oxford's Department of Statistics, working with OpenBind, have released a dataset and a predictive model focused on one of the trickiest jobs in early drug design: working out which small molecules will bind to a disease-related protein, and how strongly. The release includes detailed X-ray images of 699 compounds binding to the EV-A71 virus protein, with binding strength measurements for 601 of them, making it one of the largest public datasets for a single protein target. They are giving it away.

That second sentence is the one I would underline.

The data, not the model, is the bottleneck

For years, the public conversation about AI has been about cleverer architectures and bigger compute budgets. In drug discovery, the constraint sits somewhere less glamorous. As Professor Charlotte Deane, one of the senior OpenBind investigators, put it, the release matters because it shows we can now generate "high-quality, standardised data at scale, specifically designed for AI in drug discovery." Models like AlphaFold and Boltz are extraordinary, but they can only confidently model structures that resemble what they have already seen. Step outside that comfort zone and confidence drops fast.

So OpenBind has taken on the unglamorous work: running huge volumes of consistent, reproducible binding experiments through automated pipelines at Diamond Light Source in Oxfordshire, then processing the results into formats machines can actually learn from. The work is expensive and slow. And then handing it to the world.

Why this pattern matters beyond pharma

Pharmaceutical companies have historically hoarded binding data. It is one of the most carefully guarded assets in the sector, because the cost of generating it is enormous and the competitive advantage feels obvious. OpenBind's bet is that the binding data itself is no longer the competitive layer. The competitive layer is what you do with it: the chemistry intuition, the target selection, the clinical pipeline, the manufacturing. Commodifying the data underneath frees everyone to compete higher up the stack.

That logic applies beyond medicine. If you sit in financial services, ask yourself what shared dataset would commodify a cost every firm in your industry currently pays alone. Fraud signals? Customer onboarding identity checks? Climate risk inputs? In professional services, it might be benchmarking data or regulatory interpretation. In higher education, learning outcomes across institutions.

The interesting strategic question is rarely "should we share?" It is "would we rather convene the consortium, or be the last firm to join one someone else built?" Those are very different positions.

What this asks of leaders

This is where the industrial mindset shows up most stubbornly. The instinct to hoard data and treat every byte as proprietary is a leftover from an era when data was scarce and proprietary collection was the only path to insight. In an AI-mediated world, the value of a single firm's private dataset often falls below the value of a shared dataset that is standardised and openly maintained. This holds often enough that the question deserves a serious answer rather than a reflexive no.

Dr Fergus Imrie, the OpenBind computational researcher at Oxford, made a point worth borrowing for any boardroom: "High-quality experimental data is essential for developing new and improved AI models. As AI performance improves, this in turn helps guide future experiments, helping to accelerate discovery." The data and the models pull each other forward. Refuse to share, and you slow both.

Share this article

LinkedIn X Email

Jamie Bykov-Brett

Listed as one of Engatica's World's Top 200 Business and Technology Innovators, Jamie is an AI and automation consultant who helps organisations move from curiosity to confident daily use. As founder of Bykov-Brett Enterprises and co-founder of the Executive AI Institute, he designs AI upskilling programmes that have delivered 86% daily adoption rates and a 9.7/10 NPS. His work sits at the intersection of technology implementation and human development, with a focus on responsible governance, practical tooling, and making AI accessible to every level of an organisation.

Get AI Insights Delivered

Practical perspectives on AI adoption and the future of work. No spam.

07 May 2026

What Algorithmic Fairness Can Learn From City Planners

27 April 2026