traditional drug development, companies typically start with a target and a mechanism identified and validated in preclinical studies. This forces them to make a heavy bet on whether these same genes or proteins are implicated in patients’ pathologies. But a rising generation of startups is applying machine learning (ML) to rich collections of clinical and molecular data without following preconceived hypotheses. “The vast majority of what we’re doing is hypothesis-generating and hypothesis-free,” says Jeanne Magram, CSO of Celsius Therapeutics, an ML-driven drug discovery company.

AI Discoveries

Investment in this AI-powered approach is flowing in. In March, armed with $83 million in newly acquired funding, Celsius launched a clinical program for inflammatory bowel disease (IBD) based on a gene target identified by ML analysis of single-cell data from patient tissue samples. Verge Genomics, which uses AI to discover new targets for neurodegenerative disorders, pulled in $98 million this past December from investors including Eli Lilly and Merck and aims to launch a clinical trial for amyotrophic lateral sclerosis (ALS) drug against a new target this year. And in January, London-based BenevolentAI expanded the scope of its three-year-old partnership with AstraZeneca to apply the biotech’s disease-agnostic platform, which has already generated at least three new drug targets.

Early-stage players are shoring up support too. Immune secured $215 million in series C funding this past October to identify new drug targets by applying AI to a vast collection of patient immunological data. And London-based Relation Therapeutics raised $25 million in June to implement an ML-based platform that combines single-cell analysis with clinical insights to uncover new targets for treating bone diseases.

What distinguishes all these companies is their ‘human first” approach. The initial focus is to identify targets in patient-derived data, as opposed to identifying them in animal models or high-throughput screening and cell-based assays. “At the very core of how we built our platform is the idea that to succeed in humans, we need to be starting in humans,” says Alice Zhang, co-founder, and CEO of Verge.

This is not inherently new. Over the past 20 years, numerous drug programs have been spurred by human genetic data, mostly from population-scale surveys known as genome-wide association studies (GWAS) that compare genetic features of patient cohorts to those of healthy controls. What’s more, the rise of vast research biobanks and national public-private partnerships like Genomics England, which has collected phenotypic and genomic data from >150,000 individuals, has given drug companies ample material with which to work.

Large scale data

Big biopharma companies have embraced the approach. For instance, Amgen acquired the Icelandic startup decode Genetics in 2012 to benefit from the company’s deep genomic expertise and data resources, including genomic and clinical data from roughly half a million people. Regeneron and AstraZeneca have also each built formidable collections of molecular and clinical data on well over a million individuals through a combination of internal studies, partnerships with academics, and international biobank initiatives.

Data at this scale make it much easier to discover rare gene variants with a powerful effect on health and disease. But as these datasets get larger and larger and incorporate additional omic layers beyond the genome, including transcriptomic, proteomic, or even metabolomic data, they become more challenging to analyse. This is where AI can become a powerful asset — particularly when one is searching for signals in the data that might not be readily obvious. “At some point, we will have done all the low-hanging fruit, and maybe that’s where new methods will be more transformative because AI and ML are pretty good at looking across a broad swath of variables at really subtle nonlinear signals,” says Jeffrey Reid, chief data officer at the Regeneron Genetics Center.

Those signals can include diverse types of data associated with a disease. For example, Insitro has developed an ML-based platform that can analyse tumor histopathology images, genomic sequences and clinician reports to identify distinctive features associated with a particular pathology. A recently announced partnership with Genomics England applies its AI-based target discovery platform to their dataset to facilitate drug target discovery. At a Genomics England conference in April, Insitro CEO Daphne Koller commented, “Oftentimes, human biology surprises us with things that we didn’t train our clinicians to look for.”

On the flip side, the medical records associated with datasets like the Biobank — a repository of medical and genetic data from half a million individuals — can provide an essential context for the molecular data. “For the  Biobank, that means a very broad spectrum — imaging data, even some proteomic assays, medical record data, questionnaire data,” says Reid. “There’s a lot of stuff your medical record doesn’t capture like did one of your parents have Alzheimer’s, that can be highly relevant for genetic studies.”

ML analysis can also uncover the complex physiological pathways underpinning disease and provide insight as to why there is variability among patients with a particular disease. “We can also go in and ask questions of the data — for example, you may see in a specific population something interesting around a particular cell type,” says Magram.

Diseases resilience found in analytic data

Vast cohorts such as those assembled by national-scale initiatives such as Genomics England or its Finnish equivalent, the FinnGen program, can serve as a starting point for exploring both common and rare diseases. But for some startups, working with smaller cohorts allow them to dive deeper into specific diseases. Verge, for instance, is focusing on neurodegenerative disorders and has assembled genomic, transcriptomic, and proteomic brain and spinal tissue data from 7,000 patients. And Hong Kong-based Insilico Medicine has applied AI to identify dysregulated gene expression profiles and altered pathways in amyotrophic lateral sclerosis (ALS) for target discovery. In a recent publication, they describe how they mined post-mortem CNS samples, and iPSC-derived motor neurons from public datasets of ALS patients and controls, resulting in 17 potential drug targets for future drug development, including 11 novel targets.

London-based Alchemab is using MLto analyse what makes cancer survivors resilient to disease. “Our hypothesis is that, at least in some cases, people have protective autoantibodies that are providing them with some disease resilience,” says CSO and co-founder Jane Osbourn. By applying AI analysis to antibody-encoding DNA sequences from tens of millions of B cells from each individual — roughly 1% of their total B cell repertoire — Alchemab aims to uncover those protective antibodies and the cellular proteins that they target.

AI can be implemented at various stages of the analysis, including at the very beginning — essentially combing through the entire biomedical data haystack in search of a crucial sliver of actionable data. For example, Reid says his team at Regeneron occasionally performs hypothesis-agnostic ‘all-by-all’ analysis. “You can just say, show me all of the most significant associations between this genotype and any phenotype, and you get that list.” Or it can focus on specific disease phenotypes and narrower subsets of genes and pathways to offer molecular explanations for specific pathologies. This was the case when Verge scientists used ML to analyse spinal tissue from patients with ALS and detected a link between lysosomal function and disease pathology. “That then relies on a large body of understanding of regulatory interactions, of gene–gene interactions, and we use that to essentially create a rank-ordered list of potential targets,” says Zhang. One of the top-ranked targets, a phosphoinositide kinase called PIKfyve, is now the focus of Verge’s lead clinical program, which is on track to submit an Investigational New Drug application to the Food and Drug Administration later this year.

To search for new drug targets, BenevolentAI and AstraZeneca comb through experimental and clinical data repositories, as well as the scientific and medical literature. The data harvested in this fashion are then assembled into ‘knowledge graphs’ that capture the relationships between, for example, genes and pathways. Slavé Petrovski, VP and head of the AstraZeneca Centre for Genomics Research, developed an ML tool that uses insights from dozens of biological databases (including the Human Protein Atlas and various GWAS data catalogs) and disease-specific clinical and genomic resources to decipher potential disease-related genes in large human databases. “It can assign a probability of disease relevance to each of the 20,000 human genes for a particular phenotype,” he says. “That’s one way that we can sift through all those highly ranked, well-ranked signals that aren’t yet slam dunks to pull out those that are potentially true biology.”

AI can also classify and characterise individual cell subtypes. Celsius’s platform analyses single-cell transcriptomic data from different cohorts of patients to distinguish how certain genes in specific cell types correlate with particular phenotypes. For IBD, says Magram, “one of those cell types is the inflammatory monocyte, which is a key driver of cytokine production, and so we homed in on those cells and asked what receptors might be driving the biology there.” This analysis uncovered a protein called TREM1, a cellular receptor that can be selectively inhibited to block inflammation in IBD without broadly compromising immune function, and this protein is now the company’s lead target.

AI Tackling issues to the side

Even with the most powerful algorithms, the AI’s output is typically only a step along the road to target identification. “Closing the loop is really important,” says Su-In Lee, a computer scientist at the University of Washington who has used AI and ML in biomedical research. “You use neural networks to generate this hypothesis, and then you pass that target candidate to experimentalists and do the experiments, and then that can inform the model learning again.”

This preclinical work — standard cell culture- and mouse-based assays — will often follow. But a handful of companies, such as Insitro and Verge, are trying to keep this process as human-oriented as possible by performing target characterisation in patient-derived induced pluripotent stem cells. “That allows us to take skin cells from patients with ALS and Parkinson’s disease and directly convert them into their own brain cells, and then we validate those targets in those human-derived neurons,” says Zhang.

It remains to be seen just how much of a real edge AI and ML confer.

“They’re a screwdriver and a hammer — they’re not going to replace every tool in the toolkit,” Zhang says. “There are some things they’re good at; there are some things they’re really bad at.” The first wave of targets identified with help from AI has yet to be proven in clinical trials.

In addition to Verge, and Celsius, Alchemab expects to submit an Investigation New Drug application by late 2023. And AstraZeneca researchers have unearthed loss-of-function variants in a gene called MAP3K15 that reduce a person’s risk of developing diabetes without affecting body mass index. “There’s a long way to go,” says Petrovski, “but this could truly be a disease modifier, not just looking to treat the symptoms by reducing glucose levels.”

Even if AI remains just one tool in drug developers’ belts, Osbourn is enthusiastic about its ability to tackle old problems in new ways. “For me, the key is this combination of machine-learning in silico algorithms with some kind of deep interdisciplinary expertise, just to kind make sure that we’re learning each time and sort of turning the wheel,” she says. “And I just love the opportunity that Al’s given us to hopefully do something different.”

Are you interested in the Life Science Industry? Or are you looking for a job in a Biotech organisation? Then make sure to check out how QTC Recruitment can help you with that here!


Also published on Nature.com