Data Sciences Disrupting Life Sciences Industry
The pharma and life sciences industr is faced with increasing regulatory oversight, decreasing Research & Development (R&D) productivity, challenges to growth and profitability, and the impact of digitazation in the value chain. The regulatory changes led by the far-reaching Patient Protection and Affordable Care Act (PPACA) in the United States are forcing the pharma and life sciences industry to change its status quo. Besides the increasing cost of regulatory compliance, the industry is facing rising R&D costs, even though the health outcomes are deteriorating. Led by the regulatory changes, the customer demographics are also changing. The growth is being driven by emerging geographies of Asia Pacific (APAC) and Latin America (LATAM). As a result, the pharma and life sciences industry is compelled to focus on these relatively nascent and evolving markets. Concepts of data sciences and analytics are enabling organizations to rationalize internal costs, and focus on better profiling and targeting of clients and medical practitioners.
The Pharma Disruption
Pharmaceutical organizations can leverage Data Sciences and analytics in a big way to drive insightful decisions on all aspects of their business from product planning, design, manufacturing to clinical trials to enhance collaboration in the ecosystem, information sharing, process efficiency, cost optimization and drive competitive advantage. Analytics enables data exploration, analysis, and data science driven predictive and prescriptive analytics solutions which help in responding to the following key trends in the pharmaceutical industry:
- Drug discovery analytics– Enables scientists to source scientific findings and insights from external labs or internal knowledge to jump start discovery which will in turn help reduce cycle time for product development aiding faster go-to-market
- Reduce cycle-times for clinical trials– Through better insights driven by improved accuracy of analytics
- Supply disruptions predictive analytics– Building predictive models using a combination of internal and external data would help reduce unforeseen shortages in availability of drugs impacting customer service levels and lost sales revenues
- Product failure analytics– Via root cause analysis and predictive analysis of product failures (vendor data)
- Risk analytics– For evaluation of potential risks posed by elemental impurities in a formulated drug product
- Real-time medical device analytics and visualizations– Leveraging Interconnecting data from implanted devices and personal care devices
- Digital channel analytics / social analytics– To more fully understand customer perceptions about their products which helps in proactively fixing product issues or managing communication better
- Enhance reporting systems– To meet the changing regulatory compliance needs more effectively
- Visualization– Renew focus on understanding the underlying business data and generating analytical insights using latest business intelligence (BI) visualization tools
The Human Microbiome
Though genomics currently hogs the spotlight, there are plenty of other biotechnology fields wrestling with big data.
In fact, when it comes to human microbes – the bacteria, fungi and viruses that live on or inside us – we’re talking about astronomical amounts of data. Scientists with the NIH’s Human Microbiome Project have counted more than 10,000 microbes in the human body, with 100 times more genes than in the body’s own cells.
To determine which microbes are most important to our well-being, researchers at the Harvard Public School of Health used unique computational methods to identify around 350 of the most important organisms in their microbial communities. With the help of DNA sequencing, they sorted through 3.5 terabytes of genomic data and pinpointed genetic “name tags” – sequences specific to those key bacteria. They could then identify where and how often these markers occurred throughout a healthy population. This gave them the opportunity to catalog over 100 opportunistic pathogens and understand where in the microbiome these organisms occur normally.
Like genomics, there are also plenty of start-ups – Libra Biosciences, Vendanta Biosciences, Seres Health, Onsel – looking to capitalize on new discoveries.
Perhaps the biggest data challenge for biotechnologists is synthesis. How can scientists integrate large quantities and diverse sets of data – genomic, proteomic, phenotypic, clinical, semantic, social etc. – into a coherent whole?
Many teams are busy providing answers:
- Cambridge Semantics has a developed semantic web technologies that help pharmaceutical companies sort and select which businesses to acquire and which drug compounds to license.
- Data scientists at the Broad Institute of MIT and Harvard have developed the Integrative Genomics Viewer (IGV), open source software that allows for the interactive exploration of large, integrated genomic datasets.
- GNS Healthcare is using proprietary causal Bayesian network modeling and simulation software to analyze diverse sets of data and create predictive models and biomarker signatures.
With data sets multiplying by the minute, data scientists aren’t suffering for lack of raw materials.
Take genomics. Numbers-wise, each human genome is composed of 20,000-25,000 genes composed of 3 billion base pairs. That’s around 3 gigabytes of data. Genomics and the Role of Data in Personalizing the Healthcare Experience:
- Sequencing millions of human genomes would add up to hundreds of petabytes of data.
- Analysis of gene interactions multiplies this data even further.
In addition to sequencing, massive amounts of information on structure/function annotations, disease correlations, population variations – the list goes on – are being entered into databanks. Software companies are furiously developing tools and products to analyze this treasure trove.
For Eg. Using Google frameworks as a starting point, the folks at NextBio have created a platform that allows biotechnologists to search life-science information, share data, and collaborate with other researchers.
The computing resources needed to handle genome data will soon exceed those of Twitter and YouTube, says a team of biologists and computer scientists who are worried that their discipline is not geared up to cope with the coming genomics flood.
Other computing experts say that such a comparison with other ‘big data’ areas is not convincing and a little glib. But they agree that the computing needs of genomics will be enormous as sequencing costs drop and ever more genomes are analysed.
By 2025, between 100 million and 2 billion human genomes could have been sequenced which is published in the journal PLoS Biology. The data-storage demands for this alone could run to as much as 2–40 exabytes (1 exabyte is 1018 bytes), because the number of data that must be stored for a single genome are 30 times larger than the size of the genome itself, to make up for errors incurred during sequencing and preliminary analysis.
Coordinated Approach To Data Processing Will Be The Future
The extensive data generation in pharma, genome and microbiome serves as a clarion call that these fields are going to pose some severe challenges. Nevertheless, they will have to address the fundamental question of how much data it should generate. The world has a limited capacity for data collection and analysis, and it should be used well. Astronomers and high-energy physicists process much of their raw data soon after collection and then discard them, which simplifies later steps such as distribution and analysis. But fields like genomics does not yet have standards for converting raw sequence data into processed data.
The variety of analyses that biologists want to perform in genomics is also uniquely large, the authors write, and current methods for performing these analyses will not necessarily translate well as the volume of such data rises. For instance, comparing two genomes requires comparing two sets of genetic variants. If you have a million genomes, you’re talking about a million-squared pairwise comparisons. The algorithms for doing that are going to scale badly.
Rather than comparing disciplines, there has to be a call to arms for big-data problems that span disciplines and that could benefit from a coordinated approach — such as the relative dearth of career paths for computational specialists in science, and the need for specialized types of storage and analysis capacity that will not necessarily be met by industrial providers.