Big Data and Apache Hadoop for the Pharmaceutical Industry

Solution Overview

The pharmaceutical industry is experiencing significant growth in the volume and variety of data from several sources, including the R&D process, retailers, patients, and caregivers. Sales and marketing functions in the pharmaceutical industry have been leading adopters of big data technology and other functions are starting to move in that direction, especially Research and Development (R&D). The current stagnant pipeline and decline in success rates for new drugs is forcing R&D organizations to look at new ways to accelerate the pipeline of new drugs.

Broad changes in the healthcare industry including regulatory changes are also driving the pharmaceutical industry to examine various aspects of their business. In fact, The McKinsey Global Institute estimates that applying big data strategies to better inform decision making could generate up to $100 billion in value annually across various facets of the US healthcare and pharmaceutical system. The industry-leading MapR Distribution for Apache™ Hadoop® is ideally suited for a wide variety of use cases in the pharmaceutical industry, some of which are detailed below:

Improving Genomic Research

DNA sequencing has resulted in significant growth in the volume of data that needs to be computed. This has led to a need for a different and more scalable approach in order to improve the overall genomic research process. A leading pharmaceutical organization is using the MapR Distribution for Hadoop to achieve the following benefits:

  • Ability to run existing applications in a parallel environment that leverage streaming capabilities in Hadoop
  • Faster time to market due to improved sequencing analysis
  • Sit alongside existing grid computing environments

Accelerating Drug Discovery

Pharmaceutical R&D needs to innovate on the use of cutting-edge tools in order to accelerate the rate at which new drugs are discovered. These include sophisticated modeling techniques such as systems biology and high-throughput data production technologies that can highly benefit from the parallel computing framework that Hadoop provides.

Incorporating new types of data such as patient genotypes and clinical trials into the drug development process results can also help in making personalized medicine and diagnostics an integral part of the drug-development process rather than an afterthought.

Understanding and Improving Adherence to Prescriptions

Adherence is broadly defined as the degree to which a consumer takes prescribed drugs as directed. A Capgemini study reports that adherence levels drop over the course of the patient journey from 69% to 43% after the first 6 months of treatment.

The impact of adherence is significant as this leads to about $400 billion in avoidable hospitalizations. This has an impact on pharmaceutical companies as well since patients do not fill prescriptions due to lack of adherence. The usage of Hadoop can bring in a new paradigm and approach to the adherence problem by:

  • Detecting patterns of behavior before non-adherence
  • Performing analysis at the individual level and not just at the group or cohort level
  • Making adherence a proactive program that address patient’s behavior through behavior-based analytics


The business benefits of improving adherence include:
  • Reduction in pharmacy-related waste, a $403 billion dollar problem according to the ES 2011 Drug Trend Report
  • Reducing physician behavior that leads to fraud
  • • Increasing generic utilization

About MapR

MapR delivers on the promise of Hadoop with a proven, enterprisegrade platform that supports a broad set of mission-critical and real-time production uses. MapR brings unprecedented dependability, ease-of-use and world-record speed to Hadoop, NoSQL, database and streaming applications in one unified big data platform. MapR is used by more than 500 customers across financial services, retail, media, healthcare, manufacturing, telecommunications and government organizations as well as by leading Fortune 100 and Web 2.0 companies. Amazon, Cisco, Google and HP are part of the broad MapR partner ecosystem. Investors include Lightspeed Venture Partners, Mayfield Fund, NEA, and Redpoint Ventures. MapR is based in San Jose, CA. Connect with MapR on Facebook, LinkedIn, and Twitter.

MapR Distribution for Hadoop Highlights

  • Direct Access NFS. Easy access to data
  • Integrated security. Built-in data access controls
  • Volume support. Disparate user groups and data by logical volumes
  • Job placement control and resource management. Jobs run simultaneously in the same cluster
  • High availability and disaster recovery. Business continuity and higher business-level service level agreements
  • Data protection. Consistent snapshots with point-in-time audits and recovery
  • Support for structured, semistructured, and unstructured data. All data in the enterprise data architecture
  • High performance. Fast, responsive access to data, and higher throughput

Key Benefits

  • Simplified architecture with easy data access of all enterprise data in a single repository
  • Fast, responsive access to data to enable real-time operations
  • Low cost storage along with the benefits of high-end storage platforms
  • High uptime for the reliability to meet stringent SLAs and avoid costly downtime


DOWNLOAD PDF