With the turn of the century, the technological revolution, and the onset of social media, big data has become critical to our everyday lives. By enabling us to make advancements in finance, communications, medicine, and scientific research, it has helped us find, store, deliver, and analyze large volumes of distributed data. Most interesting, perhaps, is big data’s role in genomics and its ability to identify, map, and sequence hundreds of millions of strands of DNA.
April of 2003 brought about a scientific breakthrough marked by the completion of The Human Genome Project, the 13-year-long effort to create and map human DNA. For the first time in history, the sequence of the human genome and the tools required to analyze the data were freely available on the Internet. Once believing that our bodies were composed of 100,000 genes, scientists discovered that we actually had 20,000-25,000 of these molecular units. This finding helped scientists effectually analyze the association between genetics and disease.
Though the efforts of The Human Genome Project and UK’s current 100,000 Genome Project, today our society is one step closer to a reality in which we can predict diseases and use genomics to create personalized, data-driven medicine. We are approaching a future in which statistically correlated data will have the ability to analyze each patient’s unique DNA accurately and effectively, empowering doctors to assign personalized treatment plans to their patients.
“IT is central to testing and research,” Peter Johnson, chief clinician at Cancer Research UK, explains. “Personalized medicine is the most exciting change in cancer treatment since the invention of chemotherapy.”
Indeed, IT is central to testing and research. In order to achieve this medical breakthrough, a massive amount of data will need to be stored, analyzed, and understood. So we have to ask ourselves, would it be possible for us to map and sequence the DNA for each and every person on the planet without the use of a robust big data platform?
The UK has already dedicated five years to sequencing the DNA for 75,000-100,000 patients, most of which are suffering from rare diseases and common cancers. The government-funded project began in 2012 and is set for completion in 2017. At the current rate, how long would it take to map and store the DNA for one billion people, let alone seven billion?
In order to expedite this process, we must look to robust distributed data base platforms to process all this information in a holistic and cost-effective way; however, while platforms such as Hadoop have the necessary raw processing power, these distributed systems introduce a lot of complexity. This complexity involves the monitoring and coordination of all the IT components in the Hadoop technology stack and not just Hadoop itself. Without a resilient Hadoop ecosystem including all the supporting IT systems and infrastructure, high levels of performance and availability would not be secure.
To ensure this resilience, what is necessary is a comprehensive IT monitoring platform that can monitor Hadoop plus all supporting technologies so that cross-domain impact and trend analysis can be done and reported in real-time, allowing IT organizations to be proactive rather than reactive. This cross-domain correlation can only be effectively accomplished via a unified IT monitoring platform that can collect all the key metrics, end-to-end, and then present real-time views of business processes facilitating proactive actions by IT organizations before performance is degraded.
One example of a platform that can present appropriate business and service levels is Centerity Systems, a unified IT monitoring approach best known for its superior Business Service Management (BSM) features. With Centerity, all key metrics of every contributing component can be collected, normalized, and correlated with the use of customizable Executive Dashboards. These Executive Dashboards deliver business analytics and business value that cannot be achieved by point solutions or tools focused on individual technology silos in isolation.
“There are many IT monitoring tools that give you pieces to the puzzle, but these are really just isolated islands of data with little business value,” Marty Pejko, COO of Centerity Systems explains. “What is needed is a way to collect all the data in one system that provides systemic view, in effect, a business intelligence layer across an organization’s entire IT environment. It’s Centerity that maximizes an organization’s investment in Hadoop by optimizing its performance and availability via rapid root cause, trend, and impact analysis. This can only be done thoroughly and cost-effectively with a next-generation, unified IT monitoring platform such as Centerity.”
In January of 2015, the United States proposed an effort to analyze the DNA of one million volunteers of all different ages, genders, and health backgrounds. If this project comes to fruition, it will mark a significant turning point for genomic research, one that president Barack Obama believes will “lay a foundation for a new era of life-saving discoveries." With the use of advanced tools like Hadoop, MapR, and Centerity, there is no doubt that our journey towards new personalized treatment plans and cures will be here sooner than we ever imagined. By enabling doctors to access patients personal DNA and by using big data to its fullest potential, the technological and medical breakthrough will likely revolutionize healthcare and the world as we know it today.
Editor's Note: Download Next Generation Genome Sequencing with MapR to learn more about genome sequencing using Hadoop.