3 Ways That Big Data Are Used to Study Climate Change – Monitoring, Modeling, and Assimilation

In May of this year, the UN kicked off a new initiative on climate change – the Big Data Climate Challenge.  This initiative is associated with the UN Secretary-General’s 2014 Climate Summit.  One of the primary aims of the challenge is to use big data to make the case for climate change action, specifically “to bring forward data-driven evidence of the economic dimensions of climate change.”   You can read more about it at the Big Data Climate Challenge (BDCC) website: http://unglobalpulse.org/big-data-climate/

According to an article at FierceBigData.com, the UN is looking for any climate-related data projects that show the economic impact of climate change and ways to manage it.”  The deadline for BDCC submissions is June 30. Winners will be flown to the United Nations 2014 Climate Summit.  The FierceBigData.com article lists examples of relevant domains that can join in multidisciplinary big data efforts to study and manage climate risks.  These domains include:  smart cities, natural resource management, agriculture and food systems, complex systems, green data centers, material sciences, disaster risk reduction and resilience, architecture and design, behavioral science, climate finance, and carbon markets.

We review here a data science perspective on the problem, by discussing three ways that big data are already embedded in climate change studies:

(1) Big data in climate first means that we have sensors everywhere -- in space looking down (via remote sensing satellites) and in situ ("on the ground") sensors, which are used to monitor and measure weather, land use, vegetation, oceans, cloud cover, ice cover, precipitation, drought, water quality, sea surface temperature, and many more geophysical parameters.  These data sets are augmented with correlated data sets: biodiversity changes, invasive species, "at risk" species, etc. These comprehensive data collections give us increasingly deeper and broader coverage of climate change, both temporally and geospatially. This impressive array of sensors also delivers a vast increase in the rate and the number of climate-related parameters that we are now measuring, monitoring, and tracking.  All combined, these attributes of climate data satisfy all of the criteria of big data: high volume, variety, and velocity.  The combined power of these data sets gives us deeper insights into changes in the biosphere, hydrosphere, cryosphere, and atmosphere, and what is driving change in all of those Earth systems.  Two examples of large Earth system monitoring projects are NEON (National Ecological Observatory Network) and OOI (Ocean Observatories Initiative, a project of the Consortium for Ocean Leadership).

(2) Climate change is one of the largest examples of scientific modeling and simulation.  These simulations are used to predict climate behavior over the next 100 years, and beyond.  Huge climate simulations are now run daily (if not more frequently). The simulations have increasingly higher horizontal spatial resolution (hundreds of meters, versus the older simulations that had resolutions of tens of kilometers), higher vertical resolution (a larger number of layers modeled in the atmosphere), and higher temporal resolution (minutes or hours, versus the older simulations that had resolutions of days or weeks).  Consequently, we now update the climate model daily with the latest input data (from all of the sensors and measurements that were mentioned earlier) and we re-run the full 100-year simulation at very high spatial and temporal resolution using the latest and greatest input data from around the planet.  Note that a climate model is not focused on predicting tomorrow's local weather, but focused on predicting the Earth's "weather" over periods of decades and centuries.  The model output data that come out of these supercomputer simulations is enormous ... petabytes of data from each of these daily simulations!  These massive computer simulation output data require all of the typical big data tools for data storage, processing, analyzing, visualizing, and mining (for discovery). What makes these data unique for climate studies is that they are computer-generated, thus complementing the massive observational data streams that come from the ubiquitous worldwide network of Earth sensors. 

(3) When working with big data, it is easy to find a model that reproduces the correlations that we discover in our data.  But we must always remember that correlation does not imply causation, and so we need to apply systematic scientific methodology in order to discover the causes (the "real model") for the correlations that we discover.  Similarly, we should remember and heed the quote from the great statistician George Box: "All models are wrong, but some are useful."  This warning is especially critical when working with results from numerical computer simulations, where there are so many  assumptions and "parameterizations of our ignorance" that go into the models, especially those that attempt to simulate enormously complex systems, like the Universe, or the human brain, or the Earth's climate system.  In that sense, the numerical models have a degree of subjectivity. But, what fixes that problem (and also addresses George Box's warning) is data assimilation.  Data assimilation is the process by which we incorporate the latest and greatest observational data into the current model of a real system, in order to correct, adjust, and validate the choices that we make in our model assumptions and parameterizations.  Climate modeling uses data assimilation probably more than any other simulation science domain, and rightly so since we have such huge volumes of new incoming data to assimilate and because the problem that is being tackled is so important and critical for the future of our planet.  Consequently, big data play a vital and essential role in climate prediction science, by providing corrective actions on the evolving predictive simulations through on-going data assimilation.

Finally, we recognize the work of the Climate Corporation, who is working with MapR in the use of big data, Hadoop, and machine learning to monitor hyper-local weather, to predict future weather with high-resolution simulations, to improve farming operations, to develop agronomic models, to assist farmers in adapting to climate change, and to protect the world’s agribusiness.  If you have an idea for a data-driven response to climate change, or a big data use case that may be useful in climate studies, then act now to participate in the UN’s Big Data Climate Challenge.

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams

 

 

 

Download for free