What is the Internet of Things and Why Does it Matter to Big Data?

This is the first in a series of articles around the topic of the Internet of Things and its relevance to big data. We will discuss challenges and opportunities, review real-world use cases and how people are benefiting from applying big data technologies to make the most out of the data coming from sensors, constrained devices and smartphones.

The Internet of Things (IoT) is the idea to expand the existing Internet infrastructure (such as IP or UDP/TCP) to devices in order to facilitate communication between the devices themselves and/or between the devices and humans. These devices are typically constrained devices such as sensors, but fat ones such as your smartphone are also part of the game. Alternatively, but semantically more or less equivalent terms for IoT are: Machine-to-Machine (M2M), Web of Things or Programmable Web. IoT covers a variety of protocols, domains, and applications. Follow-up posts in this series will cover protocols, formats, standardisation and main players in greater detail. 

While mainstream adoption of IoT apps is just about to start, pretty much everyone is in wild agreement that it’s going to be huge in the foreseeable future: it is predicted that there will be between 40 and 50 billion connected devices by 2020.

IoT Apps Driven by Big Data Solutions

As previously discussed in Sensors: primary source of Big Data, the data from IoT apps represents a major source for big data, going forward. But there's also the other side of the coin: IoT apps being driven by big data solutions. So let's have a look at some real-world examples of IoT applications where a big data platform not only comes in handy, but in fact is at its core:

  • In the agriculture domain, sensors are deployed on farm machines, revealing information not only about the equipment, but also about soil temperature, moisture, etc. The goal is not only pro-active equipment maintenance, but also to help farmers optimize their yields based on a variety of environmental factors. 
  • Every smartphone has a number of sensors (video, GPS, gyro, etc.) and collectively these sensors produce a huge amount of data: some unstructured (videos), some structured (location), driving apps from gaming the quantified self, to augmented reality.
  • We find an array of sensors in commercial buildings, and in the future, we’ll likely find more in homes as well. Smart homes can not only react to the user’s needs (think: Nest), but function as the permanent base for all data related to everyday life, including food, health-related topics, education, entertainment, and so on.

We will expand on each of the three mentioned use cases above in later posts of this series.

IoT Naturally Lends Itself to Big Data

In the following section, I will discuss why IoT naturally lends itself to big data by reviewing IoT data characteristics along the three main dimensions of big data: volume, variety, and velocity.

Volume. In order to develop full-blown commercial IoT applications, often one needs to be able to capture and store all the sensor data. While a single, simple sensor—say a temperature sensor—might only generate some MB of data per year, the sheer number of sensors puts us quickly into the high PB range. 

Variety. There are tons of data formats in use, often binary and/or compressed textual formats (mainly due to the limitations of the devices, in terms of power, storage, bandwidth, etc.). None of this data is relational per se, and a platform that can accept all this data is clearly superior compared to an ETL-based RDBMS.

Velocity. For many, but not all devices, the transmission rate of the data is comparatively high. In addition to this, data streams are the norm in IoT. Again, this is an area where RDBMSs do not exactly shine, as we’ve known already for a decade now. Last but not least, for many applications, combining historical data with the new, incoming data from sensors is essential; this is usually realized through leveraging the Lambda Architecture.  

What's up Next?

I hope to see you around for the rest of the series with the following post coming up soon: The IoT Key Players—talking about the main players in the field, including companies, formats and protocols, and (emerging) standards.

For further reading on this topic, check out:

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free