The Ultimate 3-Minute Guide to Time Series Data and OpenTSDB

What is a time series? A time series is a sequence of data points which are ordered in time. Time series data can come in multiple shapes, and can be used in many facets of everyday life, such as measuring rainfall, earthquake activity, or even stock prices. With the growth of the Internet of Things, the volume of time series data you can collect is staggering - reaching 100 million data points per second.

Here is an everyday example, the temperature from your home thermostat— it impacts everyone’s daily routine (and your energy bill). The temperature in your home throughout the day can be considered time series data. For example:

9:00am - 72f - ac off

9:10am - 73f - ac turned on

9:15am - 72f - ac off

At 9:10 am, the air conditioner was automatically turned on and the temperature dropped in the next sensor reading. In this time series example, there are two separate values being tracked: the first value is the temperature, and the second is the state of the air conditioner.

OpenTSDB is an Open Source Time Series Database which can store and serve massive amounts of single value time series data without losing granularity. Based on this definition, it should be clear how OpenTSDB stores its data.

time series data table

The value of on or off would actually have to be stored as a 1 / 0, as OpenTSDB only stores number values (floating point or integer).

It may seem counterintuitive as to why OpenTSDB would store this data separately, so let’s consider the use case. The data from the two measures may not always be measured at the same time, and could be slightly different from one another. Imagine graphing a line chart where you see the temperature as the line and markers on the line where the AC status shows up as flags on the line. Another reason for sampling them separately is the fact that the period between the two are not tied together. The temperature would likely be sampled at a much higher frequency than that of the AC status.

OpenTSDB is very easy to use. It has a REST API, and is also capable of reading directly from a socket in a format like ‘ timestamp value tagk=valk tagj=valj’ with one data point per line.

Tags are descriptive details that can be used to differentiate items that may use a common metric. Let’s build on our example of the temperature in your home. Imagine for a moment that you have a temperature sensor in every room of your home, and that they are all reporting back to the thermostat. You could label each room with a number or a name, room=1, room=garage, etc.. You could also label them with the floor of the home, floor=basement, floor=1, floor=2, etc. This allows you to query and/or aggregate results of time series data by the room or floor. You may want to know the average temperature over a period of time per floor of your home. You might think that your second floor of your home is always warmer in the summer, as it likely is due to heat rising. The information you glean may help you optimize your home’s energy usage, e.g. closing vents in lower level rooms, calling in an HVAC specialist, or altering the times of day you run your air conditioning or heat.

If this type of analysis is interesting to you, you may be interested in learning about Operational Analytics. Check back soon for a blog post on that subject. 

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free