Large Air Quality Data Set Available for Download

During my PhD, we built ten sensor nodes to measure different air pollutants and deployed them in the city of Zurich (Switzerland) on top of public transport vehicles. With this installation we collected a unique pollution data set comprising by far the largest number of measurements at that time.

Balz Maag, a co-researcher, made part of this unique data set publicly available:

Zenodo is a strong supporter of open data in all its forms (meaning data that anyone is free to use, reuse, and redistribute) and takes an incentives approach to encourage depositing under an open license.

The data set contains 11 million samples, this is one year (from April 2012 to April 2013) worth of ultra-fine particle (UFP) concentration measurements. The data was collected by a mobile sensor network. The sensors were mounted on top of 10 streetcars in the city of Zurich, Switzerland. The data has been post-processed by performing a periodic null-offset calibration and  filtering samples during malfunction.

A small excerpt of the data set:

2012.04.19 14:12,47.373288,8.522049,1.1,5,6400,48.1,16.0
2012.04.19 14:12,47.373272,8.522053,1.1,5,6545,47.4,16.1
2012.04.19 14:12,47.373253,8.522068,1.1,5,6656,47.2,16.3
2012.04.19 14:12,47.373244,8.522065,1.1,5,6731,47.1,16.4
2012.04.19 14:12,47.373233,8.522051,1.1,5,6451,47.9,16.0
2012.04.19 14:12,47.373233,8.522044,1.1,5,6400,48.1,16.0
2012.04.19 14:12,47.373237,8.522035,1.1,5,6178,48.7,15.7
2012.04.19 14:12,47.373248,8.522030,1.1,5,6378,47.2,15.6

The data set has been used and is described in more detail in the following publications:

  • David Hasenfratz et al. Pushing the Spatio-Temporal Resolution Limit of Urban Air Pollution Maps. IEEE International Conference on Pervasive Computing and Communications (PerCom). Budapest, Hungary, March 2014. Best Paper Award.
  • David Hasenfratz et al. Deriving High-Resolution Urban Air Pollution Maps Using Mobile Sensor Nodes. Pervasive and Mobile Computing. Elsevier, 2015.
  • David Hasenfratz et al. Demo Abstract: Health-Optimal Routing in Urban Areas. ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN). Seattle, USA, April 2015.
  • Michael Müller et al. Statistical modelling of particle number concentration in Zurich at high spatio-temporal resolution utilizing data from a mobile sensor network. Atmospheric Environment. Elsevier, 2016.

Visualizing MQTT Data

The interest in crowd-sourced applications is steadily increasing year to year. Drastic events like the Fukushima Daiichi nuclear disaster in March 2011 showed the power and value of public initiatives, in this case to build Geiger counters to accurately measure radiation. This was particularly valuable in a situation where the general public did not have a high confidence in the numbers reported by the government.

The availability of low-cost air quality sensors pushed another application in the last years, the one of do-it-yourself air quality stations. The OK (Open Knowledge) Lab Stuttgart provides on a very nice how-to guide to building and operating an air quality station with integrated temperature, humidity, and particulate matter sensors. The data can be shared with their platform using the API provided. Based on the data of thousands of distributed stations a particulate matter (PM10) pollution map is published:

Components of the air quality station (picture from

Since I am interested in air quality it was obvious to build a station for myself. The setup with an NodeMCU ESP8266 for data processing and communication (WiFi), a SDS011 fine dust sensor, and a DHT22 temperature and humidity sensor is easy and straight-forward. While they also provide a firmware to get quickly started, I was interested in sending the sensor data over MQTT, which is not supported by the original firmware. MQTT is a lightweight messaging protocol for sensor data streams from small sensors and mobile devices, optimized for high-latency or unreliable networks. Hence, I stripped down their full-fledged NodeMCU firmware and extended it with the ability so send all sensor data to an MQTT broker. The code can be found here:

A deployed air quality station.

Much to my surprise, it was not easy to find a simple online platform to visualize sensor data sent over MQTT. First I tried Amazon Web Services (AWS), one of the dominant players in providing cloud service platforms. While AWS provides many great services their setup for this specific scenario is difficult. It requires an additional MQTT broker sitting between the NodeMCU ESP8266 sensor node and AWS’ own broker and involves multiple different service platforms, such as shown on the picture below depicting a high-level architecture diagram of the setup required.

High-level architecture overview of the involved services.

Furthermore, Amazon QuickSight only provides rudimental data visualization capabilities in its current state. Showing real-time plots of the sensor data collected is not yet possible.

I found that this experience was not the exception but rather the rule. It was difficult to find a simple, easy-to-use cloud service for visualizing sensor data sent over MQTT. After having searched for a while, I came across ThingSpeak:

ThingSpeak is an open source “Internet of Things” application and API to store and retrieve data from things using HTTP over the Internet or via a Local Area Network. With ThingSpeak, you can create sensor logging applications, location tracking applications, and a social network of things with status updates.

ThingSpeak has integrated support from the numerical computing software MATLAB from MathWorks allowing ThingSpeak users to analyze and visualize uploaded data using Matlab without requiring the purchase of a Matlab license from Mathworks. provides several paid license options but is also available as a free service for small non-commercial home projects (~8200 messages per day) with limits on capacity and update rates. These limitations are not a problem for an air quality station.

The ThingSpeak’s MQTT broker is reachable under On ThingSpeak’s platform several channels can be created, each channel can process and visualize up to eight sensor data types. A channel can be fed with data by using by using as topic channels/CHANNEL-ID/publish/API-KEY where CHANNEL-ID is the channel’s unique ID and the API-KEY is the channel specific write API key. A detailed description with example code for an Arduino client can be found here: Publish to a Channel Using Arduino Client.

For every channel also a public dashboard can be defined. This is shown below for my own channel visualizing the air quality station’s sensor data:

Public dashboard of the air quality sensor’s public channel.