This project analyzes the PM2.5 data detected by air quality sensors in Taiwan, establishes a method for screening outliers, improves data quality, and further explores the effect factors of data variation or bias. To understand the meaning of the sensor data, compares them with PM2.5 Moving Average (0.5 × average of the previous 12 hours + 0.5 × average of the previous 4 hours) of the Taiwan Environmental Protection Administration (TWEPA) standard station, subsequently name as "real-time AQI". The main work results include (1) establishing an outlier screening program, (2) understanding the factors that affect sensor bias, (3) understanding the difference between the minute value of the sensor and the moving average of the air quality at the TWEPA standard station, (4) Establish the correlation between the sensor data, the affecting-sensor factors, and the TWEPA standard station data.
This project obtained air quality sensor data from Civil IoT Taiwan Data Service Platform - EPA air quality micro station. Taking Taichung City and Kaohsiung City as the demonstration areas, the outliers with obvious observation errors were identified and eliminated through temporal and spatial cluster analysis. Furthermore, 17, 2 and 3 sensors that are closed to Zhongming, Qianjin, and Qiaotou stations were screened out. The correlations between the sensors were 0.97~0.99, 0.78~0.93 and 0.99, respectively. The correlations between the sensors and Zhongming, Qianjin, and standard measuring stations are 0.81, 0.70, and 0.68 respectively; through the Self-Organizing Map (SOM) to explore the cause of the bias, it is found that relative humidity, temperature, and wind speed are all affecting the sensor the main factor of bias.
To clarify the meaning of the sensor data, this project compares the consistency between sensor PM2.5 sensor data with the scale of minute, hour average, the moving average and the standard station "real-time AQI". The sensor’s moving average performance is the best and the minute value is the worst. When the air quality is better, the sensor is easy to overestimate; when the air quality is poor, the sensor is easy to underestimate. To improve the interpretation of the uncertainty of the sensor's minute value observation, this project integrates the analysis, through the Bayesian Network construction, an interactive way, considering under different weather conditions, the data between sensor PM2.5 and TWEPA standard station. The Bayesian network Network allows the public to understand the meaning behind the current sensor data.