SMART CITY PM2.5 AIR POLLUTION MODELING TECHNIQUES: TRAIN-TEST DATA SPLIT VERSUS K-FOLD CROSS VALIDATION TECHNIQUES
Abstract
Due to the substantial risks it poses to both human health and the environment, air pollution is a major issue that urban residents and city managers must deal with. Environmental deterioration, respiratory ailments, cardiac health difficulties, and other challenges have all been brought on by air pollution, particularly in densely populated cities or metropolises. To assess the concentrations of air pollutants in the nearby ambient environment, a variety of research techniques have been used in the literature. One such strategy uses a variety of statistical data-driven methods such as machine learning modeling and prediction tools. This is due to the fact that data-driven approaches, as opposed to the so-called chemical models approach, which is rather quite complex and time-consuming, are simpler and more cost-effective for estimating the levels of air pollutants dispersion within a certain place. Artificial intelligence (AI) has several subfields, including machine learning and deep learning, which can be used to train prior historical datasets to detect patterns in an occurrence that can be used to predict or forecast future occurrences of air pollution in a particular location or city. In this paper, two different air pollution modeling and simulation techniques (Train-Test Data Split and K-Fold CrossValidation methods) were used to model/predict the particulate matter (PM2.5) emission in Awka Metropolis. Some historical datasets comprising past air pollutants and meteorological datasets from 2008 to 2013 fromthe city of Awka Anambra State of Nigeria was utilized carry our PM2.5 emission modeling using eight different machine learning algorithms such as Multi Linear Regression (MLR), Decision Trees, Multi Layer Perceptron (MLP) Artificial Neural Network (MLP-ANN), Support Vector Regressor (SVR), Random Forest, AdaBoost, Extreme Gradient Boosting (XGBoost),and Extra Trees. Performance metrics such as coefficient of determination (R2 ), Mean Absolute Error (MAE) and Root Mean Square (RMSE) were used to compute the performances of the machine learning algorithms in terms of their modeling and prediction performances on the training and testing datasets. The results obtained from the experimental runs show that the models or algorithms such as - MLR, MLP ANN, Decision Tree, Random Forest (RF), AdaBoost, XGBoost and Extra Trees with the following R2 scores (0.9856 versus 0.9802; 0.9815 versus 0.8825; 0.9782 versus 0.9742; 0.9886 versus 0.9722; 0.9854 versus 0.9503; 0.9870 versus 0.9696 and 0.9886 versus 0.9716 for the Train-Test Data Split Method and 10-Fold Cross-Validation Test method respectively. These results from the two different modeling methods show that some levels of similarities in terms of prediction accuracy and errors of prediction. Therefore, the two prediction modeling techniques are adequate and suitable for the prediction modeling and estimation of PM2.5 pollution levels within Awka Metropolis. The prediction results obtained in Train-Test Data Split method are validated by the results obtained from using a K-fold Cross Validation approach. This shows that the two air pollution modeling and estimation techniques are suitable for modeling and prediction of air pollutant levels, since the results obtained from the two approaches show close correlation in terms of prediction accuracy and residual errors of prediction. KEYWORDS: PM2.5, air pollution, particulate matters, data-driven models, chemical models, Train Test Data Split, K-Fold Cross Validation, meteorological dataset, machine learning algorithms, modeling
Full Text:
PDFReferences
EPA (2023). “What are the Harmful Effects of PM?†Accessed online at https://www.epa.gov/pmpollution/particulate-matter-pm-
Mallet, Vivien and Spotisse, Bruno (2008). Air Quality modeling: From deterministic to stochastic approaches.
Computers & Mathematics with Applications. Volume 55, Issue 10, May 2008. pp. 2329-2337
Refbacks
- There are currently no refbacks.
Copyright (c) 2023 JOURNAL OF INVENTIVE ENGINEERING AND TECHNOLOGY (JIET)
Copyright 2020-2024. Journal of Inventive Engineering (JIET). All rights reserved. Nigerian Society of Engineers (NSE), Awka Branch.ISSN: 2705-3865
Powered by Myrasoft Systems Ltd.