Air Quality in Istanbul During Pandemic

Description

After wide research for efficient data of a social issue, we decided that “low air quality” is an underestimated problem since air pollution, which is a result of the urbanization brought about by modern life, has a local and regional impact as well as a global scale and it is a threat on public health in long terms even if people do not notice the effects immediately in daily life. Low air quality can cause serious health problems such as several lung diseases. Therefore, we can easily say that our life quality may depend on the amount of specific pollutants in the air we breathe in and air quality has to be given great importance all over the world. So that is why we want to analyze the amount of pollutants in İstanbul’s air. But we want to specify our project a little bit since for over a year we live our lives in pandemic conditions and Covid-19 affects our lives completely in every aspect. For this reason, we decided to analyze our air quality during the pandemic process while making some comparisons with last year without Covid-19.

Air Quality Index (AQI) and EPA

To solve air pollution problems and strategize, the scientific communities and the relevant authorities have focused on monitoring and analyzing atmospheric pollutant concentrations. In addition to the responsibilities of the authorities regarding the protection and improvement of air quality, it is also among their responsibilities to provide the public with up-to-date information on air pollution through communication tools, as it is an issue that directly affects public health. However, even if a scientist can understand the measurements of different pollutants, it is very difficult for the general public and local authorities. For this reason, a classification system that can be easily understood by the public is used when disclosing the condition of air pollution/air quality to the public.
The Environmental Protection Agency (EPA) is an independent executive agency of the United States federal government responsible for handling environmental protection issues. The AQI is a color-coded index developed by EPA for reporting and forecasting daily air quality. The AQI reports the most common ambient air pollutants, including particle pollution (PM10 and PM2.5). It gives information about how clean or dirty the air of the region we live in and what kind of health effects may occur. AQI indicates health effects that can occur within a few hours or days after inhalation of polluted air.
It uses a normalized scale from 0 to 500: the higher the AQI value, the greater the level of pollution and the greater the health concern. An AQI value of 100 generally corresponds to the level of the daily National Ambient Air Quality Standard for the pollutant. AQI values at and below 100 are generally considered to be satisfactory. With this classification system, which is widely used all over the world, air quality is graded as “Good”, “Moderate”, “Unhealthy for sensitive groups” “Unhealthy”, “Very unhealthy, and”Hazardous" according to the concentrations of pollutants in the air. The methods and criteria used in the calculation of the index in many countries of the world have been created by the air quality standards applied in their own countries.

What We Aim For?

Our aim in this project, called “Air Quality in Istanbul during Pandemic”, is to obtain and analyze the station data where some pollutants in the air are measured for the “Air Quality Index” and make some comparisons and draw graphics to see if anything in our air quality is changed during the pandemic as can be seen below our report.

Actions Taken

Web Scraping

When it is time to decide which districts are going to be analyzed, we wanted to move forward based on population density. So we looked for a dataset that includes the population density of İstanbul’s districts. When we could not find any proper dataset we did web scraping on two different websites and make two different data frames, one for the population of every district and one for the areas of the districts. Then we merged the two data frames based on district names. Lastly, we sort the data frame and found the districts with the most and the least population density, and chose two from both for analyzing.

Show code

library(readxl)
library(rvest)
library(dplyr)

Show code

# Get Population Number Of Istanbul Districts
url <- "https://www.nufusu.com/il/istanbul-nufusu"
webpage <- read_html(url)
table <- webpage %>%
  html_nodes("table")
PopulationNumber <- html_table(table[4])

PopulationNumber <- as.data.frame(PopulationNumber)
colnames(PopulationNumber) <- c("Yil", "Ilce", "Ilce_Nufusu", "Erkek_Nufusu", "Kadin_Nufusu", "Nufus_Yuzdesi")
PopulationNumber <- PopulationNumber %>% select(Ilce, Ilce_Nufusu)

colnames(PopulationNumber) <- c("Districts","Population")
PopulationNumber$Population <- gsub("\\.", "", PopulationNumber$Population)
PopulationNumber$Districts <- gsub("\\Eyüpsultan", "Eyüp", PopulationNumber$Districts)
PopulationNumber$Population <- as.double(PopulationNumber$Population)
PopulationNumber <- PopulationNumber[order(-PopulationNumber$Population),]

tail(PopulationNumber)

    Districts Population
36   Beşiktaş     176513
37    Çatalca      74975
38       Şile      37904
29 Bayrampaşa      26995
32     Beykoz      24611
39     Adalar      16033

Show code

# Get Area of Any Districts of Istanbul
url2 <- "https://www.atlasbig.com/tr/istanbulnun-ilceleri"
webpage <- read_html(url2)
DistrictArea <- webpage %>%
  html_nodes("table") %>%
  html_table()

DistrictArea <- as.data.frame(DistrictArea)
colnames(DistrictArea) <- c("Districts","Population", "DistrictArea")
DistrictArea <- DistrictArea %>% select(Districts, DistrictArea)
DistrictArea$DistrictArea <- gsub("\\.", "", DistrictArea$DistrictArea)
DistrictArea$DistrictArea <- gsub("\\,", ".", DistrictArea$DistrictArea)
DistrictArea$DistrictArea <- as.double(DistrictArea$DistrictArea)

tail(DistrictArea)

   Districts DistrictArea
34  Beşiktaş       17.992
35   Silivri      870.009
36   Çatalca     1136.737
37      Şile      782.227
38    Adalar       11.306
39   Avcılar       52.174

Merged Area Information and Population Numbers of any Districts of Istanbul on “Disstricts” Column and than set to New Dataframe is NufusYogunluguTablosu

Show code

Population_Area_Table <- left_join(PopulationNumber, DistrictArea, by="Districts")
Population_Area_Table$Population <- as.double(Population_Area_Table$Population)
Population_Area_Table$DistrictArea <- as.double(Population_Area_Table$DistrictArea)
Population_Area_Table$PopulationDensity <- (Population_Area_Table$Population)/(Population_Area_Table$DistrictArea)

Ordered NufusYogunluguTablosu By Descending

Gaziosmanpaşa has the most population density in Istanbul Districts.

Show code

Population_Area_Table <- Population_Area_Table[order(-Population_Area_Table$PopulationDensity),]
tail(Population_Area_Table)

    Districts Population DistrictArea PopulationDensity
39     Adalar      16033       11.306        1418.09659
24 Arnavutköy     296709      478.518         620.05818
33    Silivri     200215      870.009         230.12980
38     Beykoz      24611      311.755          78.94340
35    Çatalca      74975     1136.737          65.95633
36       Şile      37904      782.227          48.45652

Ordered NufusYogunluguTablosu By Ascending

Şile has the least population density in Istanbul Districts.

Show code

Population_Area_Table <- Population_Area_Table[order(Population_Area_Table$PopulationDensity),]
tail(Population_Area_Table)

       Districts Population DistrictArea PopulationDensity
31       Beyoğlu     226396        8.969          25242.06
16     Kağıthane     442415       15.601          28358.12
3       Bağcılar     737206       22.496          32770.54
6   Bahçelievler     592371       16.537          35820.95
26      Güngören     280299        7.305          38370.84
10 Gaziosmanpaşa     487778       11.635          41923.33

Dataset Info

The dataset we obtained for the project was taken from this website (https://sim.csb.gov.tr/STN/STN_Report/StationDataDownloadNew). Daily and hourly values could be accessed on the site. We chose four pollutants to observe and different dates for different cases but mostly focus on the Pandemic. The reason we chose Istanbul for air quality measurement is that it is the most populous city in Turkey and our University’s city.

We couldn’t reach enough data of parameters for Gaziosmanpaşa and we searched another options like Güngören and Bahçelievler, but we couldn’t reach data for this districts, too. Then we reached data for Bağcılar and we continued to work with Bağcılar for analyzing district which has one of the most population density in Istanbul.

Pollutants

It has been known for ages that the air quality we breathe has a direct impact on our health. Normally, 78.084% of the air is Nitrogen (N2), 20.946% Oxygen (O2), 0.934% Argon (Ar), 0.035% Carbon Dioxide (CO2). The remaining 0.001% consists of Neon (Ne), Methane (CH4), Helium (He), Hydrogen (H2), and Krypton (Kr). In addition, about 0.25% of the mass of the atmosphere is water vapor. Therefore, air pollution is defined as a change in the composition of the air or the mixing of substances that should not be present in the air, in a way that disrupts human health or environmental balance. Air pollution continues its effects at an increasing rate and with a changing content with the increase in population, the growth of cities, and the development of the industry. Energy consumption, burning of fossil fuels, and especially the increase in motor vehicles in urban centers cause deterioration in air quality. It is known that the effects of air pollutants on the environment and human health depend on time, space, duration of effect, concentration, and other characteristics.

A brief description of the air pollutants we choose to analyze, their possible health effects from exposure.

PM (Particulate Matter)

The term particulate matter (PM) refers to solid particles and liquid droplets found in the air. It mixes directly with the atmosphere as a result of human activities and natural sources. They form PM by reacting with other pollutants in the atmosphere and are released into the atmosphere. The sizes of solid and liquid particles span a wide range. Particles can remain suspended in the atmosphere from days to weeks, allowing the materials to travel over long distances. Larger particles are soon returned to the surface due to precipitation and gravity. PM10 and PM2.5 chemical and physical composition vary based on location, climate, and weather. The difference between PM10, PM2.5 is a matter of size. PM2.5 is very fine, and PM10 is larger than PM2.5. Particles larger than these will be filtered in the upper respiratory tract. Particulate matter may contain heavy metals such as mercury, lead, cadmium, and carcinogenic substances such as soot, fly ash, gasoline/diesel vehicle exhaust particles, and benzo(a)pyrene. Therefore, they pose a significant threat to health. To give some examples to health problems; Short-term exposures to PM10 have been associated primarily with worsening of respiratory diseases, leading to hospitalization and emergency department visits. Long-term (months to years) exposure to PM2.5 has been linked to premature death, particularly in people who have chronic heart or lung diseases, and reduced lung function growth in children. Another harm of PM is on ecosystems, including plants, soil, and water through deposition of PM and its subsequent uptake by plants or its deposition into the water where it can affect water quality and clarity.

NO2 (Nitrogen Dioxide)

Whenever something burns in the air, Nitrogen oxides will be formed. The reason for this is that the air we breathe mainly consists of Nitrogen (78%) and Oxygen (21%), and these combine when energy (from burning materials) is present in the environment. The most common nitrogen oxides (generally defined as NOx) are nitrogen oxide (NO) and nitrogen dioxide (NO2). Nitrous oxide (NO) is an odorless, colorless gas obtained by burning the fuel inside at high temperatures, for example, automobiles and other road vehicles, and, heaters. When NO comes into contact with air, it immediately combines with oxygen to form nitrogen dioxide (NO2). NO2 interacts with water, oxygen and other chemicals in the atmosphere to form acid rain. Acid rain harms ecosystems such as lakes and forests. Nitrogen dioxide causes a range of harmful effects on the lungs, including increased inflammation of the airways, worsened cough and wheezing, reduced lung function, and increased asthma attacks.

CO (Carbon Monoxide)

Carbon monoxide is a colorless, odorless gas and is formed when the carbon in fuels is incompletely burned. Its main source is internal combustion engines (85-95%). The greatest sources of CO to outdoor air are cars, trucks, and other vehicles or machinery that burn fossil fuels. CO concentrations typically reach their highest during the cold season. Because low temperatures cause incomplete combustion and cause the collapse of pollutants at ground level. CO binds 200 times more strongly to hemoglobin than O2. Therefore, it prevents O2 transport to the tissues and causes suffocation. Very high levels of CO are not likely to occur outdoors. However, when CO levels are elevated outdoors, they can be of particular concern for people with some types of heart disease.

Preprocessing Data

In this study, the data of Bağcılar and Şile stations for PM10, PM2_5, NO2 and CO components were downloaded in csv format. Reading operations were performed with the read_excel library and a dataframe was created for each dataset. In this case, we have created 2 dataframes in total for Şile and Bağcılar.

Then these dataframes went through certain processes for each component. 3 functions were written for each component. These functions are: selecting_data_, set_average_on_day_data_ and edit_all_data_*. With the Selecting_data_ function, we performed column separation for each parameter. In this way, a new dataframe was created for each component.

The data included in the downloaded dataset are time-based data. The last data we wanted to obtain was daily averaged data. For this purpose, within the set_average_on_day_data functions that we created for each parameter, operations were applied on the component-based dataframes that we took as parameters, and daily averages were taken and returned as a new dataframe.

There are two common functions for all 3 components. These two functions are merge_by_date and drop_year_from_date_data. A new column named Year has been added to the dataframe taken as a parameter with the drop_year_from_date_data function. In this column, the year information of the datetime information in the Date column is also kept. Then, the year variable of all datetime type data in the Date column was changed to 2020 in order to plot more than one group on the same chart, that is, to group the data based on year and plot them in a time-series chart. The new dataframe obtained after these changes is returned as a result.

The last function of the dataframes created for each component, which has gone through 3 different methods and created for each component, is the merge_by_date function. In this function, year-based changes are combined in the same dataframe. The left_join operation has been applied for this operation. The year-based data returned as none has been changed to 0.

The final version of the data obtained as output from the merge_by_date function is ready for graph drawing.

Imported Libraries to Graph and Import Datas

Show code

library(readxl)
library(readr)
library(dplyr)
library(ggplot2)
library(gapminder)
library(plotly)
library(tidyverse)
library(rvest)
library(hrbrthemes)
library(viridis)
library(babynames)

Getting Data For All Parameters are PM10, PM2_5, CO and NO

Show code

# All Parameters Datasets For Bağcılar

Bagcilar_2019_ByHour <- read_excel("../data/Bagcilar_2019_ByHour.xlsx")
Bagcilar_2020_ByHour <- read_excel("../data/Bagcilar_2020_ByHour.xlsx")
Bagcilar_2021_ByHour <- read_excel("../data/Bagcilar_2021_ByHour.xlsx")

# All Parameters Datasets For Şile

Sile_2019_ByHour <- read_excel("../data/Sile_2019_ByHour.xlsx")
Sile_2020_ByHour <- read_excel("../data/Sile_2020_ByHour.xlsx")
Sile_2021_ByHour <- read_excel("../data/Sile_2021_ByHour.xlsx")

Common functions to shape data Part.1

With this function, we aim that create a new column which includes only year information of Date column of data parameter and renamed by “Year”.

Show code

drop_year_from_date_data <- function(data) {
  # get only year from Date column and set to Year column which is new
  data$Year <- format(as.Date(data$Date), "20%y")
  # dropped year from Date column and set it again
  data$Date <- format(as.Date(data$Date), "2020-%m-%d")
  return(data)
}

Common functions to shape data Part.2

With this function, we aim that merge dataset on same column which is “Date”

Show code

merge_by_date <- function(data1, data2, data3) {
  # merged data1, data2 and data3 on Date column (tarih bazli 2019-2020-2021 datalari)
  
  joined_data <- full_join(data1, data2, by="Date")
  joined_data <- full_join(joined_data, data3, by="Date")
  joined_data[is.na(joined_data)] <- 0
  return(joined_data)
}

Functions to shape PM2_5 Parameter Datasets

Show code

selecting_data_PM2_5 <- function(data){
  # dropped the first row
  data = data[-1,]
  # renamed column names by order
  colnames(data) <- c("DateTime","PM10","CO","NO2","PM2_5")
  data <- data %>% select(DateTime, PM2_5)
  # set the Date column from Datetime column by only Date variables
  return(data)
}

set_average_on_day_data_PM2_5 <- function(data) {
  data$Date <- as.Date(data$DateTime)
  # dropped the first column
  data <- data[,2:3]
  # set NA values to 0
  data[is.na(data)] <- 0
  # set string "-" data to 0
  data$PM2_5 <- replace(data$PM2_5, data$PM2_5=="-",0)
  # changed commas with dots to get double values for R
  data$PM2_5 <- gsub("\\,", ".", data$PM2_5)
  data$PM2_5 <- as.double(data$PM2_5)
  # grouped data by Date column and set mean
  data <- aggregate(PM2_5 ~ Date, data, mean)
  return(data)
}

edit_all_data_PM2_5 <- function(data2019, data2020, data2021){
  selected_PM2_5_2019 <- selecting_data_PM2_5(data2019)
  selected_PM2_5_2020 <- selecting_data_PM2_5(data2020)
  selected_PM2_5_2021 <- selecting_data_PM2_5(data2021)
  avg_2019 <- set_average_on_day_data_PM2_5(selected_PM2_5_2019)
  avg_2020 <- set_average_on_day_data_PM2_5(selected_PM2_5_2020)
  avg_2021 <- set_average_on_day_data_PM2_5(selected_PM2_5_2021)
  merged_data <- merge_by_date(avg_2019, avg_2020, avg_2021)
  colnames(merged_data) <- c("Date","PM2_5_2019", "PM2_5_2020", "PM2_5_2021")
  merged_data$allPM2_5 <- merged_data$PM2_5_2019 + merged_data$PM2_5_2020 + merged_data$PM2_5_2021
  dropped_year_data <- drop_year_from_date_data(merged_data)
  return(dropped_year_data)
}

Functions to shape PM10 Parameter Datasets

Show code

selecting_data_PM10 <- function(data){
  # dropped the first row
  data = data[-1,]
  # renamed column names by order
  colnames(data) <- c("DateTime","PM10","CO","NO2","PM2_5")
  data <- data %>% select(DateTime, PM10)
  # set the Date column from Datetime column by only Date variables
  return(data)
}

set_average_on_day_data_PM10 <- function(data) {
  data$Date <- as.Date(data$DateTime)
  # dropped the first column
  data <- data[,2:3]
  # set NA values to 0
  data[is.na(data)] <- 0
  # set string "-" data to 0
  data$PM10 <- replace(data$PM10, data$PM10=="-",0)
  # changed commas with dots to get double values for R
  data$PM10 <- gsub("\\,", ".", data$PM10)
  data$PM10 <- as.double(data$PM10)
  # grouped data by Date column and set mean
  data <- aggregate(PM10 ~ Date, data, mean)
  return(data)
}

edit_all_data_PM10 <- function(data2019, data2020, data2021){
  selected_PM10_2019 <- selecting_data_PM10(data2019)
  selected_PM10_2020 <- selecting_data_PM10(data2020)
  selected_PM10_2021 <- selecting_data_PM10(data2021)
  avg_2019 <- set_average_on_day_data_PM10(selected_PM10_2019)
  avg_2020 <- set_average_on_day_data_PM10(selected_PM10_2020)
  avg_2021 <- set_average_on_day_data_PM10(selected_PM10_2021)
  merged_data <- merge_by_date(avg_2019, avg_2020, avg_2021)
  colnames(merged_data) <- c("Date","PM10_2019", "PM10_2020", "PM10_2021")
  merged_data$allPM10 <- merged_data$PM10_2019 + merged_data$PM10_2020 + merged_data$PM10_2021
  dropped_year_data <- drop_year_from_date_data(merged_data)
  return(dropped_year_data)
}

Functions to shape NO2 Parameter Datasets

Show code

selecting_data_NO2 <- function(data){
  # dropped the first row
  data = data[-1,]
  # renamed column names by order
  colnames(data) <- c("DateTime","PM10","CO","NO2","PM2_5")
  data <- data %>% select(DateTime, NO2)
  return(data)
}

set_average_on_day_data_NO2 <- function(data) {
  data$Date <- as.Date(data$DateTime)
  # dropped the first column
  data <- data[,2:3]
  # set NA values to 0
  data[is.na(data)] <- 0
  # set string "-" data to 0
  data$NO2 <- replace(data$NO2, data$NO2=="-",0)
  # changed commas with dots to get double values for R
  data$NO2 <- gsub("\\,", ".", data$NO2)
  data$NO2 <- as.double(data$NO2)
  # grouped data by Date column and set mean
  data <- aggregate(NO2 ~ Date, data, mean)
  return(data)
}

edit_all_data_NO2 <- function(data2019, data2020, data2021){
  selected_NO2_2019 <- selecting_data_NO2(data2019)
  selected_NO2_2020 <- selecting_data_NO2(data2020)
  selected_NO2_2021 <- selecting_data_NO2(data2021)
  avg_2019 <- set_average_on_day_data_NO2(selected_NO2_2019)
  avg_2020 <- set_average_on_day_data_NO2(selected_NO2_2020)
  avg_2021 <- set_average_on_day_data_NO2(selected_NO2_2021)
  merged_data <- merge_by_date(avg_2019, avg_2020, avg_2021)
  colnames(merged_data) <- c("Date","NO2_2019", "NO2_2020", "NO2_2021")
  merged_data$allNO2 <- merged_data$NO2_2019 + merged_data$NO2_2020 + merged_data$NO2_2021
  dropped_year_data <- drop_year_from_date_data(merged_data)
  return(dropped_year_data)
}

Functions to shape CO Parameter Datasets

Show code

# CO dataset
selecting_data_CO <- function(data){
  # dropped the first row
  data = data[-1,]
  # renamed column names by order
  colnames(data) <- c("DateTime","PM10","CO","NO2","PM2_5")
  data <- data %>% select(DateTime, CO)
  return(data)
}

set_average_on_day_data_CO <- function(data) {
  data$Date <- as.Date(data$DateTime)
  # dropped the first column
  data <- data[,2:3]
  # set NA values to 0
  data[is.na(data)] <- 0
  # set string "-" data to 0
  data$cO <- replace(data$CO, data$CO=="-",0)
  # changed commas with dots to get double values for R
  data$CO <- gsub("\\,", ".", data$CO)
  data$CO <- as.double(data$CO)
  # grouped data by Date column and set mean
  data <- aggregate(CO ~ Date, data, mean)
  return(data)
}

edit_all_data_CO <- function(data2019, data2020, data2021){
  selected_CO_2019 <- selecting_data_CO(data2019)
  selected_CO_2020 <- selecting_data_CO(data2020)
  selected_CO_2021 <- selecting_data_CO(data2021)
  avg_2019 <- set_average_on_day_data_CO(selected_CO_2019)
  avg_2020 <- set_average_on_day_data_CO(selected_CO_2020)
  avg_2021 <- set_average_on_day_data_CO(selected_CO_2021)
  merged_data <- merge_by_date(avg_2019, avg_2020, avg_2021)
  colnames(merged_data) <- c("Date","CO_2019", "CO_2020", "CO_2021")
  merged_data$allCO <- merged_data$CO_2019 + merged_data$CO_2020 + merged_data$CO_2021
  dropped_year_data <- drop_year_from_date_data(merged_data)
  return(dropped_year_data)
}

Created PM2_5 dataset for Bağcılar

Show code

Bagcilar_dataset_PM2_5 <- edit_all_data_PM2_5(Bagcilar_2019_ByHour, Bagcilar_2020_ByHour, Bagcilar_2021_ByHour)
tail(Bagcilar_dataset_PM2_5)

          Date PM2_5_2019 PM2_5_2020 PM2_5_2021 allPM2_5 Year
897 2020-06-15          0          0          0        0 2021
898 2020-06-16          0          0          0        0 2021
899 2020-06-17          0          0          0        0 2021
900 2020-06-18          0          0          0        0 2021
901 2020-06-19          0          0          0        0 2021
902 2020-06-20          0          0          0        0 2021

Created PM10 dataset for Bağcılar

Show code

Bagcilar_dataset_PM10 <- edit_all_data_PM10(Bagcilar_2019_ByHour, Bagcilar_2020_ByHour, Bagcilar_2021_ByHour)
tail(Bagcilar_dataset_PM10)

          Date PM10_2019 PM10_2020 PM10_2021  allPM10 Year
897 2020-06-15         0         0  28.30833 28.30833 2021
898 2020-06-16         0         0  14.25417 14.25417 2021
899 2020-06-17         0         0  20.12917 20.12917 2021
900 2020-06-18         0         0   0.00000  0.00000 2021
901 2020-06-19         0         0   0.00000  0.00000 2021
902 2020-06-20         0         0   0.00000  0.00000 2021

Created NO2 dataset for Bağcılar

Show code

Bagcilar_dataset_NO2 <- edit_all_data_NO2(Bagcilar_2019_ByHour, Bagcilar_2020_ByHour, Bagcilar_2021_ByHour)
tail(Bagcilar_dataset_NO2)

          Date NO2_2019 NO2_2020 NO2_2021   allNO2 Year
897 2020-06-15        0        0 34.34583 34.34583 2021
898 2020-06-16        0        0 23.02500 23.02500 2021
899 2020-06-17        0        0 23.57917 23.57917 2021
900 2020-06-18        0        0  0.00000  0.00000 2021
901 2020-06-19        0        0  0.00000  0.00000 2021
902 2020-06-20        0        0  0.00000  0.00000 2021

Created CO dataset for Bağcılar

Show code

Bagcilar_dataset_CO <- edit_all_data_CO(Bagcilar_2019_ByHour, Bagcilar_2020_ByHour, Bagcilar_2021_ByHour)
tail(Bagcilar_dataset_CO)

          Date CO_2019 CO_2020  CO_2021    allCO Year
849 2020-06-12       0       0 187.6708 187.6708 2021
850 2020-06-13       0       0 202.7333 202.7333 2021
851 2020-06-14       0       0 128.2458 128.2458 2021
852 2020-06-15       0       0 307.0455 307.0455 2021
853 2020-06-16       0       0 308.1833 308.1833 2021
854 2020-06-17       0       0 410.0000 410.0000 2021

NOTE:

Şile dataset doesn’t have data for CO and PM2_5 parameters

Created PM10 dataset for Sile

Show code

Sile_dataset_PM10 <- edit_all_data_PM10(Sile_2019_ByHour, Sile_2020_ByHour, Sile_2021_ByHour)
tail(Bagcilar_dataset_PM10)

          Date PM10_2019 PM10_2020 PM10_2021  allPM10 Year
897 2020-06-15         0         0  28.30833 28.30833 2021
898 2020-06-16         0         0  14.25417 14.25417 2021
899 2020-06-17         0         0  20.12917 20.12917 2021
900 2020-06-18         0         0   0.00000  0.00000 2021
901 2020-06-19         0         0   0.00000  0.00000 2021
902 2020-06-20         0         0   0.00000  0.00000 2021

Created NO2 dataset for Şile

Show code

Sile_dataset_NO2 <- edit_all_data_NO2(Sile_2019_ByHour, Sile_2020_ByHour, Sile_2021_ByHour)
tail(Bagcilar_dataset_NO2)

          Date NO2_2019 NO2_2020 NO2_2021   allNO2 Year
897 2020-06-15        0        0 34.34583 34.34583 2021
898 2020-06-16        0        0 23.02500 23.02500 2021
899 2020-06-17        0        0 23.57917 23.57917 2021
900 2020-06-18        0        0  0.00000  0.00000 2021
901 2020-06-19        0        0  0.00000  0.00000 2021
902 2020-06-20        0        0  0.00000  0.00000 2021

Graphs Part.1 for Bağcılar

Here, we will examine the PM10, PM2_5, CO and NO2 change graphs for Bağcılar between 2019-2020-2021.

Plotting PM10_Values For Bagcilar between 2019 January-2021 June

Show code

plot_Bagcilar_PM10 <- Bagcilar_dataset_PM10 %>% 
  ggplot( aes(x=Date, y=allPM10, group=Year, fill=Year, text=format(as.Date(Date), "%d-%m"))) +
  geom_area( ) +
  scale_fill_viridis(discrete = TRUE) +
  theme(legend.position="none") +
  ggtitle("Bagcilar PM10 Values between January, 2019 - June, 2021") +
  ylab("PM10")+
  theme_ipsum() +
  theme(
    legend.position = c(.95, .95),
    legend.justification = c("right", "top"),
    legend.box.just = "right",
    legend.margin = margin(6, 6, 6, 6)
  )+
  theme_classic(base_size = 11)
ggplotly(plot_Bagcilar_PM10, tooltip="text")

Plotting PM2_5 Values For Bagcilar between 2019 January-2021 June

Show code

plot_Bagcilar_PM2_5 <- Bagcilar_dataset_PM2_5 %>% 
  ggplot( aes(x=Date, y=allPM2_5, group=Year, fill=Year, text=format(as.Date(Date), "%d-%m"))) +
  geom_area( ) +
  scale_fill_viridis(discrete = TRUE) +
  theme(legend.position="none") +
  ggtitle("Bagcilar PM2_5 Values between January, 2019 - June, 2021") +
  ylab("PM2_5")+
  theme_ipsum() +
  theme(
    legend.position = c(.95, .95),
    legend.justification = c("right", "top"),
    legend.box.just = "right",
    legend.margin = margin(6, 6, 6, 6)
  )+
  theme_classic(base_size = 11)
ggplotly(plot_Bagcilar_PM2_5, tooltip="text")

Plotting NO2 Values For Bagcilar between 2019 January-2021 June

Show code

plot_Bagcilar_NO2 <- Bagcilar_dataset_NO2 %>% 
  ggplot( aes(x=Date, y=allNO2, group=Year, fill=Year, text=format(as.Date(Date), "%d-%m"))) +
  geom_area( ) +
  scale_fill_viridis(discrete = TRUE) +
  theme(legend.position="none") +
  ggtitle("Bagcilar NO2 Values between January, 2019 - June, 2021") +
  ylab("NO2")+
  theme_ipsum() +
  theme(
    legend.position = c(.95, .95),
    legend.justification = c("right", "top"),
    legend.box.just = "right",
    legend.margin = margin(6, 6, 6, 6)
  )+
  theme_classic(base_size = 11)
ggplotly(plot_Bagcilar_NO2, tooltip="text")

Plotting CO Values For Bagcilar between 2019 January-2021 June

Show code

plot_Bagcilar_CO <- Bagcilar_dataset_CO %>% 
  ggplot( aes(x=Date, y=allCO, group=Year, fill=Year, text=format(as.Date(Date), "%d-%m"))) +
  geom_area( ) +
  scale_fill_viridis(discrete = TRUE) +
  theme(legend.position="none") +
  ggtitle("Bagcilar CO Values between January, 2019 - June, 2021") +
  ylab("CO")+
  theme_ipsum() +
  theme(
    legend.position = c(.95, .95),
    legend.justification = c("right", "top"),
    legend.box.just = "right",
    legend.margin = margin(6, 6, 6, 6)
  )+
  theme_classic(base_size = 11)
ggplotly(plot_Bagcilar_CO, tooltip="text")

Graphs Part.1 for Şile

Here, we will examine the PM10 and NO2 change graphs for Şile between 2019-2020-2021.

Plotting PM10 Values For Şile between 2019 January-2021 June

Show code

plot_Sile_PM10 <- Sile_dataset_PM10 %>% 
  ggplot( aes(x=Date, y=allPM10, group=Year, fill=Year, text=format(as.Date(Date), "%d-%m"))) +
  geom_area( ) +
  scale_fill_viridis(discrete = TRUE) +
  theme(legend.position="none") +
  ggtitle("Şile PM10 Values between January, 2019 - June, 2021") +
  ylab("PM10")+
  theme_ipsum() +
  theme(
    legend.position = c(.95, .95),
    legend.justification = c("right", "top"),
    legend.box.just = "right",
    legend.margin = margin(6, 6, 6, 6)
  )+
  theme_classic(base_size = 11)
ggplotly(plot_Sile_PM10, tooltip="text")

Plotting NO2 Values For Şile between 2019 January-2021 June

Show code

plot_Sile_NO2 <- Sile_dataset_NO2 %>% 
  ggplot( aes(x=Date, y=allNO2, group=Year, fill=Year, text=format(as.Date(Date), "%d-%m"))) +
  geom_area( ) +
  scale_fill_viridis(discrete = TRUE) +
  theme(legend.position="none") +
  ggtitle("Şile NO2 Values between January, 2019 - June, 2021") +
  ylab("NO2")+
  theme_ipsum() +
  theme(
    legend.position = c(.95, .95),
    legend.justification = c("right", "top"),
    legend.box.just = "right",
    legend.margin = margin(6, 6, 6, 6)
  )+
  theme_classic(base_size = 11)
ggplotly(plot_Sile_NO2, tooltip="text")

Graphs Part.2

In the Graphs Part.1 section, we have obtained graphs comparing information containing 3 years. In this section, filtering was done on the data for summer and autumn months. Then, new graphs were drawn with the new dataframes obtained.

In this way, two graphs drawn on a seasonal basis were obtained. One of these graphs is a visualization tool comparing the data of PM10, PM2_5 , CO and NO2 components obtained from Bağcılar district during the autumn months of September, October and November. The other is a visualization tool that compares the data of PM10, PM2_5, CO and NO2 components obtained from Bağcılar district during the summer months of June, July and August.

Filtering based on autumn months are September, October and November

Show code

Bagcilar_2020_Sonbahar_PM10 <- Bagcilar_dataset_PM10[Bagcilar_dataset_PM10[, "Date"] >= '2020-09-01' & 
                                                  Bagcilar_dataset_PM10[, "Date"] <= '2020-11-30' & 
                                                  Bagcilar_dataset_PM10[, "Year"] == "2020", ]

colnames(Bagcilar_2020_Sonbahar_PM10) <- c("Date","1","2","3","Values","Year")

Bagcilar_2020_Sonbahar_PM2_5 <- Bagcilar_dataset_PM2_5[Bagcilar_dataset_PM2_5[, "Date"] >= '2020-09-01' & 
                                                  Bagcilar_dataset_PM2_5[, "Date"] <= '2020-11-30' & 
                                                  Bagcilar_dataset_PM2_5[, "Year"] == "2020", ]

colnames(Bagcilar_2020_Sonbahar_PM2_5) <- c("Date","1","2","3","Values","Year")


Bagcilar_2020_Sonbahar_CO <- Bagcilar_dataset_CO[Bagcilar_dataset_CO[,"Date"] >= '2020-09-01' & 
                                               Bagcilar_dataset_CO[, "Date"] <= '2020-11-30' & 
                                               Bagcilar_dataset_CO[, "Year"] == "2020", ]

Show code

colnames(Bagcilar_2020_Sonbahar_CO) <- c("Date","1","2","3","Values","Year")

Bagcilar_2020_Sonbahar_NO2 <- Bagcilar_dataset_NO2[Bagcilar_dataset_NO2[, "Date"] >= '2020-09-01' & 
                                                  Bagcilar_dataset_NO2[, "Date"] <= '2020-11-30' & 
                                                  Bagcilar_dataset_NO2[, "Year"] == "2020", ]

colnames(Bagcilar_2020_Sonbahar_NO2) <- c("Date","1","2","3","Values","Year")

Show code

colors <- c("NO2" = "blue", "PM10" = "red", "PM2_5" = "yellow", "CO" = "black")
ggplot(NULL) + 
  geom_line(data = Bagcilar_2020_Sonbahar_NO2, aes(x = Date, y=Values, group=Year, color = "NO2"), size = 1.5) +
  geom_line(data = Bagcilar_2020_Sonbahar_PM10, aes(x = Date, y = Values,group=Year, color = "PM10"), size = 1.5) +
  geom_line(data = Bagcilar_2020_Sonbahar_PM2_5, aes(x = Date, y = Values,group=Year, color = "PM2_5"), size = 1.5) +
  geom_line(data = Bagcilar_2020_Sonbahar_CO, aes(x = Date, y = Values/1000,group=Year, color = "CO"), size = 1.5) +
  ggtitle("PM10, PM2_5, NO2 ve CO Values of Autumn Months in 2020 for Bağcılar") +
  labs(x = "Date",
       y = "Values",
       color = "Parametreler") +
  scale_color_manual(values = colors)+
  theme_classic(base_size = 11)

Filtering based on summer months are June, July and August

Show code

Bagcilar_2020_Yaz_PM10 <- Bagcilar_dataset_PM10[Bagcilar_dataset_PM10[, "Date"] >= '2020-06-01' & 
                                                  Bagcilar_dataset_PM10[, "Date"] <= '2020-08-31' & 
                                                  Bagcilar_dataset_PM10[, "Year"] == "2020", ]

colnames(Bagcilar_2020_Yaz_PM10) <- c("Date","1","2","3","Values","Year")

Bagcilar_2020_Yaz_PM2_5 <- Bagcilar_dataset_PM2_5[Bagcilar_dataset_PM2_5[, "Date"] >= '2020-06-01' & 
                                                  Bagcilar_dataset_PM2_5[, "Date"] <= '2020-08-31' & 
                                                  Bagcilar_dataset_PM2_5[, "Year"] == "2020", ]

colnames(Bagcilar_2020_Yaz_PM2_5) <- c("Date","1","2","3","Values","Year")


Bagcilar_2020_Yaz_CO <- Bagcilar_dataset_CO[Bagcilar_dataset_CO[,"Date"] >= '2020-06-01' & 
                                               Bagcilar_dataset_CO[, "Date"] <= '2020-08-31' & 
                                               Bagcilar_dataset_CO[, "Year"] == "2020", ]

colnames(Bagcilar_2020_Yaz_CO) <- c("Date","1","2","3","Values","Year")

Bagcilar_2020_Yaz_NO2 <- Bagcilar_dataset_NO2[Bagcilar_dataset_NO2[, "Date"] >= '2020-06-01' & 
                                                  Bagcilar_dataset_NO2[, "Date"] <= '2020-08-31' & 
                                                  Bagcilar_dataset_NO2[, "Year"] == "2020", ]

colnames(Bagcilar_2020_Yaz_NO2) <- c("Date","1","2","3","Values","Year")

Show code

colors <- c("NO2" = "blue", "PM10" = "red", "PM2_5" = "yellow", "CO" = "black")
ggplot(NULL) + 
  geom_line(data = Bagcilar_2020_Yaz_NO2, aes(x = Date, y=Values, group=Year, color = "NO2"), size = 1.5) +
  geom_line(data = Bagcilar_2020_Yaz_PM10, aes(x = Date, y = Values,group=Year, color = "PM10"), size = 1.5) +
  geom_line(data = Bagcilar_2020_Yaz_PM2_5, aes(x = Date, y = Values,group=Year, color = "PM2_5"), size = 1.5) +
  geom_line(data = Bagcilar_2020_Yaz_CO, aes(x = Date, y = Values/1000,group=Year, color = "CO"), size = 1.5) +
  ggtitle("PM10, PM2_5, NO2 ve CO Values of Summer Months in 2020 for Bağcılar") +
  labs(x = "Date",
       y = "Values",
       color = "Parametreler") +
  scale_color_manual(values = colors)+
  theme_classic(base_size = 11)

PM10 AQI calculation fro Bağcılar

Show code

x_date <- Bagcilar_2020_Sonbahar_PM10$Date
x_val <- Bagcilar_2020_Sonbahar_PM10$Values
PM10_Bagcilar_AGI <- as.data.frame(x_date)
Status <- case_when(
  x_val >= 0 & x_val <= 55 ~ "Good",
  x_val > 55 & x_val <= 155 ~ "Moderate",
  x_val > 155 & x_val <= 255 ~ "Unhealthy for Sensitive Groups",
  x_val > 255 & x_val <= 355 ~ "Unhealthy",
  x_val > 355 & x_val <= 425 ~ "Very Unhealthy",
  x_val > 425 & x_val <= 605 ~ "Hazardous",
  TRUE ~ as.character(x_val)
)
PM10_Bagcilar_AGI$Status <- Status
tail(PM10_Bagcilar_AGI)

       x_date   Status
86 2020-11-25     Good
87 2020-11-26     Good
88 2020-11-27 Moderate
89 2020-11-28 Moderate
90 2020-11-29 Moderate
91 2020-11-30 Moderate

Show code

ggplot(PM10_Bagcilar_AGI, aes(x_date, Status, color=Status)) +
  geom_point()+
  theme_classic(base_size = 15)+
  xlab("Date")

Results

We downloaded the datasets for the years 2019- 2020-2021. The reason we chose it this way is that the Pandemic has started for Turkey in 2020-March. In order to understand the changes since March 2020, we need to look ahead. In this way, datasets for Bağcılar and Şile were downloaded from January 2019 until June 2021.

In this study, the main point to be examined is how pollutants can affect our lives before, during and after the pandemic. We talked about what pollutants are and to what extent they affect our lives. In this way, we decided to examine the pollutants PM10, PM2_5, CO and NO2 that we have chosen.

The reason why we chose Bağcılar and Şile, which are districts of Istanbul, the most populous city of Turkey; Bagcilar is one of the densest districts of Istanbul in terms of population and surface area, and Şile is the least dense district of Istanbul.

Datasets containing the data of pollutant measurement stations belonging to these districts were examined.

As a result of these investigations, two graphic groups were obtained. In the first group, the daily average values of the pollutant data recorded for 3 years for both districts, Bağcılar and Şile, were grouped on the same graph and visualized by year. In the second group, it was visualized to what extent PM10, PM2_5 CO and NO2 pollutants changed during the autumn and summer seasons of Bağcılar.

We thought that we could draw the following conclusions based on the graphs obtained. Together with the graphics in the first part, we observed that from the beginning of the pandemic until June 2021, the pollutant data decreased compared to each other on the same day of each year. Pollutant data from 2019 to 2021 showed a decrease, but the correct rate of reduction occurred between the same days of each year.

With the graphs in the second part, the increase and decrease rates of PM10 and NO2 pollutants during the autumn and summer months are very similar. However, we observed that the PM2_5 pollutant’s data included much more data changes only in summer than in autumn, and changed independently of other pollutants. The CO pollutant acted independently of the other 3 components in both seasons and did not show much change.