After wide research for efficient data of a social issue, we decided that “low air quality” is an underestimated problem since air pollution, which is a result of the urbanization brought about by modern life, has a local and regional impact as well as a global scale and it is a threat on public health in long terms even if people do not notice the effects immediately in daily life. Low air quality can cause serious health problems such as several lung diseases. Therefore, we can easily say that our life quality may depend on the amount of specific pollutants in the air we breathe in and air quality has to be given great importance all over the world. So that is why we want to analyze the amount of pollutants in İstanbul’s air. But we want to specify our project a little bit since for over a year we live our lives in pandemic conditions and Covid-19 affects our lives completely in every aspect. For this reason, we decided to analyze our air quality during the pandemic process while making some comparisons with last year without Covid-19.
To solve air pollution problems and strategize, the scientific communities and the relevant authorities have focused on monitoring and analyzing atmospheric pollutant concentrations. In addition to the responsibilities of the authorities regarding the protection and improvement of air quality, it is also among their responsibilities to provide the public with up-to-date information on air pollution through communication tools, as it is an issue that directly affects public health. However, even if a scientist can understand the measurements of different pollutants, it is very difficult for the general public and local authorities. For this reason, a classification system that can be easily understood by the public is used when disclosing the condition of air pollution/air quality to the public.
The Environmental Protection Agency (EPA) is an independent executive agency of the United States federal government responsible for handling environmental protection issues. The AQI is a color-coded index developed by EPA for reporting and forecasting daily air quality. The AQI reports the most common ambient air pollutants, including particle pollution (PM10 and PM2.5). It gives information about how clean or dirty the air of the region we live in and what kind of health effects may occur. AQI indicates health effects that can occur within a few hours or days after inhalation of polluted air.
It uses a normalized scale from 0 to 500: the higher the AQI value, the greater the level of pollution and the greater the health concern. An AQI value of 100 generally corresponds to the level of the daily National Ambient Air Quality Standard for the pollutant. AQI values at and below 100 are generally considered to be satisfactory. With this classification system, which is widely used all over the world, air quality is graded as “Good”, “Moderate”, “Unhealthy for sensitive groups” “Unhealthy”, “Very unhealthy, and”Hazardous" according to the concentrations of pollutants in the air. The methods and criteria used in the calculation of the index in many countries of the world have been created by the air quality standards applied in their own countries.
Our aim in this project, called “Air Quality in Istanbul during Pandemic”, is to obtain and analyze the station data where some pollutants in the air are measured for the “Air Quality Index” and make some comparisons and draw graphics to see if anything in our air quality is changed during the pandemic as can be seen below our report.
When it is time to decide which districts are going to be analyzed, we wanted to move forward based on population density. So we looked for a dataset that includes the population density of İstanbul’s districts. When we could not find any proper dataset we did web scraping on two different websites and make two different data frames, one for the population of every district and one for the areas of the districts. Then we merged the two data frames based on district names. Lastly, we sort the data frame and found the districts with the most and the least population density, and chose two from both for analyzing.
# Get Population Number Of Istanbul Districts
url <- "https://www.nufusu.com/il/istanbul-nufusu"
webpage <- read_html(url)
table <- webpage %>%
html_nodes("table")
PopulationNumber <- html_table(table[4])
PopulationNumber <- as.data.frame(PopulationNumber)
colnames(PopulationNumber) <- c("Yil", "Ilce", "Ilce_Nufusu", "Erkek_Nufusu", "Kadin_Nufusu", "Nufus_Yuzdesi")
PopulationNumber <- PopulationNumber %>% select(Ilce, Ilce_Nufusu)
colnames(PopulationNumber) <- c("Districts","Population")
PopulationNumber$Population <- gsub("\\.", "", PopulationNumber$Population)
PopulationNumber$Districts <- gsub("\\Eyüpsultan", "Eyüp", PopulationNumber$Districts)
PopulationNumber$Population <- as.double(PopulationNumber$Population)
PopulationNumber <- PopulationNumber[order(-PopulationNumber$Population),]
tail(PopulationNumber)
Districts Population
36 Beşiktaş 176513
37 Çatalca 74975
38 Şile 37904
29 Bayrampaşa 26995
32 Beykoz 24611
39 Adalar 16033
# Get Area of Any Districts of Istanbul
url2 <- "https://www.atlasbig.com/tr/istanbulnun-ilceleri"
webpage <- read_html(url2)
DistrictArea <- webpage %>%
html_nodes("table") %>%
html_table()
DistrictArea <- as.data.frame(DistrictArea)
colnames(DistrictArea) <- c("Districts","Population", "DistrictArea")
DistrictArea <- DistrictArea %>% select(Districts, DistrictArea)
DistrictArea$DistrictArea <- gsub("\\.", "", DistrictArea$DistrictArea)
DistrictArea$DistrictArea <- gsub("\\,", ".", DistrictArea$DistrictArea)
DistrictArea$DistrictArea <- as.double(DistrictArea$DistrictArea)
tail(DistrictArea)
Districts DistrictArea
34 Beşiktaş 17.992
35 Silivri 870.009
36 Çatalca 1136.737
37 Şile 782.227
38 Adalar 11.306
39 Avcılar 52.174
Population_Area_Table <- left_join(PopulationNumber, DistrictArea, by="Districts")
Population_Area_Table$Population <- as.double(Population_Area_Table$Population)
Population_Area_Table$DistrictArea <- as.double(Population_Area_Table$DistrictArea)
Population_Area_Table$PopulationDensity <- (Population_Area_Table$Population)/(Population_Area_Table$DistrictArea)
Gaziosmanpaşa has the most population density in Istanbul Districts.
Districts Population DistrictArea PopulationDensity
39 Adalar 16033 11.306 1418.09659
24 Arnavutköy 296709 478.518 620.05818
33 Silivri 200215 870.009 230.12980
38 Beykoz 24611 311.755 78.94340
35 Çatalca 74975 1136.737 65.95633
36 Şile 37904 782.227 48.45652
Şile has the least population density in Istanbul Districts.
Districts Population DistrictArea PopulationDensity
31 Beyoğlu 226396 8.969 25242.06
16 Kağıthane 442415 15.601 28358.12
3 Bağcılar 737206 22.496 32770.54
6 Bahçelievler 592371 16.537 35820.95
26 Güngören 280299 7.305 38370.84
10 Gaziosmanpaşa 487778 11.635 41923.33
The dataset we obtained for the project was taken from this website (https://sim.csb.gov.tr/STN/STN_Report/StationDataDownloadNew). Daily and hourly values could be accessed on the site. We chose four pollutants to observe and different dates for different cases but mostly focus on the Pandemic. The reason we chose Istanbul for air quality measurement is that it is the most populous city in Turkey and our University’s city.
We couldn’t reach enough data of parameters for Gaziosmanpaşa and we searched another options like Güngören and Bahçelievler, but we couldn’t reach data for this districts, too. Then we reached data for Bağcılar and we continued to work with Bağcılar for analyzing district which has one of the most population density in Istanbul.
It has been known for ages that the air quality we breathe has a direct impact on our health. Normally, 78.084% of the air is Nitrogen (N2), 20.946% Oxygen (O2), 0.934% Argon (Ar), 0.035% Carbon Dioxide (CO2). The remaining 0.001% consists of Neon (Ne), Methane (CH4), Helium (He), Hydrogen (H2), and Krypton (Kr). In addition, about 0.25% of the mass of the atmosphere is water vapor. Therefore, air pollution is defined as a change in the composition of the air or the mixing of substances that should not be present in the air, in a way that disrupts human health or environmental balance. Air pollution continues its effects at an increasing rate and with a changing content with the increase in population, the growth of cities, and the development of the industry. Energy consumption, burning of fossil fuels, and especially the increase in motor vehicles in urban centers cause deterioration in air quality. It is known that the effects of air pollutants on the environment and human health depend on time, space, duration of effect, concentration, and other characteristics.
A brief description of the air pollutants we choose to analyze, their possible health effects from exposure.
The term particulate matter (PM) refers to solid particles and liquid droplets found in the air. It mixes directly with the atmosphere as a result of human activities and natural sources. They form PM by reacting with other pollutants in the atmosphere and are released into the atmosphere. The sizes of solid and liquid particles span a wide range. Particles can remain suspended in the atmosphere from days to weeks, allowing the materials to travel over long distances. Larger particles are soon returned to the surface due to precipitation and gravity. PM10 and PM2.5 chemical and physical composition vary based on location, climate, and weather. The difference between PM10, PM2.5 is a matter of size. PM2.5 is very fine, and PM10 is larger than PM2.5. Particles larger than these will be filtered in the upper respiratory tract. Particulate matter may contain heavy metals such as mercury, lead, cadmium, and carcinogenic substances such as soot, fly ash, gasoline/diesel vehicle exhaust particles, and benzo(a)pyrene. Therefore, they pose a significant threat to health. To give some examples to health problems; Short-term exposures to PM10 have been associated primarily with worsening of respiratory diseases, leading to hospitalization and emergency department visits. Long-term (months to years) exposure to PM2.5 has been linked to premature death, particularly in people who have chronic heart or lung diseases, and reduced lung function growth in children. Another harm of PM is on ecosystems, including plants, soil, and water through deposition of PM and its subsequent uptake by plants or its deposition into the water where it can affect water quality and clarity.
Whenever something burns in the air, Nitrogen oxides will be formed. The reason for this is that the air we breathe mainly consists of Nitrogen (78%) and Oxygen (21%), and these combine when energy (from burning materials) is present in the environment. The most common nitrogen oxides (generally defined as NOx) are nitrogen oxide (NO) and nitrogen dioxide (NO2). Nitrous oxide (NO) is an odorless, colorless gas obtained by burning the fuel inside at high temperatures, for example, automobiles and other road vehicles, and, heaters. When NO comes into contact with air, it immediately combines with oxygen to form nitrogen dioxide (NO2). NO2 interacts with water, oxygen and other chemicals in the atmosphere to form acid rain. Acid rain harms ecosystems such as lakes and forests. Nitrogen dioxide causes a range of harmful effects on the lungs, including increased inflammation of the airways, worsened cough and wheezing, reduced lung function, and increased asthma attacks.
Carbon monoxide is a colorless, odorless gas and is formed when the carbon in fuels is incompletely burned. Its main source is internal combustion engines (85-95%). The greatest sources of CO to outdoor air are cars, trucks, and other vehicles or machinery that burn fossil fuels. CO concentrations typically reach their highest during the cold season. Because low temperatures cause incomplete combustion and cause the collapse of pollutants at ground level. CO binds 200 times more strongly to hemoglobin than O2. Therefore, it prevents O2 transport to the tissues and causes suffocation. Very high levels of CO are not likely to occur outdoors. However, when CO levels are elevated outdoors, they can be of particular concern for people with some types of heart disease.
In this study, the data of Bağcılar and Şile stations for PM10, PM2_5, NO2 and CO components were downloaded in csv format. Reading operations were performed with the read_excel library and a dataframe was created for each dataset. In this case, we have created 2 dataframes in total for Şile and Bağcılar.
Then these dataframes went through certain processes for each component. 3 functions were written for each component. These functions are: selecting_data_, set_average_on_day_data_ and edit_all_data_*. With the Selecting_data_ function, we performed column separation for each parameter. In this way, a new dataframe was created for each component.
The data included in the downloaded dataset are time-based data. The last data we wanted to obtain was daily averaged data. For this purpose, within the set_average_on_day_data functions that we created for each parameter, operations were applied on the component-based dataframes that we took as parameters, and daily averages were taken and returned as a new dataframe.
There are two common functions for all 3 components. These two functions are merge_by_date and drop_year_from_date_data. A new column named Year has been added to the dataframe taken as a parameter with the drop_year_from_date_data function. In this column, the year information of the datetime information in the Date column is also kept. Then, the year variable of all datetime type data in the Date column was changed to 2020 in order to plot more than one group on the same chart, that is, to group the data based on year and plot them in a time-series chart. The new dataframe obtained after these changes is returned as a result.
The last function of the dataframes created for each component, which has gone through 3 different methods and created for each component, is the merge_by_date function. In this function, year-based changes are combined in the same dataframe. The left_join operation has been applied for this operation. The year-based data returned as none has been changed to 0.
The final version of the data obtained as output from the merge_by_date function is ready for graph drawing.
# All Parameters Datasets For Bağcılar
Bagcilar_2019_ByHour <- read_excel("../data/Bagcilar_2019_ByHour.xlsx")
Bagcilar_2020_ByHour <- read_excel("../data/Bagcilar_2020_ByHour.xlsx")
Bagcilar_2021_ByHour <- read_excel("../data/Bagcilar_2021_ByHour.xlsx")
# All Parameters Datasets For Şile
Sile_2019_ByHour <- read_excel("../data/Sile_2019_ByHour.xlsx")
Sile_2020_ByHour <- read_excel("../data/Sile_2020_ByHour.xlsx")
Sile_2021_ByHour <- read_excel("../data/Sile_2021_ByHour.xlsx")
With this function, we aim that create a new column which includes only year information of Date column of data parameter and renamed by “Year”.
With this function, we aim that merge dataset on same column which is “Date”
selecting_data_PM2_5 <- function(data){
# dropped the first row
data = data[-1,]
# renamed column names by order
colnames(data) <- c("DateTime","PM10","CO","NO2","PM2_5")
data <- data %>% select(DateTime, PM2_5)
# set the Date column from Datetime column by only Date variables
return(data)
}
set_average_on_day_data_PM2_5 <- function(data) {
data$Date <- as.Date(data$DateTime)
# dropped the first column
data <- data[,2:3]
# set NA values to 0
data[is.na(data)] <- 0
# set string "-" data to 0
data$PM2_5 <- replace(data$PM2_5, data$PM2_5=="-",0)
# changed commas with dots to get double values for R
data$PM2_5 <- gsub("\\,", ".", data$PM2_5)
data$PM2_5 <- as.double(data$PM2_5)
# grouped data by Date column and set mean
data <- aggregate(PM2_5 ~ Date, data, mean)
return(data)
}
edit_all_data_PM2_5 <- function(data2019, data2020, data2021){
selected_PM2_5_2019 <- selecting_data_PM2_5(data2019)
selected_PM2_5_2020 <- selecting_data_PM2_5(data2020)
selected_PM2_5_2021 <- selecting_data_PM2_5(data2021)
avg_2019 <- set_average_on_day_data_PM2_5(selected_PM2_5_2019)
avg_2020 <- set_average_on_day_data_PM2_5(selected_PM2_5_2020)
avg_2021 <- set_average_on_day_data_PM2_5(selected_PM2_5_2021)
merged_data <- merge_by_date(avg_2019, avg_2020, avg_2021)
colnames(merged_data) <- c("Date","PM2_5_2019", "PM2_5_2020", "PM2_5_2021")
merged_data$allPM2_5 <- merged_data$PM2_5_2019 + merged_data$PM2_5_2020 + merged_data$PM2_5_2021
dropped_year_data <- drop_year_from_date_data(merged_data)
return(dropped_year_data)
}
selecting_data_PM10 <- function(data){
# dropped the first row
data = data[-1,]
# renamed column names by order
colnames(data) <- c("DateTime","PM10","CO","NO2","PM2_5")
data <- data %>% select(DateTime, PM10)
# set the Date column from Datetime column by only Date variables
return(data)
}
set_average_on_day_data_PM10 <- function(data) {
data$Date <- as.Date(data$DateTime)
# dropped the first column
data <- data[,2:3]
# set NA values to 0
data[is.na(data)] <- 0
# set string "-" data to 0
data$PM10 <- replace(data$PM10, data$PM10=="-",0)
# changed commas with dots to get double values for R
data$PM10 <- gsub("\\,", ".", data$PM10)
data$PM10 <- as.double(data$PM10)
# grouped data by Date column and set mean
data <- aggregate(PM10 ~ Date, data, mean)
return(data)
}
edit_all_data_PM10 <- function(data2019, data2020, data2021){
selected_PM10_2019 <- selecting_data_PM10(data2019)
selected_PM10_2020 <- selecting_data_PM10(data2020)
selected_PM10_2021 <- selecting_data_PM10(data2021)
avg_2019 <- set_average_on_day_data_PM10(selected_PM10_2019)
avg_2020 <- set_average_on_day_data_PM10(selected_PM10_2020)
avg_2021 <- set_average_on_day_data_PM10(selected_PM10_2021)
merged_data <- merge_by_date(avg_2019, avg_2020, avg_2021)
colnames(merged_data) <- c("Date","PM10_2019", "PM10_2020", "PM10_2021")
merged_data$allPM10 <- merged_data$PM10_2019 + merged_data$PM10_2020 + merged_data$PM10_2021
dropped_year_data <- drop_year_from_date_data(merged_data)
return(dropped_year_data)
}
selecting_data_NO2 <- function(data){
# dropped the first row
data = data[-1,]
# renamed column names by order
colnames(data) <- c("DateTime","PM10","CO","NO2","PM2_5")
data <- data %>% select(DateTime, NO2)
return(data)
}
set_average_on_day_data_NO2 <- function(data) {
data$Date <- as.Date(data$DateTime)
# dropped the first column
data <- data[,2:3]
# set NA values to 0
data[is.na(data)] <- 0
# set string "-" data to 0
data$NO2 <- replace(data$NO2, data$NO2=="-",0)
# changed commas with dots to get double values for R
data$NO2 <- gsub("\\,", ".", data$NO2)
data$NO2 <- as.double(data$NO2)
# grouped data by Date column and set mean
data <- aggregate(NO2 ~ Date, data, mean)
return(data)
}
edit_all_data_NO2 <- function(data2019, data2020, data2021){
selected_NO2_2019 <- selecting_data_NO2(data2019)
selected_NO2_2020 <- selecting_data_NO2(data2020)
selected_NO2_2021 <- selecting_data_NO2(data2021)
avg_2019 <- set_average_on_day_data_NO2(selected_NO2_2019)
avg_2020 <- set_average_on_day_data_NO2(selected_NO2_2020)
avg_2021 <- set_average_on_day_data_NO2(selected_NO2_2021)
merged_data <- merge_by_date(avg_2019, avg_2020, avg_2021)
colnames(merged_data) <- c("Date","NO2_2019", "NO2_2020", "NO2_2021")
merged_data$allNO2 <- merged_data$NO2_2019 + merged_data$NO2_2020 + merged_data$NO2_2021
dropped_year_data <- drop_year_from_date_data(merged_data)
return(dropped_year_data)
}
# CO dataset
selecting_data_CO <- function(data){
# dropped the first row
data = data[-1,]
# renamed column names by order
colnames(data) <- c("DateTime","PM10","CO","NO2","PM2_5")
data <- data %>% select(DateTime, CO)
return(data)
}
set_average_on_day_data_CO <- function(data) {
data$Date <- as.Date(data$DateTime)
# dropped the first column
data <- data[,2:3]
# set NA values to 0
data[is.na(data)] <- 0
# set string "-" data to 0
data$cO <- replace(data$CO, data$CO=="-",0)
# changed commas with dots to get double values for R
data$CO <- gsub("\\,", ".", data$CO)
data$CO <- as.double(data$CO)
# grouped data by Date column and set mean
data <- aggregate(CO ~ Date, data, mean)
return(data)
}
edit_all_data_CO <- function(data2019, data2020, data2021){
selected_CO_2019 <- selecting_data_CO(data2019)
selected_CO_2020 <- selecting_data_CO(data2020)
selected_CO_2021 <- selecting_data_CO(data2021)
avg_2019 <- set_average_on_day_data_CO(selected_CO_2019)
avg_2020 <- set_average_on_day_data_CO(selected_CO_2020)
avg_2021 <- set_average_on_day_data_CO(selected_CO_2021)
merged_data <- merge_by_date(avg_2019, avg_2020, avg_2021)
colnames(merged_data) <- c("Date","CO_2019", "CO_2020", "CO_2021")
merged_data$allCO <- merged_data$CO_2019 + merged_data$CO_2020 + merged_data$CO_2021
dropped_year_data <- drop_year_from_date_data(merged_data)
return(dropped_year_data)
}
Bagcilar_dataset_PM2_5 <- edit_all_data_PM2_5(Bagcilar_2019_ByHour, Bagcilar_2020_ByHour, Bagcilar_2021_ByHour)
tail(Bagcilar_dataset_PM2_5)
Date PM2_5_2019 PM2_5_2020 PM2_5_2021 allPM2_5 Year
897 2020-06-15 0 0 0 0 2021
898 2020-06-16 0 0 0 0 2021
899 2020-06-17 0 0 0 0 2021
900 2020-06-18 0 0 0 0 2021
901 2020-06-19 0 0 0 0 2021
902 2020-06-20 0 0 0 0 2021
Bagcilar_dataset_PM10 <- edit_all_data_PM10(Bagcilar_2019_ByHour, Bagcilar_2020_ByHour, Bagcilar_2021_ByHour)
tail(Bagcilar_dataset_PM10)
Date PM10_2019 PM10_2020 PM10_2021 allPM10 Year
897 2020-06-15 0 0 28.30833 28.30833 2021
898 2020-06-16 0 0 14.25417 14.25417 2021
899 2020-06-17 0 0 20.12917 20.12917 2021
900 2020-06-18 0 0 0.00000 0.00000 2021
901 2020-06-19 0 0 0.00000 0.00000 2021
902 2020-06-20 0 0 0.00000 0.00000 2021
Bagcilar_dataset_NO2 <- edit_all_data_NO2(Bagcilar_2019_ByHour, Bagcilar_2020_ByHour, Bagcilar_2021_ByHour)
tail(Bagcilar_dataset_NO2)
Date NO2_2019 NO2_2020 NO2_2021 allNO2 Year
897 2020-06-15 0 0 34.34583 34.34583 2021
898 2020-06-16 0 0 23.02500 23.02500 2021
899 2020-06-17 0 0 23.57917 23.57917 2021
900 2020-06-18 0 0 0.00000 0.00000 2021
901 2020-06-19 0 0 0.00000 0.00000 2021
902 2020-06-20 0 0 0.00000 0.00000 2021
Bagcilar_dataset_CO <- edit_all_data_CO(Bagcilar_2019_ByHour, Bagcilar_2020_ByHour, Bagcilar_2021_ByHour)
tail(Bagcilar_dataset_CO)
Date CO_2019 CO_2020 CO_2021 allCO Year
849 2020-06-12 0 0 187.6708 187.6708 2021
850 2020-06-13 0 0 202.7333 202.7333 2021
851 2020-06-14 0 0 128.2458 128.2458 2021
852 2020-06-15 0 0 307.0455 307.0455 2021
853 2020-06-16 0 0 308.1833 308.1833 2021
854 2020-06-17 0 0 410.0000 410.0000 2021
Şile dataset doesn’t have data for CO and PM2_5 parameters
Sile_dataset_PM10 <- edit_all_data_PM10(Sile_2019_ByHour, Sile_2020_ByHour, Sile_2021_ByHour)
tail(Bagcilar_dataset_PM10)
Date PM10_2019 PM10_2020 PM10_2021 allPM10 Year
897 2020-06-15 0 0 28.30833 28.30833 2021
898 2020-06-16 0 0 14.25417 14.25417 2021
899 2020-06-17 0 0 20.12917 20.12917 2021
900 2020-06-18 0 0 0.00000 0.00000 2021
901 2020-06-19 0 0 0.00000 0.00000 2021
902 2020-06-20 0 0 0.00000 0.00000 2021
Sile_dataset_NO2 <- edit_all_data_NO2(Sile_2019_ByHour, Sile_2020_ByHour, Sile_2021_ByHour)
tail(Bagcilar_dataset_NO2)
Date NO2_2019 NO2_2020 NO2_2021 allNO2 Year
897 2020-06-15 0 0 34.34583 34.34583 2021
898 2020-06-16 0 0 23.02500 23.02500 2021
899 2020-06-17 0 0 23.57917 23.57917 2021
900 2020-06-18 0 0 0.00000 0.00000 2021
901 2020-06-19 0 0 0.00000 0.00000 2021
902 2020-06-20 0 0 0.00000 0.00000 2021
Here, we will examine the PM10, PM2_5, CO and NO2 change graphs for Bağcılar between 2019-2020-2021.
plot_Bagcilar_PM10 <- Bagcilar_dataset_PM10 %>%
ggplot( aes(x=Date, y=allPM10, group=Year, fill=Year, text=format(as.Date(Date), "%d-%m"))) +
geom_area( ) +
scale_fill_viridis(discrete = TRUE) +
theme(legend.position="none") +
ggtitle("Bagcilar PM10 Values between January, 2019 - June, 2021") +
ylab("PM10")+
theme_ipsum() +
theme(
legend.position = c(.95, .95),
legend.justification = c("right", "top"),
legend.box.just = "right",
legend.margin = margin(6, 6, 6, 6)
)+
theme_classic(base_size = 11)
ggplotly(plot_Bagcilar_PM10, tooltip="text")
plot_Bagcilar_PM2_5 <- Bagcilar_dataset_PM2_5 %>%
ggplot( aes(x=Date, y=allPM2_5, group=Year, fill=Year, text=format(as.Date(Date), "%d-%m"))) +
geom_area( ) +
scale_fill_viridis(discrete = TRUE) +
theme(legend.position="none") +
ggtitle("Bagcilar PM2_5 Values between January, 2019 - June, 2021") +
ylab("PM2_5")+
theme_ipsum() +
theme(
legend.position = c(.95, .95),
legend.justification = c("right", "top"),
legend.box.just = "right",
legend.margin = margin(6, 6, 6, 6)
)+
theme_classic(base_size = 11)
ggplotly(plot_Bagcilar_PM2_5, tooltip="text")
plot_Bagcilar_NO2 <- Bagcilar_dataset_NO2 %>%
ggplot( aes(x=Date, y=allNO2, group=Year, fill=Year, text=format(as.Date(Date), "%d-%m"))) +
geom_area( ) +
scale_fill_viridis(discrete = TRUE) +
theme(legend.position="none") +
ggtitle("Bagcilar NO2 Values between January, 2019 - June, 2021") +
ylab("NO2")+
theme_ipsum() +
theme(
legend.position = c(.95, .95),
legend.justification = c("right", "top"),
legend.box.just = "right",
legend.margin = margin(6, 6, 6, 6)
)+
theme_classic(base_size = 11)
ggplotly(plot_Bagcilar_NO2, tooltip="text")
plot_Bagcilar_CO <- Bagcilar_dataset_CO %>%
ggplot( aes(x=Date, y=allCO, group=Year, fill=Year, text=format(as.Date(Date), "%d-%m"))) +
geom_area( ) +
scale_fill_viridis(discrete = TRUE) +
theme(legend.position="none") +
ggtitle("Bagcilar CO Values between January, 2019 - June, 2021") +
ylab("CO")+
theme_ipsum() +
theme(
legend.position = c(.95, .95),
legend.justification = c("right", "top"),
legend.box.just = "right",
legend.margin = margin(6, 6, 6, 6)
)+
theme_classic(base_size = 11)
ggplotly(plot_Bagcilar_CO, tooltip="text")
Here, we will examine the PM10 and NO2 change graphs for Şile between 2019-2020-2021.
plot_Sile_PM10 <- Sile_dataset_PM10 %>%
ggplot( aes(x=Date, y=allPM10, group=Year, fill=Year, text=format(as.Date(Date), "%d-%m"))) +
geom_area( ) +
scale_fill_viridis(discrete = TRUE) +
theme(legend.position="none") +
ggtitle("Şile PM10 Values between January, 2019 - June, 2021") +
ylab("PM10")+
theme_ipsum() +
theme(
legend.position = c(.95, .95),
legend.justification = c("right", "top"),
legend.box.just = "right",
legend.margin = margin(6, 6, 6, 6)
)+
theme_classic(base_size = 11)
ggplotly(plot_Sile_PM10, tooltip="text")
plot_Sile_NO2 <- Sile_dataset_NO2 %>%
ggplot( aes(x=Date, y=allNO2, group=Year, fill=Year, text=format(as.Date(Date), "%d-%m"))) +
geom_area( ) +
scale_fill_viridis(discrete = TRUE) +
theme(legend.position="none") +
ggtitle("Şile NO2 Values between January, 2019 - June, 2021") +
ylab("NO2")+
theme_ipsum() +
theme(
legend.position = c(.95, .95),
legend.justification = c("right", "top"),
legend.box.just = "right",
legend.margin = margin(6, 6, 6, 6)
)+
theme_classic(base_size = 11)
ggplotly(plot_Sile_NO2, tooltip="text")
In the Graphs Part.1 section, we have obtained graphs comparing information containing 3 years. In this section, filtering was done on the data for summer and autumn months. Then, new graphs were drawn with the new dataframes obtained.
In this way, two graphs drawn on a seasonal basis were obtained. One of these graphs is a visualization tool comparing the data of PM10, PM2_5 , CO and NO2 components obtained from Bağcılar district during the autumn months of September, October and November. The other is a visualization tool that compares the data of PM10, PM2_5, CO and NO2 components obtained from Bağcılar district during the summer months of June, July and August.
Bagcilar_2020_Sonbahar_PM10 <- Bagcilar_dataset_PM10[Bagcilar_dataset_PM10[, "Date"] >= '2020-09-01' &
Bagcilar_dataset_PM10[, "Date"] <= '2020-11-30' &
Bagcilar_dataset_PM10[, "Year"] == "2020", ]
colnames(Bagcilar_2020_Sonbahar_PM10) <- c("Date","1","2","3","Values","Year")
Bagcilar_2020_Sonbahar_PM2_5 <- Bagcilar_dataset_PM2_5[Bagcilar_dataset_PM2_5[, "Date"] >= '2020-09-01' &
Bagcilar_dataset_PM2_5[, "Date"] <= '2020-11-30' &
Bagcilar_dataset_PM2_5[, "Year"] == "2020", ]
colnames(Bagcilar_2020_Sonbahar_PM2_5) <- c("Date","1","2","3","Values","Year")
Bagcilar_2020_Sonbahar_CO <- Bagcilar_dataset_CO[Bagcilar_dataset_CO[,"Date"] >= '2020-09-01' &
Bagcilar_dataset_CO[, "Date"] <= '2020-11-30' &
Bagcilar_dataset_CO[, "Year"] == "2020", ]
colnames(Bagcilar_2020_Sonbahar_CO) <- c("Date","1","2","3","Values","Year")
Bagcilar_2020_Sonbahar_NO2 <- Bagcilar_dataset_NO2[Bagcilar_dataset_NO2[, "Date"] >= '2020-09-01' &
Bagcilar_dataset_NO2[, "Date"] <= '2020-11-30' &
Bagcilar_dataset_NO2[, "Year"] == "2020", ]
colnames(Bagcilar_2020_Sonbahar_NO2) <- c("Date","1","2","3","Values","Year")
colors <- c("NO2" = "blue", "PM10" = "red", "PM2_5" = "yellow", "CO" = "black")
ggplot(NULL) +
geom_line(data = Bagcilar_2020_Sonbahar_NO2, aes(x = Date, y=Values, group=Year, color = "NO2"), size = 1.5) +
geom_line(data = Bagcilar_2020_Sonbahar_PM10, aes(x = Date, y = Values,group=Year, color = "PM10"), size = 1.5) +
geom_line(data = Bagcilar_2020_Sonbahar_PM2_5, aes(x = Date, y = Values,group=Year, color = "PM2_5"), size = 1.5) +
geom_line(data = Bagcilar_2020_Sonbahar_CO, aes(x = Date, y = Values/1000,group=Year, color = "CO"), size = 1.5) +
ggtitle("PM10, PM2_5, NO2 ve CO Values of Autumn Months in 2020 for Bağcılar") +
labs(x = "Date",
y = "Values",
color = "Parametreler") +
scale_color_manual(values = colors)+
theme_classic(base_size = 11)
Bagcilar_2020_Yaz_PM10 <- Bagcilar_dataset_PM10[Bagcilar_dataset_PM10[, "Date"] >= '2020-06-01' &
Bagcilar_dataset_PM10[, "Date"] <= '2020-08-31' &
Bagcilar_dataset_PM10[, "Year"] == "2020", ]
colnames(Bagcilar_2020_Yaz_PM10) <- c("Date","1","2","3","Values","Year")
Bagcilar_2020_Yaz_PM2_5 <- Bagcilar_dataset_PM2_5[Bagcilar_dataset_PM2_5[, "Date"] >= '2020-06-01' &
Bagcilar_dataset_PM2_5[, "Date"] <= '2020-08-31' &
Bagcilar_dataset_PM2_5[, "Year"] == "2020", ]
colnames(Bagcilar_2020_Yaz_PM2_5) <- c("Date","1","2","3","Values","Year")
Bagcilar_2020_Yaz_CO <- Bagcilar_dataset_CO[Bagcilar_dataset_CO[,"Date"] >= '2020-06-01' &
Bagcilar_dataset_CO[, "Date"] <= '2020-08-31' &
Bagcilar_dataset_CO[, "Year"] == "2020", ]
colnames(Bagcilar_2020_Yaz_CO) <- c("Date","1","2","3","Values","Year")
Bagcilar_2020_Yaz_NO2 <- Bagcilar_dataset_NO2[Bagcilar_dataset_NO2[, "Date"] >= '2020-06-01' &
Bagcilar_dataset_NO2[, "Date"] <= '2020-08-31' &
Bagcilar_dataset_NO2[, "Year"] == "2020", ]
colnames(Bagcilar_2020_Yaz_NO2) <- c("Date","1","2","3","Values","Year")
colors <- c("NO2" = "blue", "PM10" = "red", "PM2_5" = "yellow", "CO" = "black")
ggplot(NULL) +
geom_line(data = Bagcilar_2020_Yaz_NO2, aes(x = Date, y=Values, group=Year, color = "NO2"), size = 1.5) +
geom_line(data = Bagcilar_2020_Yaz_PM10, aes(x = Date, y = Values,group=Year, color = "PM10"), size = 1.5) +
geom_line(data = Bagcilar_2020_Yaz_PM2_5, aes(x = Date, y = Values,group=Year, color = "PM2_5"), size = 1.5) +
geom_line(data = Bagcilar_2020_Yaz_CO, aes(x = Date, y = Values/1000,group=Year, color = "CO"), size = 1.5) +
ggtitle("PM10, PM2_5, NO2 ve CO Values of Summer Months in 2020 for Bağcılar") +
labs(x = "Date",
y = "Values",
color = "Parametreler") +
scale_color_manual(values = colors)+
theme_classic(base_size = 11)
x_date <- Bagcilar_2020_Sonbahar_PM10$Date
x_val <- Bagcilar_2020_Sonbahar_PM10$Values
PM10_Bagcilar_AGI <- as.data.frame(x_date)
Status <- case_when(
x_val >= 0 & x_val <= 55 ~ "Good",
x_val > 55 & x_val <= 155 ~ "Moderate",
x_val > 155 & x_val <= 255 ~ "Unhealthy for Sensitive Groups",
x_val > 255 & x_val <= 355 ~ "Unhealthy",
x_val > 355 & x_val <= 425 ~ "Very Unhealthy",
x_val > 425 & x_val <= 605 ~ "Hazardous",
TRUE ~ as.character(x_val)
)
PM10_Bagcilar_AGI$Status <- Status
tail(PM10_Bagcilar_AGI)
x_date Status
86 2020-11-25 Good
87 2020-11-26 Good
88 2020-11-27 Moderate
89 2020-11-28 Moderate
90 2020-11-29 Moderate
91 2020-11-30 Moderate
ggplot(PM10_Bagcilar_AGI, aes(x_date, Status, color=Status)) +
geom_point()+
theme_classic(base_size = 15)+
xlab("Date")
We downloaded the datasets for the years 2019- 2020-2021. The reason we chose it this way is that the Pandemic has started for Turkey in 2020-March. In order to understand the changes since March 2020, we need to look ahead. In this way, datasets for Bağcılar and Şile were downloaded from January 2019 until June 2021.
In this study, the main point to be examined is how pollutants can affect our lives before, during and after the pandemic. We talked about what pollutants are and to what extent they affect our lives. In this way, we decided to examine the pollutants PM10, PM2_5, CO and NO2 that we have chosen.
The reason why we chose Bağcılar and Şile, which are districts of Istanbul, the most populous city of Turkey; Bagcilar is one of the densest districts of Istanbul in terms of population and surface area, and Şile is the least dense district of Istanbul.
Datasets containing the data of pollutant measurement stations belonging to these districts were examined.
As a result of these investigations, two graphic groups were obtained. In the first group, the daily average values of the pollutant data recorded for 3 years for both districts, Bağcılar and Şile, were grouped on the same graph and visualized by year. In the second group, it was visualized to what extent PM10, PM2_5 CO and NO2 pollutants changed during the autumn and summer seasons of Bağcılar.
We thought that we could draw the following conclusions based on the graphs obtained. Together with the graphics in the first part, we observed that from the beginning of the pandemic until June 2021, the pollutant data decreased compared to each other on the same day of each year. Pollutant data from 2019 to 2021 showed a decrease, but the correct rate of reduction occurred between the same days of each year.
With the graphs in the second part, the increase and decrease rates of PM10 and NO2 pollutants during the autumn and summer months are very similar. However, we observed that the PM2_5 pollutant’s data included much more data changes only in summer than in autumn, and changed independently of other pollutants. The CO pollutant acted independently of the other 3 components in both seasons and did not show much change.