R Programming Course – Assignment 1 : Air Pollution Part 1

I am taking the R programming course from the Data Science Specialization offered by the John Hopkins University on Coursera. This blog post is a personal notes taking where we can follow the reasoning during the exercices.

Today I try to complete the Assignement 1 “Air Pollution” Part 1. We are given a .zip file that contains 332 *.csv files containing pollution monitoring data for fine particulate matter (PM) air pollution at 332 locations in the United States. Each file contains data from a single monitor and the ID number for each monitor is contained in the file name. Here is my walkthrough.

Part 1 : pollutantmean()

The Part 1 is about writing the pollutantmean(directory, pollutant, id=1:332) function which returns the mean of a specified pollutant out of one or many CSV (requested by id) in the specified directory.

The results should be:

> pollutantmean("specdata", "sulfate", 1:10)
[1] 4.064
> pollutantmean("specdata", "nitrate", 70:72)
[1] 1.706
> pollutantmean("specdata", "nitrate", 23)
[1] 1.281

My try :

There are 2 cases: when ID is given for one single monitor, when ID is given for many monitors in a row.

pollutantmean <- function(directory, pollutant, id = 1:332) {
 files <- list.files(directory, full.names = TRUE)
 
 # Case where id indicates 1 file
 if (length(files[id])==1){
 mean(read.csv(files[id])[,pollutant], na.rm=1)
 }
 
 # Case where id indicates many files in a row
 else {
 datas <- data.frame()
 for (i in 1:length(files[id])){
 datas <- rbind(datas, read.csv(files[i]))
 }
 mean(datas[,pollutant], na.rm=1)
 }
}

Results are:

> pollutantmean("specdata", "sulfate", 1:10)
[1] 4.064128
> pollutantmean("specdata", "nitrate", 70:72)
[1] 0.8599547
> pollutantmean("specdata", "nitrate", 23)
[1] 1.280833

The first and the third requests works but not the second one… The mistake is that the loop is always starting at i=1 instead of the given set (that is why 1:10 returns the right answer, but 70:72 actually returns the result for 1:72). By simply fixing the loop, the results are all right:

## Fixed loop
for (i in id){
 datas <- rbind(datas, read.csv(files[i]))
}
> pollutantmean("specdata", "sulfate", 1:10)
[1] 4.064128
> pollutantmean("specdata", "nitrate", 70:72)
[1] 1.706047
> pollutantmean("specdata", "nitrate", 23)
[1] 1.280833

What I try do next is to fix the function to makes it works with disparate ID given. I do :
– Read the monitor files list into the files vector, then binding into the bind23_26 vector files 23 and 26 (it actually adds the 26’s datas just after the 23’s datas into one single data.frame).
– Create a vector containing id=23 and id=26 and requesting them into the pollutantmean() function.

> files <- list.files("specdata", full.names=1)
> bind23_26 <- read.csv(files[23])
> bind23_26 <- rbind(bind23_26, read.csv(files[26]))
> mean(bind23_26[,"nitrate"], na.rm=1)
[1] 4.169054
> v <- c(23,26)
> pollutantmean("specdata", "nitrate", v)
[1] 4.169054

Surprisingly it works without fixing the loop. I learned that loops can works with (i in c(1, 4, 5, …) ).

Next, I guess I have to fix the results to be shown at 10-3 just like the example, but the assignment asks not to round the values…

Finally, I can erase the case where ID is a single element since for loop can obviously browse a set of 1 number.

## pollutantmean.R
pollutantmean <- function(directory, pollutant, id = 1:332) {
 files <- list.files(directory, full.names = TRUE)
 datas <- data.frame()
 for (i in id){
 datas <- rbind(datas, read.csv(files[i]))
 }
 mean(datas[,pollutant], na.rm=1)
}

Part 2 : complete()
Part 3 : corr()

Advertisement

4 thoughts on “R Programming Course – Assignment 1 : Air Pollution Part 1

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: