# R Programming Course – Assignment 1 : Air Pollution Part 1

I am taking the R programming course from the Data Science Specialization offered by the John Hopkins University on Coursera. This blog post is a personal notes taking where we can follow the reasoning during the exercices.

Today I try to complete the Assignement 1 “Air Pollution” Part 1. We are given a .zip file that contains 332 *.csv files containing pollution monitoring data for fine particulate matter (PM) air pollution at 332 locations in the United States. Each file contains data from a single monitor and the ID number for each monitor is contained in the file name. Here is my walkthrough.

## Part 1 : pollutantmean()

The Part 1 is about writing the pollutantmean(directory, pollutant, id=1:332) function which returns the mean of a specified pollutant out of one or many CSV (requested by id) in the specified directory.

The results should be:

``````> pollutantmean("specdata", "sulfate", 1:10)
`````` 4.064
``````> pollutantmean("specdata", "nitrate", 70:72)
`````` 1.706
``````> pollutantmean("specdata", "nitrate", 23)
```` 1.281````

### My try :

There are 2 cases: when ID is given for one single monitor, when ID is given for many monitors in a row.

```pollutantmean <- function(directory, pollutant, id = 1:332) {
files <- list.files(directory, full.names = TRUE)

# Case where id indicates 1 file
if (length(files[id])==1){
}

# Case where id indicates many files in a row
else {
datas <- data.frame()
for (i in 1:length(files[id])){
}
mean(datas[,pollutant], na.rm=1)
}
}```

Results are:

```> pollutantmean("specdata", "sulfate", 1:10)
 4.064128
> pollutantmean("specdata", "nitrate", 70:72)
 0.8599547
> pollutantmean("specdata", "nitrate", 23)
 1.280833```

The first and the third requests works but not the second one… The mistake is that the loop is always starting at i=1 instead of the given set (that is why 1:10 returns the right answer, but 70:72 actually returns the result for 1:72). By simply fixing the loop, the results are all right:

```## Fixed loop
for (i in id){
}```
```> pollutantmean("specdata", "sulfate", 1:10)
 4.064128
> pollutantmean("specdata", "nitrate", 70:72)
 1.706047
> pollutantmean("specdata", "nitrate", 23)
 1.280833```

What I try do next is to fix the function to makes it works with disparate ID given. I do :
– Read the monitor files list into the files vector, then binding into the bind23_26 vector files 23 and 26 (it actually adds the 26’s datas just after the 23’s datas into one single data.frame).
– Create a vector containing id=23 and id=26 and requesting them into the pollutantmean() function.

```> files <- list.files("specdata", full.names=1)
> mean(bind23_26[,"nitrate"], na.rm=1)
 4.169054
> v <- c(23,26)
> pollutantmean("specdata", "nitrate", v)
 4.169054```

Surprisingly it works without fixing the loop. I learned that loops can works with (i in c(1, 4, 5, …) ).

Next, I guess I have to fix the results to be shown at 10-3 just like the example, but the assignment asks not to round the values…

Finally, I can erase the case where ID is a single element since for loop can obviously browse a set of 1 number.

```## pollutantmean.R
pollutantmean <- function(directory, pollutant, id = 1:332) {
files <- list.files(directory, full.names = TRUE)
datas <- data.frame()
for (i in id){
}
mean(datas[,pollutant], na.rm=1)
}```
1. charles sutton says:
2. Xiaoyan Fan says: