Learning *apply() Functions

This article is intended to briefly summarize what I’ve learn about the *apply functions.

split() and lapply()

We’ll work on the Air Quality *.csv file from the Data Science Specialization course:

Capture d'écran 2015-10-20 12.51.42

> airquality <- read.csv("rcourse/hw1_data.csv")
> head(airquality)

We want to create a matrix that contain the monthy means of each column :

> s <- split(airquality, airquality$Month) # This create a list of 5 data.frame, separating the datas of the 5 collected months (may to september).

Capture d'écran 2015-10-20 12.57.33

Since each elements of the list have the same dimensions, we can use sapply() to make a matrix of these datas. sapply() summarize (as long as it can) the result of lapply(). We use the colMeans() function which is a shortcut for apply(s, 2, mean). The dataset containing some NA values, we want to remove them turning na.rm to TRUE.

> sapply(s, colMeans, na.rm=1)

Capture d'écran 2015-10-20 13.01.29

If we only need some of the columns, we can use an anonymous function :

> sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")], na.rm=1))

Capture d'écran 2015-10-20 13.06.55

—-

Some exercices

> library(datasets)
> data(iris)

This famous (Fisher’s or Anderson’s) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.”

In this dataset, what is the mean of ‘Sepal.Length’ for the species virginica

> virginica<-subset(iris, iris[,"Species"]=="virginica") 
> mean(virginica[,1]) # Since Sepal.Length is [,1]
How can one calculate the average miles per gallon (mpg) by number of cylinders in the car (cyl)?
> split(mtcars$mpg, mtcars$cyl)
$`4`
 [1] 22.8 24.4 22.8 32.4 30.4 33.9 21.5 27.3 26.0 30.4 21.4

$`6`
[1] 21.0 21.0 21.4 18.1 19.2 17.8 19.7

$`8`
 [1] 18.7 14.3 16.4 17.3 15.2 10.4 10.4 14.7 15.5 15.2 13.3 19.2 15.8 15.0

> sapply(split(mtcars$mpg, mtcars$cyl), mean)
4        6        8 
26.66364 19.74286 15.10000
Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: