Friday, January 13, 2012

R tips - Using For loops in R

Using For loops in R significantly shortens your scripts. After all the process is pretty simple, and maybe because of that the tips for the topic are rather scattered. It is kind of assumed that everyone can do these loops. However, for us who have no programming background the philosophy behind these loops is rather unfamiliar. In this post I'll try to clarify usage of for loos in R with some simple examples.

The anatomy of a for loop is following:

for(for loop parameter in start (number):end (number)){
actual script using for loop parameter
}

You can replace for loop parameter with any name (numbers are not allowed). For some reason people often use i,j,l or k. This is fine, but by all means a,f,g,blurp or fk will work as well. I like to use letters, because they are short and require less writing.

Start (number) and end (number) must be replaced with either a number or with a function that returns a number. For example ncol(x), ncol(x)-1, length(x)/mean(x$y) and grep("column name", colnames(x)) are all fine as long as they make sense for you. x is your data frame here.

Essentially you should write a script that does the process you want to loop once. In this script you'll have to use column, row or vector element numbers instead of names. Then you'll just replace the number with a for loop parameter. Let's get started:

data(CO2)


df <- data.frame(CO2)


str(df)


# I want to change the class of all columns to "factor". I'd do it following, if I used column numbers:


df[,1] <- as.factor(df[,1])


# Then it's just to replace "1" with a For loop parameter:



for(i in grep("conc", colnames(df)):ncol(df)){
df[,i] <- as.factor(df[,i])}


# Here I used grep command to start the loop from the 4th column, since columns 1-3 are already factors.


str(df)



# You can do a lot of things with these loops. For example, make multiple figures:


df <- data.frame(CO2)
x <- levels(df$Plant)


for(i in 1:length(x)){
y <- df[df$Plant == x[i],]
png(paste(x[i], "_plot", ".png", sep = ""), width = 450, height = 450)
plot(y$conc, y$uptake, type = "b", col = "red", main = paste(x[i], "plot and blaa", sep = " - "))
dev.off()
}


# Write files


df <- data.frame(CO2)
x <- levels(df$Plant)


for(i in 1:length(x)){
y <- df[df$Plant == x[i],]
write.table(y, paste(x[i], "_data", ".txt", sep = ""), sep = "\t", row.names = F) 
}


# Read several files and append them to a data frame (the same can be done for a vector, but use append() command)


path <- getwd() # You can change this as you will, remember to use / instead of \ in the file path
files <- dir(path, pattern = "_data.txt")


df <- data.frame()
for(i in 1:length(files)){
x <- read.delim(files[i])
df <- rbind(df, x)
}

And a lot more. You can also make loops inside loops. While making these loops a good tip is to start with

i <- 1

for instance. Then you can write the script with "i" already on it's place. Test it. If it works, add the for(){} command. If not, you'll still have time to fix it without crashing your computer (this can happen, if you make a mistake, so save your files frequently).

Read more...