Wednesday, November 3, 2010

R tips - How to calculate diet indices from time-series data?

Ok, since I am supposed to write also about my work here, I'll start a new series called "R tips". R is an open source statistical program useful especially for nature scientists. This might be very nerdy and useless "shait" for the most, but I hope that some lucky Googlers find these tips useful. At least I have found this kind of blog entries very helpful when doing my work.

Lately I have been working with a seabird diet time-series. I have learned that there are tens of different indices that can be used to describe the dataset. Best fitting index has to be chosen individually depending on the questions and data. Swanson & Krapu (1974) made a review of the issue. Here I am going to demonstrate how to calculate "frequency of occurrence", "aggregate mass", "aggregate percentage" and mean mass for a time-series.

Shortly, frequency of occurrence tells how frequently a diet item occurs in the dataset. The aggregate mass tells the percentage for a diet item of summed total volume of all samples. Finally, the mean of mass percentages, or aggregate percentage, tells the average over the dataset of how many percentages a diet item constitutes of a stomach sample. This is much better explained in Swanson & Krapu 1974.

There are of course thousands of ways to do this, but simplest I have found so far is by using "reshape" package. Here is the R code:

#Assume a dataset where "a", "b" and "c" are diet items, "x1", "x2" ... are diet samples from individual birds and "y1" and "y2" are different years. Values for diet items are given in grams:

a <- c(1, 1, 2, 0, 3, 0, 5, 6, 1, 2)
b <- c(2, 1, 3, 2, 5, 2, 3, 5, 1, 0)
c <- c(0, 0, 1, 2, 4, 0, 20, 0, 0, 0)
year <- c(rep("y1", 5), rep("y2", 5))
bird <- paste(rep("x", 10), seq(1,10,1), sep="")

data <- cbind.data.frame(bird, year, a, b, c)

library(reshape)  # start the reshape package

data$total.mass <- rowSums(data[3:5]) # create total.mass column for later use
melt.data <- melt(data, id=1:2, measured=3:6) # melt the data so that we can cast it in desired format

# start from easiest, mean mass and standard deviation.

t(cast(melt.data, formula = year ~ variable, mean)) #Transposing t() because it is better to list diet items as rows when there are a lot of prey items
t(cast(melt.data, formula = year ~ variable, sd))

# aggregate mass needs column sum and total mass for each year

mass.sum <- cast(melt.data, formula = year ~ variable, sum)
aggregate.mass <- (mass.sum[2:5]/mass.sum$total.mass)*100

#if you want to print the dataframe, you can do following:

rownames(aggregate.mass) <- levels(mass.sum$year)
print(t(aggregate.mass), digits = 1)

#aggregate percentage

data.per <- cbind.data.frame(data[1:2], (data[3:6]/data$total.mass)*100) # calculate how big percentage each diet item consitutes in each sample
data.per <- data.per[-6] # drop off useless total.mass column
melt.data.per <- melt(data.per, id=1:2, measured=3:5)
aggregate.percentage <- cast(melt.data.per, formula = year ~ variable, mean)

#frequency of occurrence

data.na <- data
data.na[data.na==0] <- NA
melt.data.na <- melt(data.na, id=1:2, measured=3:6, na.rm = T)
occur <- cast(melt.data.na, formula = year ~ variable, length)
t(cbind.data.frame(occur[2:4]/occur$total.mass, row.names = levels(occur$year)))

#comparison between aggregate percentage and aggregate mass

rownames(aggregate.percentage) <- levels(aggregate.percentage$year)
aggregate.percentage <- aggregate.percentage[-1]
aggregate.mass <- aggregate.mass[-4]

aggregate.percentage-aggregate.mass

# differences are mainly because of 20 grams of item c in bird x7. The aggregate volume method gives equal weight to each unit of food consumed by any bird while the aggregate percent method gives equal weight in the analysis to each bird as Swanson & Krapu (1974) says.

Please tell, if there are mistakes in scripts.

References

Swanson, G. A., G. L. Krapu, et al. (1974). "Advantages in Mathematically Weighting Waterfowl Food-Habits Data." Journal of Wildlife Management 38(2): 302-307.

Read more...