Simple models for abundance
Ecology
is fundamentally the study of the distribution and abundance of living things,
and a core goal is to understand the factors that influence the spatial and
temporal variation in distribution and abundance patterns. Over the last few
months, I have had a few people asking me how to model abundance, so I decided
to make a quick post on the topic.
Abundance is frequently measured as a count, such as the number of individuals of a target species seen in a plot or transect. Commonly, you will also record some features of the site that you think may influence the abundance of the target species – vegetation cover, for example. The Poisson distribution is the fundamental probability distribution to model abundance data, as it supports discrete positive values (e.g., counts). The Poisson-log Generalised Linear Model is the most common regression model for studying the relationship between abundance and the factors of interest. In a Poisson-log GLM, the relationship between abundance and the factors of interest is linear on the log-scale, and the abundance follows a Poisson distribution with mean abundance λi:
log(λi) = intercept + slope * xi; or equivalently λi = exp(intercept +
slope * xi)
Abundance ~ Poisson(λi)
where xi is the factor of interest. Let’s do an
example in R, where we are interested in how the vegetation cover in a plot,
measured as a proportion of the plot covered by vegetation, influences the
abundance of Australian huntsman spiders.
![]() |
A cute and adorable huntsman spider on Newland Head Conservation Park, South Australia |
First of all, let’s simulate some data in R:
#### sample.size: number of plots surveyed for huntsman, n = 30
set.seed(1000)
sample.size<-30
#### Simulate vegetation cover data for the survey plots from a
Beta distribution
veg.cov<-rbeta(sample.size,
0.5, 0.5)
####
Now, let’s define the parameters of the Poisson-log GLM for the relationship between
#### the huntsman abundance and the vegetation cover
#### the huntsman abundance and the vegetation cover
intercept.sim=-2 #### Intercept
slope.sim=5 #### Slope
slope.sim=5 #### Slope
####
Define the mean huntsman abundance λ as a function of the vegetation cover
lambda<-exp(intercept.sim+slope.sim*veg.cov)
####
Finally, obtain the huntsman abundance in each and every survey plot by drawing values
#### from a Poisson distribution with mean λ
abundance<-rpois(sample.size,
lambda)
#####
Let’s have a look at the relationship between the huntsman abundance and the
####
vegetation cover
plot(abundance~veg.cov, xlab="Vegetation cover (%)", ylab="Huntsman abundance", pch=19, cex=3)
Now, it is time for the formal modelling to answer the
question, does vegetation cover influences the huntsman abundance? We respond to the question by doing a Poisson-log GLM on the simulated data.
####
Fit a Poisson-log GLM to the huntsman abundance (dependent variable) and
####
vegetation cover data (independent).
fit1<-glm(abundance~veg.cov,
family=poisson())
####
Have a look at the summary Table.
summary(fit1)
The
‘Estimate’ column shows that ‘veg.cov’ indeed has a strong positive effect, and
the column 'Pr(>|z|)’ indicates that the effect is
statistically significant (p <0.05). Therefore, we conclude that
the huntsman abundance increases with increasing vegetation cover.
We have simulated the data and that means that we know exactly the values of the intercept
(intercept.sim = -2) and the slope (slope.sim = 5) of the Poisson-log GLM. Is the algorithm
able to recover these values? That is, how similar are the values for the
intercept and the slope estimated by the fitted model to the
simulated values? Obviously, the better the best. Having a look at the column ‘Estimate’,
we can see that the values compare well. To further confirm the reliability of
the fitted model, we can calculate the 95% Confidence Intervals for the
intercept and the slope for the fitted model:
####
Calculate the 95% Confidence Intervals
conf.int<-confint(fit1)
conf.int
Again, we see that our simulated values fall well within
the 95% Confidence Intervals drawn from the fitted model. We conclude that our
fitted model is good, and that it recovers the coefficients adequately.
This was a very simple, quick, example. Modelling abundance can be way more difficult and complicated. For example, here I have assumed that the huntsman abundance is perfectly observed, that means that we have counted all the spiders present in each and every survey plot. The reality is that this is rarely the case and that we commonly do not detect all the individuals present in a place. There are statistical tools to deal with this kind of biases, but those are an entirely different world.
An annotated R script for conducting the Poisson-log GLM can be found in my GitHub repository: https://github.com/pablogarciadiaz/Code-Blog/blob/master/Poisson-Regression.R
No comments:
Post a Comment