Thursday, 17 December 2015

Modelling tips II

A few tips for modelling and analysing ecological data. 

Episode II

Let's continue with the second part of my personal tips for analysing ecological data. Most of you would read through this post and think that it amounts to no more than a collection of motherhood statements and very basic recommendations. That might be true, but it strikes me how important and simple these tips are, yet I keep on finding works that would have benefited from using these ideas to improve the analyses presented.

I am currently in Spain. It is a bit colder here than it was in Adelaide when I flew out. 

4. Your model must abide by the fundamental laws of the universe. This may sound like as a joke to mess up with you – but people seem to forget about this rule quite too often.  If your model structure, model setting and assumptions, or model results violate the fundamental laws of the universe, then your model is inadequate. One example: you are trying to model the relationship between the number of cockroaches counted in a set of plots and the soil characteristics in those plots. You are using a linear regression – not the very best choice. One of the characteristics of a linear regression is that it models continuous response data, and therefore under the model it is possible to have fractions of individuals in a plot (e.g., 2.34 cockroaches). Obviously, it is impossible to have fractional numbers of cockroaches per plot. A better choice would be a Poisson generalised linear model, more suitable for counts.

Violating the laws of the universe: 1.5 individuals in a single photo, although one of them is not having a very good day. This is a Brown falcon having an Eastern brown snake for lunch (Roseworthy, SA).

5. Check your model results against your background knowledge of the system that you are modelling. Use common sense to evaluate the outputs from you model. Do the results make sense to you? It does not matter that some metrics such as R2 or p-values suggest that the model is good – if the results do not make sense, go and check your model, computer code, and your data. You may need to change your modelling approach.

6. Be very wary of any result from your model that contradicts what you know about the system or the existing knowledge of similar situations. I can assure you that 99.9% of the times when to get a very surprising and striking result, there is a problem with the model, the code used for the model, or something in the same line. It is highly unlikely that you have discovered something groundbreaking and unexpected. Check carefully your model, the code, and your data. Run the analyses several times to check the consistency of your results.

Wow! This linear regression is awesome! Look at that r-squared value and the p-value for the slope - this would ensure that you get a paper in Nature! Too good to be true, unfortunately... better go and check the model, the code, and the data carefully. Moreover, you must re-run the analysis to check that the results are consistent. The code is available here.   

Again, remember that a model is a representation of how do you think your system works. It is your choice and your responsibility to propose an adequate model for researching your ecological system. A statistical analysis is not a black box where you input some data and obtain an unbiased, reliable, solution to your question. Rather, you control the choices, and, hence, the robustness of the results would depend on those choices. I know that this sounds a bit overwhelming for beginners, but trust me when I say that with some time and dedication the task becomes easier (and even entertaining for some of us).

An annotated R script for producing the linear regression can be found in my GitHub repository:

No comments:

Post a Comment