I have recently been playing with spatial data as a learning exercise and I have been particularly interested in point pattern analysis, often use in epidemiology and developing in ecology. While some packages already have canned function for this (see the excellent inlabru or the very well known spatstat), I prefer not to rely on them. As I wanted to improve my understanding of point pattern models and decided to use rstanwhere I need to code my model from scratch. Stan, this is a probabilistic programming language to obtain Bayesian inference using the Hamiltonian Monte Carlo algorithm.

Point pattern and point process

There are many resources introducing to the notions of point patterns and point processes but I will quickly explain these two notions here.

Point pattern

A point pattern represents the distribution of a set of points in time, space or higher dimensions. For instance, the location of trees in a forest can be thought as a point pattern. The location of crimes is another example of point pattern. There are three general patterns:

  • Random : any point is equally likely to occur at any location and the position of a point is not affected by the position of other points. For example, if I throw a bag of marble on the floor it is likely that the pattern will be random.
  • Uniform : every point is as far from its neighbors as possible. For example, we can think of a human-made forests where trees are regularly placed.
  • Clustered : many points are concentrated close together, possibly due to a covariate. We can take the example of bees locations in a field, locations will likely cluster around flowers. The point pattern that we simulate in this post represent a clustered point pattern.

Point process

A Spatial point processes is a description of the point pattern. We can think of it as the model which generated the point pattern. The points arise from a random process, described by the local intensity λ(s), which measures the expected density of points at a given location, s, in space. If points arise independantly and at random, the local intensity can be described by a homogenous Poisson distribution and is refered to as a Poisson point process. If event locations are independant but the intensity varies spatially, the distribution arises from an inhomogenous point process (i.e. λ(s) varies). The latter is also called inhomogenous Poisson process.

We can model the intensity of the inhomogenous point process as a function of covariates. We describe this type of model as follow:

λ(s)=exp(α+β∗X(u))

Where X(u) is a spatial covariate and α and β are parameters to be estimated. Z(u) can represent the pH of a soil for instance or temperature in the air.

R libraries

To replicate this tutorial you will need to load the following libraries:

library(spatstat)
library(sf)
library(sp)
library(maptools)
library(raster)
library(rstan)
library(tidyverse)
library(cowplot)

#data-science #data-analysis #r #r-programming #developer

Understanding Point Process Model with R
2.80 GEEK