Chapter 3 Estimators

3.1 Simple Random Sample (design-based)

The simplest approach is to treat the data as if all points were collected from a SRS design, and to use the usual SRS variance estimator,

\[\begin{equation}\label{eq:srs_var} \widehat{\rm var}_{srs}(\bar{y}) = \frac{s^2_y}{n} = \frac{1}{n(n - 1)}\sum\limits_{i=1}^n \left(y_i - \bar{y}\right)^2 \ . \end{equation}\]

Several simulation studies have demonstrated that this estimator tends to over-estimate the true variance, and the over-estimation can be substantial when positive autocorrelation exists (R. M. Fewster 2011; Dunn and Harrison 1993).

3.2 Post-stratification (design-based)

This class of estimators was proposed by Wolter (1984) and has been extended to systematic spatial surveys e.g., Dunn and Harrison (1993) R. M. Fewster (2011), Millar and Olsen (1995). Each stratum contains a fixed number of neighbouring sampled units from the systematic design, and the stratified random sample variance estimator of \(\bar{y}\) is used (Rachel M. Fewster et al. 2009) (this is ad-hoc because stratified designs assume simple random sampling within strata). (Millar and Olsen 1995) provided a variation on this approach, by showing that it could be applied using strata that overlapped.

Consider \(H\) overlapping strata where \(h = (1,2,\dots,H)\) index the strata, and let stratum \(h\) contain \(n_h\) sampling units
(in this case \(n_h = 4\) in Figure~). %Delete or add Figure Let the set of sampling units in strata \(h\) be denoted by the set \(S_h\). Then the post-stratified estimate of variance is

\[\begin{equation}\label{strata_var} \widehat{\rm var}_{str}(\bar{y}) = \sum\limits_{h = 1}^H w_h^2 \frac{s^2_h}{n_h} \ , \end{equation}\]

where \[ s^2_h = \frac{1}{n_h-1} \sum\limits_{i \in S_h} (y_i - \bar{y_h})^2, \text{ and } w_h = \frac{n_h}{\sum_h n_h} \ . \] This formulation follows from Millar and Olsen (1995) and assumes overlapping strata. However, there are alternative overlapping and non-overlapping post-stratified variance estimators (see D’Orazio (2003)). These alternative estimators are due in part to their original application for one dimensional systematically sampled data (Wolter 1984). As discussed by D’Orazio (2003), extending these one-dimensional estimators to two-dimensions is not straightforward.

Example of systematic survey. Blue rectangles are sampling units

3.3 Correlation adjustment (hybrid)

Defined in Wolter (1984) and investigated for spatial systematic samples (Ambrosio Flores, Iglesias Martı́nez, and Marı́n Ferrer 2003; Strand 2017; Brus and Saby 2016; McGarvey, Burch, and Matthews 2016), this estimator makes an adjustment to \(Var_{srs}(\bar{y})\) (Equation~) based on the amount of spatial auto-correlation that exists in the systematic sample.

\[\begin{align}\label{eq:adjust_corr} \widehat{\rm var}_{adj}(\bar{y}) =& \widehat{\rm var}_{srs}(\bar{y}) \bigg[1 + \frac{2}{ln({\rho})} + \frac{2}{({\rho}^{-1} - 1)}\bigg]& \text{if }{\rho} > 0\\ =& \widehat{\rm var}_{srs}(\bar{y})& \text{if }{\rho} \leq 0 \ .\nonumber \end{align}\] %, Here, \({\rho}\) is often substituted for by an estimate of global spatial-autocorrelation such as the commonly used Morans I .

3.4 Boxlet estimator (hybrid)

The boxlet method proposed by R. M. Fewster (2011)} fits a density surface to the spatial domain \(\mathcal{D}\) using a two-dimensional generalised additive model (GAM) (Wood 2017). For any possible location, \(b\), of the first quadrat, this enables calculation of the expected total count within the sampled quadrats, \(\eta_b\) and within \(\mathcal{D}\), \(\eta_\mathcal{D}\). This calculation is done using finite approximation over a fine lattice within \(\mathcal{D}\) (these lattice points correspond to the centroids of the so-called boxlets). Let \(p_b=\eta_b/\eta_\mathcal{D}\) denote the proportion of the expected total count that is expected within the sampled quadrats.

Let \(y^+=\sum_{i=1}^n y_i\) be the number of individuals counted in the systematic sample. The boxlet method partitions the variance of \(y^+\) using the law of total variance (Equation (3.1)). The first term (\({\rm var}_b(E[y^+|b])\)) is the variance in the expected value of \(y^+\) due to the possible starting position, \(b\), of the first quadrat. The second component (\(E_b[{\rm var}(y^+|b)]\)) quantifies the expected (over \(b\)) magnitude of the variability in \(y^+\) for a given \(b\),

\[\begin{equation} {\rm var}(y^+) = {\rm var}_{b}(E[y^+|b]) + E_{b}[{\rm var}(y^+|b)] \tag{3.1} \end{equation}\]

Under the assumption that the counts in each quadrat are Poisson distributed, then given \(b\) it follows that \(y^+\) is binomially distributed according to a \({\rm Bin}(N,p_b)\) distribution. This gives \[\begin{align} {\rm var}(y^+) &= {\rm var}_{b}(N p_b) + E_{b}[Np_b(1 - p_b)] \nonumber \\ &= E_{b}[N^2 p_b^2] + E_{b}[N p_b]^2 + E_{b}[Np_b(1 - p_b)] \label{eq:BoxletVar} \end{align}\]

The boxlet method estimates \({\rm var}(y^+)\) by substituting \(N\) with \(\widehat{N}\), and numerically evaluating the \(E_b\) and \({\rm var}_b\) terms (that is, with respect to variability in \(p_b\)) over all boxlet centroids within the scope of placement of the first quadrat. This leads to the boxlet variance estimator \(y^+\) and hence for \(\bar{y}\), \[\begin{equation} \widehat{\rm var}_{box}(\bar{y}) = \frac{\widehat{\rm var}_{box}(y^+)}{n^2} \end{equation}\]

3.5 Geostatistical model-based estimator (model-based)

The geostatistical model-based approach partitions the spatial domain \(\mathcal{D}\) into \(m >> n\) quadrats, with abundances (at the time of the survey) of \(\boldsymbol{y}=\{y_i, i= 1, \dots, m\}\), and hence \(N=\sum_i^m y_i\). The abundances are assumed to be a realisation from a model, \[\begin{equation*} \boldsymbol{y} \sim f\left(\boldsymbol{X}, \boldsymbol{\theta}\right) \ , \end{equation*}\]

with auxiliary covariates denoted by \(\boldsymbol{X}\) and model parameters denoted by \(\boldsymbol{\theta}\). This natural incorporation of auxiliary covariates is often viewed as an advantage for the model-based approach (Johnson, Laake, and Ver Hoef 2010; Chambers and Clark 2012; Ståhl et al. 2016; Shelton et al. 2014).

Let \(\Omega\) be the index of the set of \(n\) sampling units selected by the systematic survey. The estimate of total population can be decomposed as \[\begin{equation} \widehat{N} = \sum\limits_{i \in \Omega} y_i + \sum\limits_{j \notin \Omega} E[y_j| y_i, i \in \Omega] \ .\tag{3.2} \end{equation}\] The \(y_i, i \in \Omega\) are known and so inference on un-sampled units \(y_j, j \notin \Omega\) is the goal.

Geostatistical models have been used with systematic surveys primarily in the acoustic literature Walline (2007); Simmonds and Fryer (1996) but also in land cover use Aune-Lundberg and Strand (2014). These studies used kriging equations for model inference, which differs from the SPDE approach Lindgren, Rue, and Lindström (2011) that is used for geostatistical models presented here (see Chatper 7 Gómez-Rubio (2020) and Chang, Guillas, and Fioletov (2015) for more detail on the difference between these two approaches). The SPDE approach was chosen for its flexibility in response variable distributions, flexibility in incorporating spatio-temporal GFs, ease of including covariates and availability of convenient software for its implementation, i.e. INLA Lindgren and Rue (2015).

The software chosen to implement the spatial model, and estimator for the total population (Equation~(3.2)) and corresponding variance was Template Model Builder (TMB) (Kristensen et al. 2016). TMB was chosen for its ease of implementing bespoke models (Equation~(3.2) only involves using a subset of fitted values) and computational speed (Muff, Signer, and Fieberg 2020) required for the purpose of performing 1 000’s of simulations. TMB also has many inbuilt features that enhance its applicability. This includes automatic calculation of standard errors for functions of parameters using Taylor series expansion, which is commonly known as the generalised delta-method Millar (2011);Fournier et al. (2012)). TMB also includes the “epsilon” method of bias correction (Tierney, Kass, and Kadane 1989; Thorson and Kristensen 2016), which approximates the integral that equates to the expectation of non-linear functions of random-effect variables. Thorson and Kristensen (2016) showed the “epsilon” method to be superior to other bias correction methods commonly used within fisheries models for estimating abundance. Using these estimators of variance, TMB will automatically report estimates of \({SE}\left[\widehat{N}\right]\).

For spatially variable populations it is often better to estimate and provide standard errors for \(\log \widehat{N}\), which has generally been found to be more quadratic Bolker et al. (2013). When \(\log \widehat{N}\) is of focus, confidence intervals are

\[\begin{equation} CI_{95\%} = exp \left(\log \widehat{N} \pm 1.96 \ \widehat{SE}\left[\log \widehat{N}\right]\right) \ . \end{equation}\]

Considerations when using model-based estimators include; assessing sensitivity to starting values, model convergence, model selection, goodness of fit, etc. As in all regression analysis, these are expected for any valid inferences.

References

Ambrosio Flores, Luis, Luis Iglesias Martı́nez, and Carmen Marı́n Ferrer. 2003. “Systematic Sample Design for the Estimation of Spatial Means.” Environmetrics: The Official Journal of the International Environmetrics Society 14 (1): 45–61.

Aune-Lundberg, Linda, and Geir-Harald Strand. 2014. “Comparison of Variance Estimation Methods for Use with Two-Dimensional Systematic Sampling of Land Use/Land Cover Data.” Environmental Modelling & Software 61: 87–97.

Bolker, Benjamin M, Beth Gardner, Mark Maunder, Casper W Berg, Mollie Brooks, Liza Comita, Elizabeth Crone, et al. 2013. “Strategies for fitting nonlinear ecological models in R, AD Model Builder, and BUGS.” Methods in Ecology and Evolution 4 (6): 501–12.

Brus, DJ, and NPA Saby. 2016. “Approximating the Variance of Estimated Means for Systematic Random Sampling, Illustrated with Data of the French Soil Monitoring Network.” Geoderma 279: 77–86.

Chambers, Ray, and Robert Clark. 2012. An Introduction to Model-Based Survey Sampling with Applications. Vol. 37. Oxford University Press.

Chang, K-L, S Guillas, and VE Fioletov. 2015. “Spatial Mapping of Ground-Based Observations of Total Ozone.” Atmospheric Measurement Techniques 8 (10): 4487–4505.

D’Orazio, Marcello. 2003. “Estimating the Variance of the Sample Mean in Two-Dimensional Systematic Sampling.” Journal of Agricultural, Biological, and Environmental Statistics 8 (3): 280.

Dunn, R, and A R Harrison. 1993. “Two-Dimensional Systematic Sampling of Land Use.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 42 (4): 585–601.

Fewster, R M. 2011. “Variance Estimation for Systematic Designs in Spatial Surveys.” Biometrics 67 (4): 1518–31.

Fewster, Rachel M, Stephen T Buckland, Kenneth P Burnham, David L Borchers, Peter E Jupp, Jeffrey L Laake, and Len Thomas. 2009. “Estimating the Encounter Rate Variance in Distance Sampling.” Biometrics 65 (1): 225–36.

Fournier, David A, Hans J Skaug, Johnoel Ancheta, James Ianelli, Arni Magnusson, Mark N Maunder, Anders Nielsen, and John Sibert. 2012. “AD Model Builder: Using Automatic Differentiation for Statistical Inference of Highly Parameterized Complex Nonlinear Models.” Optimization Methods and Software 27 (2): 233–49.

Gómez-Rubio, Virgilio. 2020. Bayesian Inference with INLA. CRC Press.

Johnson, Devin S, Jeffrey L Laake, and Jay M Ver Hoef. 2010. “A Model-Based Approach for Making Ecological Inference from Distance Sampling Data.” Biometrics 66 (1): 310–18.

Kristensen, Kasper, Anders Nielsen, Casper W. Berg, Hans Skaug, and Bradley M. Bell. 2016. “TMB: Automatic Differentiation and Laplace Approximation.” Journal of Statistical Software 70 (5): 1–21. https://doi.org/10.18637/jss.v070.i05.

Lindgren, Finn, and Håvard Rue. 2015. “Bayesian Spatial Modelling with R-INLA.” Journal of Statistical Software 63 (19): 1–25. http://www.jstatsoft.org/v63/i19/.

Lindgren, Finn, Håvard Rue, and Johan Lindström. 2011. “An Explicit Link Between Gaussian Fields and Gaussian Markov Random Fields: The Stochastic Partial Differential Equation Approach.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (4): 423–98.

McGarvey, Richard, Paul Burch, and Janet M Matthews. 2016. “Precision of Systematic and Random Sampling in Clustered Populations: Habitat Patches and Aggregating Organisms.” Ecological Applications 26 (1): 233–48.

Millar, Russell B. 2011. Maximum likelihood estimation and inference: with examples in R, SAS and ADMB. Vol. 111. John Wiley & Sons.

Millar, Russell B, and Danette Olsen. 1995. “Abundance of Large Toheroa (Paphies Ventricosa Gray) at Oreti Beach, 1971–90, Estimated from Two-Dimensional Systematic Samples.” New Zealand Journal of Marine and Freshwater Research 29 (1): 93–99.

Muff, Stefanie, Johannes Signer, and John Fieberg. 2020. “Accounting for Individual-Specific Variation in Habitat-Selection Studies: Efficient Estimation of Mixed-Effects Models Using Bayesian or Frequentist Computation.” Journal of Animal Ecology 89 (1): 80–92.

Shelton, Andrew Olaf, James T Thorson, Eric J Ward, and Blake E Feist. 2014. “Spatial Semiparametric Models Improve Estimates of Species Abundance and Distribution.” Canadian Journal of Fisheries and Aquatic Sciences 71 (11): 1655–66.

Simmonds, E John, and Robert J Fryer. 1996. “Which Are Better, Random or Systematic Acoustic Surveys? A Simulation Using North Sea Herring as an Example.” ICES Journal of Marine Science 53 (1): 39–50.

Ståhl, Göran, Svetlana Saarela, Sebastian Schnell, Sören Holm, Johannes Breidenbach, Sean P Healey, Paul L Patterson, et al. 2016. “Use of Models in Large-Area Forest Surveys: Comparing Model-Assisted, Model-Based and Hybrid Estimation.” Forest Ecosystems 3 (1): 1–11.

Strand, Geir-Harald. 2017. “A Study of Variance Estimation Methods for Systematic Spatial Sampling.” Spatial Statistics 21: 226–40.

Thorson, James T, and Kasper Kristensen. 2016. “Implementing a Generic Method for Bias Correction in Statistical Models Using Random Effects, with Spatial and Population Dynamics Examples.” Fisheries Research 175: 66–74.

Tierney, Luke, Robert E Kass, and Joseph B Kadane. 1989. “Fully Exponential Laplace Approximations to Expectations and Variances of Nonpositive Functions.” Journal of the American Statistical Association 84 (407): 710–16.

Walline, Paul D. 2007. “Geostatistical simulations of eastern Bering Sea walleye pollock spatial distributions, to estimate sampling precision.” ICES Journal of Marine Science 64 (3): 559–69. https://doi.org/10.1093/icesjms/fsl045.

Wolter, Kirk M. 1984. “An Investigation of Some Estimators of Variance for Systematic Sampling.” Journal of the American Statistical Association 79 (388): 781–90.

Wood, Simon N. 2017. Generalized Additive Models: An Introduction with R. CRC press. https://doi.org/10.1201/9781315370279.