9 Jonah crab

10 Jonah crab

10.1 Introduction (obviously this will be much longer)

The Jonah crab (Cancer borealis) fishery is managed in part by a minimum legal size, intended to protect females and enough mature males to sustain the population. However, there are many unresolved questions about the growth of Jonah crabs, including spatial variation in size at maturity (Truesdale, Dalton, and McManus 2019). Here, I will combine the insights from the systematic review and simulation testing with cutting-edge ecological modeling tools in a novel approach to spatially-varying parametrization of size at maturity models that can be generalized to other crustacean fisheries.

10.2 Methods

Based on the superior performance of mclust during the preliminary simulation testing, my current plan for analyzing the Jonah crab data is to use Gaussian mixture models for classification followed by spatial logistic regression to determine SM50. This may change as I find new modeling approaches through the systematic review and conduct more comprehensive simulation testing.

10.2.1 Jonah crab data

Male Jonah crab morphological data used in this study were collected across the Northeast Atlantic in 2015-2019 and 2021. Most data were obtained from the annual Northeast Fisheries Science Center bottom trawl surveys, with supplemental measurements provided by the Massachusetts Department of Marine Fisheries and the University of Maryland Center for Environmental Science in 2015, 2016, and 2017. We only considered records that included a measurement for both carapace width (CW) and chela height (CH) and had spatial coordinates. Although there is also some data for female Jonah crabs, females are uncommon in the Jonah crab fishery because they are typically below the minimum legal size limit and associated trap vent sizes.

10.2.2 Mixture model clustering

I will use the package mclust to perform model-based clustering using finite Gaussian mixture models (Scrucca et al. 2023). In brief, this clustering method works by assuming that the groups (in this case, mature and immature crabs) correspond to different probability density functions, called mixture components. The overall mixture distribution is a weighted sum (formally, a convex linear combination) of these components. The parameters for the component distributions and the mixture weights are estimated using the Expectation-Maximization algorithm, an iterative approach to finding maximum-likelihood estimates that is commonly used for latent variable and missing data problems (Dempster, Laird, and Rubin 1977).

The data will not be manually transformed before clustering because the mclust default is to transform the data using singular value decomposition (SVD) before initializing the EM algorithm (Scrucca and Raftery 2015). The number of clusters will be pre-specified \((G=2)\). The “EVV” type will be chosen to specify the geometric characteristics of the covariance matrices, meaning that the mixture components have equal volume but varying shape and orientation (Scrucca et al. 2023).

10.2.3 Spatial logistic regression

The clustering-derived maturity labels will then used to fit a logistic regression model using the R package sdmTMB version 0.6.0.9013, which provides a flexible interface to fit spatial and spatiotemporal GLMMs (Generalized Linear Mixed Effects Models) (Anderson et al., n.d.). The package employs Template Model Builder (TMB) and integrated nested Laplace approximations to find values for fixed effects that maximize the marginal log likelihood while integrating across random effects (Kristensen et al. 2016; Rue, Martino, and Chopin 2009). A Delaunay triangulation mesh will be constructed over the observed coordinates using the make_mesh function in sdmTMB, setting a minimum distance of 10 km to avoid overfitting. The mesh serves as an input for an sdmTMB logistic model (binomial family with a logit link) of the form maturity ~ CW, with a spatially-varying slope coefficient for carapace width added as a random effect. I will use the model to generate predictions over a grid of the target region and extracted the value of the coefficient for each location, then use the extracted coefficient to recalculate SM50 at each point. I will model spatial variation in the probability of maturity at legal size by generating model predictions over a grid that included CW as a constant attribute held at the minimum legal size.

For comparison with the spatial model, I will fit a non-spatial logistic model using the base R function glm(). The Fieller method will be used to extract confidence intervals for SM50 from the GLM coefficients and variance-covariance matrix (Mainguy et al. 2024). Comparison between models will be based on differences between their Akaike information criterion (AIC) values (Burnham and Anderson 2004). Analytical randomized-quantile residuals will be checked for normality (Dunn and Smyth 1996) and simulation-based residuals from the fitted models will be checked for uniformity, overdispersion, and the presence of significant outliers using the R package DHARMa (Hartig, n.d.).

References

Anderson, Sean C., Eric J. Ward, Philina A. English, Lewis A. K. Barnett, and James T. Thorson. n.d. “sdmTMB: An R Package for Fast, Flexible, and User-Friendly Generalized Linear Mixed Effects Models with Spatial and Spatiotemporal Random Fields.” https://doi.org/10.1101/2022.03.24.485545.

Burnham, Kenneth P., and David R. Anderson. 2004. “Multimodel Inference: Understanding AIC and BIC in Model Selection.” Sociological Methods & Research 33 (2): 261–304. https://doi.org/10.1177/0049124104268644.

Dempster, A. P., N. M. Laird, and D. B. Rubin. 1977. “Maximum Likelihood from Incomplete Data via the EM Algorithm.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 39 (1): 1–38. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x.

Dunn, Peter K., and Gordon K. Smyth. 1996. “Randomized Quantile Residuals.” Journal of Computational and Graphical Statistics 5 (3): 236–44. https://doi.org/10.2307/1390802.

Hartig, Florian. n.d. “DHARMa: Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models.” https://doi.org/10.32614/CRAN.package.DHARMa.

Kristensen, Kasper, Anders Nielsen, Casper W. Berg, Hans Skaug, and Bradley M. Bell. 2016. “TMB: Automatic Differentiation and Laplace Approximation.” Journal of Statistical Software 70 (April): 1–21. https://doi.org/10.18637/jss.v070.i05.

Mainguy, Julien, Martin Bélanger, Geneviève Ouellet-Cauchon, and Rafael de Andrade Moral. 2024. “Monitoring Reproduction in Fish: Assessing the Adequacy of Ogives and the Predicted Uncertainty of Their L50 Estimates for More Reliable Biological Inferences.” Fisheries Research 269 (January): 106863. https://doi.org/10.1016/j.fishres.2023.106863.

Rue, Håvard, Sara Martino, and Nicolas Chopin. 2009. “Approximate Bayesian Inference for Latent Gaussian Models by Using Integrated Nested Laplace Approximations.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71 (2): 319–92. https://doi.org/10.1111/j.1467-9868.2008.00700.x.

Scrucca, Luca, Chris Fraley, T. Brendan Murphy, and Adrian E. Raftery. 2023. Model-based clustering, classification, and density estimation using mclust in R. First edition. Boca Raton, FL: CRC Press. https://mclust-org.github.io/book/.

Scrucca, Luca, and Adrian E. Raftery. 2015. “Improved Initialisation of Model-Based Clustering Using Gaussian Hierarchical Partitions.” Advances in Data Analysis and Classification 9 (4): 447–60. https://doi.org/10.1007/s11634-015-0220-z.

Truesdale, Corinne L., Tracey M. Dalton, and M. Conor McManus. 2019. “Fishers’ Knowledge and Perceptions of the Emerging Southern New England Jonah Crab Fishery.” North American Journal of Fisheries Management 39 (5): 951–63. https://doi.org/10.1002/nafm.10327.