Habitat hotspots were mapped for migratory birds ‘guilds’ across the LCD region using species presence/absence data collected from citizen-science datasets and modelled habitat conditions from the LANDFIRE program (Rollins, 2009). For presence/absence data, we used the eBird Reference Dataset (ERD, accessed October 1st, 2016; summarized in Sullivan et al., 2009) to model guild-level response to prevailing vegetation structure (e.g., percent-cover grass, tree, shrub, vegetation height), topography, and water availability for priority migratory bird species outlined in the Rio Mora NWR Land Protection Plan. We parsed eBird species “checklists” for species observed within a ~ 500 kilometer radius of the Rio Mora NWR. Each checklist contains spatially-explicit count estimates (which were re-classified as presence-absence) for 1766 bird species observed across North America. We grouped priority migratory birds for Rio Mora NWR into six species ‘guilds’ (Tbl N), based on how each species resolve along habitat and trophic gradients (Fig. N). The total number of checklists considered for each guild varied by guild, ranging from 1646-5154 total checklists (Tbl N). To account for count inflation within each guild (which can bias predictions in random forests), we re-binned continuous guild counts into 30 ‘bins’ and then randomly downsampled each of the 30 bins to the median number of records observed across all bins. Lastly, in order to account for an over-sampling bias near urban-areas in the ERD, we censored species observations using estimates of human population density from the NASA SEDAC Gridded Population of the World (GPW v.4, 2015). All eBird observations collected in areas with a population density greater-than-or-equal-to 150/km2 were excluded from analysis. We then re-gridded all checklist observations to a grid-cell resolution that was consistent with the mean parcel size (16 km2) observed across the tri-county region. Species presence-absence for each guild was modeled using random forests, a non-parametric, tree-based algorithm used for high-dimensional classification and regression problems in ecology (Cutler et al., 2007; Liaw & Wiener, 2002). We used random forests to ‘learn’ suitable habitat conditions for each guild by treating presence/absence observations as a classification problem, such that: Pr(Prop. Votes) = fy[1,0] | p1,p2,…. Here the proportion of votes for presence (1) or absence (0) across all trees in a random forest are treated as conditional on habitat parameters (p) and interpreted probabilistically, using log-scale distance normalization (Evans et al., 2011; as implemented in the ‘R’ package rfUtilities). After fitting models and optimizing variable selection for each guild, we then projected each model across the geographic extent of the tri-county study region to produce our hotspot maps. The guild-level suitability projections were considered both on an individual basis and in a GIS-based “stacked” suitability analysis showing the additive-benefit of habitat for all priority species within landowner parcel units. |