Chapter 6 Quality Control

The following scripts demonstrate a possible workflow for quality control

6.1 Loading Data

6.1.1 Loading Package Example Data

#remotes::install_github("ncss-tech/soilDB", dependencies = FALSE) #install latest version of SoilDB package
#remotes::install_github("phytoclast/vegnasis", dependencies = FALSE) #install latest version of vegnasis package
library(soilDB)
library(vegnasis)

#relationship between sites
siteass <- vegnasis::siteass20250414
#site table
sites <- vegnasis::sites20250414
#load veg plot main table
vegplot <- vegnasis::vegplot20250414
#load species composition
veg.raw <- vegnasis::veg.raw20250414

6.1.2 Loading Data From NASIS

#remotes::install_github("ncss-tech/soilDB", dependencies = FALSE) #install latest version of SoilDB package
#remotes::install_github("phytoclast/vegnasis", dependencies = FALSE) #install latest version of vegnasis package
library(soilDB)
library(vegnasis)

#relationship between sites
siteass <- soilDB::get_site_association_from_NASIS(SS=F)
#site table
sites <- soilDB::get_site_data_from_NASIS_db(SS=F)
#load veg plot main table
vegplot <- soilDB::get_vegplot_from_NASIS_db(SS=F)
#load species composition
veg.raw <- soilDB::get_vegplot_species_from_NASIS_db(SS=F)

6.2 Evaluate Species Composition

First we must filter the records to a project of interest. In this case, a Dynamic Soil Properties project in northern Lower Michigan. We use the site association record to identify that these plots belong together. We also process the species composition table using clean.veg() to grab from the various plant cover and height columns and consolidate them.

thesesites <- siteass |> subset(usiteassocid %in% 'DSP-F094AB019MI-2024')
linktoplot <- vegplot |> subset(select=c(siteiid, usiteid, vegplotid))

plotfilter <- linktoplot |> subset(usiteid %in% thesesites$usiteid)

veg <- clean.veg(veg.raw) |> subset(plot %in% plotfilter$vegplotid)

6.2.0.1 Check Phytogeography

Potentially the greatest source of error in data entry is outright incorrect species identification, entering in an auto completed partially for similarly spelled species name or common name, entering a typo in USDA PLANTS database symbol, or typing only four letter abbreviation omitting the numeric tie-breaker. A simple check of whether the plant has ever been documented on an official species checklist in vicinity of the project area will catch many such errors. Use the check.phytogeography() function to screen the taxa against a list of species found in the state of Michigan (use postal code). Output will be in the form of a column with text saying “pass” for it’s probably ok, and “check” meaning that you should consult a floristic manual or atlas for your state. In RStudio you would click on the “Veg” data frame in the environment tab to view the whole table, but in this demo, I am rendering just a portion of the table using the kableextra package for better display in an html document.

veg <- veg |> mutate(taxon2 = fill.taxon.from.symbols(symbol))
veg <- veg |> mutate(docgeo = check.phytogeography(taxon2, 'MI'))
library(knitr)#package used to make nice tables
library(kableExtra)#package used to make nice tables
options(knitr.kable.NA = '-')
quicklook <- veg[1:15,c("plot", "taxon", "docgeo")]
quicklook |> 
  knitr::kable(row.names = FALSE) |>
  remove_column(1) |> column_spec(1,italic=T) |>
  kableExtra::group_rows(index = table(quicklook$plot)) |>
  kable_classic(full_width = F, html_font = "Cambria")
taxon docgeo
S2024MI135001
Pinus banksiana pass
Pinus banksiana pass
Polytrichum unknown
Prunus pumila pass
Prunus virginiana pass
Prunus serotina pass
Pteridium aquilinum pass
Gaultheria procumbens pass
Malaxis unifolia pass
Maianthemum canadense pass
Melampyrum lineare pass
Asclepias amplexicaulis pass
Arctostaphylos uva-ursi pass
Apocynum androsaemifolium pass
Andropogon gerardii pass

6.2.0.2 Check Habit

Another source of error is entering the wrong plant type (growth habit). Sometimes this is actually an error in the plant species, while the plant type is correct, a typo in the name rendered the wrong species. Other times the plant type is incorrectly considered the same as a stratum, wherein the person entered “forb” to represent a tree seedling in the lowest stratum. Another common source of inconsistency rather than error is the labeling of a tall shrub as a “tree” or a small tree as a “shrub/vine”, which can sometimes be matter of opinion. But in general, the same species should have the same “plant type” in each stratum it occurs, while stratum is designated strictly by height (except all herbs usually considered one stratum regardless of height). The function fill.type() looks up a standardized list of North American plants and provides for a default recomended plant type.

veg <- veg |> mutate(lookuptype = fill.type(taxon), checktype = ifelse(lookuptype == type, 'pass','check'))

quicklook <- veg[1:15,c("plot", "taxon","type","lookuptype", "checktype")]
quicklook |> 
  knitr::kable(row.names = FALSE) |>
  remove_column(1) |> column_spec(1,italic=T) |>
  kableExtra::group_rows(index = table(quicklook$plot)) |>
  kable_classic(full_width = F, html_font = "Cambria")
taxon type lookuptype checktype
S2024MI135001
Pinus banksiana tree tree pass
Pinus banksiana tree tree pass
Polytrichum moss moss pass
Prunus pumila shrub/vine shrub/vine pass
Prunus virginiana shrub/vine shrub/vine pass
Prunus serotina tree tree pass
Pteridium aquilinum forb forb pass
Gaultheria procumbens shrub/vine shrub/vine pass
Malaxis unifolia forb forb pass
Maianthemum canadense forb forb pass
Melampyrum lineare forb forb pass
Asclepias amplexicaulis forb forb pass
Arctostaphylos uva-ursi shrub/vine shrub/vine pass
Apocynum androsaemifolium forb forb pass
Andropogon gerardii grass/grasslike grass/grasslike pass

6.2.0.3 Check Height

One can also check whether the plant is too tall for a plant type not normally considered a tree. However, any “shrub/vine” that reaches into the tree canopy can be a vine, while a “forb” might be understood as an epiphyte rather than an extra tall herb. To get better context, we can use get.habit.code() and get.habit.name() to get more specific on the type of plant habit we are faced with. If the initial pass suggest that a “check” on height is needed, then the extended habit name will show whether the plant is an epiphyte or vine.

veg <- veg |> mutate(heightcheck = ifelse((is.na(crown.max)&is.na(stratum.min)) |
                                         (!is.na(crown.max) & crown.max <= 5) | 
                                         (!is.na(stratum.min) & stratum.min < 5) |
                                         type %in% c('tree'), 'pass','check'))

veg <- veg |> mutate(habit = get.habit.name(get.habit.code(taxon)))                                      

quicklook <- veg[1:15,c("plot", "taxon","type","crown.max", "stratum.min", "heightcheck","habit")]
quicklook |> 
  knitr::kable(row.names = FALSE) |>
  remove_column(1) |> column_spec(1,italic=T) |>
  kableExtra::group_rows(index = table(quicklook$plot)) |>
  kable_classic(full_width = F, html_font = "Cambria")
taxon type crown.max stratum.min heightcheck habit
S2024MI135001
Pinus banksiana tree 4.0 2.0 pass tall needleleaf evergreen tree
Pinus banksiana tree
0.5 pass tall needleleaf evergreen tree
Polytrichum moss
0.0 pass Bryophyte
Prunus pumila shrub/vine
0.0 pass broadleaf deciduous shrub
Prunus virginiana shrub/vine 1.2 0.5 pass broadleaf deciduous shrub
Prunus serotina tree
0.0 pass tall broadleaf deciduous tree
Pteridium aquilinum forb 1.0 0.0 pass seedless forb
Gaultheria procumbens shrub/vine
0.0 pass broadleaf evergreen subshrub
Malaxis unifolia forb
0.0 pass perennial forb
Maianthemum canadense forb
0.0 pass perennial forb
Melampyrum lineare forb
0.0 pass annual parasitic forb
Asclepias amplexicaulis forb
0.0 pass perennial forb
Arctostaphylos uva-ursi shrub/vine
0.0 pass sclerophyllous subshrub
Apocynum androsaemifolium forb
0.0 pass perennial forb
Andropogon gerardii grass/grasslike
0.0 pass perennial warm season graminoid

6.3 Check Structure

The total aggregated structure by stratum from the species records should more or less be compatible with the reported total overstory cover (assuming that overstory is set at 5 m). There is a function, get.structure, that estimates the cover by stratum and overall label for this vegetation structure (e.g. forest, woodland, savanna, etc.). The fill.hts.df() function first needs to estimate plant height based on reported stratum levels and/or live crown heights, and fills in missing data based partially on the plant type (this is why it is important to remember to record the maximum height of every species in the plot). To compare with the whole plot estimate of canopy cover, we also need the main vegplot table where that value is stored.

veg.str <- veg |> fill.hts.df() |> get.structure() |> left_join(vegplot |> 
                                                                  subset(select=c(vegplotid, overstorycancontotalpct, overstorycancovtotalclass)), by=join_by(plot==vegplotid)) |>
  mutate(overstory = case_when(!is.na(overstorycancontotalpct) ~ overstorycancontotalpct,
                               overstorycancovtotalclass %in% "trace" ~ (0.1)/2,
                               overstorycancovtotalclass %in% "0.1 to 1%" ~ (0.1+1)/2,
                               overstorycancovtotalclass %in% "1.1 to 2%" ~ (1+2)/2,
                               overstorycancovtotalclass %in% "2 to 5%" ~ (2+5)/2,
                               overstorycancovtotalclass %in% "6 to 10%" ~ (5+10)/2,
                               overstorycancovtotalclass %in% "11 to 25%" ~ (10+25)/2,
                               overstorycancovtotalclass %in% "26 to 50%" ~ (25+50)/2,
                               overstorycancovtotalclass %in% "51 to 75" ~ (50+75)/2,
                               overstorycancovtotalclass %in% "76 to 95%" ~ (75+95)/2,
                               overstorycancovtotalclass %in% "> 95%" ~ (95+100)/2,
                               TRUE ~ NA),                                                                                                                     check = ifelse(abs(overstory - tree) > 20, 'check','pass'))

quicklook <- veg.str[,c("plot", "tree","shrub","herb","structure","overstory", "check")]
quicklook |> 
  knitr::kable(row.names = FALSE) |>
  kable_classic(full_width = F, html_font = "Cambria")
plot tree shrub herb structure overstory check
S2024MI039001 5.0 95.9 36.3 shrubland 5 pass
S2024MI039002 41.3 75.2 7.6 woodland 41 pass
S2024MI039003 65.0 56.5 30.8 forest 65 pass
S2024MI069001 0.0 89.9 68.0 shrubland 0 pass
S2024MI069002 47.5 24.9 6.9 woodland 47 pass
S2024MI069003 25.0 25.4 20.1 woodland 25 pass
S2024MI135001 0.0 94.1 36.6 shrubland 0 pass
S2024MI135002 37.6 58.9 8.6 woodland 38 pass
S2024MI135003 50.0 57.8 13.4 woodland 50 pass

6.4 Check Location

Finally, a common way of missing the point of the survey is to enter incorrect latitude and longitude coordinates. Most errors are pretty egregious and can land your points in the wrong county or even the wrong state. Running the script below will allow you to see that these points all landed in the national forest, and none landed in the lake. This script requires the site table and the sf package to convert the coordinates to spatial features, and the mapview package to display an interactive map.

library(sf)
library(mapview)
s <- sites |> subset(usiteid %in% plotfilter$usiteid) |> 
  mutate(lat=latstddecimaldegrees, lon = longstddecimaldegrees) |> subset(!is.na(lon), select=c(site_id, obsdate, lat, lon, elev, slope, aspect, site_mlra, site_state, site_county, ecositeid, ecositenm, ecostatename, commphasename)) 
s <- s |> st_as_sf(coords = c(x='lon', y='lat'), crs=st_crs('EPSG:4326'))


mapview(s, col.regions=c('red', 'green', 'yellow'), zcol='commphasename')