Chapter 6 Quality Control
The following scripts demonstrate a possible workflow for quality control
6.1 Loading Data
6.1.1 Loading Package Example Data
#remotes::install_github("ncss-tech/soilDB", dependencies = FALSE) #install latest version of SoilDB package
#remotes::install_github("phytoclast/vegnasis", dependencies = FALSE) #install latest version of vegnasis package
library(soilDB)
library(vegnasis)
#relationship between sites
siteass <- vegnasis::siteass20250414
#site table
sites <- vegnasis::sites20250414
#load veg plot main table
vegplot <- vegnasis::vegplot20250414
#load species composition
veg.raw <- vegnasis::veg.raw20250414
6.1.2 Loading Data From NASIS
#remotes::install_github("ncss-tech/soilDB", dependencies = FALSE) #install latest version of SoilDB package
#remotes::install_github("phytoclast/vegnasis", dependencies = FALSE) #install latest version of vegnasis package
library(soilDB)
library(vegnasis)
#relationship between sites
siteass <- soilDB::get_site_association_from_NASIS(SS=F)
#site table
sites <- soilDB::get_site_data_from_NASIS_db(SS=F)
#load veg plot main table
vegplot <- soilDB::get_vegplot_from_NASIS_db(SS=F)
#load species composition
veg.raw <- soilDB::get_vegplot_species_from_NASIS_db(SS=F)
6.2 Evaluate Species Composition
First we must filter the records to a project of interest. In this case, a Dynamic Soil Properties project in northern Lower Michigan. We use the site association record to identify that these plots belong together. We also process the species composition table using clean.veg() to grab from the various plant cover and height columns and consolidate them.
thesesites <- siteass |> subset(usiteassocid %in% 'DSP-F094AB019MI-2024')
linktoplot <- vegplot |> subset(select=c(siteiid, usiteid, vegplotid))
plotfilter <- linktoplot |> subset(usiteid %in% thesesites$usiteid)
veg <- clean.veg(veg.raw) |> subset(plot %in% plotfilter$vegplotid)
6.2.0.1 Check Phytogeography
Potentially the greatest source of error in data entry is outright incorrect species identification, entering in an auto completed partially for similarly spelled species name or common name, entering a typo in USDA PLANTS database symbol, or typing only four letter abbreviation omitting the numeric tie-breaker. A simple check of whether the plant has ever been documented on an official species checklist in vicinity of the project area will catch many such errors. Use the check.phytogeography() function to screen the taxa against a list of species found in the state of Michigan (use postal code). Output will be in the form of a column with text saying “pass” for it’s probably ok, and “check” meaning that you should consult a floristic manual or atlas for your state. In RStudio you would click on the “Veg” data frame in the environment tab to view the whole table, but in this demo, I am rendering just a portion of the table using the kableextra package for better display in an html document.
veg <- veg |> mutate(taxon2 = fill.taxon.from.symbols(symbol))
veg <- veg |> mutate(docgeo = check.phytogeography(taxon2, 'MI'))
library(knitr)#package used to make nice tables
library(kableExtra)#package used to make nice tables
options(knitr.kable.NA = '-')
quicklook <- veg[1:15,c("plot", "taxon", "docgeo")]
quicklook |>
knitr::kable(row.names = FALSE) |>
remove_column(1) |> column_spec(1,italic=T) |>
kableExtra::group_rows(index = table(quicklook$plot)) |>
kable_classic(full_width = F, html_font = "Cambria")
taxon | docgeo |
---|---|
S2024MI135001 | |
Pinus banksiana | pass |
Pinus banksiana | pass |
Polytrichum | unknown |
Prunus pumila | pass |
Prunus virginiana | pass |
Prunus serotina | pass |
Pteridium aquilinum | pass |
Gaultheria procumbens | pass |
Malaxis unifolia | pass |
Maianthemum canadense | pass |
Melampyrum lineare | pass |
Asclepias amplexicaulis | pass |
Arctostaphylos uva-ursi | pass |
Apocynum androsaemifolium | pass |
Andropogon gerardii | pass |
6.2.0.2 Check Habit
Another source of error is entering the wrong plant type (growth habit). Sometimes this is actually an error in the plant species, while the plant type is correct, a typo in the name rendered the wrong species. Other times the plant type is incorrectly considered the same as a stratum, wherein the person entered “forb” to represent a tree seedling in the lowest stratum. Another common source of inconsistency rather than error is the labeling of a tall shrub as a “tree” or a small tree as a “shrub/vine”, which can sometimes be matter of opinion. But in general, the same species should have the same “plant type” in each stratum it occurs, while stratum is designated strictly by height (except all herbs usually considered one stratum regardless of height). The function fill.type() looks up a standardized list of North American plants and provides for a default recomended plant type.
veg <- veg |> mutate(lookuptype = fill.type(taxon), checktype = ifelse(lookuptype == type, 'pass','check'))
quicklook <- veg[1:15,c("plot", "taxon","type","lookuptype", "checktype")]
quicklook |>
knitr::kable(row.names = FALSE) |>
remove_column(1) |> column_spec(1,italic=T) |>
kableExtra::group_rows(index = table(quicklook$plot)) |>
kable_classic(full_width = F, html_font = "Cambria")
taxon | type | lookuptype | checktype |
---|---|---|---|
S2024MI135001 | |||
Pinus banksiana | tree | tree | pass |
Pinus banksiana | tree | tree | pass |
Polytrichum | moss | moss | pass |
Prunus pumila | shrub/vine | shrub/vine | pass |
Prunus virginiana | shrub/vine | shrub/vine | pass |
Prunus serotina | tree | tree | pass |
Pteridium aquilinum | forb | forb | pass |
Gaultheria procumbens | shrub/vine | shrub/vine | pass |
Malaxis unifolia | forb | forb | pass |
Maianthemum canadense | forb | forb | pass |
Melampyrum lineare | forb | forb | pass |
Asclepias amplexicaulis | forb | forb | pass |
Arctostaphylos uva-ursi | shrub/vine | shrub/vine | pass |
Apocynum androsaemifolium | forb | forb | pass |
Andropogon gerardii | grass/grasslike | grass/grasslike | pass |
6.2.0.3 Check Height
One can also check whether the plant is too tall for a plant type not normally considered a tree. However, any “shrub/vine” that reaches into the tree canopy can be a vine, while a “forb” might be understood as an epiphyte rather than an extra tall herb. To get better context, we can use get.habit.code() and get.habit.name() to get more specific on the type of plant habit we are faced with. If the initial pass suggest that a “check” on height is needed, then the extended habit name will show whether the plant is an epiphyte or vine.
veg <- veg |> mutate(heightcheck = ifelse((is.na(crown.max)&is.na(stratum.min)) |
(!is.na(crown.max) & crown.max <= 5) |
(!is.na(stratum.min) & stratum.min < 5) |
type %in% c('tree'), 'pass','check'))
veg <- veg |> mutate(habit = get.habit.name(get.habit.code(taxon)))
quicklook <- veg[1:15,c("plot", "taxon","type","crown.max", "stratum.min", "heightcheck","habit")]
quicklook |>
knitr::kable(row.names = FALSE) |>
remove_column(1) |> column_spec(1,italic=T) |>
kableExtra::group_rows(index = table(quicklook$plot)) |>
kable_classic(full_width = F, html_font = "Cambria")
taxon | type | crown.max | stratum.min | heightcheck | habit |
---|---|---|---|---|---|
S2024MI135001 | |||||
Pinus banksiana | tree | 4.0 | 2.0 | pass | tall needleleaf evergreen tree |
Pinus banksiana | tree |
|
0.5 | pass | tall needleleaf evergreen tree |
Polytrichum | moss |
|
0.0 | pass | Bryophyte |
Prunus pumila | shrub/vine |
|
0.0 | pass | broadleaf deciduous shrub |
Prunus virginiana | shrub/vine | 1.2 | 0.5 | pass | broadleaf deciduous shrub |
Prunus serotina | tree |
|
0.0 | pass | tall broadleaf deciduous tree |
Pteridium aquilinum | forb | 1.0 | 0.0 | pass | seedless forb |
Gaultheria procumbens | shrub/vine |
|
0.0 | pass | broadleaf evergreen subshrub |
Malaxis unifolia | forb |
|
0.0 | pass | perennial forb |
Maianthemum canadense | forb |
|
0.0 | pass | perennial forb |
Melampyrum lineare | forb |
|
0.0 | pass | annual parasitic forb |
Asclepias amplexicaulis | forb |
|
0.0 | pass | perennial forb |
Arctostaphylos uva-ursi | shrub/vine |
|
0.0 | pass | sclerophyllous subshrub |
Apocynum androsaemifolium | forb |
|
0.0 | pass | perennial forb |
Andropogon gerardii | grass/grasslike |
|
0.0 | pass | perennial warm season graminoid |
6.3 Check Structure
The total aggregated structure by stratum from the species records should more or less be compatible with the reported total overstory cover (assuming that overstory is set at 5 m). There is a function, get.structure, that estimates the cover by stratum and overall label for this vegetation structure (e.g. forest, woodland, savanna, etc.). The fill.hts.df() function first needs to estimate plant height based on reported stratum levels and/or live crown heights, and fills in missing data based partially on the plant type (this is why it is important to remember to record the maximum height of every species in the plot). To compare with the whole plot estimate of canopy cover, we also need the main vegplot table where that value is stored.
veg.str <- veg |> fill.hts.df() |> get.structure() |> left_join(vegplot |>
subset(select=c(vegplotid, overstorycancontotalpct, overstorycancovtotalclass)), by=join_by(plot==vegplotid)) |>
mutate(overstory = case_when(!is.na(overstorycancontotalpct) ~ overstorycancontotalpct,
overstorycancovtotalclass %in% "trace" ~ (0.1)/2,
overstorycancovtotalclass %in% "0.1 to 1%" ~ (0.1+1)/2,
overstorycancovtotalclass %in% "1.1 to 2%" ~ (1+2)/2,
overstorycancovtotalclass %in% "2 to 5%" ~ (2+5)/2,
overstorycancovtotalclass %in% "6 to 10%" ~ (5+10)/2,
overstorycancovtotalclass %in% "11 to 25%" ~ (10+25)/2,
overstorycancovtotalclass %in% "26 to 50%" ~ (25+50)/2,
overstorycancovtotalclass %in% "51 to 75" ~ (50+75)/2,
overstorycancovtotalclass %in% "76 to 95%" ~ (75+95)/2,
overstorycancovtotalclass %in% "> 95%" ~ (95+100)/2,
TRUE ~ NA), check = ifelse(abs(overstory - tree) > 20, 'check','pass'))
quicklook <- veg.str[,c("plot", "tree","shrub","herb","structure","overstory", "check")]
quicklook |>
knitr::kable(row.names = FALSE) |>
kable_classic(full_width = F, html_font = "Cambria")
plot | tree | shrub | herb | structure | overstory | check |
---|---|---|---|---|---|---|
S2024MI039001 | 5.0 | 95.9 | 36.3 | shrubland | 5 | pass |
S2024MI039002 | 41.3 | 75.2 | 7.6 | woodland | 41 | pass |
S2024MI039003 | 65.0 | 56.5 | 30.8 | forest | 65 | pass |
S2024MI069001 | 0.0 | 89.9 | 68.0 | shrubland | 0 | pass |
S2024MI069002 | 47.5 | 24.9 | 6.9 | woodland | 47 | pass |
S2024MI069003 | 25.0 | 25.4 | 20.1 | woodland | 25 | pass |
S2024MI135001 | 0.0 | 94.1 | 36.6 | shrubland | 0 | pass |
S2024MI135002 | 37.6 | 58.9 | 8.6 | woodland | 38 | pass |
S2024MI135003 | 50.0 | 57.8 | 13.4 | woodland | 50 | pass |
6.4 Check Location
Finally, a common way of missing the point of the survey is to enter incorrect latitude and longitude coordinates. Most errors are pretty egregious and can land your points in the wrong county or even the wrong state. Running the script below will allow you to see that these points all landed in the national forest, and none landed in the lake. This script requires the site table and the sf package to convert the coordinates to spatial features, and the mapview package to display an interactive map.
library(sf)
library(mapview)
s <- sites |> subset(usiteid %in% plotfilter$usiteid) |>
mutate(lat=latstddecimaldegrees, lon = longstddecimaldegrees) |> subset(!is.na(lon), select=c(site_id, obsdate, lat, lon, elev, slope, aspect, site_mlra, site_state, site_county, ecositeid, ecositenm, ecostatename, commphasename))
s <- s |> st_as_sf(coords = c(x='lon', y='lat'), crs=st_crs('EPSG:4326'))
mapview(s, col.regions=c('red', 'green', 'yellow'), zcol='commphasename')