Overview

This document demonstrates how to calculate common biological community metrics, such as richness, relative abundance, and diversity indices using the mmir R-package.

Preparation

The package, devtools, must be installed to use the devtools::install_github() function. You will also need to install the dplyr package to aid in data manipulation.

install.packages("devtools",
                 "dplyr")

Install the Multi-Metric Index (MMI) package, mmir, that I am developing from GitHub.

devtools::install_github("zsmith27/mmir")

Once mmir is installed, load the packages with library().

library(mmir)

Calculate Metrics

One of the major goals of the mmir package was to simplify the metric calculation procedure. Five base taxa functions (i.e., taxa_rich(), taxa_pct_rich(), taxa_div(), taxa_pct(), and taxa_abund()) are capable of calculating the majority of metrics that are used in Multi-Metric Indices of biotic integrity.

Preparation

As an example data set, mmir contains benthic macroinvertebrates data collected in the Northern Appalachians as part of the 2008-2009 National Rivers and Streams Assessment conducted by the United States Environmental Protection Agency.

data("nrsa_nap_0809", package = "mmir")

Use taxa_fill to replace NAs in the taxonomic hierarchy with the nearest identified taxonomic rank. Note that the order the columns are presented is used to create the hierarchy that identifies the previous taxonomic rank. For example, you want your columns to be ordered as: Phylum, Class, Order, Family, Genus.

nrsa_nap_fill <- nrsa_nap_0809 %>% 
  taxa_fill(.final_id = target_taxon,
            .prefix = "unidentified",
            phylum:genus)

Create a data frame that will store the calculated metrics. This should include a unique site ID and possibly a few other important columns, such as stream gradient classification column (e.g., Reference and Degraded). I recommend not carrying all of the site information through the metric calculation process because it can easily be appended at the end of the calculation process or at another later step. This will reduce clutter, allowing you to focus on the metric values, and will help you manage memory allocation. Additionally, do NOT alter the order of the unique identifier. The output from each metric function is a vector in the order of the supplied data frame of taxonomic counts (in this case onondaga). If the unique identifier is rearranged, no error will appear but the metric values will be associated with incorrect unique identifier.

nest.df <- nrsa_nap_fill %>% 
  dplyr::group_nest(uid,  rt_nrsa_cat, .key = "data") 

Input Variable Definitions

Standard input variable definitions for functions in the mmir package.

  • .dataframe = A data frame in a long data format, where each row represents a unique sampling event and taxon.
  • key.col = A single column that represents a unique ID or key (i.e., site ID) that can be used to group the data. This variable should be input using NSE syntax (i.e., not quoted).
  • .group_col = A single column that represents a lower resolution taxonomic rank than the taxonomic rank specified for taxa.col. This variable should be input using NSE syntax (i.e., not quoted).
  • taxa.col = A single column that represents a higher resolution taxonomic rank than the taxonomic rank specified for .group_col. This variable should be input using NSE syntax (i.e., not quoted).
  • taxon = A object or vector, generally a character string or a character vector, that is used subsets the data frame to only include rows that match the string(s) in taxa.col.
  • .counts_col = A single numeric column that represents taxonomic counts. This variable should be input using NSE syntax (i.e., not quoted).

Richness Metrics

Richness refers to the number of unique taxa. Typically, richness is used to refer to the number of unique species found in a sample but richness can be calculated for any taxonomic rank (e.g., order, family, genus).

Community Richness

taxa_rich() is used to calculate richness. A long format data frame containing taxonomic counts is specified as the .dataframe variable (onondaga). The name of a unique ID column is specified for the .key_col variable (uid). Note that this is done with NSE syntax (i.e., no quotes). The .group_col and .filter_col refer to low resolution and high resolution taxonomic rank columns of interest. .filter_col is required but .group_col is only necessary when calculating subgroup richness. Furthermore, the taxon variable is only necessary for specifying which taxon or taxa will be used to calculate subgroup richness; therefore, when calculating community richness, taxon should be set to NULL. In this example family-level and genus-level community richness are calculated and appended to metrics.key. DT::datatable() is only used to present interactive tables in this document.

rich.df <- nest.df %>% 
  dplyr::mutate(
    rich_family = taxa_rich(.dataframe = .,
                            .key_col = uid,
                            .group_col = family,
                            .unnest_col = data),
    rich_genus = taxa_rich(.dataframe = .,
                            .key_col = uid,
                            .group_col = genus,
                            .unnest_col = data)
  )


rich.df %>% 
  dplyr::mutate(data = "nested dataframe") %>% 
  head() %>% 
  knitr::kable()
uid rt_nrsa_cat data rich_family rich_genus
11134 least disturbed nested dataframe 25 47
11135 least disturbed nested dataframe 26 54
11136 least disturbed nested dataframe 28 48
11137 least disturbed nested dataframe 23 36
11138 least disturbed nested dataframe 26 37
11564 most disturbed nested dataframe 19 36

Subgroup Richness

taxa_rich() can also be used to calculate subgroup richness, which refers to calculating the richness of only select set of taxa. In the example below Ephemeroptera (Mayfly) richness is calculated at the family- and genus-levels. More than one taxonomic group can be specified, as shown for rich_ept_gen (taxon = c("ephemeroptera", "plecoptera", "trichoptera")). The taxon variable is used to filter the data frame based on character string matches found in specified .group_col. In the example, the onondaga data frame is subset to only include rows where the order column specifies “ephemeroptera”. The .filter_col (e.g., family or genus) is then used to find the number of unique ephemeroptera taxa at this specified taxonomic level. If taxon was set to “baetidae” (a family of ephmeroptera) in the example below, then taxa_rich() would return all zeros because the character string “baetidae” would never be found in the order-level column.

sub_rich.df <- nest.df %>% 
  dplyr::mutate(
    rich_ephemeroptera_fam = taxa_rich(.dataframe = .,
                            .key_col = uid,
                            .group_col = family,
                            .filter = order %in% "ephemeroptera",
                            .unnest_col = data),
    rich_ephemeroptera_gen = taxa_rich(.dataframe = .,
                            .key_col = uid,
                            .group_col = genus,
                            .filter = order %in% "ephemeroptera",
                            .unnest_col = data),
    rich_ept_gen = taxa_rich(.dataframe = .,
                            .key_col = uid,
                            .group_col = genus,
                            .filter = order %in% c("ephemeroptera",
                                                       "plecoptera",
                                                       "trichoptera"),
                            .unnest_col = data)
  )

sub_rich.df %>% 
  dplyr::mutate(data = "nested dataframe") %>% 
  head() %>% 
  knitr::kable() 
uid rt_nrsa_cat data rich_ephemeroptera_fam rich_ephemeroptera_gen rich_ept_gen
11134 least disturbed nested dataframe 2 7 21
11135 least disturbed nested dataframe 4 9 25
11136 least disturbed nested dataframe 4 8 22
11137 least disturbed nested dataframe 5 8 21
11138 least disturbed nested dataframe 4 4 16
11564 most disturbed nested dataframe 1 1 3

Diversity Metrics

Diversity metric in general combine the measure of taxonomic evenness and taxonomic richness. ADD EVENNESS DESCRIPTION. A number of diversity metrics can be calculated using taxa_div(): Shannon-Wiener Diversity (“shannon”), Simpson’s Diversity (“simpson”), Margalef’s Diversity (“margalef”), Menhinick’s Diversity (“menhinick”), and Pielou Evenness (“pielou”).

Community Diversity

div.df <- nest.df %>% 
  dplyr::mutate(
    shannon_genus = taxa_div(.dataframe = .,
                            .key_col = uid,
                            .counts_col = total,
                            .group_col = genus,
                            .job = "shannon",
                             .base_log = 2,
                            .unnest_col = data),
    simpson_genus = taxa_div(.dataframe = .,
                            .key_col = uid,
                            .counts_col = total,
                            .group_col = genus,
                            .job = "simpson",
                            .unnest_col = data),
    margalef_genus = taxa_div(.dataframe = .,
                            .key_col = uid,
                            .counts_col = total,
                            .group_col = genus,
                            .job = "margalef",
                            .unnest_col = data),
    menhinick_genus = taxa_div(.dataframe = .,
                            .key_col = uid,
                            .counts_col = total,
                            .group_col = genus,
                            .job = "menhinick",
                            .unnest_col = data),
    pielou_genus = taxa_div(.dataframe = .,
                            .key_col = uid,
                            .counts_col = total,
                            .group_col = genus,
                            .job = "pielou",
                            .unnest_col = data)
  )

div.df %>% 
  dplyr::mutate(data = "nested dataframe") %>% 
  head() %>% 
  knitr::kable() 
uid rt_nrsa_cat data shannon_genus simpson_genus margalef_genus menhinick_genus pielou_genus
11134 least disturbed nested dataframe 4.074962 0.1018417 17.43422 2.253477 1.672098
11135 least disturbed nested dataframe 4.584539 0.0694405 20.37699 2.703381 1.732394
11136 least disturbed nested dataframe 4.202079 0.1019651 18.66180 2.642313 1.681241
11137 least disturbed nested dataframe 3.723167 0.1819546 15.93880 2.873113 1.556302
11138 least disturbed nested dataframe 2.869174 0.3291442 14.06301 1.941996 1.568202
11564 most disturbed nested dataframe 2.678898 0.2865829 12.91861 1.590990 1.556302

Subgroup Diversity

sub_div.df <- nest.df %>% 
  dplyr::mutate(
    gini_simpson_ept = taxa_div(.dataframe = .,
                            .key_col = uid,
                            .counts_col = total,
                            .group_col = order,
                            .filter = genus %in% c("ephemeroptera",
                                                       "plecoptera",
                                                       "trichoptera"),
                            .job = "gini_simpson",
                            .unnest_col = data),
    simpson_ept = taxa_div(.dataframe = .,
                            .key_col = uid,
                            .counts_col = total,
                            .group_col = order,
                            .filter = genus %in% c("ephemeroptera",
                                                       "plecoptera",
                                                       "trichoptera"),
                            .job = "simpson",
                            .unnest_col = data),
    shannon_ept = taxa_div(.dataframe = .,
                            .key_col = uid,
                            .counts_col = total,
                            .group_col = order,
                            .filter = genus %in% c("ephemeroptera",
                                                       "plecoptera",
                                                       "trichoptera"),
                            .job = "shannon",
                            .base_log = 2,
                            .unnest_col = data)
  )

sub_div.df %>% 
  dplyr::mutate(data = "nested dataframe") %>% 
  head() %>% 
  knitr::kable() 

Dominance

dom.df <- nest.df %>% 
  dplyr::mutate(
    dom_1_target_taxon = taxa_dom(.dataframe = .,
                                  .key_col = uid,
                                  .counts_col = total,
                                  .group_col = target_taxon,
                                  .dom_level = 1,
                                  .unnest_col = data),
    dom_5_target_taxon = taxa_dom(.dataframe = .,
                                  .key_col = uid,
                                  .counts_col = total,
                                  .group_col = target_taxon,
                                  .dom_level = 5,
                                  .unnest_col = data)
  )

dom.df %>% 
  dplyr::mutate(data = "nested dataframe") %>% 
  head() %>% 
  knitr::kable() 
uid rt_nrsa_cat data dom_1_target_taxon dom_5_target_taxon
11134 least disturbed nested dataframe 21.37931 63.67816
11135 least disturbed nested dataframe 18.04511 48.37093
11136 least disturbed nested dataframe 23.33333 58.48485
11137 least disturbed nested dataframe 40.12739 63.05732
11138 least disturbed nested dataframe 55.64738 76.30854
11564 most disturbed nested dataframe 42.77344 83.78906

Community Metrics

Percentages

taxa_pct() calculates the percentage of the sample represented by a taxon or taxa. The example below shows how to calculate the percentage of a single taxon, Ephemeroptera (pct_ephemeroptera), and multiple taxa, EPT (pct_ept). Remember that .filter_col refers to the name of the column that contains the taxon or taxa of interest. The .filter_vec variable is then used to specify these taxon or taxa.

pct.df <- nest.df %>% 
  dplyr::mutate(
    pct_ephemeroptera = taxa_pct(.dataframe = .,
                                  .key_col = uid,
                                  .counts_col = total,
                                  .filter = order %in% "ephemeroptera",
                                 .unnest_col = data),
    pct_ept = taxa_pct(.dataframe = .,
                        .key_col = uid,
                        .counts_col = total,
                        .filter = order %in% c("ephemeroptera",
                                                   "plecoptera",
                                                   "trichoptera"),
                       .unnest_col = data)
  )

pct.df %>% 
  dplyr::mutate(data = "nested dataframe") %>% 
  head() %>% 
  knitr::kable()
uid rt_nrsa_cat data pct_ephemeroptera pct_ept
11134 least disturbed nested dataframe 5.287356 17.93103
11135 least disturbed nested dataframe 26.817043 60.40100
11136 least disturbed nested dataframe 12.424242 41.51515
11137 least disturbed nested dataframe 15.923567 77.07006
11138 least disturbed nested dataframe 3.581267 26.44628
11564 most disturbed nested dataframe 1.953125 2.34375

Abundances

Abundance refers to reported taxonomic count but may represent counts aggregated at a lower resolution taxonomic rank. For example, taxa may be reported at the genus-level but abundance could be calculated at the order-level. Generally, I have not found abundance metrics used in macorinvertebrate indices but they are common in fish indices. taxa_abund() has the same input as taxa_pct() (See Percentages).

abund.df <- nest.df %>% 
  dplyr::mutate(
    abund_ephemeroptera = taxa_abund(.,
                                  .key_col = uid,
                                  .counts_col = total,
                                  .filter = order %in% "ephemeroptera",
                                  .unnest_col = data),
    abund_ept = taxa_abund(.,
                        .key_col = uid,
                        .counts_col = total,
                        .filter = order %in% c("ephemeroptera",
                                                   "plecoptera",
                                                   "trichoptera"),
                        .unnest_col = data)
  )

abund.df %>% 
  dplyr::mutate(data = "nested dataframe") %>% 
  head() %>% 
  knitr::kable()
uid rt_nrsa_cat data abund_ephemeroptera abund_ept
11134 least disturbed nested dataframe 23 78
11135 least disturbed nested dataframe 107 241
11136 least disturbed nested dataframe 41 137
11137 least disturbed nested dataframe 25 121
11138 least disturbed nested dataframe 13 96
11564 most disturbed nested dataframe 10 12

Sequencing Metric Calculations

When developing a new index, the developer usually tries to evaluate as many metrics as possible to obtain the most sensitive metrics. Writing individual lines of code for each taxon using taxa_rich(), taxa_pct_rich(), taxa_pct(), or taxa_abund() would be very time consuming and prone to typos. taxa_seq() was developed to make these calculations simple and safe.

This is a wrapper function around the previously discussed taxa metrics (taxa_rich(), taxa_pct_rich(), taxa_pct(), or taxa_abund()), which loops through each taxon in the specified taxonomic rank or taxonomic attribute column(s) calculating taxon richness, taxon percent richness, taxon percentages, or taxon abundance. The input variable, job, is used to specify which type of metrics to calculate (i.e., “rich”, “pct_rich”, “pct”, and “abund”). The character string used to specify the job will be used as a prefix for the column names (e.g., “rich_ephemeroptera” or “pct_plecoptera”). The remaining inputs are the same as the base taxa functions (taxa_rich(), taxa_pct_rich(), taxa_pct(), or taxa_abund()) that taxa_seq() wraps around. This also means that there will frequently be many input variables that you do not need to use. For example, if job = "pct", then hich.taxa.col, base.log, and q do not need to be specified because these variables are only used for richness and/or diversity related calculations. It is often helpful to refer back to the base taxa function to identify the necessary inputs for the specified job. Additionally, each metric only represents a single taxon. The output will therefore not include metrics, such as the percentage of EPT taxa or the richness of Gathers’ and Filter Feeder taxa. These metrics will need to be calculated individually using the base taxa functions.

The example below calculates taxa richness, percent taxa richness, taxa percentage, and taxa abundance for all of the taxa under the order and family columns. The taxa.cols input variable could be reduced to just one column (taxa.cols = "family") or could be expanded to include as many taxonomic rank and/or taxonomic attribute columns as you want (taxa.cols = c("class", order", "suborder", "family", "tolerance_values", "functional_feeding_groups", "habits")). I suggest specifying all of your taxonomic rank and taxonomic attribute columns. This will create a lot of output, could potentially take awhile to calculate, and the majority of the metrics will have very poor metric sensitivity; however, the sensitivity() function (See [Calculate Metric Sensitivity]) will allow you to quickly filter out poorly performing metrics. Calculating a large amount of metrics will give you more opptertunities to identify patterns that reflect your defined disturbance gradient. I used this function during the development of the Chessie BIBI and I found several metrics that are not commonly used to have high metric sensitivity values. For example, the percent of Systellognatha (a suborder of Plecoptera) was more sensitive than the commonly used percent of Plecoptera metric. This extensive exploration of your data also comes with the danger of overfitting your index. To prevent this issue please use index validation procedure (e.g., Hold-Out validation or Cross Validation) during your index development.

seq.df <- nest.df %>% 
  dplyr::bind_cols(
    taxa_seq(.dataframe = .,
             .key_col = uid,
             .counts_col = total,
             .filter_cols_vec = c("class", "order"),
             .group_col = genus,
             .job = "rich",
             .unnest_col = data)
    )
#> New names:
#> * rich_unidentified_nemata -> rich_unidentified_nemata...9
#> * rich_unidentified_cnidaria -> rich_unidentified_cnidaria...11
#> * rich_unidentified_nemata -> rich_unidentified_nemata...36
#> * rich_unidentified_cnidaria -> rich_unidentified_cnidaria...41
#> New names:
#> * rich_unidentified_nemata...9 -> rich_unidentified_nemata...12
#> * rich_unidentified_cnidaria...11 -> rich_unidentified_cnidaria...14
#> * rich_unidentified_nemata...36 -> rich_unidentified_nemata...39
#> * rich_unidentified_cnidaria...41 -> rich_unidentified_cnidaria...44

dplyr::tibble(
  "List of Column Names" = names(seq.df)
) %>% 
knitr::kable()
List of Column Names
uid
rt_nrsa_cat
data
rich_insecta
rich_arachnida
rich_gastropoda
rich_oligochaeta
rich_bivalvia
rich_malacostraca
rich_turbellaria
rich_enopla
rich_unidentified_nemata…12
rich_hirudinea
rich_unidentified_cnidaria…14
rich_polychaeta
rich_ostracoda
rich_trichoptera
rich_diptera
rich_trombidiformes
rich_coleoptera
rich_ephemeroptera
rich_basommatophora
rich_plecoptera
rich_lumbriculida
rich_megaloptera
rich_odonata
rich_haplotaxida
rich_veneroida
rich_isopoda
rich_amphipoda
rich_unidentified_turbellaria
rich_enchytraeida
rich_neotaenioglossa
rich_unidentified_arachnida
rich_lepidoptera
rich_decapoda
rich_hemiptera
rich_hoplonemertea
rich_unidentified_nemata…39
rich_architaenioglossa
rich_unidentified_gastropoda
rich_sarcoptiformes
rich_rhynchobdellida
rich_unidentified_cnidaria…44
rich_arhynchobdellida
rich_sabellida
rich_cumacea
rich_unidentified_hirudinea
rich_unionoida
rich_heterostropha
rich_neuroptera
rich_branchiobdellida
rich_mysida
rich_unidentified_ostracoda
rich_unidentified_bivalvia
rich_unidentified_oligochaeta

# seq.df %>% 
#   dplyr::mutate(data = "nested dataframe") %>% 
#   head() %>% 
#   dplyr::select(uid:rich_genus_bivalvia) %>% 
#   knitr::kable()

Putting it All Together

The following code-chunk will produce more than 1,000 biological metrics.

metrics.df <- nest.df %>% 
  dplyr::mutate(
    rich_family = taxa_rich(.dataframe = .,
                            .key_col = uid,
                            .group_col = family,
                            .unnest_col = data),
    rich_genus = taxa_rich(.dataframe = .,
                           .key_col = uid,
                           .group_col = genus,
                           .unnest_col = data),
    rich_target_taxon = taxa_rich(.dataframe = .,
                                  .key_col = uid,
                                  .group_col = target_taxon,
                                  .unnest_col = data),
    gini_simpson_ept = taxa_div(.dataframe = .,
                                .key_col = uid,
                                .counts_col = total,
                                .group_col = target_taxon,
                                .filter = order %in% c("ephemeroptera",
                                                       "plecoptera",
                                                       "trichoptera"),
                                .job = "gini_simpson",
                                .unnest_col = data),
    simpson_ept = taxa_div(.dataframe = .,
                           .key_col = uid,
                           .counts_col = total,
                           .group_col = target_taxon,
                           .filter = order %in% c("ephemeroptera",
                                                  "plecoptera",
                                                  "trichoptera"),
                           .job = "simpson",
                           .unnest_col = data),
    shannon_ept = taxa_div(.dataframe = .,
                           .key_col = uid,
                           .counts_col = total,
                           .group_col = target_taxon,
                           .filter = order %in% c("ephemeroptera",
                                                  "plecoptera",
                                                  "trichoptera"),
                           .job = "shannon",
                           .base_log = 2,
                           .unnest_col = data),
    pct_ept = taxa_pct(.dataframe = .,
                       .key_col = uid,
                       .counts_col = total,
                       .filter = order %in% c("ephemeroptera",
                                              "plecoptera",
                                              "trichoptera"),
                       .unnest_col = data),
    pct_cote = taxa_pct(.dataframe = .,
                        .key_col = uid,
                        .counts_col = total,
                        .filter = order %in% c("coleoptera",
                                               "odonata",
                                               "trichoptera",
                                               "ephemeroptera"),
                        .unnest_col = data),
    dom_1_target_taxon = taxa_dom(.dataframe = .,
                                  .key_col = uid,
                                  .counts_col = total,
                                  .group_col = target_taxon,
                                  .dom_level = 1,
                                  .unnest_col = data),
    dom_5_target_taxon = taxa_dom(.dataframe = .,
                                  .key_col = uid,
                                  .counts_col = total,
                                  .group_col = target_taxon,
                                  .dom_level = 5,
                                  .unnest_col = data),
    tol_index = taxa_tol_index(.dataframe = .,
                               .key_col = uid,
                               .counts_col = total,
                               .tol_col = ptv,
                               .unnest_col = data)
  ) %>% 
  dplyr::bind_cols(
    taxa_seq(.dataframe = .,
             .key_col = uid,
             .counts_col = total,
             .filter_cols_vec = c("class", "order", "family"),
             .group_col = target_taxon,
             .job = "rich",
             .unnest_col = data),
    taxa_seq(.dataframe = .,
             .key_col = uid,
             .counts_col = total,
             .filter_cols_vec = c("class", "order", "family"),
             .group_col = target_taxon,
             .job = "pct_rich",
             .unnest_col = data),
    taxa_seq(.dataframe = .,
             .key_col = uid,
             .counts_col = total,
             .filter_cols_vec = c("class", "order", "family", "genus"),
             .job = "pct",
             .unnest_col = data),
    taxa_seq(.dataframe = .,
             .key_col = uid,
             .counts_col = total,
             .filter_cols_vec = c("class", "order", "family"),
             .group_col = target_taxon,
             .job = "simpson",
             .unnest_col = data),
  )
#> New names:
#> * rich_unidentified_nemata -> rich_unidentified_nemata...9
#> * rich_unidentified_cnidaria -> rich_unidentified_cnidaria...11
#> * rich_unidentified_turbellaria -> rich_unidentified_turbellaria...28
#> * rich_unidentified_arachnida -> rich_unidentified_arachnida...31
#> * rich_unidentified_nemata -> rich_unidentified_nemata...36
#> * ...
#> New names:
#> * pct_rich_unidentified_nemata -> pct_rich_unidentified_nemata...9
#> * pct_rich_unidentified_cnidaria -> pct_rich_unidentified_cnidaria...11
#> * pct_rich_unidentified_turbellaria -> pct_rich_unidentified_turbellaria...28
#> * pct_rich_unidentified_arachnida -> pct_rich_unidentified_arachnida...31
#> * pct_rich_unidentified_nemata -> pct_rich_unidentified_nemata...36
#> * ...
#> New names:
#> * pct_NULL_unidentified_nemata -> pct_NULL_unidentified_nemata...9
#> * pct_NULL_unidentified_cnidaria -> pct_NULL_unidentified_cnidaria...11
#> * pct_NULL_unidentified_turbellaria -> pct_NULL_unidentified_turbellaria...28
#> * pct_NULL_unidentified_arachnida -> pct_NULL_unidentified_arachnida...31
#> * pct_NULL_unidentified_nemata -> pct_NULL_unidentified_nemata...36
#> * ...
#> New names:
#> * simpson_unidentified_nemata -> simpson_unidentified_nemata...9
#> * simpson_unidentified_cnidaria -> simpson_unidentified_cnidaria...11
#> * simpson_unidentified_turbellaria -> simpson_unidentified_turbellaria...28
#> * simpson_unidentified_arachnida -> simpson_unidentified_arachnida...31
#> * simpson_unidentified_nemata -> simpson_unidentified_nemata...36
#> * ...
#> New names:
#> * rich_unidentified_nemata...9 -> rich_unidentified_nemata...23
#> * rich_unidentified_cnidaria...11 -> rich_unidentified_cnidaria...25
#> * rich_unidentified_turbellaria...28 -> rich_unidentified_turbellaria...42
#> * rich_unidentified_arachnida...31 -> rich_unidentified_arachnida...45
#> * rich_unidentified_nemata...36 -> rich_unidentified_nemata...50
#> * ...
nrsa_nap_metrics.df <- metrics.df %>% 
  dplyr::select(-data)

usethis::use_data(nrsa_nap_metrics.df,
                  overwrite = TRUE)
#> √ Setting active project to 'C:/Users/zmsmith.000/OneDrive - New York State Office of Information Technology Services/projects/mmir'
#> √ Saving 'nrsa_nap_metrics.df' to 'data/nrsa_nap_metrics.df.rda'
#> * Document your data (see 'https://r-pkgs.org/data.html')