Examples of VegVault usage

Published

January 24, 2025

Modified

April 12, 2025

In order to make usage of the VegVault database as easy as possible, we have developed an R-package called {vaultkeepr}, providing a suite of functions to effectively interact with the database directly from the R programming language. Functions include opening the database, extracting datasets, filtering samples, and accessing specific taxa and traits. This package is a well-tested interface (>95% code coverage).

The {vaultkeepr} can be installed from GitHub with:

# install.packages("remotes")
remotes::install_github("OndrejMottl/vaultkeepr")

and then all functions will be made available by attaching as:

library(vaultkeepr)

Here is a schematic workflow of accessing and extracting data from the VegVault database using the {vaultkeepr} R package:

Schematic workflow of accessing and extracting data from VegVault database using {vaultkeepr} R package.

Download VegVault database
Download {vaultkeepr} R package from GitHub (see above)
In the R programming environment39, the user has to select, which type(s) of datasets they would like to extracted. A minimum of one Dataset Type (see Data Records section for more details) must be specified when using the vaultkeepr::select_dataset_by_type() function
When the user is interested in data from a specific region, they can specific the geographical coordinates of a rectangular extent using the vaultkeepr::select_dataset_by_geo() function to only get Samples from the individual Datasets that were recorded within the region of interest. Additionally, the user can specify the temporal focus by filtering data within a specific period using the vaultkeepr::select_samples_by_age()
The user may also specify any further attributes to be added to the data compilation. Specifically: get_taxa()
- When extracting contemporary or past vegetation (fossil pollen) records, the user most likely wishes to add information about the abundance of individual Taxa in each Sample. To do so, the user can use the vaultkeepr::get_taxa() function. In addition, the user can standardise the taxonomy, so that the extracted Taxa can be compared. The parameter classify_to within the vaultkeepr::get_taxa() function allows the user to specify a taxonomic level (species, genus or family) on which the data should be comparable (see Methods about Taxa classification). Furthermore, the user can select specific Taxa based on taxonomy by using the vaultkeepr::select_taxa_by_name() function. get_traits()
- When wishing to link the Trait data with the vegetation records, the user can use vaultkeepr::get_traits() function to extract all Trait Samples of the earlier specified spatio-temporal extent. Moreover, similarly to the vaultkeepr::get_taxa() function, the user can specify the taxonomic level to which the data should be standardised by using the classify_to parameter. Please note that the user has to decide in their further analysis how they want to aggregate the measured traits per taxonomic group, e.g. by taking the mean. Further, the user can select a specific Trait Domain (of the six available; see Methods about details) by using the vaultkeepr::select_traits_by_domain_name() function. get_abiotic_data()
- When wishing to link vegetation or trait records with abiotic data, the abiotic data can be obtained by the vaultkeepr::get_abiotic_data() function. The user can specify in which mode the abiotic data should be linked, which can be either “nearest” (i.e. geographically closest records), “mean” or “median” (summarising all abiotic records within a set geographical and/or temporal distance). This can be further tweaked by its parameters limit_by_distance_km and limit_by_age_years. Furthermore, specific abiotic variables can be chosen by the vaultkeepr::select_abiotic_var_by_name() function
When defined all specifications for data extraction, the user can execute the extraction using vaultkeepr::extract_data(). This will result in a “ready-for-analyses” data compilation. Moreover, the user can use the vaultkeepr::get_references() function to obtain all references required (and/or suggested) for such compilation (see Internal Database Structure section for more details about hierarchical structure of references). Finally, see the Usage Notes section for examples of data processing and extracting using {vaultkeepr} R package.

Finally, we present 3 examples of potential projects and how to obtain data for such projects from VegVault using {vaultkeepr} R-package. Note that we are specifically not trying to do any analysis, only presenting a way to obtain data, which can be used for such projects.

Example 1: Spatiotemporal patterns of the Picea genus across North America since the LGM

The first example demonstrates how to retrieve data for the genus Picea across North America by selecting both modern and fossil pollen plot datasets, filtering samples by geographic boundaries and temporal range (0 to 15,000 yr BP), and harmonizing taxa to the genus level. The resulting dataset allows users to study spatiotemporal patterns of Picea distribution over millennia. This can be accomplished by running the following code:

data_na_plots_picea <-
  # Access the VegVault
  vaultkeepr::open_vault(path = "<path_to_VegVault>") %>%
  # Start by adding dataset information
  vaultkeepr::get_datasets() %>%
  # Select both modern and paleo plot data
  vaultkeepr::select_dataset_by_type(
    sel_dataset_type = c(
      "vegetation_plot",
      "fossil_pollen_archive"
    )
  ) %>%
  # Limit data to North America
  vaultkeepr::select_dataset_by_geo(
    lat_lim = c(22, 60),
    long_lim = c(-135, -60)
  ) %>%
  # Add samples
  vaultkeepr::get_samples() %>%
  # Limit the samples by age
  vaultkeepr::select_samples_by_age(
    age_lim = c(0, 15e3)
  ) %>%
  # Add taxa & classify all data to a genus level
  vaultkeepr::get_taxa(classify_to = "genus") %>%
  # Extract only Picea data
  vaultkeepr::select_taxa_by_name(sel_taxa = "Picea") %>%
  vaultkeepr::extract_data()

Now, we plot the presence of Picea in each 2500-year bin.

Example 2: Joined Species Distribution model for all vegetation within Czechia

In the second example, the project aims to do species distribution modelling for plant taxa in the Czech Republic based on contemporary vegetation plot data and mean annual temperature. The code includes selecting datasets and extracting relevant abiotic data as followed:

data_cz_modern <-
  # Acess the VegVault file
  vaultkeepr::open_vault(path = "<path_to_VegVault>") %>%
  # Add the dataset information
  vaultkeepr::get_datasets() %>%
  # Select modern plot data and climate
  vaultkeepr::select_dataset_by_type(
    sel_dataset_type = c(
      "vegetation_plot",
      "gridpoints"
    )
  ) %>%
  # Limit data to Czech Republic
  vaultkeepr::select_dataset_by_geo(
    lat_lim = c(48.5, 51.1),
    long_lim = c(12, 18.9)
  ) %>%
  # Add samples
  vaultkeepr::get_samples() %>%
  # select only modern data
  vaultkeepr::select_samples_by_age(
    age_lim = c(0, 0)
  ) %>%
  # Add abiotic data
  vaultkeepr::get_abiotic_data() %>%
  # Select only Mean Anual Temperature (bio1)
  vaultkeepr::select_abiotic_var_by_name(sel_var_name = "bio1") %>%
  # add taxa
  vaultkeepr::get_taxa() %>%
  vaultkeepr::extract_data()

Now we can simply plot both the climatic data and the plot vegetation data:

Example 3: Patterns of plant height (CWM) for South and Central Latin America between 6-12 ka

The third example focuses on obtaining data to be able to reconstruct plant height for South and Central America between 6-12 ka cal yr BP (thousand years before present). This example project showcases the integration of trait data with paleo-vegetation records to subsequently study historical vegetation dynamics and functional composition of plant communities:

data_la_traits <-
  # Acess the VegVault file
  vaultkeepr::open_vault(path = "<path_to_VegVault>") %>%
  # Add the dataset information
  vaultkeepr::get_datasets() %>%
  # Select modern plot data and climate
  vaultkeepr::select_dataset_by_type(
    sel_dataset_type = c(
      "fossil_pollen_archive",
      "traits"
    )
  ) %>%
  # Limit data to South and Central America
  vaultkeepr::select_dataset_by_geo(
    lat_lim = c(-53, 28),
    long_lim = c(-110, -38),
    sel_dataset_type = c(
      "fossil_pollen_archive",
      "traits"
    )
  ) %>%
  # Add samples
  vaultkeepr::get_samples() %>%
  # Limit to 6-12 ka yr BP
  vaultkeepr::select_samples_by_age(
    age_lim = c(6e3, 12e3)
  ) %>%
  # add taxa & clasify all data to a genus level
  vaultkeepr::get_taxa(classify_to = "genus") %>%
  # add trait information & clasify all data to a genus level
  vaultkeepr::get_traits(classify_to = "genus") %>%
  # Only select the plant height
  vaultkeepr::select_traits_by_domain_name(sel_domain = "Plant heigh") %>%
  vaultkeepr::extract_data()

Now let’s plot the overview of the data