Examples of VegVault usage
In order to make usage of the VegVault database as easy as possible, we have developed an R-package called {vaultkeepr}, providing a suite of functions to effectively interact with the database directly from the R programming language. Functions include opening the database, extracting datasets, filtering samples, and accessing specific taxa and traits. This package is a well-tested interface (>95% code coverage).
The {vaultkeepr} can be installed from GitHub with:
# install.packages("remotes")
::install_github("OndrejMottl/vaultkeepr") remotes
and then all functions will be made available by attaching as:
library(vaultkeepr)
Here is a schematic workflow of accessing and extracting data from the VegVault database using the {vaultkeepr} R package:
Schematic workflow of accessing and extracting data from VegVault database using {vaultkeepr} R package.
- Download VegVault database
- Download {vaultkeepr} R package from GitHub (see above)
- In the R programming environment39, the user has to select, which type(s) of datasets they would like to extracted. A minimum of one Dataset Type (see Data Records section for more details) must be specified when using the
vaultkeepr::select_dataset_by_type()
function - When the user is interested in data from a specific region, they can specific the geographical coordinates of a rectangular extent using the
vaultkeepr::select_dataset_by_geo()
function to only get Samples from the individual Datasets that were recorded within the region of interest. Additionally, the user can specify the temporal focus by filtering data within a specific period using thevaultkeepr::select_samples_by_age()
- The user may also specify any further attributes to be added to the data compilation. Specifically:
get_taxa()
- When extracting contemporary or past vegetation (fossil pollen) records, the user most likely wishes to add information about the abundance of individual Taxa in each Sample. To do so, the user can use the
vaultkeepr::get_taxa()
function. In addition, the user can standardise the taxonomy, so that the extracted Taxa can be compared. The parameterclassify_to
within thevaultkeepr::get_taxa()
function allows the user to specify a taxonomic level (species, genus or family) on which the data should be comparable (see Methods about Taxa classification). Furthermore, the user can select specific Taxa based on taxonomy by using thevaultkeepr::select_taxa_by_name()
function.get_traits()
- When wishing to link the Trait data with the vegetation records, the user can use
vaultkeepr::get_traits()
function to extract all Trait Samples of the earlier specified spatio-temporal extent. Moreover, similarly to thevaultkeepr::get_taxa()
function, the user can specify the taxonomic level to which the data should be standardised by using theclassify_to
parameter. Please note that the user has to decide in their further analysis how they want to aggregate the measured traits per taxonomic group, e.g. by taking the mean. Further, the user can select a specific Trait Domain (of the six available; see Methods about details) by using thevaultkeepr::select_traits_by_domain_name()
function.get_abiotic_data()
- When wishing to link vegetation or trait records with abiotic data, the abiotic data can be obtained by the
vaultkeepr::get_abiotic_data()
function. The user can specify in which mode the abiotic data should be linked, which can be either “nearest” (i.e. geographically closest records), “mean” or “median” (summarising all abiotic records within a set geographical and/or temporal distance). This can be further tweaked by its parameterslimit_by_distance_km
andlimit_by_age_years
. Furthermore, specific abiotic variables can be chosen by thevaultkeepr::select_abiotic_var_by_name()
function
- When extracting contemporary or past vegetation (fossil pollen) records, the user most likely wishes to add information about the abundance of individual Taxa in each Sample. To do so, the user can use the
- When defined all specifications for data extraction, the user can execute the extraction using
vaultkeepr::extract_data()
. This will result in a “ready-for-analyses” data compilation. Moreover, the user can use thevaultkeepr::get_references()
function to obtain all references required (and/or suggested) for such compilation (see Internal Database Structure section for more details about hierarchical structure of references). Finally, see the Usage Notes section for examples of data processing and extracting using {vaultkeepr} R package.
Finally, we present 3 examples of potential projects and how to obtain data for such projects from VegVault using {vaultkeepr} R-package. Note that we are specifically not trying to do any analysis, only presenting a way to obtain data, which can be used for such projects.
Example 1: Spatiotemporal patterns of the Picea genus across North America since the LGM
The first example demonstrates how to retrieve data for the genus Picea across North America by selecting both modern and fossil pollen plot datasets, filtering samples by geographic boundaries and temporal range (0 to 15,000 yr BP), and harmonizing taxa to the genus level. The resulting dataset allows users to study spatiotemporal patterns of Picea distribution over millennia. This can be accomplished by running the following code:
<-
data_na_plots_picea # Access the VegVault
::open_vault(path = "<path_to_VegVault>") %>%
vaultkeepr# Start by adding dataset information
::get_datasets() %>%
vaultkeepr# Select both modern and paleo plot data
::select_dataset_by_type(
vaultkeeprsel_dataset_type = c(
"vegetation_plot",
"fossil_pollen_archive"
)%>%
) # Limit data to North America
::select_dataset_by_geo(
vaultkeeprlat_lim = c(22, 60),
long_lim = c(-135, -60)
%>%
) # Add samples
::get_samples() %>%
vaultkeepr# Limit the samples by age
::select_samples_by_age(
vaultkeeprage_lim = c(0, 15e3)
%>%
) # Add taxa & classify all data to a genus level
::get_taxa(classify_to = "genus") %>%
vaultkeepr# Extract only Picea data
::select_taxa_by_name(sel_taxa = "Picea") %>%
vaultkeepr::extract_data() vaultkeepr
Now, we plot the presence of Picea in each 2500-year bin.
Example 2: Joined Species Distribution model for all vegetation within Czechia
In the second example, the project aims to do species distribution modelling for plant taxa in the Czech Republic based on contemporary vegetation plot data and mean annual temperature. The code includes selecting datasets and extracting relevant abiotic data as followed:
<-
data_cz_modern # Acess the VegVault file
::open_vault(path = "<path_to_VegVault>") %>%
vaultkeepr# Add the dataset information
::get_datasets() %>%
vaultkeepr# Select modern plot data and climate
::select_dataset_by_type(
vaultkeeprsel_dataset_type = c(
"vegetation_plot",
"gridpoints"
)%>%
) # Limit data to Czech Republic
::select_dataset_by_geo(
vaultkeeprlat_lim = c(48.5, 51.1),
long_lim = c(12, 18.9)
%>%
) # Add samples
::get_samples() %>%
vaultkeepr# select only modern data
::select_samples_by_age(
vaultkeeprage_lim = c(0, 0)
%>%
) # Add abiotic data
::get_abiotic_data() %>%
vaultkeepr# Select only Mean Anual Temperature (bio1)
::select_abiotic_var_by_name(sel_var_name = "bio1") %>%
vaultkeepr# add taxa
::get_taxa() %>%
vaultkeepr::extract_data() vaultkeepr
Now we can simply plot both the climatic data and the plot vegetation data:
Example 3: Patterns of plant height (CWM) for South and Central Latin America between 6-12 ka
The third example focuses on obtaining data to be able to reconstruct plant height for South and Central America between 6-12 ka cal yr BP (thousand years before present). This example project showcases the integration of trait data with paleo-vegetation records to subsequently study historical vegetation dynamics and functional composition of plant communities:
<-
data_la_traits # Acess the VegVault file
::open_vault(path = "<path_to_VegVault>") %>%
vaultkeepr# Add the dataset information
::get_datasets() %>%
vaultkeepr# Select modern plot data and climate
::select_dataset_by_type(
vaultkeeprsel_dataset_type = c(
"fossil_pollen_archive",
"traits"
)%>%
) # Limit data to South and Central America
::select_dataset_by_geo(
vaultkeeprlat_lim = c(-53, 28),
long_lim = c(-110, -38),
sel_dataset_type = c(
"fossil_pollen_archive",
"traits"
)%>%
) # Add samples
::get_samples() %>%
vaultkeepr# Limit to 6-12 ka yr BP
::select_samples_by_age(
vaultkeeprage_lim = c(6e3, 12e3)
%>%
) # add taxa & clasify all data to a genus level
::get_taxa(classify_to = "genus") %>%
vaultkeepr# add trait information & clasify all data to a genus level
::get_traits(classify_to = "genus") %>%
vaultkeepr# Only select the plant height
::select_traits_by_domain_name(sel_domain = "Plant heigh") %>%
vaultkeepr::extract_data() vaultkeepr
Now let’s plot the overview of the data