VegVault

— press START to begin —

Ondřej Mottl

2025-03-23

🖼️Set the scene

Most biodiversity models rely on contemporary data

❌but ecosystems evolve over millennia!


There are many biodiversity databases

❌but they are often not connected!


What if we want to combine data from multiple databases and analyze them together? 🤔

🧙About me

Ondra Mottl

Assistant Professor at Charles University

🕹️Example project

Spatiotemporal patterns of the Picea genus across North America since the Last Glacial Maximum




We need:

  • Contemporary vegetation data
    • Access BIEN & sPlotOpen & …
    • Download & Process only North America
  • Fossil pollen data
    • Access Neotoma
    • Download & process only North America between 0-15 ka BP
  • Harmonize everything to genus level

🕹️Example project

Goal: To retrieve data for the genus Picea by selecting plot datasets for both contemporary and fossil pollen, filtering samples by geographic boundaries (North America) and temporal range (0 to 15,000 yr BP), and harmonizing taxa to the genus level.😱

Not so difficult with VegVault!

data_na_plots_picea <-
  # Access the VegVault
  vaultkeepr::open_vault(path = "<path_to_VegVault>") %>%
  # Start by adding dataset information
  vaultkeepr::get_datasets() %>%
  # Select both contemporary and paleo plot data
  vaultkeepr::select_dataset_by_type(
    sel_dataset_type = c(
      "vegetation_plot",
      "fossil_pollen_archive"
    )
  ) %>%
  # Limit data to North America
  vaultkeepr::select_dataset_by_geo(
    lat_lim = c(22, 60),
    long_lim = c(-135, -60)
  ) %>%
  # Add samples
  vaultkeepr::get_samples() %>%
  # Limit the samples by age
  vaultkeepr::select_samples_by_age(
    age_lim = c(0, 15e3)
  ) %>%
  # Add taxa & classify all data to a genus level
  vaultkeepr::get_taxa(classify_to = "genus") %>%
  # Extract only Picea data
  vaultkeepr::select_taxa_by_name(sel_taxa = "Picea") %>%
  vaultkeepr::extract_data()

🕹️Example project

🏛️VegVault

  • A global database linking interdisciplinary vegetation data!
  • A powerful tool for researchers exploring biodiversity dynamics across time & space
  • Open Source & Reproducible
  • 🔗 Website: bit.ly/VegVault

🏛️VegVault

  • 🌿 Paleoecological records (Neotoma, FOSSILPOL)
  • 🌱 Contemporary vegetation (BIEN, sPlotOpen)
  • 🌾 Functional traits (TRY, BIEN)
  • 🌍 Climate & soil data (CHELSA, WoSIS)

🏛️VegVault

🔢In numbers: 110 GB of SQLite data | 29 tables & 87 variables | 480,000+ datasets | 13M+ samples | 110,000+ taxa | 11M+ trait values | 8 abiotic variables

🏛️VegVault

Key innovations:

  • Plot based data (both contemporary and paleo)
  • Abiotic data linked to each plot
  • Taxonomic classification up to family level


🔑{vaultkeepr}

An Open Source R package📦–your key🔑 to VegVault🌿🔒🏛️

CRAN-status R-CMD-check Codecov-test-coverage



  • ✅ No SQL required – just load & go!
  • ✅ Filter by taxa, traits, time & space
  • ✅ Directly usable in R for analysis
  • 🔗 Learn more: bit.ly/vaultkeepr

🕹️Another example


Patterns of CWM of plant height for South and Central America between 6-12 ka


Goal: obtaining data to be able to reconstruct CWM of plant height for South and Central America between 6-12 ka cal yr BP (thousand years before present).

🕹️Another example

data_la_traits <-
  # Acess the VegVault file
  vaultkeepr::open_vault(path = "<path_to_VegVault>") %>%
  # Add the dataset information
  vaultkeepr::get_datasets() %>%
  # Select contemporarly plot data and climate
  vaultkeepr::select_dataset_by_type(
    sel_dataset_type = c(
      "fossil_pollen_archive",
      "traits"
    )
  ) %>%
  # Limit data to South and Central America
  vaultkeepr::select_dataset_by_geo(
    lat_lim = c(-53, 28),
    long_lim = c(-110, -38),
    sel_dataset_type = c(
      "fossil_pollen_archive",
      "traits"
    )
  ) %>%
  # Add samples
  vaultkeepr::get_samples() %>%
  # Limit to 6-12 ka yr BP
  vaultkeepr::select_samples_by_age(
    age_lim = c(6e3, 12e3)
  ) %>%
  # add taxa & clasify all data to a genus level
  vaultkeepr::get_taxa(classify_to = "genus") %>%
  # add trait information & clasify all data to a genus level
  vaultkeepr::get_traits(classify_to = "genus") %>%
  # Only select the plant height
  vaultkeepr::select_traits_by_domain_name(
    sel_domain = "Plant height"
  ) %>%
  vaultkeepr::extract_data()

🕹️Another example

📄 Preprint available!


🧑‍💻Laboratory of Quantitative Ecology

  • Vegetation biodiversity, including taxonomic, functional, and phylogenetic dimensions

  • State-of-the-Art Tools and Methods

  • Interdisciplinary approach

  • 🔗 Website: bit.ly/CUNI_QuantitativeEcology

📖Open Science



  • Open Science Champion at CUNI
  • Open Data, Open Source, Open Access
  • Advocating for reproducibility and transparency

🪄SPROuT

Science Powered through Reproducibility, Openness, and Teamwork

  • How to manage your files and code?
  • How to make your research reproducible?
  • How to never lose your changes and data?
  • How to collaborate with others?
  • How to make cool presentation like this one?

Data Science, Version Control, R, QUARTO, … all specifically tailored for ecologists!

Highly recommended for all Geobotany students‼️

🪄SPROuT

What do the students say?

I truly enjoyed using SPROuT! It’s an invaluable tool and definitely a must-have for anyone working in academia.


It was a very useful course that everone doing or planing to do research should attend to.


Every lecture provided valuable skills essential for my development as a researcher

🚀SSoQE 2025

Science School of, Quantitative Ecology

Joined international course by Charles University and the University of Bayreuth

  • 📅Date: 15.09.2025 - 20.09.2025 (Monday to Saturday)
  • 🗺️Location: Mariánské Lázně, Czech Republic
  • 🔬Engaging Lectures, 🤝Hands-On Practice, 🍻Social Events,…
  • 🔗 Website: bit.ly/SSoQE

🚀SSoQE 2025

This Presentation

Author: Ondrej Mottl