filter_by_n_samples()

filter_by_n_samples R Documentation

Filter Taxa by Minimum Number of Spatio-Temporal Samples

Description

Filters out taxa that are not present in a sufficient number of spatio-temporal samples (distinct dataset-age combinations). Only taxa occurring in at least ‘min_n_samples’ distinct ‘(dataset_name, age)’ combinations are retained. This removes taxa that are present in too few interpolated time steps to provide reliable co-occurrence signal.

Usage

filter_by_n_samples(data = NULL, min_n_samples = 1)

Arguments

data

A data frame containing community data in long format. Must include columns ‘taxon’, ‘dataset_name’, and ‘age’.

min_n_samples

A single positive integer specifying the minimum number of distinct spatio-temporal samples (dataset-age combinations) a taxon must appear in to be retained. Default is 1 (no filtering).

Details

The function counts distinct ‘(dataset_name, age)’ combinations per ‘taxon’. Taxa with fewer combinations than ‘min_n_samples’ are removed. An error is raised if no taxa remain after filtering, which may indicate that ‘min_n_samples’ is set too high.

Value

A filtered data frame containing only taxa that appear in at least ‘min_n_samples’ distinct spatio-temporal samples. All original columns are preserved.

See Also

[filter_community_by_n_cores()], [filter_rare_taxa()], [select_n_taxa()]

Back to top