filter_by_n_samples()
| filter_by_n_samples | R Documentation |
Filter Taxa by Minimum Number of Spatio-Temporal Samples
Description
Filters out taxa that are not present in a sufficient number of spatio-temporal samples (distinct dataset-age combinations). Only taxa occurring in at least ‘min_n_samples’ distinct ‘(dataset_name, age)’ combinations are retained. This removes taxa that are present in too few interpolated time steps to provide reliable co-occurrence signal.
Usage
filter_by_n_samples(data = NULL, min_n_samples = 1)
Arguments
data
|
A data frame containing community data in long format. Must include columns ‘taxon’, ‘dataset_name’, and ‘age’. |
min_n_samples
|
A single positive integer specifying the minimum number of distinct spatio-temporal samples (dataset-age combinations) a taxon must appear in to be retained. Default is 1 (no filtering). |
Details
The function counts distinct ‘(dataset_name, age)’ combinations per ‘taxon’. Taxa with fewer combinations than ‘min_n_samples’ are removed. An error is raised if no taxa remain after filtering, which may indicate that ‘min_n_samples’ is set too high.
Value
A filtered data frame containing only taxa that appear in at least ‘min_n_samples’ distinct spatio-temporal samples. All original columns are preserved.
See Also
[filter_community_by_n_cores()], [filter_rare_taxa()], [select_n_taxa()]