filter_df_by_prefixes
- filter_df_by_prefixes(df: pd.DataFrame, *, column: str | int, prefixes: str | Collection[str], method: PrefixIndexMethod | None = None, validate: bool = False, converter: Converter | None = None) pd.DataFrame[source]
Filter a dataframe based on CURIEs in a given column having a given prefix or set of prefixes.
- Parameters:
df – A dataframe
column – The integer index or column name of a column containing CURIEs
prefixes – The prefix (given as a string) or collection of prefixes (given as a list, set, etc.) to keep
method – The implementation for getting the prefix index
validate – Should the prefixes be validated against the converter?
converter – A converter for validating CURIEs
- Returns:
If not in place, return a new dataframe.
Example usage:
import pandas as pd from curies.dataframe import filter_df_by_prefixes rows = [ ("DOID:0080795", "skos:exactMatch", "EFO:0003029", "semapv:ManualMappingCuration"), ("DOID:0080795", "skos:exactMatch", "mesh:D015471", "semapv:ManualMappingCuration"), ("DOID:0080799", "skos:exactMatch", "EFO:1000527", "semapv:ManualMappingCuration"), ( "DOID:0080808", "skos:exactMatch", "mesh:D000069295", "semapv:ManualMappingCuration", ), ] df = pd.DataFrame( rows, columns=["subject_id", "predicate_id", "object_id", "mapping_justification"] ) filtered_df = filter_df_by_prefixes(df, column="object_id", prefixes=["EFO"])
This results in the following dataframe:
subject_id
predicate_id
object_id
mapping_justification
DOID:0080795
skos:exactMatch
EFO:0003029
semapv:ManualMappingCuration
DOID:0080799
skos:exactMatch
EFO:1000527
semapv:ManualMappingCuration
Internally, this function uses
get_filter_df_by_prefixes_index().