filter_df_by_curies
- filter_df_by_curies(df: pd.DataFrame, *, column: str | int, curies: str | Collection[str]) pd.DataFrame[source]
Filter a dataframe based on CURIEs in a given column having a given prefix or set of prefixes.
- Parameters:
df – A dataframe
column – The integer index or column name of a column containing CURIEs
curies – The CURIE (given as a string) or collection of CURIEs (given as a list, set, etc.) to keep
- Returns:
If not in place, return a new dataframe.
Example usage:
import pandas as pd from curies.dataframe import filter_df_by_curies rows = [ ("DOID:0080795", "skos:exactMatch", "EFO:0003029", "semapv:ManualMappingCuration"), ("DOID:0080795", "skos:exactMatch", "mesh:D015471", "semapv:ManualMappingCuration"), ("DOID:0080799", "skos:exactMatch", "EFO:1000527", "semapv:ManualMappingCuration"), ( "DOID:0080808", "skos:exactMatch", "mesh:D000069295", "semapv:ManualMappingCuration", ), ] df = pd.DataFrame( rows, columns=["subject_id", "predicate_id", "object_id", "mapping_justification"] ) filtered_df = filter_df_by_curies(df, column="subject_id", prefixes=["DOID:0080795"])
This results in the following dataframe:
subject_id
predicate_id
object_id
mapping_justification
DOID:0080795
skos:exactMatch
EFO:0003029
semapv:ManualMappingCuration
DOID:0080795
skos:exactMatch
mesh:D015471
semapv:ManualMappingCuration