filter_df_by_curies

filter_df_by_curies(df: pd.DataFrame, *, column: str | int, curies: str | Collection[str]) pd.DataFrame[source]

Filter a dataframe based on CURIEs in a given column having a given prefix or set of prefixes.

Parameters:
  • df – A dataframe

  • column – The integer index or column name of a column containing CURIEs

  • curies – The CURIE (given as a string) or collection of CURIEs (given as a list, set, etc.) to keep

Returns:

If not in place, return a new dataframe.

Example usage:

import pandas as pd
from curies.dataframe import filter_df_by_curies

rows = [
    ("DOID:0080795", "skos:exactMatch", "EFO:0003029", "semapv:ManualMappingCuration"),
    ("DOID:0080795", "skos:exactMatch", "mesh:D015471", "semapv:ManualMappingCuration"),
    ("DOID:0080799", "skos:exactMatch", "EFO:1000527", "semapv:ManualMappingCuration"),
    (
        "DOID:0080808",
        "skos:exactMatch",
        "mesh:D000069295",
        "semapv:ManualMappingCuration",
    ),
]
df = pd.DataFrame(
    rows, columns=["subject_id", "predicate_id", "object_id", "mapping_justification"]
)
filtered_df = filter_df_by_curies(df, column="subject_id", prefixes=["DOID:0080795"])

This results in the following dataframe:

subject_id

predicate_id

object_id

mapping_justification

DOID:0080795

skos:exactMatch

EFO:0003029

semapv:ManualMappingCuration

DOID:0080795

skos:exactMatch

mesh:D015471

semapv:ManualMappingCuration