Working with Dataframes

Filtering

In the following examples, we’ll use a dataframe representing semantic mappings between disease ontologies in the SSSOM format:

subject_id

predicate_id

object_id

mapping_justification

DOID:0080795

skos:exactMatch

EFO:0003029

semapv:ManualMappingCuration

DOID:0080795

skos:exactMatch

mesh:D015471

semapv:ManualMappingCuration

DOID:0080799

skos:exactMatch

EFO:1000527

semapv:ManualMappingCuration

DOID:0080808

skos:exactMatch

mesh:D000069295

semapv:ManualMappingCuration

To get the set of unique prefixes appearing in a column, use curies.dataframe.get_df_unique_prefixes():

from curies.dataframe import get_df_unique_prefixes

df = ...
prefixes = get_df_unique_prefixes(df, column="object_id")
assert prefixes == {"EFO", "mesh"}

To filter to objects that use EFO, use curies.dataframe.filter_df_by_prefixes():

from curies.dataframe import filter_df_by_prefixes

df = ...
df = filter_df_by_prefixes(df, column="object_id", prefixes=["efo"])

subject_id

predicate_id

object_id

mapping_justification

DOID:0080795

skos:exactMatch

EFO:0003029

semapv:ManualMappingCuration

DOID:0080799

skos:exactMatch

EFO:1000527

semapv:ManualMappingCuration

To filter to rows that have the subject DOID:0080795, use curies.dataframe.filter_df_by_curies():

from curies.dataframe import filter_df_by_curies

df = ...
df = filter_df_by_curies(df, column="subjects_id", curies=["DOID:0080795"])

subject_id

predicate_id

object_id

mapping_justification

DOID:0080795

skos:exactMatch

EFO:0003029

semapv:ManualMappingCuration

DOID:0080795

skos:exactMatch

mesh:D015471

semapv:ManualMappingCuration