get_df_unique_prefixes

get_df_unique_prefixes(df: DataframeOrSeries, *, column: str | int | None = None, validate: bool = False, converter: Converter | None = None) set[str][source]

Get unique prefixes.

Parameters:
  • df – A dataframe or series. If a dataframe is given, the column must not be none.

  • column – The column to check, if a dataframe was passed. If a series was passed, this can be left as none.

  • validate – Should the prefixes be validated against the converter?

  • converter – A converter for validating CURIEs

Returns:

A set of prefixes appearing in CURIEs in the given column

import pandas as pd
from curies.dataframe import get_df_unique_prefixes

rows = [
    ("DOID:0080795", "skos:exactMatch", "EFO:0003029", "semapv:ManualMappingCuration"),
    ("DOID:0080795", "skos:exactMatch", "mesh:D015471", "semapv:ManualMappingCuration"),
    ("DOID:0080799", "skos:exactMatch", "EFO:1000527", "semapv:ManualMappingCuration"),
    (
        "DOID:0080808",
        "skos:exactMatch",
        "mesh:D000069295",
        "semapv:ManualMappingCuration",
    ),
]
df = pd.DataFrame(
    rows, columns=["subject_id", "predicate_id", "object_id", "mapping_justification"]
)
assert get_df_unique_prefixes(df, column="object_id") == {"EFO", "mesh"}