Converter

class Converter(records: List[Record], *, delimiter: str = ':', strict: bool = True)[source]

Bases: object

A cached prefix map data structure.

# Construct a prefix map:
>>> converter = Converter.from_prefix_map({
...    "CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
...    "MONDO": "http://purl.obolibrary.org/obo/MONDO_",
...    "GO": "http://purl.obolibrary.org/obo/GO_",
...    "OBO": "http://purl.obolibrary.org/obo/",
... })

# Compression and Expansion:
>>> converter.compress("http://purl.obolibrary.org/obo/CHEBI_1")
'CHEBI:1'
>>> converter.expand("CHEBI:1")
'http://purl.obolibrary.org/obo/CHEBI_1'

# Example with unparsable URI:
>>> converter.compress("http://example.com/missing:0000000")

# Example with missing prefix:
>>> converter.expand("missing:0000000")

Instantiate a converter.

Parameters:
  • records – A list of records. If you plan to build a converter incrementally, pass an empty list.

  • strict – If true, raises issues on duplicate URI prefixes

  • delimiter – The delimiter used for CURIEs. Defaults to a colon.

Raises:

Attributes Summary

bimap

Get the bijective mapping between CURIE prefixes and URI prefixes.

Methods Summary

add_prefix(prefix, uri_prefix[, ...])

Append a prefix to the converter.

add_record(record[, case_sensitive, merge])

Append a record to the converter.

compress()

Compress a URI to a CURIE, if possible.

compress_or_standardize()

Compress a URI or standardize a CURIE.

compress_strict(uri)

Compress a URI to a CURIE, and raise an error of not possible.

expand()

Expand a CURIE to a URI, if possible.

expand_all(curie)

Expand a CURIE pair to all possible URIs.

expand_or_standardize()

Expand a CURIE or standardize a URI.

expand_pair(prefix, identifier)

Expand a CURIE pair to the standard URI.

expand_pair_all(prefix, identifier)

Expand a CURIE pair to all possible URIs.

expand_strict(curie)

Expand a CURIE to a URI, and raise an error of not possible.

file_compress(path, column, *[, sep, ...])

Convert all URIs in the given column of a CSV file to CURIEs.

file_expand(path, column, *[, sep, header, ...])

Convert all CURIEs in the given column of a CSV file to URIs.

format_curie(prefix, identifier)

Format a prefix and identifier into a CURIE string.

from_extended_prefix_map(records, **kwargs)

Get a converter from a list of dictionaries by creating records out of them.

from_jsonld(data, **kwargs)

Get a converter from a JSON-LD object, which contains a prefix map in its @context key.

from_jsonld_github(owner, repo, *path[, branch])

Construct a remote JSON-LD URL on GitHub then parse with Converter.from_jsonld().

from_prefix_map(prefix_map, **kwargs)

Get a converter from a simple prefix map.

from_priority_prefix_map(data, **kwargs)

Get a converter from a priority prefix map.

from_rdflib(graph_or_manager, **kwargs)

Get a converter from an RDFLib graph or namespace manager.

from_reverse_prefix_map(reverse_prefix_map, ...)

Get a converter from a reverse prefix map.

from_shacl(graph[, format])

Get a converter from SHACL, either in a turtle f.

get_prefixes(*[, include_synonyms])

Get the set of prefixes covered by this converter.

get_record(prefix)

Get the record for the prefix.

get_subconverter(prefixes)

Get a converter with a subset of prefixes.

get_uri_prefixes(*[, include_synonyms])

Get the set of URI prefixes covered by this converter.

is_curie(s)

Check if the string can be parsed as a CURIE by this converter.

is_uri(s)

Check if the string can be parsed as a URI by this converter.

parse_curie(curie)

Parse a CURIE.

parse_uri(uri)

Compress a URI to a CURIE pair.

pd_compress(df, column[, target_column, ...])

Convert all URIs in the given column to CURIEs.

pd_expand(df, column[, target_column, ...])

Convert all CURIEs in the given column to URIs.

pd_standardize_curie(df, *, column[, ...])

Standardize all CURIEs in the given column.

pd_standardize_prefix(df, *, column[, ...])

Standardize all prefixes in the given column.

pd_standardize_uri(df, *, column[, ...])

Standardize all URIs in the given column.

standardize_curie()

Standardize a CURIE.

standardize_prefix()

Standardize a prefix.

standardize_uri()

Standardize a URI.

Attributes Documentation

bimap

Get the bijective mapping between CURIE prefixes and URI prefixes.

Methods Documentation

add_prefix(prefix: str, uri_prefix: str, prefix_synonyms: Collection[str] | None = None, uri_prefix_synonyms: Collection[str] | None = None, *, case_sensitive: bool = True, merge: bool = False) None[source]

Append a prefix to the converter.

Parameters:
  • prefix – The prefix to append, e.g., go

  • uri_prefix – The URI prefix to append, e.g., http://purl.obolibrary.org/obo/GO_

  • prefix_synonyms – An optional collection of synonyms for the prefix such as gomf, gocc, etc.

  • uri_prefix_synonyms – An optional collections of synonyms for the URI prefix such as https://bioregistry.io/go:, http://www.informatics.jax.org/searches/GO.cgi?id=GO:, etc.

  • case_sensitive – Should prefixes and URI prefixes be compared in a case-sensitive manner when checking for uniqueness? Defaults to True.

  • merge – Should this record be merged into an existing record if it uniquely maps to a single existing record? When false, will raise an error if one or more existing records can be mapped. Defaults to false.

This can be used to add missing namespaces on-the-fly to an existing converter:

>>> import curies
>>> converter = curies.get_obo_converter()
>>> converter.add_prefix("hgnc", "https://bioregistry.io/hgnc:")
>>> converter.expand("hgnc:1234")
'https://bioregistry.io/hgnc:1234'
>>> converter.expand("GO:0032571")
'http://purl.obolibrary.org/obo/GO_0032571'

This can also be used to incrementally build up a converter from scratch:

>>> import curies
>>> converter = curies.Converter(records=[])
>>> converter.add_prefix("hgnc", "https://bioregistry.io/hgnc:")
>>> converter.expand("hgnc:1234")
'https://bioregistry.io/hgnc:1234'
add_record(record: Record, case_sensitive: bool = True, merge: bool = False) None[source]

Append a record to the converter.

compress(uri: str, *, strict: Literal[True] = True, passthrough: bool = False) str[source]
compress(uri: str, *, strict: Literal[False] = False, passthrough: Literal[True] = True) str
compress(uri: str, *, strict: Literal[False] = False, passthrough: Literal[False] = False) str | None

Compress a URI to a CURIE, if possible.

Parameters:
  • uri – A string representing a valid uniform resource identifier (URI)

  • strict – If true and the URI can’t be compressed, returns an error. Defaults to false.

  • passthrough – If true, strict is false, and the URI can’t be compressed, return the input. Defaults to false.

Returns:

A compact URI if this converter could find an appropriate URI prefix, otherwise none.

Raises:

CompressionError – If strict is set to true and the URI can’t be compressed

>>> from curies import Converter
>>> converter = Converter.from_prefix_map({
...    "CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
...    "MONDO": "http://purl.obolibrary.org/obo/MONDO_",
...    "GO": "http://purl.obolibrary.org/obo/GO_",
...    "OBO": "http://purl.obolibrary.org/obo/",
... })
>>> converter.compress("http://purl.obolibrary.org/obo/GO_0032571")
'GO:0032571'
>>> converter.compress("http://purl.obolibrary.org/obo/go.owl")
'OBO:go.owl'
>>> converter.compress("http://example.org/missing:0000000")

Note

If there are partially overlapping URI prefixes in this converter (e.g., http://purl.obolibrary.org/obo/GO_ for the prefix GO and http://purl.obolibrary.org/obo/ for the prefix OBO), the longest URI prefix will always be matched. For example, parsing http://purl.obolibrary.org/obo/GO_0032571 will return GO:0032571 instead of OBO:GO_0032571.

compress_or_standardize(uri_or_curie: str, *, strict: Literal[True] = True, passthrough: bool = False) str[source]
compress_or_standardize(uri_or_curie: str, *, strict: Literal[False] = False, passthrough: Literal[True] = True) str
compress_or_standardize(uri_or_curie: str, *, strict: Literal[False] = False, passthrough: Literal[False] = False) str | None

Compress a URI or standardize a CURIE.

Parameters:
  • uri_or_curie – A string representing a compact URI (CURIE) or a URI.

  • strict – If true and the string is neither a URI that can be compressed nor a CURIE that can be standardized, returns an error. Defaults to false.

  • passthrough – If true, strict is false, and the string is neither a URI that can be compressed nor a CURIE that can be standardized, return the input. Defaults to false.

Returns:

If the string is a URI, and it can be compressed, returns the corresponding CURIE. If the string is a CURIE, and it can be standardized, returns the standard CURIE.

Raises:

CompressionError – If strict is true and the URI can’t be compressed

>>> from curies import Converter, Record
>>> converter = Converter.from_extended_prefix_map([
...     Record(
...          prefix="CHEBI",
...          prefix_synonyms=["chebi"],
...          uri_prefix="http://purl.obolibrary.org/obo/CHEBI_",
...          uri_prefix_synonyms=["https://identifiers.org/chebi:"],
...     ),
... ])
>>> converter.compress_or_standardize("http://purl.obolibrary.org/obo/CHEBI_138488")
'CHEBI:138488'
>>> converter.compress_or_standardize("https://identifiers.org/chebi:138488")
'CHEBI:138488'
>>> converter.compress_or_standardize("CHEBI:138488")
'CHEBI:138488'
>>> converter.compress_or_standardize("chebi:138488")
'CHEBI:138488'
>>> converter.compress_or_standardize("missing:0000000")
>>> converter.compress_or_standardize("https://example.com/missing:0000000")
compress_strict(uri: str) str[source]

Compress a URI to a CURIE, and raise an error of not possible.

expand(curie: str, *, strict: Literal[True] = True, passthrough: bool = False) str[source]
expand(curie: str, *, strict: Literal[False] = False, passthrough: Literal[True] = True) str
expand(curie: str, *, strict: Literal[False] = False, passthrough: Literal[False] = False) str | None

Expand a CURIE to a URI, if possible.

Parameters:
  • curie – A string representing a compact URI (CURIE)

  • strict – If true and the CURIE can’t be expanded, returns an error. Defaults to false.

  • passthrough – If true, strict is false, and the CURIE can’t be expanded, return the input. Defaults to false. If your strings can either be a CURIE _or_ a URI, consider using Converter.expand_or_standardize() instead.

Returns:

A URI if this converter contains a URI prefix for the prefix in this CURIE

Raises:

ExpansionError – If strict is true and the CURIE can’t be expanded

>>> from curies import Converter
>>> converter = Converter.from_prefix_map({
...    "CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
...    "MONDO": "http://purl.obolibrary.org/obo/MONDO_",
...    "GO": "http://purl.obolibrary.org/obo/GO_",
... })
>>> converter.expand("CHEBI:138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'
>>> converter.expand("missing:0000000")
expand_all(curie: str) Collection[str] | None[source]

Expand a CURIE pair to all possible URIs.

Parameters:

curie – A string representing a compact URI

Returns:

A list of URIs that this converter can create for the given CURIE. The first entry is the “standard” URI then others are based on URI prefix synonyms. If the prefix is not registered to this converter, none is returned.

>>> priority_prefix_map = {
...     "CHEBI": [
...         "http://purl.obolibrary.org/obo/CHEBI_",
...         "https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:",
...     ],
... }
>>> converter = Converter.from_priority_prefix_map(priority_prefix_map)
>>> converter.expand_all("CHEBI:138488")
['http://purl.obolibrary.org/obo/CHEBI_138488', 'https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:138488']
>>> converter.expand_all("NOPE:NOPE") is None
True
expand_or_standardize(curie_or_uri: str, *, strict: Literal[True] = True, passthrough: bool = False) str[source]
expand_or_standardize(curie_or_uri: str, *, strict: Literal[False] = False, passthrough: Literal[True] = True) str
expand_or_standardize(curie_or_uri: str, *, strict: Literal[False] = False, passthrough: Literal[False] = False) str | None

Expand a CURIE or standardize a URI.

Parameters:
  • curie_or_uri – A string representing a compact URI (CURIE) or a URI.

  • strict – If true and the string is neither a CURIE that can be expanded nor a URI that can be standardized, returns an error. Defaults to false.

  • passthrough – If true, strict is false, and the string is neither a CURIE that can be expanded nor a URI that can be standardized, return the input. Defaults to false.

Returns:

If the string is a CURIE, and it can be expanded, returns the corresponding URI. If the string is a URI, and it can be standardized, returns the standard URI.

Raises:

ExpansionError – If strict is true and the CURIE can’t be expanded

>>> from curies import Converter, Record
>>> converter = Converter.from_extended_prefix_map([
...     Record(
...          prefix="CHEBI",
...          prefix_synonyms=["chebi"],
...          uri_prefix="http://purl.obolibrary.org/obo/CHEBI_",
...          uri_prefix_synonyms=["https://identifiers.org/chebi:"],
...     ),
... ])
>>> converter.expand_or_standardize("CHEBI:138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'
 >>> converter.expand_or_standardize("chebi:138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'
>>> converter.expand_or_standardize("http://purl.obolibrary.org/obo/CHEBI_138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'
>>> converter.expand_or_standardize("https://identifiers.org/chebi:138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'
>>> converter.expand_or_standardize("missing:0000000")
>>> converter.expand_or_standardize("https://example.com/missing:0000000")
expand_pair(prefix: str, identifier: str) str | None[source]

Expand a CURIE pair to the standard URI.

Parameters:
  • prefix – The prefix of the CURIE

  • identifier – The local unique identifier of the CURIE

Returns:

A URI if this converter contains a URI prefix for the prefix in this CURIE

>>> from curies import Converter
>>> converter = Converter.from_prefix_map({
...    "CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
...    "MONDO": "http://purl.obolibrary.org/obo/MONDO_",
...    "GO": "http://purl.obolibrary.org/obo/GO_",
... })
>>> converter.expand_pair("CHEBI", "138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'
>>> converter.expand_pair("missing", "0000000")
expand_pair_all(prefix: str, identifier: str) Collection[str] | None[source]

Expand a CURIE pair to all possible URIs.

Parameters:
  • prefix – The prefix of the CURIE

  • identifier – The local unique identifier of the CURIE

Returns:

A list of URIs that this converter can create for the given CURIE. The first entry is the “standard” URI then others are based on URI prefix synonyms. If the prefix is not registered to this converter, none is returned.

>>> priority_prefix_map = {
...     "CHEBI": [
...         "http://purl.obolibrary.org/obo/CHEBI_",
...         "https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:",
...     ],
... }
>>> converter = Converter.from_priority_prefix_map(priority_prefix_map)
>>> converter.expand_pair_all("CHEBI", "138488")
['http://purl.obolibrary.org/obo/CHEBI_138488', 'https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:138488']
>>> converter.expand_pair_all("NOPE", "NOPE") is None
True
expand_strict(curie: str) str[source]

Expand a CURIE to a URI, and raise an error of not possible.

file_compress(path: str | Path, column: int, *, sep: str | None = None, header: bool = True, strict: bool = False, passthrough: bool = False, ambiguous: bool = False) None[source]

Convert all URIs in the given column of a CSV file to CURIEs.

Parameters:
  • path – A pandas DataFrame

  • column – The column in the dataframe containing URIs to convert to CURIEs.

  • sep – The delimiter of the CSV file, defaults to tab

  • header – Does the file have a header row?

  • strict – If true and the URI can’t be compressed, returns an error. Defaults to false.

  • passthrough – If true, strict is false, and the URI can’t be compressed, return the input. Defaults to false.

  • ambiguous – If true, consider the column as containing either CURIEs or URIs.

file_expand(path: str | Path, column: int, *, sep: str | None = None, header: bool = True, strict: bool = False, passthrough: bool = False, ambiguous: bool = False) None[source]

Convert all CURIEs in the given column of a CSV file to URIs.

Parameters:
  • path – A pandas DataFrame

  • column – The column in the dataframe containing CURIEs to convert to URIs.

  • sep – The delimiter of the CSV file, defaults to tab

  • header – Does the file have a header row?

  • strict – If true and the CURIE can’t be expanded, returns an error. Defaults to false.

  • passthrough – If true, strict is false, and the CURIE can’t be expanded, return the input. Defaults to false.

  • ambiguous – If true, consider the column as containing either CURIEs or URIs.

format_curie(prefix: str, identifier: str) str[source]

Format a prefix and identifier into a CURIE string.

classmethod from_extended_prefix_map(records: str | Path | Iterable[Record | Dict[str, Any]], **kwargs: Any) Converter[source]

Get a converter from a list of dictionaries by creating records out of them.

Parameters:
  • records

    One of the following:

    • An iterable of curies.Record objects or dictionaries that will get converted into record objects that together constitute an extended prefix map

    • A string containing a remote location of a JSON file containg an extended prefix map

    • A string or pathlib.Path object corresponding to a local file path to a JSON file containing an extended prefix map

  • kwargs – Keyword arguments to pass to curies.Converter.__init__()

Returns:

A converter

An extended prefix map is a list of dictionaries containing four keys:

  1. A prefix string

  2. A uri_prefix string

  3. An optional list of strings prefix_synonyms

  4. An optional list of strings uri_prefix_synonyms

Across the whole list of dictionaries, there should be uniqueness within the union of all prefix and prefix_synonyms as well as uniqueness within the union of all uri_prefix and uri_prefix_synonyms.

>>> epm = [
...     {
...         "prefix": "CHEBI",
...         "prefix_synonyms": ["chebi", "ChEBI"],
...         "uri_prefix": "http://purl.obolibrary.org/obo/CHEBI_",
...         "uri_prefix_synonyms": ["https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:"],
...     },
...     {
...         "prefix": "GO",
...         "uri_prefix": "http://purl.obolibrary.org/obo/GO_",
...     },
... ]
>>> converter = Converter.from_extended_prefix_map(epm)

Expand using the preferred/canonical prefix:

>>> converter.expand("CHEBI:138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'

Expand using a prefix synonym:

>>> converter.expand("chebi:138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'

Compress using the preferred/canonical URI prefix:

>>> converter.compress("http://purl.obolibrary.org/obo/CHEBI_138488")
'CHEBI:138488'

Compressing using a URI prefix synonym:

>>> converter.compress("https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:138488")
'CHEBI:138488'

Example from a remote source:

>>> url = "https://github.com/biopragmatics/bioregistry/raw/main/exports/contexts/bioregistry.epm.json"
>>> converter = Converter.from_extended_prefix_map(url)
classmethod from_jsonld(data: str | Path | Dict[str, Any], **kwargs: Any) Converter[source]

Get a converter from a JSON-LD object, which contains a prefix map in its @context key.

Parameters:
  • data – A JSON-LD object

  • kwargs – Keyword arguments to pass to curies.Converter.__init__()

Returns:

A converter

Example from a remote context file:

>>> base = "https://raw.githubusercontent.com"
>>> url = f"{base}/biopragmatics/bioregistry/main/exports/contexts/semweb.context.jsonld"
>>> converter = Converter.from_jsonld(url)
>>> "rdf" in converter.prefix_map

See also

https://www.w3.org/TR/json-ld11/#the-context defines the @context aspect of JSON-LD

classmethod from_jsonld_github(owner: str, repo: str, *path: str, branch: str = 'main', **kwargs: Any) Converter[source]

Construct a remote JSON-LD URL on GitHub then parse with Converter.from_jsonld().

Parameters:
  • owner – A github repository owner or organization (e.g., biopragmatics)

  • repo – The name of the repository (e.g., bioregistry)

  • path – The file path in the GitHub repository to a JSON-LD context file.

  • branch – The branch from which the file should be downloaded. Defaults to main, for old repositories this might need to be changed to master.

  • kwargs – Keyword arguments to pass to curies.Converter.__init__()

Returns:

A converter

Raises:

ValueError – If the given path doesn’t end in a .jsonld file name

>>> converter = Converter.from_jsonld_github(
...     "biopragmatics", "bioregistry", "exports",
...     "contexts", "semweb.context.jsonld",
... )
>>> "rdf" in converter.prefix_map
True
classmethod from_prefix_map(prefix_map: str | Path | Mapping[str, str], **kwargs: Any) Converter[source]

Get a converter from a simple prefix map.

Parameters:
  • prefix_map

    One of the following:

    • A mapping whose keys represent CURIE prefixes and values represent URI prefixes

    • A string containing a remote location of a JSON file containg a prefix map

    • A string or pathlib.Path object corresponding to a local file path to a JSON file containing a prefix map

  • kwargs – Keyword arguments to pass to curies.Converter.__init__()

Returns:

A converter

>>> converter = Converter.from_prefix_map({
...     "CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
...     "MONDO": "http://purl.obolibrary.org/obo/MONDO_",
...     "GO": "http://purl.obolibrary.org/obo/GO_",
...     "OBO": "http://purl.obolibrary.org/obo/",
... })
>>> converter.expand("CHEBI:138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'
>>> converter.compress("http://purl.obolibrary.org/obo/CHEBI_138488")
'CHEBI:138488'
classmethod from_priority_prefix_map(data: str | Path | Mapping[str, List[str]], **kwargs: Any) Converter[source]

Get a converter from a priority prefix map.

Parameters:
  • data – A prefix map where the keys are prefixes (e.g., chebi) and the values are lists of URI prefixes (e.g., http://purl.obolibrary.org/obo/CHEBI_) with the first element of the list being the priority URI prefix for expansions.

  • kwargs – Keyword arguments to pass to the parent class’s init

Returns:

A converter

>>> priority_prefix_map = {
...     "CHEBI": [
...         "http://purl.obolibrary.org/obo/CHEBI_",
...         "https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:",
...     ],
...     "GO": ["http://purl.obolibrary.org/obo/GO_"],
...     "obo": ["http://purl.obolibrary.org/obo/"],
... }
>>> converter = Converter.from_priority_prefix_map(priority_prefix_map)
>>> converter.expand("CHEBI:138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'
>>> converter.compress("http://purl.obolibrary.org/obo/CHEBI_138488")
'CHEBI:138488'
>>> converter.compress("https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:138488")
'CHEBI:138488'
classmethod from_rdflib(graph_or_manager: rdflib.Graph | rdflib.namespace.NamespaceManager, **kwargs: Any) Converter[source]

Get a converter from an RDFLib graph or namespace manager.

Parameters:
  • graph_or_manager – A RDFLib graph or manager object

  • kwargs – Keyword arguments to pass to from_prefix_map()

Returns:

A converter

In the following example, a rdflib.Graph is created, a namespace is bound to it, then a converter is made:

>>> import rdflib, curies
>>> graph = rdflib.Graph()
>>> graph.bind("hgnc", "https://bioregistry.io/hgnc:")
>>> converter = curies.Converter.from_rdflib(graph)
>>> converter.expand("hgnc:1234")
'https://bioregistry.io/hgnc:1234'

This also works if you directly start with a rdflib.namespace.NamespaceManager:

>>> converter = curies.Converter.from_rdflib(graph.namespace_manager)
>>> converter.expand("hgnc:1234")
'https://bioregistry.io/hgnc:1234'
classmethod from_reverse_prefix_map(reverse_prefix_map: str | Path | Mapping[str, str], **kwargs: Any) Converter[source]

Get a converter from a reverse prefix map.

Parameters:
  • reverse_prefix_map – A mapping whose keys are URI prefixes and whose values are the corresponding prefixes. This data structure allow for multiple different URI formats to point to the same prefix.

  • kwargs – Keyword arguments to pass to curies.Converter.__init__()

Returns:

A converter

>>> converter = Converter.from_reverse_prefix_map({
...     "http://purl.obolibrary.org/obo/CHEBI_": "CHEBI",
...     "https://www.ebi.ac.uk/chebi/searchId.do?chebiId=": "CHEBI",
...     "http://purl.obolibrary.org/obo/MONDO_": "MONDO",
... })
>>> converter.expand("CHEBI:138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'
>>> converter.compress("http://purl.obolibrary.org/obo/CHEBI_138488")
'CHEBI:138488'
>>> converter.compress("https://www.ebi.ac.uk/chebi/searchId.do?chebiId=138488")
'CHEBI:138488'

Altenatively, get content from the internet like

>>> url = "https://github.com/biopragmatics/bioregistry/raw/main/exports/contexts/bioregistry.rpm.json"
>>> converter = Converter.from_reverse_prefix_map(url)
>>> "chebi" in converter.prefix_map
classmethod from_shacl(graph: str | Path | rdflib.Graph, format: str | None = None, **kwargs: Any) Converter[source]

Get a converter from SHACL, either in a turtle f.

Parameters:
  • graph – A RDFLib graph, a Path, a string representing a file path, or a string URL

  • format – The RDF format, if a file path is given

  • kwargs – Keyword arguments to pass to Converter.__init__()

Returns:

A converter

get_prefixes(*, include_synonyms: bool = False) Set[str][source]

Get the set of prefixes covered by this converter.

Parameters:

include_synonyms – If true, include secondary prefixes.

Returns:

A set of primary prefixes covered by the converter. If include_synonyms is set to True, secondary prefixes (i.e., ones in Record.prefix_synonyms are also included

get_record(prefix: str) Record | None[source]

Get the record for the prefix.

get_subconverter(prefixes: Iterable[str]) Converter[source]

Get a converter with a subset of prefixes.

Parameters:

prefixes – A list of prefixes to keep from this converter. These can correspond either to preferred CURIE prefixes or CURIE prefix synonyms.

Returns:

A new, slimmed down converter

This functionality is useful for downstream applications like the following:

  1. You load a comprehensive extended prefix map, e.g., from the Bioregistry using curies.get_bioregistry_converter().

  2. You load some data that conforms to this prefix map by convention. This is often the case for semantic mappings stored in the SSSOM format.

  3. You extract the list of prefixes actually used within your data

  4. You subset the detailed extended prefix map to only include prefixes relevant for your data

  5. You make some kind of output of the subsetted extended prefix map to go with your data. Effectively, this is a way of reconciling data. This is especially effective when using the Bioregistry or other comprehensive extended prefix maps.

Here’s a concrete example of doing this (which also includes a bit of data science) to do this on the SSSOM mappings from the Disease Ontology project.

>>> import curies
>>> import pandas as pd
>>> import itertools as itt
>>> commit = "faca4fc335f9a61902b9c47a1facd52a0d3d2f8b"
>>> url = f"https://raw.githubusercontent.com/mapping-commons/disease-mappings/{commit}/mappings/doid.sssom.tsv"
>>> df = pd.read_csv(url, sep="\t", comment='#')
>>> prefixes = {
...     curies.Reference.from_curie(curie).prefix
...     for column in ["subject_id", "predicate_id", "object_id"]
...     for curie in df[column]
... }
>>> converter = curies.get_bioregistry_converter()
>>> slim_converter = converter.get_subconverter(prefixes)
get_uri_prefixes(*, include_synonyms: bool = False) Set[str][source]

Get the set of URI prefixes covered by this converter.

Parameters:

include_synonyms – If true, include secondary prefixes.

Returns:

A set of primary URI prefixes covered by the converter. If include_synonyms is set to True, secondary URI prefixes (i.e., ones in Record.uri_prefix_synonyms are also included

is_curie(s: str) bool[source]

Check if the string can be parsed as a CURIE by this converter.

Parameters:

s – A string that might be a CURIE

Returns:

If the string can be parsed as a CURIE by this converter. Note that some valid CURIEs, when passed to this function, will result in False if their prefixes are not registered with this converter.

>>> import curies
>>> converter = curies.get_obo_converter()
>>> converter.is_curie("GO:1234567")
True
>>> converter.is_curie("http://purl.obolibrary.org/obo/GO_1234567")
False

The following is a valid CURIE, but the prefix is not registered with the converter based on the OBO Foundry prefix map, so it returns False.

>>> converter.is_curie("pdb:2gc4")
False
is_uri(s: str) bool[source]

Check if the string can be parsed as a URI by this converter.

Parameters:

s – A string that might be a URI

Returns:

If the string can be parsed as a URI by this converter. Note that some valid URIs, when passed to this function, will result in False if their URI prefixes are not registered with this converter.

>>> import curies
>>> converter = curies.get_obo_converter()
>>> converter.is_uri("http://purl.obolibrary.org/obo/GO_1234567")
True
>>> converter.is_uri("GO:1234567")
False

The following is a valid URI, but the prefix is not registered with the converter based on the OBO Foundry prefix map, so it returns False.

>>> converter.is_uri("http://proteopedia.org/wiki/index.php/2gc4")
False
parse_curie(curie: str) ReferenceTuple[source]

Parse a CURIE.

parse_uri(uri: str) ReferenceTuple | Tuple[None, None][source]

Compress a URI to a CURIE pair.

Parameters:

uri – A string representing a valid uniform resource identifier (URI)

Returns:

A CURIE pair if the URI could be parsed, otherwise a pair of None’s

>>> from curies import Converter
>>> converter = Converter.from_prefix_map({
...    "CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
...    "MONDO": "http://purl.obolibrary.org/obo/MONDO_",
...    "GO": "http://purl.obolibrary.org/obo/GO_",
... })
>>> converter.parse_uri("http://purl.obolibrary.org/obo/CHEBI_138488")
ReferenceTuple(prefix='CHEBI', identifier='138488')
>>> converter.parse_uri("http://example.org/missing:0000000")
(None, None)
pd_compress(df: pandas.DataFrame, column: str | int, target_column: None | str | int = None, strict: bool = False, passthrough: bool = False, ambiguous: bool = False) None[source]

Convert all URIs in the given column to CURIEs.

Parameters:
  • df – A pandas DataFrame

  • column – The column in the dataframe containing URIs to convert to CURIEs.

  • target_column – The column to put the results in. Defaults to input column.

  • strict – If true and the URI can’t be compressed, returns an error. Defaults to false.

  • passthrough – If true, strict is false, and the URI can’t be compressed, return the input. Defaults to false.

  • ambiguous – If true, consider the column as containing either CURIEs or URIs.

pd_expand(df: pandas.DataFrame, column: str | int, target_column: None | str | int = None, strict: bool = False, passthrough: bool = False, ambiguous: bool = False) None[source]

Convert all CURIEs in the given column to URIs.

Parameters:
  • df – A pandas DataFrame

  • column – The column in the dataframe containing CURIEs to convert to URIs.

  • target_column – The column to put the results in. Defaults to input column.

  • strict – If true and the CURIE can’t be expanded, returns an error. Defaults to false.

  • passthrough – If true, strict is false, and the CURIE can’t be expanded, return the input. Defaults to false.

  • ambiguous – If true, consider the column as containing either CURIEs or URIs.

pd_standardize_curie(df: pandas.DataFrame, *, column: str | int, target_column: None | str | int = None, strict: bool = False, passthrough: bool = False) None[source]

Standardize all CURIEs in the given column.

Parameters:
  • df – A pandas DataFrame

  • column – The column in the dataframe containing CURIEs to standardize.

  • target_column – The column to put the results in. Defaults to input column.

  • strict – If true and any CURIE can’t be standardized, returns an error. Defaults to false.

  • passthrough – If true, strict is false, and any CURIE can’t be standardized, return the input. Defaults to false.

The Disease Ontology curates mappings to other semantic spaces and distributes them in the tabular SSSOM format. However, they use a wide variety of non-standard prefixes for referring to external vocabularies like SNOMED-CT. The Bioregistry contains these synonyms to support reconciliation. The following example shows how the SSSOM mappings dataframe can be loaded and this function applied to the mapping object_id column (in place).

>>> import curies
>>> import pandas as pd
>>> commit = "faca4fc335f9a61902b9c47a1facd52a0d3d2f8b"
>>> url = f"https://raw.githubusercontent.com/mapping-commons/disease-mappings/{commit}/mappings/doid.sssom.tsv"
>>> df = pd.read_csv(url, sep="\t", comment='#')
>>> converter = curies.get_bioregistry_converter()
>>> converter.pd_standardize_curie(df, column="object_id")
pd_standardize_prefix(df: pandas.DataFrame, *, column: str | int, target_column: None | str | int = None, strict: bool = False, passthrough: bool = False) None[source]

Standardize all prefixes in the given column.

Parameters:
  • df – A pandas DataFrame

  • column – The column in the dataframe containing prefixes to standardize.

  • target_column – The column to put the results in. Defaults to input column.

  • strict – If true and any prefix can’t be standardized, returns an error. Defaults to false.

  • passthrough – If true, strict is false, and any prefix can’t be standardized, return the input. Defaults to false.

pd_standardize_uri(df: pandas.DataFrame, *, column: str | int, target_column: None | str | int = None, strict: bool = False, passthrough: bool = False) None[source]

Standardize all URIs in the given column.

Parameters:
  • df – A pandas DataFrame

  • column – The column in the dataframe containing URIs to standardize.

  • target_column – The column to put the results in. Defaults to input column.

  • strict – If true and any URI can’t be standardized, returns an error. Defaults to false.

  • passthrough – If true, strict is false, and any URI can’t be standardized, return the input. Defaults to false.

standardize_curie(curie: str, *, strict: Literal[True] = True, passthrough: bool = False) str[source]
standardize_curie(curie: str, *, strict: Literal[False] = False, passthrough: Literal[True] = True) str
standardize_curie(curie: str, *, strict: Literal[False] = False, passthrough: Literal[False] = False) str | None

Standardize a CURIE.

Parameters:
  • curie – A string representing a compact URI (CURIE)

  • strict – If true and the CURIE can’t be standardized, returns an error. Defaults to false.

  • passthrough – If true, strict is false, and the CURIE can’t be standardized, return the input. Defaults to false.

Returns:

A standardized version of the CURIE in case a prefix synonym was used. Note that this function is idempotent, i.e., if you give an already standard CURIE, it will just return it as is. If the CURIE can’t be parsed with respect to the records in the converter, None is returned.

Raises:

CURIEStandardizationError – If strict is true and the CURIE can’t be standardized

>>> from curies import Converter, Record
>>> converter = Converter.from_extended_prefix_map([
...     Record(prefix="CHEBI", prefix_synonyms=["chebi"], uri_prefix="http://purl.obolibrary.org/obo/CHEBI_"),
... ])
>>> converter.standardize_curie("chebi:138488")
'CHEBI:138488'
>>> converter.standardize_curie("CHEBI:138488")
'CHEBI:138488'
>>> converter.standardize_curie("NOPE:NOPE") is None
True
>>> converter.standardize_curie("NOPE:NOPE", passthrough=True)
'NOPE:NOPE'
standardize_prefix(prefix: str, *, strict: Literal[True] = True, passthrough: bool = False) str[source]
standardize_prefix(prefix: str, *, strict: Literal[False] = False, passthrough: Literal[True] = True) str
standardize_prefix(prefix: str, *, strict: Literal[False] = False, passthrough: Literal[False] = False) str | None

Standardize a prefix.

Parameters:
  • prefix – The prefix of the CURIE

  • strict – If true and the prefix can’t be standardized, returns an error. Defaults to false.

  • passthrough – If true, strict is false, and the prefix can’t be standardized, return the input. Defaults to false.

Returns:

The standardized version of this prefix wrt this converter. If the prefix is not registered in this converter, returns none.

Raises:

PrefixStandardizationError – If strict is true and the prefix can’t be standardied

>>> from curies import Converter, Record
>>> converter = Converter.from_extended_prefix_map([
...     Record(prefix="CHEBI", prefix_synonyms=["chebi"], uri_prefix="..."),
... ])
>>> converter.standardize_prefix("chebi")
'CHEBI'
>>> converter.standardize_prefix("CHEBI")
'CHEBI'
>>> converter.standardize_prefix("NOPE") is None
True
>>> converter.standardize_prefix("NOPE", passthrough=True)
'NOPE'
standardize_uri(uri: str, *, strict: Literal[True] = True, passthrough: bool = False) str[source]
standardize_uri(uri: str, *, strict: Literal[False] = False, passthrough: Literal[True] = True) str
standardize_uri(uri: str, *, strict: Literal[False] = False, passthrough: Literal[False] = False) str | None

Standardize a URI.

Parameters:
  • uri – A string representing a valid uniform resource identifier (URI)

  • strict – If true and the URI can’t be standardized, returns an error. Defaults to false.

  • passthrough – If true, strict is false, and the URI can’t be standardized, return the input. Defaults to false.

Returns:

A standardized version of the URI in case a URI prefix synonym was used. Note that this function is idempotent, i.e., if you give an already standard URI, it will just return it as is. If the URI can’t be parsed with respect to the records in the converter, None is returned.

Raises:

URIStandardizationError – If strict is true and the URI can’t be standardized

>>> from curies import Converter, Record
>>> converter = Converter.from_extended_prefix_map([
...     Record(
...         prefix="CHEBI",
...         uri_prefix="http://purl.obolibrary.org/obo/CHEBI_",
...         uri_prefix_synonyms=[
...             "https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:",
...         ],
...     ),
... ])
>>> converter.standardize_uri("https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'
>>> converter.standardize_uri("http://purl.obolibrary.org/obo/CHEBI_138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'
>>> converter.standardize_uri("http://example.org/NOPE") is None
True
>>> converter.standardize_uri("http://example.org/NOPE", passthrough=True)
'http://example.org/NOPE'