Converter

class Converter(records: Iterable[Record] | None = None, *, delimiter: str = ':', strict: bool = True)[source]

Bases: object

A cached prefix map data structure.

# Construct a prefix map:
>>> converter = Converter.from_prefix_map(
...     {
...         "CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
...         "MONDO": "http://purl.obolibrary.org/obo/MONDO_",
...         "GO": "http://purl.obolibrary.org/obo/GO_",
...         "OBO": "http://purl.obolibrary.org/obo/",
...     }
... )

# Compression and Expansion:
>>> converter.compress("http://purl.obolibrary.org/obo/CHEBI_1")
'CHEBI:1'
>>> converter.expand("CHEBI:1")
'http://purl.obolibrary.org/obo/CHEBI_1'

# Example with unparsable URI:
>>> converter.compress("http://example.com/missing:0000000")

# Example with missing prefix:
>>> converter.expand("missing:0000000")

Instantiate a converter.

Parameters:

records – A list of records. If you plan to build a converter incrementally, pass an empty list or leave this blank.
strict – If true, raises issues on duplicate URI prefixes
delimiter – The delimiter used for CURIEs. Defaults to a colon.

Raises:

DuplicatePrefixes – if any records share any synonyms
DuplicateURIPrefixes – if any records share any URI prefixes

Attributes Summary

`bimap`	Get the bijective mapping between CURIE prefixes and URI prefixes.
`prefix_map`	Get the non-URI-prefix-unique prefix map..
`reverse_bimap`	Get the bijective mapping between URI CURIE prefixes and CURIE prefixes.
`reverse_prefix_map`	Get the non-URI-prefix-unique prefix map.

Methods Summary

`add_prefix`(prefix, uri_prefix[, ...])	Append a prefix to the converter.
`add_prefix_synonym`(prefix, prefix_synonym, *)	Add a prefix synonym to the record with the given prefix.
`add_record`(record, *[, case_sensitive, merge])	Append a record to the converter.
`add_uri_prefix_synonym`(prefix, ...[, ...])	Add a URI synonym to the record with the given prefix.
`bind_rdflib`(graph_or_manager[, synonyms])	Add the prefix map from this converter to a RDFlib graph or manager.
`compress`(-> str -> str)	Compress a URI to a CURIE, if possible.
`compress_or_standardize`(-> str -> str)	Compress a URI or standardize a CURIE.
`compress_strict`(uri)	Compress a URI to a CURIE, and raise an error of not possible.
`expand`(-> str -> str)	Expand a CURIE to a URI, if possible.
`expand_all`(...)	Expand a CURIE pair to all possible URIs.
`expand_or_standardize`(-> str -> str)	Expand a CURIE or standardize a URI.
`expand_pair`(-> str)	Expand a CURIE pair to the standard URI.
`expand_pair_all`(...)	Expand a CURIE pair to all possible URIs.
`expand_reference`(-> str)	Expand a reference.
`expand_strict`(curie)	Expand a CURIE to a URI, and raise an error of not possible.
`file_compress`(path, column, *[, sep, ...])	Convert all URIs in the given column of a CSV file to CURIEs.
`file_expand`(path, column, *[, sep, header, ...])	Convert all CURIEs in the given column of a CSV file to URIs.
`format_curie`(prefix, identifier)	Format a prefix and identifier into a CURIE string.
`from_extended_prefix_map`(records, **kwargs)	Get a converter from a list of dictionaries by creating records out of them.
`from_jsonld`(data, **kwargs)	Get a converter from a JSON-LD object, which contains a prefix map in its `@context` key.
`from_jsonld_github`(owner, repo, *path[, branch])	Construct a remote JSON-LD URL on GitHub then parse with `Converter.from_jsonld()`.
`from_prefix_map`(prefix_map, **kwargs)	Get a converter from a simple prefix map.
`from_priority_prefix_map`(data, **kwargs)	Get a converter from a priority prefix map.
`from_rdflib`(graph_or_manager, **kwargs)	Get a converter from an RDFLib graph or namespace manager.
`from_reverse_prefix_map`(reverse_prefix_map, ...)	Get a converter from a reverse prefix map.
`from_shacl`(graph[, format])	Get a converter from SHACL, either in a turtle f.
`get_prefixes`(*[, include_synonyms])	Get the set of prefixes covered by this converter.
`get_record`(-> ~curies.api.Record)	Get the record for the prefix.
`get_subconverter`(prefixes)	Get a converter with a subset of prefixes.
`get_uri_prefixes`(*[, include_synonyms])	Get the set of URI prefixes covered by this converter.
`has_prefix`(prefix)	Check if the converter has the prefix (either as a primary or secondary).
`hash_triple`(triple, *[, negate])	Hash a triple using `curies.triples.hash_triple()`, implementing https://ts4nfdi.github.io/mapping-sameness-identifier.
`is_curie`(s)	Check if the string can be parsed as a CURIE by this converter.
`is_uri`(s)	Check if the string can be parsed as a URI by this converter.
`parse`(-> ~curies.api.ReferenceTuple)	Parse a string, URI, or CURIE.
`parse_curie`(-> ~curies.api.ReferenceTuple \| None)	Parse and standardize a CURIE.
`parse_uri`(-> ~curies.api.ReferenceTuple \| None)	Compress a URI to a CURIE pair.
`pd_compress`(df, column[, target_column, ...])	Convert all URIs in the given column to CURIEs.
`pd_expand`(df, column[, target_column, ...])	Convert all CURIEs in the given column to URIs.
`pd_standardize_curie`(df, *, column[, ...])	Standardize all CURIEs in the given column.
`pd_standardize_prefix`(df, *, column[, ...])	Standardize all prefixes in the given column.
`pd_standardize_uri`(df, *, column[, ...])	Standardize all URIs in the given column.
`standardize_curie`(-> str -> str)	Standardize a CURIE.
`standardize_identifier`(-> str)	Standardize an identifier.
`standardize_prefix`(-> str -> str)	Standardize a prefix.
`standardize_reference`(-> ~curies.api.Reference)	Standardizes a reference.
`standardize_uri`(-> str -> str)	Standardize a URI.

Attributes Documentation

bimap: Get the bijective mapping between CURIE prefixes and URI prefixes.

prefix_map: Get the non-URI-prefix-unique prefix map..

reverse_bimap: Get the bijective mapping between URI CURIE prefixes and CURIE prefixes.

reverse_prefix_map: Get the non-URI-prefix-unique prefix map.

Methods Documentation

add_prefix(prefix: str, uri_prefix: str, prefix_synonyms: Collection[str] | None = None, uri_prefix_synonyms: Collection[str] | None = None, *, pattern: str | None = None, case_sensitive: bool = True, merge: bool = False) → None[source]

Append a prefix to the converter.

Parameters:

prefix – The prefix to append, e.g., go
uri_prefix – The URI prefix to append, e.g., http://purl.obolibrary.org/obo/GO_
prefix_synonyms – An optional collection of synonyms for the prefix such as gomf, gocc, etc.
uri_prefix_synonyms – An optional collections of synonyms for the URI prefix such as https://bioregistry.io/go:, http://www.informatics.jax.org/searches/GO.cgi?id=GO:, etc.
pattern – An optional pattern
case_sensitive – Should prefixes and URI prefixes be compared in a case-sensitive manner when checking for uniqueness? Defaults to True.
merge – Should this record be merged into an existing record if it uniquely maps to a single existing record? When false, will raise an error if one or more existing records can be mapped. Defaults to false.

This can be used to add missing namespaces on-the-fly to an existing converter:

>>> import curies
>>> converter = curies.get_obo_converter()
>>> converter.add_prefix("hgnc", "https://bioregistry.io/hgnc:")
>>> converter.expand("hgnc:1234")
'https://bioregistry.io/hgnc:1234'
>>> converter.expand("GO:0032571")
'http://purl.obolibrary.org/obo/GO_0032571'

This can also be used to incrementally build up a converter from scratch:

>>> import curies
>>> converter = curies.Converter(records=[])
>>> converter.add_prefix("hgnc", "https://bioregistry.io/hgnc:")
>>> converter.expand("hgnc:1234")
'https://bioregistry.io/hgnc:1234'

add_prefix_synonym(prefix: str, prefix_synonym: str, *, case_sensitive: bool = True) → None[source]: Add a prefix synonym to the record with the given prefix.

add_record(record: Record, *, case_sensitive: bool = True, merge: bool = False) → None[source]: Append a record to the converter.

add_uri_prefix_synonym(prefix: str, uri_prefix_synonym: str, *, case_sensitive: bool = True) → None[source]: Add a URI synonym to the record with the given prefix.

bind_rdflib(graph_or_manager: rdflib.Graph | rdflib.namespace.NamespaceManager, synonyms: bool = False) → None[source]

Add the prefix map from this converter to a RDFlib graph or manager.

Parameters:

graph_or_manager – A RDFLib graph or manager object
synonyms – Should CURIE prefix synonyms be bound?

Binding a graph:

>>> import curies, rdflib
>>> converter = curies.get_obo_converter()
>>> graph = rdflib.Graph()
>>> converter.bind_rdflib(graph)

Binding a manager:

>>> import curies, rdflib, rdflib.namespace
>>> converter = curies.get_obo_converter()
>>> manager = rdflib.namespace.NamespaceManager(graph=rdflib.Graph())
>>> converter.bind_rdflib(manager)

compress(uri: str, *, strict: Literal[True] = True, passthrough: bool = False) → str[source]

compress(uri: str, *, strict: Literal[False] = False, passthrough: Literal[True] = True) → str

compress(uri: str, *, strict: Literal[False] = False, passthrough: Literal[False] = False) → str | None

Compress a URI to a CURIE, if possible.

Parameters:

uri – A string representing a valid uniform resource identifier (URI)
strict – If true and the URI can’t be compressed, returns an error. Defaults to false.
passthrough – If true, strict is false, and the URI can’t be compressed, return the input. Defaults to false.

Returns:

A compact URI if this converter could find an appropriate URI prefix, otherwise none.

Raises:

CompressionError – If strict is set to true and the URI can’t be compressed

>>> from curies import Converter
>>> converter = Converter.from_prefix_map(
...     {
...         "CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
...         "MONDO": "http://purl.obolibrary.org/obo/MONDO_",
...         "GO": "http://purl.obolibrary.org/obo/GO_",
...         "OBO": "http://purl.obolibrary.org/obo/",
...     }
... )
>>> converter.compress("http://purl.obolibrary.org/obo/GO_0032571")
'GO:0032571'
>>> converter.compress("http://purl.obolibrary.org/obo/go.owl")
'OBO:go.owl'
>>> converter.compress("http://example.org/missing:0000000")

Note

If there are partially overlapping URI prefixes in this converter (e.g., http://purl.obolibrary.org/obo/GO_ for the prefix GO and http://purl.obolibrary.org/obo/ for the prefix OBO), the longest URI prefix will always be matched. For example, parsing http://purl.obolibrary.org/obo/GO_0032571 will return GO:0032571 instead of OBO:GO_0032571.

compress_or_standardize(uri_or_curie: str, *, strict: Literal[True] = True, passthrough: bool = False) → str[source]

compress_or_standardize(uri_or_curie: str, *, strict: Literal[False] = False, passthrough: Literal[True] = True) → str

compress_or_standardize(uri_or_curie: str, *, strict: Literal[False] = False, passthrough: Literal[False] = False) → str | None

Compress a URI or standardize a CURIE.

Parameters:

uri_or_curie – A string representing a compact URI (CURIE) or a URI.
strict – If true and the string is neither a URI that can be compressed nor a CURIE that can be standardized, returns an error. Defaults to false.
passthrough – If true, strict is false, and the string is neither a URI that can be compressed nor a CURIE that can be standardized, return the input. Defaults to false.

Returns:

If the string is a URI, and it can be compressed, returns the corresponding CURIE. If the string is a CURIE, and it can be standardized, returns the standard CURIE.

Raises:

CompressionError – If strict is true and the URI can’t be compressed

>>> from curies import Converter, Record
>>> converter = Converter.from_extended_prefix_map(
...     [
...         Record(
...             prefix="CHEBI",
...             prefix_synonyms=["chebi"],
...             uri_prefix="http://purl.obolibrary.org/obo/CHEBI_",
...             uri_prefix_synonyms=["https://identifiers.org/chebi:"],
...         ),
...     ]
... )
>>> converter.compress_or_standardize("http://purl.obolibrary.org/obo/CHEBI_138488")
'CHEBI:138488'
>>> converter.compress_or_standardize("https://identifiers.org/chebi:138488")
'CHEBI:138488'

>>> converter.compress_or_standardize("CHEBI:138488")
'CHEBI:138488'
>>> converter.compress_or_standardize("chebi:138488")
'CHEBI:138488'

>>> converter.compress_or_standardize("missing:0000000")
>>> converter.compress_or_standardize("https://example.com/missing:0000000")

compress_strict(uri: str) → str[source]: Compress a URI to a CURIE, and raise an error of not possible.

expand(curie: str, *, strict: Literal[True] = True, passthrough: bool = False) → str[source]

expand(curie: str, *, strict: Literal[False] = False, passthrough: Literal[True] = True) → str

expand(curie: str, *, strict: Literal[False] = False, passthrough: Literal[False] = False) → str | None

Expand a CURIE to a URI, if possible.

Parameters:

curie – A string representing a compact URI (CURIE)
strict – If true and the CURIE can’t be expanded, returns an error. Defaults to false.
passthrough – If true, strict is false, and the CURIE can’t be expanded, return the input. Defaults to false. If your strings can either be a CURIE _or_ a URI, consider using Converter.expand_or_standardize() instead.

Returns:

A URI if this converter contains a URI prefix for the prefix in this CURIE

Raises:

ExpansionError – If strict is true and the CURIE can’t be expanded

>>> from curies import Converter
>>> converter = Converter.from_prefix_map(
...     {
...         "CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
...         "MONDO": "http://purl.obolibrary.org/obo/MONDO_",
...         "GO": "http://purl.obolibrary.org/obo/GO_",
...     }
... )
>>> converter.expand("CHEBI:138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'
>>> converter.expand("missing:0000000")

expand_all(curie: str, *, strict: Literal[False] = False) → Collection[str] | None[source]

expand_all(curie: str, *, strict: Literal[True] = True) → Collection[str]

Expand a CURIE pair to all possible URIs.

Parameters:

curie – A string representing a compact URI
strict – If true and the prefix can’t be expanded, returns an error. Defaults to false.

Returns:

A list of URIs that this converter can create for the given CURIE. The first entry is the “standard” URI then others are based on URI prefix synonyms. If the prefix is not registered to this converter, none is returned.

Raises:

PrefixStandardizationError – if the prefix in the CURIE can not be looked up

>>> priority_prefix_map = {
...     "CHEBI": [
...         "http://purl.obolibrary.org/obo/CHEBI_",
...         "https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:",
...     ],
... }
>>> converter = Converter.from_priority_prefix_map(priority_prefix_map)
>>> converter.expand_all("CHEBI:138488")
['http://purl.obolibrary.org/obo/CHEBI_138488', 'https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:138488']
>>> converter.expand_all("NOPE:NOPE") is None
True

expand_or_standardize(curie_or_uri: str, *, strict: Literal[True] = True, passthrough: bool = False) → str[source]

expand_or_standardize(curie_or_uri: str, *, strict: Literal[False] = False, passthrough: Literal[True] = True) → str

expand_or_standardize(curie_or_uri: str, *, strict: Literal[False] = False, passthrough: Literal[False] = False) → str | None

Expand a CURIE or standardize a URI.

Parameters:

curie_or_uri – A string representing a compact URI (CURIE) or a URI.
strict – If true and the string is neither a CURIE that can be expanded nor a URI that can be standardized, returns an error. Defaults to false.
passthrough – If true, strict is false, and the string is neither a CURIE that can be expanded nor a URI that can be standardized, return the input. Defaults to false.

Returns:

If the string is a CURIE, and it can be expanded, returns the corresponding URI. If the string is a URI, and it can be standardized, returns the standard URI.

Raises:

ExpansionError – If strict is true and the CURIE can’t be expanded

>>> from curies import Converter, Record
>>> converter = Converter.from_extended_prefix_map(
...     [
...         Record(
...             prefix="CHEBI",
...             prefix_synonyms=["chebi"],
...             uri_prefix="http://purl.obolibrary.org/obo/CHEBI_",
...             uri_prefix_synonyms=["https://identifiers.org/chebi:"],
...         ),
...     ]
... )
>>> converter.expand_or_standardize("CHEBI:138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'
>>> converter.expand_or_standardize("chebi:138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'

>>> converter.expand_or_standardize("http://purl.obolibrary.org/obo/CHEBI_138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'
>>> converter.expand_or_standardize("https://identifiers.org/chebi:138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'

>>> converter.expand_or_standardize("missing:0000000")
>>> converter.expand_or_standardize("https://example.com/missing:0000000")

expand_pair(prefix: str, identifier: str, *, strict: Literal[True] = True, passthrough: bool = False) → str[source]

expand_pair(prefix: str, identifier: str, *, strict: Literal[False] = False, passthrough: bool = False) → str | None

Expand a CURIE pair to the standard URI.

Parameters:

prefix – The prefix of the CURIE
identifier – The local unique identifier of the CURIE
strict – If true and the prefix can’t be expanded, returns an error. Defaults to false.
passthrough – If true, strict is false, and the prefix can’t be expanded, return the input. Defaults to false.

Returns:

A URI if this converter contains a URI prefix for the prefix in this CURIE

>>> from curies import Converter
>>> converter = Converter.from_prefix_map(
...     {
...         "CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
...         "MONDO": "http://purl.obolibrary.org/obo/MONDO_",
...         "GO": "http://purl.obolibrary.org/obo/GO_",
...     }
... )
>>> converter.expand_pair("CHEBI", "138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'
>>> converter.expand_pair("missing", "0000000")

expand_pair_all(prefix: str, identifier: str, *, strict: Literal[True] = True) → Collection[str][source]

expand_pair_all(prefix: str, identifier: str, *, strict: Literal[False] = False) → Collection[str] | None

Expand a CURIE pair to all possible URIs.

Parameters:

prefix – The prefix of the CURIE
identifier – The local unique identifier of the CURIE
strict – If true and the prefix can’t be expanded, returns an error. Defaults to false.

Returns:

A list of URIs that this converter can create for the given CURIE. The first entry is the “standard” URI then others are based on URI prefix synonyms. If the prefix is not registered to this converter, none is returned.

Raises:

ExpansionError – if the prefix in the CURIE can not be looked up

>>> priority_prefix_map = {
...     "CHEBI": [
...         "http://purl.obolibrary.org/obo/CHEBI_",
...         "https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:",
...     ],
... }
>>> converter = Converter.from_priority_prefix_map(priority_prefix_map)
>>> converter.expand_pair_all("CHEBI", "138488")
['http://purl.obolibrary.org/obo/CHEBI_138488', 'https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:138488']
>>> converter.expand_pair_all("NOPE", "NOPE") is None
True

expand_reference(reference: ReferenceTuple | Reference, *, strict: Literal[True] = False, passthrough: bool = False) → str[source]
expand_reference(reference: ReferenceTuple | Reference, *, strict: Literal[False] = False, passthrough: bool = False) → str | None: Expand a reference.

expand_strict(curie: str) → str[source]: Expand a CURIE to a URI, and raise an error of not possible.

file_compress(path: str | Path, column: int, *, sep: str | None = None, header: bool = True, strict: bool = False, passthrough: bool = False, ambiguous: bool = False) → None[source]

Convert all URIs in the given column of a CSV file to CURIEs.

Parameters:

path – A pandas DataFrame
column – The column in the dataframe containing URIs to convert to CURIEs.
sep – The delimiter of the CSV file, defaults to tab
header – Does the file have a header row?
strict – If true and the URI can’t be compressed, returns an error. Defaults to false.
passthrough – If true, strict is false, and the URI can’t be compressed, return the input. Defaults to false.
ambiguous – If true, consider the column as containing either CURIEs or URIs.

file_expand(path: str | Path, column: int, *, sep: str | None = None, header: bool = True, strict: bool = False, passthrough: bool = False, ambiguous: bool = False) → None[source]

Convert all CURIEs in the given column of a CSV file to URIs.

Parameters:

path – A pandas DataFrame
column – The column in the dataframe containing CURIEs to convert to URIs.
sep – The delimiter of the CSV file, defaults to tab
header – Does the file have a header row?
strict – If true and the CURIE can’t be expanded, returns an error. Defaults to false.
passthrough – If true, strict is false, and the CURIE can’t be expanded, return the input. Defaults to false.
ambiguous – If true, consider the column as containing either CURIEs or URIs.

format_curie(prefix: str, identifier: str) → str[source]: Format a prefix and identifier into a CURIE string.

classmethod from_extended_prefix_map(records: str | Path | Iterable[Record | dict[str, Any]], **kwargs: Any) → Converter[source]

Get a converter from a list of dictionaries by creating records out of them.

Parameters:

records –
One of the following:
- An iterable of curies.Record objects or dictionaries that will get converted into record objects that together constitute an extended prefix map
- A string containing a remote location of a JSON file containg an extended prefix map
- A string or pathlib.Path object corresponding to a local file path to a JSON file containing an extended prefix map
kwargs – Keyword arguments to pass to curies.Converter.__init__()

Returns:

A converter

An extended prefix map is a list of dictionaries containing four keys:

A prefix string
A uri_prefix string
An optional list of strings prefix_synonyms
An optional list of strings uri_prefix_synonyms

Across the whole list of dictionaries, there should be uniqueness within the union of all prefix and prefix_synonyms as well as uniqueness within the union of all uri_prefix and uri_prefix_synonyms.

>>> epm = [
...     {
...         "prefix": "CHEBI",
...         "prefix_synonyms": ["chebi", "ChEBI"],
...         "uri_prefix": "http://purl.obolibrary.org/obo/CHEBI_",
...         "uri_prefix_synonyms": [
...             "https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:"
...         ],
...     },
...     {
...         "prefix": "GO",
...         "uri_prefix": "http://purl.obolibrary.org/obo/GO_",
...     },
... ]
>>> converter = Converter.from_extended_prefix_map(epm)

Expand using the preferred/canonical prefix:

>>> converter.expand("CHEBI:138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'

Expand using a prefix synonym:

>>> converter.expand("chebi:138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'

Compress using the preferred/canonical URI prefix:

>>> converter.compress("http://purl.obolibrary.org/obo/CHEBI_138488")
'CHEBI:138488'

Compressing using a URI prefix synonym:

>>> converter.compress("https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:138488")
'CHEBI:138488'

Example from a remote source:

>>> url = "https://github.com/biopragmatics/bioregistry/raw/main/exports/contexts/bioregistry.epm.json"
>>> converter = Converter.from_extended_prefix_map(url)

classmethod from_jsonld(data: str | Path | dict[str, Any], **kwargs: Any) → Converter[source]

Get a converter from a JSON-LD object, which contains a prefix map in its @context key.

Parameters:

data – A JSON-LD object
kwargs – Keyword arguments to pass to curies.Converter.__init__()

Returns:

A converter

Example from a remote context file:

>>> base = "https://raw.githubusercontent.com"
>>> url = f"{base}/biopragmatics/bioregistry/main/exports/contexts/semweb.context.jsonld"
>>> converter = Converter.from_jsonld(url)
>>> "rdf" in converter.prefix_map

See also

https://www.w3.org/TR/json-ld11/#the-context defines the @context aspect of JSON-LD

classmethod from_jsonld_github(owner: str, repo: str, *path: str, branch: str = 'main', **kwargs: Any) → Converter[source]

Construct a remote JSON-LD URL on GitHub then parse with Converter.from_jsonld().

Parameters:

owner – A github repository owner or organization (e.g., biopragmatics)
repo – The name of the repository (e.g., bioregistry)
path – The file path in the GitHub repository to a JSON-LD context file.
branch – The branch from which the file should be downloaded. Defaults to main, for old repositories this might need to be changed to master.
kwargs – Keyword arguments to pass to curies.Converter.__init__()

Returns:

A converter

Raises:

ValueError – If the given path doesn’t end in a .jsonld file name

>>> converter = Converter.from_jsonld_github(
...     "biopragmatics",
...     "bioregistry",
...     "exports",
...     "contexts",
...     "semweb.context.jsonld",
... )
>>> "rdf" in converter.prefix_map
True

classmethod from_prefix_map(prefix_map: str | Path | Mapping[str, str], **kwargs: Any) → Converter[source]

Get a converter from a simple prefix map.

Parameters:

prefix_map –
One of the following:
- A mapping whose keys represent CURIE prefixes and values represent URI prefixes
- A string containing a remote location of a JSON file containg a prefix map
- A string or pathlib.Path object corresponding to a local file path to a JSON file containing a prefix map
kwargs – Keyword arguments to pass to curies.Converter.__init__()

Returns:

A converter

>>> converter = Converter.from_prefix_map(
...     {
...         "CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
...         "MONDO": "http://purl.obolibrary.org/obo/MONDO_",
...         "GO": "http://purl.obolibrary.org/obo/GO_",
...         "OBO": "http://purl.obolibrary.org/obo/",
...     }
... )
>>> converter.expand("CHEBI:138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'
>>> converter.compress("http://purl.obolibrary.org/obo/CHEBI_138488")
'CHEBI:138488'

classmethod from_priority_prefix_map(data: str | Path | Mapping[str, list[str]], **kwargs: Any) → Converter[source]

Get a converter from a priority prefix map.

Parameters:

data – A prefix map where the keys are prefixes (e.g., chebi) and the values are lists of URI prefixes (e.g., http://purl.obolibrary.org/obo/CHEBI_) with the first element of the list being the priority URI prefix for expansions.
kwargs – Keyword arguments to pass to the parent class’s init

Returns:

A converter

>>> priority_prefix_map = {
...     "CHEBI": [
...         "http://purl.obolibrary.org/obo/CHEBI_",
...         "https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:",
...     ],
...     "GO": ["http://purl.obolibrary.org/obo/GO_"],
...     "obo": ["http://purl.obolibrary.org/obo/"],
... }
>>> converter = Converter.from_priority_prefix_map(priority_prefix_map)
>>> converter.expand("CHEBI:138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'
>>> converter.compress("http://purl.obolibrary.org/obo/CHEBI_138488")
'CHEBI:138488'
>>> converter.compress("https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:138488")
'CHEBI:138488'

classmethod from_rdflib(graph_or_manager: rdflib.Graph | rdflib.namespace.NamespaceManager, **kwargs: Any) → Converter[source]

Get a converter from an RDFLib graph or namespace manager.

Parameters:

graph_or_manager – A RDFLib graph or manager object
kwargs – Keyword arguments to pass to from_prefix_map()

Returns:

A converter

In the following example, a rdflib.Graph is created, a namespace is bound to it, then a converter is made:

>>> import rdflib, curies
>>> graph = rdflib.Graph()
>>> graph.bind("hgnc", "https://bioregistry.io/hgnc:")
>>> converter = curies.Converter.from_rdflib(graph)
>>> converter.expand("hgnc:1234")
'https://bioregistry.io/hgnc:1234'

This also works if you directly start with a rdflib.namespace.NamespaceManager:

>>> converter = curies.Converter.from_rdflib(graph.namespace_manager)
>>> converter.expand("hgnc:1234")
'https://bioregistry.io/hgnc:1234'

classmethod from_reverse_prefix_map(reverse_prefix_map: str | Path | Mapping[str, str], **kwargs: Any) → Converter[source]

Get a converter from a reverse prefix map.

Parameters:

reverse_prefix_map – A mapping whose keys are URI prefixes and whose values are the corresponding prefixes. This data structure allow for multiple different URI formats to point to the same prefix.
kwargs – Keyword arguments to pass to curies.Converter.__init__()

Returns:

A converter

>>> converter = Converter.from_reverse_prefix_map(
...     {
...         "http://purl.obolibrary.org/obo/CHEBI_": "CHEBI",
...         "https://www.ebi.ac.uk/chebi/searchId.do?chebiId=": "CHEBI",
...         "http://purl.obolibrary.org/obo/MONDO_": "MONDO",
...     }
... )
>>> converter.expand("CHEBI:138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'
>>> converter.compress("http://purl.obolibrary.org/obo/CHEBI_138488")
'CHEBI:138488'
>>> converter.compress("https://www.ebi.ac.uk/chebi/searchId.do?chebiId=138488")
'CHEBI:138488'

Altenatively, get content from the internet like

>>> url = "https://github.com/biopragmatics/bioregistry/raw/main/exports/contexts/bioregistry.rpm.json"
>>> converter = Converter.from_reverse_prefix_map(url)
>>> "chebi" in converter.prefix_map

classmethod from_shacl(graph: str | Path | rdflib.Graph, format: str | None = None, **kwargs: Any) → Converter[source]

Get a converter from SHACL, either in a turtle f.

Parameters:

graph – A RDFLib graph, a Path, a string representing a file path, or a string URL
format – The RDF format, if a file path is given
kwargs – Keyword arguments to pass to Converter.__init__()

Returns:

A converter

get_prefixes(*, include_synonyms: bool = False) → set[str][source]

Get the set of prefixes covered by this converter.

Parameters:: include_synonyms – If true, include secondary prefixes.
Returns:: A set of primary prefixes covered by the converter. If include_synonyms is set to True, secondary prefixes (i.e., ones in Record.prefix_synonyms are also included

get_record(prefix: str, *, strict: Literal[True] = True) → Record[source]
get_record(prefix: str, *, strict: Literal[False] = False) → Record | None: Get the record for the prefix.

get_subconverter(prefixes: Iterable[str]) → Converter[source]

Get a converter with a subset of prefixes.

Parameters:: prefixes – A list of prefixes to keep from this converter. These can correspond either to preferred CURIE prefixes or CURIE prefix synonyms.
Returns:: A new, slimmed down converter

This functionality is useful for downstream applications like the following:

You load a comprehensive extended prefix map, e.g., from the Bioregistry using curies.get_bioregistry_converter().
You load some data that conforms to this prefix map by convention. This is often the case for semantic mappings stored in the SSSOM format.
You extract the list of prefixes actually used within your data
You subset the detailed extended prefix map to only include prefixes relevant for your data
You make some kind of output of the subsetted extended prefix map to go with your data. Effectively, this is a way of reconciling data. This is especially effective when using the Bioregistry or other comprehensive extended prefix maps.

Here’s a concrete example of doing this (which also includes a bit of data science) to do this on the SSSOM mappings from the Disease Ontology project.

>>> import curies
>>> import pandas as pd
>>> import itertools as itt
>>> commit = "faca4fc335f9a61902b9c47a1facd52a0d3d2f8b"
>>> url = f"https://raw.githubusercontent.com/mapping-commons/disease-mappings/{commit}/mappings/doid.sssom.tsv"
>>> df = pd.read_csv(url, sep="\t", comment="#")
>>> prefixes = {
...     curies.Reference.from_curie(curie).prefix
...     for column in ["subject_id", "predicate_id", "object_id"]
...     for curie in df[column]
... }
>>> converter = curies.get_bioregistry_converter()
>>> slim_converter = converter.get_subconverter(prefixes)

get_uri_prefixes(*, include_synonyms: bool = False) → set[str][source]

Get the set of URI prefixes covered by this converter.

Parameters:: include_synonyms – If true, include secondary prefixes.
Returns:: A set of primary URI prefixes covered by the converter. If include_synonyms is set to True, secondary URI prefixes (i.e., ones in Record.uri_prefix_synonyms are also included

has_prefix(prefix: str) → bool[source]: Check if the converter has the prefix (either as a primary or secondary).

hash_triple(triple: Triple, *, negate: bool = False) → str[source]

Hash a triple using curies.triples.hash_triple(), implementing https://ts4nfdi.github.io/mapping-sameness-identifier.

Parameters:

triple – A subject-predicate-object triple
negate – If true, considers the triple as “negative” and postpends a ~ to the hash

Returns:

A hexadecimal digest of the SHA-256 hash of the space-joined expanded URI triple

>>> import curies
>>> from curies import Triple, Converter
>>> converter = curies.load_prefix_map(
...     {
...         "mesh": "http://id.nlm.nih.gov/mesh/",
...         "skos": "http://www.w3.org/2004/02/skos/core#",
...         "CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
...     }
... )
>>> triple = Triple(
...     subject="mesh:C000089",
...     predicate="skos:exactMatch",
...     object="CHEBI:28646",
... )
>>> converter.hash_triple(triple)
'36a1f9244ea7641a90987c82f33c25c0c13712ee8f48207b2a0825f8a4e4e26a'
>>> converter.hash_triple(triple, negate=True)
'36a1f9244ea7641a90987c82f33c25c0c13712ee8f48207b2a0825f8a4e4e26a~'

is_curie(s: str) → bool[source]

Check if the string can be parsed as a CURIE by this converter.

Parameters:: s – A string that might be a CURIE
Returns:: If the string can be parsed as a CURIE by this converter. Note that some valid CURIEs, when passed to this function, will result in False if their prefixes are not registered with this converter.

>>> import curies
>>> converter = curies.get_obo_converter()
>>> converter.is_curie("GO:1234567")
True
>>> converter.is_curie("http://purl.obolibrary.org/obo/GO_1234567")
False

The following is a valid CURIE, but the prefix is not registered with the converter based on the OBO Foundry prefix map, so it returns False.

>>> converter.is_curie("pdb:2gc4")
False

is_uri(s: str) → bool[source]

Check if the string can be parsed as a URI by this converter.

Parameters:: s – A string that might be a URI
Returns:: If the string can be parsed as a URI by this converter. Note that some valid URIs, when passed to this function, will result in False if their URI prefixes are not registered with this converter.

>>> import curies
>>> converter = curies.get_obo_converter()
>>> converter.is_uri("http://purl.obolibrary.org/obo/GO_1234567")
True
>>> converter.is_uri("GO:1234567")
False

The following is a valid URI, but the prefix is not registered with the converter based on the OBO Foundry prefix map, so it returns False.

>>> converter.is_uri("http://proteopedia.org/wiki/index.php/2gc4")
False

parse(str_or_uri_or_curie: str, *, strict: Literal[True] = True) → ReferenceTuple[source]
parse(str_or_uri_or_curie: str, *, strict: Literal[False] = False) → ReferenceTuple | None: Parse a string, URI, or CURIE.

parse_curie(curie: str, *, strict: Literal[False] = False) → ReferenceTuple | None[source]
parse_curie(curie: str, *, strict: Literal[True] = True) → ReferenceTuple: Parse and standardize a CURIE.

parse_uri(uri: str, *, strict: Literal[False] = False, return_none: None = None) → ReferenceTuple | None[source]

parse_uri(uri: str, *, strict: Literal[True] = True, return_none: None = None) → ReferenceTuple

Compress a URI to a CURIE pair.

Parameters:

uri – A string representing a valid uniform resource identifier (URI)
strict – If true and the URI can’t be parsed, returns an error. Defaults to false.
return_none – Opt into future type returning of a single None instead of a pair of Nones

Returns:

A CURIE pair if the URI could be parsed, otherwise a pair of None’s

Raises:

CompressionError – if strict is set to true and the URI can’t be parsed
NotImplementedError – If you pass False to return_none

>>> from curies import Converter
>>> converter = Converter.from_prefix_map(
...     {
...         "CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
...         "MONDO": "http://purl.obolibrary.org/obo/MONDO_",
...         "GO": "http://purl.obolibrary.org/obo/GO_",
...     }
... )
>>> converter.parse_uri("http://purl.obolibrary.org/obo/CHEBI_138488")
ReferenceTuple(prefix='CHEBI', identifier='138488')
>>> converter.parse_uri("http://example.org/missing:0000000")

pd_compress(df: pandas.DataFrame, column: str | int, target_column: None | str | int = None, strict: bool = False, passthrough: bool = False, ambiguous: bool = False) → None[source]

Convert all URIs in the given column to CURIEs.

Parameters:

df – A pandas DataFrame
column – The column in the dataframe containing URIs to convert to CURIEs.
target_column – The column to put the results in. Defaults to input column.
strict – If true and the URI can’t be compressed, returns an error. Defaults to false.
passthrough – If true, strict is false, and the URI can’t be compressed, return the input. Defaults to false.
ambiguous – If true, consider the column as containing either CURIEs or URIs.

pd_expand(df: pandas.DataFrame, column: str | int, target_column: None | str | int = None, strict: bool = False, passthrough: bool = False, ambiguous: bool = False) → None[source]

Convert all CURIEs in the given column to URIs.

Parameters:

df – A pandas DataFrame
column – The column in the dataframe containing CURIEs to convert to URIs.
target_column – The column to put the results in. Defaults to input column.
strict – If true and the CURIE can’t be expanded, returns an error. Defaults to false.
passthrough – If true, strict is false, and the CURIE can’t be expanded, return the input. Defaults to false.
ambiguous – If true, consider the column as containing either CURIEs or URIs.

pd_standardize_curie(df: pandas.DataFrame, *, column: str | int, target_column: None | str | int = None, strict: bool = False, passthrough: bool = False) → None[source]

Standardize all CURIEs in the given column.

Parameters:

df – A pandas DataFrame
column – The column in the dataframe containing CURIEs to standardize.
target_column – The column to put the results in. Defaults to input column.
strict – If true and any CURIE can’t be standardized, returns an error. Defaults to false.
passthrough – If true, strict is false, and any CURIE can’t be standardized, return the input. Defaults to false.

The Disease Ontology curates mappings to other semantic spaces and distributes them in the tabular SSSOM format. However, they use a wide variety of non-standard prefixes for referring to external vocabularies like SNOMED-CT. The Bioregistry contains these synonyms to support reconciliation. The following example shows how the SSSOM mappings dataframe can be loaded and this function applied to the mapping object_id column (in place).

>>> import curies
>>> import pandas as pd
>>> commit = "faca4fc335f9a61902b9c47a1facd52a0d3d2f8b"
>>> url = f"https://raw.githubusercontent.com/mapping-commons/disease-mappings/{commit}/mappings/doid.sssom.tsv"
>>> df = pd.read_csv(url, sep="\t", comment="#")
>>> converter = curies.get_bioregistry_converter()
>>> converter.pd_standardize_curie(df, column="object_id")

pd_standardize_prefix(df: pandas.DataFrame, *, column: str | int, target_column: None | str | int = None, strict: bool = False, passthrough: bool = False) → None[source]

Standardize all prefixes in the given column.

Parameters:

df – A pandas DataFrame
column – The column in the dataframe containing prefixes to standardize.
target_column – The column to put the results in. Defaults to input column.
strict – If true and any prefix can’t be standardized, returns an error. Defaults to false.
passthrough – If true, strict is false, and any prefix can’t be standardized, return the input. Defaults to false.

pd_standardize_uri(df: pandas.DataFrame, *, column: str | int, target_column: None | str | int = None, strict: bool = False, passthrough: bool = False) → None[source]

Standardize all URIs in the given column.

Parameters:

df – A pandas DataFrame
column – The column in the dataframe containing URIs to standardize.
target_column – The column to put the results in. Defaults to input column.
strict – If true and any URI can’t be standardized, returns an error. Defaults to false.
passthrough – If true, strict is false, and any URI can’t be standardized, return the input. Defaults to false.

standardize_curie(curie: str, *, strict: Literal[True] = True, passthrough: bool = False) → str[source]

standardize_curie(curie: str, *, strict: Literal[False] = False, passthrough: Literal[True] = True) → str

standardize_curie(curie: str, *, strict: Literal[False] = False, passthrough: Literal[False] = False) → str | None

Standardize a CURIE.

Parameters:

curie – A string representing a compact URI (CURIE)
strict – If true and the CURIE can’t be standardized, returns an error. Defaults to false.
passthrough – If true, strict is false, and the CURIE can’t be standardized, return the input. Defaults to false.

Returns:

A standardized version of the CURIE in case a prefix synonym was used. Note that this function is idempotent, i.e., if you give an already standard CURIE, it will just return it as is. If the CURIE can’t be parsed with respect to the records in the converter, None is returned.

Raises:

CURIEStandardizationError – If strict is true and the CURIE can’t be standardized

>>> from curies import Converter, Record
>>> converter = Converter.from_extended_prefix_map(
...     [
...         Record(
...             prefix="CHEBI",
...             prefix_synonyms=["chebi"],
...             uri_prefix="http://purl.obolibrary.org/obo/CHEBI_",
...         ),
...     ]
... )
>>> converter.standardize_curie("chebi:138488")
'CHEBI:138488'
>>> converter.standardize_curie("CHEBI:138488")
'CHEBI:138488'
>>> converter.standardize_curie("NOPE:NOPE") is None
True
>>> converter.standardize_curie("NOPE:NOPE", passthrough=True)
'NOPE:NOPE'

standardize_identifier(standard_prefix: str, identifier: str, *, strict: Literal[True] = False) → str[source]

standardize_identifier(standard_prefix: str, identifier: str, *, strict: Literal[False] = False) → str | None

Standardize an identifier.

Parameters:

standard_prefix – This is a prefix that has already been standardized using standardize_prefix() in this converter
identifier – An unstandardized identifier
strict – If true, requires standardization to succeed or throws an error

Returns:

A standardized identifier.

By default, this function is a no-op, meaning that it just returns the identifier as is. You can override the Converter class to implement this method to do standardization (e.g., removing redundant prefixes) and to do validation (e.g., checking against a regular expression).

standardize_prefix(prefix: str, *, strict: Literal[True] = True, passthrough: bool = False) → str[source]

standardize_prefix(prefix: str, *, strict: Literal[False] = False, passthrough: Literal[True] = True) → str

standardize_prefix(prefix: str, *, strict: Literal[False] = False, passthrough: Literal[False] = False) → str | None

Standardize a prefix.

Parameters:

prefix – The prefix of the CURIE
strict – If true and the prefix can’t be standardized, returns an error. Defaults to false.
passthrough – If true, strict is false, and the prefix can’t be standardized, return the input. Defaults to false.

Returns:

The standardized version of this prefix wrt this converter. If the prefix is not registered in this converter, returns none.

Raises:

PrefixStandardizationError – If strict is true and the prefix can’t be standardied

>>> from curies import Converter, Record
>>> converter = Converter.from_extended_prefix_map(
...     [
...         Record(prefix="CHEBI", prefix_synonyms=["chebi"], uri_prefix="..."),
...     ]
... )
>>> converter.standardize_prefix("chebi")
'CHEBI'
>>> converter.standardize_prefix("CHEBI")
'CHEBI'
>>> converter.standardize_prefix("NOPE") is None
True
>>> converter.standardize_prefix("NOPE", passthrough=True)
'NOPE'

standardize_reference(reference: Reference, *, strict: Literal[True] = False) → Reference[source]
standardize_reference(reference: Reference, *, strict: Literal[False] = False) → Reference | None: Standardizes a reference.

standardize_uri(uri: str, *, strict: Literal[True] = True, passthrough: bool = False) → str[source]

standardize_uri(uri: str, *, strict: Literal[False] = False, passthrough: Literal[True] = True) → str

standardize_uri(uri: str, *, strict: Literal[False] = False, passthrough: Literal[False] = False) → str | None

Standardize a URI.

Parameters:

uri – A string representing a valid uniform resource identifier (URI)
strict – If true and the URI can’t be standardized, returns an error. Defaults to false.
passthrough – If true, strict is false, and the URI can’t be standardized, return the input. Defaults to false.

Returns:

A standardized version of the URI in case a URI prefix synonym was used. Note that this function is idempotent, i.e., if you give an already standard URI, it will just return it as is. If the URI can’t be parsed with respect to the records in the converter, None is returned.

Raises:

URIStandardizationError – If strict is true and the URI can’t be standardized

>>> from curies import Converter, Record
>>> converter = Converter.from_extended_prefix_map(
...     [
...         Record(
...             prefix="CHEBI",
...             uri_prefix="http://purl.obolibrary.org/obo/CHEBI_",
...             uri_prefix_synonyms=[
...                 "https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:",
...             ],
...         ),
...     ]
... )
>>> converter.standardize_uri(
...     "https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:138488"
... )
'http://purl.obolibrary.org/obo/CHEBI_138488'
>>> converter.standardize_uri("http://purl.obolibrary.org/obo/CHEBI_138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'
>>> converter.standardize_uri("http://example.org/NOPE") is None
True
>>> converter.standardize_uri("http://example.org/NOPE", passthrough=True)
'http://example.org/NOPE'