Typing

This package comes with utilities for better typing other resources.

Prefix Parsing

Let’s say you have a table like this:

prefix	identifier	name	smiles
CHEBI	16236	ethanol	CCO
CHEBI	28831	propanol	CCCO
CHOBI	44884	pentanol	CCCCCO

Note that there’s a typo in the prefix on the fourth row in the prefix because it uses CHOBI instead of CHEBI. In the following code, we simulate reading that file and show where the error shows up:

import csv
from pydantic import BaseModel, ValidationError
from curies import Converter, Prefix

converter = Converter.from_prefix_map(
    {
        "CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
    }
)


class Row(BaseModel):
    prefix: Prefix
    identifier: str
    name: str
    smiles: str


records = [
    {"prefix": "CHEBI", "identifier": "16236", "name": "ethanol", "smiles": "CCO"},
    {"prefix": "CHEBI", "identifier": "28831", "name": "propanol", "smiles": "CCCO"},
    {"prefix": "CHOBI", "identifier": "44884", "name": "pentanol", "smiles": "CCCCCO"},
]

for record in records:
    try:
        model = Row.model_validate(record, context=converter)
    except ValidationError as e:
        print(f"Issue parsing record {record}: {e}")
        continue

Note that pydantic.BaseModel.model_validate() allows for passing a “context”. The curies.Prefix class implements custom context handling, so if you pass a converter, it knows how to check using prefixes in the converter.

CURIE Parsing

Let’s use a similar table, now with the prefix and identifier combine into CURIEs.

curie	name	smiles
CHEBI:16236	ethanol	CCO
CHEBI:28831	propanol	CCCO
CHOBI:44884	pentanol	CCCCCO

Note that there’s a typo in the prefix on the fourth row in the prefix because it uses CHOBI instead of CHEBI. In the following code, we simulate reading that file and show where the error shows up:

import csv
from pydantic import BaseModel, ValidationError
from curies import Converter, Reference

converter = Converter.from_prefix_map(
    {
        "CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
    }
)


class Row(BaseModel):
    curie: Reference
    name: str
    smiles: str


records = [
    {"curie": "CHEBI:16236", "name": "ethanol", "smiles": "CCO"},
    {"curie": "CHEBI:28831", "name": "propanol", "smiles": "CCCO"},
    {"curie": "CHOBI:44884", "name": "pentanol", "smiles": "CCCCCO"},
]

for record in records:
    try:
        model = Row.model_validate(record, context=converter)
    except ValidationError as e:
        print(f"Issue parsing record {record}: {e}")
        continue