Typing
This package comes with utilities for better typing other resources.
Prefix Parsing
Let’s say you have a table like this:
prefix |
identifier |
name |
smiles |
|---|---|---|---|
CHEBI |
16236 |
ethanol |
CCO |
CHEBI |
28831 |
propanol |
CCCO |
CHOBI |
44884 |
pentanol |
CCCCCO |
Note that there’s a typo in the prefix on the fourth row in the prefix because it uses
CHOBI instead of CHEBI. In the following code, we simulate reading that file and
show where the error shows up:
import csv
from pydantic import BaseModel, ValidationError
from curies import Converter, Prefix
converter = Converter.from_prefix_map(
{
"CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
}
)
class Row(BaseModel):
prefix: Prefix
identifier: str
name: str
smiles: str
records = [
{"prefix": "CHEBI", "identifier": "16236", "name": "ethanol", "smiles": "CCO"},
{"prefix": "CHEBI", "identifier": "28831", "name": "propanol", "smiles": "CCCO"},
{"prefix": "CHOBI", "identifier": "44884", "name": "pentanol", "smiles": "CCCCCO"},
]
for record in records:
try:
model = Row.model_validate(record, context=converter)
except ValidationError as e:
print(f"Issue parsing record {record}: {e}")
continue
Note that pydantic.BaseModel.model_validate() allows for passing a “context”. The
curies.Prefix class implements custom context handling, so if you pass a
converter, it knows how to check using prefixes in the converter.
CURIE Parsing
Let’s use a similar table, now with the prefix and identifier combine into CURIEs.
curie |
name |
smiles |
|---|---|---|
CHEBI:16236 |
ethanol |
CCO |
CHEBI:28831 |
propanol |
CCCO |
CHOBI:44884 |
pentanol |
CCCCCO |
Note that there’s a typo in the prefix on the fourth row in the prefix because it uses
CHOBI instead of CHEBI. In the following code, we simulate reading that file and
show where the error shows up:
import csv
from pydantic import BaseModel, ValidationError
from curies import Converter, Reference
converter = Converter.from_prefix_map(
{
"CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
}
)
class Row(BaseModel):
curie: Reference
name: str
smiles: str
records = [
{"curie": "CHEBI:16236", "name": "ethanol", "smiles": "CCO"},
{"curie": "CHEBI:28831", "name": "propanol", "smiles": "CCCO"},
{"curie": "CHOBI:44884", "name": "pentanol", "smiles": "CCCCCO"},
]
for record in records:
try:
model = Row.model_validate(record, context=converter)
except ValidationError as e:
print(f"Issue parsing record {record}: {e}")
continue