Typing
This package comes with utilities for better typing other resources.
Prefix Parsing
Let’s say you have a table like this:
prefix |
identifier |
name |
smiles |
---|---|---|---|
CHEBI |
16236 |
ethanol |
CCO |
CHEBI |
28831 |
propanol |
CCCO |
CHOBI |
44884 |
pentanol |
CCCCCO |
Note that there’s a typo in the prefix on the fourth row in the prefix because it
uses CHOBI
instead of CHEBI
. In the following code, we simulate reading that
file and show where the error shows up:
import csv
from pydantic import BaseModel, ValidationError
from curies import Converter, Prefix
converter = Converter.from_prefix_map({
"CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
})
class Row(BaseModel):
prefix: Prefix
identifier: str
name: str
smiles: str
records = [
{"prefix": "CHEBI", "identifier": "16236", "name": "ethanol", "smiles": "CCO"},
{"prefix": "CHEBI", "identifier": "28831", "name": "propanol", "smiles": "CCCO"},
{"prefix": "CHOBI", "identifier": "44884", "name": "pentanol", "smiles": "CCCCCO"},
]
for record in records:
try:
model = Row.model_validate(record, context=converter)
except ValidationError as e:
print(f"Issue parsing record {record}: {e}")
continue
Note that pydantic.BaseModel.model_validate()
allows for passing a “context”.
The curies.Prefix
class implements custom context handling, so if you pass
a converter, it knows how to check using prefixes in the converter.
CURIE Parsing
Let’s use a similar table, now with the prefix and identifier combine into CURIEs.
curie |
name |
smiles |
---|---|---|
CHEBI:16236 |
ethanol |
CCO |
CHEBI:28831 |
propanol |
CCCO |
CHOBI:44884 |
pentanol |
CCCCCO |
Note that there’s a typo in the prefix on the fourth row in the prefix because it
uses CHOBI
instead of CHEBI
. In the following code, we simulate reading that
file and show where the error shows up:
import csv
from pydantic import BaseModel, ValidationError
from curies import Converter, Reference
converter = Converter.from_prefix_map({
"CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
})
class Row(BaseModel):
curie: Reference
name: str
smiles: str
records = [
{"curie": "CHEBI:16236", "name": "ethanol", "smiles": "CCO"},
{"curie": "CHEBI:28831", "name": "propanol", "smiles": "CCCO"},
{"curie": "CHOBI:44884", "name": "pentanol", "smiles": "CCCCCO"},
]
for record in records:
try:
model = Row.model_validate(record, context=converter)
except ValidationError as e:
print(f"Issue parsing record {record}: {e}")
continue