API
predict
PyBoltz.predict
— Functionpredict(input, [output_type]; options...)
Run Boltz-1 prediction with the given input, output type, and options.
Input types
AbstractString
: Path to a FASTA/YAML file or directory (for batching).BoltzInput
: A singlePyBoltz.Schema.BoltzInput
object.Vector{BoltzInput}
: A vector ofPyBoltz.Schema.BoltzInput
objects for batching.
Output types
By default, raw results will be written to disk in the out_dir
directory (see options).
For convenience, output_type
can be provided as a second argument to reduce manual file I/O.
If output_type
is provided, the function will return a single object if a BoltzInput
was provided as input, otherwise a vector if an AbstractString
or Vector{BoltzInput}
was provided.
The following output types are supported:
BioStructures.MolecularStructure
: a rich and robust representation of molecular structures.ProteinChains.ProteinStructure
: a flat and specialized representation of protein structures for convenience.
Options
Numeric Options
devices::Integer
: Number of devices to use. Default: 1.recycling_steps::Integer
: Number of recycling steps. Default: 3.sampling_steps::Integer
: Number of sampling steps. Default: 200.diffusion_samples::Integer
: Number of diffusion samples. Default: 1.step_scale::Float64
: Step size related to temperature. Default: 1.638.num_workers::Integer
: Number of dataloader workers. Default: 2.seed::Integer
: RNG seed; default: none.
String Options
out_dir::String
: The path where to save the predictions.cache::String
: The directory where to download the data and model.
Defaults to a Scratch.jl-backed directory created at module init; call clear_cache()
to reset it.
checkpoint::String
: Optional checkpoint path; defaults to Boltz-1 model.accelerator::String
: 'gpu', 'cpu', or 'tpu'. Default: 'gpu'.output_format::String
: 'pdb' or 'mmcif'. Default: 'mmcif'.msa_server_url::String
: MSA server URL; requiresuse_msa_server=true
.msa_pairing_strategy::String
: 'greedy' or 'complete'; requiresuse_msa_server=true
.
Boolean Flags
write_full_pae::Bool
: Dump PAE to a npz file. Default: true.write_full_pde::Bool
: Dump PDE to a npz file. Default: false.override::Bool
: Override existing predictions. Default: false.use_msa_server::Bool
: Use MMSeqs2 server for MSA generation. Default: false.
Schema
submodule
PyBoltz.Schema.BoltzInput
— TypeBoltzInput
A dictionary object that can be written to a YAML file.
Implemented according to the schema definition in the boltz documentation, allowing for easy in-memory construction of the schema.
Additions
name
is an optional argument that changes the name of the output file/structure.- Sequences passed to
protein
,dna
, andrna
get automatically converted to strings, so any type (e.g.BioSequences.BioSequence
) that has sensibleBase.string
-conversion defined will work. msa
can be provided as a vector of sequences.
Examples
using PyBoltz.Schema
input1 = BoltzInput(
name = "example1", # optional name YAML file (and thus output pdb/cif file)
sequences = [
protein(
id = ["A", "B"],
sequence = seq,
msa = [seq, other...] # or path to a3m file
),
ligand(
id = ["C", "D"],
ccd = "SAH"
),
ligand(
id = ["E", "F"],
smiles = "N[C@@H](Cc1ccc(O)cc1)C(=O)O"
)
]
)
input2 = BoltzInput(
sequences = [
protein(
id = ["A1"],
sequence = seq
),
ligand(
id = ["B1"],
ccd = "EKY"
)
],
constraints = [
pocket(
binder = "B1",
contacts = [ ("B1", 1), ("A1", 138) ]
)
]
)
Sequences
The following sequence types go into the sequences
vector keyword argument of BoltzInput
.
PyBoltz.Schema.protein
— Functionprotein(; id, sequence, msa=nothing, modifications=nothing, cyclic=nothing)
using PyBoltz.Schema: protein
protein(id="A", sequence="RHKDE")
protein(id=["A", "B"], sequence="RHKDE")
protein(id="A", sequence="RHKDE", msa="path/to/msa.a3m")
protein(id="A", sequence="RHKDE", msa=["RHKDE", "RHKDE"])
protein(id="A", sequence="RHKDE", modifications=[(position=1, ccd="MSE"), (position=5, ccd="MSE")])
protein(id="A", sequence="RHKDE", cyclic=true)
PyBoltz.Schema.dna
— Functiondna(; id, sequence)
using PyBoltz.Schema: dna
dna(id="A", sequence="GATTACA")
dna(id=["A", "B"], sequence="GATTACA")
dna(id="A", sequence="GATTACA", modifications=[(position=2, ccd="6MA"), (position=6, ccd="5MC")]) # untested
dna(id="A", sequence="GATTACA", cyclic=true)
PyBoltz.Schema.rna
— Functionrna(; id, sequence)
using PyBoltz.Schema: rna
rna(id="A", sequence="GAUUACA")
rna(id=["A", "B"], sequence="GAUUACA")
rna(id="A", sequence="GAUUACA", modifications=[(position=2, ccd="I"), (position=3, ccd="PSU")]) # untested
rna(id="A", sequence="GAUUACA", cyclic=true)
PyBoltz.Schema.ligand
— Functionligand(; id, smiles=nothing, ccd=nothing)
using PyBoltz.Schema: ligand
ligand(id="C", smiles="C1=CC=CC=C1")
ligand(id=["D", "E"], ccd="SAH")
Constraints
The following constraint types go into the constraints
vector keyword argument of BoltzInput
.
PyBoltz.Schema.bond
— Functionbond(; atom1, atom2)
using PyBoltz.Schema: bond
# atom1 and atom2 are tuples of (chain_id, residue_index, atom_name)
bond(atom1=("A", 1, "CA"), atom2=("B", 2, "CA"))
PyBoltz.Schema.pocket
— Functionpocket(; binder, contacts, max_distance=nothing)
using PyBoltz.Schema: pocket
# binder is a chain_id
# contacts is a vector of vectors of (chain_id, residue_index)
pocket(binder="A", contacts=[("B", 1), ("C", 2)])
PyBoltz.Schema.contact
— Functioncontact(; token1, token2, max_distance=nothing)
Templates
PyBoltz.Schema.template
— Functiontemplate(; cif, chain_id=nothing, template_id=nothing)
Properties
PyBoltz.Schema.affinity
— Functionaffinity(; binder)