API
predict
PyBoltz.predict — Functionpredict(input, [output_type]; options...)Run Boltz-1 prediction with the given input, output type, and options.
Input types
AbstractString: Path to a FASTA/YAML file or directory (for batching).BoltzInput: A singlePyBoltz.Schema.BoltzInputobject.Vector{BoltzInput}: A vector ofPyBoltz.Schema.BoltzInputobjects for batching.
Output types
By default, raw results will be written to disk in the out_dir directory (see options).
For convenience, output_type can be provided as a second argument to reduce manual file I/O.
If output_type is provided, the function will return a single object if a BoltzInput was provided as input, otherwise a vector if an AbstractString or Vector{BoltzInput} was provided.
The following output types are supported:
BioStructures.MolecularStructure: a rich and robust representation of molecular structures.ProteinChains.ProteinStructure: a flat and specialized representation of protein structures for convenience.
Options
Numeric Options
devices::Integer: Number of devices to use. Default: 1.recycling_steps::Integer: Number of recycling steps. Default: 3.sampling_steps::Integer: Number of sampling steps. Default: 200.diffusion_samples::Integer: Number of diffusion samples. Default: 1.step_scale::Float64: Step size related to temperature. Default: 1.638.num_workers::Integer: Number of dataloader workers. Default: 2.seed::Integer: RNG seed; default: none.
String Options
out_dir::String: The path where to save the predictions.cache::String: The directory where to download the data and model.
Defaults to a Scratch.jl-backed directory created at module init; call clear_cache() to reset it.
checkpoint::String: Optional checkpoint path; defaults to Boltz-1 model.accelerator::String: 'gpu', 'cpu', or 'tpu'. Default: 'gpu'.output_format::String: 'pdb' or 'mmcif'. Default: 'mmcif'.msa_server_url::String: MSA server URL; requiresuse_msa_server=true.msa_pairing_strategy::String: 'greedy' or 'complete'; requiresuse_msa_server=true.
Boolean Flags
write_full_pae::Bool: Dump PAE to a npz file. Default: true.write_full_pde::Bool: Dump PDE to a npz file. Default: false.override::Bool: Override existing predictions. Default: false.use_msa_server::Bool: Use MMSeqs2 server for MSA generation. Default: false.
Schema submodule
PyBoltz.Schema.BoltzInput — TypeBoltzInputA dictionary object that can be written to a YAML file.
Implemented according to the schema definition in the boltz documentation, allowing for easy in-memory construction of the schema.
Additions
nameis an optional argument that changes the name of the output file/structure.- Sequences passed to
protein,dna, andrnaget automatically converted to strings, so any type (e.g.BioSequences.BioSequence) that has sensibleBase.string-conversion defined will work. msacan be provided as a vector of sequences.
Examples
using PyBoltz.Schema
input1 = BoltzInput(
name = "example1", # optional name YAML file (and thus output pdb/cif file)
sequences = [
protein(
id = ["A", "B"],
sequence = seq,
msa = [seq, other...] # or path to a3m file
),
ligand(
id = ["C", "D"],
ccd = "SAH"
),
ligand(
id = ["E", "F"],
smiles = "N[C@@H](Cc1ccc(O)cc1)C(=O)O"
)
]
)
input2 = BoltzInput(
sequences = [
protein(
id = ["A1"],
sequence = seq
),
ligand(
id = ["B1"],
ccd = "EKY"
)
],
constraints = [
pocket(
binder = "B1",
contacts = [ ("B1", 1), ("A1", 138) ]
)
]
)Sequences
The following sequence types go into the sequences vector keyword argument of BoltzInput.
PyBoltz.Schema.protein — Functionprotein(; id, sequence, msa=nothing, modifications=nothing, cyclic=nothing)using PyBoltz.Schema: protein
protein(id="A", sequence="RHKDE")
protein(id=["A", "B"], sequence="RHKDE")
protein(id="A", sequence="RHKDE", msa="path/to/msa.a3m")
protein(id="A", sequence="RHKDE", msa=["RHKDE", "RHKDE"])
protein(id="A", sequence="RHKDE", modifications=[(position=1, ccd="MSE"), (position=5, ccd="MSE")])
protein(id="A", sequence="RHKDE", cyclic=true)PyBoltz.Schema.dna — Functiondna(; id, sequence)using PyBoltz.Schema: dna
dna(id="A", sequence="GATTACA")
dna(id=["A", "B"], sequence="GATTACA")
dna(id="A", sequence="GATTACA", modifications=[(position=2, ccd="6MA"), (position=6, ccd="5MC")]) # untested
dna(id="A", sequence="GATTACA", cyclic=true)PyBoltz.Schema.rna — Functionrna(; id, sequence)using PyBoltz.Schema: rna
rna(id="A", sequence="GAUUACA")
rna(id=["A", "B"], sequence="GAUUACA")
rna(id="A", sequence="GAUUACA", modifications=[(position=2, ccd="I"), (position=3, ccd="PSU")]) # untested
rna(id="A", sequence="GAUUACA", cyclic=true)PyBoltz.Schema.ligand — Functionligand(; id, smiles=nothing, ccd=nothing)using PyBoltz.Schema: ligand
ligand(id="C", smiles="C1=CC=CC=C1")
ligand(id=["D", "E"], ccd="SAH")Constraints
The following constraint types go into the constraints vector keyword argument of BoltzInput.
PyBoltz.Schema.bond — Functionbond(; atom1, atom2)using PyBoltz.Schema: bond
# atom1 and atom2 are tuples of (chain_id, residue_index, atom_name)
bond(atom1=("A", 1, "CA"), atom2=("B", 2, "CA"))PyBoltz.Schema.pocket — Functionpocket(; binder, contacts, max_distance=nothing)using PyBoltz.Schema: pocket
# binder is a chain_id
# contacts is a vector of vectors of (chain_id, residue_index)
pocket(binder="A", contacts=[("B", 1), ("C", 2)])PyBoltz.Schema.contact — Functioncontact(; token1, token2, max_distance=nothing)Templates
PyBoltz.Schema.template — Functiontemplate(; cif, chain_id=nothing, template_id=nothing)Properties
PyBoltz.Schema.affinity — Functionaffinity(; binder)