API

`predict`

PyBoltz.predict — Function

predict(input, [output_type]; options...)

Run Boltz-1 prediction with the given input, output type, and options.

Input types

AbstractString: Path to a FASTA/YAML file or directory (for batching).
BoltzInput: A single PyBoltz.Schema.BoltzInput object.
Vector{BoltzInput}: A vector of PyBoltz.Schema.BoltzInput objects for batching.

Output types

By default, raw results will be written to disk in the out_dir directory (see options).

For convenience, output_type can be provided as a second argument to reduce manual file I/O.

If output_type is provided, the function will return a single object if a BoltzInput was provided as input, otherwise a vector if an AbstractString or Vector{BoltzInput} was provided.

The following output types are supported:

BioStructures.MolecularStructure: a rich and robust representation of molecular structures.
ProteinChains.ProteinStructure: a flat and specialized representation of protein structures for convenience.

Options

Numeric Options

devices::Integer: Number of devices to use. Default: 1.
recycling_steps::Integer: Number of recycling steps. Default: 3.
sampling_steps::Integer: Number of sampling steps. Default: 200.
diffusion_samples::Integer: Number of diffusion samples. Default: 1.
step_scale::Float64: Step size related to temperature. Default: 1.638.
num_workers::Integer: Number of dataloader workers. Default: 2.
seed::Integer: RNG seed; default: none.

String Options

out_dir::String: The path where to save the predictions.
cache::String: The directory where to download the data and model.

Defaults to a Scratch.jl-backed directory created at module init; call clear_cache() to reset it.

checkpoint::String: Optional checkpoint path; defaults to Boltz-1 model.
accelerator::String: 'gpu', 'cpu', or 'tpu'. Default: 'gpu'.
output_format::String: 'pdb' or 'mmcif'. Default: 'mmcif'.
msa_server_url::String: MSA server URL; requires use_msa_server=true.
msa_pairing_strategy::String: 'greedy' or 'complete'; requires use_msa_server=true.

Boolean Flags

write_full_pae::Bool: Dump PAE to a npz file. Default: true.
write_full_pde::Bool: Dump PDE to a npz file. Default: false.
override::Bool: Override existing predictions. Default: false.
use_msa_server::Bool: Use MMSeqs2 server for MSA generation. Default: false.

source

`Schema` submodule

PyBoltz.Schema.BoltzInput — Type

BoltzInput

A dictionary object that can be written to a YAML file.

Implemented according to the schema definition in the boltz documentation, allowing for easy in-memory construction of the schema.

Additions

name is an optional argument that changes the name of the output file/structure.
Sequences passed to protein, dna, and rna get automatically converted to strings, so any type (e.g. BioSequences.BioSequence) that has sensible Base.string-conversion defined will work.
msa can be provided as a vector of sequences.

Examples

using PyBoltz.Schema

input1 = BoltzInput(
    name = "example1", # optional name YAML file (and thus output pdb/cif file)
    sequences = [
        protein(
            id = ["A", "B"],
            sequence = seq,
            msa = [seq, other...] # or path to a3m file
        ),
        ligand(
            id = ["C", "D"],
            ccd = "SAH"
        ),
        ligand(
            id = ["E", "F"],
            smiles = "N[C@@H](Cc1ccc(O)cc1)C(=O)O"
        )
    ]
)

input2 = BoltzInput(
    sequences = [
        protein(
            id = ["A1"],
            sequence = seq
        ),
        ligand(
            id = ["B1"],
            ccd = "EKY"
        )
    ],
    constraints = [
        pocket(
            binder = "B1",
            contacts = [ ("B1", 1), ("A1", 138) ]
        )
    ]
)

source

Sequences

The following sequence types go into the sequences vector keyword argument of BoltzInput.

PyBoltz.Schema.protein — Function

protein(; id, sequence, msa=nothing, modifications=nothing, cyclic=nothing)

using PyBoltz.Schema: protein
protein(id="A", sequence="RHKDE")
protein(id=["A", "B"], sequence="RHKDE")
protein(id="A", sequence="RHKDE", msa="path/to/msa.a3m")
protein(id="A", sequence="RHKDE", msa=["RHKDE", "RHKDE"])
protein(id="A", sequence="RHKDE", modifications=[(position=1, ccd="MSE"), (position=5, ccd="MSE")])
protein(id="A", sequence="RHKDE", cyclic=true)

source

PyBoltz.Schema.dna — Function

dna(; id, sequence)

using PyBoltz.Schema: dna
dna(id="A", sequence="GATTACA")
dna(id=["A", "B"], sequence="GATTACA")
dna(id="A", sequence="GATTACA", modifications=[(position=2, ccd="6MA"), (position=6, ccd="5MC")]) # untested
dna(id="A", sequence="GATTACA", cyclic=true)

source

PyBoltz.Schema.rna — Function

rna(; id, sequence)

using PyBoltz.Schema: rna
rna(id="A", sequence="GAUUACA")
rna(id=["A", "B"], sequence="GAUUACA")
rna(id="A", sequence="GAUUACA", modifications=[(position=2, ccd="I"), (position=3, ccd="PSU")]) # untested
rna(id="A", sequence="GAUUACA", cyclic=true)

source

PyBoltz.Schema.ligand — Function

ligand(; id, smiles=nothing, ccd=nothing)

using PyBoltz.Schema: ligand
ligand(id="C", smiles="C1=CC=CC=C1")
ligand(id=["D", "E"], ccd="SAH")

source

Constraints

The following constraint types go into the constraints vector keyword argument of BoltzInput.

PyBoltz.Schema.bond — Function

bond(; atom1, atom2)

using PyBoltz.Schema: bond
# atom1 and atom2 are tuples of (chain_id, residue_index, atom_name)
bond(atom1=("A", 1, "CA"), atom2=("B", 2, "CA"))

source

PyBoltz.Schema.pocket — Function

pocket(; binder, contacts, max_distance=nothing)

using PyBoltz.Schema: pocket
# binder is a chain_id
# contacts is a vector of vectors of (chain_id, residue_index)
pocket(binder="A", contacts=[("B", 1), ("C", 2)])

source

PyBoltz.Schema.contact — Function

contact(; token1, token2, max_distance=nothing)

source

Templates

PyBoltz.Schema.template — Function

template(; cif, chain_id=nothing, template_id=nothing)

source

Properties

PyBoltz.Schema.affinity — Function

affinity(; binder)

source

API

predict

Schema submodule

Sequences

Constraints

Templates

Properties

`predict`

`Schema` submodule