API

predict

PyBoltz.predictFunction
predict(input, [output_type]; options...)

Run Boltz-1 prediction with the given input, output type, and options.

Input types

Output types

By default, raw results will be written to disk in the out_dir directory (see options).

For convenience, output_type can be provided as a second argument to reduce manual file I/O.

If output_type is provided, the function will return a single object if a BoltzInput was provided as input, otherwise a vector if an AbstractString or Vector{BoltzInput} was provided.

The following output types are supported:

  • BioStructures.MolecularStructure: a rich and robust representation of molecular structures.
  • ProteinChains.ProteinStructure: a flat and specialized representation of protein structures for convenience.

Options

Numeric Options

  • devices::Integer: Number of devices to use. Default: 1.
  • recycling_steps::Integer: Number of recycling steps. Default: 3.
  • sampling_steps::Integer: Number of sampling steps. Default: 200.
  • diffusion_samples::Integer: Number of diffusion samples. Default: 1.
  • step_scale::Float64: Step size related to temperature. Default: 1.638.
  • num_workers::Integer: Number of dataloader workers. Default: 2.
  • seed::Integer: RNG seed; default: none.

String Options

  • out_dir::String: The path where to save the predictions.
  • cache::String: The directory where to download the data and model.

Defaults to a Scratch.jl-backed directory created at module init; call clear_cache() to reset it.

  • checkpoint::String: Optional checkpoint path; defaults to Boltz-1 model.
  • accelerator::String: 'gpu', 'cpu', or 'tpu'. Default: 'gpu'.
  • output_format::String: 'pdb' or 'mmcif'. Default: 'mmcif'.
  • msa_server_url::String: MSA server URL; requires use_msa_server=true.
  • msa_pairing_strategy::String: 'greedy' or 'complete'; requires use_msa_server=true.

Boolean Flags

  • write_full_pae::Bool: Dump PAE to a npz file. Default: true.
  • write_full_pde::Bool: Dump PDE to a npz file. Default: false.
  • override::Bool: Override existing predictions. Default: false.
  • use_msa_server::Bool: Use MMSeqs2 server for MSA generation. Default: false.
source

Schema submodule

PyBoltz.Schema.BoltzInputType
BoltzInput

A dictionary object that can be written to a YAML file.

Implemented according to the schema definition in the boltz documentation, allowing for easy in-memory construction of the schema.

Additions

  • name is an optional argument that changes the name of the output file/structure.
  • Sequences passed to protein, dna, and rna get automatically converted to strings, so any type (e.g. BioSequences.BioSequence) that has sensible Base.string-conversion defined will work.
  • msa can be provided as a vector of sequences.

Examples

using PyBoltz.Schema

input1 = BoltzInput(
    name = "example1", # optional name YAML file (and thus output pdb/cif file)
    sequences = [
        protein(
            id = ["A", "B"],
            sequence = seq,
            msa = [seq, other...] # or path to a3m file
        ),
        ligand(
            id = ["C", "D"],
            ccd = "SAH"
        ),
        ligand(
            id = ["E", "F"],
            smiles = "N[C@@H](Cc1ccc(O)cc1)C(=O)O"
        )
    ]
)

input2 = BoltzInput(
    sequences = [
        protein(
            id = ["A1"],
            sequence = seq
        ),
        ligand(
            id = ["B1"],
            ccd = "EKY"
        )
    ],
    constraints = [
        pocket(
            binder = "B1",
            contacts = [ ("B1", 1), ("A1", 138) ]
        )
    ]
)
source

Sequences

The following sequence types go into the sequences vector keyword argument of BoltzInput.

PyBoltz.Schema.proteinFunction
protein(; id, sequence, msa=nothing, modifications=nothing, cyclic=nothing)
using PyBoltz.Schema: protein
protein(id="A", sequence="RHKDE")
protein(id=["A", "B"], sequence="RHKDE")
protein(id="A", sequence="RHKDE", msa="path/to/msa.a3m")
protein(id="A", sequence="RHKDE", msa=["RHKDE", "RHKDE"])
protein(id="A", sequence="RHKDE", modifications=[(position=1, ccd="MSE"), (position=5, ccd="MSE")])
protein(id="A", sequence="RHKDE", cyclic=true)
source
PyBoltz.Schema.dnaFunction
dna(; id, sequence)
using PyBoltz.Schema: dna
dna(id="A", sequence="GATTACA")
dna(id=["A", "B"], sequence="GATTACA")
dna(id="A", sequence="GATTACA", modifications=[(position=2, ccd="6MA"), (position=6, ccd="5MC")]) # untested
dna(id="A", sequence="GATTACA", cyclic=true)
source
PyBoltz.Schema.rnaFunction
rna(; id, sequence)
using PyBoltz.Schema: rna
rna(id="A", sequence="GAUUACA")
rna(id=["A", "B"], sequence="GAUUACA")
rna(id="A", sequence="GAUUACA", modifications=[(position=2, ccd="I"), (position=3, ccd="PSU")]) # untested
rna(id="A", sequence="GAUUACA", cyclic=true)
source
PyBoltz.Schema.ligandFunction
ligand(; id, smiles=nothing, ccd=nothing)
using PyBoltz.Schema: ligand
ligand(id="C", smiles="C1=CC=CC=C1")
ligand(id=["D", "E"], ccd="SAH")
source

Constraints

The following constraint types go into the constraints vector keyword argument of BoltzInput.

PyBoltz.Schema.bondFunction
bond(; atom1, atom2)
using PyBoltz.Schema: bond
# atom1 and atom2 are tuples of (chain_id, residue_index, atom_name)
bond(atom1=("A", 1, "CA"), atom2=("B", 2, "CA"))
source
PyBoltz.Schema.pocketFunction
pocket(; binder, contacts, max_distance=nothing)
using PyBoltz.Schema: pocket
# binder is a chain_id
# contacts is a vector of vectors of (chain_id, residue_index)
pocket(binder="A", contacts=[("B", 1), ("C", 2)])
source

Templates

Properties