DLProteinFormats

DLProteinFormats.flatten
DLProteinFormats.sample_batched_inds
DLProteinFormats.unflatten
ProteinChains.writepdb

DLProteinFormats.flatten — Method

flatten(rec::ProteinStructure; T = Float32)

Takes a ProteinStructure and returns a tuple of the translations, rotations, residue indices, and features for each chain.

source

DLProteinFormats.sample_batched_inds — Method

sample_batched_inds(flatrecs; l2b = length2batch(1000, 1.9))

Takes a vector of (flattened) protein structures, and returns a vector of indices into the original array, with each batch containing a random sample of one protein from each cluster.

source

DLProteinFormats.unflatten — Method

unflatten(locs, rots, seqints, chainids, resnums)
unflatten(locs, rots, seqhots, chainids, resnums)  
unflatten(locs, rots, seq, chainids, resnums)

Converts flattened protein structure data back into ProteinChain objects.

Arguments

locs: Array of translations/locations (3×1×L or 3×1×L×B for batched)
rots: Array of rotations (3×3×L or 3×3×L×B for batched)
seqints/seqhots/seq: Sequence data as integers, one-hot encoding, or generic sequence
chainids: Chain identifiers for each residue
resnums: Residue numbers for each position

Returns

Vector of ProteinChain objects (or vector of vectors for batched input)

The function reconstructs protein chains from flattened representations, applying unit scaling to locations and converting sequence integers back to amino acid strings.

source

ProteinChains.writepdb — Function

writepdb(path, chains::AbstractVector{<:ProteinChains.ProteinChain})

Examples

using DLProteinFormats

data = DLProteinFormats.load(PDBSimpleFlat500);

flat_chains = data[1];

chains = DLProteinFormats.unflatten(
    flat_chains.locs,
    flat_chains.rots,
    flat_chains.AAs,
    flat_chains.chainids,
    flat_chains.resinds) # unflatten the flat data

writepdb("chains-1.pdb", chains) # view in e.g. chimerax or vscode protein viewer extension

source