ProteinChains

ProteinChains.STANDARD_RESIDUE_TEMPLATE
ProteinChains.BackboneGeometry
ProteinChains.IdealResidue
ProteinChains.ProteinChain
ProteinChains.ProteinStructure
ProteinChains.ProteinStructureStore
ProteinChains.ProteinStructureStore
ProteinChains.all_atom_coords
ProteinChains.append_residue
ProteinChains.atom_contact_order
ProteinChains.cysteine_features
ProteinChains.deserialize
ProteinChains.interchain_atom_contact_proportion
ProteinChains.interchain_contact_counts
ProteinChains.interchain_residue_contact_proportion
ProteinChains.mapmmcif
ProteinChains.pdbentry
ProteinChains.prepend_residue
ProteinChains.residue_contact_order
ProteinChains.residue_pair_min_distances
ProteinChains.serialize
ProteinChains.shape_features
ProteinChains.structure_atom_coords

ProteinChains.STANDARD_RESIDUE_TEMPLATE — Constant

STANDARD_RESIDUE_TEMPLATE

This is a template of a "standard residue", with a very specific and distinct shape, size, and orientation. which needs to be consistent if we want to represent protein structures as sets of residue rotations and translations.

Thus, we can use this residue as a template for aligning other residues with very precise geometry to it.

julia> IdealResidue{Float64}(BackboneGeometry(N_Ca_C_angle = 1.93); template=ProteinChains.STANDARD_RESIDUE_TEMPLATE)
3×3 IdealResidue{Float64}:
 -1.06447   -0.199174   1.26364
  0.646303  -0.529648  -0.116655
  0.0        0.0        0.0

source

ProteinChains.BackboneGeometry — Type

BackboneGeometry(;
    N_Ca_length = 1.46,
    Ca_C_length = 1.52,
    C_N_length = 1.33,

    N_Ca_C_angle = 1.94,
    Ca_C_N_angle = 2.03,
    C_N_Ca_angle = 2.13,
)

Define the idealized bond lengths and bond angles of a protein backbone.

source

ProteinChains.IdealResidue — Type

IdealResidue{T<:AbstractFloat} <: AbstractMatrix{T}

IdealResidue{T}(backbone_geometry=DEFAULT_BACKBONE_GEOMETRY; template=nothing) where T

A 3x3 matrix representing the idealized geometry of a protein residue, with columns representing the N, Ca, and C atom positions of a residue positioned at the origin.

source

ProteinChains.ProteinChain — Type

ProteinChain{T<:Real}

Represents a protein chain with a basic set of fields from which some other properties might be derived.

source

ProteinChains.ProteinStructure — Type

ProteinStructure{T} <: AbstractVector{ProteinChain{T}}

Fields

name::String: Usually just the base name of the original file.
chains::Vector{ProteinChain{T}}: a collection of ProteinChains.
atoms::Vector{Atom{T}}: free atoms from the structure that were not part of any protein residue.

source

ProteinChains.ProteinStructureStore — Type

ProteinStructureStore <: AbstractDict{InlineStrings.String31,ProteinStructure}

A JLD2-based store for protein structures implementing the AbstractDict interface, allowing for dictionary operations on the stored structures.

Keys are stored as InlineStrings.String31 objects to reduce references. This means keys are limited to 31 bytes.

A ProteinStructureStore gets closed automatically when there no longer exists a program-accessible reference to it.

Examples

julia> store = ProteinStructureStore("store.pss")
ProteinStructureStore with 0 entries

julia> store["3HFM"] = pdb"3HFM"
[ Info: Downloading file from PDB: 3HFM
3-element ProteinStructure "3HFM.cif":
 215-residue ProteinChain{Float64} (H)
 214-residue ProteinChain{Float64} (L)
 129-residue ProteinChain{Float64} (Y)

julia> store
ProteinStructureStore with 1 entry

julia> keys(store)
Set{InlineStrings.String31} with 1 element:
  InlineStrings.String31("3HFM")

julia> delete!(store, "3HFM")
ProteinStructureStore with 0 entries

source

ProteinChains.ProteinStructureStore — Method

ProteinStructureStore(f::Function, filename, mode="a+")

source

ProteinChains.all_atom_coords — Method

all_atom_coords(chain)

Return a 3 × N matrix of all-atom Cartesian coordinates for a ProteinChain. This is written to be type-stable and allocation-friendly.

source

ProteinChains.append_residue — Method

append_residue(Backbone::Backbone, torsion_angles::Vector{<:Real}; ideal::BackboneGeometry=DEFAULT_BACKBONE_GEOMETRY)

Create a new backbone by appending 3 new torsion angles (ψ, ω, ϕ) at the end, using bond lengths and bond angles specified in BackboneGeometry.

source

ProteinChains.atom_contact_order — Method

atom_contact_order(chain; cutoff = 6.0)

Atom-level contact order (ALCO) matching the definition in Eric Alm's contactOrder.pl script.

For a single chain:

consider all non-hydrogen atoms,
for each contacting atom pair within cutoff Å, with positive sequence separation in chain.numbering, accumulate that separation;
the absolute CO is order / counts;
the relative CO is absolute_CO / (max(numbering) - min(numbering) + 1).

Results should match Plaxco et al. 1998, Table 1.

source

ProteinChains.cysteine_features — Method

cysteine_features(structure; distance_cutoff = 2.5)

For each chain, count:

num_paired::Int: number of cysteines that participate in a disulfide bond (including those paired with cysteines in other chains),
num_cys::Int: total number of cysteines in the chain.

Pairing is defined by SG–SG distances < distance_cutoff Å, including pairs between different chains. Returns a vector of named tuples, one per chain.

source

ProteinChains.deserialize — Method

deserialize(filename::AbstractString)

Deserialize ProteinStructure objects from a JLD2 file. Returns a Vector{ProteinStructure} of all structures stored in the file.

source

ProteinChains.interchain_atom_contact_proportion — Method

interchain_atom_contact_proportion(structure; distance_cutoff = 5.0)

Atom-level inter-chain contact proportion: for each chain, returns the fraction of atoms that are within distance_cutoff Å of at least one atom in another chain.

Results should match Plaxco et al. 1998, Table 1.

source

ProteinChains.interchain_contact_counts — Method

interchain_contact_counts(structure; distance_cutoff = 5.0)

For each chain, count how many residues make contact with at least one residue in any other chain, based on all-atom contacts with a distance cutoff distance_cutoff (Å).

Returns a vector of integers, one per chain.

source

ProteinChains.interchain_residue_contact_proportion — Method

interchain_residue_contact_proportion(structure; distance_cutoff = 5.0)

Residue-level inter-chain contact proportion: for each chain, returns the fraction of residues that have at least one atom within distance_cutoff Å of any atom in another chain.

source

ProteinChains.mapmmcif — Method

mapmmcif(mmcifdict, field1 => field2, field3 => field4, ...)

julia> import BioStructures

julia> filename = BioStructures.downloadpdb("3HFM", format=BioStructures.MMCIFFormat);
[ Info: Downloading file from PDB: 3HFM

julia> mmcifdict = BioStructures.MMCIFDict(filename);

julia> mapmmcif(mmcifdict,
           "_atom_site.auth_asym_id"   => "_atom_site.label_entity_id",
           "_entity_src_gen.entity_id" => "_entity_src_gen.pdbx_gene_src_ncbi_taxonomy_id")
Dict{String, String} with 3 entries:
  "Y" => "9031"
  "L" => "10090"
  "H" => "10090"

source

ProteinChains.pdbentry — Method

pdbentry(pdbid::AbstractString; format=MMCIFFormat, kws...)

Keyword arguments get propagated to BioStructures.downloadpdb

Downloads are cached in a temporary directory.

Examples

julia> pdbentry("1EYE")
[ Info: Downloading file from PDB: 1EYE
1-element ProteinStructure "1EYE.cif":
 256-residue ProteinChain{Float64} (A)

julia> pdb"1EYE" # string macro for convenience
[ Info: File exists: 1EYE
1-element ProteinStructure "1EYE.cif":
 256-residue ProteinChain{Float64} (A)

julia> pdb"1EYE"A # string suffix to get a specific chain
[ Info: File exists: 1EYE
256-residue ProteinChain{Float64} (A)

julia> pdb"1EYE"1 # integer suffix to specify "ba_number" keyword
[ Info: Downloading file from PDB: 1EYE
2-element ProteinStructure "1EYE_ba1.cif":
 256-residue ProteinChain{Float64} (A)
 256-residue ProteinChain{Float64} (A-2)

source

ProteinChains.prepend_residue — Method

append_residue(Backbone::Backbone, torsion_angles::Vector{<:Real}; ideal::BackboneGeometry=DEFAULT_BACKBONE_GEOMETRY)

Create a new backbone by prepending 3 new torsion angles (ψ, ω, ϕ) at the beginning, using bond lengths and bond angles specified in the BackboneGeometry.

Note

The torsion angle order is the same as it would be when appending. The order is not reversed.

source

ProteinChains.residue_contact_order — Method

residue_contact_order(chain; distance_thresholds = (6.0, 8.0, 10.0))

Compute length-normalized contact order for a single chain, using all-atom contacts.

For each distance threshold d, contacts are residue pairs with any inter-atomic distance < d Å, excluding pairs with sequence-number separation ≤ 1 based on chain.numbering.

Contact order is defined as

CO = (1 / (N * Leff)) * sum{(i,j) in C} |numbering[j] - numbering[i]|

where L_eff is maximum(numbering) - minimum(numbering) + 1 and N is the number of contacts at that threshold.

source

ProteinChains.residue_pair_min_distances — Method

residue_pair_min_distances(chain; max_distance; use_numbering = false)

Compute the minimum all-atom distance for each residue pair in chain whose minimum inter-atomic distance is ≤ max_distance, using a KDTree over all atoms. Returns a Dict{Tuple{Int,Int},Float64} mapping an ordered residue identifier pair (i, j) with i < j to the minimum distance in Å.

If use_numbering == false (default), the residue identifiers are 1-based indices into the chain (i.e. positions in chain.atoms). If use_numbering == true, the identifiers are taken from chain.numbering.

source

ProteinChains.serialize — Method

serialize(filename::AbstractString, structures::AbstractVector{<:ProteinStructure})

Serialize a vector of ProteinStructure objects to a JLD2 file. This function creates a new ProteinStructureStore and writes each structure in the input vector to it. Each structure is stored using its name as the key.

source

ProteinChains.shape_features — Method

shape_features(chain)

For a single ProteinChain, return a named tuple with:

radius_of_gyration: standard R_g based on all atoms;
mean_pairwise_distance: mean distance between all atom pairs;
max_pairwise_distance: maximum distance between any two atoms.

source

ProteinChains.structure_atom_coords — Method

structure_atom_coords(structure)

Flatten all atoms from all chains in structure into a single coordinate matrix and parallel vectors giving the chain index and residue index for each atom.

source