ProteinChains

Documentation for ProteinChains.

ProteinChains.STANDARD_RESIDUE_TEMPLATEConstant
STANDARD_RESIDUE_TEMPLATE

This is a template of a "standard residue", with a very specific and distinct shape, size, and orientation. which needs to be consistent if we want to represent protein structures as sets of residue rotations and translations.

Thus, we can use this residue as a template for aligning other residues with very precise geometry to it.

julia> IdealResidue{Float64}(BackboneGeometry(N_Ca_C_angle = 1.93); template=ProteinChains.STANDARD_RESIDUE_TEMPLATE)
3×3 IdealResidue{Float64}:
 -1.06447   -0.199174   1.26364
  0.646303  -0.529648  -0.116655
  0.0        0.0        0.0
source
ProteinChains.BackboneGeometryType
BackboneGeometry(;
    N_Ca_length = 1.46,
    Ca_C_length = 1.52,
    C_N_length = 1.33,

    N_Ca_C_angle = 1.94,
    Ca_C_N_angle = 2.03,
    C_N_Ca_angle = 2.13,
)

Define the idealized bond lengths and bond angles of a protein backbone.

source
ProteinChains.IdealResidueType
IdealResidue{T<:AbstractFloat} <: AbstractMatrix{T}

IdealResidue{T}(backbone_geometry=DEFAULT_BACKBONE_GEOMETRY; template=nothing) where T

A 3x3 matrix representing the idealized geometry of a protein residue, with columns representing the N, Ca, and C atom positions of a residue positioned at the origin.

source
ProteinChains.ProteinChainType
ProteinChain{T<:Real}

Represents a protein chain with a basic set of fields from which some other properties might be derived.

source
ProteinChains.ProteinStructureType
ProteinStructure{T} <: AbstractVector{ProteinChain{T}}

Fields

  • name::String: Usually just the base name of the original file.
  • chains::Vector{ProteinChain{T}}: a collection of ProteinChains.
  • atoms::Vector{Atom{T}}: free atoms from the structure that were not part of any protein residue.
source
ProteinChains.ProteinStructureStoreType
ProteinStructureStore <: AbstractDict{InlineStrings.String31,ProteinStructure}

A JLD2-based store for protein structures implementing the AbstractDict interface, allowing for dictionary operations on the stored structures.

Keys are stored as InlineStrings.String31 objects to reduce references. This means keys are limited to 31 bytes.

A ProteinStructureStore gets closed automatically when there no longer exists a program-accessible reference to it.

Examples

julia> store = ProteinStructureStore("store.pss")
ProteinStructureStore with 0 entries

julia> store["3HFM"] = pdb"3HFM"
[ Info: Downloading file from PDB: 3HFM
3-element ProteinStructure "3HFM.cif":
 215-residue ProteinChain{Float64} (H)
 214-residue ProteinChain{Float64} (L)
 129-residue ProteinChain{Float64} (Y)

julia> store
ProteinStructureStore with 1 entry

julia> keys(store)
Set{InlineStrings.String31} with 1 element:
  InlineStrings.String31("3HFM")

julia> delete!(store, "3HFM")
ProteinStructureStore with 0 entries
source
ProteinChains.all_atom_coordsMethod
all_atom_coords(chain)

Return a 3 × N matrix of all-atom Cartesian coordinates for a ProteinChain. This is written to be type-stable and allocation-friendly.

source
ProteinChains.append_residueMethod
append_residue(Backbone::Backbone, torsion_angles::Vector{<:Real}; ideal::BackboneGeometry=DEFAULT_BACKBONE_GEOMETRY)

Create a new backbone by appending 3 new torsion angles (ψ, ω, ϕ) at the end, using bond lengths and bond angles specified in BackboneGeometry.

source
ProteinChains.atom_contact_orderMethod
atom_contact_order(chain; cutoff = 6.0)

Atom-level contact order (ALCO) matching the definition in Eric Alm's contactOrder.pl script.

For a single chain:

  • consider all non-hydrogen atoms,
  • for each contacting atom pair within cutoff Å, with positive sequence separation in chain.numbering, accumulate that separation;
  • the absolute CO is order / counts;
  • the relative CO is absolute_CO / (max(numbering) - min(numbering) + 1).

Results should match Plaxco et al. 1998, Table 1.

source
ProteinChains.cysteine_featuresMethod
cysteine_features(structure; distance_cutoff = 2.5)

For each chain, count:

  • num_paired::Int: number of cysteines that participate in a disulfide bond (including those paired with cysteines in other chains),
  • num_cys::Int: total number of cysteines in the chain.

Pairing is defined by SG–SG distances < distance_cutoff Å, including pairs between different chains. Returns a vector of named tuples, one per chain.

source
ProteinChains.deserializeMethod
deserialize(filename::AbstractString)

Deserialize ProteinStructure objects from a JLD2 file. Returns a Vector{ProteinStructure} of all structures stored in the file.

source
ProteinChains.interchain_atom_contact_proportionMethod
interchain_atom_contact_proportion(structure; distance_cutoff = 5.0)

Atom-level inter-chain contact proportion: for each chain, returns the fraction of atoms that are within distance_cutoff Å of at least one atom in another chain.

Results should match Plaxco et al. 1998, Table 1.

source
ProteinChains.interchain_contact_countsMethod
interchain_contact_counts(structure; distance_cutoff = 5.0)

For each chain, count how many residues make contact with at least one residue in any other chain, based on all-atom contacts with a distance cutoff distance_cutoff (Å).

Returns a vector of integers, one per chain.

source
ProteinChains.interchain_residue_contact_proportionMethod
interchain_residue_contact_proportion(structure; distance_cutoff = 5.0)

Residue-level inter-chain contact proportion: for each chain, returns the fraction of residues that have at least one atom within distance_cutoff Å of any atom in another chain.

source
ProteinChains.mapmmcifMethod
mapmmcif(mmcifdict, field1 => field2, field3 => field4, ...)
julia> import BioStructures

julia> filename = BioStructures.downloadpdb("3HFM", format=BioStructures.MMCIFFormat);
[ Info: Downloading file from PDB: 3HFM

julia> mmcifdict = BioStructures.MMCIFDict(filename);

julia> mapmmcif(mmcifdict,
           "_atom_site.auth_asym_id"   => "_atom_site.label_entity_id",
           "_entity_src_gen.entity_id" => "_entity_src_gen.pdbx_gene_src_ncbi_taxonomy_id")
Dict{String, String} with 3 entries:
  "Y" => "9031"
  "L" => "10090"
  "H" => "10090"
source
ProteinChains.pdbentryMethod
pdbentry(pdbid::AbstractString; format=MMCIFFormat, kws...)

Keyword arguments get propagated to BioStructures.downloadpdb

Downloads are cached in a temporary directory.

Examples

julia> pdbentry("1EYE")
[ Info: Downloading file from PDB: 1EYE
1-element ProteinStructure "1EYE.cif":
 256-residue ProteinChain{Float64} (A)

julia> pdb"1EYE" # string macro for convenience
[ Info: File exists: 1EYE
1-element ProteinStructure "1EYE.cif":
 256-residue ProteinChain{Float64} (A)

julia> pdb"1EYE"A # string suffix to get a specific chain
[ Info: File exists: 1EYE
256-residue ProteinChain{Float64} (A)

julia> pdb"1EYE"1 # integer suffix to specify "ba_number" keyword
[ Info: Downloading file from PDB: 1EYE
2-element ProteinStructure "1EYE_ba1.cif":
 256-residue ProteinChain{Float64} (A)
 256-residue ProteinChain{Float64} (A-2)
source
ProteinChains.prepend_residueMethod
append_residue(Backbone::Backbone, torsion_angles::Vector{<:Real}; ideal::BackboneGeometry=DEFAULT_BACKBONE_GEOMETRY)

Create a new backbone by prepending 3 new torsion angles (ψ, ω, ϕ) at the beginning, using bond lengths and bond angles specified in the BackboneGeometry.

Note

The torsion angle order is the same as it would be when appending. The order is not reversed.

source
ProteinChains.residue_contact_orderMethod
residue_contact_order(chain; distance_thresholds = (6.0, 8.0, 10.0))

Compute length-normalized contact order for a single chain, using all-atom contacts.

For each distance threshold d, contacts are residue pairs with any inter-atomic distance < d Å, excluding pairs with sequence-number separation ≤ 1 based on chain.numbering.

Contact order is defined as

CO = (1 / (N * Leff)) * sum{(i,j) in C} |numbering[j] - numbering[i]|

where L_eff is maximum(numbering) - minimum(numbering) + 1 and N is the number of contacts at that threshold.

source
ProteinChains.residue_pair_min_distancesMethod
residue_pair_min_distances(chain; max_distance; use_numbering = false)

Compute the minimum all-atom distance for each residue pair in chain whose minimum inter-atomic distance is ≤ max_distance, using a KDTree over all atoms. Returns a Dict{Tuple{Int,Int},Float64} mapping an ordered residue identifier pair (i, j) with i < j to the minimum distance in Å.

If use_numbering == false (default), the residue identifiers are 1-based indices into the chain (i.e. positions in chain.atoms). If use_numbering == true, the identifiers are taken from chain.numbering.

source
ProteinChains.serializeMethod
serialize(filename::AbstractString, structures::AbstractVector{<:ProteinStructure})

Serialize a vector of ProteinStructure objects to a JLD2 file. This function creates a new ProteinStructureStore and writes each structure in the input vector to it. Each structure is stored using its name as the key.

source
ProteinChains.shape_featuresMethod
shape_features(chain)

For a single ProteinChain, return a named tuple with:

  • radius_of_gyration: standard R_g based on all atoms;
  • mean_pairwise_distance: mean distance between all atom pairs;
  • max_pairwise_distance: maximum distance between any two atoms.
source
ProteinChains.structure_atom_coordsMethod
structure_atom_coords(structure)

Flatten all atoms from all chains in structure into a single coordinate matrix and parallel vectors giving the chain index and residue index for each atom.

source