ProteinChains
Documentation for ProteinChains.
ProteinChains.STANDARD_RESIDUE_TEMPLATEProteinChains.BackboneGeometryProteinChains.IdealResidueProteinChains.ProteinChainProteinChains.ProteinStructureProteinChains.ProteinStructureStoreProteinChains.ProteinStructureStoreProteinChains.all_atom_coordsProteinChains.append_residueProteinChains.atom_contact_orderProteinChains.cysteine_featuresProteinChains.deserializeProteinChains.interchain_atom_contact_proportionProteinChains.interchain_contact_countsProteinChains.interchain_residue_contact_proportionProteinChains.mapmmcifProteinChains.pdbentryProteinChains.prepend_residueProteinChains.residue_contact_orderProteinChains.residue_pair_min_distancesProteinChains.serializeProteinChains.shape_featuresProteinChains.structure_atom_coords
ProteinChains.STANDARD_RESIDUE_TEMPLATE — Constant
STANDARD_RESIDUE_TEMPLATEThis is a template of a "standard residue", with a very specific and distinct shape, size, and orientation. which needs to be consistent if we want to represent protein structures as sets of residue rotations and translations.
Thus, we can use this residue as a template for aligning other residues with very precise geometry to it.
julia> IdealResidue{Float64}(BackboneGeometry(N_Ca_C_angle = 1.93); template=ProteinChains.STANDARD_RESIDUE_TEMPLATE)
3×3 IdealResidue{Float64}:
-1.06447 -0.199174 1.26364
0.646303 -0.529648 -0.116655
0.0 0.0 0.0ProteinChains.BackboneGeometry — Type
BackboneGeometry(;
N_Ca_length = 1.46,
Ca_C_length = 1.52,
C_N_length = 1.33,
N_Ca_C_angle = 1.94,
Ca_C_N_angle = 2.03,
C_N_Ca_angle = 2.13,
)Define the idealized bond lengths and bond angles of a protein backbone.
ProteinChains.IdealResidue — Type
IdealResidue{T<:AbstractFloat} <: AbstractMatrix{T}
IdealResidue{T}(backbone_geometry=DEFAULT_BACKBONE_GEOMETRY; template=nothing) where TA 3x3 matrix representing the idealized geometry of a protein residue, with columns representing the N, Ca, and C atom positions of a residue positioned at the origin.
ProteinChains.ProteinChain — Type
ProteinChain{T<:Real}Represents a protein chain with a basic set of fields from which some other properties might be derived.
ProteinChains.ProteinStructure — Type
ProteinStructure{T} <: AbstractVector{ProteinChain{T}}Fields
name::String: Usually just the base name of the original file.chains::Vector{ProteinChain{T}}: a collection ofProteinChains.atoms::Vector{Atom{T}}: free atoms from the structure that were not part of any protein residue.
ProteinChains.ProteinStructureStore — Type
ProteinStructureStore <: AbstractDict{InlineStrings.String31,ProteinStructure}A JLD2-based store for protein structures implementing the AbstractDict interface, allowing for dictionary operations on the stored structures.
Keys are stored as InlineStrings.String31 objects to reduce references. This means keys are limited to 31 bytes.
A ProteinStructureStore gets closed automatically when there no longer exists a program-accessible reference to it.
Examples
julia> store = ProteinStructureStore("store.pss")
ProteinStructureStore with 0 entries
julia> store["3HFM"] = pdb"3HFM"
[ Info: Downloading file from PDB: 3HFM
3-element ProteinStructure "3HFM.cif":
215-residue ProteinChain{Float64} (H)
214-residue ProteinChain{Float64} (L)
129-residue ProteinChain{Float64} (Y)
julia> store
ProteinStructureStore with 1 entry
julia> keys(store)
Set{InlineStrings.String31} with 1 element:
InlineStrings.String31("3HFM")
julia> delete!(store, "3HFM")
ProteinStructureStore with 0 entriesProteinChains.ProteinStructureStore — Method
ProteinStructureStore(f::Function, filename, mode="a+")ProteinChains.all_atom_coords — Method
all_atom_coords(chain)Return a 3 × N matrix of all-atom Cartesian coordinates for a ProteinChain. This is written to be type-stable and allocation-friendly.
ProteinChains.append_residue — Method
append_residue(Backbone::Backbone, torsion_angles::Vector{<:Real}; ideal::BackboneGeometry=DEFAULT_BACKBONE_GEOMETRY)Create a new backbone by appending 3 new torsion angles (ψ, ω, ϕ) at the end, using bond lengths and bond angles specified in BackboneGeometry.
ProteinChains.atom_contact_order — Method
atom_contact_order(chain; cutoff = 6.0)Atom-level contact order (ALCO) matching the definition in Eric Alm's contactOrder.pl script.
For a single chain:
- consider all non-hydrogen atoms,
- for each contacting atom pair within
cutoffÅ, with positive sequence separation inchain.numbering, accumulate that separation; - the absolute CO is
order / counts; - the relative CO is
absolute_CO / (max(numbering) - min(numbering) + 1).
Results should match Plaxco et al. 1998, Table 1.
ProteinChains.cysteine_features — Method
cysteine_features(structure; distance_cutoff = 2.5)For each chain, count:
num_paired::Int: number of cysteines that participate in a disulfide bond (including those paired with cysteines in other chains),num_cys::Int: total number of cysteines in the chain.
Pairing is defined by SG–SG distances < distance_cutoff Å, including pairs between different chains. Returns a vector of named tuples, one per chain.
ProteinChains.deserialize — Method
deserialize(filename::AbstractString)Deserialize ProteinStructure objects from a JLD2 file. Returns a Vector{ProteinStructure} of all structures stored in the file.
ProteinChains.interchain_atom_contact_proportion — Method
interchain_atom_contact_proportion(structure; distance_cutoff = 5.0)Atom-level inter-chain contact proportion: for each chain, returns the fraction of atoms that are within distance_cutoff Å of at least one atom in another chain.
Results should match Plaxco et al. 1998, Table 1.
ProteinChains.interchain_contact_counts — Method
interchain_contact_counts(structure; distance_cutoff = 5.0)For each chain, count how many residues make contact with at least one residue in any other chain, based on all-atom contacts with a distance cutoff distance_cutoff (Å).
Returns a vector of integers, one per chain.
ProteinChains.interchain_residue_contact_proportion — Method
interchain_residue_contact_proportion(structure; distance_cutoff = 5.0)Residue-level inter-chain contact proportion: for each chain, returns the fraction of residues that have at least one atom within distance_cutoff Å of any atom in another chain.
ProteinChains.mapmmcif — Method
mapmmcif(mmcifdict, field1 => field2, field3 => field4, ...)julia> import BioStructures
julia> filename = BioStructures.downloadpdb("3HFM", format=BioStructures.MMCIFFormat);
[ Info: Downloading file from PDB: 3HFM
julia> mmcifdict = BioStructures.MMCIFDict(filename);
julia> mapmmcif(mmcifdict,
"_atom_site.auth_asym_id" => "_atom_site.label_entity_id",
"_entity_src_gen.entity_id" => "_entity_src_gen.pdbx_gene_src_ncbi_taxonomy_id")
Dict{String, String} with 3 entries:
"Y" => "9031"
"L" => "10090"
"H" => "10090"ProteinChains.pdbentry — Method
pdbentry(pdbid::AbstractString; format=MMCIFFormat, kws...)Keyword arguments get propagated to BioStructures.downloadpdb
Downloads are cached in a temporary directory.
Examples
julia> pdbentry("1EYE")
[ Info: Downloading file from PDB: 1EYE
1-element ProteinStructure "1EYE.cif":
256-residue ProteinChain{Float64} (A)
julia> pdb"1EYE" # string macro for convenience
[ Info: File exists: 1EYE
1-element ProteinStructure "1EYE.cif":
256-residue ProteinChain{Float64} (A)
julia> pdb"1EYE"A # string suffix to get a specific chain
[ Info: File exists: 1EYE
256-residue ProteinChain{Float64} (A)
julia> pdb"1EYE"1 # integer suffix to specify "ba_number" keyword
[ Info: Downloading file from PDB: 1EYE
2-element ProteinStructure "1EYE_ba1.cif":
256-residue ProteinChain{Float64} (A)
256-residue ProteinChain{Float64} (A-2)ProteinChains.prepend_residue — Method
append_residue(Backbone::Backbone, torsion_angles::Vector{<:Real}; ideal::BackboneGeometry=DEFAULT_BACKBONE_GEOMETRY)Create a new backbone by prepending 3 new torsion angles (ψ, ω, ϕ) at the beginning, using bond lengths and bond angles specified in the BackboneGeometry.
ProteinChains.residue_contact_order — Method
residue_contact_order(chain; distance_thresholds = (6.0, 8.0, 10.0))Compute length-normalized contact order for a single chain, using all-atom contacts.
For each distance threshold d, contacts are residue pairs with any inter-atomic distance < d Å, excluding pairs with sequence-number separation ≤ 1 based on chain.numbering.
Contact order is defined as
CO = (1 / (N * Leff)) * sum{(i,j) in C} |numbering[j] - numbering[i]|
where L_eff is maximum(numbering) - minimum(numbering) + 1 and N is the number of contacts at that threshold.
ProteinChains.residue_pair_min_distances — Method
residue_pair_min_distances(chain; max_distance; use_numbering = false)Compute the minimum all-atom distance for each residue pair in chain whose minimum inter-atomic distance is ≤ max_distance, using a KDTree over all atoms. Returns a Dict{Tuple{Int,Int},Float64} mapping an ordered residue identifier pair (i, j) with i < j to the minimum distance in Å.
If use_numbering == false (default), the residue identifiers are 1-based indices into the chain (i.e. positions in chain.atoms). If use_numbering == true, the identifiers are taken from chain.numbering.
ProteinChains.serialize — Method
serialize(filename::AbstractString, structures::AbstractVector{<:ProteinStructure})Serialize a vector of ProteinStructure objects to a JLD2 file. This function creates a new ProteinStructureStore and writes each structure in the input vector to it. Each structure is stored using its name as the key.
ProteinChains.shape_features — Method
shape_features(chain)For a single ProteinChain, return a named tuple with:
radius_of_gyration: standard R_g based on all atoms;mean_pairwise_distance: mean distance between all atom pairs;max_pairwise_distance: maximum distance between any two atoms.
ProteinChains.structure_atom_coords — Method
structure_atom_coords(structure)Flatten all atoms from all chains in structure into a single coordinate matrix and parallel vectors giving the chain index and residue index for each atom.