Examples

ProteinStructure and ProteinChain

using ProteinChains

Fetch a structure from the PDB:

structure = pdb"3NIR"
1-element ProteinStructure "3NIR.cif":
 46-residue ProteinChain{Float64} (A)
propertynames(structure)
(:name, :chains, :atoms)

We can index by chain ID or by index:

chain = structure["A"]
46-residue ProteinChain{Float64} (A)
chain == structure[1]
true

A chain has a few basic fields:

propertynames(chain)
(:id, :atoms, :sequence, :numbering, :ins_codes)
chain.id
"A"
typeof(chain.atoms)
Vector{Vector{Atom{Float64}}} (alias for Array{Array{Atom{Float64}, 1}, 1})
chain.sequence
"TTCCPSIVARSNFNVCRLPGTPEALCATYTGCIIIPGATCPGDYAN"
chain.numbering
46-element Vector{Int64}:
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
  ⋮
 38
 39
 40
 41
 42
 43
 44
 45
 46
chain.ins_codes # (sometimes just a string of spaces)
Indexable wrapping a Array (get value with `unwrap(x)`)

The atoms field is a residue-wise vector of vectors of Atoms, but often we just want the backbone:

get_backbone(chain)
3×3×46 Array{Float64, 3}:
[:, :, 1] =
   3.265    4.047    4.817
 -14.107  -12.839  -12.834
  16.877   16.901   15.587

[:, :, 2] =
   4.935    5.757    7.07
 -11.632  -11.521  -10.839
  15.046   13.85    14.21

[:, :, 3] =
   8.12     9.465   10.033
 -11.181  -10.668  -10.398
  13.484   13.672   12.281

;;; … 

[:, :, 44] =
  16.11    14.796  14.319
 -11.166  -10.726  -9.649
  13.823   13.383  14.376

[:, :, 45] =
 15.122  15.022  14.041
 -8.598  -7.65   -6.508
 14.466  15.547  15.291

[:, :, 46] =
 13.513  12.645  11.201
 -6.432  -5.347  -5.794
 14.07   13.627  13.38

Chains can be indexed/reordered with a vector of residue indices:

chain[1:10].numbering
10-element Vector{Int64}:
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
chain[10:-1:1].numbering
10-element Vector{Int64}:
 10
  9
  8
  7
  6
  5
  4
  3
  2
  1

Dynamic properties

The chain type is dynamic, so we can add new properties dynamically at runtime:

chain.taxonomy_id = 3721;
chain.taxonomy_id
3721
delete!(chain, :taxonomy_id);
hasproperty(chain, :taxonomy_id)
false

For a property whose last dimension is tied to the number of residues, we can wrap it with Indexable to automatically index it when we index the chain:

chain.confidence = Indexable(rand(length(chain)));
chain[1:10].confidence # still wrapped by Indexable
Indexable wrapping a SubArray (get value with `unwrap(x)`)
unwrap(chain[1:10].confidence)
10-element view(::Vector{Float64}, 1:10) with eltype Float64:
 0.8180990362562345
 0.39801589277850535
 0.9927451976236543
 0.44244119271743776
 0.5895964593825309
 0.691535329405928
 0.26485228134786587
 0.7902944102892216
 0.18418458407578486
 0.39944528181384575

Backbone geometry

There are utility functions for getting the backbone geometry:

get_bond_lengths(chain) # N₁-Ca₁, Ca₁-C₁, C₁-N₂, N₂-Ca₂, ... Caₙ-Cₙ
137-element Vector{Float64}:
 1.4899409384267541
 1.5229973736024631
 1.3234081003227993
 1.455479646027384
 1.5227255169596399
 1.321567251410233
 1.4517362019320184
 1.5265664086439212
 1.338382979569003
 1.4438819203799176
 ⋮
 1.3412609738600474
 1.453889954570153
 1.540619031428602
 1.3257111299223503
 1.441272007637699
 1.5271087060193196
 1.3324417435670501
 1.4583888370390115
 1.5316507434790727
get_bond_angles(chain) # N₁-Ca₁-C₁, Ca₁-C₁-N₂, C₁-N₂-Ca₂, ... Cₙ-Caₙ-Cₙ
136-element Vector{Float64}:
 1.827871186457387
 1.983138955181121
 2.0437725856964972
 1.9037800202736577
 2.0256187520350775
 2.1815795899803243
 1.8642086249679886
 2.0639413829709
 2.0929840239513604
 1.9240162398704397
 ⋮
 2.048312974109768
 2.121393102341094
 1.8716399948584543
 1.993720427949135
 2.1298052192012555
 1.9940368854073829
 2.0384883352986076
 2.16112840954969
 1.9746673510926809
get_torsion_angles(chain) # N₁-Ca₁-C₁-N₂, Ca₁-C₁-N₂-Ca₂, C₁-N₂-Ca₂-C₂, ... Cₙ₋₁-Nₙ-Caₙ-Cₙ
135-element Vector{Float64}:
  2.5026669335531033
  3.0499947519787094
 -1.8577639592574344
  2.621638694380875
 -3.124624702060941
 -2.4119446090661154
  2.401530981697558
 -3.069632443794041
 -2.1575164774083353
  2.58165021045778
  ⋮
 -0.022063000258269294
  2.9673306083304207
 -2.251107492520742
  1.028259099659418
 -2.8813694570643014
 -1.5378658168594752
 -0.125461599869042
 -3.053494884008746
 -1.9544887246950824

There are also functions for getting the residue-wise rigid transformation "frames" of the chain:

frames = Frames(chain);
size(frames.rotations)
(3, 3, 46)
size(frames.translations)
(3, 46)

MMCIF utilities

MMCIF files contain a lot of information that is not present in PDB files. The mapmmcif function can be used to map one MMCIF field to another.

mmcifdict = mmcifdict"3HFM";
[ Info: File exists: 3HFM
mapmmcif(mmcifdict, "_atom_site.auth_asym_id"   => "_atom_site.label_entity_id")
Dict{String, String} with 3 entries:
  "Y" => "3"
  "L" => "1"
  "H" => "2"
mapmmcif(mmcifdict, "_entity_src_gen.entity_id" => "_entity_src_gen.pdbx_gene_src_ncbi_taxonomy_id")
Dict{String, String} with 3 entries:
  "1" => "10090"
  "2" => "10090"
  "3" => "9031"
chainid_to_taxonomyid = mapmmcif(mmcifdict,
    "_atom_site.auth_asym_id"   => "_atom_site.label_entity_id",
    "_entity_src_gen.entity_id" => "_entity_src_gen.pdbx_gene_src_ncbi_taxonomy_id")
Dict{String, String} with 3 entries:
  "Y" => "9031"
  "L" => "10090"
  "H" => "10090"
structure = pdb"3HFM"
for chain in structure
    chain.taxonomy_id = chainid_to_taxonomyid[chain.id]
    println("Set taxonomy_id of chain $(chain.id) to $(chain.taxonomy_id)")
end
[ Info: File exists: 3HFM
Set taxonomy_id of chain H to 10090
Set taxonomy_id of chain L to 10090
Set taxonomy_id of chain Y to 9031

ProteinStructureStore

The ProteinStructureStore <: AbstractDict{String, ProteinStructure} type is a lazy wrapper for a file-based storage of ProteinStructures.

dir = mktempdir();
store = ProteinStructureStore(joinpath(dir, "structures.pss"));
store["3NIR"] = pdb"3NIR";
[ Info: File exists: 3NIR
store["3NIR"]
1-element ProteinStructure "3NIR.cif":
 46-residue ProteinChain{Float64} (A)
store["3NIR"]["A"]
46-residue ProteinChain{Float64} (A)

This page was generated using Literate.jl.