Examples
ProteinStructure and ProteinChain
using ProteinChainsFetch a structure from the PDB:
structure = pdb"3NIR"1-element ProteinStructure "3NIR.cif":
46-residue ProteinChain{Float64} (A)propertynames(structure)(:name, :chains, :atoms)We can index by chain ID or by index:
chain = structure["A"]46-residue ProteinChain{Float64} (A)chain == structure[1]trueA chain has a few basic fields:
propertynames(chain)(:id, :atoms, :sequence, :numbering, :ins_codes)chain.id"A"typeof(chain.atoms)Vector{Vector{Atom{Float64}}} (alias for Array{Array{Atom{Float64}, 1}, 1})chain.sequence"TTCCPSIVARSNFNVCRLPGTPEALCATYTGCIIIPGATCPGDYAN"chain.numbering46-element Vector{Int64}:
1
2
3
4
5
6
7
8
9
10
⋮
38
39
40
41
42
43
44
45
46chain.ins_codes # (sometimes just a string of spaces)Indexable wrapping a Array (get value with `unwrap(x)`)The atoms field is a residue-wise vector of vectors of Atoms, but often we just want the backbone:
get_backbone(chain)3×3×46 Array{Float64, 3}:
[:, :, 1] =
3.265 4.047 4.817
-14.107 -12.839 -12.834
16.877 16.901 15.587
[:, :, 2] =
4.935 5.757 7.07
-11.632 -11.521 -10.839
15.046 13.85 14.21
[:, :, 3] =
8.12 9.465 10.033
-11.181 -10.668 -10.398
13.484 13.672 12.281
;;; …
[:, :, 44] =
16.11 14.796 14.319
-11.166 -10.726 -9.649
13.823 13.383 14.376
[:, :, 45] =
15.122 15.022 14.041
-8.598 -7.65 -6.508
14.466 15.547 15.291
[:, :, 46] =
13.513 12.645 11.201
-6.432 -5.347 -5.794
14.07 13.627 13.38Chains can be indexed/reordered with a vector of residue indices:
chain[1:10].numbering10-element Vector{Int64}:
1
2
3
4
5
6
7
8
9
10chain[10:-1:1].numbering10-element Vector{Int64}:
10
9
8
7
6
5
4
3
2
1Dynamic properties
The chain type is dynamic, so we can add new properties dynamically at runtime:
chain.taxonomy_id = 3721;chain.taxonomy_id3721delete!(chain, :taxonomy_id);hasproperty(chain, :taxonomy_id)falseFor a property whose last dimension is tied to the number of residues, we can wrap it with Indexable to automatically index it when we index the chain:
chain.confidence = Indexable(rand(length(chain)));chain[1:10].confidence # still wrapped by IndexableIndexable wrapping a SubArray (get value with `unwrap(x)`)unwrap(chain[1:10].confidence)10-element view(::Vector{Float64}, 1:10) with eltype Float64:
0.35485393443246527
0.7166481724474872
0.6675048052293057
0.28363549746885486
0.6699220908914133
0.5047620273257207
0.17221398922209796
0.4151747366066296
0.927226756731327
0.31536608362592544Backbone geometry
There are utility functions for getting the backbone geometry:
get_bond_lengths(chain) # N₁-Ca₁, Ca₁-C₁, C₁-N₂, N₂-Ca₂, ... Caₙ-Cₙ137-element Vector{Float64}:
1.4899409384267541
1.5229973736024631
1.3234081003227993
1.455479646027384
1.5227255169596399
1.321567251410233
1.4517362019320184
1.5265664086439212
1.338382979569003
1.4438819203799176
⋮
1.3412609738600474
1.453889954570153
1.540619031428602
1.3257111299223503
1.441272007637699
1.5271087060193196
1.3324417435670501
1.4583888370390115
1.5316507434790727get_bond_angles(chain) # N₁-Ca₁-C₁, Ca₁-C₁-N₂, C₁-N₂-Ca₂, ... Cₙ-Caₙ-Cₙ136-element Vector{Float64}:
1.827871186457387
1.983138955181121
2.0437725856964972
1.9037800202736577
2.0256187520350775
2.1815795899803243
1.8642086249679886
2.0639413829709
2.0929840239513604
1.9240162398704397
⋮
2.048312974109768
2.121393102341094
1.8716399948584543
1.993720427949135
2.1298052192012555
1.9940368854073829
2.0384883352986076
2.16112840954969
1.9746673510926809get_torsion_angles(chain) # N₁-Ca₁-C₁-N₂, Ca₁-C₁-N₂-Ca₂, C₁-N₂-Ca₂-C₂, ... Cₙ₋₁-Nₙ-Caₙ-Cₙ135-element Vector{Float64}:
2.5026669335531033
3.0499947519787094
-1.8577639592574344
2.621638694380875
-3.124624702060941
-2.4119446090661154
2.401530981697558
-3.069632443794041
-2.1575164774083353
2.58165021045778
⋮
-0.022063000258269294
2.9673306083304207
-2.251107492520742
1.028259099659418
-2.8813694570643014
-1.5378658168594752
-0.125461599869042
-3.053494884008746
-1.9544887246950824There are also functions for getting the residue-wise rigid transformation "frames" of the chain:
frames = Frames(chain);size(frames.rotations)(3, 3, 46)size(frames.translations)(3, 46)MMCIF utilities
MMCIF files contain a lot of information that is not present in PDB files. The mapmmcif function can be used to map one MMCIF field to another.
mmcifdict = mmcifdict"3HFM";[ Info: File exists: 3HFMmapmmcif(mmcifdict, "_atom_site.auth_asym_id" => "_atom_site.label_entity_id")Dict{String, String} with 3 entries:
"Y" => "3"
"L" => "1"
"H" => "2"mapmmcif(mmcifdict, "_entity_src_gen.entity_id" => "_entity_src_gen.pdbx_gene_src_ncbi_taxonomy_id")Dict{String, String} with 3 entries:
"1" => "10090"
"2" => "10090"
"3" => "9031"chainid_to_taxonomyid = mapmmcif(mmcifdict,
"_atom_site.auth_asym_id" => "_atom_site.label_entity_id",
"_entity_src_gen.entity_id" => "_entity_src_gen.pdbx_gene_src_ncbi_taxonomy_id")Dict{String, String} with 3 entries:
"Y" => "9031"
"L" => "10090"
"H" => "10090"structure = pdb"3HFM"
for chain in structure
chain.taxonomy_id = chainid_to_taxonomyid[chain.id]
println("Set taxonomy_id of chain $(chain.id) to $(chain.taxonomy_id)")
end[ Info: File exists: 3HFM
Set taxonomy_id of chain H to 10090
Set taxonomy_id of chain L to 10090
Set taxonomy_id of chain Y to 9031ProteinStructureStore
The ProteinStructureStore <: AbstractDict{String, ProteinStructure} type is a lazy wrapper for a file-based storage of ProteinStructures.
dir = mktempdir();
store = ProteinStructureStore(joinpath(dir, "structures.pss"));store["3NIR"] = pdb"3NIR";[ Info: File exists: 3NIRstore["3NIR"]1-element ProteinStructure "3NIR.cif":
46-residue ProteinChain{Float64} (A)store["3NIR"]["A"]46-residue ProteinChain{Float64} (A)This page was generated using Literate.jl.