Examples
ProteinStructure and ProteinChain
using ProteinChains
Fetch a structure from the PDB:
structure = pdb"3NIR"
1-element ProteinStructure "3NIR.cif":
46-residue ProteinChain{Float64} (A)
propertynames(structure)
(:name, :chains, :atoms)
We can index by chain ID or by index:
chain = structure["A"]
46-residue ProteinChain{Float64} (A)
chain == structure[1]
true
A chain has a few basic fields:
propertynames(chain)
(:id, :atoms, :sequence, :numbering, :ins_codes)
chain.id
"A"
typeof(chain.atoms)
Vector{Vector{Atom{Float64}}} (alias for Array{Array{Atom{Float64}, 1}, 1})
chain.sequence
"TTCCPSIVARSNFNVCRLPGTPEALCATYTGCIIIPGATCPGDYAN"
chain.numbering
46-element Vector{Int64}:
1
2
3
4
5
6
7
8
9
10
⋮
38
39
40
41
42
43
44
45
46
chain.ins_codes # (sometimes just a string of spaces)
Indexable wrapping a Array (get value with `unwrap(x)`)
The atoms
field is a residue-wise vector of vectors of Atom
s, but often we just want the backbone:
get_backbone(chain)
3×3×46 Array{Float64, 3}:
[:, :, 1] =
3.265 4.047 4.817
-14.107 -12.839 -12.834
16.877 16.901 15.587
[:, :, 2] =
4.935 5.757 7.07
-11.632 -11.521 -10.839
15.046 13.85 14.21
[:, :, 3] =
8.12 9.465 10.033
-11.181 -10.668 -10.398
13.484 13.672 12.281
;;; …
[:, :, 44] =
16.11 14.796 14.319
-11.166 -10.726 -9.649
13.823 13.383 14.376
[:, :, 45] =
15.122 15.022 14.041
-8.598 -7.65 -6.508
14.466 15.547 15.291
[:, :, 46] =
13.513 12.645 11.201
-6.432 -5.347 -5.794
14.07 13.627 13.38
Chains can be indexed/reordered with a vector of residue indices:
chain[1:10].numbering
10-element Vector{Int64}:
1
2
3
4
5
6
7
8
9
10
chain[10:-1:1].numbering
10-element Vector{Int64}:
10
9
8
7
6
5
4
3
2
1
Dynamic properties
The chain type is dynamic, so we can add new properties dynamically at runtime:
chain.taxonomy_id = 3721;
chain.taxonomy_id
3721
delete!(chain, :taxonomy_id);
hasproperty(chain, :taxonomy_id)
false
For a property whose last dimension is tied to the number of residues, we can wrap it with Indexable
to automatically index it when we index the chain:
chain.confidence = Indexable(rand(length(chain)));
chain[1:10].confidence # still wrapped by Indexable
Indexable wrapping a SubArray (get value with `unwrap(x)`)
unwrap(chain[1:10].confidence)
10-element view(::Vector{Float64}, 1:10) with eltype Float64:
0.5298702568182996
0.3941423490777185
0.6048411782874423
0.565806724765865
0.7114206455622968
0.5315602713277932
0.5541407205889294
0.00484427195822934
0.10709619696055783
0.17282996946872087
Backbone geometry
There are utility functions for getting the backbone geometry:
get_bond_lengths(chain) # N₁-Ca₁, Ca₁-C₁, C₁-N₂, N₂-Ca₂, ... Caₙ-Cₙ
137-element Vector{Float64}:
1.4899409384267541
1.5229973736024631
1.3234081003227993
1.455479646027384
1.5227255169596399
1.321567251410233
1.4517362019320184
1.5265664086439212
1.338382979569003
1.4438819203799176
⋮
1.3412609738600474
1.453889954570153
1.540619031428602
1.3257111299223503
1.441272007637699
1.5271087060193196
1.3324417435670501
1.4583888370390115
1.5316507434790727
get_bond_angles(chain) # N₁-Ca₁-C₁, Ca₁-C₁-N₂, C₁-N₂-Ca₂, ... Cₙ-Caₙ-Cₙ
136-element Vector{Float64}:
1.827871186457387
1.983138955181121
2.0437725856964972
1.9037800202736577
2.0256187520350775
2.1815795899803243
1.8642086249679886
2.0639413829709
2.0929840239513604
1.9240162398704397
⋮
2.048312974109768
2.121393102341094
1.8716399948584543
1.993720427949135
2.1298052192012555
1.9940368854073829
2.0384883352986076
2.16112840954969
1.9746673510926809
get_torsion_angles(chain) # N₁-Ca₁-C₁-N₂, Ca₁-C₁-N₂-Ca₂, C₁-N₂-Ca₂-C₂, ... Cₙ₋₁-Nₙ-Caₙ-Cₙ
135-element Vector{Float64}:
2.5026669335531033
3.0499947519787094
-1.8577639592574344
2.621638694380875
-3.124624702060941
-2.4119446090661154
2.401530981697558
-3.069632443794041
-2.1575164774083353
2.58165021045778
⋮
-0.022063000258269294
2.9673306083304207
-2.251107492520742
1.028259099659418
-2.8813694570643014
-1.5378658168594752
-0.125461599869042
-3.053494884008746
-1.9544887246950824
There are also functions for getting the residue-wise rigid transformation "frames" of the chain:
frames = Frames(chain);
size(frames.rotations)
(3, 3, 46)
size(frames.translations)
(3, 46)
MMCIF utilities
MMCIF files contain a lot of information that is not present in PDB files. The mapmmcif
function can be used to map one MMCIF field to another.
mmcifdict = mmcifdict"3HFM";
[ Info: File exists: 3HFM
mapmmcif(mmcifdict, "_atom_site.auth_asym_id" => "_atom_site.label_entity_id")
Dict{String, String} with 3 entries:
"Y" => "3"
"L" => "1"
"H" => "2"
mapmmcif(mmcifdict, "_entity_src_gen.entity_id" => "_entity_src_gen.pdbx_gene_src_ncbi_taxonomy_id")
Dict{String, String} with 3 entries:
"1" => "10090"
"2" => "10090"
"3" => "9031"
chainid_to_taxonomyid = mapmmcif(mmcifdict,
"_atom_site.auth_asym_id" => "_atom_site.label_entity_id",
"_entity_src_gen.entity_id" => "_entity_src_gen.pdbx_gene_src_ncbi_taxonomy_id")
Dict{String, String} with 3 entries:
"Y" => "9031"
"L" => "10090"
"H" => "10090"
structure = pdb"3HFM"
for chain in structure
chain.taxonomy_id = chainid_to_taxonomyid[chain.id]
println("Set taxonomy_id of chain $(chain.id) to $(chain.taxonomy_id)")
end
[ Info: File exists: 3HFM
Set taxonomy_id of chain H to 10090
Set taxonomy_id of chain L to 10090
Set taxonomy_id of chain Y to 9031
ProteinStructureStore
The ProteinStructureStore <: AbstractDict{String, ProteinStructure}
type is a lazy wrapper for a file-based storage of ProteinStructure
s.
dir = mktempdir();
store = ProteinStructureStore(joinpath(dir, "structures.pss"));
store["3NIR"] = pdb"3NIR";
[ Info: File exists: 3NIR
store["3NIR"]
1-element ProteinStructure "3NIR.cif":
46-residue ProteinChain{Float64} (A)
store["3NIR"]["A"]
46-residue ProteinChain{Float64} (A)
This page was generated using Literate.jl.