Predicting Intrinsically Disordered Regions of Protein Chains
A support document for
FirstGlance in Jmol.
Purpose I: Crystallized residues are missing or have a high
temperature. Quite often some part of the protein sequence
that was crystallized is missing in the 3D model, or has a higher
temperature than the rest of the structure
(see Local Uncertainty in the
Views tab).
Empty basket at missing residues.
Missing
Residues are reported under the under the
Regions with missing residues will be marked with "empty baskets".
Quite often, segments that are missing or that have a high temperature
are predicited to be intrinsically disordered. When missing, such
segments are less likely to be resolved when the same protein
sequence is crystallized under other conditions.
Purpose II: Residues were deleted before crystallization.
Crystallization success is often improved by deleting flexible
portions of a protein chain, especially intrinsicially disordered
portions. (Another reason for deletion is when a region is labile
to degradation.)
You may like to know whether portions of a chain that
were deleted for a crystallization experiment are predicted to be
intrinsically disordered. On the other hand, it is sometimes possible
to obtain diffraction-quality crystals that include
intrinsically disordered portions, in which case those disordered
portions will likely be missing from the crystallographic model,
unless they are stabilized by
crystal contacts. (Example: in
3b0z, flexibility of the N terminus appears to be
functionally important, and it is predicted to be intrinsically
disordered. Nevertheless, it formed a helix in the crystal, stabilized
by crystal contacts.)
For an introduction to intrinsic disorder in proteins, see the article
in Proteopedia
Intrinsically Disordered Protein.
There you will also find a list of five or more prediction servers.
Below are instructions for using one of these servers, FoldIndex.
If the results are important to you, try other servers to see
how well they agree.
Before you begin the procedure below, take a careful look at the
Missing Residues report by FirstGlance in Jmol under
the
Copy the sequence of the chain of interest. In FirstGlance,
under the leftmost tab (usually labeled with the PDB code), click on Sequences.
(These links will take you directly to sequences. In contrast, the links in the
Resources Tab will take you to a general page on the PDB entry, where
you will have to find the sequences.)
Note that the sequences available from OCA, PDB-Europe, and
PDB-USA (RCSB) span only the sequence range used for structure determination,
which are quite often not full length sequences for the molecule
in question. These crystallized sequences are suitable for Purpose I
above. The FASTA sequences available from OCA are most convenient for copying.
Full length sequences are available from UniProt. These are suitable for
Purpose II above.
Paste the sequence into the box. (FoldIndex will ignore the FASTA
description line, beginning ">".)
Click the Process button. Now you have your results.
Caution! Don't look for predicted disorder by sequence
number! Why? FoldIndex numbers the sequence starting at one.
However, the first residue in a chain is often not numbered one
in PDB files. Worse, the sequence numbering in PDB files is sometimes
not consecutive because of numbering according to an ancestral
sequence, leading to insertions and deletions with bizzare numbering.
Keep reading ...
Purpose I: Crystallized residues are missing.
To find a segment that FoldIndex predicts to be disordered, use
the sequence capability of the Find.. tool (under the Tools Tab).
Quick check: Copy the segment predicted to be disordered and
see if it is found. If it is, then none of it is missing in the model!
As an example, we'll use
2ace. FoldIndex marks predicted
disorder in red.
Here we have copied the first red segment:
Be sure to remove the spaces when you paste the sequence into the slot
in the Find.. tool:
If the red sequence is found, then none of the residues in that sequence
are missing in the 3D model, and it was not disordered in the crystal.
If the red sequence is not found,
some or all of the red sequence
is missing from the 3D model. If even one residue in the sequence
is missing, the sequence will not be found.
To locate the missing residues in the 3D model,
copy a
short sequence of 4-5 residues that precedes or follows the
putatively disordered segment. Use this sequence in the Find..
tool.
Be sure to remove the spaces when you paste the sequence into the slot
in the Find.. tool:
Incorrect query: sequence=RIM HY
Correct query: sequence=RIMHY
Once you locate the general area with a preceding or following sequence
fragment, you will see an "empty basket" (marking missing residues)
nearby. Touch or click on the end(s) of the chain near that basket to
report the residues at the "broken end(s)".
To find out whether residues were deleted from the full length
sequence before crystallization, check the lengths of the sequences.
The length of the crystallized sequence is given in the FASTA title line
at the OCA server. Click on Sequences under the
and then click on the desired chain under OCA. Note the sequence length.
The full length is given in UniProt. Go to the
and click on the chain of interest under UniProt.
This will take you to the Sequences section, where you will find
the full length (red arrow below in snapshot):
In our example, 2ace, a protein chain of length 537 was crystallized,
but the full length sequence is 586 amino acids. Note that in some
cases, an expression tag may have been added to the crystallized
sequence.
What portions were deleted (or added?) before crystallization?
This can best be answered by aligning the crystallized sequence
with the full-length sequence.
While you are at UniProt, click the FASTA link (to the left
of the red arrow in the snapshot above) and
copy the full-length FASTA sequence.
At
UniProt, click on the Align tab
at the top of the page. Use the Clear button to clear the form box.
Paste the two FASTA sequences into the box. (You can enlarge the box
by dragging the lower right corner to help see what you are doing.)
The box should look similar to this. (A blank line between the
FASTA sequences is OK but not necessary).
After clicking the Align button, the resulting alignment shows
(in the example of 2ace) that the first 21 residues and the last 28
residues were deleted in the crystallized chain.
FoldIndex does not make predictions for the first and last 25 residues
due the size of its scanning interval (51 residues by default). In a different
case having larger
deletions, the predictions of FoldIndex for the full-length
sequence would be more useful.