THE CONSURF DATABASE
SERVER FOR THE IDENTIFICATION OF FUNCTIONAL REGIONS IN PROTEINS
1
2
3
4
5
6
7
8
9
Variable
Average
Conserved
Insufficient Data
1 11 21 31 41 S I L A T P I L I P E N Q R P P F P R S V G K V I R S E G T E G A K F R L S G K G V D Q D P K G I F |
51 61 71 81 91 R I N E I S G D V S V T R P L D R E A I A N Y Q L E V E V T D L S G K I I D G P V R L D I S V I D Q |
101 111 121 131 141 N D N R P M F K E G P Y V G H V M E G S P T G T T V M R M T A F D A D D P S T D N A L L R Y N I L K |
151 161 171 181 191 Q T P T K P S P N M F Y I D P E K G D I V T V V S P V L L D R E T M E T P K Y E L V I E A K D M G G |
201 211 H D V G L T G T A T A T I L I D D |
Homologues, Alignment and Phylogeny
- 67103 homologues were collected from the UNIPROT database using HMMER.
- Of these, 12429 homologues passed the thresholds (min/max similarity, coverage, etc), 6017 of them are CD-HIT unique.
- The calculations were conducted on 300 hits (query included), sampled from the unique hits. Click here if you wish to view the list of sequences which produced significant alignments, but were not chosen as hits.
- Average pairwise distance : 1.24
- Lower bound : 0.05
- Upper bound : 2.23
- Residue variety per position in the MSA (The table is best viewed with an editor that respects Comma-Separated Values)
- View MSA and phylogenetic tree using WASABI
- Download Phylogenetic Tree (Newick format)
- The best evolutionary model was selected to be: WAG. See details here
Running Parameters
- Homologues Search:
- Homologues were collected from UNIREF90 database, a clustered version of the UniProt database.
- Homologues search algorithm is HMMER.
- HMMER E-value cutoff is 0.0001.
- Number of HMMER Iterations is 1.
- Homologues Thresholds:
- CD-HIT cutoff is 95% (This is the maximal sequence identity between homologues).
- Maximal number of final homologues is 300. These are sampled from the list of unique homologues.
- Maximal overlap between homologues is 10% (If overlap between two homologues exceeds 10%, the highest scoring homologue is chosen).
- Coverage is 60% (This is the minimal percentage of the query sequence covered by the homologue).
- Alignment, Phylogeny and Conservation Scores:
- Multiple Sequence Alignment was built using MAFFT.
- Phylogenetic tree was built using Neighbor Joining with ML distance.
- Conservation Scores were calculated with the Bayesian method.
- Amino acid substitution model was chosen by best fit.
Related Links
- ConSurf server - for the identification of functional regions in proteins
- PDBsum - a pictorial database that provides an at-a-glance overview
- Proteopedia - an interactive encyclopedia of proteins