
E1DS (Enzyme 1-dimensional Signature) is a web server that carries out prediction of enzyme catalytic sites based on a collection of 1-dimensional signatures of protein sequences. The provided sequence signatures are derived by a novel pattern mining approach that aims at discovering long motifs consisted of several sequential blocks (ungapped polypeptides). It uses 5421 signatures that in total cover 932 4-digital EC numbers for catalytic site prediction. When compared with some of some existing pattern databases, E1DS provides more complete sequence signatures which benefit enzyme sequence/structure analyses and function inference.
Behind E1DS, there is a signature database that is built with 61829 enzymes collected from Swiss-Prot. These signatures are considerably conserved among the protein members in the associated EC group. When a query sequence is given, E1DS will first identify the potential EC and then find the most matched signature with respect to the query sequence. Finally, the matched residues will be highlighted as the predicted catalytic residues. According to our experiment results on a test set, 69.9% catalytic sites can be successfully predicted by E1DS.
The sequence signatures are derived by a novel pattern mining approach, WildSpan, that aims at discovering long motifs consisted of several sequential blocks. WildSpan is a constraint-based sequential pattern mining algorithm to extract frequent patterns from unaligned sequences that satisfy the user-specified constraints, where pattern components maintain their order in the sequential data [1,2]. The pattern mining procedure of WildSpan can be divided into two phases. In the first phase, WildSpan generates the complete set of closed pattern blocks satisfying the block constraint and the intra-block gap constraint. A pattern or block is closed if none of its super-patterns getting exactly the same support (i.e. occurrence frequency). After that, in the second phase, WildSpan discovers the complete set of closed long patterns satisfying the inter-block gap constraint by connecting frequent blocks found in the first phase with large irregular gaps.
[1] Hsu, C.-M., Chen, C.-Y., and Liu, B.-J. (2006) MAGIIC-PRO: Detecting functional signatures by efficient discovery of long patterns in protein sequences. Nucleic Acids Res, 34, W356-W361.
[2] Hsu, C.-M. (2007) WildSpan: Discovery of Discontinuous Functional Motifs from Biological Sequences Using Constraint-based Sequential Pattern Mining. PHD Thesis. Taoyuan, Yuan Ze University.
This is the home page of E1DS:

There are three ways in this page that you can use to submit a query protein for which you want to know where the catalytic site is. In the first way, you can specify the query protein using a Swiss-Prot accession number, entry name, or a PDB chain ID. For example, if you are interested in the catalytic site of a cysteine desulfurase, please fill in the "Protein ID" field with your protein ID:

and then click the "Predict with the protein ID" link.

In this case, we use the default protein ID P77444. If you do not know the protein ID (or protien sequence) of the proteins of interest, this section could be helpful to you. To submit proteins in the other two ways is similar to the above process except requiring a protein sequence (in FASTA format). It is up to you which way to use, a protein ID, a protein sequence, or a FASTA file.
After submitting a query protein, you will see the prediction page of E1DS like this:

For normal queries, the prediction results will consist of 4 parts. The 1st part lists basic information for your input sequence. The users are encouraged to check this information carefully to make sure the server handles the query correctly. The 2nd part provides a sequence view of the predicted catalytic site. The 3rd and 4th part provide a 3-dimensional view of the predicted catalytic site. The 3rd part shows the predicted catalytic site mapped on a selected protein (not the query protein) with PDB structure available. You may see the predicted catalytic site mapped on existing protein structures by the user interface in the 4th part.
Sometimes you only know the protein name (e.g. cysteine desulfurase) and have no appropriate protein ID or protein sequence at hand. Moreover, maybe you never hear about any of the following ID types in E1DS (Swiss-Prot accession number, entry name, and PDB chain ID), neither FASTA. Here we provide a simple method to get the protein ID/FASTA. However, a protein name might result in many protein IDs/sequences and you must choose the most appropriate one by yourself.
First, go to the home page of Swiss-Prot (shown below) where you can find a powerful keyword search throughout the whole Swiss-Prot database.

All we need to do is to input the keyword. Here we use "cysteine desulfurase" as an example. Then, click the "Go" button.

Here comes the results page. In this example, O32975 is the Swiss-Prot accession number and CSD1_MYCLE is the entry name. Both of them are valid input in E1DS.

Furthermore, if you want the FASTA of this sequence, please click the entry name,

then you will see a entry page like this. Click the "FASTA format" link pointed by the arrow to retrieve the FASTA.

This panel (the 1st part in the prediction page) provides some basic information parsed for your input. If you inputted a protein ID, this panel will show such information.

Remember to examine the "Description of the enzyme class" field to make sure that the prediction results is what you are interested. Once you inputted a protein sequence, this panel will look somewhat different.

It also contains the "Description of the enzyme class" field. In addition, some characters might be pruned (as gray) and please make sure the black sequence is exactly what you want for prediction.
This panel (the 2nd part in the prediction page) highlighted the predicted catalytic residues on the query sequence.

The letters in the center part is the query sequence in FASTA format while the numbers beside the sequence are shown for you to get the position of a specific amino acid conveniently. In addition, when your cursor stays over an amino acid for a while, you will see a little tip which displays the index of that amino acid.
The underscored amino acids are the predicted catalytic residues. Different colors represent different blocks from the signature used. For more information about E1DS block, you may check the references in this section. Generally speaking, residues in the same block cluster in the primary structure, and residues from different blocks cluster together in the tertiary structure. These colors also help the user to map the residues to that shown visually in the structure panel (explained below).
This panel is consisted of two regions (the 3rd and the 4th part in the prediction page). The left region embeds a Java viewer for chemical structures in 3D while the right region provides an interface to control which PDB structure to be shown in the left region.

In the left region, the blocks of predicted catalytic residues are illustrated as sticks with distinct colors corresponding to their sequence expression form as shown in the sequence panel. Ligands are displayed in spacefill and colored in CPK mode.
E1DS will collect PDB structures that are similar to your query sequence and list their information in the right region of this panel. In this case, there are 5 structures similar to the query sequence. The marked PDB ID (here is 1KMK:A) means that it is shown in the left region now. The e-value is provided by sequence alignment (BLAST) between the query sequence and the PDB ID. You can see another how the predicted catalytic site looks on another PDB structure by clicking its ID.

Swiss-Prot is a curated protein sequence database which strives to provide a high level of annotation (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases (More details|References|Linking to Swiss-Prot|User manual).
E1DS is a signature-based predictor for catalytic site. These underlay signatures are mined based on all sequence available from the current release of Swiss-Prot. That is, the E1DS signatures are heavily dependent on the sequence distribution of Swiss-Prot. If you are interested in how E1DS generates these signatures, this section would be helpful.
The Protein Data Bank (PDB) is the single worldwide depository of information about the three-dimensional structures of large biological molecules, including proteins and nucleic acids. These are the molecules of life that are found in all organisms including bacteria, yeast, plants, flies, and mice, and in healthy as well as diseased humans. Understanding the shape of a molecule helps to understand how it works (Linking to PDB).
One of the remarkable features of E1DS is it maintains a comprehensive signature-structure mapping for users to have a 3D view of the predicted residues on selected structures through the well designed interface. All these structures E1DS used come from PDB.
PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them (More details|References|Linking to PROSITE).
PROSITE, like E1DS, provides signatures which are called "patterns" in PROSITE for enzymes. Although the two kinds of signatures have many functions in common, they are pretty different from a fundamental view. E1DS signatures are constructed to characterize the functional regions as complete as possible while PROSITE patterns are designed for function inference to achieve both high sensitivity and specificity when performing functional prediction. That is, E1DS signatures are suitable for catalytic residue detection while PROSITE patterns might sacrifice some catalytic residues for better sensitivity and specificity. Moreover, E1DS only focuses on enzyme while PROSITE provides more comprehensive analyses on general proteins.
The structure panel in the prediction page provides an interactive interface of enzyme structure in 3D. However, for better experience, this panel utilized many modern web technologies such as Java Applet and AJAX. We provide some checkpoints to help people who have no idea about these web technologies.