Problem Solving With the Research Collaboratory for Structural Bioinformatics Protein Data Bank Overall Searching and Browsing Capabilities

An interesting way of exploring structures relevant to medicinal chemistry is to enter the RCSB PDB via the Molecule of the Month column, linked to from the home page of the PDB website. This feature illustrates important biological molecules and how they function through descriptive text and pictures, with links to specific PDB entries and other resources. To dig deeper, various search interfaces are available on the RCSB PDB website. From the home page, users can search by PDB ID (if known, such as from a scientific publication), author name, or a keyword search using general terms such as 'diabetes' or 'insulin.' A number of different reports can be generated to compare various features of the selected structures. Selecting any of the PDB IDs listed on the results page will bring up a structure summary page for that individual structure. This latter page provides summary information about the entry, an illustration, links to coordinates, and links to detailed reports (such as biology and chemistry, or materials and methods) available within the RCSB PDB, as well as links to external web resources.

A number of additional search options, referred to as Query-by-Example, are available from the structure summary page. They offer simple ways of retrieving search results that share particular attributes. For example, all structures associated with a particular author can be quickly retrieved by simply choosing a single author name from the page.

Derived features including SCOP14 and/or CATH13 structure classifications, and Gene Ontology (GO) terms24 describing molecular function, biochemical process, and cellular location can also be searched in this way. For example, choosing 'hormone activity' under GO terms and 'molecular function' on the structure summary page for the insulin structure 1APH25 retrieves all structures that have been classified under the same GO branch.

From this page, it is also possible to retrieve the PubMed abstract, and, using terms from it, to search MEDLINE for all structures with abstracts that contain the same terms.

There is also a more advanced and specialized search interface that allows a user to search for structures that have common characteristics in their experimental details, geometry, biology, chemistry, SCOP and CATH classifications, and citations.

Search Sequence provides several ways of finding structures that contain similarities to a given amino acid or nucleic acid sequence. Search Unreleased looks for structures that have been submitted to the PDB but have not yet been released. Search Ligands provides an interface for looking at all the small molecules covalently or noncovalently associated with macromolecules.

Using the 'PDB ID or keyword' option on the home page also queries the static web pages on the RCSB PDB site. For example, a search for insulin finds a number of pages, including the corresponding Molecule of the Month edition.

An alternative to searching is browsing, which is particularly useful in situations where queries cannot be quantitatively defined. A number of browsers, such as Biological Process, Molecular Function, or Disease, are available. Each browser (e.g., Browse Database —Disease) offers a hierarchical classification that can be expanded (e.g., to Cancers —Colon Cancer) to show all structures associated with a particular disease.

Finally, a number of tools are available for evaluating and refining search results. Tabular reports (e.g., Structure Summary, Ligands, or Primary Citation) can be produced from the results obtained from various searches. These reports can also be customized from a large number of available attributes describing various aspects of the structures, to aid the user in evaluating the relevance of their search results.

No single search strategy can be recommended as the best practice for all possible scenarios. In general, however, users might be advised to start with simple keyword searches for one or more search terms of interest. Based on the number and relevance of the results returned by these simple searches, the queries can then be refined based on additional search terms, or more specific searches can be composed (e.g., using the Advanced Search option). Alternatively, browsing can provide some good first impressions of the content of the PDB. Finding Ligands and Exploring Protein-Ligand Interactions

Particularly relevant to medicinal chemistry is the study of chemical entities bound to proteins, DNA, and RNA. These entities include commercial drugs, known inhibitors, toxic agents, and small molecules found in the cell such as ATP and GTP. The PDB website represents these entities by their common names, a three-letter code, a two-dimensional (2D) chemical diagram, and a SMILES string (actually used to construct the 2D diagram).26 Entities may be downloaded as a MOL file (defined by Molecular Design Ltd as a file that describes chemical structure) or as part of the PDB structure file.

There are several ways to search for and display ligands. If a particular structure has ligands, they are displayed in the Chemical Component section on the Structure Summary web page (Figure 3). Users can retrieve a list of PDB structures with this ligand, view the ligand structure itself, or view ligand interactions with the macromolecule. The ligand structure view provides a chemical structure for the ligand, a 2D MarvinView sketch, a SMILES string, and a link to the MOL file. The SMILES string opens up the Search Ligands form, to let the user modify the ligand and run a similarity/substructure search.

The ligand interaction view launches the LigPro Ligand Explorer,27 a three-dimensional (3D) interactive tool that is specifically designed for inspecting protein-ligand interactions, such as hydrophilic, hydrophobic, and other van der Waals interactions. Ligand Explorer dynamically generates the associated ligand-macromolecule contact list, centers the view at the user-selected ligand, calculates interactions within a user-specified range, and provides one-click inspections of different types of interactions. Ligand Explorer provides two-way communication between the macromolecular sequence and the structure viewers - a click on any residue in the sequence viewer will highlight that residue in the structure viewer; a click on any fragment/residue in the structure viewer will highlight the corresponding sequence in the sequence viewer. The ATP-binding site of a protein kinase (PDB ID 1ATP28) is examined with Ligand Explorer in Figure 4a. Finding Disease-Related Structures

The disease browser offers an ideal starting point for a user interested in structures that have been solved for proteins implicated in human disease. This browser is accessible from the search menu tab under Browse Database — Disease.


Home Search) Structure]

► Download Files

■ FASTA Sequence

► Display Files

► Display Molecule

■ Structural Reports

► Structure Analysis

An Information Portal to Biological Macromoiecular Structures

_As of Tuesday Aug 16. 2006 there are 3226S Structures » I PDBStatisticse

Structure Summary Page [more...]



Biological Molecule / Asymmetric Unit



Zheng, J. Trafny, E,A Knighton, D R. Xuong, N.-H, Taylor, S.S. Teneyck, L.F. Sowadskl, J.M.

Primary Citation

Zheng, 3.H. Trafny, E.A. Knighton, D.R. Xuong, H.H. Taylor, S.S. Teneyck, L.F. Sowadski, jm. 2.2-ANGSTROM REFINED CRYSTAL-STRUCTURE OF THE CATALYTIC SUBUNIT OF CAMP-DEPENDENT PROTEIN-KIN ASE COMPLEXED WITH MNATP AND A PEPTIDE INHIBITOR. Acta Crystailogr D Biol Crystaltogr v49 pp.3G2-365 . 1993


Deposition 1993-01-08 Release 1993-04-15

Expérimentai Method



Resolution^ ] i - R-Value R-Free Space Group 2.20 0.177 (work) n/a P 2t 2i 2t

Unit Cell

Length [A] a 73.5E b 76.28 c S0.5B Angles [ | alpha 90.00 beta 90.00 gamma 90.OD

Molecular Description

Polymer: 1 Molecule: cAMP-DEPENDENT PROTEIN KINASE Chains: E: EC No.: if Polymer: 2 Chaihs: 1:

Functional Class

Transferase (phos photransf erase)

Source Polymer: 1 Scientific Nam©: Mus musculus « Polymer: 2 Scientific Name: Synthetic construct

Component^ ldentlfier Name Formula




Display Options

KING Jmol WebMol All Images

Source Polymer: 1 Scientific Nam©: Mus musculus « Polymer: 2 Scientific Name: Synthetic construct

Component^ ldentlfier Name Formula




SCOP Domain info Class Classification d1atpe_


Alpha and Protein beta proteins kinase-like (a+b) {PK-like)

Superfamily Protein kinase-like (PK-like)


Protei I catalytic subunit


Ligand Interaction

Protein kinases. ^Mp-de^pendent Mouse (Mus catalytic subumt musculu


House ( I musculus)

CATH Domain Classification 1ajpE1


GO Terms Polymer



Mainly Alpha

Alpha 8eta

2-Layer Sandwich

Topology Homology

Transferase( Phosphotransferase); Transferase( Phosphotransferase) domain i domain 1

Phosphorylase Kinase; domain 1 Phosphorylase Kinase; domain 1


Molecular Function Biological Process

• protein kinase activity

• protein amino acid phosphorylation

* protein serine/threonine kinase activity

Figure 3 The Structure Summary page for 1ATP.28 The Chemical Component section in the middle gives information about ligands and provides links to view the ligand structure and interaction.

Figure 4 Ligand Explorer. (a) The view for 1ATP. When the user selects 'ATP_1' from the left side bar and 'Hydrophilic Interactions,' and clicks on the apply button, LigPro computes the view centered at ligand ATP_1, displays the protein residues that have hydrophilic (H bond) interactions with ATP in the structure viewer, and highlights these residues in red in the sequence viewer at the top. The green dashed lines connect putative H bond donors and acceptors, with distances (in angstroms) displayed at the midpoint. Selecting an interacting protein residue in the sequence viewer highlights this residue in yellow in the structure viewer (SER53 here). The number of calculated interactions is displayed in the status bar at the bottom. Clicking on a noninteracting protein residue in the sequence viewer turns on the all-atom display for this residue. Through the analysis menu on the top, the user can measure the distance between any two atoms, the angle between any three atoms, and the dihedral angles between the planes made by any four atoms. The image or selected interactions can be saved (under the File menu) for further analysis or publication. (b) View of the hydrogen bond interaction of the drug indinavir with the critical ASP25 residue in each of the two chains of the HIV-1 protease 1HSG.38

File Analysis Tools Help


Center the veiw at ligand O MN_2 O MN_3 OP03J97 OP03J338 ® ATP_1 Display inieraclion

□ Inler-ligand 11 I I Display protein-ligand interaction fâ Hydrophilic (H-bond! BBB

□ Hydrophobic UJJ

□ Other LLL Adjust interaction threshold

Hydrophilic -Hydrophobic fvg . [ij Other 0 -

A:GLftll21 \


f AN

Status: |lnteractions:gQ hydrophilic (h-bond) ~

File Analysis Tools Help


The hierarchy in the browser is based on the chapters and sections in the e-book Genes and Disease.29 The focus of the e-book is the set of inherited diseases caused by a mutation on a single gene. Identification of mutations in multiple genes whose products interact, disease-causing mutations whose phenotype is influenced by environmental conditions, and mutations in regulatory elements causing inherited diseases are challenges that will be addressed in the future.

Structures of proteins associated with any given disease are identified following a mapping to one or more OMIM (Online Mendelian Inheritance in Man) numbers.

A user interested in retrieving structures linked to a disease (e.g., colon cancer) can enter the term and perform a search across the hierarchy in the disease browser or run a keyword search: 'colon' and 'cancer.' There are 61 structures (at the time of writing) of proteins that are associated with colon cancer, as retrieved from the disease browser. Structures of proteins with similar sequences can be eliminated using the homology reduction function available from the query results page. This will allow the user to focus on just the proteins that differ from each other at the sequence level. Upon removing homologous sequences (menu item Narrow Query — Remove Similar Structures — 90% identity), five structures are returned. A number of tabular reports can be generated for the structures listed on the results page. The Summary Reports — Biological Details report lists EC (Enzyme Classification) numbers and GO terms and ID associated with each of the structures. Depending on the user's focus, any of these structures may be further explored. For example, the GO term definitions for structure 1CTQ29a suggest that this protein participates in molecular signaling processes within the cell and plays a role in modulating the cell cycle, both of which are generally implicated in the process of transformation of healthy tissues to carcinomas.

The structure summary page for 1CTQ provides the user with further insight into the structure. An interesting report to view from this page would be the biology and chemistry report (found under the Structural Reports menu item). This report lists the OMIM numbers and OMIM clinical synopses associated with 1CTQ. Clicking on the OMIM 190020 link leads to the OMIM summary for the gene HRAS, coding for p21. The summary lists research on this gene and mutations leading to its transition to a transforming gene. A point mutation at codon 12 replaces the glycine residue at that position, drastically impeding GTP hydrolysis to GDP by p21. The decrease in GTP to GDP hydrolysis results in p21 remaining in its active state, leading to uncontrolled cell proliferation and transformation. An interesting question arising from this information could be: what structural differences exist between the mutant and the wild type and could those differences explain the functional differences? The user may now be able to look for structures of mutants of p21 and compare them with the structure of 1CTQ (Figure 5), as described in our next section. Exploring Genetic and Induced Mutations

A user interested in looking for the effects of mutations on 3D structure can start with a known structure of a nonmutant protein. We can look for structures with a similar sequence by using the Search Database menu (Search Database — Sequence), then entering '1CTQ' in the PDB ID box and running the sequence similarity search using either BLAST30 or FASTA.31 At the time of writing, 203 structures are retrieved by this search. We can refine the results by looking for the text word 'mutant' (menu item Refine this Search; then entering 'mutant' as the keyword in the text box for Keyword - advanced). At the time of writing, this refinement returns 46 structures.

Many of these structures have point mutations at the 12th codon. Looking at each of these structures provides insights into the phenotype of the mutation. For example, a G12D transforming mutant of HRAS (1AGP)32 was found

Figure 5 Structures of (a) p21 (1CTQ)29a and two transforming mutants of p21: (b) 1AGP32 and (c) 2Q21.68 The images show the location of the transforming mutation at position 12 in the polypeptide chain. The image was created with Chimera.69

to crystallize in a space group different from the wild type. Also, the structure of 1AGP around the active site was found to be different from that of the wild type. On the other hand, a G12P non-transforming mutant had a structure very similar to that of the wild type in the active site. Structure 2Q2133 is a G12 V transforming mutant where the valine side chain interferes with GTP hydrolysis to GDP. The user can also explore a number of structures of mutants with point mutations at other positions in the sequence. The structure summary pages for all structures have links to the PubMed abstracts, where available. By following this link, the user can access an abstract of the analysis of the structure, along with the information necessary to retrieve the complete article.

Diabetes 2

Diabetes 2

Diabetes is a disease that affects the way your body uses food. Normally, your body converts sugars, starches and other foods into a form of sugar called glucose. Your body uses glucose for fuel. The cells receive the glucose through the bloodstream. They then use insulin a hormone made by the pancreas to absorb the glucose, convert it into energy, and either use it or store it for later use. Learn more...

Get My Free Ebook

Post a comment