Skip Navigation

Results and Future Plans

Protein Research Group
From Elucidation of Protein Structure and Function to an Understanding of Life Systems

Shigeyuki Yokoyama

Director, RIKEN System and Structural Biology Center

Shigeyuki Yokoyama was Deputy Director of RIKEN Structural Genomics/ Proteomics Initiative (RSGI), and also acted as a representative for RSGI researchers. Under his leadership the group has opened up new frontiers in structural proteomics.

The human genome functions as a highly organized network that is built around tens of thousands of proteins produced from genes. These proteins exhibit their functions by interacting with ions and electrons, as well as with molecules such as other proteins, nucleic acids, and sugars. The nature of these interactions is governed by the three dimensional structures of the proteins involved. Understanding and overcoming disease is a major issue that underpins life science; many therapeutic drugs target proteins. The three dimensional structure-based investigation, therefore, is expected to develop drugs by aiming at the target proteins in the network involved in disease. Therefore, in recent years, research into the structure and function of proteins (structural biology/structural proteomics) has become indispensable not only in life science but also in the elucidation of disease mechanisms.

Organization for Comprehensively Investigating Fundamental Structure

For the last 10 years, the Protein Research Group has carried out research with two long-term aims. The first is to understand life systems on the basis of research into the structure and function of proteins. The second is to use research results to benefit industry. At the time of foundation in 1998, GSC proposed a project to elucidate fundamental protein structures and began preparation for this project at a very early stage. As the countdown to completion of the human genome project had begun, we predicted the imminent rush into protein research.
We first considered how we could systematically comprehend the vast number of protein varieties (estimated to be around 100,000 in humans). We devised a strategy whereby we focused on the fact that many proteins are composed of multiple functional domains, then we classified domains with similar sequences into "families". By investigating the representative structure of these families, we intended to collect information on the corresponding (molecular) function. This was the real essence of the "Protein Folds Project".

GSC Establishes a World-Class NMR Facility

GSC's large-scale NMR facility is equipped with more than 40 high performance NMR systems, the largest one in the world. Researchers use this facility to investigate three-dimensional structure and the function of proteins.

We decided at that point to combine X-ray crystallography and NMR spectroscopy for structural analyses of proteins. We planned to use the world's highest energy large-scale synchrotron radiation facility, SPring-8, that had just begun operation in 1997, for our X-ray crystallography. In order to carry out NMR research on a comparable level, we put forward a plan to build the world's first and largest NMR facility (see photograph). We joined the group headed by Professor Seiki Kuramitsu at the Harima Research Institute in 1999, and began working on a project entitled "Structural-Biological Whole Cell Project of Thermus thermophilus HB8". This project aims to systematically determine the three-dimensional structure of proteins from a model organism, an extreme thermophilic bacterium, Thermus thermophilus.
Around the same time, the need for a systematic analysis of protein structure was being proposed in America, and an opportunity for international collaboration quickly came to the fore. As it turned out, America began its own Protein Structure Initiative (PSI) in 2000, while we organized the RIKEN Structural Genomics/Proteomics Initiative (RSGI) with the Harima Research Institute in 2001, creating a base organization for structure analysis in Japan.

Leading the National Project on Protein Structural and Functional Analysis (Protein 3000) Programs

The importance of protein structure analysis on a national scale had been comprehensively discussed by the Crystallographic Society of Japan, the newly founded Protein Science Society of Japan and other societies. In 2002, it was decided that the "Protein 3000 Project" would go forward for a 5-year period, ending in March 2007 under the guidance of the Japanese Ministry of Education, Culture, Sports, Science and Technology. The aim was to carry out large-scale analysis of proteins important to medical and biological science. We participated through RSGI, and were responsible for the comprehensive analysis program.
The top priority of this program was to perform a comprehensive search for mouse, human, and bacterial proteins that were considered highly important from a biological or medical perspective, were highly likely to be involved in major diseases, or were thought to be potentially useful for industrial applications, and then to select candidates from a wide range of approximately 10,000 domains and pare this number down while continuing analysis. At that time, we sought to optimize our research process by comparing proteins that showed structural or functional similarity.
The genome of Thermus thermophilus we chose as our research model codes for a condensed set of approximately 2000 protein varieties that are necessary for sustaining life. We reasoned that, by using this organism, we would be able to elucidate information regarding mechanisms of fundamental life processes that would be conserved in humans, and elucidate information regarding development of antibacterial drugs. Another benefit of using Thermus thermophilus is that its proteins are extremely resistant to heat, thus making the protein purification and crystallization processes comparatively simple.
Mainly through the work of the Kuramitsu Group at the Harima Research Institute, RSGI successfully cloned most of the genes of Thermus thermophilus, following which we began protein expression, purification, crystallization, and structure analysis. To date, the three-dimensional structures of over 300 proteins have been determined. When our results are combined with results from other groups around the world, we have the structures for around 20% of the proteins coded by Thermus thermophilus genome. The prokaryotic Thermus thermophilus is now the organism for which protein structure research is most advanced. We also worked on archaea, which are said to be the prototypes for eukaryotes.

Three-dimensional Structure Analysis Pipeline and its core technologies established during the Protein 3000 Project
This pipeline automates and accelerates the series of processes from protein sample preparation to structure analysis, and is being further developed.

The Next Challenge: Human and Mouse Proteins

X-ray Diffractometer and Protein Crystals

Three-dimensional protein structures are obtained from crystallized proteins by measuring X-ray diffraction data.

For human and mouse proteins, we have initially established a framework using the full length cDNA library, and then conducted analysis on functional domains of proteins involved in disease, drug design targets and other protein varieties. This work resulted in determination of the structures of approximately 1500 domains. While carrying out this research, we also semi-automated each step in the research process, from sample preparation to three-dimensional structure determination, and turned the whole process into a research pipeline. (See figure above) The use of cell-free protein synthesis systems as well as in vivo expression system has made high throughput sample preparation a reality. Prepared samples can then be screened with high efficiency in terms of favorable expression level and desirable properties. Samples can then be labeled with stable isotopes*1 and selenomethionine*2 for NMR analysis and MAD phasing method of X-ray crystallography respectively.
By the end of the 5-year project, we had succeeded to elucidate the structures of 1333 domains through X-ray crystallography, and 1342 through NMR analysis. This gave a total of 2675 structures, exceeding the RSGI target of 2500. In particular, the combined use of the cell-free protein synthesis system and NMR analysis to successfully analyze the structure of over 1300 domains set a new standard for structure analysis, and so deserves a special mention. To give a concrete example, we built up systematic structural information regarding the families of functional domains responsible for important functions such as signal transduction and transcription control in human cells. (See figure below)
Through further X-ray crystallography, we succeeded to obtain a wealth of information regarding the mechanisms by which multiple functional domains work in collaboration to express their function. We managed to obtain particularly remarkable results regarding the structure and function of protein groups that support the signal transduction pathway in human and mouse cells, and for the structure and function of the RNA polymerases, ribosome, various protein factors and enzymes that regulate the genetic information. (See bottom figure)
In the future, we would like to carry out research into fundamental life phenomena and protein systems involved in various diseases by taking the systematic strategies that we have used up until this point as a basis for analysis that takes interactive networks of proteins with multiple domains (multidomain proteins) into consideration. Through this method, we hope to gain an understanding of the mechanisms of action of entire systems, based on knowledge of three-dimensional structure.

*1 Stable Isotope Labeling
When analyzing protein using NMR, the samples are labeled with NMR-observable stable isotopes, 13C and 15N, for sample preparation in place of the naturally prevalent 12C and 14N.

*2 The MAP Phasing Method
Multi-wavelength Anomalous Dispersion (MAD) is a method of crystal structure analysis that employs synchrotron radiation of modifiable wavelength to analyze proteins labelled with selenium or other elements.

Elucidated Domain Structures
We analyzed domains that have various intra- and extracellular functions.

Development of Ground Breaking Technologies

It is extremely important to develop and refine technology to break through the "research wall". We have developed many ground breaking techniques, including techniques for automation of protein expression and purification, as well as for preparation of troublesome proteins, particularly membrane and other proteins considered to be important targets for drug design. Other successes include technological development in the field of X-ray crystallography, such as the automated robotic system for crystallization and observation. We have also developed NMR technologies such as high-field magnet, sensitive probe, and software to automate the analysis and structure determination.
In all these technological developments, we have placed major emphasis on inspection during the actual structure analysis process, thereby gaining feedback which can be used to take development to even higher levels. By producing an analysis pipeline that incorporates advanced and optimized technology, it will be possible to respond to the structure analysis-related needs of systematic protein structure-based research that is predicted to show explosive expansion in coming years. It may be that, in the future, researchers who are not structure analysis experts may also be able to obtain structural information by using this pipeline. Use of this pipeline will lead to new standards in life science research. Since 2007 we have invited researchers from outside RIKEN to make use of our NMR-based structural analysis pipeline as a large-scale communal facility.
We have also aimed to apply the results of our protein structure analysis to industrial applications. This occurs via the following process. Protein structure data is used to perform a virtual screening, whereby data for low molecular weight compounds is input into a computer, and a search is performed for compounds that might control that protein's activity. The efficacy of the screened lead compounds is confirmed experimentally. Our group has aimed for application of our research data to the production of antiviral and anticancer agents by targeting proteins that are indispensable for proliferation of viruses and cancer.
At the time when GSC was established, structure-based drug development was the stuff of dreams; however, we have worked hard over the last 10 years to make it a reality. In order to make new compounds optimal and practical for use as drugs it is necessary to have a system of close collaboration with drug companies and other contacts in industry. We therefore established a new program, the "Novel Proteome Drug Discovery Joint Research Program (Partnership Program)" to make use of the research results of the Protein 3000 Project, which we carried out in addition to collaborative research based on the needs of businesses.
Aside from application to drug design, we would also like to use the newly obtained lead compounds as chemical inhibitors to carry out protein network analysis (chemical proteomics) that will then feedback into further drug design.

Proteins Involved in the Protein Life Cycle

Comprehensive analysis of the structure and function of proteins coded by the genome lead to deeper understanding of biofunctions of protein network systems, such as systems for expressing genetic information.

[PDB ID in the figure]
  • 1J1V → DnaA
  • 1UFI → CENP-B
  • 1HLV → CENP-B/DNA complex
  • 1IU3 → SeqA/DNA complex
  • 1V5W → Dmc1
  • 1KN0 → Rad52
  • 1UI0 → Uracil DNA glycoslyase / Uracil complex
  • 1IW7 → RNA polymerase holoenzyme
  • 1H38 → T7 RNA polymerase
  • 1SMY → RNA polymerase / ppGpp complex
  • 1TJL → DksA
  • 1VS3 → tRNA pseudouridine synthase
  • 1UDN → RNasePH
  • 1VDX → 2'-5' RNA ligase
  • 1UEU → CCA lyase
  • 1J2B → ArcTGT
  • 1V2X → SpoU
  • 1WW1 → tRNase Z
  • 1WWR → TadA
  • 2CX5 → transcription factor
  • 1UDZ → IleRS CP1 domain
  • 1WKA → ValRS CP1 domain
  • 1ULH → TrpRS
  • 2CXI → PheRS
  • 1V4P → AlaRS
  • 1X56 → AsnRS
  • 1WWT → ThrRS TGS domain
  • 1X59 → HisRS WHEP-TRS domain
  • 2CYA → TyrRS
  • 2CYC → TyrRS
  • 1VBM → TyrRS/Tyr-AMS complex
  • 1J1U → TyrRS/tRNATyr/L-Tyr complex
  • 1VBQ → CysRS
  • 1WZ2 → LeuRS/tRNALeu complex
  • 2BTE → LeuRS/tRNALeu complex
  • 1IYW → ValRS
  • 1IVS → ValRS/tRNAVal/Val-AMS complex
  • 1A8H → MetRS
  • 2CSX → MetRS/tRNAMet complex
  • 2CT8 → MetRS/tRNAMet/MetSA complex
  • 2CV0 → GluRS/tRNAGlu/ATP/L-Glu complex
  • 2DB3 → Vasa/RNA/ATP analog complex
  • 2DYI → RimM 1WF3:ERA
  • 1V8Q → L27
  • 1UEB → EF-P
  • 1UFK → PrmA
  • 1WDT → EF-G-2
  • 1HZD → AUH
  • 2CSL → YabJ
  • 2CWJ → Endoribonuclease
  • 2CZJ → smpB/RNA complex