Skip Navigation

Results and Future Plans

Genome Exploration Research Group
cDNA collection and discovery of the RNA continent leads us ultimately to network analysis

Yoshihide Hayashizaki

Director, RIKEN Omics Sciences Center

Dr. Hayashizaki is the Director of Omics Sciences Center, who lead the Genome Exploration Research Group in GSC until 2008. He also actively leads the technologies development and data production team. Many staff members in Dr. Hayashizaki’s group visit him frequently and you’ll find his laboratory always filled with laughter accompanied by his lively Kansai dialect. His research motto is, "Try what no one else can do". He spends his time matching his words with his actions.

In 1995 the Genome Exploration Research Group took over and further developed the Mouse Genome Encyclopedia project being undertaken at GSC. In response to the proposal for a Human Genome Project by the United States in 1995, our group launched a trial project called the "Full-Length cDNA Project" a genome project designed to complement and respond to the American proposal. Our project was to proceed with the development of basic technology essential for cDNA analysis, as one of the ways in which we could compete with the activities of the United States who had invested huge funds in their work.

I still have a vivid memory of the time when GSC founder Dr. Akiyoshi Wada, during a visit to see the site at which the RISA high speed sequencing system was being developed, said, "When this is complete, my dream will have come true." The group has been engaged in technical development ever since, and is proud to be working in the spirit of Dr. Wada, developing technology while advancing science. At the beginning, when we were struggling to get our projects up and running, Dr. Wada was kind enough to give me a book entitled, "Koken Long-Range Mono-Plane" (Kiyoshi Tomizuka, Miki Press).
The book is a depiction of the process of development of the Koken long-range research plane, which was built for the purpose of achieving a new world record for long distance flight, and ultimately succeeded in doing so.
At first I just leafed casually through the book, however what it taught me was that when developing systems, it is not always necessary to develop the most highly advanced technology possible. It is often better to use existing reliable technology.
Thanks to this well-timed advice, our systems development leapt forward, producing a series of successes that included a series of full-length cDNA technologies and automated sequencing development. Putting this new technologies to work, we managed to build the RIKEN cDNA clone collection and obtain full-length sequence for all the clones. The collection now sets the global standard. In 2000, we hosted the first international Functional Annotation of Mouse cDNA (FANTOM) meeting as a means to annotate the huge amount of base pair sequence that we had obtained. The conference was attended by 65 researchers from Japan and overseas, and the results of the conference were published in Nature magazine in 2001.

Exploiting Novel Technologies to Discover a New RNA Continent

Further analysis revealed that around half the mouse gene complement is subject to alternative splicing, and around 80% of these alternately spliced genes use splicing to effect change in the codons of amino acids in proteins. This discovery led to the understanding that the number of proteins and mRNA varieties is considerably greater than that of genes in the genome. These findings were published in a special feature of Nature magazine along with the first draft of the mouse genome base pair sequence completed in America and Britain at that time. Thanks to these results, the mouse became the first organism in scientific history to have its genome and transcriptome deciphered simultaneously.
The technologies that we produced for creating full-length cDNA libraries, sequencing, and analysis were applied not only to analysis of full-length cDNA from mouse, but also to that of humans and other organisms, including rice, an important food resource here in Japan, the widely used model organism Arabidopsis thaliana, and the honeybee, which is well known as a social insect.
Full-length cDNA contains an overwhelming amount of information; however, obtaining it incurs great costs. As a means to address the cost issue, we developed Cap Analysis of Gene Expression (CAGE), a technique that allows rapid identification of transcription start sites. In the CAGE method, a combination of heat resistant reverse transcriptase*1 and the Cap-trapper method*2 is used to produce products that can be cut using restriction enzyme, effectively producing tag sequences that represent the initial 20 or so base pairs of 5 prime terminals of transcripts. The base pair sequence can then be determined. This technique made it possible to identify transcription start sites on a genome-wide basis. Upon adding data obtained through this technique to full-length cDNA data and subsequently carrying out analysis, it became clear that over half of the transcription products (transcripts) do not code for protein. Furthermore, we discovered that these non-coding RNAs carry out various functions in vivo. These results were published in Nature in 2005, and were widely picked up by several newspapers reporting the "Discovery of a new RNA Continent". These results showed that over half the transcript complement had been overlooked throughout human scientific history, and they opened up a vast new territory for research.

Discovery of a New RNA Continent

A research field comparable in size to that of protein science has opened up.

*1 Heat resistant reverse transcriptase
Heat resistant reverse transcriptase is an enzyme that synthesizes DNA from single stranded RNA in a process known as reverse transcription. Its enzymatic activity does not decrease even after high temperature treatment, and in fact it has optimum reverse transcriptase activity at 65℃ and above. As this enzyme can be used to carry out reverse transcription and other reactions at high temperature, long stranded RNA can be used as a template for reverse transcription reactions. This was previously difficult since long stranded RNA molecules tend to adopt three-dimensional configurations.

*2 Cap-trapper method
In the Cap-trapper method, CAP structures specific to the 5 prime terminals of full-length cDNAs are chemically biotinated then recovered using avidin to specifically select for full-length cDNA.

Developing Technologies to Advance New Research

FANTOM1 (top) and FANTOM2 (bottom)
A gathering of over 100 top scientists from all over the world.

Looking back over the last 10 years at GSC, it was like climbing a seriously high mountain, though without real experience in mountaineering, it's just my guess. The mountain that we climbed, while constantly developing new technologies, was the formidable peak known as the transcriptome. We came up against many problems and expended a lot of energy on the way, however we managed to make good and sound judgements with much help and advice from various scientist. Upon reaching the summit, an unknown RNA continent unfolded before our eyes. That is, of course, not to say that we had conquered the transcriptome, rather that we finally had reached a vantage point from which we could get a broad outline of the challenge ahead.
However our group also carried out research that was somewhat closer to the lives of public. One aspect of this research was the development of the Smart Amplification Process (SMAP) technique. The SMAP method allows for extremely rapid and totally specific amplification of a target gene. We have already developed a detection kit that is a first step toward developing individualized medical care. Using this kit, it is possible to carry out rapid and highly sensitive detection of mutations in the epithelial growth factor (EGFR) gene, a gene involved in sensitivity to Iressa (gefitinib), a drug used in the treatment of lung cancer. Clinical application has just begun.
The SMAP method has made extremely accurate diagnosis into a reality by completely repressing background noise. It has another favourable characteristic in that the amplification reaction proceeds at a constant temperature, and thus requires little energy. We are currently utilizing this characteristic to develop a device that is powered by heat energy from a mobile telephone battery and with which individuals can perform a self-health check.

Accelerating Analytic Capability using the LSA Pipeline

In order to understand the myriad of life phenomena, we firstly need to uncover intracellular networks (linking genes and phenotypes) at the molecular level using vast amount of data and techniques. It goes without saying that these networks will contain things pertaining to RNA.
In order to respond to these requirements, we are building the Life Science Accelerator (LSA). LSA is a system that makes use of research methods at the genome-level to systematically gather information about biomolecules, then to elucidate bimolecular interactions that take place in the background of life phenomena. We are hoping to develop a large-scale analysis system that we can use to elucidate transcription networks. In order to develop such system, it would first be necessary to fix a transcription network as the analysis target, then use CAGE, a cell-based assay*3 and other technique to carry out a huge integrated analsis that would unravel transcription networks analysis. We would then add manifold data obtained through other analysis methods to carry out a huge integrated analysis that would unravel transcription networks. This kind of research strategy is on a completely different level to that of traditional individualized research, and may well be the kind of innovative approach that will completely renovate the existing approaches to life science research.
In the near future, it may be possible to receive analysis data from individual researchers working on individual genes and conduct systematic genome-wide analysis using LSA. We predict that the LSA analysis pipeline will have a massive acceleratory effect on life science research, and that is the very reason for including the word "Accelerator" in its title.
It may also well be possible in the course of the development of LSA to elucidate molecular mechanisms and molecular networks by which traditional methodologies failed to achieve. And that I hope to show the results that will take a lead role in further development in life science research. I am also certain that LSA will become a foundation not only for RIKEN but also for life science in Japan as a whole.
On a final note, there is the commemorative cherry trees was planted at the entrance of the Yokohama Research Institute during the 2nd FANTOM international conferenece in 2002. I hope that this tree grows strongly and the blossoms will be with FANTOM project next spring.

*3 Cell-based Assay
The cell-based assay is a means to screen for various stimulation responses from cultured cells.


Through analysis carried out through FANTOM and the Genome Network Project, a new RNA continent has emerged along with many new facts.