Skip Navigation

Results and Future Plans

Former Human Genome Research Group (Oct. 1998-Mar. 2004)
Leading the Human Genome Project to a successful conclusion; applying the techniques learn to genomic analysis

Yoshiyuki Sakaki

President, Toyohashi University of Technology

Dr. Sakaki, the former Director of GSC, was also serving as Project Director of the Computational and Experimental Systems Biology Group, and as Director of the Genome Core Technology Facilities. He was formerly a representative of Japan for the Human Genome Project and was the President of the International Human Genome Organization (HUGO). He contributed to further development of genome science in Japan by promoting the fundamental concept of "Omic Space", which was advocated by Dr. Wada, the former Director of GSC. And he assumed his present position in 2008.

After elucidation of the DNA double helix, decoding of information (genome) written in the helix became the primary theme in life science. Technologies making full use of molecular biology were developed in the process of research, and recombinant DNA and DNA sequence technologies, in particular, brought revolutionary changes in genome sequencing. This trend was later connected to the development of a DNA automatic sequencer by the "Wada project" and gave rise to the Human Genome Project. The United States and the UK established large-scale sequence centers in succession in the 1990s, because it was believed that complete sequence of the human genome required powerful and highly accurate sequence determination capability. In Japan, however, although the sequencing of the human genome started in 1995 under the program for the Advanced Human Genome Database by the Japan Information Center for Science and Technology (JICST), the project in Japan at that time was being conducted on a far smaller scale than that in the United States and the UK. Therefore, large-scale sequencing center was necessary in order to maintain Japan's influential voice on the international stage of genomic research. It was against this background that RIKEN GSC was established and the former Human Genome Research Group began its activities.

The Past 10 Years of the Group

The foremost achievement of the former Human Genome Research Group over the past 10 years was undoubtedly the complete sequencing of the human genome. GSC marked its presence on the world stage by representing Japan on the 6-nation team (comprised of Japan, the United States, the UK, France, Germany, and China), by being the core agency for sequencing chromosomes 21 and 11, and furthermore, by acting as the sub-agency for sequencing chromosome 18.
The sequencing of the human genome was not only of great historical significance through which we obtained our gene map, but it also completely changed the manner in which we investigate disease mechanisms and also human evolution and diversity. Until that time, life science had only been able to use the inductive method, which assumes the whole picture from partial results by studying the activity of individual genes, and only at that time could establish a deductive method for clarifying parts or details using all information, such as all genes or all SNPs, as the basis.
In addition, comparative genome analysis of human and chimpanzee is not only an essential step in elucidating the evolution of humans, but is also a step toward elucidating the genetic background of humanness. GSC held an international workshop with the National Institute of Genetics in 2000, and based on this activity in particular, GSC conducted comparative mapping of the human and chimpanzee genomes (Science, 2002) and comparative detailed analysis of human chromosome 21 (Nature, 2004), paving the way in international genomic research. It was this kind of activity that prompted the United States to start the chimpanzee genome draft sequencing project. Research aiming to elucidate human genetic characteristics has, with the genome as a solid basis, begun to move steadily forward.
In 2004, although the former Human Genome Research Group was separated into the Genome Core Technology Facilities and partly into the Computational and Experimental Systems Biology Group, I would like to look back on the activities of the original group these past 10 years, as well as those after the groups separated.

The human genome project was started from 1990 and was led by the United States in cooperation and coordination with Britain, Japan, France, and Germany. Once the initial step in constructing the human genome map was completed in the mid 1990s, the project entered the sequencing phase. At the Bermuda meeting in 1996, it was decided that each country should focus attention on specific regions and aim for complete sequencing. Japan was assigned chromosome 21 (Japan was assigned the main responsibility for chromosome 11 and sub-responsibility for chromosome 18 at a later meeting) spearheaded by the international "Chromosome 21 Consortium", organized mainly by myself as project leader. The cover shown above is from the May 18, 2000 issue of Nature magazine in which results of the complete sequence of chromosome 21 were published.

1. Establishment of a World-Class Sequence Data Production Line

The first and foremost important target of the group was the establishment of a pipeline for genome analysis.
This was necessary not only for arranging sequencers, but also for preparing the integration of the following three teams for: "resources (BAC library, BAC clone alignment map, etc.)", "sequence data production line", and the "informatics base, which provides the essential meaning of information (annotation) accompanied by completion of sequence data by editing/integrating the produced data".
Fortunately, based on experience from the days of the JICST project, the world's top class pipeline was established to read 100 million bases in one year by the Summer of 1999 owing to three excellent team leaders, Asao Fujiyama (current professor at the National Institute of Information), Masahira Hattori (current professor at the University of Tokyo), and Tetsushi Yata (current associate professor at Kyoto University, and the efforts of approximately 40 enthusiastic staff members.
The sequencing pipeline was integrated with the cDNA sequencing line of the Hayashizaki group after completing the sequence of the human genome, and the sequence technology team of the Genome Core Technology Facilities led by Atsushi Toyoda, Senior Scientist, is now conducting analysis in response to requests from those inside and outside of the center on the sequencing of genomes of various species and their cDNA sequences.

2. Complete Sequencing of Chromosome 21

The complete sequencing of chromosome 21 was the first major accomplishment that brought the eyes of the world on RIKEN GSC. The international Chromosome 21 Consortium, which was proposed at a meeting called by myself and David Patterson (U.S.) in 1994, started the full-scale research activities led by Japan and Germany following the 1996 Bermuda meeting. However, progress was slow due to research limitations for sequence analysis capability in both Japan and Germany.
It was the establishment of GSC that substantially increased the research capability of the RIKEN group, and with GSC's leadership the project made considerable progress and data production was nearly completed by the end of 1999. The complete results of this project were published in Nature as a full paper on May 8, 2000. Great interest was shown in chromosome 21 by society at large because it is the causative chromosome of Down's
syndrome, the most frequent syndrome in neonatal disease, and the causative gene in Alzheimer's disease.
In this paper, the existence of genes coding 225 proteins, including 98 new genes, was clarified. Moreover, gene distribution was shown to have great deviation. In fact, a "gene desert" exists where hardly any genes code proteins. Considering also the data for chromosome 22, we suggested for the first time that the total number of genes in the human genome was not as high as the 100,000 originally believed, rather that there were 40,000 or fewer. This prompted serious discussion on the total number of human genes among the academic community.

The critical region of human chromosome 21 in Down's syndrome, published in Nature (May 18, 2000 issue) is shown below, and includes a map of the mouse genome corresponding to it

Human chromosome 21 is the causative chromosome of Down's syndrome, which is the most frequent neonatal disorder. Sequencing chromosome 21 has revealed the existence of 11 genes within the essential region of Down's syndrome (upper panel). It is supposed that the overexpressions of these genes are related to the symptoms of Down's syndrome, such as mental retardation. In addition, we determined the sequence in the corresponding region of the mouse genome (bottom panel) and conducted a comparative study. Although 10 genes were well conserved in the mouse genome, a gene designated DSCR9 was found only in the human genome.

3. Determination of the Human Genome Draft Sequence

The draft sequence of the human genome is an important achievement in the history of life science, clarifying for the first time the whole picture of the human genome. The international team for decoding the human genome set the determination of the human genome sequence to the highest possible accuracy level and to be the most important basis for medical and life science research, and the team provided its data to the public without cost or restriction. Building this new basis for research, we placed emphasis on rapidly advancing the growth of medicine and life science.
However, in 1999, Celera Genomics Co., which was established following the arrival of multi-capillary type sequencers on the market, took a strategy of tentatively reading the entire human genome and obtained a roughly accurate draft sequence using the genome shotgun method* with the intention of patent application. This strategy brought it into direct competition with the international team. In May of that year, therefore, the international team temporarily stopped high-accuracy sequencing and decided to move forward with determining the draft sequence of the human genome as an interim goal. Just a little more than a year later, the great efforts toward determining the draft sequence made by GSC came to fruition.
In June 2000, the international team finished the draft sequencing and summarized the results in an article published in Nature magazine in February 2001. While there were some controversial issues regarding the accuracy of the remaining data, this historic article clarified for the first time the overall picture of the human genome. This article was later selected by Nature magazine as one of the 25 most historic articles, alongside the discovery of X-ray. After The Sanger Center in the UK and 4 centers in the United States, GSC's genome sequencing team became the 6th highest contributor out of 20 centers participating in the Human Genome Project. The article clarified for the first time the entire structure of the human genome, predicting that it is composed of 30,000-40,000 genes, and indicated for the first time that 45% of the human genome is a repeated sequence of the Alu and L1 sequences.

*A procedure in which the genome DNA is fragmented into appropriate lengths and each fragment is directly deciphered using a device, enabling the fragments to be connected like pieces in a puzzle by a computer.

4. Completion of the Deciphering of the Entire Human Genome

Complete sequencing of the human genome will be recorded in scientific history as the definitive achievement of the international Human Genome Project. The international team continued their efforts in determining the human genome sequence at the highest accuracy level possible, which was the initial goal after the determination of the draft sequence. They announced completion of the sequence in April 2003. The international joint declaration by the leaders of the 6 countries participating in the project paid tribute to this immense undertaking, "Today, we have made an important step toward the constructing the future when all the people on the earth will be healthier. The human genome is a valuable common heritage for humankind and we all owe thanks to the creativity and dedication of the participants involved. Their remarkable results are as epoch-making achievements not only in scientific technology, but also in human history".
On April 14, 2003, myself as the representative of GSC, together with Nobuyoshi Shimizu of Keio University, Hidetoshi Inoko of Tokai University, and Hideaki Sugawara of the National Institute of Genetics visited Prime Minister Junichiro Koizumi upon the announcement of the completion of deciphering. We presented the Prime Minister with a set of 24 CD-ROMs containing all obtained and recorded data on the human genome. As for the scientific results for the entire deciphering, a general collaborative research article and separate articles written by each leading center for each chromosome were published in the October 2004 issue of Nature magazine.
The collaborative article clarified the aspects of the draft article indicating inadequate analysis; as an example, the coding gene for the human protein was clarified to have approximately 22,000 in total. In addition, the discovery of many overlapping regions was reported. Among the 24 chromosomes sequenced, GSC served as the core center for the complete sequence of chromosome 11 and as the sub-center for the entire sequencing of chromosome 18 after the sequencing of chromosome 21. The article on chromosome 11 was written under the direction of Team Leader Todd Taylor and was published in the March 2006 issue of Nature magazine.

5. Comparative Genome Analysis of Chimpanzee

The important issue after decoding the human genome is to give meaning to the information we found written in it. While all this genetic information characterizing humans is written in the human genome, it is not easy to read and decipher this genetic information. In our efforts to clarify the characteristics of the human genome, we are presently analyzing a variety of organismal genomes and performing comparative analyses.
In particular, we considered that the target sequences can be narrowed down through comparative analyses of genomes between human and its most related species, chimpanzee, and thus started to work on sequence determination of the chimpanzee genome in 2000. At first, a clone library was constructed by subcloning the fragments of chimpanzee genomic DNA to artificial chromosomes, BAC. Subsequently, a comparative map was created by determining the terminal sequences of each fragment and matching them with the homologous regions in the human genome. As a result, we could precisely map the fragments of chimpanzee genomic DNA in more than 70% of regions of the entire human genome. Assuming the case in which matching is technically impossible due to the presence of repeat sequences in these regions, genomic sequences of the two species were estimated to share high homology in more than 90% of regions. Thus, we have successfully completed the world's first DNA comparative map between human and chimpanzee genomes and have revealed that the difference in the genomic sequences of the two species was 1.23% as far as the regions available for the precise comparison were concerned. This article was published in Science magazine in January 2002.
Additionally, we made a comparison according to highly accurate sequencing in chimpanzee chromosome 22, which corresponds to human chromosome 21. GSC invited groups from Germany, as well as from Korea, China, and Taiwan, which cooperated in the sequencing of human chromosome 21, and conducted the analysis as an international consortium. Hidemi Watanabe (currently a professor of Hokkaido University) was primarily responsible for writing up the results, which were published in the May 2004 issue of Nature magazine. In this article, we clarified that deletions of high accuracy and insertions exceeded the estimate for both human and chimpanzee, that the active conversion of retrotransposon has occurred, and that the rate of evolution of 2-3% of the genes was markedly fast.
A further publication in Nature Genetics in 2006 reported determination of the sequence of the chimpanzee with respect to the Y-chromosome, which is inherited by only males. This finding was mainly the result of work by Yoko Kuroki, a research scientist with us. The Y-chromosome is a unique chromosome providing information not given by other chromosomes, and this information led us to an understanding of biological evolution. Our supposition of the ancestor- type Y-chromosome structure common to both subjects was confirmed by the comparison of human and chimpanzee Y-chromosomes.

Human and Chimpanzee: Comparison of Y-chromosome

(a) Ancestral-type sex chromosome (human chromosome X)

(b) Ancestral-type chromosome Y

(c) Human chromosome Y

(d) Chimpanzee Y-chromosome

It is thought that the mammalian X- and Y-chromosomes were formerly a pair of homologous chromosomes. Comparative analysis was conducted on each of the Y-chromosomes of the human and chimpanzee (c, d) before their divergence, thus re-building the ancestral type Y-chromosome (b) based on the current human X chromosome (a), which conserves more of the genome structure than the Y-chromosome. As a result of the analysis, it was clarified that the rate of evolution of the Y-chromosome in the chimpanzee was faster than that in humans. Triangles shown with the same number and color in the figure indicate the homologous region in each chromosome.

6. Meta Genome Analysis -A New Challenge

Genome analysis is considered difficult for microorganisms that are not easy to culture when sufficient DNA required for genome analysis cannot be obtained. However, in 2000, in collaborative research with Hideji Shigenobu and Subaru Ishikawa of the Faculty of Science, the University of Tokyo, we succeeded in conducting genome analysis of the commensal bacterium Buchnera, which cannot be cultured in the laboratory because it is viable only in the intestine of aphids. This was made possible by its isolation and collection under a microscope from the inner part of the aphid intestine (Nature, 2000).
Technical advances of DNA sequencers have been remarkable, and genome information processing technology is gaining ground. Moreover, "metagenome analysis" which involves direct extraction analysis from natural environments without requiring cloning of genomic DNA of bacteria (in a group) has recently shown potential. Using this technique in subjects with enterobacteria, we have determined slightly less than 600 organisms, including many novel species, in the human intestine. We plan to extend our work on this enterobacterium group by investigating its relations with health status, such as immune response and obesity. In addition, the key to development of such a study is information analysis, and developments within the field of computational biology are keenly anticipated.

The Vast Biological Diversity Motivating GSC's Comparative Analyses