A Vision for the Future of Genomics Research: A Blueprint for the Genomic Era

Part III

Grand Challenge I-4: Understand evolutionary variation across species and the mechanisms underlying it

The genome is a dynamic structure, continually subjected to modification by the forces of evolution. The genomic variation seen in humans represents only a small glimpse through the larger window of evolution, where hundreds of millions of years of trial-and-error efforts have created today's biosphere of animal, plant, and microbial species. A complete elucidation of genome function requires a parallel understanding of the sequence differences across species and the fundamental processes that have sculpted their genomes into the modern-day forms.

The study of inter-species sequence comparisons is important for identifying functional elements in the genome (see Grand Challenge I-1). Beyond this illuminating role, determining the sequence differences between species will provide insight into the distinct anatomical, physiological, and developmental features of different organisms, will help to define the genetic basis for speciation, and will facilitate the characterization of mutational processes. This last point deserves particular attention, because mutation both drives long-term evolutionary change and is the underlying cause of inherited disease. The recent finding that mutation rates vary widely across the mammalian genome¹¹ raises numerous questions about the molecular basis for these evolutionary changes. At present, our understanding of DNA mutation and repair, including the important role of environmental factors, is limited.

Genomics will provide the ability to substantively advance insight into evolutionary variation, which will, in turn, yield new insights into the dynamic nature of genomes in a broader evolutionary framework.

Grand Challenge I-5: Develop policy options that facilitate the widespread use of genome information in both research and clinical settings

Realization of the opportunities provided by genomics depends on effective access to the information (such as data about genes, gene variants, haplotypes, protein structures, small molecules, and computational models) by a wide range of potential users, including researchers, commercial enterprises, healthcare providers, patients, and the public. Researchers themselves need maximum access to the data as soon as possible (see "Data release," below). Use of the information for the development of therapeutic and other products necessarily entails consideration of the complex issues of intellectual property (for example, patenting and licensing) and commercialization. The intellectual property practices, laws, and regulations that affect genomics must adhere to the principle of maximizing public benefit, but must also be consistent with more general and longer established intellectual property principles. Further, because genome research is global, international treaties, laws, regulations, practices, belief systems, and cultures also come into play.

Without commercialization, most diagnostic and therapeutic advances will not reach the clinical setting, where they can benefit patients. Thus, we need to develop policy options for data access and for patenting, licensing, and other intellectual property issues to facilitate the dissemination of genomics data.

II. Genomics to Health — Translating Genome-Based Knowledge into Health Benefits

The sequencing of the human genome, along with other recent and expected achievements in genomics, provides an unparalleled opportunity to advance our understanding of the role of genetic factors in human health and disease, to allow more precise definition of the non-genetic factors involved, and to apply this insight rapidly to the prevention, diagnosis, and treatment of disease. The report by the US National Research Council that originally envisioned the HGP was explicit in its expectation that the human genome sequence would lead to improvements in human health, and subsequent five-year plans reaffirmed this view^15-17. But how this will happen has been less clearly articulated. With the completion of the original goals of the HGP, the time is right to develop and apply large-scale genomic strategies to empower improvements in human health, while anticipating and avoiding potential harm.

Such strategies should enable the research community to achieve the following:

Identify genes and pathways with a role in health and disease, and determine how they interact with environmental factors.
Develop, evaluate, and apply genome-based diagnostic methods for the prediction of susceptibility to disease, the prediction of drug response, the early detection of illness, and the accurate molecular classification of disease.
Develop and deploy methods that catalyse the translation of genomic information into therapeutic advances.

Grand Challenge II-1: Develop robust strategies for identifying the genetic contributions to disease and drug response

For common diseases, the interplay of multiple genes and multiple non-genetic factors, not a single allele, usually dictates disease susceptibility and response to treatments. Deciphering the role of genes in human health and disease is a formidable problem for many reasons, including impediments to defining biologically valid phenotypes, challenges in identifying and quantifying environmental exposures, technological obstacles to generating sufficient and useful genotypic information, and the difficulties of studying humans. Yet this problem can be solved. Vigorous development of crosscutting genomic tools to catalyse advances in understanding the genetics of common disease and in pharmacogenomics is needed. Prominent among these will be a detailed haplotype map of the human genome (see Grand Challenge I-3) that can be used for whole-genome association studies of all diseases in all populations, as well as further advances in sequencing and genotyping technology to make such studies feasible (see "Quantum leaps," which will appear in the final part of this report).

More efficient strategies for detecting rare alleles involved in common disease are also needed, as the hypothesis that alleles that increase risk for common diseases are themselves common³⁰ will probably not be universally true. Computational and experimental methods to detect gene-gene and gene-environment interactions, as well as methods allowing interfacing of a variety of relevant databases, are also required (Box 3). By obtaining unbiased assessments of the relative disease risk that particular gene variants contribute, a large longitudinal population-based cohort study, with collection of extensive clinical information and ongoing follow-up, would be profoundly valuable to the study of all common diseases (Box 1). Already, such projects as the UK Biobank, the Marshfield Clinic's Personalized Medicine Research Project, and the Estonian Genome Project seek to provide such resources. But if the multiple population groups in the United States and elsewhere in the world are to benefit fully and fairly from such research (see Grand Challenge II-6), a large population-based cohort study that includes full representation of minority populations is also needed.

Grand Challenge II-2: Develop strategies to identify gene variants that contribute to good health and resistance to disease

Most human genetic research has traditionally focused on identifying genes that predispose to illness. A relatively unexplored, but important, area of research focuses on the role of genetic factors in maintaining good health. Genomics will facilitate further understanding of this aspect of human biology and allow the identification of gene variants that are important for the maintenance of health, particularly in the presence of known environmental risk factors. One useful research resource would be a "healthy cohort," a large epidemiologically robust group of individuals (Box 1) with unusually good health, who could be compared with cohorts of individuals with diseases and who could also be intensively studied to reveal alleles protective for conditions such as diabetes, cancer, heart disease, and Alzheimer's disease. Another promising approach would be rigorous examination of genetic variants in individuals at high risk for specific diseases who do not develop them, such as sedentary, obese smokers without heart disease or individuals with HNPCC mutations who do not develop colon cancer.

References

Mendel,G.Versuche über Pflanzen-Hybriden.Verhandlungen des naturforschenden Vereines,Abhandlungen, Brünn 4, 3-47 (1866).
Avery,O. T.,MacLeod,C.M.& McCarty, M. Studies of the chemical nature of the substance inducing transformation of pneumococcal types. Induction of transformation by a desoxyribonucleic acid fraction isolated from Pneumococcus Type III. J. Exp.Med. 79, 137-158 (1944).
Watson, J.D. & Crick, F. H. C.Molecular structure of nucleic acids: A structure for deoxyribose nucleic acid. Nature 171, 737 (1953).
Nirenberg, M.W. The genetic code: II. Sci.Am. 208, 80-94 (1963).
Jackson,D. A., Symons, R. H. & Berg, P. Biochemical method for inserting new genetic information into DNA of Simian Virus 40: circular SV40 DNA molecules containing lambda phage genes and the galactose operon of Escherichia coli. Proc.Natl Acad. Sci. USA 69, 2904-2909 (1972).
Cohen, S. N., Chang,A.C., Boyer, H.W. & Helling,R. B. Construction of biologically functional bacterial plasmids in vitro. Proc.Natl Acad. Sci. USA 70, 3240-3244 (1973).
The International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860-921 (2001).
Sanger, F.& Coulson, A. R. A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J.Mol. Biol. 94, 441-448 (1975).
Maxam, A. M. & Gilbert,W. A new method for sequencing DNA. Proc.Natl Acad. Sci. USA 74, 560-564 (1977).
Smith, L. M. et al. Fluorescence detection in automated DNAsequence analysis. Nature 321, 674-679 (1986).
The Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520-562 (2002).
The Chipping Forecast II. Nature Genet. 32, 461-552 (2002).
Guttmacher,A. E.& Collins, F. S. Genomic medicine — A primer. N. Engl. J. Med. 347, 1512-1520 (2002).
National Research Council. Mapping and Sequencing the Human Genome (National Academy Press,Washington DC, 1988).
US Department of Health and Human Services, US DOE. Understanding Our Genetic Inheritance. The US Human Genome Project: The First Five Years. NIH Publication No. 90-1590 (National Institutes of Health, Bethesda, MD, 1990).
Collins, F. & Galas,D. A new five-year plan for the US Human Genome Project. Science 262, 43-46 (1993).
Collins, F. S. et al.New goals for the US Human Genome Project: 1998-2003. Science 282, 682-689 (1998).
Hilbert, D.Mathematical problems. Bull. Am. Math. Soc. 8, 437-479 (1902).
Aparicio, S. et al.Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297, 1301-1310 (2002).
Sidow, A. Sequence first.Ask questions later. Cell 111, 13-16 (2002).
Zhang,M.Q.Computational prediction of eukaryotic proteincoding genes. Nature Rev. Genet. 3, 698-709 (2002).
Banerjee, N. & Zhang,M.X. Functional genomics as applied to mapping transcription regulatory networks.Curr.Opin. Microbiol. 5, 313-317 (2002).
Van der Weyden, L.,Adams,D. J. & Bradley, A. Tools for targeted manipulation of the mouse genome. Physiol.Genomics 11, 133-164 (2002).
Hannon, G. J.RNA interference. Nature 418, 244-251 (2002).
Stockwell, B. R. Chemical genetics: Ligand-based discovery of gene function. Nature Rev. Genet. 1, 116-125 (2000).
Gavin, A. C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141-147 (2002).
Tyson, J. J.,Chen, K. & Novak, B.Network dynamics and cell physiology. Nature Rev.Mol. Cell Biol. 2, 908-916 (2001).
Sachidanandam, R. et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928-933 (2001).
Gabriel, S. B. et al. The structure of haplotype blocks in the human genome. Science 296, 2225-2229 (2002).
Reich,D. E. & Lander, E. S. On the allelic spectrum of human disease. Trends Genet. 17, 502-510 (2001).
Hirschhorn, J.N., Lohmueller, K., Byrne, E. & Hirschhorn,K.A comprehensive review of genetic association studies. Genet. Med. 4, 45-61 (2002).
Wagner,K.R.Genetic diseases of muscle. Neurol. Clin. 20, 645-678 (2002).
Golub, T.R.Genomic approaches to the pathogenesis of hematologic malignancy. Curr.Opin.Hematol. 8, 252-261 (2001).
Drews, J.& Ryser, S. The role of innovation in drug development. Nature Biotechnol. 15, 1318-1319 (1997).
Druker, B. J. Imatinib alone and in combination for chronic myeloid leukemia. Semin. Hematol. 40, 50-8 (2003).
Selkoe,D. J.Alzheimer's disease: genes, proteins, and therapy. Physiol. Rev. 81, 741-66 (2001).
Lynch, H. T. & de la Chapelle, A. Genomic medicine: hereditary colorectal cancer. N. Engl. J. Med. 348, 919-932 (2003).
Gardner, M. J. et al. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419, 498-511 (2002).
Holt, R. A. et al. The genome sequence of the malaria mosquito Anopheles gambiae. Science 298, 129-149 (2002).
Anderlik, M. R. & Rothstein, M.A. Privacy and confidentiality of genetic information: What rules for the new science? Annu. Rev. Genom.Hum.Genet. 2, 401-433 (2001).
Hudson, K. L.,Rothenberg,K. H.,Andrews, L. B.,Kahn, M. J. E. & Collins, F. S. Genetic discrimination and health-insurance — An urgent need for reform. Science 270, 391-393 (1995).
Rothenberg, K. et al. Genetic information and the workplace: Legislative approaches and policy challenges. Science 275, 1755-1757 (1997).
Fuller, B. P. et al. Policy forum: Ethics — privacy in genetics research. Science 285, 1359-1361 (1999).
Miller, P. S. Is there a pink slip in my genes? J.Health Care Law Policy 3, 225-265 (2000).

Courtesy: National Human Genome Research Institute

1,470,619

660

A Vision for the Future of Genomics Research: A Blueprint for the Genomic Era