| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Endocrine Reviews, The University of Texas Medical Branch, Galveston, Texas 77555-0628
Along with all of biology and medicine, the field of endocrinology will be affected profoundly by the "genomics" era. This new era will bring changes for all of us, from basic bench scientists to practicing clinicians. The power and quantity of the information that the closely related technologies, genomics and proteomics, will create will require us to adopt new ways of thinking about our patients and our research.
The science of endocrinology has long been based on understanding the mechanisms of hormone actions. We add to this knowledge each year by discovering new hormones and hormone-like substances and unraveling the ways in which they carry out their functions. From the accumulated studies of many laboratories, it has become increasingly obvious that the action of any hormone is much more than a simple, single linear sequence of causes and effects. Rather, hormones and the regulatory pathways they control form interlocking networks. The molecules that function in these networks vary in quantity and identity from cell to cell. The interactive nature of the networks means, therefore, that the concentrations of each network molecule and the affinity of its molecular interactions determine the outcome of each hormones effect at a given time in a particular cell type. Though in principle the classic study of single cause-and-effect relationships, isolated from all other circumstances, could work out these interactive networks, in fact, rigid adherence to the strictly classic approach would make it a difficult and very lengthy task. Now, the discoveries and approaches globally grouped under the terms "genomics" and "proteomics" offer the opportunity to address this problem of interactive systems in a new way.
By genomics, I refer to the knowledge of the complete genome sequences of more and more species, the use of gene arrays to study gene expression, and the development and use of gene expression profiles to identify propensity to illness, response to drugs, and many other clinically important behaviors. I use proteomics to refer to studies of expressed proteins in arrays, of actual and predicted protein structures, and other protein-based analyses. Both genomics and proteomics require analysis of very large, complex bodies of data by processes termed collectively "bioinformatics." The use of these three powerful new tools will require some changes in our thinking. We will need to know more about the specific kinetics of interactions between the molecules in the networks that we study. We will have to be prepared to look upon an experiment not as a one-to-one cause and effect but as sets of causes and effects reverberating through a network. We will need to train ourselves in new methods and new mathematical and computer-based bioinformatic approaches to data analysis. The idea of mathematical models for endocrine systems is not new. For example, as long ago as the 1960s, Berman (1) was attempting to build kinetic models of hormone action; Urquhart et al. (2) and Yates et al. (3) combined experimental tests and mathematical models to study adrenocortical secretory function. These and other early studies showed the potential for dynamic models of complex endocrine systems. We will be increasingly required to use mathematical approaches combined with our traditional methods, as we attempt to apply the valuable tools laid at our feet by the technical and informational advances that genomics and proteomics bring.
One of the great technical advances within genomics is the invention of the gene chip, physical arrays of probes that can identify up to tens of thousands of unique mRNAs in a heterogeneous sample. By using such chips, one can address the question of how many and which genes change their expression at the mRNA level after any given treatment. Although in some quarters the use of these chips has been scorned as mere data gathering, in fact, getting an overall idea of the changes in gene expression caused by a hormone is a necessary step if we are to begin to define and understand the interactive networks controlled by hormones. We will have to change the concept that all good science must study only specific individual mechanisms. Although such studies are valuable and important, science progresses in stages, and we are at the stage when new data must be obtained and new discoveries must be made to lay the groundwork for future mechanistic studies. The history of science shows the truth in this. For example, if they were to attempt to publish nowadays, the great naturalists of the 19th century might have their papers rejected as mere collections of data, because much of what they did was identifications and classifications of plants and animals. Yet, in many ways that work provided the necessary basis for much that has followed in biology. With the advent of genomics/proteomics, once more a time has come when it will be necessary for us to make discoveries about the extent and intensity of changes in gene expression levels due to hormones, under a great variety of circumstances. While we should guard against repetitious and trivial work in this endeavor, we should encourage important studies. These can provide databases that will point the way toward critical new experiments and testable hypotheses directed at the mechanisms by which hormones bring about their far-reaching effects. Once this new paradigm is understood, what are the issues that should be addressed to develop the databases about these matters in the most quick and scientifically rigorous way possible?
Several issues arise when one begins to employ gene arrays carrying thousands of genes in an effort to see the extent of change that occurs after a given treatment. These issues concern validity of the results, the statistical probability of their being correct, their accuracy and precision, and what the changes mean in terms of biological function. Work in the field is currently underway to establish a consensus as to how one should assess the validity of gene array analysis data. We are in the development phase of appropriate methods. From the many working systems being developed, there is no doubt that over time there will follow a selection of those found satisfactory. The usual validity and statistical issues of endocrine studies are compounded in gene chip studies because of the high expense (for the average laboratory) of the existing available commercial chips. This cost precludes most labs from carrying out large numbers of replicate experiments. Of course, it may be expected that as greater use is made of the technology, costs will be lowered. Also, as it becomes increasingly easy for institutions, if not individual labs, to set up cores in which specialized chips can be made cheaply, costs will further diminish. At this time and for the near future, however, based on the number of replicates currently practicable, there is still active discussion as to what are proper validations and statistical methods to be applied in gene chip array analyses. Also relevant are the various software packages available for the analysis of the data from the chips. Results obtained are not always identical when differing analytical methods embedded in the software are employed to analyze the same primary data, because the various programs use differing methods to solve issues of background, normalization, and standardization. One must try to examine the raw data by more than one approach before concluding what genes have actually changed expression. Secondly, because of these normalization, background, and related practical problems, most analyses take mRNA changes of no smaller than 2- to 3-fold as the minimum acceptable to be considered different from random. (This kind of cutoff is not new to studies of RNA expression. A 2-fold change has been the rough working differential minimum taken in interpreting results obtained by the established methods of Northern blots, quantitative PCR, and other popular techniques for measuring individual RNAs.) While, for practical purposes, these limits may presently be the best one can do, important quantitative changes in interactive molecular species may show far less than 2-fold alterations. As has long been known, for a given increase in synthetic rate while degradation rate remains constant, a molecular pool of small size will show a larger fold change than a pool initially of large size (4, 5). It is therefore important to keep in mind that the fold change in a particular mRNA or protein is not a direct representation of its quantity. In terms of cellular concentration, a relatively small fold change of a large pool may represent a very large change in amount. In the future, our technology must be good enough to unequivocally identify differences in gene expression and levels of gene products, even though they fall below the present cutoffs imposed by the state of the technology. Obviously, signaling systems may respond to minor changes in amount of a particular gene product, and we must be able to detect and study such changes if we hope to understand fully the pathways through which hormones work.
Once it has been determined from array expression studies which particular genes consistently change expression under the chosen experimental conditions, one can begin to study the system. Given some prior knowledge from studies of specific genes expression, one can check the data from the chip by manually testing the samples of RNA for those genes to see how closely the chip data and single-RNA tests correlate. In addition, one can select some fresh unknowns from the pool of genes shown to alter in their expressed level according to the chip, obtain probes for these genes, and test their regulation one by one. Thus, one can obtain a strong sense of the general validity of the changes seen according to the chip data. By use of repeated experiments, statistical data analysis algorithms, and manual verification experiments, one can obtain a sense of the validity, precision, and accuracy of the gene expression data obtained by use of gene chips. As methods improve, one hopes that this data will become increasingly quantitative and that smaller quantitative changes in expressed levels of RNA will be accurately and precisely shown.
As far as obtaining statistically valid data for the changes seen, the cost of chips may make doing a sufficient number of repeat experiments to formally satisfy statistical rules impossible. Many statistically minded scientists are presently developing models, which would at least allow a semiquantitative evaluation of the statistical probability of a given change being safely acceptable as authentic. One widely recommended approach is to follow up a limited number of analyses on expensive commercial chips with much more detailed studies on selected individual genes obtained from the chips. This can be done by preparing and using less expensive "custom" chips containing the genes initially identified out of the global array chips. Using the custom arrays, one can pursue a problem in more detail. Some commercial vendors offer premade custom arrays according to certain topics, e.g., apoptosis or cancer. One should be cautious in the use of such chips because they can only contain genes chosen by the chip preparers as relevant to the topic. The presumption that all the important genes are known and placed on the chip is unsafe in this time of discovery. The points above by no means discuss all the solutions to these problems; however, they give some idea of the issues under investigation currently, and some ways of thinking about how they may be addressed.
It will be important as we go forward to keep in mind the issues of validation, quantification, and limits to the new technologies. With time, use, and new technical and bioinformatic developments, we will reach consensus on these issues. Use of gene chips and protein array technologies provide an entirely new perspective on how hormones work and what they really do at the molecular level.
Another powerful new tool, complimentary to gene chip array analysis, is the genomic sequence data now available for increasing numbers of species. A great deal has been written about this, with special focus after the completion of the first draft of the human genome (6, 7). The information these sequence databases provide will allow rapid identification and localization of candidate genes for health and disease. They will allow cross-referencing of regulatory networks between species. Human polymorphism profiles will be prepared for genes with probably relevance to endocrine and other diseases. These profiles will allow correlations to be made between particular molecular sequences and responses to hormones, to propensity for endocrine diseases, and to therapeutic drug-hormone interactions. Eventually proteomic, structural, and biochemical studies will bring understanding of how the polymorphisms affect function of relevant molecules. Gene polymorphism profiles will be applied to populations and in the future, perhaps, to individuals. We may hope for the day of individualized patient care and disease prevention, based on ones personal molecular makeup. The ability to obtain such detailed information brings concerns about its ethical use. It will be essential for us to enter the dialog that will lead to establishment of appropriate and humane use of the information made available to us as physicians and scientists.
Proteomics approaches complement and extend the possibilities of genomics. Proteomics can be defined as the characterization of the entire protein complement of a genome that is expressed in a cell line, tissue, or organism, including posttranslational modification and epigenetic processing. Developments in two-dimensional gel electrophoresis, coupled with mass spectroscopy, make possible the rapid separation, identification, and quantification of thousands of proteins. Parallel studies concerning the effects of a particular hormone, therefore, will produce information at both the mRNA and protein levels. These effects often include posttranslational alterations in proteins, e.g., phosphorylations; and now these can be detected and quantified on a scale hitherto impossible. Another goal of proteomics is the discovery of all possible protein domain structures. It is estimated that there are 30005000 unique three-dimensional forms into which independent domains of proteins can fold. Though efforts are underway to create mathematical algorithms for predicting protein structure from primary amino acid sequence, the rules of protein folding are still not well enough worked out for this to succeed de novo with certainty from a primary sequence. It has been shown in several instances that differing primary sequences can result in the same final fold. Why this is so is not as yet fully understood. Therefore, efforts are underway to express each full protein coding DNA sequences and to determine the structure of the expressed protein. The results will create a library of protein structures. Use of this library will allow comparisons of proteins with unknown function to those of known function. Discovery of a shape likely to carry out a particular function will allow rapid testing and positioning of the new protein in its biological context.
The quantities of data forthcoming from this work will be prodigious. Even a simple gene array analysis gives rise to tens or hundreds of thousands of data items, but only a tiny fraction of this can be presented and discussed in a standard research paper. The "unused" mass of data may nevertheless be of importance to many other workers, perhaps with different interests from those of the primary authors. We must develop ways of storing and maintaining this data in accessible reserves, to make the widest use of them possible and to avoid redundant, expensive experiments. Journal editors, librarians, academic societies, businesses, and government agencies are developing policies and practical implementations concerning this issue.
This issue of Endocrine Reviews makes a modest entry into the genomics/proteomics era by including a special section reviewing applications of genomics to several endocrine or endocrine-like systems. We hope it provides an intriguing portal of entry for those just beginning to think in such terms, as well as appropriate reviews of the topics concerned.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
K. De Bosscher, W. Vanden Berghe, and G. Haegeman The Interplay between the Glucocorticoid Receptor and Nuclear Factor-{kappa}B or Activator Protein-1: Molecular Mechanisms for Gene Repression Endocr. Rev., August 1, 2003; 24(4): 488 - 522. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Endocrinology | Endocrine Reviews | J. Clin. End. & Metab. |
| Molecular Endocrinology | Recent Prog. Horm. Res. | All Endocrine Journals |