| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Division of Reproductive Biology, Department of Gynecology and Obstetrics, Stanford University School of Medicine, Stanford, California 94305-5317
Correspondence: Address all correspondence and requests for reprints to: Aaron J. W. Hseuh, Ph.D., Division of Reproductive Biology, Department of Gynecology and Obstetrics, Stanford University Medical Center, 300 Pasteur Drive, Room A344, Stanford, California 94305-5317. E-mail: aaron. hsueh@forsythe.stanford.edu
| Abstract |
|---|
I. Introduction
II. Gene Identification
III. Sequence Homology and Phylogenetic Relationships
IV. Chromosomal Location of Genes
V. Gene Polymorphism
VI. Alternative Splicing
VII. Identification of the Regulatory Elements of Genes
VIII. mRNA Expression Profiling
IX. The Postgenomic Challenges in Hormonal Research
| I. Introduction |
|---|
A major impact of hormonal genomics that is evident today is the method used for identifying new ligands and receptors. In the classic endocrine approach, we first defined a biological endpoint (such as body growth induced by pituitary extracts in hypophysectomized animals) to allow the isolation of the potential ligand (in this case, GH) based on bioassays. Only after the cloning of ligand genes could cognate receptors be identified. Theoretically, in the postgenomic era, all receptors could be predicted based on their unique sequence features (e.g., the seven-transmembrane stretches of hydrophobic amino acids characteristic for G protein-coupled plasma membrane receptors). This reverse endocrinology approach starts with orphan receptors to eventually identify cognate ligands and physiological functions. Taking advantage of the sequence relatedness among family genes, we can predict the functions of novel uncharacterized hormonal ligands, receptors, and intracellular signaling molecules.
Another important conceptual change for traditional endocrinologists is the realization that the global perspective of biological systems will continue to blur established boundaries currently separating endocrinology, growth factor research, immunology, extracellular matrix research, and developmental biology. With the availability of a handful of animal genomes for comparison, we can trace the evolutionary roots of all human endocrine genes. Also, increasing evidence indicates that the endocrine mechanism is merely a special case of long-range intercellular communication mechanisms that developed during the course of evolution. Indeed, one can find many cases in which endocrine hormones and receptors have paracrine roots; for example, homologs for insulin-signaling genes or glycoprotein hormone receptors are already present in nematodes that are lacking an enclosed circulatory system (3, 4).
Over the past decade, whole-genome sequencing projects for numerous organisms have been completed, beginning with viruses and prokaryotes such as Escherichia coli (5) and continuing with the eukaryotic model organisms bakers yeast, Saccharomyces cerevisiae (6), the nematode Caenorhabditis elegans (7), and the fruit fly Drosophila melanogaster (8). Recently, working draft versions of the human genome have been published (1, 2). Moreover, a private draft sequence of the mouse genome (Mus musculus) and public draft sequences for two species of puffer fish (Fugu rubripes and Tetraodon nigroviridis) have been completed. Additional sequencing projects for rat (Rattus norvegicus), zebra fish (Danio rerio), and a number of other organisms are in progress.
Given the explosive growth of the genomic literature, we have deliberately chosen to limit the scope of the present review, highlighting basic principles in genomics and providing selected examples of their application to the field of hormonal research. In contrast, the evolving field of hormonal proteomics and the impact of genomics on clinical endocrinology will not be covered here.
| II. Gene Identification |
|---|
The transcription units, consisting of exons, introns, and the regulatory region, are thought to constitute more than 20% of the entire human genome. To increase the sensitivity and specificity of gene predictions in the draft sequence of the human genome, both public and private groups are using a combination of approaches based on: 1) identification of matching mRNA-derived sequences, 2) searches for similarity to previously known genes and proteins, and 3) prediction based on features common to all genes.
The first strategy relies on identifying regions of similarity between genomic sequences and sequences that are known to be transcribed, such as expressed sequence tags (ESTs) derived from fragments of cDNA, and full-length cDNA sequences (11). Even though this approach is based on direct experimental evidence, it is still subject to errors resulting from spurious EST data, e.g., from unspliced mRNAs, or genomic DNA contamination. More importantly, genes expressed at low levels or those transcribed selectively in tissues or cell types underrepresented in EST databases may be overlooked. Likewise, single-exon genes encoding small proteins may not be detected because the ESTs representing them are sometimes indistinguishable from genomic DNA contamination (12).
The second strategy identifies similarities between the genomic sequence investigated and known gene or protein sequences in human or other species. Although this approach can correctly predict genes belonging to larger gene families or having orthologs in different species, it is not capable of identifying genes having no sequence similarity to known genes and is prone to picking up pseudogenes. Moreover, it is difficult to detect homologs of genes with small ORFs because sequence similarities over short stretches may not achieve the necessary level of statistical significance (12).
The third strategy, ab initio gene prediction, uses software algorithms combining statistical information on splicing sites, coding bias, and exon/intron lengths to detect putative exons and genes (13). In a recent study involving the fruit fly genome, such algorithms correctly predicted all exons of a gene in about 40% of the cases, but entirely missed 510% of the known genes (14). Due to the limitations discussed above, the application of ab initio gene prediction methods to the human genome is expected to yield an even lower sensitivity and specificity (15).
Combining all three strategies, the total number of human genes was estimated to be between 30,000 and 40,000. It is important to note, however, that we are still some way from having a complete human gene index. Indeed, initially, a complete high-stringency alignment with the draft sequence could only be achieved for half of the more than 10,000 known human genes stored in the curated RefSeq database (1). This discrepancy is due to the remaining errors and gaps in the draft sequence that are being addressed during the ongoing process.
With the completion of the sequencing of several vertebrate genomes, analytical approaches to identify functionally similar homologous genes have become feasible. The best examples for hormonal research are the discoveries of more than 400 G protein-coupled receptors in the human genome with a potential hormonal ligand (16), and of diverse ligands belonging to the cytokine (17), IL (18, 19), and cysteine-knot gene families. New ligand/receptor systems have also been discovered with the aid of genomic information. For example, a family of more than 10 Toll-like receptors has been identified (20), and the crucial roles of these receptors in the recognition of microbial components and innate immunity have been elucidated (21).
As discussed above, the task of gene identification is not trivial. Moreover, multiple gene messages can be derived from a single stretch of DNA based on alternative uses of promoters, exons, and termination sites. Adding to these overlapping transcription units, somatic recombination events such as those found in some of the immune recognition loci, and the existence of highly similar gene families and pseudogenes, render it difficult to clearly identify and categorize the genes. Based on the classic definition that genes are distinct transcription units that are translated to generate one or a set of related proteins, many "genes" in the genome are pseudogenes. The human genome is estimated to encode approximately 900 olfactory receptor genes; however, 60% of them appear to be pseudogenes with disrupted ORFs (22, 23). Likewise, the entire mouse olfactory receptor gene repertoire comprising about 1300 genes may include 20% pseudogenes (24). Because not all genes with disruptions are pseudogenes, due to the possible splicing out of regions with a termination codon, the exact number of functional genes remains to be determined using more advanced experimental procedures.
Comparative genomic analyses of gene families across different species can provide insights into the evolution and associated adaptation of specific hormonal regulatory circuits among closely related species. For example, multiple GnRHs and GnRH receptors have been identified in vertebrates from teleosts to primates (25); however, humans appear to have only one functional GnRH receptor due to the apparent degeneration of the type II GnRH receptor gene into a pseudogene (26, 27, 28). Because the ortholog for the human type II GnRH receptor is functional in nonhuman primates (29), future investigation on the role of two types of GnRH receptors in primates will have to address the evolutionary adaptation of these GnRH signaling systems.
| III. Sequence Homology and Phylogenetic Relationships |
|---|
Despite the lack of a convenient global approach to understand genes involved in ligand signaling, one can define subgroups of these genes based on their evolutionary relationships. Two types of relationships between genes deserve specific attention. The first type is the orthologous gene that did not significantly change in either structure or function during speciation. For instance, it has been estimated that 40% and 60% of human genes responsible for genetic diseases have orthologs in yeast and fly, respectively. The second group of human genes derived as the result of gene-duplication events during evolution are called paralogs. Although these genes belong to the same family and usually have similar structural motifs, they can serve related but nonidentical physiological functions. It has been hypothesized that during early chordate evolution, before the Cambrian explosion and during the early Devonian period, two entire genome duplications took place (31, 32), leading to the generation of sets of multiple mammalian paralogs, concomitant with the opportunity to develop complex regulatory mechanisms and functions. Thus, most human genes belong to a family as the result of gene duplication and domain shuffling. These evolutionary mechanisms allow for the amplification and diversification of existing domains, the recruitment of existing domains for new functions, and the development of new domains through domain combination and shuffling.
Comparative genomic analysis reveals the evolution of polypeptide ligands, receptors, and intracellular signaling molecules. Only four orthologs of human insulin-like genes have been found in D. melanogaster, whereas C. elegans possesses a large number of insulin gene homologs, including one that was shown to determine the life span of the worm (33). In contrast, three insulin paralogs are present in the human genome, including insulin controlling carbohydrate metabolism, and IGF-I and IGF-II regulating organ and body growth. A related group of hormonal ligands that also consist of B and A subunits connected by a long-C-domain peptide is constituted by the relaxin-related proteins. The human paralogs of both insulin and relaxin all have highly conserved cysteine residues essential for maintaining a similar secondary structure. In the relaxin group, at least seven human paralogs have been identified, including three relaxin genes, Leydig cell relaxin (INSL3), INSL4, INSL5/RIF2, and INSL6/RIF1 involved in reproductive tract maintenance and other uncharacterized functions (34, 35).
Genomic studies on the presence of genes with sequence similarities to the human gonadotropin and TSH receptors also enhanced our understanding of their evolution. All of these genes encode proteins with a large N-terminal extracellular region important for ligand binding followed by a seven-transmembrane region known to be essential for G protein coupling. Based on the evolutionary conservation of these genes, five orphan receptors were identified in humans and named LGRs (leucine-rich repeat-containing, G protein-coupled receptors) (36, 37). Based on the structural comparison of LGRs from diverse species, and the evolutionary relationship of these animals, a putative evolutionary tree for the LGR family of proteins can be proposed. The primitive LGR gene likely replicated before the emergence of the cniderian (sea anemone) to form three LGRs (named as LGRA, LGRB, and LGRC), each with homologs in modern vertebrates. Based on their structural similarity and putative evolutionary origins, mammalian LGRs can be divided into three subgroups: LGRA (LH, FSH, and TSH receptors), LGRB (LGR4, LGR5, and LGR6), and LGRC (LGR7 and LGR8) (38). In fly, there are at least three LGRs, each belonging to one of these subgroups. Because only one LGR with similarities to the LGRA subgroup could be identified in C. elegans (4), it is likely that a gene loss occurred during evolution. Interestingly, recent studies indicated that LGR7 and LGR8 are receptors for relaxin (39).
Initially discovered based on their sequence similarity to a fly ortholog, cyclic nucleotide phosphodiesterases represent a large group of genes encoding enzymes that hydrolyze and inactivate cAMP and cGMP (40). Animal cyclic nucleotide phosphodiesterases comprise at least seven subtypes as well as orthologs corresponding to four subtypes that can be found even in freshwater sponges. Accordingly, phylogenetic analysis revealed that most gene duplications and domain shuffling that gave rise to different subtypes had been completed in the early evolution of animals before the separation of sponges and eumetazoans (41).
Another example of the use of comparative genomics for hormone discovery is the recent identification of CRH- related peptides (42, 43, 44). Using phylogenetic and structure profile analyses, two novel peptides, stresscopin/urocortin II and stresscopin-related peptide/urocortin III, with limited sequence relatedness to frog sauvagine and mammalian CRH, were isolated from the human and puffer fish genomic databases and shown to function as selective type 2 CRH receptor agonists. These studies suggest that the two novel peptides are important in stress-coping responses in both central and peripheral tissues for maintaining homeostasis after the initial fight-or-flight responses induced by stress.
Evolutionary genomic approaches have facilitated the analysis of large families of polypeptide ligands. A recent review summarizes the origin and function of the pituitary adenylate cyclase-activating polypeptide/glucagon superfamily in vertebrates and shows that the ancestral superfamily members can be traced to a common origin with the invertebrates. It also suggests that the superfamily began with gene or exon duplication and then continued to diverge with some gene duplications in vertebrates (45). Likewise, recent duplication of the LHß gene in ancestral primates led to the derivation of multiple human chorionic gonadotropin-ß genes (46). Furthermore, the coevolution of GH and its receptor in primates has been analyzed. These studies demonstrated that GH evolution is very conservative among most mammals but not in primates. The human GH receptor displays species specificity and interacts only with human or rhesus monkey GH, but not with nonprimate GH. Sequence analysis revealed that species specificity of the human GH receptor emerged in the common ancestor of Old World primates in a relatively short period (47). Phylogenetic analysis also suggested the evolution of vertebrate steroid receptors from an ancestral estrogen receptor, followed by the progesterone receptor and other related proteins (48).
As these examples illustrate, the availability of genomic sequences from an increasing number of organisms will enable hormonal researchers to decipher the evolutionary origin of diverse human genes important for ligand signaling. In turn, these findings will significantly facilitate the experimental verification of functions for novel genes based on their relationship to characterized genes as well as studies in model organisms.
| IV. Chromosomal Location of Genes |
|---|
Investigations of gene order in chromosomes of closely related species indicate that chromosomes underwent duplication, fusion, translocation, and inversion events during evolution. In syntenic regions of related species, a partial or complete conservation of gene order during chromosome evolution is evident. With the availability of complete genomes of increasing numbers of vertebrate species, the genomic analysis of syntenic regions becomes a valuable tool for understanding the function of human orthologs based on the conservation of gene locations. Recently, syntenic mapping of human and mouse chromosomes was performed by the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/Homology/). With the well established technology of gene engineering in the mouse, comparisons of human and mouse phenotypes and the elucidation of syntenic genes will become an increasingly useful approach in the future. Similar syntenic mapping projects are being performed between human and rat (49) as well as between other species.
One example of the impact of synteny studies on endocrine research is the role of certain members of the TGFß family in ovarian follicle development. Mutant mice lacking the oocyte hormone growth differentiation factor (GDF)-9 showed an arrest of ovarian follicles at the secondary stage (50). Furthermore, another oocyte gene with high homology to GDF-9, named GDF-9B or bone morphogenetic protein-15 (51, 52), was also found to stimulate granulosa cell proliferation (53) and mapped to human chromosome Xp11.211.4. In sheep, a naturally occurring X-linked mutation was known to cause an increased ovulation rate and twin and triplet births in heterozygotes but primary ovarian failure in homozygotes. Further studies identified the genetic locus of this Inverdale sheep mutation as syntenic to the human Xp region containing the GDF-9B/bone morphogenetic protein-15 gene. Thus, this oocyte gene is essential for female fertility, and its natural mutations can cause both an increased ovulation rate and infertility phenotypes in a dosage-sensitive manner (54).
In addition to studies of syntenic chromosome regions, analyses of unique phenotypes in patients have also confirmed the disruption of linked genes in the human genome. Kallmann syndrome is characterized by hypogonadotropic hypogonadism and an inability to smell as the result of a defect in the migration of olfactory and GnRH neurons (55). Although this disease has been associated with both X-linked and autosomal inheritance, some infertile patients also showed skin lesions of scalp, ears, and neck, representing a contiguous gene syndrome involving two adjacent genes in chromosome Xp22.3 (56). In this region, the KAL1 gene responsible for Kallmann syndrome is situated adjacent to the steroid sulfatase gene responsible for X-linked ichthyosis. Therefore, the analysis of patients with microdeletions of neighboring chromosomal regions containing linked genes can reveal the location of disease genes.
| V. Gene Polymorphism |
|---|
Polymorphisms of individual genes can account for differences in an individuals responsiveness to hormonal and other ligands. For example, polymorphisms in the coding region of the C-C chemokine receptor (CCR)-5 gene, and in the promoter region of its ligand RANTES, are associated with aberrant ligand-receptor interactions. Patients with an altered CCR-5 receptor, the coreceptor for the viral infection of immune cells, or patients who exhibit overproduction of RANTES, can be exposed to HIV with reduced infection risk or delayed disease progression (61, 62). In both cases, the viral particles fail to infect cells either because they cannot enter using the defective receptor or because the functional receptor is already occupied by high levels of the endogenous ligand. Likewise, a polymorphism of the ß-adrenergic receptor is associated with altered sensitivity to ß-agonists in asthmatics (63). Moreover, a single-nucleotide polymorphism of the IL-1B promoter has been found to be associated with a chronic hypochlorhydric response to Helicobacter pylori infection and the risk of gastric cancer, presumably by altering IL-1 levels in the stomach (64). Recently, two linked polymorphisms of the FSH receptor gene were found to alter FSH responsiveness of gonadal cells as the amount of FSH required for ovarian stimulation in patients with different polymorphic alleles differed (65). Estrogen receptor variants in normal and neoplastic mammary tissues and other target cells have also been identified (66, 67). These transcripts could be responsible for aberrant ligand-signaling events. A recent study analyzed polymorphisms and genetic variations of multiple G protein-coupled receptors and documented the functional importance of specific residues for the function of these receptors (68).
Studies on gene polymorphisms could also facilitate the mapping of disease genes based on their chromosomal location. Studies on polymorphic differences in coding and noncoding regions in large populations with a particular phenotype facilitate genome-wide linkage scans of polygenic diseases such as type 1 and type 2 diabetes. This approach is complicated by the dependence of such complex diseases on a combination of genetic variations that may be common in the general population, and environmental factors. However, increasing documentation of single-nucleotide polymorphism variation in humans could allow genome-wide linkage and association scans for complex endocrine diseases (69). Recently, advanced analytical tools have been applied to delineate the relationship between a common polymorphism in the peroxisome proliferator- activated receptor-
gene and type 2 diabetes (70). Because the chromosomal locations of human genes in the known ligand-signaling pathways are clear, efforts to identify genetic loci predisposing for hypertension have also focused on candidate gene strategies. Genetic linkage and association methods have correlated hypertension with polymorphisms in the genes of the renin-angiotensin-aldosterone system, the catecholaminergic/adrenergic receptors, and other hormonal factors. These studies have yielded promising leads by suggesting a number of candidate genes for hypertension susceptibility (71).
| VI. Alternative Splicing |
|---|
Sequence alignment of multiple ESTs with the corresponding genomic sequence is the most common method to predict the presence of different mRNA species (76, 79). Intronic sequences at splice junctions are highly conserved, and 99% of introns have a GT-AG at their 5'- and 3'-ends, respectively. These characteristic patterns can be used to verify candidate splices. Other approaches for the prediction of alternative splicing variants include interspecies comparisons of genomic sequences (80) and the computational analysis of mRNA expression data measured with oligonucleotide microarrays (81).
The phenomenon of alternative splicing affects hormonal systems in two ways. In one case, the components of the machinerypolypeptide ligands, receptors (plasma membrane and nuclear), intracellular mediators of signaling pathways, etc.are themselves subject to alternative splicing. On the other hand, the processing of pre-mRNAs in the nucleus may be subject to hormonal regulation (82). For example, hypophysectomy in the rat results in the expression of alternative splice variants of a calcium-dependent potassium channel, an effect that can be prevented by injection of ACTH (83).
A number of hormone and growth factor isoforms are derived from alternative gene splicing (84). One prominent example is vascular endothelial growth factor, a factor that induces microvascular permeability and plays a central role in both angiogenesis and vasculogenesis. Through alternative mRNA splicing, the vascular endothelial growth factor gene gives rise to several distinct isoforms differing in their expression patterns as well as their biochemical and biological properties (85, 86, 87).
There are also many examples of receptor gene splicing. For instance, inclusion or exclusion of exon 11 of the insulin receptor gene results in two different mRNA isoforms encoding receptor proteins with differing biological properties (88). Similarly, there are at least eight human CRH-receptor type 1 mRNA variants, and the splicing pattern varies in response to environmental stress. Functional analysis indicated that some of these CRH-receptor type 1 variants differ in their ability to transduce ligand signals in the cAMP-mediated pathways, whereas others could function as binding proteins for CRH and related ligands (89). The alternative splicing mechanism also underlies the generation of diverse decoy receptors belong to the TNF receptor, the chemokine receptor, and cytokine receptor families. These soluble or membrane-anchored decoy receptors bind and sequester their respective ligands with high affinity and specificity but are incapable of signaling or presenting the agonist to the functional receptor complexes, thus acting as a molecular trap for endogenous agonists (90).
For intracellular signaling proteins, gene splicing can also lead to multiple gene products with diverse functions. For genes in the apoptosis pathway, alternative splicing can affect both the functional activity and the intracellular distribution of protein isoforms (91). For example, a splicing variant of the antiapoptotic Mcl-1 protein has proapoptotic activity and antagonizes the prosurvival action of the known Mcl-1 protein by forming noncovalent dimers (92). In this example, proteins with diametrically opposed functions are generated under different physiological conditions through alternative splicing mechanisms.
Splicing of mRNA constitutes an essential molecular mechanism in the ongoing evolution of eukaryotes and is pivotal to the understanding of hormonal genomics. Currently available information on alternative splice variants for numerous genes can be found in databases such as ASDB (93), AsMamDB (94), or PALS db (95). Such resources will allow us to gain a better understanding of the complexity of alternative splicing and its role in hormonal research. Because approximately 15% of the point mutations that cause diseases in humans are thought to alter the normal splicing pattern, studies on gene splicing mechanisms could also provide important insights into human pathophysiology.
| VII. Identification of the Regulatory Elements of Genes |
|---|
Traditionally, regulatory DNA sequences have been investigated on a gene-by-gene basis, using a variety of approaches including deletion constructs, DNA footprinting, and gel shift assays. With the availability of genome sequences for human, rodent, and other species, novel strategies for the large-scale, computer-based in silico identification of regulatory sites are becoming feasible (98).
cis-Regulatory elements in the promoters and enhancers of genes can be defined based on their DNA sequences, position relative to coding sequences, and the potential to interact with transcription factors. Because an exhaustive experimental characterization of all potential regulatory sites in a given genome is difficult to generate, a variety of computational approaches to this problem have been developed. These methods include 1) analyses using databases of experimentally defined transcription factor binding sites, 2) approaches based on comparative genomics, and 3) sequence analyses of genes with similar expression profiles.
Experimentally derived data on transcription factors, their consensus DNA binding sites, and DNA-binding profiles have been compiled in a number of publicly accessible databases (99, 100, 101, 102). By comparing a given genomic DNA sequence to the consensus target sites for transcription factors stored in the database, potential regulatory sites can be identified. Because many of these sites are short in length and contain degenerate consensus sequences, a relatively large number of false-positive predictions are generated.
Cross-species comparisons, also known as phylogenetic profiling, provide another approach for identifying regulatory DNA sequences. These methods align noncoding sequences of orthologous genes from different species, taking advantage of the high degree of conservation for regulatory sequences. On a functional level, this finding is matched by the observation that genomic transgenes largely retain their natural expression patterns even when introduced into mice from other mammalian species (103). With the availability of working drafts of the human and mouse genome, large-scale interspecies comparisons to identify conserved noncoding sequences have become possible (104, 105). Moreover, alignments of large genomic regions can be easily performed and visualized using Internet-based tools such as PipMaker (106). The last common ancestor of humans and mice lived about 80100 million years ago, and it appears that in many regions of the human genome, this evolutionary distance will allow the identification of conserved regulatory sites through comparative genomics (98, 107).
Another strategy to predict regulatory elements of genes is based on the use of expression profiling to identify coregulated genes (cf. Section VIII). Transcription factors often coordinate the expression of whole sets of genes such as those induced by a given hormonal ligand. This coordinated regulation results from the presence of common transcription factor binding sites in these genes and, hence, similar or identical sequence motifs in their noncoding regions. For hormonally regulated genes, large-scale identification of coexpressed mRNAs after hormonal treatment can be used to identify genes that are coregulated by the same transcription factors. A detailed analysis of the upstream regions of these functionally clustered genes for shared sequence motifs can then serve as a starting point for the characterization of novel regulatory sequences. This approach has already been successfully applied to infer regulatory sites in yeast from microarray- derived expression data (108, 109, 110). Because the expression of a single gene can be controlled by various synergistic and/or antagonistic factors acting on different binding sites in its regulatory region, additional algorithms are needed to model relationships between multiple transcription factors and their binding sites to unravel complex transcriptional networks (111, 112, 113).
The combination of bioinformatics and novel experimental approaches allows new insights into the function of regulatory elements. For example, nuclear receptor signaling is transduced by homodimeric or heterodimeric receptor complexes that bind to specific sequences of DNA, termed response elements, in the promoter regions of target genes. Because considerable flexibility exists in the types of binding sites the nuclear receptors are capable of recognizing, the precision of predicting regulatory elements based on a bioinformatic approach is limited. Screening methods involving immunoselection and PCR amplification or microarray binding assays in combination with bioinformatic analyses now can be used to examine genomic binding sites for different nuclear receptors. The integrated approach for isolating functional binding sites for nuclear receptors from genomic DNA should aid in the discovery of genes regulated by steroid hormones (114). The large-scale prediction of regulatory regions in mammalian genomes will likely require a combination of all the approaches mentioned above. Importantly, in silico prediction of regulatory DNA sequences will need to be verified by functional data from high-throughput experimental methods. Although the area of regulatory element analysis remains a major challenge for hormonal researchers, recent studies on the promoter of the human RANTES/CCR5 ligand gene have allowed the identification of promoter "modules" that are important for cell type-specific expression of this gene (115, 116) and will serve as a model for future bioinformatic approaches in this field.
| VIII. mRNA Expression Profiling |
|---|
Traditional experimental methods, such as Northern blot analysis, differential display, and subtractive hybridization, were the standard tools that researchers used to identify differentially expressed genes. These approaches produced either lists of differentially expressed genes or, at most, semiquantitative estimates of transcript frequencies in different biological samples. This situation changed with high-throughput sequencing of cDNAs, including studies of ESTs and the serial analysis of gene expression (SAGE). These methods allowed scientists to quantify the occurrence of thousands of transcripts in libraries prepared from different cells or tissues. Based on the resulting data, the relative expression of different genes within a biological sample could be assessed. Databases containing information on ESTs derived from different tissues or cell lines can be analyzed to estimate the relative abundance of transcripts (117, 118). Using digital differential display or electronic subtraction, a computational assessment of transcript frequencies in different samples (e.g., with or without nerve growth factor treatment) can be used to explore differential gene expression (119). Digital differential display has been used to identify genes differentially expressed in diverse hormone- producing cells and tissues, including murine oocytes (120), the human prostate (121, 122), and the human hypothalamus-pituitary-adrenal axis (79). By comparing the genomic localizations of the genes encoding testis-specific transcripts with known translocation breakpoints from infertile males, a set of candidate genes for male infertility was identified (123).
In SAGE, short diagnostic sequence tags are isolated from multiple cDNAs in the sample of interest, concatenated, and cloned. In this manner, the sequencing of thousands of so-called SAGE tags can be performed with relative ease. By calculating the frequencies of tags in a given sample, the expression patterns of the corresponding genes can be determined (124, 125). A SAGE analysis of the molecular phenotype of the human oocyte identified a variety of surface receptors, components of second messenger systems, and secreted proteins, many of which were not previously known to be expressed in mammalian oocytes (126). Other applications of SAGE technology to the field of endocrinology include the identification of corticosteroid-responsive genes in the rat hippocampus (127), the discovery of human thyroid-specific transcripts (128, 129), and the characterization of gene expression in PRL receptor knockout mice (130).
With the introduction of DNA array technologies, powerful tools for global studies of gene expression have become available to individual laboratories (131). In this approach, gene-specific sequences (probes) are immobilized on solid substrates in ordered arrays. Subsequently, mRNA from the cells or tissues to be studied is used to generate labeled samples (targets) that are hybridized to the arrays. By quantifying the amount of labeled target hybridized to the array, large-scale analyses of gene expression can be performed. The available DNA array platforms differ in a number of ways including the substrate on which the array is produced (glass slides or silicon wafers), the type of arrayed material (cDNAs or oligonucleotides), the way in which the probe is fixed onto the array (printing or in situ synthesis), and the type of labeling used [fluorescent dyes or radioactive tracers (132)].
As a single DNA array experiment yields thousands of data points, new computational tools have been developed for their management and analysis (133). In particular, a variety of clustering algorithms have been designed to classify and group genes on the basis of similarities or differences in their expression across different samples and thus help users to recognize global gene expression patterns (134). The popular hierarchical clustering approach uses data from a set of array experiments to place genes on a single hierarchical tree. Genes with similar expression profiles become neighbors on the tree, whereas genes with larger differences in their expression patterns appear on more distant branches (135).
One basic assumption in interpreting gene expression profiles is that genes with similar transcription patterns are likely to share aspects of their regulation and/or biological functions. As a consequence, previously uncharacterized genes have been attributed putative functions based on the characteristics of known genes that are found in the same cluster. This "guilt-by-association" approach has been successfully used to predict the function of yeast genes (131) and is now being applied to the study of mammalian genomes, e.g., the identification of prostate cancer marker genes (136).
Microarray technologies have also been applied to investigate a number of endocrine tissues and hormone-regulated processes. Studies performed to date have addressedamong othersquestions involving gene expression in the human adrenal (137), the effect of hypophysectomy in rats (138), development of the mouse pituitary gland (139), adipose tissue in obese and diabetic mice (140), preovulatory gene expression in the rat ovary (141), implantation in the mouse (142), and gene expression in the murine placenta and embryo (143). Another important area of DNA array studies is the classification of disease phenotypes based on expression profiling of genes in normal and pathological samples (144, 145, 146, 147). In the future, transcript profiling of patient specimens could be used to subclassify hormonal disorders allowing for more precise therapeutic interventions.
As gene expression patterns may vary widely between adjacent but biologically distinct tissues and cell types, a meaningful microarray analysis often depends on the precise selection of defined cell populations from a sample. The investigation of gene expression patterns can be refined by laser capture microdissection, often in combination with methods for target amplification (148, 149). In the future, global analyses of gene expression may be performed even in single cells (150).
So far, most of these DNA array studies have taken a local approach to the study of gene expression and have focused on identifying individual genes connected with certain biological phenomena. Because gene products in a given cell usually function in a concerted manner and belong to specific pathways for carrying out unique cellular functions, future studies will focus on a global analysis of gene expression. In a recent study, the results of more than 500 DNA microarray experiments were analyzed to group coregulated genes in C. elegans. To visualize the relationships between more than 17,000 genes in the nematode genome, clusters of coexpressed transcripts were displayed as mountains in a three-dimensional expression map that corresponds to established, as well as previously unknown, biological relationships (151). In addition, a number of projects focus on the analysis and presentation of gene pathway maps. For example, GenMapp provides a graphic representation of the relative expression of selected groups of genes categorized by their known positions in physiological pathways (152) (http://www.genmapp.org). Thus, RNA transcripts for thousands of genes from defined cell populations can be probed for their relative abundance based on prearranged maps representing functional pathways.
DNA array experiments generate data sets of sizes that are unprecedented in most areas of biological research (153). In the future, it will become increasingly useful for these data to be available in electronic form to the scientific community at large. On the one hand, these data cannot be published by traditional methods because of their unwieldy quantity and because they can be more easily visualized and manipulated using a computer. On the other hand, the constant improvement of algorithms for the analysis of microarray data means that data sets will be frequently revisited to reinterpret their meaning or to formulate new hypotheses and relationships. Most importantly, however, the true power of gene expression data can only be unleashed when the expression of a given set of transcripts can be compared over as many different tissues and cell types, developmental and physiological states, and diseases and experimental conditions as possible. Therefore, public repositories and common data formats for gene expression data have been proposed (154).
| IX. The Postgenomic Challenges in Hormonal Research |
|---|
In the next phase, the large data sets created by the application of a single high-throughput method will be analyzed and interpreted with regard to the complete underlying set of biological entities (e.g., genes). The characterization of gene function will also expand from the single-gene knockout approach to random or targeted gene-trapping approaches (155, 156) (http://socrates.berkeley.edu/
skarnes/resource.html; http://www.lexgen.com/). A secretory trap method has allowed the generation of mice with a deletion of genes encoding membrane and secreted proteins, including those for polypeptide ligands and plasma membrane receptors. After the validation of functions for individual genes, comprehensive gene functional maps will be integrated into a biological atlas (157). As a result, relationships between separate entities in this set are recognized, patterns are discovered, and new testable hypotheses regarding gene pathways and networks are created. This global phase will be reached when a large number of data sets become available in a compatible electronic format and new computational methods for their analysis, visualization, and interpretation become available (e.g., the use of microarray data in the yeast research community). For understanding hormonal signaling, data on all ligands and receptors will be deposited and categorized in searchable databases with links to biomedical literature and gene sequences, protein structures, and other resources. These resources will be complemented by databases for specific endocrine organs [e.g., the Ovarian Kaleidoscope database, (158)] to provide an integrated view of organ physiology and pathophysiology.
We are moving from a gene-by-gene approach to a reconstructionistic one. Genes in the hormonal signaling system can be analyzed from the combined perspectives of traditional endocrinologists, molecular biologists, developmental biologists, cell biologists, evolution biologists, structural biologists, and geneticists. In the postgenomic era, the scientific community as a whole, as opposed to an individual laboratory, plays an increasingly important role in biomedical research. Eventually, the massive genomic data gathered with different methods (e.g., DNA arrays and proteomics) will be integrated to allow a global view of the system (157). Recently, steroid receptors and related orphan receptors have been grouped in the context of all transcription factors (159), whereas diverse plasma membrane receptors have been classified based on their evolutionary origin and relationship to other plasma membrane signaling molecules (I. Ben-Shlomo, S. Y. Hsu, and A. J. W. Hsueh, submitted). In this way, new metapatterns can be discovered, and pathways spanning the scope of different methods can be understood ranging from regulatory DNA sequences to gene transcript expression to protein function. This phase relies heavily on collaboration within the biomedical community because no single laboratory is likely to have the expertise, time, and resources to generate such large datasets using different high-throughput methods. This transition of hormonal research from a single laboratory approach to an integrated community effort will pose an important challenge for the next generation of endocrine researchers.
| Acknowledgments |
|---|
| Footnotes |
|---|
* Current address for C.P.L.: MBA Programme, Institute Europeen dAdministration des Affaires, Boulevard de Constance, 77305 Fontainebleau Cedex, France.
Abbreviations: CCR, C-C chemokine receptor; EST, expressed sequence tag; GDF, growth differentiation factor; LGR, leucine-rich repeat-containing G protein-coupled receptor; ORF, open reading frame; SAGE, serial analysis of gene expression.
| References |
|---|
and IL-1
, function as an antagonist and agonist of NF-
B activation through the orphan IL-1 receptor-related protein 2. J Immunol 167:14401446