help button home button Endocrine Society Endocrine Reviews
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Purchase Article
Right arrow View Shopping Cart
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow Request Copyright Permission
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Leo, C. P.
Right arrow Articles by Hsueh, A. J. W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Leo, C. P.
Right arrow Articles by Hsueh, A. J. W.
Right arrowPubmed/NCBI databases
Medline Plus Health Information
*Hormones
Endocrine Reviews 23 (3): 369-381
Copyright © 2002 by The Endocrine Society

Hormonal Genomics

Chandra P. Leo, Sheau Yu Hsu and Aaron J. W. Hsueh

Division of Reproductive Biology, Department of Gynecology and Obstetrics, Stanford University School of Medicine, Stanford, California 94305-5317

Correspondence: Address all correspondence and requests for reprints to: Aaron J. W. Hseuh, Ph.D., Division of Reproductive Biology, Department of Gynecology and Obstetrics, Stanford University Medical Center, 300 Pasteur Drive, Room A344, Stanford, California 94305-5317. E-mail: aaron. hsueh@forsythe.stanford.edu


    Abstract
 Top
 Abstract
 I. Introduction
 II. Gene Identification
 III. Sequence Homology and...
 IV. Chromosomal Location of...
 V. Gene Polymorphism
 VI. Alternative Splicing
 VII. Identification of the...
 VIII. mRNA Expression Profiling
 IX. The Postgenomic Challenges...
 References
 
The availability of the human genomic sequence is changing the way in which biological questions are addressed. Based on the prediction of genes from nucleotide sequences, homologies among their encoded amino acids can be analyzed and used to place them in distinct families. This serves as a first step in building hypotheses for testing the structural and functional properties of previously uncharacterized paralogous genes. As genomic information from more organisms becomes available, these hypotheses can be refined through comparative genomics and phylogenetic studies. Instead of the traditional single-gene approach in endocrine research, we are beginning to gain an understanding of entire mammalian genomes, thus providing the basis to reveal subfamilies and pathways for genes involved in ligand signaling. The present review provides selective examples of postgenomic approaches in the analysis of novel genes involved in hormonal signaling and their chromosomal locations, polymorphisms, splicing variants, differential expression, and physiological function. In the postgenomic era, scientists will be able to move from a gene-by-gene approach to a reconstructionistic one by reading the encyclopedia of life from a global perspective. Eventually, a community-based approach will yield new insights into the complexity of intercellular communications, thereby offering us an understanding of hormonal physiology and pathophysiology.

I. Introduction

II. Gene Identification

III. Sequence Homology and Phylogenetic Relationships

IV. Chromosomal Location of Genes

V. Gene Polymorphism

VI. Alternative Splicing

VII. Identification of the Regulatory Elements of Genes

VIII. mRNA Expression Profiling

IX. The Postgenomic Challenges in Hormonal Research


    I. Introduction
 Top
 Abstract
 I. Introduction
 II. Gene Identification
 III. Sequence Homology and...
 IV. Chromosomal Location of...
 V. Gene Polymorphism
 VI. Alternative Splicing
 VII. Identification of the...
 VIII. mRNA Expression Profiling
 IX. The Postgenomic Challenges...
 References
 
THE SEQUENCING OF the human genome has triggered a revolution encompassing all fields of biomedicine (1, 2). Whereas traditional methods in genetics and molecular biology focused on the characterization of individual genes and their products, the new genomic approach takes advantage of knowledge concerning the totality of genes in an organism. In the area of hormonal genomics, the functional subgenomes of secreted extracellular signaling molecules (peptide and protein ligands), transmembrane receptors, intracellular signaling molecules, and transcriptional factors (including steroid receptors and related genes) can now be analyzed in an integrated manner. With a global perspective on the limited "parts list" of our body, we are able to predict the structure of genes based on their sequence relationships, to understand the functional redundancy of genes based on their common position in gene networks, and to trace the evolutionary roots of gene families based on comparative genomic analyses of sequence conservation and chromosomal gene order. Although experimental evidence is needed for all functional assignments, it is becoming clear that the emerging global perspective of all genes, as well as the pathways and networks in which they are arranged, will provide insights of a quality that was formerly unattainable using the previous local view of a limited number of genes. In the postgenomic era, the availability of all human gene sequences allows us to study transcription profiles of genes for specific cell types and developmental stages in response to hormonal signaling. The introduction of new tools for the parallel analysis of all gene transcripts and, in the future, the expression of all proteins heralds the dawn of the new fields of transcriptomics and proteomics, respectively.

A major impact of hormonal genomics that is evident today is the method used for identifying new ligands and receptors. In the classic endocrine approach, we first defined a biological endpoint (such as body growth induced by pituitary extracts in hypophysectomized animals) to allow the isolation of the potential ligand (in this case, GH) based on bioassays. Only after the cloning of ligand genes could cognate receptors be identified. Theoretically, in the postgenomic era, all receptors could be predicted based on their unique sequence features (e.g., the seven-transmembrane stretches of hydrophobic amino acids characteristic for G protein-coupled plasma membrane receptors). This reverse endocrinology approach starts with orphan receptors to eventually identify cognate ligands and physiological functions. Taking advantage of the sequence relatedness among family genes, we can predict the functions of novel uncharacterized hormonal ligands, receptors, and intracellular signaling molecules.

Another important conceptual change for traditional endocrinologists is the realization that the global perspective of biological systems will continue to blur established boundaries currently separating endocrinology, growth factor research, immunology, extracellular matrix research, and developmental biology. With the availability of a handful of animal genomes for comparison, we can trace the evolutionary roots of all human endocrine genes. Also, increasing evidence indicates that the endocrine mechanism is merely a special case of long-range intercellular communication mechanisms that developed during the course of evolution. Indeed, one can find many cases in which endocrine hormones and receptors have paracrine roots; for example, homologs for insulin-signaling genes or glycoprotein hormone receptors are already present in nematodes that are lacking an enclosed circulatory system (3, 4).

Over the past decade, whole-genome sequencing projects for numerous organisms have been completed, beginning with viruses and prokaryotes such as Escherichia coli (5) and continuing with the eukaryotic model organism’s baker’s yeast, Saccharomyces cerevisiae (6), the nematode Caenorhabditis elegans (7), and the fruit fly Drosophila melanogaster (8). Recently, working draft versions of the human genome have been published (1, 2). Moreover, a private draft sequence of the mouse genome (Mus musculus) and public draft sequences for two species of puffer fish (Fugu rubripes and Tetraodon nigroviridis) have been completed. Additional sequencing projects for rat (Rattus norvegicus), zebra fish (Danio rerio), and a number of other organisms are in progress.

Given the explosive growth of the genomic literature, we have deliberately chosen to limit the scope of the present review, highlighting basic principles in genomics and providing selected examples of their application to the field of hormonal research. In contrast, the evolving field of hormonal proteomics and the impact of genomics on clinical endocrinology will not be covered here.


    II. Gene Identification
 Top
 Abstract
 I. Introduction
 II. Gene Identification
 III. Sequence Homology and...
 IV. Chromosomal Location of...
 V. Gene Polymorphism
 VI. Alternative Splicing
 VII. Identification of the...
 VIII. mRNA Expression Profiling
 IX. The Postgenomic Challenges...
 References
 
In small microbial genomes, the identification of genes is relatively straightforward, as the gene-coding regions are not interrupted by introns. However, as organisms and their genomes increase in complexity, the task of gene finding becomes more difficult. One main reason for the difficulty is that the signal-to-noise ratio—and thus the accuracy of gene predictions—drops as the proportion of noncoding DNA increases. Whereas in prokaryotes, open reading frames (ORFs) that could encode a stretch of uninterrupted amino acids represent approximately 85% of the whole genome, this number decreases to around 70% in yeast and to less than 25% in the worm and fly (9). The human genome is estimated to have a total size of about 3.2 gigabases (i.e., 3.2 x 109 bp) of DNA, of which less than 1.5% are thought to encode proteins (10). Gene prediction in the human genome is further complicated by the presence of comparatively long and variable intron sequences (1).

The transcription units, consisting of exons, introns, and the regulatory region, are thought to constitute more than 20% of the entire human genome. To increase the sensitivity and specificity of gene predictions in the draft sequence of the human genome, both public and private groups are using a combination of approaches based on: 1) identification of matching mRNA-derived sequences, 2) searches for similarity to previously known genes and proteins, and 3) prediction based on features common to all genes.

The first strategy relies on identifying regions of similarity between genomic sequences and sequences that are known to be transcribed, such as expressed sequence tags (ESTs) derived from fragments of cDNA, and full-length cDNA sequences (11). Even though this approach is based on direct experimental evidence, it is still subject to errors resulting from spurious EST data, e.g., from unspliced mRNAs, or genomic DNA contamination. More importantly, genes expressed at low levels or those transcribed selectively in tissues or cell types underrepresented in EST databases may be overlooked. Likewise, single-exon genes encoding small proteins may not be detected because the ESTs representing them are sometimes indistinguishable from genomic DNA contamination (12).

The second strategy identifies similarities between the genomic sequence investigated and known gene or protein sequences in human or other species. Although this approach can correctly predict genes belonging to larger gene families or having orthologs in different species, it is not capable of identifying genes having no sequence similarity to known genes and is prone to picking up pseudogenes. Moreover, it is difficult to detect homologs of genes with small ORFs because sequence similarities over short stretches may not achieve the necessary level of statistical significance (12).

The third strategy, ab initio gene prediction, uses software algorithms combining statistical information on splicing sites, coding bias, and exon/intron lengths to detect putative exons and genes (13). In a recent study involving the fruit fly genome, such algorithms correctly predicted all exons of a gene in about 40% of the cases, but entirely missed 5–10% of the known genes (14). Due to the limitations discussed above, the application of ab initio gene prediction methods to the human genome is expected to yield an even lower sensitivity and specificity (15).

Combining all three strategies, the total number of human genes was estimated to be between 30,000 and 40,000. It is important to note, however, that we are still some way from having a complete human gene index. Indeed, initially, a complete high-stringency alignment with the draft sequence could only be achieved for half of the more than 10,000 known human genes stored in the curated RefSeq database (1). This discrepancy is due to the remaining errors and gaps in the draft sequence that are being addressed during the ongoing process.

With the completion of the sequencing of several vertebrate genomes, analytical approaches to identify functionally similar homologous genes have become feasible. The best examples for hormonal research are the discoveries of more than 400 G protein-coupled receptors in the human genome with a potential hormonal ligand (16), and of diverse ligands belonging to the cytokine (17), IL (18, 19), and cysteine-knot gene families. New ligand/receptor systems have also been discovered with the aid of genomic information. For example, a family of more than 10 Toll-like receptors has been identified (20), and the crucial roles of these receptors in the recognition of microbial components and innate immunity have been elucidated (21).

As discussed above, the task of gene identification is not trivial. Moreover, multiple gene messages can be derived from a single stretch of DNA based on alternative uses of promoters, exons, and termination sites. Adding to these overlapping transcription units, somatic recombination events such as those found in some of the immune recognition loci, and the existence of highly similar gene families and pseudogenes, render it difficult to clearly identify and categorize the genes. Based on the classic definition that genes are distinct transcription units that are translated to generate one or a set of related proteins, many "genes" in the genome are pseudogenes. The human genome is estimated to encode approximately 900 olfactory receptor genes; however, 60% of them appear to be pseudogenes with disrupted ORFs (22, 23). Likewise, the entire mouse olfactory receptor gene repertoire comprising about 1300 genes may include 20% pseudogenes (24). Because not all genes with disruptions are pseudogenes, due to the possible splicing out of regions with a termination codon, the exact number of functional genes remains to be determined using more advanced experimental procedures.

Comparative genomic analyses of gene families across different species can provide insights into the evolution and associated adaptation of specific hormonal regulatory circuits among closely related species. For example, multiple GnRHs and GnRH receptors have been identified in vertebrates from teleosts to primates (25); however, humans appear to have only one functional GnRH receptor due to the apparent degeneration of the type II GnRH receptor gene into a pseudogene (26, 27, 28). Because the ortholog for the human type II GnRH receptor is functional in nonhuman primates (29), future investigation on the role of two types of GnRH receptors in primates will have to address the evolutionary adaptation of these GnRH signaling systems.


    III. Sequence Homology and Phylogenetic Relationships
 Top
 Abstract
 I. Introduction
 II. Gene Identification
 III. Sequence Homology and...
 IV. Chromosomal Location of...
 V. Gene Polymorphism
 VI. Alternative Splicing
 VII. Identification of the...
 VIII. mRNA Expression Profiling
 IX. The Postgenomic Challenges...
 References
 
Although the human genome has been sequenced, only a small fraction of its genes have been studied experimentally to reveal their physiological roles. Consequently, a major challenge for functional genomics is to predict and then elucidate the function of all genes in the human body. For hormonal genomics, we can take advantage of the unique sequence features of known ligands, receptors, and intracellular signaling molecules. Using sequence relatedness as a criterion, one can define subgenomes encompassing specific groups of ligand-signaling genes. Although we do not have knowledge of all hormones and receptors in the human genome, most of the encoded G protein-coupled receptors have been identified. Likewise, attempts were made to predict all cystine knot-containing ligands and related extracellular signaling molecules among the human ORFs by searching for their unique pattern of cysteine signatures (30). Although a large number of secreted polypeptide hormones can potentially be identified based on the presence of a signal peptide for secretion and the lack of transmembrane regions, this group of predicted genes also includes adhesion molecules, secreted enzymes, and other uncharacterized extracellular proteins.

Despite the lack of a convenient global approach to understand genes involved in ligand signaling, one can define subgroups of these genes based on their evolutionary relationships. Two types of relationships between genes deserve specific attention. The first type is the orthologous gene that did not significantly change in either structure or function during speciation. For instance, it has been estimated that 40% and 60% of human genes responsible for genetic diseases have orthologs in yeast and fly, respectively. The second group of human genes derived as the result of gene-duplication events during evolution are called paralogs. Although these genes belong to the same family and usually have similar structural motifs, they can serve related but nonidentical physiological functions. It has been hypothesized that during early chordate evolution, before the Cambrian explosion and during the early Devonian period, two entire genome duplications took place (31, 32), leading to the generation of sets of multiple mammalian paralogs, concomitant with the opportunity to develop complex regulatory mechanisms and functions. Thus, most human genes belong to a family as the result of gene duplication and domain shuffling. These evolutionary mechanisms allow for the amplification and diversification of existing domains, the recruitment of existing domains for new functions, and the development of new domains through domain combination and shuffling.

Comparative genomic analysis reveals the evolution of polypeptide ligands, receptors, and intracellular signaling molecules. Only four orthologs of human insulin-like genes have been found in D. melanogaster, whereas C. elegans possesses a large number of insulin gene homologs, including one that was shown to determine the life span of the worm (33). In contrast, three insulin paralogs are present in the human genome, including insulin controlling carbohydrate metabolism, and IGF-I and IGF-II regulating organ and body growth. A related group of hormonal ligands that also consist of B and A subunits connected by a long-C-domain peptide is constituted by the relaxin-related proteins. The human paralogs of both insulin and relaxin all have highly conserved cysteine residues essential for maintaining a similar secondary structure. In the relaxin group, at least seven human paralogs have been identified, including three relaxin genes, Leydig cell relaxin (INSL3), INSL4, INSL5/RIF2, and INSL6/RIF1 involved in reproductive tract maintenance and other uncharacterized functions (34, 35).

Genomic studies on the presence of genes with sequence similarities to the human gonadotropin and TSH receptors also enhanced our understanding of their evolution. All of these genes encode proteins with a large N-terminal extracellular region important for ligand binding followed by a seven-transmembrane region known to be essential for G protein coupling. Based on the evolutionary conservation of these genes, five orphan receptors were identified in humans and named LGRs (leucine-rich repeat-containing, G protein-coupled receptors) (36, 37). Based on the structural comparison of LGRs from diverse species, and the evolutionary relationship of these animals, a putative evolutionary tree for the LGR family of proteins can be proposed. The primitive LGR gene likely replicated before the emergence of the cniderian (sea anemone) to form three LGRs (named as LGRA, LGRB, and LGRC), each with homologs in modern vertebrates. Based on their structural similarity and putative evolutionary origins, mammalian LGRs can be divided into three subgroups: LGRA (LH, FSH, and TSH receptors), LGRB (LGR4, LGR5, and LGR6), and LGRC (LGR7 and LGR8) (38). In fly, there are at least three LGRs, each belonging to one of these subgroups. Because only one LGR with similarities to the LGRA subgroup could be identified in C. elegans (4), it is likely that a gene loss occurred during evolution. Interestingly, recent studies indicated that LGR7 and LGR8 are receptors for relaxin (39).

Initially discovered based on their sequence similarity to a fly ortholog, cyclic nucleotide phosphodiesterases represent a large group of genes encoding enzymes that hydrolyze and inactivate cAMP and cGMP (40). Animal cyclic nucleotide phosphodiesterases comprise at least seven subtypes as well as orthologs corresponding to four subtypes that can be found even in freshwater sponges. Accordingly, phylogenetic analysis revealed that most gene duplications and domain shuffling that gave rise to different subtypes had been completed in the early evolution of animals before the separation of sponges and eumetazoans (41).

Another example of the use of comparative genomics for hormone discovery is the recent identification of CRH- related peptides (42, 43, 44). Using phylogenetic and structure profile analyses, two novel peptides, stresscopin/urocortin II and stresscopin-related peptide/urocortin III, with limited sequence relatedness to frog sauvagine and mammalian CRH, were isolated from the human and puffer fish genomic databases and shown to function as selective type 2 CRH receptor agonists. These studies suggest that the two novel peptides are important in stress-coping responses in both central and peripheral tissues for maintaining homeostasis after the initial fight-or-flight responses induced by stress.

Evolutionary genomic approaches have facilitated the analysis of large families of polypeptide ligands. A recent review summarizes the origin and function of the pituitary adenylate cyclase-activating polypeptide/glucagon superfamily in vertebrates and shows that the ancestral superfamily members can be traced to a common origin with the invertebrates. It also suggests that the superfamily began with gene or exon duplication and then continued to diverge with some gene duplications in vertebrates (45). Likewise, recent duplication of the LHß gene in ancestral primates led to the derivation of multiple human chorionic gonadotropin-ß genes (46). Furthermore, the coevolution of GH and its receptor in primates has been analyzed. These studies demonstrated that GH evolution is very conservative among most mammals but not in primates. The human GH receptor displays species specificity and interacts only with human or rhesus monkey GH, but not with nonprimate GH. Sequence analysis revealed that species specificity of the human GH receptor emerged in the common ancestor of Old World primates in a relatively short period (47). Phylogenetic analysis also suggested the evolution of vertebrate steroid receptors from an ancestral estrogen receptor, followed by the progesterone receptor and other related proteins (48).

As these examples illustrate, the availability of genomic sequences from an increasing number of organisms will enable hormonal researchers to decipher the evolutionary origin of diverse human genes important for ligand signaling. In turn, these findings will significantly facilitate the experimental verification of functions for novel genes based on their relationship to characterized genes as well as studies in model organisms.


    IV. Chromosomal Location of Genes
 Top
 Abstract
 I. Introduction
 II. Gene Identification
 III. Sequence Homology and...
 IV. Chromosomal Location of...
 V. Gene Polymorphism
 VI. Alternative Splicing
 VII. Identification of the...
 VIII. mRNA Expression Profiling
 IX. The Postgenomic Challenges...
 References
 
For decades, genetic mapping of the chromosomal positions of genes has been performed based on patients’ family histories or cross-breeding in animals. Linkage analyses utilize the frequency of meiotic recombination between genetic markers such as microsatellite DNAs or known genes to estimate the position of genes with a unique phenotype. The determination of gene positions in chromosomes has also been assisted by physical mapping based on fluorescence in situ hybridization, restriction enzyme length polymorphism, and sequence-tagged sites. In the postgenomic era, chromosomal location is known for all human genes with known sequences.

Investigations of gene order in chromosomes of closely related species indicate that chromosomes underwent duplication, fusion, translocation, and inversion events during evolution. In syntenic regions of related species, a partial or complete conservation of gene order during chromosome evolution is evident. With the availability of complete genomes of increasing numbers of vertebrate species, the genomic analysis of syntenic regions becomes a valuable tool for understanding the function of human orthologs based on the conservation of gene locations. Recently, syntenic mapping of human and mouse chromosomes was performed by the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/Homology/). With the well established technology of gene engineering in the mouse, comparisons of human and mouse phenotypes and the elucidation of syntenic genes will become an increasingly useful approach in the future. Similar syntenic mapping projects are being performed between human and rat (49) as well as between other species.

One example of the impact of synteny studies on endocrine research is the role of certain members of the TGFß family in ovarian follicle development. Mutant mice lacking the oocyte hormone growth differentiation factor (GDF)-9 showed an arrest of ovarian follicles at the secondary stage (50). Furthermore, another oocyte gene with high homology to GDF-9, named GDF-9B or bone morphogenetic protein-15 (51, 52), was also found to stimulate granulosa cell proliferation (53) and mapped to human chromosome Xp11.2–11.4. In sheep, a naturally occurring X-linked mutation was known to cause an increased ovulation rate and twin and triplet births in heterozygotes but primary ovarian failure in homozygotes. Further studies identified the genetic locus of this Inverdale sheep mutation as syntenic to the human Xp region containing the GDF-9B/bone morphogenetic protein-15 gene. Thus, this oocyte gene is essential for female fertility, and its natural mutations can cause both an increased ovulation rate and infertility phenotypes in a dosage-sensitive manner (54).

In addition to studies of syntenic chromosome regions, analyses of unique phenotypes in patients have also confirmed the disruption of linked genes in the human genome. Kallmann syndrome is characterized by hypogonadotropic hypogonadism and an inability to smell as the result of a defect in the migration of olfactory and GnRH neurons (55). Although this disease has been associated with both X-linked and autosomal inheritance, some infertile patients also showed skin lesions of scalp, ears, and neck, representing a contiguous gene syndrome involving two adjacent genes in chromosome Xp22.3 (56). In this region, the KAL1 gene responsible for Kallmann syndrome is situated adjacent to the steroid sulfatase gene responsible for X-linked ichthyosis. Therefore, the analysis of patients with microdeletions of neighboring chromosomal regions containing linked genes can reveal the location of disease genes.


    V. Gene Polymorphism
 Top
 Abstract
 I. Introduction
 II. Gene Identification
 III. Sequence Homology and...
 IV. Chromosomal Location of...
 V. Gene Polymorphism
 VI. Alternative Splicing
 VII. Identification of the...
 VIII. mRNA Expression Profiling
 IX. The Postgenomic Challenges...
 References
 
The out-of-Africa hypothesis regards all modern human populations as descended from a small group of approximately 100,000 individuals that migrated from Africa less than 150,000 yr ago and replaced archaic populations (57). Assuming the common origin of modern humans, nucleotide variations are shared at specific positions in the genome. Base pair changes that lead to minimal alterations in the property of encoded amino acids usually do not significantly affect the overall function of the protein. Consequently, variations in nucleotide sequences that are associated with silent mutations or minimal phenotypic changes are tolerated during evolution and inherited over generations. These single nucleotide polymorphisms represent minor alterations in DNA that occur with varying frequencies in different ethnic populations and are a central focus of pharmacogenomic studies (58). A random pair of human haploid genomes differed at an average rate of 1 per 1250 bp, but there was marked heterogeneity in the occurrence of polymorphisms across the genome. It has been estimated that there are 300,000 potential single nucleotide polymorphisms in the human genome (59, 60).

Polymorphisms of individual genes can account for differences in an individual’s responsiveness to hormonal and other ligands. For example, polymorphisms in the coding region of the C-C chemokine receptor (CCR)-5 gene, and in the promoter region of its ligand RANTES, are associated with aberrant ligand-receptor interactions. Patients with an altered CCR-5 receptor, the coreceptor for the viral infection of immune cells, or patients who exhibit overproduction of RANTES, can be exposed to HIV with reduced infection risk or delayed disease progression (61, 62). In both cases, the viral particles fail to infect cells either because they cannot enter using the defective receptor or because the functional receptor is already occupied by high levels of the endogenous ligand. Likewise, a polymorphism of the ß-adrenergic receptor is associated with altered sensitivity to ß-agonists in asthmatics (63). Moreover, a single-nucleotide polymorphism of the IL-1B promoter has been found to be associated with a chronic hypochlorhydric response to Helicobacter pylori infection and the risk of gastric cancer, presumably by altering IL-1 levels in the stomach (64). Recently, two linked polymorphisms of the FSH receptor gene were found to alter FSH responsiveness of gonadal cells as the amount of FSH required for ovarian stimulation in patients with different polymorphic alleles differed (65). Estrogen receptor variants in normal and neoplastic mammary tissues and other target cells have also been identified (66, 67). These transcripts could be responsible for aberrant ligand-signaling events. A recent study analyzed polymorphisms and genetic variations of multiple G protein-coupled receptors and documented the functional importance of specific residues for the function of these receptors (68).

Studies on gene polymorphisms could also facilitate the mapping of disease genes based on their chromosomal location. Studies on polymorphic differences in coding and noncoding regions in large populations with a particular phenotype facilitate genome-wide linkage scans of polygenic diseases such as type 1 and type 2 diabetes. This approach is complicated by the dependence of such complex diseases on a combination of genetic variations that may be common in the general population, and environmental factors. However, increasing documentation of single-nucleotide polymorphism variation in humans could allow genome-wide linkage and association scans for complex endocrine diseases (69). Recently, advanced analytical tools have been applied to delineate the relationship between a common polymorphism in the peroxisome proliferator- activated receptor-{gamma} gene and type 2 diabetes (70). Because the chromosomal locations of human genes in the known ligand-signaling pathways are clear, efforts to identify genetic loci predisposing for hypertension have also focused on candidate gene strategies. Genetic linkage and association methods have correlated hypertension with polymorphisms in the genes of the renin-angiotensin-aldosterone system, the catecholaminergic/adrenergic receptors, and other hormonal factors. These studies have yielded promising leads by suggesting a number of candidate genes for hypertension susceptibility (71).


    VI. Alternative Splicing
 Top
 Abstract
 I. Introduction
 II. Gene Identification
 III. Sequence Homology and...
 IV. Chromosomal Location of...
 V. Gene Polymorphism
 VI. Alternative Splicing
 VII. Identification of the...
 VIII. mRNA Expression Profiling
 IX. The Postgenomic Challenges...
 References
 
To deduce the complete set of proteins encoded by a genome, more information than the sequence of all its genes is required. In higher eukaryotes, the primary transcripts of individual genes are processed in the nucleus before being transported to the cytoplasm as mRNAs that direct ribosomal protein synthesis. Alternative mRNA processing is a mechanism for creating protein diversity through selective inclusion or exclusion of nucleotide sequences during posttranscriptional processing. Although human, fruit fly, and worm appear to have similar numbers of alternative splicing forms per gene, it is clear that alternative splicing results in a significant increase in coding capacity and protein diversity (72). Different studies imply that alternative splicing occurs in about 35–50% of human genes (73, 74, 75, 76) and may be more frequent in certain types of genes, e.g., those encoding cell surface receptors (77). Although it is not unusual for individual genes to have as many as 12 different mRNA species, a small subset of genes (such as adhesion molecules or ion channels) have been found to possess more than 100 splicing variants (78).

Sequence alignment of multiple ESTs with the corresponding genomic sequence is the most common method to predict the presence of different mRNA species (76, 79). Intronic sequences at splice junctions are highly conserved, and 99% of introns have a GT-AG at their 5'- and 3'-ends, respectively. These characteristic patterns can be used to verify candidate splices. Other approaches for the prediction of alternative splicing variants include interspecies comparisons of genomic sequences (80) and the computational analysis of mRNA expression data measured with oligonucleotide microarrays (81).

The phenomenon of alternative splicing affects hormonal systems in two ways. In one case, the components of the machinery—polypeptide ligands, receptors (plasma membrane and nuclear), intracellular mediators of signaling pathways, etc.—are themselves subject to alternative splicing. On the other hand, the processing of pre-mRNAs in the nucleus may be subject to hormonal regulation (82). For example, hypophysectomy in the rat results in the expression of alternative splice variants of a calcium-dependent potassium channel, an effect that can be prevented by injection of ACTH (83).

A number of hormone and growth factor isoforms are derived from alternative gene splicing (84). One prominent example is vascular endothelial growth factor, a factor that induces microvascular permeability and plays a central role in both angiogenesis and vasculogenesis. Through alternative mRNA splicing, the vascular endothelial growth factor gene gives rise to several distinct isoforms differing in their expression patterns as well as their biochemical and biological properties (85, 86, 87).

There are also many examples of receptor gene splicing. For instance, inclusion or exclusion of exon 11 of the insulin receptor gene results in two different mRNA isoforms encoding receptor proteins with differing biological properties (88). Similarly, there are at least eight human CRH-receptor type 1 mRNA variants, and the splicing pattern varies in response to environmental stress. Functional analysis indicated that some of these CRH-receptor type 1 variants differ in their ability to transduce ligand signals in the cAMP-mediated pathways, whereas others could function as binding proteins for CRH and related ligands (89). The alternative splicing mechanism also underlies the generation of diverse decoy receptors belong to the TNF receptor, the chemokine receptor, and cytokine receptor families. These soluble or membrane-anchored decoy receptors bind and sequester their respective ligands with high affinity and specificity but are incapable of signaling or presenting the agonist to the functional receptor complexes, thus acting as a molecular trap for endogenous agonists (90).

For intracellular signaling proteins, gene splicing can also lead to multiple gene products with diverse functions. For genes in the apoptosis pathway, alternative splicing can affect both the functional activity and the intracellular distribution of protein isoforms (91). For example, a splicing variant of the antiapoptotic Mcl-1 protein has proapoptotic activity and antagonizes the prosurvival action of the known Mcl-1 protein by forming noncovalent dimers (92). In this example, proteins with diametrically opposed functions are generated under different physiological conditions through alternative splicing mechanisms.

Splicing of mRNA constitutes an essential molecular mechanism in the ongoing evolution of eukaryotes and is pivotal to the understanding of hormonal genomics. Currently available information on alternative splice variants for numerous genes can be found in databases such as ASDB (93), AsMamDB (94), or PALS db (95). Such resources will allow us to gain a better understanding of the complexity of alternative splicing and its role in hormonal research. Because approximately 15% of the point mutations that cause diseases in humans are thought to alter the normal splicing pattern, studies on gene splicing mechanisms could also provide important insights into human pathophysiology.


    VII. Identification of the Regulatory Elements of Genes
 Top
 Abstract
 I. Introduction
 II. Gene Identification
 III. Sequence Homology and...
 IV. Chromosomal Location of...
 V. Gene Polymorphism
 VI. Alternative Splicing
 VII. Identification of the...
 VIII. mRNA Expression Profiling
 IX. The Postgenomic Challenges...
 References
 
Hormones often effect changes in their target tissues by altering the expression of specific sets of genes. At the DNA level, these temporal, spatial, and quantitative changes in gene expression are controlled by various regulatory DNA sequences such as promoters, enhancers, insulators, and silencers (96, 97). The identification and characterization of regulatory DNA sequences that mediate the actions of hormones and their signaling pathway genes therefore constitute an important focus of hormonal research.

Traditionally, regulatory DNA sequences have been investigated on a gene-by-gene basis, using a variety of approaches including deletion constructs, DNA footprinting, and gel shift assays. With the availability of genome sequences for human, rodent, and other species, novel strategies for the large-scale, computer-based in silico identification of regulatory sites are becoming feasible (98).

cis-Regulatory elements in the promoters and enhancers of genes can be defined based on their DNA sequences, position relative to coding sequences, and the potential to interact with transcription factors. Because an exhaustive experimental characterization of all potential regulatory sites in a given genome is difficult to generate, a variety of computational approaches to this problem have been developed. These methods include 1) analyses using databases of experimentally defined transcription factor binding sites, 2) approaches based on comparative genomics, and 3) sequence analyses of genes with similar expression profiles.

Experimentally derived data on transcription factors, their consensus DNA binding sites, and DNA-binding profiles have been compiled in a number of publicly accessible databases (99, 100, 101, 102). By comparing a given genomic DNA sequence to the consensus target sites for transcription factors stored in the database, potential regulatory sites can be identified. Because many of these sites are short in length and contain degenerate consensus sequences, a relatively large number of false-positive predictions are generated.

Cross-species comparisons, also known as phylogenetic profiling, provide another approach for identifying regulatory DNA sequences. These methods align noncoding sequences of orthologous genes from different species, taking advantage of the high degree of conservation for regulatory sequences. On a functional level, this finding is matched by the observation that genomic transgenes largely retain their natural expression patterns even when introduced into mice from other mammalian species (103). With the availability of working drafts of the human and mouse genome, large-scale interspecies comparisons to identify conserved noncoding sequences have become possible (104, 105). Moreover, alignments of large genomic regions can be easily performed and visualized using Internet-based tools such as PipMaker (106). The last common ancestor of humans and mice lived about 80–100 million years ago, and it appears that in many regions of the human genome, this evolutionary distance will allow the identification of conserved regulatory sites through comparative genomics (98, 107).

Another strategy to predict regulatory elements of genes is based on the use of expression profiling to identify coregulated genes (cf. Section VIII). Transcription factors often coordinate the expression of whole sets of genes such as those induced by a given hormonal ligand. This coordinated regulation results from the presence of common transcription factor binding sites in these genes and, hence, similar or identical sequence motifs in their noncoding regions. For hormonally regulated genes, large-scale identification of coexpressed mRNAs after hormonal treatment can be used to identify genes that are coregulated by the same transcription factors. A detailed analysis of the upstream regions of these functionally clustered genes for shared sequence motifs can then serve as a starting point for the characterization of novel regulatory sequences. This approach has already been successfully applied to infer regulatory sites in yeast from microarray- derived expression data (108, 109, 110). Because the expression of a single gene can be controlled by various synergistic and/or antagonistic factors acting on different binding sites in its regulatory region, additional algorithms are needed to model relationships between multiple transcription factors and their binding sites to unravel complex transcriptional networks (111, 112, 113).

The combination of bioinformatics and novel experimental approaches allows new insights into the function of regulatory elements. For example, nuclear receptor signaling is transduced by homodimeric or heterodimeric receptor complexes that bind to specific sequences of DNA, termed response elements, in the promoter regions of target genes. Because considerable flexibility exists in the types of binding sites the nuclear receptors are capable of recognizing, the precision of predicting regulatory elements based on a bioinformatic approach is limited. Screening methods involving immunoselection and PCR amplification or microarray binding assays in combination with bioinformatic analyses now can be used to examine genomic binding sites for different nuclear receptors. The integrated approach for isolating functional binding sites for nuclear receptors from genomic DNA should aid in the discovery of genes regulated by steroid hormones (114). The large-scale prediction of regulatory regions in mammalian genomes will likely require a combination of all the approaches mentioned above. Importantly, in silico prediction of regulatory DNA sequences will need to be verified by functional data from high-throughput experimental methods. Although the area of regulatory element analysis remains a major challenge for hormonal researchers, recent studies on the promoter of the human RANTES/CCR5 ligand gene have allowed the identification of promoter "modules" that are important for cell type-specific expression of this gene (115, 116) and will serve as a model for future bioinformatic approaches in this field.


    VIII. mRNA Expression Profiling
 Top
 Abstract
 I. Introduction
 II. Gene Identification
 III. Sequence Homology and...
 IV. Chromosomal Location of...
 V. Gene Polymorphism
 VI. Alternative Splicing
 VII. Identification of the...
 VIII. mRNA Expression Profiling
 IX. The Postgenomic Challenges...
 References
 
The majority of cells in an individual organism (with exceptions such as haploid germ cells or neoplastic cells) share an identical set of genes. Nevertheless, these cells display an astonishing phenotypic diversity that is further enhanced by a multiplicity of developmental stages and physiological or pathological states. This diversity of cellular phenotypes encountered within an individual is almost entirely due to differences in the patterns of gene expression. Consequently, the study of changes in gene expression over time and space—as well as in response to hormones—represents a challenging area of hormonal research.

Traditional experimental methods, such as Northern blot analysis, differential display, and subtractive hybridization, were the standard tools that researchers used to identify differentially expressed genes. These approaches produced either lists of differentially expressed genes or, at most, semiquantitative estimates of transcript frequencies in different biological samples. This situation changed with high-throughput sequencing of cDNAs, including studies of ESTs and the serial analysis of gene expression (SAGE). These methods allowed scientists to quantify the occurrence of thousands of transcripts in libraries prepared from different cells or tissues. Based on the resulting data, the relative expression of different genes within a biological sample could be assessed. Databases containing information on ESTs derived from different tissues or cell lines can be analyzed to estimate the relative abundance of transcripts (117, 118). Using digital differential display or electronic subtraction, a computational assessment of transcript frequencies in different samples (e.g., with or without nerve growth factor treatment) can be used to explore differential gene expression (119). Digital differential display has been used to identify genes differentially expressed in diverse hormone- producing cells and tissues, including murine oocytes (120), the human prostate (121, 122), and the human hypothalamus-pituitary-adrenal axis (79). By comparing the genomic localizations of the genes encoding testis-specific transcripts with known translocation breakpoints from infertile males, a set of candidate genes for male infertility was identified (123).

In SAGE, short diagnostic sequence tags are isolated from multiple cDNAs in the sample of interest, concatenated, and cloned. In this manner, the sequencing of thousands of so-called SAGE tags can be performed with relative ease. By calculating the frequencies of tags in a given sample, the expression patterns of the corresponding genes can be determined (124, 125). A SAGE analysis of the molecular phenotype of the human oocyte identified a variety of surface receptors, components of second messenger systems, and secreted proteins, many of which were not previously known to be expressed in mammalian oocytes (126). Other applications of SAGE technology to the field of endocrinology include the identification of corticosteroid-responsive genes in the rat hippocampus (127), the discovery of human thyroid-specific transcripts (128, 129), and the characterization of gene expression in PRL receptor knockout mice (130).

With the introduction of DNA array technologies, powerful tools for global studies of gene expression have become available to individual laboratories (131). In this approach, gene-specific sequences (probes) are immobilized on solid substrates in ordered arrays. Subsequently, mRNA from the cells or tissues to be studied is used to generate labeled samples (targets) that are hybridized to the arrays. By quantifying the amount of labeled target hybridized to the array, large-scale analyses of gene expression can be performed. The available DNA array platforms differ in a number of ways including the substrate on which the array is produced (glass slides or silicon wafers), the type of arrayed material (cDNAs or oligonucleotides), the way in which the probe is fixed onto the array (printing or in situ synthesis), and the type of labeling used [fluorescent dyes or radioactive tracers (132)].

As a single DNA array experiment yields thousands of data points, new computational tools have been developed for their management and analysis (133). In particular, a variety of clustering algorithms have been designed to classify and group genes on the basis of similarities or differences in their expression across different samples and thus help users to recognize global gene expression patterns (134). The popular hierarchical clustering approach uses data from a set of array experiments to place genes on a single hierarchical tree. Genes with similar expression profiles become neighbors on the tree, whereas genes with larger differences in their expression patterns appear on more distant branches (135).

One basic assumption in interpreting gene expression profiles is that genes with similar transcription patterns are likely to share aspects of their regulation and/or biological functions. As a consequence, previously uncharacterized genes have been attributed putative functions based on the characteristics of known genes that are found in the same cluster. This "guilt-by-association" approach has been successfully used to predict the function of yeast genes (131) and is now being applied to the study of mammalian genomes, e.g., the identification of prostate cancer marker genes (136).

Microarray technologies have also been applied to investigate a number of endocrine tissues and hormone-regulated processes. Studies performed to date have addressed—among others—questions involving gene expression in the human adrenal (137), the effect of hypophysectomy in rats (138), development of the mouse pituitary gland (139), adipose tissue in obese and diabetic mice (140), preovulatory gene expression in the rat ovary (141), implantation in the mouse (142), and gene expression in the murine placenta and embryo (143). Another important area of DNA array studies is the classification of disease phenotypes based on expression profiling of genes in normal and pathological samples (144, 145, 146, 147). In the future, transcript profiling of patient specimens could be used to subclassify hormonal disorders allowing for more precise therapeutic interventions.

As gene expression patterns may vary widely between adjacent but biologically distinct tissues and cell types, a meaningful microarray analysis often depends on the precise selection of defined cell populations from a sample. The investigation of gene expression patterns can be refined by laser capture microdissection, often in combination with methods for target amplification (148, 149). In the future, global analyses of gene expression may be performed even in single cells (150).

So far, most of these DNA array studies have taken a local approach to the study of gene expression and have focused on identifying individual genes connected with certain biological phenomena. Because gene products in a given cell usually function in a concerted manner and belong to specific pathways for carrying out unique cellular functions, future studies will focus on a global analysis of gene expression. In a recent study, the results of more than 500 DNA microarray experiments were analyzed to group coregulated genes in C. elegans. To visualize the relationships between more than 17,000 genes in the nematode genome, clusters of coexpressed transcripts were displayed as mountains in a three-dimensional expression map that corresponds to established, as well as previously unknown, biological relationships (151). In addition, a number of projects focus on the analysis and presentation of gene pathway maps. For example, GenMapp provides a graphic representation of the relative expression of selected groups of genes categorized by their known positions in physiological pathways (152) (http://www.genmapp.org). Thus, RNA transcripts for thousands of genes from defined cell populations can be probed for their relative abundance based on prearranged maps representing functional pathways.

DNA array experiments generate data sets of sizes that are unprecedented in most areas of biological research (153). In the future, it will become increasingly useful for these data to be available in electronic form to the scientific community at large. On the one hand, these data cannot be published by traditional methods because of their unwieldy quantity and because they can be more easily visualized and manipulated using a computer. On the other hand, the constant improvement of algorithms for the analysis of microarray data means that data sets will be frequently revisited to reinterpret their meaning or to formulate new hypotheses and relationships. Most importantly, however, the true power of gene expression data can only be unleashed when the expression of a given set of transcripts can be compared over as many different tissues and cell types, developmental and physiological states, and diseases and experimental conditions as possible. Therefore, public repositories and common data formats for gene expression data have been proposed (154).


    IX. The Postgenomic Challenges in Hormonal Research
 Top
 Abstract
 I. Introduction
 II. Gene Identification
 III. Sequence Homology and...
 IV. Chromosomal Location of...
 V. Gene Polymorphism
 VI. Alternative Splicing
 VII. Identification of the...
 VIII. mRNA Expression Profiling
 IX. The Postgenomic Challenges...
 References
 
The application of postgenomic approaches to the field of hormone research will likely result in major changes in the way endocrinologists perform their work. In the traditional endocrine laboratory, the single-gene/single-protein approach dominates, providing an integrated view of sequence, mRNA, and protein expression, as well as cellular and organismal function for a limited number of genes. This type of research often starts from functional studies where individual laboratories function independently with a limited degree of collaboration. In the current early phase of the postgenomic era, new methods provide massive amounts of data, but these are still largely interpreted using a classic paradigm. For example, DNA microarrays are often used as a sophisticated substitute for differential display or subtractive hybridization. Consequently, out of thousands analyzed, only a few genes with the most pronounced changes are selected and studied using the single-gene approach. In spite of the accelerated pace of discovery, the independent laboratory approach is still most common.

In the next phase, the large data sets created by the application of a single high-throughput method will be analyzed and interpreted with regard to the complete underlying set of biological entities (e.g., genes). The characterization of gene function will also expand from the single-gene knockout approach to random or targeted gene-trapping approaches (155, 156) (http://socrates.berkeley.edu/~skarnes/resource.html; http://www.lexgen.com/). A secretory trap method has allowed the generation of mice with a deletion of genes encoding membrane and secreted proteins, including those for polypeptide ligands and plasma membrane receptors. After the validation of functions for individual genes, comprehensive gene functional maps will be integrated into a biological atlas (157). As a result, relationships between separate entities in this set are recognized, patterns are discovered, and new testable hypotheses regarding gene pathways and networks are created. This global phase will be reached when a large number of data sets become available in a compatible electronic format and new computational methods for their analysis, visualization, and interpretation become available (e.g., the use of microarray data in the yeast research community). For understanding hormonal signaling, data on all ligands and receptors will be deposited and categorized in searchable databases with links to biomedical literature and gene sequences, protein structures, and other resources. These resources will be complemented by databases for specific endocrine organs [e.g., the Ovarian Kaleidoscope database, (158)] to provide an integrated view of organ physiology and pathophysiology.

We are moving from a gene-by-gene approach to a reconstructionistic one. Genes in the hormonal signaling system can be analyzed from the combined perspectives of traditional endocrinologists, molecular biologists, developmental biologists, cell biologists, evolution biologists, structural biologists, and geneticists. In the postgenomic era, the scientific community as a whole, as opposed to an individual laboratory, plays an increasingly important role in biomedical research. Eventually, the massive genomic data gathered with different methods (e.g., DNA arrays and proteomics) will be integrated to allow a global view of the system (157). Recently, steroid receptors and related orphan receptors have been grouped in the context of all transcription factors (159), whereas diverse plasma membrane receptors have been classified based on their evolutionary origin and relationship to other plasma membrane signaling molecules (I. Ben-Shlomo, S. Y. Hsu, and A. J. W. Hsueh, submitted). In this way, new metapatterns can be discovered, and pathways spanning the scope of different methods can be understood ranging from regulatory DNA sequences to gene transcript expression to protein function. This phase relies heavily on collaboration within the biomedical community because no single laboratory is likely to have the expertise, time, and resources to generate such large datasets using different high-throughput methods. This transition of hormonal research from a single laboratory approach to an integrated community effort will pose an important challenge for the next generation of endocrine researchers.


    Acknowledgments
 


    Footnotes
 
This work was supported by NIH Grants HD-23273, HD-31398, and DK-58534.

* Current address for C.P.L.: MBA Programme, Institute Europeen d’Administration des Affaires, Boulevard de Constance, 77305 Fontainebleau Cedex, France.

Abbreviations: CCR, C-C chemokine receptor; EST, expressed sequence tag; GDF, growth differentiation factor; LGR, leucine-rich repeat-containing G protein-coupled receptor; ORF, open reading frame; SAGE, serial analysis of gene expression.


    References
 Top
 Abstract
 I. Introduction
 II. Gene Identification
 III. Sequence Homology and...
 IV. Chromosomal Location of...
 V. Gene Polymorphism
 VI. Alternative Splicing
 VII. Identification of the...
 VIII. mRNA Expression Profiling
 IX. The Postgenomic Challenges...
 References
 

  1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, et al. 2001 Initial sequencing and analysis of the human genome. Nature 409:860–921[CrossRef][Medline]
  2. Venter JC, Adams MD, Myers EW, Li PW, Mural, RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides, P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, et al. 2001 The sequence of the human genome. Science 291:1304–1351[Abstract/Free Full Text]
  3. Gerisch B, Weitzel C, Kober-Eisermann C, Rottiers V, Antebi A 2001 A hormonal signaling pathway influencing C. elegans metabolism, reproductive development, and life span. Dev Cell 1:841–851[CrossRef][Medline]
  4. Kudo M, Chen T, Nakabayashi K, Hsu SY, Hsueh AJ 2000 The nematode leucine-rich repeat-containing, G protein-coupled receptor (LGR) protein homologous to vertebrate gonadotropin and thyrotropin receptors is constitutively active in mammalian cells. Mol Endocrinol 14:272–284[Abstract/Free Full Text]
  5. Blattner FR, Plunkett III G, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF, Gregor J, Davis NW, Kirkpatrick HA, Goeden MA, Rose DJ, Mau B, Shao Y 1997 The complete genome sequence of Escherichia coli K-12. Science 277:1453–1474[Abstract/Free Full Text]
  6. Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M, Louis EJ, Mewes HW, Murakami Y, Philippsen P, Tettelin H, Oliver SG 1996 Life with 6000 genes. Science 274:546, 563–547[Abstract/Free Full Text]
  7. The C. elegans Sequencing Consortium 1998 Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282:2012–2018[Abstract/Free Full Text]
  8. Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, George RA, Lewis SE, Richards S, Ashburner M, Henderson SN, Sutton GG, Wortman JR, Yandell MD, Zhang Q, Chen LX, et al. 2000 The genome sequence of Drosophila melanogaster. Science 287:2185–2195[Abstract/Free Full Text]
  9. Stein L 2001 Genome annotation: from sequence to biology. Nat Rev Genet 2:493–503[Medline]
  10. Baltimore D 2001 Our genome unveiled. Nature 409:814–816[CrossRef][Medline]
  11. Bailey Jr LC, Searls DB, Overton GC 1998 Analysis of EST-driven gene annotation in human genomic sequence. Genome Res 8: 362–376
  12. Basrai MA, Hieter P, Boeke JD 1997 Small open reading frames: beautiful needles in the haystack. Genome Res 7:768–771[Free Full Text]
  13. Burge C, Karlin S 1997 Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94[CrossRef][Medline]
  14. Reese MG, Kulp D, Tammana H, Haussler D 2000 Genie–gene finding in Drosophila melanogaster. Genome Res 10:529–538[Abstract/Free Full Text]
  15. Guigo R, Agarwal P, Abril JF, Burset M, Fickett JW 2000 An assessment of gene prediction accuracy in large DNA sequences. Genome Res 10:1631–1642[Abstract/Free Full Text]
  16. Howard AD, McAllister G, Feighner SD, Liu Q, Nargund RP, Van der Ploeg LH, Patchett AA 2001 Orphan G-protein-coupled receptors and natural ligand discovery. Trends Pharmacol Sci 22:132–140[Medline]
  17. Cascieri MA, Springer MS 2000 The chemokine/chemokine- receptor family: potential and progress for therapeutic intervention. Curr Opin Chem Biol 4:420–427[CrossRef][Medline]
  18. Busfield SJ, Comrack CA, Yu G, Chickering TW, Smutko JS, Zhou H, Leiby KR, Holmgren LM, Gearing DP, Pan Y 2000 Identification and gene organization of three novel members of the IL-1 family on human chromosome 2. Genomics 66:213–216[CrossRef][Medline]
  19. Debets R, Timans JC, Homey B, Zurawski S, Sana TR, Lo S, Wagner J, Edwards G, Clifford T, Menon S, Bazan JF, Kastelein RA 2001 Two novel IL-1 family members, IL-1{delta} and IL-1{epsilon}, function as an antagonist and agonist of NF-{kappa}B activation through the orphan IL-1 receptor-related protein 2. J Immunol 167:1440–1446[Abstract/Free Full Text]
  20. Rock FL, Hardiman G, Timans JC, Kastelein RA, Bazan JF 1998 A family of human receptors structurally related to Drosophila Toll. Proc Natl Acad Sci USA 95:588–593[Abstract/Free Full Text]
  21. O’Neill LA 2000 The interleukin-1 receptor/Toll-like receptor superfamily: signal transduction during inflammation and host defense. Sci STKE 44:RE1; 1–11 (http://stke.sciencemag.org)
  22. Glusman G, Yanai I, Rubin I, Lancet D 2001 The complete human olfactory subgenome. Genome Res 11:685–702[Abstract/Free Full Text]
  23. Zozulya S, Echeverri F, Nguyen T 2001 The human olfactory receptor repertoire. Genome Biol 2(6):research0018.1–0018.12 (http://genomebiology.com)
  24. Zhang X, Firestein S 2002 The olfactory receptor gene superfamily of the mouse. Nat Neurosci 5:124–133[Medline]
  25. Neill JD 2002 GnRH and GnRH receptor genes in the human genome. Endocrinology 143:737–743[Abstract/Free Full Text]
  26. Okubo K, Nagata S, Ko R, Kataoka H, Yoshiura Y, Mitani H, Kondo M, Naruse K, Shima A, Aida K 2001 Identification and characterization of two distinct GnRH receptor subtypes in a teleost, the medaka Oryzias latipes. Endocrinology 142:4729–4739[Abstract/Free Full Text]
  27. Millar R, Lowe S, Conklin D, Pawson A, Maudsley S, Troskie B, Ott T, Millar M, Lincoln G, Sellar R, Faurholm B, Scobie G, Kuestner R, Terasawa E, Katz A 2001 A novel mammalian receptor for the evolutionarily conserved type II GnRH. Proc Natl Acad Sci USA 98:9636–9641[Abstract/Free Full Text]
  28. Faurholm B, Millar RP, Katz AA 2001 The genes encoding the type II gonadotropin-releasing hormone receptor and the ribonucleoprotein RBM8A in humans overlap in two genomic loci. Genomics 78:15–18[CrossRef][Medline]
  29. Neill JD, Duck LW, Sellers JC, Musgrove LC 2001 A gonadotropin-releasing hormone (GnRH) receptor specific for GnRH II in primates. Biochem Biophys Res Commun 282:1012–1018[CrossRef][Medline]
  30. Vitt UA, Hsu SY, Hsueh AJ 2001 Evolution and classification of cystine knot-containing hormones and related extracellular signaling molecules. Mol Endocrinol 15:681–694[Abstract/Free Full Text]
  31. Meyer A, Schartl M 1999 Gene and genome duplications in vertebrates: the one-to-four (-to-eight in fish) rule and the evolution of novel gene functions. Curr Opin Cell Biol 11:699–704[CrossRef][Medline]
  32. Pebusque MJ, Coulier F, Birnbaum D, Pontarotti P 1998 Ancient large-scale genome duplications: phylogenetic and linkage analyses shed light on chordate genome evolution. Mol Biol Evol 15:1145–1159[Abstract]
  33. Pierce SB, Costa M, Wisotzkey R, Devadhar S, Homburger SA, Buchman AR, Ferguson KC, Heller J, Platt DM, Pasquinelli AA, Liu LX, Doberstein SK, Ruvkun G 2001 Regulation of DAF-2 receptor signaling by human insulin and ins-1, a member of the unusually large and diverse C. elegans insulin gene family. Genes Dev 15:672–686[Abstract/Free Full Text]
  34. Hsu SY 1999 Cloning of two novel mammalian paralogs of relaxin/insulin family proteins and their expression in testis and kidney. Mol Endocrinol 13:2163–2174[Abstract/Free Full Text]
  35. Bathgate RA, Samuel CS, Burazin TC, Layfield S, Claasz AA, Reytomas IG, Dawson NF, Zhao C, Bond C, Summers RJ, Parry LJ, Wade JD, Tregear GW 2002 Human relaxin gene 3 (H3) and the equivalent mouse relaxin (M3) gene. Novel members of the relaxin peptide family. J Biol Chem 277:1148–1157[Abstract/Free Full Text]
  36. Hsu SY, Kudo M, Chen T, Nakabayashi K, Bhalla A, van der Spek PJ, van Duin M, Hsueh AJ 2000 The three subfamilies of leucine-rich repeat-containing G protein-coupled receptors (LGR): identification of LGR6 and LGR7 and the signaling mechanism for LGR7. Mol Endocrinol 14:1257–1271[Abstract/Free Full Text]
  37. Hsu SY, Liang SG, Hsueh AJ 1998 Characterization of two LGR genes homologous to gonadotropin and thyrotropin receptors with extracellular leucine-rich repeats and a G protein-coupled, seven-transmembrane region. Mol Endocrinol 12:1830–1845[Abstract/Free Full Text]
  38. Nishi S, Hsu SY, Zell K, Hsueh AJ 2000 Characterization of two fly LGR (leucine-rich repeat-containing, G protein-coupled receptor) proteins homologous to vertebrate glycoprotein hormone receptors: constitutive activation of wild-type fly LGR1 but not LGR2 in transfected mammalian cells. Endocrinology 141:4081–4090[Abstract/Free Full Text]
  39. Hsu SY, Nakabayashi K, Nishi S, Kumagai J, Kudo M, Sherwood OD, Hsueh AJ 2002 Activation of orphan receptors by the hormone relaxin. Science 295:671–674[Abstract/Free Full Text]
  40. Conti M, Jin SL 1999 The molecular biology of cyclic nucleotide phosphodiesterases. Prog Nucleic Acid Res Mol Biol 63:1–38[Medline]
  41. Koyanagi M, Suga H, Hoshiyama D, Ono K, Iwabe N, Kuma K, Miyata T 1998 Ancient gene duplication and domain shuffling in the animal cyclic nucleotide phosphodiesterase family. FEBS Lett 436:323–328[CrossRef][Medline]
  42. Hsu SY, Hsueh AJ 2001 Human stresscopin and stresscopin- related peptide are selective ligands for the type 2 corticotropin-releasing hormone receptor. Nat Med 7:605–611[CrossRef][Medline]