Human Genome - Keeping Genes in Order

Araxi Urrutia-Odabachian

15 October 2003

The Royal Institution in conjunction with L'Oréal, award a bursary each year to the Science Graduate of the Year. In 2003, Araxi Urrutia, a research graduate at the University of Bath, won this award, and this talk was a summary of the work she had done for it, and was given on the eve of her doctorate thesis submission.

Since the DNA molecule structure was cracked, the complete sequence of the human genome has now been deciphered. However, little is known about the processes that have shaped and still shape the organization of genes in genomes, in particular for the human genome. During the last three years my collaborators and I set about tracking down how our genes and genome are adapted for an optimization of protein synthesis.
At the time we started our studies, no genome draft nor any large scale transcription data had been published. The common thought was that, in humans, gene characteristics were determined almost exclusively by mutational processes on the one hand, and the optimization of function of the encoded protein on the other. From the results obtained so far by us as well as those obtained by other research groups, we are confronted with a very different view where genome structure becomes an active and important player in the regulation of large scale patterns of gene activity, and genes are not only under pressure to produce proteins with optimum enzymatic function but also for doing it cheaply.
Initially we set up to study codon usage bias (CUB), that is, the unequal use of codons (triplets of nucleotides that encode for a particular amino acid) that encode for the same amino acid. In other species, CUB is closely related to expression patterns: genes of greater expression (those genes encoding proteins produced in large quantities) are more biased. In mammals, however, no such relation had been established. The human genome is highly heterogeneous in its base composition (relative proportions in which the four nucleotides ATGC are present), which had hampered previous efforts to examine patterns of codon usage. To overcome this difficulty, we developed a new method to calculate CUB that could account for background nucleotide bias. Contrary to previous expectations, we found significant biases in codon usage in human genes not accountable by background nucleotide distributions.
With the publication of the human genome draft sequence and the greater availability of gene expression data, large scale analyses became possible. We were able to show that highly expressed genes encode shorter proteins, have reduced introns (stretches of DNA in a gene that do not contribute to protein expression), and have greater biases in their codon distributions and amino acid use. All these patterns are precisely what we would expect if selective pressures act to reduce costs of protein synthesis.
These were unexpected results given the relatively small effective population sizes (the part of the population that contributes offspring to future generations) of mammalian species. However, the human genome had more surprising features that would place its very own structure as a major determining factor in the regulation of gene expression. We observed that levels of gene expression significantly varied from one chromosome to another. Previously, with few exceptions, genes were generally believed to be randomly located in the genome. We examined more closely the distribution of genes with respect to their expression patterns and found that genes with greater expression, on average, are to be found near to other genes of broad expression. This observation was among the first to describe a general pattern of gene sorting along the genome. Moreover we found a relationship between the tight coupling of gene expression and chromosome location with the puzzling base composition heterogeneity (isochore structures). We observed that base composition of flanking non-transcribed regions of the genes is greatly correlated with their expression levels. The higher the guanine and cytosine (GC) content the higher the expression.
Although there are many issues still to be resolved in the study of our genome, we are now able to draw a quite different picture of the general processes that shape genes and structure of the human genome. We now know that protein function is not the only factor of selective pressure acting upon genes, but lowering production costs by modifying gene length and base composition is also an important factor. But most impressive is the fact that maximization of transcription is a decisive force in the determination of gene location along the genome. The pressure to being in the tiny regions more suitable for expression is such that genes frequently overlap with their neighbours, while most of the genome is mostly inhabitable for genes.
Araxi Urrutia Odabachian is funded by CONACyT, Mexico and Overseas Research Award, UK

The following are the original manuscripts where the above results were presented.
Urrutia AO, Hurst LD. 2001. Codon usage bias covaries with expression breadth and the rate of synonymous evolution in humans, but this is not evidence for selection. Genetics. Nov;159(3):1191-9.
Lercher MJ, Urrutia AO, Hurst LD. 2002 Clustering of housekeeping genes provides a unified model of gene order in the human genome. Nature Genetics. Jun;31(2):180-3.
Lercher MJ, Urrutia AO, Pavlicek A, Hurst LD. 2003. A unification of mosaic structures in the human genome. Human Molecular Genetics Oct 1;12(19):2411-5.
Urrutia AO, Hurst LD. 2003. The signature of selection mediated by expression on human genes. Genome Research Oct;13(10):2260-4.

The following are books of popular science related to the study of genes and genomes.

Richard Dawkins, The selfish gene. Oxford press 1990. (A nice explanation of the of the evolution through natural selection.)

John Maynard Smith And Eors Szathmary. The origins of life: from birth of life to the origins of language. Oxford University Press 2000. (A journey through the different paradoxes and big leaps during evolution of life)

Spencer Wells. The journey of man: a genetic odyssey. Random House Trade Paperbacks 2004. (A good account of human migrations through the study of genes.)