Chromosome 1: the biography

By Graeme O'Neill
Friday, 21 July, 2006

Sixteen years after it began, the vast jigsaw of the Human Genome Project is effectively complete. A few tiny pieces continue to defy cloning and analysis, but the DNA sequence, gene catalogue and chromosomal map of the genome are now inscribed in digital granite, for all time.

In the May 18 issue of the international journal Nature, researchers from Britain's Wellcome Trust Sanger Institute and their collaborators in the UK and US published a revised and annotated DNA sequence for chromosome 1, the largest human chromosome.

It marked the end of an epic process that began in 1990 and which produced the historic first draft of the DNA sequence and chromosomal map of the human genome, published in Nature and Science in 2000.

And, of historical note, the completion of the revision project came 50 years after a mortifying moment in human genetics.

When Watson and Crick published their celebrated model of the structure of DNA in April, 1953, cytologists believed that humans had 48 chromosomes - the same number as their great ape relatives chimpanzees, bonobos, gorillas and Asia's two orang-utan species.

In 1956, a keen-eyed cytologist discovered that humans actually have 46 chromosomes (and the importance of basic arithmetic was highlighted for those taking a BSc). It is now known that some time after humans diverged from chimps and bonobos six to seven million years ago, great ape chromosomes 2a and 2b fused to form human chromosome 2.

Among the denizens of the human nuclear landscape, chromosome 1 is king: over evolutionary time it has acquired the veritable lion's share of the genes in the primate genome.

According to the Sanger Institute, it took 10 years to sequence. It is not only large but gene-dense, bearing 3141 of the 25,000-odd genes in the human genome. Between the first and second drafts, molecular geneticists added 173 more genes to the first-pass tally of 2968 genes.

Inside the chromosome

Oddly, around 50 per cent of the world's population is a gene short on chromosome 1, having only 3140 genes. They are missing the glutathione S-transferase M1 gene, whose encoded enzyme is involved in detoxifying metabolites of the 3000-odd carcinogenic chemicals in cigarette smoke.

Those with the GTS1M2-null genotype are more susceptible to lung cancer and other epitheliary cancers - breast, bladder, bowel, esophagus and larynx.

Half a century ago, cytologists numbered the 22 pairs of human autosomes in order of their apparent physical size. Although chromosome 1 appears slightly larger than chromosome 2, the latest, high-precision estimate of the number of DNA base pairs in its sequence - just over 224 million - ranks it second to the composite chromosome 2, with 237 million.

Yet chromosome 2 has 1795 fewer genes than chromosome 1, because of extensive, gene-sparse regions along its length. Other chromosomes have similar 'gene deserts' - most notably, chromosome 13, home of the breast-cancer genes BRCA1 and BRCA1.

Chromosome 13's 95.5 million base pairs put it in the middle rank of autosomes. It is a study in contrast with chromosome 1, bearing only 633 genes, fewer even than tiny chromosome 22.

When Sanger Institute researchers published the chromosome 13 sequence in Nature on April 1, 2004, they noted a huge region in the centre of the chromosome containing only 47 genes; even at the average gene density of the entire genome, there should have been around 180. These gene deserts are thought to have important regulatory functions, but their role remains enigmatic.

Among the densely crowded genes on chromosome 1 are genes that, in mutant form, have been linked to more than 350 diseases, including numerous cancers, neurodegenerative disorders including Parkinson's and Alzheimer's diseases, developmental disorders and high serum cholesterol. The function of many of its genes remains unknown.

The average gene density for chromosome 1 is 14.2 genes per megabase (Mb), almost twice the average 7.8/Mg density of the entire genome. Chromosome 1 is also home to 991 pseudogenes - genes that have either reached their evolutionary use-by date or have relinquished their original protein-coding duties for regulatory roles; they now code only for RNA molecules. Of these 840 are still processed, presumably yielding anti-sense RNAs that regulate the expression of other, related genes.

The sequence also includes 22 elements coding for micro-RNAs, (miRNAs), tiny non-coding RNAs that produce RNA molecules with a mature length of 22-23 nucleotides. MiRNAs are known to be involved in modifying the activity of messenger RNAs from genes - miRNAs have RNA sequences that allow them to bind promiscuously to a variety of messenger RNAs, suggesting they are involved in coordinating gene activity post-transcriptionally.

---PB---

Pressure of selection

The Sanger Centre researchers say chromosome 1 harbours a number of genes that appear to have under selection pressure as contributors to human fitness in relatively recent times.

These regions are characterised by a relative paucity of variation in single-nucleotide polymorphisms (SNPs) within the genes in question, and in the genes flanking them.

Genes tend to time-travel between generations en bloc, so strong positive selection for an advantageous mutation in a particular gene will capture whatever alleles of adjacent genes happened to be present at the time it occurred.

This "tag-along" effect fixes whatever SNPs were present in these alleles, leaving selective sweeps' imprint on the chromosome by 'bleaching' that region of its original genetic variation, a phenomenon known as purifying selection.

Over millions of years, mutation gradually restores variability to the region at a fixed rate; by applying statistical models that factor in generation times and the effective breeding size of the population, molecular geneticists can estimate the approximate time of the mutation from the amount of de novo variation found in the adjacent genes - strong selection pressure then tends to keep the target gene 'pure', and free of mutation.

The most dramatic example of this effect is the so-called 'Chomsky gene'. FOXP2, on chromosome 7, named after US language expert Professor Noam Chomsky, who first proposed the idea of a 'language engine' in the primate brain.

FOXP2 encodes a transcription factor involved in development of the brain's speech-processing centres, and fine motor control of the muscles of the face and tongue. The human and mouse genes differ by just three nucleotides.

The first mutation occurred tens of millions of years ago, while the second appeared five to seven million years ago, around the time of the human-chimp divergence.

The third occurred 100,000 to 200,000 years ago, around the time of the Upper Paleolithic Transition, when modern humans began fashioning high-precision stone tools, and also began moving out of Africa to colonise the globe.

Affected members of a UK family with a FOXP2 mutation cannot coordinate their facial muscles to speak intelligibly, and have defective grammar. Differences in the frequency of specific alleles of genes may also reflect regional selection pressures. Using data on SNP allele frequencies, the researchers found significant differences between Western and Northern European, West African, East Asia, Japanese and Han Chinese for 69 SNPs, which they believe provide evidence for geographically restricted selection.

The best known of these was in the Duffy blood group gene; the West African mutation that created the Duffy-negative genotype in West African populations protects against malaria caused by Plasmodium vivax.

Purifying selection

SNP frequency for another gene, ACOT11, coding for a cold-induced thioesterase enzyme, expressed in fatty tissue and implicated in obesity in mice, also differs significantly between African and Asian populations.

The ancestral African genotype is effectively fixed in West African populations, pointing to strong purifying selection. However, the gene is more variable in Japanese and Han Chinese, suggesting that the strong selection pressure that maintained the ancestral form in the hot, humid tropical environment of West Africa was relaxed in more cold-adapted Asian populations.

Genes for several olfactory receptors also differed significantly in frequency between ethnic groups. Could these reflect selection pressure of scent cues associated with regional foods, body odours or animal scents?

The researchers also identified significant allelic differentiation in genes of unknown function, as well as two extended haplotypes - conserved chromosomal 'blocks' of 17 megabases and 43 megabases peculiar to European and Asian populations, but not Africans.

Recombination rates within these haplotype blocks are the lowest for any region of chromosome 1, suggesting European and Asian populations derive some evolutionary benefit from the fixed allelic combinations within them.

Studies of genes that have been under purifying selection are putting flesh onto the bare bones of the human fossil record, and illuminating the evolutionary trajectory of humans since they diverged from the African higher primate cousins.

Methylation

Chromosome 1 contains many hotspots for meiotic recombination, variously scattered throughout its length, but clustered within regions around genes - predictably so, given recent research indicating that recombination hotspots tend to be located in heavily methylated chromosomal segments.

Methylation is a secondary mechanism of gene regulation that creates heritable patterns of coordinated gene suppression in either sex. Because of the sheer number of genes it carries, chromosome 1 is likely to contain many more methylated genes, and thus, more recombination sites, than any other chromosome.

Despite exhaustive screening of bacterial and yeast-derived clonal libraries, the Sanger researchers and their colleagues report that 26 gaps remain in the chromosome 1 sequence. Sixteen of these occur in GC-rich regions, which often form heavily methylated 'CpG 'islands' that are problematic to clone.

The researchers identified 10,971 regions of chromosome 1 that are evolutionarily conserved between humans, mouse, rat, zebrafish and pufferfish.

Chromosome 13 was the first for which sequencing searched for sequences in so-called "junk" DNA in gene exons and intergenic regions, coding for micro-RNAs.

These small RNA molecules, around 22 nucleotides in length, regulate gene activity transcriptionally, by binding to complementary sequences in messenger RNAs from genes. The chromosome 1 sequence turned up 22 microRNAs; the miRNA tally for the human genome now stands at 1,345.

Chromosome 1: the biography

Pressure of selection

$96m RNA Research and Manufacturing Facility opens in NSW

Cartherics and Catalent announce enhanced partnership

Alliance seeks to boost regional capacity in clinical trials

Content from other channels on our network