Thursday, May 27, 2010

The Ultimate Database

I've been digging around in my genome, using the data collected by 23andme.  They collected about a half million SNPs, or "single-nucleotide polymorphisms".  Each SNP is a single-letter change in DNA that researchers have teased out by comparing genomes between humans, and even between humans and other animals.  You could say that our SNPs are what make us each unique individuals.

For example, the SNP rs12913832 has been found to have a very high likelihood for predicting blue eyes.  I have "GG" in that location, which is spot on.  But wouldn't it be cool to look at that bit of DNA?

That's where the UCSC Genome Browser comes in.  This has got to be the coolest database browsing tool I've ever used.  Yeah, it's a little clunky, but when you think about what you're looking at, you are gobsmacked.

Here is the zoomed-in view of the SNP for blue eyes.

In the center of the image lined up with the SNP you will see a series of A's.  This shows that in all of those animals, that portion of the HERC2 gene has an A there.  In other words, it's a "highly conserved" allele.  The graph in the center portion shows how conserved that area of the gene is within the animal kingdom.  The HERC2 gene encodes an enzyme that controls the OCA2 (oculocutaneous albinism) gene.  Stuffing a G into that location causes blue eyes.  (yes, I'm simplifying a bit).

Ok, now to explain a bit more of what's going on in the browser.  Above the main image is a smaller image.  That's a picture of the actual chromosome, in this case Chromosome 15.

You can navigate through the chromosome by hitting the left and right arrow buttons, and zoom in and out.  The image changes radically depending on the level of zoom, showing different levels of detail.  The number of SNPs gets very dense as you zoom out.

Here's another interesting gene, TTN.  This encodes a muscle protein Titin, and is the largest gene, encoding the largest protein known to man, with the chemical formula C132,983H211,861N36,149O40,883S693.  It's just one springy piece of the complex mechanism that muscles are made of.

Looking at the section in the middle you can see that the TTN gene is highly conserved, forming that dark band down the middle of the image.  Even the stickleback has a very similar TTN gene.

The dense graph on the bottom are the SNPs in this region of the chromosome.  Each of those SNPs is a potential area for research.  It's staggering how much is known already, but even more amazing is the sheer size of this project, the database, and animal genomes.

If you scoot over a couple of pages to the left from TTN, you'll find the Homeobox D (or HOXD) genes.  These affect things like limb development.  If you type 'HOXD13' in the gene box and hit 'jump', you'll zoom in on that single gene.  Somewhere in that gene I have a mutation that gives me strange thumbs, just like Megan Fox.