Metagenomics and Mixology are similar in regard to their use of blenders.
If metagenomics were a cocktail recipe, it would go something like this:
1. In a blender, add:
-A number of different types and brands of alcohol (but don’t tell me how many or exactly what you picked!)
-A variety of fruits, juices, creams, bitters, etc. (but again, I don’t want to know what you just poured in!)
3. Drink – and try not to throw up.
In actual metagenomics, our normal lab recipe goes as follows:
1. Put dirt in a blender
3. Sequence fragments of all the DNA you found
4. Try to figure how many species, and what species, you just sequenced.
A “metagenome” is any random bits of DNA you can sequence from a handful of dirt. Because we now have awesome instruments that can quickly sequence hundreds of millions of DNA fragments at once, metagenomics has emerged as a new way of studying environments. Instead of spending weeks or months looking under a microscope or trying to culture a single species of bacteria, scientists can now *literally* throw everything into a blender to look at communities of organisms from a different (and more comprehensive) perspective.
The perfect cocktail has just the right mix of ingredients, mixed in just the right proportions. This ideal recipe applies equally to cocktails and metagenomes. In a chic tipple, the taste of every ingredient comes through on your tongue, each one complementing the other and coalescing into a surreal experience. In a manageable metagenome, you can isolate and identify the DNA signature of every species in your sample, piecing together whole genome sequences and identifying the ecological role of each species based on their genes. Unfortunately, this rarely happens in metagenomics…natural communities just aren’t that simple. One example is acid mine drainage ecosystems, a habitat so toxic that only a handful of species can survive there. The cocktail equivalent would be vodka with lemon – there’s only two ingredients, so of course you can tell them apart:
Marine metagenomes are like the most disgusting cocktail ever. They have SO many ingredients (thousands genomes representing different species) that the end product is extremely messy and leaves a bad taste in your mouth. Think of that drinking game, King’s Cup (and the not-so-lucky person who gets to imbibe from the communal vessel after the 4th King. Scientists don’t want to drink bad cocktails, of course, and the focus of metagenomics research is coming up with ways to distill this messy genomic mixture into something more palatable.
If you were presented with the results of our metagenomic cocktail recipe, you could think up a bunch of ways to determine what’s in your drink: a fibrous texture would hint at fresh fruit, color might indicate the blend of juices, and a heavy aftertaste of alcohol means that someone doused the mixture with moonshine from a plastic jug. We can’t use sight or taste to tease apart an actual metagenome (although I’ve never tried licking a computer…). All we have is the 4-letter alphabet of DNA, so we must use properties of the sequences themselves to separate and group our data in meaningful ways (also known as “binning” the sequences).
We can search for short “words” of DNA letters (known as k-mers – for example, “TTGACC” is a 6-letter k-mer), and measure their occurrence and frequency in our data. We could calculate the percentage of Guanine and Cytosine nucleotides in each sequence (abbreviated as %GC) and group together sequences with similar values. Or we could look at coverage – after grouping identical DNA sequences together, how many times did you observe DNA sequences Y and Z? If you had 100 observations of both DNA sequences Y and Z, and if those two sequences had similar %GC and k-mer values, they might have come from the same genome.
Analyzing metagenomes is a lot more complicated than these superficial calculations. I could go on about HMMs, COGs, and KEGGs, and we’d really have a BLAST (genomics joke!). But then you’d really want to start drinking….
As for cocktails, I definitely would’t recommend the recipe that opened this post. But for environmental sequence data, metagenomics researchers really don’t have any other choice.