By Dr. Jimmy Breen (ACAD/School of Agriculture, Food and Wine)
Whether they are ancient or modern DNA samples, next-generation sequencing (NGS) is an incredibly important tool in extracting biologically relevant information. While sequencing your favourite materials on a genome sequencer is extremely useful, it can also be ridiculously expensive.
Before I was a joint postdoc here at ACAD and the School of Agriculture, Food and Wine (Waite Campus, University of Adelaide), I was a bioinformatics officer at a genome sequencing company. So I know about costing, as a result, usually around the pointy end of the year (i.e. grant season), I get asked lots of questions about potential sequencing projects by my colleagues. The main query is: how many samples can I adequately sequence, given my budget of $X? It’s a good question. While it’s fairly straightforward to give an estimate for the cost of sequencing (after you work out the number and type of sequencing runs you need to do), it’s often difficult to estimate the costs of preparing your samples for running on the machine.
The general workflow for performing sequencing runs is to extract your desired DNA sample and then produce a sequencing library (an amplified version of your extract DNA with sequencing adaptors attached). This library can then be used as an immortal resource, whenever you need more DNA to work with, you just amplify whatever you need from your library.
When sequencing ancient DNA, library cost isn’t that much of a consideration. We deal with relatively few NGS libraries in ACAD, each developed from rare and fragile samples. However, sometimes our projects require modern datasets for comparison with the ancient samples. Therefore, many different samples are performed on one sequencing run. This cuts costs and we get the most bang for our sequencing buck. Running multiple samples at once is called multiplexing. Samples can be separated from each other on the basis of a barcode, or in the case of Illumina sequencing, an index.
As I said, the cost of the actual sequencing run is the least of your worries. With the Illumina sequencing platform, if you prepared sequencing libraries for 96 samples using standard Illumina library prep kits, you could be looking at a cost of up to $9,000. That’s 6 – 7 times more than the cost of the sequencing run itself! It’s no wonder that labs are now developing protocols to produce cost-effective, highly multiplexed sequencing libraries! This is especially true for projects that deal with large amounts of samples at once, such as population genetics and plant genotyping experiments.
As part of our ARC Discovery project studying ancient cereals from the Areni-1 cave complex (for more info visit my profile page), we are looking to produce sequence information from a large series of global wheat and barley cultivars (plants produced by selective breeding). To do that, we’re expanding on a shotgun library protocol developed in Jay Shendure’s lab at the University of Washington by including elements of other high-throughput library creation protocols such as genotype-by-sequencing (GBS) or restriction-site associated marker sequencing (RAD-seq). In GBS/RAD-seq, multiple enzymes are used to cleave fragments of DNA for sequencing. But this is not appropriate for creating a shotgun library because fragments need to cut randomly to get an even coverage of your target genome or sample.
A less costly procedure is to mechanically shear your DNA using something like a Covaris ultra-focused sonicator (which literally shakes the DNA into fragments). However, if you don’t have a 96-well covaris sonicator available for DNA shearing, this step can be both time-consuming and still somewhat expensive. Using a fragmentase (an enzyme which fragments the DNA randomly) and then adding sequencing adapters means you can do a size selection on pooled samples rather than continuing to deal with 96-wells. This is a cheaper and much simpler treatment. Protocols such as these have also been developed for RNA sequencing libraries, meaning that a large range of sequencing studies are able to be investigated cheaply. By cutting costs with a fragmentase, and by designing your own Illumina adapters and oligos, you can bring your library prep consumables and reagents for 96 samples down to below $2,000.
*Info-mercial presenter voice* That’s around $20 per sample and a saving of over $7,000!
Reducing sequencing costs as a whole can have a massive effect on how many samples you can sequence. More efficient library techniques and potential biases are investigated in Adey et al. 2010 (including exome library preparation), so I encourage anyone that’s thinking about large-scale sequencing work to check out the paper.
Of course, most of these protocols are not going to work as well as Illumina’s out-of-the-box library prep protocols, but initial investigations into the new protocols’ effectiveness is proving to be very encouraging, giving hope to the funding starved researcher who has big ideas!