Humpath.com - Human pathology

Home > Technical section > Biology > Molecular biology > DNA study > next-generation sequencing

next-generation sequencing

Sunday 17 April 2011

Next Generation Sequencing (NGS) is the latest revolutionary high-throughput DNA-sequencing technology that parallelizes sequencing to deliver fast, inexpensive and large volumes of accurate sequential data; up to 600GB in two weeks (the whole human genome is approximately three GB).

The NGS technology landscape

DNA sequencing is one of the primary methods of genetic analytics. Knowledge of DNA sequences is now indispensable in basic biological research and in numerous applied fields such as biotechnology, diagnostic, forensic biology and systems biology.

The advent of DNA sequencing has significantly accelerated biology research and discovery. Initially all major sequencing projects (including the Human Genome Project) were done using Sanger (first generation) sequencer but its limitations in terms of throughput and cost, propelled the development of next (second) generation sequencing technologies.

Currently, large numbers of biologists have adopted NGS for sequencing complete genomes (de novo) of varied plants and organisms with scientific or commercial relevance (for example rice, maize, buffalo, camel, goat).

Apart from de novo sequencing, NGS is being widely used for RNA-Seq (transcriptome) analysis, ChIP-Seq analysis, CNV-Seq, genomic structural variations and SNP studies.

NGS is indeed delivering breathtaking value and driving down sequencing costs considerably, thus encouraging researchers to undertake more and more sequencing projects (for example 1000 Human Genome Project, Cancer Genome Project).

Reports are already available showcasing applicability of NGS in cytogenetic diagnosis of Mendelian disorder, diagnosis and prognosis of cancer and genetic testing for common diseases such as hypertension, diabetes, among others.

NGS technologies

•Illumina/Solexa (Genome Analyzer II, HiSeq, Miseq)

Illumina sequencers use the principle of cluster generation by bridge amplification followed with sequencing by synthesis (base-space mechanism) and generates output with read length between 35-150bp.

Illumina sequencers are available in multiple versions; HiSeq is the latest, high-end platform and has the capability to churn out data up to 600 GB per run and is suitable for big labs or core sequencing facilities while Miseq is a smaller table-top version which can generate data upto one GB and is aimed at smaller labs.

Genome Analyzer II is an intermediate version. The primary error type for Illumina sequencers is “substitution” and the reported error rate is approximately 0.1 percent.

•SOLiD

Life Technologies’ SOLiD system uses unique color space mechanism. Briefly, the DNA fragments to be sequenced are first immobilized on beads followed by emulsion PCR.

Later the beads (with amplified fragment) are arrayed on glass slide for sequencing by synthesis using DNA ligase which ligates fluorescently labeled octomers to the fragment being read.

SOLiD system is unique. It uses DNA ligase instead of DNA polymerase to perform the sequencing reaction, the sequence generated is in color space and each base is read twice thus reducing error in sequencing (error rate @<@ 0.1 percent). The major shortcoming for SOLiD is its short read length (35-75 bp) and the data generated is up to 15 GB.

•Ion Torrent

Ion torrent’s (now Life Technologies) Personal Genome Machine (PGM) uses unique semi-conductor based technology for sequencing.

In PGM, a high-density array of micro wells are used to perform sequencing beneath which there is an ion sensitive layer followed by a proprietary ion sensor to detect changes in pH resulting from release of hydrogen ion after nucleotide incorporation.

Upon release of hydrogen ion, the voltage change is detected by the ion sensor.

Thus in PGM, instead of fluorescently-labeled nucleotides or light, incorporation of a base is detected on the basis of hydrogen ion released.

Since the detection system involves no imaging, there is significant reduction in lag and thus the run time.

PGM is a small, fast sequencer with current maximum output of 10 MB per run with read length of 100–200bp. The primary error type is “indel (insertion/deletion)” and an error rate of about one percent.

•Roche-454

GS FLX and GS Junior are NGS systems from Roche-454 and based on the principle of fragment amplification by emulsion PCR followed by pyro-sequencing.

The major advantage of 454 systems is its high read length (up to 400bp) and low error rate (one percent) as compared to other systems.

GS FLX can generate up to 450 MB data and is suitable for core sequencing facilities while GS Junior is a smaller table-top version of GS FLX and generates about 35 MB data per run. The primary error observed is “indel”.

The major drawback with 454 systems is their inability to handle large sample numbers and thus they are not cost effective for high throughput usage.

•PacBio RS

PacBio RS is the first third generation Single Molecule Real Time (SMRT) DNA Sequencer enabling direct measurement of individual molecules.

No PCR amplification is required resulting in more uniform sequence coverage across genomic regions irrespective of the GC content and thus, facilitating the detection of minor variants. The data generated per flow cell of SMRT is up to 150MB.

The biggest advantage of PacBio RS is its unprecedented long read length between 2500-3000bp while the biggest short coming is the high error rate (about 15 percent).

Evolution

Amongst all the platforms, Illumina is undoubtedly the current market leader with more than 50 percent market share.

More recently, Ion Torrent has also started gaining popularity due to its low cost and longer read length as compared to Illumina.

Roche-454 has its own unique niche marketshare and many researchers use it strategically in combination with Illumina, especially in de novo sequencing projects to fill gaps in the sequences.

SOLiD is losing ground while PacBio has been introduced into the market very recently.

Advantages

1. Fishing-out novel genetic variations

NGS provides an unbiased view of the genome/transcriptome; hence the ability to discover novel genetic variations is very high. Novel gene fusion events have already been reported using NGS.

2. Greater sensitivity

High depth sequencing using NGS can help identify rare genetic variants that may contribute to disorders such as schizophrenia, alzheimer, cardiac disorders, cancer, among others. Similarly in transcriptome analysis, low expressing transcripts or rare transcript isoforms can be detected.

Bottlenecks and challenges

Today, NGS is not only replacing the older sequencing technologies but also is a method of choice for studying gene expression, protein binding and copy number variations. Undoubtedly NGS, has significantly reduced the cost of sequencing per base by substantially improving the throughput; however NGS or second generation sequencing technologies also suffer from some key limitations as indicated below:

•NGS data analysis:

In a majority of the sequencing projects the quantum of sequencing data has outpaced computational capabilities, making NGS data analysis and management the biggest bottleneck and a field in itself for research which is still evolving. Although the cost of actual sequencing is reducing drastically, the associated bioinformatics cost for NGS data storage and analysis has grown exponentially. Unlike microarray which has multiple robust analysis solutions, NGS data analysis largely rely on non-standard open source tools and requires highly trained Bioinformaticians. Although few commercial solutions are available they are extremely expensive and not very reliable.

•Computing infrastructure:

NGS data demands sophisticated and high-end computing infrastructure. For example, to perform de novo assembly and annotation of mammalian genome, a system with atleast eight quad core processor and 512 GB RAM along with 10 terabytes (TB) of disk space is required. Additionally, highly skilled IT and bioinformatics staff is required to set up, maintain and run NGS data analysis tools.

•Commercially unviable:

The current NGS instruments available are not capable of sequencing complete human genomes on a large scale at a low price. Researchers cannot afford these high costs.

See also

- cancer NGS

Open References

- Comparison of next-generation sequencing systems. Liu L, Li Y, Li S, Hu N, He Y, Pong R, Lin D, Lu L, Law M. J Biomed Biotechnol. 2012;2012:251364. doi : 10.1155/2012/251364 PMID: 22829749 [Free]

References

- Advances in understanding cancer genomes through second-generation sequencing. Meyerson M, Gabriel S, Getz G. Nat Rev Genet. 2010 Oct;11(10):685-96. PMID: 20847746

- Human Disease: Next-generation sequencing of the next generation.
Burgess DJ. Nat Rev Genet. 2011 Feb;12(2):78. PMID: 21173774