The Human Genome Project is one of the largest and most ambitious scientific initiatives ever undertaken. It was made possible by DNA sequencing technologies, which allowed researchers to map the human genome fully. The discoveries stemming from this project have greatly aided basic and biomedical research.
Next-generation sequencing (NGS), also known as short-read sequencing, is a powerful and highly effective method for studying DNA and RNA. By analyzing nucleic acid fragments typically ranging from 50 to 600 base pairs, this technique provides valuable insight into DNA and RNA structure, modifications, variations and gene expression profiles. Since its inception, NGS has been extensively utilized in the Human Genome Project and has undergone significant advancements in sequencing depth, accuracy, specificity and throughput.
One of the most significant benefits of NGS is its ability to analyze massive amounts of genomic data, which was previously impossible with traditional sequencing methods. With its high throughput, NGS has enabled researchers to study multiple genomes simultaneously, accelerating scientific discovery. Additionally, NGS has played a critical role in personalized medicine, allowing for the identification of genetic variations that can lead to the development of targeted therapies. Overall, NGS has revolutionized the field of genomics and continues to pave the way for discoveries in molecular biology and personalized medicine.
Next-generation sequencing workflow
Next-generation sequencing workflows vary based on the selected sample type or library preparation kit. However, common steps are followed when going from sample to answer.
- Sample acquisition and processing: It is essential to prepare samples correctly to ensure a successful library preparation process. Samples can be of different types, such as cells, bone marrow, liquid biopsy or tissue. Solid tissue samples require mechanical dissociation or enzymatic digestion steps, which liquid biopsy samples such as blood, saliva, urine, or cerebral spinal fluid do not. However, it is important to acquire and process all samples promptly to avoid cellular stress, which can lead to changes in gene expression, causing cell death or altered gene profiles.
- Library Preparation: The target DNA or RNA is obtained and enriched using techniques such as Polymerase Chain Reaction (PCR) or hybridization probe capture. In the case of RNA, a reverse transcription step is required to convert it to complementary DNA (cDNA) before sequencing. To minimize data anomalies, impurities are removed from the libraries through a clean-up process.
- Sequencing: After preparing the libraries, they are loaded onto the sequencer and read in parallel. The recommended read length for most sequencers is 350 base pairs, but the final length may vary depending on the library preparation process and the research goals. Once the sequencing is complete, a raw data file containing the reads is generated, which can then be analyzed using bioinformatics software.
- Sequence Alignment and Bioinformatics: Advanced bioinformatics techniques align genomic regions. This entails using specialized software to map specific genome sequences to their respective locations with a reference genome. Subsequently, genetic variations and mutations are identified through variant calling, gene annotation and differential expression analyses. These are crucial in comprehending various biological processes.
- Actionable Insights: After the bioinformatics analysis, the researchers or clinical investigators interpret the sequencing data to derive actionable insights based on genomic (DNA) or transcriptomic (RNA) findings. Possible insights could be identifying drug targets or optimizing drug candidates.
See how Danaher Life Sciences can help
Biomedical and Industrial Applications of Next-Generation Sequencing
NGS (also called short-read sequencing) is a cost-effective method used for genomic or transcriptomic sequencing with a wide range of applications. Here are a few examples of how NGS has benefited biomedical and industrial users.
- Drug discovery and development: Short-read sequencing has become indispensable for drug discovery and development. It enables scientists to accurately identify genetic or transcriptomic variations that drugs can target, leading to the development of more effective and personalized treatments for diseases.
- Agriculture: Short-read sequencing is a technique that can be used to detect genetic mutations that may impact crop yield, quality or disease susceptibility. By identifying these mutations, agricultural scientists can develop improved crop varieties that are more productive, sustainable, and resistant to changing environmental conditions, such as drought and disease.
- Food security: Ensuring food safety and preventing outbreaks is a crucial concern worldwide. Short-read sequencing can play a significant role in identifying potential human pathogens in the food supply. This approach enables authorities to take prompt action to prevent outbreaks. Additionally, it can help food distributors to correctly identify species, ultimately preventing the unnecessary distribution and profiteering of endangered species by unscrupulous actors.
- Manufacturing and textiles: Short-read sequencing is extensively used by synthetic biology firms to identify and modify microbial strains for biomanufacturing. The aim is to produce materials for textiles and manufacturing industries more efficiently and sustainably. These approaches are also being adopted in the pharmaceutical industry to produce biologics.
Next-Generation Sequencing Approaches
NGS approaches can be classified into the following ways based on the chemistry involved:
-
Pyrosequencing: The technology was introduced in 1993. It works on the principle of detecting free pyrophosphate (PPi) when the enzyme DNA polymerase adds new nucleotides. The released pyrophosphate is detected with a chemiluminescent technique by using the following enzymes:
- Luciferase isolated from Photinus pyralis (commonly known as American firefly)
- Recombinant ATP sulfurylase isolated from Saccharomyces cerevisiae (commonly known as Brewer's Yeast)
-
Sequencing-by-hybridization: This technique works on the principle of hybridizing the target DNA fragments to a biochip or DNA chip containing short, single-stranded DNA probes. The DNA fragments bind to the probes with complementary sequences, which are used to create a spectrum (a set of all DNA fragments bound to respective probes) to analyze DNA sequences.
-
Sequencing-by-synthesis: This technique uses fluorescently labeled dNTPs to identify the sequences in a sample. During the process, the DNA to be sequenced is immobilized on a solid support and a complementary sequence is formed using labeled dNTPs, primers, and an engineered polymerase. The signals generated due to the fluorescent emission of the incorporated nucleotide help determine the sequence of the target DNA fragment.
Challenges and Limitations
Short-read sequencing has become increasingly popular in research and industrial applications due to its faster and more cost-effective nature. However, the main limitation of this technology is the read length. Third-generation sequencing (also called long-read sequencing) has been developed to overcome this limitation. Long-read sequencing allows for generating reads that can be in the megabase pair lengths.
Shorter reads can lead to uneven coverage of genomic regions, particularly those with repeat sequences, such as GC- or AT-rich sequences. Furthermore, short-read sequencing can have difficulty accessing certain parts of the genome, such as telomeric regions, making de novo genome assembly challenging. This aspect prompted researchers with the Human Genome Project to adopt long-read sequencing to finish sequencing the human genome.
Recent Advancements
Short-read sequencing is a widely used technique in molecular biology and genomics, and developers have been continually innovating to improve the accuracy, sensitivity, coverage and throughput while lowering the cost of the process. Recent improvements have focused on data acquisition and analysis, with cloud computing and artificial intelligence shortening the time required to analyze data while improving variant calling and annotation through data training models. The increased access to greater parts of the genome through improved library preparation methods has also led to increased data integrity.
In addition, labs that have historically found short-read sequencing cost-prohibitive now have access to these technologies. Nanopore-based sequencing and single-molecule real-time sequencing, which were previously associated with long-read sequencing, are directly being applied to shorter fragments. These technologies provide accuracy and specificity while allowing longer read lengths and better coverage of complex genomic regions.
FAQs
What is short-read sequencing?
Short-read sequencing is a next-generation sequencing technology that sequences nucleic acid fragments of 50-600 base pairs.
What is the difference between short-read and long-read sequencing?
Short-read and long-read sequencing differ mainly in the length of nucleic acid fragments they can accommodate. Short-read sequencing focuses on fragments that are 50-600 bases long, while long-read sequencing can handle fragments ranging from 1 Kbps to several megabase pairs. Long-read sequencing provides better accessibility to complex genomic regions, such as telomeric or acrocentric regions. It offers better coverage of tandem repeat sequences, which short-read sequencing cannot achieve. However, short-read sequencing is more accurate and sensitive, has a higher throughput, and is cost-efficient.
What is short-read RNA sequencing?
RNA-seq is a next-generation sequencing approach to profile the transcriptome (mRNA). RNA-seq can detect, identify and quantify transcripts in biological samples.
What are sequencing reads?
Sequencing reads refer to the number of base pair sequences obtained after sequencing the nucleic acid fragment of interest from a biological sample.
See how Danaher Life Sciences can help
recent-articles