Long-read sequencing – Historical perspective and overview

Genome sequencing has brought about a revolutionary change in the diagnosis and treatment of critical diseases. It has made detecting and understanding the underlying causes of many such diseases easier. Whole-genome sequencing, Sanger sequencing and NGS (next-generation sequencing, also known as short-read sequencing) have been widely utilized approaches by researchers and clinicians.

The advent of next-generation sequencing technologies proved revolutionary for human genetics and genomics. These have enabled researchers to explore gene variations to detect and diagnose clinically significant mutations (germline or somatic). However, these technologies only focused on small DNA or RNA fragment sizes.

While other sequencing approaches provide valuable insights, they do not always capture the whole picture. Long-read sequencing can and has helped fill in the gaps left by these other methods. Considered a third-generation sequencing technology, long-read sequencing originated in the early 1980s with nanopore sequencing. Long-read sequencing allows for analyzing large DNA fragments, up to several thousands of kilobases, that other approaches cannot sequence. Long-read sequencing has advantages over short-read sequencing, such as the ability to access telomeric and acrocentric regions of the genome, identification of structural isoforms and ideal sequence coverage of repetitive regions.

In addition to a nanopore approach commercialized by Oxford Nanopore, long-read sequencing can be performed using single-molecule real-time (SMRT) sequencing (Pacific Biosciences) or synthetically via computational methods that stitch together fragments of short reads. The latter approach does have its limitations.

Advantages of long-read sequencing

Here are some long-read sequencing benefits appreciated by researchers and clinicians:

Long-read sequencing applications

Long-read sequencing is one of the most advanced sequencing approaches available today and is benefiting from increased adoption in many fields because of the utility the approach provides.

De novo genome assembly and reference genome construction are some of the original applications for long-read sequencing. These applications establish the reference genome researchers use to compare and study an organism’s genomic landscape. Short-read sequencing poses challenges when reconstructing genomes due to inaccurate base calling or gaps in repetitive sequence resolution. Long-read sequencing addressed these challenges, allowing the Human Genome Project to be completed.

Long-read sequencing has become essential for detecting DNA structural variants (SVs). This technology has significantly advanced knowledge of gene duplications, inversions, deletions or translocations and their association with diseases. Identifying SVs is especially valuable in the study of cancer and autism. Rare diseases can also benefit since initial diagnosis delays can be attributed to a lack of genetic information or accessibility.

Cell line authentication is another application of long-read sequencing. This process is essential for biopharma to ensure the cell line's identity for biological production. Authentication can also help to identify unknown species that may cause human infections.

See how Danaher Life Sciences can help

Talk to an expert

Long-read sequencing challenges

Despite some wonderful benefits, long-read sequencing has several limitations that can hinder its extensive use. For example, when compared to short-read sequencing, long-read suffers from lower per-read accuracy.

One major challenge of using long reads is achieving accurate haplotype resolution in organisms with diploid genomes (such as humans) while assembling their end-to-end genomic structures.

Preparing samples for long-read sequencing workflows is a challenging task. Nucleic acids with larger sizes are delicate and prone to fragmentation during preparation. Poor sample preparation can directly impact the quality of the data obtained. Depending on the chosen approach, unintended consequences can include longer preparation times and lower recovery rates.

Although long-read sequencing may offer advantages over short-read sequencing, such as generating more complete and contiguous genome assemblies, it is also essential to consider the potential cost implications. It's worth noting that the cost of long-read sequencing has decreased considerably since its inception, and many researchers are finding that the additional insights gained from long-read sequencing are well worth the investment. However, short-read sequencing may still be the more feasible option for some projects with limited budgets. Ultimately, the choice of sequencing technology will depend on factors such as the research question, the available resources and the desired level of accuracy and resolution.

Future directions and advances in long read sequencing

The future of long-read sequencing looks promising, with many exciting developments. As technology providers continue to refine the accuracy and efficiency of long-read sequencing, the technology will continue to be adopted in fields such as medicine, agriculture and environmental science. Additionally, new applications of long-read sequencing are being explored, such as the ability to sequence entire chromosomes, which could eliminate the practice of karyotyping.

Users and technology developers aim to reduce the per genome cost to par with short-read sequencing. This is being achieved through recent advancements in sequencing and data analysis technologies. Automating more parts of the workflow can increase reproducibility and read accuracy while reducing staff time commitments and training requirements. As the cost of long-read sequencing continues to decrease, we can also expect it to become more accessible to researchers and scientists worldwide.

With the increasing use of long-read sequencing technology, there is a growing demand for advanced bioinformatics tools to analyze complex data. Robust analytic pipelines will deliver more comprehensive and precise genomic and transcriptomic data. However, to take full advantage of their capabilities, it is necessary to continually refine and benchmark tools specifically tailored to the challenges and opportunities presented by this technique.

FAQs

What is long-read sequencing?

Long-read sequencing is a third-generation sequencing technology that can produce DNA reads around 10 Kbps to several megabases in length.

What is the difference between long-read and short-read sequencing?

Long-read sequencing can handle DNA fragments that are several megabases in length. However, short-read sequencing can only generate reads between 50-1000 bases.

What is read length in DNA sequencing?

The read length in DNA sequencing refers to the number of base pairs or length/size of DNA fragments sequenced in a single run.

What are the advantages of long-read sequencing over short-read sequencing?

Long-read sequencing has several advantages over short-end sequencing, such as:

  • Production of longer reads, often stretching to megabases

  • Accessibility of complex genomic structures, such as telomeric regions

  • Higher sequence coverage even during high GC content

  • Detection of base modifications, such as methylation

What are the advantages of long reads for structural variation analysis?

Long-reads allow the detection of complex structural variations, such as large deletions, duplications, inversions and other chromosomal rearrangements.

See how Danaher Life Sciences can help

Talk to an expert

Long read sequencing

Long read sequencing