What is Protein Sequencing?
Protein sequencing is an essential workflow in biology and medicine, and it enables the analysis of amino acid sequences, confirmation along with protein structure and function, the identification of new protein biomarkers and drug targets, the construction of phylogenetic trees to understand evolutionary relationships, and the identification of orthologous and paralogous proteins. It aids in extrapolating the functions of unknown orthologous proteins, facilitating cDNA library screening, and supporting protein engineering studies.
Protein sequencing is the process of examining the covalent structure and amino acid sequence of a fully developed polypeptide, which is considered a component of post-translational modifications. The structure of a protein is intricately linked to the sequence of amino acids in that protein.
Basics of Protein Sequencing and Analysis
Protein sequencing can be categorized into three main methods: investigating the N-terminus, exploring the C-terminus (limited methods, typically involving carboxypeptidases), and breaking down polypeptides into peptides.
Studying the structure and function of proteins is crucial as they are the fundamental actors in the molecular drama of life, orchestrating biological functions at a nanoscale level and responsible for maintaining life, replication, defense, and reproduction. These versatile molecules, composed of twenty natural amino acids, underpin all cellular processes and hold the key to understanding diseases, evolution, and the intricacies of life on our planet.
Protein Sequencing Methods
Classical protein sequencing methods like the Edman degradation technique systematically eliminate amino acid residues from a peptide's amino end, preserving protein integrity. In contrast, next-generation protein sequencing (NGPS) utilizes advanced technologies to comprehensively identify all protein species in a biological organism, offering unprecedented precision and potential applications in disease research and immunotherapy.
Edman Degradation Method
Edman degradation is a technique for purifying proteins by systematically eliminating individual amino acid residues from the peptide's amino end. Pehr Edman devised a solution to the problem of potential protein damage during hydrolysis. Instead of using harsh conditions, Edman introduced a novel labeling and cleaving approach by adding Phenyl isothiocyanate, a phenylthiocarbamoyl derivative form with the N-terminal. The cleavage of the N-terminal occurs under milder acidic conditions, resulting in a cyclic compound known as phenylthiohydantoin (PTH)-amino acid. This process preserves the protein integrity, leaving behind two peptide sequence components. The procedure can be iterated for each residue, allowing for the sequential separation of individual residues without compromising the overall protein structure.
Protein sequencing via Edman’s method faces challenges in sequencing larger proteins due to its less-than-optimal efficiency. To overcome this limitation, a divide-and-conquer strategy is employed, breaking down the larger protein into more manageable amino acid segments. This involves the use of specific chemicals or enzymes capable of cleaving the protein at predetermined amino acid residues. The resulting smaller peptides can be isolated through chromatography and subsequently subjected to Edman sequencing, as their reduced size allows for a more effective and accurate determination of their amino acid sequence.
Mass Spectrometry-based Techniques
Mass spectrometry is now preferred for determining protein mass and structure. It involves ionization, acceleration of charged particles into a mass analyzer, and detection based on deflection proportional to the mass-to-charge ratio (m/z). Sample introduction methods include diffusion, liquid injection, or laser desorption for large proteins. LCMS, coupling HPLC with mass spectrometry, is used for analyzing complex mixtures.
LC-MS, a common technique now, employs electrospray ionization (ESI) for a reliable interface. It's versatile for various biological molecules, and tandem MS and stable isotopes enhance sensitivity and accuracy. Despite the need for method optimization to reduce ion suppression, fast scanning enables multiplexing, measuring numerous compounds in one run. As instruments become more affordable, LC-MS is gaining importance in clinical biochemistry, competing with traditional methods like liquid chromatography and immunoassay.
Quadrupole mass analyzers, commonly used in tandem mass spectrometry (MS/MS), select ions based on varying m/z ratios and employ collision-induced dissociation (CID) to fragment selected ions in the ion trap, allowing the determination of peptide/protein sequences. This technique typically involves two mass analyzers separated by a collision cell, but it can also be accomplished in a single mass analyzer using a quadrupole ion trap.
Mass spectrometry provides clinical laboratories with benefits, such as specificity, sensitivity, high throughput, and cost-effective testing. However, challenges arise, particularly due to the absence of standardization in therapeutic drug monitoring assays and concerns about potential additional FDA regulations on laboratory-developed tests (LDTs).
Featured Product

Echo® MS+ Mass Spectrometer System
A pioneering combination of acoustic droplet ejection (ADE) and the open port interface (OPI), along with the quantitative power of the SCIEX Triple Quad 6500+ mass spectrometer, this ultra-high sample throughput system delivers speed, scale and reproducibility with unyielding data quality.
Next-generation Sequencing Techniques
Next-generation protein sequencing (NGPS) revolutionizes the comprehensive identification of all protein species within a single biological organism. It represents a technological initiative aimed at extracting proteomic data with precision. NGPS enables the identification of proteins present in low abundance, including those undergoing post-translational modifications.
NGS has demonstrated notable benefits in terms of speed, throughput, and accuracy compared to conventional methods. Its utilization in clinical microbiology has expanded the detection and characterization of microbes by circumventing reliance on traditional culture techniques and effectively identifying non-culturable microorganisms.
Next-generation protein sequencing (NGPS) holds promise for revolutionizing disease research, immunotherapy, and vaccine development by precisely identifying protein sequences and enabling the study of antibody variability. Despite technological challenges, ongoing efforts to integrate diverse techniques, such as mass spectrometry and nanopore technologies, aim to establish NGPS as a comprehensive tool for advancing personalized medicine and biotechnological applications.
See how Danaher Life Sciences can help
Challenges and Limitations in Protein Sequencing
Post-translational modifications (PTMs) involve the alteration of amino acid side chains in certain proteins after their synthesis, serving as essential molecular regulatory mechanisms that govern a variety of cellular processes. These modifications, which include proteolytic cleavage and the addition of modifying groups like acetyl, phosphoryl, glycosyl, and methyl to one or more amino acids, play a pivotal role in numerous biological processes, substantially influencing the structure and dynamics of proteins.
The intricacy of protein sequences, particularly those with significant disorder or featuring segments of low complexity, presents challenges in ensuring a consistent charge for efficient sequencing via mass spectrometry. The heterogeneity and intricacy of protein sequences also create computational challenges, making the development and maintenance of protein sequence alignment algorithms more demanding, particularly given the heightened complexity in addressing long-range interactions and combinatorial challenges in sampling β‐sheet proteins.
Sequencing membrane proteins poses numerous challenges attributable to their hydrophobic characteristics, complicating processes such as overexpression, purification, and crystallization. Conventional protein sequencing techniques like NMR or x-ray diffraction face constraints in elucidating the structure of membrane proteins.
Recent Advancements and Future Prospects
Bioinformatics plays a significant role in proteomics, as well as genomics. RFdiffusion, within the RoseTTAFold framework, employs guided diffusion for the generation of novel protein structures, while Protein MPNN serves as a robust tool for designing protein sequences. RoseTTAFold utilizes deep learning to rapidly and precisely predict protein structures solely based on amino acid sequences. The Rosetta software suite encompasses computational modeling and analysis algorithms for protein structures. Foldit, a free desktop game, empowers individuals to contribute to protein design by leveraging their creativity and ingenuity, offering a collaborative platform where both computers and researchers may overlook important aspects.
Long-read sequencing enables the measurement of gene network interactions across chromosomes in unprecedented ways. The application of long reads has rectified numerous errors in the genome assemblies of various animal species, particularly in addressing false gene gains and losses that were prevalent in assemblies based on short-read sequencing.High-throughput automated imaging systems have played a crucial role in supporting clonally-derived cell line development for the production of therapeutic proteins, ensuring a standardized and consistent manufacturing process.
Utilizing sequencing methods for patients undergoing treatment enables the detection of emerging mutations, while genomic characterization of the primary disease identifies molecular pathways that can propose personalized biomarkers for monitoring the progression of the patient's condition.
FAQs
What is protein sequencing?
Protein sequencing is the process of determining the amino acid order in a protein chain, revealing its primary structure. Single-molecule protein sequencing is an advanced technique that involves analyzing single amino acid or protein molecules, enabling high-resolution insights into complex biological systems. Protein sequencing has applications in understanding disease mechanisms, drug development, and personalized medicine.
What is the difference between DNA sequencing and protein sequencing?
DNA sequencing involves determining the nucleotide sequence in a DNA molecule, while protein sequencing involves identifying the amino acid sequence in a protein chain.
How is protein sequence analysis performed?
Protein sequence analysis involves techniques such as mass spectrometry, Edman degradation, and more recently, advanced protein sequencing technologies like single-molecule protein sequencing, which precisely determine the order of amino acids in a protein chain.
What is the sequence of events in protein synthesis?
Protein synthesis unfolds in two main stages: transcription and translation. In transcription, the DNA code is transcribed into mRNA in the cell nucleus. This mRNA then migrates to the cytoplasm, where translation takes place. During translation, the ribosome reads the mRNA code and with the help of tRNA molecules, assembles a corresponding sequence of amino acids, ultimately forming a functional protein.
Understanding this sequence is fundamental to comprehending how genetic information is transformed into the intricate structures and functions of proteins within living cells.
Why is protein sequence analysis important?
Protein sequence analysis is crucial for understanding the structure, function, and evolutionary relationships of proteins, providing insights into biological processes and facilitating advancements in fields such as medicine and biotechnology.
See how Danaher Life Sciences can help
recent-articles