De Novo Antibody Sequencing NISTmAb with Post-Translational Modification Detection

Jun 26, 2025 | Science, Services

Protein sequencing has been possible for 75 years with Edman degradation, but over the past decade, mass spectrometry (MS)-based sequencing has become the dominant approach for antibodies.

The general strategy is to digest antibody proteins into short peptides using distinct proteases. Distinct proteases are selected for their unique protein cleavage specificity, so that short peptides can be analyzed and interpreted by MS, and then assembled back into full-length antibody heavy and light chains. In this blog post we show the result of de novo antibody sequencing NISTmAb, and how the primary sequence information can be enriched by using complementary algorithms to identify post-translational modifications.

The figure below highlights the key steps in the de novo antibody sequencing process, with the de novo sequencing step involving two informatic components. The first component uses an algorithm to predict a peptide sequence for each mass spectrum. The peptide sequences are fed into a second algorithm to assemble full-length heavy and light chain sequences. We’ve been sequencing monoclonal antibodies for over a decade with Valens. You can read more about our Valens services here.

De novo monoclonal antibody sequencing process

Database Search vs De Novo Peptide Identification

Most MS-based proteomics analysis is performed with software that scans a protein database for peptides matching mass spectra, an approach called database search. In contrast, antibody sequence requires an alternate approach called de novo peptide sequencing that determines peptide sequences from mass spectra without the aide of a database. Database search has been the preferred method over de novo due to its speed and sensitivity. However, good de novo peptide sequencing of mass spectra is required for assembling accurate antibody sequence since a reliable antibody database is unavailable. In this post, we will show a common limitation of de novo peptide sequencing, and how database search can be used to help recover antibody sequence coverage.

To help understand the difference between de novo vs database search peptide ids, let’s start with an analogy. Suppose we play a game of identify the animal based on descriptor words. For example, I could say “has fur” and “purs”, and you would guess: “tiger”, “lion”, “leopard”, and etc. Any of the guesses could be correct since there isn’t enough of a description. If I gave a context clue, like “lives in my house”, you would confidently narrow the guess to the right animal, a “house cat”. This is essentially the difference between peptide id by de novo and database search. Fragmentation within a mass spectrum (akin to descriptors) needs to be rich enough to support each amino acid residue in a de novo peptide interpretation. On the other hand, peptides sourced from a database (akin to context), can be confidently matched to a mass spectrum with weaker fragmentation (akin to a few descriptors).

De Novo and Database Search for NISTmAb Analysis

We can demonstrate the difference using the standard control antibody material NISTmAb R.M. 8671 (NIST Monoclonal Antibody Reference Material 8671 ) with our Valens process. The antibody was digested with multiple enzymes (chymotrypsin, elastase, pepsin, and trypsin), and analyzed by a ThermoFisher Eclipse Orbitrap instrument. Our proprietary algorithms were used for de novo peptide id, and MSGFPlus was used for database search id.

The chart below highlights the sensitivity gains for peptide-spectrum matches of database search over de novo across the four digests. De novo peptide id enables near full coverage of both heavy and light chains, with 99% of residues covered by 11 de novo peptide-spectrum matches. Database search recruits additional ids for 2.9x and 4.0x higher peptide-spectrum match coverage per residue of heavy and light chains. The sensitivity gains will depend on the MS data acquisition method selected, and more importantly the quality of de novo id software.

Comparison of de novo peptide identification versus database search

De novo peptide ids supporting NISTmAb CDR-H3. The top 10 de novo peptide spectrum matches for pepsin are shown in the top track. Subsequent tracks show total peptide spectral coverage per residue for multiple enzymes identified by de novo and database search.

Detecting Post-Translational Modifications: NISTmAb Glycosylation

Standard controls are essential for research and development, and NISTmAb is derived from the therapeutic anti-RSV humanized mouse antibody, motavizumab. While highly pure, the material is not a single molecule! The intact monoclonal antibody tetramer has many isotopologues. Additionally, human IgG1-class antibodies have a single N-linked glycosylation site residing in the constant region, and results in a lot of variability.

The section below shows the sequence of both heavy and light chains.

>NISTmAb HC (motavizumab) QVTLRESGPALVKPTQTLTLTCTFSGFSLSTAGMSVGWIRQPPGKALEWLADIWWDDKKHYNPSLKDRLTISKDTSKNQVVLKVTNMDPADTATYYCARDMIFNFYFDVWGQGTTVTVSSASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKRVEPKSCDKTHTCPPCPAPELLGGPSVFLFPPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQY<N>STYRVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSREEMTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQGNVFSCSVMHEALHNHYTQKSLSLSPGK

>NISTmAb LC (motavizumab) DIQMTQSPSTLSASVGDRVTITCSASSRVGYMHWYQQKPGKAPKLLIYDTSKLASGVPSRFSGSGSGTEFTLTISSLQPDDFATYYCFQGSGYPFTFGGGTKVEIKRTVAAPSVFIFPPSDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTLSKADYEKHKVYACEVTHQGLSSPVTKSFNRGEC

Glycosylation is a biological post-translational modification, and a number of specialized computational tools have been developed to identify glycans and glycopeptides via mass spectrometry. De novo peptide id tools limit sequencing to the 20 natural amino acid residues and a few variable modifications, such as oxidation, to ensure high accuracy.

The top 4 N-linked glycan structures represent >85% of NISTmAb glycosylation. Below each structure is the monoisotopic mass in Daltons and its chemical formula that is added to asparagine.

By not permitting variable modifications, like glycosylations, in de novo peptide sequencing, there will be a loss of sequence coverage for residues in a sample that are modified. Using the NISTmAb dataset from above, we can see the de novo peptide coverage around the canonical N-linked glyscosylation site in the heavy chain constant (residue N84.4 of CH2 in IMGT nomenclature) is lost.

However, an accurate database search tool is sufficient to provide peptide evidence for the sequence in such cases. With MSGFPlus, we can include common glycosylation masses for asparagine residues (N/Asn) to the list of variable modifications. On the NISTmAb dataset, there were 4 glycan variants observed at the canonical glycosylation site with 23 to 66 glycopeptide-spectrum matches. Importantly, only 3 out of 23 other Asn sites were reported to have glycopeptide-spectrum matches. These additional ids could all be filtered out based on lacking the Asn-X-Ser/Thr sequon, which is necessary for oligosaccharyltransferase to attach an oligosaccharide to the protein. The three sites can also can be filtered out based on mass spectrometry criteria alone. Two of the false glycosylation sites only had a single support glycopeptide-spectrum match, which is very weak support. The third reported false glycosylation site is adjacent to the true glycosylation site, and the ten or fewer false glycopeptide spectrum matches, were peptides that encompassed the true glycosylation site. Localizing the exact modification site is a known challenge in post-translational modification search, however MSGFPlus did report the correct site for a majority of the glycopeptide-spectrum matches (169 correct ids).

De novo and database search results for the canonical N-glycan site on NistmAb

Low de novo peptide id coverage over N-linked glycosylation site of NISTmAb that is recovered by database search using variable glycosylation mods.

See this paper for a much deeper dive into glycan variation of NISTmAb in a much more comprehensive interlaboratory study.

Conclusion

Monoclonal antibody sequencing by mass spectrometry is achievable by de novo peptide id and assembly, and shown on the standard NISTmAb material. Antibody purity is important, and unknown sequence modifications can be a challenge for de novo sequencing. We show that database search methods can be used to increase sensitivity, and fully utilize mass spectrometry data to recover full antibody sequence support, in the case of N-linked glycosylation.

Tell Us About Your Project.

Need more information? We're here to answer any questions you have.

SPEAK TO A SCIENTIST

← Monoclonal versus Polyclonal Antibody Sequencing Rabbit Polyclonal Antibody Sequencing for Neurodegenerative Disease Research →