De Novo Antibody Sequencing: Which Enzymes are Best?
De novo antibody sequencing is the process of determining the primary amino acid sequence of an antibody directly from the protein. Determining the sequence of an antibody is a requirement for many downstream activities.
- Intellectual Property Protection: In order to patent an antibody, the primary sequence is needed.
- Antibody Optimization & Engineering: In order to optimize the antibody, the primary sequence acts as a starting point for affinity maturation, humanization, or isotype switching.
- Recombinant Production: Antibody genes can be synthesized and expressed heterologously in cells. Producing antibodies recombinantly ensures batch to batch consistency and an endless supply.
We summarized the different methods for sequencing an antibody in our resource page.
For de novo antibody sequencing, mass spectrometry is the method of choice. Mass spectrometry is high-throughput, highly accurate, and can accommodate post-translational modifications that may occur on antibodies. At Abterra Bio, we’ve been sequencing antibodies using mass spectrometry for over a decade, and summarized the different approaches (including our Valens technology), in a blog post
The most robust method for sequencing antibodies with mass spectrometry is to digest the antibody into shorter peptides that are more amenable to analysis. A key to successful de novo antibody sequencing is obtaining complete sequence coverage, particularly the CDR3, by peptides. Otherwise the recovered sequence will be incomplete or inaccurate. Selecting enzymes for digestion is an important step for sequencing but due to sequence composition certain enzymes may be more suitable for certain antibodies. In this blog post we outline our enzyme choice can impact de novo antibody sequencing with a focus on the heavy chain.
Traditional enzymes for de novo antibody sequencing
Trypsin
Trypsin is the the workhorse of mass spectrometry-based proteomics. It cleaves C-terminal to arginine (R) and lysine (K). Every tryptic peptide, therefore, contains an R or a K which results in good ionization of the peptides and easier interpretation of the mass spectra. Trypsin is also ideal for targeting the CDR3 of many IgGs. 93% of human germline V genes and 95% of mouse germline V genes end with a R or K, making the start of the CDR3 a likely N-terminus of a tryptic peptide.
Provided that there is no R or K in the D gene or untemplated insertions/deletions, then a suitable C-terminus for the peptide could be found in the J gene or CH1 of the C gene. In mouse, no germline J gene contains a tryptic cleavage site and in human only two allelic variants of the IGHJ6 gene contain lysines.
Most IgG antibodies from human and mouse have a tryptic cleavage site early in the CH1 that makes de novo antibody sequencing simpler.
Examples of human tryptic peptides covering the H-CDR3:
Four example human V-J-C gene junctions are shown. The V gene is colored blue, the J gene is colored orange, and the C gene is colored green. The D gene and any untemplated insertions that make up the remainder of the CDR3 are denoted with an X. In the 4th example antibody, the IgHJ6 allelic variant is shown with a tryptic terminus before the contstant region. The IgGs all contain a tryptic cleavage site in early in the CH1.
Examples of mouse tryptic peptides covering the H-CDR3:
IGHV3-6/IGHJ2/IGHG1 …KLNSVTTEDTATYYCARXXXXXYFDYWGQGTTLTVSSAKTTPPSVYPLAPG…
IGHV1-72/IGHJ3/IGHG2A …QLSSLTSEDSAVYYCARXXXXXWFAYWGQGTLVTVSAAKTTAPSVYPLAPV…
IGHV1-81/IGHJ4/IGHG2B …ELRSLTSEDSAVYFCARXXXYYAMDYWGQGTSVTVSSAKTTPPSVYPLAPG…
IGHV1-7/IGHJ4/IGHG2C …QLSSLTYEDSAVYYCARXXXYYAMDYWGQGTSVTVSSAKTTAPSVYPLAPV…
<—tryptic peptide–>
IGHV1-80/IGHJ1/IGHG3 …QLSSLTSEDSAVYFCARXXXXXYFDYWGQGTTLTVSSATTTAPSVYPLVPGCSDTSGSSVTLGCLVK…
<—————–tryptic peptide—————->
Five example mouse V-J-C gene junctions are shown. The V gene is colored blue, the J gene is colored orange, and the C gene is colored green. The D gene and any untemplated insertions that make up the remainder of the CDR3 are denoted with an X. The IgGs all contain a tryptic cleavage site in early in the CH1 except IgG3.
Note that the IgG3 constant region does not have a tryptic cleavage site until much further into the CH1. This can result in a peptide that is too long for most analysis pipelines. At Abterra Bio, we’ve developed specialized informatics tools for sequencing these long peptides.
Chymotrypsin
Chymotrypsin is another popular enzyme for mass spectrometry-based proteomics and de novo antibody sequencing. This enzyme cleaves C-terminal to phenyalanine (F), tyrosine (Y), leucine (L), and tryptophan (W). Due to the frequency of cleavage sites, chymotrypsin often results in short peptides. It can be complementary to trypsin since it cleaves just upstream of trypsin at a Y or F, and just inside the J gene where there’s a buffet of Y, F, and W to cleave (provided there are no cleavage sites within the CDR3).
Sequencing non-IgG antibodies
In human, isotypes beyond IgG may be of interest to study allergic reactions (Sutten et al), immunology of the gut or mucous membranes (de Sousa-Pereira and Woof), or naive antibody repertoires (Keyt et al). Below we show the CH1 of the major allelic variants of common antibody isotypes. For trypsin, the cleavage sites for IgA and IgE are slightly further into the constant region compared to IgGs. However, for IgM, the first tryptic cleavage site is 43 amino acids into the CH1. When combined with the CDR3 and framework 4, the size of this peptide can often exceed 70 amino acids (~7.7 kDa).
IGHG1 CH1 …ASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSS…
IGHG2 CH1 …ASTKGPSVFPLAPCSRSTSESTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSS…
IGHG3 CH1 …ASTKGPSVFPLAPCSRSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSS…
IGHG4 CH1 …ASTKGPSVFPLAPCSRSTSESTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSS…
^
IGHA1 CH1 …ASPTSPKVFPLSLCSTQPDGNVVIACLVQGFFPQEPLSVTWSESGQGVTARNFPPSQDAS…
IGHA2 CH1 …ASPTSPKVFPLSLDSTPQDGNVVVACLVQGFFPQEPLSVTWSESGQNVTARNFPPSQDAS…
^
IGHGE CH1 …ASTQSPSVFPLTRCCKNIPSNATSVTLGCLATGYFPEPVMVTCDTGSLNGTTMTLPATTL…
^
IGHGM CH1 …GSASAPTLFPLVSCENSPSDTSSVAVGCLAQDFLPDSITLSWKYKNNSDISSTRGFPSVL…
^
The CH1 of major human isotypes are shown with tryptic cleave sites marked with ‘^’.
Some methods that use top-down proteomics can directly analyze fragments of this size or larger (Stork et al, Bondt et al), however, they are not commonly used due to the complexity of the sample preparation, instrumentation, and incomplete sequence recovery.
While complete recovery of the H-CDR3 in a single peptide is ideal, assembling shorter peptides can be successful. This type of assembly, where overlapping peptides are stitched together in silico to form a larger peptide sequence, is a core capability of our polyclonal antibody sequencing platform, Griffin.
Full Sequence QVIVESGGGIVQPGGSIRISCXXXXXXXXXXXXXWYRQISGGERESVAWIGPXXXXXXXXXXFTISRDNAKNTIYIQMDNIKPEDTAVYYCXXXXXXXXXXXWGQGTQVTVSS
Peptide 1 QVIVESGGGIVQPGGSIR———————————————————————————————–
Peptide 2 –IVESGGGIVQPGGSIRISCXXXXXXXXXXXXX——————————————————————————-
Peptide 3 ———————XXXXXXXXXXXXXWYRQISGGERE——————————————————————–
Peptide 4 —————————–XXXXXWYRQISGGERESV——————————————————————
Peptide 5 ———————————-WYRQISGGERESVAWIGPXXXXXXXXXXFTISR———————————————-
Peptide 6 ————————————-QISGGERESVAWIGPXXXXXXXXXXFTISRDNAK——————————————
Peptide 7 ———————————————SVAWIGPXXXXXXXXXXFTISRDNAKNTIY————————————–
Peptide 8 ————————————————————XXFTISRDNAKNTIYIQMDNIKPEDTA————————–
Peptide 9 ————————————————————–FTISRDNAKNTIYIQMDNIKPEDTAVYYCPA——————–
Peptide 10 —————————————————————TISRDNAKNTIYIQMDNIKPEDTAVYYCXXXXXXXXXXXWGQGTQVTVSS
Peptide 11 ——————————————————————————————CXXXXXXXXXXXWGQGTQVTVSS
Above a full length heavy chain sequence is shown as a result of assembling the overlapping peptides aligned beneath it.
De novo antibody sequencing for less traditional species
Comprehensive germline gene references and favorable enzymatic cleavage sites around the CDRs can simplify the task of de novo antibody sequencing. However, some species that are less commonly used for as reagents and therapeutics, and have unexpected pitfalls to watch out for.
[1] Bonissone, Stefano R. “Gene conversion identification and analysis in immunoglobulins.” The Journal of Immunology 202.1_Supplement (2019): 131-23.
[2] Macpherson, Alex, et al. “Isolation of antigen-specific, disulphide-rich knob domain peptides from bovine antibodies.” PLoS Biology 18.9 (2020): e3000821.
[3] Huang, Ruiqi, et al. “The smallest functional antibody fragment: Ultralong CDR H3 antibody knob regions potently neutralize SARS-CoV-2.” Proceedings of the National Academy of Sciences 120.39 (2023): e2303455120.
Rabbit pAb Sequencing
Griffin is species-agnostic, and can be used to sequencing polyclonal antibodies from other species – including rabbit like we did in a recent case study.
VHH Antibody Discovery
B cells and serum antibodies provide complementary information about the immune response. Our Alicanto platform combines B cell repertoire sequencing and serum antibody analysis to deliver diverse, functional VHHs.
Griffin vs Alicanto
What’s the different between serum-only antibody sequencing with Griffin and proteogenomic antibody sequencing with Alicanto? We investigate the advantages and trade offs of each approach in this blog post.


