FASTA2AIRR: comparing human and humanized antibodies
Fasta2airr is a free web app for converting antibody sequences into annotated AIRR-Seq files. Converting sequences into this open format is an essential step in antibody repertoire analysis. Among other antibody meta data the AIRR-Seq file contains:
- Germline gene calls for V, D, and J genes
- CDR sequences
- Mutations with germline
Fasta2airr not only converts your sequences into this versatile, tab-delimited format, it also generates visualizations for key features of antibodies. In this blog post we use fasta2airr to identify key features of two sets of therapeutic antibodies.
What is a human antibody?
A ‘human’ antibody heavy chain is the product of a genomic rearrangement of human germline V, D, and J genes. There are several ways fully human antibodies can be generated.
- From human donors. Ansuvimab, for example, was derived directly from a patient who had recovered from the Ebola virus. Human donors can also provide natural human sequences for screening as ‘naive’ libraries.
- From transgenic animals. When human antibody germline genes are inserted into the genome of another animal (e.g. a mouse), it is capable of generating human antibodies.
- From in vitro libraries. Both synthetic libraries and semi-synthetic libraries can be designed in silico to mimic human antibody sequences.
What is a humanized antibody?
A humanized antibody is generated from non-human germline genes followed by mutagenesis to make the sequences look more human and thus less immunogenic to humans. The earliest antibodies, like rituximab, contained fully mouse variable regions and resulted in anti-drug immune responses. Learning from those early years, scientists have created humanization methods to go from almost any species to human. The key challenge is retaining the antigen-binding properties while reducing immunogenicity.
Results
Both human and humanized antibody heavy chains use a diverse repertoire of germline genes.
The antibodies were aligned to the closest human germline V and J genes. The dot plots below show the frequency of each V-J combination. Both sets of antibodies made broad use germline V and J genes.
The human antibodies were generated from 17 total V genes, with the most common V gene being IGHV3-23.
The humanized antibodies were generated from 16 germline V genes, with IGHV1-46 and IGHV3-23 overrepresented in the set.
Human antibodies appear to be less mutated than humanized.
Fasta2airr finds the closest human germline gene and aligns the submitted antibody sequence to the germline. From the alignment, the number of amino acid mutations can be calculated. Since humanized antibodies are derived from non-human sequences to begin with, it’s not unexpected that they are more different from human germlines than human antibodies. The fasta2airr plots provide a sanity check for the mutation distribution of each set.
Most of the human heavy chains analyzed had fewer than 10 amino acid mutations when aligned to germline.
The human light chains also show fewer mutations when aligned to germline.
Almost two thirds of the humanized heavy chains had 15 or more mutations when aligned to germline.
The humanized light chains were more mutated than their human counterparts.
Human antibodies appear to be less mutated than humanized.
Fasta2airr finds the closest human germline gene and aligns the submitted antibody sequence to the germline. From the alignment, the number of amino acid mutations can be calculated. Since humanized antibodies are derived from non-human sequences to begin with, it’s not unexpected that they are more different from human germlines than human antibodies. The fasta2airr plots provide a sanity check for the mutation distribution of each set.
6 of the 24 (25%) light chains from the human therapeutic set were from the lambda locus.
2 of the 24 (8%) of the light chains from the humanized therapeutic antibodies were from the lambda locus.
What else can Reptor analysis do for you?
The output of the fasta2airr web app is a small piece of what the Reptor repertoire analysis service can provide.
Clone sharing across samples
CDR3 sharing across samples reveals pair-wise sample similarity at a clone level.
The diagonal shows the number of amino acid CDR3s present in each sample, while the off-diagonal heat map shows the jaccard similary of clones (intersection/union).
In the plot to the right, T4 and T5 are 7 days apart, as are T1 and T2. T3 is 20 days after T2 and 30 before T4. As expected, proximal time points show more sharing than distal ones.
CDR motif analysis
Reptor generates CDR plots to show per-position amino acid frequency and entropy by position.
.
Clone Clustering and Hit Expansion
The AIRRSeqViewer, a standalone software tool available as part of the Reptor service, clusters CDR3 sequences on user-defined clone definitions to identify relatives of hits.
In the network to the right, the triangle node represents a clone identified via single B cell sequencing, and the edges identify other clonal lineages that are 90% identical.