Introduction 1 2 3 4 5 6 7 8 An epitope is any molecular structure that can be recognised by the immune, or other biological, system. Epitopes, or the antigen from which they are derived, can be composed of protein, carbohydrate, lipid, nucleotide, or a combination thereof. It is through recognition of foreign, or non-self, epitopes that the immune system can identify and, hopefully, destroy pathogens. Hitherto, peptide epitopes have been the best studied, and have, traditionally, have been categorized as either T cell or B cell epitopes. T cell epitopes are peptides presented to the cellular arm of the immune system via the MHC-peptide-TCR complex. B cell epitopes represent surface regions of an antigen that are bound by soluble or membrane-bound antibodies. If this region of a protein antigen is comprised of residues distally separated within the primary structure, and brought into local proximity by protein folding, then it is termed a discontinuous or conformational B cell epitope. Linear or continuous B cell epitope residues are sequential in both primary structure and thus as a region on the proteins' surface. Such epitopes are predominantly identified by antigen-specific antibody cross-reactivity with peptides. 2 8 v2.0 9 10 11 in silico Database development 9 10 Database content 9 10 v2.0 v 1 2 Table 1 AntiJen sub-databases and content. DATABASE CONTENT T Cell Epitopes Contains T Cell epitope peptides (known binders). B Cell Epitopes Contains B Cell linear and conformational epitope peptides. MHC-Peptide Binding data relating to antigenic peptides and MHC interactions. TCR Binding data relating to antigenic peptides – TCR – MHC interactions. TAP Binding data relating to antigenic peptides and TAP interactions. Kinetics Kinetic binding data for MHC peptide interactions. IPPI Binding data for a collection of immunological protein interactions. Diffusion Coefficient Collection of Diffusion and Friction coefficients for surface peptides. Copy Number Number/Abundance of cell surface molecules. Peptide Libraries Relative binding data for antigenic peptide amino acid substitutions. Antibody-Peptide A variety of antibodies known to bind proteins. Table 2 Size of AntiJen relative to JenPep. The number of peptides for each category in the AntiJen database is given, distinguishing between class I and class II categories, where appropriate. Growth versus JenPep 1 and 2, the progenitors of AntiJen, is included. For certain data categories, most obviously TAP binding data, re-evaluation of the quality of data within JenPep has seen it decrease rather than increase, however the expansion of the data is clearly seen. DATABASE v1.0 v2.0 v1.0 v2.0 Class 1 Class 2 Total Class 1 Class 2 Total Class 1 Class 2 TOTAL Class 1 Class 2 TOTAL T cell epitotes 1266 795 2061 2060 1158 3218 2247 1578 3825 2402 1585 4158 MHC peptide binding 3196 2652 5848 6411 5925 12336 6853 7772 14625 7304 8114 15454 TAP peptide binding 432 441 408 1106 B cell epitotes 816 1295 3541 TCR – peptide-MHC 49 375 124 594 527 253 782 MHC peptide kinetics 704 243 947 897 294 1150 IPPI 805 2675 Copy Number 161 243 414 Diffusion coefficients 759 Peptide Libraries 897 Antibody 395 YTSDYFISY YTSDYFISY SWISS-PROT code P41156 YTSDYFISY Journal of Immunol 1994 volume 152 pages 3913–3924, PUBMED ID 8144960 YTSDYFISY human class I A*0101 12 13 14 15 16 17 AntiJen is, where possible, a quantitative database archiving continuous measures of binding. This is a fundamental feature of several sub databases, such as the MHC ligand and pMHC-TCR databases. The binding of an immunological macromolecule to a peptide or other biomacromolecule is quantified as are other receptor-ligand interactions: R L RL on off k on R L k off RL Rearranging: D A a max 50 50 50 50 YTSDYFISY 50 50 3 Table 3 AntiJen Thermodynamic and Kinetic Data. An overview of the 6 AntiJen databases that provide binding data. It must also be noted that several of the databases contain additional data not present in any of the other databanks. MHC-Peptide Kinetics IPPI TAP pMHC-TCR Antibody TOTAL 50 8562 0 247 1000 0 4 9813 on 0 188 563 0 157 87 995 off 0 146 610 0 150 101 1007 D 359 156 1143 16 227 70 1971 a 65 0 37 0 28 132 262 1/2 0 207 72 0 148 0 427 4 1 Table 4 etc DATABASE TOTAL pH Temperature [standard] Stand. peptide seq. [competitor] Method [peptide] MHC Binding 15454 6679 9831 10893 12796 5007 1251 MHC Kinetics 1150 677 1101 1149 606 TAP Binding 1106 22 243 1092 1101 86 981 TCR-pMHC 782 426 632 668 IPPI a 2675 726 1371 2600 Copy Number 414 183 278 414 Peptide Libraries 897 897 897 Diffusion Coefficient 759 321 668 736 Antibody 395 119 115 372 Figure 1 The distribution of experimental methods applied within each database OTHERS 16 17 Subsidiary Databases in AntiJen The AntiJen database contains a number of sub-databases. Each of these contains data on different aspects of the biological function and/or biophysical properties of different classes of immunomacromolecule. We describe the nature and content of each sub database below. B Cell Epitopes de novo TTGDVIASS Escherichia coli TTGDVI T Cell Epitopes T cell epitopes are short peptides bound by major histocompatibility complexes (MHC) and subsequently recognized by T cells. Epitopes recognized by both CD4+ and CD8+ T cells are included in the database. Such epitopes can be identified in many different ways. However, this diversity of measurement imposes a certain need for consistency, necessitating the requirement for recording a range of different experimental methods. The archive has expanded to include 4,158 entries. The entries contain the epitopes, ranging in length from 4 to 38 amino acids, peptide information, detailing the source, with links to Swiss-Prot and the corresponding MHC restriction data such as Serotype, Allele and Class. Additionally, the peptides are categorized in to groups such as Allergens, Bacterial, Cancer, Human, Viral and Self peptides. MHC – Peptide binding 18 19 50 50 50 A D m 50 pMHC-T Cell Receptor interaction D 50 on off A 1/2 TAP Binding 50 D 50 2 Peptide-MHC Kinetics on off 2 1/2 1/2 2 Immunological Protein-Protein Interactions on off D A 50 Antibody – Protein Binding in vitro D A on off 50 Peptide Libraries 20 50 50 Diffusion Co-efficients 2 -1 21 etc Copy Numbers 23 Searching the database 24 24 2 12 Figure 2 Overview of the different search methods within AntiJen 2 3 3 25 4 12 50 Figure 3 Searchable database types within AntiJen 50 Figure 4 Sparse peptide sequence search 50 Discussion 26 in vivo in vitro 2 8 2 7 5 27 in silico 18 19 28 29 Future work Future tasks in the development of AntiJen, fall into two principle categories: eliminating deficiencies, errors, and inconsistencies within the database and simultaneously reinforcing it by expanding its depth, breadth, and scope. We also need to monitor updates within external databases, so that any alterations are mirrored within the archive. Like all other such repositories, AntiJen is prone to both systematic and random errors within the data accumulation process. User feedback and our interactions with immunologists will hopefully address persisting errors. Deficiencies in our database include our current inability to encode chemically or post-translationally modified peptides, non-natural MHC mutants and non-amino acid peptidomimetic MHC ligands. Additionally, it would also be interesting to complement our existing data on TAP binding with information on antigen presentation pathways, such as proteasomal and cathepsin cleavage patterns. Moreover, the compilation of B cell or antibody epitope data is an area ripe for robust development. Linear and conformational B cell epitopes are very much larger in number than our current compilation, leaving us scope to greatly increase recorded epitopes. Conclusion The development of a database is always a work in progress. Not simply because the easily accessible literature is typically always increasing, but also because of the desire to capture as much of the existing, but hidden, literature, as possible. In the post-genomic era, the database has formed the bedrock and language of bioinformatics; increasingly databases are coming to underpin our modern understanding of biology as a whole. Traditionally, databases have arisen as a response to need, answering the individual and idiosyncratic questions posed by biologists. However, the history of bioinformatics databases has shown the extraordinarily diverse ways in which archived data can be used. in silico 16 17 28 29 12 30