miHA-Match: Computational detection of tissue-specific minor histocompatibility antigens

miHA-Match: Computational detection of tissue-specific minor histocompatibility antigens

Journal of Immunological Methods 386 (2012) 94–100 Contents lists available at SciVerse ScienceDirect Journal of Immunological Methods journal homep...

297KB Sizes 0 Downloads 86 Views

Journal of Immunological Methods 386 (2012) 94–100

Contents lists available at SciVerse ScienceDirect

Journal of Immunological Methods journal homepage: www.elsevier.com/locate/jim

Computational Modeling

miHA-Match: Computational detection of tissue-specific minor histocompatibility antigens☆ Magdalena Feldhahn a,⁎, Pierre Dönnes a, b, Benjamin Schubert a, Karin Schilbach c, Hans-Georg Rammensee d, Oliver Kohlbacher a, e a b c d e

University of Tübingen, Center for Bioinformatics, Applied Bioinformatics, Sand 14, 72076 Tübingen, Germany Systems Biology Research Center, University of Skövde, Sweden University Children's Hospital, Department of, Hematology/Oncology, Hoppe-Seyler Str. 1, 72076, Tübingen, Germany University of Tübingen, Department of Immunology, Auf der Morgenstelle 15, 72076 Tuebingen, Germany University of Tübingen, Quantitative Biology Center, Tübingen, Germany

a r t i c l e

i n f o

Article history: Received 29 August 2012 Accepted 10 September 2012 Available online 14 September 2012 Keywords: Minor histocompatibility antigens Immunoinformatics Transplantation

a b s t r a c t Allogenic stem cell transplantation has shown considerable success in a number of hematological malignancies, in particular in leukemia. The beneficial effect is mediated by donor T cells recognizing patient-specific HLA-binding peptides. These peptides are called minor histocompatibility antigens (miHAs) and are typically caused by single nucleotide polymorphisms. Tissue-specific miHAs have successfully been used in anti-tumor therapy without causing unspecific graft-versus-host reactions. However, only a small number of miHAs have been identified to date, limiting the clinical use. Here we present an immunoinformatics pipeline for the identification of miHAs. The pipeline can be applied to large-scale miHA screening, for example, in the development of diagnostic tests. Another interesting application is the design of personalized miHA-based cancer therapies based on patient–donor pair-specific miHAs detected by this pipeline. The suggested method covers various aspects of genetic variant detection, effects of alternative transcripts, and HLA-peptide binding. A comparison of our computational pipeline and experimentally derived datasets shows excellent agreement and coverage of the computationally predicted miHAs. © 2012 Elsevier B.V. All rights reserved.

1. Introduction Peptide presentation by MHC (HLA) is a key step in T-cell-mediated immunity. This has major implications for stem cell transplantation, where an HLA-mismatch between

☆ This paper was presented at the Third Immunoinformatics and Computational Immunology Workshop (ICIW 2012), October 7, 2012, Orlando, FL, USA. ⁎ Corresponding author. Tel.: +49 7071 29 70 460; fax: +49 7071 29 5152. E-mail addresses: [email protected] (M. Feldhahn), [email protected] (P. Dönnes), [email protected] (B. Schubert), [email protected] (K. Schilbach), [email protected] (H.-G. Rammensee), [email protected] (O. Kohlbacher). 0022-1759/$ – see front matter © 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.jim.2012.09.004

a donor and a patient leads to graft-versus-host disease (GvHD) with severe adverse effects (Shlomchik, 2007). Even in HLA-matched donor–patient pairs GvHD is observed as an effect of minor histocompatibility antigens (miHAs). miHAs also play a role in general transplantation settings where they are responsible for graft rejection by the patient's immune system. Here we focus on stem cell transplantation, where the clinical relevance is related to the reaction of cytotoxic donor T cells against the tissue of the patient. miHAs arise from differences in HLA class I-presented peptides between the donor and the patient, often caused by single nucleotide polymorphisms (SNPs) affecting the amino acid sequence or altering the expression of proteins (Bleakley and Riddell, 2004). The expression of a miHA depends on the expression of the respective gene. Tissue-specific genes lead to tissue-specific

M. Feldhahn et al. / Journal of Immunological Methods 386 (2012) 94–100

miHAs while ubiquitously expressed genes lead to ubiquitously presented miHAs. For certain hematologic malignancies, allogenic hematopoietic stem cell transplantation (alloHCT) is a well-established therapy (Appelbaum, 2003). Initially alloHCT was used to strengthen the hematopoiesis after chemo-/radiotherapy, but nowadays the beneficial effects of graft-versus-leukemia (GvL) or graft-versus-tumor (GvT) caused by donor T cells are the main aim (Feng et al., 2008). In a setting of HLA-matched donor and patient, the effects observed in terms of GvL and GvT are caused by residual T cells of the stem cell graft. Increasing the numbers of T cells increases the risk to induce GvHD, yet alloreactive T cells also mediate the beneficial GvL effect. The dissection of unwanted GvHD and GvL effects can be accomplished when miHA is targeted that is specifically expressed in the tumor tissue. The presence on exclusively malignant cells is more likely to give the desired and localized GvL effects, whereas miHAs expressed in many different tissues may lead to GvHD (Ferrara et al., 2009). Several strategies have been developed in recent years for using miHAs to elicit GvL or GvT effects (Goulmy, 2006), for example miHA-specific T cells and miHA vaccines containing peptides, proteins, mRNA, or DNA. One of the most feasible and efficient current approaches is vaccination with defined miHA peptides and longer peptides/whole proteins pulsed onto dentritic cells after alloHCT treatment (Feng et al., 2008). A major challenge in this field of immunotherapy is the relatively low number of characterized miHAs in relation to the number of HLA alleles. There is a substantial need for fast and accurate identification of novel miHAs to enable immunotherapy for a large number of patients. While next-generation sequencing (NGS) has made the required patient-specific SNP data readily available, it is rather complex to derive patient-specific miHAs from this data. We present a computational pipeline that facilitates the identification of novel miHAs. The pipeline addresses several of the current problems in the identification of miHAs that are likely to be effective in GvL and GvT treatment. We address the analysis of polymorphisms given a set of genes, analysis of protein and peptide effects of these polymorphisms and finally the prediction of HLA class I binding. The pipeline is well suited for both large-scale screening for novel miHAs based on existing SNP data, as well as personalized settings with recently sequenced donors and patients. Here we give a brief outline of the two scenarios and compare our results to experimentally verified results. 2. Theory 2.1. Prediction of miHAs In order to design a computational pipeline for the large-scale identification of candidate miHAs we first need to specify our criteria for candidate miHAs. We define a peptide to be a candidate miHA, if 1) the peptide sequence is changed by a SNP and 2) if the peptide is predicted to bind to at least one of the HLA alleles under consideration. Strictly speaking, a miHA can only be defined in a specific patient–donor pair. By definition, a miHA in the context of alloHCT is a peptide that is presented by HLA in the patient,

95

but not in the donor. In the context of graft rejection, a miHA is a peptide that is presented in the graft but not in the patient. Since we assume the patient and the donor to be HLA-matched, the difference in presentation has to be caused by a genetic variation (usually a SNP) between the patient and the donor. This variation needs to change the peptide sequence (i.e., it has to be a non-synonymous SNP) and thus leads to a presented peptide not previously seen by donor T cells. There are two main settings for the computational identification of miHAs. The first one is the personalized therapy, where information on genetic differences between a donor and a patient is at hand. The second setting is a large-scale screening for potential miHAs for a larger cohort of patients that can be turned into a rapid diagnostic test without the need for individualized genome sequencing. In the latter we computationally identify all potential miHAs for a suitable set of genes, for example genes that are specifically expressed in hematopoietic cells. This strategy can be used to design diagnostic SNP-genotyping tests for the quick identification of relevant miHAs for specific donor–patient pairs. The screening identifies a feasible number of SNPs for a diagnostic test with a high population coverage with respect to HLA alleles. The results of the genotyping for a donor–patient pair can then be analyzed in the individualized setting. Such combined approaches can increase the chance of finding miHAs also for patients with infrequent HLA alleles while minimizing experimental effort to the most promising candidates. We first describe our pipeline for the computational detection of miHAs and then present an example how the pipeline can be applied to clinical settings. 2.2. Pipeline Our pipeline for in silico detection of potential miHAs is depicted in Fig. 1. The workflow is divided in three main steps: (1) Obtain polymorphisms of interest, (2) map polymorphisms in silico to the respective transcripts and translate into peptides, and (3) predict HLA binding peptides for a given set of HLA alleles. A more detailed description of the three steps, along with a discussion of the different options and issues associated with each step, is given in the following. 2.2.1. Step 1: obtain polymorphisms of interest As the input for the miHA analysis pipeline we need a set of polymorphisms that are of interest for the respective clinical setting. We introduce two settings — personalized therapy and large-scale screening for miHAs. We demonstrate how to obtain relevant polymorphisms in the context of leukemia. Restricting the search for miHAs to genes that are only expressed in the malignant tissue reduces the risk of GvHD. As a starting point we therefore take a set of genes that are specifically expressed in the hematopoietic system. In the personalized setting, genomic information for donor and patient is available. This information can be obtained from NGS data or from array technologies for SNP genotpying. From this data we can derive a set of relevant SNPs for the donor–patient pair, i.e. a set of positions in which the patient and the donor differ and one version of the

96

M. Feldhahn et al. / Journal of Immunological Methods 386 (2012) 94–100

Fig. 1. Immunoinformatics pipeline for the identification of potential miHAs. (1) Obtain a set of polymorphisms of interest. In the general setting, SNPs can be retrieved from databases (e.g., dbSNP). In a personalized setting, a list of SNPs with genotyping information for patient–donor pairs is required. In the latter case only SNPs with disparity between patient and donor are considered. Gene expression data can be included to select tissue-specific SNPs. (2) The polymorphisms selected are mapped to the respective transcripts. All peptides containing the polymorphic position are generated from the transcript sequences. (3) Prediction of HLA binding for a set of HLA alleles (based on frequency information in the general case or on patient/donor HLA types) is performed on the polymorphic peptides.

SNP is uniquely present in the patient. Only SNPs in the given set of tissue-specific genes are selected. A large-scale screening aims to identify potential miHAs from known SNPs without prior knowledge on the patient and donor genotypes. The donor and the patient can be tested for disparities in the relevant SNP positions in a later step. For the given set of genes we retrieve all known non-synonymous SNPs from the dbSNP (Sherry et al., 2001). To increase the number of potential candidates one can additionally retrieve synonymous SNPs and in the next step generate peptides around these SNPs in alternative reading frames. In Table 1 we illustrate the influence of genotype combinations of donor and patient on the miHA relevance of SNPs. 2.2.2. Step 2: generating polymorphic peptides In the second step of the pipeline we generate all polymorphic peptides that result from the polymorphisms selected in the

first step. For the genes affected by the polymorphisms we retrieve all transcript sequences from RefSeq (Pruitt et al., 2009). Different transcript or protein isoforms can lead to different peptides. We therefore take all transcripts that are available for each gene into account. We then map the polymorphisms to the transcripts. If more than one polymorphism occurs in the same transcript we take into account all possible combinations. If available, information on genetic linkage of neighboring SNPs can be included in order to reduce the number of combinations. However, a complete linkage might not be proven for all SNPs. We therefore use all combinations of neighboring SNPs in the screening setting. Per default we only translate in the normal (annotated) reading frame. Alternative reading frame translation and the inclusion of synonymous SNPs are also possible. Duplicate peptides generated from different transcripts are removed before submitting the peptides to the prediction step. The length of the produced peptides depends on the methods used

M. Feldhahn et al. / Journal of Immunological Methods 386 (2012) 94–100 Table 1 For SNP rs142901306 we have shown how the combination of donor and patient genotypes decides if a SNP is relevant for miHA analysis for a specific donor–patient pair. For each genotype the corresponding amino acids are listed in brackets. In alloHTC settings a SNP is relevant if the patient expresses one peptide version that is not present in the donor. Donor

G/A (E/K) G (E) A (K)

Patient G/A (E/K)

G (E)

A (K)

No Yes Yes

No No Yes

No Yes No

for HLA binding prediction. Since the majority of all HLA binding peptides are ninemers the default peptide length is nine. For personalized studies we can directly account for the patient–donor genotypes. The zygosity and the direction of the transplant process have to be taken into account. Only peptides that occur uniquely in the patient are of interest. 2.2.3. Step 3: predicting candidate miHAs In the last step of the pipeline we select polymorphic peptides that are likely to be able to elicit an immune response. Binding to HLA is only a prerequisite for a peptide to function as a T-cell epitope. Additional factors are antigen processing and the availability of a T-cell receptor that matches the peptide: HLA complex. While good prediction methods for HLA binding are available, the prediction of antigen processing and T-cell reactivity leave room for improvement (Tung et al., 2011). We therefore restrict the epitope prediction to the prediction of HLA binding in the default setting. Incorporation of antigen processing and T-cell reactivity is technically possible if appropriate prediction methods are available. By default we apply the MHC binding prediction method netMHCpan (Nielsen et al., 2007) using the low-binder threshold to all polymorphic peptides generated in the previous step. We choose netMHCpan since it provides predictions for a wide range of HLA alleles. Our pipeline is based on FRED, a framework for epitope prediction (Feldhahn et al., 2009), and is therefore very flexible with respect to the prediction method to use. The set of HLA alleles for which epitope prediction should be applied depends on the application. In individualized studies, HLA typing of the patient and the donor is available. In such cases all HLA alleles shared by donor and patient are relevant. In the screening setting HLA alleles can be selected according to HLA allele frequencies in a population based on e.g. dbMHC (Sayers et al., 2011) or the allele frequency net database (Gonzalez-Galarza et al., 2011). The whole pipeline has been parallelized to support the efficient screening for large numbers of SNPs and HLA alleles on distributed high-performance computing infrastructures. 2.3. Prioritizing miHAs Computational pipelines as proposed here are able to identify miHA candidates. The number of miHA candidates can be large, depending on the genes and the number of SNPs involved. A clinical application of miHAs, for example for the transfer of miHA-specific donor T cells after alloHCT, requires large experimental efforts. The candidates have, amongst

97

other experiments, to be synthesized and tested for reactive T cells. In the screening setting described above, the donor and the patient have to be genotyped for the SNPs contributing to the miHA candidates. To reduce this experimental effort we propose criteria to select the most promising candidates. 2.3.1. Prioritization on the SNP level The number of SNPs available in dbSNP for a set of genes can be large. In the screening setting, the aim is to identify a small set of SNPs with a high chance of producing a miHA. The donor and the patient are then genotyped for all selected SNPs before the clinical application. SNPs can be chosen to maximize the chance of observing a disparity in two individuals (donor and patient) based on known SNP allele frequencies (many SNPs in dbSNP are annotated with allele frequencies). To address the second requirement for miHAs, HLA binding, the number of peptides around a SNP that are predicted to bind to one of the HLA alleles under consideration is a useful criterion. 2.3.2. Prioritization on the peptide level If the number of donor–patient-specific miHAs is too high to be experimentally validated, the respective peptides can be ranked by the predicted binding affinity to the respective HLA allele. Promiscuous peptides that are predicted to bind to more than one of the patients' alleles have a higher potential for a T-cell response than peptides that bind to just one allele with low affinity. The results presented in Section 3.1.1 show the existence of promiscuous HLA binding peptides. Another aspect to consider is the uniqueness of candidate miHA peptides. If a peptide generated by a SNP is not unique, i.e. it is also present in other proteins, T-cell clones recognizing this peptide might not be present in the T-cell repertoire (self-tolerance). A check against all existing proteins of the human proteome can be done in order to determine similarity to other peptides (Toussaint et al., 2011). 3. Results 3.1. Clinical applications 3.1.1. Large-scale screening for novel miHAs In this scenario we describe how our pipeline can be applied to screen for novel miHAs on a large scale. The aim is to find novel potential miHAs in a set of hematopoiesis-restricted genes. These results can be used to design a genotyping assay for the quick identification of hematopoiesis-restricted miHAs in donor–patient pairs. In order to compare our pipeline to previously published results we base our analysis on a set of genes proposed by Hombrink et al. (2011) for a similar screening study. Our setting differs from the one in Hombrink et al. (2011) in the following parameters: We restrict the analysis to peptides of length nine, to non-synonymous SNPs, and to normal reading frame translation. While Hombrink et al. (2011) only consider HLA-A*02:01, we use the five most frequent HLA-A and HLA-B alleles (based on allele frequency net database (Gonzalez-Galarza et al., 2011) and dbMHC (Sayers et al., 2011)): HLA-A*02:01, HLA-A*01:01, HLA-A*03:01, HLA-A*24:02, HLA-A*11:01, HLA-B*07:02, HLA-B*08:01, HLA-B*44:02, HLA-B*35:01,

98

M. Feldhahn et al. / Journal of Immunological Methods 386 (2012) 94–100

HLA-B*15:01. We use netMHCpan 2.4 with a binding threshold of IC50 ≤ 500 nM for the HLA binding predictions. We use and compare versions 132 and 135 of dbSNP (Sherry et al., 2001), while Hombrink et al. (2011) based their analysis on version 128 of dbSNP. To show the validity of our pipeline we first compare the results to those presented in Hombrink et al. (2011). We only consider peptides of length nine that are translated in the normal reading frame and predictions for HLA-A*02:01 for comparison. We can reproduce 160 of the 177 (90.4%) predicted miHAs (Table S3 in Hombrink et al. (2011)). Hombrink et al. (2011) also analyzed the predicted miHAs with respect to genotyping information for selected donor– patient pairs (Table 1 in Hombrink et al. (2011)). We again exclude peptides that come from alternative reading frame translation or are not of length nine. We can reproduce four out of the remaining five peptides. The one peptide we miss is due to the discrepancies between the prediction methods used. These results show not only the validity of our pipeline but also the influence of the used parameters and prediction methods. A short summary of the numbers of predicted miHAs (10 HLA-A and HLA-B alleles, versions 132 and 135 of dbSNP) is listed in Table 2. In Section 2.3 we presented the criteria to prioritize miHAs on the SNP and on the peptide level. These filtering criteria can be adapted to the constraints of the application of the screening, for example the maximal number of SNPs or miHAs that can be tested experimentally. In the following we demonstrate the effects of some of the filtering criteria. All numbers presented below refer to the prediction for the ten alleles and for SNPs retrieved from version 135 of dbSNP.

3.1.1.1. SNP allele frequencies. A prerequisite for a miHA being present in a donor–patient pair is a disparity in a SNP between the two individuals. The chance of observing a disparity is higher if the allele frequencies of a SNP are balanced. Many SNPs in dbSNP are annotated with allele frequencies. SNPs can be filtered or prioritized based on the Table 2 Number of predicted miHAs for versions 132 and 135 of dbSNP for the 79 genes proposed in Hombrink et al. (2011). Genes and SNPs are counted if they lead to at least one peptide that is predicted to bind to at least one of the HLA alleles. For the 79 genes in our analysis 654 non-synonymous SNPs were found in version 132 of dbSNP, in version 135 of dbSNP we found 2036 SNPs. If we consider the ten alleles listed above we predict miHAs for almost all genes in the list. With an increasing number of SNPs that go into the analysis we also obtain a larger number of predicted miHAs. The number of predicted miHAs is already too large for experimental validation and the number of known SNPs will keep on increasing in the future as a result of large genome sequencing projects. Many of the new SNPs will however show low validated population frequencies. Prioritizing SNPs to include only relevant ones in the prediction of miHA candidates gains importance with increasing numbers of known SNPs.

SNPs found 9mers produced SNPs with predicted miHAs (10 alleles) SNPs with predicted miHAs (A*02:01) Genes with predicted miHAs (10 alleles) Genes with predicted miHAs (A*02:01)

dbSNPv132

dbSNPv135

654 14,562 493 201 75 61

2036 62,655 1598 651 77 73

minor allele frequency. By requiring a minor allele frequency of at least 0.3 we can reduce the number of SNPs to consider from 1598 to 55. This simple threshold can be refined by computing the probability of a mismatch, i.e. the probability of drawing a disparate SNP in two individuals based on allele and haplotype frequencies. 3.1.1.2. Number of predicted miHAs per SNP. The number of different peptides that are affected by a SNP and are predicted to bind to at least one of the HLA alleles under consideration varies between 1 and 40. A single SNP can lead to at most 18 different peptides of length nine. A larger number of peptides per SNP are observed if two SNPs lie within a reading frame of nine amino acids. We then consider all possible peptides that arise from the combination of the two SNPs. By requiring a SNP to produce at least five predicted miHAs we can reduce the number of SNPs from 1598 to 423. 3.1.1.3. Number of HLA alleles covered by a SNP. We define an HLA allele to be covered by a SNP if at least one of the peptides derived from the SNP is predicted to bind to this HLA allele. We count the number of different HLA alleles that are covered by a SNP following this definition. We find SNPs that cover nine out of the ten alleles, other SNPs only cover one allele. For some SNPs the number of covered alleles notably exceeds the number of predicted miHAs. This indicates that we observe promiscuous miHAs that are predicted to be relevant for more than just one HLA allele. These miHAs have a higher potential to be used in therapies in a large number of patients with different HLA alleles. 3.1.2. Finding miHAs in an individualized setting The second scenario is a personalized setting where disparate SNPs for a donor–patient pair are known from the start. To demonstrate the validity of our pipeline in an individualized setting we apply our miHA identification pipeline on experimental data provided by van Bergen et al. (2010). For two patient–donor pairs we are given a list of SNP disparities and the HLA typing of the patients. We use the SNPs that were shown to be associated with a miHA (Table 3 in van Bergen et al. (2010)). Polymorphic peptides of length nine are generated around the SNPs taking into account the genotypes of the respective patient and donor. We compare our predictions to the results reported by van Bergen et al. (2010). The results of the comparison are summarized in Table 3. van Bergen et al. (2010) found seven miHAs for patient H and three miHAs for patient Z. We excluded one of the miHAs for patient H (clone H2) from the analysis since van Bergen et al. (2010) could not find a peptide derived from the respective SNP that is predicted to bind to one of the patients' HLA alleles. (The peptide reported in the column “Predicted HLA binding peptides” in Table 3 in van Bergen et al. (2010) for this clone is actually not predicted to bind and was manually selected for testing due to anchor position that match a binding motif for HLA-A*02:01. The peptide could however not be confirmed as a miHA. Personal communication with C. van Bergen). A second miHA for patient H (clone H10) was excluded since it is derived from an alternative reading frame translation and cannot be detected with our

M. Feldhahn et al. / Journal of Immunological Methods 386 (2012) 94–100

99

Table 3 Identification of the miHAs identified by van Bergen. The first six columns list the information taken from Table 3 in van Bergen et al. (2010). The last column lists the peptides identified as potential miHA by our pipeline. We can reproduce all miHAs that match our requirements. Clone type

Gene

SNP

Polymorphism

HLA

Predicted HLA binding peptide⁎

miHA predicted by our pipeline

H1 H2 H3 H8 H9 H9 H10 H11 Z1 Z2 Z3

WNK1 SSR1 PRCP ERAP1 BCAT2 BCAT2 ARHGDIB PDCD11 GEMIN4 EBI3 APOBEC3B

rs12828016 rs10004 rs2298668 rs26653 rs11548193 rs11548193 rs4703 rs2986014 rs4968104 rs4740 rs2076109

Ile/Met Ser/Leu Asp/Glu Arg/Pro Arg/Thr Arg/Thr Arg/Pro Phe/Leu Val/Glu Ile/Val Lys/Glu

A⁎02:01 A⁎02:01 A⁎02:01 B⁎07:02 B⁎07:02 B⁎07:02 B⁎07:02 B⁎07:02 B⁎07:02 B⁎07:02 B⁎07:02

TLSPEIITV SLAVAQDLT ⁎⁎ FMWDVAEDL HPRQEQIAL QPRRALLFV GVSQPRRAL LPRACWREA ⁎⁎⁎

TLSPEIITV – FMWDVAEDL HPRQEQIAL QPRRALLFV GVSQPRRAL – GPDSSKTFL FPALRFVEV RARYYIQVA –

GPDSSKTFL FPALRFVEV RPRARYYIQ ⁎⁎ KPQYHAEMC ⁎⁎

⁎ Peptides listed in the column “Predicted HLA binding peptides” from Table 3 in van Bergen et al. (2010). We only included peptides of length 9. ⁎⁎ Peptide not predicted to bind by netMHC, although they are listed in the predicted binder column by van Bergen. ⁎⁎⁎ Peptide generated from alternative reading frame.

settings. The remaining five miHAs for patient H were correctly identified by our pipeline. We could directly reproduce one of the miHAs for patient Z (Z1). We could not reproduce the peptides of length nine for Z2 and Z3. The nonameric peptide for Z2 is not predicted as binder. These peptides therefore cannot be detected by our pipeline. We could however identify a predicted binder of length nine for the same allele that is contained in the peptides of lengths 10 and 11 reported by van Bergen et al. (2010). For miHA Z3 we did not find a binding ninemer but could reproduce the peptides of lengths 10 and 11. These results show that we are able to reproduce all miHAs identified by van Bergen et al. (2010) that match our criteria. The number of disparate SNPs provided by van Bergen et al. (2010) is rather small. The general availability of information on genetic variation will increase in the future. With advances in sequencing technologies, sequencing of complete exomes or even genomes for donor–patient pairs will rapidly become a standard technique. The broad application of the above describe individualized setting will therefore gain importance.

4. Conclusions and discussion Here we have presented an efficient and flexible immunoinformatics pipeline for analysis of miHAs, with the aim to facilitate both large-scale discovery of miHAs and selection of donor–patient pairs in the clinic. The application of computational prediction methods offers new approaches for therapies by the quick identification of targets for large groups of patients irrespective of the frequency of their HLA alleles. The validity of the proposed in silico approach was shown by direct comparison with data obtained from experimental studies and yielded nearly perfect agreement. Selecting a set of genes to start with is often based on a comparison between two states, for example normal/cancer or hematopoietic/other tissue. The origin of such data might be microarray or RNA-Seq experiments. A careful selection at this stage might reduce the risk of GvHD in later stages. Extracting SNPs from dbSNP for a set of genes also poses

interesting questions about allele frequency and population distribution. Due to the amount of data generated by NGS-methods, the amount of reported SNPs also increases constantly. Ten years ago it might have made sense to utilize all SNPs one could obtain information about, but in the future the number of reported low-frequency SNPs will be too large. An interesting approach is to do a SNP comparison of donors and patients based on actual sequencing. This is becoming feasible when prices and time needed for next-generation sequencing approaches are decreasing. In such cases it is likely to identify very specific differences between the donor and the patient. Determining SNP effects in terms of amino acid changes requires knowledge about transcript structure, but in many cases it is a rather straightforward process to calculate amino acid effects. A remaining challenge is the reported occurrence of miHAs generated by frame-shift mutations and proteasomal splicing of peptides. Prediction of HLA binding requires good algorithms and careful selection of cutoff values and allele coverage. It would also be possible to extend prediction to events such as proteasomal cleavage and TAP transport (Donnes and Kohlbacher, 2005). Furthermore, one can include prediction of T-cell reactivity and similarity to self-proteins, in order to increase the specificity of the selected miHAs. Our clinical examples focus on the identification of a patient specific and a donor specific set of miHA variants. The computational analysis leads to the identification of patient– donor specific miHA disparities that might be potential targets for GvL therapy. In vitro and in vivo studies have shown potent anti-tumor T-cell mediated effects, capable of inducing GvL activity without causing GvHD. miHA-specific T cells that target epitopes restricted to a relevant (tumor) tissue entity, can be induced in vivo by vaccine cocktails that comprise sets of immunogenic miHA variants, present in the patient but not the donor. T cells of these specificities, generated in the healthy immune system of the respective donors, can be isolated for use in adoptive immunotherapy of a significant proportion of patients with relapse of AML or ALL after alloHCT. This approach not only offers a maximum individualized therapeutic option but also significantly reduces adverse and side effects

100

M. Feldhahn et al. / Journal of Immunological Methods 386 (2012) 94–100

of conventional immune therapeutic treatment, and will significantly improve survival. Acknowledgments This study was funded in part by Deutsche Forschungsgemeinschaft (SFB685), BMBF (iVacALL, grant 01GU1106) and Deutsche Kinderkrebsstiftung (grant DKS2010.10). References Appelbaum, F.R., 2003. The current status of hematopoietic cell transplantation. Annu. Rev. Med. 54, 491. Bleakley, M., Riddell, S.R., 2004. Molecules and mechanisms of the graft-versusleukaemia effect. Nat. Rev. Cancer 4, 371. Donnes, P., Kohlbacher, O., 2005. Integrated modeling of the major events in the MHC class I antigen processing pathway. Protein Sci. 14, 2132. Feldhahn, M., Donnes, P., Thiel, P., Kohlbacher, O., 2009. FRED—a framework for T-cell epitope detection. Bioinformatics 25, 2758. Feng, X., Hui, K.M., Younes, H.M., Brickner, A.G., 2008. Targeting minor histocompatibility antigens in graft versus tumor or graft versus leukemia responses. Trends Immunol. 29, 624. Ferrara, J.L., Levine, J.E., Reddy, P., Holler, E., 2009. Graft-versus-host disease. Lancet 373, 1550. Gonzalez-Galarza, F.F., Christmas, S., Middleton, D., Jones, A.R., 2011. Allele frequency net: a database and online repository for immune gene frequencies in worldwide populations. Nucleic Acids Res. 39, D913. Goulmy, E., 2006. Minor histocompatibility antigens: from transplantation problems to therapy of cancer. Hum. Immunol. 67, 433. Hombrink, P., Hadrup, S.R., Bakker, A., Kester, M.G., Falkenburg, J.H., von dem Borne, P.A., Schumacher, T.N., Heemskerk, M.H., 2011. High-throughput

identification of potential minor histocompatibility antigens by MHC tetramer-based screening: feasibility and limitations. PLoS One 6, e22523. Nielsen, M., Lundegaard, C., Blicher, T., Lamberth, K., Harndahl, M., Justesen, S., Roder, G., Peters, B., Sette, A., Lund, O., Buus, S., 2007. NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and -B locus protein of known sequence. PLoS One 2, e796. Pruitt, K.D., Tatusova, T., Klimke, W., Maglott, D.R., 2009. NCBI Reference Sequences: current status, policy and new initiatives. Nucleic Acids Res. 37, D32. Sayers, E.W., Barrett, T., Benson, D.A., Bolton, E., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio, M., Federhen, S., Feolo, M., Fingerman, I.M., Geer, L.Y., Helmberg, W., Kapustin, Y., Landsman, D., Lipman, D.J., Lu, Z., Madden, T.L., Madej, T., Maglott, D.R., MarchlerBauer, A., Miller, V., Mizrachi, I., Ostell, J., Panchenko, A., Phan, L., Pruitt, K.D., Schuler, G.D., Sequeira, E., Sherry, S.T., Shumway, M., Sirotkin, K., Slotta, D., Souvorov, A., Starchenko, G., Tatusova, T.A., Wagner, L., Wang, Y., Wilbur, W.J., Yaschenko, E., Ye, J., 2011. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 39, D38. Sherry, S.T., Ward, M.H., Kholodov, M., Baker, J., Phan, L., Smigielski, E.M., Sirotkin, K., 2001. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308. Shlomchik, W.D., 2007. Graft-versus-host disease. Nature reviews. Immunology 7, 340. Toussaint, N.C., Feldhahn, M., Ziehm, M., Stevanovic, S., Kohlbacher, O., 2011. T-cell epitope prediction based on self-tolerance. Proc. ICIW. Tung, C.W., Ziehm, M., Kamper, A., Kohlbacher, O., Ho, S.Y., 2011. POPISK: T-cell reactivity prediction using support vector machines and string kernels. BMC Bioinformatics 12, 446. van Bergen, C.A., Rutten, C.E., van Der Meijden, E.D., van Luxemburg-Heijs, S.A., Lurvink, E.G., Houwing-Duistermaat, J.J., Kester, M.G., Mulder, A., Willemze, R., Falkenburg, J.H., Griffioen, M., 2010. High-throughput characterization of 0 new minor histocompatibility antigens by whole genome association scanning. Cancer Res. 70, 9073–9083.