Resolving MiSeq-Generated Ambiguities in HLA-DPB1 Typing by Using the Oxford Nanopore Technology

Resolving MiSeq-Generated Ambiguities in HLA-DPB1 Typing by Using the Oxford Nanopore Technology

The Journal of Molecular Diagnostics, Vol. 21, No. 5, September 2019 jmd.amjpathol.org Resolving MiSeq-Generated Ambiguities in HLA-DPB1 Typing by U...

1MB Sizes 0 Downloads 34 Views

The Journal of Molecular Diagnostics, Vol. 21, No. 5, September 2019

jmd.amjpathol.org

Resolving MiSeq-Generated Ambiguities in HLA-DPB1 Typing by Using the Oxford Nanopore Technology Jamie L. Duke,* Timothy L. Mosbruger,* Deborah Ferriola,* Nilesh Chitnis,* Taishan Hu,* Nikolaos Tairis,* David J. Margolis,yz and Dimitri S. Monos*x From the Immunogenetics Laboratory,* Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia, Philadelphia; and the Departments of Dermatologyy and Biostatistics and Epidemiologyz and Pathology and Laboratory Medicine,x Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania Accepted for publication April 30, 2019. Address correspondence to Dimitri S. Monos, Ph.D., Immunogenetics Laboratory, Department of Pathology and Laboratory Medicine, 3615 Civic Center Blvd., Children’s Hospital of Philadelphia, Philadelphia, PA 19104. E-mail: [email protected].

The technical limitations of current next-generation sequencing technologies, combined with an everincreasing number of human leukocyte antigen (HLA) alleles, form the basis for the additional ambiguities encountered at an increasing rate in clinical practice. HLA-DPB1 characterization, particularly, generates a significant percentage of ambiguities (25.5%), posing a challenge for accurate and unambiguous HLA-DPB1 genotyping. Phasing of exonic heterozygous positions between exon 2 and all other downstream exons has been the major cause of ambiguities. In this study, the Oxford Nanopore MinION, a third-generation sequencing technology, was used to resolve the phasing. The accurate MiSeq sequencing data, combined with the long reads obtained from the MinION platform, allow for the resolution of the tested ambiguities. (J Mol Diagn 2019, 21: 852e861; https://doi.org/10.1016/ j.jmoldx.2019.04.009)

The introduction of high-throughput single DNA molecule sequencing technologies [namely, next-generation sequencing (NGS)] in the field of immunogenetics has significantly affected the practice of human leukocyte antigen (HLA) typing.1 One of the major consequences is the elimination of almost all ambiguities observed when legacy methods are used for typing. Nevertheless, as our experience expands regarding the use of NGS for HLA typing, it becomes apparent that the ever-increasing number of alleles, combined with the technical limitation of Illumina (San Diego, CA) platforms to sequence and provide credible phase information for distances >800 bases,1e3 generates circumstances of new ambiguities that interfere with the unequivocal reporting of HLA genotyping. One locus that is particularly notorious for generating ambiguities is the HLA-DPB1 locus. This locus is sequenced in our laboratory as it is clinically relevant to both allogeneic hematopoietic stem cell transplantation and solid organ transplantation.4e13 However, in the course of sequencing the HLA-DPB1 gene in the clinical setting, often

it is not possible to report the genotype unambiguously. Over the past 5 years, >5700 samples have been sequenced for DPB1 and an average of 25.5% of all samples have at least one allele reported ambiguously. There are two types of ambiguity possible for the DPB1 locus: the first is because the current assay does not characterize exon 1 and alleles whose only difference is in exon 1 are not distinguished, and the second type of ambiguity is due to the inability to set phase between the heterozygous positions within the exons sequenced (exons 2 to 5). The rate of reported ambiguity for the DPB1 gene in our laboratory has dramatically increased in that time frame from an overall ambiguity rate of 17.8% in 2014 (type 1, 6.1%; type 2, 11.7%) to 34.0% in 2019 (type 1, 7.9%; type 2, 26.1%). The increase is directly related to the increase in the number of Supported by NIH grant R01AR070873 (D.J.M. and D.S.M.) and Children’s Hospital of Philadelphia (D.S.M.). Disclosures: D.S.M. is a consultant to, and owns options in, Omixon; D.S.M., D.F., and J.L.D. receive royalties from Omixon.

Copyright ª 2019 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved. https://doi.org/10.1016/j.jmoldx.2019.04.009

Oxford Nanopore Resolves DPB1 Ambiguity alleles that have been reported and characterized beyond exon 2 to the international ImMunoGeneTics information system HLA(IMGT/HLA) database14 in that time frame. Herein, we focus on the second type of ambiguity, where initially an unphased set of two alleles only had one combination of alleles that satisfied the set of heterozygous positions in the data. However, as time has progressed, new alleles have been described that also satisfy the same set of heterozygous positions but in different cis/trans combinations. For example, a sample in 2014 would have been typed as DPB1*01:01:01 þ DPB1*11:01:01 unambiguously as no other pair of alleles could explain the set of heterozygous positions present in the data; however, in 2017, the DPB1*654:01 allele was described, and when this newer allele is combined with the DPB1*417:01 allele, they also have the same set of heterozygous positions as the DPB1*01:01:01 þ DPB1*11:01:01 combination (Figure 1D). The difference between the two allele combinations is the arrangement of a polymorphic position located in exon 4, the transmembrane domain, with the polymorphic positions in exon 2 that are 4.8 kb apart, and phase is broken by a 1.6-kb homozygous region in intron 2 (Figure 1D). Therefore, unless all positions across the sequenced region are phased, the two possible

combinations of alleles cannot be discerned from each other. Within the new alleles that are being published for the DPB1 locus, it is not uncommon to find the same elements from other, often common, alleles rearranged to form these new alleles. This can be accomplished through multiple mechanisms, including recombination and gene conversion, both of which have been well-described mechanisms at play within the major histocompatibility complex as a way to increase allelic diversity, allowing for the possibility of millions of alleles per locus to exist.15e20 In the case of DPB1, gene conversion has been proposed as a way in which the six hypervariable regions found within exon 2 are shuffled between alleles to generate diversity.21e23 However, this is only used to explain the exchange of small regions of DNA, subsegments of exons, and does not explain larger rearrangements, which have been generated, presumably, through larger recombination events. Recent NGS sequencing characterization of the complete genomic sequence of DPB1 alleles shows that in the region between exon 3 and the 30 untranslated region (UTR) of the DPB1 alleles are clustered into two distinct clades of alleles that are independent of the clustering observed from the same set of alleles for exon 2, the antigen recognition domain.24,25 The two clusters of alleles described segregate with the single-nucleotide

Figure 1

AeF: Schematic of six different ambiguous allele combinations examined. The amplicon sequenced starts in intron 1, just upstream of exon 2, and continues into the 30 untranslated region (UTR). Exons 2 to 5 are shown as tall boxes with the exon number found in A (E2, E3, E4, and E5), introns are shown as horizontal lines, and the 30 UTR extends out of E5 as a shorter box. Heterozygous positions are shown as vertical lines (black, red, and blue), and gray regions indicate there are many heterozygous positions, not represented individually, and the region is phased. Dashed lines in the introns represent breaks in phase and the distance to the next heterozygous position labeled with the respective length. The red (known heterozygous position) and blue (novel position) vertical bars correspond to the heterozygous positions used in Table 1.

The Journal of Molecular Diagnostics

-

jmd.amjpathol.org

853

Duke et al polymorphism (SNP) rs9277534 that serves as a marker for DPB1 expression.26,27 Phase is often broken in intron 2, which is approximately 4 kb in length, and can be either sparsely or densely populated with heterozygous positions, depending on the combination of alleles, and falls in line with the observations regarding the evolution of the DPB1 locus.24,25 To reduce the rate at which ambiguities are reported for DPB1, phasing of intron 2 is of utmost importance and short-read technologies cannot completely address the problem. Therefore, we explored long-read sequencing methods to combat the ambiguity problem, keeping in mind the needs of clinical HLA laboratories for methods that are accurate, are fast, allow multiplexing, are easy to use, and permit reading long fragments of DNA. Therefore, it was evaluated whether the combination of MiSeq sequencing together with Oxford Nanopore Technology (ONT; Oxford, UK) MinION sequencing offers a way to accurately resolve ambiguities in HLA-DPB1 genotyping. In general, nanopore sequencing is a technology in which DNA is transported through nanoscale-sized pores. Nanopore holes can be proteinaceous (biological nanopores) or solid (solid-state nanopores).28,29 The R9 chemistry of ONT flow cells uses Escherichia coli mutant curli production assembly/transport protein CsgG lipoprotein that has been engineered for DNA translocation. The protein nanopore is placed in electrically resistant polymer membrane, and a current passes through the nanopore. A disturbance in electric current due to passing of DNA molecule through nanopore is used to identify the different bases. Each of the combination of bases produces a different pattern of disturbance in the current flow, and this allows identification of nucleotides as DNA moves through the single pore. A major difference between ONT and Illumina is read length. ONT allows large amplicon fragments to be sequenced in their entirety, whereas Illumina platforms have limitations with respect to sequencing long DNA fragments, as previously mentioned (Introduction), and the read length does not exceed 500 bases, whether as a continuum or paired ends of a fragment. ONT sequencing is also relatively fast. However, the major limitation of sequencing with the R9 chemistry from ONT is that it is highly error prone, and the instrument makes systematic errors.30 Others have shown the strength of combining ONT data with Illumina data to produce high-quality data.31,32 For our purposes of phasing, using Illumina MiSeq data to phase between distal heterozygous positions requires phasing both intronic and exonic heterozygous positions, often including regions with low-quality sequencing data (including homopolymers and short tandem repeats); and because the Illumina fragment sizes are small (300 to 800 bases), many fragments are needed to tile from one heterozygous position to the next. However, with the ONT MinION data, in which a single read encompasses the entire region of interest, there is freedom to choose the appropriate heterozygous positions and bypass regions that are difficult to sequence, in particular low-complexity regions often found in the intronic

Initial HLA-DPB1 genotyping for each sample was performed as part of a pool with other HLA genes using Omixon Holotype V2 HLA kits (Omixon, Budapest, Hungary). Briefly, reagents from Qiagen LR PCR kits (Qiagen, Valencia, CA) were combined with kit primers for amplification; and PCR was performed on a ThermoFisher Veriti thermal cycler (ThermoFisher, Waltham, MA), according to the Holotype V2 protocol. The amplicon excludes exon 1 containing the leader peptide, starts in intron 1, continues through to the 30 UTR of the gene, containing exon 2 (antigen presentation domain), exon 3 (extracellular domain proximal to the membrane), exon 4 (transmembrane domain), exon 5 (cytoplasmic tail), and the 30 UTR, which includes the SNP rs9277534 that has been a identified as a marker for HLA-DPB1 high/low expression. Library preparation for NGS was performed using the standard library preparation method and reagents from Holotype V2 HLA kits. Amplicons were quantitated with the QuantiFluor dsDNA system (Promega, Madison, WI) and diluted to approximately 150 ng/mL and then equal volumes were pooled to 35 mL. After cleaning the diluted 35 mL amplicon pools with 4 ml ExoSAP Express (ThermoFisher), the diluted amplicons were then enzymatically fragmented, end repaired, ligated to indexed adaptors, pooled, concentrated with 1X AMPure XP beads (Beckman Coulter, Indianapolis, IN), and size selected on a Blue Pippin (Sage Science, Beverly, MA). The size-selected library concentration was measured by quantitative PCR with KAPA library quantification kits (KAPA, Wilmington, MA) diluted to 9 pM (picomoles) and sequenced on an Illumina MiSeq using paired-end 2  250 V2 sequencing chemistry. The DPB1 loci for these samples were then reamplified and sequenced as individual locus libraries, as

854

jmd.amjpathol.org

sequences. Therefore, we have explored a combination of Illumina MiSeq, high-quality data with short fragment sizes, and ONT MinION sequencing, error prone but long reads, to resolve the ambiguities present within the HLA-DPB1 gene and aid in novel allele discovery.

Materials and Methods Sample Selection HLA-DPB1 ambiguity is a common problem observed through routine HLA genotyping. Fifteen samples, in which a common ambiguity or novelty at HLA-DPB1 was observed in routine testing, were chosen to undergo ambiguity resolution with Oxford Nanopore sequencing. Thirteen samples were from clinical specimens, whereas two samples were from a research study regarding atopic dermatitis. Inclusion of all deidentified subjects for this study has received institutional review board approval.

Amplicon Preparation and Sequencing on Illumina MiSeq

-

The Journal of Molecular Diagnostics

Oxford Nanopore Resolves DPB1 Ambiguity described above, using paired-end, 2  150 V2 sequencing chemistry to obtain higher depth and, therefore, increase the chances for phasing distant polymorphisms (Results). NGS data were demultiplexed and FASTQ files were generated by MiSeq Reporter on the platform. FASTQ files were analyzed with Omixon’s Twin software version 2.5.1 (IMGT/HLA version 3.29.0.1_5) using 7000 pairs of reads for the DPB1 locus and GenDx’s NGSengine software version 2.9.1 (IMGT/HLA version 3.31; GenDx, Utrecht, the Netherlands) using 100,000 pairs of reads per sample.

ONT MinION Sequencing DPB1 amplicons from 14 samples with ambiguous DPB1 allele combinations and one sample with a novelty that could not be phased were then sequenced on an ONT MinION to obtain a fully phased sequence. Twelve of these samples were sequenced in a single run using an ONT 1D Native barcoding kit for genomic DNA with the manufacturer’s protocol version NBE_9006_v103_revQ_21Dec2016 beginning at the end repair step. The three other DPB1 amplicons were processed in subsequent MinION runs. In short, 2 mg of each amplicon was end repaired and dA tailed with New England Biolabs (NEB; Ipswich, MA) Ultra II End-prep enzyme and buffer, ligated to ONT native barcodes with NEB Blunt/TA Ligase Master Mix, pooled equimolar, and then ligated to ONT Barcode Adaptor Mix with NEB NEBNext5X Quick Ligation Reaction Buffer and Quick T4 DNA Ligase. Between each step in the ONT protocol, reactions were cleaned with AMPure XP beads. The final adaptor-ligated, barcoded pool was loaded onto an ONT SpotON SQK-LSK108 flow cell on the MinION, and amplicon strands were processed through the nanopores on the flow cell for 2 hours using MinKNOW software version 1.13.1 (Oxford Nanopore Technologies). Local base calling was performed using MinKNOW, and the resulting FASTQ files were demultiplexed and adapter trimmed using Porechop version 0.2.3 (R. Wick, Melbourne, VIC, Australia; https://github.com/rrwick/Porechop, last accessed May 1, 2018). ONT reads with lengths between 6500 and 7400 bp were retained for further analysis.

ONT MinION Sequence Analysis and Consensus Generation Illumina FASTQ files were analyzed for DPB1 with Omixon Twin version 2.5.1 (IMGT 3.29.0.1_5) using 7000 paired reads per sample. The sequence and quality information for reads assigned to DPB1 were extracted from the Omixion HTR file and written out in FASTQ format. Hybrid correction of the ONT reads with Illumina reads was performed with FMLRC version 0.1.2 using default settings.33 MAFFT version 7.394 was used to generate a multiple sequence alignment (MSA) on 100 randomly selected corrected ONT reads per sample.34 The

The Journal of Molecular Diagnostics

-

jmd.amjpathol.org

adjustdirection argument was used to account for unstranded reads, and the first read in the input file was selected to be in the sense orientation to keep the direction of the MSA consistent. The function msaConsensusSequence from the Bioconductor package msa version 1.10.0 was used to generate a consensus sequence from the MAFFT MSA.35 The resulting consensus sequence was cleaned by removing gaps and replacing question marks with N. A total of 1000 corrected ONT reads were aligned to the consensus sequence using minimap236 version 2.10, and the alignments were left aligned using the function LeftAlignIndels from the Genome Analysis Toolkit version 3.3.0.37 Two variants were selected for each ambiguous pair, the last variant in exon 2 and the next exonic variant in exon 3, 4, or 5. Intronic variants were ignored. A custom Python script using the pysam module version 0.14 (https://pypi.org/ project/pysam, last accessed February 1, 2018) was used to iterate over each aligned read and record the observed bases at the two variant positions. The counts of the base combinations representing the ambiguous pairs were used to calculate a F (Phi) coefficient.

Results Given the high ambiguity rate in the reported typing for the HLA-DPB1 gene using short-read technology, five different common ambiguities across 14 samples and an unphased novel allele using the Oxford Nanopore MinION were examined. The five allele combinations that could not be phased with the Illumina platform were ambiguous and chosen for further study: DPB1*02:01:02 þ DPB1*04:02:01 versus DPB1*105:01:01 þ DPB1*416:01:01 (Figure 1A); DPB1*03:01:01 þ DPB1*04:01:01 versus DPB1*124:01:01 þ DPB1*350:01 (Figure 1B); DPB1*04:01:01 þ DPB1*04:02:01 versus DPB1*126:01:01 þ DPB1*105:01:01 (Figure 1C); DPB1*01:01:01 þ DPB1*11:01:01 versus DPB1*417:01 þ DPB1*654:01 (Figure 1D); and DPB1*04:01:01 þ DPB1*104:01:01 versus DPB1*124:01:01 þ DPB1*702:01 (Figure 1E). The final sample included in this study has a novel mutation found in exon 5 in a sample that otherwise types as DPB1*02:01:02 þ DPB1*04:01:01:13 (Figure 1F). To determine the allele to which the novelty belongs, it must be phased to heterozygous positions in exon 2. In the allele combinations tested, the heterozygous positions in exon 2 (Figure 1) could not be phased to heterozygous positions in exon 3, 4, or 5 (Figure 1) because of long distances between polymorphic positions, >650 bases in the clinical setting. Technical limitations of the Illumina sequencing method make it difficult to sequence DNA fragments >650 bases with enough depth to determine phasing. All samples selected for this study were initially sequenced on the Illumina MiSeq with paired-end sequencing through routine testing, and specific polymorphisms were

855

Duke et al unphased. The method used to sequence HLA genes specifically aims to provide a wide range of DNA fragment sizes to be able to phase heterozygous positions not necessarily located on the same physical read but located on the same fragment, providing phasing of positions that are <700 bases apart and with adequate depth of coverage. To maximize the potential of the current HLA typing protocol, the DPB1 locus from each sample was resequenced as a single amplicon, obtaining a higher depth of coverage that can aid in the phasing of longer distances. Figure 2A shows the distribution of the length of DNA fragments for each sample at HLADPB1 from the resequencing effort. Some DNA fragments that were sequenced are >650 bases in length, ranging from 11.98% to 20.13% of the total fragments sequenced (Figure 2A). Only one sample, S14 (Figure 1E), was phased with the higher depth of coverage achieved through the resequencing effort, whereby the polymorphic positions that were 670 bases apart were now phased. This sample had the highest number of fragments used for analysis (approximately 69,000), where the median fragment size was 464 bases and 15.87% of fragments were 650 bases. The overall depth of coverage coupled with the profile of fragment sizes allowed phasing of this locus when all possible fragments were accounted for during the analysis. For all other samples, which had larger distances to phase and fewer fragments available for analysis, the reanalysis did not improve the phasing. Taken together, it is difficult, if not impossible, to phase the polymorphic positions observed in the ambiguous allele combinations using the Illumina data alone. To secure phase between exon 2 and all other distal exons in the HLA-DPB1 gene, the samples that presented us with phasing challenges on the Illumina MiSeq were then sequenced on the ONT MinION. The benefit of the MinION technology allows for sequencing the DPB1 amplicon in its entirety, a total of approximately 7150 bases, without fragmentation, guaranteeing that the heterozygous positions will be on phase. The length of DNA fragments that were sequenced on the MinION can be seen in Figure 2B. All samples had most reads at the expected length of the amplicon. Five samples have two distinct peaks, indicating the presence of two different species in the sample, affecting samples S5, S6, S7, S8, and S14. All the alleles in these groups have members of the DPB1*03:01:01G group (DPB1*03:01:01, DPB1*104:01:01, and DPB1*124:01:01) and DPB1*04:01:01G group (DPB1*04:01:01, DPB1*350:01, and DPB1*702:01). Examination of the known DPB1*03:01:01:01 allele compared with the known DPB1*04:01:01:01 allele shows that there is a noticeable difference in the length of the alleles in the targeted region, whereby the DPB1*03:01:01:01 allele is approximately 65 bases shorter than the DPB1*04:01:01:01 allele because of multiple deletions within the intronic and untranslated regions, accounting for the difference in amplicon lengths. To resolve the phase between exon 2 and other distal exons, a custom pipeline was designed to maximize the strengths of the ONT-derived data. Given the high error rate

inherent to the MinION sequencing, the data produced from the MiSeq sequencing were used to perform a hybrid correction of the MinION reads, which were then used to generate a consensus for alignment and phasing of the heterozygous positions (Materials and Methods). Although all heterozygous positions could be used, the highest interest lies in phasing the heterozygous positions in the exonic regions, as these positions are what cause the ambiguities with the short-read (Illumina) sequencing data. The last heterozygous position in exon 2 was phased to the next heterozygous position found in exon 3, 4, or 5, depending on the allele combination (Figure 1), which ranged from 3978 to 5360 bases. Table 1 shows the number of Nanopore reads that support each combination of nucleotides at the exonic heterozygous positions to secure phase. The percentage of reads supporting the most frequent pair of nucleotides for phasing ranged from 63.95% to 83.79%. To evaluate the strength of the phasing from the Nanopore data, a F (Phi) coefficient was calculated for these pairs of heterozygous positions. This score is based on a 2  2 contingency table, which ranges from -1.0 to 1.0, indicating both the direction of the phasing and the strength, where a score of 0 indicates no relationship and 1 indicates a perfect relationship. The 4 coefficient for phasing ranged from 0.314 (moderate positive relationship) to 0.674 (strong positive relationship). The first set of allele combinations examined, found in samples S1 to S4, found that the MinION data supported the DPB1*02:01:02 þ DPB1*04:02:01 allele combination. To phase exon 2 to exon 3, heterozygous positions between 4325 and 4345 bases apart had to be phased. The different lengths indicate that there were minor differences in the length of intron 2, depending on the sample. Between 74.70% and 83.79% of the reads support this combination of alleles in the four different samples. The corresponding 4 scores ranged from 0.506 to 0.674 and rule out the DPB1*416:01:01 þ DPB1*105:01:01 allele combination. The trends are similar for the two other combinations of alleles that were sequenced multiple times. In these cases, all samples in each group support the same phasing combination DPB1*03:01:01 þ DPB1*04:01:01 allele pair (S5 to S8) and DPB1*04:01:01 þ DPB1*04:02:01 (S9 to S12). The two allele pairs that were sequenced a single time, S13 and S14, support DPB1*01:01:01 þ DPB1*11:01:01 and DPB1*04:01:01 þ DPB1*104:01:01 allele combinations. A single sample, S15, had a novelty that was unable to be phased to known heterozygous positions in other exons with the short-read sequencing of the Illumina system. The novelty, found in exon 5, converts codon 225 from CAA to CAG (synonymous substitution for glutamine) and is 5360 bases downstream from the final heterozygous position in exon 2. On the basis of MinION data, the novelty is phased to the DPB1*02:01:02 allele with 69.29% of the reads [F (Phi) coefficient Z 0.359]. The distance between heterozygous positions in this sample is the longest phasing

856

jmd.amjpathol.org

-

The Journal of Molecular Diagnostics

Oxford Nanopore Resolves DPB1 Ambiguity

Figure 2

Distribution of read lengths in the sequencing experiments. Samples are grouped according to the HLA-DPB1 ambiguous allele combinations in the header. A: Illumina MiSeq sequencing: the vertical dotted lines correspond to 650 bases, with the percentage of reads that are 650 bases labeled in the top right of each panel. The vertical dashed lines represent the median for each sample and are labeled to the left of the median line. B: Read length distribution for each sample using the Oxford Nanopore MinION.

The Journal of Molecular Diagnostics

-

jmd.amjpathol.org

857

Duke et al Table 1

Phasing Statistics from the Oxford Nanopore Sequencing Results

Sample S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15

Allele combination 1, n 02:01:02 (G-C ) 373 396 358 393 03:01:01 (G-A) 472 477 442 398 04:01:01 (C-C ) 343 403 473 366 01:01:01 (A-A) 369 04:01:01 (A-G) 326 02:01:NEW (G-G) 220

04:02:01 (A-A) 454 351 407 348 04:01:01 (A-G) 306 325 324 360 04:02:01 (A-A) 459 377 337 419 11:01:01 (G-G) 211 104:01:01 (G-A) 383 04:01:01 (A-A) 405

Allele combination 2, n 416:01:01 (G-A) 76 178 94 171 124:01:01 (G-G) 120 98 121 102 126:01:01 (C-A) 83 89 75 101 417:01 (A-G) 71 702:01 (A-A) 85 02:01:02 (G-A) 131

105:01:01 (A-C ) 84 62 130 80 350:01 (A-A) 87 89 99 126 105:01:01 (A-C ) 106 120 104 105 654:01 (G-A) 256 124:01:01 (G-G) 114 04:01:NEW (A-G) 146

Other, n

Phasing distance, bp

F (Phi) Coefficient

Reads supporting allele combination 1, %

13 13 11 8

4345 4341 4345 4325

0.674 0.534 0.548 0.506

83.79 75.68 77.35 74.70

15 11 14 14

3978 3978 4020 3978

0.569 0.613 0.548 0.538

78.98 81.09 77.69 76.88

9 11 11 9

4379 4379 4379 4379

0.614 0.579 0.632 0.583

80.93 78.87 81.90 79.21

93

4851

0.314

63.95

92

4435

0.562

78.08

98

5360

0.359

69.29

One of the ultimate goals for laboratories performing HLA genotyping is to unambiguously characterize the set of HLA alleles present in a given sample, such that once the allele has been fully interrogated, the typing will not change, independent of the version of the IMGT/HLA database of alleles. Achieving this goal enables more precise matching of patients and donors in pretransplant scenarios, aids in monitoring patients after transplantation for the development of donor-specific antibodies, and eliminates confounding factors for retrospective analyses examining the roles of HLA genes in transplantation and disease association studies. Therefore, all heterozygous positions within the full gene definition must be identified and phased, enabling the generation of a consensus sequence for each allele present for each HLA gene sequenced. Unless all positions are phased, a reported genotype that is currently unambiguous may in the future become ambiguous as more alleles are defined in the IMGT/HLA database. The introduction of NGS into clinical practice has dramatically reduced the level of ambiguity

reported across all HLA alleles and is due, in part, to the ability of the haplotypic nature of the sequencing reads to phase heterozygous positions coupled with interrogating most, if not all, of the gene. One of the pitfalls of NGS methods is that the second-generation instruments (Illumina and Ion Torrent) require short fragments for sequencing, typically <700 bases, but the HLA genes are between 3 and 16 kb in total length. Depending on the combination of alleles for each gene, the density and spacing of heterozygous positions are highly variable; and if the length of the fragments that are sequenced is shorter than the largest distance between consecutive heterozygous positions, phasing is not empirically possible. To guarantee phase across all heterozygous positions in the gene, a method that sequences the amplicons from each gene in their entirety, without fragmentation, is necessary; this is where the third-generation sequencing technologies, like Oxford Nanopore, become highly relevant. One of the challenges of using the ONT system is the high error rate, approximately 15%, which has been demonstrated to have a nonrandom error model, which cannot completely be overcome with a high depth of coverage from sequencing.30,38 To diminish this error, Illumina reads were used to correct the raw 1D MinION reads. In doing so, the noise level in each read was significantly reduced, which also reduced the probability for false identification of heterozygous positions between the consensus sequences for the two alleles and yielded clean sequences to examine the phase structure of the amplicon. After error correction, the alternative allele combination in the phasing statistics for the distal heterozygous positions ranged from 16.21% to 36.05% of the overall reads, which corresponded to a decrease in the

858

jmd.amjpathol.org

distance attempted in this study and shows one of the weakest associations and F (Phi) coefficient. The novel DPB1*02:01 allele described herein has since been characterized in the IMGT/HLA database to be DPB1*02:01:19 and no longer types as ambiguous with the DPB1*04:01:01 allele, but this demonstrates how the long reads from the MinION can be used to sequence and resolve unphased novel alleles.

Discussion

-

The Journal of Molecular Diagnostics

Oxford Nanopore Resolves DPB1 Ambiguity F (Phi) coefficient. Some evidence of PCR-mediated recombination was expected as it is observed at low levels in the shorter fragments from the Illumina sequencing data, in which a F (Phi) coefficient is generally observed to be >0.8, but the long reads generated by the MinION exemplified the extent of crossover during the initial amplification of the gene region. This concept was explored by Potapov and Ong39 in 2017, specifically looking at the extent of PCR-mediated recombination, as observed through single-molecule sequencing on the Pacific Biosciences (Menlo Park, CA) Single Molecule Real-Time sequencing technology. There, they found that between 23% and 28% of reads showed evidence of a crossover event occurring when using Taq polymerase,39 which falls in line with the proportion of reads that were observed that have the alternative allele combination. Interestingly, they also show that PCR-mediated recombination is evenly distributed across the length of the amplicon, which again is consistent with the two observations that had longer distances to phase, 4.8 and 5.3 kb, and had lower F (Phi) coefficient. Although the use of the F (Phi) coefficient does not provide a clear threshold to determine when true phasing is achieved versus noise inherent to preparing and sequencing amplified material, it is useful in assessing the strength and directionality of the phasing. To reduce this undesirable effect of PCR-mediated crossover, a reliable method capable of capturing the HLA region of interest directly and sequencing it without amplification is necessary to achieve this goal. Current capture methods do not produce enough material when only a few small regions are targeted to proceed with the library preparation steps, which are necessary to secure high enough depth of coverage to overcome the error rate inherent to the ONT technology. In this study, we have presented a method that is able to supplement routine clinical HLA typing for samples and loci that require phasing to rule out ambiguous allele combinations. The starting material for this work, the unfragmented HLA amplicons from the clinical workflow, can be directly prepared for sequencing on the ONT MinION device. Using barcodes, a maximum of 12 samples could be sequenced together in a single experiment with the Native barcoding kit, and the samples included herein were sequenced in two separate experiments. For each experiment, a sequencing time of 2 hours on the MinION was more than sufficient to generate enough reads for each of the 12 samples in the experiment to proceed with analysis shown herein. Currently, a new barcoding kit capable of sequencing 96 samples per experiment has since been released, although it relies on a slightly different protocol for introducing the barcodes to the amplicons. However, the ability to barcode and sequence a higher number of samples in one experiment is highly beneficial as batching more samples into a single sequencing experiment dramatically reduces the cost per sample for ruling out ambiguities or phasing novel alleles. Although this approach was applied to the DPB1 locus, it is applicable for any HLA amplicon that requires full phasing. It is also possible to combine

The Journal of Molecular Diagnostics

-

jmd.amjpathol.org

amplicons from multiple HLA genes of a single sample using a single barcode for sequencing on the MinION, but this requires slightly more work during the downstream analysis to separate reads from different loci before consensus generation for ambiguity resolution. The real challenge of this work lies in the downstream analysis, in which no single piece of software is able to handle all parts of the analysis for easy introduction to the clinical laboratory for ambiguity resolution. Although we have established a sequencing and analysis pipeline that is capable of phasing the DPB1 locus, it is not currently automated and relies on the use of multiple pieces of thirdparty software, especially when attempting to correct the ONT-derived reads with the higher-quality Illumina-derived reads. Furthermore, the product line for Oxford ONT is still undergoing development and is subject to changes in sequencing chemistry and base calling. As such, the best practices for analysis are also continually evolving to adapt to the changes of the sequencing technology and incorporate new tools and algorithms that are developed alongside of the product. However, even with known challenges to the entire system, this technology has proved its utility for phasing distal heterozygous positions within the HLA-DPB1 gene for ambiguity resolution and full assessment of novel alleles, and it is extensible to other genes, both HLA and nonHLA, in which similar problems are encountered. Resolving the DPB1 ambiguities by using a combination of sequencing and bioinformatic approaches addresses a technical problem in HLA typing, but is there any evidence that the unambiguous reporting is of any clinical relevance? The clinical utility that the DP locus plays in transplantation has grown significantly in the past decade. For the purpose of hematopoietic stem cell transplants, the T-cell epitope (TCE) model has become a preferred method for choosing unrelated donors that have mismatched alleles at DPB1 by determining the permissive or nonpermissive nature of the mismatches.40 This model is based on the sequence content of the antigen recognition domain, encoded by exon 2 of the gene. Considering that the type of ambiguities encountered are a combination of alleles that as pairs share the same exon 2, the TCE assessment is not influenced by the ambiguity. The DPB1 TCE model, however, does not account for the level of expression of the DPB1 alleles. A preliminary study addressing the possible association between the TCE mismatching algorithm and the presence of the rs9277534 SNP at the 30 UTR of DPB1 has shown there is a separate risk for acute graft-versus-host disease that is associated with the permissive/nonpermissive mismatched DPB1 alleles and a risk associated with the level of expression of DPB1 alleles.24 If level of DPB1 expression is relevant, it is appropriate to identify the means to assess the level of expression based on the particular DPB1 alleles involved and, therefore, the need for resolving ambiguities; by doing so, we define the exact alleles present. Another study by Schöne et al41 claims to be able to predict the 30 UTR SNP polymorphism reflecting expression

859

Duke et al of DPB1 using sequence information from only exons 2 and 3 of the gene and provide guidance for preferred donor selection based on the presence of SNP polymorphism at the 30 UTR in the absence of a donor matched at DPB1. A major limitation of this study is that it does not physically phase exon 2 to either exon 3 or the expression SNP and infers the linkage based on a limited set of full-length DPB1 alleles available in the IMGT/HLA database at the time of the study (IMGT/HLA version 3.23). Although this is not problematic if both alleles in a sample are within the same expression clade, it is problematic when you have alleles that are from different clades in the same sample as you cannot definitively assign the 30 UTR SNP to the exon 2 sequence because of the possibility for cis/trans ambiguities. For example, the allele combination of DPB1*03:01:01:01 with HLA-DPB1*105:01:01:01 is ambiguous with the allele combination of DPB1*124:01:01:01 with DPB1*463:01:01:01. The DPB1*03:01:01:01 is in TCE group 2 with high expression, whereas DPB1*105:01:01:01 is in TCE group 3 with low expression, whereas DPB1*124:01:01:01 is in TCE group 2 with low expression and DPB1*463:01:01:01 is in TCE group 3 with high expression. What exactly is the nonambiguous combination of alleles that influences relationships of TCEs and levels of expression? To eliminate this confounding factor and to properly model the risk for acute graft-versus-host disease, it is necessary to phase polymorphisms in exon 2, for which the TCE model is based, with polymorphisms of exon 3, reflecting the rs9277534 SNP polymorphism in the 30 UTR. This can only be achieved through the long-read sequencing methods, as shown in this report, for all combinations of DPB1 alleles. The approach presented herein enables the accurate and definitive typing results that are the necessary prerequisite to properly conduct these studies not only for the DPB1 locus examining the interplay between the DPB1 TCE mismatch and DPB1 expression to acute graft-versus-host disease in unrelated hematopoietic cell transplantation, but for any HLA locus whereby it is necessary to phase long distances as we assess the role of both polymorphisms and levels of expression of HLA genes.

Acknowledgment We thank the staff of the Immunogenetics Clinical Laboratory (Children’s Hospital of Philadelphia) who performed the initial Illumina sequencing and analysis of the samples we have been using in this report.

References 1. Duke JL, Lind C, Mackiewicz K, Ferriola D, Papazoglou A, Gasiewski A, Heron S, Huynh A, McLaughlin L, Rogers M, Slavich L, Walker R, Monos DS: Determining performance characteristics of an NGS-based HLA typing method for clinical applications. HLA 2016, 87:141e152

860

2. Duke JL, Lind C, Mackiewicz K, Ferriola D, Papazoglou A, Derbeneva O, Wallace D, Monos DS: Towards allele-level human leucocyte antigens genotyping - assessing two next-generation sequencing platforms: ion Torrent Personal Genome Machine and Illumina MiSeq. Int J Immunogenet 2015, 42:346e358 3. Gandhi MJ, Ferriola D, Huang Y, Duke JL, Monos D: Targeted nextgeneration sequencing for human leukocyte antigen typing in a clinical laboratory: metrics of relevance and considerations for its successful implementation. Arch Pathol Lab Med 2017, 141: 806e812 4. Fleischhauer K, Shaw BE, Gooley T, Malkki M, Bardy P, Bignon J-D, Dubois V, Horowitz MM, Madrigal JA, Morishima Y, Oudshoorn M, Ringden O, Spellman S, Velardi A, Zino E, Petersdorf EW; International Histocompatibility Working Group in Hematopoietic Cell Transplantation: Effect of T-cell-epitope matching at HLA-DPB1 in recipients of unrelated-donor haemopoietic-cell transplantation: a retrospective study. Lancet Oncol 2012, 13:366e374 5. Pidala J, Lee SJ, Ahn KW, Spellman S, Wang H-L, Aljurf M, Askar M, Dehn J, Fernandez Viña M, Gratwohl A, Gupta V, Hanna R, Horowitz MM, Hurley CK, Inamoto Y, Kassim AA, Nishihori T, Mueller C, Oudshoorn M, Petersdorf EW, Prasad V, Robinson J, Saber W, Schultz KR, Shaw B, Storek J, Wood WA, Woolfrey AE, Anasetti C: Nonpermissive HLA-DPB1 mismatch increases mortality after myeloablative unrelated allogeneic hematopoietic cell transplantation. Blood 2014, 124:2596e2606 6. Thaunat O, Hanf W, Dubois V, McGregor B, Perrat G, Chauvet C, Touraine J-L, Morelon E: Chronic humoral rejection mediated by anti-HLA-DP alloantibodies: insights into the role of epitope sharing in donor-specific and non-donor specific alloantibodies generation. Transpl Immunol 2009, 20:209e211 7. Singh P, Colombe BW, Francos GC, Martinez Cantarin MP, Frank AM: Acute humoral rejection in a zero mismatch deceased donor renal transplant due to an antibody to an HLA-DP alpha. Transplantation 2010, 90:220e221 8. Mytilineos J, Deufel A, Opelz G: Clinical relevance of HLA-DPB locus matching for cadaver kidney retransplants: a report of the Collaborative Transplant Study. Transplantation 1997, 63:1351e1354 9. Qiu J, Cai J, Terasaki PI, El-Awar N, Lee J-H: Detection of antibodies to HLA-DP in renal transplant recipients using single antigen beads. Transplantation 2005, 80:1511e1513 10. Laux G, Mansmann U, Deufel A, Opelz G, Mytilineos J: A new epitope-based HLA-DPB matching approach for cadaver kidney retransplants. Transplantation 2003, 75:1527e1532 11. Goral S, Prak EL, Kearns J, Bloom RD, Pierce E, Doyle A, Grossman R, Naji A, Kamoun M: Preformed donor-directed antiHLA-DP antibodies may be an impediment to successful kidney transplantation. Nephrol Dial Transplant 2008, 23:390e392 12. Vazirabad I, Chhabra S, Nytes J, Mehra V, Narra RK, Szabo A, Jerkins JH, Dhakal B, Hari P, Anderson MW: Direct HLA genetic comparisons identify highly matched unrelated donor-recipient pairs with improved transplantation outcome. Biol Blood Marrow Transplant 2019, 25:921e931 13. Nowak J, Nestorowicz K, Graczyk-Pol E, Mika-Witkowska R, Rogatko-Koros M, Jaskula E, et al: HLA-inferred extended haplotype disparity level is more relevant than the level of HLA mismatch alone for the patients survival and GvHD in T cell-replate hematopoietic stem cell transplantation from unrelated donor. Hum Immunol 2018, 79:403e412 14. Robinson J, Halliwell JA, Hayhurst JD, Flicek P, Parham P, Marsh SGE: The IPD and IMGT/HLA database: allele variant databases. Nucleic Acids Res 2015, 43:D423eD431 15. Carrington M: Recombination within the human MHC. Immunol Rev 1999, 167:245e256 16. Kotsch K, Blasczyk R: The noncoding regions of HLA-DRB uncover interlineage recombinations as a mechanism of HLA diversification. J Immunol 2000, 165:5664e5670

jmd.amjpathol.org

-

The Journal of Molecular Diagnostics

Oxford Nanopore Resolves DPB1 Ambiguity 17. Högstrand K, Böhme J: Gene conversion can create new MHC alleles. Immunol Rev 1999, 167:305e317 18. Seemann GH, Rein RS, Brown CS, Ploegh HL: Gene conversion-like mechanisms may generate polymorphism in human class I genes. EMBO J 1986, 5:547e552 19. Pease LR, Schulze DH, Pfaffenbach GM, Nathenson SG: Spontaneous H-2 mutants provide evidence that a copy mechanism analogous to gene conversion generates polymorphism in the major histocompatibility complex. Proc Natl Acad Sci U S A 1983, 80:242e246 20. Klitz W, Hedrick P, Louis EJ: New reservoirs of HLA alleles: pools of rare variants enhance immune defense. Trends Genet 2012, 28:480e486 21. Bugawan TL, Horn GT, Long CM, Mickelson E, Hansen JA, Ferrara GB, Angelini G, Erlich HA: Analysis of HLA-DP allelic sequence polymorphism using the in vitro enzymatic DNA amplification of DP-alpha and DP-beta loci. J Immunol 1988, 141:4024e4030 22. Zangenberg G, Huang M-M, Arnheim N, Erlich H: New HLAeDPB1 alleles generated by interallelic gene conversion detected by analysis of sperm. Nat Genet 1995, 10:407 23. Huang MM, Erlich HA, Goodman MF, Arnheim N: Analysis of mutational changes at the HLA locus in single human sperm. Hum Mutat 1995, 6:303e310 24. Morishima S, Shiina T, Suzuki S, Ogawa S, Sato-Otsubo A, Kashiwase K, Azuma F, Yabe T, Satake M, Kato S, Kodera Y, Sasazuki T, Morishima Y: Evolutionary basis of HLA-DPB1 alleles affects acute GVHD in unrelated donor stem cell transplantation. Blood 2018, 131:808e817 25. Klasberg S, Lang K, Günther M, Schober G, Massalski C, Schmidt AH, Lange V, Schöfl G: Patterns of non-ARD variation in more than 300 fulllength HLA-DPB1 alleles. Hum Immunol 2019, 80:44e52 26. Thomas R, Thio CL, Apps R, Qi Y, Gao X, Marti D, Stein JL, Soderberg KA, Moody MA, Goedert JJ, Kirk GD, Hoots WK, Wolinsky S, Carrington M: A novel variant marking HLA-DP expression levels predicts recovery from hepatitis B virus infection. J Virol 2012, 86:6979e6985 27. Petersdorf EW, Malkki M, O’hUigin C, Carrington M, Gooley T, Haagenson MD, Horowitz MM, Spellman SR, Wang T, Stevenson P: High HLA-DP expression and graft-versus-host disease. N Engl J Med 2015, 373:599e609 28. Deamer D, Akeson M, Branton D: Three decades of nanopore sequencing. Nat Biotechnol 2016, 34:518 29. Sadki ES, Garaj S, Vlassarev D, Golovchenko JA, Branton D: Embedding a carbon nanotube across the diameter of a solid state nanopore. J Vac Sci Technol B 2011, 29. 053001

The Journal of Molecular Diagnostics

-

jmd.amjpathol.org

30. Magi A, Giusti B, Tattini L: Characterization of MinION nanopore data for resequencing analyses. Brief Bioinformatics 2017, 18: 940e953 31. Goodwin S, Gurtowski J, Ethe-Sayers S, Deshpande P, Schatz MC, McCombie WR: Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res 2015, 25:1750e1756 32. Morisse P, Lecroq T, Lefebvre A: Hybrid correction of highly noisy long reads using a variable-order de Bruijn graph. Bioinformatics 2018, 34:4213e4222 33. Wang JR, Holt J, McMillan L, Jones CD: FMLRC: hybrid long read error correction using an FM-index. BMC Bioinformatics 2018, 19:50 34. Katoh K, Standley DM: MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 2013, 30:772e780 35. Bodenhofer U, Bonatesta E, Horejs-Kainrath C, Hochreiter S: msa: An R package for multiple sequence alignment. Bioinformatics 2015, 31:3997e3999 36. Li H: Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 2018, 34:3094e3100 37. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M, McKenna A, Fennell TJ, Kernytsky AM, Sivachenko AY, Cibulskis K, Gabriel SB, Altshuler D, Daly MJ: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 2011, 43:491 38. Jain M, Tyson JR, Loose M, Ip CLC, Eccles DA, O’Grady J, Malla S, Leggett RM, Wallerman O, Jansen HJ, Zalunin V, Birney E, Brown BL, Snutch TP, Olsen HE: MinION Analysis and Reference Consortium: phase 2 data release and analysis of R9.0 chemistry. F1000Research 2017, 6:760 39. Potapov V, Ong JL: Examining sources of error in PCR by singlemolecule sequencing. PLoS One 2017, 12:e0169774 40. Crivello P, Zito L, Sizzano F, Zino E, Maiers M, Mulder A, Toffalori C, Naldini L, Ciceri F, Vago L, Fleischhauer K: The impact of amino acid variability on alloreactivity defines a functional distance predictive of permissive HLA-DPB1 mismatches in hematopoietic stem cell transplantation. Biol Blood Marrow Transplant 2015, 21:233e241 41. Schöne B, Bergmann S, Lang K, Wagner I, Schmidt AH, Petersdorf EW, Lange V: Predicting an HLA-DPB1 expression marker based on standard DPB1 genotyping: linkage analysis of over 32,000 samples. Hum Immunol 2018, 79:20e27

861