Abstracts / Human Immunology 78 (2017) 51–254
P146
NANOPORE SEQUENCING AS A MATURING PLATFORM FOR FULL-LENGTH HLA GENOTYPING Vineeth Surendranath a, Gerhard Schoefl a, Kathrin Lang a, Hanneke W. van Deutekom b, Erik Rozemuller b, Alexander Schmidt c, Vinzenz Lange a. aDKMS Life Science Lab, Dresden, Germany; bGenDx, Baltimore, MD, United States; cDKMS Gemeinnuetzige GmbH, Tuebingen, BWB, Germany. Aim: While short read sequencing technologies have proven highly successful for HLA genotyping, longer reads would be desirable to resolve phasing ambiguities. The emerging nanopore sequencing platform from Oxford Nanopore Technologies (ONT) presents a promising technology capable of sequencing unfragmented molecules in the order of tens of kilobases. We demonstrate that nanopore sequencing is an increasingly viable platform for HLA genotyping by addressing questions of throughput, mappability and genotyping accuracy. Methods: We sequenced a benchmarking set of 30 samples with established full length sequence information, 10 each for HLA-A, HLA-B and HLA-C, on the MinION, a nanopore sequencing device from ONT, using 3 successive chemistries R7.3, R9 and R9.4. We mapped the resulting reads, both template-only (1D) and template-complement (2D), to the reference sequences using ‘bwa’ in its ‘ont’ mode, a set of parameters tailored for MinION sequencing reads. Consensus sequences were then derived from the SAM mapping data using a routine leveraging the pysam Python library. Results: We observed progressive chemistries yielding increasing read numbers, from approximately 3000 reads to 8000 reads per sample. We also observed a successive increase in mappability from 90% to 98% for 1D reads, and from 97% to 99.8% for 2D reads. We compared the consensus sequences with the reference sequences, and observed reproducibility at 98% for 1D reads and 99.8% for 2D reads. We note here that the ONT basecaller systematically undercalls polynucleotide stretches resulting in gaps explaining the consensus sequence errors. We expect to report on this issue being ameliorated by a basecaller update scheduled for May 2017 specifically aimed at accurately calling homopolymer stretches. We evaluate if improvements in basecalling are achievable by training the neural network that underlies the ONT basecaller on curated data derived from HLA references sequences. Conclusions: The sequencing yield and mappability of full-length HLA alleles using nanopore sequencing could render it a suitable platform for HLA genotyping. The platform’s major disadvantage, severe undercalling of polynucleotide stretches, is expected to improve with pending updates to the basecalling software.
P147
A NEW APPROACH FOR ACCURATE, FULLY-PHASED HLA CLASS I SEQUENCING USING PACBIO TECHNOLOGY Daniela C. Monaco, Eric Hunter, Dario A. Dilernia. Emory University, Atlanta, GA, United States. Aim: To test the accuracy and sensitivity of a workflow that combines PacBio sequencing and a novel bioinformatics pipeline for multiplexed fully-phased sequencing of full-length HLA-A, -B and -C alleles. This approach involved a new barcoding system and was compared against the LAA approach, commonly used for PacBio-based HLA typing. Methods: Full-length HLA-A, -B and -C loci were PCR-amplified from 50 samples (IHWG reference panel) using in-house barcoded primers. PCR amplicons were pooled together and then ligated to SMRTbell regular adapters (LibraryPrep1), or pooled by sample (A+B+C) and then each pool independently ligated to a different barcoded SMRTbell adapter (LibraryPrep2). PacBio sequencing data (RSII system) was analyzed using the MDPSeq tool. Data from LibrarypPrep2 was also analyzed using the LAA tool (Pacific Biosciences). Results: The sequence of 113 unique alleles was obtained that included 7 with incomplete reference sequences and 14 novel alleles. The high accuracy of the approach was demonstrated by a complete match between alleles sequences obtained from different samples, by a complete match with the available IMGT reference sequences, and by Sanger sequencing confirmation of the mismatches found in the 14 new alleles. Sequencing results from LibraryPrep1 and LibraryPrep2 were identical when data was analyzed with the MDPSeq tool, while LAA approach exhibited an estimated 10% rate of in silico cross-contamination when compared against the MDPSeq tool on data from LibraryPrep2. Conclusions: The MDPSeq tool is able to achieve accurate multiplexed sequencing of HLA-I alleles while the LAA tool exhibits a significant rate of in silico cross-contamination identifying homozygotes as heterozygotes and finding more than two alleles for heterozygotes. The MDPSeq tool, which performs simultaneous step-wise phasing and correction of both barcode and target sequence is able to prevent cross-contamination and provide unequivocally accurate results.
163