Discrimination and quantification of homologous keratins from goat and sheep with dual protease digestion and PRM assays

Discrimination and quantification of homologous keratins from goat and sheep with dual protease digestion and PRM assays

Journal of Proteomics xxx (xxxx) xxx–xxx Contents lists available at ScienceDirect Journal of Proteomics journal homepage: www.elsevier.com/locate/j...

2MB Sizes 0 Downloads 3 Views

Journal of Proteomics xxx (xxxx) xxx–xxx

Contents lists available at ScienceDirect

Journal of Proteomics journal homepage: www.elsevier.com/locate/jprot

Discrimination and quantification of homologous keratins from goat and sheep with dual protease digestion and PRM assays ⁎⁎

Chen Miaoa,b, Yunfei Yanga, Shanshan Lic, Yufeng Guoa, Wenqing Shuic,d, , Qichen Caoa,



a

Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China University of Chinese Academy of Sciences, Beijing 100049, China c iHuman Institute, Shanghai Tech University, Shanghai 201210, China d School of Life Science and Technology, Shanghai Tech University, Shanghai 201210, China b

A R T I C LE I N FO

A B S T R A C T

Keywords: Mass spectrometry Keratin Keratin-associated proteins (KAPs) Species identification Homology Hair fiber

Mass spectrometry (MS) technology has a special advantage in species determination for protein-rich samples which requires identification of species-specific peptides. However, for species discrimination of highly homologous proteins, it remains challenging to select the species unique peptides with routine proteomics approaches. In this work, we chose keratins and keratin-associated proteins (KAPs) present in cashmere fibers from goat and wool fibers from sheep as targets, to develop a dual-protease digestion workflow based on in-silico and experimental analysis. Combined usage of Glu-C and trypsin proteases showed the best digestion performance for MS identification of keratins and KAPs from different species. The parallel reaction monitoring (PRM) technique was implemented to validate and quantify the selected species discriminable peptides. The fiber composition of both blended animal hair fibers and industrial textile fabrics were successfully determined with the PRM assay. Furthermore, we identified over 360 peptides from the cashmere fiber beyond the current Uniprot goat proteome database. We expect our new workflow would improve the identification and quantification of keratin and KAPs, and provide inspiration for distinguishing other highly homologous proteins. We also anticipate the set of species-specific peptides from keratin or KAPs validated in this work would benefit the quality assessment for industrial fiber materials and textile products. Significance: Discriminating species from highly homologous proteins is challenging for MS-based shotgun proteomics. The large percentage of overlapped protein sequence hinders the identification of the species unique peptides. In this work, we aimed to discriminate sample species between goat and sheep from keratins and keratin-associated proteins (KAPs). A dedicated workflow was developed to boost the exposure and quantification of species discriminable peptides. The dual-proteases digestion approach was optimized based on amino acid sequence analysis and protein in-silico digestion analysis. The PRM assays were established to validate and quantify the selected species unique peptides. Additionally, we have identified about 360 novel candidate peptides complementary to the current goat protein sequence database. We expect our workflow would improve the species discrimination for highly homologous proteins and benefit the proteomics study of keratin and KAPs in the human proteome.

1. Introduction Nowadays, mass spectrometry (MS) is becoming a leading technology for identification and quantification of proteins and is increasingly utilized for species discrimination of protein-rich samples [1–7]. MS based methods distinguish one species from another by detecting the species-specific proteins or identifying the difference of protein sequence between different species. To-date, two basic MS approaches



are commonly applied in protein identification, based on whether intact proteins or the peptides of protein digestion are introduced into MS instruments. The former is termed as “Top-down” proteomics and the latter is “bottom-up” or “shotgun” proteomics. The bottom-up MS approach is widely applied in a high-throughput manner, while, the topdown MS approach was not so frequently adopted [8, 9]. Due to the fact that, the bottom-up approach detects only peptides, it's important to identify the unique peptides to confirm the presence of specific proteins

Corresponding author. Corresponding author at: iHuman Institute, Shanghai Tech University, Shanghai 201210, China. E-mail addresses: [email protected] (W. Shui), [email protected] (Q. Cao).

⁎⁎

https://doi.org/10.1016/j.jprot.2018.07.010 Received 23 March 2018; Received in revised form 3 July 2018; Accepted 13 July 2018 1874-3919/ © 2018 Elsevier B.V. All rights reserved.

Please cite this article as: Miao, C., Journal of Proteomics (2018), https://doi.org/10.1016/j.jprot.2018.07.010

Journal of Proteomics xxx (xxxx) xxx–xxx

C. Miao et al.

2.2. Protein digestion

and species [10]. However, for proteins in highly homologous or evolutionary conservative species, that often share high percentages of amino acid sequence homology, the amount of the unique peptides is limited. Therefore, it is very challenging to distinguish between species that contain large numbers of homologous proteins. Keratin, a superfamily of structural and ubiquitous proteins, has been found from hair to the nuclei of cells [5, 11, 12]. Due to its lightweight and sturdy properties, keratin-materials have been made into various luxurious textiles or artware. Species identification of keratins is therefore important for the quality assessment of the keratinous products. For keratins, one of the significant features is the high degree of homology found within each family, which can amount to 92% between some members of the Type I keratins and 85% among the Type II keratins [13]. In hair, the keratins are embedded in a matrix of keratin-associated proteins (KAPs) [14], a large group of proteins that contains more than 20 families, with most proteins having multiple isoforms. Similar degrees of homology has also been observed for the KAPs [15]. Because of the difficulties in MS identification owing to the substantial sequence homology, coverages of keratins and KAPs are below average in the reported human proteome draft datasets [16], and it is even more challenging to quantify the expression of different isoforms of keratins and KAPs [17, 18]. To address the challenges of identification and quantification of highly homologous keratin and KAPs, in this work, we took goat cashmere and sheep wool fiber as benchmark samples, and a dedicated shotgun proteomics workflow was developed to distinguish the fiber originated species. We first selected the proper protease for protein digestion by carrying out the amino acid sequence analysis and protein in-silico digestion analysis. The best protease combination was confirmed experimentally and was further used to identify the unique peptides of keratin and KAPs from specific species. To maximize the sensitivity and reproducibility for quantifying these unique peptides in the background of high abundance shared peptides, a Parallel Reaction Monitoring (PRM) [19, 20] based MS assay has been developed. The robustness of the PRM assay was evaluated by the blend samples composed of animal fibers originated from distinct species. In total, two species discriminating peptides were successfully quantified in industrial textile samples. We anticipate that the workflow presented in this article could be further implicated in the study of highly homologous proteins in the human proteome. And the PRM assays for the keratin discriminating peptides developed in this work will benefit the quality control (QC) of industrial textiles, which are traditionally examined manually and depend on the expertise of the operator heavily [6].

About 100 μg protein solution was reduced with 10 mM dithiothreitol (DTT) at 37 °C for 4 h in a thermo-shaker at 550 rpm. The alkylation was performed with 40 mM iodoacetamide (IAA) at room temperature in darkness for 40 min. Additional 30 mM DTT was added to consume the excess IAA, followed by vortexing and incubation at 37 °C for 40 min. The protein digestion was conducted in diverse conditions: ⅰ. For trypsin digestion, urea was diluted with 50 mM ammonium bicarbonate to a final concentration of less than 1 M, proteins were digested using sequencing grade modified trypsin (Promega, Madison, USA) at an enzyme-to-protein ratio of 1:100 (w/w) at 37 °C for 4 h, followed by adding fresh trypsin at 1:50 (w/w) before incubating at 37 °C overnight. ⅱ. For trypsin-chymotrypsin digestion, urea concentration was adjusted to less than 1 M with 50 mM ammonium bicarbonate, chymotrypsin (Promega, Madison, USA) was added firstly at an enzyme-to-protein ratio of 1:150 (w/w) reacting at 25 °C for 2 h and fresh chymotrypsin was added at a ratio of 1:150 (w/w) at 25 °C for 12 h, then trypsin was added at the ratio of 1:150 (w/w) at 37 °C for 3 h and additional trypsin was added at the same amount incubating overnight. ⅲ. For trypsin-Glu-C digestion, Glu-C and trypsin were added simultaneously or sequentially. When digestion reacted simultaneously, 50 mM ammonium bicarbonate was used to dilute urea to a concentration of less than 1 M, trypsin and Glu-C were then added at an enzyme-to-protein ratio of 1:100 (w/w) and 1:40 (w/w) respectively at 37 °C for 4 h, followed by adding fresh trypsin at the ratio of 1:50 (w/w) and Glu-C at the ratio of 1:40 (w/w) before incubation at 37 °C overnight. When digestion reacted sequentially, urea was diluted with phosphate buffered solution (pH 7.5) to a final concentration of less than 1 M, Glu-C was firstly added at an enzyme-to-protein ratio of 1:60 (w/w) at 25 °C for 4 h and then fresh Glu-C was added at the same amount incubating for 13 h, hereafter, trypsin was added at the ratio of 1:100 (w/w) at 37 °C for 4 h, followed by additional trypsin added at the ratio of 1:50 (w/w) before incubation at 37 °C overnight. All the digestion described above were terminated by adding formic acid to pH 3. The protein lysate was desalted with C18 tip columns (Nest, MA, USA) and lyophilized under vacuum. 2.3. Nano LC-MS/MS analysis The peptides were dissolved with LC mobile phase A (2% acetonitrile, 0.1% formic acid) and about 1 μg peptide was subjected to nanoLC-MS/MS analysis with an Eksigent Nano LC coupled to Triple-TOF™ 5600 mass spectrometer (SCIEX, USA) with a nano-electrospray ionization source. The peptides were firstly loaded onto a C18 (5 μm) trap column and then switched to an in-house packed 150 mm × 75 μm analytical column with Reprosil-Pur Basic C18 (3 μm) sorbent. LC mobile phase buffer B was composed of 2% water and 0.1% formic acid in acetonitrile. Peptides were separated with a 120 min discontinuous gradient of 5% - 95% buffer B with a flow rate of 300 nL/min. For untargeted proteomics analysis, the instrument was operated in the data-dependent acquisition (DDA) mode. Precursor MS scan range was set to 350–1500 m/z with ion charge states of 2–5. The top 40 strongest precursor ions were fragmented with 22 s of dynamic exclusion time. The targeted proteomic analysis was conducted in parallel reaction monitoring (PRM) mode [22]. The acquisition list was generated from candidate marker peptides selected based on the DDA results. The collision energy was optimized for each peptide precursor to obtain high-quality MS/MS spectra. The top 3 to 6 product ions by intensity were used to construct the transitions for quantitating individual peptide precursors. All the transitions were validated using the mProphet algorithm [23].

2. Materials and methods 2.1. Protein extraction from fiber sample Inner Mongolia white cashmere (C) and fine wool (F) fiber were used for untargeted marker screening and three commercial textile fiber was used for targeted marker verification. Fiber samples were pretreated before protein extraction. Detailed procedure has been introduced in the previous article [21]. Briefly, about 5 mg of pretreated fiber sample was cut up with scissor and added to the extraction buffer (25 mM Tris-HCl, 5 M urea, 2.4 M thiourea, 5% DTT, pH 8.5), slightly shaking for 16 h at 55 °C. Residual fiber sample was then removed by centrifugation at 12000g for 5 min. The protein of the supernatant was precipitated with acetone in a protein-to-acetone ratio of 1:6 (v/v) for at least 2 h under −20 °C and centrifuged at 3000 rpm for 5 min at 4 °C. The protein pellets were further washed with cooled acetone for three times. The supernatant was removed, and the residual acetone was volatilized under nitrogen until about 10 μL left. The protein pellets were re-suspended with re-dissolve buffer (20 mM Tris-HCl, 8 M urea). The protein concentration was determined using the Bradford assay and then the samples were kept at −80 °C.

2.4. DDA data processing for peptide markers screening Obtained DDA data were searched against target-decoy database 2

Journal of Proteomics xxx (xxxx) xxx–xxx

C. Miao et al.

Fig. 1. Theoretical analysis of keratin and KAPs of goat and sheep species. (A) Homologous analysis of 329 keratins or KAPs of goat (59) and sheep (270) blast with UniProtKB protein database under organism of Ovis aries and Capra hircus. Ident_value represents the similarity of the two species. The higher the ident_value is, the higher the homology is. (B) Amino acid composition of proteins in goat and sheep species.

that combines UniProt isoform database of cashmere (Capra hircus, Nov-2017, 2844 entries) and wool (Ovis aries, Nov-2017, 27,544 entries) in Mascot search engine. Fixed modifications of carbamidomethyl were allowed. Oxidation and carbamyl of protein N-term were set as dynamic modifications. The precursor ion mass tolerance was allowed within 20 ppm and fragment ion mass tolerance was set to 0.05 Da. Four miss-cleavages was allowed for both single enzyme digestion and dual digestion. For single digestion, Lys and Arg were set as specific cleavage-sites for trypsin, Asp and Glu were set for Glu-C. For dualenzyme digestion, three cleavage-sites of Lys, Arg, Glu were set for trypsin-Glu-C (Simultaneous digestion); four cleavage-sites of Lys, Arg, Glu, Asp were set for trypsin-GluC (Sequential digestion) and six cleavage-sites of Phe, Tyr, Trp, Lys, Arg were set for trypsin-chymotrypsin. For protein identification, the peptide false discovery rates (FDR) were set at 1%, and mascot ion score cutoff was set to 20.

constructed based on the normalized peptide responses measured from cashmere and fine wool mixtures with at least five data points of which the percentage of cashmere or fine wool was varied from 5% to 100% respectively. Then, the proportion of cashmere and fine wool in mixed fiber and textile was obtained by the calibration curve.

2.5. PRM data processing for peptide marker quantitation

The frequency of each amino acid (Ala, Ile, Pro, Val, Arg, Leu, Ser, Thr, Gly, Met, His, Phe, Tyr, Lys, Asp, Asn, Gln, Glu, Cys and Trp) of a given protein was calculated and divided by the total length of the protein. The amino acid composition analysis was accomplished using a custom python script based on the algorithm described above.

2.6. Scanning electron microscopy of fiber samples The standard animal hair fibers and fabric fibers were observed by a scanning electron microscope (Hitachi SU8000, Japan). Multiple highresolution images of fiber sample at x2500 magnification at least were acquired, which exhibited the difference in morphology among these fiber samples. 2.7. Analyzing protein amino acid composition

The PRM data were processed using Skyline (Version 3.6.0) with the software settings according to the online tutorials (https://skyline.ms/ wiki/home/software/Skyline/page.view?name=tutorial_targeted_ msms) [24]. The peptide search results (DAT file) of DDA data from mascot were used for spectral library building. The spectrum of imported PRM data was matched with that of the spectral library to confirm whether peptides were identified and quantified accurately. All the extracted ion chromatograms (XICs) of selected fragments were manually inspected and the mProphet algorithm was utilized to ensure proper peak picking and peak integration. The peptide of BSA (Bovine serum albumin) was invoked as the internal reference (IS). The XIC intensity ratio of targeted peptides from a given fiber sample vs IS was defined as the normalized peptide responses. A calibration curve was

3. Results 3.1. Homologous analysis of keratin & KAPs in goat and sheep proteomes As documented in previous literature [13, 15], keratins and keratinassociated proteins (KAPs) are both superfamilies with highly homologous sequence. To further understand the protein sequence similarity in goat and sheep, we blasted the keratins and KAPs from the two species. We collected all keratins and KAPs sequences of UniProtKB 3

Journal of Proteomics xxx (xxxx) xxx–xxx

C. Miao et al.

analysis, trypsin-Glu-C resulted in the most protein and peptides identification compared with trypsin-chymotrypsin combination (Fig. 3A–C). By overlapping the keratins and KAPs peptides from different species, more species unique peptides were identified by dualenzyme digestion (Fig. 3D). Detailed MS identification results are summarized in Table S3, S4. The shotgun-proteomics result described above demonstrates the advantage of trypsin-Glu-C dual-proteases digestion strategy for keratins and KAPs identification. To simplify the sample preparation procedure and reduce the protease incubation time, we investigated the effects by adding trypsin and Glu-C sequentially (incubating for 32 h) or simultaneously (incubation for 16 h). And no significant difference was observed between these two approaches (Fig. 4, Table S5). Accordingly, we used the latter approach of trypsin-Glu-C digestion for following proteomics analysis.

protein database under the organism of goat (59 entries) and sheep (270 entries). Literally, we blasted (using NCBI-BLAST-2.2.29+) every distinct protein of our collection against each other. We plot the top identity value (a “100%” ident-value means the two sequences are exactly the same in an alignment) returned from each query in Fig. 1A. As a result, 65.65% keratin or KAPs proteins have an ident-value larger than 0.8, and about 79% of ident-values are greater than 0.6. The high ident-values reflect the notably homologous character of the keratin and KAPs both intra and inter species, which poses a great challenge for bottom-up based proteomics approach to distinguish the difference between goat and sheep species. Another point should we also notice is that, due to the limited genomics and proteomics studies of goat, the number of keratins and KAPs of this species recorded in UniProtKB database is much lower than that of sheep (59:270). Since there are high similarities of existing keratins and KAPs between goat and sheep, it is reasonable to deduce that the un-sequenced goat proteins are also very similar to the counterparts in sheep. The experimental shotgun proteomics screening result confirmed our hypothesis, hence, we combined the goat and sheep protein database for the following proteomics study.

3.4. Selecting species discriminable unique peptides Based on the optimized dual-proteases digestion approach, we conducted a comprehensive proteomics profiling experiment of the cashmere fiber from goat and the fine wool fiber from sheep, each in triplicate. As mentioned above, the goat-sheep combined protein sequence database was used for MS/MS spectra annotation. In total, 67 and 64 keratins and KAPs were identified from the cashmere and fine wool fiber, respectively (Table S6). Altogether 140 keratin and KAP peptides were identified exclusively from the cashmere fiber and 261 peptides were identified from the fine wool fiber. Due to the high sequence similarity, the identified keratins and KAPs shares a large overlap between goat and sheep species (Fig. 5A). The peptides only identified from goat or sheep species were taken as candidates for further analysis. To select the species discriminable peptides, we set four criteria to filtrate the peptides of the candidates: i) peptides with no modifications except carbamidomethyl_Cys, ii) peptides with no miss-cleavages, iii) peptides with less than and equal to 25 amino acids, iv) peptide only exist in corresponding species. After the filtration, the majority of the candidate peptides were ruled out, only 5 peptides from goat and 7 peptides from wool remained (Fig. 5B). Finally, by further validated with three quantitative criteria (Fig. 5B), 4 goat cashmere specific marker peptides and 3 sheep wool maker peptides were remained (Table 1). One should note that, almost all proteins that generate these marker peptides have at least one homologous counterpart (with extremely high Blast ident-values, Table 1), and the difference of only 1–2 amino acids within the marker peptides allows for the discrimination of keratins and KAPs intra-species or inter-species (Fig. S1). For comprehensive profiling of the cashmere and fine wool proteomes, the two-species combined database was employed in this work. Ideally, the MS spectra acquired from cashmere fiber should all be assigned to the peptides/proteins of goat species and vice versa. However, after the peptide filtration, we surprisingly discovered that from the cashmere fiber a total of 75 peptides were assigned only to the proteins of sheep species, moreover, from the wool fiber there were 3 peptides assigned only to the proteins of goat species. After careful manual inspection of our MS data to confirm the peptide sequence assignment, we speculated that this observation might arise from the incompleteness of the protein database for both goat and sheep species, and this set of novel peptides might lead to the discovery of new protein isoforms or sequence variants of keratins or KAPs from the goat or sheep species. Inspired by this finding, we re-inspected all the identified peptides from Fig. 5A, and as a result, totally 364 peptides from cashmere were found only present in the sheep proteome database and 18 peptides from wool in the goat proteome database (Table S8).

3.2. In-silico digestion of keratin & KAPs As widely accepted, bottom-up proteomics strategy applies protease pre-cutting proteins into several short peptides and then is followed by LC-MS analysis. And proteins are inferred by the identified peptides [9]. Although many algorithms are introduced for protein inferring process, identifying the unique peptides is the most critical step [25–27]. Due to the high homology of keratins and KAPs, it is crucial to expose these differential parts to MS instruments to be detected. Conventionally, trypsin is used to digest the proteins to achieve an even distribution of cleavage sites (i.e. lysine and arginine) within the proteome. Yet it is noteworthy that cysteine is highly enriched in keratin and KAPs sequences, which prompted us to investigate whether trypsin is still a good choice for fiber proteome digestion. Therefore, we performed the amino acid (AA) composition analysis of keratins and KAPs from goat and sheep species (Fig. 1B). As a result, both glycine and serine exhibit the significant high abundance, similar to cysteine. We speculate that the high level composition of Cys, Gly, and Ser makes keratins and KAPs be of the characteristics of high helix and high crosslinking [11, 28, 29]. For the rest of AAs, Tyrosine, Arginine, Glutamic acid, Phenylalanine, Leucine, Proline and Valine shows relatively higher abundance than others (Fig. 1B). Based on the cleavage specificity, we selected trypsin (sensitive to Lys/Arg), Glu-C (sensitive to Asp/Glu) and Chymotrypsin (sensitive to Phe/Tyr/Trp/Leu) to perform in-silico digestion analysis of keratins and KAPs. Theoretical protein sequence coverage and the number of species-unique peptides which are present only in the goat or sheep proteome were calculated when using trypsin, Glu-C and chymotrypsin separately or in combination (trypsin-Glu-C and trypsinchymotrypsin). The number of missed cleavage sites was limited to two, the peptide length was limited to 6–30 aa, and the peptide mass was limited to 700–6000 Da. As illustrated in Fig. 2A, utilizing Glu-C alone showed no significant improvement compared with trypsin, while chymotrypsin and the combination of either tryspin-Glu-C or trypsinchymotrypsin could increase the number of unique peptides from both intra-species (Fig. 2A, Table S1) and interspecies (Fig. 2B, Table S2). 3.3. Promoting keratins and KAPs digestion with dual proteases Based on the result of the in-silico digestion analysis, we evaluated the performance of trypsin-Glu-C and trypsin-Chymotrypsin for goat/ sheep hair fiber sample preparation, the protein lysate was analyzed by LC-MS/MS in data dependent acquisition (DDA) mode. Single digestion with trypsin was also conducted as to be the control. As predicted by insilico analysis, the dual protease digestion demonstrated improved results for goat keratins and KAPs. However, discrepantly to the in-silico

3.5. Quantification of species discriminable peptides To validate the selected species discriminable peptides of keratins and KAPs, we designed a quantitative test on cashmere-wool fiber 4

Journal of Proteomics xxx (xxxx) xxx–xxx

C. Miao et al.

Fig. 2. Theoretical analysis of keratin and KAPs of goat and sheep species. (A) Peptide number comparison of keratin and KAPs among different digestion methods of goat and sheep species (including 1 and 2 miss cleavages). (B) Overlap of total keratin and KAPs peptides between goat and sheep species among three digestion methods.

Fig. 3. Comparison of four digestion methods of cashmere and fine wool fiber. (A) The number of total protein and keratin & KAPs protein were varied among the different digestion methods in cashmere fiber. (B) The number of total peptide and keratin & KAPs peptide of three digestion methods in cashmere fiber. (C) The number of unique peptide and unique keratin & KAPs peptide among these digestion methods in cashmere fiber. (D) Overlapping of total keratin and KAPs peptides of cashmere and wool when using single or multiple digestion.

5

Journal of Proteomics xxx (xxxx) xxx–xxx

C. Miao et al.

Fig. 4. Optimization of multi-enzyme digestion. (A) The difference between two kinds of multi-enzymes digestion methods in cashmere fiber. “Simultaneuos” represents using trypsin and Glu-C simultaneously, “Sequential” represents using Glu-C and trypsin sequentially. (B) The overlap of proteins and peptides in keratin and KAPs between the two conditions of multi-enzyme digestion.

extract ion chromatograph (XIC) group is illustrated in Fig. 6. A set of blended samples pre-mixed with different ratio of goat cashmere and sheep wool standard fibers were analyzed as the quality control (QC) samples. The quantitation result is shown in Table 2, the coefficient of variation (CV) of both precision and the accuracy value are less 15%, demonstrating the excellent performance of our PRM based quantitative method for species discriminable peptides of keratins and KAPs. Hence, we further put these marker peptides under a more practical scenario. We accessed real textile samples produced under a series of industrial processes including intense staining (Fig. S2). Fiber samples

mixture. We implemented MS parallel reaction monitoring (PRM) technique to develop the quantitative assay for 4 goat cashmere specific peptides and 3 sheep wool specific peptides that are from several high homologous keratin and KAPs. One BSA peptide was spiked into as the internal standard (IS) to correct the bias from sample preparation and LC-MS injection. Featured by high sensitivity, specificity, and reproducibility, PRM was utilized in many quantitative proteomics researches [21, 30, 31]. The transitions employed in this work are listed in Table S7. In PRM, the target peptides are scanned exclusively and continuously, yielding a highly reliable result. The example transition

Fig. 5. Marker peptides screening. (A) Protein and peptide number of keratin and KAPs identified in cashmere and fine wool fiber of three replicates. (B) Screening processing of marker peptides in cashmere and fine wool. 6

Journal of Proteomics xxx (xxxx) xxx–xxx

C. Miao et al.

Table 1 The marker peptides selected for discriminating the fiber species. Peptide C-1 C-2 C-3 C-4 F-1 F-2 F-3 a b c

CGPCSSYVR IVYVIPSCQSSR YAVPVVTVSSPE LPCNPCATTNAYGK SHSAWSILPR TCCEPTVCQSTCYQPTPCVSSPVR TGCGIGGSTGYGQVGSSGAVSSR

Species specificity

Proteina

Homologous proteinb

Identityc

Goat Goat Goat Goat Sheep Sheep Sheep

Q6R651_ goat Q6R647_ goat Q6R648_ goat Q6R651_ goat W5Q209_ sheep F5AY98_ sheep Q5MD00_ sheep

B0LKP0_ sheep W5PKZ6_ sheep I3V647_ goat B0LKP0_ sheep W5Q5X4_ sheep Q6R648_ goat Q7JFW9_ sheep

98% 87% 99% 98% 54% 97% 99%

The protein (denoted with UniProt ID and the species) containing the marker peptide. The protein with the most amino acid sequence similarity to a. The identity value resulting from blasting b against a.

Fig. 6. Two peptide markers discovered in our study for discrimination and quantification of two species of fibers. One cashmere marker peptide (YAVPVVTVSSPE) and one fine wool marker peptide (SHSAWSILPR). (A) The PRM transitions of the two markers extracted by Skyline software. (B) The standard curves established for the two marker peptides using PRM analysis. C and M mixtures were prepared with varying C and M percentages.

4. Discussion

were processed with the PRM assay developed above in triplicate. As a result, all the seven marker peptides were correctly identified from the fabric samples. More importantly, the fiber composition of three test fabrics determined using our new approach based on MS quantification of three marker peptides is in good agreement with that estimated by a very experienced QC inspector using the traditional microscope-based method (Table 3). We speculate that the intense staining of fiber may affect the protein structure and digestion efficiency, thus interfering with the quantitative results of other maker peptides (although the optical microscopy and electron microscopy examination is also susceptible to industrial manufacture process and relies on the analysts' experience extensively). Therefore, in-depth future work is needed to address this challenge.

Keratin and KAPs are both highly homologous and evolutionary conservative proteins, as presented in fig. S1, only a few amino acids differed between different keratins, either intra-species or inter-species. In this work, a dual-enzyme digestion workflow has been developed and optimized to improve the species identification of keratins and KAPs. In terms of the complementary cleavage specificity of protease, multiple enzyme digestion is the approach often applied to increase the protein identification. And it's important to select the appropriate enzyme combination for the characteristics of the target protein sequence. Since, the amino acid composition analysis in this work reveals the high abundance of Cys, Gly, Ser, Tyr, Arg, Glu, Phe, Leu, Pro and Val in 7

Journal of Proteomics xxx (xxxx) xxx–xxx

C. Miao et al.

5. Conclusion

Table 2 Cashmere and fine wool percentages in QC samples determined using peptide markers. Marker peptides

Sample

Theoretical

Measured

CV

Relative error

C-1

QC-1 QC-2 QC-1 QC-2 QC-1 QC-2 QC-1 QC-2 QC-1 QC-2 QC-1 QC-2 QC-1 QC-2

20% 60% 20% 60% 20% 60% 20% 60% 80% 40% 80% 40% 80% 40%

19.10 59.46 19.29 56.30 20.57 62.65 20.05 60.02 85.53 43.41 88.63 41.13 71.70 43.56

7.61% 15.58% 11.52% 18.01% 13.65% 14.76% 11.50% 17.81% 12.31% 14.80% 17.44% 18.55% 7.64% 19.45%

4.52% 0.90% 3.53% 6.17% 2.85% 4.42% 0.23% 0.03% 6.91% 8.52% 10.78% 2.83% 10.37% 8.91%

C-2 C-3 C-4 F-1 F-2 F-3

In this work, we established a workflow to discriminate the fiber species through highly homologous keratin and KAP proteins. In terms of the characterize of amino acid composition, assistant by in‑silicon digestion analysis, the appropriate protease or proteases combination is selected. By applying DDA-based bottom-up proteomics profiling, the unique peptides are selected. Targeted PRM-based assays were developed to qualify and quantify the pre-selected unique peptides. Here, we take keratins and KAPs as the sample, the species specific peptides are selected following the workflow described above, and we demonstrated the specificity and quantitation accuracy of the selected species specific peptides with standard blended fiber samples and real textile products from goat cashmere and sheep wool. We anticipate the species unique peptides selected in this work could help the assessment of industrial textile fabrics and the workflow established in this work would also benefit the study of identifying homologous proteoforms or variable splicing isomers.

QC-1, a mixture of cashmere and fine wool (1:4,w/w). QC-2, a mixture of cashmere and fine wool (3:2,w/w).

Acknowledgments Table 3 Quantification results of cashmere and fine wool proportions in three textile fibers. Textile fiber

T1 T2 T3

Marker peptide

C-3 C-3 F-1

Cashmere (%)

Fine wool (%)

PRM

Microscopy

PRM

Microscopy

73 ± 4 77 ± 3 N

68 ± 4 77 ± 5 N

N N 27 ± 3

N N 29 ± 5

We would like to thank the MS facility at Tianjin Institution of Industrial Biotechnology, Chinese Academy of Science for assistance in instrument usage. This work was supported by grants from National Natural Science Foundation of China [grant numbers 21505151, 31401150] and by the Key Projects in Tianjin Science & Technology Pillar Program [grant number 14ZCZDSY00062]. Conflict of interest The authors declare no competing financial interests.

N, undetected.

keratins and KAPs. Therefore, trypsin, Glu-C, and Chymotrypsin were selected as the candidate proteases. Benefited from more cleavage sites, the highest protein coverage and the number of unique peptide are obtained with trypsin-Chymotrypsin combination in in-silico digestion analysis (Fig. 2). While, the real experimental data disagreed with the in-silico analysis result, the combination of trypsin and Glu-C was demonstrated to be most suitable for the digestion of keratins and KAPs samples (Fig. 3). The number of missed cleavage could be a good indicator for validating the digestion performance of a specific protease. From the shotgun proteomics test, trypsin-chymotrypsin combination generated far more missed cleavages than trypsin-Glu-C (Fig. S3). Similar data is also documented by a previous literature from E. coli sample digestion [32]. We suspect that the relatively low proteolytic efficiency of chymotrypsin, which was not considered in in-silico digestion analysis, arises the performance difference between real experiment and theoretical analysis. Furthermore, a secondary hydrolysis from chymotrypsin will also occur on methionine, isoleucine, serine, threonine, valine, histidine, glycine, and alanine [33]. This drawback is most likely to further reduce the performance of chymotrypsin. Currently, proteomics identification relies on comprehensive and well annotated proteins sequence database. However, in this work, due to the lack of sequencing study, the sequence entry of goat species in UniProt deposit is very limited, hence, the annotation of MS spectra from goat cashmere sample is compromised. Based on the high homology between goat and sheep species, we used a combined protein sequence database of these two species to increase the identification depth of goat keratins and KAPs. As a consequence, 364 peptides corresponding to 130 proteins (Table S8) were additionally identified from goat due to the incorporation of sheep protein database. These possible new protein isoforms or variants would require further experiments to verify. We hope our in-depth profiling of the animal fiber proteomes will extend the current goat and sheep proteome database and provide specific molecular markers for fiber composition determination valuable for textile quality assessment.

Appendix A. Supplementary data Supplementary data to this article can be found online at https:// doi.org/10.1016/j.jprot.2018.07.010. References [1] E. Shitikov, E. Ilina, L. Chernousova, A. Borovskaya, I. Rukin, M. Afanas'ev, T. Smirnova, A. Vorobyeva, E. Larionova, S. Andreevskaya, M. Kostrzewa, V. Govorun, Mass spectrometry based methods for the discrimination and typing of mycobacteria, Infect. Genet. Evol. 12 (4) (2012) 838–845. [2] E. Durighello, L. Bellanger, E. Ezan, J. Armengaud, Proteogenomic biomarkers for identification of Francisella species and subspecies by matrix-assisted laser desorption ionization-time-of-flight mass spectrometry, Anal. Chem. 86 (19) (2014) 9394–9398. [3] A. Stahl, U. Schroder, Development of a MALDI-TOF MS-based protein fingerprint database of common food fish allowing fast and reliable identification of fraud and substitution, J. Agric. Food Chem. 65 (34) (2017) 7519–7527. [4] H. Kajiwara, N. Hinomoto, T. Gotoh, Mass fingerprint analysis of spider mites (Acari) by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry for rapid discrimination, RCM 30 (8) (2016) 1037–1042. [5] C. Solazzo, M. Wadsley, J.M. Dyer, S. Clerens, M.J. Collins, J. Plowman, Characterisation of novel alpha-keratin peptide markers for species identification in keratinous tissues using mass spectrometry, rapid communications in mass spectrometry, RCM 27 (23) (2013) 2685–2698. [6] S. Paolella, M. Bencivenni, F. Lambertini, B. Prandi, A. Faccini, C. Tonetti, C. Vineis, S. Sforza, Identification and quantification of different species in animal fibres by LC/ESI-MS analysis of keratin-derived proteolytic peptides, JMS 48 (8) (2013) 919–926. [7] C. Solazzo, Follow-up on the characterization of peptidic markers in hair and fur for the identification of common North American species, Rapid communications in mass spectrometry, RCM 31 (17) (2017) 1375–1384. [8] T.K. Toby, L. Fornelli, N.L. Kelleher, Progress in top-down proteomics and the analysis of proteoforms, Annu. Rev. Anal. Chem. 9 (1) (2016) 499–519. [9] L.C. Gillet, A. Leitner, R. Aebersold, Mass spectrometry applied to bottom-up proteomics: entering the high-throughput era for hypothesis testing, Annu. Rev. Anal. Chem. 9 (1) (2016) 449–472. [10] T. Huang, J. Wang, W. Yu, Z. He, Protein inference: a review, Brief. Bioinform. 13 (5) (2012) 586–614. [11] C.H. Lee, M.S. Kim, B.M. Chung, D.J. Leahy, P.A. Coulombe, Structural basis for heteromeric assembly and perinuclear organization of keratin filaments, Nat. Struct. Mol. Biol. 19 (7) (2012) 707–715.

8

Journal of Proteomics xxx (xxxx) xxx–xxx

C. Miao et al.

[12] J.E. Plowman, The proteomics of keratin proteins, J. Chromatogr. B Anal. Technol. Biomed. Life Sci. 849 (1–2) (2007) 181–189. [13] S. Deb-Choudhury, J.E. Plowman, A. Thomas, G.L. Krsinic, J.M. Dyer, S. Clerens, Electrophoretic mapping of highly homologous keratins: a novel marker peptide approach, Electrophoresis 31 (17) (2010) 2894–2902. [14] M.A. Rogers, L. Langbein, S. Praetzel-Wunder, H. Winter, J. Schweizer, Human hair keratin-associated proteins (KAPs), Int. Rev. Cytol. 251 (2006) 209–263. [15] L.M. Flanagan, J.E. Plowman, W.G. Bryson, The high Sulphur proteins of wool: towards an understanding of sheep breed diversity, Proteomics 2 (9) (2002) 1240–1246. [16] M. Wilhelm, J. Schlegl, H. Hahne, A.M. Gholami, M. Lieberenz, M.M. Savitski, E. Ziegler, L. Butzmann, S. Gessulat, H. Marx, T. Mathieson, S. Lemeer, K. Schnatbaum, U. Reimer, H. Wenschuh, M. Mollenhauer, J. Slotta-Huspenina, J.H. Boese, M. Bantscheff, A. Gerstmair, F. Faerber, B. Kuster, Mass-spectrometrybased draft of the human proteome, Nature 509 (7502) (2014) 582–587. [17] J.E. Plowman, S. Deb-Choudhury, A. Thomas, S. Clerens, C.D. Cornellison, A.J. Grosvenor, J.M. Dyer, Characterisation of low abundance wool proteins through novel differential extraction techniques, Electrophoresis 31 (12) (2010) 1937–1946. [18] J.E. Plowman, S. Deb-Choudhury, S. Clerens, A. Thomas, C.D. Cornellison, J.M. Dyer, Unravelling the proteome of wool: towards markers of wool quality traits, J. Proteome 75 (14) (2012) 4315–4324. [19] A.C. Peterson, J.D. Russell, D.J. Bailey, M.S. Westphall, J.J. Coon, Parallel reaction monitoring for high resolution and high mass accuracy quantitative, targeted proteomics, MCP 11 (11) (2012) 1475–1488. [20] J.H. Baek, H. Kim, B. Shin, M.H. Yu, Multiple products monitoring as a robust approach for peptide quantification, J. Proteome Res. 8 (7) (2009) 3625–3632. [21] S. Li, Y. Zhang, J. Wang, Y. Yang, C. Miao, Y. Guo, Z. Zhang, Q. Cao, W. Shui, Combining untargeted and targeted proteomic strategies for discrimination and quantification of cashmere fibers, PLoS One 11 (1) (2016) e0147044. [22] B. Schilling, B. MacLean, J.M. Held, A.K. Sahu, M.J. Rardin, D.J. Sorensen, T. Peters, A.J. Wolfe, C.L. Hunter, M.J. MacCoss, B.W. Gibson, Multiplexed,

[23]

[24]

[25]

[26] [27] [28]

[29]

[30] [31] [32]

[33]

9

scheduled, high-resolution parallel reaction monitoring on a full scan QqTOF instrument with integrated data-dependent and targeted mass spectrometric workflows, Anal. Chem. 87 (20) (2015) 10222–10229. L. Reiter, O. Rinner, P. Picotti, R. Huttenhain, M. Beck, M.Y. Brusniak, M.O. Hengartner, R. Aebersold, mProphet: automated data processing and statistical validation for large-scale SRM experiments, Nat. Methods 8 (5) (2011) 430–435. B. MacLean, D.M. Tomazela, N. Shulman, M. Chambers, G.L. Finney, B. Frewen, R. Kern, D.L. Tabb, D.C. Liebler, M.J. MacCoss, Skyline: an open source document editor for creating and analyzing targeted proteomics experiments, Bioinformatics 26 (7) (2010) 966–968. A.I. Nesvizhskii, A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics, J. Proteome 73 (11) (2010) 2092–2123. O. Serang, W. Noble, A review of statistical methods for protein identification using tandem mass spectrometry, Stat. Interf. 5 (1) (2012) 3–20. Y.F. Li, P. Radivojac, Computational approaches to protein inference in shotgun proteomics, BMC Bioinforma. 13 (Suppl 16) (2012) S4. S.D. Bringans, J.E. Plowman, J.M. Dyer, S. Clerens, J.A. Vernon, W.G. Bryson, Characterization of the exocuticle a-layer proteins of wool, Exp. Dermatol. 16 (11) (2007) 951–960. I. Hanukoglu, E. Fuchs, The cDNA sequence of a Type II cytoskeletal keratin reveals constant and variable structural domains among keratins, Cell 33 (3) (1983) 915–924. J. Zhou, Y. Yin, Strategies for large-scale targeted metabolomics quantification by liquid chromatography-mass spectrometry, Analyst 141 (23) (2016) 6362–6373. A. Lesur, B. Domon, Advances in high-resolution accurate mass spectrometry application to targeted proteomics, Proteomics 15 (5–6) (2015) 880–890. P. Giansanti, L. Tsiatsiani, T.Y. Low, A.J. Heck, Six alternative proteases for mass spectrometry-based proteomics beyond trypsin, Nat. Protoc. 11 (5) (2016) 993–1006. M.M. Burrell, Enzymes of Molecular Biology, Humana Press, 1993.