Accepted Manuscript The CPCFC cuticular protein family: anatomical and cuticular locations in Anopheles gambiae and distribution throughout Pancrustacea Laura Vannini, John Hunter Bowen, Tyler W. Reed, Judith H. Willis PII:
S0965-1748(15)30021-7
DOI:
10.1016/j.ibmb.2015.07.002
Reference:
IB 2733
To appear in:
Insect Biochemistry and Molecular Biology
Received Date: 20 May 2015 Revised Date:
2 July 2015
Accepted Date: 3 July 2015
Please cite this article as: Vannini, L., Bowen, J.H., Reed, T.W, Willis, J.H, The CPCFC cuticular protein family: anatomical and cuticular locations in Anopheles gambiae and distribution throughout Pancrustacea, Insect Biochemistry and Molecular Biology (2015), doi: 10.1016/j.ibmb.2015.07.002. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
The CPCFC cuticular protein family: anatomical and cuticular locations in Anopheles
2
gambiae and distribution throughout Pancrustacea.
3
Laura Vannini1, John Hunter Bowen1, Tyler W Reed1, Judith H Willis1*
4
1
5
*Correspondence:
[email protected]
6
Abstract
7
Arthropod cuticles have, in addition to chitin, many structural proteins belonging to
8
diverse families. Information is sparse about how these different cuticular proteins
9
contribute to the cuticle. Most cuticular proteins lack cysteine with the exception of two
10
families (CPAP1 and CPAP3), recently described, and the one other that we now report
11
on that has a motif of 16 amino acids first identified in a protein, Bc-NCP1, from the
12
cuticle of nymphs of the cockroach, Blaberus craniifer (Jensen et al., 1997). This motif
13
turns out to be present as two or three copies in one or two proteins in species from
14
many orders of Hexapoda. We have named the family of cuticular proteins with this
15
motif CPCFC, based on its unique feature of having two cysteines interrupted by five
16
amino acids (C-X(5)-C). Analysis of the single member of the family in Anopheles
17
gambiae (AgamCPCFC1) revealed that its mRNA is most abundant immediately
18
following ecdysis in larvae, pupae and adults. The mRNA is localized primarily in
19
epidermis that secretes hard cuticle, sclerites, setae, head capsules, appendages and
20
spermatheca. EM immunolocalization revealed the presence of the protein, generally in
21
endocuticle of legs and antennae. A phylogenetic analysis found proteins bearing this
22
motif in 14 orders of Hexapoda, but not in some species for which there are complete
RI PT
1
AC C
EP
TE D
M AN U
SC
Department of Cellular Biology, University of Georgia, Athens, GA, USA
1
ACCEPTED MANUSCRIPT
genomic data. Proteins were much longer in Coleoptera and Diptera than in other
24
orders. In contrast to the 1 and occasionally 2 copies in other species, a dragonfly,
25
Ladona fulva, has at least 14 genes coding for family members. CPCFC proteins were
26
present in four classes of Crustacea with 5 repeats in one species, and motifs that
27
ended C-X(7)-C in Malacostraca. They were not detected, except as obvious
28
contaminants, in any other arthropod subphyla or in any other phylum.
29
The conservation of CPCFC proteins throughout the Pancrustacea and the small
30
number of copies in individual species indicate that, when present, these proteins are
31
serving important functions worthy of further study.
32
Keywords:
33
Cuticle, EM immunolocalization, in situ hybridization, arthropod phylogeny, RT-qPCR
34
1. Introduction
35
Over a dozen families of cuticular proteins (CPs) have been described. One (CPR) has
36
well over 100 genes in several species (Cornman et al., 2008; Futahashi et al., 2008;
37
Cornman, 2009; Willis, 2010; Willis et al., 2012; Ioannidou et al., 2014; Neafsey et al.,
38
2015). Additional data on temporal and spatial expression (both in terms of tissue
39
distribution and location within the cuticle) have also been published. Early papers are
40
reviewed in Willis et al. (2012), more recent ones are Nor et al. (2014; 2015), Pesch et
41
al. (2015) and Vannini et al. (2014a,b). An unusual family that generally has only one
42
member in a species (and very rarely more than two) was named CPCFC by Willis et al.
43
(2012) because of a motif of C-X(5)-C (two cysteines interrupted by five amino acids).
44
The “type specimen” for CPCFC is Bc-NCP1, isolated from nymphal cuticle of the
AC C
EP
TE D
M AN U
SC
RI PT
23
2
ACCEPTED MANUSCRIPT
cockroach, Blaberus craniifer (Jensen et al., 1997) [GenBank: P80674]. The paper
46
describing that sequence established the fundamental property of the family: a 16
47
amino acid motif, here repeated 3 times, that ends C-X(5)-C. The final motif is at the
48
carboxy-terminus of the protein. In addition, Jensen et al. (1997) speculate, after ruling
49
out a role in cross-linking via quinones: “It is more likely that the three cysteine-
50
containing loops in Bc-NCP1 are involved in some sort of specific interaction or binding,
51
either to metal ions or to other proteins.”
52
Now we describe, in detail, expression and localization of one member of that family,
53
AgamCPCFC1, in Anopheles gambiae. We conclude with an analysis of the
54
phylogenetic distribution of members of that family in many orders of Pancrustacea
55
(Hexapoda + Crustacea). Our analysis revealed consistent variants of CPCFC proteins
56
in different orders. The wide-spread distribution of this family represents the second
57
time a motif identified in a few cuticular protein sequences (5 in the case of the R&R
58
Consensus in the CPR family (Rebers and Riddiford, 1988), one sequence here
59
(Jensen et al., 1997) turns out to have been conserved in CPs found throughout
60
arthropods (reviewed in Willis 2010; Willis et al., 2012).
61
2. Materials and methods
62
2.1. Anopheles rearing
63
An. gambiae (G3 strain) were obtained as newly hatched first instar larvae from the
64
breeding facility at the University of Georgia Entomology Department. They were raised
65
at 27 oC under a 12:12 photoperiod and fed ground Koi Food Staple Diet (Foster and
66
Smith Aquatics, Rhinelander, WI USA).
AC C
EP
TE D
M AN U
SC
RI PT
45
3
ACCEPTED MANUSCRIPT
2.2. RT-qPCR
68
An. gambiae larvae, pupae and adults were carefully timed relative to a molt, placed in
69
TRIzol® and immediately frozen. RNA was prepared following the manufacturer’s
70
instructions. Superscript III First Strand Synthesis Kit (Invitrogen) with oligo (dT)20
71
primers was used for cDNA production, and RT-qPCR was carried out with Bio-Rad’s
72
CFX Connect Real Time system. Additional details are in Supplementary File 1 that
73
provides MIQE information in a format recommended by Bustin et al. (2013).
74
Calculations were carried out with LinRegPCR software (Ruijter et al., 2009).
75
The primers used were located near the end of the coding region and extended into the
76
3’UTR with an amplification product of 103 nt (Supplementary Files 2,3). Before use,
77
the primers were checked on genomic DNA for amplification kinetics against two single
78
copy genes, RpS7 [GenBank:AGAP010592] and the epidermal chitin synthase
79
[GenBank:AGAP001748], to assure that they were only amplifying a single gene. RpS7
80
was run on every plate with every cDNA preparation, but was not used to normalize
81
values. Rather, we calculate N0, described as R0 in Togawa et al. (2008), basing values
82
on concentrations of RNA determined with NanoDrop N-1000 (Thermo Scientific). This
83
was necessary because we have failed to find housekeeping genes with consistent
84
expression across the range of developmental stages we studied. Figures showing the
85
variable values obtained with the RpS7 primers and CPCFC1 data normalized to RpS7
86
are in Supplementary File 4.
87
2.3. In situ hybridization
AC C
EP
TE D
M AN U
SC
RI PT
67
4
ACCEPTED MANUSCRIPT
In situ hybridization was carried out on 4 µm paraffin sections of paraformaldehyde fixed
89
An. gambiae of different developmental stages prepared by the Histology Laboratory of
90
the University of Georgia College of Veterinary Medicine. DIG-labeled anti-sense probe
91
preparation and hybridization followed the methods described in earlier publications
92
from our laboratory (Vannini et al., 2014a,b). The primers used and resulting probes are
93
shown in Supplementary Files 2 and 3, respectively. We used one probe directed
94
against the coding region and another against the 3’UTR. Identical patterns of
95
hybridization were found (Supplementary File 5). Probes were also designed based on
96
the sense strands of both antisense probes. They validated the specificity of the
97
technique (Supplementary File 6). Anatomical nomenclature is based on Harbach and
98
Knight (1980).
99
2.4. Cloning and expression of AgamCPCFC1
TE D
M AN U
SC
RI PT
88
The coding sequence for almost all of the mature form of AgamCPCFC1 was cloned
101
into Lucigen Expresso™ T7 Cloning and Expression System with an N-His tag. Primers
102
are given in Supplementary File 2. They cover the entire coding sequence of the mature
103
protein except for the regions coding for the first four and last three amino acids
104
(Supplementary File 3B).
105
The expressed protein was solubilized in 3M urea, 10 mM DTT (dithiothreitol), purified
106
with a Talon Imac Metal Affinity Resin packed into a BioRad column, eluted with 1M
107
imidazole and sent to Harlan Bioproducts for antibody production in rabbits, using their
108
112 day protocol.
109
2.5 EM immunocytochemistry
AC C
EP
100
5
ACCEPTED MANUSCRIPT
Legs and antennae with Johnston’s organs were dissected from precisely aged pharate
111
and post-eclosion adults and fixed in 4% parformaldehyde, 0.3% glutaraldehyde + 4%
112
sucrose in phosphate buffer (pH 7.4). Further details about processing and embedding
113
in LR White resin (Electron Microscopy Sciences) and subsequent processing are given
114
in Vannini et al. (2014a,b). Anti-AgamCPCFC1 and secondary antibodies (goat-anti-
115
rabbit, conjugated to 5 nm gold particles, Sigma) were diluted 1:5,000 and 1:50,
116
respectively. We found only an occasional gold particle on sections incubated with
117
hybridization buffer rather than the primary antibody. We used a JEM-1210
118
transmission electron microscope (JEOL USA) at 120kV. The images were captured
119
with an XR41C Bottom-MountCCD Camera (Advanced Microscopy Techniques).
120
2.6. Phylogenetic analysis via BLAST searches
121
BLAST searches (tblastn) for CPCFC family members were carried out at
122
http://blast.ncbi.nlm.nih.gov/Blast.cgi using either the first motif from Blaberus craniifer
123
Bc-NCP1 [GenBank:P80674.1] or its entire sequence. We used default settings except
124
for turning off filtering and masking of low complexity regions. We searched EST and
125
TSA databases. We only included in our analyses (with one exception) sequences that
126
had a signal peptide and a stop codon and at least two occurrences of the 16- amino-
127
acid CPCFC motif. We omitted all sequences that came from the 1KITE - 1K Insect
128
Transcriptome Evolution project submitted in January, 2014, because we found a small
129
number of cases with identical sequences in two or more orders. At the time of writing
130
this paper these data were under review and revision, which may resolve the
131
inconsistencies that we observed. We used the phylogenetic nomenclature of von
AC C
EP
TE D
M AN U
SC
RI PT
110
6
ACCEPTED MANUSCRIPT
Reumont et al. (2012) and Misof et al. (2014) as well as many of the sequences
133
produced in their analyses.
134
Additional searches were done with wgs (whole-genome shotgun contigs) using
135
Odonata (taxid:6961) as the search term. These could not produce complete
136
sequences unless the region coding for the entire protein was in a single exon,
137
something we have not yet seen for CPCFC genes. Nonetheless, we got provocative
138
results for Ladona fulva.
139
3. Results and discussion
140
3.1. Genomic structure
141
AgamCPCFC1 [GenBank:AGAP007980] is coded by a gene with three exons, the first
142
of which codes for only 5 amino acids (Supplementary File 3A). Such a short first exon
143
is a common feature of CPs in other families (Willis et al., 2010). The sequence is
144
certain to be correct; for there are 4 ESTs with 100% sequence identity and an
145
additional 50 with 99% identity, all covering the entire coding sequence. These ESTs
146
came from the Celera Anopheles gambiae EST project with directional cloning on mixed
147
sex adults, using strain RSP-ST (Reduced susc. to Permethrin).
148
The ortholog in Drosophila melanogaster has only two exons, and the first also codes
149
for only 5 amino acids (Supplementary File 3D).
150
3.2. Temporal expression of transcripts
151
RT-qPCR was used to learn when mRNA from AgamCPCFC1 was present. Highest
152
levels were found immediately after ecdysis to fourth instar larvae, to pupae and to
AC C
EP
TE D
M AN U
SC
RI PT
132
7
ACCEPTED MANUSCRIPT
adults. Far lower levels of transcripts were detected in intermolt and pharate periods
154
(Fig. 1).
155
3.3. Anatomical location of transcripts for AgamCPCFC1
156
We carried out in situ hybridization to learn where the mRNA for AgamCPCFC1 was
157
localized. We used two different antisense probes, one designed in the coding region,
158
the other in the 3’UTR (Supplementary File 3A). In successive sections, hybridization
159
patterns were identical with the two probes (Supplementary File 5). We selected
160
animals at developmental stages where our RT-qPCR data indicated that mRNA was
161
likely to be present, namely pharate and newly eclosed animals. Sense controls for
162
both probes showed no specific hybridization (Supplementary File 6).
163
Transcripts were found in epidermis of larvae, pupae and adults underlying cuticle
164
destined to be highly sclerotized, i.e. hard cuticle. Thus in sections of larvae (Fig. 2),
165
probe was found in the head capsule (Fig. 2B), in cells that secrete lateral setae (arrows
166
in Fig. 2A-C) and in the cells that form the grid and brush at the posterior end (Fig. 2D).
167
Our slides of larvae had animals at different developmental ages, thus it was not
168
unexpected that we found many sections without labeled cells in the head capsule.
169
In sections of pupae that were less than one hour after eclosion (Fig. 3), label was
170
present in cells that form bristles on the pupal abdomen (Fig. 3B); it was also present in
171
the developing antennae (Fig. 3C) and adult scales that surprisingly are already forming
172
(Fig. 3D). Label was found in epidermis underlying abdominal sclerites but not
173
intersegmental membranes (Fig. 3A) with the exception of places where muscle is
174
inserting into the intersegmental membrane (Mus in Fig. 3A)
AC C
EP
TE D
M AN U
SC
RI PT
153
8
ACCEPTED MANUSCRIPT
In pharate adults (Fig. 4), hybridization of the probe was found in sclerites (Fig. 4A), in
176
muscle attachment zones (Fig. 4B), and in epidermis of Johnston’s organ (JO) both
177
beneath the basal plate and under the pedicel that surrounds the organ (Fig. 4D). It was
178
also present in the epidermis of the flagellum (Fig. 4D), spermatheca (Fig. 4C) and the
179
cervical sclerite (Fig. 4E). Just as in the pupa, CPCFC1 transcript was not found in
180
intersegmental membranes (Fig. 4A).
181
In recently eclosed adults (Fig. 5), CPCFC1 transcript was once again detected in JO
182
and the flagellum of the antennae (Fig. 5A), the male cerci (Fig. 5B), and other
183
appendages (Fig. 5C,D).
184
3.4. Localization of AgamCPCFC1 protein within the cuticle
185
We used EM immunolocalization in order to learn where CPCFC1 was within the
186
cuticle. EM sections were treated with a polyclonal antibody (Ab) that had been raised
187
against most of the mature form of CPCFC1 (Supplementary File 3B). The specificity of
188
the antibody is shown in a Western blot of proteins isolated from adult legs
189
(Supplementary File 3C). Ab binding to EM sections was visualized with a colloidal-
190
gold- labeled secondary antibody against rabbit IgG. We examined structures where
191
the transcript, as visualized with in situ hybridization, was abundant: legs and the
192
antenna. We use the term exocuticle for cuticle formed prior to ecdysis, with
193
endocuticle secretion beginning after ecdysis. In adult legs fixed within a day of
194
eclosion or on Day 8 of the adult stage, the presence of AgamCPCFC1 was strong,
195
exclusively in the endocuticle of both the leg and its apodemes (Figs. 6 A-C). In most
196
regions of the legs of pharate adults (P24), when, by definition, no endocuticle is
AC C
EP
TE D
M AN U
SC
RI PT
175
9
ACCEPTED MANUSCRIPT
present, no trace of AgamCPCFC1 was found (Fig. 6D). But in other regions of the
198
pharate adult leg, we did find evidence for AgamCPCFC1 in exocuticle, both in regions
199
with well-formed lamellae and in not yet organized regions next to the epidermal cells.
200
This was most noticeable at the base of the leg and near a joint (Fig. 7A). We also saw
201
label in the pedicel of pharate adults (Fig. 7B) and flagellum of newly emerged adults
202
(Fig. 7C), once again, where endocuticle should not yet be present (Fig. 7B). Absence
203
of an antigen in the cuticle might just mean that it has been masked during the
204
sclerotization process. Hence it would be premature to conclude that except for an
205
occasional region, AgamCPCFC1 is confined to the endocuticle. The higher levels of
206
transcript right after a molt rather than immediately before (Fig. 1), however, are
207
consistent with the endocuticle being the primary destination of the protein.
208
3.5. Phylogenetic distribution of CPCFC genes in Hexapoda
209
RNAseq technology has provided a plethora of sequences from diverse arthropods,
210
available as TSA (Transcriptome Shotgun Assembly) that greatly expanded the number
211
of sequences available from ESTs or genomic data. These new data provided a rich
212
source of CPCFCs including some from minor orders. Searches were carried out with
213
blastp and tblastn (see Methods) and we found 72 complete sequences distributed
214
across the Hexapoda (Table 1; Supplementary File 7). We required that a sequence
215
be complete with a signal peptide and a stop codon in order to be included in the
216
analysis, a stringent criterion especially for sequences obtained with Pyrosequencing
217
(454), where we found occasional frame shifts recognized because parts of the protein
218
resided in two different reading frames. No attempt was made to reconcile these.
219
Further details on search strategies are described in Section 2.6.
AC C
EP
TE D
M AN U
SC
RI PT
197
10
ACCEPTED MANUSCRIPT
The complete sequences identified were sufficient to gain insight about the CPCFC
221
family. With but two exceptions, the original Blaberus protein (Bc-NCP1) and
222
AgamCPCFC1, the proteins discussed are only putative cuticular proteins. Bc-NCP1
223
was isolated from clean nymphal cuticle, and we presented immunological evidence for
224
the presence of AgamCPCFC1 in the cuticle. All of the sequences we report have
225
signal peptides, establishing that they are secreted. One incomplete sequence from
226
Pediculus humanus is presented (in different or red type) in Table 1 and Supplementary
227
File 7, but data from it were not used in the numerical analyses.
228
The diagnostic feature of this family is the presence of a 16 amino acid motif, first
229
identified by Jensen et al. (1997). WebLogos (Crooks et al., 2004) based on motifs
230
from holo- and non-holometabolous hexapods and diverse Crustacea are given in Fig.
231
8. They show that in addition to the two cysteines that provided the name for this family,
232
there are three prolines, in positions 2, 11, 14, that are universal across the Hexapoda.
233
Several other residues are highly conserved, making this an easily recognized and
234
highly conserved motif.
235
Additional consistent features are evident, but we acknowledge that these conclusions
236
are preliminary and may well be revised as more sequences become available. The
237
most common protein structure of the CPCFC family had three copies of the motif, but
238
sequences from three orders, Collembola, Coleoptera and Lepidoptera, had only two.
239
One of the two sequences from the Odonata also had only two motifs (Table 1). Most
240
species have only a single copy of the gene. The presence of two genes in the
241
coleopteran Tribolium castaneum led to the speculation that where only two motifs were
242
present, there would be two genes. Yet we have identified only 2/10 species of
AC C
EP
TE D
M AN U
SC
RI PT
220
11
ACCEPTED MANUSCRIPT
Coleoptera and 2/14 species of Lepidoptera with two copies of CPCFC. There was one
244
dipteran and one odonate with two CPCFC genes (Table 1, Supplementary File 7). An
245
intriguing exception in another odonate, Ladona fulva, is discussed below.
246
The most surprising phylogenetic finding was that the family was almost completely
247
absent from Hymenoptera with only one complete sequence identified from Cephus
248
cinctus, a sawfly. This is despite the abundance of sequence information for this order,
249
with data from many species and complete genomes for three species of Nasonia and
250
Apis dorsata and Apis mellifera, the latter with a recently updated proteome (Elsik et al.
251
2014).
252
SignalP (Petersen et al., 2011) was used to predict the signal peptides shown in
253
Supplementary Files 7 and 9. The first amino acid in Bc-NCP1 is glutamine (Q), which
254
was present as a pyroglutamate residue (Jensen et al., 1997). An initial Q was
255
present, after the signal peptide was removed, in many of the sequences. In addition,
256
we noticed that many of the retrieved sequences had a Q close to the end of the signal
257
peptide. In most cases, the SignalP result showed that this could follow an alternative
258
splice site. The signal for these sequences was modified (bold in Supplementary File 7)
259
to move the Q into the mature protein resulting in 6/12 non-holometabola sequences
260
beginning in this manner, providing further evidence for the conservation of the entire
261
protein sequence. In the Holometabola, Q was less common. Instead, in the
262
Lepidoptera, arginine (R) was the first amino acid in 13/16 sequences, and in the
263
Diptera it was lysine (K) in 22/28. Except for the Coleoptera, there are fewer than 10
264
amino acids from the start of the mature protein to the start of the first motif. Generally
AC C
EP
TE D
M AN U
SC
RI PT
243
12
ACCEPTED MANUSCRIPT
there are zero or one amino acids after the final cysteine at the carboxy- terminus, but
266
occasionally more (Table 1).
267
Another generalization is that the mature protein, with one exception, does not exceed
268
130 amino acids except in the Coleoptera and Diptera that have all family members
269
over that length. The lepidopteran sequences are more comparable in length to
270
members of the non-holometabolous orders (Table 1). There also appear to be amino
271
acids immediately adjacent to the 16-amino- acid-motifs that differ between the different
272
motifs within a sequence and among different orders. For example, almost all of the
273
lepidopteran sequences have arginine-glutamic acid (RE) immediately upstream of the
274
first motif, while this was not seen in any of the dipteran sequences, all with longer
275
stretches before the first motif and alanine-glutamine (AQ) most frequently immediately
276
upstream from the first motif (Supplementary File 7). Whether these differences
277
represent something functional or result from a chance event in evolution remains to be
278
learned.
279
While we have focused our discussion on the number and placement of the CPCFC 16-
280
amino-acid-motif within the protein, it is apparent that the rest of the protein must be
281
conferring important functional properties. This is clearest in the three major
282
Holometabola orders, Coleoptera, Lepidoptera and Diptera. Extensions of the amino-
283
terminus and the regions between motifs are populated by the acidic amino acids,
284
glutamine (Q ) or asparagine (N), with fairly evenly spaced aromatic residues tyrosine
285
(Y), tryptophan (W), or phenylalanine (F) (Supplementary File 7).
AC C
EP
TE D
M AN U
SC
RI PT
265
13
ACCEPTED MANUSCRIPT
In addition to the presence of only two copies of the CPCFC motif in Coleoptera and
287
Lepidoptera, there are other features of the long sequences from these groups and from
288
the Diptera that enable one to assign a sequence to the correct order.
289
The generalizations presented here are certain to change as data on more species
290
become available. For example, a tblastn search for whole genome sequences (WGS)
291
in just the Odonata revealed evidence for 14 distinct CPCFC genes in Ladona fulva.
292
None were complete, for the start of the signal peptides was missing, something not
293
unexpected since the first exon is generally very short and would not been continuous
294
with the presumed second exon, which in these genes had the rest of the coding region.
295
All ended with stop codons. These 14 genes were distributed across 10 contigs. Ten
296
sequences had three motifs, and 4 had two (Supplementary File 8). Three with two
297
motifs were unusual because the final motif was not near the C-terminus, but from 63-
298
84 amino acids away. Possibly as whole genome sequences become available for other
299
species, more examples will be found with more than two CPCFC genes. Another
300
generalization that is upset by Ladona CPCFCs is that the length of the proteins from
301
the first motif to the end exceeds 131 amino acids in 7 of the sequences, excluding the
302
two with unusual carboxy-termini. Hence, unless an intron interrupts what we have
303
interpreted as a continuous second exon, the Coleoptera and Diptera will not be the
304
only orders with long proteins. The one exception noted above to a non-Holometabola
305
sequence with greater than 140 amino acids interestingly is one of the two sequences
306
from another odonate, Enallagma hageni (Table 1).
307
3.6. Phylogenetic distribution of CPCFC genes in Crustacea
AC C
EP
TE D
M AN U
SC
RI PT
286
14
ACCEPTED MANUSCRIPT
While the available data are far more limited in the Crustacea, we found representatives
309
of CPCFC in four of the six classes: Ostracoda, Malacostraca, Maxillopoda, and
310
Remipedia (Table 2, Supplementary File 9). Variation among groups was informative.
311
A large number of hits that were not examined further were to sequences that had only
312
one of the motifs. The barnacle (Amphibalanus amphitrite) had five motifs, and that was
313
the only sequence in Crustacea that was longer than 100 amino acids. Remipedia, the
314
group reported by von Reumont et al. (2012) to be most closely related to the
315
hexapods, had two sequences from one species, Speleonectes, one with two motifs,
316
one with three. The more basal group (Ostracoda) had two sequences, both with two
317
motifs. Most intriguing were the 6 members of this family in Malacostraca. All had a
318
variant on the basic motif, namely C-X(7)-C, present twice in each sequence. This
319
variant was not found in any other group of arthropods. Since Jensen et al. (1997)
320
suggested that the motif functions to bind metals, it would be interesting to learn if some
321
unusual metal is used by members of this order.
322
The conservation of CPCFC proteins across the arthropods and the somewhat
323
consistent differences among members of different orders suggest that these proteins
324
must be playing a significant role in the cuticle. Their absence in some Hymenoptera
325
indicates that whatever that role is, it is not irreplaceable.
326
3.7. Is CPCFC1 found outside Arthropoda?
327
We wondered if the CPCFC motif so highly conserved in Crustacea and Hexapoda
328
could be found in other groups. They were, and while details are in Supplementary File
329
10, a summary is given below:
AC C
EP
TE D
M AN U
SC
RI PT
308
15
ACCEPTED MANUSCRIPT
BLAST searches (tblastn, against EST or TSA entries, excluding Arthropoda) turned up
331
five hits. One hit was to a sequence from a Homo sapiens brain cDNA library
332
[GenBank:HY131203.1]. The sequence is not present in the database of Homo sapiens
333
proteins, not surprisingly, because it has a 100% match to a protein from the cockroach,
334
Blatella germanica [GenBank:GBID01001268.1].
335
We also got hits to two plants, Karelinia caspia (Asteraceae, a daisy,
336
[GenBank:GANI01023091.1]) and Humulus lupulus (common hop, [GenBank:
337
GAAW01027316.1]). TSA entries from another animal, Hynobius chinensis (Chinese
338
salamander, [GenBank:GAQK01079415.1]), also had a CPCFC sequence.
339
We found a perfect match for the daisy; indeed, the daisy sequence completed an
340
abbreviated sequence for the silverleaf (sweet potato, tobacco) whitefly Bemisia tabaci.
341
The hop was clearly contaminated by a fruit fly, probably in the genus Bactrocera, and
342
the salamander sequence was very close to a chironomid.
343
A final case of contamination was in Daphnia pulex, the only sequence identified for the
344
crustacean class Branchiopoda. Searches of ESTs for CPCFC in Crustacea result in
345
top hits to Daphnia pulex, but exclusively to library 12, the one where the Daphnia had
346
been exposed to Chaoborus americanus in order to monitor the transcriptional response
347
to this predatory midge (Table S10 in Colbourne et al., 2011). Thus it is not surprising
348
that when the complete Daphnia sequence [GenBank:FE342003.1] is itself used in a
349
BLAST search against ESTs, instead of linking to other Crustacea, the top match is to a
350
different midge, Corethrella appendiculata [GenBank:GANO01004087.1], followed by
351
various mosquitoes.
AC C
EP
TE D
M AN U
SC
RI PT
330
16
ACCEPTED MANUSCRIPT
4. Conclusions
353
A new family of cuticular proteins, CPCFC, has members widely dispersed among the
354
Pancrustacea. Members are generally present in 1-2 copies per species, with a protein
355
having two to three copies of the 16 amino acid CPCFC motif that ends C-X(5)-C. A
356
notable exception was seen in the dragonfly, Ladona fulva, where 14 genes, each with
357
2 or 3 CPCFC motifs, were found.
358
Experimental work with the An. gambiae family member, AgamCPCFC1, revealed that
359
the mRNA is most abundant immediately following a molt; transcripts are found
360
predominantly in epidermis secreting hard cuticle, and the protein has been localized
361
mainly in endocuticle. Available information on phylogenetic distribution and protein
362
characteristics revealed that CPCFC is distributed throughout the Hexapoda and in
363
several classes of Crustacea. Amino acid sequences in two Holometabola orders,
364
Coleoptera and Diptera, were longer than in the other orders. All sequences found in
365
the Malacostraca had a motif that ended C-X(7)-C, rather than C-X(5)-C.
366
Figure legends
367
Fig. 1. RT-qPCR analysis of AgamCPCFC1 transcripts in Anopheles gambiae. L48
368
and P24 are actually pharates of the next stage. See Text and Supplementary File 1 for
369
methods.
370
Fig. 2. In situ hybridization of AgamCPCFC1 on sections of 4th instar larvae.
AC C
EP
TE D
M AN U
SC
RI PT
352
371
A. Photograph of larva with arrows showing location of lateral setae on thorax and
372
abdomen and a double arrow indicating the grid and fringe at the posterior end. B.
373
Head capsule and bit of prothoracic segment. Note the presence of hybridization in the 17
ACCEPTED MANUSCRIPT
small cells that form setae at the anterior edge of the prothorax. C. Section of the
375
abdomen showing cells that are forming setae. D. Grid and accompanying fringe at
376
posterior end of a larva. E. Section showing cells secreting large and small setae. (B,D
377
3’ probe; C,E coding region probe).
378
Fig. 3. In situ hybridization of AgamCPCFC1 on sections of pupae less than 1 hour
379
after pupation. A. Section of abdomen showing epidermal hybridization in sclerites (Scl)
380
and only in intersegmental membrane (IsM) where muscles (Mus) are inserting into the
381
cuticle. B. Lateral surface of pupal abdomen with setae-forming cells. C. Developing
382
antenna in pupa. Structure was recognized because it is similar to that shown in Fig.
383
76a of Harbach and Knight (1980). D. Limb with developing scales showing
384
hybridization. E. Muscle insertion zone with strong hybridization.
385
A,C,E coding region probe.)
386
Fig. 4. In situ hybridization of AgamCPCFC1 on sections of pharate adults
387
Animals were fixed 24 hours after pupation, which are a few hours before ecdysis to the
388
adult. A. Hybridization to epidermis of sclerites (Scl), but not intersegmental
389
membranes (IsM). B. Hybridization in muscle attachment region. C. Hybridization in
390
spermatheca (Sp). D. Hybridization under basal plate (BP) of Johnston’s organ, the
391
surrounding pedicel (Ped) and the flagellum (Fl). E. Hybridization to part of cervical
392
sclerite. (D,E 3’ probe; A,B,C coding region probe.)
393
Fig. 5. In situ hybridization of AgamCPCFC1 on adults less than 12 hours after
394
eclosion. A. Antenna with Johnston’s organ (JO) and flagellum (Fl) showing strong
395
hybridization. B. cerci at the terminal end of the male abdomen. C and D.
396
Hybridization in appendages. (All coding region probe.)
(B,D 3’ probe;
AC C
EP
TE D
M AN U
SC
RI PT
374
18
ACCEPTED MANUSCRIPT
Fig. 6. EM Immunolocalization of AgamCPCFC1 on legs from adults of various ages.
398
In these sections label is restricted to endocuticle. A. Leg from adult one day after
399
eclosion. B. Apodeme from same animal. Exocuticle is interior in the apodemes. C.
400
Section of leg from animal 8 days after eclosion. D. Pharate adult with only exocuticle
401
and no labeling visible. ex, exocuticle; en, endocuticle; ep, epidermis. Scale bars are
402
500 nm.
403
Fig. 7. EM immunolocalization of AgamCPCFC1 in both exo- and endo-cuticle. A. Leg
404
of a pharate adult (P24) showing areas of lamellar exocuticle with labeling near a joint.
405
Insert lower power of relevant region. B. Labeling in exocuticle of P24 pedicel. C. Both
406
exo- and endo-cuticle labeled in flagellum of adult <12 h after eclosion. Abbreviations
407
as in Fig. 6. Scale bars are 500 nm.
408
Fig. 8. WebLogos constructed for CPCFC motifs highlighted in Supplementary Files 7
409
and 9.
410
Acknowledgements
411
We thank Drs. Reben Rhaman and Sheng-Cheng Wu for producing the AgamCPCFC1
412
protein used for antibody generation. We also thank Dr. Mark R. Brown and Anne
413
Robertson for maintaining the mosquito facility from which the animals were obtained,
414
MR Brown for help interpreting mosquito structures, and Dr. Michael Strand for access
415
to his Leica photomicroscope and Jena Johnson for training in its use. Dr. Neal Dittmer
416
alerted us to the presence of two CPCFC genes in Tribolium; Dr. Hugh Robertson found
417
the Cephus sequence; Dr. Michael Pfrender supplied information about Daphnia and
AC C
EP
TE D
M AN U
SC
RI PT
397
19
ACCEPTED MANUSCRIPT
Drs. Bernhard Misof and Karen Meusemann provided guidance about the 1KITE
419
sequences. We thank Mary B. Ard of the Electron Microscopy Laboratory at the
420
University of Georgia College of Veterinary Medicine for technical support. Drs. Yihong
421
Zhou and John S. Willis and three anonymous reviewers provided helpful comments on
422
the MS. This research was funded by a grant from the U.S. National Institutes of Health
423
R01AI055624.
424
Competing interests
425
The authors declare that they have no competing interests.
426
Appendix A. Supplementary data
427
Supplementary File 1. Conditions used for RT-qPCR following MIQE guidelines (Bustin
428
et al. 2013).
429
Supplementary File 2. Primers used for RT-qPCR, in situ probe construction, and
430
protein expression.
431
Supplementary File 3. Genomic regions (A) and protein sequence (B) for
432
AgamCPCFC1 and Western blot (C) for antibody used for EM immunolocalization. D.
433
CPCFC ortholog in Drosophila melanogaster.
SC
M AN U
TE D
EP
AC C
434
RI PT
418
Supplementary File 4. Illustration of why RT-qPCR data were not normalized to RpS7.
435
Supplementary File 5. In situ hybridization showing comparable hybridization with
436
AgamCPCFC1 probes in protein coding and 3’UTR. A. Label in muscle insertion zones
437
of pharate adult (P24) comparable to Figure 4B. B, C. Hybridization to epidermis in 20
ACCEPTED MANUSCRIPT
head capsules and cells forming tiny setae in prothorax from adjacent sections of larvae
439
97-120 hours after feeding began D. Hybridization to grid and fringe in section adjacent
440
to Figure 2D. (A,B 3’ probe; C,D coding region probe).
441
Supplementary File 6. In situ hybridization of adjacent sections, processed at the same
442
time, using antisense and sense probes. A, B. Treatment of sections of larvae with
443
antisense and sense probes in the protein coding region. Background is high due to
444
low hybridization temperature (55 oC) relative to high melting point of probe (89 oC) as
445
calculated with basic setting of OligoCalc
446
(http://www.basic.northwestern.edu/biotools/OligoCalc.html). Acellular head capsule of
447
previous instar has some stained cuticle, a common occurrence with RNA probes. C-F.
448
Treatment of sections of animals fixed within 30 min of pupation. Probes against 3’
449
UTR have calculated melting temperature 77.6 oC.
450
Supplementary File 7. Sequences of CPCFC proteins in Hexapoda.
451
Supplementary File 8. Sequences of CPCFC proteins in Ladona fulva.
452
Supplementary File 9. Sequences of CPCFC proteins in Crustacea.
453
Supplementary File 10. Non-arthropod TSA hits and their most closely related
454
arthropod match.
456
SC
M AN U
TE D
EP
AC C
455
RI PT
438
References:
21
ACCEPTED MANUSCRIPT
Bustin, S.A., Benes, V., Garson, J., Hellemans, J., Huggett, J., Kubista, M., Mueller, R.,
458
Nolan, T., Pfaffl, M.W., Shipley, G., Wittwer, C.T., Schjerling, P., Day, P.J.,
459
Abreu, M., Aguado, B., Beaulieu, J.F., Beckers, A., Bogaert, S., Browne, J.A.,
460
Carrasco-Ramiro, F., Ceelen, L., Ciborowski, K., Cornillie, P., Coulon, S.,
461
Cuypers, A., De Brouwer, S., De Ceuninck, L., De Craene, J., De Naeyer, H., De
462
Spiegelaere, W., Deckers, K., Dheedene, A., Durinck, K., Ferreira-Teixeira, M.,
463
Fieuw, A., Gallup, J.M., Gonzalo-Flores, S., Goossens, K., Heindryckx, F.,
464
Herring, E., Hoenicka, H., Icardi, L., Jaggi, R., Javad, F., Karampelias, M.,
465
Kibenge, F., Kibenge, M., Kumps, C., Lambertz, I., Lammens, T., Markey, A.,
466
Messiaen, P., Mets, E., Morais, S., Mudarra-Rubio, A., Nakiwala, J., Nelis, H.,
467
Olsvik, P.A., Perez-Novo, C., Plusquin, M., Remans, T., Rihani, A., Rodrigues-
468
Santos, P., Rondou, P., Sanders, R., Schmidt-Bleek, K., Skovgaard, K., Smeets,
469
K., Tabera, L., Toegel, S., Van Acker, T., Van den Broeck, W., Van der Meulen,
470
J., Van Gele, M., Van Peer, G., Van Poucke, M., Van Roy, N., Vergult, S.,
471
Wauman, J., Tshuikina-Wiklander, M., Willems, E., Zaccara, S., Zeka, F.,
472
Vandesompele, J., 2013. The need for transparency and good practices in the
473
qPCR literature. Nat. Methods 10, 1063-1067.
475 476
SC
M AN U
TE D
EP
Colbourne, J.K., Pfrender, M.E., Gilbert, D., Thomas, W.K., Tucker, A., Oakley, T.H.,
AC C
474
RI PT
457
Tokishita, S., Aerts, A., Arnold, G.J., Basu, M.K., Bauer, D.J., Caceres, C.E., Carmel, L., Casola, C., Choi, J.H., Detter, J.C., Dong, Q., Dusheyko, S., Eads,
477
B.D., Frohlich, T., Geiler-Samerotte, K.A., Gerlach, D., Hatcher, P., Jogdeo, S.,
478
Krijgsveld, J., Kriventseva, E.V., Kultz, D., Laforsch, C., Lindquist, E., Lopez, J.,
479
Manak, J.R., Muller, J., Pangilinan, J., Patwardhan, R.P., Pitluck, S., Pritham,
22
ACCEPTED MANUSCRIPT
E.J., Rechtsteiner, A., Rho, M., Rogozin, I.B., Sakarya, O., Salamov, A.,
481
Schaack, S., Shapiro, H., Shiga, Y., Skalitzky, C., Smith, Z., Souvorov, A., Sung,
482
W., Tang, Z., Tsuchiya, D., Tu, H., Vos, H., Wang, M., Wolf, Y.I., Yamagata, H.,
483
Yamada, T., Ye, Y., Shaw, J.R., Andrews, J., Crease, T.J., Tang, H., Lucas,
484
S.M., Robertson, H.M., Bork, P., Koonin, E.V., Zdobnov, E.M., Grigoriev, I.V.,
485
Lynch, M., Boore, J.L., 2011. The ecoresponsive genome of Daphnia pulex.
486
Science 311, 555-561.
SC
Cornman, R.S., Togawa, T., Dunn, W.A., He, N., Emmons, A.C., Willis, J.H., 2008.
M AN U
487
RI PT
480
488
Annotation and analysis of a large cuticular protein family with the R&R
489
Consensus in Anopheles gambiae. BMC Genomics 9, 22.
491 492 493 494
Cornman, R.S., 2009. Molecular evolution of Drosophila cuticular protein genes. PLoS ONE 4, e8345.
Crooks, G.E., Hon, G., Chandonia, J.M., Brenner, S.E., 2004. WebLogo: a sequence
TE D
490
logo generator. Genome Res. 14, 1188-1190. Elsik, C.G., Worley, K.C., Bennett, A.K., Beye, M., Camara, F., Childers, C.P., de Graaf, D.C., Debyser, G., Deng, J., Devreese, B., Elhaik, E., Evans, J.D., Foster, L.J.,
496
Graur, D., Guigo, R., Hoff, K.J., Holder, M.E., Hudson, M.E., Hunt, G.J., Jiang,
498 499
AC C
497
EP
495
H., Joshi, V., Khetani, R.S., Kosarev, P., Kovar, C.L., Ma, J., Maleszka, R., Moritz, R.F., Munoz-Torres, M.C., Murphy, T.D., Muzny, D.M., Newsham, I.F., Reese, J.T., Robertson, H.M., Robinson, G.E., Rueppell, O., Solovyev, V.,
500
Stanke, M., Stolle, E., Tsuruda, J.M., Vaerenbergh, M.V., Waterhouse, R.M.,
501
Weaver, D.B., Whitfield, C.W., Wu, Y., Zdobnov, E.M., Zhang, L., Zhu, D., Gibbs,
23
ACCEPTED MANUSCRIPT
502
R.A., 2014. Finding the missing honey bee genes: lessons learned from a
503
genome upgrade. BMC Genomics 15, 86.
504
Futahashi, R., Okamoto, S., Kawasaki, H., Zhong, Y.S., Iwanaga, M., Mita, K., Fujiwara, H., 2008. Genome-wide identification of cuticular protein genes in the silkworm,
506
Bombyx mori. Insect Biochem. Mol. Biol. 38, 1138-1146.
508
Harbach, R.E., Knight, K.L., 1980. Taxonomist's glossary of mosquito anatomy, first ed. Plexus Publishing, Inc. Marlton, New Jersey.
SC
507
RI PT
505
Ioannidou, Z.S., Theodoropoulou, M.C., Papandreou, N.C., Willis, J.H., Hamodrakas,
510
S.J., 2014. CutProtFam-Pred: detection and classification of putative structural
511
cuticular proteins from sequence alone, based on profile hidden Markov models.
512
Insect Biochem. Mol. Biol. 52, 51-59.
513
M AN U
509
Jensen, U.G., Rothmann, A., Skou, L., Andersen, S.O., Roepstorff, P., Hojrup, P., 1997. Cuticular proteins from
the giant cockroach, Blaberus craniifer. Insect
515
Biochem. Mol. Biol. 27, 109-120.
TE D
514
Misof B, Liu S, Meusemann K, Peters RS, Donath A, Mayer C, Frandsen PB, Ware J,
517
Flouri T, Beutel RG, Niehuis O, Petersen M, Izquierdo-Carrasco F, Wappler T,
518
Rust J, Aberer AJ, Aspöck U, Aspöck H, Bartel D, Blanke A, Berger S, Böhm A,
520 521
AC C
519
EP
516
Buckley TR, Calcott B, Chen J, Friedrich F, Fukui M, Fujita M, Greve C, Grobe P, Gu S, Huang Y, Jermiin LS, Kawahara AY, Krogmann L, Kubiak M, Lanfear R, Letsch H, Li Y, Li Z, Li J, Lu H, Machida R, Mashimo Y, Kapli P, McKenna DD,
522
Meng G, Nakagaki Y, Navarrete-Heredia JL, Ott M, Ou Y, Pass G, Podsiadlowski
523
L, Pohl H, von Reumont BM, Schütte K, Sekiya K, Shimizu S, Slipinski A,
524
Stamatakis A, Song W, Su X, Szucsich NU, Tan M, Tan X, Tang M, Tang J, 24
ACCEPTED MANUSCRIPT
Timelthaler G, Tomizuka S, Trautwein M, Tong X, Uchifune T, Walzl MG,
526
Wiegmann BM, Wilbrandt J, Wipfler B, Wong TK, Wu Q, Wu G, Xie Y, Yang S,
527
Yang Q, Yeates DK, Yoshizawa K, Zhang Q, Zhang R, Zhang W, Zhang Y, Zhao
528
J, Zhou C, Zhou L, Ziesmann T, Zou S, Li Y, Xu X, Zhang Y, Yang H, Wang J,
529
Wang J, Kjer KM, Zhou X., 2014. Phylogenomics resolves the timing and
530
pattern of insect evolution. Science 346, 763-767.
SC
531
RI PT
525
Neafsey, D.E., Waterhouse, R.M., Abai, M.R., Aganezov, S.S., Alekseyev, M.A., Allen, J.E., Amon, J., Arca, B., Arensburger, P., Artemov, G., Assour, L.A., Basseri, H.,
533
Berlin, A., Birren, B.W., Blandin, S.A., Brockman, A.I., Burkot, T.R., Burt, A.,
534
Chan, C.S., Chauve, C., Chiu, J.C., Christensen, M., Costantini, C., Davidson,
535
V.L., Deligianni, E., Dottorini, T., Dritsou, V., Gabriel, S.B., Guelbeogo, W.M.,
536
Hall, A.B., Han, M.V., Hlaing, T., Hughes, D.S., Jenkins, A.M., Jiang, X.,
537
Jungreis, I., Kakani, E.G., Kamali, M., Kemppainen, P., Kennedy, R.C.,
538
Kirmitzoglou, I.K., Koekemoer, L.L., Laban, N., Langridge, N., Lawniczak, M.K.,
539
Lirakis, M., Lobo, N. F., Lowy, E., MacCallum, R.M., Mao, C., Maslen, G.,
540
Mbogo, C., McCarthy, J., Michel, K., Mitchell, S.N., Moore, W., Murphy, K.A.,
541
Naumenko, A.N., Nolan, T., Novoa, E.M., O'Loughlin, S., Oringanje, C., Oshaghi,
543 544
TE D
EP
AC C
542
M AN U
532
M.A., Pakpour, N., Papathanos, P.A., Peery, A.N., Povelones, M., Prakash, A., Price, D.P., Rajaraman, A., Reimer, L.J., Rinker, D.C., Rokas, A., Russell, T.L., Sagnon, N., Sharakhova, M.V., Shea, T., Simao, F.A., Simard, F., Slotman, M.A.,
545
Somboon, P., Stegniy, V., Struchiner, C.J., Thomas, G.W., Tojo, M., Topalis, P.,
546
Tubio, J.M., Unger, M.F., Vontas, J., Walton, C., Wilding, C.S., Willis, J.H., Wu,
547
Y.C., Yan, G., Zdobnov, E.M., Zhou, X., Catteruccia, F., Christophides, G.K., 25
ACCEPTED MANUSCRIPT
Collins, F.H., Cornman, R.S., Crisanti, A., Donnelly, M.J., Emrich, S.J., Fontaine,
549
M.C., Gelbart, W., Hahn, M.W., Hansen, I.A., Howell, P.I., Kafatos, F.C., Kellis,
550
M., Lawson, D., Louis, C., Luckhart, S., Muskavitch, M.A., Ribeiro, J.M., Riehle,
551
M.A., Sharakhov, I.V., Tu, Z., Zwiebel, L.J., Besansky, N.J., 2015. Mosquito
552
genomics. Highly evolvable malaria vectors: the genomes of 16 Anopheles
553
mosquitoes. Science. 347: 1258522.
RI PT
548
Noh, M.Y., Kramer, K.J., Muthukrishnan, S., Kanost, M.R., Beeman, R.W., Arakane, Y.,
555
2014. Two major cuticular proteins are required for assembly of horizontal
556
laminae and vertical pore canals in rigid cuticle of Tribolium castaneum. Insect
557
Biochem. Mol. Biol. 53C, 22-29.
M AN U
SC
554
Noh, M.Y., Muthukrishnan, S., Kramer, K.J., Arakane, Y., 2015. Tribolium castaneum
559
RR-1 cuticular protein TcCPR4 is required for formation of pore canals in rigid
560
cuticle. PLoS Genet. 11, e1004963.
TE D
558
Pesch, Y.Y., Riedel, D., Behr, M., 2015. Obstructor-A organizes matrix assembly at the
562
apical cell surface to promote enzymatic cuticle maturation in Drosophila. J. Biol.
563
Chem. 290, 10071-10082.
565 566
Petersen, T.N., Brunak, S., von Heijne, G., Nielsen, H., 2011. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat. Methods 8,
AC C
564
EP
561
785-786.
567
Rebers, J.E., Riddiford, L.M., 1988. Structure and expression of a Manduca sexta larval
568
cuticle gene homologous to Drosophila cuticle genes. J. Mol. Biol. 203, 411-423.
26
ACCEPTED MANUSCRIPT
569
Ruijter, J.M., Ramakers, C., Hoogaars, W.M., Karlen, Y., Bakker, O., van den Hoff, M.J.,
570
Moorman, A.F., 2009. Amplification efficiency: linking baseline and bias in the
571
analysis of quantitative PCR data. Nucleic Acids Res. 37, e45. Togawa, T., Dunn, W.A., Emmons, A.C., Nagao, J., Willis, J.H., 2008. Developmental
RI PT
572 573
expression patterns of cuticular protein genes with the R&R Consensus from
574
Anopheles gambiae. Insect Biochem. Mol. Biol. 38, 508-519.
Vannini, L., Augustine Dunn, W., Reed, T.W., Willis, J.H., 2014a. Changes in transcript
576
abundance for cuticular proteins and other genes three hours after a blood meal
577
in Anopheles gambiae. Insect Biochem. Mol. Biol. 44, 33-43.
M AN U
578
SC
575
Vannini, L., Reed, T.W., Willis, J.H., 2014b. Temporal and spatial expression of
579
cuticular proteins of Anopheles gambiae implicated in insecticide resistance or
580
differentiation of M/S incipient species. Parasit. Vectors 7, 24. von Reumont, B.M., Jenner, R.A., Wills, M.A., Dell'ampio, E., Pass, G., Ebersberger, I.,
TE D
581
Meyer, B., Koenemann, S., Iliffe, T. M., Stamatakis, A., Niehuis, O., Meusemann,
583
K., Misof, B., 2012. Pancrustacean phylogeny in the light of new phylogenomic
584
data: support for Remipedia as the possible sister group of Hexapoda. Mol. Biol.
585
Evol. 29, 1031-1045.
587 588 589
Willis, J.H., 2010. Structural cuticular proteins from arthropods: annotation,
AC C
586
EP
582
nomenclature, and sequence characteristics in the genomics era. Insect Biochem. Mol. Biol. 40, 189-204.
Willis, J.H., Papandreou, N.C., Iconomidou, V.A., Hamodrakas, S.J., 2012. Cuticular
590
Proteins, in: Gilbert L.I. (Ed.), Insect Molecular Biology and Biochemistry.
591
Academic Press, San Diego, pp. 134-166.
27
ACCEPTED MANUSCRIPT
TABLE 1 CHARACTERISTICS OF CPCFC FAMILY MEMBERS IN HEXAPODA (signals removed)
amino acids between motif 1-2
53 55
3 4
18 19
121
4
26
96 145
5 6
29 64
Collembola Orchesella cincta Onychiurus arcticus Lepismachilis y-signata Enallagma hageni Enallagma hageni
Orthoptera Teleogryllus commodus Gryllotalpa sp.
90 92
Blattodea Blaberus craniifer Blattella germanica
87 87
Pediculus humanus corporis
Hemiptera
1
14
0 43
20 22
16 16
0 0
4 4
22 22
12 12
1 1
154+
6
82
21
end missing
128 128 119
4 4 5
60 60 23
15 15 42
1 1 1
EP
Macrosiphum euphorbiae Acyrthosiphon pisum Kerria lacca
42
6 6
TE D
Phthiraptera
M AN U
Odonata
final C to end
0 0
SC
Archaeognatha
between motif 2-3
RI PT
Order/Species
total mature length
to start of motif 1
HOLOMETABOLA
Hymenoptera
AC C
Cephus cinctus
151
6
54
43
0
94
2
22
21
1
94
4
24
18
0
158 184 180 178 195 302
14 14 14 9 17 47
111 137 129 129 144 222
Megaloptera Corydalinae sp.
Neuroptera
Chrysopa pallens
Coleoptera
Tribolium castaneum Tribolium castaneum Dendroctonus frontalis Dendroctonus ponderosae Pissodes strobi Pissodes strobi
1 1 4 3 2 1
ACCEPTED MANUSCRIPT
18 14 14 14
137 114 115 159
1 2 1 1
158
14
111
1
171
14
124
1
Bombyx mori Spodoptera litura Ostrinia furnacalis Ostrinia nubilalis Antheraea assama Antheraea assama Antheraea yamamai Athetis lepigone Agrotis segetum Papilio polytes
72 72 74 74 76 77 76 70 72 74
2 2 2 2 2 1 2 2 2 2
37 37 39 39 42 44 42 35 37 39
Papilio xuthus Danaus plexippus Heliconius melpomene Heliconius melpomene Heliconius erato Mamestra brassicae
74 74 74 74 74 72
2 2 2 2 2 2
39 39 39 39 39 37
Diaprepes abbreviatus Colaphellus boyringi
Oropsylla silantiewi
Diptera
AC C
1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1
110
6
35
20
1
150 149 159 148
9 9 9 9
72 71 77 70
21 21 21 21
0 0 0 0
152 190 144 148 131 165 146 147 147 147 149 152 147 147 146
9 9 7 9 6 5 9 9 9 9 9 9 9 9 9
74 111 66 68 62 86 65 66 66 66 68 71 66 66 65
20 21 23 22 15 22 23 23 23 23 23 23 23 23 23
0 1 0 0 0 4 1 1 1 1 1 1 1 1 1
EP
Anopheles gambiae Anopheles darlingi Anopheles sinensis Anopheles funestus Anopheles quadrimaculatus Aedes aegypti Chironomus riparius Sitodiplosis mosellana Sitodiplosis mosellana Culicoides sonorensis Drosophila ananassae Drosophila yakuba Drosophila grimshawi Drosophila melanogaster Drosophila erecta Drosophila persimilis Drosophila simulans Drosophila sechellia Drosophila mojavensis
M AN U
Siphonaptera
SC
Lepidoptera
RI PT
188 162 165 206
TE D
Rhynchophorus ferrugineus Anthonomus grandis Agrilus planipennis Onthophagus taurus
ACCEPTED MANUSCRIPT
9 9 9 9 9
65 66 58 113 95
23 23 21 21 20
1 1 1 1 1
133 139 145 136
9 5 9 9
54 63 65 57
21 21 22 21
1 1 1 1
RI PT
146 147 137 193 175
AC C
EP
TE D
M AN U
SC
Drosophila willistoni Drosophila virilis Ceratitis capitata Teleopsis dalmanni Corethrella appendiculata Glossina morsitans morsitans Musca domestica Bactrocera dorsalis Bactrocera cucurbitae
ACCEPTED MANUSCRIPT
TABLE 2 CHARACTERISTICS OF CPCFC FAMILY MEMBERS IN CRUSTACEA (signals removed)
total length
to start of motif 1
between motif 1-2
91 92
3 4
27 27
65 73 72 47 48 48
2 9 9 4 4 4
28 29 28 10 10 10
156
4
81 79
4 3
56
1
amino acids Between Between motif 2-3 motif 3-4
Ostracoda Cypridininae sp. Cypridininae sp.
Amphibalanus amphitrite Calanus finmarchicus Eucyclops serrulatus
76
2
AC C
EP
Speleonectes cf. tulumensis Speleonectes cf. tulumensis
21
17
13 13
15 15
14
TE D
Remipedia
ALL Malacostraca are C-X(7)-C
M AN U
Maxillopoda
SC
Malacostraca Melita plumulosa mira Hyalella azteca Hyalella azteca Procambarus clarkii Petrolisthes cinctipes Petrolisthes cinctipes
10
Between motif 4-5
RI PT
Class/Species
15
17
17
final C to end 29 29 2 2 2 0 1 1 0 1 0 9 1
ACCEPTED MANUSCRIPT
CPCFC1 transcript levels 6000
4th Instar Larvae
Pupae
4000
M AN U
3000
2000
0 0 hr
12 hr
TE D
1000
24 hr
36 hr
48 hr
EP
Age
AC C
N0 X 107
Adults
SC
5000
RI PT
FIGURE 1
0 hr
12 hr
24 hr
< 10 min
< 12 hr
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT Figure 8
SC
M AN U
NON-HOLOMETABOLA (11 species, 33 motifs)
RI PT
HOLOMETABOLA (55 species, 152 motifs)
AC C
EP
TE D
CRUSTACEA without Malacostraca (5 species, 20 motifs)
MALACOSTRACA (4 species, 12 motifs)
ACCEPTED MANUSCRIPT CPCFC HIGHLIGHTS New cuticular protein family described, characterized by a 16 amino acid motif ending C-X(5)-C.
•
In Anopheles gambiae, transcripts localized primarily in epidermis underlying hard cuticle.
•
Proteins localized primarily in endocuticle.
•
Family members identified in 14 orders of Hexapoda and 4 classes of Crustacea.
AC C
EP
TE D
M AN U
SC
RI PT
•
ACCEPTED MANUSCRIPT
SC
RI PT
SUPPLEMENTARY FILE 1 MIQE Experimental design and sample collection Sample description Fourth instar larvae and pupae were collected immediately after ecdysis. Animals were either placed in TRIzol® and frozen immediately, or kept until the desired time after eclosion. One group of adults was processed within 10 min of eclosion and another within 12 hrs of eclosion. Larval samples: n = 3-5 for each age Number per sample Pupal samples: n = 2-5 for each age Adult samples: n = 3 for each age Technical replicate number Three technical replicates were run. Nucleic acid extraction Procedure Immediate placement in TRIzol® (Ambion), freezing at -80o C; RNA extraction followed TRIzol’s protocol. Quantification NanoDrop 1000 Spectrophotometer Purity 260/280 analysis (value 1.95-2.04)
Amount of RNA Reaction volume Temperature and time RT-qPCR target information Sequence accession numbers
Primer sequences
EP
RT-qPCR protocol Complete reaction conditions
AC C
Thermocycling parameters RT-qPCR instrument Data analysis R0 determination
RpS7 (AGAP010592); CPCFC1 (AGAP007980) See Additional file 8.
TE D
RT-qPCR oligonucleotides
Life Technologies SuperScript III First-Strand Synthesis System with Oligo-dT20 primer 1.0 µg of total RNA 15 µl 65°C for 5min; 55°C for 50min; 85°C for 5min
M AN U
Reverse transcription Procedure/kit
Reaction volume: 15 µl Primer: 2.5 µM cDNA: 5 µl of 1/100 diluted cDNAs Polymerase and reactants: SsoAdvanced™ Universal SYBR® Green Supermix (Bio-Rad)
95°C for 2min; 39 cycles of 95°C for 10s and 57°C f or 30s Bio-Rad CFX ConnectTM Real Time PCR Detection System
R0 values were calculated with amplicon mean efficiency per run using LinRegPCR software (Ruijter et al., Nucleic Acids Res. Reference genes were not 2009;37(6):e45). used. rpS7 (Togawa et al., Insect Biochem Molec Biol. 2008;38:508519) was validated as a stable reference gene only for larvae of all ages. It could only be used for a single age of pupae and pharate or eclosed adults. Table based on recommendation of Bustin et al. (Nat Methods. 2013:10:1063-1067). Table design from Isolani et al. (Eur J Pharmacol. 2012:688:1-7).
ACCEPTED MANUSCRIPT SUPPLEMENTARY FILE 2 -- PRIMER SEQUENCES
AgamCPCFC1 Primers
in situ hybridization – antisense probe in coding region in situ hybridization – antisense probe in 3’UTR in situ hybridization – sense probe in coding region in situ hybridization – sense probe in 3’UTR Protein production
Sequence 5’-3’ CCACTGCCAGGATACACCTC in coding region GTCAGGAAATGGGAAGGCGA in 3’UTR GTGAGGTCGAGTTCAACAACAAGAA GGCACCGGCACGTAGATGA CTCAGCCCAGCTGGAACGCC TAATACGACTCACTATAGGGCAGGTGTGCGGGGACACTC TCCCACGTTTGCCATGGTTGTGT TAATACGACTCACTATAGGGTGTGTGCGATTGCACGCTGA TAATACGACTCACTATAGGGCTCAGCCCAGCTGGAACGCC CAGGTGTGCGGGGACACTC TAATACGACTCACTATAGGGTCCCACGTTTGCCATGGTTGTGT TGTGTGCGATTGCACGCTGA CATCATCACCACCATCACCAGCCAGCCGCCCAGTATCC GTGGCGGCCGCTCTATTAGAAGTTCGGGCAGGTGTGCG
RI PT
RT-qPCR
Probe Name CPCFC1 qPCR-2 CPCFC1 qPCR-2 RpS7-UC RpS7-DD CPCFC1-UA T7CPCFC1-DA CPCFC1-UB-3’ T7CPCFC1-DB-3’ T7CPCFC1-UA CPCFC1-DA T7CPCFC1-UB CPCFC1-DB PP-CPCFC1-F PP-CPCFC1-R
SC
Purpose
AC C
EP
TE D
M AN U
Sequences in bold are not part of AgamCPCFC1 gene. They are the T7 primer used for probe construction, the sequence that added the 6 His residues to the protein, and the adaptor for the plasmid.
ACCEPTED MANUSCRIPT
SUPPLEMENTARY FILE 3 Genomic regions and protein sequences for AgamCPCFC1 and ortholog in Drosophila melanogaster.
AgamCPCFC1 – [GenBank:AGAP007980]
RI PT
A. The genomic region (from VectorBase) that included AgamCPCFC1 showing probes used for in situ hybridization.
EP
TE D
M AN U
SC
............taattgtacgctccgcaaacggacggactacaaccggcactggcactgctgatcgatcaccaccc tcttctagccctgccgtggtggaggggggggggaaggggtttactcgcccaggtagtactgtaactgtaaccgccga ctactcgagccgatcgttcgggggagaatgttcgggctctcggcgttacacgggtacggtcgagtggtccaatggat cgatttcggcgcggagaaagaaatcgtcggtggcgctcgtgggcacccttccctttaatggttcgccgttggtgttt ggctaggtggagctaaggtagggggctgctcgaacctcggctgctcggtggagccgcagggaggggatacctagaac gcggctccacgaggctcacgagagagcgccccgaacgcgagcgacgaacctcgtggccgggcccgatcggctgggtg tggtgtggtttttactataaaagctcggtattgtctttcggagatcagtaTTCGGTACTCGGTTTGGAAGTTTTGTG TTAAGCGAACAACAGTCCGTGTTTCGGTAACATAAAAGTCCAACTTGCCTGTGTGCATACGGAAGATTGAACGCAAG AGATACTCTGCATCCCAAAATGTTCTCCAAAGTGgtaagcaatttagagggtgtcacgaagggtatgggggatcaaa ccctttgaagggttgagcctgatctgtgtgtgcgtgagtgcgaggatagacagccccaacagagaacggctgtcaga attgtggaattgtgtggaagaggatcgtgtgcaatcagtgtgcagggggcgtgatgaatcggatgtgcaactgtgtt taatccaaagtctggtgatgctaatcgttgcttcctgtacttgtgcgcttgcttgtagATCGCTGTTTTGGCCTTTG CCGCCGTGGTAGCCGCTAAGCCCCAACATCAGCCAGCCGCCCAGTATCCGGCCGGAGTCGATCCGTCCCGCTGCCCG TCGTACCCGAACTGCGATAACGCGGCCCTGCACAGCCCGAACCCGTACAACAACCATGCCGCCAACCACTGGAACCC GAACTGGAACGCTCAGCCCAGCTGGAACGCCGCCCCTGCCCCTGCTCCGGCCCCGGCCGCCTACTACCACGGAGCTC CCCACTCGTACCAGGCCCTGACTGGCCCGAGCCACAACTACATTGGAGCTCCCAGCCCCTCGGCCGGTGGTGACCGg taggttcaaggttccccgagacctcttcctcaccctcacacaagttcaatattgctacaacgtccgcttccattcgc tcttttccccctcacagTTACCCCGCTGGAGTGAACCCGCAGTCGTGCCCGAACTATCCGTACTGTGATAACACCGT TCAGGCTGGCGTTCCTCAGGTTGCCCCACTGCCAGGATACACCTCCCGCCAGTACCCGGCCGGAGTGTCCCCGCACA CCTGCCCGAACTTCCCGTACTGCTAAGACATTCGCCTTCCCATTTCCTGACCCTCCCCACACTACCCTTCATCGTAC TATTGATCTGTGACCACACGTTTCTTCCTTCCTTGCGCAATTTTCATCCGTTCTACCCCTACCCTGTACATCTTCCC ACGTTTGCCATGGTTGTGTAAATAACTGGTACTGGTTTGTTTTCGTTAGTGTTTTGTGCATGAAATACTGCCTCTTT AGTAGGAGAGATATTAGCTGTGTTCCTTAGAGTGATTGGTTCAACAAGCAAGAGCGATAAGAGCGCAACAATCAAAA GCATTGGAGAACTTATGGGAAATATCATTGTATAAAAAAACAAAAAGAATCTGTGAAACTACAACTGCAACAACAAC AGCAATATCTTCATCGACTATGTTATCTCAGCGTGCAATCGCACACATCGCAAATGAAATTGAAATGCAATTGATTT Aagaatcaaacggattatgttgtagtgcgtttaaatgaattcgatgttacaccgaacattcatgttgtgggttgtgg ggggaatggtaacgcttgtactggtaacagctaagtgatcaaaaatgtttactgttgatccaaagattctagctcct gcttcttcttattcttatttggcgccacaaccttaatcggttcagcgcaagcgaagcttgtaatgagcttgtctact tattggtatt
AC C
Probe used for in situ hybridization that came from coding region (gray highlighted) is shown in dark orange, the one in the 3’UTR (interrupting purple type) is in light orange. Probes were 284 and 282 nt respectively. Introns are in blue type; upstream and downstream regions in green. B. Protein sequence of AgamCPCFC1. Gray highlighting shows 16 aa motifs, blue type indicates region used for antibody production. MFSKVIAVLAFAAVVAA KPQHQPAAQYPAGVDPSRCPSYPNCDNAALHSPNPYNNHAANHWNPNWNAQPSWNAAPAPAPAPAAYYHGAPHSYQA LTGPSHNYIGAPSPSAGGDRYPAGVNPQSCPNYPYCDNTVQAGVPQVAPLPGYTSRQYPAGVSPHTCPNFPYC
ACCEPTED MANUSCRIPT
M AN U
SC
RI PT
C. Western blots of crude protein extract of An. gambiae legs treated with anti-CPCFC1 (diluted 5,000-fold). Molecular weight marker is in daltons.
D. The AgamCPCFC1 ortholog in Drosophila melanogaster from FlyBase [GenBank: CG8736]. Colored type added to conform to Anopheles.
AC C
EP
TE D
>CG8736 2R:8594031..8595001 (reverse complement) atctgttgggccaatcaagtaaaatatgcgcgagatcagtcaactacagaaacaaaagcaaaagtaaagcaaagcca ctgcagcagcagcaatagcaaaagcaacaagcacagcagccgcagtaatgaaagtgaaaccgagtctggccgagaga ctctggctgagattgagacccggccaagagtcggttctagccagcaccgctatataagcttgatggccgggctcggc AGCAGCAGTGCAGCGCCGACCAGGAACCCAATTGGAAGTTTGAGCTACGACTCCATAGTCCAATTCGGCAAGGATTA CCATAAGCCCCACACCAGAACCAACTCCACAACTACCAACCACCCACTCACCTCAGCCAACATGTTCTGCAAGCTGg taagtgccctttgagccaagtttctgcccacaaggatagcgtctgaaaaagttcctttaactactaagtggagctgg aatccaaatctgcaacattttagttgaagttcttaagatccgaggatcctaagttccagatatttttcaaactacag atgtatcttattacatttaaaaattccatatttttttaaaatcttttcaaagCTTTTCGCTACCTTCGTGGCCCTGG CGGTGGCCAAGCCACAACACCAACCTGCTGCCCAGTATCCGGCTGGCGTGAATCCGCAGGACTGCCCCAACTTCCCC ATCTGTGATAATGCGCGCCTGCACAATCCGCAGCCGCAGTGGGGTGCCCCGCAGCCACAGTGGAACCCCCAGCCGCA GCCACAGTGGAACCCGCAGCCACAGTGGCAGCAACCTCAACCCCAGTGGAACCCCCAGCCGCAGCCACAGTGGCAGG CACAGCCCTCGTGGAACGCAGCCCCTGCTGCCGCACCCGGTGGCGATAAGTATCCAGCTGGCGTCAATCCGCAGACC TGCCCCAACTATCCCTACTGCGACGTGAACGCCGGACACGCTGGTGCTCCCGTGGCAGCTCCTCCTCTACCTGGCTG GACGGAGCGTCTGTATCCCGCCGGAGTTTCGCCGCACCAGTGCCCCAACTTCCCGTACTGCAACTAGGGCGGCCTAG GGCTCACTTGCGGCCAGCCGCAGCTTCCTTTAACGCTTCGCCTTTCCCCAGTTCTCAATTAGTGGACATTAATCTGA AATTCTTTGTTGTTGGCGCCGAAATAAATGCAAAATGTTGGTCAAAGaaatgggactttctatggttgatagctgca gatacatgggggtatacaaatcgttctattcgcaactacaacttcttactatattacaaaatgtaattgttggttgg ttcatagaactcttactaatgaagaaaagattaattgaccaaagtaagcatttataattaaataaagtaatttgaac tgtctacaatcagaaactcattcagtcaagtgcgctaaatgaggtatgaaagaagactgttaagatcctgctccatg MFCKLLFATFVALAVA KPQHQPAAQYPAGVNPQDCPNFPICDNARLHNPQPQWGAPQPQWNPQPQPQWNPQPQWQQPQPQWNPQPQPQWQAQP SWNAAPAAAPGGDKYPAGVNPQTCPNYPYCDVNAGHAGAPVAAPPLPGWTERLYPAGVSPHQCPNFPYCN-
ACCEPTED MANUSCRIPT SUPPLEMENTARY FILE 4 ILLUSTRATION OF WHY RT-qPCR DATA WERE NOT NORMALIZED TO RpS7. Data are from same cDNA preparations used for Figure 1.
RI PT
S7 Transcript Levels 6000 5000
SC
N0*107
4000 3000
M AN U
2000 1000 0
L4 0 hr L4 12 hr L4 24 hr L4 36 hr L4 48 hr
P 0 hr
P 12 hr
P 24 hr
A < 10 A < 12 hr min
P 12 hr
P 24 hr
A < 10 A < 12 hr min
TE D
Age
CPCFC1/S7
EP
3 2.5
1.5 1 0.5
AC C
CPCFC1/S7
2
0 L4 0 hr
L4 12 hr L4 24 hr L4 36 hr L4 48 hr Age
P 0 hr
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
SUPPLEMENTARY FILE 7 CPCFC FAMILY MEMBERS IN HEXAPODA Collembola (2 species, 2 sequences)
RI PT
>gi|570575880|gb[GenBank:GAMM01011142.1]TSA: Orchesella cincta OC11152 MMKQIVALLVLAIVAYAYA QERYPAGVSPASCPNYPFCNVNGIGAPPGYHFDRSLGYPAGIHPSTCPNYPYC>gi|164439313|gb[GenBank:EW760097.1]|EW760097 sb_009_07P09_Onychiurus arcticus MSKVILVLMVLAVFATVCFA QADRYPAGVSPASCPNYPFCNNVGHNVPIGCRFDSANHRYPPGVDASTCPFFPYC-
Archaeognatha (1 species, 1 sequence)
(1 species, 2 sequences)
M AN U
Odonata
SC
>gi|283497943|gb [GenBank:N223383.1]|FN223383 FN223383 dmp031cm Lepismachilis y-signata MMKLVVLAALVALAAA QADRYPAGLNPAACPNFPLCDSNAIAAFQHNPNTYFSPPSAPTGARYPEGINPVTCPNYPYCGASAPAGAPAGYASAPASYAPAPSNYA QAPAGYASSPNNNIAYPAGVNPSSCPNYPYCH-
TE D
>gi|459260966|gb[GenBank:GAEQ01007162.1]|TSA: Enallagma hageni contig09773 MFAKLFVFAACVAVALC AVADKYPAGLNPALCPNYPDCDNTLIALHSSNPSAVLPYAAAPLYHYGREYPAGVHPAACPNYPYCNTLAYPYAAHYAREYPAGVHPAA CPNYPYC>gi|459229756|gb|[GenBank:GAEQ01017608.1]| TSA: Enallagma hageni contig07609 MIAKSVAIILCTVAVACTA APQAARFPAGIDPQVCPNYPDCDNVALAASISAQVQQQQYAAAPYSAPYSAPYSAPQPAPYNPPPQQYSYPTYAQPAPAAPRAAEGYPA GVDARVCPNYPYCGPTPAHVPAAPQNYAAPQNYAAPPPANNWAAPQNQYNAPIPEP-
Orthoptera (2 species, 2 sequences)
AC C
EP
>gi|701837185|gb|[GenBank:GBHB01030180.1]| TSA: Teleogryllus commodus MblContig30181 MALKLVLALCLVAVALA APQADRYPAGLNPALCPNYPLCDNNVIATYGPAAAAVPRAREYPAGVPAAACPNYPFCNVNLHAPPLPGFSARLYPAGVPAAACPAYPY C>gi|714206773|gb|[GenBank:GAWZ01160366.1]| TSA: Gryllotalpa sp. AD-2013 C634000 MIAKLMVVAVALLAAVYA APQADRYPAGLNPALCPGYPVCDNALIATYGPSGAPVHNVYARQYPAGVNPAACPNYPYCNTAVSAAPLPGFSARLYPAGVSPAACPGY PYC-
Blattodea (2 species, 2 sequences)
>Original sequence for family: Blaberus craniifer gi|3023587|sp|P80674.1| Name:Bc-NCP1 QADKYPAGLNPALCPNYPNCDNALIALYSNVAPAIPYAAAYNYPAGVSPAACPNYPFCGAIAPLGYHVREYPAGVHPAACPNYPYCV>gi|698758469|gb|[GenBank:GBID01001268.1]| TSA: Blattella germanica Contig1280 MYCKLVVLAAIVAVAVA QADKYPAGLSPALCPNYPHCDNALIALYSNNAPAVPYASAYNYPAGVSPAACPNYPFCGALAPLGYHVREYPAGVSPAACPNYPYCH-
1
ACCEPTED MANUSCRIPT
Phthiraptera
(1 species, 1 incomplete sequence)
>modified from gi|242023445|ref|[GenBank:XP_002432144.1]| Galectin-3, [Pediculus humanus corporis] MIVQLFFFASIVISAFA GPAGDKYPAGLDPNLCPNYPNCDNVLLAAAQTQPGVYTSGTYNGAYNGAYNGAYNGAYNGAYNDAYNSGAYVPEAYSGYPNTGFA SGAYTGFGNPAGGAHAVPGYPAGVNPASCPNYPYCTNYAPNAYHQVAPLPGFTHREYPEGVNPTTCPNYxxxxx
RI PT
Hemiptera (3 species, 3 sequences)
M AN U
SC
>gi|659433823|gb|[GenBank:GAOM01006399.1]| TSA: Macrosiphum euphorbiae Me_WB07486 MIGVLAIVFAVQATAVLA GGARYPSGLNPALCPNYPHCDNVLLAAYAQPAAGNDYNDHYTNNNNAGQYYGSGYQHHPSEASNYHNNQPLVPAPYTEPGYPAGLSSSN CPNYPYCSHQVPAEALRYAHKRYPSGVSPQNCPNYPYCH>gi|193636711|ref|[GenBank:XP_001949693.1]|PREDICTED:cuticleprotein1[Acyrthosiphon pisum] MIGVLAIVFAVQATAVLA GGARYPSGLNPALCPNYPHCDNVLLAAYAQPAAGNDYNDHYTNNNNAGQYYGSGYQHHPSEASNYHNNQPLVPAPYTEPGYPAGLSSSN CPNYPYCSHQVPAEALRYAHKRYPSGVSPQNCPNYPYCH>gi|656473778|gb|[GenBank:GBDP01042177.1]| TSA: Kerria lacca L_17239_T_1/1_C_1.000_L_675 MISKTIFICTVLLISVTCQS QSYQSNKYPAGIHPNLCPHYPYCDNTVLAGFAQGVAAFHTGAAAPGYPASLSPQACPNYPYCSHQIPPEAIHYRRSAALHQYPTVAEST NYAYPIPSAIDLRTKYPSGVNPASCPNYPYCH-
HOLOMETABOLA Hymenoptera
(1 species, 1 sequence)
(1 species, 1 sequence)
EP
Megaloptera
TE D
>Cephus cinctus [Ccin1_scaffold0997]_contig12, whole genome shotgun sequence MCTLILNYLIQIQVVLCILALATTLLA KPNGDRYPAGVNPQSCPNYPNCDNAALHSGRASTPSWSPQGGAWAPAGAPAAPWAQPASPWNAPHSAGNPASAGAQYPAGVNPQSCPNY PQCDNAALHGGAPANNDWNEPSSNSWDSWDSWSDPSTAQPAAVAPRYPAGVSQQSCPNYPYC-
AC C
>gi|661056549|gb|[GenBank:GADH01013481.1]| TSA: Corydalinae sp. KMRSPBM-2012 contig14585 MFKPVVVLIAVLVACVSS QADRYPAGLNPALCPGYPRCDNSLLALHSARTEPVADYTATRYPAGVPAAACPNYPFCNTGEAYGYSAARPLPGFTRRLYPDGVPAAAC PNYPFCH-
Neuroptera
(1 species, 1 sequences)
>gi|459415814|gb|[| TSA: Chrysopa pallens Unigene29704_dacaoling MNQLVILTVVAFIACAYG QADRYPAGLNPALCPGYPNCDNALLALYSTGAIPAPPLQAPAARYPAGVPAAACPNYPYCNVGAPESALPLPGYAQRLYPAGVPAAACP NYPYC-
2
ACCEPTED MANUSCRIPT
Coleoptera
(10 species, 12 sequences)
AC C
EP
TE D
M AN U
SC
RI PT
>gi|91087673|ref|XP_976428.1| [Tribolium castaneum] MFVKLTVLACSIAAVCG VWNGPLAGGVPAHQYPAGVSPQACPNFPNCANPAVAANPNAPAPYNPVPQYNHYNPAPQYNGYNPAPVPQYNPGLQSALDRGEYIGDGD YHGEGLAESGAYGNNGQHGGYNGGYNGGYNPAPAYNPAPAYNHGLPAGVPAQVPAGVDARSCPNYPFCH>gi|91087671|ref|XP_976426.1| [Tribolium castaneum] MFVKLAVLACSLAVSAA VYSGPLAGGVPAAQFPAGVSPQACPNYPNCANPSVAVNQAPVSQYNAAPQYTPQQYQPAPQYAPQQYQPAPQYTPQQYQPAPQFAPQQY NAAPARPQYTPEVQNALDRGEYIGDGDYHGEGLAEALAPGYQGQAQAYNAAPAYNPAAYAPQPQAHHQLPAGVGQPAQIPAGVDARSCP NYPFCH>gi|452930847|gb|GAFI01012246.1| Dendroctonus frontalis MFVKLVTLALCLTSAWA VYNGPLAGGLPADLYPAGVSPQACPNFPNCANPAVAVSSGAPQNNWGAPQPQPAWNQAPQSQWNAPQPQWNAPQPQWNNYNPQPVPQWN PSGQNALEKGGYTGDGDYHGEGLAEALAPGYENAGGWNKWNNNDNQAAAWNQAPAWNAGPQAGLPNGAGARIPAGVDPNACPNYPFCGG GH>gi|459324431|gb|GAFX01014541.1| Dendroctonus ponderosae MFVKLVTLALCLTSAWA VYNGPLAGGLPADLYPAGVSPQACPNFPNCANPAVAVSSGAPQNNWGAPQPQPAWNQAPQPQWNAPQPQWNAPQPQWNNYNPQPVPQWN PSGQNALDKGGYTGDGDYHGEGLAEALAPGYENAGGWNKWNNDNQAPAWNQAPAWNAGPPAGLPNGAGARIPAGVDPNACPNYPFCGGH >gi|452925844|gb|GAEO01000512.1| TSA: Pissodes strobi Pissodes_strobi_Contig512 MFVKLVVFVCFAGSALG QHQQYQGPLAGGQPAALYPAGVNPQSCPNYPDCTNPLVAISQNAAPQYAQSAPQYQQPAQYQQPSQYQQYQAPAVTPAPVSQYNPVYPQ QYAPASQRQYSSDVQQRLDRGEYIGDGDYRGEGLAEALAPGYAGQAQAAPQYNPAPQYNPAPAFQPAPQQYNAAPAYPQAAPSAQPAQI PAGVNAQACPNYPFCHA>gi|452924214|gb|GAEO01001329.1| TSA: Pissodes strobi Pissodes_strobi_Contig1329 MFHKLAILLCFMSVTIA QYHQHPQYQQAQYQQQPQYQQQPQYQQPQYQQAQYQQQPEPVPTAAQFPAGVDAQSCPNYPECLNPLLAVQAVAKASDPRYLAQNAP AQRESQYSPDVQQRLDRGEYIGDGDYHGEGLDEALAPELAVRGHYDGQTAAAQYAAQPAAAQYASQPAAAQYAAQPNIGAVQYPAQQAA SQYVAQPAAVQYPAQRAASQYAAQPGAVQYPAQQAASQYNAAPQYITAPRQATAPQYYQARPSPHGSQHAQASLFTPQYAQPAPVSEAR SYSPVANAIASGSEPVPSVQLPAGVDANACPNYPFCH>gi|372374709|gb|JR483044.1| TSA: Rhynchophorus ferrugineus contig15581.Rhfeelpa MFVKLAVFSCALALAFA QYHQPYNGPLAGGQPASLYPAGVSPQSCPNYPDCSNPLVAVQNSAPQYAPSAPQYPQPAQYSQYQAPAPVTPAPVSQYNPVYPQQYAPA SRSQYSPDVQQRLDRGEYIGDGDYHGEGLAEALAPGYAGQAPRQYAPAPAPYQPAPAPYQPAQAYSQPAYPQAAPAGPQPAQIPAGVNA NACPNYPFCH>gi|562764064|gb|GABY01013552.1| TSA: Anthonomus grandis A_grandis_454_rep_c207 MFVKLVTLALCLTSTLA VYNGPLAGGLPASLYPAGVSPQACPNFPNCNNPAVAANPNSPTQQQWGSPQPQAWGQQQPAWTQQPQNQWNNAQAVPQWNGNNNDVLLK GGYTGDGDYRGEGLAEALAPGYENSDVWRNWATGGQQQANQWNQAPQGPTHGVGQIPAGVDAGACPNYPFCGH>gi|429236657|gb|GAAB01001063.1| TSA: Agrilus planipennis 000793_EAB-5_isotig01124 MFVKLVVLACISSVALA AYNGPLAGGEPAHRYPAGVDPSACPNFPHCNNPAVAVNQQPAHAWNAQPQWNAAPQNQWNPAPQNHWNAAPQNQWNAAPQQWNAQPSWN GNQNALDSGAYTGDGDWHGEGLAEAGAFGDISHNFNDPAPGHPIPQAAHHVPVPGLPAQLPAGVDAHACPNYPYCH>gi|211331260|gb|FG540609.1|FG540609 OtL019A08_021607c Onthophagus taurus MFTKLTTLACVLAVANC AWNGPLAGGAPASSVPAGISEAACPNYPHCTNPSVAVEPNSPAQPQSQYQQYQPQYQQPQYQSHQPQYQPQQQYQSQPQQQYQPQPQYQ SQPQQYQPQPQQYQQQPQQNQYNSGNHNENVLLSGEYTGDGDYRGEGLAESGAFGPVDDPKSYDATPAPQMTYQPTPAYNPAGYQQQGY NNAPQHAQPNNVPAGLDPRYCPYYPFCH-
3
ACCEPTED MANUSCRIPT >gi|46496193|gb|CN475749.1|CN475749 USDA-FP_124839 Diaprepes abbreviatus MFVKLVTLAICLASARA VYNGPLAGGKPADLYPAGVSPQACPNFPNCANPAVAANPNAPGGYPWGSQPQNAWAQTGNNWNSAPQNNWNAPAAEWSPYRQNALDRGE YTGDGDWHGERLAEALAPGYENRGGGWNNNGGQYDGSQGWAGVNQPPAGLGAIPAGVNPGSCPNYPFCK-
Lepidoptera
RI PT
>gi|749107276|gb|GBHN01000010.1| TSA: Colaphellus boyringi Contig11_AA MFVKLAVIACSLAAANA VYNGPLAGGQPAALYPAGVSPEACPNFPNCNNPAVAANPQQAAPHQYGAPQPQYNAQPQNQYNAQPQNQYNQGQQYNPAPVPQQYGNDA NNRLNRGEYIGDGDYHGEGLAEALAPGYSQPNYNDANQYKNQGNNYNQGPQGVPQNIHQTHGVGQIPAGVDAHACPNYPFCS-
(14 species, 16 sequences)
AC C
EP
TE D
M AN U
SC
>Bombyx mori gi|698765134|gb|GBJR01010300.1| TSA MYGKLFAILTLAAVALA REYPAGLHPAICPNYPFCDADALAKYTPQGMPIPEWVRNPAILPIARAASNSVPKYPADFPAALCPNYPYCW>Spodoptera litura gi|612350358|gb|GBBY01010560.1| TSA MFGKMFVFFAVLVVALA REYPAGVHPAVCPNYPYCDADALARHTPDGMPIPQWGYHPGVAPAAPGPVPAAPRYPADFPPALCPNYPYCW>Ostrinia furnacalis gi|572957986|gb|GAQJ01033381.1|TSA MFAKLFALLALAAVALC REYPAGVHPAVCPNYPYCDTTAFARHTPDGQPIPEWVYNPSILPVAPVDPAHNAAPRYPADFPAALCPNYPYCW>gi|597687333|gb|GAVD01010517.1| TSA: Ostrinia nubilalis comp17384_c0_seq1 MFAKLFALLALAAVALC REYPAGVHPAVCPNYPYCDTTAFARHTPDGQPIPEWVYNPSILPVAPVDPAHNAAPRYPADFPAALCPNYPYCW >gi|189554912|gb|FG209476.1|FG209476 Aace00901 Antheraea assama MYGKLLIVFALVVVALG QKYPAGVHPAVCPNYPFCDAQALARHTPDGTPIPEWVRNPSILPAPVPNHYAAGSFAAPRYPADFPAALCPNYPYC>gi|189567448|gb|FG208832.1|FG208832 Aace00257 Antheraea assama MFGKLFFLCAVAVAIAE QYPAGVHPAICPNYPFCDAETLARFTPDGMPIPEWYRNPALIPAPVPVPVVRAFEAPVAAKYPADFDASKCPNYPYC>gi|755820069|gb|GBZJ01043295.1| TSA: Antheraea yamamai Unigene46686_Ayam MYGKLFIVFALVVVALG QKYPAGVHPAVCPNYPFCDTQALARHTPDGTPIPEWVRNPSILPAPVPNHYAAGSFAAPRYPADFPAALCPNYPYC>Athetis lepigone 1-4gi|576213777|gb|GARG01025927.1| TSA MIGKMLFFFALAAVALA REYPAGVHPAVCPDYPFCAPDALARHTPSGIPIPQWGYNPGVAPGHPGPAALKYPADFPAALCPNYPYCW>Agrotis segetum gi|617808548|gb|GBCW01017681.1| TSA MFGKMLVFFALAALALA REYPAGVHPAVCPDYPFCAADALARHTPDGMPIPQWGYNPGVAPAHAGAVPAAPRYPAGLPPALCPNYPYCW>Papilio polytes gi|389611347|dbj|BAM19285.1| PpolCPH1 MYAKLFVLCVLAGVALA REYPAGLHPAVCPNYPYCDTNTFARFTPDGMPIPEWVYNPSILPVAPADPHANAAPKYPANFNAAACPNYPYCW>Papilio xuthus gi|389608733|dbj|BAM17976.1| PxutCPH1 MYAKLFVLCVLAGVALA REYPAGLHPAVCPNYPYCDTNTFARFTPDGMPIPEWVYNPSILPVAPADPNANVAAKYPANFNAAACPNYPYCW>Danaus plexippus gi|357617832|gb|EHJ71016.1| MYAKLFIVCAVAVVALA REYPAGLHPAVCPNYPYCDATAFQRFTPEGQPIPEWVYNPSILPQAPVDPNANLAARYPANFNAAACPNYPYCH>Heliconius melpomene cDNA clone Hm_pwAE2_33G10 MYAKLFIVCVVAVVALA REYPAGLHPAVCPNYPFCDTNAFARFTPEGMPIPEWVYNPSILPVAPADPNANIAAKYPANLNPAECPNYPYCW>Heliconius melpomene cDNA clone Hm_pwAE2_14F09 MYAKLFIVCIVAVVALA REYPAGLYPALCPNYPFCDSNTLARFTPDGMPIPEWVYNPSILPVAPADPNANIAAKYPANLNPAECPNYPYCW-
4
ACCEPTED MANUSCRIPT >gi|74326249|gb|DT662070.1|DT662070 Heliconius erato MYAKLFIVCVVAAVALA REYPAGLHPAVCPNYPFCDSNAFARFTPEGMPIPEWVYNPSILPVAPADPNANIAAKYPANLNPAECPNYPYCW>gi|308155535|gb|FS940564.1|FS940564 FS940564 Mamestra brassicae MYGKLLMFFALAAVALA REYPAGVHPAVCPNYPYCDADALARHTPDGMPIPQWGYNPAVAPAHPGPVPAAPRYPADFPAALCPNYPYCW-
(1 species, 1 sequences)
RI PT
Siphonaptera
Diptera
SC
>gi|604977437|gb|GAWY01009901.1| TSA: Oropsylla silantiewi comp14334_c0_seq1 MFVKIVLSVSALCLLASA APQAARYPAGLNPSLCPGYPYCDNLLLAKYAPSAAGGAYVAPATSHTHDYNGVGGDKYPAGVDPSTCPNYPFCDNNVGAGYYAPPLPGF KQRLYPDGVSAHNCPNYPFCH-
(27 species, 28 sequences)
AC C
EP
TE D
M AN U
>gi|118789538|ref|XP_317486.3| AGAP007980-PA [Anopheles gambiae str. PEST] MFSKVIAVLAFAAVVAA KPQHQPAAQYPAGVDPSRCPSYPNCDNAALHSPNPYNNHAANHWNPNWNAQPSWNAAPAPAPAPAAYYHGAPHSYQALTGPSHNYIGAP SPSAGGDRYPAGVNPQSCPNYPYCDNTVQAGVPQVAPLPGYTSRQYPAGVSPHTCPNFPYC>gi|568252033|gb|ETN61428.1| hypothetical protein AND_006914 [Anopheles darlingi] MFSKVIAVLAFAAVVAA KPQHQQAAQYPAGVDPSRCPSYPNCDNAALHSPNPYNNHAANHWNPNWNVQPSWNHAPAAPAPASYYHGAPHSYQALTGPSHNYLGAPA PTAGGDRYPAGVNPQSCPNYPYCDNSAPAGVPHVAPLPGYTARQYPAGVSPHACPNFPYC>gi|668457631|gb|KFB45634.1| AGAP007980-PA-like protein [Anopheles sinensis] MFSKVIAVLAFAAVVAA KPQHQPAAQYPAGVDPSRCPSYPNCDNAALHNPNHAYNNHAANHAANHWNPNWNAQPSWNAAPAPVPAPAPYYHGAPHSYQALTGPSHN YLGAPAPTAGGDRYPAGVDPQACPNYPYCNNLAPAGAPQAAPLPGFTSRQYPAGVSPHTCPNFPYC>gi|302221267|gb|EZ977345.1| TSA: Anopheles funestus Afun011392 MFSKVIAVLAFAAVVAA KPQHQQAAQYPAGVDPSRCPSYPNCDNAALHSPNPYNNHAANHWNPNWNAQPSWNPAPAPAPASYYHGAPHSYQALTGPSHNYLGAPAP TAGGDRYPAGVNPQSCPNYPYCDNTVQAGVPQAAPLPGYTSRQYPAGVSPHTCPNFPYC>gi|704848324|gb|GBTE01001587.1| TSA: Anopheles quadrimaculatus m.2806 MFSKVIAVLAFAAVVAA KPQHQPAAQYPAGVDPSRCPSYPNCDNAALHNPNHQTYNNHAANHWNPNWNAQPSWNAAPAHAPAPAPYYHGAPQSYQALTGPSHNYLG APASTAGGDRYPAGVDPQACPNYPYCDNLAPAGVPQAAPLPGYHARQYPAGVSGHTCPNYPYC>gi|157105379|ref|XP_001648842.1| hypothetical protein AaeL_AAEL004292 [Aedes aegypti] MYSKMIAVLALAAVAIA APQHQEAARFPAGVNPNACPSYPNCDNAALHNQNPPANHANNHWNPNWNAQPAAPAWNAQPAAPSWNPQPAAHSWNQQPAAPSWNPQPA AHSWNQQPAAPAWNAQPQPHWNSFPAVTGPANNHLAAAPAPSAGGDKYPAGVNPQTCPNYPFCDHAATAGAPQVAPLPGYTERLYPAGV SPHSCPNFPYCN>gi|401007996|gb|KA191234.1| TSA: Chironomus riparius CripIT16530 MFKLVTFVTLFAVAFS APQHAAKYPAGVDPSKCPNFPICDNAALHAKAPAYNHWDQPAAHWNQPAQAYNHWDHQPAAPQWAPAPQWNNAAQYNHVAPAAPKAAAK YPAGVDPRSCPDFPYCPTPILPGHHAHHVAPLPGFTERLYPAGVSAHTCPNYPEC>gi|511203332|gb|GAKJ01009672.1| TSA: Sitodiplosis mosellana Unigene17991_S_mosellanaA MNSKVICFFVLIAAVHS APQHDQPARYPAGVNPALCPGFPICDNSLLHGTPPVPAAPHAAYTGSPAWNHGAQSYAYHSAPAYNQWNQPQHYDAHDYDYSTNDINGP GGDKYPAGVNPSACPNYPYCDNGAASHYAPVATPLAGYASRQYPAGISPAACPNYPYCA>gi|697393690|gb|GBRL01010158.1| TSA: Sitodiplosis mosellana CL166.Contig1_S_mosellanaA MNSKVISFFVLIAAVYS APHATQWPAGVHPSVCPNYPYCDTGAIAASVAPLEGFSTRLYPAGISPAACPGYPICDNTVVHNTPLVNTVNPAWNQPSTVVDKYPAGV HPSACPNYPYCSTGPATPLEGFSTRLYPAGIVAASCPSYPYC-
5
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
>gi|590125442|gb|GAWM01017815.1| TSA: Culicoides sonorensis m.7430 MFSKVFVVLATIAYVAA QEAARYPAGVDPSRCPGFPICDNAALHNVNPVPYSAPSYHQPQYYSAPAPTNYDDTGAYDPRYNDPNFQGNNGGYYQAPAPVQHYQPAP VQYAAPVSNHIAAPAADKYPAGVSPNSCPNYPYCDVNAGHNGPARAAPLPGFTERLYPAGVNPSACPNFPDCPIGQ>gi|194753029|ref|XP_001958821.1| GF12575 [Drosophila ananassae] MFFKLLFASCLALALA KPQHPPAAQYPAGVNPQDCPGFPICDNARLHNPQPQWGAPQPQWQQPQPQWQPQPQPQWQPQPQWQQPQPQWQPQPSWNAAPAPSAGGD KYPAGVNPQTCPNYPYCDVNAGHGGAPVAAPPLPGWTERLYPAGVSPHQCPNFPYCH>gi|195474759|ref|XP_002089657.1| GE22901 [Drosophila yakuba] MFCKLLFATFVALAVA KPQHQPAAQYPAGVNPQDCPNFPICDNARLHNPQPQWGAPQPQWNPQPQPQWNPQPQWQQPQPQWNPQPQPQWQPQPSWNAAPAAAPGG DKYPAGVNPQTCPNYPYCDVNAGHAGAPVAAPPLPGWTERLYPAGVSPHQCPNFPYCN>gi|195030001|ref|XP_001987860.1| GH22145 [Drosophila grimshawi] MFYKLLLASCLALALA KPQHQPAAQYPAGVNPQDCPNFPICDNARLHNPHSQWQQPQPQWQQPQPQWQPQPQQHWQQPQSQWQQPQPQWQPQPAWNAPPAASAGG DKYPAGVNPQTCPNYPYCDVNAGHGGAPVAAPPLPGWTERLYPAGVSPHQCPNFPFCN>gi|221330063|ref|NP_610394.2| CG8736 [Drosophila melanogaster] MFCKLLFATFVALAVA KPQHQPAAQYPAGVNPQDCPNFPICDNARLHNPQPQWGAPQPQWNPQPQPQWNPQPQWQQPQPQWNPQPQPQWQAQPSWNAAPAAAPGG DKYPAGVNPQTCPNYPYCDVNAGHAGAPVAAPPLPGWTERLYPAGVSPHQCPNFPYCN>gi|194863443|ref|XP_001970443.1| GG10631 [Drosophila erecta] MFCKLLFATFVALAVA KPQHQPAAQYPAGVNPQDCPNFPICDNARLHNPQPQWGAPQPQWNAQPQPQWNPQPQPQWQQPQPQWNPQPQPQWQPQPSWNAAPAAAP GGDKYPAGVNPQTCPNYPYCDVNAGHAGAPVAAPPLPGWTERLYPAGVSPHQCPNFPYCN>gi|195148990|ref|XP_002015442.1| GL11020 [Drosophila persimilis] MFFKLLFASCLALALA KPQHPPAAQYPAGVNPQDCPGFPICDNERLHSPKSQWGAPQNQWQPQPQWQQQPQTWQPQPQWQPQPQTWQPQPQWQPQPQPSWNSAPA PAAGGDKYPAGVNPQTCPNYPYCDVNAGHAGGPVAAPPLPGWTERLYPAGVSPHECPNFPYCH>gi|195581589|ref|XP_002080616.1| GD10155 [Drosophila simulans] MFCKLLFATFVALAVA KPQHQPAAQYPAGVNPQDCPNFPICDNARLHNPQPQWGAPQPQWNPQPQPQWNPQPQWQQPQPQWNPQPQPQWQPQPSWNAAPAAAPGG DKYPAGVNPQTCPNYPYCDVNAGHAGAPVAAPPLPGWTERLYPAGVSPHQCPNFPYCN>gi|195332522|ref|XP_002032946.1| GM20677 [Drosophila sechellia] MFCKLLFATFVALAVA KPQHQPAAQYPAGVNPQDCPNFPICDNARLHNPQPQWGAPQPQWNPQPQPQWNPQPQWQQPQPQWNPQPQPQWQPQPSWNAAPAGAPGG DKYPAGVNPQTCPNYPYCDVNAGHAGAPVAAPPLPGWTERLYPAGVSPHQCPNFPYCN>gi|195120670|ref|XP_002004844.1| GI19375 [Drosophila mojavensis] MFFKLLFVSCVGVALA KPQLQPAAQYPAGVNPQDCPNFPLCDNARLHNPQSQWQQPQPQWQPQPQWQPQPQPQWQPQPQWQQPQPQWQPQPSWNPAPAPAPGGGD KYPAGINPQTCPNYPYCDVNAGHAAAPVAAPPLPGWTERLYPAGVSPQQCPNFPYCH>gi|195455414|ref|XP_002074713.1| GK23212 [Drosophila willistoni] MFFKLLLFVSCLALTLA KPQHQQAAQYPAGVNPQDCPGFPICDNARLHNPQAHNQWQPQPQWQQPQPQWQQPQQQWQPQPQWQQPQQQWQPQPSWNAAPAPSAGGD KYPAGINPQTCPNYPYCDVNAGHAGAPVAAPPLPGWTERLYPAGVSPHQCPNFPYCQ>gi|195384435|ref|XP_002050923.1| GJ19933 [Drosophila virilis] MFFKLLLASCLALALA KPQHPPAAQYPAGVNPQDCPGFPICDNARLHNPQSQWQQPQPQWQQPQPQWQPQPQQHWQQPQPQWQQPQPQWQPQPSWNAAPAPAAGG DKYPAGINPQTCPNYPYCDVNAGHAAAPVAAPPLPGWTERLYPAGVSPHQCPNFPFCN>gi|499013673|ref|XP_004537594.1| PREDICTED: cuticle protein 1-like [Ceratitis capitata] MFCKLAFISLFAVALA KPQHLPAAQYPAGVNPQDCPGFPICDNARLHNPQAHWGAPAPAWQPQPQWGAPAPAWQPQPHWGAPAAPSWQPAPAPSAGGDKFPAGVN PHTCPNYPFCDVNAGHGAVAAPPLPGWTERQYPAGVSPHQCPNFPYCN>gi|615215794|gb|GBBP01054551.1| TSA: Teleopsis dalmanni Td_comp152510_c0_seq2 MFFKLCLLSVIALTFA KPQHAPAAQFPAGVNPQDCPGFPICDNARLHNPQSNWGAPQPSWNSQPSWNNGQSQWNNNGQWNNNNDDGQWNGGHDQWNNNNDQWNNG GQSSWNNGGQSSWNNGGQSSWNNGGQSSWNNGGQSWNGAPTGGNAGSGQFPAGVNPHSCPNYPFCNINGGGSAPVAAPPLPGWSERQYP AGVSPHQCPNFPYCK>gi|545914203|gb|GANO01004087.1| TSA: Corethrella appendiculata CorSigP-3899 MFSKLIAILATVAAVSA APQHLEAARFPAGVNPAACPGYPNCDNAALHNPQPQWNQWNAPQPQWNAAPQPQWNPAPQPQWNAAPQPQWNQHAEAAPQWDPNTKNNN PLWNVPAAAQQYNYPALTGPATNHLGSGGDKYPAGVNPHTCPNYPYCDTNAGHAGAVRAAPLPGFTERQYPAGVNPHQCPNFPYCS-
6
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
>gi|289743532|gb|EZ424238.1| TSA: Glossina morsitans morsitans GM-8604 MFTKLVFFGLMSLALA KPQHLPAAQYPAGVNPQDCPGFPICDNARLHNPAARWQSAPAWKPQPQWSAPHWQAPAPAWNAPAPAWGAPAPAAGGDKFPAGVSPHTC PNYPFCDLHAGAAGAPAPPLPGWTERQYPAGVSPHTCPNFPYCH>gi|557778123|ref|XP_005188694.1| PREDICTED: cuticle protein 1-like [Musca domestica] MFTKLVILSLAAVACA KPQAAQYPAGVNPQDCPGFPICDNARLHNPASRWQQPQPAWQPQPSWQQPQPSWQPQPSWQQPQPSWQPQPQWNAPPAAPGAADKFPAG VSPHTCPNYPFCDVNAGGAAAPAPPLPGFTERQYPAGVSPHTCPNFPYCN>gi|751797740|ref|XM_011210452.1| PREDICTED: Bactrocera dorsalis cuticle protein 1 MFCKLAFISSLIALALA KPQHLPAAQYPAGVNPQDCPGFPICDNARLHNPQAHWGAPAPAWQPQPQWGAPAPSWQQQQQWGAPAPSWQGAPAPSWQGAPAASAGGD KFPAGVNPHTCPNYPFCDVNAGQHGAVAAPPLPGWTERQYPAGVSAHQCPNFPYCN>gi|751475075|ref|XM_011194661.1| PREDICTED: Bactrocera cucurbitae cuticle protein 1 MFCKLAFISSLIALALA KPQHLPAAQYPAGVNPQDCPGFPICDNARLHNPQAHWGAPAPAWQPQPQWGAPAPSWQPQPQWGAPAPSWQGAPAASAGGDKFPAGVNP HTCPNYPFCDVNAGHGAVAAPPLPGWTERQYPAGVSPHSCPNFPYCN-
7
ACCEPTED MANUSCRIPT
SUPPLEMENTARY FILE 8 MULTIPLE CPCFC GENES IN LADONA FULVA (Odonata) from WGS BioProject: PRJNA194433 >gi|481388480|gb|APVN01034708.1| Ladona fulva Contig34725, whole genome shotgun sequence
Range 1: 3552 to 3983
RI PT
QNKRGYVGMLEEAVFIIESGGRQVEGRLCTESVLFFLQMLLLVAAAVAVADKYPAGLNPALCPNYPNCDNALIALHSANPSAVTPYWAA PIAKEYPAGVHPAACPNYPYCGTAVPLAYAAWAPYTREWPAGVHPAACPNYPYCH>gi|481308109|gb|APVN01075567.1| Ladona fulva Contig75636, whole genome shotgun sequence
Range 1: 3296 to 3733
MGSAVAPAAKYPAGVNPHACPNYPYCDNVALAAHSAGAVHAPYAAPAYAHYGVHGPVAAAYPAGVDPHTCPNYPYCDNVAVHTARSAHG WAAPAWTAHGAWAAAPHAAWTGVAHGGWAGAHGVAHGAARYPAGVNPHTCPNYPYCH-
SC
>gi|481308101|gb|APVN01075571.1| Ladona fulva Contig75640, whole genome shotgun sequence
Range 1: 660 to 944
M AN U
MCVAAVLGGAVFPAARYPAGVNPLACPNYPYCDNVALAANPAGWARSAWNVPAWTAYSAPWNGVWSGVIPGVTPVAARYPPGVDPVACP NYPYCH >gi|481308097|gb|APVN01075573.1| Ladona fulva Contig75642, whole genome shotgun sequence
Range 1: 2724 to 3119
FQIILALCATAVLSTGIPAAKYPAGVSPHTCPNYPYCDNVALAAHAVAPYAAPAYAHYGVHGPVAAAYPAGVDPHTCPNYPYCDNVALA AHVTGAHGVYGAPWAAHGAWAGAAAHYPAGVSPHTCPNYPFCH>gi|481308097|gb|APVN01075573.1| Ladona fulva Contig75642, whole genome shotgun sequence
Range 1: 7900 to 8487
TE D
MPELTKEIIKYYCFRSQIALVLCVAIALGCANGQAAKYPAGVSPHLCPNYPHCDNAVLGAAAADSAAHAYSAPVYGGYAAPGYAAPGYA APGYAAPGYAASGHAAVAAVGYPAHVNPHSCPNYPYCGPTPVHVPSKVWAGAAAHGYGATAHNAWAAPAAYAHGGAASHGFSSLAKGGD RYPAGVSPHACPNYPYCH>gi|481308097|gb|APVN01075573.1| Ladona fulva Contig75642, whole genome shotgun sequence
Range 1: 14655 to 15188
EP
METARSNIPHLLFQIILALSAVAFIESVHSQAAKYPAGVDPHLCPNYPHCDNAALGAAASAAGAVAHAYAVPTYEEPSY HSYGAHASSAYASYAPAVPVSEGYPAGVNPHACPNYPFCGPTPTHVPSKIWAGPSAHGYPTPAYSPYGASAAYASPAKGGDRYPDGVDP HSCPNYPYCH>gi|481308089|gb|APVN01075577.1| Ladona fulva Contig75646, whole genome shotgun sequence
Range 1: 4957 to 5610
AC C
MSAGASGIMSEKSIIFFFFPQIVVALCIVGALGAAVPEAARYPAGVDPHVCPNYPNCDNVALAARASVPYAPSAPTHYAAPAYGHHAA APVPASLPAGVDARACPNYPYCGPTPVQPQAPSGYSSWSAPAPAPQAPSAYSSWSAPAPQPSWTQPAPQPAWSNPSSQNFWSNPSAPRA AVPNFNWNTPAQSPAPSSEQGGALFPAGVDPSSCPNYPYCH>gi|481308087|gb|APVN01075578.1| Ladona fulva Contig75647, whole genome shotgun sequence
Range 1: 98 to 856
HIGYNLFTHSLSSLISASTILFQFAVALCLLSCTVAQYVNYSPQVVQRPCYDYPNCGNIHRSPQVSNDGQDNIIWPDDGSYPGDTKES QVTEAPGEPGYPAGVNSNLCPNYPFCGPTPVYVKGKASDQAIWSVQATKSQPLSSPLAVHQQTSYKAPAAPSHTQYFVPSRAAPHNDWN GPSVQQVQQVSWTAPAVGKPHPNSLPAVAHNKWSAASIHSPATIPVAPFDYSAEGGVRYPADVDPNSCPNYPYCRV>gi|481308083|gb|APVN01075580.1| Ladona fulva Contig75649, whole genome shotgun sequence
Range 1: 1448 to 1903 LCFKVLVLCVVSAFVNAAPQAARYPAGVDPHTCPNYPNCDNVALAAHATGAAHAPYAAPTYSAYGHSSGVPGAAATLPAGVDARACPNY PYCGPTPVAVPRPHNTWSAPAAYNQWSAPAAPQQWSAPAPAQGGAHYPAGVDPHACPNYPYCS-
ACCEPTED MANUSCRIPT >gi|481308083|gb|APVN01075580.1| Ladona fulva Contig75649, whole genome shotgun sequence
Range 1: 10447 to 10989 IVFALCVVSALAAPQAAKYPAGVDPHTCPNYPNCDNVALAAHATGAPYAAPAYSAYGHAAGVPGAAAAYPAGVDPHACPNYPYCGPTPA HVPGAAPHNSWAAPAAHNNWAAPAAHNNWAAPAAHNNWAAPAAHNNWAAPAAHNNWAAPAAHNAWAAPAPAAAAVYPAGVSPHSCPNYP YCS>gi|481308083|gb|APVN01075580.1| Ladona fulva Contig75649, whole genome shotgun sequence
Range 1: 16205 to 16690
RI PT
IILAVCVVGTFGNPIPLAAKYPAGVNPHACPNYPYCDNALTAHAPYAAPAYAAYGHHGGVPGAAAKYPAGVDPHLCPNYPFCGPAVAHV PGVHGGWEGAGAWAGAHHGWDDGSYHGDDEGTYYGGDDDGSYNHWDDGSYNEWDDGSYNHWDDGSYHGDGHHW>gi|481308079|gb|APVN01075582.1| Ladona fulva Contig75651, whole genome shotgun sequence
Range 1: 488 to 1003
MPCFRQFILVLSAIATASAAISAAKYPAGVDPHTCPNYPNCDNVALAAHATGAPHAPYAAPAYSAYGHAAGVPGAAAA YPAGVDPHACPNYPYCGPTPAHVPGAAPHNSWAAPAAHNPWVAPAVNNAWAAHNGWAAAPATGHDGNHYPAGVSPHSCPNYPYCH-
SC
>gi|481308075|gb|APVN01075584.1| Ladona fulva Contig75653, whole genome shotgun sequence
Range 1: 2969 to 3481
M AN U
RSTLIRDSKIPIFFFHCSQQLSLILQIVLALCVAGALGGLVPHAAKYPAGVSPHTCPNYPFCDVSAHAVAPYAAHAYAAYGHHAGVPG AAAAYPAGVDPHICPNYPFCGPTPAHHGWAGAGAGAWDDGSYHPWYDNSAHYDDGSYKPWLDNAGHNDDGSYRPWQYGGHHHW>gi|481265194|gb|APVN01098002.1| Ladona fulva Contig98099, whole genome shotgun sequence
Range 1: 4925 to 5497
AC C
EP
TE D
MTNGSRHFRLLLQIILALCVAGALGSAIPAAAKYPAGVNPHTCPNYPYCDNVAVAAHAAHGAYGAHAAAPYAAHAYATYGHHAGVPGAA AKYPAGVDPHACPNYPFCGPTPAHVPGAHGAWAGHSASAGAAHNAWAGAHGGWAGAHGGWAGAHHGGWDDGSYHGEDDGQYHHWDDGSH WDDGSYHGDHHHW-
ACCEPTED MANUSCRIPT
SUPPLEMENTARY FILE 9 CPCFC FAMILY MEMBERS IN CRUSTACEA Ostracoda
(1 species?; 2 sequences)
Malacostraca
(4 species, 6 sequences) motif in all ends C-X(7)-C
SC
RI PT
>gi|333210927|gb|JL247049.1| TSA: Cypridininae sp. BMR-2011 mRNA sequence MMSFRLLVASILFTCALS KVIFPAGVNPAACPNFPFCDALIDPVTGNQVAPAENYPGYVPYNLKYPAGLIPAACPDFPYCTGRADRLLRFVNTGRQEVPAGINPAGW YA>gi|333231688|gb|JL267722.1| TSA: Cypridininae sp. BMR-2011 mRNA sequence MSLILLVASCLLATSMA LPRRLPPGVSLVGCPQWPICDPLIDPLTGASRGDPKDFPGYVPLKLKNPIGLSVLSCPDYPFCRGRAERQLQFLVTGRQQVPADVDPAL WYS-
AC C
EP
TE D
M AN U
>gi|510192250|gb|GAKD01008615.1| TSA: Melita plumulosa mira_rep_c8871 MSFIRATCLVLLVAVSLSTALP QELPAGVTAAECPNFPFFNCSPLLKAVAPAGQPAPSAAAVAAGAPAPKNPAGVKCFNFPFFPCNP>gi|510079550|gb|GAJQ01012904.1| TSA: Hyalella azteca contig16944 MAGLKIFTFCSALLVTLA ATNTVATPVLPAGVTAAECPNFPFFNCSPLLKAVVPESLAPAPAVPSPNAPPQQPANPAGVRCFNFPFFPCSP>gi|510070149|gb|GAJP01004516.1| TSA: Hyalella azteca contig06745 transcribed RNA sequence MAGLKIFAFCSALLVTLA ATNTVATPVLPAGVTAAECPNFPFFNCPLLKAVVPESLAPAPAVTSPNAPPQQPANPAGVRCFNFPFFPCSP>gi|742949070|gb|GARH01025610.1| TSA: Procambarus clarkii Prcla_ES_994_0 transcribed RNA sequence MRSLVVVLVVVVVMVVVVVVG YPSQLPAGVTAADCPTYPFFPCRVPVQPAQPANPASVTCFNFPFYSC>gi|170194750|gb|FE752277.1|FE752277 CAYF2211.g3 CAYF Petrolisthes cinctipes MRFNIAMVMMVVVVVVMVGVAMA LPANLPASVSAADCPGYPFYSCRQPAHPPQPANPAGVTCYNFPFYHCS>gi|170215384|gb|FE772151.1|FE772151 CCAG6495.b3 CCAG Petrolisthes cinctipes MRFNIAMVVVVVMMGVAMA LPANLPASVSAADCPGYPFYSCRQPAHPPQPANPAGVTCYNFPFYHCS-
Maxillopoda (3 species, 3 sequences) >gi|218457166|gb|FM882955.1|FM882955 FM882955 BA23840_2 Amphibalanus amphitrite cDNA clone 08_F12, mRNA sequence MLSLSVLVALVAVCSA QPVEYPEGVSPAACPNYPYCGTDANTLAAIQLASLAPSVRQYPAGVSAAACPNYPDCGSNSAIVSPAGVPLTRQYPAGVSAAACPNYPD CGSNSAIVSPAGVPLTRQYPAGVSAAACPNFPDCGSNSAIVSPAGVPLTRQYPAGVSAAACPNYPHC>gi|592916158|gb|GAXK01042217.1| TSA: Calanus finmarchicus comp299568_c0_seq1 MLAKIVSLCLMSTLVSG QAAQWPAGVSPAACPNYPDCSLTPAGYAGAVSAYPAGVLPAACPDYPYCTAAPAAPAGYVNTAGYPAGVAAAACPNFPYCY>gi|597435325|gb|GARW01001542.1| TSA: Eucyclops serrulatus MFTKIAVFAALFCLAAG QILLPAGVDPAVCPNYPYCDGVTVAPNLPAAAYPADVAPAACPDYPFCSSRAAAPIGYANTAGWPAGVAPAACPNFPYC-
ACCEPTED MANUSCRIPT
Remipedia
(1 species, 2 sequences)
AC C
EP
TE D
M AN U
SC
RI PT
>gi|333322291|gb|JL185402.1| TSA: Speleonectes cf. tulumensis BMR-2011 DMPC15238678 MLKFVVLLVLMATHLAMSHPV QYPAGVSPHECPNYPFCIRHPSHATANIERNFPSNILPAVCPNYPFCDNELLVQYL>gi|333269853|gb|JL132964.1| TSA: Speleonectes cf. tulumensis BMR-2011 DMPC15226770 MYKLITILVVVAVALAKP QEYPSGVNPATCPNYPFCTYEGVPSMLTHPAGVHHTVCPNYPFCTNTPQAYASVLPSFKYPAGVNPAVCSNYPYCG-
ACCEPTED MANUSCRIPT
SUPPLEMENTARY FILE 10 NON-ARTHROPOD TSA HITS and their most closely related Arthropod source. Initiator methionines are highlighted in green, final aa in red. The CPCFC consensus, marked only in the Arthropod sequence, except for the daisy, is in gray.
RI PT
Homo sapiens: Note that there is 100% identify in the protein coding sequence (between green and red) and extensive identity in both the 5’ and 3’ UTRs.
SC
>gi|389142709|gb|HY131203.1|HY131203 HY131203 RIKEN full-length enriched human cDNA library, brain Homo sapiens cDNA clone H06D096C22, mRNA sequence Blatella germinica 1: gi|698758469|gb|GBID01001268.1| TSA: Blattella germanica Contig1280 Blatella germinica 2:gi|698757539|gb|GBID01002198.1| TSA: Blattella germanica Contig2218
---------LYSFI-HFRSKSSTITMYCKLVVLAAIVAVAVAQADKYPAGLSPALCPNYP ---------LYSFI-HFRSKSSTITMYCKLVVLAAIVAVAVAQADKYPAGLSPALCPNYP HCRPVSIELALSFI-HFRSKSSTITMYCKLVVLAAIVAVAVAQADKYPAGLSPALCPNYP *** *********************************************
Blatellagermanica1 Blatellagermanica2 humanbrainHY131203.1
HCDNALIALYSNNAPAVPYASAYNYPAGVSPAACPNYPFCGALAPLGYHVREYPAGVSPA HCDNALIALYSNNAPAVPYASAYNYPAGVSPAACPNYPFCGALAPLGYHVREYPAGVSPA HCDNALIALYSNNAPAVPYASAYNYPAGVSPAACPNYPFCGALAPLGYHVREYPAGVSPA ************************************************************
Blatellagermanica1 Blatellagermanica2 humanbrainHY131203.1
ACPNYPYCH-IDLGI-SLGVA-CRKDSVLRVEMLLEQCIYTSICGEKFSYECYLSKIITR ACPNYPYCH-IDLGI-SLGVA-CRKDSVLRVEMLLEQCIYTSIRGEKFSYECYLSKIITR ACPNYPYCH-IDLGI-SLGFA-CRKDSVLRVEILLEQCIYTSISGEKLVFSAI-LXS--********* ***** ***.* **********:********** ***: :..
TE D
M AN U
Blatellagermanica1 Blatellagermanica2 humanbrainHY131203.1
The daisy, Karelinia caspia:
AC C
EP
Perfect match to TSA: Bemisia tabaci BT_B_ZJU_Singletons82302 Sequence ID: gb|HP653972.1 that covers truncated coding sequence of the whitefly and extends into the 5’UTR. Daisy sequence completes the whitefly sequence, and there is identity in the 5’UTR. >gi|675980905|gb|GANI01023091.1| TSA: Karelinia caspia comp33299_c0_seq1 >gi|319670146|gb|HP653972.1| TSA: Bemisia tabaci BT_B_ZJU_Singletons82302 mRNA sequence Bemisia Karelinia
---KKKKFRGFQPTVQHCSV-FRFPVFSVTTTNHKMIGKLVVLSALVAVVLAQAQQWPAG LAQAQQWLRGFQPTVQHCSV-FRFPVFSVTTTNHKMIGKLVVLSALVAVVLAQAQQWPAG :: :************ ***************************************
Bemisia Karelinia
LNPAACPNYPNCDNTVVALYGGLPYAPAASVGRSYPAGVPAAA----------------LNPAACPNYPNCDNTVVALYGGLPYAPAASVGRSYPAGVPAAACPNYPFCGSAAAPAGYV *******************************************
Bemisia Karelinia
-------------------AREYPAGVPAAACPNYPYC-
1
ACCEPTED MANUSCRIPT
The hop, Humulus lupulus >gi|422196604|gb|GAAW01027316.1| TSA: Humulus lupulus comp42311_c0_seq1 used as query.
RI PT
Best match (Sbjct) was to Bactrocera oleae, the olive fruit fly. >gi|510288382|gb|GAKB01003870.1| TSA: Bactrocera oleae contig03870 transcribed RNA sequence. Only the mature protein is shown. The top four matches were all to members of the genus, Bactrodera.
Expect Method Identities Positives Gaps Frame Score 106 bits(264) 4e-25 Compositional matrix adjust. 75/149(50%) 89/149(59%) 19/149(12%) +2
Sbjct
230
Query
61
Sbjct
389
Query
114
Sbjct
554
AQYPAGVNPHLCPNYPHCDNALLGLHAQNAAAAAAAPAHNAYPYANPNPYGNPNPYGNPN AQYPAGVNP CP +P CDNA L H A A APA P +G P P P AQYPAGVNPQDCPGFPICDNARL--HNPQAHWGAPAPAWQPQPQ-----WGAPAPSWQPQ
60
PYAVPAVPSYVPNHLGVPA---HGQPAAA----QYPAGVSPHECPNYPYCSNHPGAGGPV P PS+ G PA +G PAA+ ++PAGV+PH CPNYP+C + G G V PQWGAPAPSW----QGAPASSWNGAPAASAGGDKFPAGVNPHTCPNYPFCDVNAGHGA-V
113
SC
1
388
553
M AN U
Query
AAPPLPGFSSRQYPDGVSPHACPNYPYCH AAPPLPG++ RQYP GVSPH CPN+PYC+ AAPPLPGWTERQYPAGVSPHQCPNFPYCN
142 640
Chinese salamander, Hynobius chinensis:
TE D
gi|570852487|gb|GAQK01079415.1| TSA: Hynobius chinensis comp4708_c0_seq1 The best match (Sbjct) indicates that the sequence that had been attributed to the salamander (used as the query against TSA) actually comes from a chironomid related to gi|401007996|gb|KA191234.1| TSA: Chironomus riparius CripIT16530 mRNA sequence.
1
Sbjct
85
Query
61
Sbjct
259
Query
118
Sbjct
439
MFKLVTFVTLFAVAFSAPQHAAKYPAGVDPAKCPGFPICDNAALHAPAHAPAYNHWDQPA MFKLVTFVTLFAVAFSAPQHAAKYPAGVDP+KCP FPICDNAALHA A PAYNHWDQPA MFKLVTFVTLFAVAFSAPQHAAKYPAGVDPSKCPNFPICDNAALHAKA--PAYNHWDQPA
60
NHWSPPAPAYNHWDHQPAAHWNQPAPQWN---QYNHVAPAAPKAAAKYPAGVDPSKCPNF HW+ PA AYNHWDHQPAA PAPQWN QYNHVAPAAPKAAAKYPAGVDP CP+F AHWNQPAQAYNHWDHQPAAPQWAPAPQWNNAAQYNHVAPAAPKAAAKYPAGVDPRSCPDF
117
AC C
Query
EP
Score Expect Method Identities Positives Gaps Frame 229 bits(583) 6e-73 Compositional matrix adjust. 142/162(88%) 146/162(90%) 5/162(3%) +1
PYCPTPVLPGHHAHHVAPLPGFTERLYPAGVSAHTCPNYPEC PYCPTP+LPGHHAHHVAPLPGFTERLYPAGVSAHTCPNYPEC PYCPTPILPGHHAHHVAPLPGFTERLYPAGVSAHTCPNYPEC
258
438
159 564
2
ACCEPTED MANUSCRIPT Daphnia pulex (library 12) There are extensive EST data for Daphnia, from libraries where Daphnia had been subjected to challenging environments. Interestingly, the only matches to CPCFC queries were from library 12 that came from animals exposed to the midge, Corethrella appendiculata.
Subject: TSA: Corethrella appendiculata CorSigP-3899 mRNA sequence
RI PT
>Daphnia pulex FE342003.1 -- query MFSKVIVLCATLAVSFA APQHQEAARYPAGVNPAACPSYPNCDNAALHNPQPQNHQQNHWNPSWNAAPAPYTAPNHYSPPAPAYNHEQNQWNPSWNAPALHGPAHN YLGNPAPASAPTGGDKYPAGVNPQSCPNYPFCDNSAPAGHQQ VAPLPRFTERQYPAGVNPHTCPNFPYCN
Query
61
Sbjct
181
Query
98
Sbjct
361
Query
157
Sbjct
517
M AN U
1
NPSWNAAPAPYTAP--------------NHYSPPAPAYNHEQNQWNPSWNA--------P WNAAP P P N ++ AP ++ NP WN QPQWNAAPQPQWNPAPQPQWNAAPQPQWNQHAEAAPQWDPNTKNNNPLWNVPAAAQQYNY PALHGPAHNYLGNPAPASAPTGGDKYPAGVNPQSCPNYPFCD-NSAPAGHQQVAPLPRFT PAL GPA N+LG +GGDKYPAGVNP +CPNYP+CD N+ AG + APLP FT PALTGPATNHLG--------SGGDKYPAGVNPHTCPNYPYCDTNAGHAGAVRAAPLPGFT ERQYPAGVNPHTCPNFPYCN ERQYPAGVNPH CPNFPYC+ ERQYPAGVNPHQCPNFPYCS
176
60 180 97 360 156 516
TE D
Sbjct
MFSKVIVLCATLAVSFAAPQHQEAARYPAGVNPAACPSYPNCDNAALHNPQPQNHQQNHW MFSK+I + AT+A AAPQH EAAR+PAGVNPAACP YPNCDNAALHNPQPQ +Q N MFSKLIAILATVAAVSAAPQHLEAARFPAGVNPAACPGYPNCDNAALHNPQPQWNQWNAP
576
EP
1
AC C
Query
SC
Score Expect Method Identities Positives Gaps Frame 186 bits(472) 2e-56 Compositional matrix adjust. 113/200(57%) 129/200(64%) 32/200(16%) +1
3