The CPCFC cuticular protein family: Anatomical and cuticular locations in Anopheles gambiae and distribution throughout Pancrustacea

The CPCFC cuticular protein family: Anatomical and cuticular locations in Anopheles gambiae and distribution throughout Pancrustacea

Accepted Manuscript The CPCFC cuticular protein family: anatomical and cuticular locations in Anopheles gambiae and distribution throughout Pancrustac...

8MB Sizes 0 Downloads 26 Views

Accepted Manuscript The CPCFC cuticular protein family: anatomical and cuticular locations in Anopheles gambiae and distribution throughout Pancrustacea Laura Vannini, John Hunter Bowen, Tyler W. Reed, Judith H. Willis PII:

S0965-1748(15)30021-7

DOI:

10.1016/j.ibmb.2015.07.002

Reference:

IB 2733

To appear in:

Insect Biochemistry and Molecular Biology

Received Date: 20 May 2015 Revised Date:

2 July 2015

Accepted Date: 3 July 2015

Please cite this article as: Vannini, L., Bowen, J.H., Reed, T.W, Willis, J.H, The CPCFC cuticular protein family: anatomical and cuticular locations in Anopheles gambiae and distribution throughout Pancrustacea, Insect Biochemistry and Molecular Biology (2015), doi: 10.1016/j.ibmb.2015.07.002. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT

The CPCFC cuticular protein family: anatomical and cuticular locations in Anopheles

2

gambiae and distribution throughout Pancrustacea.

3

Laura Vannini1, John Hunter Bowen1, Tyler W Reed1, Judith H Willis1*

4

1

5

*Correspondence: [email protected]

6

Abstract

7

Arthropod cuticles have, in addition to chitin, many structural proteins belonging to

8

diverse families. Information is sparse about how these different cuticular proteins

9

contribute to the cuticle. Most cuticular proteins lack cysteine with the exception of two

10

families (CPAP1 and CPAP3), recently described, and the one other that we now report

11

on that has a motif of 16 amino acids first identified in a protein, Bc-NCP1, from the

12

cuticle of nymphs of the cockroach, Blaberus craniifer (Jensen et al., 1997). This motif

13

turns out to be present as two or three copies in one or two proteins in species from

14

many orders of Hexapoda. We have named the family of cuticular proteins with this

15

motif CPCFC, based on its unique feature of having two cysteines interrupted by five

16

amino acids (C-X(5)-C). Analysis of the single member of the family in Anopheles

17

gambiae (AgamCPCFC1) revealed that its mRNA is most abundant immediately

18

following ecdysis in larvae, pupae and adults. The mRNA is localized primarily in

19

epidermis that secretes hard cuticle, sclerites, setae, head capsules, appendages and

20

spermatheca. EM immunolocalization revealed the presence of the protein, generally in

21

endocuticle of legs and antennae. A phylogenetic analysis found proteins bearing this

22

motif in 14 orders of Hexapoda, but not in some species for which there are complete

RI PT

1

AC C

EP

TE D

M AN U

SC

Department of Cellular Biology, University of Georgia, Athens, GA, USA

1

ACCEPTED MANUSCRIPT

genomic data. Proteins were much longer in Coleoptera and Diptera than in other

24

orders. In contrast to the 1 and occasionally 2 copies in other species, a dragonfly,

25

Ladona fulva, has at least 14 genes coding for family members. CPCFC proteins were

26

present in four classes of Crustacea with 5 repeats in one species, and motifs that

27

ended C-X(7)-C in Malacostraca. They were not detected, except as obvious

28

contaminants, in any other arthropod subphyla or in any other phylum.

29

The conservation of CPCFC proteins throughout the Pancrustacea and the small

30

number of copies in individual species indicate that, when present, these proteins are

31

serving important functions worthy of further study.

32

Keywords:

33

Cuticle, EM immunolocalization, in situ hybridization, arthropod phylogeny, RT-qPCR

34

1. Introduction

35

Over a dozen families of cuticular proteins (CPs) have been described. One (CPR) has

36

well over 100 genes in several species (Cornman et al., 2008; Futahashi et al., 2008;

37

Cornman, 2009; Willis, 2010; Willis et al., 2012; Ioannidou et al., 2014; Neafsey et al.,

38

2015). Additional data on temporal and spatial expression (both in terms of tissue

39

distribution and location within the cuticle) have also been published. Early papers are

40

reviewed in Willis et al. (2012), more recent ones are Nor et al. (2014; 2015), Pesch et

41

al. (2015) and Vannini et al. (2014a,b). An unusual family that generally has only one

42

member in a species (and very rarely more than two) was named CPCFC by Willis et al.

43

(2012) because of a motif of C-X(5)-C (two cysteines interrupted by five amino acids).

44

The “type specimen” for CPCFC is Bc-NCP1, isolated from nymphal cuticle of the

AC C

EP

TE D

M AN U

SC

RI PT

23

2

ACCEPTED MANUSCRIPT

cockroach, Blaberus craniifer (Jensen et al., 1997) [GenBank: P80674]. The paper

46

describing that sequence established the fundamental property of the family: a 16

47

amino acid motif, here repeated 3 times, that ends C-X(5)-C. The final motif is at the

48

carboxy-terminus of the protein. In addition, Jensen et al. (1997) speculate, after ruling

49

out a role in cross-linking via quinones: “It is more likely that the three cysteine-

50

containing loops in Bc-NCP1 are involved in some sort of specific interaction or binding,

51

either to metal ions or to other proteins.”

52

Now we describe, in detail, expression and localization of one member of that family,

53

AgamCPCFC1, in Anopheles gambiae. We conclude with an analysis of the

54

phylogenetic distribution of members of that family in many orders of Pancrustacea

55

(Hexapoda + Crustacea). Our analysis revealed consistent variants of CPCFC proteins

56

in different orders. The wide-spread distribution of this family represents the second

57

time a motif identified in a few cuticular protein sequences (5 in the case of the R&R

58

Consensus in the CPR family (Rebers and Riddiford, 1988), one sequence here

59

(Jensen et al., 1997) turns out to have been conserved in CPs found throughout

60

arthropods (reviewed in Willis 2010; Willis et al., 2012).

61

2. Materials and methods

62

2.1. Anopheles rearing

63

An. gambiae (G3 strain) were obtained as newly hatched first instar larvae from the

64

breeding facility at the University of Georgia Entomology Department. They were raised

65

at 27 oC under a 12:12 photoperiod and fed ground Koi Food Staple Diet (Foster and

66

Smith Aquatics, Rhinelander, WI USA).

AC C

EP

TE D

M AN U

SC

RI PT

45

3

ACCEPTED MANUSCRIPT

2.2. RT-qPCR

68

An. gambiae larvae, pupae and adults were carefully timed relative to a molt, placed in

69

TRIzol® and immediately frozen. RNA was prepared following the manufacturer’s

70

instructions. Superscript III First Strand Synthesis Kit (Invitrogen) with oligo (dT)20

71

primers was used for cDNA production, and RT-qPCR was carried out with Bio-Rad’s

72

CFX Connect Real Time system. Additional details are in Supplementary File 1 that

73

provides MIQE information in a format recommended by Bustin et al. (2013).

74

Calculations were carried out with LinRegPCR software (Ruijter et al., 2009).

75

The primers used were located near the end of the coding region and extended into the

76

3’UTR with an amplification product of 103 nt (Supplementary Files 2,3). Before use,

77

the primers were checked on genomic DNA for amplification kinetics against two single

78

copy genes, RpS7 [GenBank:AGAP010592] and the epidermal chitin synthase

79

[GenBank:AGAP001748], to assure that they were only amplifying a single gene. RpS7

80

was run on every plate with every cDNA preparation, but was not used to normalize

81

values. Rather, we calculate N0, described as R0 in Togawa et al. (2008), basing values

82

on concentrations of RNA determined with NanoDrop N-1000 (Thermo Scientific). This

83

was necessary because we have failed to find housekeeping genes with consistent

84

expression across the range of developmental stages we studied. Figures showing the

85

variable values obtained with the RpS7 primers and CPCFC1 data normalized to RpS7

86

are in Supplementary File 4.

87

2.3. In situ hybridization

AC C

EP

TE D

M AN U

SC

RI PT

67

4

ACCEPTED MANUSCRIPT

In situ hybridization was carried out on 4 µm paraffin sections of paraformaldehyde fixed

89

An. gambiae of different developmental stages prepared by the Histology Laboratory of

90

the University of Georgia College of Veterinary Medicine. DIG-labeled anti-sense probe

91

preparation and hybridization followed the methods described in earlier publications

92

from our laboratory (Vannini et al., 2014a,b). The primers used and resulting probes are

93

shown in Supplementary Files 2 and 3, respectively. We used one probe directed

94

against the coding region and another against the 3’UTR. Identical patterns of

95

hybridization were found (Supplementary File 5). Probes were also designed based on

96

the sense strands of both antisense probes. They validated the specificity of the

97

technique (Supplementary File 6). Anatomical nomenclature is based on Harbach and

98

Knight (1980).

99

2.4. Cloning and expression of AgamCPCFC1

TE D

M AN U

SC

RI PT

88

The coding sequence for almost all of the mature form of AgamCPCFC1 was cloned

101

into Lucigen Expresso™ T7 Cloning and Expression System with an N-His tag. Primers

102

are given in Supplementary File 2. They cover the entire coding sequence of the mature

103

protein except for the regions coding for the first four and last three amino acids

104

(Supplementary File 3B).

105

The expressed protein was solubilized in 3M urea, 10 mM DTT (dithiothreitol), purified

106

with a Talon Imac Metal Affinity Resin packed into a BioRad column, eluted with 1M

107

imidazole and sent to Harlan Bioproducts for antibody production in rabbits, using their

108

112 day protocol.

109

2.5 EM immunocytochemistry

AC C

EP

100

5

ACCEPTED MANUSCRIPT

Legs and antennae with Johnston’s organs were dissected from precisely aged pharate

111

and post-eclosion adults and fixed in 4% parformaldehyde, 0.3% glutaraldehyde + 4%

112

sucrose in phosphate buffer (pH 7.4). Further details about processing and embedding

113

in LR White resin (Electron Microscopy Sciences) and subsequent processing are given

114

in Vannini et al. (2014a,b). Anti-AgamCPCFC1 and secondary antibodies (goat-anti-

115

rabbit, conjugated to 5 nm gold particles, Sigma) were diluted 1:5,000 and 1:50,

116

respectively. We found only an occasional gold particle on sections incubated with

117

hybridization buffer rather than the primary antibody. We used a JEM-1210

118

transmission electron microscope (JEOL USA) at 120kV. The images were captured

119

with an XR41C Bottom-MountCCD Camera (Advanced Microscopy Techniques).

120

2.6. Phylogenetic analysis via BLAST searches

121

BLAST searches (tblastn) for CPCFC family members were carried out at

122

http://blast.ncbi.nlm.nih.gov/Blast.cgi using either the first motif from Blaberus craniifer

123

Bc-NCP1 [GenBank:P80674.1] or its entire sequence. We used default settings except

124

for turning off filtering and masking of low complexity regions. We searched EST and

125

TSA databases. We only included in our analyses (with one exception) sequences that

126

had a signal peptide and a stop codon and at least two occurrences of the 16- amino-

127

acid CPCFC motif. We omitted all sequences that came from the 1KITE - 1K Insect

128

Transcriptome Evolution project submitted in January, 2014, because we found a small

129

number of cases with identical sequences in two or more orders. At the time of writing

130

this paper these data were under review and revision, which may resolve the

131

inconsistencies that we observed. We used the phylogenetic nomenclature of von

AC C

EP

TE D

M AN U

SC

RI PT

110

6

ACCEPTED MANUSCRIPT

Reumont et al. (2012) and Misof et al. (2014) as well as many of the sequences

133

produced in their analyses.

134

Additional searches were done with wgs (whole-genome shotgun contigs) using

135

Odonata (taxid:6961) as the search term. These could not produce complete

136

sequences unless the region coding for the entire protein was in a single exon,

137

something we have not yet seen for CPCFC genes. Nonetheless, we got provocative

138

results for Ladona fulva.

139

3. Results and discussion

140

3.1. Genomic structure

141

AgamCPCFC1 [GenBank:AGAP007980] is coded by a gene with three exons, the first

142

of which codes for only 5 amino acids (Supplementary File 3A). Such a short first exon

143

is a common feature of CPs in other families (Willis et al., 2010). The sequence is

144

certain to be correct; for there are 4 ESTs with 100% sequence identity and an

145

additional 50 with 99% identity, all covering the entire coding sequence. These ESTs

146

came from the Celera Anopheles gambiae EST project with directional cloning on mixed

147

sex adults, using strain RSP-ST (Reduced susc. to Permethrin).

148

The ortholog in Drosophila melanogaster has only two exons, and the first also codes

149

for only 5 amino acids (Supplementary File 3D).

150

3.2. Temporal expression of transcripts

151

RT-qPCR was used to learn when mRNA from AgamCPCFC1 was present. Highest

152

levels were found immediately after ecdysis to fourth instar larvae, to pupae and to

AC C

EP

TE D

M AN U

SC

RI PT

132

7

ACCEPTED MANUSCRIPT

adults. Far lower levels of transcripts were detected in intermolt and pharate periods

154

(Fig. 1).

155

3.3. Anatomical location of transcripts for AgamCPCFC1

156

We carried out in situ hybridization to learn where the mRNA for AgamCPCFC1 was

157

localized. We used two different antisense probes, one designed in the coding region,

158

the other in the 3’UTR (Supplementary File 3A). In successive sections, hybridization

159

patterns were identical with the two probes (Supplementary File 5). We selected

160

animals at developmental stages where our RT-qPCR data indicated that mRNA was

161

likely to be present, namely pharate and newly eclosed animals. Sense controls for

162

both probes showed no specific hybridization (Supplementary File 6).

163

Transcripts were found in epidermis of larvae, pupae and adults underlying cuticle

164

destined to be highly sclerotized, i.e. hard cuticle. Thus in sections of larvae (Fig. 2),

165

probe was found in the head capsule (Fig. 2B), in cells that secrete lateral setae (arrows

166

in Fig. 2A-C) and in the cells that form the grid and brush at the posterior end (Fig. 2D).

167

Our slides of larvae had animals at different developmental ages, thus it was not

168

unexpected that we found many sections without labeled cells in the head capsule.

169

In sections of pupae that were less than one hour after eclosion (Fig. 3), label was

170

present in cells that form bristles on the pupal abdomen (Fig. 3B); it was also present in

171

the developing antennae (Fig. 3C) and adult scales that surprisingly are already forming

172

(Fig. 3D). Label was found in epidermis underlying abdominal sclerites but not

173

intersegmental membranes (Fig. 3A) with the exception of places where muscle is

174

inserting into the intersegmental membrane (Mus in Fig. 3A)

AC C

EP

TE D

M AN U

SC

RI PT

153

8

ACCEPTED MANUSCRIPT

In pharate adults (Fig. 4), hybridization of the probe was found in sclerites (Fig. 4A), in

176

muscle attachment zones (Fig. 4B), and in epidermis of Johnston’s organ (JO) both

177

beneath the basal plate and under the pedicel that surrounds the organ (Fig. 4D). It was

178

also present in the epidermis of the flagellum (Fig. 4D), spermatheca (Fig. 4C) and the

179

cervical sclerite (Fig. 4E). Just as in the pupa, CPCFC1 transcript was not found in

180

intersegmental membranes (Fig. 4A).

181

In recently eclosed adults (Fig. 5), CPCFC1 transcript was once again detected in JO

182

and the flagellum of the antennae (Fig. 5A), the male cerci (Fig. 5B), and other

183

appendages (Fig. 5C,D).

184

3.4. Localization of AgamCPCFC1 protein within the cuticle

185

We used EM immunolocalization in order to learn where CPCFC1 was within the

186

cuticle. EM sections were treated with a polyclonal antibody (Ab) that had been raised

187

against most of the mature form of CPCFC1 (Supplementary File 3B). The specificity of

188

the antibody is shown in a Western blot of proteins isolated from adult legs

189

(Supplementary File 3C). Ab binding to EM sections was visualized with a colloidal-

190

gold- labeled secondary antibody against rabbit IgG. We examined structures where

191

the transcript, as visualized with in situ hybridization, was abundant: legs and the

192

antenna. We use the term exocuticle for cuticle formed prior to ecdysis, with

193

endocuticle secretion beginning after ecdysis. In adult legs fixed within a day of

194

eclosion or on Day 8 of the adult stage, the presence of AgamCPCFC1 was strong,

195

exclusively in the endocuticle of both the leg and its apodemes (Figs. 6 A-C). In most

196

regions of the legs of pharate adults (P24), when, by definition, no endocuticle is

AC C

EP

TE D

M AN U

SC

RI PT

175

9

ACCEPTED MANUSCRIPT

present, no trace of AgamCPCFC1 was found (Fig. 6D). But in other regions of the

198

pharate adult leg, we did find evidence for AgamCPCFC1 in exocuticle, both in regions

199

with well-formed lamellae and in not yet organized regions next to the epidermal cells.

200

This was most noticeable at the base of the leg and near a joint (Fig. 7A). We also saw

201

label in the pedicel of pharate adults (Fig. 7B) and flagellum of newly emerged adults

202

(Fig. 7C), once again, where endocuticle should not yet be present (Fig. 7B). Absence

203

of an antigen in the cuticle might just mean that it has been masked during the

204

sclerotization process. Hence it would be premature to conclude that except for an

205

occasional region, AgamCPCFC1 is confined to the endocuticle. The higher levels of

206

transcript right after a molt rather than immediately before (Fig. 1), however, are

207

consistent with the endocuticle being the primary destination of the protein.

208

3.5. Phylogenetic distribution of CPCFC genes in Hexapoda

209

RNAseq technology has provided a plethora of sequences from diverse arthropods,

210

available as TSA (Transcriptome Shotgun Assembly) that greatly expanded the number

211

of sequences available from ESTs or genomic data. These new data provided a rich

212

source of CPCFCs including some from minor orders. Searches were carried out with

213

blastp and tblastn (see Methods) and we found 72 complete sequences distributed

214

across the Hexapoda (Table 1; Supplementary File 7). We required that a sequence

215

be complete with a signal peptide and a stop codon in order to be included in the

216

analysis, a stringent criterion especially for sequences obtained with Pyrosequencing

217

(454), where we found occasional frame shifts recognized because parts of the protein

218

resided in two different reading frames. No attempt was made to reconcile these.

219

Further details on search strategies are described in Section 2.6.

AC C

EP

TE D

M AN U

SC

RI PT

197

10

ACCEPTED MANUSCRIPT

The complete sequences identified were sufficient to gain insight about the CPCFC

221

family. With but two exceptions, the original Blaberus protein (Bc-NCP1) and

222

AgamCPCFC1, the proteins discussed are only putative cuticular proteins. Bc-NCP1

223

was isolated from clean nymphal cuticle, and we presented immunological evidence for

224

the presence of AgamCPCFC1 in the cuticle. All of the sequences we report have

225

signal peptides, establishing that they are secreted. One incomplete sequence from

226

Pediculus humanus is presented (in different or red type) in Table 1 and Supplementary

227

File 7, but data from it were not used in the numerical analyses.

228

The diagnostic feature of this family is the presence of a 16 amino acid motif, first

229

identified by Jensen et al. (1997). WebLogos (Crooks et al., 2004) based on motifs

230

from holo- and non-holometabolous hexapods and diverse Crustacea are given in Fig.

231

8. They show that in addition to the two cysteines that provided the name for this family,

232

there are three prolines, in positions 2, 11, 14, that are universal across the Hexapoda.

233

Several other residues are highly conserved, making this an easily recognized and

234

highly conserved motif.

235

Additional consistent features are evident, but we acknowledge that these conclusions

236

are preliminary and may well be revised as more sequences become available. The

237

most common protein structure of the CPCFC family had three copies of the motif, but

238

sequences from three orders, Collembola, Coleoptera and Lepidoptera, had only two.

239

One of the two sequences from the Odonata also had only two motifs (Table 1). Most

240

species have only a single copy of the gene. The presence of two genes in the

241

coleopteran Tribolium castaneum led to the speculation that where only two motifs were

242

present, there would be two genes. Yet we have identified only 2/10 species of

AC C

EP

TE D

M AN U

SC

RI PT

220

11

ACCEPTED MANUSCRIPT

Coleoptera and 2/14 species of Lepidoptera with two copies of CPCFC. There was one

244

dipteran and one odonate with two CPCFC genes (Table 1, Supplementary File 7). An

245

intriguing exception in another odonate, Ladona fulva, is discussed below.

246

The most surprising phylogenetic finding was that the family was almost completely

247

absent from Hymenoptera with only one complete sequence identified from Cephus

248

cinctus, a sawfly. This is despite the abundance of sequence information for this order,

249

with data from many species and complete genomes for three species of Nasonia and

250

Apis dorsata and Apis mellifera, the latter with a recently updated proteome (Elsik et al.

251

2014).

252

SignalP (Petersen et al., 2011) was used to predict the signal peptides shown in

253

Supplementary Files 7 and 9. The first amino acid in Bc-NCP1 is glutamine (Q), which

254

was present as a pyroglutamate residue (Jensen et al., 1997). An initial Q was

255

present, after the signal peptide was removed, in many of the sequences. In addition,

256

we noticed that many of the retrieved sequences had a Q close to the end of the signal

257

peptide. In most cases, the SignalP result showed that this could follow an alternative

258

splice site. The signal for these sequences was modified (bold in Supplementary File 7)

259

to move the Q into the mature protein resulting in 6/12 non-holometabola sequences

260

beginning in this manner, providing further evidence for the conservation of the entire

261

protein sequence. In the Holometabola, Q was less common. Instead, in the

262

Lepidoptera, arginine (R) was the first amino acid in 13/16 sequences, and in the

263

Diptera it was lysine (K) in 22/28. Except for the Coleoptera, there are fewer than 10

264

amino acids from the start of the mature protein to the start of the first motif. Generally

AC C

EP

TE D

M AN U

SC

RI PT

243

12

ACCEPTED MANUSCRIPT

there are zero or one amino acids after the final cysteine at the carboxy- terminus, but

266

occasionally more (Table 1).

267

Another generalization is that the mature protein, with one exception, does not exceed

268

130 amino acids except in the Coleoptera and Diptera that have all family members

269

over that length. The lepidopteran sequences are more comparable in length to

270

members of the non-holometabolous orders (Table 1). There also appear to be amino

271

acids immediately adjacent to the 16-amino- acid-motifs that differ between the different

272

motifs within a sequence and among different orders. For example, almost all of the

273

lepidopteran sequences have arginine-glutamic acid (RE) immediately upstream of the

274

first motif, while this was not seen in any of the dipteran sequences, all with longer

275

stretches before the first motif and alanine-glutamine (AQ) most frequently immediately

276

upstream from the first motif (Supplementary File 7). Whether these differences

277

represent something functional or result from a chance event in evolution remains to be

278

learned.

279

While we have focused our discussion on the number and placement of the CPCFC 16-

280

amino-acid-motif within the protein, it is apparent that the rest of the protein must be

281

conferring important functional properties. This is clearest in the three major

282

Holometabola orders, Coleoptera, Lepidoptera and Diptera. Extensions of the amino-

283

terminus and the regions between motifs are populated by the acidic amino acids,

284

glutamine (Q ) or asparagine (N), with fairly evenly spaced aromatic residues tyrosine

285

(Y), tryptophan (W), or phenylalanine (F) (Supplementary File 7).

AC C

EP

TE D

M AN U

SC

RI PT

265

13

ACCEPTED MANUSCRIPT

In addition to the presence of only two copies of the CPCFC motif in Coleoptera and

287

Lepidoptera, there are other features of the long sequences from these groups and from

288

the Diptera that enable one to assign a sequence to the correct order.

289

The generalizations presented here are certain to change as data on more species

290

become available. For example, a tblastn search for whole genome sequences (WGS)

291

in just the Odonata revealed evidence for 14 distinct CPCFC genes in Ladona fulva.

292

None were complete, for the start of the signal peptides was missing, something not

293

unexpected since the first exon is generally very short and would not been continuous

294

with the presumed second exon, which in these genes had the rest of the coding region.

295

All ended with stop codons. These 14 genes were distributed across 10 contigs. Ten

296

sequences had three motifs, and 4 had two (Supplementary File 8). Three with two

297

motifs were unusual because the final motif was not near the C-terminus, but from 63-

298

84 amino acids away. Possibly as whole genome sequences become available for other

299

species, more examples will be found with more than two CPCFC genes. Another

300

generalization that is upset by Ladona CPCFCs is that the length of the proteins from

301

the first motif to the end exceeds 131 amino acids in 7 of the sequences, excluding the

302

two with unusual carboxy-termini. Hence, unless an intron interrupts what we have

303

interpreted as a continuous second exon, the Coleoptera and Diptera will not be the

304

only orders with long proteins. The one exception noted above to a non-Holometabola

305

sequence with greater than 140 amino acids interestingly is one of the two sequences

306

from another odonate, Enallagma hageni (Table 1).

307

3.6. Phylogenetic distribution of CPCFC genes in Crustacea

AC C

EP

TE D

M AN U

SC

RI PT

286

14

ACCEPTED MANUSCRIPT

While the available data are far more limited in the Crustacea, we found representatives

309

of CPCFC in four of the six classes: Ostracoda, Malacostraca, Maxillopoda, and

310

Remipedia (Table 2, Supplementary File 9). Variation among groups was informative.

311

A large number of hits that were not examined further were to sequences that had only

312

one of the motifs. The barnacle (Amphibalanus amphitrite) had five motifs, and that was

313

the only sequence in Crustacea that was longer than 100 amino acids. Remipedia, the

314

group reported by von Reumont et al. (2012) to be most closely related to the

315

hexapods, had two sequences from one species, Speleonectes, one with two motifs,

316

one with three. The more basal group (Ostracoda) had two sequences, both with two

317

motifs. Most intriguing were the 6 members of this family in Malacostraca. All had a

318

variant on the basic motif, namely C-X(7)-C, present twice in each sequence. This

319

variant was not found in any other group of arthropods. Since Jensen et al. (1997)

320

suggested that the motif functions to bind metals, it would be interesting to learn if some

321

unusual metal is used by members of this order.

322

The conservation of CPCFC proteins across the arthropods and the somewhat

323

consistent differences among members of different orders suggest that these proteins

324

must be playing a significant role in the cuticle. Their absence in some Hymenoptera

325

indicates that whatever that role is, it is not irreplaceable.

326

3.7. Is CPCFC1 found outside Arthropoda?

327

We wondered if the CPCFC motif so highly conserved in Crustacea and Hexapoda

328

could be found in other groups. They were, and while details are in Supplementary File

329

10, a summary is given below:

AC C

EP

TE D

M AN U

SC

RI PT

308

15

ACCEPTED MANUSCRIPT

BLAST searches (tblastn, against EST or TSA entries, excluding Arthropoda) turned up

331

five hits. One hit was to a sequence from a Homo sapiens brain cDNA library

332

[GenBank:HY131203.1]. The sequence is not present in the database of Homo sapiens

333

proteins, not surprisingly, because it has a 100% match to a protein from the cockroach,

334

Blatella germanica [GenBank:GBID01001268.1].

335

We also got hits to two plants, Karelinia caspia (Asteraceae, a daisy,

336

[GenBank:GANI01023091.1]) and Humulus lupulus (common hop, [GenBank:

337

GAAW01027316.1]). TSA entries from another animal, Hynobius chinensis (Chinese

338

salamander, [GenBank:GAQK01079415.1]), also had a CPCFC sequence.

339

We found a perfect match for the daisy; indeed, the daisy sequence completed an

340

abbreviated sequence for the silverleaf (sweet potato, tobacco) whitefly Bemisia tabaci.

341

The hop was clearly contaminated by a fruit fly, probably in the genus Bactrocera, and

342

the salamander sequence was very close to a chironomid.

343

A final case of contamination was in Daphnia pulex, the only sequence identified for the

344

crustacean class Branchiopoda. Searches of ESTs for CPCFC in Crustacea result in

345

top hits to Daphnia pulex, but exclusively to library 12, the one where the Daphnia had

346

been exposed to Chaoborus americanus in order to monitor the transcriptional response

347

to this predatory midge (Table S10 in Colbourne et al., 2011). Thus it is not surprising

348

that when the complete Daphnia sequence [GenBank:FE342003.1] is itself used in a

349

BLAST search against ESTs, instead of linking to other Crustacea, the top match is to a

350

different midge, Corethrella appendiculata [GenBank:GANO01004087.1], followed by

351

various mosquitoes.

AC C

EP

TE D

M AN U

SC

RI PT

330

16

ACCEPTED MANUSCRIPT

4. Conclusions

353

A new family of cuticular proteins, CPCFC, has members widely dispersed among the

354

Pancrustacea. Members are generally present in 1-2 copies per species, with a protein

355

having two to three copies of the 16 amino acid CPCFC motif that ends C-X(5)-C. A

356

notable exception was seen in the dragonfly, Ladona fulva, where 14 genes, each with

357

2 or 3 CPCFC motifs, were found.

358

Experimental work with the An. gambiae family member, AgamCPCFC1, revealed that

359

the mRNA is most abundant immediately following a molt; transcripts are found

360

predominantly in epidermis secreting hard cuticle, and the protein has been localized

361

mainly in endocuticle. Available information on phylogenetic distribution and protein

362

characteristics revealed that CPCFC is distributed throughout the Hexapoda and in

363

several classes of Crustacea. Amino acid sequences in two Holometabola orders,

364

Coleoptera and Diptera, were longer than in the other orders. All sequences found in

365

the Malacostraca had a motif that ended C-X(7)-C, rather than C-X(5)-C.

366

Figure legends

367

Fig. 1. RT-qPCR analysis of AgamCPCFC1 transcripts in Anopheles gambiae. L48

368

and P24 are actually pharates of the next stage. See Text and Supplementary File 1 for

369

methods.

370

Fig. 2. In situ hybridization of AgamCPCFC1 on sections of 4th instar larvae.

AC C

EP

TE D

M AN U

SC

RI PT

352

371

A. Photograph of larva with arrows showing location of lateral setae on thorax and

372

abdomen and a double arrow indicating the grid and fringe at the posterior end. B.

373

Head capsule and bit of prothoracic segment. Note the presence of hybridization in the 17

ACCEPTED MANUSCRIPT

small cells that form setae at the anterior edge of the prothorax. C. Section of the

375

abdomen showing cells that are forming setae. D. Grid and accompanying fringe at

376

posterior end of a larva. E. Section showing cells secreting large and small setae. (B,D

377

3’ probe; C,E coding region probe).

378

Fig. 3. In situ hybridization of AgamCPCFC1 on sections of pupae less than 1 hour

379

after pupation. A. Section of abdomen showing epidermal hybridization in sclerites (Scl)

380

and only in intersegmental membrane (IsM) where muscles (Mus) are inserting into the

381

cuticle. B. Lateral surface of pupal abdomen with setae-forming cells. C. Developing

382

antenna in pupa. Structure was recognized because it is similar to that shown in Fig.

383

76a of Harbach and Knight (1980). D. Limb with developing scales showing

384

hybridization. E. Muscle insertion zone with strong hybridization.

385

A,C,E coding region probe.)

386

Fig. 4. In situ hybridization of AgamCPCFC1 on sections of pharate adults

387

Animals were fixed 24 hours after pupation, which are a few hours before ecdysis to the

388

adult. A. Hybridization to epidermis of sclerites (Scl), but not intersegmental

389

membranes (IsM). B. Hybridization in muscle attachment region. C. Hybridization in

390

spermatheca (Sp). D. Hybridization under basal plate (BP) of Johnston’s organ, the

391

surrounding pedicel (Ped) and the flagellum (Fl). E. Hybridization to part of cervical

392

sclerite. (D,E 3’ probe; A,B,C coding region probe.)

393

Fig. 5. In situ hybridization of AgamCPCFC1 on adults less than 12 hours after

394

eclosion. A. Antenna with Johnston’s organ (JO) and flagellum (Fl) showing strong

395

hybridization. B. cerci at the terminal end of the male abdomen. C and D.

396

Hybridization in appendages. (All coding region probe.)

(B,D 3’ probe;

AC C

EP

TE D

M AN U

SC

RI PT

374

18

ACCEPTED MANUSCRIPT

Fig. 6. EM Immunolocalization of AgamCPCFC1 on legs from adults of various ages.

398

In these sections label is restricted to endocuticle. A. Leg from adult one day after

399

eclosion. B. Apodeme from same animal. Exocuticle is interior in the apodemes. C.

400

Section of leg from animal 8 days after eclosion. D. Pharate adult with only exocuticle

401

and no labeling visible. ex, exocuticle; en, endocuticle; ep, epidermis. Scale bars are

402

500 nm.

403

Fig. 7. EM immunolocalization of AgamCPCFC1 in both exo- and endo-cuticle. A. Leg

404

of a pharate adult (P24) showing areas of lamellar exocuticle with labeling near a joint.

405

Insert lower power of relevant region. B. Labeling in exocuticle of P24 pedicel. C. Both

406

exo- and endo-cuticle labeled in flagellum of adult <12 h after eclosion. Abbreviations

407

as in Fig. 6. Scale bars are 500 nm.

408

Fig. 8. WebLogos constructed for CPCFC motifs highlighted in Supplementary Files 7

409

and 9.

410

Acknowledgements

411

We thank Drs. Reben Rhaman and Sheng-Cheng Wu for producing the AgamCPCFC1

412

protein used for antibody generation. We also thank Dr. Mark R. Brown and Anne

413

Robertson for maintaining the mosquito facility from which the animals were obtained,

414

MR Brown for help interpreting mosquito structures, and Dr. Michael Strand for access

415

to his Leica photomicroscope and Jena Johnson for training in its use. Dr. Neal Dittmer

416

alerted us to the presence of two CPCFC genes in Tribolium; Dr. Hugh Robertson found

417

the Cephus sequence; Dr. Michael Pfrender supplied information about Daphnia and

AC C

EP

TE D

M AN U

SC

RI PT

397

19

ACCEPTED MANUSCRIPT

Drs. Bernhard Misof and Karen Meusemann provided guidance about the 1KITE

419

sequences. We thank Mary B. Ard of the Electron Microscopy Laboratory at the

420

University of Georgia College of Veterinary Medicine for technical support. Drs. Yihong

421

Zhou and John S. Willis and three anonymous reviewers provided helpful comments on

422

the MS. This research was funded by a grant from the U.S. National Institutes of Health

423

R01AI055624.

424

Competing interests

425

The authors declare that they have no competing interests.

426

Appendix A. Supplementary data

427

Supplementary File 1. Conditions used for RT-qPCR following MIQE guidelines (Bustin

428

et al. 2013).

429

Supplementary File 2. Primers used for RT-qPCR, in situ probe construction, and

430

protein expression.

431

Supplementary File 3. Genomic regions (A) and protein sequence (B) for

432

AgamCPCFC1 and Western blot (C) for antibody used for EM immunolocalization. D.

433

CPCFC ortholog in Drosophila melanogaster.

SC

M AN U

TE D

EP

AC C

434

RI PT

418

Supplementary File 4. Illustration of why RT-qPCR data were not normalized to RpS7.

435

Supplementary File 5. In situ hybridization showing comparable hybridization with

436

AgamCPCFC1 probes in protein coding and 3’UTR. A. Label in muscle insertion zones

437

of pharate adult (P24) comparable to Figure 4B. B, C. Hybridization to epidermis in 20

ACCEPTED MANUSCRIPT

head capsules and cells forming tiny setae in prothorax from adjacent sections of larvae

439

97-120 hours after feeding began D. Hybridization to grid and fringe in section adjacent

440

to Figure 2D. (A,B 3’ probe; C,D coding region probe).

441

Supplementary File 6. In situ hybridization of adjacent sections, processed at the same

442

time, using antisense and sense probes. A, B. Treatment of sections of larvae with

443

antisense and sense probes in the protein coding region. Background is high due to

444

low hybridization temperature (55 oC) relative to high melting point of probe (89 oC) as

445

calculated with basic setting of OligoCalc

446

(http://www.basic.northwestern.edu/biotools/OligoCalc.html). Acellular head capsule of

447

previous instar has some stained cuticle, a common occurrence with RNA probes. C-F.

448

Treatment of sections of animals fixed within 30 min of pupation. Probes against 3’

449

UTR have calculated melting temperature 77.6 oC.

450

Supplementary File 7. Sequences of CPCFC proteins in Hexapoda.

451

Supplementary File 8. Sequences of CPCFC proteins in Ladona fulva.

452

Supplementary File 9. Sequences of CPCFC proteins in Crustacea.

453

Supplementary File 10. Non-arthropod TSA hits and their most closely related

454

arthropod match.

456

SC

M AN U

TE D

EP

AC C

455

RI PT

438

References:

21

ACCEPTED MANUSCRIPT

Bustin, S.A., Benes, V., Garson, J., Hellemans, J., Huggett, J., Kubista, M., Mueller, R.,

458

Nolan, T., Pfaffl, M.W., Shipley, G., Wittwer, C.T., Schjerling, P., Day, P.J.,

459

Abreu, M., Aguado, B., Beaulieu, J.F., Beckers, A., Bogaert, S., Browne, J.A.,

460

Carrasco-Ramiro, F., Ceelen, L., Ciborowski, K., Cornillie, P., Coulon, S.,

461

Cuypers, A., De Brouwer, S., De Ceuninck, L., De Craene, J., De Naeyer, H., De

462

Spiegelaere, W., Deckers, K., Dheedene, A., Durinck, K., Ferreira-Teixeira, M.,

463

Fieuw, A., Gallup, J.M., Gonzalo-Flores, S., Goossens, K., Heindryckx, F.,

464

Herring, E., Hoenicka, H., Icardi, L., Jaggi, R., Javad, F., Karampelias, M.,

465

Kibenge, F., Kibenge, M., Kumps, C., Lambertz, I., Lammens, T., Markey, A.,

466

Messiaen, P., Mets, E., Morais, S., Mudarra-Rubio, A., Nakiwala, J., Nelis, H.,

467

Olsvik, P.A., Perez-Novo, C., Plusquin, M., Remans, T., Rihani, A., Rodrigues-

468

Santos, P., Rondou, P., Sanders, R., Schmidt-Bleek, K., Skovgaard, K., Smeets,

469

K., Tabera, L., Toegel, S., Van Acker, T., Van den Broeck, W., Van der Meulen,

470

J., Van Gele, M., Van Peer, G., Van Poucke, M., Van Roy, N., Vergult, S.,

471

Wauman, J., Tshuikina-Wiklander, M., Willems, E., Zaccara, S., Zeka, F.,

472

Vandesompele, J., 2013. The need for transparency and good practices in the

473

qPCR literature. Nat. Methods 10, 1063-1067.

475 476

SC

M AN U

TE D

EP

Colbourne, J.K., Pfrender, M.E., Gilbert, D., Thomas, W.K., Tucker, A., Oakley, T.H.,

AC C

474

RI PT

457

Tokishita, S., Aerts, A., Arnold, G.J., Basu, M.K., Bauer, D.J., Caceres, C.E., Carmel, L., Casola, C., Choi, J.H., Detter, J.C., Dong, Q., Dusheyko, S., Eads,

477

B.D., Frohlich, T., Geiler-Samerotte, K.A., Gerlach, D., Hatcher, P., Jogdeo, S.,

478

Krijgsveld, J., Kriventseva, E.V., Kultz, D., Laforsch, C., Lindquist, E., Lopez, J.,

479

Manak, J.R., Muller, J., Pangilinan, J., Patwardhan, R.P., Pitluck, S., Pritham,

22

ACCEPTED MANUSCRIPT

E.J., Rechtsteiner, A., Rho, M., Rogozin, I.B., Sakarya, O., Salamov, A.,

481

Schaack, S., Shapiro, H., Shiga, Y., Skalitzky, C., Smith, Z., Souvorov, A., Sung,

482

W., Tang, Z., Tsuchiya, D., Tu, H., Vos, H., Wang, M., Wolf, Y.I., Yamagata, H.,

483

Yamada, T., Ye, Y., Shaw, J.R., Andrews, J., Crease, T.J., Tang, H., Lucas,

484

S.M., Robertson, H.M., Bork, P., Koonin, E.V., Zdobnov, E.M., Grigoriev, I.V.,

485

Lynch, M., Boore, J.L., 2011. The ecoresponsive genome of Daphnia pulex.

486

Science 311, 555-561.

SC

Cornman, R.S., Togawa, T., Dunn, W.A., He, N., Emmons, A.C., Willis, J.H., 2008.

M AN U

487

RI PT

480

488

Annotation and analysis of a large cuticular protein family with the R&R

489

Consensus in Anopheles gambiae. BMC Genomics 9, 22.

491 492 493 494

Cornman, R.S., 2009. Molecular evolution of Drosophila cuticular protein genes. PLoS ONE 4, e8345.

Crooks, G.E., Hon, G., Chandonia, J.M., Brenner, S.E., 2004. WebLogo: a sequence

TE D

490

logo generator. Genome Res. 14, 1188-1190. Elsik, C.G., Worley, K.C., Bennett, A.K., Beye, M., Camara, F., Childers, C.P., de Graaf, D.C., Debyser, G., Deng, J., Devreese, B., Elhaik, E., Evans, J.D., Foster, L.J.,

496

Graur, D., Guigo, R., Hoff, K.J., Holder, M.E., Hudson, M.E., Hunt, G.J., Jiang,

498 499

AC C

497

EP

495

H., Joshi, V., Khetani, R.S., Kosarev, P., Kovar, C.L., Ma, J., Maleszka, R., Moritz, R.F., Munoz-Torres, M.C., Murphy, T.D., Muzny, D.M., Newsham, I.F., Reese, J.T., Robertson, H.M., Robinson, G.E., Rueppell, O., Solovyev, V.,

500

Stanke, M., Stolle, E., Tsuruda, J.M., Vaerenbergh, M.V., Waterhouse, R.M.,

501

Weaver, D.B., Whitfield, C.W., Wu, Y., Zdobnov, E.M., Zhang, L., Zhu, D., Gibbs,

23

ACCEPTED MANUSCRIPT

502

R.A., 2014. Finding the missing honey bee genes: lessons learned from a

503

genome upgrade. BMC Genomics 15, 86.

504

Futahashi, R., Okamoto, S., Kawasaki, H., Zhong, Y.S., Iwanaga, M., Mita, K., Fujiwara, H., 2008. Genome-wide identification of cuticular protein genes in the silkworm,

506

Bombyx mori. Insect Biochem. Mol. Biol. 38, 1138-1146.

508

Harbach, R.E., Knight, K.L., 1980. Taxonomist's glossary of mosquito anatomy, first ed. Plexus Publishing, Inc. Marlton, New Jersey.

SC

507

RI PT

505

Ioannidou, Z.S., Theodoropoulou, M.C., Papandreou, N.C., Willis, J.H., Hamodrakas,

510

S.J., 2014. CutProtFam-Pred: detection and classification of putative structural

511

cuticular proteins from sequence alone, based on profile hidden Markov models.

512

Insect Biochem. Mol. Biol. 52, 51-59.

513

M AN U

509

Jensen, U.G., Rothmann, A., Skou, L., Andersen, S.O., Roepstorff, P., Hojrup, P., 1997. Cuticular proteins from

the giant cockroach, Blaberus craniifer. Insect

515

Biochem. Mol. Biol. 27, 109-120.

TE D

514

Misof B, Liu S, Meusemann K, Peters RS, Donath A, Mayer C, Frandsen PB, Ware J,

517

Flouri T, Beutel RG, Niehuis O, Petersen M, Izquierdo-Carrasco F, Wappler T,

518

Rust J, Aberer AJ, Aspöck U, Aspöck H, Bartel D, Blanke A, Berger S, Böhm A,

520 521

AC C

519

EP

516

Buckley TR, Calcott B, Chen J, Friedrich F, Fukui M, Fujita M, Greve C, Grobe P, Gu S, Huang Y, Jermiin LS, Kawahara AY, Krogmann L, Kubiak M, Lanfear R, Letsch H, Li Y, Li Z, Li J, Lu H, Machida R, Mashimo Y, Kapli P, McKenna DD,

522

Meng G, Nakagaki Y, Navarrete-Heredia JL, Ott M, Ou Y, Pass G, Podsiadlowski

523

L, Pohl H, von Reumont BM, Schütte K, Sekiya K, Shimizu S, Slipinski A,

524

Stamatakis A, Song W, Su X, Szucsich NU, Tan M, Tan X, Tang M, Tang J, 24

ACCEPTED MANUSCRIPT

Timelthaler G, Tomizuka S, Trautwein M, Tong X, Uchifune T, Walzl MG,

526

Wiegmann BM, Wilbrandt J, Wipfler B, Wong TK, Wu Q, Wu G, Xie Y, Yang S,

527

Yang Q, Yeates DK, Yoshizawa K, Zhang Q, Zhang R, Zhang W, Zhang Y, Zhao

528

J, Zhou C, Zhou L, Ziesmann T, Zou S, Li Y, Xu X, Zhang Y, Yang H, Wang J,

529

Wang J, Kjer KM, Zhou X., 2014. Phylogenomics resolves the timing and

530

pattern of insect evolution. Science 346, 763-767.

SC

531

RI PT

525

Neafsey, D.E., Waterhouse, R.M., Abai, M.R., Aganezov, S.S., Alekseyev, M.A., Allen, J.E., Amon, J., Arca, B., Arensburger, P., Artemov, G., Assour, L.A., Basseri, H.,

533

Berlin, A., Birren, B.W., Blandin, S.A., Brockman, A.I., Burkot, T.R., Burt, A.,

534

Chan, C.S., Chauve, C., Chiu, J.C., Christensen, M., Costantini, C., Davidson,

535

V.L., Deligianni, E., Dottorini, T., Dritsou, V., Gabriel, S.B., Guelbeogo, W.M.,

536

Hall, A.B., Han, M.V., Hlaing, T., Hughes, D.S., Jenkins, A.M., Jiang, X.,

537

Jungreis, I., Kakani, E.G., Kamali, M., Kemppainen, P., Kennedy, R.C.,

538

Kirmitzoglou, I.K., Koekemoer, L.L., Laban, N., Langridge, N., Lawniczak, M.K.,

539

Lirakis, M., Lobo, N. F., Lowy, E., MacCallum, R.M., Mao, C., Maslen, G.,

540

Mbogo, C., McCarthy, J., Michel, K., Mitchell, S.N., Moore, W., Murphy, K.A.,

541

Naumenko, A.N., Nolan, T., Novoa, E.M., O'Loughlin, S., Oringanje, C., Oshaghi,

543 544

TE D

EP

AC C

542

M AN U

532

M.A., Pakpour, N., Papathanos, P.A., Peery, A.N., Povelones, M., Prakash, A., Price, D.P., Rajaraman, A., Reimer, L.J., Rinker, D.C., Rokas, A., Russell, T.L., Sagnon, N., Sharakhova, M.V., Shea, T., Simao, F.A., Simard, F., Slotman, M.A.,

545

Somboon, P., Stegniy, V., Struchiner, C.J., Thomas, G.W., Tojo, M., Topalis, P.,

546

Tubio, J.M., Unger, M.F., Vontas, J., Walton, C., Wilding, C.S., Willis, J.H., Wu,

547

Y.C., Yan, G., Zdobnov, E.M., Zhou, X., Catteruccia, F., Christophides, G.K., 25

ACCEPTED MANUSCRIPT

Collins, F.H., Cornman, R.S., Crisanti, A., Donnelly, M.J., Emrich, S.J., Fontaine,

549

M.C., Gelbart, W., Hahn, M.W., Hansen, I.A., Howell, P.I., Kafatos, F.C., Kellis,

550

M., Lawson, D., Louis, C., Luckhart, S., Muskavitch, M.A., Ribeiro, J.M., Riehle,

551

M.A., Sharakhov, I.V., Tu, Z., Zwiebel, L.J., Besansky, N.J., 2015. Mosquito

552

genomics. Highly evolvable malaria vectors: the genomes of 16 Anopheles

553

mosquitoes. Science. 347: 1258522.

RI PT

548

Noh, M.Y., Kramer, K.J., Muthukrishnan, S., Kanost, M.R., Beeman, R.W., Arakane, Y.,

555

2014. Two major cuticular proteins are required for assembly of horizontal

556

laminae and vertical pore canals in rigid cuticle of Tribolium castaneum. Insect

557

Biochem. Mol. Biol. 53C, 22-29.

M AN U

SC

554

Noh, M.Y., Muthukrishnan, S., Kramer, K.J., Arakane, Y., 2015. Tribolium castaneum

559

RR-1 cuticular protein TcCPR4 is required for formation of pore canals in rigid

560

cuticle. PLoS Genet. 11, e1004963.

TE D

558

Pesch, Y.Y., Riedel, D., Behr, M., 2015. Obstructor-A organizes matrix assembly at the

562

apical cell surface to promote enzymatic cuticle maturation in Drosophila. J. Biol.

563

Chem. 290, 10071-10082.

565 566

Petersen, T.N., Brunak, S., von Heijne, G., Nielsen, H., 2011. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat. Methods 8,

AC C

564

EP

561

785-786.

567

Rebers, J.E., Riddiford, L.M., 1988. Structure and expression of a Manduca sexta larval

568

cuticle gene homologous to Drosophila cuticle genes. J. Mol. Biol. 203, 411-423.

26

ACCEPTED MANUSCRIPT

569

Ruijter, J.M., Ramakers, C., Hoogaars, W.M., Karlen, Y., Bakker, O., van den Hoff, M.J.,

570

Moorman, A.F., 2009. Amplification efficiency: linking baseline and bias in the

571

analysis of quantitative PCR data. Nucleic Acids Res. 37, e45. Togawa, T., Dunn, W.A., Emmons, A.C., Nagao, J., Willis, J.H., 2008. Developmental

RI PT

572 573

expression patterns of cuticular protein genes with the R&R Consensus from

574

Anopheles gambiae. Insect Biochem. Mol. Biol. 38, 508-519.

Vannini, L., Augustine Dunn, W., Reed, T.W., Willis, J.H., 2014a. Changes in transcript

576

abundance for cuticular proteins and other genes three hours after a blood meal

577

in Anopheles gambiae. Insect Biochem. Mol. Biol. 44, 33-43.

M AN U

578

SC

575

Vannini, L., Reed, T.W., Willis, J.H., 2014b. Temporal and spatial expression of

579

cuticular proteins of Anopheles gambiae implicated in insecticide resistance or

580

differentiation of M/S incipient species. Parasit. Vectors 7, 24. von Reumont, B.M., Jenner, R.A., Wills, M.A., Dell'ampio, E., Pass, G., Ebersberger, I.,

TE D

581

Meyer, B., Koenemann, S., Iliffe, T. M., Stamatakis, A., Niehuis, O., Meusemann,

583

K., Misof, B., 2012. Pancrustacean phylogeny in the light of new phylogenomic

584

data: support for Remipedia as the possible sister group of Hexapoda. Mol. Biol.

585

Evol. 29, 1031-1045.

587 588 589

Willis, J.H., 2010. Structural cuticular proteins from arthropods: annotation,

AC C

586

EP

582

nomenclature, and sequence characteristics in the genomics era. Insect Biochem. Mol. Biol. 40, 189-204.

Willis, J.H., Papandreou, N.C., Iconomidou, V.A., Hamodrakas, S.J., 2012. Cuticular

590

Proteins, in: Gilbert L.I. (Ed.), Insect Molecular Biology and Biochemistry.

591

Academic Press, San Diego, pp. 134-166.

27

ACCEPTED MANUSCRIPT

TABLE 1 CHARACTERISTICS OF CPCFC FAMILY MEMBERS IN HEXAPODA (signals removed)

amino acids between motif 1-2

53 55

3 4

18 19

121

4

26

96 145

5 6

29 64

Collembola Orchesella cincta Onychiurus arcticus Lepismachilis y-signata Enallagma hageni Enallagma hageni

Orthoptera Teleogryllus commodus Gryllotalpa sp.

90 92

Blattodea Blaberus craniifer Blattella germanica

87 87

Pediculus humanus corporis

Hemiptera

1

14

0 43

20 22

16 16

0 0

4 4

22 22

12 12

1 1

154+

6

82

21

end missing

128 128 119

4 4 5

60 60 23

15 15 42

1 1 1

EP

Macrosiphum euphorbiae Acyrthosiphon pisum Kerria lacca

42

6 6

TE D

Phthiraptera

M AN U

Odonata

final C to end

0 0

SC

Archaeognatha

between motif 2-3

RI PT

Order/Species

total mature length

to start of motif 1

HOLOMETABOLA

Hymenoptera

AC C

Cephus cinctus

151

6

54

43

0

94

2

22

21

1

94

4

24

18

0

158 184 180 178 195 302

14 14 14 9 17 47

111 137 129 129 144 222

Megaloptera Corydalinae sp.

Neuroptera

Chrysopa pallens

Coleoptera

Tribolium castaneum Tribolium castaneum Dendroctonus frontalis Dendroctonus ponderosae Pissodes strobi Pissodes strobi

1 1 4 3 2 1

ACCEPTED MANUSCRIPT

18 14 14 14

137 114 115 159

1 2 1 1

158

14

111

1

171

14

124

1

Bombyx mori Spodoptera litura Ostrinia furnacalis Ostrinia nubilalis Antheraea assama Antheraea assama Antheraea yamamai Athetis lepigone Agrotis segetum Papilio polytes

72 72 74 74 76 77 76 70 72 74

2 2 2 2 2 1 2 2 2 2

37 37 39 39 42 44 42 35 37 39

Papilio xuthus Danaus plexippus Heliconius melpomene Heliconius melpomene Heliconius erato Mamestra brassicae

74 74 74 74 74 72

2 2 2 2 2 2

39 39 39 39 39 37

Diaprepes abbreviatus Colaphellus boyringi

Oropsylla silantiewi

Diptera

AC C

1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1

110

6

35

20

1

150 149 159 148

9 9 9 9

72 71 77 70

21 21 21 21

0 0 0 0

152 190 144 148 131 165 146 147 147 147 149 152 147 147 146

9 9 7 9 6 5 9 9 9 9 9 9 9 9 9

74 111 66 68 62 86 65 66 66 66 68 71 66 66 65

20 21 23 22 15 22 23 23 23 23 23 23 23 23 23

0 1 0 0 0 4 1 1 1 1 1 1 1 1 1

EP

Anopheles gambiae Anopheles darlingi Anopheles sinensis Anopheles funestus Anopheles quadrimaculatus Aedes aegypti Chironomus riparius Sitodiplosis mosellana Sitodiplosis mosellana Culicoides sonorensis Drosophila ananassae Drosophila yakuba Drosophila grimshawi Drosophila melanogaster Drosophila erecta Drosophila persimilis Drosophila simulans Drosophila sechellia Drosophila mojavensis

M AN U

Siphonaptera

SC

Lepidoptera

RI PT

188 162 165 206

TE D

Rhynchophorus ferrugineus Anthonomus grandis Agrilus planipennis Onthophagus taurus

ACCEPTED MANUSCRIPT

9 9 9 9 9

65 66 58 113 95

23 23 21 21 20

1 1 1 1 1

133 139 145 136

9 5 9 9

54 63 65 57

21 21 22 21

1 1 1 1

RI PT

146 147 137 193 175

AC C

EP

TE D

M AN U

SC

Drosophila willistoni Drosophila virilis Ceratitis capitata Teleopsis dalmanni Corethrella appendiculata Glossina morsitans morsitans Musca domestica Bactrocera dorsalis Bactrocera cucurbitae

ACCEPTED MANUSCRIPT

TABLE 2 CHARACTERISTICS OF CPCFC FAMILY MEMBERS IN CRUSTACEA (signals removed)

total length

to start of motif 1

between motif 1-2

91 92

3 4

27 27

65 73 72 47 48 48

2 9 9 4 4 4

28 29 28 10 10 10

156

4

81 79

4 3

56

1

amino acids Between Between motif 2-3 motif 3-4

Ostracoda Cypridininae sp. Cypridininae sp.

Amphibalanus amphitrite Calanus finmarchicus Eucyclops serrulatus

76

2

AC C

EP

Speleonectes cf. tulumensis Speleonectes cf. tulumensis

21

17

13 13

15 15

14

TE D

Remipedia

ALL Malacostraca are C-X(7)-C

M AN U

Maxillopoda

SC

Malacostraca Melita plumulosa mira Hyalella azteca Hyalella azteca Procambarus clarkii Petrolisthes cinctipes Petrolisthes cinctipes

10

Between motif 4-5

RI PT

Class/Species

15

17

17

final C to end 29 29 2 2 2 0 1 1 0 1 0 9 1

ACCEPTED MANUSCRIPT

CPCFC1 transcript levels 6000

4th Instar Larvae

Pupae

4000

M AN U

3000

2000

0 0 hr

12 hr

TE D

1000

24 hr

36 hr

48 hr

EP

Age

AC C

N0 X 107

Adults

SC

5000

RI PT

FIGURE 1

0 hr

12 hr

24 hr

< 10 min

< 12 hr

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT Figure 8

SC

M AN U

NON-HOLOMETABOLA (11 species, 33 motifs)

RI PT

HOLOMETABOLA (55 species, 152 motifs)

AC C

EP

TE D

CRUSTACEA without Malacostraca (5 species, 20 motifs)

MALACOSTRACA (4 species, 12 motifs)

ACCEPTED MANUSCRIPT CPCFC HIGHLIGHTS New cuticular protein family described, characterized by a 16 amino acid motif ending C-X(5)-C.



In Anopheles gambiae, transcripts localized primarily in epidermis underlying hard cuticle.



Proteins localized primarily in endocuticle.



Family members identified in 14 orders of Hexapoda and 4 classes of Crustacea.

AC C

EP

TE D

M AN U

SC

RI PT



ACCEPTED MANUSCRIPT

SC

RI PT

SUPPLEMENTARY FILE 1 MIQE Experimental design and sample collection Sample description Fourth instar larvae and pupae were collected immediately after ecdysis. Animals were either placed in TRIzol® and frozen immediately, or kept until the desired time after eclosion. One group of adults was processed within 10 min of eclosion and another within 12 hrs of eclosion. Larval samples: n = 3-5 for each age Number per sample Pupal samples: n = 2-5 for each age Adult samples: n = 3 for each age Technical replicate number Three technical replicates were run. Nucleic acid extraction Procedure Immediate placement in TRIzol® (Ambion), freezing at -80o C; RNA extraction followed TRIzol’s protocol. Quantification NanoDrop 1000 Spectrophotometer Purity 260/280 analysis (value 1.95-2.04)

Amount of RNA Reaction volume Temperature and time RT-qPCR target information Sequence accession numbers

Primer sequences

EP

RT-qPCR protocol Complete reaction conditions

AC C

Thermocycling parameters RT-qPCR instrument Data analysis R0 determination

RpS7 (AGAP010592); CPCFC1 (AGAP007980) See Additional file 8.

TE D

RT-qPCR oligonucleotides

Life Technologies SuperScript III First-Strand Synthesis System with Oligo-dT20 primer 1.0 µg of total RNA 15 µl 65°C for 5min; 55°C for 50min; 85°C for 5min

M AN U

Reverse transcription Procedure/kit

Reaction volume: 15 µl Primer: 2.5 µM cDNA: 5 µl of 1/100 diluted cDNAs Polymerase and reactants: SsoAdvanced™ Universal SYBR® Green Supermix (Bio-Rad)

95°C for 2min; 39 cycles of 95°C for 10s and 57°C f or 30s Bio-Rad CFX ConnectTM Real Time PCR Detection System

R0 values were calculated with amplicon mean efficiency per run using LinRegPCR software (Ruijter et al., Nucleic Acids Res. Reference genes were not 2009;37(6):e45). used. rpS7 (Togawa et al., Insect Biochem Molec Biol. 2008;38:508519) was validated as a stable reference gene only for larvae of all ages. It could only be used for a single age of pupae and pharate or eclosed adults. Table based on recommendation of Bustin et al. (Nat Methods. 2013:10:1063-1067). Table design from Isolani et al. (Eur J Pharmacol. 2012:688:1-7).

ACCEPTED MANUSCRIPT SUPPLEMENTARY FILE 2 -- PRIMER SEQUENCES

AgamCPCFC1 Primers

in situ hybridization – antisense probe in coding region in situ hybridization – antisense probe in 3’UTR in situ hybridization – sense probe in coding region in situ hybridization – sense probe in 3’UTR Protein production

Sequence 5’-3’ CCACTGCCAGGATACACCTC in coding region GTCAGGAAATGGGAAGGCGA in 3’UTR GTGAGGTCGAGTTCAACAACAAGAA GGCACCGGCACGTAGATGA CTCAGCCCAGCTGGAACGCC TAATACGACTCACTATAGGGCAGGTGTGCGGGGACACTC TCCCACGTTTGCCATGGTTGTGT TAATACGACTCACTATAGGGTGTGTGCGATTGCACGCTGA TAATACGACTCACTATAGGGCTCAGCCCAGCTGGAACGCC CAGGTGTGCGGGGACACTC TAATACGACTCACTATAGGGTCCCACGTTTGCCATGGTTGTGT TGTGTGCGATTGCACGCTGA CATCATCACCACCATCACCAGCCAGCCGCCCAGTATCC GTGGCGGCCGCTCTATTAGAAGTTCGGGCAGGTGTGCG

RI PT

RT-qPCR

Probe Name CPCFC1 qPCR-2 CPCFC1 qPCR-2 RpS7-UC RpS7-DD CPCFC1-UA T7CPCFC1-DA CPCFC1-UB-3’ T7CPCFC1-DB-3’ T7CPCFC1-UA CPCFC1-DA T7CPCFC1-UB CPCFC1-DB PP-CPCFC1-F PP-CPCFC1-R

SC

Purpose

AC C

EP

TE D

M AN U

Sequences in bold are not part of AgamCPCFC1 gene. They are the T7 primer used for probe construction, the sequence that added the 6 His residues to the protein, and the adaptor for the plasmid.

ACCEPTED MANUSCRIPT

SUPPLEMENTARY FILE 3 Genomic regions and protein sequences for AgamCPCFC1 and ortholog in Drosophila melanogaster.

AgamCPCFC1 – [GenBank:AGAP007980]

RI PT

A. The genomic region (from VectorBase) that included AgamCPCFC1 showing probes used for in situ hybridization.

EP

TE D

M AN U

SC

............taattgtacgctccgcaaacggacggactacaaccggcactggcactgctgatcgatcaccaccc tcttctagccctgccgtggtggaggggggggggaaggggtttactcgcccaggtagtactgtaactgtaaccgccga ctactcgagccgatcgttcgggggagaatgttcgggctctcggcgttacacgggtacggtcgagtggtccaatggat cgatttcggcgcggagaaagaaatcgtcggtggcgctcgtgggcacccttccctttaatggttcgccgttggtgttt ggctaggtggagctaaggtagggggctgctcgaacctcggctgctcggtggagccgcagggaggggatacctagaac gcggctccacgaggctcacgagagagcgccccgaacgcgagcgacgaacctcgtggccgggcccgatcggctgggtg tggtgtggtttttactataaaagctcggtattgtctttcggagatcagtaTTCGGTACTCGGTTTGGAAGTTTTGTG TTAAGCGAACAACAGTCCGTGTTTCGGTAACATAAAAGTCCAACTTGCCTGTGTGCATACGGAAGATTGAACGCAAG AGATACTCTGCATCCCAAAATGTTCTCCAAAGTGgtaagcaatttagagggtgtcacgaagggtatgggggatcaaa ccctttgaagggttgagcctgatctgtgtgtgcgtgagtgcgaggatagacagccccaacagagaacggctgtcaga attgtggaattgtgtggaagaggatcgtgtgcaatcagtgtgcagggggcgtgatgaatcggatgtgcaactgtgtt taatccaaagtctggtgatgctaatcgttgcttcctgtacttgtgcgcttgcttgtagATCGCTGTTTTGGCCTTTG CCGCCGTGGTAGCCGCTAAGCCCCAACATCAGCCAGCCGCCCAGTATCCGGCCGGAGTCGATCCGTCCCGCTGCCCG TCGTACCCGAACTGCGATAACGCGGCCCTGCACAGCCCGAACCCGTACAACAACCATGCCGCCAACCACTGGAACCC GAACTGGAACGCTCAGCCCAGCTGGAACGCCGCCCCTGCCCCTGCTCCGGCCCCGGCCGCCTACTACCACGGAGCTC CCCACTCGTACCAGGCCCTGACTGGCCCGAGCCACAACTACATTGGAGCTCCCAGCCCCTCGGCCGGTGGTGACCGg taggttcaaggttccccgagacctcttcctcaccctcacacaagttcaatattgctacaacgtccgcttccattcgc tcttttccccctcacagTTACCCCGCTGGAGTGAACCCGCAGTCGTGCCCGAACTATCCGTACTGTGATAACACCGT TCAGGCTGGCGTTCCTCAGGTTGCCCCACTGCCAGGATACACCTCCCGCCAGTACCCGGCCGGAGTGTCCCCGCACA CCTGCCCGAACTTCCCGTACTGCTAAGACATTCGCCTTCCCATTTCCTGACCCTCCCCACACTACCCTTCATCGTAC TATTGATCTGTGACCACACGTTTCTTCCTTCCTTGCGCAATTTTCATCCGTTCTACCCCTACCCTGTACATCTTCCC ACGTTTGCCATGGTTGTGTAAATAACTGGTACTGGTTTGTTTTCGTTAGTGTTTTGTGCATGAAATACTGCCTCTTT AGTAGGAGAGATATTAGCTGTGTTCCTTAGAGTGATTGGTTCAACAAGCAAGAGCGATAAGAGCGCAACAATCAAAA GCATTGGAGAACTTATGGGAAATATCATTGTATAAAAAAACAAAAAGAATCTGTGAAACTACAACTGCAACAACAAC AGCAATATCTTCATCGACTATGTTATCTCAGCGTGCAATCGCACACATCGCAAATGAAATTGAAATGCAATTGATTT Aagaatcaaacggattatgttgtagtgcgtttaaatgaattcgatgttacaccgaacattcatgttgtgggttgtgg ggggaatggtaacgcttgtactggtaacagctaagtgatcaaaaatgtttactgttgatccaaagattctagctcct gcttcttcttattcttatttggcgccacaaccttaatcggttcagcgcaagcgaagcttgtaatgagcttgtctact tattggtatt

AC C

Probe used for in situ hybridization that came from coding region (gray highlighted) is shown in dark orange, the one in the 3’UTR (interrupting purple type) is in light orange. Probes were 284 and 282 nt respectively. Introns are in blue type; upstream and downstream regions in green. B. Protein sequence of AgamCPCFC1. Gray highlighting shows 16 aa motifs, blue type indicates region used for antibody production. MFSKVIAVLAFAAVVAA KPQHQPAAQYPAGVDPSRCPSYPNCDNAALHSPNPYNNHAANHWNPNWNAQPSWNAAPAPAPAPAAYYHGAPHSYQA LTGPSHNYIGAPSPSAGGDRYPAGVNPQSCPNYPYCDNTVQAGVPQVAPLPGYTSRQYPAGVSPHTCPNFPYC

ACCEPTED MANUSCRIPT

M AN U

SC

RI PT

C. Western blots of crude protein extract of An. gambiae legs treated with anti-CPCFC1 (diluted 5,000-fold). Molecular weight marker is in daltons.

D. The AgamCPCFC1 ortholog in Drosophila melanogaster from FlyBase [GenBank: CG8736]. Colored type added to conform to Anopheles.

AC C

EP

TE D

>CG8736 2R:8594031..8595001 (reverse complement) atctgttgggccaatcaagtaaaatatgcgcgagatcagtcaactacagaaacaaaagcaaaagtaaagcaaagcca ctgcagcagcagcaatagcaaaagcaacaagcacagcagccgcagtaatgaaagtgaaaccgagtctggccgagaga ctctggctgagattgagacccggccaagagtcggttctagccagcaccgctatataagcttgatggccgggctcggc AGCAGCAGTGCAGCGCCGACCAGGAACCCAATTGGAAGTTTGAGCTACGACTCCATAGTCCAATTCGGCAAGGATTA CCATAAGCCCCACACCAGAACCAACTCCACAACTACCAACCACCCACTCACCTCAGCCAACATGTTCTGCAAGCTGg taagtgccctttgagccaagtttctgcccacaaggatagcgtctgaaaaagttcctttaactactaagtggagctgg aatccaaatctgcaacattttagttgaagttcttaagatccgaggatcctaagttccagatatttttcaaactacag atgtatcttattacatttaaaaattccatatttttttaaaatcttttcaaagCTTTTCGCTACCTTCGTGGCCCTGG CGGTGGCCAAGCCACAACACCAACCTGCTGCCCAGTATCCGGCTGGCGTGAATCCGCAGGACTGCCCCAACTTCCCC ATCTGTGATAATGCGCGCCTGCACAATCCGCAGCCGCAGTGGGGTGCCCCGCAGCCACAGTGGAACCCCCAGCCGCA GCCACAGTGGAACCCGCAGCCACAGTGGCAGCAACCTCAACCCCAGTGGAACCCCCAGCCGCAGCCACAGTGGCAGG CACAGCCCTCGTGGAACGCAGCCCCTGCTGCCGCACCCGGTGGCGATAAGTATCCAGCTGGCGTCAATCCGCAGACC TGCCCCAACTATCCCTACTGCGACGTGAACGCCGGACACGCTGGTGCTCCCGTGGCAGCTCCTCCTCTACCTGGCTG GACGGAGCGTCTGTATCCCGCCGGAGTTTCGCCGCACCAGTGCCCCAACTTCCCGTACTGCAACTAGGGCGGCCTAG GGCTCACTTGCGGCCAGCCGCAGCTTCCTTTAACGCTTCGCCTTTCCCCAGTTCTCAATTAGTGGACATTAATCTGA AATTCTTTGTTGTTGGCGCCGAAATAAATGCAAAATGTTGGTCAAAGaaatgggactttctatggttgatagctgca gatacatgggggtatacaaatcgttctattcgcaactacaacttcttactatattacaaaatgtaattgttggttgg ttcatagaactcttactaatgaagaaaagattaattgaccaaagtaagcatttataattaaataaagtaatttgaac tgtctacaatcagaaactcattcagtcaagtgcgctaaatgaggtatgaaagaagactgttaagatcctgctccatg MFCKLLFATFVALAVA KPQHQPAAQYPAGVNPQDCPNFPICDNARLHNPQPQWGAPQPQWNPQPQPQWNPQPQWQQPQPQWNPQPQPQWQAQP SWNAAPAAAPGGDKYPAGVNPQTCPNYPYCDVNAGHAGAPVAAPPLPGWTERLYPAGVSPHQCPNFPYCN-

ACCEPTED MANUSCRIPT SUPPLEMENTARY FILE 4 ILLUSTRATION OF WHY RT-qPCR DATA WERE NOT NORMALIZED TO RpS7. Data are from same cDNA preparations used for Figure 1.

RI PT

S7 Transcript Levels 6000 5000

SC

N0*107

4000 3000

M AN U

2000 1000 0

L4 0 hr L4 12 hr L4 24 hr L4 36 hr L4 48 hr

P 0 hr

P 12 hr

P 24 hr

A < 10 A < 12 hr min

P 12 hr

P 24 hr

A < 10 A < 12 hr min

TE D

Age

CPCFC1/S7

EP

3 2.5

1.5 1 0.5

AC C

CPCFC1/S7

2

0 L4 0 hr

L4 12 hr L4 24 hr L4 36 hr L4 48 hr Age

P 0 hr

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT

SUPPLEMENTARY FILE 7 CPCFC FAMILY MEMBERS IN HEXAPODA Collembola (2 species, 2 sequences)

RI PT

>gi|570575880|gb[GenBank:GAMM01011142.1]TSA: Orchesella cincta OC11152 MMKQIVALLVLAIVAYAYA QERYPAGVSPASCPNYPFCNVNGIGAPPGYHFDRSLGYPAGIHPSTCPNYPYC>gi|164439313|gb[GenBank:EW760097.1]|EW760097 sb_009_07P09_Onychiurus arcticus MSKVILVLMVLAVFATVCFA QADRYPAGVSPASCPNYPFCNNVGHNVPIGCRFDSANHRYPPGVDASTCPFFPYC-

Archaeognatha (1 species, 1 sequence)

(1 species, 2 sequences)

M AN U

Odonata

SC

>gi|283497943|gb [GenBank:N223383.1]|FN223383 FN223383 dmp031cm Lepismachilis y-signata MMKLVVLAALVALAAA QADRYPAGLNPAACPNFPLCDSNAIAAFQHNPNTYFSPPSAPTGARYPEGINPVTCPNYPYCGASAPAGAPAGYASAPASYAPAPSNYA QAPAGYASSPNNNIAYPAGVNPSSCPNYPYCH-

TE D

>gi|459260966|gb[GenBank:GAEQ01007162.1]|TSA: Enallagma hageni contig09773 MFAKLFVFAACVAVALC AVADKYPAGLNPALCPNYPDCDNTLIALHSSNPSAVLPYAAAPLYHYGREYPAGVHPAACPNYPYCNTLAYPYAAHYAREYPAGVHPAA CPNYPYC>gi|459229756|gb|[GenBank:GAEQ01017608.1]| TSA: Enallagma hageni contig07609 MIAKSVAIILCTVAVACTA APQAARFPAGIDPQVCPNYPDCDNVALAASISAQVQQQQYAAAPYSAPYSAPYSAPQPAPYNPPPQQYSYPTYAQPAPAAPRAAEGYPA GVDARVCPNYPYCGPTPAHVPAAPQNYAAPQNYAAPPPANNWAAPQNQYNAPIPEP-

Orthoptera (2 species, 2 sequences)

AC C

EP

>gi|701837185|gb|[GenBank:GBHB01030180.1]| TSA: Teleogryllus commodus MblContig30181 MALKLVLALCLVAVALA APQADRYPAGLNPALCPNYPLCDNNVIATYGPAAAAVPRAREYPAGVPAAACPNYPFCNVNLHAPPLPGFSARLYPAGVPAAACPAYPY C>gi|714206773|gb|[GenBank:GAWZ01160366.1]| TSA: Gryllotalpa sp. AD-2013 C634000 MIAKLMVVAVALLAAVYA APQADRYPAGLNPALCPGYPVCDNALIATYGPSGAPVHNVYARQYPAGVNPAACPNYPYCNTAVSAAPLPGFSARLYPAGVSPAACPGY PYC-

Blattodea (2 species, 2 sequences)

>Original sequence for family: Blaberus craniifer gi|3023587|sp|P80674.1| Name:Bc-NCP1 QADKYPAGLNPALCPNYPNCDNALIALYSNVAPAIPYAAAYNYPAGVSPAACPNYPFCGAIAPLGYHVREYPAGVHPAACPNYPYCV>gi|698758469|gb|[GenBank:GBID01001268.1]| TSA: Blattella germanica Contig1280 MYCKLVVLAAIVAVAVA QADKYPAGLSPALCPNYPHCDNALIALYSNNAPAVPYASAYNYPAGVSPAACPNYPFCGALAPLGYHVREYPAGVSPAACPNYPYCH-

1

ACCEPTED MANUSCRIPT

Phthiraptera

(1 species, 1 incomplete sequence)

>modified from gi|242023445|ref|[GenBank:XP_002432144.1]| Galectin-3, [Pediculus humanus corporis] MIVQLFFFASIVISAFA GPAGDKYPAGLDPNLCPNYPNCDNVLLAAAQTQPGVYTSGTYNGAYNGAYNGAYNGAYNGAYNDAYNSGAYVPEAYSGYPNTGFA SGAYTGFGNPAGGAHAVPGYPAGVNPASCPNYPYCTNYAPNAYHQVAPLPGFTHREYPEGVNPTTCPNYxxxxx

RI PT

Hemiptera (3 species, 3 sequences)

M AN U

SC

>gi|659433823|gb|[GenBank:GAOM01006399.1]| TSA: Macrosiphum euphorbiae Me_WB07486 MIGVLAIVFAVQATAVLA GGARYPSGLNPALCPNYPHCDNVLLAAYAQPAAGNDYNDHYTNNNNAGQYYGSGYQHHPSEASNYHNNQPLVPAPYTEPGYPAGLSSSN CPNYPYCSHQVPAEALRYAHKRYPSGVSPQNCPNYPYCH>gi|193636711|ref|[GenBank:XP_001949693.1]|PREDICTED:cuticleprotein1[Acyrthosiphon pisum] MIGVLAIVFAVQATAVLA GGARYPSGLNPALCPNYPHCDNVLLAAYAQPAAGNDYNDHYTNNNNAGQYYGSGYQHHPSEASNYHNNQPLVPAPYTEPGYPAGLSSSN CPNYPYCSHQVPAEALRYAHKRYPSGVSPQNCPNYPYCH>gi|656473778|gb|[GenBank:GBDP01042177.1]| TSA: Kerria lacca L_17239_T_1/1_C_1.000_L_675 MISKTIFICTVLLISVTCQS QSYQSNKYPAGIHPNLCPHYPYCDNTVLAGFAQGVAAFHTGAAAPGYPASLSPQACPNYPYCSHQIPPEAIHYRRSAALHQYPTVAEST NYAYPIPSAIDLRTKYPSGVNPASCPNYPYCH-

HOLOMETABOLA Hymenoptera

(1 species, 1 sequence)

(1 species, 1 sequence)

EP

Megaloptera

TE D

>Cephus cinctus [Ccin1_scaffold0997]_contig12, whole genome shotgun sequence MCTLILNYLIQIQVVLCILALATTLLA KPNGDRYPAGVNPQSCPNYPNCDNAALHSGRASTPSWSPQGGAWAPAGAPAAPWAQPASPWNAPHSAGNPASAGAQYPAGVNPQSCPNY PQCDNAALHGGAPANNDWNEPSSNSWDSWDSWSDPSTAQPAAVAPRYPAGVSQQSCPNYPYC-

AC C

>gi|661056549|gb|[GenBank:GADH01013481.1]| TSA: Corydalinae sp. KMRSPBM-2012 contig14585 MFKPVVVLIAVLVACVSS QADRYPAGLNPALCPGYPRCDNSLLALHSARTEPVADYTATRYPAGVPAAACPNYPFCNTGEAYGYSAARPLPGFTRRLYPDGVPAAAC PNYPFCH-

Neuroptera

(1 species, 1 sequences)

>gi|459415814|gb|[| TSA: Chrysopa pallens Unigene29704_dacaoling MNQLVILTVVAFIACAYG QADRYPAGLNPALCPGYPNCDNALLALYSTGAIPAPPLQAPAARYPAGVPAAACPNYPYCNVGAPESALPLPGYAQRLYPAGVPAAACP NYPYC-

2

ACCEPTED MANUSCRIPT

Coleoptera

(10 species, 12 sequences)

AC C

EP

TE D

M AN U

SC

RI PT

>gi|91087673|ref|XP_976428.1| [Tribolium castaneum] MFVKLTVLACSIAAVCG VWNGPLAGGVPAHQYPAGVSPQACPNFPNCANPAVAANPNAPAPYNPVPQYNHYNPAPQYNGYNPAPVPQYNPGLQSALDRGEYIGDGD YHGEGLAESGAYGNNGQHGGYNGGYNGGYNPAPAYNPAPAYNHGLPAGVPAQVPAGVDARSCPNYPFCH>gi|91087671|ref|XP_976426.1| [Tribolium castaneum] MFVKLAVLACSLAVSAA VYSGPLAGGVPAAQFPAGVSPQACPNYPNCANPSVAVNQAPVSQYNAAPQYTPQQYQPAPQYAPQQYQPAPQYTPQQYQPAPQFAPQQY NAAPARPQYTPEVQNALDRGEYIGDGDYHGEGLAEALAPGYQGQAQAYNAAPAYNPAAYAPQPQAHHQLPAGVGQPAQIPAGVDARSCP NYPFCH>gi|452930847|gb|GAFI01012246.1| Dendroctonus frontalis MFVKLVTLALCLTSAWA VYNGPLAGGLPADLYPAGVSPQACPNFPNCANPAVAVSSGAPQNNWGAPQPQPAWNQAPQSQWNAPQPQWNAPQPQWNNYNPQPVPQWN PSGQNALEKGGYTGDGDYHGEGLAEALAPGYENAGGWNKWNNNDNQAAAWNQAPAWNAGPQAGLPNGAGARIPAGVDPNACPNYPFCGG GH>gi|459324431|gb|GAFX01014541.1| Dendroctonus ponderosae MFVKLVTLALCLTSAWA VYNGPLAGGLPADLYPAGVSPQACPNFPNCANPAVAVSSGAPQNNWGAPQPQPAWNQAPQPQWNAPQPQWNAPQPQWNNYNPQPVPQWN PSGQNALDKGGYTGDGDYHGEGLAEALAPGYENAGGWNKWNNDNQAPAWNQAPAWNAGPPAGLPNGAGARIPAGVDPNACPNYPFCGGH >gi|452925844|gb|GAEO01000512.1| TSA: Pissodes strobi Pissodes_strobi_Contig512 MFVKLVVFVCFAGSALG QHQQYQGPLAGGQPAALYPAGVNPQSCPNYPDCTNPLVAISQNAAPQYAQSAPQYQQPAQYQQPSQYQQYQAPAVTPAPVSQYNPVYPQ QYAPASQRQYSSDVQQRLDRGEYIGDGDYRGEGLAEALAPGYAGQAQAAPQYNPAPQYNPAPAFQPAPQQYNAAPAYPQAAPSAQPAQI PAGVNAQACPNYPFCHA>gi|452924214|gb|GAEO01001329.1| TSA: Pissodes strobi Pissodes_strobi_Contig1329 MFHKLAILLCFMSVTIA QYHQHPQYQQAQYQQQPQYQQQPQYQQPQYQQAQYQQQPEPVPTAAQFPAGVDAQSCPNYPECLNPLLAVQAVAKASDPRYLAQNAP AQRESQYSPDVQQRLDRGEYIGDGDYHGEGLDEALAPELAVRGHYDGQTAAAQYAAQPAAAQYASQPAAAQYAAQPNIGAVQYPAQQAA SQYVAQPAAVQYPAQRAASQYAAQPGAVQYPAQQAASQYNAAPQYITAPRQATAPQYYQARPSPHGSQHAQASLFTPQYAQPAPVSEAR SYSPVANAIASGSEPVPSVQLPAGVDANACPNYPFCH>gi|372374709|gb|JR483044.1| TSA: Rhynchophorus ferrugineus contig15581.Rhfeelpa MFVKLAVFSCALALAFA QYHQPYNGPLAGGQPASLYPAGVSPQSCPNYPDCSNPLVAVQNSAPQYAPSAPQYPQPAQYSQYQAPAPVTPAPVSQYNPVYPQQYAPA SRSQYSPDVQQRLDRGEYIGDGDYHGEGLAEALAPGYAGQAPRQYAPAPAPYQPAPAPYQPAQAYSQPAYPQAAPAGPQPAQIPAGVNA NACPNYPFCH>gi|562764064|gb|GABY01013552.1| TSA: Anthonomus grandis A_grandis_454_rep_c207 MFVKLVTLALCLTSTLA VYNGPLAGGLPASLYPAGVSPQACPNFPNCNNPAVAANPNSPTQQQWGSPQPQAWGQQQPAWTQQPQNQWNNAQAVPQWNGNNNDVLLK GGYTGDGDYRGEGLAEALAPGYENSDVWRNWATGGQQQANQWNQAPQGPTHGVGQIPAGVDAGACPNYPFCGH>gi|429236657|gb|GAAB01001063.1| TSA: Agrilus planipennis 000793_EAB-5_isotig01124 MFVKLVVLACISSVALA AYNGPLAGGEPAHRYPAGVDPSACPNFPHCNNPAVAVNQQPAHAWNAQPQWNAAPQNQWNPAPQNHWNAAPQNQWNAAPQQWNAQPSWN GNQNALDSGAYTGDGDWHGEGLAEAGAFGDISHNFNDPAPGHPIPQAAHHVPVPGLPAQLPAGVDAHACPNYPYCH>gi|211331260|gb|FG540609.1|FG540609 OtL019A08_021607c Onthophagus taurus MFTKLTTLACVLAVANC AWNGPLAGGAPASSVPAGISEAACPNYPHCTNPSVAVEPNSPAQPQSQYQQYQPQYQQPQYQSHQPQYQPQQQYQSQPQQQYQPQPQYQ SQPQQYQPQPQQYQQQPQQNQYNSGNHNENVLLSGEYTGDGDYRGEGLAESGAFGPVDDPKSYDATPAPQMTYQPTPAYNPAGYQQQGY NNAPQHAQPNNVPAGLDPRYCPYYPFCH-

3

ACCEPTED MANUSCRIPT >gi|46496193|gb|CN475749.1|CN475749 USDA-FP_124839 Diaprepes abbreviatus MFVKLVTLAICLASARA VYNGPLAGGKPADLYPAGVSPQACPNFPNCANPAVAANPNAPGGYPWGSQPQNAWAQTGNNWNSAPQNNWNAPAAEWSPYRQNALDRGE YTGDGDWHGERLAEALAPGYENRGGGWNNNGGQYDGSQGWAGVNQPPAGLGAIPAGVNPGSCPNYPFCK-

Lepidoptera

RI PT

>gi|749107276|gb|GBHN01000010.1| TSA: Colaphellus boyringi Contig11_AA MFVKLAVIACSLAAANA VYNGPLAGGQPAALYPAGVSPEACPNFPNCNNPAVAANPQQAAPHQYGAPQPQYNAQPQNQYNAQPQNQYNQGQQYNPAPVPQQYGNDA NNRLNRGEYIGDGDYHGEGLAEALAPGYSQPNYNDANQYKNQGNNYNQGPQGVPQNIHQTHGVGQIPAGVDAHACPNYPFCS-

(14 species, 16 sequences)

AC C

EP

TE D

M AN U

SC

>Bombyx mori gi|698765134|gb|GBJR01010300.1| TSA MYGKLFAILTLAAVALA REYPAGLHPAICPNYPFCDADALAKYTPQGMPIPEWVRNPAILPIARAASNSVPKYPADFPAALCPNYPYCW>Spodoptera litura gi|612350358|gb|GBBY01010560.1| TSA MFGKMFVFFAVLVVALA REYPAGVHPAVCPNYPYCDADALARHTPDGMPIPQWGYHPGVAPAAPGPVPAAPRYPADFPPALCPNYPYCW>Ostrinia furnacalis gi|572957986|gb|GAQJ01033381.1|TSA MFAKLFALLALAAVALC REYPAGVHPAVCPNYPYCDTTAFARHTPDGQPIPEWVYNPSILPVAPVDPAHNAAPRYPADFPAALCPNYPYCW>gi|597687333|gb|GAVD01010517.1| TSA: Ostrinia nubilalis comp17384_c0_seq1 MFAKLFALLALAAVALC REYPAGVHPAVCPNYPYCDTTAFARHTPDGQPIPEWVYNPSILPVAPVDPAHNAAPRYPADFPAALCPNYPYCW >gi|189554912|gb|FG209476.1|FG209476 Aace00901 Antheraea assama MYGKLLIVFALVVVALG QKYPAGVHPAVCPNYPFCDAQALARHTPDGTPIPEWVRNPSILPAPVPNHYAAGSFAAPRYPADFPAALCPNYPYC>gi|189567448|gb|FG208832.1|FG208832 Aace00257 Antheraea assama MFGKLFFLCAVAVAIAE QYPAGVHPAICPNYPFCDAETLARFTPDGMPIPEWYRNPALIPAPVPVPVVRAFEAPVAAKYPADFDASKCPNYPYC>gi|755820069|gb|GBZJ01043295.1| TSA: Antheraea yamamai Unigene46686_Ayam MYGKLFIVFALVVVALG QKYPAGVHPAVCPNYPFCDTQALARHTPDGTPIPEWVRNPSILPAPVPNHYAAGSFAAPRYPADFPAALCPNYPYC>Athetis lepigone 1-4gi|576213777|gb|GARG01025927.1| TSA MIGKMLFFFALAAVALA REYPAGVHPAVCPDYPFCAPDALARHTPSGIPIPQWGYNPGVAPGHPGPAALKYPADFPAALCPNYPYCW>Agrotis segetum gi|617808548|gb|GBCW01017681.1| TSA MFGKMLVFFALAALALA REYPAGVHPAVCPDYPFCAADALARHTPDGMPIPQWGYNPGVAPAHAGAVPAAPRYPAGLPPALCPNYPYCW>Papilio polytes gi|389611347|dbj|BAM19285.1| PpolCPH1 MYAKLFVLCVLAGVALA REYPAGLHPAVCPNYPYCDTNTFARFTPDGMPIPEWVYNPSILPVAPADPHANAAPKYPANFNAAACPNYPYCW>Papilio xuthus gi|389608733|dbj|BAM17976.1| PxutCPH1 MYAKLFVLCVLAGVALA REYPAGLHPAVCPNYPYCDTNTFARFTPDGMPIPEWVYNPSILPVAPADPNANVAAKYPANFNAAACPNYPYCW>Danaus plexippus gi|357617832|gb|EHJ71016.1| MYAKLFIVCAVAVVALA REYPAGLHPAVCPNYPYCDATAFQRFTPEGQPIPEWVYNPSILPQAPVDPNANLAARYPANFNAAACPNYPYCH>Heliconius melpomene cDNA clone Hm_pwAE2_33G10 MYAKLFIVCVVAVVALA REYPAGLHPAVCPNYPFCDTNAFARFTPEGMPIPEWVYNPSILPVAPADPNANIAAKYPANLNPAECPNYPYCW>Heliconius melpomene cDNA clone Hm_pwAE2_14F09 MYAKLFIVCIVAVVALA REYPAGLYPALCPNYPFCDSNTLARFTPDGMPIPEWVYNPSILPVAPADPNANIAAKYPANLNPAECPNYPYCW-

4

ACCEPTED MANUSCRIPT >gi|74326249|gb|DT662070.1|DT662070 Heliconius erato MYAKLFIVCVVAAVALA REYPAGLHPAVCPNYPFCDSNAFARFTPEGMPIPEWVYNPSILPVAPADPNANIAAKYPANLNPAECPNYPYCW>gi|308155535|gb|FS940564.1|FS940564 FS940564 Mamestra brassicae MYGKLLMFFALAAVALA REYPAGVHPAVCPNYPYCDADALARHTPDGMPIPQWGYNPAVAPAHPGPVPAAPRYPADFPAALCPNYPYCW-

(1 species, 1 sequences)

RI PT

Siphonaptera

Diptera

SC

>gi|604977437|gb|GAWY01009901.1| TSA: Oropsylla silantiewi comp14334_c0_seq1 MFVKIVLSVSALCLLASA APQAARYPAGLNPSLCPGYPYCDNLLLAKYAPSAAGGAYVAPATSHTHDYNGVGGDKYPAGVDPSTCPNYPFCDNNVGAGYYAPPLPGF KQRLYPDGVSAHNCPNYPFCH-

(27 species, 28 sequences)

AC C

EP

TE D

M AN U

>gi|118789538|ref|XP_317486.3| AGAP007980-PA [Anopheles gambiae str. PEST] MFSKVIAVLAFAAVVAA KPQHQPAAQYPAGVDPSRCPSYPNCDNAALHSPNPYNNHAANHWNPNWNAQPSWNAAPAPAPAPAAYYHGAPHSYQALTGPSHNYIGAP SPSAGGDRYPAGVNPQSCPNYPYCDNTVQAGVPQVAPLPGYTSRQYPAGVSPHTCPNFPYC>gi|568252033|gb|ETN61428.1| hypothetical protein AND_006914 [Anopheles darlingi] MFSKVIAVLAFAAVVAA KPQHQQAAQYPAGVDPSRCPSYPNCDNAALHSPNPYNNHAANHWNPNWNVQPSWNHAPAAPAPASYYHGAPHSYQALTGPSHNYLGAPA PTAGGDRYPAGVNPQSCPNYPYCDNSAPAGVPHVAPLPGYTARQYPAGVSPHACPNFPYC>gi|668457631|gb|KFB45634.1| AGAP007980-PA-like protein [Anopheles sinensis] MFSKVIAVLAFAAVVAA KPQHQPAAQYPAGVDPSRCPSYPNCDNAALHNPNHAYNNHAANHAANHWNPNWNAQPSWNAAPAPVPAPAPYYHGAPHSYQALTGPSHN YLGAPAPTAGGDRYPAGVDPQACPNYPYCNNLAPAGAPQAAPLPGFTSRQYPAGVSPHTCPNFPYC>gi|302221267|gb|EZ977345.1| TSA: Anopheles funestus Afun011392 MFSKVIAVLAFAAVVAA KPQHQQAAQYPAGVDPSRCPSYPNCDNAALHSPNPYNNHAANHWNPNWNAQPSWNPAPAPAPASYYHGAPHSYQALTGPSHNYLGAPAP TAGGDRYPAGVNPQSCPNYPYCDNTVQAGVPQAAPLPGYTSRQYPAGVSPHTCPNFPYC>gi|704848324|gb|GBTE01001587.1| TSA: Anopheles quadrimaculatus m.2806 MFSKVIAVLAFAAVVAA KPQHQPAAQYPAGVDPSRCPSYPNCDNAALHNPNHQTYNNHAANHWNPNWNAQPSWNAAPAHAPAPAPYYHGAPQSYQALTGPSHNYLG APASTAGGDRYPAGVDPQACPNYPYCDNLAPAGVPQAAPLPGYHARQYPAGVSGHTCPNYPYC>gi|157105379|ref|XP_001648842.1| hypothetical protein AaeL_AAEL004292 [Aedes aegypti] MYSKMIAVLALAAVAIA APQHQEAARFPAGVNPNACPSYPNCDNAALHNQNPPANHANNHWNPNWNAQPAAPAWNAQPAAPSWNPQPAAHSWNQQPAAPSWNPQPA AHSWNQQPAAPAWNAQPQPHWNSFPAVTGPANNHLAAAPAPSAGGDKYPAGVNPQTCPNYPFCDHAATAGAPQVAPLPGYTERLYPAGV SPHSCPNFPYCN>gi|401007996|gb|KA191234.1| TSA: Chironomus riparius CripIT16530 MFKLVTFVTLFAVAFS APQHAAKYPAGVDPSKCPNFPICDNAALHAKAPAYNHWDQPAAHWNQPAQAYNHWDHQPAAPQWAPAPQWNNAAQYNHVAPAAPKAAAK YPAGVDPRSCPDFPYCPTPILPGHHAHHVAPLPGFTERLYPAGVSAHTCPNYPEC>gi|511203332|gb|GAKJ01009672.1| TSA: Sitodiplosis mosellana Unigene17991_S_mosellanaA MNSKVICFFVLIAAVHS APQHDQPARYPAGVNPALCPGFPICDNSLLHGTPPVPAAPHAAYTGSPAWNHGAQSYAYHSAPAYNQWNQPQHYDAHDYDYSTNDINGP GGDKYPAGVNPSACPNYPYCDNGAASHYAPVATPLAGYASRQYPAGISPAACPNYPYCA>gi|697393690|gb|GBRL01010158.1| TSA: Sitodiplosis mosellana CL166.Contig1_S_mosellanaA MNSKVISFFVLIAAVYS APHATQWPAGVHPSVCPNYPYCDTGAIAASVAPLEGFSTRLYPAGISPAACPGYPICDNTVVHNTPLVNTVNPAWNQPSTVVDKYPAGV HPSACPNYPYCSTGPATPLEGFSTRLYPAGIVAASCPSYPYC-

5

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

>gi|590125442|gb|GAWM01017815.1| TSA: Culicoides sonorensis m.7430 MFSKVFVVLATIAYVAA QEAARYPAGVDPSRCPGFPICDNAALHNVNPVPYSAPSYHQPQYYSAPAPTNYDDTGAYDPRYNDPNFQGNNGGYYQAPAPVQHYQPAP VQYAAPVSNHIAAPAADKYPAGVSPNSCPNYPYCDVNAGHNGPARAAPLPGFTERLYPAGVNPSACPNFPDCPIGQ>gi|194753029|ref|XP_001958821.1| GF12575 [Drosophila ananassae] MFFKLLFASCLALALA KPQHPPAAQYPAGVNPQDCPGFPICDNARLHNPQPQWGAPQPQWQQPQPQWQPQPQPQWQPQPQWQQPQPQWQPQPSWNAAPAPSAGGD KYPAGVNPQTCPNYPYCDVNAGHGGAPVAAPPLPGWTERLYPAGVSPHQCPNFPYCH>gi|195474759|ref|XP_002089657.1| GE22901 [Drosophila yakuba] MFCKLLFATFVALAVA KPQHQPAAQYPAGVNPQDCPNFPICDNARLHNPQPQWGAPQPQWNPQPQPQWNPQPQWQQPQPQWNPQPQPQWQPQPSWNAAPAAAPGG DKYPAGVNPQTCPNYPYCDVNAGHAGAPVAAPPLPGWTERLYPAGVSPHQCPNFPYCN>gi|195030001|ref|XP_001987860.1| GH22145 [Drosophila grimshawi] MFYKLLLASCLALALA KPQHQPAAQYPAGVNPQDCPNFPICDNARLHNPHSQWQQPQPQWQQPQPQWQPQPQQHWQQPQSQWQQPQPQWQPQPAWNAPPAASAGG DKYPAGVNPQTCPNYPYCDVNAGHGGAPVAAPPLPGWTERLYPAGVSPHQCPNFPFCN>gi|221330063|ref|NP_610394.2| CG8736 [Drosophila melanogaster] MFCKLLFATFVALAVA KPQHQPAAQYPAGVNPQDCPNFPICDNARLHNPQPQWGAPQPQWNPQPQPQWNPQPQWQQPQPQWNPQPQPQWQAQPSWNAAPAAAPGG DKYPAGVNPQTCPNYPYCDVNAGHAGAPVAAPPLPGWTERLYPAGVSPHQCPNFPYCN>gi|194863443|ref|XP_001970443.1| GG10631 [Drosophila erecta] MFCKLLFATFVALAVA KPQHQPAAQYPAGVNPQDCPNFPICDNARLHNPQPQWGAPQPQWNAQPQPQWNPQPQPQWQQPQPQWNPQPQPQWQPQPSWNAAPAAAP GGDKYPAGVNPQTCPNYPYCDVNAGHAGAPVAAPPLPGWTERLYPAGVSPHQCPNFPYCN>gi|195148990|ref|XP_002015442.1| GL11020 [Drosophila persimilis] MFFKLLFASCLALALA KPQHPPAAQYPAGVNPQDCPGFPICDNERLHSPKSQWGAPQNQWQPQPQWQQQPQTWQPQPQWQPQPQTWQPQPQWQPQPQPSWNSAPA PAAGGDKYPAGVNPQTCPNYPYCDVNAGHAGGPVAAPPLPGWTERLYPAGVSPHECPNFPYCH>gi|195581589|ref|XP_002080616.1| GD10155 [Drosophila simulans] MFCKLLFATFVALAVA KPQHQPAAQYPAGVNPQDCPNFPICDNARLHNPQPQWGAPQPQWNPQPQPQWNPQPQWQQPQPQWNPQPQPQWQPQPSWNAAPAAAPGG DKYPAGVNPQTCPNYPYCDVNAGHAGAPVAAPPLPGWTERLYPAGVSPHQCPNFPYCN>gi|195332522|ref|XP_002032946.1| GM20677 [Drosophila sechellia] MFCKLLFATFVALAVA KPQHQPAAQYPAGVNPQDCPNFPICDNARLHNPQPQWGAPQPQWNPQPQPQWNPQPQWQQPQPQWNPQPQPQWQPQPSWNAAPAGAPGG DKYPAGVNPQTCPNYPYCDVNAGHAGAPVAAPPLPGWTERLYPAGVSPHQCPNFPYCN>gi|195120670|ref|XP_002004844.1| GI19375 [Drosophila mojavensis] MFFKLLFVSCVGVALA KPQLQPAAQYPAGVNPQDCPNFPLCDNARLHNPQSQWQQPQPQWQPQPQWQPQPQPQWQPQPQWQQPQPQWQPQPSWNPAPAPAPGGGD KYPAGINPQTCPNYPYCDVNAGHAAAPVAAPPLPGWTERLYPAGVSPQQCPNFPYCH>gi|195455414|ref|XP_002074713.1| GK23212 [Drosophila willistoni] MFFKLLLFVSCLALTLA KPQHQQAAQYPAGVNPQDCPGFPICDNARLHNPQAHNQWQPQPQWQQPQPQWQQPQQQWQPQPQWQQPQQQWQPQPSWNAAPAPSAGGD KYPAGINPQTCPNYPYCDVNAGHAGAPVAAPPLPGWTERLYPAGVSPHQCPNFPYCQ>gi|195384435|ref|XP_002050923.1| GJ19933 [Drosophila virilis] MFFKLLLASCLALALA KPQHPPAAQYPAGVNPQDCPGFPICDNARLHNPQSQWQQPQPQWQQPQPQWQPQPQQHWQQPQPQWQQPQPQWQPQPSWNAAPAPAAGG DKYPAGINPQTCPNYPYCDVNAGHAAAPVAAPPLPGWTERLYPAGVSPHQCPNFPFCN>gi|499013673|ref|XP_004537594.1| PREDICTED: cuticle protein 1-like [Ceratitis capitata] MFCKLAFISLFAVALA KPQHLPAAQYPAGVNPQDCPGFPICDNARLHNPQAHWGAPAPAWQPQPQWGAPAPAWQPQPHWGAPAAPSWQPAPAPSAGGDKFPAGVN PHTCPNYPFCDVNAGHGAVAAPPLPGWTERQYPAGVSPHQCPNFPYCN>gi|615215794|gb|GBBP01054551.1| TSA: Teleopsis dalmanni Td_comp152510_c0_seq2 MFFKLCLLSVIALTFA KPQHAPAAQFPAGVNPQDCPGFPICDNARLHNPQSNWGAPQPSWNSQPSWNNGQSQWNNNGQWNNNNDDGQWNGGHDQWNNNNDQWNNG GQSSWNNGGQSSWNNGGQSSWNNGGQSSWNNGGQSWNGAPTGGNAGSGQFPAGVNPHSCPNYPFCNINGGGSAPVAAPPLPGWSERQYP AGVSPHQCPNFPYCK>gi|545914203|gb|GANO01004087.1| TSA: Corethrella appendiculata CorSigP-3899 MFSKLIAILATVAAVSA APQHLEAARFPAGVNPAACPGYPNCDNAALHNPQPQWNQWNAPQPQWNAAPQPQWNPAPQPQWNAAPQPQWNQHAEAAPQWDPNTKNNN PLWNVPAAAQQYNYPALTGPATNHLGSGGDKYPAGVNPHTCPNYPYCDTNAGHAGAVRAAPLPGFTERQYPAGVNPHQCPNFPYCS-

6

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

>gi|289743532|gb|EZ424238.1| TSA: Glossina morsitans morsitans GM-8604 MFTKLVFFGLMSLALA KPQHLPAAQYPAGVNPQDCPGFPICDNARLHNPAARWQSAPAWKPQPQWSAPHWQAPAPAWNAPAPAWGAPAPAAGGDKFPAGVSPHTC PNYPFCDLHAGAAGAPAPPLPGWTERQYPAGVSPHTCPNFPYCH>gi|557778123|ref|XP_005188694.1| PREDICTED: cuticle protein 1-like [Musca domestica] MFTKLVILSLAAVACA KPQAAQYPAGVNPQDCPGFPICDNARLHNPASRWQQPQPAWQPQPSWQQPQPSWQPQPSWQQPQPSWQPQPQWNAPPAAPGAADKFPAG VSPHTCPNYPFCDVNAGGAAAPAPPLPGFTERQYPAGVSPHTCPNFPYCN>gi|751797740|ref|XM_011210452.1| PREDICTED: Bactrocera dorsalis cuticle protein 1 MFCKLAFISSLIALALA KPQHLPAAQYPAGVNPQDCPGFPICDNARLHNPQAHWGAPAPAWQPQPQWGAPAPSWQQQQQWGAPAPSWQGAPAPSWQGAPAASAGGD KFPAGVNPHTCPNYPFCDVNAGQHGAVAAPPLPGWTERQYPAGVSAHQCPNFPYCN>gi|751475075|ref|XM_011194661.1| PREDICTED: Bactrocera cucurbitae cuticle protein 1 MFCKLAFISSLIALALA KPQHLPAAQYPAGVNPQDCPGFPICDNARLHNPQAHWGAPAPAWQPQPQWGAPAPSWQPQPQWGAPAPSWQGAPAASAGGDKFPAGVNP HTCPNYPFCDVNAGHGAVAAPPLPGWTERQYPAGVSPHSCPNFPYCN-

7

ACCEPTED MANUSCRIPT

SUPPLEMENTARY FILE 8 MULTIPLE CPCFC GENES IN LADONA FULVA (Odonata) from WGS BioProject: PRJNA194433 >gi|481388480|gb|APVN01034708.1| Ladona fulva Contig34725, whole genome shotgun sequence

Range 1: 3552 to 3983

RI PT

QNKRGYVGMLEEAVFIIESGGRQVEGRLCTESVLFFLQMLLLVAAAVAVADKYPAGLNPALCPNYPNCDNALIALHSANPSAVTPYWAA PIAKEYPAGVHPAACPNYPYCGTAVPLAYAAWAPYTREWPAGVHPAACPNYPYCH>gi|481308109|gb|APVN01075567.1| Ladona fulva Contig75636, whole genome shotgun sequence

Range 1: 3296 to 3733

MGSAVAPAAKYPAGVNPHACPNYPYCDNVALAAHSAGAVHAPYAAPAYAHYGVHGPVAAAYPAGVDPHTCPNYPYCDNVAVHTARSAHG WAAPAWTAHGAWAAAPHAAWTGVAHGGWAGAHGVAHGAARYPAGVNPHTCPNYPYCH-

SC

>gi|481308101|gb|APVN01075571.1| Ladona fulva Contig75640, whole genome shotgun sequence

Range 1: 660 to 944

M AN U

MCVAAVLGGAVFPAARYPAGVNPLACPNYPYCDNVALAANPAGWARSAWNVPAWTAYSAPWNGVWSGVIPGVTPVAARYPPGVDPVACP NYPYCH >gi|481308097|gb|APVN01075573.1| Ladona fulva Contig75642, whole genome shotgun sequence

Range 1: 2724 to 3119

FQIILALCATAVLSTGIPAAKYPAGVSPHTCPNYPYCDNVALAAHAVAPYAAPAYAHYGVHGPVAAAYPAGVDPHTCPNYPYCDNVALA AHVTGAHGVYGAPWAAHGAWAGAAAHYPAGVSPHTCPNYPFCH>gi|481308097|gb|APVN01075573.1| Ladona fulva Contig75642, whole genome shotgun sequence

Range 1: 7900 to 8487

TE D

MPELTKEIIKYYCFRSQIALVLCVAIALGCANGQAAKYPAGVSPHLCPNYPHCDNAVLGAAAADSAAHAYSAPVYGGYAAPGYAAPGYA APGYAAPGYAASGHAAVAAVGYPAHVNPHSCPNYPYCGPTPVHVPSKVWAGAAAHGYGATAHNAWAAPAAYAHGGAASHGFSSLAKGGD RYPAGVSPHACPNYPYCH>gi|481308097|gb|APVN01075573.1| Ladona fulva Contig75642, whole genome shotgun sequence

Range 1: 14655 to 15188

EP

METARSNIPHLLFQIILALSAVAFIESVHSQAAKYPAGVDPHLCPNYPHCDNAALGAAASAAGAVAHAYAVPTYEEPSY HSYGAHASSAYASYAPAVPVSEGYPAGVNPHACPNYPFCGPTPTHVPSKIWAGPSAHGYPTPAYSPYGASAAYASPAKGGDRYPDGVDP HSCPNYPYCH>gi|481308089|gb|APVN01075577.1| Ladona fulva Contig75646, whole genome shotgun sequence

Range 1: 4957 to 5610

AC C

MSAGASGIMSEKSIIFFFFPQIVVALCIVGALGAAVPEAARYPAGVDPHVCPNYPNCDNVALAARASVPYAPSAPTHYAAPAYGHHAA APVPASLPAGVDARACPNYPYCGPTPVQPQAPSGYSSWSAPAPAPQAPSAYSSWSAPAPQPSWTQPAPQPAWSNPSSQNFWSNPSAPRA AVPNFNWNTPAQSPAPSSEQGGALFPAGVDPSSCPNYPYCH>gi|481308087|gb|APVN01075578.1| Ladona fulva Contig75647, whole genome shotgun sequence

Range 1: 98 to 856

HIGYNLFTHSLSSLISASTILFQFAVALCLLSCTVAQYVNYSPQVVQRPCYDYPNCGNIHRSPQVSNDGQDNIIWPDDGSYPGDTKES QVTEAPGEPGYPAGVNSNLCPNYPFCGPTPVYVKGKASDQAIWSVQATKSQPLSSPLAVHQQTSYKAPAAPSHTQYFVPSRAAPHNDWN GPSVQQVQQVSWTAPAVGKPHPNSLPAVAHNKWSAASIHSPATIPVAPFDYSAEGGVRYPADVDPNSCPNYPYCRV>gi|481308083|gb|APVN01075580.1| Ladona fulva Contig75649, whole genome shotgun sequence

Range 1: 1448 to 1903 LCFKVLVLCVVSAFVNAAPQAARYPAGVDPHTCPNYPNCDNVALAAHATGAAHAPYAAPTYSAYGHSSGVPGAAATLPAGVDARACPNY PYCGPTPVAVPRPHNTWSAPAAYNQWSAPAAPQQWSAPAPAQGGAHYPAGVDPHACPNYPYCS-

ACCEPTED MANUSCRIPT >gi|481308083|gb|APVN01075580.1| Ladona fulva Contig75649, whole genome shotgun sequence

Range 1: 10447 to 10989 IVFALCVVSALAAPQAAKYPAGVDPHTCPNYPNCDNVALAAHATGAPYAAPAYSAYGHAAGVPGAAAAYPAGVDPHACPNYPYCGPTPA HVPGAAPHNSWAAPAAHNNWAAPAAHNNWAAPAAHNNWAAPAAHNNWAAPAAHNNWAAPAAHNAWAAPAPAAAAVYPAGVSPHSCPNYP YCS>gi|481308083|gb|APVN01075580.1| Ladona fulva Contig75649, whole genome shotgun sequence

Range 1: 16205 to 16690

RI PT

IILAVCVVGTFGNPIPLAAKYPAGVNPHACPNYPYCDNALTAHAPYAAPAYAAYGHHGGVPGAAAKYPAGVDPHLCPNYPFCGPAVAHV PGVHGGWEGAGAWAGAHHGWDDGSYHGDDEGTYYGGDDDGSYNHWDDGSYNEWDDGSYNHWDDGSYHGDGHHW>gi|481308079|gb|APVN01075582.1| Ladona fulva Contig75651, whole genome shotgun sequence

Range 1: 488 to 1003

MPCFRQFILVLSAIATASAAISAAKYPAGVDPHTCPNYPNCDNVALAAHATGAPHAPYAAPAYSAYGHAAGVPGAAAA YPAGVDPHACPNYPYCGPTPAHVPGAAPHNSWAAPAAHNPWVAPAVNNAWAAHNGWAAAPATGHDGNHYPAGVSPHSCPNYPYCH-

SC

>gi|481308075|gb|APVN01075584.1| Ladona fulva Contig75653, whole genome shotgun sequence

Range 1: 2969 to 3481

M AN U

RSTLIRDSKIPIFFFHCSQQLSLILQIVLALCVAGALGGLVPHAAKYPAGVSPHTCPNYPFCDVSAHAVAPYAAHAYAAYGHHAGVPG AAAAYPAGVDPHICPNYPFCGPTPAHHGWAGAGAGAWDDGSYHPWYDNSAHYDDGSYKPWLDNAGHNDDGSYRPWQYGGHHHW>gi|481265194|gb|APVN01098002.1| Ladona fulva Contig98099, whole genome shotgun sequence

Range 1: 4925 to 5497

AC C

EP

TE D

MTNGSRHFRLLLQIILALCVAGALGSAIPAAAKYPAGVNPHTCPNYPYCDNVAVAAHAAHGAYGAHAAAPYAAHAYATYGHHAGVPGAA AKYPAGVDPHACPNYPFCGPTPAHVPGAHGAWAGHSASAGAAHNAWAGAHGGWAGAHGGWAGAHHGGWDDGSYHGEDDGQYHHWDDGSH WDDGSYHGDHHHW-

ACCEPTED MANUSCRIPT

SUPPLEMENTARY FILE 9 CPCFC FAMILY MEMBERS IN CRUSTACEA Ostracoda

(1 species?; 2 sequences)

Malacostraca

(4 species, 6 sequences) motif in all ends C-X(7)-C

SC

RI PT

>gi|333210927|gb|JL247049.1| TSA: Cypridininae sp. BMR-2011 mRNA sequence MMSFRLLVASILFTCALS KVIFPAGVNPAACPNFPFCDALIDPVTGNQVAPAENYPGYVPYNLKYPAGLIPAACPDFPYCTGRADRLLRFVNTGRQEVPAGINPAGW YA>gi|333231688|gb|JL267722.1| TSA: Cypridininae sp. BMR-2011 mRNA sequence MSLILLVASCLLATSMA LPRRLPPGVSLVGCPQWPICDPLIDPLTGASRGDPKDFPGYVPLKLKNPIGLSVLSCPDYPFCRGRAERQLQFLVTGRQQVPADVDPAL WYS-

AC C

EP

TE D

M AN U

>gi|510192250|gb|GAKD01008615.1| TSA: Melita plumulosa mira_rep_c8871 MSFIRATCLVLLVAVSLSTALP QELPAGVTAAECPNFPFFNCSPLLKAVAPAGQPAPSAAAVAAGAPAPKNPAGVKCFNFPFFPCNP>gi|510079550|gb|GAJQ01012904.1| TSA: Hyalella azteca contig16944 MAGLKIFTFCSALLVTLA ATNTVATPVLPAGVTAAECPNFPFFNCSPLLKAVVPESLAPAPAVPSPNAPPQQPANPAGVRCFNFPFFPCSP>gi|510070149|gb|GAJP01004516.1| TSA: Hyalella azteca contig06745 transcribed RNA sequence MAGLKIFAFCSALLVTLA ATNTVATPVLPAGVTAAECPNFPFFNCPLLKAVVPESLAPAPAVTSPNAPPQQPANPAGVRCFNFPFFPCSP>gi|742949070|gb|GARH01025610.1| TSA: Procambarus clarkii Prcla_ES_994_0 transcribed RNA sequence MRSLVVVLVVVVVMVVVVVVG YPSQLPAGVTAADCPTYPFFPCRVPVQPAQPANPASVTCFNFPFYSC>gi|170194750|gb|FE752277.1|FE752277 CAYF2211.g3 CAYF Petrolisthes cinctipes MRFNIAMVMMVVVVVVMVGVAMA LPANLPASVSAADCPGYPFYSCRQPAHPPQPANPAGVTCYNFPFYHCS>gi|170215384|gb|FE772151.1|FE772151 CCAG6495.b3 CCAG Petrolisthes cinctipes MRFNIAMVVVVVMMGVAMA LPANLPASVSAADCPGYPFYSCRQPAHPPQPANPAGVTCYNFPFYHCS-

Maxillopoda (3 species, 3 sequences) >gi|218457166|gb|FM882955.1|FM882955 FM882955 BA23840_2 Amphibalanus amphitrite cDNA clone 08_F12, mRNA sequence MLSLSVLVALVAVCSA QPVEYPEGVSPAACPNYPYCGTDANTLAAIQLASLAPSVRQYPAGVSAAACPNYPDCGSNSAIVSPAGVPLTRQYPAGVSAAACPNYPD CGSNSAIVSPAGVPLTRQYPAGVSAAACPNFPDCGSNSAIVSPAGVPLTRQYPAGVSAAACPNYPHC>gi|592916158|gb|GAXK01042217.1| TSA: Calanus finmarchicus comp299568_c0_seq1 MLAKIVSLCLMSTLVSG QAAQWPAGVSPAACPNYPDCSLTPAGYAGAVSAYPAGVLPAACPDYPYCTAAPAAPAGYVNTAGYPAGVAAAACPNFPYCY>gi|597435325|gb|GARW01001542.1| TSA: Eucyclops serrulatus MFTKIAVFAALFCLAAG QILLPAGVDPAVCPNYPYCDGVTVAPNLPAAAYPADVAPAACPDYPFCSSRAAAPIGYANTAGWPAGVAPAACPNFPYC-

ACCEPTED MANUSCRIPT

Remipedia

(1 species, 2 sequences)

AC C

EP

TE D

M AN U

SC

RI PT

>gi|333322291|gb|JL185402.1| TSA: Speleonectes cf. tulumensis BMR-2011 DMPC15238678 MLKFVVLLVLMATHLAMSHPV QYPAGVSPHECPNYPFCIRHPSHATANIERNFPSNILPAVCPNYPFCDNELLVQYL>gi|333269853|gb|JL132964.1| TSA: Speleonectes cf. tulumensis BMR-2011 DMPC15226770 MYKLITILVVVAVALAKP QEYPSGVNPATCPNYPFCTYEGVPSMLTHPAGVHHTVCPNYPFCTNTPQAYASVLPSFKYPAGVNPAVCSNYPYCG-

ACCEPTED MANUSCRIPT

SUPPLEMENTARY FILE 10 NON-ARTHROPOD TSA HITS and their most closely related Arthropod source. Initiator methionines are highlighted in green, final aa in red. The CPCFC consensus, marked only in the Arthropod sequence, except for the daisy, is in gray.

RI PT

Homo sapiens: Note that there is 100% identify in the protein coding sequence (between green and red) and extensive identity in both the 5’ and 3’ UTRs.

SC

>gi|389142709|gb|HY131203.1|HY131203 HY131203 RIKEN full-length enriched human cDNA library, brain Homo sapiens cDNA clone H06D096C22, mRNA sequence Blatella germinica 1: gi|698758469|gb|GBID01001268.1| TSA: Blattella germanica Contig1280 Blatella germinica 2:gi|698757539|gb|GBID01002198.1| TSA: Blattella germanica Contig2218

---------LYSFI-HFRSKSSTITMYCKLVVLAAIVAVAVAQADKYPAGLSPALCPNYP ---------LYSFI-HFRSKSSTITMYCKLVVLAAIVAVAVAQADKYPAGLSPALCPNYP HCRPVSIELALSFI-HFRSKSSTITMYCKLVVLAAIVAVAVAQADKYPAGLSPALCPNYP *** *********************************************

Blatellagermanica1 Blatellagermanica2 humanbrainHY131203.1

HCDNALIALYSNNAPAVPYASAYNYPAGVSPAACPNYPFCGALAPLGYHVREYPAGVSPA HCDNALIALYSNNAPAVPYASAYNYPAGVSPAACPNYPFCGALAPLGYHVREYPAGVSPA HCDNALIALYSNNAPAVPYASAYNYPAGVSPAACPNYPFCGALAPLGYHVREYPAGVSPA ************************************************************

Blatellagermanica1 Blatellagermanica2 humanbrainHY131203.1

ACPNYPYCH-IDLGI-SLGVA-CRKDSVLRVEMLLEQCIYTSICGEKFSYECYLSKIITR ACPNYPYCH-IDLGI-SLGVA-CRKDSVLRVEMLLEQCIYTSIRGEKFSYECYLSKIITR ACPNYPYCH-IDLGI-SLGFA-CRKDSVLRVEILLEQCIYTSISGEKLVFSAI-LXS--********* ***** ***.* **********:********** ***: :..

TE D

M AN U

Blatellagermanica1 Blatellagermanica2 humanbrainHY131203.1

The daisy, Karelinia caspia:

AC C

EP

Perfect match to TSA: Bemisia tabaci BT_B_ZJU_Singletons82302 Sequence ID: gb|HP653972.1 that covers truncated coding sequence of the whitefly and extends into the 5’UTR. Daisy sequence completes the whitefly sequence, and there is identity in the 5’UTR. >gi|675980905|gb|GANI01023091.1| TSA: Karelinia caspia comp33299_c0_seq1 >gi|319670146|gb|HP653972.1| TSA: Bemisia tabaci BT_B_ZJU_Singletons82302 mRNA sequence Bemisia Karelinia

---KKKKFRGFQPTVQHCSV-FRFPVFSVTTTNHKMIGKLVVLSALVAVVLAQAQQWPAG LAQAQQWLRGFQPTVQHCSV-FRFPVFSVTTTNHKMIGKLVVLSALVAVVLAQAQQWPAG :: :************ ***************************************

Bemisia Karelinia

LNPAACPNYPNCDNTVVALYGGLPYAPAASVGRSYPAGVPAAA----------------LNPAACPNYPNCDNTVVALYGGLPYAPAASVGRSYPAGVPAAACPNYPFCGSAAAPAGYV *******************************************

Bemisia Karelinia

-------------------AREYPAGVPAAACPNYPYC-

1

ACCEPTED MANUSCRIPT

The hop, Humulus lupulus >gi|422196604|gb|GAAW01027316.1| TSA: Humulus lupulus comp42311_c0_seq1 used as query.

RI PT

Best match (Sbjct) was to Bactrocera oleae, the olive fruit fly. >gi|510288382|gb|GAKB01003870.1| TSA: Bactrocera oleae contig03870 transcribed RNA sequence. Only the mature protein is shown. The top four matches were all to members of the genus, Bactrodera.

Expect Method Identities Positives Gaps Frame Score 106 bits(264) 4e-25 Compositional matrix adjust. 75/149(50%) 89/149(59%) 19/149(12%) +2

Sbjct

230

Query

61

Sbjct

389

Query

114

Sbjct

554

AQYPAGVNPHLCPNYPHCDNALLGLHAQNAAAAAAAPAHNAYPYANPNPYGNPNPYGNPN AQYPAGVNP CP +P CDNA L H A A APA P +G P P P AQYPAGVNPQDCPGFPICDNARL--HNPQAHWGAPAPAWQPQPQ-----WGAPAPSWQPQ

60

PYAVPAVPSYVPNHLGVPA---HGQPAAA----QYPAGVSPHECPNYPYCSNHPGAGGPV P PS+ G PA +G PAA+ ++PAGV+PH CPNYP+C + G G V PQWGAPAPSW----QGAPASSWNGAPAASAGGDKFPAGVNPHTCPNYPFCDVNAGHGA-V

113

SC

1

388

553

M AN U

Query

AAPPLPGFSSRQYPDGVSPHACPNYPYCH AAPPLPG++ RQYP GVSPH CPN+PYC+ AAPPLPGWTERQYPAGVSPHQCPNFPYCN

142 640

Chinese salamander, Hynobius chinensis:

TE D

gi|570852487|gb|GAQK01079415.1| TSA: Hynobius chinensis comp4708_c0_seq1 The best match (Sbjct) indicates that the sequence that had been attributed to the salamander (used as the query against TSA) actually comes from a chironomid related to gi|401007996|gb|KA191234.1| TSA: Chironomus riparius CripIT16530 mRNA sequence.

1

Sbjct

85

Query

61

Sbjct

259

Query

118

Sbjct

439

MFKLVTFVTLFAVAFSAPQHAAKYPAGVDPAKCPGFPICDNAALHAPAHAPAYNHWDQPA MFKLVTFVTLFAVAFSAPQHAAKYPAGVDP+KCP FPICDNAALHA A PAYNHWDQPA MFKLVTFVTLFAVAFSAPQHAAKYPAGVDPSKCPNFPICDNAALHAKA--PAYNHWDQPA

60

NHWSPPAPAYNHWDHQPAAHWNQPAPQWN---QYNHVAPAAPKAAAKYPAGVDPSKCPNF HW+ PA AYNHWDHQPAA PAPQWN QYNHVAPAAPKAAAKYPAGVDP CP+F AHWNQPAQAYNHWDHQPAAPQWAPAPQWNNAAQYNHVAPAAPKAAAKYPAGVDPRSCPDF

117

AC C

Query

EP

Score Expect Method Identities Positives Gaps Frame 229 bits(583) 6e-73 Compositional matrix adjust. 142/162(88%) 146/162(90%) 5/162(3%) +1

PYCPTPVLPGHHAHHVAPLPGFTERLYPAGVSAHTCPNYPEC PYCPTP+LPGHHAHHVAPLPGFTERLYPAGVSAHTCPNYPEC PYCPTPILPGHHAHHVAPLPGFTERLYPAGVSAHTCPNYPEC

258

438

159 564

2

ACCEPTED MANUSCRIPT Daphnia pulex (library 12) There are extensive EST data for Daphnia, from libraries where Daphnia had been subjected to challenging environments. Interestingly, the only matches to CPCFC queries were from library 12 that came from animals exposed to the midge, Corethrella appendiculata.

Subject: TSA: Corethrella appendiculata CorSigP-3899 mRNA sequence

RI PT

>Daphnia pulex FE342003.1 -- query MFSKVIVLCATLAVSFA APQHQEAARYPAGVNPAACPSYPNCDNAALHNPQPQNHQQNHWNPSWNAAPAPYTAPNHYSPPAPAYNHEQNQWNPSWNAPALHGPAHN YLGNPAPASAPTGGDKYPAGVNPQSCPNYPFCDNSAPAGHQQ VAPLPRFTERQYPAGVNPHTCPNFPYCN

Query

61

Sbjct

181

Query

98

Sbjct

361

Query

157

Sbjct

517

M AN U

1

NPSWNAAPAPYTAP--------------NHYSPPAPAYNHEQNQWNPSWNA--------P WNAAP P P N ++ AP ++ NP WN QPQWNAAPQPQWNPAPQPQWNAAPQPQWNQHAEAAPQWDPNTKNNNPLWNVPAAAQQYNY PALHGPAHNYLGNPAPASAPTGGDKYPAGVNPQSCPNYPFCD-NSAPAGHQQVAPLPRFT PAL GPA N+LG +GGDKYPAGVNP +CPNYP+CD N+ AG + APLP FT PALTGPATNHLG--------SGGDKYPAGVNPHTCPNYPYCDTNAGHAGAVRAAPLPGFT ERQYPAGVNPHTCPNFPYCN ERQYPAGVNPH CPNFPYC+ ERQYPAGVNPHQCPNFPYCS

176

60 180 97 360 156 516

TE D

Sbjct

MFSKVIVLCATLAVSFAAPQHQEAARYPAGVNPAACPSYPNCDNAALHNPQPQNHQQNHW MFSK+I + AT+A AAPQH EAAR+PAGVNPAACP YPNCDNAALHNPQPQ +Q N MFSKLIAILATVAAVSAAPQHLEAARFPAGVNPAACPGYPNCDNAALHNPQPQWNQWNAP

576

EP

1

AC C

Query

SC

Score Expect Method Identities Positives Gaps Frame 186 bits(472) 2e-56 Compositional matrix adjust. 113/200(57%) 129/200(64%) 32/200(16%) +1

3