Development of expressed sequenced tags (EST) to identify some pathogen resistance genes expressed in Gossypium arboreum

Development of expressed sequenced tags (EST) to identify some pathogen resistance genes expressed in Gossypium arboreum

Accepted Manuscript Development of expressed sequenced tags (EST) to identify some pathogen resistance genes expressed in Gossypium arboreum Rakhshan...

140KB Sizes 1 Downloads 48 Views

Accepted Manuscript Development of expressed sequenced tags (EST) to identify some pathogen resistance genes expressed in Gossypium arboreum

Rakhshanda Mushtaq, Khurram Shahzad, Zahid Hussain Shah, Hameed Alsamadany, Tahir Mujtaba, Yahya Al-Zahrani, Hind A.S. Alzahrani, Zaheer Ahmed, Shahid Mansoor, Aftab Bashir PII: DOI: Article Number: Reference:

S2452-0144(19)30039-1 https://doi.org/10.1016/j.genrep.2019.100397 100397 GENREP 100397

To appear in:

Gene Reports

Received date: Accepted date:

6 March 2019 22 March 2019

Please cite this article as: R. Mushtaq, K. Shahzad, Z.H. Shah, et al., Development of expressed sequenced tags (EST) to identify some pathogen resistance genes expressed in Gossypium arboreum, Gene Reports, https://doi.org/10.1016/j.genrep.2019.100397

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT Title: Development of expressed sequenced tags (EST) to identify some pathogen resistance genes expressed in Gossypium arboreum. Rakhshanda Mushtaq1, Khurram Shahzad2*, Zahid Hussain Shah3, Hameed Alsamadany4, Tahir Mujtaba5, Yahya Al-Zahrani4, Hind A. S. Alzahrani6, Zaheer Ahmed7 , Shahid Mansoor8, Aftab Bashir8,

1

M

AN

US

CR

IP

T

Department of Biotechnology, Pakistan Institute of Engineering and Applied Sciences Nilore, Islamabad, Pakistan. 2 Department of Plant Breeding and Genetics,The University of Haripur, Pakistan. 3 Department of Plant Breeding and Genetics, Pir Mehr Ali Shah Arid Agriculture University Rawalpindi. 4 Department of Biological Sciences, King Abdulaziz University Jeddah Saudi Arabia. 5 Plant and Forest Biotechnology Umea, Plant Science Centre (UPSC), Swedish University of Agriculture Sciences (SLU), Umea, Sweden. 6 College of Science, Imam Abdulrahman bin Faisal University, Dammam, Saudi Arabia. 7 Department of Plant Breeding and Genetics, University of Agriculture Faisalabad Pakistan. 8 National Institute of Biology and Genetic Engineering (NIBGE), Faisalabad, Pakistan

AC

CE

PT

ED

*Corresponding Authors:[email protected];[email protected]

ACCEPTED MANUSCRIPT 1. Abstract Cotton the most important fibre crop is facing a major threat due to a viral disease caused by cotton leaf curl virus (CLCuV). The cotton specie, Gossypium arboreum is resistant to this disease. Cotton scientists are working to find the key genes in G. arboreum that confer resistance against cotton leaf curl disease (CLCuD). Current research work is an effort to find some potential biotic stress related resistance genes from G. arboreum and the their evaluation against

T

CLCuV infection utilizing functional genomics approaches. Leaf cDNA library was constructed

IP

from field grown G. arboreum which was further utillized to identify and isolate clones involved

CR

in resistance against CLCuD. The clone sequences were exploited to establish expressed sequence tags (EST). The EST represented some important biotic stress resistance genes like

US

lipoxygenase, cytochrome P450, CPMMV like coat protein, serine threonine kinase, a RGA, lipid transfer protein and ubiquitin conjugating enzyme E2. As cotton is a fiber crop so some

AN

trichome development genes like aquaporin, arabinogalactans and cellulose synthase were also found. Lipoxygenases are known to be involved in apoptosis and biotic and abiotic stress

M

responses in plants. Here the members of LOX are identified in biotic stress resistant G. arboreum. G. arboreum genome encode 13 LOX proteins. The G. arboreum LOXs are validated

ED

based on protein alignment studies. This is the first report wherein number of LOXs are identified in cotton which may help to better understand the apoptosis and responses to biotic

PT

and abiotic stresses in naturally resistant G. arboreum.

CE

Key words: Resistant Cotton, EST, TM-Pred, LOX genes, VecScreen, CLCuV 2. Introduction

AC

Messenger RNAs (mRNA) are the gene transcripts expressed in all types of cells. The information coded on single stranded mRNA can be read by sequencing the reverse transcribed double stranded complementary DNA (cDNA) present in cDNA libraries. EST are the cDNA library clones representative of information encoded on mRNA (Rahman et al., 2017). Moreover, cDNA libraries of cells, tissues and organisms are made to generate respective EST. Clones from cDNA libraries are randomly selected and sequenced either from 5' or 3' end to yield an EST sequence. The EST sequence consists of 5' and 3' end vector sequences and highly informative middle sequence usually ranging from 200-800 bp in length (Nagaraj et al., 2006).

ACCEPTED MANUSCRIPT EST are an important source of transcriptome exploration, tool for gene discovery (Xu et al., 2011), molecular marker identification (Choudhary et al., 2012), microarray development (Sousounis and P. A. et al., 2012) and comparative genomics (Shangguan et al., 2013). Studies on EST are especially imperative in organisms whose full genomes have not been sequenced. In 1991 EST data was used for human gene discovery prior the completion of sequencing of human genome. Van der Hoeven et al. (2002) organized the tomato genome with the help of EST

T

analysis and predicted 35000 genes concentrated in the euchromatin region of the genome. EST

IP

and genome survey sequences are also used for computational identification and functional

CR

identification of micro RNA’s (Devi et al., 2016, Farooq et al., 2017).

An EST sequence encodes true information of mRNA but is highly error prone and

US

requires preprocessing. Especially the 5' and 3' ends of an EST contain vector sequence and are less readable. The sequence quality of middle portion is overall better (Aaronson et al., 1996).

AN

While analyzing raw EST sequences following steps are followed. (1) The 5' and 3' end vector sequences are trimmed, (2) low quality and very short sequences are deleted, (3) only high

M

quality EST are clustered to generate consensus sequence, (4) EST are BLAST searched in DNA database to identify similarities and to assign putative function, (5) also protein translates are

ED

searched in protein database to relate putative function. The analysis is finalized by functional

PT

annotation and subsequent visualization and results interpretation (Nagaraj et al., 2006) The largest repository of EST data (76,034,178 EST from 28,952 bio-samples as on

CE

October 2015) is dbEST. dbEST contains 337811 EST of G. hirsutum, 64798 EST of G. arboreum, 63577 EST of G. raimondaii and 39115 EST of G. barbadense. Recently the draft

AC

genome of G. raimondaii (D genome), G. arboreum (A genome) and G. hirsutum has been sequenced and assembled (Wang et al., 2012, Li et al., 2014; Tan and Wu et al., 2012). G. hirsutum (allotetraploid with AD genome) is the major cotton species cultivated in Pakistan due to its better fiber quality, is vulnerable to many biotic and abiotic stresses including CLCuV (Rahman et al., 2017). While G. arboreum (diploid, A genome species) is immune towards many stresses including cotton leaf curl virus disease (Akhtar et al., 2013).It is postulated that some of the genes coded by A-genome are suppressed in tetraploid cotton or alternatively some other genes from D genome are over expressed and make the species susceptible to viral and other diseases. There has been no attempt yet to look for the

ACCEPTED MANUSCRIPT differentially expressed genes in the two cotton species and identify the key genes responsible for resistance to CLCuD. However, R genes conferring resistance to some cotton pathogens have been pulled out from cotton genome by using degenerate primers on the R-genes from other plant genera (Tan et al., 2003). There is no report on the identification of R genes for CLCuV resistance. The current researchwas aimed at exploring the resistance mechanisms in naturally resistant cotton species (G.arboreum) by developing expressed sequenced tags (EST) to identify

T

some pathogen resistance genes expressed in G. arboreum. The EST generated through this work

CR

IP

were further screened to identify potential genes involved in host mediated resistance. 3. Materials and Methods

US

3.1 Sequencing of Clones from G. arboreum Leaf cDNA Library

High quality leaf cDNA library of G. arboreum was present at Gene Isolation Lab, NIBGE,

M

AN

Faisalabad, Pakistan. The library aliquote was spread

-

C. The

clones were streaked on LB agar plate with antibiotic and incubated overnight to get bacterial

ED

colonies. Then single colony was carefully picked and cultured in LB liquid for sixteen hours. Next day culture was harvested to purify plasmid through miniprep method using Fermentas

PT

plasmid isolation kit. The concentration of plasmids was checked by gel electrophoresis of 2ul of plasmid. The plasmids were restricted with EcoRI and HindIII restriction enzyme and run on

CE

agarose gel. Only the clones having insert sizes more than 750 bp were selected and sent for

AC

DNA sequencing by Sangers method. 3.2 Removal of Vector Sequences Vector sequences contained in the EST sequences were identified using the NCBI program VecScreen (described at http://ncbi.nlm.nih.gov/VecScreen/VecScreen.html). EST regions identified by VecScreen as strong or moderate matches to vector sequences were removed. Regions in an EST that VecScreen classified as weak matches to vector or segments of suspect origin were removed only if the EST showed additional evidence of vector contamination. 3.3 Categorization of EST

ACCEPTED MANUSCRIPT The sequences were BLAST searched at NCBI for their homology based categorization. About 80 % sequences were belonging to costitutively expressing genes that carry out basic cellular functions. These house keeping genes were not selected for further analysis. 3.4 Creation of Translated Protein Sequences for New Gene Candidates The nucleotide sequence of each EST was translated into the corresponding protein sequence

IP

T

using the Translate tool at www.expasy.ch.

CR

3.5 Predictions of Organelle Targeting and Transmembrane Topology for Putative Genes Predotar tool at Expasy.ch was used to determine the organelle targeting. Transmembrane

US

topology was predicted at TM-Pred tool of Expasy.ch.

AN

3.6 Alignment Construction

Alignment of protein sequences was constructed using CLC Genomics workbench

M

(https://www.qiagenbioinformatics.com/products/clc-genomics-workbench/)..

ED

3.7 EST Accessibility

The EST for putative genes have been submitted in GenBank under the accession numbers

PT

mentioned in results section.

CE

4. Results

EST analysis indicated some important EST as shown in Table 1. Almost 80 % gene sequences

AC

were of plant house keeping genes. Genes for some important categories were found while screening the G. arboreum leaf cDNA library. Some important gene categories are discussed as follows:

4.1 Plant Defense Related Genes The EST named MP31, ARB65, MP14, ARB288, ARB32, ARB72, ARB124, ARB230, MP49, ARB150 and RM4 encodes important genes that might be playing role in the manifestation of plant resistance against various biotic stresses. The characterization of these EST has been described as under.

ACCEPTED MANUSCRIPT Sequence analysis of an EST for lipoxygenase MP31 encodes a 3' end EST, 731 bp long after removal of 5' and 3' end vector sequence. When translated using Expasy translate tool we got an ORF of 231 amino acids spanning 698 bp nBLAST and BLASTX showed strong sequence similarity with the lipoxygenases of other plant species. nBLAST showed 84 % identity with Theobroma cacao (XM 007049521.1) LOX with

T

E-Value 2e-100. BLAST-X against non redundant protein database (nr) at NCBI shows that this

IP

sequence contains lipoxygenase domain and 56 % sequence similarity with Ricinus communis

CR

(XP_ 002527266.1) lipoxygenase with E-Value 3e-75. While MP31 have 68 % identity with Arabidopsis thaliana LOX-5 (NM-113137.3) with Evalue 3e-17. It showed 29 % identity with human lipoxygenase 3 gene (CAC12843.1) with E-Value 3e-20. No transmembrane region was

US

found in MP31 protein sequence. MP31 is used here for the identification of LOX genes in cotton. The cotton genome encodes for thirteen LOX proteins (Table 2). Of these majoriry of

AN

them are composed of 855-982 aa with the exception of loci LOC_108468402 (516 aa). Five LOX are located on chromosome 3, two on each chromosome 5 and 7 and one LOX located on

M

each chromosome 1, 4 and 13. Four of the LOX are annotated to contain chloroplast taegeting sequences on the basis of SignalP results (Table 2). The table 3 indicates homology percent

ED

identity/similarity for cotton LOX against Arabidopsis LOX proteins. The best probable hit with Arabidopsis protein database is taken into consideration while preparing the table 3. Interstingly

PT

all the cotton LOX showed high level of protein similarity with known LOXs from Arabidopsis indicating their novelty at the genomic level. The protein alignment (Figure 4) shows that LOX

CE

sequences are relatively well conserved in cotton at C terminus.

AC

n phylogenetic tree (Figure 1) the grouping of MP31 and all its homologs in other gossypium species with the AtLOX1, AtLOX5 and GhLOX1 (9-LOX) shows that MP31 and its homologs are possibly 9-LOX. However MP31 showed close similarity with its homolog EST of D. raimondii not with its homolog EST of G. arboreum. Sequence Analysis of an EST for Cytochrome P450 The cDNA clone ARB-65, a truncated EST at both ends encodes for cytochrome p450 consisting of 755 bp after removing vector sequences. This 755 bp sequence consisted of ORF of 267 amino acids. BLASTn showed 73 % sequence identity with T. cacao cytochrome P450 (XM-

ACCEPTED MANUSCRIPT 007017023) with E-Value 5e-115. BLAST X results showed presence of P-450 domain and maximum homology with T. cacao cytochrome P-450 (XP-007017085.1) assigned CYP79A2 with E-Value of 1e-100, 58 % similarity with Populous trichocarpa CYP79B2 (XP002305081), 57 % similarity with CYP79A68 of Prunus mume (BAP15883) and 57 % similarity to CYP 79D15 of Trifolium montanum (AHY21762). These results indicated that ARB-65 is a P450 belonging to CYP79 family of enzymes. ARB65 showed 71 % identity with A. thaliana

T

cytochrome P450 CYP79A2 mRNA (CP002688.1) with E-Value 2e-14 and 28 % identity with

CR

IP

Homo sapiens P-450 (AAA19567.1). Sequence Analysis of MP14

US

An interesting EST MP14 of 746 bp length and 248 amino acids was found in our sequencing data showing conserved domain of Flexi-CP and Flexi-CP-N. MP14 consists of a protein starting

AN

with methionine and ending at stop codon. BLASTn results indicated 76 % identity (E-Value 2e146) with cow pea mild mottle virus coat protein (JX 020701.1). BLASTX results show 91%

M

identity with coat protein of cowpea mild mottle virus (AAB94082.1) with E-Value 2e-169. Also 67 % identity with coat protein of cucumber vein clearing virus (AEP83730.1), 63 % with coat

ED

protein of Hippeastrum latent virus (YP 002308451.1), 64 % with coat protein of Phlox virus M (ABP68910.1), 62 % coat protein of Potato virus M (ACF05255.1) and 62 % with Hop mosaic

PT

virus coat protein (ACS45220.1). When BLAST searched in Arabidopsis genome using TAIRBLAST (www.arabidopsis.org/Blast/index.jsp) it did not showed any significant identity. No

CE

BLAST hits were found against this sequence in Gossypium EST and HTGS in GenBank. To know the homology of MP14 with begomo viruses it was BLAST searched against begomo virus

sequence.

AC

sequence in database but did not find any match with any begomo virus nucleotide or protein

EST encoding Serine/Threonine Kinases Two G. arboreum EST ARB-32 (KT223831) and ARB-150 (KT223832) show homology with serine/threonine kinase (STK) class of proteins. ARB-32 showed 90 % identity with T. cacao STK (XM007014092), 99 % similarity with putative STK-drdK protein of G. arboreum (KHF98176) and 77 % homology with A. lyrata kinase family protein (XP_002864520). While ARB150 is found 95 % and 55 % similar with G. arboreum (KHG13005) and A. thaliana

ACCEPTED MANUSCRIPT (NP_199829) STK protein respectively in BLASTP results. The protein sequences of both EST contain PKc domain. STK proteins are subclass of membrane receptor protein kinase. ARB32 also exhibited transmembrane region as predicted by TMPred tool at Expasy.ch (Figure 2) EST encoding a RGA One EST named RM4 (Accession No. KT250636) showed 71 and 72 % homology with TMV

T

resistance protein N gene of Citrus sinensis and Morus notabilis respectively in nBLAST results.

IP

BLASTP indicated the presence of TIR domain characteristic of plant R genes and 50 %

CR

homology with TIR-NBS resistance protein of Medicago truncatula(XP003613396). EST Encoding Lipid Transfer Proteins

US

Four EST encoding Lipid Transfer Proteins (LTPs) were also found in the screening of cDNA clones. All showing significant homology with the G. hirsutum LTPs (Table 1). ARB72,

AN

ARB124 and MP49 encode full length LTP proteins that contain N terminal signal peptides (Figure 3) as predicted using neural networks (NN) trained on eukaryotes. While ARB230 was a

M

partial EST so does not contain signal peptide sequence in its protein translate.

ED

EST Encoding Ubiquitin conjugating enzyme E2 Ubiquitin conjugating enzymes (UBCs) are the enzymes of proteasomal degradaion pathway that

PT

work for cell protein degradation. One important EST, ARB-288 showed 99 % homolgy with the UBC-E2 from Zostera marina, Vitis vinifera, Nicotiana sylvestris, Malus domestica and many

CE

other plants UBCs. The BLASTX results showed the presence of UBCs super family domain in the 447 bp length sequence. ARB288 was also nBLAST searched in GhEST database to find its

sequence.

AC

homolog in G. hirsutum. GhUBC EST also showed 99 % similarity with GaUBC nucleotide

4.2 EST Related to Cell Morphogenesis Genes Genes involved in fiber or trichome development were also found in leaf cDNA library clones like aquaporin (ARB 29, ARB 46 and MP 50), LTPs (ARB 72, ARB 124, ARB 230, MP 49), expansin (ARB137), arabinogalactans (ARB212) and cellulose synthase (ARB 286). The BLASTX results of ARB137 indicated signatures of Pollen allergen-I. That is known in pollen allergy causing proteins. The detail and BLAST similarity search results of these EST are shown

ACCEPTED MANUSCRIPT in Table1. All these genes are important in fiber development and are being used in another study to improve the fiber length in G. hirsutum by over expressing these genes. 4.3 EST Related to Plant Hormone Three EST related to plant hormones were also found. BLAST results indicated that ARB27 is a putative auxin binding protein, ARB162 is a putative auxin responsive protein and ARB61 is a

T

putative ethylene responsive transcription factor. All three EST showed maximum homology

IP

with G. hirsutum respective proteins (Table 1).

CR

5. Discussion

US

The screening of cDNA libraries is a potent source of finding genes being expressed in a tissue or an organism. In this study the main goal was to find biotic stress related genes from

AN

biotic stress resistant cotton species (G. arboreum). The genes expressing in resistant species are important candidates to study their relation towards incorporation of resistance in susceptible

M

genotypes. Lipoxygenase (LOX) is an enzyme in the oxilipin pathway leading to the production of jasmonic acid and other compounds that play role in plant defense against various biotic and

ED

abiotic stresses (Babenko et al., 2017; Raya-Gonzalez et al., 2017). Lipoxygenases are expressed in seed (Long et al., 2013), play role in plant vegetative growth (Yang et al., 2012), response to

(Vicente et al., 2012).

PT

wounding (Yan et al., 2013), herbivore attack (Christensen et al., 2013) and pathogen attack

CE

Lipoxygenases are widespread in both plants and animals. They catalyze the peroxidation of polyunsaturated fatty acids. This reaction is pivotal in the enzymatic cascade that leads to

AC

production of numerous metabolism regulators named oxylipins (Ogorodnikova et al., 2015; Pokotylo et al., 2015). LOX genes are also involved in resistance mechanism in Arabidopsis and wheat against fusarium blight (Nalam et al., 2015). With respect to the importance of lipoxygenases in plant defense, the putative LOX EST (MP31) seem to be a novel candidate as it is expressed in G. arboreum which is known to be resistant to many of the biotic and abiotic stresses including CLCuV.There are 6 LOX genes in Arabidopsis and 14 LOX genes are identified in rice (Umate 2011). In this study we find 13 LOX in G. arboreum genome.in phylogenetic analysis (Figure 1) MP31 grouped with the AtLOX1 and AtLOX5. It indicated that MP31 is possibly a 9-LOX. The Arabidopsis 9-LOX exhibited comparable oxygenase activity

ACCEPTED MANUSCRIPT either for linoleic acid or linolenic acid while 13-LOX (AtLOX2, AtLOX3, AtLOX4 and AtLOX6) specifically oxygenate linolenic acid only . In G. hirsutum two LOX genes are known. G. hirsutum LOX1, encoding a 9-LOX is reported to cause cell death during hypersensitive response while interacting with Xanthomonas campestris (Marmey et al., 2007). GhLOX2 is a 13-LOX with chloroplast transit peptide at N terminus and expressed early during hypersensitive response in Xanthomonas campestris infected cotton plants (Sanier et al., 2012). MP31 have

T

strong homology with GhLOX1, so it might be a player in conferring resistance against

CR

IP

pathogens.

Disease prevalence might be due to the ability of pathogen to hijack plant defense system. Recently Dubey et al., (2013) have observed the decrease in expression of GhLOX1 and

US

GhLOX2 during the first hour of infestation of G. hirsutum plant with aphid and whitefly, exhibiting insect mediated suppression of plant defense. It is hypothesized that G. hirsutum

AN

becomes susceptible to most of the pathogens because of the ability of pathogens to hijack the

M

plant defense machinery like LOX pathway in this species. The discovery of an EST named ARB65, a putative P450 CYP79A2 is important as it is

ED

known to be involved in glucosinolate production. Cytochrome P450 (CYP) enzymes are important to carry out very critical steps in plant

PT

cell, most notably involved in anticancer and antimicrobial activity (Nelson, 2018; Sun et al., 2018; Yang et al., 2018; Ye et al., 2018; Zhan et al., 2018). Due to a lot of diversity of known

CE

CYPs, 59 families are known in plants (Nelson et al., 2008). CYPs families are also known in animals, fungi, protists, bacteria, archeae and even in viruses. The members in CYP79 family

AC

catalyze the conversion of amino acids to oximes in order to produce glucosinolate. Glucosinolate are important as anti-cancer agents and plant protectants against insect and fungal attacks. Glucosinolates are produced in the members of order capparales which includes crops of brassicaceae family (Wittstock 2000). In Arabidopsis seven members in CYP79 family are known and these members of CYP79 family produce different types of glucosinolates (Mikkelsen et al., 2003). CYP79A2 metabolizes the phenylalanine (Wittstock 2000), CYP79B2 and CYP79B3 catalyzes tryptophan (Mikkelsen et al., 2000), CYP79F1 and CYP79F2 catalyze methionine derivatives (Hansen 2001) and CYP79C1 and CYP79C2 have not yet been assigned a function.

ACCEPTED MANUSCRIPT Previously one of the doctrines of virology was that plant viruses do not integrate in host genome but about two decades before evidences like presence of viral genome in the plant sequence was reported. Bejarano et al. (1996) reported first time the integration of Gemini virus DNA in the form of multiple repeats in Nicotiana tabacum genome. The viral genome integration events have been reported in banana (Gayral et al., 2008), petunia (Richert-Poggeler 2003), grapevine (Bertsch et al., 2009), tomato (Staginnus et al., 2007), rice (Kunii et al., 2004)

IP

T

and tobacco (Gregor 2004).

CR

MP14 showed great homology with CPMMV coat protein. CPMMV is a pathogenic plant virus transmitted by whitefly and belonging to Beta flexiviridae. This virus was known to cause disease in yard long beans, soybeans and peanuts. As whitefly also feeds on cotton, so it

US

might possible that CPMMV is transmitted in G. arboreum by carrier whitefly and then part of its genome became integrated in G.arboreum genome (Khan et al., 2016). The cotton is immune

AN

to CPMMV (Muniyappa 1983). According to Bertsch et al. (2009) the endogenous viruses could be integrated in plant chromosomes either becoming pathogenic or conferring resistance against

M

invading viruses. There are reports about the integrated para retroviruses conferring resistance against infectious viruses. Bertsch et al. (2009) gave an hypothesis that grapevine is infected by

ED

diverse positive strand RNA viruses but not PRVs as multiple integration events of PRVs

PT

genomes are known in grapevine that make it resistant to other PRVs infection. The identification of CPMMV coat protein like sequence in G. arboreum genome is

CE

hypothesized to support the resistance against related para retroviruses that might be a kind of homology dependent virus resistance. However, MP14 seems not to confer this kind of

AC

resistance against begomo viruses as MP14 did not showed any homology with begomo virus sequences in GenBank. So the discovery of CPMMV coat protein like sequence in the transcriptome of biotic resistant G. arboreum is important towards determining the resistance mechanism present in this species especially against CLCuV. ARB32 and ARB150 are serine/threonine kinases with PKc domain in their protein sequence. PKc is a member of AGC group of protein kinase superfamily. These enzymes are involved in the phosphorylation of serine and threonine amino acids found in proteins. The STK protein is one of the three subclasses of receptor protein kinases of R genes involved in elicitor recognition and signal transduction (Morillo and Tax 2006). Their network in the cell is known

ACCEPTED MANUSCRIPT as the central processing unit that accept input from environmental stimuli whether pathogen or phytohormones or others and giving output in the form of responses like triggering gene expression, change in metabolism and growth and development (Hardie 1999; Afzal 2008). So both these kinases are important to study their role against various biotic stresses. One putative R gene homolog was also found in sequenced data showing homology with

T

TIR-NBS-LRR class of R genes. R genes are important part of plant defense system and work in

IP

a cascade where pathogen effectors are recognized by the specific receptors and this accurate

CR

recognition signals the respective R gene for awakening the plant defense system in order to combat pathogens. In cotton 63 resistance gene analogs (RGAs) clusters and 355 NBS encoding genes have been reported in diploid D genome of G. raimondii (Wang et al., 2013). However,

US

no R gene conferring resistance against CLCuV infection in cotton has been reported. RGA RM4 is recognized in CLCuV resistant G. arboreum. It may be a potential candidate to evaluate its

AN

role against biotic stresses.

M

Four important LTPs were found in the G. arboreum EST. LTPs are small proteins abundantly found in the cell and known to play role in defense against pathogens, in cuticle

ED

synthesis, plant growth and development (Yeats and Rose 2008). The presence of signal peptide is the characteristic of plant LTPs some of which are known to target cell wall (Thoma et al.,

PT

1993; García-Olmedo et al., 1995). Some of the LTPs are specifically known to be upregulated in response to infection and exhibit antimicrobial activity (Kader 1997). So the identified LTPs

CE

can be novel candidate to study their role against different biotic stresses in plants. ARB288 encoding GaUBC-E2 enzyme is an important EST found with respect to its

AC

possible role in CLCuV resistance. Because UBC is one of the host proteins that are found to βC

ton leaf curl beta satellite (Eini et al., 2009). UBCs are the

components of ubiquitin proteasomal pathway (UPP) that play significant role in protein degradation and modification in eukaryotic cell. E1, E2 and E3 are the enzymes involved in ubiquitination of the target protein in three separate steps. E2 enzyme is known to be involved in immune response due to viral and bacterial infection (Chen et al., 2013). In case of plants ubiquitination is mostly studied in A. thaliana where 1400 genes code for ubiquitin proteasomal pathway (UPP), out of which 37 code for E2s (Smalle and R. Vierstra 2004). Eini et al. (2009) have shown that in CLCuMV (a Gemini virus) infected tomato plants the DNA β

ACCEPTED MANUSCRIPT βC

symptoms were induced due to the interaction

E2

study showed

that β C1 hijacks the UPP of tomato by binding to E2 at myristoylation like motif. So virus utilizes E2 protein to escape from UPP and utilizes host biological pathway for successful disease induction. The maximum homology of ARB288 with Zostera marina (fish) UBC-E2 is interesting. From this result it can be inferred that UBC enzymes remained conserved during

T

evolutionary process of organisms.

IP

As cotton is a fiber producing plant so genes related to fiber development were also

CR

found in the sequenced EST data. Aquaporins are the proteins that are responsible for maintaining the water and nutrients status of the whole plant (Maurel, 2009). Naoumkina et al. (2015) have studied the expression pattern of different genes in short lint cotton mutants and

US

wild type plants. They observed that in short fiber mutant plants different genes including three subfamilies of aquaporins are down regulated. And accordingly concentration of soluble sugars

AN

and osmotic concentration were lower in such plants. Aquaporins may therefore be good

M

candidates to increase fiber length.

Similarly, arabinogalactans is another class of cell proteins playing role in fiber

ED

development. Arabinogalactans are a class of glycoproteins reported to paly role in plant growth, development and signal transduction. Huang et al. (2013) have reported that arabinogalactans are

PT

abundantly expressed in developing fiber and play role in fiber elongation. They observed that transgenic cotton plants overexpressing arabinogalactans protein produced longer fiber while

CE

cotton plants in which arabinogalactans were suppressed by RNA interference slowed down fiber initiation and elongation.

AC

Expansins are a group of non-enzymatic proteins of plant cell wall that play role in cell growth, abscission, fruit softening and other developmental processes where cell wall loosening occurs. Harmer et al. (2002) have reported six expansin genes in G. hirsutum out of which GhExp1 transcripts were abundantly expressed in cotton fiber and may play important role in fiber elongation. Ruan et al. (2001) have reported EXP1 gene in cotton that specifically expressed at early phase of elongating cotton fiber. Expansins expression also has been studied in other fiber producing plants like Calotropis procera (Cheema et al., 2010; Aslam et al., 2013).

ACCEPTED MANUSCRIPT The analysis of G. arboreum EST helped in the identification of important genes in its genome. EST for Lipoxygenase further used to identify 13 LOXs genes in cotton. As lipoxygenase is an important gene of plant defense pathway so all 13 LOXs identified in cotton can be used to evaluate their role in biotic stress resistance and specifically against CLCuV infection. These LOXs might be potential candidates for incorporation of resistance in

T

susceptible species through genetic engineering.

IP

Authors Contribution:

CR

SM and AB came with idea and supervised the study, while RM, KS executed the experiment and wrote the manuscript. ZHS, HA, YA and HASA had critically reviewed and proof read the

AC

CE

PT

ED

M

AN

US

manuscript. ZA and TM reviewed the bioinformatics related tasks.

ACCEPTED MANUSCRIPT References Aaronson, J., Eckman, B., Blevins, R., Borkowski, J., Myerson, J., Imran, S., and Elliston, K. (1996). Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data.Genome Res. 6 (9), 829-845.

IP

T

Afzal, A., Wood, A., and Lightfoot, D. (2008). Plant Receptor-Like Serine Threonine Kinases: Roles in Signaling and Plant Defense. MPMI 21 (5), 507-517. doi: 10.1094/MPMI-21-50507

CR

Akhtar, K.P., Ullah, R., Khan, I.A., Saeed, M., Sarwar, N., and Mansoor, S. (2013). First symptomatic evidence of infection of Gossypium arboreum with cotton leaf curl burewala virus through grafting. Int. J. Agric. Biol. , 7‒ 6

AN

US

Aslam, U., Khatoon, A., Cheema, H.M., and Bashir, A. (2013). Identification and characterization of plasma membrane aquaporins isolated from fiber cells of Calotropis procera. J. Zhejiang Univ. Sci. B. 14 (7), 586-595. doi: 10.1631/jzus.B1200233.

M

Babenko, L.M., Shcherbatiuk, M.M., Skaterna, T.D., Kosakivska, I.V. (2017). Lipoxygenases and their metabolites in formation of plant stress tolerance. Ukrainian biochemical journal 89, 5-21.

PT

ED

Bejarano, E., Khashoggi, A., Witty, M., and Lichtenstein, C. (1996). Integration of multiple repeats of geminiviral DNA into the nuclear genome of tobacco during evolution. Proc. Nat. Acad. Sci. 93 (2), 759-764. doi:10.1073/pnas.93.2.759

CE

Bertsch, C., Beuve, M., Dolja, V., Wirth, M., Pelsy, F. et al. (2009). Retention of the virusderived sequences in the nuclear genome of grapevine as a potential pathway to virus resistance. Biology Direct 4(1), 21. doi: 10.1186/1745-6150-4-21.

AC

Chen, L., Cheng, C., Zhang, C., Yao, Q., and Zhao, E. (2013). Ubiquitin-conjugating enzyme involved in the immune response caused by pathogens invasion. Open J. Immunol. 3 (30), 93-97. doi:10.4236/oji.2013.33013. Cheema, H.M.N., Bashir, A., Khatoon, A., Iqbal, N., Zafar, Y., and Malik, K.A. (2010). Molecular characterization and transcriptome profiling of expansin genes isolated from Calotropis procera fibers. Electron. J. Biotechn.13(6),10-11. doi: 10.2225/vol13-issue6fulltext-7. Choudhary, S., Gaur, R., and Gupta, S. (2012). EST-derived genic molecular markers: development and utilization for generating an advanced transcript map of chickpea.Theor. Appl. Genet. 124 (8), 1449-1462. DOI: 10.1007/s00122-012-1800-3

ACCEPTED MANUSCRIPT Christensen, S., Nemchenko, A., Borrego, E., Murray, I., Sobhy, I., Bosak, L., DeBlasio, S. et al. (2013). The maize lipoxygenase, ZmLOX10 , mediates green leaf volatile, jasmonate and herbivore-induced plant volatile production for defense against insect attack. Plant J. 74 (1), 59-73. doi: 10.1111/tpj.12101.

CR

IP

T

Devi, K., Chakraborty, S., Deb, B., and Rajwanshi, R. (2016). Computational identification and functional annotation of microRNAs and their targets from expressed sequence tags (EST) and genome survey sequences (GSSs) of coffee (Coffea arabica L.). Plant Gene, 6, 30-42. doi: 10.1016/j.plgene.2016.03.001. Dubey, N., Goel, R., Ranjan, A., Idris, A., Singh, S., Bag, S. et al. (2013). Comparative transcriptome analysis of Gossypium hirsutum L. in response to sap sucking insects: aphid and whitefly. BMC Genomics 14(1), 241. doi: 10.1186/1471-2164-14-241

AN

US

Eini, O., Dogra, S., Selth, L., Dry, I., Randles, J. and Rezaian, M. (2009). Interaction with a Host Ubiquitin-Conjugating Enzyme Is Required for the Pathogenicity of a Geminiviral DNA βS , MPMI, vol. 22, no. 6, pp. 737-746.

M

Farooq, M., Mansoor, S., Guo, H., Amin, I., Chee, P. W., Azeem, M. K., and Paterson, A. H. (2017). Identification and characterization of miRNA transcriptome in asiatic cotton (Gossypium arboreum) using high throughput sequencing. Frontiers in Plant Science 8: 969. Doi: 10.3389/fpls.2017.00969

PT

ED

García-Olmedo, F., Molina, A., Segura, A., and Moreno, M. (1995). The defensive role of nonspecific lipid-transfer proteins in plants. Trends in Microbiology 3(2), 72-74.

AC

CE

Gayral, P., Noa-Carrazana, J., Lescot, M., Lheureux, F., Lockhart, B. et al. (2008). A Single Banana Streak Virus Integration Event in the Banana Genome as the Origin of Infectious Endogenous Pararetrovirus. J.Virol. 82 (13), 6697-6710. doi: 10.1128/JVI.00212-08. Gregor, W. (2004). A Distinct Endogenous Pararetrovirus Family in Nicotiana tomentosiformis, a Diploid Progenitor of Polyploid Tobacco. Plant Physiol. 134 (3), 1191-1199. doi: 10.1104/pp.103.031112 . H

,SE,O ,SJ, , J N (2 2) C Gossypium hirsutum (upland cotton). Mol. doi:10.1007/s00438-002-0721-2.

Genet.

x α-expansin genes in Genomics 268, 1-9.

Hansen, C. (2001). CYP83B1 Is the Oxime-metabolizing Enzyme in the Glucosinolate Pathway in Arabidopsis.J. Biol. Chem.276 (27), 24790-24796. doi: 10.1074/jbc.

ACCEPTED MANUSCRIPT Hardie, D. (1999). Plant Protein Serine/Threonine Kinases: Classification and Functions. Annu. Rev. Plant Physiol. Plant Mol. Biol. 50(1), 97-131. doi: 10.1146/annurev.arplant.50.1.97. Huang, G., Gong, S., Xu, W., Li, W., Li, P. et al. (2013). A Fasciclin-Like Arabinogalactan Protein, GhFLA1, Is Involved in Fiber Initiation and Elongation of Cotton. Plant Physiol. 161 (3), 1278-1290. doi: 10.1104/pp.112.203760 :

zz

”, Trends Plant Sci.

IP

T

Kader, J. (1997). Lipid2(2), 66-70.

CR

Khan, A. M., Khan, A. A., Azhar, M. T., Amrao, L., and Cheema, H. M. N. (2016). Comparative analysis of resistance gene analogues encoding NBS-LRR domains in cotton. J. Sci. Food Agric. 96, 530–538

AN

US

Kunii, M., Kanda, M., Nagano, H., Uyeda, I., Kishima, Y., and Y. Sano. (2004). Reconstruction of putative DNA virus from endogenous rice tungro bacilliform virus-like sequences in the rice genome: implications for integration and evolution. BMC Genomics 5(1), 80. doi: 10.1186/1471-2164-5-80

M

Li, F., Fan, G., Wang, K., Sun, F., Yuan, Y., Song, G., Li, Q. et al. (2014). Genome sequence of the cultivated cotton Gossypium arboreum. Nat. Genet.46(6), 567-572. doi:10.1038/ng.2987.

PT

ED

Long, Q., Zhang, W., Wang, P., Shen, W., Zhou, T., et al. (2013). Molecular genetic characterization of rice seed lipoxygenase 3 and assessment of its effects on seed longevity.J. Plant Biol.56(4), 232-242. doi: 10.1007/s12374-013-0085-7.

AC

CE

Marmey, P., Jalloul, A., Alhamdia, M., Assigbetse, K., Cacas, J A. Voloudakis, A. Champion, A. Clerivet, J. Montillet and M. Nicole. (2007).The 9-lipoxygenase GhLOX1 gene is associated with the hypersensitive reaction of cotton Gossypium hirsutum to Xanthomonas campestris pv malvacearum. Plant Physiol.Bioch. 45(8), 596-606. DOI: 10.1016/j.plaphy.2007.05.002 Maurel, C., Santoni, V., Luu, D., Wudick, M., and Verdoucq, L. (2009). The cellular dynamics of plant aquaporin expression and functions. Curr. Opin. Plant Biol. 12 (6), 690-698. doi: 10.1016/j.pbi.2009.09.002. Mikkelsen, M., Hansen, M., Wittstock, U., and Halkier, B. (2000). Cytochrome P450 CYP79B2 from Arabidopsis Catalyzes the Conversion of Tryptophan to Indole-3-acetaldoxime, a Precursor of Indole Glucosinolates and Indole-3-acetic Acid. J. Biol. Chem.275 (43), 33712-33717. DOI: 10.1074/jbc.M001667200

ACCEPTED MANUSCRIPT Mikkelsen, M.D, Petersen, B.L., Glawischnig, E., Jensen, A.B., Andreasson, E., and Halkier, B.A. (2003). Modulation of CYP79 genes and glucosinolate profiles in Arabidopsisby defense signaling pathways. Plant Physiol. 131, 298-308. doi: 10.1104/pp.011015 Morillo, S., and Tax, F. (2006). Functional analysis of receptor-like kinases in monocots and dicots. Curr. Opin. Plant Biol. 9 (5), 460-469. doi: 10.1016/j.pbi.2006.07.009

IP

T

Muniyappa, V. (1983). Transmission of Cowpea Mild Mottle Virus by Bemisia tabaci in a Nonpersistent Manner. Plant Disease 67(4), 391. doi: 10.1094/PD-67-391

CR

Nagaraj, S.H., Gasser, R.B., and Ranganathan, S. (2006). A hitchhikers guide to expressed sequence tag (EST) analysis. Brief.Bioinform.8 (1), 6-21. doi: 10.1093/bib/bbl015.

M

AN

US

Nalam, V.J., Alam, S., Keereetaweep, J., Venables, B., Burdan, D., Lee, H., Trick, H.N., Sarowar, S., Makandar, R., Shah, J., 2015. Facilitation of Fusarium graminearum Infection by 9-Lipoxygenases in Arabidopsis and Wheat. Molecular plant-microbe interactions : MPMI 28, 1142-1152. Naoumkina, M., Thyssen, G., and Fang, D. (2015). RNA-seq analysis of short fiber mutants Ligon-lintless-1 (Li1) and – 2 (Li2) revealed important role of aquaporins in cotton (Gossypium hirsutum L.) fiber elongation. BMC Plant Biol.15(1), 65. doi: 10.1186/s12870-015-0454-0

ED

Nelson, D.R., 2018. Cytochrome P450 diversity in the tree of life. Biochim Biophys Acta 1866, 141-154.

PT

Nelson, D., Ming, R., Alam, M., and Schuler, M. (2008). Comparison of Cytochrome P450 Genes from Six Plant Genomes. Tropical Plant Biol. 1 (3-4), 216-235. doi: 10.1007/s12042-008-9022-1

AC

CE

Ogorodnikova, A.V., Mukhitova, F.K., Grechkin, A.N., 2015. Oxylipins in the spikemoss Selaginella martensii: Detection of divinyl ethers, 12-oxophytodienoic acid and related cyclopentenones. Phytochemistry 118, 42-50. Pokotylo, I.V., Kolesnikov, Y.S., Derevyanchuk, M.V., Kharitonenko, A.I., Kravets, V.S., 2015. Lipoxygenases and plant cell metabolism regulation. Ukrainian biochemical journal 87, 41-55. Rahman, M.U., Khan, A.Q., Rahmat, Z., Iqbal, M.A., and Zafar, Y. (2017). Genetics and Genomics of Cotton Leaf Curl Disease, Its Viral Causal Agents and Whitefly Vector: A Way Forward to Sustain Cotton Fiber Security. Front Plant Sci. 8, 1157. doi: 10.3389/fpls.2017.01157 Raya-Gonzalez, J., Velazquez-Becerra, C., Barrera-Ortiz, S., Lopez-Bucio, J., Valencia-Cantero, E., (2017). N,N-dimethyl hexadecylamine and related amines regulate root morphogenesis via jasmonic acid signaling in Arabidopsis thaliana. Protoplasma 254, 1399-1410.

ACCEPTED MANUSCRIPT Richert-Poggeler, K. (2003). Induction of infectious petunia vein clearing (pararetro) virus from endogenous provirus in petunia.The EMBO J. 22 (18). 4836-4845. doi: 10.1093/emboj/cdg443

T

Ruan,Y. (2001).The Control of Single-Celled Cotton Fiber Elongation by Developmentally Reversible Gating of Plasmodesmata and Coordinated Expression of Sucrose and K+ Transporters and Expansin.The Plant Cell Online 13(1), 47-60. DOI: https://doi.org/10.1105/tpc.13.1.47

CR

IP

Sanier, C., Sayegh-Alhamdia, M., Jalloul, A., Clerivet, A., Nicole, M., and Marmey, P. (2012). A 13-lipoxygenase is Expressed Early in the Hypersensitive Reaction of Cotton Plants to Xanthomonas campestris pv. Malvacearum. J.Phytopathol.160(6), 286-293. OI: 10.1111/j.1439-0434.2012.01900.x.

AN

US

Shangguan, L., Han, J., Kayesh, E., Sun, X., Zhang, S., Pervaiz, T., Wen, X., and Fang, J. (2013). Evaluation of Genome Sequencing Quality in Selected Plant Species Using Expressed Sequence Tags. PLoS ONE 8 (7), e69890. doi: 10.1371/journal.pone.0069890.

M

Smalle, J., and Vierstra, R. (2004). The ubiquitin 26s proteasome proteolytic pathway. Annu. Rev. Plant Biol. 55(1), 555-590. DOI:10.1146/annurev.arplant.55.031903.141801

ED

Sousounis, K., and Tsonis, P.A. (2012). Patterns of gene expression in microarrays and expressed sequence tags from normal and cataractous lenses. Human Genomics 6, 1-13. doi: 10.1186/1479-7364-6-14.

CE

PT

Staginnus, C., Gregor, W., Mette, M., Teo, C., Borroto-Fernández, E. et al. (2007). Endogenous pararetroviral sequences in tomato (Solanum lycopersicum) and related species. BMC Plant Biol.7(1). 24. doi: 10.1186/1471-2229-7-24

AC

Sun, Y., Xin, X., Zhang, K., Cui, T., Peng, Y., Zheng, J., 2018. Cytochrome P450 mediated metabolic activation of chrysophanol. Chem Biol Interact 289, 57-67. Tan, H., Callahan, F.E., Zhang, X.D., Karaca, M., Saha, S., Jenkins, J.N., Creech, R.G., and Ma, D.P. (2003). Identification of resistance gene analogs in cotton. Euphytica 134, 1-7. doi: 10.1023/A:1026114327168 Tan, S., and Wu, S. (2012). Genome Wide Analysis of Nucleotide-Binding Site Disease Resistance Genes in Brachypodium distachyon.Comparative and Functional Genomics, 2012, 1-12. doi: 10.1155/2012/418208 Thoma, S., Kaneko, Y., and Somerville, C. (1993). A non-specific lipid transfer protein from Arabidopsis is a cell wall protein. Plant J. 3(3), 427-436.

ACCEPTED MANUSCRIPT Umate, P. (2011). Genome-wide analysis of lipoxygenase gene family in Arabidopsis and rice. P. Signaling & Behavior 6(3), 335-338. doi: 10.4161/psb.6.3.13546 Van-der-Hoeven, R., Ronning, C., Giovannoni, J., Martin, G., and Tanksley, S. (2002). Deductions about the Number, Organization, and Evolution of Genes in the Tomato Genome Based on Analysis of a Large Expressed Sequence Tag Collection and Selective Genomic Sequencing.Plant Cell 14, 1441–1456.

CR

IP

T

Vicente, J., Cascón, T., Vicedo, B., García-Agustín, P., Hamberg, M., and Castresana, C. (2012). Role of 9x α-Dioxygenase Oxylipin Pathways as Modulators of Local and Systemic Defense. Mol. Plant. 5(4), 914-928. doi: 10.1093/mp/ssr105.

US

Wang, Z., Zhang, D., Wang, X., Tan, X., Guo, H., and Paterson, A. (2013). A Whole-Genome DNA Marker Map for Cotton Based on the D-Genome Sequence of Gossypium raimondii L. Genes|Genomes|Genetics, 3(10), 1759-1767, 2013. doi: 10.1534/g3.113.006890.

M

AN

Wittstock, U. (2000). Cytochrome P450 CYP79A2 from Arabidopsis thaliana L. Catalyzes the Conversion of L-Phenylalanine to Phenylacetaldoxime in the Biosynthesis of Benzylglucosinolate. J.Biol. Chem.275(19), 14659-14666.

ED

Xu, J., Linning, R., Fellers, J., Dickinson, M., Zhu, W., et al. (2011). Gene discovery in EST sequences from the wheat leaf rust fungus Puccinia triticina sexual spores, asexual spores and haustoria, compared to other rust and corn smut fungi.BMC Genomics 12, 161. doi: 10.1186/1471-2164-12-161.

CE

PT

Yan, L., Zhai, Q., Wei, J., Li, S., Wang, B., Huang, T., Du, M. et al. (2013). Role of Tomato Lipoxygenase D in Wound-Induced Jasmonate Biosynthesis and Plant Immunity to Insect Herbivores. PLoS Genet.9 (12), e1003964. doi: 10.1371/journal.pgen.1003964.

AC

Yang, Q., Li, J., Shen, J., Xu, Y., Liu, H., Deng, W., Li, X., Zheng, M., 2018. Metabolic Resistance to Acetolactate Synthase Inhibiting Herbicide Tribenuron-Methyl in Descurainia sophia L. Mediated by Cytochrome P450 Enzymes. J Agric Food Chem 66, 4319-4327.

Yang, X., Jiang, W., and Yu, H. (2012). The Expression Profiling of the Lipoxygenase (LOX) Family Genes During Fruit Development, Abiotic Stress and Hormonal Treatments in Cucumber (Cucumis sativus L.). IJMS 13 (12), 2481-2500. doi: 10.3390/ijms13022481. Ye, Z., Yamazaki, K., Minoda, H., Miyamoto, K., Miyazaki, S., Kawaide, H., Yajima, A., Nojiri, H., Yamane, H., Okada, K., 2018. In planta functions of cytochrome P450 monooxygenase genes in the phytocassane biosynthetic gene cluster on rice chromosome 2. Bioscience, biotechnology, and biochemistry 82, 1021-1030.

ACCEPTED MANUSCRIPT

Yeats, T., and Rose, J. (2008). The biochemistry and biology of extracellular plant lipid-transfer proteins (LTPs). Protein Sci. 17(2), 191-198. doi: 10.1110/ps.073300108 Wang, K., Wang, Z., Li, F., Ye, W., Wang, J., Song, G., Yue, Z., Cong, L., Shang, H., Zhu, S., Zou, C., et al. (2012). The draft genome of a diploid cotton Gossypium raimondii.Nat. Genet.44(10), 1098-1103. doi: 10.1038/ng.2371.

AC

CE

PT

ED

M

AN

US

CR

IP

T

Zhan, C., Ahmed, S., Hu, S., Dong, S., Cai, Q., Yang, T., Wang, X., Li, X., Hu, X., 2018. Cytochrome P450 CYP716A254 catalyzes the formation of oleanolic acid from beta-amyrin during oleanane-type triterpenoid saponins biosynthesis in Anemone flaccida. Biochem Biophys Res Commun 495, 1271-1277.

ACCEPTED MANUSCRIPT Table 1: Important EST found in G. arboreum leaf cDNA library Protein Length 363 591 470 363

EST Clone No.

Accession No.

ARB72 ARB124 ARB230 MP49

KR002859 KR007847 KR007849 KR007850

ARB40

KR002860

ARB212

KR007848

753

Arabinogalactans

ARB27 ARB162 ARB61

JZ773803 JZ773805 JZ773804

565 204 336

Auxin binding protein Auxin responsive protein Ethylene responsive transcription factor

MP14

-

746

CPMMV coat protein

MP31

KR259800

698

Lipoxygenase

ARB65

KR259801

924

ARB32

KT223831

734

ARB150

KT223832

ARB286

KT340054

RM4

KT250636

ARB288

KT424099

ARB137

453

% Homology

LTP LTP LTP LTP

97 % with G. herbaceum LTP (AFR43276) 99 % with G. hirsutum LTP11 (KC342642) 78 % with T. cacao LTP (XM00704390) 99 % with G. hirsutum LTP3 (FJ200519) 71 % with Proline Rich Protein of G. hirsutum (ABM05955) 100 % with G. arboreumfasciclin like arabinogalactan (KHG09433) 98 % with G. hirsutum (AA092740) 100 % with G. hirsutum (AEE25654) 52 % with G. hirsutum (AGN90957) 76 % with Cowpea mild mottle virus coat protein (JX020710) 54 % with LOX gene of T. cacao (XP007049578) 73 % with T. cacao P450 (XM-007017023) 90 % with T. cacao STK Protein (XM007014092) 95 % with G. arboreum STK protein (KHG13005) 95 % with G. hirsutum cellulose synthase (GQ200734) 62 % with Citrus sinensis TMV resistance protein N (XP006494027) 99 % with zostera marina UBC-E2 (KMZ63943) 86 % with G. hirsutumexpansin (ABB59694) 99 % with G. hirsutum aquaporin (ADE34299) 95 % with G. hirsutum aquaporin (AAB04557) 100 % with G. arboreum PIP-2 (KHG19287)

T P

C S

U N

A

D E

M

CYP450

Ser/Thr protein kinase Ser/Thr protein kinase

754

Cellulose synthase

279

TIR-2 Superfamily disease resistance gene

447

Ubiquitin conjugating enzyme E2

KT424098

737

Expansin

ARB29

JZ717573

233

MIP Superfamily aquaporin

ARB46

JZ773801

315

Aquaporin TIP-2 like protein

MP50

JZ773802

739

Aquaporin

C A

E C

I R

Proline rich protein

T P 755

Putative Protein

ACCEPTED MANUSCRIPT Table 2: Genes encoding lipoxygenae in cotton Chromosome

Locus Id

Target result

Protein domain

Amino acid

Molecular weight

3

LOC_108467586

-

Lipoxygenase PLAT/LH2

982

114148.62

3

LOC_108467580

-

Lipoxygenase PLAT/LH2

925

107542.32

1

LOC_108483796

-

Lipoxygenase PLAT/LH2

855

97778.43

5

LOC_108474786

-

Lipoxygenase PLAT/LH2

877

99800.28

3

LOC_108468954

-

Lipoxygenase PLAT/LH2

871

98519.10

3

LOC_108470002

-

Lipoxygenase PLAT/LH2

872

3

LOC_108468402

-

Lipoxygenase PLAT/LH2

516

7

LOC_108481129

Lipoxygenase PLAT/LH2

13

LOC_108464242

4

LOC_108473970

11

LOC_108457463

5

LOC_108474384

Chloroplast

7

LOC_108481128

Chloroplast

Chloroplast

-

C A

C S

8.42 6.37 5.23

59369.11

7.91

886

100923.93

6.28

865

98822.62

5.9

914

104218.75

5.82

Lipoxygenase PLAT/LH2

873

99661.71

5.97

Lipoxygenase PLAT/LH2

913

103431.37

7.40

Lipoxygenase PLAT/LH2

886

101250.22

6.34

D E

T P

E C

I R

T P

6.20

5.63

U N

A

M

Lipoxygenase PLAT/LH2

-

6.33

98930.78

Lipoxygenase PLAT/LH2

Possibly plastid

Isoelectric point

ACCEPTED MANUSCRIPT Table 3: Percent homology of cotton lipoxygenase against Arabidopsis counterparts Loci (Ga)

Homology (At)

Identity/ Similarity

No. of aa compared

(%) LOC108467586

LOX3

38

562

LOC108467580

LOX3

39

538

LOC108483796

LOX1

48

798

LOC108474786

LOX5

72

854

LOC108468954

LOX1

62

852

LOC108470002

LOX1

61

853

LOC108468402

LOX3

36

140

LOC108481129

LOX3

59

816

LOC108464242

LOX1

72

LOC108473970

LOX2

61

LOC108457463

LOX5

71

LOC108474384

LOX6

62

LOC108481128

LOX3

T P 67

C A

I R

D E

M

850 750 854 504 503

C S

U N

A

Ga, Gossypium arboreum L; At, Arabidopsis thaliana L; aa, amino acids

E C

T P

ACCEPTED MANUSCRIPT

13LOX

T P

I R

9LOX

C S

U N

A

M

Figure 1 Phylogenetic tree showing the grouping of MP31 with A. thaliana and G. hirsutum LOX genes. The accession numbers of the genes included in this phylogenetic tree are MP31:KR259800, AtLOX1:AT1G55020, AtLOX2:AT3G45140, AtLOX3:AT1G17420, AtLOX4:AT1G67560, AtLOX5:AT3G22400, AtLOX6:AT1G72520, G. raimondaii EST:CO128352.1, G. arboretum EST:JG854328, GhLOX1:AF361893, GhLOX2:JF967645, G. hirsutum EST:ES818622 and H sapiens: AAA19567.1

D E

T P

C A

E C

AC

CE

PT

ED

M

AN

US

Figure 2: Transmembrane topology prediction for ARB32

CR

IP

T

ACCEPTED MANUSCRIPT

IP

T

ACCEPTED MANUSCRIPT

(b)

(c)

PT

ED

M

AN

US

CR

(a)

CE

Figure 3: Signal peptide prediction in lipid transfer proteins. (a) 1-26 amino acids signal peptide predicted in ARB72, (b) 1-22 amino acids signal peptide predicted in ARB124, (c) 1-29 amino

AC

acids signal peptide predicted in MP49

AC

CE

PT

ED

M

AN

US

CR

IP

T

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN

US

CR

IP

T

ACCEPTED MANUSCRIPT

Figure 4: Protein alignment of cotton LOXs. The 13 cotton LOXs are indicated on left and sequence position on right. Gaps are included to improve alignment accuracy. Alignment was generated using CLC Genomics Workbench 11.

ACCEPTED MANUSCRIPT List of Abbreviations: cotton leaf curl virus (CLCuV) cotton leaf curl disease (CLCuD) expressed sequence tags (EST)

AC

CE

PT

ED

M

AN

US

CR

IP

T

resistant gene analog (RGA)