A novel exon generates ubiquitously expressed alternatively spliced new transcript of mouse Abcc4 gene

A novel exon generates ubiquitously expressed alternatively spliced new transcript of mouse Abcc4 gene

    A novel exon generates ubiquitously expressed alternatively spliced new transcript of mouse Abcc4 gene Sayeed Ur Rehman, Hassan Mubar...

998KB Sizes 0 Downloads 87 Views

    A novel exon generates ubiquitously expressed alternatively spliced new transcript of mouse Abcc4 gene Sayeed Ur Rehman, Hassan Mubarak Ishqi, Mohammed Amir Husain, Tarique Sarwar, Mohammad Tabish PII: DOI: Reference:

S0378-1119(16)30704-1 doi:10.1016/j.gene.2016.08.058 GENE 41562

To appear in:

Gene

Received date: Revised date: Accepted date:

19 April 2016 30 July 2016 31 August 2016

Please cite this article as: Rehman, Sayeed Ur, Ishqi, Hassan Mubarak, Husain, Mohammed Amir, Sarwar, Tarique, Tabish, Mohammad, A novel exon generates ubiquitously expressed alternatively spliced new transcript of mouse Abcc4 gene, Gene (2016), doi:10.1016/j.gene.2016.08.058

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT A novel exon generates ubiquitously expressed alternatively spliced new transcript of mouse Abcc4 gene

IP

SC R

Mohammad Tabish*

T

Sayeed Ur Rehman, Hassan Mubarak Ishqi, Mohammed Amir Husain, Tarique Sarwar and

Department of Biochemistry, Faculty of Life Sciences, A.M. University, Aligarh, U.P. 202002,

NU

India

MA

*Corresponding author: Department of Biochemistry

D

Faculty of Life Sciences

TE

A.M. University, Aligarh U.P. 202002, India

CE P

Email: [email protected]; Tel: +91-9634780818

AC

Running Title: A novel transcript isoform of mouse Abcc4 gene No. of Figures: 6 No. of Tables: 3

Conflict of Interest: The authors declare that there is no conflict of interest in this work.

ACCEPTED MANUSCRIPT Abstract Abcc4 gene codes for a protein (ABCC4) involved in the transportation of different classes of drugs

T

outside the cells. Various important drugs transported by ABCC4 include antiviral and anticancer

IP

drugs as well as endogenous molecules such as bile acids, cyclic nucleotides, folates, prostaglandins

SC R

and steroids. Alternative splicing generates multiple mRNAs that encode protein isoforms having diverse functions. In this study, we have identified a novel transcript of mouse Abcc4 gene using a

NU

combination of bioinformatics and molecular biology techniques. This transcript was found to be different from the reported transcript in having a different first exon that was found to be located on

MA

previously identified first intron. Newly identified transcript was found to be expressed across different tissues we studied and in different developmental stages. Expression level of novel and

D

reported transcripts was studied using quantitative real time PCR. After conceptually translating the

TE

novel transcript, various post-translational modifications were studied. Translation efficiency and predicted half life of encoded protein isoforms were analysed in silico. Molecular modelling was

CE P

performed to compare the structural differences in both isoforms. The diversity at N-termini in these protein isoforms explains the diverse function of ABCC4 in mouse.

AC

Keywords: Alternative splicing; mouse; Abcc4 gene; bioinformatics; molecular biology

ACCEPTED MANUSCRIPT Introduction Multiple drug resistance protein 4 (MRP4/ABCC4) is an ATP-binding cassette (ABC) transporter

T

included in the subfamily C of the ABC transporter superfamily. MRP4 is encoded by Abcc4 gene,

IP

which is involved in the transport of diverse sets of drugs outside the cell (Zhao et al, 2014).

SC R

Various anticancer drugs that are transported by ABCC4 include 6-mercaptopurine, 6-thioguanine, camptothecins and others (Lin et al., 2013; Tian et al., 2005; Leggas et al., 2004). ABCC4 is

NU

reported to be overexpressed in retinoblastomas, neuroblastomas, gliomas, melanomas, colon cancer and colorectal cancer cells (Hendig et al., 2009, Holla et al., 2008). Moreover, ABCC4

MA

expression is associated with drug resistance in leukaemia and ovarian cancer cells (Beretta et al., 2010). Recently, a study showed that downregulation of ABCC4 increased apoptosis in drug-

D

resistant human gastric cancer cells, thereby restoring the sensitivity of the drug-resistant cancer

TE

cells to 5-FU (Zhang et al., 2015) while another study showed protective effect of ABCC4 against

CE P

cytarabine-mediated damage in leukemic and host myeloid cells (Drenberg et al., 2016). ABCC4 is also reported to play an important role in cAMP homeostasis and related pathways (Belleville-

AC

Rolland et al., 2016). Such a diverse role of ABCC4 makes it an interesting target for research. ABC transporters can be classified by presence of three sequence motifs located in their cytoplasmic ATP-binding domain or nucleoside binding domains (NBDs). These are Walker A motifs, Walker B motifs, and ABC signature motif (Haimeur et al., 2004). The core functional structure of most ABC transporters include two NBDs, nucleoside binding domains and two sets of membrane spanning domains (MSDs) typically containing six membrane-spanning alpha helices (Haimeur et al., 2004). Alternative splicing plays an important role in creating diversity and regulating cellular functions (Kelemen et al., 2013; Nilsen and Graveley, 2010; Stamm et al., 2005). Recently, alternative splicing was reported to play an important role in creating functional diversity in ABCC1 (MRP1)

ACCEPTED MANUSCRIPT of sea urchin (Gökirmak et al., 2016). In case of ABCC4, there is no study reporting the alternatively spliced isoforms in recent years. In this study, we have analysed the alternative

T

splicing of the mouse Abcc4 gene due to its important and diverse roles inside the cell. Mouse

IP

Abcc4 gene is located on chromosome 14 and contains 31 exons that code for 1325 amino acids

SC R

protein. According to consensus CDS project, two CCDS are reported, CCDS27335.1 and CCDS49565.1, that differ in exclusion of exon 4 in one of them. To study the alternative splicing of a Abcc4 gene, we followed a novel methodology that combines the use of various bioinformatic

NU

tools and molecular biology techniques to predict and then confirm the existence of novel isoforms

MA

as described earlier (Banday et al., 2012). Using such methodology, we predicted and confirmed one exon located in the intronic region between exon 1 and exon 2 of Abcc4 gene that could

D

participate in alternative splicing with other internal exons. With the help of various molecular

TE

biology techniques, we confirmed the expression of one new transcript in different tissues of mouse. Further, using various bioinformatics tools, prediction of several post translational modifications of

CE P

the novel isoform was performed.

AC

Materials and Methods

Computational analysis and prediction of novel exons Our novel methodology involves the extensive use of various databases and gene/exon finding tools. Abcc4 gene of mouse was studied for predicting exons that could participate in alternative splicing with existing exons leading to generation of novel isoforms. Genomic DNA, cDNA and protein sequences of Abcc4 were downloaded from Mouse Genome Informatics (MGI). Published exons and introns were first located on the genomic sequence. Upstream region of the gene was also analysed for putative novel exons using gene and exon finding tools like GENSCAN (Burge and Karlin, 1997), FGENESH (Solovyev et al., 2006) and FEX (Solovyev et al., 1994). These tools were able to identify the published exons with high score and served as a positive control. Several

ACCEPTED MANUSCRIPT false positives arising in the prediction process were filtered out using the knowledge of comparative genomics and alternate splicing. The removal of redundant data was done carefully by

T

manual curation. The new exons identified were then subjected to test code analysis available at

IP

sequence manipulation suite (SMS). Test code recognises the protein coding nature of the DNA

SC R

sequences and is based on statistical correlation of nucleotide sequences to be coding or non-coding in nature (Fickett, 1982). Based on the results obtained from test code analysis, exons were chosen for confirmation by wet lab experiments involving RT-PCR and sequencing. Different online tools in

our

study

are

GENSCAN

(http://genes.mit.edu/GENSCAN.html),

NU

used

FGENESH

MA

(http://linux1.softberry.com/berry.phtml?topic=fgenesh&group=programs&subgroup=gfind), FEX (http://linux1.softberry.com/berry.phtml?topic=fex&group=programs&subgroup=gfind), (http://www.ebi.ac.uk/Tools/msa/clustalo/),

D

ClustalOmega

Sequence

TE

(http://blast.ncbi.nlm.nih.gov/Blast.cgi),

BLAST

Manipulation

Suite

(http://www.bioinformatics.org/sms/), ExPASy translate tools (http://web.expasy.org/translate), (http://www.expasy.org/proteomics),

CE P

ExPASy

TermiNator

(http://www.isv.cnrs-

gif.fr/terminator3/index.html), TFBIND (http://tfbind.hgc.jp/). Exon specific primers were designed

AC

and synthesized (Sigma Aldrich, Bangalore, India) and were as followed. Primer

Sequence (5'-3')

Direction

Primer location

FC

CTG GTC ATA AGC GGA GAC TGG AA

Forward

Exon E2

FE1A

GTG TTT TCA CCA ACG TCC AGG AAG

Forward

Exon E1A

FE1B

ATG CTG CCG AGT GAG GTG GTG AA

Forward

Exon E1B

R1

GCA GAG GCA GAA GAA TAA CCA GAA C

Reverse

Exon E6

R2

CTG CCC ACA GAA AGT GCA AGA AG

Reverse

Exon E6

Preparation of RNA from different tissues of mouse

ACCEPTED MANUSCRIPT Total cellular RNA was isolated from different mouse tissues using RNA extraction kit (Intron Biotechnology, Gyeonggi-do, Korea) according to manufacturer’s instructions. Isolated RNA

T

was dissolved in diethyl pyrocarbonate treated water and quantitated spectrophotometrically.

IP

Integrity of RNA was confirmed by denaturing agarose gel electrophoresis. RNA prepared was

SC R

either used immediately or stored at -80 °C. Animal experimentations were permitted by Ministry of Environment and Forests, Government of India under registration no. 714/02/a/CPCSEA. It was issued by Committee for the Purpose of Control and Supervision of

NU

Experiments on Animals (CPCSEA) dated 25th October, 2012 and approved by the Institutional

MA

Animal Ethic Committee (IAEC) of Department of Biochemistry, Faculty of Life Sciences, Aligarh Muslim University, Aligarh, India (Order no: D.No. 4165).

TE

D

Single strand cDNA synthesis

Splicing patterns of newly predicted exons with internal exons were characterized by synthesis of

CE P

cDNA by reverse transcriptase, PCR amplification followed by cloning and sequencing. Abcc4 mRNA from mouse tissues like brain, heart, kidney and liver. Tissue specific total RNA (2 μg) was reverse transcribed into cDNA using oligo(dT)18 primer in a total volume of 20 µL by using

AC

RevertAidTM H Minus Reverse Transcriptase (Fermentas Life Sciences, USA) at 42ºC for 1 h. Touchdown PCR

To amplify the single stranded cDNA prepared from different tissues of mouse, touchdown PCR was performed using PCR master mix (Fermentas Life Sciences, USA) in a total volume of 50 μL. Each PCR tube contained fixed amount of single stranded cDNA as template, a common reverse primer (R1) and one of the forward primers. PCR Amplification was performed under the following conditions: denaturation for 30 s at 93°C; annealing for 30 s at 70°C with a decrease of 0.2°C per cycle; and extension for 40 s at 72°C. This was repeated for 35 cycles followed by a final extension at 72°C for 8 min.

ACCEPTED MANUSCRIPT Semi-nested PCR To further authorize the genuine PCR product, 1 μL of product obtained in touchdown PCR was

T

used as a template for further amplification by PCR for 30 cycles using the same upstream

IP

forward primer but a different, common reverse primer (R2) which is located internal to R1. The

SC R

resulting semi-nested PCR product (8 μL) was then subjected to electrophoresis on a 1.2 % (w/v) agarose gel, stained with ethidium bromide and photographed on a UV transilluminator.

NU

Sub cloning and sequencing

MA

The identity of PCR products obtained after semi nested PCR was confirmed by DNA sequencing. The PCR amplified products were electrophoresed on 1.2 % (w/v) agarose gels. Anticipated

D

ethidium bromide stained bands were cut out from the gels and DNA was eluted using a Qiaquick

TE

PCR gel purification kit (Qiagen, Santa Clarita, CA). The purified DNA was sub cloned into TOPO

CE P

vector (Invitrogen, Carlsbad, CA) and transformed to E. coli JM109-competent cells were transformed. Transformed colonies were grown overnight at 37°C for miniprep plasmid DNA purification. The purified plasmid containing the insert was sequenced using an automatic

AC

sequencer using either M13 forward or reverse primers.

Quantitative real time PCR (qRT-PCR)

qRT-PCR was used to study the expression of control and novel transcript variants of Abcc4 gene (Ishqi et al., 2016). Total RNA from mouse kidney was isolated as discussed earlier. After determining the concentration of RNA using UV spectrophotometer Nanodrop (Thermo Scientific, USA), 2.5 μg of RNA was converted to cDNA (as described earlier). Different preparations of cDNA samples were quantified in duplicate for Abcc4 gene while actin gene was taken as an internal control. All experiments were performed according to the protocol described in the manual (Thermo Scientific, USA) using eppendorf realplex4 mastercycler. In brief, 20 µl PCR reaction

ACCEPTED MANUSCRIPT mixtures included primers (300 nM each), 1 µl of 1:5 diluted template cDNA and 10 µl of 2X PCR master mix (SYBR Green). Initial denaturation was performed for 7 min at 95°C, followed by 40

T

rounds of 15 s at 95°C and 60 s at 60°C. Formation of multiple products by a set of primers was

IP

also ruled out. Real-time detections were carried out at annealing stages. A common reverse primer

SC R

(A4R) was used to amplify the transcripts along with isoform specific forward primer. The primers were designed from the unique regions of both the isoforms. The following primers sequence was

NU

used for amplification of transcripts:

Sequence (5'-3')

Direction

Primer location

A4CF

GTG TTC TTC TGG TGG CTC AAC

Forward

Junction of exon 1 and exon 2 of control transcript

A4NF

GAG AGG AAC GGT GGC TCA AC

A4R

CTT CCT CGA GTC CTT CTT GG

MA

Primer

Junction of exon 1 and exon 2 of novel transcript

Reverse

3rd exon, common to both transcripts

TE

D

Forward

In silico analysis for promoter regions and post translational modifications

CE P

Sequences upstream of existing first exon and novel first exon were analyzed and compared for possible promoter regions using different tools such as TFSEARCH, and TFBIND (Tsunoda and

AC

Takagi, 1999). Putative transcription start site was predicted using Promoter 2.0 prediction server (Knudsen, 1999). The sequence of novel transcript isoforms was conceptually translated using ExPASy Translate tool. Protein sequences were then subjected to various post translational modifications tools available at ExPASy resource portal. Comparison of amino acid sequences of N-termini was done using multiple sequence alignment tool (ClustalW).

With the help of

Terminator tool, N-terminal methionine excision, translation efficiency and predicted half life of Ntermini of different isoforms were studied (Martinez et al., 2008, Frottin et al., 2006).

Results and Discussion

Two new coding exons of mouse Abcc4 predicted within first intron

ACCEPTED MANUSCRIPT With the help of bioinformatics tools, we predicted two new exons of Abcc4 gene in mouse (Figure 1A). New exons, E1A and E1B, were located between E1 and E2 and are capable of participating in

T

alternative splicing with known exons. Each new exon can potentially alternatively splice out E1 or

IP

both E1 and E2 and generate different transcript isoforms differing at their 5’ end (Figure 1B).

SC R

Both E1A and E1B exons have high FEX score along with published exon, E1, which served as positive control (Table 1). Hits with low scores and several false positives were filtered out using

NU

splicing rule information and comparative genomics. The presence of the start codon ‘ATG’ in the novel exons show that these exons are capable of coding for novel N-terminals of encoded proteins.

MA

These exons were further subjected to TestCode analysis available at Sequence Manipulation Suite that predicts the coding and non-coding nature of exons. As seen in Table 1, E1 (positive control)

D

and E1B was found to be “coding” while E1A was “May be coding” in nature. Also to find any

TE

match in various databases, the nucleotide and encoded amino acid sequence of these new exons

any positive hits.

CE P

were also used to search for the available ESTs with the help of BLAST and none of them yielded

AC

Confirmation of predicted exon by RT-PCR and Semi-nested PCR To confirm the existence of predicted exons, RT-PCR followed by semi-nested PCR was performed. PCR product obtained were fractioned by agarose gel electrophoresis and is shown in Figure 2. Lane “a” in Figure 2 represents control band (550 bp) in kidney while lane “b-e” represents a product (391 bp) corresponding to E1B exon in kidney, brain, heart and liver respectively. In case of positive control, a band corresponding to CCDS27335.1 was obtained which was further confirmed by sequencing. After sequencing of PCR products, existence of one novel transcript containing exon E1B was confirmed. Sequence of Abcc4-N1 transcript was submitted to the gene bank with accession number KX013261.

ACCEPTED MANUSCRIPT Novel transcript isoform containing new exon E1B at 5' end, replacing known exon E1, was named as Abcc4-N1 that included exons E1B, E2, E3, E5 and E6 in the amplicon. Exon 4 (E4) was found

T

to be missing from Abcc4-N1. Since isoform corresponding to CCDS49565.1 also lacks exon 4, we

IP

have compared the novel isoform with the known isoform (corresponding to CCDS49565.1) also

SC R

lacking exon 4. We will refer this transcript isoform as Abcc4 and the corresponding protein as ABCC4.

NU

Expression of Abcc4 transcripts in different tissues during the development Expression of the Abcc4-N1 transcript was studied in different tissues of mouse across three

MA

different developmental stages. Different postnatal stages studied were PN3 (postnatal 3 days), PN15 (postnatal 15 days) and PN60 (postnatal 60 days). Published transcript Abcc4 as well as novel

TE

developmental stages (Table 2).

D

transcript, Abcc4-N1, was found to be ubiquitously expressed in all the four tissues across all

CE P

Quantitative real time RT-PCR

Expression level of control (Abcc4) and novel (Abcc4-N1) transcripts was studied by real-time RT-

AC

PCR using appropriate primers. In case of both variants, primers were designed in such a way to include the exon junction region in the forward primer while the reverse primer was same. Expression level of these transcripts in kidney is shown in Figure 3. Under normal conditions, expression of Abcc4-N1 was 36% as compared to Abcc4. Expression of Abcc4-N1 was found to be significantly lower and this might be one of the reasons for not being detected earlier. Promoter region analysis With the help of web based tool TFBIND (Tsunoda and Takagi, 1999), various transcription factor binding sites were predicted in the region upstream of first exon of each transcript. Putative TF binding sites with high scores are presented in Figure 4. As compared to Abcc4, promoter region of

ACCEPTED MANUSCRIPT Abcc4-N1 showed some similarity (GATA-1 and MZF-1) as well as differences in the potential TF binding sites. TF binding sites (high prediction score) present in the promoter regions of Abcc4 are

T

GATA-1, HSF2, AML-1a, c-Rel, Sp1, USF, MZF-1, STATx and c-ETS, and Abcc4-N1 are CdxA,

IP

MyoD, HNF-1, CP2, GATA-1, Nkx-2, GATA-2, deltaE and MZF1. The existence of these highly

SC R

diverse sets of TF binding sites may fulfil the need for differential expression of these isoforms in different cell types and/or across different developmental stages.

NU

Post translational modifications

In order to understand the function of different isoforms of Abcc4 gene, we performed comparative

MA

post translational studies in silico with the help of ExPaSy tools. In silico study on mouse Abcc4 gene revealed that the encoded N-terminals of new and published isoforms have unique properties

TE

D

and can undergo co-translational or post translational modifications, which is important for imparting different functions to these protein isoforms. The nucleotide sequences of first exons in

CE P

each transcript were conceptually translated and studied for various post translation modifications. The amino acid sequence obtained was compared with that of published isoform using multiple

AC

sequence alignment tools, ClustalW (Figure 5). These isoforms were found to differ only at their Nterminal in the region encoded by their first exon. All the properties mentioned hereafter are for amino acid sequence encoded by first exon only and not the complete protein. The molecular weight (MW) and isoelectric point (pI) of the sequences encoded by first exon was calculated using tools available at ExPaSy site (Bjellqvist et al., 1994). In case of published Abcc4 isoform, first exon E1 codes for a sequence of 2.9 kDa (MLPVHTEVKPNPLQDANLCSRVFFW). However, the sequence encoded by first exon (E1B) of Abcc4-N1 is 1.8 kDa (MEMLPSEVVKPREER). Also, there was considerable shift in the isoelectric point (pI) by these sequences. In case of ABCC4 the pI value is close to 6.59 while in case of ABCC4-N1, pI value was calculated to be 4.94. Along with published isoform ABCC4, novel isoform, ABCC4-N1 did not have any signal peptide. First

ACCEPTED MANUSCRIPT exon encoded sequence of published ABCC4 isoform was found to have one predicted Oglycosylation site (Steentoft et al., 2013) in the region studied whereas ABCC4-N1 had no such site.

T

N-glycosylation, phosphorylation, PKA dependent phosphorylation (Blom et al., 2004) and

IP

sulfation sites (Chang et al., 2009) were found to be absent in both these sequences (results not

SC R

shown).

Fate and stability of proteins depend upon the N-terminal sequence of the polypeptide and hence

NU

decides the half life of the protein (Giglione and Meinnel, 2001). N-terminal of these isoform was studied for various characteristic properties that included N-terminal methionine excision (NME),

MA

translation efficiency and predicted half life of the protein sequence. These studies were conducted with the help of TermiNator tool that also predicts N-terminal acetylation, N-terminal

D

myristoylation and S-palmitoylation. The isoforms ABCC4 and ABCC4-N1 were predicted to

TE

retain the N-terminal methionine and did not undergo N-terminal excision. ABCC4-N1 was

CE P

predicted to have acetylation at the N-terminal. Further, ABCC4-N1 was found to be more efficient in translation than ABCC4. Predicted half life of these protein isoforms is shown in Table 3.

AC

Functional significance of alternatively spliced isoform Alternative splicing generates different proteins from same gene. In case of Abcc4 gene in mouse, we identified an isoform that contains a novel first exon that splices with internal exon-2 to form mature mRNA. In order to understand the consequence of alternative splicing of first exon, we performed BLAST-P using 25 amino acid encoded by exon 1. It was seen that the 25 amino acid residues present at the N-terminal of ABCC4 is highly conserved through different organisms. Nterminal amino acid sequences ABCC4 from four different organisms namely Mus musculus, Rattus norvegicus, Canis lupus familiaris and Homo sapiens were aligned using clustalW and is shown in Figure 6A. The conserved 25 amino acid sequence of ABCC4 is replaced by a novel 15 amino acid sequence at the N-terminal in ABCC4-N1 isoform. This is expected to alter the functional aspect of

ACCEPTED MANUSCRIPT ABCC4. Further, BLAST-P was performed using the novel 15 amino acid sequence of ABCC4-N1. As seen in Figure 6B, few hits were obtained that showed some level of sequence similarity to

T

middle or C-terminal region of bacterial ABC transporter ATP-binding proteins (Identity; 75% -

IP

69%). Change in the ABCC4-N1 sequence may assign a different role to this ABC transporter.

SC R

Further downstream experiments can be performed to evaluate the functioning of novel isoform. Conclusion

NU

Using a combination of bioinformatics and molecular biology techniques, we have identified a novel transcript of Abcc4 gene in mouse. The novel transcript was found to be expressed in all the

MA

four tissues that were studied across different developmental stages. With the help of different tools available online, in silico analysis was performed. Various post translational modifications in the

D

novel as well as published isoforms are reported. ABCC4 and ABCC4-N1 were predicted to show

TE

no N-terminal methionine cleavage. Promoter analysis showed diverse sets of transcription factor

CE P

binding sites. The amino acid sequence encoded by exon 1 was found to be highly conserved in different organisms and was replaced by novel 15 amino acid sequence which shows some homology with ABC transporters of lower organisms. However, further studies are required to

AC

characterise the functional aspects of novel isoforms at the level of protein and their expression in presence of different drugs since various drugs were reported to modulate alternative splicing (Rehman et al, 2015). Acknowledgements The authors are thankful to C.S.I.R, New Delhi, for the award of CSIR-SRF fellowship to SUR (File no - 09/112(0470)/2011-EMR1). We are thankful to the Department of Biotechnology, New Delhi, India for providing generous funding to MT (Grant No: BT/PR5271/BID/7/395/2012). The necessary facilities provided by the Department of Biochemistry, A.M.U., Aligarh, India are also acknowledged.

ACCEPTED MANUSCRIPT References Banday, A.R., Azim, S., Rehman, S.U., Tabish, M., 2012. Two novel N-terminal coding exons of Prkar1b gene of mouse: identified using a novel approach of in silico and molecular biology

IP

T

techniques. Gene 500, 73-79.

Belleville-Rolland, T., Sassi, Y., Decouture, B., Dreano, E., Hulot, J. S., Gaussem, P., Bachelot-

SC R

Loza, C., 2016. MRP4 (ABCC4) as a potential pharmacologic target for cardiovascular disease. Pharmacol Res. 107, 381-389.

NU

Beretta, G.L., Benedetti, V., Cossa, G., Assaraf, Y.G., Bram, E., Gatti, L., Corna, E., Carenini, N., Colangelo, D., Howell, S.B., Zunino, F., Perego, P., 2010. Increased levels and defective glycosylation of MRPs in ovarian carcinoma cells resistant to oxaliplatin. Biochem. Pharmacol. 79,

MA

1108-1117.

Bjellqvist, B., Basse, B., Olsen, E., Celis, J.E., 1994. Reference points for comparisons of two-

D

dimensional maps of proteins from different human cell types defined in a pH scale where

TE

isoelectric points correlate with polypeptide compositions. Electrophoresis 15, 529-539. Blom, N., Sicheritz‐Pontén, T., Gupta, R., Gammeltoft, S., Brunak, S., 2004. Prediction of

CE P

post‐translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics 4, 1633-1649.

AC

Burge, C., Karlin, S., 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78-94.

Chang, W.C., Lee, T.Y., Shien, D.M., Hsu, J.B., Horng, J.T., Hsu, P.C., Wang, T.Y., Huang, H.D., Pan, R.L., 2009. Incorporating support vector machine for identifying protein tyrosine sulfation sites. J. Comput. Chem. 30, 2526-2537. Drenberg, C.D., Hu, S., Li, L., Buelow, D.R., Orwick, S.J., Gibson, A.A., Schuetz, J.D., Sparreboom, A., Baker, S.D., 2016. ABCC4 Is a Determinant of Cytarabine‐Induced Cytotoxicity and Myelosuppression. Clin. Transl. Sci. 9, 51-59. Fickett, J.W., 1982. Recognition of protein coding regions in DNA sequences. Nucleic Acids Res. 10, 5303-5318.

ACCEPTED MANUSCRIPT Frottin, F., Martinez, A., Peynot, P., Mitra, S., Holz, R. C., Giglione, C., Meinnel, T., 2006. The proteomics of N-terminal methionine cleavage. Mol. Cell Proteomics 5, 2336-2349. Giglione, C., Meinnel, T., 2001. Organellar peptide deformylases: universality of the N-terminal

IP

T

methionine cleavage mechanism. Trends Plant Sci. 6, 566-72.

Gökirmak, T., Campanale, J. P., Reitzel, A. M., Shipp, L. E., Moy, G. W., Hamdoun, A., 2016.

SC R

Functional diversification of sea urchin ABCC1 (MRP1) by alternative splicing. Am. J. Physiol. Cell Physiol. 310, C911-C920.

NU

Haimeur, A., Conseil, G., Deeley, R.G., Cole, S.P., 2004. The MRP-related and BCRP/ABCG2 multidrug resistance proteins: biology, substrate specificity and regulation. Current Drug Metab. 5,

MA

21-53.

Hendig, D., Langmann, T., Zarbock, R., Schmitz, G., Kleesiek, K., Götting, C., 2009. Characterization of the ATP-binding cassette transporter gene expression profile in Y79: a

D

retinoblastoma cell line. Mol. Cell. Biochem. 328, 85-92.

TE

Holla, V.R., Backlund, M.G., Yang, P., Newman, R.A., DuBois, R.N., 2008. Regulation of

CE P

prostaglandin transporters in colorectal neoplasia. Cancer Prev. Res. 1, 93-99. Ishqi, H.M., Rehman, S.U., Sarwar, T., Husain, M.A., Tabish, M., 2016. Identification of differentially expressed three novel transcript variants of mouse ARNT gene. IUBMB Life 68, 122-

AC

135.

Kelemen, O., Convertini, P., Zhang, Z., Wen, Y., Shen, M., Falaleeva, M., Stamm, S., 2013., Function of alternative splicing. Gene 514, 1-30. Knudsen, S., 1999. Promoter 2.0: for the recognition of PolII promoter sequences. Bioinformatics 15, 356-361. Leggas, M., Adachi, M., Scheffer, G.L., Sun, D., Wielinga, P., Du, G., Mercer, K.E., Zhuang, Y., Panetta, J.C., Johnston, B., Scheper, R.J., Stewart, C.F., Schuetz, J.D., 2004. Mrp4 confers resistance to topotecan and protects the brain from chemotherapy. Mol. Cell. Biol. 24, 7612-7621. Lin, F., Marchetti, S., Pluim, D., Iusuf, D., Mazzanti, R., Schellens, J.H., Beijnen, J.H., van Tellingen, O., 2013. Abcc4 together with abcb1 and abcg2 form a robust cooperative drug efflux system that restricts the brain entry of camptothecin analogues. Clin. Cancer Res. 19, 2084-95.

ACCEPTED MANUSCRIPT Martinez, A., Traverso, J.A., Valot, B., Ferro, M., Espagne, C., Ephritikhine, G., Zivy, M., Giglione, C., Meinnel, T., 2008. Extent of N-terminal modifications in cytosolic proteins from eukaryotes. Proteomics 8, 2809-31.

T

Nilsen, T. W., Graveley, B. R., 2010. Expansion of the eukaryotic proteome by alternative splicing.

IP

Nature 463, 457-463.

SC R

Rehman, S.U., Husain, M.A., Sarwar, T., Ishqi, H.M., Tabish, M., 2015. Modulation of alternative splicing by anticancer drugs. Wiley Interdiscip. Rev. RNA 6, 369-79.

NU

Solovyev, V., Kosarev, P., Seledsov, I., Vorobyev, D., 2006. Automatic annotation of eukaryotic genes, pseudogenes and promoters. Genome Biol. 7, 10.1-10.12.

MA

Solovyev, V.V., Salamov, A.A., Lawrence, C.B., 1994. Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. Nucleic Acids Res. 22,

D

5156-5163.

TE

Stamm, S., Ben-Ari, S., Rafalska, I., Tang, Y., Zhang, Z., Toiber, D., Thanaraj, T, A., Soreq, H., 2005. Function of alternative splicing. Gene 344, 1-20.

CE P

Steentoft, C., Vakhrushev, S.Y., Joshi, H.J., Kong, Y., Vester-Christensen, M.B., Schjoldager, K.T., Lavrsen, K., Dabelsteen, S., Pedersen, N.B., Marcos-Silva, L., Gupta, R., Bennett, E.P., Mandel, U., Brunak, S., Wandall, H.H., Levery, S.B., Clausen, H., 2013. Precision mapping of the human

AC

O‐GalNAc glycoproteome through SimpleCell technology. The EMBO journal 32, 1478-1488. Tian, Q., Zhang, J., Tan, T.M., Chan, E., Duan, W., Chan, S.Y., Boelsterli, U.A., Ho, P.C., Yang, H., Bian, J.S., Huang, M., Zhu, Y.Z., Xiong, W., Li, X., Zhou, S., 2005. Human multidrug resistance associated protein 4 confers resistance to camptothecins. Pharm. Res. 22, 1837-1853. Tsunoda, T., Takagi, T., 1999. Estimating transcription factor bindability on DNA. Bioinformatics 15, 622-30. Zhang, G., Wang, Z., Qian, F., Zhao, C., Sun, C., 2015. Silencing of the ABCC4 gene by RNA interference reverses multidrug resistance in human gastric cancer. Oncol. Rep. 33, 1147-1154. Zhao, X., Guo, Y., Yue, W., Zhang, L., Gu, M., Wang, Y., 2014. ABCC4 is required for cell proliferation and tumorigenesis in non-small cell lung cancer. Onco. Targets Ther. 21, 343-51.

ACCEPTED MANUSCRIPT Figure Legends Figure 1: Exon-intron organisation and alternative splicing pattern of published and newly

T

predicted Abcc4 transcripts of mouse. (A) Exons are represented with rectangular boxes and inter

IP

connecting lines represent introns. Known exons (E1 to E31) are shown in black boxes whereas

SC R

coloured boxes represent newly predicted exons. Size of at the exons/introns are not to scale. Dashed lines show the splicing pattern, predicted to generate two new transcripts having either exon E1A or exon E1B as the 1st exon at the 5Ꞌ end along with a known transcript having exon E1 as first

NU

exon at the 5' end. TSS is putative transcription start site for known variant containing E1 while

MA

TSS-N and TSS-N’ is putative transcription start site for novel variants containing E1A and E1B respectively. (B) Designing of forward and reverse primers. Control forward primer (FC) was

D

designed from published exon 2 region. Reverse primer R1 was from exon 6 while reverse primer

TE

R2 was located internally (from exon 6) to R1. R2 primer was designed for semi-nested PCR. Forward primers from newly predicted exons, FE1A and FE1B were designed from newly predicted

CE P

exons E1A and E1B respectively.

Figure 2. RT-PCR followed by semi-nested PCR products fractioned by agarose gel

AC

electrophoresis. Total RNA isolated from kidney was reverse transcribed and amplified by RTPCR followed by semi-nested PCR using a common reverse primer R2. Lane ‘a’ corresponds to the control product of 550 bp using forward primer FC while lane ‘b-e’ represents a product of anticipated size (391 bp) obtained by using forward primer FE1B in kidney, brain, heart and liver respectively. PCR product was not obtained in the tissues tested when forward primer FE1A was used. Size of PCR ladders are shown on the left. Expected size of PCR products using forward primer FE1B and reverse primer R2 is as followed depending on the splicing pattern. (1) 616 bp when exon E1B splices with E2 and includes E4 in the transcript and 391 bp when E4 is also spliced out from the transcript. (2) 505 bp when E1B splices directly with E3 and includes E4 in the transcript. If E4 is spliced out, a product size of 280 bp was expected.

ACCEPTED MANUSCRIPT Figure 3. Relative expression of Abcc4 transcript variants. Oligo(dT)18 primed cDNA was synthesized from mRNA obtained from adult kidney mouse. Quantitative real time RT-PCR was

T

used to study the expression level of isoforms using exon specific primers. Forward primers were

IP

designed in such a way to include the exon junction region while the reverse primer was same. Fold

SC R

change with respect to control is reported.

Figure 4. Comparative in silico prediction of different transcription factor binding sites.

NU

Regions located upto 600bp upstream of first exon of each isoform were analyzed for possible transcription factor binding sites. Only results with high probabilities are reported here.

MA

Figure 5. Multiple sequence alignment (Clustal W) of amino acid sequences encoded by first two exons of published ABCC4 and new isoforms of ABCC4-N1 in mouse. Marked difference is

D

seen at the N-terminal of these isoforms. Arrow shows the sequencing corresponding to start of

TE

exon 2. Upstream of this are the sequences of alternately spliced first exons. Asterisk shows

CE P

identical residues in both isoforms.

Figure 6. Amino acid sequence comparison with other organisms using BLAST-P. (A) Amino

AC

acid sequence encoded by exon 1 of mouse Abcc4 shows a high similarity with higher organisms. (B) Amino acid sequence encoded by first exon of Abcc4-N1 (query) shows (1) 75% and (2) 69% identity with ABC transporters of some lower organisms.

AC

CE P

TE

D

MA

NU

SC R

IP

T

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT

AC

CE P

TE

D

MA

NU

SC R

IP

T

Figures

Figure 1

AC

CE P

TE

D

MA

NU

SC R

IP

T

ACCEPTED MANUSCRIPT

Figure 2

MA

NU

SC R

IP

T

ACCEPTED MANUSCRIPT

AC

CE P

TE

D

Figure 3

MA

NU

SC R

IP

T

ACCEPTED MANUSCRIPT

AC

CE P

TE

D

Figure 4

MA

NU

SC R

IP

T

ACCEPTED MANUSCRIPT

AC

CE P

TE

D

Figure 5

AC

CE P

TE

D

MA

NU

SC R

IP

T

ACCEPTED MANUSCRIPT

Figure 6

ACCEPTED MANUSCRIPT

Table 1: List of published and predicted exons with their FEX and coding test score.

Coding/Non-coding Coding May be Coding Coding

T

TestCode score 1.324 0.785 1.241

IP

FEX score 23.09 4.38 6.33

SC R

Exons Known exon E1 Predicted exon E1A Predicted exon E1B

AC

CE P

TE

D

MA

NU

Exon 1 (E1) was taken as positive control for both FEX and test code analysis.

IP

T

ACCEPTED MANUSCRIPT

SC R

Table 2. Expression profile of novel isoforms in different tissues across different developmental stages. PN15

PN60

Brain

+

+

+

Heart

+

+

+

Kidney

+

+

+

Liver

+

+

+

MA

NU

PN3

AC

CE P

TE

D

Three different postnatal (PN) stages were studied that include PN3, PN15 and PN60. Appearance of RT-PCR products of anticipated size is represented as (+) sign and denotes expression of control and novel isoform. cDNA was prepared independently from three different mice of each group mentioned.

ACCEPTED MANUSCRIPT Table 3. Various properties of N-terminal amino acid sequence of published and novel protein isoforms as predicted by TermiNator tool.

*M(1)

ABCC4-N1

MEMLPSEVVKPREERWL NPL

Ac-M(1)

T

MLPVHTEVKPNPLQDANL CS

Translation efficiencyb

IP

ABCC4

Likelihood %

Predicted half life (h)

100

1

?

100

5

5-31

SC R

Predicted N-terminus of mature proteina

NU

Amino acid sequence (20 aa at N- terminal)

(1) or (2) indicates the position of the amino-acid in the protein sequence starting with M(1).

b

Translation efficiency refers to the capacity of the translation machinery to translate the mRNA into protein. The

MA

a

AC

CE P

TE

D

higher the value denotes more efficient translation on the scale 1-5.

ACCEPTED MANUSCRIPT Abrivations: RT-PCR, Reverse transcriptase polymerase chain reaction

T

EST, Expressed sequence tags

IP

PN, Postnatal

SC R

SMS, sequence manipulation suite

AC

CE P

TE

D

MA

NU

qRT-PCR, Quantitative real time PCR

ACCEPTED MANUSCRIPT Highlights

Mouse Abcc4 gene was studied to identify novel alternatively spliced transcripts.



Expression of a novel transcript was confirmed using molecular biology techniques.



Novel transcript has different promoter region and transcription start site.



Conceptually translated transcript show different post-translational modifications.

AC

CE P

TE

D

MA

NU

SC R

IP

T