Unraveling the nuclear and chloroplast genomes of an agar producing red macroalga, Gracilaria changii (Rhodophyta, Gracilariales)

Unraveling the nuclear and chloroplast genomes of an agar producing red macroalga, Gracilaria changii (Rhodophyta, Gracilariales)

Genomics xxx (xxxx) xxx–xxx Contents lists available at ScienceDirect Genomics journal homepage: www.elsevier.com/locate/ygeno Unraveling the nucle...

1005KB Sizes 58 Downloads 81 Views

Genomics xxx (xxxx) xxx–xxx

Contents lists available at ScienceDirect

Genomics journal homepage: www.elsevier.com/locate/ygeno

Unraveling the nuclear and chloroplast genomes of an agar producing red macroalga, Gracilaria changii (Rhodophyta, Gracilariales) Chai-Ling Ho⁎, Wei-Kang Lee, Ee-Leen Lim Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 Serdang, Selangor, Malaysia

A R T I C L E I N F O

A B S T R A C T

Keywords: Agarophyte Gracilaria changii nuclear genome plastid genome

Agar and agarose have wide applications in food and pharmaceutical industries. Knowledge on the genome of red seaweeds that produce them is still lacking. To fill the gap in genome analyses of these red algae, we have sequenced the nuclear and organellar genomes of an agarophyte, Gracilaria changii. The partial nuclear genome sequence of G. changii has a total length of 35.8 Mb with 10,912 predicted protein coding sequences. Only 39.4% predicted proteins were found to have significant matches to protein sequences in SwissProt. The chloroplast genome of G. changii is 183,855 bp with a total of 201 open reading frames (ORFs), 29 tRNAs and 3 rRNAs predicted. Five genes: ssrA, leuC and leuD CP76_p173 (orf139) and pbsA were absent in the chloroplast genome of G. changii. The genome information is valuable in accelerating functional studies of individual genes and resolving evolutionary relationship of red seaweeds.

1. Introduction Macroalgae are macroscopic and multicellular algae [1]. Many of them are natural resources of phycocolloids such as agar, agarose and carrageenan that are produced by the red macroalgal species; and alginate which is produced by the brown macroalgal species. These phycocolloids have gelling, water-retention, emulsifying properties that have wide applications in food and pharmaceutical industries. Agar is also extensively used as culture medium in microbiology research while agarose is used for DNA separation by gel electrophoresis [2]. Whole genome DNA sequencing is a revolution in biology which enables the determination of the complete DNA sequence of an organism which provides a full description of genes and other important biological information stored in the genome. Whole genome DNA also provides information on the complete set of proteins which reveals how the genetic information determines development, structure and function of a living organism and their interaction with environment; enables comparisons of proteins between different species, and exploration on the evolutionary relationship between species. Unlike higher terrestrial plants and microalgae, little is known about the macroalgal genomes until lately when the brown macroalga Ectocarpus siliculosus [3] was sequenced, followed by the genome sequencing of Chondrus crispus [4]. To date, five red algal genomes have been completely sequenced, i.e. Cyanidioschyzon merolae [5], Galdieria sulphuraria [6], Porphyridium cruentum [7], and Pyropia yezoensis [8] in

addition to Ch. crispus. Among these, only two are from the red macroalgae i.e. Ch. crispus and Py. yezoensis. While the genomes of Ch. crispus and Ec. siliculosus serve as reference genomes for carrageenophytes and alginophytes that produce carrageenan and alginates, respectively; a reference genome for macroalgae that produce agar and agarose is lacking. Analysis of the Ectocarpus genome revealed genes for alginate biosynthesis and modifying enzymes including mannuronan C5 epimerases, that are likely to modulate physicochemical properties of alginates in the cell wall [3]. The genome of Ch. crispus provides insights into the metabolism of carrageenophyte including genes encoding numerous glycoside hydrolases (GH) and glycosyltransferases (GT) that are involved in cell-wall metabolism, polysaccharide biosynthesis and glycosylation (protein and lipid). In addition, twelve genes encoding galactose-6‑sulfurylases (which are involved in the final step of carrageenan biosynthesis), three genes for κ-carrageenases (which are involved in cell-wall expansion and recycling), and homologous genes for chondroitin synthase and carbohydrate sulfotransferases (CSTs) that are involved in the biosynthesis of glycosaminoglycans, were reported [4]. Although the mapping of these genes to the carrageenan biosynthetic pathway is still far from being complete, genome sequencing is a key step towards achieving this. To fill in the gap in genome analyses of phycocolloid producing algae, we have sequenced the nuclear and organellar genomes of an agarophyte, Gracilaria changii collected from a tropical mangrove

⁎ Corresponding author at: Department of Cell and Molecular Biology, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 Serdang, Selangor, Malaysia. E-mail address: [email protected] (C.-L. Ho).

http://dx.doi.org/10.1016/j.ygeno.2017.09.003 Received 14 July 2017; Received in revised form 5 September 2017; Accepted 6 September 2017 0888-7543/ © 2017 Elsevier Inc. All rights reserved.

Please cite this article as: Ho, C.-L., Genomics (2017), http://dx.doi.org/10.1016/j.ygeno.2017.09.003

Genomics xxx (xxxx) xxx–xxx

C.-L. Ho et al.

Gracilaria changii (Xia et Abbott) Abbott, Zhang et Xia was collected from the mangrove swamp at Morib, Selangor, Malaysia (02° 45.808′ N; 101° 26.143′ E). The seaweeds were washed and detached from mud, visible epiphytes and epibionts before being frozen in liquid nitrogen and stored at − 80 °C.

to produce a minimum of 10 Gbases data. The data were deposited in European Nucleotide Archive (ENA) under the accession number PRJEB20769. The paired end raw reads with (100 bases each) generated were trimmed and filtered by CLC Genomics Workbench 4.9 before they were assembled with an optimized word size of 47. Gene prediction was performed by AUGUSTUS [21] using a red algal genome (Galdieria sulphuraria) as model. The predicted coding sequences were annotated with the SwissProt database in the National Center for Biotechnology Information (NCBI) with BlastP algorithm [22] with an Evalue cutoff at 10− 5. The assembled contigs with an average coverage below 100 were removed because contigs with low average coverage could have higher possibility to be contaminated by sequences of bacterial origin. Only contigs with a minimal average coverage of 100 were used for subsequent genome analyses. These contigs were matched with expressed sequence tags (ESTs) [10] and de novo assembled transcripts from G. changii [13] with BlastN algorithm [22], with an E-value cutoff at 10− 20. A higher cutoff value (10− 5) was employed for cross-species comparison with protein sequences in the SwissProt database while a lower cutoff value (10− 20) was set for DNA sequence comparison within G. changii. The completeness of the gene set (protein) predicted from the genome assembly was assessed by Benchmarking Universal Single-Copy Orthologs (BUSCO) v2.0 with default settings and Eukaryota odb9 [23]. The gene ontology (GO) terms for these sequences were retrieved by using the matching SwissProt IDs as input for Argot2 (Annotation Retrieval of Gene Ontology Terms) [24]. The Kyoto Encyclopedia of Genes and Genomes (KEGG) Orthology (KO) number for these sequences were retrieved by BlastKOALA at the KOALA (KEGG Orthology And Links Annotation) webpage (www.kegg.jp/blastkoala/). Multiple sequence alignment of red algal sequences were performed with Clustal W [25], respectively. Phylogenetic analyses were conducted in MEGA4 [26] using the Neighbor-Joining method [27] with a bootstrap test performed on 1000 random combinations of the sequence alignment.

2.2. DNA extraction

2.4. Ortholog analyses

Frozen G. changii sample (approximately 3.5 g) was ground in liquid nitrogen and sand with a mortar and pestle to a fine powder. The powder was distributed evenly into 15 ml extraction buffer containing 25 mM Tris-Cl, pH 8; 50 nM LiCl; 35 mM EDTA, pH 8; 35 mM EGTA, pH 8; and 5% w/v SDS, and 7.5 ml phenol. The mixture was mixed thoroughly before 7.5 ml of chloroform/isoamyl alcohol (24:1) was added, mixed and centrifuged at 10,000g for 15 min. The top aqueous layer was transferred to new tube, and extracted twice with equal volume of phenol/chloroform/isoamyl alcohol (PCI; 25:24:1). The DNA was then precipitated with 0.7 vol isopropanol and incubated overnight at −20 °C. The tube was centrifuged at 10,000g at 4 °C for 30 min. The DNA pellet was rinsed with 70% (v/v) ethanol, centrifuged at 10,000g at 4 °C for 15 min, and air dried. The pellet was dissolved in 400 μl TE buffer (100 mM Tris-Cl, pH 8; 10 mM EDTA, pH 8). The DNA was treated with 10 μg RNase at 37 °C for 30 min. This was followed by extraction with equal volume of PCI. The DNA was then precipitated with 2.5 vol ice-cold absolute ethanol, 0.1 volume of 3 M sodium acetate and incubated overnight at −80 °C. Subsequently, the DNA was centrifuged at 12,000g at 4 °C for 30 min. After rinsing with 70% (v/v) ethanol and air dried, the pellet was re-suspended in 35 μl TE buffer. The absorbance of DNA was measured by spectrophotometer (Implen NanoPhotometer, Alpha Technologies Ltd., Ireland) at 260, 280 and 230 nm. The quality of DNA was further examined by analyzing the DNA samples on 1% (w/v) agarose gel in 1 × TAE buffer (Tris base, glacial acetic acid, 0.5 M EDTA) and stained with ethidium bromide.

The protein sequences predicted from the contigs with an average coverage which is equal or more than 100 in this study, together with those from five red algae that have been completely sequenced: Chondrus crispus Stackhouse (http://protists.ensembl.org/Chondrus_ crispus/Info/Index); Cyanidioschyzon merolae P. De Luca, R. Taddei & L. Varano (http://merolae.biol.s.u-tokyo.ac.jp/blast/blast. html); Galdieria sulphuraria (Galdieri) Merola (http://protists.ensembl. org/Galdieria_sulphuraria/Info/Index); Porphyridium cruentum (S.F. Gray) Nägeli (http://cyanophora.rutgers.edu/porphyridium/); and Pyropia yezoensis (Ueda) M.S.·Hwang & H.G. Choi (http://nrifs.fra.affrc. go.jp/ResearchCenter/5_AG/genomes/nori/index_j.html), were submitted to OrthoMCL (http://www.orthomcl.org/orthomcl/) for the assignment of ortholog groups. Protein sequences that met the threshold: cutoff of 10− 5 and a minimum of 50% match for their BlastP results, were assigned the group from the best matching OrthoMCL protein. For the remaining proteins that do not match with any OrthoMCL protein, the InParalog algorithm was used to find potential paralog pairs by the Markov Clustering algorithm (www.micans.org/mcl) [28].

swamp. Since Gracilaria species provide about 91% of the world's agar [9], the importance of establishing a reference genome from this genus for other agarophytes is justified. In addition, important transcriptome data exist for G. changii, including those on expressed sequence tags (ESTs) [10], cDNA microarray and RNA-seq data in relation to the macroalgal responses to abiotic and biotic stresses [11–14]. Thus, the availability of G. changii genome could help promoting this macroalga as a model species for agarophytes and shed light on agar biosynthesis. The chloroplast genome sequences from G. salicornia [15] and G. tenuistipitata [16] have benefited the evolutionary biology of seaweeds. We hope the genome information here will fuel future omics research of agarophytes which are still limited in number, while the chloroplast genome can contribute to the elucidation of evolutionary relationship of red seaweeds. The plastid genome is also important for crop improvement as several important agronomic traits are found to be associated with the plastid genome, which is usually inherited uniparentally [17,18]. The importance of algal-bacterial interactions to the biology of both interacting partner organisms is well recognized [19], however, the biodiversity of the microbial communities that are closely associated with seaweeds are largely uncharacterized [20]. In the present study, the sequence data of field sample also provided a glimpse of the associated core microbiome of G. changii. 2. Materials and methods 2.1. Sample collection

2.5. Analyses of plastid genome Contig 30 (GenBank accession No. KY018922) was identified to be the scaffold for the chloroplast genome of G. changii, by using Blast with red algal chloroplast sequences as queries. GeneMarkS [29] and Blast (BlastN and BlastX) were used for open reading frame (ORF) finding and gene prediction, respectively. In addition, Dual Organellar GenoMe Annotator (DOGMA) [30] was also used to annotate the chloroplast genome of G. changii. Identification of large and small subunits of ribosomal of RNA (LSU and SSU rRNAs) was conducted by pairwise alignment with red algal rRNAs from G. salicornia (NC_023785).

2.3. Genome sequence analysis Genomic DNA fragments of an average size around 290 bp were selected and sequenced by Illumina Genome Analyzer IIx in two lanes 2

Genomics xxx (xxxx) xxx–xxx

C.-L. Ho et al.

the majority of them (92%) are below 10 kb, indicating that the genome sequence obtained is likely to be partial and fragmented. These sequences matched to 71.9% of ESTs and 84% of de novo assembled transcripts from RNA-seq data of G. changii obtained from our previous studies [10,13] at a minimal E-value of 10− 20. Based on the coverage of these contigs on ESTs and transcriptome data, the predicted genome size of Gracilaria species could range from 40 to 50 Mb which is close to the genome size of Pyropia yezoensis i.e. 43 Mb [8]. The completeness of the draft genome was assessed by comparing the predicted proteins from G. changii with a total of 303 BUSCO in the Eukaryota odb9. The draft genome contains 70% complete BUSCO and 5.6% fragmented BUSCO, while 24.4% BUSCO were missing. The completeness of the draft genome estimated by BUSCO corresponds to that estimated by comparison to ESTs or de novo transcriptome. The GC content of the partial genome was estimated to be 50.6% which is lower than the predicted GC content of the Py. yezoensis (63.6%) [8].

tRNAscan-SE Search Server [31] was used to identify transfer RNA (tRNA). The results were manually curated. The physical map of chloroplast DNA were generated by OrganellarGenomeDRAW [32]. 2.6. Microbiome analysis The 16S ribosomal DNA sequences among the contig sequences with an average coverage which is less than 100 were identified by similarity search (BlastN) against the bacterial 16S ribosomal DNA sequences, archaea 16S ribosomal DNA sequences and fungal 28S ribosomal DNA sequences from the Ribosomal Database Project (RDP; https://rdp.cme. msu.edu) and the Greengenes database from the Greengenes (http:// greengenes.secondgenome.com/). These sequences were submitted to the Classifier in RDP for rapid assignment of rRNA sequences into new bacterial taxonomy [33] at an 80% confidence level, which is regarded as 89% accurate in classification at the genus level [34]. To annotate the predicted coding sequences from the contigs with an average coverage less than 100, these contigs were translated into amino acids and matched with the SwissProt database in the NCBI with BlastP algorithm [22] with a minimal cutoff E-value of 10− 5. To detect bacterial transcripts from field samples, the transcripts with low expression level (RPKM that were less than 15) from the RNA-seq data of G. changii samples (European Nucleotide Archive accession number PRJEB13899) [14], were matched with the SwissProt database in the NCBI with BlastX algorithm [22] with an E-value cutoff at 10− 5. Classification of these sequences into genus/species was performed based on the identity of the top match.

3.2. Protein prediction and annotation of the nuclear genome A total of 10,912 protein coding sequences were predicted from the filtered contigs with a minimal average coverage of 100, which is comparable to those in Py. yezoensis i.e., 10,327; and slightly higher than that in Ch. crispus i.e., 9606. Based on the distribution of proteins on these contigs, a protein was found at every 2.8 kb. The average coding sequence size is 1239 bp with an untranslated intergenic region of an average size of 1.5 kb. The number of introns per gene for G. changii is 0.64, with about 60.5% of the predicted genes have no introns. The numbers of introns per gene for Py. yezoensis, Ch. crispus and Cy. merolae were reported to be 0.7, 0.32 and 0.005, respectively [4,5,8]. Of the 4311 genes with introns, 65% of them were found to have a single intron. The average size of an intron is 90 bp. This is in agreement with the findings on other red algal genome analysis that the algal genome is compact [4,5] possibly driven by early ecological forces that led to gene loss, reduction (both in number or/and length) of introns and intergenic DNA [4] that occurred after the divergence of the green and red lineages about 1.5 billion years ago [36]. The average distance between genes and average intron size in G. changii were estimated to be smaller than those in Ch. crispus, featuring a smaller estimated genome size of this macroalga. This is substantiated by the findings of Nakamura et al. [8] that showed a correlation between average intron length and algal genome size. More than 81% of the 10,912 predicted proteins were assigned into ortholog groups (67.4%, including 21.2% red algal-specific orthologs) or paralogs (14.3%) by OrthologMCL. The remaining 19% protein sequences with no orthologs could be unique to Gracilaria species, suggesting large gene diversity within this lineage. Most of the ortholog (97%) and paralog (63%) groups have only less than four members from G. changii suggesting that most gene families in G. changii are small, with few paralogs involved. Only 4302 of 10,912 predicted proteins (39.4%) were found to have significant matches to the sequences in the SwissProt while 52% of genes in Ch. crispus were reported to have no counterpart in GenBank [4], suggesting that the red seaweeds have large unexplored gene diversity. The deduced amino acid sequences encoded by the coding sequences of G. changii share the highest identities to those from eukaryotic species i.e. Arabidopsis thaliana, Homo sapiens, Mus musculus, Dictyostelium discoideum, Drosophila melanogaster, Schizosaccharomyces pombe, Saccharomyces cerevisiae, Rattus norvegicus, Oryza sativa and Danio rerio. A total of 92 GO terms were assigned to the protein coding sequences. The distribution of GO terms in biological processes (BP), molecular functions (MF) and cellular components (CC) were 64, 16 and 14, respectively. The distribution of GOs in BP category was shown in Fig. 1A with 10% GOs in primary metabolic process, 8% in macromolecule metabolic process, 7% in single organism cellular process and nitrogen compound metabolic process respectively. KEGG Orthologies

3. Results and discussion 3.1. Genome sequence analysis Sequencing of the total DNA extracted from G. changii produced more than 14 Gb of paired end raw reads (1.6 × 108 reads in total) that are 100-nucleotide each. The Sanger/Phred quality value of reads is between 28 and 40 with 0.001% probability of error per reads. The filtered reads were assembled with a word size of 47. In total, 10,853 contigs with a minimal average coverage of 100 were obtained. Contigs that have an average coverage less than 100 were not included for further analyses as the majority of them were found to have short lengths and the deduced amino acid sequences shared high identities to those from prokaryotes such as Escherichia spp., Bacillus spp., Rhizobium spp., Pseudomonas spp. and others. Field grown seaweeds have been reported to be contaminated by many marine bacteria [8,35]. Analysis of the whole genome sequence of marine algae was complicated by DNA contamination from symbiotic bacteria [8]. By applying this conservative selection criterion, we may exclude some of the “real” seaweed sequences that could share high identities to bacterial sequences. Likewise, the selection criteria applied may not be able to exclude contaminating sequences completely. However, our data demonstrated that the numbers of predicted proteins that matched to Escherichia spp., Bacillus spp., Rhizobium spp., Pseudomonas spp. have reduced from 2165, 1661, 1279 and 905 to 31, 70, 18 and 15 respectively upon the filtering process. We presumed that a substantial amount of DNA from contaminating organisms have to be present in order to reach an average coverage of 100, and it was quite impossible for this to happen as the DNA sample was expected to be enriched with seaweed genomic DNA. The filtered contigs which consist of 92.2% of the total sequencing data generated thus have a high likelihood (more than 99%) to be G. changii sequence. The exact genome size of Gracilaria species is unknown. The genome sizes of red algae range from 13.7 Mb (for Galdieria sulphuraria) [6] to 105 Mb (for Chondrus crispus) [4]. The filtered contigs constitute a total size of 35.8 Mb (excluding the chloroplast and mitochondrial genomes) which falls into the range of the genome size of red algae. These contigs ranged from 0.2–20.6 kb with an average size of 3314 bp in length, and 3

Genomics xxx (xxxx) xxx–xxx

C.-L. Ho et al.

Fig. 1. The distribution of Gene Ontologies (GO) in biological process (BP) (A) and KEGG Orthologies (KO) (B) for the annotated genes in the Gracilaria changii genome.

sequences could be UDP-Glc specific enzymes that initiate and elongate the floridean starch, respectively. Floridean starch (an α-linked glucose polymer) is a storage glucan found in grains or granules in the cytoplasm, and is usually the primary sink for fixed carbon from photosynthesis in rhodophytes. It has a higher degree of branching compared to amylopectin thus are sometimes referred as “semi-amylopectin” [37]. Floridean starch shares structural similarities with higher plant starch granules but lack amylose [38]. Floridean starch has been proposed as a dynamic carbon source for glucose (Glc) and galactose (Gal) in agar biosynthesis in the dark condition [39–41]. Previously, an UDPGlc:α-glucan synthase purified from G. tenuistipitata [42] was proposed to be the floridean starch synthase in the cytoplasm of this rhodophyte. Meanwhile, another pathway of α-glucan biosynthesis in the plastid was hypothesized to co-exist with the α-glucan biosynthesis pathway in the cytoplasm of red algal cells. The findings of Sesma and Iglesias [43] demonstrated this by purifying and characterizing an ADP-Glc

(KOs) with the highest percentages of annotated genes in the Gracilaria changii genome include gene information (23%), carbohydrate metabolism (9%), and amino acid metabolisms (6%) (Fig. 1B). We focused on sequence analysis related to KOs that are related to agar and agarose biosynthesis i.e. carbohydrate metabolism and nucleotide metabolism. The present study shows the presence of genes for amylopectin/ glycogen branching enzyme (EC 2.4.1.18) and amylopectin/glycogen phosphorylase (EC 2.4.1.1) in G. changii. Although the enzyme UTP-Glc1-phosphate uridylyltransferase (EC 2.7.7.9) which changes α-D-Glc 1phosphate and UTP to UDP-Glc and disphosphate was present in G. changii, the enzyme involved in the biosynthesis of floridean starch from UDP-Glc i.e., UDP-Glc:1,4-α-D-glucan 4-α-D-glucosyltransferase was not annotated. But, genes for glycogenin-1 or UDP-α-DGlc:glycogenin α-D-glucosyltransferase (EC 2.4.1.186) and glycogen synthase (UDP-Glc:glycogen 4-α-D-glucosyltransferase; EC 2.4.1.11) were found among the predicted proteins of G. changii. Both of these 4

Genomics xxx (xxxx) xxx–xxx

C.-L. Ho et al.

pyrophosphorylase (AGPase) from G. gracilis, and the presence of an αglucan lyase, a novel enzyme with starch hydrolyzing activity in the rhodoplasts of G. lemaneiformis [44] further substantiated the presence of an α-glucan biosynthesis pathway in the plastid. Genes for pullanase, α-glucan water dikinase and isoamylase that probably recycle floridean starch are also present in the G. changii genome. Similar to Ch. crispus, G. changii has remarkable low number of genes needed to synthesize and recycle starch. In addition, genes for enzymes that incorporate UDP-Glc into trehalose via trehalose-6phosphate i.e., trehalose 6-phosphate synthase (EC 2.4.1.15) and trehalose 6-phosphate phosphatase (EC 3.1.3.12), and trehalase (EC 3.2.1.28) that hydrolyze trehalose to D-Glc were also annotated among the proteins in G. changii. Algal (iso)floridoside phosphate synthase/ phosphatases were found to be highly similar to trehalose phosphate synthases/phosphatases, there are possibility that one or more of these genes for trehalose phosphate synthase/phosphatase-like enzyme from G. changii may actually encode the (iso)floridoside phosphate synthase/ phosphatase [45]. Indeed, two of these protein sequences (Bg1593 and Bg5654) were clustered closely with the identified (iso)floridoside phosphate synthases/phosphatases from Gal. sulphuraria (Gasu_10960 and Gasu_26940) (Fig. 2) and could participate in the biosynthesis of (iso)floridosides from UDP-Gal and glycerol-3-phosphate. (Iso)floridosides are low molecular weight compatible solutes that are involved in the osmo-acclimation of seaweed [46]. In addition, (iso)floridosides can be degraded by α-galactosidase to UDP-Gal that fuels the agar biosynthesis probably during osmotic stress [47,48]. The backbone of the two major galactans in G. changii i.e. agar and agarose, in the cell wall matrix of agarophytes consists a chain of alternating D-Gal and L-Gal in agar, and D-Gal and 3,6-anhydro-L-Gal in agarose [49,50]. The biosynthesis pathway of agar and agarose is not well characterized. However, it is generally accepted that the backbone of galactan of agar is synthesized in the Golgi apparatus by galactosyltransferases that catalyze the polymerization of UDP-D-β-Gal and GDP-L-α-Gal residues. Agar is then sulfated by sulfotransferases probably with 3′-phosphoadenosine 5′-phosphosulphate (PAPS) as sulfate

donor. During the conversion of agar to agarose, sulfurylases may remove the C6-sulfate from the L-Gal unit of agar to form 3,6-anhydro rings [49]. Genes for a complete set of enzymes involved in the biosynthesis of the two precursors, UDP- and GDP-Gal, starting from fructose-6-phosphate, were identified among the predicted genes, i.e., glucose-6-phosphate isomerase (Bg4720), phosphofructokinase (Bg5039, Bg5070), UTP-glucose-1-phosphate uridylyltransferase (Cg967, Bg1209), galactose-1-phosphate uridylyltransferase (Bg4032) for the biosynthesis of UDP-Gal; and mannose-6-phosphate isomerase (Bg1166, Bg4605), phosphomannose mutase (Bg1446), mannose-1phosphate guanylyltransferase or GDP-mannose pyrophosphorylase (Bg6239) and GDP-mannose 3′,5′-epimerase (Bg2095). UDP-galactose and GDP-galactose are believed to be added to the galactan chain alternatively by different galactosyltransferases, while galactosidases may play a role in recycling the precursors from galactans. Similar to the genome of Ch. crispus, the G. changii genome contains genes encoding numerous glycoside hydrolases (GH) and glycosyltransferases (GT) belonging to diverse families, especially galactosyltransferase and galactosidases. Although the biosynthetic pathways of UDP-Gal and GDP-Gal are shared by many organisms, the galactosyltransferases involved in agar biosynthesis could be unique in agarophytes since agar/agarose are only produced in the cell wall of these seaweeds, thus making the discovery of these genes difficult. Five sequences for chondroitin sulfate synthases (Bg4280, Bg4389, Bg5200, Bg6271, Cg374) and one for chondroitin sulfate N-acetylgalactosaminyl transferase (Bg2711) were found among the predicted protein sequences of G. changii. Chondroitin sulfate synthase and chondroitin sulfate N-acetylgalactosaminyl transferase are involved in the biosynthesis of sulfated glycosaminoglycans in the extracellular matrix of animals [51]. The functional roles of their homologous genes in agar biosynthesis of G. changii await further investigation. In addition, two sequences encoding carbohydrate sulfotransferases (Bg2166 and Bg2456) were found among the predicted protein sequences of G. changii. Bg2166 and Bg2456 share 60% and 50% identities with two of the seven proteins that have been identified as

Fig. 2. Unrooted phylogenetic tree of amino acid sequences of trehalose and (iso)floridoside phosphate synthases/phosphatases from G. changii using the neighbor-joining algorithm. The distance scale is shown in the left-hand lower corner and bootstrap values (1000 replicates) are added to nodes. The amino acid sequences used are in Supplementary File 1.

5

Genomics xxx (xxxx) xxx–xxx

C.-L. Ho et al.

Fig. 3. Unrooted phylogenetic tree of amino acid sequences of carbohydrate sulfotransferase in Sulfotransferase Family 2 from G. changii using the neighbor-joining algorithm. The distance scale is shown in the left-hand lower corner and bootstrap values (1000 replicates) are added to nodes. The amino acid sequences used are in Supplementary File 1.

Fig. 4. Unrooted phylogenetic tree of amino acid sequences of D-α-Gal-2,6‑sulfurylase I (A) and D-α-Gal-2,6‑sulfurylase II (B) from G. changii using the neighbor-joining algorithm. The distance scale is shown in the left-hand lower corner and bootstrap values (1000 replicates) are added to nodes. The amino acid sequences used are in Supplementary File 1.

sulfate group to position 6 of Gal, N-acetylglucosamine and N-acetylgalactosamine residues [52–54]. The homologous sequences from G. changii could potentially be the candidate genes for agar sulfotranferases.

genuine carrageenan sulfotransferases, i.e., CHC_T00009100001 and CHC_T00009100001 reported by Collén et al. [4], respectively (Fig. 3). Together, they belong to the sulfotransferase subfamily 2 which is composed of carbohydrate sulfotransferases that specifically transfer 6

Genomics xxx (xxxx) xxx–xxx

C.-L. Ho et al.

Fig. 5. The gene map of Gracilaria changii chloroplast. The inner circle shows the GC content of the sequence while the outer circle shows the distribution of genes.

2,6‑sulfurylase II) from Ch. crispus was 29% (Fig. 4). The red algal sequences encoding D-α-Gal-2,6‑sulfurylase II does not have significant matches to other protein sequences in the databases. It is intriguing that the protein sequences D-α-Gal-2,6‑sulfurylase I and II that are supposed to have similar enzyme activity were found to be highly diverse. Since 3,6-anhydro-L-α-Gal instead of 3,6-anhydro-D-α-Gal is present in agar, it is unknown whether Bg1861, Ag1964 and Bg692 have identical or similar enzyme activities as their counterparts in Ch. crispus. The discovery of these candidate gene sequences from G. changii enables future exploration on the stereoselectivity and stereospecificity of the enzymecatalyzed reactions. Similar to the finding of Collén et al. [4] on Ch. crispus, genes for sulfatases are absent from the G. changii genome suggesting that sulfatases are not involved in the modification of agar or carrageenan structure although bacterial arylsulfatases have been reported to be able to desulfate the carbon 6 in D-α-Gal or L-α-Gal and cyclization of 3,6 anhydro-bridge in carrageenan and agar, respectively [55,56]. Red algae could have lost the genes for these enzymes or have never

In this study, three genes from G. changii were shown to have significant matches to genes encoding D-α-Gal-2,6‑sulfurylase I and II from Ch. crispus, respectively. Previously, twelve genes for D-α-Gal-2,6‑sulfurylases, were identified from the genome of Ch. crispus [4]. They were proposed to participate in desulfation of carbon 6 in D-α-Gal and cyclization of 3,6 anhydro-bridge, which contributes to the gelling properties of carrageenans, in the final step of biosynthesis of carrageenans. Two protein sequences (Bg1861 and Ag1964) demonstrated high identities (45–51%) to that of CHC_T00008516001 (XP_005713538.1) for D-α-Gal-2,6‑sulfurylase I from Ch. crispus (Fig. 4). All these three protein sequences share around 23–25% identities with the protein sequences of L-amino oxidase from Chlamydomonas reinhardtii (XP_001700756.1 and XP_001694130.1). Since Bg1861 and Ag1964 are 202 and 364 residues shorter at the N-terminus compared to that of CHC_T00008516001, the NAD binding 8/NADB_Rossmann family signature (pfam 13450) which is conserved in L-amino oxidase sequences and CHC_T00008516001 was thus missing. The percentage of identity shared by Bg692 and CHC_T00009416001 (D-α-Gal7

Genomics xxx (xxxx) xxx–xxx

C.-L. Ho et al.

slightly less than that in G. salicornia (202) or G. tenuistipitata (204). The ORFs predicted in the chloroplast genome of G. changii either matched with those in the chloroplast genomes of G. salicornia or G. tenuistipitata with highly conserved synteny. The type and number of tRNAs predicted in the chloroplast genome of G. changii are the same as in the other two Gracilaria chloroplast genomes (Table 1). G. tenuistipitata was reported to contain a group II intron in an essential gene for elongator tRNA-Met (trnMe) [57]. Similarly, a group II intron is also found in the plastid genome of G. changii. The group II intron was previously reported to be evolutionary stable upon acquisition in terrestrial plants and charophytes [58,59]. It is the first red algal intron which was found to encode a reverse transcriptase/maturase [57]. Similar to the chloroplast genome of G. salicornia, ssrA, leuC and leuD are absent in the plastid genome of G. changii although these genes are present in the plastid genome of G. tenuistipitata. leuC and leuD that encode the large and small subunits of 3-isopropylmalate dehydratase which catalyzes the isomerization step in leucine biosynthesis and analogous reactions, are also absent in other red algal plastid genomes [60]. While G. changii and G. salicornia may use the canonical plastidtargeted leuC/D for leucine biosynthesis [57], these proteobacteriallike leuC and leuD genes were hypothesized to be acquired by the plastid of G. tenuistipitata through horizontal gene transfer. In addition, pbsA (which was present in G. tenuistipitata and G. salicornia) was not found in G. changii while ycf23 (which was present in G. tenuistipitata and G. changii) was not found in G. salicornia. The red algal plastid genes are not only promising as markers for reconstruction of evolutionary relationships, the plastid genomes are also useful for species barcoding [61]. The scaffold for the mitochondrial genome of G. changii (Contigs 206; GenBank accession No. KY009863) was also identified. However, we will not discuss the mitochondrial genome here since it has been published by another group recently [62].

Table 1 Number and recognition pattern of tRNA in the chloroplast of Gracilaria changii. Amino acid

Arginine Leucine Glycine Methionine Serine Threonine Valine Alanine Asparagine Aspartate Cysteine Glutamate Glutamine Histidine Ileucine Lysine Phenylalanine Proline Tryptophan Tyrosine

Chloroplast Number of tRNA

Recognition pattern of tRNA

3 3 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1

ACG, CCG, UCU CAA, UAA, UAG GCC, UCC CAU GCU, UGA GGU,TGT GAC, UAC UGC GUU GUC GCA UUC UUG GUG GAU UUU GAA GGG CCA GUA

acquired these enzymes through horizontal lateral gene transfer. In the present study, candidate genes involved in the agar biosynthetic pathway have been annotated, genome sequencing is a key step towards better understanding of the agar biosynthesis.

3.3. Plastid genome The scaffold for chloroplast genome of G. changii was identified in Contig 30 which is 183,855 bp with an average coverage of 3938. The alignment indicated an average of 85% and 82% nucleotide identity between G. changii and the other two genomes, G. salicornia and G. tenuistipitata, respectively. The GC content of chloroplast genome of G. changii is 28.1% which is slightly lower compared those to in G. salicornia (29.15%; NC_023785) [15] and G. tenuistipitata (28.8%; NC_006137) [16], which is typical for Gracilaria chloroplast genomes. The chloroplast genomes from Gracilaria species remain highly conserved in both gene content and synteny, indicating a slow evolving genomic architecture in this genus. A total of 201 ORFs, 29 tRNAs and 3 rRNAs were predicted (Fig. 5). The number of ORF predicted was

3.4. Microbiome associated with G. changii Although algal-bacterial interactions have been reported to be important to the biology of both partners [19,63,64], the bacterial biodiversity of the microbial communities that are closely associated with seaweeds are largely unexplored [20]. The interactions between macroalgae and bacteria can be symbiotic, pathological and opportunistic depending on the biodiversity of the microbial communities [19]. Recent investigations on microbiomes associated with a few algae including Ulva spp., Fucus vesiculosus, Saccharina latissima and Delisea

Table 2 The genus and number (in bracket) of bacterial sequences found among the genome sequence of Gracilaria changii. Phylum

Class

Order

Family

Genus

Proteobacteria (44)

Alphaproteobacteria (21)

Rhodobacterales (6) Rhizobiales (3) Sphingomonadales (2) Eilatimonas (1) Caulobacterales (1) Alteromonadales (6)

Rhodobacteraceae (6) Hyphomicrobiaceae (1) Erythrobacteraceae (2)

Jannaschia (1)

Gammaproteobacteria (13)

Oceanospirillales (3)

Planctomycetes (7) Cyanobacteria/Chloroplast (7)

Epsilonproteobacteria (3) Deltaproteobacteria (1) Flavobacteriia (10) Sphingobacteriia (3) Planctomycetia (7) Cyanobacteria (3)

Actinobacteria (3)

Chloroplast (3) Actinobacteria (3)

Bacteroidetes (17)

Vibrionales (1) Campylobacterales (3) Bdellovibrionales (1) Flavobacteriales (10) Sphingobacteriales (1) Planctomycetales (7) Family IV (1) Family VIII (2) Chloroplast (3) Acidimicrobidae (1)

8

Hyphomonadaceae (1) Alteromonadaceae (1) Colwelliaceae (1) Pseudoalteromonadaceae (4) Oceanospirillaceae (3) Vibrionaceae (1) Campylobacteraceae (3) Bacteriovoracaceae (1) Flavobacteriaceae (10) Planctomycetaceae (7) GpIV (1) GpVIII(2) Bacillariophyta (3) Acidimicrobiales (1)

Aestuariibacter (1) Thalassomonas (1) Pseudoalteromonas (4) Marinomonas (1) Oceanospirillum (1) Vibrio (1) Arcobacter (3) Peredibacter (1) Robiginitalea (2) Blastopirellula (1)

Acidimicrobineae (1)

Genomics xxx (xxxx) xxx–xxx

C.-L. Ho et al.

4. Conclusions

pulchra, G. vermiculophylla have attempted to address the variation of microbial community in relation to geographical location, season and taxonomy of the associated hosts [65,66–69]. Some of these questions require additional investigations on more algae. In the present study, the sequence data of field sample provided a glimpse of the associated core microbiome of G. changii. The contigs which have an average coverage less than 100 are many in number (318,947 with a total length of 168 Mb), but have a low coverage (8.5 in average), and 92% of them were below 1 kb. They consist of approximately 7.8% of the sequencing data generated in this study. Among these contigs, 386 of them were mapped to the RDP and Greengenes databases with an E-value cutoff at 10− 5, whereby 202 and 3 of them were classified as bacterial and archaeal 16S DNA ribosomal sequences, respectively; while the remaining 181 sequences were not being classified. A total of 124 bacterial and all archaeal 16S ribosomal DNA sequences could not be further classified into lower taxa while 78 bacterial sequences were further classified as 16S DNA ribosomal sequences from Proteobacteria (56.41%), Bacteroidetes (21.79%), Planctomycetes (8.97%), Cyanobacteria/Chloroplast (8.97%) and Actinobacteria (3.85%). The majority of 16S DNA ribosomal sequences from Proteobacteria consist of those from α-proteobacteria (21), γproteobacteria (13), ε-proteobacteria (3), and δ-proteobacteria (1) (Table 2). Similar to the findings of Miranda et al. [20] on the microbiome of P. umbilicalis, bacteria from the Bacteroidetes, Planctomycetes and Proteobacteria that are known to digest the galactan sulfates of red algal cell walls were well-represented in the microbiome associated with G. changii. It is not surprising as both Gracilaria and Porphyra have structurally similar sulfated galactans i.e., agar and porphyran in their cell walls [70]. In addition, the microbial community of G. vermiculophylla was also reported to be enriched in Bacteroidetes and Proteobacteria [68]. α-Proteobacteria has also been reported to be abundant in marine samples including Porphyra and Gracilaria species previously [71]. The 16S ribosomal DNA sequences that can be classified up to genus level with high confidence were: Jannaschia, Peredibacter, Arcobacter; Aestuariibacter, Marinomonas, Oceanospirillum, Pseudoalteromonas, Thalassomonas, Vibrio, Acidimicrobineae, Robiginitalea, and Blastopirellula. Among these, only Vibrio was reported in marine samples including Gracilaria by Aravindraja et al. [71]. The predicted protein sequences from the contigs which have an average coverage less than 100 matched with those from Escherichia (10.77%), Bacillus (8.03%), Rhizobium (6.37%), Pseudomonas (4.49%), Erythrobacter (3.31%), Haemophilus (2.67%), Vibrio (2.47%), Mycobacterium (2.29%), Caulobacter (2.25%), Shigella (2.15%), Salmonella (2.05%), Rhodobacter (1.77%), Pseudoalteromonas (1.67%), Alteromonas (1.64%), and Rickettsia (1.61%). Sequencing of transcriptomes from G. changii demonstrated that 7.69% of the annotated transcripts have the highest identities with transcripts encoding proteins from Bacillus (0.76%), Pseudomonas (0.31%), Escherichia (0.23%), Haemophilus (0.23%), Candidatus (0.19%), Streptomyces (0.18%) and Leptospira (0.18%) [13]. Although the distribution (percentage) of taxa differs, the types of bacteria that are present are generally in agreement with each other. Despite a substantial number of protein coding sequences from various microorganisms could be coexisting with the seaweed source, they may not be present in their transcribed or translated form in the seaweeds sample. For example, several gene sequences encoding sulfatases were found in the microbiome associated with G. changii, they could not be detected in the RNA-seq data of G. changii [13,14]. The ways in which algal-bacterial interactions impact metabolic and developmental features of the hosts are largely uncharacterized. The RNA-seq data of seaweed samples may provide some clues on the type of bacterial proteins being translated in seaweed samples, and perhaps shed lights on how bacteria affect metabolic features of the hosts.

In conclusion, we have successfully elucidated the complete chloroplast genome of G. changii. The G. changii draft genome obtained may be fragmented and incomplete, but provides extensive coverage of the genes. The draft nuclear genome is valuable in accelerating studies of candidate genes related to agar biosynthesis while the organelle genomes can facilitate future study on algal evolution. Sequencing of field samples also revealed microbiome that is closely associated with the agarophyte. Exploration of microbiome of macroalgae is critical to the elucidation of dependencies of algal hosts to their microbiome. Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.ygeno.2017.09.003. Acknowledgements The project was partially funded by the Great Illumina 10G Challenge Award, and Fundamental Research Grant Scheme (No: 0104-10-769FR) from the Ministry of Higher Education, Malaysia (MOHE), awarded to CL Ho. EL Lim was supported by the Graduate Research Fellowship of Universiti Putra Malaysia, and WK Lee was supported by the MyPhD Fellowship of the Ministry of Education, Malaysia. We acknowledge the technical assistance provided by Science Vision Sdn. Bhd., BioEasy Sdn. Bhd. and Codon Genomics Sdn. Bhd. References [1] G.M. Smith, Marine Algae of the Monterey Peninsula, 2nd ed, Stanford Univ, California, 1944. [2] D.W. Renn, Agar and agarose: indispensable partners in biotechnology, Ind. Eng. Chem. Prod. Res. Dev. 23 (1984) 17–21. [3] J.M. Cock, L. Sterck, P. Rouzé, D. Scornet, A.E. Allen, G. Amoutzias, V. Anthouard, F. Artiguenave, J.M. Aury, J.H. Badger, B. Beszteri, K. Billiau, E. Bonnet, J.H. Bothwell, C. Bowler, C. Boyen, C. Brownlee, C.J. Carrano, B. Charrier, G.Y. Cho, S.M. Coelho, J. Collén, E. Corre, C. Da Silva, L. Delage, N. Delaroque, S.M. Dittami, S. Doulbeau, M. Elias, G. Farnham, C.M. Gachon, B. Gschloessl, S. Heesch, K. Jabbari, C. Jubin, H. Kawai, K. Kimura, B. Kloareg, F.C. Küpper, D. Lang, A. Le Bail, C. Leblanc, P. Lerouge, M. Lohr, P.J. Lopez, C. Martens, F. Maumus, G. Michel, D. Miranda-Saavedra, J. Morales, H. Moreau, T. Motomura, C. Nagasato, C.A. Napoli, D.R. Nelson, P. Nyvall-Collén, A.F. Peters, C. Pommier, P. Potin, J. Poulain, H. Quesneville, B. Read, S.A. Rensing, A. Ritter, S. Rousvoal, M. Samanta, G. Samson, D.C. Schroeder, B. Ségurens, M. Strittmatter, T. Tonon, J.W. Tregear, K. Valentin, P. von Dassow, T. Yamagishi, Y. Van de Peer, P. Wincker, The Ectocarpus genome and the independent evolution of multicellularity in brown algae, Nature 465 (2010) 617–621. [4] J. Collén, B. Porcel, W. Carré, S.G. Ball, C. Chaparro, T. Tonon, T. Barbeyron, G. Michel, B. Noel, K. Valentin, M. Elias, F. Artiguenave, A. Arun, J.M. Aury, J.F. Barbosa-Neto, J.H. Bothwell, F.Y. Bouget, L. Brillet, F. Cabello-Hurtado, S. Capella-Gutiérrez, B. Charrier, L. Cladière, J.M. Cock, S.M. Coelho, C. Colleoni, M. Czjzek, C. Da Silva, L. Delage, F. Denoeud, P. Deschamps, S.M. Dittami, T. Gabaldón, C.M. Gachon, A. Groisillier, C. Hervé, K. Jabbari, M. Katinka, B. Kloareg, N. Kowalczyk, K. Labadie, C. Leblanc, P.J. Lopez, D.H. McLachlan, L. Meslet-Cladiere, A. Moustafa, Z. Nehr, P. Nyvall Collén, O. Panaud, F. Partensky, J. Poulain, S.A. Rensing, S. Rousvoal, G. Samson, A. Symeonidi, J. Weissenbach, A. Zambounis, P. Wincker, C. Boyen, Genome structure and metabolic features in the red seaweed Chondrus crispus shed light on evolution of the Archaeplastida, Proc. Natl. Acad. Sci. U. S. A. 110 (2013) 5247–5252. [5] M. Matsuzaki, O. Misumi, T. Shin-I, S. Maruyama, M. Takahara, S.Y. Miyagishima, T. Mori, K. Nishida, F. Yagisawa, K. Nishida, Y. Yoshida, Y. Nishimura, S. Nakao, T. Kobayashi, Y. Momoyama, T. Higashiyama, A. Minoda, M. Sano, H. Nomoto, K. Oishi, H. Hayashi, F. Ohta, S. Nishizaka, S. Haga, S. Miura, T. Morishita, Y. Kabeya, K. Terasawa, Y. Suzuki, Y. Ishii, S. Asakawa, H. Takano, N. Ohta, H. Kuroiwa, K. Tanaka, N. Shimizu, S. Sugano, N. Sato, H. Nozaki, N. Ogasawara, Y. Kohara, T. Kuroiwa, Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D, Nature 428 (2004) 653–657. [6] G. Schönknecht, W.H. Chen, C.M. Ternes, G.G. Barbier, R.P. Shrestha, M. Stanke, A. Bräutigam, B.J. Baker, J.F. Banfield, R.M. Garavito, K. Carr, C. Wilkerson, S.A. Rensing, D. Gagneul, N.E. Dickenson, C. Oesterhelt, M.J. Lercher, A.P. Weber, Gene transfer from bacteria and archaea facilitated evolution of an extremophilic eukaryote, Science 339 (2013) 1207–1210. [7] D. Bhattacharya, D.C. Price, C.X. Chan, H. Qiu, N. Rose, S. Ball, A.P. Weber, M.C. Arias, B. Henrissat, P.M. Coutinho, A. Krishnan, S. Zäuner, S. Morath, F. Hilliou, A. Egizi, M.M. Perrineau, H.S. Yoon, Genome of the red alga Porphyridium purpureum, Nat. Commun. 4 (2013) 1941. [8] Y. Nakamura, N. Sasaki, M. Kobayashi, N. Ojima, M. Yasuike, Y. Shigenobu, M. Satomi, Y. Fukuma, K. Shiwaku, A. Tsujimoto, T. Kobayashi, I. Nakayama, F. Ito, K. Nakajima, M. Sano, T. Wada, S. Kuhara, K. Inouye, T. Gojobori, K. Ikeo, The first symbiont-free genome sequence of marine red alga, susabi-nori (Pyropia yezoensis), PLoS One 8 (2013) e57122. [9] H. Porse, B. Rudolph, The seaweed hydrocolloid industry: 2016 updates,

9

Genomics xxx (xxxx) xxx–xxx

C.-L. Ho et al.

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19] [20]

[21]

[22] [23]

[24]

[25]

[26] [27] [28] [29]

[30] [31]

[32]

[33]

[34]

[35]

[36] [37] [38] [39]

[40] C.S. Lobban, P.J. Harrison, M.J. Duncan, The Physiological Ecology of Seaweeds, Cambridge University Press, New York, 1985. [41] R.E. Rincones, S. Yu, M. Pedersen, Effect of dark treatment on the starch degradation and the agar quality of cultivated Gracilariopsis lemaneiformis (Rhodophyta, Gracilariales) from Venezuela, Hydrobiologia 260 (261) (1993) 633–640. [42] P. Nyvall, J. Pelloux, H.V. Davies, M. Pedersén, R. Viola, Purification and characterisation of a novel starch synthase selective for uridine 5′-diphosphate glucose from the red alga Gracilaria tenuistipitata, Planta 209 (1999) 143–152. [43] J.I. Sesma, A.A. Iglesias, Synthesis of floridean starch in the red alga Gracilaria gracilis occurs via ADP-glucose, in: G. Garag (Ed.), Photosynthesis: Mechanisms and Effects, vol. V, Kluwer Publishers, Dordrecht, 1998, pp. 3537–3540. [44] S. Yu, L. Kenne, M. Pedersén, α-1,4-Glucan lyase, a new class of starch/glycogen degrading enzyme I: efficient purification and characterization from red seaweeds, Biochim. Biophys. Acta 1156 (1993) 313–320. [45] N. Pade, N. Linka, W. Ruth, A.P. Weber, M. Hagemann, Floridoside and isofloridoside are synthesized by trehalose 6-phosphate synthase-like enzymes in the red alga Galdieria sulphuraria, New Phytol. 205 (2014) 1227–1238. [46] G.O. Kirst, Salinity tolerance of eukaryotic marine algae, Annu. Rev. Plant Physiol. Plant Mol. Biol. 41 (1990) 21–53. [47] S. Yu, M. Pedersen, The effect of salinity changes on the activity of α-galactosidase of the red algae Gracilaria sordida and G tenuistipitata, Bot. Mar. 33 (1990) 385–392. [48] S. Yu, M. Pedersen, The α-galactosidase of Gracilaria tenuistipitata and G sordida (Gracilariales, Rhodophyta), Phycologia 29 (1990) 454–460. [49] J. Craigie, Cell walls, in: K.M. Cole, R.G. Sheath (Eds.), Biology of the Red Algae, Cambridge University Press, New York, 1990, pp. 221–258. [50] M. Lahaye, C. Rochas, Chemical structure and physico-chemical properties of agar, Hydrobiology 221 (1991) 137–148. [51] H. Kitagawa, T. Uyama, K. Sugahara, Molecular cloning and expression of a human chondroitin synthase, J. Biol. Chem. 276 (2001) 38721–38726. [52] M. Fukuta, K. Uchimura, K. Nakashima, M. Kato, K. Kimata, T. Shinomura, O. Habuchi, Molecular cloning and expression of chick chondrocyte chondroitin 6sulfotransferase, J. Biol. Chem. 270 (1995) 18575–18580. [53] M. Fukuta, J. Inazawa, T. Torii, K. Tsuzuki, E. Shimada, O. Habuchi, Molecular cloning and characterization of human keratan sulfate Gal-6-sulfotransferase, J. Biol. Chem. 272 (1997) 32321–32328. [54] K.D. Mazany, T. Peng, C.E. Watson, I. Tabas, K.J. Williams, Human chondroitin 6sulfotransferase: cloning, gene structure, and chromosomal localization, Biochim. Biophys. Acta 1407 (1998) 92–97. [55] W.J. Chi, Y.K. Chang, S.K. Hong, Agar degradation by microorganisms and agardegrading enzymes, Appl. Microbiol. Biotechnol. 94 (2012) 917–930. [56] A. Préchoux, S. Genicot, H. Rogniaux, W. Helbert, Controlling carrageenan structure using a novel formylglycine-dependent sulfatase, an endo-4S-iota-carrageenan sulfatase, Mar. Biotechnol. 15 (2013) 265–274. [57] J. Janouškovec, S.L. Liu, P.T. Martone, W. Carré, C. Leblanc, J. Collén, P.J. Keeling, Evolution of red algal plastid genomes: ancient architectures, introns, horizontal gene transfer, and taxonomic utility of plastid markers, PLoS One 8 (2013) e59001. [58] J.R. Manhart, J.D. Palmer, The gain of two chloroplast tRNA introns marks the green algal ancestors of land plants, Nature 345 (1990) 268–270. [59] M. Turmel, C. Otis, C. Lemieux, The chloroplast genome sequence of Chara vulgaris sheds new light into the closest green algal relatives of land plants, Mol. Biol. Evol. 23 (2006) 1324–1338. [60] S. Binder, T. Knill, J. Schuster, Branched-chain amino acid metabolism in higher plants, Physiol. Plant. 129 (2007) 68–78. [61] J. Hughey, P. Silva, M. Hommersand, Solving taxonomic and nomenclatural problems in Pacific Gigartinaceae (Rhodophyta) using DNA from type material, J. Phycol. 1109 (2001) 1091–1109. [62] S.-L. Song, H.-S. Yong, P.-E. Lim, S.-M. Phang, Complete mitochondrial genome of Gracilaria changii (Rhodophyta, Gracilariaceae), J. Appl. Phycol. (2017), http://dx. doi.org/10.1007/s10811-017-1100-z. [63] C. Vairappan, S. Anangdan, K. Tan, S. Matsunaga, Role of secondary metabolites as defense chemicals against ice-ice disease bacteria in biofouler at carrageenophyte farms, J. Appl. Phycol. 22 (2010) 305–311. [64] A. Nasrolahi, S.B. Stratil, K.J. Jacob, M. Wahl, A protective coat of microorganisms of macroalgae: inhibitory effects of bacterial microfilms and epibiotic microbial assemblages on barnacle attachment, FEMS Microbiol. Ecol. 81 (2012) 583–595. [65] S.R. Longford, N.A. Tujula, G.R. Crocetti, A.J. Holmes, C. Holmström, S. Kjelleberg, P.D. Steinberg, M.W. Taylor, Comparisons of diversity of bacterial communities associated with three sessile marine eukaryotes, Aquat. Microb. Ecol. 48 (2007) 217–229. [66] N.A. Tujula, G.R. Crocetti, C. Burke, T. Thomas, C. Holmström, S. Kjelleberg, Variability and abundance of the epiphytic bacterial community associated with a green marine Ulvacean alga, ISME J. 4 (2010) 301–311. [67] C. Burke, T. Thomas, M. Lewis, P. Steinberg, S. Kjelleberg, Composition, uniqueness and variability of the epiphytic bacterial community of the green alga Ulva australis, ISME J. 5 (2011) 590–600. [68] T. Lachnit, D. Meske, M. Wahl, T. Harder, R. Schmitz, Epibacterial community patterns on marine macroalgae are host-specific but temporally variable, Environ. Microbiol. 13 (2011) 655–665. [69] T. Staufenberger, V. Thiel, J. Wiese, J.F. Imhoff, Phylogenetic analysis of bacteria associated with Laminaria saccharina, FEMS Microbiol. Ecol. 64 (2008) 65–77. [70] Correc Gl, J.H. Hehemann, M. Czjzek, W. Helbert, Structural analysis of the degradation products of porphyran digested by Zobellia galactanivorans β-porphyranase A, Carbohydr. Polym. 83 (2011) 277–283. [71] C. Aravindraja, D. Viszwapriya, S. Karutha Pandian, Ultradeep 16S rRNA sequencing A analysis of geographically similar but diverse unexplored marine samples reveal varied bacterial community composition, PLoS One 8 (10) (2013) e76724.

requirements, and outlook, J. Appl. Phycol. (2017), http://dx.doi.org/10.1007/ s10811-017-1144-0. S.-S. Teo, C.-L. Ho, S. Teoh, W.-W. Lee, J.-M. Tee, A.R. Raha, S.-M. Phang, Analyses of expressed sequence tags (ESTs) from an agarophyte, Gracilaria changii (Gracilariales, Rhodophyta), Eur. J. Phycol. 42 (2007) 41–46. S.-S. Teo, C.-L. Ho, S. Teoh, A.R. Raha, S.-M. Phang, Transcriptomic analysis of Gracilaria changii (Rhodophyta) in response to hyper- and hypo-osmotic stresses, J. Phycol. 45 (2009) 1093–1099. C.L. Ho, S. Teoh, S.S. Teo, A.R. Raha, S.M. Phang, Profiling the transcriptome of Gracilaria changii (Rhodophyta) in response to light deprivation, Mar. Biotechnol. 11 (2009) 1436–2236. E.L. Lim, R.S. Siow, R. Abdul Rahim, C.L. Ho, Global transcriptome analysis of Gracilaria changii (Rhodophyta) in response to agarolytic enzyme and bacterium, Mar. Biotechnol. 18 (2016) 189–200. W.K. Lee, P. Namasivayam, J.O. Abdullah, C.L. Ho, Transcriptome profiling of sulfate deprivation responses in two agarophytes Gracilaria changii and Gracilaria salicornia (Rhodophyta), Sci Rep 7 (2017) 46563. M.A. Campbell, G.G. Presting, M.S. Bennett, A.R. Sherwood, Highly conserved organellar genomes in the Gracilariales as inferred using new data from the Hawaiian invasive red alga Gracilaria salicornia (Rhodophyta), Phycologia 53 (2014) 109–116. J.C. Hagopian, M. Reis, J.P. Kitajima, D. Bhattacharya, M.C. de Oliveira, Comparative analysis of the complete plastid genome sequence of the red alga Gracilaria tenuistipitata var liui provides insights into the evolution of rhodoplasts and their relationship to other plastids, J. Mol. Evol. 59 (2004) 464–477. S.-M. Chung, V.S. Gordon, J.E. Staub, Sequencing cucumber (Cucumis sativus L.) chloroplast genomes identifies differences between chilling-tolerant and -susceptible cucumber lines, Genome 50 (2007) 215–225. V.S. Gordon, J.E. Staub, Comparative analysis of chilling response in cucumber through plastidic and nuclear genetic effects component analysis, J. Am. Soc. Hortic. Sci. 136 (2011) 256–264. F. Goecke, A. Labes, J. Wiese, J.F. Imhoff, Chemical interactions between marine macroalgae and bacteria, Mar. Ecol. Prog. Ser. 409 (2010) 267–300. L.N. Miranda, K. Hutchison, A.R. Grossman, S.H. Brawley, Diversity and abundance of the bacterial community of the red macroalga Porphyra umbilicalis: did bacterial farmers produce macroalgae? PLoS One 8 (2013) e58269. M. Stanke, O. Schöffmann, B. Morgenstern, S. Waack, Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources, BMC Bioinf. 7 (2006) 62. S.F. Altschul, W. Gish, W. Miller, E.W. Myers, D.J. Lipman, Basic local alignment search tool, J. Mol. Biol. 215 (1990) 403–410. F.A. Simao, R.M. Waterhouse, P. Ioannidis, E.V. Kriventseva, E.M. Zdobnov, BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs, Bioinformatics 31 (2015) 3210–3212. M. Falda, S. Toppo, A. Pescarolo, E. Lavezzo, B. Di Camillo, A. Facchinetti, E. Cilia, R. Velasco, P. Fontana, Argot2: a large scale function prediction tool relying on semantic similarity of weighted gene ontology terms, BMC Bioinf. 13 (2000) 2012. R. Chenna, H. Sugawara, T. Koike, R. Lopez, T.J. Gibson, D.G. Higgins, J.D. Thompson, Multiple sequence alignment with the Clustal series of programs, Nucleic Acids Res. 31 (2003) 3497–3500. K. Tamura, J. Dudley, M. Nei, S. Kumar, MEGA4: Molecular evolutionary genetics analysis (MEGA) software version 4.0, Mol. Biol. Evol. 24 (2007) 1596–1599. N. Saitou, M. Nei, The neighbor-joining method: a new method for reconstructing phylogenetic trees, Mol. Biol. Evol. 4 (1987) 406–425. S. van Dongen, Graph Clustering by Flow Simulation (PhD thesis), University of Utrecht, 2000. J. Besemer, A. Lomsadze, M. Borodovsky, GeneMarkS: a self-training method for prediction of gene starts in microbial genomes implications for finding sequence motifs in regulatory regions, Nucleic Acids Res. 29 (2001) 2607–2618. S.K. Wyman, R.K. Jansen, J.L. Boore, Automatic annotation of organellar genomes with DOGMA, Bioinformatics 20 (2004) 3252–3255. P. Schattner, A.N. Brooks, T.M. Lowe, The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs, Nucleic Acids Res. 33 (2005) W686–W689. M. Lohse, O. Drechsel, R. Bock, OrganellarGenomeDRAW (OGDRAW) - a tool for the easy generation of high-quality custom graphical maps of plastid and mitochondrial genomes, Curr. Genet. 52 (2007) 267–274. Q. Wang, G.M. Garrity, J.M. Tiedje, J.R. Cole, Naïve Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy, Appl. Environ. Microbiol. 73 (2007) 5261–5267. J.R. Cole, Q. Wang, E. Cardenas, J. Fish, B. Chai, R.J. Farris, A.S. Kulam-SyedMohideen, D.M. McGarrell, T. Marsh, G.M. Garrity, J.M. Tiedje, The ribosomal database project: improved alignments and new tools for rRNA analysis, Nucleic Acids Res. 37 (2009) D141–D145. L.S. de Oliveira, G.B. Gregoracci, G.G. Silva, L.T. Salgado, G.A. Filho, M. AlvesFerreira, R.C. Pereira, F.L. Thompson, Transcriptomic analysis of the red seaweed Laurencia dendroidea (Florideophyceae, Rhodophyta) and its microbiome, BMC Genomics 13 (2012) 487. H.S. Yoon, J.D. Hackett, C. Ciniglia, G. Pinto, D. Bhattacharya, A molecular timeline for the origin of photosynthetic eukaryotes, Mol. Biol. Evol. 21 (2004) 809–818. B.J.D. Meeuse, M. Andries, J.A. Wood, Floridean starch, J. Exp. Bot. 11 (1960) 129–140. D.A. McCracken, J.R. Cain, Amylose in floridean starch, New Phytol. 88 (1981) 67–71. P. Ekman, S. Yu, M. Pedersen, Effects of altered salinity, darkness and algal nutrient status on floridoside and starch content, a-galactosidase activity and agar yield of cultivated Gracilaria sordida, Br. Phycol. J. 126 (1991) 123–131.

10