Transcriptomic Data of Utilization Processes for Nitrogen and Phosphorus in Prorocentrum donghaiense

Prorocentrum donghaiense is one of the most frequently occurred harmful algal blooms in the East China sea. In order to reveal the response mechanisms of P. donghaiense to different nutrient status, de novo transcriptome sequencing was used to examine transcriptomic differences in P. donghaiense that was grown under replete, nitrogen-limited or phosphorus-limited conditions. We noted that transcripts down-regulated by phosphate limitation included those encoding proteins involved in RNA transport, oxidative phosphorylation, photosynthesis, endocytosis, pyrimidine metabolism, glycolysis/gluconeogenesis, biosynthesis of amino acids, vitamin digestion and absorption, protein processing in endoplasmic reticulum, while the expression of genes involved in ribosomal protein metabolism were significantly up-regulated. The abundance of 896 transcripts were elevated or inhibited by nitrogen limitation and they were involved in metabolic processes similar to P depletion. Here, we presented the experimental procedures and analytical processes in detail.


Algal strain and culture conditions
Prorocentrum donghaiense (MEL203) was originally isolated from Zhu Jiang, China on 2009, and preserved in research center of harmful algae and marine biology in Jinan University.Before formal experiments, P. donghaiense was revived in natural seawater media supplemented with f/2 nutrients.Subsequently, the cultures were inoculated into f/2 media made from artificial seawater and incubated in a light-dark cycle of 12 h: 12 h with an intensity of 100 μmol/(m 2 •s) provided by cool fluorescent tubes and 20 ± 1 o C. The two antibiotics were added into the media to inhibit bacterial growth (final concentration: penicillin G, 30 mg L -1 ; streptomycin sulphate, 50 mg L -1 ).

Experimental Design
For the nitrogen and phosphate limitation experiment, cultures during exponential growth phase were centrifuged at 1400 g for 5 min, and the resultant pellets were transferred into nutrient-replete (882 μmol L -1 NO 3 -and 36 μmol L -1 PO 4 3-), N-free (0 μmol L -1 NO 3 -and 36 μmol L -1 PO 4 3-), and P-free media (882 μmol L -1 NO 3 -and 0 μmol L -1 PO 4 3-), respectively.The cultures were maintained at 20 ± 1 o C in a light-dark cycle of 12 h: 12 h.Cultures were harvested at 12 h after inoculation, and pellets were covered with RNA later solution(Sigma) and stored at -80 o C for further analysis.

Total RNA Extraction and Library Preparation
Total RNA was prepared from frozen cells using the total RNA extraction kit (Magen, shanghai, China).The extracted RNA was eluted in RNA-free water, and the concentration was determined using a spectrophotometer (Aglient technologies, CA, USA).The mRNA was purified from total RNA using poly-T oligo-attached magnetic beads.The purified mRNA was cut into fragments using divalent cations under high temperature.These RNA fragments were generated into first strand cDNA using random hexamer primer and RNase H.After that, the second strand of cDNA was subsequently synthesized using the first strand buffer, dNTPs, DNA polymerase I and RNase H.The cDNA fragments were purified with QiaQuick PCR kits and washed with EB buffer.And then, these fragments were terminally repaired, and poly(A)-tails and adapters were added.The aimed products were separated by agarose gel electrophoresis, and the fragments were PCR amplificated to create a cDNA library.The clustering of the index-coded samples was performed on a cBot cluster generation system using HiSeq PE Cluster Kit v4-cBot-HS (Illumina) and then the library preparations were sequenced using an illummina HiSeq 4000 sequencer and 150 bp paried-end reads were generated Raw data files have been deposited in the NCBI's Gene Expression Omnibus(GEO) (SRX3437735).

Quality Control, Assembly of Reads, Coding Region Prediction and Annotation
In order to get high-quality reads, raw data was processed with Perl scripts to get rid of reads with adaptor sequence, lowquality reads and reads with number of N accounting for more than 5%.High-quality reads were assembled by Trinity software (version 20140710) [1].The clean data were mapped to the assembled transcript by Bowtie [2] to post assembly evalution [4].TransDecoder (version 20140710) was used to recognize candidate coding regions of the assemblied reads.The functional annotation of unigenes and Open Reading Frames (ORFs) were achieved using Trinotate (version 20140717).Trinotate is a tool which is widely used for annotating de novo assembled transcriptomes.The software can carry out multiple functions, including homology search, the structure identification of the protein domains (HMMER/PFAM), protein signal prediction (SignalP/TmHMM), etc.

Analysis of Differentially Expressed Genes
Reads count for each gene in each sample was counted by HTSeq v0.6.0,RPKM (Reads Per Kilobase Millon Mapped Reads) was then used to quantitatively estimate gene expression values in each sample.The final set of the genes were used for differential gene expression analysis [3].DEGseq was used to compare genes that were upregulated and downregulated between two samples using a model based on the negative binomial distribution.The P-value could be assigned to each gene and adjusted following the Benjamini and Hochberg's correction for controlling the false discovery rate.Genes with q≤0.05 and |log2_ratio|≥1 are identified as Differentially Expressed Genes (DEGs) [4].