1. Introduction to OncoSplicing


2. Splicing types and quantification


3. Database construction pipeline


4. Function in OncoSplicing


5. Description of columns


6. Data summary


7. Abbreviation


8. Reference



1. Introduction to OncoSplicing

Percent splice in (PSI) ranged from 0 to 1 was a commonly used ratio to indicate different uses of alternative exon. We downloaded PSI value of 122,423 alternative splicing (AS) events across 10,699 samples in 33 cancer types from the TCGA SpliceSeq database [1]. The SplAdder [2] software is different with the SpliceSeq [3] software in detection and quantification of alternative splicing, which may contribute to the discovery of different value of alternative splicing in cancers. We collected raw counts data of 4,491,482 AS events in 9437 TCGA and 3323 GTEx samples from the SplAdder project [4] and then re-calculated PSI value of 238,558 confirmed and filtered AS events based on a modified pipeline. After integrating with sample types, survival data and different levels of clinical indicators, differential, survival and cancer specific analyses based on these data were preformed to identify potential clinically relevant AS events in each cancer type (See below). In order to facilitate users to quickly determine the transcripts information related to alternative splicing, 101,877 (83.2%) AS events in the SpliceSeq and 169,426 (71%) in the SplAdder project were annotated with at least one transcript. In addition, explanatory diagrams of detected AS events and annotated structure of transcripts were presented in the UCSC genome browser by adding custom tracks. In OncoSplicing, users can easily browse and search alternative splicing in cancers and perform visualization of clinical relevance in a single-cancer or pan-cancer view.



2. Splicing types and quantification

(1) Splice types detected in the SplAdder project including alternative 3’ site (A3), alternative 5’ site (A5), exon skip (ES), mutually exclusive exons (ME) and intron retention (IR) were respectively corresponding to five splice types detected in the SpliceSeq project, including alternative acceptor sites (AA), alternative donor sites (AD), exonskipping (ES), mutually exclusive exons (ME) and retained intron (RI). The SpliceSeq project also included splice types alternate promoter (AP) and alternate terminator (AT).


(2) The software SpliceSeq and SplAdder are different in AS detection and PSI calculation. The SpliceSeq software takes all read counts covered in the splicing exon to calculate PSI, while the SplAdder takes into consideration only the read counts covered in the splice junctions. In this database, we modified the PSI calculation in the SplAdder project by normalizing the read quantification by the number of splice junctions, which might to some extent affect quantification of read counts and PSI for ME and ES splice types.



3. Database construction pipeline

Primary pipelines for data processing were obtained from the papers of the SpliceSeq [1] and SplAdder [4] project separately. In the SplAdder project, the modified quantification of read counts and PSI was described in the section 2 and the pipeline to confirm novel AS events was modified by setting the minimum read counts for splice junctions as three for splice types ES and ME and performing sample size normalization with a percentage cut-off 0.5% in a cancer type population.

(1) Survival analysis. Survival analyses were performed based on overall survival (OS), progression free interval (PFI), disease free interval (DFI) and disease specific survival (DSS) data. Cox'PH regression analysis was used to evaluate relative Hazard ratio between two survival groups by dichotomizing PSI values and Log-rank test was used to value the significance. For each survival data, survival analysis for AS event implemented only if it with effective sample size > 30, survival event > 5 and minima of group size > 10. AS events with Log-Rank p-value less than 0.05 were considered significant survival-associated alternative splicing events (SASEs). All significant SASEs can be found on the "ClinicalAS" page and all results of survival analyses are stored in the info table on the "Download" page.Survival analyses on the "SpliceSeq" and "SplAdder" page were performed based on overall survival data for most cancer types or based on progression-free survival data due to a lack of overall survival events for PCPG, PRAD, TGCT, THCA and THYM.


(2) Differential analysis. Differential AS analyses were performed for cancer types (see data summary) with at least 30 TCGA tumour and 10 adjacent normal samples. The Wilcoxon rank-sum test was used to evaluate the significance of differences. AS events with absolute delta PSI more than 0.1 and Benjamini–Hochberg (BH) adjust p-value less than 0.05 were considered significant differential alternative splicing events (DASEs).


(3) Identification of clinical indicator-relevant AS events. Basic patient information, including age, sex and race, and nonredundant and variant clinical indicators, was manually collected and separated into two groups for each cancer type. Clinical indicators in a cancer type were reserved for further analysis only if there were more than 20 records per group. Differential AS analysis was performed between two groups for each indicator, and only AS events with a delta PSI greater than 0.1 and a BH adjusted p-value less than 0.05 were considered significant clinically relevant AS events. The explanation of each clinical indicator can be found here.


(4) Identification of cancer specific AS. AS events considered as cancer-specific AS only if they met one of following criteria: 1) PSI > 0.99 in more than 90% GTEx samples and < 0.95 in more than 10% tumour samples for at least one TCGA cancer type; or 2) PSI < 0.01 in more than 90% GTEx samples and > 0.05 in more than 10% tumour samples for at least one TCGA cancer type.


(5) Annotation of AS associated transcripts. An AS-associated transcript means that the transcript contained splice junctions in the AS event, associated with either exon splice in or splice out. Exons were organized from 5’ to 3’ for all transcripts annotated in the genome annotation file (GRCh37 or GENCODE19). Chromosome locations of splice junctions of each AS event were obtained and mapped to transcripts of the splice gene to confirm AS associated transcripts.



4. Function in OncoSplicing

(1) After choosing a cancer type and searching a gene symbol, users can browse an integrative information of the queried splice events, including gene information ①, events information ② and statistical results ③. By clicking on a gene symbol embeded with the hyperlink, users are directed to the Ensembl database. In the button region ④, by clicking on the "UCSC" button users are directed to the UCSC genome browser, which is characterized by customized tracks with annotated structure of transcripts ⑦, and explanatory diagrams ⑤ ⑥ of detected AS events in the SplAdder and SpliceSeq projects. By clicking on a plot button, corresponding plot of querying will be presented in few seconds. ① and ② are consistently presented in different pages such as "SpliceSeq", "SplAdder", "PanCancer" and "ClinAS", while ③ and ④ might be different among them. For the SplAdder project, event region in ② was organized orderly by three or four exons (0 base start to end) based on chromosome locations from 5' to 3' and linked by "-", and alternate region was characterized as 0 base start to end of longer intron (A3 and A5), longer exon (IR) or alternate exon (ES and ME).


(2) Three customized tracks are provided in the UCSC genome browser, presenting explanatory diagrams and annotated structure of transcripts in the SplAdder and SpliceSeq projects. In the explanatory diagram track for SplAdder ⑤ and SpliceSeq ⑥, AS events in Blue or Green are indicated as annotated known AS events while AS events in Red or Orange are indicated as novel AS events. In the structural transcripts track for GENECODE v19 ⑦, transcripts in DarkRed are indicated as protein coding while transcripts in DarkBlue are indicated as non-protein coding. Thick blocks are indicated as alternate exon/exons of an AS event in the tracks with explanatory diagrams of AS events and are indicated as CDS region of a transcript in the tracks with annotated structure of transcripts.


(3) OncoSplicing provides six different plotting functions to visualize each AS event.
KM-plot produces two plots for most AS events based on the median PSI cut-off (left) and the predicted optimal PSI cut-off (right, if applicable). The optimal cut-off was predicted using survival data by the “surv_cutpoint” function in the R package “survminer”.

TN-plot provides boxplot to show the distribution of PSI in tumor samples and the comparison with adjacent normal samples (if applicable) and/or GTEx samples (if applicable). The Wilcoxon rank-sum test was used to evaluate the significance of differences.

PanDiff plot provides a pan-cancer view of PSI differences of the queried AS event (detected in at least 3 cancers) between tumor samples and adjacent normal samples (left, if applicable) and/or GTEx normal samples (right, if applicable). The red dashed line indicate 0.05 in the Y-axis. The red labels in axises indicate transformed breaks. The in-circle colors represent different cancer types.

PanCox (PanOS or PanPFI) plot provides a pan-cancer view of Hazard ratio of the queried AS event (detected in at least 3 cancers) based on the median PSI cut-off (left) and the predicted optimal PSI cut-off (right, if applicable). Survival data (OS or PFI) used in the plot was labeled in the X-axis title.

PanPlot produces boxplot to show PSI distribution of the queried AS event across different cancer types and GTEx tissues (SplAdder) and distributions of read counts supporting exon splice in or splice out (SplAdder). In the Figure (3)C, colored labels and black labels in the X-axis represent TCGA cancers and GTEx tissues respectively. The upper and middle parts labeled with "Reads-In" and "Reads-Out" in Y-axis represent read count value surport exon splice in or splice out respectively.

CIplot provides the visualization of significant PSI or survival differences between two subgroups of a selected clinical indicator, which are similar with KMplot and TNplot respectively.

If no data of the queried AS event could be displayed for a plot function, an empty plot with warning message will be presented.




5. Description of columns



6. Data summary



7. Abbreviation

Cancer TypeFull Name Cancer TypeFull Name Cancer TypeFull Name
ACCAdrenocortical Carcinoma KIRCKidney Renal Clear Cell Carcinoma PRADProstate Adenocarcinoma
BLCABladder Urothelial Carcinoma KIRPKidney Renal Papillary Cell Carcinoma READRectum Adenocarcinoma
BRCABreast Invasive Carcinoma LAMLAcute Myeloid Leukemia SARCSarcoma
CESCCervical Squamous Cell Carcinoma LGGLower Grade Glioma SKCMSkin Cutaneous Melanoma
CHOLCholangiocarcinoma LIHCLiver Hepatocellular Carcinoma STADStomach Adenocarcinoma
COADColon Adenocarcinoma LUADLung Adenocarcinoma TGCTTesticular Germ Cell Tumors
DLBCDiffuse Large B-cell Lymphoma LUSCLung Squamous Cell Carcinoma THCAThyroid Carcinoma
ESCAEsophageal Carcinoma MESOMesothelioma THYMThymoma
GBMGlioblastoma Multiforme OVOvarian Serous Cystadenocarcinoma UCECUterine Corpus Endometrial Carcinoma
HNSCHead and Neck Squamous Cell Carcinoma PAADPancreatic Adenocarcinoma UCSUterine Carcinosarcoma
KICHKidney Chromophobe PCPGPheochromocytoma and Paraganglioma UVMUveal Melanoma


8. Reference

1. Ryan M, Wong WC, Brown R, Akbani R, Su X, Broom B, Melott J, Weinstein J. TCGASpliceSeq a compendium of alternative mRNA splicing in cancer. Nucleic Acids Res. 2016 Jan 4; 44(D1):D1018-22.


2. Kahles A, Ong CS, Zhong Y, Rätsch G. SplAdder: identification, quantification and testing of alternative splicing events from RNA-Seq data. Bioinformatics. 2016 Jun 15;32(12):1840-7.


3. Ryan MC, Cleland J, Kim R, Wong WC, Weinstein JN. SpliceSeq: a resource for analysis and visualization of RNA-Seq data on alternative splicing and its functional impacts. Bioinformatics. 2012 Sep 15; 28(18):2385-7.


4. Kahles A, Lehmann KV, Toussaint NC, Hüser M, Stark SG, Sachsenberg T, Stegle O, Kohlbacher O, Sander C; Cancer Genome Atlas Research Network, Rätsch G. Comprehensive Analysis of Alternative Splicing Across Tumors from 8,705 Patients. Cancer Cell. 2018 Aug 13; 34(2):211-224.e6.