Dataset information
This report has been verified by Polly as per framework v1.0 Learn More
| Dataset information | Value |
|---|---|
| Dataset ID | GSE68086_GPL16791_raw |
| Title | RNA-Seq of Tumor-Educated Platelets Enables Blood-Based Pan-Cancer, Multiclass, and Molecular Pathway Cancer Diagnostics |
| Summary | We report RNA-sequencing data of 283 blood platelet samples, including 228 tumor-educated platelet (TEP) samples collected from patients with six different malignant tumors (non-small cell lung cancer, colorectal cancer, pancreatic cancer, glioblastoma, breast cancer and hepatobiliary carcinomas). In addition, we report RNA-sequencing data of blood platelets isolated from 55 healthy individuals. This dataset highlights the ability of TEP RNA-based 'liquid biopsies' in patients with several types with cancer, including the ability for pan-cancer, multiclass cancer and companion diagnostics. |
| Overall Design | Blood platelets were isolated from whole blood in purple-cap BD Vacutainers containing EDTA anti-coagulant by standard centrifugation. Total RNA was extracted from the platelet pellet, subjected to cDNA synthesis and SMARTer amplification, fragmented by Covaris shearing, and prepared for sequencing using the Truseq Nano DNA Sample Preparation Kit. Subsequently, pooled sample libraries were sequenced on the Illumina Hiseq 2500 platform. All steps were quality-controlled using Bioanalyzer 2100 with RNA 6000 Picochip, DNA 7500 and DNA High Sensitivity chips measurements. For further downstream analyses, reads were quality-controlled using Trimmomatic, mapped to the human reference genome using STAR, and intron-spanning reads were summarized using HTseq. The processed data includes 285 samples (columns) and 57736 ensemble gene ids (rows). The supplementary data file (TEP_data_matrix.txt) contains the intron-spanning read counts, after data summarization by HTseq. |
| Number of samples | 285 |
| Publication Link | Link |
| Abstract | Tumor-educated blood platelets (TEPs) are implicated as central players in the systemic and local responses to tumor growth, thereby altering their RNA profile. We determined the diagnostic potential of TEPs by mRNA sequencing of 283 platelet samples. We distinguished 228 patients with localized and metastasized tumors from 55 healthy individuals with 96% accuracy. Across six different tumor types, the location of the primary tumor was correctly identified with 71% accuracy. Also, MET or HER2-positive, and mutant KRAS, EGFR, or PIK3CA tumors were accurately distinguished using surrogate TEP mRNA profiles. Our results indicate that blood platelets provide a valuable platform for pan-cancer, multiclass cancer, and companion diagnostics, possibly enabling clinical advances in blood-based "liquid biopsies". |
| Disease | Breast Neoplasms, Triple Negative Breast Neoplasms, Digestive System Neoplasms, Colorectal Neoplasms, Glioblastoma, Normal, Carcinoma, Non-Small-Cell Lung, Pancreatic Neoplasms |
| Tissue | Blood |
| Drug | None |
| Cell Lines | None |
| Cell Type | Platelet |
| Organism | Homo Sapiens |
| Custom Curation | N/A |
Processing information
The section provides processing details for the data coming from source.
| Data Processing | SRA files are converted to fastq files using fasterq dump, then QC'ed using FastQC with short read threshold of 20. MinION adapter search with adapter threshold 2 is performed on Fastq file(s) and skewer quality trimming is done, with min. read length (18), and phred quality threshold (10). Kallisto quantification with fragment length (100) and standard deviation (20) is used to get read counts. These parameters ensure robust analysis and reliable interpretation of bulk RNA-seq data. |
|---|
1. Metadata information
| Metadata information | Value |
|---|---|
| Polly curated metadata fields are present at dataset level ℹ | Pass |
| Polly curated metadata fields are present at sample level ℹ | Pass |
| Polly curated metadata fields are present in gct file ℹ | Pass |
| Publication Link is provided ℹ | Pass |
| Publication Link is valid ℹ | Fail |
| Dataset-Level vs. Sample-Level Metadata: concordance check ℹ | Pass |
| Custom fields are present and valid ℹ | N/A |
2. Feature identifier
| Feature Identifier Check | Value |
|---|---|
| Ensembl Gene IDs present ℹ | Pass |
| Ensembl Gene IDs are valid ℹ | Pass |
| Gene Symbol present ℹ | Fail |
| Gene Symbol are valid ℹ | Fail |
3. Data Matrix
| Data Matrix | Value |
|---|---|
| Data Matrix Values Valid ℹ | Pass |
| Data Matrix Range ℹ | 0.00 to 5168550.00 |
4. Histogram for expression distribution
Figure 1: Histogram showing frequency and distribution of TPM normalised expression values across all samples.
| The histogram displays data distribution from counts matrix. The Raw count values are TPM normalized and log2(x+1) transformed for clarity. |
5. Sample wise distribution of expression values using a boxplot.
Figure 2: Boxplot showing TPM expression values across all samples.
| The boxplot displays sample-wise distribution of counts matrix. The Raw count values are TPM normalized and log2(x+1) transformed for clarity. |
6. Sample wise distribution of number of genes expressing using a barplot.
Figure 3: Barplot showing the distribution of number of genes with expresion value equal to 0 per sample.
| This barplot helps identify if there are any samples with significantly number of genes which are lowly expressed which may indicate low mapping of reads to the genome. |
1. Polly's curated
metadata field distribution
Figure 1: The umap plot(s) represent different
samples in a reduced dimensional space, with colors indicating the Polly
standard and custom curated fields.
The plot(s) aid in understanding the biological differences
between different samples as described by different metadata
fields. Note: Umap plot for the raw counts will not be a
reflective of correct distribution as the data requires
normalisation
Figure 2: The sunburst plot(s) represent counts of
different samples, with colors representing values from the Polly standard and custom
curated fields.
The plot(s) aid in understanding the distribution of different
samples as per the categorical metadata variables of Polly
standard curated fields
Figure 1: The umap plot(s) represent different samples in a reduced dimensional space, with colors indicating the Polly standard and custom curated fields.
| The plot(s) aid in understanding the biological differences between different samples as described by different metadata fields. Note: Umap plot for the raw counts will not be a reflective of correct distribution as the data requires normalisation |
Figure 2: The sunburst plot(s) represent counts of different samples, with colors representing values from the Polly standard and custom curated fields.
| The plot(s) aid in understanding the distribution of different samples as per the categorical metadata variables of Polly standard curated fields |
2. Source metadata field
distribution
Figure 3: The umap plot(s) represent different
samples in a reduced dimensional space, with colors indicating the
source metadata fields.
The plot(s) aid in understanding the biological differences
between different samples as described by different metadata
fields. Note: Umap plot for the raw counts will not be a
reflective of correct distribution as the data requires
normalisation
Figure 4: The sunburst plot represent counts of
different samples, with colors representing values from the source.
The plot(s) aid in understanding the distribution of different
samples as per the categorical metadata variables of source
fields
Figure 3: The umap plot(s) represent different samples in a reduced dimensional space, with colors indicating the source metadata fields.
| The plot(s) aid in understanding the biological differences between different samples as described by different metadata fields. Note: Umap plot for the raw counts will not be a reflective of correct distribution as the data requires normalisation |
Figure 4: The sunburst plot represent counts of different samples, with colors representing values from the source.
| The plot(s) aid in understanding the distribution of different samples as per the categorical metadata variables of source fields |