Dataset information
This report has been verified by Polly as per framework v1.0 Learn More
| Dataset information | Value |
|---|---|
| Dataset ID | GSE68086_GPL16791_raw |
| Title | RNA-Seq of Tumor-Educated Platelets Enables Blood-Based Pan-Cancer, Multiclass, and Molecular Pathway Cancer Diagnostics |
| Summary | We report RNA-sequencing data of 283 blood platelet samples, including 228 tumor-educated platelet (TEP) samples collected from patients with six different malignant tumors (non-small cell lung cancer, colorectal cancer, pancreatic cancer, glioblastoma, breast cancer and hepatobiliary carcinomas). In addition, we report RNA-sequencing data of blood platelets isolated from 55 healthy individuals. This dataset highlights the ability of TEP RNA-based 'liquid biopsies' in patients with several types with cancer, including the ability for pan-cancer, multiclass cancer and companion diagnostics. |
| Overall Design | Blood platelets were isolated from whole blood in purple-cap BD Vacutainers containing EDTA anti-coagulant by standard centrifugation. Total RNA was extracted from the platelet pellet, subjected to cDNA synthesis and SMARTer amplification, fragmented by Covaris shearing, and prepared for sequencing using the Truseq Nano DNA Sample Preparation Kit. Subsequently, pooled sample libraries were sequenced on the Illumina Hiseq 2500 platform. All steps were quality-controlled using Bioanalyzer 2100 with RNA 6000 Picochip, DNA 7500 and DNA High Sensitivity chips measurements. For further downstream analyses, reads were quality-controlled using Trimmomatic, mapped to the human reference genome using STAR, and intron-spanning reads were summarized using HTseq. The processed data includes 285 samples (columns) and 57736 ensemble gene ids (rows). The supplementary data file (TEP_data_matrix.txt) contains the intron-spanning read counts, after data summarization by HTseq. |
| Number of samples | 285 |
| Publication Link | Link |
| Abstract | Tumor-educated blood platelets (TEPs) are implicated as central players in the systemic and local responses to tumor growth, thereby altering their RNA profile. We determined the diagnostic potential of TEPs by mRNA sequencing of 283 platelet samples. We distinguished 228 patients with localized and metastasized tumors from 55 healthy individuals with 96% accuracy. Across six different tumor types, the location of the primary tumor was correctly identified with 71% accuracy. Also, MET or HER2-positive, and mutant KRAS, EGFR, or PIK3CA tumors were accurately distinguished using surrogate TEP mRNA profiles. Our results indicate that blood platelets provide a valuable platform for pan-cancer, multiclass cancer, and companion diagnostics, possibly enabling clinical advances in blood-based "liquid biopsies". |
| Disease | Breast Neoplasms, Triple Negative Breast Neoplasms, Digestive System Neoplasms, Colorectal Neoplasms, Glioblastoma, Normal, Carcinoma, Non-Small-Cell Lung, Pancreatic Neoplasms |
| Tissue | Blood |
| Drug | None |
| Cell Lines | None |
| Cell Type | Platelet |
| Organism | Homo Sapiens |
| Custom Curation | N/A |
Processing information
The section provides processing details for the data coming from source.
| Data Processing | SRA files are converted to fastq files using fasterq dump, then QC'ed using FastQC with short read threshold of 20. MinION adapter search with adapter threshold 2 is performed on Fastq file(s) and skewer quality trimming is done, with min. read length (18), and phred quality threshold (10). Kallisto quantification with fragment length (100) and standard deviation (20) is used to get read counts. These parameters ensure robust analysis and reliable interpretation of bulk RNA-seq data. |
|---|
QUALITY ASSURANCE CONTENT
1. Metadata information
| Metadata information | Value |
|---|---|
| Polly curated metadata fields are present at dataset level ℹ | Pass |
| Polly curated metadata fields are present at sample level ℹ | Pass |
| Polly curated metadata fields are present in gct file ℹ | Pass |
| Publication Link is provided ℹ | Pass |
| Publication Link is valid ℹ | Fail |
| Dataset-Level vs. Sample-Level Metadata: concordance check ℹ | Pass |
| Custom fields are present and valid ℹ | N/A |
2. Feature identifier
| Feature Identifier Check | Value |
|---|---|
| Ensembl Gene IDs present ℹ | Pass |
| Ensembl Gene IDs are valid ℹ | Pass |
| Gene Symbol present ℹ | Fail |
| Gene Symbol are valid ℹ | Fail |
3. Data Matrix
| Data Matrix | Value |
|---|---|
| Data Matrix Values Valid ℹ | Pass |
| Data Matrix Range ℹ | 0.00 to 5168550.00 |
4. Histogram for expression distribution
Figure 1: Histogram showing frequency and distribution of TPM normalised expression values across all samples.
| The histogram displays data distribution from counts matrix. The Raw count values are TPM normalized and log2(x+1) transformed for clarity. |
5. Sample wise distribution of expression values using a boxplot.
Figure 2: Boxplot showing TPM expression values across all samples.
| The boxplot displays sample-wise distribution of counts matrix. The Raw count values are TPM normalized and log2(x+1) transformed for clarity. |