GSE68086_GPL16791_raw GEO Bulk RNAseq report

Dataset information

This report has been verified by Polly as per framework v1.0 Learn More

Dataset information	Value
Dataset ID	GSE68086_GPL16791_raw
Title	RNA-Seq of Tumor-Educated Platelets Enables Blood-Based Pan-Cancer, Multiclass, and Molecular Pathway Cancer Diagnostics
Summary	We report RNA-sequencing data of 283 blood platelet samples, including 228 tumor-educated platelet (TEP) samples collected from patients with six different malignant tumors (non-small cell lung cancer, colorectal cancer, pancreatic cancer, glioblastoma, breast cancer and hepatobiliary carcinomas). In addition, we report RNA-sequencing data of blood platelets isolated from 55 healthy individuals. This dataset highlights the ability of TEP RNA-based 'liquid biopsies' in patients with several types with cancer, including the ability for pan-cancer, multiclass cancer and companion diagnostics.
Overall Design	Blood platelets were isolated from whole blood in purple-cap BD Vacutainers containing EDTA anti-coagulant by standard centrifugation. Total RNA was extracted from the platelet pellet, subjected to cDNA synthesis and SMARTer amplification, fragmented by Covaris shearing, and prepared for sequencing using the Truseq Nano DNA Sample Preparation Kit. Subsequently, pooled sample libraries were sequenced on the Illumina Hiseq 2500 platform. All steps were quality-controlled using Bioanalyzer 2100 with RNA 6000 Picochip, DNA 7500 and DNA High Sensitivity chips measurements. For further downstream analyses, reads were quality-controlled using Trimmomatic, mapped to the human reference genome using STAR, and intron-spanning reads were summarized using HTseq. The processed data includes 285 samples (columns) and 57736 ensemble gene ids (rows). The supplementary data file (TEP_data_matrix.txt) contains the intron-spanning read counts, after data summarization by HTseq.
Number of samples	285
Publication Link	Link
Abstract	Tumor-educated blood platelets (TEPs) are implicated as central players in the systemic and local responses to tumor growth, thereby altering their RNA profile. We determined the diagnostic potential of TEPs by mRNA sequencing of 283 platelet samples. We distinguished 228 patients with localized and metastasized tumors from 55 healthy individuals with 96% accuracy. Across six different tumor types, the location of the primary tumor was correctly identified with 71% accuracy. Also, MET or HER2-positive, and mutant KRAS, EGFR, or PIK3CA tumors were accurately distinguished using surrogate TEP mRNA profiles. Our results indicate that blood platelets provide a valuable platform for pan-cancer, multiclass cancer, and companion diagnostics, possibly enabling clinical advances in blood-based "liquid biopsies".
Disease	Breast Neoplasms, Triple Negative Breast Neoplasms, Digestive System Neoplasms, Colorectal Neoplasms, Glioblastoma, Normal, Carcinoma, Non-Small-Cell Lung, Pancreatic Neoplasms
Tissue	Blood
Drug	None
Cell Lines	None
Cell Type	Platelet
Organism	Homo Sapiens
Custom Curation	N/A

Processing information

The section provides processing details for the data coming from source.

Data Processing	SRA files are converted to fastq files using fasterq dump, then QC'ed using FastQC with short read threshold of 20. MinION adapter search with adapter threshold 2 is performed on Fastq file(s) and skewer quality trimming is done, with min. read length (18), and phred quality threshold (10). Kallisto quantification with fragment length (100) and standard deviation (20) is used to get read counts. These parameters ensure robust analysis and reliable interpretation of bulk RNA-seq data.

Data Processing

SRA files are converted to fastq files using fasterq dump, then QC'ed using FastQC with short read threshold of 20. MinION adapter search with adapter threshold 2 is performed on Fastq file(s) and skewer quality trimming is done, with min. read length (18), and phred quality threshold (10). Kallisto quantification with fragment length (100) and standard deviation (20) is used to get read counts. These parameters ensure robust analysis and reliable interpretation of bulk RNA-seq data.

QUALITY ASSURANCE CONTENT

1. Metadata information

Metadata information	Value
Polly curated metadata fields are present at dataset level ℹ	Pass
Polly curated metadata fields are present at sample level ℹ	Pass
Polly curated metadata fields are present in gct file ℹ	Pass
Publication Link is provided ℹ	Pass
Publication Link is valid ℹ	Fail
Dataset-Level vs. Sample-Level Metadata: concordance check ℹ	Pass
Custom fields are present and valid ℹ	N/A

2. Feature identifier

Feature Identifier Check	Value
Ensembl Gene IDs present ℹ	Pass
Ensembl Gene IDs are valid ℹ	Pass
Gene Symbol present ℹ	Fail
Gene Symbol are valid ℹ	Fail

3. Data Matrix

Data Matrix	Value
Data Matrix Values Valid ℹ	Pass
Data Matrix Range ℹ	0.00 to 5168550.00

4. Histogram for expression distribution

Figure 1: Histogram showing frequency and distribution of TPM normalised expression values across all samples.

The histogram displays data distribution from counts matrix. The Raw count values are TPM normalized and log2(x+1) transformed for clarity.

5. Sample wise distribution of expression values using a boxplot.

Figure 2: Boxplot showing TPM expression values across all samples.

The boxplot displays sample-wise distribution of counts matrix. The Raw count values are TPM normalized and log2(x+1) transformed for clarity.

6. Sample wise distribution of number of genes expressing using a barplot.

Figure 3: Barplot showing the distribution of number of genes with expresion value equal to 0 per sample.

This barplot helps identify if there are any samples with significantly number of genes which are lowly expressed which may indicate low mapping of reads to the genome.

DATA EXPLORATION CONTENT

1. Polly's curated metadata field distribution

Figure 1: The umap plot(s) represent different samples in a reduced dimensional space, with colors indicating the Polly standard and custom curated fields.

The plot(s) aid in understanding the biological differences between different samples as described by different metadata fields. Note: Umap plot for the raw counts will not be a reflective of correct distribution as the data requires normalisation

Figure 2: The sunburst plot(s) represent counts of different samples, with colors representing values from the Polly standard and custom curated fields.

The plot(s) aid in understanding the distribution of different samples as per the categorical metadata variables of Polly standard curated fields

2. Source metadata field distribution

Figure 3: The umap plot(s) represent different samples in a reduced dimensional space, with colors indicating the source metadata fields.

Figure 4: The sunburst plot represent counts of different samples, with colors representing values from the source.

The plot(s) aid in understanding the distribution of different samples as per the categorical metadata variables of source fields