Hello,
I am working with the Allen smart-seq data posted here (Mouse Whole Cortex and Hippocampus SMART-seq - brain-map.org). I’m hoping to use Seurat to eventually integrate this smartseq dataset with the 10x dataset here (Mouse Whole Cortex and Hippocampus 10x - brain-map.org) to use as a reference dataset, and the 10x HDF5 expression matrix file does seem to have both introns and exons combined. However, while the smartseq HDF5 expression matrix file is described as having both introns and exons combined, the labels of the expression matrix file suggest only exon counts are included. Is this a mistake or does the HDF5 file really only contain the exon counts?
What I see in R for the smartseq dataset:
library(rhdf5)
h5ls(“expression_matrix.hdf5”)
group name otype dclass dim
0 / data H5I_GROUP
1 /data exon H5I_GROUP
2 /data/exon dims H5I_DATASET INTEGER 2
3 /data/exon i H5I_DATASET FLOAT 703397415
4 /data/exon p H5I_DATASET INTEGER 45769
5 /data/exon x H5I_DATASET FLOAT 703397415
6 /data t_exon H5I_GROUP
7 /data/t_exon dims H5I_DATASET INTEGER 2
8 /data/t_exon i H5I_DATASET FLOAT 703397415
9 /data/t_exon p H5I_DATASET INTEGER 73364
10 /data/t_exon x H5I_DATASET FLOAT 703397415
11 /data total_exon_counts H5I_DATASET FLOAT 73363
12 / gene_names H5I_DATASET STRING 45768
13 / sample_names H5I_DATASET STRING 73363
Thank you for any help!