Hello, thank you for creating and maintaining this excellent resource. I am following up on this thread (insert). I would like to analyze these data for a specific set of isoforms (PPARG1 and PPARG2). I thought that the counts files (here) may contain transcript-level counts but I believe that they are gene-level counts, since I only find counts for a single “PPARG” annotation. Is this due to the annotation features file that was used in the summarizeOverlaps quantification? If so, is it advisable to work from the bam files and re-run summarizeOverlaps?
A few other clarifying questions on the human_cortex_SS4_Open_GRU_raw-file-manifest-2025-7-30.tsv:
- the
Tar Filenamefield entries do not specify R1 or R2 for paired end reads, yet, in the methods section to the paper cited in the above preceding thread (Jorstad, 2024), the SMART-seq methods section mentions “After clipping, the paired-end reads were mapped using Spliced Transcripts Alignment to a Reference (STAR)” - could you please provide a key to each element of the filenames separated by underscores, e.g. in
F1S4_191008_307_A01.fastq.tar. Does every unique filename correspond to a single cell? I’m wondering why each substring likeF1S4_191008_307has 8 files, A01:H01?
Thank you for your patience as this is my first time working with SS data. I appreciate any guidance you can provide.