Isoforms in SMART-Seq v4 DFC M1C Dataset

Hello, thank you for creating and maintaining this excellent resource. I am following up on this thread (insert). I would like to analyze these data for a specific set of isoforms (PPARG1 and PPARG2). I thought that the counts files (here) may contain transcript-level counts but I believe that they are gene-level counts, since I only find counts for a single “PPARG” annotation. Is this due to the annotation features file that was used in the summarizeOverlaps quantification? If so, is it advisable to work from the bam files and re-run summarizeOverlaps?

A few other clarifying questions on the human_cortex_SS4_Open_GRU_raw-file-manifest-2025-7-30.tsv:

  • the Tar Filename field entries do not specify R1 or R2 for paired end reads, yet, in the methods section to the paper cited in the above preceding thread (Jorstad, 2024), the SMART-seq methods section mentions “After clipping, the paired-end reads were mapped using Spliced Transcripts Alignment to a Reference (STAR)”
  • could you please provide a key to each element of the filenames separated by underscores, e.g. in F1S4_191008_307_A01.fastq.tar . Does every unique filename correspond to a single cell? I’m wondering why each substring like F1S4_191008_307 has 8 files, A01:H01?

Thank you for your patience as this is my first time working with SS data. I appreciate any guidance you can provide.

Hi,

Thank you for your question. I asked our technical experts how to address your questions. Here is the response:

  1. Isoforms. Our published expression matrix files are all at the gene-level. For isoform analysis the bam files will be needed to generate a new isoform level expression matrix.

  2. Our single cell SMARTerV4 data is FACS sorted into 8 well strips. Example F1S4_191008_307_A01

    1. F1S4 = Indicates the FACS machine used and the SMARTerV4 (S4) method.

    2. 191008 = the date of sort.

    3. 307 = Number of sorted cell.

    4. A01 = Sort well.

Thank you

1 Like

Got it! Thank you so much!