Data Download: Different Sample Numbers in Files

I downloaded the folder ‘RNA-seq Gencode v3c summarized to genes’ and loaded the dataframes into R. From what I understand, the columns in the ‘expression_matrix.csv’ file are the samples listed in the ‘columns_metadata.csv’ file. Therefore, the number of columns in ‘expression_matrix.csv’ should equal the number of rows in ‘columns_metadata.csv’ However, when I check, the ‘expression_maxtrix.csv’ has 579 columns whereas the ‘columns_metadata.csv’ file has 524 rows.

> expressionMatrix <- read.csv(file='expression_matrix.csv',sep=',',header=TRUE)
> columnsMeta <- read.csv(file='columns_metadata.csv',sep=',',header=TRUE)
> ncol(expressionMatrix)
[1] 579
> nrow(columnsMeta)
[1] 524

Hi there, thanks for your interest in the Allen Institute’s data. Can you please describe what type of data you are interested in and where you are downloading it from? This will help us to find the right person to answer your question.

Hi, the data was downloaded from here: Data Download :: BrainSpan: Atlas of the Developing Human Brain

Hi,

I just downloaded the same file

and “columns_metadata.csv” includes 578 rows:


This aligns with the “expression_matrix.csv” (which has the first row corresponding to the genes).

I’m not sure why you are seeing a different number. Could you try downloading again? Maybe the download was incomplete.

Jeremy