Hello, I am a undergrad student at UC Davis, with no experience using the API, or SDK of the Allen Brain Institute.
I am trying to create a python module, that will match patient microarray expression data to their disease, and anatomical structure. So using my module, and downloaded microarray data, I can querey the following. “Given gene classification, “autism”, what genes are expressed?” “Given anatomy structure, “prefontal cortex”, what diseases have their genes significantly expressed here?” etc.
Now, I have the expression.csv file, and I understand the rows are the probes, and each column is a z score. However, I don’t know which column goes to which structure, or to which patient. So my question is, “how is the expression file from the downloaded microarray data organized?”
To reiterate, I downloaded all the data from the gene classification, “autism”. The expression file has no headers, detailing what anatomical structure or patient the z-scores come from. Is there a key for the csv file?
Hi @GerrikLabra. When you download expression data using a “Download this data button” you should have a zip file with four files: “Contents.txt”, “Expression.csv”, “Columns.csv”, and “Probes.csv”. Contents.txt will provide more details, but Probes.csv tells you about the probes/genes and Columns.csv tells you about the samples, including the information you request here.
1 Like
Hi Jeremy,
I read the document and the help files, but it only tells me the following.
"The file Expression.csv contains expression values, calculated using zscore normalization. Each row begins with the ID of a probe.
The file Columns.csv contains metadata for each column in Expression.csv, arranged in the same order."
So, is row 1 column B of expression.csv, relating to the second row of columns.csv. Then in row 2, col B, expression, is row 1 of columns. row 3, 4, 5, etc, column B, of expression is for row 1 of columns. Ok. Then columns B-all represents in expression.csv represent all the rows in columns.csv
Correct! Columns B,C,D,… of expression.csv correspond to rows 2,3,4,… in Columns.csv. Rows 1,2,3,… in expression.csv correspond to rows 2,3,4,… in Probes.csv and can also be matched from the id’s in column A.
1 Like
What do these value even mean?
Does -0.5 mean that this gene is very little expressed?
These are most likely Z scores. See this post and the related post it references for more detail.
So in Brainspan. This fold change what dies this mean?
In all the atlases (including Brainspan) the fold change is the ratio of expression in the target vs. contrast structures:
I get this mess. But why?
You need to import them into Excel rather than directly opening, or use a text editor or a programming language. This can be resolved using a Google search.