Hi, @jenniferhu04. Deciding which probe(s) to best represent each gene is something that we have spent a lot of time considering. You have a few choices.
- Use the probe with the highest average expression level. Usually this probe best represents the underlying gene expression
- We wrote a paper comparing gene expression from the Allen Human Brain Atlas using microarray and RNA-seq. Additional File 8 shows statistics for each probe, and probes with the lowest q-value best represent true gene expression values as measured with RNA-seq.
- If you want to aggregate probes using the average or another metric (take the mostly highly expressed probe per gene as mentioned in #1 above), you can do that easily with the collapseRows R function.
After calculating your gene expression matrix using one of the above three options, then I would suggest defining your R score between genes.