I am analyzing ADNP expression in the Allen Human Brain Atlas (microarray dataset) and encountered a probe-level inconsistency that I would appreciate some guidance on.
Specifically, ADNP is represented by multiple probes in the AHBA. When I examined regional expression patterns (z-scored across brain regions, averaged across ~6 donors per region), I found that two probes show opposite trends within the same brain regions:
This discrepancy persists across multiple regions.
I am therefore trying to clarify:
What are the exact targets of these two probes?
Do they map to different transcripts, exons, or UTRs of ADNP?
Is either probe known to have lower reliability, cross-hybridization, or weaker specificity for ADNP?
Is there an Allen-recommended probe for ADNP in the Human Brain Atlas (e.g. based on expression level, differential stability, or QC metrics)?
More generally, I would like to understand whether this divergence reflects biological transcript heterogeneity or is more likely a technical/annotation issue, and how best to handle this in downstream regional analyses.
Any guidance on probe selection or relevant documentation would be greatly appreciated.
I can partially address your questions from the perspective of someone who has studied the best probes to select on microarrays in general, although I have not looked at ADNP expression specifically. I wrote a paper several years ago on different ways to select probes (and an associated R function). In short, usually you can just take the probe with the highest average expression.
In this case, however, we have ground truth (RNA-seq) data from the Allen Human Brain Atlas, so a better option would be to choose the probe with higher correlation to gene expression levels based on RNA-seq. I also wrote a paper (here) describing how your can select the best probes and potentially scale the microarray data to make the resulting values more representative of absolute expression levels. Additional file 8 from this study suggests that “A_23_P254179” is better correlated with RNA-seq data and may be a better choice.
A few more thoughts:
I would not recommend using Z scores in your analysis, but instead sticking with the (normalized) intensity values provided. The 0 value for expression is meaningful and Z-scoring artificially puts some values below 0 with a somewhat biologically arbitrary mean and minimum value.
Depending on the specific brain regions you care about, it might be worthwhile to use the ‘RNA-sequencing datasets’ available for download here.
If you really want to investigate specific targeting of these probes, you can find probe sequences for these (or any) probes on the gene view for ADNP (or any gene). A_23_P254179 → TTAGACCTATTCAAGTGATGCTCATGATCCTGTTACTGTGTGCCCATCATAGATTTCTTT and CUST_10358_PI416261804 → TTGAAAAACACTACATGGGAGGATGTAGGACTGTGGGACCCATCACTTACGAAAAACCAG. Tools like blast can pinpoint these within the genome.
Hopefully this is enough, but feel free to reply if you need more information.
Thank you so much for your helpful and detailed guidance — this really clarifies the probe selection strategy for me and will directly inform my analysis. I truly appreciate your time and insight.