dear Allen
today I joined the talk about Maymycell
i would like to apply and use this with my dataset however I could not well navigate it. may I ask your instruction about it? (the tutorial notebook link was broken as well)
thanks
best
Hi @hansol
Two questions first:
-
What form is your data currently in?
-
Can you please post the link you tried to go to that was broken (we need to fix that)?
(this might take a little back-and-forth before we get you unstuck; thank you for your patience)
Thanks,
Scott
thanks for quick reply to me
it was MapMyCells Use Case: Single Cell Genomics in Mouse - brain-map.org
but now I think the error solved.
I have H5 format ( I finished preprocessing/basic clustering with Seurat pipeline). thanks!
(sorry for the late reply; I somehow missed the email notification that you had responded to my question)
Here is a python script I put together for a different user to convert an HDF5 file in which the cell-by-gene data was stored with genes as rows and cells as columns into an h5ad file
import anndata
import h5py
import pandas as pd
import scipy.sparse
with h5py.File('cell_by_gene_matrix_file.h5', 'r') as src:
barcodes = src['matrix/barcodes'][()]
gene_ids = src['matrix/features/id'][()]
# The cell-by-gene data appears to be stored as a sparse CSC matrix
# in which genes are rows and cells are columns.
# See:
# https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_matrix.html
cell_by_gene = scipy.sparse.csc_matrix(
(src['matrix/data'][()],
src['matrix/indices'][()],
src['matrix/indptr'][()]),
shape=src['matrix/shape'][()])
# MapMyCells expects rows to be *cells* and columns to be genes
cell_by_gene = cell_by_gene.transpose()
# Now save the data as an h5ad file
# https://anndata.readthedocs.io/en/latest/
# store the cell metadata in the 'obs' dataframe;
# really we just need a dataframe whose index is the unique identifier
# of each cell.
obs = pd.DataFrame(
[{'barcode': b} for b in barcodes]).set_index('barcode')
# store the gene metadata in the 'var' dataframe;
# really we just need a dataframe whose index is the unique identifier
# of each gene
var = pd.DataFrame(
[{'gene_id': g} for g in gene_ids]).set_index('gene_id')
a_data = anndata.AnnData(
X=cell_by_gene,
obs=obs,
var=var)
a_data.write_h5ad(
'output_as_anndata.h5ad',
compression='gzip',
compression_opts=4)
Does this get you unstuck?
If not, can you give me a little more information about the contents of your H5 file?
(I suppose I should also ask if you are comfortable in Python or R?)