Mapmycell pipeline for user

hansol · April 24, 2024, 5:54pm

dear Allen
today I joined the talk about Maymycell
i would like to apply and use this with my dataset however I could not well navigate it. may I ask your instruction about it? (the tutorial notebook link was broken as well)
thanks
best

danielsf · April 24, 2024, 6:49pm

Hi @hansol

Two questions first:

What form is your data currently in?
Can you please post the link you tried to go to that was broken (we need to fix that)?

(this might take a little back-and-forth before we get you unstuck; thank you for your patience)

Thanks,

Scott

hansol · April 25, 2024, 6:51pm

thanks for quick reply to me

it was MapMyCells Use Case: Single Cell Genomics in Mouse - brain-map.org
but now I think the error solved.
I have H5 format ( I finished preprocessing/basic clustering with Seurat pipeline). thanks!

danielsf · May 1, 2024, 9:10pm

(sorry for the late reply; I somehow missed the email notification that you had responded to my question)

Here is a python script I put together for a different user to convert an HDF5 file in which the cell-by-gene data was stored with genes as rows and cells as columns into an h5ad file

import anndata
import h5py
import pandas as pd
import scipy.sparse


with h5py.File('cell_by_gene_matrix_file.h5', 'r') as src:
    barcodes = src['matrix/barcodes'][()]
    gene_ids = src['matrix/features/id'][()]

    # The cell-by-gene data appears to be stored as a sparse CSC matrix
    # in which genes are rows and cells are columns.
    # See:
    # https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_matrix.html
    cell_by_gene = scipy.sparse.csc_matrix(
        (src['matrix/data'][()],
         src['matrix/indices'][()],
         src['matrix/indptr'][()]),
        shape=src['matrix/shape'][()])
# MapMyCells expects rows to be *cells* and columns to be genes
cell_by_gene = cell_by_gene.transpose()

# Now save the data as an h5ad file
# https://anndata.readthedocs.io/en/latest/

# store the cell metadata in the 'obs' dataframe;
# really we just need a dataframe whose index is the unique identifier
# of each cell.
obs = pd.DataFrame(
    [{'barcode': b} for b in barcodes]).set_index('barcode')

# store the gene metadata in the 'var' dataframe;
# really we just need a dataframe whose index is the unique identifier
# of each gene
var = pd.DataFrame(
    [{'gene_id': g} for g in gene_ids]).set_index('gene_id')

a_data = anndata.AnnData(
    X=cell_by_gene,
    obs=obs,
    var=var)

a_data.write_h5ad(
    'output_as_anndata.h5ad',
    compression='gzip',
    compression_opts=4)

Does this get you unstuck?

If not, can you give me a little more information about the contents of your H5 file?

(I suppose I should also ask if you are comfortable in Python or R?)

Topic		Replies	Views
Error to uploading my H5ad file on Mapmycell MapMyCells	19	350	May 29, 2024
Generating input for MapMyCells from Spatial Data MapMyCells	6	528	December 6, 2023
MapMyCells Now Accepts CSV Input Files! MapMyCells	0	59	March 7, 2025
H5 to h5ad conversion help MapMyCells	5	789	March 12, 2024
MapMyCells User Guide MapMyCells how-to	8	1823	March 21, 2024

Mapmycell pipeline for user

Related topics