Mapmycell pipeline for user

dear Allen
today I joined the talk about Maymycell
i would like to apply and use this with my dataset however I could not well navigate it. may I ask your instruction about it? (the tutorial notebook link was broken as well)
thanks
best

Hi @hansol

Two questions first:

  1. What form is your data currently in?

  2. Can you please post the link you tried to go to that was broken (we need to fix that)?

(this might take a little back-and-forth before we get you unstuck; thank you for your patience)

Thanks,

Scott

thanks for quick reply to me

it was MapMyCells Use Case: Single Cell Genomics in Mouse - brain-map.org
but now I think the error solved.
I have H5 format ( I finished preprocessing/basic clustering with Seurat pipeline). thanks!

(sorry for the late reply; I somehow missed the email notification that you had responded to my question)

Here is a python script I put together for a different user to convert an HDF5 file in which the cell-by-gene data was stored with genes as rows and cells as columns into an h5ad file

import anndata
import h5py
import pandas as pd
import scipy.sparse


with h5py.File('cell_by_gene_matrix_file.h5', 'r') as src:
    barcodes = src['matrix/barcodes'][()]
    gene_ids = src['matrix/features/id'][()]

    # The cell-by-gene data appears to be stored as a sparse CSC matrix
    # in which genes are rows and cells are columns.
    # See:
    # https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_matrix.html
    cell_by_gene = scipy.sparse.csc_matrix(
        (src['matrix/data'][()],
         src['matrix/indices'][()],
         src['matrix/indptr'][()]),
        shape=src['matrix/shape'][()])
# MapMyCells expects rows to be *cells* and columns to be genes
cell_by_gene = cell_by_gene.transpose()

# Now save the data as an h5ad file
# https://anndata.readthedocs.io/en/latest/

# store the cell metadata in the 'obs' dataframe;
# really we just need a dataframe whose index is the unique identifier
# of each cell.
obs = pd.DataFrame(
    [{'barcode': b} for b in barcodes]).set_index('barcode')

# store the gene metadata in the 'var' dataframe;
# really we just need a dataframe whose index is the unique identifier
# of each gene
var = pd.DataFrame(
    [{'gene_id': g} for g in gene_ids]).set_index('gene_id')

a_data = anndata.AnnData(
    X=cell_by_gene,
    obs=obs,
    var=var)

a_data.write_h5ad(
    'output_as_anndata.h5ad',
    compression='gzip',
    compression_opts=4)

Does this get you unstuck?

If not, can you give me a little more information about the contents of your H5 file?

(I suppose I should also ask if you are comfortable in Python or R?)