{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# End-to-end gene network tutorial\n", "\n", "Build a context-conditional gene correlation network from your own scRNA-seq data:\n", "\n", "```\n", "input directory (one h5ad per cell type)\n", " │ build_metacells_dir → dict[cell_type, metacell AnnData]\n", " │ build_corr_db → on-disk corr DB\n", " │ build_multi_goi_features → feature matrix across (GOI × cell type)\n", " │ build_gene_network → kNN gene network\n", "```\n", "\n", "Replace the placeholder paths and gene list with your own." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import warnings; warnings.filterwarnings('ignore')\n", "from pathlib import Path\n", "\n", "import matplotlib.pyplot as plt\n", "import networkx as nx\n", "\n", "from sceleto.network import (\n", " build_metacells_dir,\n", " build_corr_db,\n", " list_cell_types,\n", " build_gene_network,\n", " plot_network,\n", " plot_clustermap,\n", " build_multi_goi_features,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Prepare the input directory\n", "\n", "`build_metacells_dir` reads every `*.h5ad` in a directory and treats each file as one cell type (file stem becomes the cell-type name). Point it at a clean directory containing only the files you want." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "INPUT_DIR = Path('path/to/input_dir')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Build metacells per cell type\n", "\n", "`build_metacells_dir` applies `build_metacells` to every h5ad in `INPUT_DIR`, building per-sample metacells within each cell type. Set `sample_key` to the `.obs` column identifying samples — kNN graphs are built within each sample only. When `output_dir` is provided, results are cached and read back on re-run (idempotent)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "metacells = build_metacells_dir(\n", " INPUT_DIR,\n", " 'path/to/metacells_cache',\n", " sample_key='sample',\n", " counts='raw',\n", " min_cells_per_sample=100,\n", " prop=0.1,\n", " seed=42,\n", " verbose=True,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.5 (Optional) unify var_names across cell types\n", "\n", "`build_corr_db` requires every cell type's metacell AnnData to share the same `var_names`. If your input files have differing gene sets (e.g. compartment-specific QC), intersect them first; you can also rename to gene symbols here for readability." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "common = sorted(set.intersection(*(set(mc.var_names) for mc in metacells.values())))\n", "metacells = {ct: mc[:, common].copy() for ct, mc in metacells.items()}\n", "\n", "ref = next(iter(metacells.values())).copy()\n", "ref.var_names = ref.var['feature_name'].astype(str).values\n", "ref.var_names_make_unique()\n", "unified_names = list(ref.var_names)\n", "\n", "for mc in metacells.values():\n", " mc.var_names = unified_names" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Build the correlation DB\n", "\n", "`build_corr_db` computes a per-cell-type (gene × gene) float16 correlation matrix from the metacell expression and writes it to disk under the `{name}_..._{version}` convention. With `overwrite=False`, calls are idempotent; a `gene_names` mismatch raises to prevent silent stale state." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "CORR_DB_DIR = 'path/to/corr_db'\n", "\n", "build_corr_db(\n", " metacells,\n", " out_dir=CORR_DB_DIR,\n", " name='myproject',\n", " version='v01',\n", " overwrite=False,\n", " verbose=True,\n", ")\n", "\n", "list_cell_types(CORR_DB_DIR, name='myproject', version='v01')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Multi-GOI integrated gene network\n", "\n", "For multiple genes of interest, `build_multi_goi_features` returns a single feature matrix spanning all `(GOI, cell type)` conditions. Feeding it to `build_gene_network` yields one kNN graph; per-condition visualizations share the same node layout for easy comparison." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "gois = [] # your genes of interest, e.g. ['GENE1', 'GENE2']\n", "\n", "feat_multi = build_multi_goi_features(\n", " gois,\n", " data_dir=CORR_DB_DIR,\n", " top_n=10,\n", " name='myproject',\n", " version='v01',\n", ")\n", "G_multi = build_gene_network(feat_multi, k=5)\n", "pos_multi = nx.spring_layout(G_multi, weight='weight', seed=3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_clustermap(feat_multi);" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for goi in gois:\n", " cols = [c for c in feat_multi.columns if c.startswith(f'{goi}_')]\n", " fig, axes = plt.subplots(1, len(cols), figsize=(7*len(cols), 7), squeeze=False)\n", " for ax, cond in zip(axes.ravel(), cols):\n", " plot_network(G_multi, feature_matrix=feat_multi, condition=cond, pos=pos_multi, ax=ax)\n", " fig.suptitle(goi, fontsize=14, y=1.02)\n", " plt.tight_layout(); plt.show(); plt.close(fig)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 5 }