{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# End-to-end gene network tutorial\n",
    "\n",
    "Build a context-conditional gene correlation network from your own scRNA-seq data:\n",
    "\n",
    "```\n",
    "input directory (one h5ad per cell type)\n",
    "    │  build_metacells_dir       → dict[cell_type, metacell AnnData]\n",
    "    │  build_corr_db             → on-disk corr DB\n",
    "    │  build_multi_goi_features  → feature matrix across (GOI × cell type)\n",
    "    │  build_gene_network        → kNN gene network\n",
    "```\n",
    "\n",
    "Replace the placeholder paths and gene list with your own."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings; warnings.filterwarnings('ignore')\n",
    "from pathlib import Path\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "import networkx as nx\n",
    "\n",
    "from sceleto.network import (\n",
    "    build_metacells_dir,\n",
    "    build_corr_db,\n",
    "    list_cell_types,\n",
    "    build_gene_network,\n",
    "    plot_network,\n",
    "    plot_clustermap,\n",
    "    build_multi_goi_features,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Prepare the input directory\n",
    "\n",
    "`build_metacells_dir` reads every `*.h5ad` in a directory and treats each file as one cell type (file stem becomes the cell-type name). Point it at a clean directory containing only the files you want."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "INPUT_DIR = Path('path/to/input_dir')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Build metacells per cell type\n",
    "\n",
    "`build_metacells_dir` applies `build_metacells` to every h5ad in `INPUT_DIR`, building per-sample metacells within each cell type. Set `sample_key` to the `.obs` column identifying samples — kNN graphs are built within each sample only. When `output_dir` is provided, results are cached and read back on re-run (idempotent)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "metacells = build_metacells_dir(\n",
    "    INPUT_DIR,\n",
    "    'path/to/metacells_cache',\n",
    "    sample_key='sample',\n",
    "    counts='raw',\n",
    "    min_cells_per_sample=100,\n",
    "    prop=0.1,\n",
    "    seed=42,\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.5 (Optional) unify var_names across cell types\n",
    "\n",
    "`build_corr_db` requires every cell type's metacell AnnData to share the same `var_names`. If your input files have differing gene sets (e.g. compartment-specific QC), intersect them first; you can also rename to gene symbols here for readability."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "common = sorted(set.intersection(*(set(mc.var_names) for mc in metacells.values())))\n",
    "metacells = {ct: mc[:, common].copy() for ct, mc in metacells.items()}\n",
    "\n",
    "ref = next(iter(metacells.values())).copy()\n",
    "ref.var_names = ref.var['feature_name'].astype(str).values\n",
    "ref.var_names_make_unique()\n",
    "unified_names = list(ref.var_names)\n",
    "\n",
    "for mc in metacells.values():\n",
    "    mc.var_names = unified_names"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Build the correlation DB\n",
    "\n",
    "`build_corr_db` computes a per-cell-type (gene × gene) float16 correlation matrix from the metacell expression and writes it to disk under the `{name}_..._{version}` convention. With `overwrite=False`, calls are idempotent; a `gene_names` mismatch raises to prevent silent stale state."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "CORR_DB_DIR = 'path/to/corr_db'\n",
    "\n",
    "build_corr_db(\n",
    "    metacells,\n",
    "    out_dir=CORR_DB_DIR,\n",
    "    name='myproject',\n",
    "    version='v01',\n",
    "    overwrite=False,\n",
    "    verbose=True,\n",
    ")\n",
    "\n",
    "list_cell_types(CORR_DB_DIR, name='myproject', version='v01')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Multi-GOI integrated gene network\n",
    "\n",
    "For multiple genes of interest, `build_multi_goi_features` returns a single feature matrix spanning all `(GOI, cell type)` conditions. Feeding it to `build_gene_network` yields one kNN graph; per-condition visualizations share the same node layout for easy comparison."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gois = []  # your genes of interest, e.g. ['GENE1', 'GENE2']\n",
    "\n",
    "feat_multi = build_multi_goi_features(\n",
    "    gois,\n",
    "    data_dir=CORR_DB_DIR,\n",
    "    top_n=10,\n",
    "    name='myproject',\n",
    "    version='v01',\n",
    ")\n",
    "G_multi   = build_gene_network(feat_multi, k=5)\n",
    "pos_multi = nx.spring_layout(G_multi, weight='weight', seed=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_clustermap(feat_multi);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for goi in gois:\n",
    "    cols = [c for c in feat_multi.columns if c.startswith(f'{goi}_')]\n",
    "    fig, axes = plt.subplots(1, len(cols), figsize=(7*len(cols), 7), squeeze=False)\n",
    "    for ax, cond in zip(axes.ravel(), cols):\n",
    "        plot_network(G_multi, feature_matrix=feat_multi, condition=cond, pos=pos_multi, ax=ax)\n",
    "    fig.suptitle(goi, fontsize=14, y=1.02)\n",
    "    plt.tight_layout(); plt.show(); plt.close(fig)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}