End-to-end gene network tutorial

Build a context-conditional gene correlation network from your own scRNA-seq data:

input directory (one h5ad per cell type)
    │  build_metacells_dir       → dict[cell_type, metacell AnnData]
    │  build_corr_db             → on-disk corr DB
    │  build_multi_goi_features  → feature matrix across (GOI × cell type)
    │  build_gene_network        → kNN gene network

Replace the placeholder paths and gene list with your own.

[ ]:
import warnings; warnings.filterwarnings('ignore')
from pathlib import Path

import matplotlib.pyplot as plt
import networkx as nx

from sceleto.network import (
    build_metacells_dir,
    build_corr_db,
    list_cell_types,
    build_gene_network,
    plot_network,
    plot_clustermap,
    build_multi_goi_features,
)

1. Prepare the input directory

build_metacells_dir reads every *.h5ad in a directory and treats each file as one cell type (file stem becomes the cell-type name). Point it at a clean directory containing only the files you want.

[ ]:
INPUT_DIR = Path('path/to/input_dir')

2. Build metacells per cell type

build_metacells_dir applies build_metacells to every h5ad in INPUT_DIR, building per-sample metacells within each cell type. Set sample_key to the .obs column identifying samples — kNN graphs are built within each sample only. When output_dir is provided, results are cached and read back on re-run (idempotent).

[ ]:
metacells = build_metacells_dir(
    INPUT_DIR,
    'path/to/metacells_cache',
    sample_key='sample',
    counts='raw',
    min_cells_per_sample=100,
    prop=0.1,
    seed=42,
    verbose=True,
)

2.5 (Optional) unify var_names across cell types

build_corr_db requires every cell type’s metacell AnnData to share the same var_names. If your input files have differing gene sets (e.g. compartment-specific QC), intersect them first; you can also rename to gene symbols here for readability.

[ ]:
common = sorted(set.intersection(*(set(mc.var_names) for mc in metacells.values())))
metacells = {ct: mc[:, common].copy() for ct, mc in metacells.items()}

ref = next(iter(metacells.values())).copy()
ref.var_names = ref.var['feature_name'].astype(str).values
ref.var_names_make_unique()
unified_names = list(ref.var_names)

for mc in metacells.values():
    mc.var_names = unified_names

3. Build the correlation DB

build_corr_db computes a per-cell-type (gene × gene) float16 correlation matrix from the metacell expression and writes it to disk under the {name}_..._{version} convention. With overwrite=False, calls are idempotent; a gene_names mismatch raises to prevent silent stale state.

[ ]:
CORR_DB_DIR = 'path/to/corr_db'

build_corr_db(
    metacells,
    out_dir=CORR_DB_DIR,
    name='myproject',
    version='v01',
    overwrite=False,
    verbose=True,
)

list_cell_types(CORR_DB_DIR, name='myproject', version='v01')

4. Multi-GOI integrated gene network

For multiple genes of interest, build_multi_goi_features returns a single feature matrix spanning all (GOI, cell type) conditions. Feeding it to build_gene_network yields one kNN graph; per-condition visualizations share the same node layout for easy comparison.

[ ]:
gois = []  # your genes of interest, e.g. ['GENE1', 'GENE2']

feat_multi = build_multi_goi_features(
    gois,
    data_dir=CORR_DB_DIR,
    top_n=10,
    name='myproject',
    version='v01',
)
G_multi   = build_gene_network(feat_multi, k=5)
pos_multi = nx.spring_layout(G_multi, weight='weight', seed=3)
[ ]:
plot_clustermap(feat_multi);
[ ]:
for goi in gois:
    cols = [c for c in feat_multi.columns if c.startswith(f'{goi}_')]
    fig, axes = plt.subplots(1, len(cols), figsize=(7*len(cols), 7), squeeze=False)
    for ax, cond in zip(axes.ravel(), cols):
        plot_network(G_multi, feature_matrix=feat_multi, condition=cond, pos=pos_multi, ax=ax)
    fig.suptitle(goi, fontsize=14, y=1.02)
    plt.tight_layout(); plt.show(); plt.close(fig)