End-to-end gene network tutorial¶
Build a context-conditional gene correlation network from your own scRNA-seq data:
input directory (one h5ad per cell type)
│ build_metacells_dir → dict[cell_type, metacell AnnData]
│ build_corr_db → on-disk corr DB
│ build_multi_goi_features → feature matrix across (GOI × cell type)
│ build_gene_network → kNN gene network
Replace the placeholder paths and gene list with your own.
[ ]:
import warnings; warnings.filterwarnings('ignore')
from pathlib import Path
import matplotlib.pyplot as plt
import networkx as nx
from sceleto.network import (
build_metacells_dir,
build_corr_db,
list_cell_types,
build_gene_network,
plot_network,
plot_clustermap,
build_multi_goi_features,
)
1. Prepare the input directory¶
build_metacells_dir reads every *.h5ad in a directory and treats each file as one cell type (file stem becomes the cell-type name). Point it at a clean directory containing only the files you want.
[ ]:
INPUT_DIR = Path('path/to/input_dir')
2. Build metacells per cell type¶
build_metacells_dir applies build_metacells to every h5ad in INPUT_DIR, building per-sample metacells within each cell type. Set sample_key to the .obs column identifying samples — kNN graphs are built within each sample only. When output_dir is provided, results are cached and read back on re-run (idempotent).
[ ]:
metacells = build_metacells_dir(
INPUT_DIR,
'path/to/metacells_cache',
sample_key='sample',
counts='raw',
min_cells_per_sample=100,
prop=0.1,
seed=42,
verbose=True,
)
2.5 (Optional) unify var_names across cell types¶
build_corr_db requires every cell type’s metacell AnnData to share the same var_names. If your input files have differing gene sets (e.g. compartment-specific QC), intersect them first; you can also rename to gene symbols here for readability.
[ ]:
common = sorted(set.intersection(*(set(mc.var_names) for mc in metacells.values())))
metacells = {ct: mc[:, common].copy() for ct, mc in metacells.items()}
ref = next(iter(metacells.values())).copy()
ref.var_names = ref.var['feature_name'].astype(str).values
ref.var_names_make_unique()
unified_names = list(ref.var_names)
for mc in metacells.values():
mc.var_names = unified_names
3. Build the correlation DB¶
build_corr_db computes a per-cell-type (gene × gene) float16 correlation matrix from the metacell expression and writes it to disk under the {name}_..._{version} convention. With overwrite=False, calls are idempotent; a gene_names mismatch raises to prevent silent stale state.
[ ]:
CORR_DB_DIR = 'path/to/corr_db'
build_corr_db(
metacells,
out_dir=CORR_DB_DIR,
name='myproject',
version='v01',
overwrite=False,
verbose=True,
)
list_cell_types(CORR_DB_DIR, name='myproject', version='v01')
4. Multi-GOI integrated gene network¶
For multiple genes of interest, build_multi_goi_features returns a single feature matrix spanning all (GOI, cell type) conditions. Feeding it to build_gene_network yields one kNN graph; per-condition visualizations share the same node layout for easy comparison.
[ ]:
gois = [] # your genes of interest, e.g. ['GENE1', 'GENE2']
feat_multi = build_multi_goi_features(
gois,
data_dir=CORR_DB_DIR,
top_n=10,
name='myproject',
version='v01',
)
G_multi = build_gene_network(feat_multi, k=5)
pos_multi = nx.spring_layout(G_multi, weight='weight', seed=3)
[ ]:
plot_clustermap(feat_multi);
[ ]:
for goi in gois:
cols = [c for c in feat_multi.columns if c.startswith(f'{goi}_')]
fig, axes = plt.subplots(1, len(cols), figsize=(7*len(cols), 7), squeeze=False)
for ax, cond in zip(axes.ravel(), cols):
plot_network(G_multi, feature_matrix=feat_multi, condition=cond, pos=pos_multi, ax=ax)
fig.suptitle(goi, fontsize=14, y=1.02)
plt.tight_layout(); plt.show(); plt.close(fig)