sceleto.network¶
Correlation-based Gene Network¶
- sceleto.network.compute_corr(adata: AnnData, gene: str, label: str | None = None, layer: str | None = None, chunk_size: int = 4096) DataFrame[source]¶
Pearson correlation of gene against all genes in adata.
Parameters¶
- adata
Input AnnData.
- gene
Gene of interest; must be in
adata.var_names.- label
Column prefix for output. Falls back to
adata.uns["label"]then"sample".- layer
Layer to use instead of
adata.X.- chunk_size
Genes processed per chunk (memory control).
Returns¶
- pd.DataFrame
Columns:
gene,{label}_corr,{label}_pval.
- sceleto.network.build_corr_matrix(adatas: dict[str, AnnData], gene: str, layer: str | None = None, chunk_size: int = 4096) DataFrame[source]¶
Compute per-condition correlation for gene across multiple AnnData objects.
Parameters¶
- adatas
{label: AnnData}mapping. The key is used as the column prefix.- gene
Gene of interest.
- layer
Layer to use instead of
adata.X.- chunk_size
Passed to
compute_corr().
Returns¶
- pd.DataFrame
Wide table:
gene+{label}_corr+{label}_pvalper condition.
- sceleto.network.select_top_genes(corr_df: DataFrame, top_n: int = 10, conditions: list[str] | None = None, exclude_gene: str | None = None) DataFrame[source]¶
Select the top top_n positively correlated genes per condition.
Parameters¶
- corr_df
Wide table from
build_corr_matrix()orload_corr_db().- top_n
Number of top genes to keep per condition.
- conditions
Subset of condition labels (column prefix, i.e. without
_corr). If None, all*_corrcolumns are used.- exclude_gene
Gene name to exclude (typically the GOI itself). Removes rows where
gene == exclude_geneorcorr >= 1.0.
Returns¶
- pd.DataFrame
Long-form:
condition,gene,corr,pval.
- sceleto.network.build_feature_matrix(top_genes_df: DataFrame, corr_df: DataFrame) DataFrame[source]¶
Build a gene × conditions correlation matrix for network construction.
Parameters¶
- top_genes_df
Long-form output of
select_top_genes().- corr_df
Wide table from
build_corr_matrix().
Returns¶
- pd.DataFrame
Index = unique genes, columns = condition labels, values = corr (NaN filled with 0.0).
- sceleto.network.build_gene_network(feature_matrix: DataFrame, k: int = 5, metric: str = 'euclidean') Graph[source]¶
Build a k-NN gene network from a feature matrix.
Parameters¶
- feature_matrix
Gene × conditions matrix (output of
build_feature_matrix()).- k
Number of nearest neighbours per gene.
- metric
Distance metric passed to
scipy.spatial.distance.pdist.
Returns¶
- networkx.Graph
Nodes = gene names; edge attributes:
dist,weight.
- sceleto.network.plot_network(G: Graph, feature_matrix: DataFrame | None = None, condition: str | None = None, pos: dict | None = None, seed: int = 3, figsize: tuple[int, int] = (15, 15), node_size_range: tuple[int, int] = (50, 600), cmap: str = 'coolwarm', ax: Axes | None = None) Figure[source]¶
Draw a gene network with optional per-condition node coloring.
Parameters¶
- G
networkx Graph from
build_gene_network().- feature_matrix
Gene × conditions matrix. Required when condition is set.
- condition
Column in feature_matrix to use for node color/size.
- pos
Pre-computed layout positions. If None, spring layout is computed.
- seed
Random seed for spring layout.
figsize node_size_range
(min_size, max_size)when coloring by condition.- cmap
Colormap name for condition coloring.
- ax
Existing Axes to draw on.
Returns¶
matplotlib Figure
- sceleto.network.plot_clustermap(feature_matrix: DataFrame, figsize: tuple[int, int] = (15, 35), cmap: str = 'coolwarm', max_genes: int = 96) ClusterGrid[source]¶
Hierarchically clustered heatmap of the feature matrix.
Parameters¶
- feature_matrix
Gene × conditions matrix.
figsize cmap max_genes
If more genes than this, keep top max_genes by mean |corr|.
Returns¶
seaborn ClusterGrid
- sceleto.network.corr_pangea(gene: str, data_dir: str, cell_types: list[str] | None = None, top_n: int = 10, k: int = 5) tuple[DataFrame, DataFrame, Graph][source]¶
One-shot gene network from PANGEA pre-computed correlation DB.
Parameters¶
- gene
Gene of interest (e.g.
"CD55").- data_dir
Directory containing
pangea_corr_{CT}_v03.csv.gzfiles.- cell_types
Subset of cell types.
None= all 6.- top_n
Number of top correlated genes per cell type.
- k
Number of nearest neighbours for the kNN gene network.
Returns¶
- corr_dfpd.DataFrame
Wide table (gene + per-cell-type corr/pval).
- feature_matrixpd.DataFrame
Gene × conditions correlation matrix.
- Gnetworkx.Graph
kNN gene network.
Correlation Database¶
- sceleto.network.list_cell_types(data_dir: str | Path | None = None, name: str = 'pangea', version: str = 'v03') list[str][source]¶
Return available cell types in a corr database.
Parameters¶
- data_dir
Directory containing the DB. If
None, returns the PANGEA defaults without touching disk.- name, version
DB identifiers (only used when
data_diris given).
Returns¶
- list[str]
Cell type keys (read from
{name}_n_obs_{version}.json).
- sceleto.network.load_corr_db(gene: str, data_dir: str | Path, cell_types: list[str] | None = None, name: str = 'pangea', version: str = 'v03') DataFrame[source]¶
Load pre-computed correlations for a gene of interest.
Uses memory-mapped npy files for fast random row access.
Parameters¶
- gene
Gene name (must exist in the corr database).
- data_dir
Directory containing
{name}_corr_{CT}_{version}.npy,{name}_gene_names_{version}.npy, and{name}_n_obs_{version}.json.- cell_types
Subset of cell types to load.
None= all (auto-discovered from{name}_n_obs_{version}.json).- name, version
DB identifiers. Defaults are PANGEA (
"pangea"/"v03").
Returns¶
- pd.DataFrame
Wide table:
gene+{CT}_corr+{CT}_pvalper cell type. Compatible withselect_top_genes().
Legacy¶
- sceleto.network.get_grid(bdata, scale=1, border=2, expand=3, select_per_grid=5, min_count=2, n_neighbor=10)[source]¶