Utilities¶

Dotplot¶

sceleto.dotplot(adata, var_names: Sequence[str] | Mapping[str, Sequence], groupby: str, *, max_scale: bool = True, groups: Sequence[str] | None = None, swap_axes: bool = False, use_raw: bool = True, dendrogram: bool = False, cmap: str = 'OrRd', figsize: Tuple[float, float] | None = None, save: str | None = None, show: bool = True, **kwargs)[source]¶

Dotplot built on scanpy.pl.dotplot, with optional per-gene max-scaling.

Size encodes fraction of cells expressing the gene (scanpy default). Color depends on max_scale:

max_scale=True (default): group_mean(gene) / max_group(group_mean(gene)) per gene, so vmax=1 always corresponds to the highest-expressing group.
max_scale=False: plain group mean of (log1p) expression with scanpy’s automatic color scaling — i.e. exactly what scanpy.pl.dotplot shows by default.

Follows scanpy’s default axis orientation: genes on x-axis, groups on y-axis. Pass swap_axes=True to put genes on y-axis, groups on x-axis.

Parameters¶

adata: AnnData with log1p-normalized expression.
var_names: Gene list or {bracket_name: [gene, ...]} / {bracket_name: [(gene, score), ...]} mapping. Mappings render as bracket-grouped labels via scanpy.
groupby: Column in adata.obs to group cells by.
max_scale: If True (default), color = per-gene max-normalized group mean with vmin=0, vmax=1. If False, color = raw group mean (log1p) with scanpy’s automatic color scaling, reproducing scanpy.pl.dotplot’s default; vmin/vmax/standard_scale may then be passed through.
groups: Subset of groups to display. None shows all.
swap_axes: If True, genes on y-axis, groups on x-axis (swaps scanpy default).
use_raw: If True (default), read from adata.raw.X. If False, read from adata.X. Both sources are checked for log1p normalization.
cmap: Matplotlib colormap for color scale (default OrRd).
figsize: Manual (width, height) in inches.
save: Path to save figure (PDF, dpi=300).
show: Whether to call plt.show().
**kwargs: Forwarded to scanpy.pl.dotplot.

Annotator¶

Build cell-type annotations incrementally by mapping cluster IDs to labels.

import sceleto as scl

# start a new 'celltype' column (all cells begin as 'unknown')
ann = scl.Annotator(adata, 'celltype')

# one call = one cluster -> one label (exact string match)
ann.annotate('leiden', '0', 'T cell')
ann.annotate('leiden', '1', 'B cell')

# label several clusters at once by looping a dict
for cluster, label in {'2': 'Monocyte', '3': 'Monocyte', '4': 'NK'}.items():
    ann.annotate('leiden', cluster, label)

# only fill in cells still left as 'unknown'
ann.annotate('leiden', '5', 'other', unknown_only=True)

ann.summary()   # value counts of the current labels

class sceleto.Annotator(adata, label_key: str, copy_from: str | None = None)[source]¶

Bases: object

Build cell-type annotations incrementally on an AnnData object.

Parameters¶

adata: AnnData object.
label_key: Name of the new column in adata.obs.
copy_from: If given, initialize from an existing adata.obs column.

annotate(obs_key: str, select: str, label: str, unknown_only: bool = False)[source]¶

Assign label to cells whose obs_key value equals select.

One call = one decision. select is a single string matched exactly against adata.obs[obs_key]; no list or comma splitting. To label multiple groups with the same label, call repeatedly (e.g. in a dict loop).

Parameters¶

obs_key: Column in adata.obs to match against (e.g. ‘leiden’, ‘leiden_R’).
select: Exact value to match in adata.obs[obs_key].
label: The annotation label to assign.
unknown_only: If True, only update cells still labeled ‘unknown’.

annotate_mask(mask, label: str)[source]¶: Assign label to cells matching a boolean mask directly.

summary()[source]¶: Print value counts of current annotations.

UMAP¶

sceleto.us(adata, gene, groups=None, show=False, exclude=None, figsize=None, **kwargs)[source]¶

03/10/2022

Create a umap using a list of genes.

adata:AnnData, REQUIRED | AnnData object. gene:list/str, REQUIRED | List of genes to use for UMAP. A coma seperated string can be used instead of a list groups:str, NOT REQUIRED | Restrict to a few categories in categorical observation annotation show:boolean, NOT REQUIRED | Show the plot. Default = False. exclude:list, NOT REQUIRED | List of genes to exclude. figsize:float, NOT REQUIRED | Figure size.

Preprocessing¶

sceleto.sc_process(adata, steps: str = 'fspkuc', n_pcs: int = 50)[source]¶

Scanpy preprocessing pipeline controlled by a step string.

Each letter in steps triggers one preprocessing step, executed in order:

n	normalize_total (1e4)
l	log1p + store .raw
f	highly_variable_genes + filter
r	remove cell-cycle genes
s	scale (max_value=10)
p	PCA
k	kNN neighbors
u	UMAP
c	leiden clustering

Parameters¶

adata : AnnData steps : str

Letters selecting which steps to run. Default "fspkuc".

n_pcsint: Number of PCs for neighbor search. Default 50.

sceleto.read_process(adata, version: str, *, species: str = 'human', sample: str | None = None, define_var: bool = True, call_doublet: bool = True, write: bool = True, min_n_counts: int = 1000, min_n_genes: int = 500, max_n_genes: int = 7000, max_pct_mito: float = 0.5)[source]¶

QC filtering + optional doublet detection + write.

Parameters¶

adataAnnData: Raw count matrix.
versionstr: Version tag for the output filename.
speciesstr: "human" or "mouse" (determines mito gene prefix).
samplestr, optional: Sample name stored in adata.obs["Sample"].
define_varbool: If True, copy gene names / Ensembl IDs into adata.var.
call_doubletbool: If True, run scrublet for doublet detection (lazy import).
writebool: If True, save filtered adata as h5ad.
min_n_counts, min_n_genes, max_n_genesint: Cell-level count / gene number thresholds.
max_pct_mitofloat: Maximum mitochondrial fraction (0–1).

sceleto.remove_geneset(adata, geneset)[source]¶: Remove genes in geneset from adata and return a copy.