adata_science_tools

Python data science toolkit for differential analysis and visualization on AnnData objects.

Overview

A Python data science toolkit that operates on AnnData objects, providing config-driven workflows for differential analysis, custom plotting, and simulated data generation. The library includes column plots, volcano plots, correlation dotplots with marginal histograms, and a simulation framework for generating reproducible test datasets with configurable covariates.

Key Features

  • Differential analysis plotting – bar/dotplot combinations showing log2 fold-change with FDR-corrected p-values and per-feature data distributions
  • Volcano plots – configurable significance thresholds with multi-level alpha shading and gene labeling
  • Correlation dotplots – scatter plots with per-group regression fits, Pearson correlations, and marginal histograms/KDEs
  • Data simulation – config-driven AnnData generation with tunable betas, covariates (e.g. age, case/control), and residual variance
  • AnnData-native – all tools operate directly on AnnData objects for seamless integration with scanpy workflows

Example Plots

Correlation Dotplot (Simulated Data)

Scatter plot with per-group linear regression fits, Pearson correlation statistics, and marginal histograms. Generated from the simulated data workflow in example_simulated_data/.

Combined bar and dotplot showing per-feature OLINK distributions alongside log2 fold-change with FDR-scaled dot sizes. Data from PMID 33969320.

Volcano plot with multi-level significance coloring (alpha 0.2, 0.1, 0.05) and log2FC threshold lines. FDR-corrected t-test p-values.

Setup

git clone https://github.com/gitbenlewis/adata_science_tools.git
conda env create -f config/env_not_base.yaml -n not_base
conda activate not_base

Running the Examples

# OLINK proteomics differential analysis (PMID 33969320)
bash example_PMID_33969320/scripts/000_run_everything.bash

# Simulated data with configurable covariates
python example_simulated_data/scripts/simulate_1_var_covar_age.py
python example_simulated_data/scripts/plot_dotplot_simulate_1_var_covar_age.py

Repository

Source code and documentation: gitbenlewis/adata_science_tools