Epistasis Simulation
Overview
The epistasis simulation utilities generate synthetic genotype and phenotype arrays with configurable additive and pairwise interaction effects. A companion helper builds a hierarchical SNP→Gene→System ontology aligned to the simulated causal structure, which can be exported as TSV files for downstream pipelines.
Usage and examples
Simulate epistatic data
from src.utils.analysis.epistasis_simulation import simulate_epistasis
sim = simulate_epistasis(
n_samples=2000,
n_snps=50000,
n_additive=200,
n_pairs=75,
h2_additive=0.4,
h2_epistatic=0.2,
n_ld_blocks=200,
ld_rho=0.8,
)
Build a hierarchy aligned to the simulation
from src.utils.analysis.epistasis_simulation import build_hierarchical_ontology
snp_df, gene_df, system_df = build_hierarchical_ontology(
sim,
n_genes=800,
n_systems=60,
ontology_coherence=0.5,
overlap_prob=0.25,
n_causal_systems=20,
)
API documentation
- simulate_epistasis(n_samples=1000, n_snps=20000, n_additive=100, n_pairs=50, h2_additive=0.5, h2_epistatic=0.1, min_additive_p_value=1e-4, min_epistatic_p_value=1e-3, maf_range=(0.05, 0.5), n_ld_blocks=100, ld_rho=0.8, epistasis_bias=10.0, seed=42)
Simulate genotype data with additive and epistatic effects.
- Parameters:
n_samples (int) – Number of individuals.
n_snps (int) – Total number of SNPs.
n_additive (int) – Number of causal additive SNPs.
n_pairs (int) – Number of causal epistatic SNP pairs.
h2_additive (float) – Additive heritability budget.
h2_epistatic (float) – Epistatic heritability budget.
min_additive_p_value (float, optional) – Minimum marginal p-value enforced for causal additive SNPs.
min_epistatic_p_value (float, optional) – Minimum interaction p-value enforced for causal pairs.
maf_range (tuple[float, float]) – Minor-allele frequency range for SNPs.
n_ld_blocks (int) – Number of LD blocks to simulate.
ld_rho (float) – Correlation coefficient for adjacent SNPs in a block.
epistasis_bias (float) – Bias factor to steer epistatic pairs toward SNPs with smaller additive effects.
seed (int) – Random seed.
- Returns:
Dictionary with genotype matrix, phenotype vector, causal SNP indices, epistatic pairs, effect sizes, and LD block assignments.
- Return type:
- build_hierarchical_ontology(sim, n_genes=800, n_systems=60, ontology_coherence=0.5, overlap_prob=0.25, n_causal_systems=20, causal_system_enrichment=5.0, seed=123)
Build a hierarchical SNP→Gene→System ontology aligned to the simulation.
- Parameters:
sim (dict) – Output dictionary from
simulate_epistasis().n_genes (int) – Number of genes to simulate.
n_systems (int) – Number of leaf systems in the ontology.
ontology_coherence (float) – Controls whether epistatic pairs map to the same, related, or distant systems.
overlap_prob (float) – Probability of overlapping genes across systems.
n_causal_systems (int) – Number of systems to enrich for causal genes.
causal_system_enrichment (float) – Weight applied when sampling enriched systems.
seed (int) – Random seed.
- Returns:
Tuple of DataFrames: SNP-to-gene, gene-to-system, and system hierarchy.
- Return type:
tuple[pandas.DataFrame, pandas.DataFrame, pandas.DataFrame]