Epistasis Simulation

Overview

The epistasis simulation utilities generate synthetic genotype and phenotype arrays with configurable additive and pairwise interaction effects. A companion helper builds a hierarchical SNP→Gene→System ontology aligned to the simulated causal structure, which can be exported as TSV files for downstream pipelines.

Usage and examples

Simulate epistatic data

from src.utils.analysis.epistasis_simulation import simulate_epistasis

sim = simulate_epistasis(
    n_samples=2000,
    n_snps=50000,
    n_additive=200,
    n_pairs=75,
    h2_additive=0.4,
    h2_epistatic=0.2,
    n_ld_blocks=200,
    ld_rho=0.8,
)

Build a hierarchy aligned to the simulation

from src.utils.analysis.epistasis_simulation import build_hierarchical_ontology

snp_df, gene_df, system_df = build_hierarchical_ontology(
    sim,
    n_genes=800,
    n_systems=60,
    ontology_coherence=0.5,
    overlap_prob=0.25,
    n_causal_systems=20,
)

API documentation

simulate_epistasis(n_samples=1000, n_snps=20000, n_additive=100, n_pairs=50, h2_additive=0.5, h2_epistatic=0.1, min_additive_p_value=1e-4, min_epistatic_p_value=1e-3, maf_range=(0.05, 0.5), n_ld_blocks=100, ld_rho=0.8, epistasis_bias=10.0, seed=42)

Simulate genotype data with additive and epistatic effects.

Parameters:
  • n_samples (int) – Number of individuals.

  • n_snps (int) – Total number of SNPs.

  • n_additive (int) – Number of causal additive SNPs.

  • n_pairs (int) – Number of causal epistatic SNP pairs.

  • h2_additive (float) – Additive heritability budget.

  • h2_epistatic (float) – Epistatic heritability budget.

  • min_additive_p_value (float, optional) – Minimum marginal p-value enforced for causal additive SNPs.

  • min_epistatic_p_value (float, optional) – Minimum interaction p-value enforced for causal pairs.

  • maf_range (tuple[float, float]) – Minor-allele frequency range for SNPs.

  • n_ld_blocks (int) – Number of LD blocks to simulate.

  • ld_rho (float) – Correlation coefficient for adjacent SNPs in a block.

  • epistasis_bias (float) – Bias factor to steer epistatic pairs toward SNPs with smaller additive effects.

  • seed (int) – Random seed.

Returns:

Dictionary with genotype matrix, phenotype vector, causal SNP indices, epistatic pairs, effect sizes, and LD block assignments.

Return type:

dict

build_hierarchical_ontology(sim, n_genes=800, n_systems=60, ontology_coherence=0.5, overlap_prob=0.25, n_causal_systems=20, causal_system_enrichment=5.0, seed=123)

Build a hierarchical SNP→Gene→System ontology aligned to the simulation.

Parameters:
  • sim (dict) – Output dictionary from simulate_epistasis().

  • n_genes (int) – Number of genes to simulate.

  • n_systems (int) – Number of leaf systems in the ontology.

  • ontology_coherence (float) – Controls whether epistatic pairs map to the same, related, or distant systems.

  • overlap_prob (float) – Probability of overlapping genes across systems.

  • n_causal_systems (int) – Number of systems to enrich for causal genes.

  • causal_system_enrichment (float) – Weight applied when sampling enriched systems.

  • seed (int) – Random seed.

Returns:

Tuple of DataFrames: SNP-to-gene, gene-to-system, and system hierarchy.

Return type:

tuple[pandas.DataFrame, pandas.DataFrame, pandas.DataFrame]