Epistasis Retrieval Evaluation

Overview

These utilities evaluate epistasis retrieval by comparing discovered SNP pairs against known causal interactions. They coordinate loading attention scores, genotypes, and ontology mappings, then compute retrieval metrics for top-ranked systems.

Usage and examples

Example: configure and run evaluation

from src.utils.analysis.epistasis_retrieval_evaluation import (
    EvaluationConfig,
    EpistasisRetrievalEvaluator,
)

config = EvaluationConfig(
    causal_info="data/causal.json",
    system_importance="outputs/system_importance.csv",
    attention_results="outputs/attention_scores.csv",
    tsv="data/genotypes.tsv",
    pheno="data/phenotypes.tsv",
    cov="data/covariates.tsv",
    onto="ontology.tsv",
    snp2gene="snp2gene.tsv",
    top_n_systems=50,
    output_prefix="outputs/epistasis_eval",
    num_workers=4,
    executor_type="process",
    quantiles=(0.9, 0.95),
    snp_threshold=50,
)

evaluator = EpistasisRetrievalEvaluator(config)
evaluator.evaluate()

API documentation

class EvaluationConfig

Configuration container for epistasis retrieval evaluation inputs and settings.

Parameters:
  • causal_info (str) – Path to the JSON file containing causal SNP and epistasis information.

  • system_importance (str) – Path to the system importance CSV file.

  • attention_results (str) – Path to the attention results CSV file.

  • tsv (str) – Path to the genotype TSV file.

  • pheno (str) – Path to the phenotype file.

  • cov (str) – Path to the covariate file.

  • onto (str) – Path to the ontology file.

  • snp2gene (str) – Path to the SNP-to-gene mapping file.

  • top_n_systems (int) – Number of top-ranked systems to evaluate.

  • output_prefix (str) – Prefix used when writing summary and p-value outputs.

  • num_workers (int) – Number of workers for parallel processing.

  • executor_type (str) – Execution backend (process or thread).

  • quantiles (Sequence[float]) – Quantiles used when filtering attention scores.

  • snp_threshold (int) – Optional SNP count threshold for skipping large systems.

class EpistasisRetrievalEvaluator

Coordinates loading inputs, running diagnostic checks, parallel epistasis searches, and exporting evaluation metrics.

__init__(config)

Stores the evaluation configuration.

Parameters:

config (EvaluationConfig) – Parsed evaluation configuration.

evaluate()

Executes the end-to-end evaluation, including metrics and curve output.

build_arg_parser()

Builds the CLI argument parser for the evaluation workflow.

build_config_from_args(args)

Converts CLI arguments into an EvaluationConfig.

Parameters:

args (argparse.Namespace) – Parsed command-line arguments.

main()

CLI entry point that constructs the configuration and runs the evaluator.