Epistasis Retrieval Evaluation

Overview

These utilities evaluate epistasis retrieval by comparing discovered SNP pairs against known causal interactions. They coordinate loading attention scores, genotypes, and ontology mappings, then compute retrieval metrics for top-ranked systems.

Usage and examples

Example: configure and run evaluation

from src.utils.analysis.epistasis_retrieval_evaluation import (
    EvaluationConfig,
    EpistasisRetrievalEvaluator,
)

config = EvaluationConfig(
    causal_info="data/causal.json",
    system_importance="outputs/system_importance.csv",
    attention_results="outputs/attention_scores.csv",
    tsv="data/genotypes.tsv",
    pheno="data/phenotypes.tsv",
    cov="data/covariates.tsv",
    onto="ontology.tsv",
    snp2gene="snp2gene.tsv",
    top_n_systems=50,
    output_prefix="outputs/epistasis_eval",
    num_workers=4,
    executor_type="process",
    quantiles=(0.9, 0.95),
    snp_threshold=50,
)

evaluator = EpistasisRetrievalEvaluator(config)
evaluator.evaluate()

API documentation

class EvaluationConfig

Configuration container for epistasis retrieval evaluation inputs and settings.

Parameters:

causal_info (str) – Path to the JSON file containing causal SNP and epistasis information.
system_importance (str) – Path to the system importance CSV file.
attention_results (str) – Path to the attention results CSV file.
tsv (str) – Path to the genotype TSV file.
pheno (str) – Path to the phenotype file.
cov (str) – Path to the covariate file.
onto (str) – Path to the ontology file.
snp2gene (str) – Path to the SNP-to-gene mapping file.
top_n_systems (int) – Number of top-ranked systems to evaluate.
output_prefix (str) – Prefix used when writing summary and p-value outputs.
num_workers (int) – Number of workers for parallel processing.
executor_type (str) – Execution backend (process or thread).
quantiles (Sequence[float]) – Quantiles used when filtering attention scores.
snp_threshold (int) – Optional SNP count threshold for skipping large systems.

class EpistasisRetrievalEvaluator

Coordinates loading inputs, running diagnostic checks, parallel epistasis searches, and exporting evaluation metrics.

__init__(config)

Stores the evaluation configuration.

Parameters:: config (EvaluationConfig) – Parsed evaluation configuration.

evaluate(): Executes the end-to-end evaluation, including metrics and curve output.

build_arg_parser(): Builds the CLI argument parser for the evaluation workflow.

build_config_from_args(args)

Converts CLI arguments into an EvaluationConfig.

Parameters:: args (argparse.Namespace) – Parsed command-line arguments.

main(): CLI entry point that constructs the configuration and runs the evaluator.