Overview

G2PT is a hierarchical Genotype-to-Phenotype Transformer that models information flow from SNPs to genes, systems, and phenotypes. The Read the Docs pages focus on how to run experiments and navigate the API reference.

Quickstart

  1. Create the environment and make the repo importable:

conda env create -f environment.yml
conda activate G2PT_github
export PYTHONPATH=.
  1. Prepare inputs:

    • Genotypes: PLINK .bed/.bim/.fam or TSV matrices.

    • Covariates/phenotypes: tab-delimited files with FID and IID columns.

    • Ontology (--onto) and SNP-to-gene mapping (--snp2gene).

  2. Train a model (PLINK example):

python train_snp2p_model.py \
  --onto samples/ontology.txt \
  --snp2gene samples/snp2gene.txt \
  --train-bfile /path/to/train \
  --train-cov /path/to/train.cov --train-pheno /path/to/train.pheno \
  --val-bfile /path/to/val \
  --val-cov /path/to/val.cov --val-pheno /path/to/val.pheno \
  --bt PHENOTYPE \
  --out outputs/run1
  1. Generate predictions and attention summaries:

python predict_attention.py \
  --onto samples/ontology.txt \
  --snp2gene samples/snp2gene.txt \
  --bfile /path/to/test \
  --cov /path/to/test.cov --pheno /path/to/test.pheno \
  --model outputs/run1/model_best.pth \
  --out outputs/run1/test

Where to go next

  • API reference pages for model, dataset, and trainer components are under the API Documentation section.

  • For full CLI usage details and additional examples, see the repository README.