# Overview G2PT is a hierarchical Genotype-to-Phenotype Transformer that models information flow from SNPs to genes, systems, and phenotypes. The Read the Docs pages focus on how to run experiments and navigate the API reference. ## Quickstart 1. Create the environment and make the repo importable: ```bash conda env create -f environment.yml conda activate G2PT_github export PYTHONPATH=. ``` 2. Prepare inputs: - Genotypes: PLINK `.bed/.bim/.fam` or TSV matrices. - Covariates/phenotypes: tab-delimited files with `FID` and `IID` columns. - Ontology (`--onto`) and SNP-to-gene mapping (`--snp2gene`). 3. Train a model (PLINK example): ```bash python train_snp2p_model.py \ --onto samples/ontology.txt \ --snp2gene samples/snp2gene.txt \ --train-bfile /path/to/train \ --train-cov /path/to/train.cov --train-pheno /path/to/train.pheno \ --val-bfile /path/to/val \ --val-cov /path/to/val.cov --val-pheno /path/to/val.pheno \ --bt PHENOTYPE \ --out outputs/run1 ``` 4. Generate predictions and attention summaries: ```bash python predict_attention.py \ --onto samples/ontology.txt \ --snp2gene samples/snp2gene.txt \ --bfile /path/to/test \ --cov /path/to/test.cov --pheno /path/to/test.pheno \ --model outputs/run1/model_best.pth \ --out outputs/run1/test ``` ## Where to go next - API reference pages for model, dataset, and trainer components are under the **API Documentation** section. - For full CLI usage details and additional examples, see the repository [README](../README.md).