Tree
Overview
The Tree utilities parse hierarchical ontologies of systems (terms) and genes, and optionally link SNPs to genes. They are used throughout the project to build attention masks and define structured inputs for epistasis analysis and SNP2P datasets.
Usage and examples
TreeParser
The ontology input should provide parent/child relationships between systems
(terms) and genes. It can be supplied as a pandas DataFrame or as a path to a
tabular file with parent, child, and interaction columns (for
example is_a or gene). Once loaded, you can inspect the structure or
collapse small terms.
import pandas as pd
from g2pt.tree import TreeParser
# Minimal parent/child ontology with interaction types.
ontology_df = pd.DataFrame(
{
"parent": ["immune_system", "immune_system", "adaptive_immunity"],
"child": ["innate_immunity", "adaptive_immunity", "IL7R"],
"interaction": ["is_a", "is_a", "gene"],
}
)
tree = TreeParser(ontology_df)
tree.summary()
collapsed_tree = tree.collapse(min_term_size=2)
SNPTreeParser
The SNP mapping file is expected to include at least snp and gene
columns (optionally chr if you plan to use by_chr=True). You can pass
either file paths or pandas DataFrames.
from g2pt.tree import SNPTreeParser
tree_parser = SNPTreeParser(
ontology="ontology.tsv",
snp2gene="snp2gene.tsv",
by_chr=True,
)
tree_parser.summary()
API documentation
- class TreeParser
Parses and represents a hierarchical ontology of systems and genes.
This class loads an ontology from a file or DataFrame, builds a graph representation, and provides methods for manipulating and analyzing the ontology.
Example
The ontology input should provide parent/child relationships between systems (terms) and genes. It can be supplied as a pandas DataFrame or as a path to a tabular file with
parent,child, andinteractioncolumns (for exampleis_aorgene). Once loaded, you can inspect the structure or collapse small terms.import pandas as pd from g2pt.tree import TreeParser # Minimal parent/child ontology with interaction types. ontology_df = pd.DataFrame( { "parent": ["immune_system", "immune_system", "adaptive_immunity"], "child": ["innate_immunity", "adaptive_immunity", "IL7R"], "interaction": ["is_a", "is_a", "gene"], } ) tree = TreeParser(ontology_df) tree.summary() collapsed_tree = tree.collapse(min_term_size=2)
- __init__(ontology, dense_attention=False, sys_annot_file=None)
Initializes the TreeParser.
- from_obo(obo_path, dense_attention=False)
Create a TreeParser instance from an OBO file.
- init_ontology(ontology_df, inplace=True, verbose=True)
Initializes the ontology from a DataFrame.
- build_mask(ordered_query, ordered_key, query2key_dict, interaction_value=0, mask_value=-10**4)
Builds a mask for attention.
- Parameters:
- Returns:
A tuple containing the query-to-index mapping, the index-to-query mapping, the key-to-index mapping, the index-to-key mapping, and the mask.
- Return type:
- summary(system=True, gene=True)
Print a summary of the systems and genes in the ontology.
- collapse(to_keep=None, min_term_size=2, verbose=True, inplace=False)
Collapses the ontology by removing small terms.
- Parameters:
- class SNPTreeParser
Parses SNP→gene mappings alongside the system ontology.
This class extends
TreeParserby wiring SNPs into the ontology so downstream datasets can emit SNP, gene, and system indices. Provide the same parent/child ontology used forTreeParserplus a SNP→gene mapping table.Example
The SNP mapping file is expected to include at least
snpandgenecolumns (optionallychrif you plan to useby_chr=True). You can pass either file paths or pandas DataFrames.from g2pt.tree import SNPTreeParser tree_parser = SNPTreeParser( ontology="ontology.tsv", snp2gene="snp2gene.tsv", by_chr=True, ) tree_parser.summary()
- __init__(ontology, snp2gene, dense_attention=False, sys_annot_file=None, by_chr=False, multiple_phenotypes=False, block_bias=False)
- Parameters:
ontology (str or pandas.DataFrame) – path or DataFrame for parent–child ontology
snp2gene (str or pandas.DataFrame) – path or DataFrame for SNP→gene mapping
dense_attention (bool, optional) – Whether to use dense attention.
sys_annot_file (str, optional) – Path to a file containing system annotations.
by_chr (bool, optional) – Whether to process by chromosome.
multiple_phenotypes (bool, optional) – Whether to handle multiple phenotypes.
block_bias (bool, optional) – Whether to use block bias.
- init_ontology_with_snp(ontology_df, snp2gene, inplace=True, multiple_phenotypes=False, verbose=True)
Extend TreeParser.init_ontology by also loading and wiring the SNP→gene table (snp2gene).
- Parameters:
ontology_df (pandas.DataFrame) – A pandas DataFrame containing the ontology.
snp2gene (str or pandas.DataFrame) – path or DataFrame for SNP→gene mapping
inplace (bool, optional) – Whether to modify the object in place.
multiple_phenotypes (bool, optional) – Whether to handle multiple phenotypes.
verbose (bool, optional) – Whether to print progress messages.