Query

Query interface

PyHGNC provides a powerfull query interface for the stored data. It can be accessed from python shell:

import pyhgnc
query = pyhgnc.query()

You can use the query interface instance to issue a query to any model defined in pyhgnc.manager.models:

# Issue query on hgnc table:
query.hgnc()

# Issue query on pubmed table:
query.pubmed()

Hint

See Query functions for more examples and check out pyhgnc.manager.query.QueryManager (below) for all possible parameters for the different models.

Query Manager Reference

class pyhgnc.manager.query.QueryManager(connection=None, echo=False)[source]

Query interface to database.

alias_name(alias_name=None, is_previous_name=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]

Method to query models.AliasName objects in database

Parameters:
Returns:

Return type:

list(models.AliasSymbol) or pandas.DataFrame

alias_symbol(alias_symbol=None, is_previous_symbol=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]

Method to query models.AliasSymbol objects in database

Parameters:
Returns:

Return type:

list(models.AliasSymbol) or pandas.DataFrame

ccds(ccdsid=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]

Method to query models.CCDS objects in database

Parameters:
  • ccdsid (str or tuple(str) or None) – Consensus CDS ID(s)
  • hgnc_symbol (str or tuple(str) or None) – HGNC symbol(s)
  • hgnc_identifier (int or tuple(int) or None) – identifiers(s) in models.HGNC
  • limit (int or tuple(int) or None) –
    • if isinstance(limit,int)==True -> limit
    • if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
    • if limit == None -> all results
  • as_df (bool) – if True results are returned as pandas.DataFrame
Returns:

Return type:

list(models.CCDS) or pandas.DataFrame

ena(enaid=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]

Method to query models.ENA objects in database

Parameters:
  • enaid (str or tuple(str) or None) – European Nucleotide Archive (ENA) identifier(s)
  • hgnc_symbol (str or tuple(str) or None) – HGNC symbol(s)
  • hgnc_identifier (int or tuple(int) or None) – identifiers(s) in models.HGNC
  • limit (int or tuple(int) or None) –
    • if isinstance(limit,int)==True -> limit
    • if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
    • if limit == None -> all results
  • as_df (bool) – if True results are returned as pandas.DataFrame
Returns:

Return type:

list(models.ENA) or pandas.DataFrame

enzyme(ec_number=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]

Method to query models.Enzyme objects in database

Parameters:
  • ec_number (str or tuple(str) or None) – Enzyme Commission number (EC number)(s)
  • hgnc_symbol (str or tuple(str) or None) – HGNC symbol(s)
  • hgnc_identifier (int or tuple(int) or None) – identifiers(s) in models.HGNC
  • limit (int or tuple(int) or None) –
    • if isinstance(limit,int)==True -> limit
    • if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
    • if limit == None -> all results
  • as_df (bool) – if True results are returned as pandas.DataFrame
Returns:

Return type:

list(models.Enzyme) or pandas.DataFrame

gene_family(family_identifier=None, family_name=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]

Method to query models.GeneFamily objects in database

Parameters:
  • family_identifier (int or tuple(int) or None) – gene family identifier(s)
  • family_name (str or tuple(str) or None) – gene family name(s)
  • hgnc_symbol (str or tuple(str) or None) – HGNC symbol(s)
  • hgnc_identifier (int or tuple(int) or None) – identifiers(s) in models.HGNC
  • limit (int or tuple(int) or None) –
    • if isinstance(limit,int)==True -> limit
    • if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
    • if limit == None -> all results
  • as_df (bool) – if True results are returned as pandas.DataFrame
Returns:

Return type:

list(models.AliasSymbol) or pandas.DataFrame

get_model_queries(query_obj, model_queries_config)[source]

use this if your are searching for a field in the same model

hgnc(name=None, symbol=None, identifier=None, status=None, uuid=None, locus_group=None, orphanet=None, locus_type=None, date_name_changed=None, date_modified=None, date_symbol_changed=None, pubmedid=None, date_approved_reserved=None, ensembl_gene=None, horde=None, vega=None, lncrnadb=None, uniprotid=None, entrez=None, mirbase=None, iuphar=None, ucsc=None, snornabase=None, gene_family_name=None, mgdid=None, pseudogeneorg=None, bioparadigmsslc=None, locationsortable=None, ec_number=None, refseq_accession=None, merops=None, location=None, cosmic=None, imgt=None, enaid=None, alias_symbol=None, alias_name=None, rgdid=None, omimid=None, ccdsid=None, lsdbs=None, ortholog_species=None, gene_family_identifier=None, limit=None, as_df=False)[source]

Method to query pyhgnc.manager.models.Pmid

Parameters:
  • name (str or tuple(str) or None) – HGNC approved name for the gene
  • symbol (str or tuple(str) or None) – HGNC approved gene symbol
  • identifier (int or tuple(int) or None) – HGNC ID. A unique ID created by the HGNC for every approved symbol
  • status (str or tuple(str) or None) – Status of the symbol report, which can be either “Approved” or “Entry Withdrawn”
  • uuid (str or tuple(str) or None) – universally unique identifier
  • locus_group (str or tuple(str) or None) – group name for a set of related locus types as defined by the HGNC
  • orphanet (int ot tuple(int) or None) – Orphanet database identifier (related to rare diseases and orphan drugs)
  • locus_type (str or tuple(str) or None) – locus type as defined by the HGNC (e.g. RNA, transfer)
  • date_name_changed (str or tuple(str) or None) – date the gene name was last changed (format: YYYY-mm-dd, e.g. 2017-09-29)
  • date_modified (str or tuple(str) or None) – date the entry was last modified (format: YYYY-mm-dd, e.g. 2017-09-29)
  • date_symbol_changed (str or tuple(str) or None) – date the gene symbol was last changed (format: YYYY-mm-dd, e.g. 2017-09-29)
  • date_approved_reserved (str or tuple(str) or None) – date the entry was first approved (format: YYYY-mm-dd, e.g. 2017-09-29)
  • pubmedid (int ot tuple(int) or None) – PubMed identifier
  • ensembl_gene (str or tuple(str) or None) – Ensembl gene ID. Found within the “GENE RESOURCES” section of the gene symbol report
  • horde (str or tuple(str) or None) – symbol used within HORDE for the gene (not available in JSON)
  • vega (str or tuple(str) or None) – Vega gene ID. Found within the “GENE RESOURCES” section of the gene symbol report
  • lncrnadb (str or tuple(str) or None) – Noncoding RNA Database identifier
  • uniprotid (str or tuple(str) or None) – UniProt identifier
  • entrez (str or tuple(str) or None) – Entrez gene ID. Found within the “GENE RESOURCES” section of the gene symbol report
  • mirbase (str or tuple(str) or None) – miRBase ID
  • iuphar (str or tuple(str) or None) – The objectId used to link to the IUPHAR/BPS Guide to PHARMACOLOGY database
  • ucsc (str or tuple(str) or None) – UCSC gene ID. Found within the “GENE RESOURCES” section of the gene symbol report
  • snornabase (str or tuple(str) or None) – snoRNABase ID
  • gene_family_name (int or tuple(int) or None) – Gene family name
  • gene_family_identifier – Gene family identifier
  • mgdid (int ot tuple(int) or None) – Mouse Genome Database identifier
  • imgt (str or tuple(str) or None) – Symbol used within international ImMunoGeneTics information system
  • enaid (str or tuple(str) or None) – European Nucleotide Archive (ENA) identifier
  • alias_symbol (str or tuple(str) or None) – Other symbols used to refer to a gene
  • alias_name (str or tuple(str) or None) – Other names used to refer to a gene
  • pseudogeneorg (str or tuple(str) or None) – Pseudogene.org ID
  • bioparadigmsslc (str or tuple(str) or None) – Symbol used to link to the SLC tables database at bioparadigms.org for the gene
  • locationsortable (str or tuple(str) or None) – locations sortable
  • ec_number (str or tuple(str) or None) – Enzyme Commission number (EC number)
  • refseq_accession (str or tuple(str) or None) – RefSeq nucleotide accession(s)
  • merops (str or tuple(str) or None) – ID used to link to the MEROPS peptidase database
  • location (str or tuple(str) or None) – Cytogenetic location of the gene (e.g. 2q34).
  • cosmic (str or tuple(str) or None) – Symbol used within the Catalogue of somatic mutations in cancer for the gene
  • rgdid (int or tuple(int) or None) – Rat genome database gene ID
  • omimid (int or tuple(int) or None) – Online Mendelian Inheritance in Man (OMIM) ID
  • ccdsid (str or tuple(str) or None) – Consensus CDS ID
  • lsdbs (str or tuple(str) or None) – Locus Specific Mutation Database Name
  • ortholog_species (int or tuple(int) or None) – Ortholog species NCBI taxonomy identifier
  • limit (int or tuple(int) or None) –
    • if isinstance(limit,int)==True -> limit
    • if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
    • if limit == None -> all results
  • as_df (bool) – if True results are returned as pandas.DataFrame
Returns:

  • if as_df == False -> list(models.Keyword)
  • if as_df == True -> pandas.DataFrame

Return type:

list[models.HGNC]

lsdb(lsdb=None, url=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]

Method to query models.LSDB objects in database

Parameters:
  • lsdb (str or tuple(str) or None) – name(s) of the Locus Specific Mutation Database
  • url (str or tuple(str) or None) – URL of the Locus Specific Mutation Database
  • hgnc_symbol (str or tuple(str) or None) – HGNC symbol(s)
  • hgnc_identifier (int or tuple(int) or None) – identifiers(s) in models.HGNC
  • limit (int or tuple(int) or None) –
    • if isinstance(limit,int)==True -> limit
    • if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
    • if limit == None -> all results
  • as_df (bool) – if True results are returned as pandas.DataFrame
Returns:

Return type:

list(models.LSDB) or pandas.DataFrame

mgd(mgdid=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]

Method to query models.MGD objects in database

Parameters:
  • mgdid (str or tuple(str) or None) – Mouse genome informatics database ID(s)
  • hgnc_symbol (str or tuple(str) or None) – HGNC symbol(s)
  • hgnc_identifier (int or tuple(int) or None) – identifiers(s) in models.HGNC
  • limit (int or tuple(int) or None) –
    • if isinstance(limit,int)==True -> limit
    • if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
    • if limit == None -> all results
  • as_df (bool) – if True results are returned as pandas.DataFrame
Returns:

Return type:

list(models.MGD) or pandas.DataFrame

omim(omimid=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]

Method to query models.OMIM objects in database

Parameters:
  • omimid (str or tuple(str) or None) – Online Mendelian Inheritance in Man (OMIM) ID(s)
  • hgnc_symbol (str or tuple(str) or None) – HGNC symbol(s)
  • hgnc_identifier (int or tuple(int) or None) – identifiers(s) in models.HGNC
  • limit (int or tuple(int) or None) –
    • if isinstance(limit,int)==True -> limit
    • if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
    • if limit == None -> all results
  • as_df (bool) – if True results are returned as pandas.DataFrame
Returns:

Return type:

list(models.OMIM) or pandas.DataFrame

orthology_prediction(ortholog_species=None, human_entrez_gene=None, human_ensembl_gene=None, human_name=None, human_symbol=None, human_chr=None, human_assert_ids=None, ortholog_species_entrez_gene=None, ortholog_species_ensembl_gene=None, ortholog_species_db_id=None, ortholog_species_name=None, ortholog_species_symbol=None, ortholog_species_chr=None, ortholog_species_assert_ids=None, support=None, hgnc_identifier=None, hgnc_symbol=None, limit=None, as_df=False)[source]

Method to query pyhgnc.manager.models.OrthologyPrediction

Parameters:
  • ortholog_species (int) – NCBI taxonomy identifier
  • human_entrez_gene (str) – Entrez gene identifier
  • human_ensembl_gene (str) – Ensembl identifier
  • human_name (str) – human gene name
  • human_symbol (str) – human gene symbol
  • human_chr (str) – human chromosome
  • human_assert_ids (str) –
  • ortholog_species_entrez_gene (str) – Entrez gene identifier for ortholog
  • ortholog_species_ensembl_gene (str) – Ensembl gene identifier for ortholog
  • ortholog_species_db_id (str) – Species specific database identifier (e.g. MGI:1920453)
  • ortholog_species_name (str) – gene name of ortholog
  • ortholog_species_symbol (str) – gene symbol of ortholog
  • ortholog_species_chr (str) – chromosome identifier (ortholog)
  • ortholog_species_assert_ids (str) –
  • support (str) –
  • hgnc_identifier (int) – HGNC identifier
  • hgnc_symbol (str) – HGNC symbol
  • limit (int or tuple(int) or None) –
    • if isinstance(limit,int)==True -> limit
    • if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
    • if limit == None -> all results
  • as_df (bool) – if True results are returned as pandas.DataFrame
Returns:

  • if as_df == False -> list(models.Keyword)
  • if as_df == True -> pandas.DataFrame

Return type:

list(models.Keyword) or pandas.DataFrame

pubmed(pubmedid=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]

Method to query models.PubMed objects in database

Parameters:
Returns:

Return type:

list(models.PubMed) or pandas.DataFrame

ref_seq(accession=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]

Method to query models.RefSeq objects in database

Parameters:
  • accession (str or tuple(str) or None) – RefSeq accessionl(s)
  • hgnc_symbol (str or tuple(str) or None) – HGNC symbol(s)
  • hgnc_identifier (int or tuple(int) or None) – identifiers(s) in models.HGNC
  • limit (int or tuple(int) or None) –
    • if isinstance(limit,int)==True -> limit
    • if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
    • if limit == None -> all results
  • as_df (bool) – if True results are returned as pandas.DataFrame
Returns:

Return type:

list(models.RefSeq) or pandas.DataFrame

rgd(rgdid=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]

Method to query models.RGD objects in database

Parameters:
  • rgdid (str or tuple(str) or None) – Rat genome database gene ID(s)
  • hgnc_symbol (str or tuple(str) or None) – HGNC symbol(s)
  • hgnc_identifier (int or tuple(int) or None) – identifiers(s) in models.HGNC
  • limit (int or tuple(int) or None) –
    • if isinstance(limit,int)==True -> limit
    • if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
    • if limit == None -> all results
  • as_df (bool) – if True results are returned as pandas.DataFrame
Returns:

Return type:

list(models.RGD) or pandas.DataFrame

uniprot(uniprotid=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]

Method to query models.UniProt objects in database

Parameters:
  • uniprotid (str or tuple(str) or None) – UniProt identifier(s)
  • hgnc_symbol (str or tuple(str) or None) – HGNC symbol(s)
  • hgnc_identifier (int or tuple(int) or None) – identifiers(s) in models.HGNC
  • limit (int or tuple(int) or None) –
    • if isinstance(limit,int)==True -> limit
    • if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
    • if limit == None -> all results
  • as_df (bool) – if True results are returned as pandas.DataFrame
Returns:

Return type:

list(models.UniProt) or pandas.DataFrame