Query¶
Contents
Query interface¶
PyHGNC provides a powerfull query interface for the stored data. It can be accessed from python shell:
import pyhgnc
query = pyhgnc.query()
You can use the query interface instance to issue a query to any model defined in pyhgnc.manager.models
:
# Issue query on hgnc table:
query.hgnc()
# Issue query on pubmed table:
query.pubmed()
Hint
See Query functions for more examples and check out pyhgnc.manager.query.QueryManager
(below) for
all possible parameters for the different models.
Query Manager Reference¶
-
class
pyhgnc.manager.query.
QueryManager
(connection=None, echo=False)[source]¶ Query interface to database.
-
alias_name
(alias_name=None, is_previous_name=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]¶ Method to query
models.AliasName
objects in databaseParameters: - alias_name (str or tuple(str) or None) – alias name(s)
- is_previous_name (bool or tuple(bool) or None) – flag for ‘is previous’
- hgnc_symbol (str or tuple(str) or None) – HGNC symbol(s)
- hgnc_identifier (int or tuple(int) or None) – identifiers(s) in
models.HGNC
- limit (int or tuple(int) or None) –
- if isinstance(limit,int)==True -> limit
- if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
- if limit == None -> all results
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: - if as_df == False -> list(
models.AliasSymbol
) - if as_df == True ->
pandas.DataFrame
Return type: list(
models.AliasSymbol
) orpandas.DataFrame
-
alias_symbol
(alias_symbol=None, is_previous_symbol=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]¶ Method to query
models.AliasSymbol
objects in databaseParameters: - alias_symbol (str or tuple(str) or None) – alias symbol(s)
- is_previous_symbol (bool or tuple(bool) or None) – flag for ‘is previous’
- hgnc_symbol (str or tuple(str) or None) – HGNC symbol(s)
- hgnc_identifier (int or tuple(int) or None) – identifiers(s) in
models.HGNC
- limit (int or tuple(int) or None) –
- if isinstance(limit,int)==True -> limit
- if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
- if limit == None -> all results
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: - if as_df == False -> list(
models.AliasSymbol
) - if as_df == True ->
pandas.DataFrame
Return type: list(
models.AliasSymbol
) orpandas.DataFrame
-
ccds
(ccdsid=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]¶ Method to query
models.CCDS
objects in databaseParameters: - ccdsid (str or tuple(str) or None) – Consensus CDS ID(s)
- hgnc_symbol (str or tuple(str) or None) – HGNC symbol(s)
- hgnc_identifier (int or tuple(int) or None) – identifiers(s) in
models.HGNC
- limit (int or tuple(int) or None) –
- if isinstance(limit,int)==True -> limit
- if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
- if limit == None -> all results
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: - if as_df == False -> list(
models.CCDS
) - if as_df == True ->
pandas.DataFrame
Return type: list(
models.CCDS
) orpandas.DataFrame
-
ena
(enaid=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]¶ Method to query
models.ENA
objects in databaseParameters: - enaid (str or tuple(str) or None) – European Nucleotide Archive (ENA) identifier(s)
- hgnc_symbol (str or tuple(str) or None) – HGNC symbol(s)
- hgnc_identifier (int or tuple(int) or None) – identifiers(s) in
models.HGNC
- limit (int or tuple(int) or None) –
- if isinstance(limit,int)==True -> limit
- if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
- if limit == None -> all results
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: - if as_df == False -> list(
models.ENA
) - if as_df == True ->
pandas.DataFrame
Return type: list(
models.ENA
) orpandas.DataFrame
-
enzyme
(ec_number=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]¶ Method to query
models.Enzyme
objects in databaseParameters: - ec_number (str or tuple(str) or None) – Enzyme Commission number (EC number)(s)
- hgnc_symbol (str or tuple(str) or None) – HGNC symbol(s)
- hgnc_identifier (int or tuple(int) or None) – identifiers(s) in
models.HGNC
- limit (int or tuple(int) or None) –
- if isinstance(limit,int)==True -> limit
- if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
- if limit == None -> all results
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: - if as_df == False -> list(
models.Enzyme
) - if as_df == True ->
pandas.DataFrame
Return type: list(
models.Enzyme
) orpandas.DataFrame
-
gene_family
(family_identifier=None, family_name=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]¶ Method to query
models.GeneFamily
objects in databaseParameters: - family_identifier (int or tuple(int) or None) – gene family identifier(s)
- family_name (str or tuple(str) or None) – gene family name(s)
- hgnc_symbol (str or tuple(str) or None) – HGNC symbol(s)
- hgnc_identifier (int or tuple(int) or None) – identifiers(s) in
models.HGNC
- limit (int or tuple(int) or None) –
- if isinstance(limit,int)==True -> limit
- if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
- if limit == None -> all results
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: - if as_df == False -> list(
models.AliasSymbol
) - if as_df == True ->
pandas.DataFrame
Return type: list(
models.AliasSymbol
) orpandas.DataFrame
-
get_model_queries
(query_obj, model_queries_config)[source]¶ use this if your are searching for a field in the same model
-
hgnc
(name=None, symbol=None, identifier=None, status=None, uuid=None, locus_group=None, orphanet=None, locus_type=None, date_name_changed=None, date_modified=None, date_symbol_changed=None, pubmedid=None, date_approved_reserved=None, ensembl_gene=None, horde=None, vega=None, lncrnadb=None, uniprotid=None, entrez=None, mirbase=None, iuphar=None, ucsc=None, snornabase=None, gene_family_name=None, mgdid=None, pseudogeneorg=None, bioparadigmsslc=None, locationsortable=None, ec_number=None, refseq_accession=None, merops=None, location=None, cosmic=None, imgt=None, enaid=None, alias_symbol=None, alias_name=None, rgdid=None, omimid=None, ccdsid=None, lsdbs=None, ortholog_species=None, gene_family_identifier=None, limit=None, as_df=False)[source]¶ Method to query
pyhgnc.manager.models.Pmid
Parameters: - name (str or tuple(str) or None) – HGNC approved name for the gene
- symbol (str or tuple(str) or None) – HGNC approved gene symbol
- identifier (int or tuple(int) or None) – HGNC ID. A unique ID created by the HGNC for every approved symbol
- status (str or tuple(str) or None) – Status of the symbol report, which can be either “Approved” or “Entry Withdrawn”
- uuid (str or tuple(str) or None) – universally unique identifier
- locus_group (str or tuple(str) or None) – group name for a set of related locus types as defined by the HGNC
- orphanet (int ot tuple(int) or None) – Orphanet database identifier (related to rare diseases and orphan drugs)
- locus_type (str or tuple(str) or None) – locus type as defined by the HGNC (e.g. RNA, transfer)
- date_name_changed (str or tuple(str) or None) – date the gene name was last changed (format: YYYY-mm-dd, e.g. 2017-09-29)
- date_modified (str or tuple(str) or None) – date the entry was last modified (format: YYYY-mm-dd, e.g. 2017-09-29)
- date_symbol_changed (str or tuple(str) or None) – date the gene symbol was last changed (format: YYYY-mm-dd, e.g. 2017-09-29)
- date_approved_reserved (str or tuple(str) or None) – date the entry was first approved (format: YYYY-mm-dd, e.g. 2017-09-29)
- pubmedid (int ot tuple(int) or None) – PubMed identifier
- ensembl_gene (str or tuple(str) or None) – Ensembl gene ID. Found within the “GENE RESOURCES” section of the gene symbol report
- horde (str or tuple(str) or None) – symbol used within HORDE for the gene (not available in JSON)
- vega (str or tuple(str) or None) – Vega gene ID. Found within the “GENE RESOURCES” section of the gene symbol report
- lncrnadb (str or tuple(str) or None) – Noncoding RNA Database identifier
- uniprotid (str or tuple(str) or None) – UniProt identifier
- entrez (str or tuple(str) or None) – Entrez gene ID. Found within the “GENE RESOURCES” section of the gene symbol report
- mirbase (str or tuple(str) or None) – miRBase ID
- iuphar (str or tuple(str) or None) – The objectId used to link to the IUPHAR/BPS Guide to PHARMACOLOGY database
- ucsc (str or tuple(str) or None) – UCSC gene ID. Found within the “GENE RESOURCES” section of the gene symbol report
- snornabase (str or tuple(str) or None) – snoRNABase ID
- gene_family_name (int or tuple(int) or None) – Gene family name
- gene_family_identifier – Gene family identifier
- mgdid (int ot tuple(int) or None) – Mouse Genome Database identifier
- imgt (str or tuple(str) or None) – Symbol used within international ImMunoGeneTics information system
- enaid (str or tuple(str) or None) – European Nucleotide Archive (ENA) identifier
- alias_symbol (str or tuple(str) or None) – Other symbols used to refer to a gene
- alias_name (str or tuple(str) or None) – Other names used to refer to a gene
- pseudogeneorg (str or tuple(str) or None) – Pseudogene.org ID
- bioparadigmsslc (str or tuple(str) or None) – Symbol used to link to the SLC tables database at bioparadigms.org for the gene
- locationsortable (str or tuple(str) or None) – locations sortable
- ec_number (str or tuple(str) or None) – Enzyme Commission number (EC number)
- refseq_accession (str or tuple(str) or None) – RefSeq nucleotide accession(s)
- merops (str or tuple(str) or None) – ID used to link to the MEROPS peptidase database
- location (str or tuple(str) or None) – Cytogenetic location of the gene (e.g. 2q34).
- cosmic (str or tuple(str) or None) – Symbol used within the Catalogue of somatic mutations in cancer for the gene
- rgdid (int or tuple(int) or None) – Rat genome database gene ID
- omimid (int or tuple(int) or None) – Online Mendelian Inheritance in Man (OMIM) ID
- ccdsid (str or tuple(str) or None) – Consensus CDS ID
- lsdbs (str or tuple(str) or None) – Locus Specific Mutation Database Name
- ortholog_species (int or tuple(int) or None) – Ortholog species NCBI taxonomy identifier
- limit (int or tuple(int) or None) –
- if isinstance(limit,int)==True -> limit
- if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
- if limit == None -> all results
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: - if as_df == False -> list(
models.Keyword
) - if as_df == True ->
pandas.DataFrame
Return type:
-
lsdb
(lsdb=None, url=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]¶ Method to query
models.LSDB
objects in databaseParameters: - lsdb (str or tuple(str) or None) – name(s) of the Locus Specific Mutation Database
- url (str or tuple(str) or None) – URL of the Locus Specific Mutation Database
- hgnc_symbol (str or tuple(str) or None) – HGNC symbol(s)
- hgnc_identifier (int or tuple(int) or None) – identifiers(s) in
models.HGNC
- limit (int or tuple(int) or None) –
- if isinstance(limit,int)==True -> limit
- if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
- if limit == None -> all results
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: - if as_df == False -> list(
models.LSDB
) - if as_df == True ->
pandas.DataFrame
Return type: list(
models.LSDB
) orpandas.DataFrame
-
mgd
(mgdid=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]¶ Method to query
models.MGD
objects in databaseParameters: - mgdid (str or tuple(str) or None) – Mouse genome informatics database ID(s)
- hgnc_symbol (str or tuple(str) or None) – HGNC symbol(s)
- hgnc_identifier (int or tuple(int) or None) – identifiers(s) in
models.HGNC
- limit (int or tuple(int) or None) –
- if isinstance(limit,int)==True -> limit
- if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
- if limit == None -> all results
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: - if as_df == False -> list(
models.MGD
) - if as_df == True ->
pandas.DataFrame
Return type: list(
models.MGD
) orpandas.DataFrame
-
omim
(omimid=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]¶ Method to query
models.OMIM
objects in databaseParameters: - omimid (str or tuple(str) or None) – Online Mendelian Inheritance in Man (OMIM) ID(s)
- hgnc_symbol (str or tuple(str) or None) – HGNC symbol(s)
- hgnc_identifier (int or tuple(int) or None) – identifiers(s) in
models.HGNC
- limit (int or tuple(int) or None) –
- if isinstance(limit,int)==True -> limit
- if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
- if limit == None -> all results
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: - if as_df == False -> list(
models.OMIM
) - if as_df == True ->
pandas.DataFrame
Return type: list(
models.OMIM
) orpandas.DataFrame
-
orthology_prediction
(ortholog_species=None, human_entrez_gene=None, human_ensembl_gene=None, human_name=None, human_symbol=None, human_chr=None, human_assert_ids=None, ortholog_species_entrez_gene=None, ortholog_species_ensembl_gene=None, ortholog_species_db_id=None, ortholog_species_name=None, ortholog_species_symbol=None, ortholog_species_chr=None, ortholog_species_assert_ids=None, support=None, hgnc_identifier=None, hgnc_symbol=None, limit=None, as_df=False)[source]¶ Method to query
pyhgnc.manager.models.OrthologyPrediction
Parameters: - ortholog_species (int) – NCBI taxonomy identifier
- human_entrez_gene (str) – Entrez gene identifier
- human_ensembl_gene (str) – Ensembl identifier
- human_name (str) – human gene name
- human_symbol (str) – human gene symbol
- human_chr (str) – human chromosome
- human_assert_ids (str) –
- ortholog_species_entrez_gene (str) – Entrez gene identifier for ortholog
- ortholog_species_ensembl_gene (str) – Ensembl gene identifier for ortholog
- ortholog_species_db_id (str) – Species specific database identifier (e.g. MGI:1920453)
- ortholog_species_name (str) – gene name of ortholog
- ortholog_species_symbol (str) – gene symbol of ortholog
- ortholog_species_chr (str) – chromosome identifier (ortholog)
- ortholog_species_assert_ids (str) –
- support (str) –
- hgnc_identifier (int) – HGNC identifier
- hgnc_symbol (str) – HGNC symbol
- limit (int or tuple(int) or None) –
- if isinstance(limit,int)==True -> limit
- if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
- if limit == None -> all results
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: - if as_df == False -> list(
models.Keyword
) - if as_df == True ->
pandas.DataFrame
Return type: list(
models.Keyword
) orpandas.DataFrame
-
pubmed
(pubmedid=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]¶ Method to query
models.PubMed
objects in databaseParameters: - pubmedid (str or tuple(str) or None) – alias symbol(s)
- hgnc_symbol (str or tuple(str) or None) – HGNC symbol(s)
- hgnc_identifier (int or tuple(int) or None) – identifiers(s) in
models.HGNC
- limit (int or tuple(int) or None) –
- if isinstance(limit,int)==True -> limit
- if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
- if limit == None -> all results
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: - if as_df == False -> list(
models.PubMed
) - if as_df == True ->
pandas.DataFrame
Return type: list(
models.PubMed
) orpandas.DataFrame
-
ref_seq
(accession=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]¶ Method to query
models.RefSeq
objects in databaseParameters: - accession (str or tuple(str) or None) – RefSeq accessionl(s)
- hgnc_symbol (str or tuple(str) or None) – HGNC symbol(s)
- hgnc_identifier (int or tuple(int) or None) – identifiers(s) in
models.HGNC
- limit (int or tuple(int) or None) –
- if isinstance(limit,int)==True -> limit
- if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
- if limit == None -> all results
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: - if as_df == False -> list(
models.RefSeq
) - if as_df == True ->
pandas.DataFrame
Return type: list(
models.RefSeq
) orpandas.DataFrame
-
rgd
(rgdid=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]¶ Method to query
models.RGD
objects in databaseParameters: - rgdid (str or tuple(str) or None) – Rat genome database gene ID(s)
- hgnc_symbol (str or tuple(str) or None) – HGNC symbol(s)
- hgnc_identifier (int or tuple(int) or None) – identifiers(s) in
models.HGNC
- limit (int or tuple(int) or None) –
- if isinstance(limit,int)==True -> limit
- if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
- if limit == None -> all results
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: - if as_df == False -> list(
models.RGD
) - if as_df == True ->
pandas.DataFrame
Return type: list(
models.RGD
) orpandas.DataFrame
-
uniprot
(uniprotid=None, hgnc_symbol=None, hgnc_identifier=None, limit=None, as_df=False)[source]¶ Method to query
models.UniProt
objects in databaseParameters: - uniprotid (str or tuple(str) or None) – UniProt identifier(s)
- hgnc_symbol (str or tuple(str) or None) – HGNC symbol(s)
- hgnc_identifier (int or tuple(int) or None) – identifiers(s) in
models.HGNC
- limit (int or tuple(int) or None) –
- if isinstance(limit,int)==True -> limit
- if isinstance(limit,tuple)==True -> format:= tuple(page_number, results_per_page)
- if limit == None -> all results
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: - if as_df == False -> list(
models.UniProt
) - if as_df == True ->
pandas.DataFrame
Return type: list(
models.UniProt
) orpandas.DataFrame
-