Data Models ¶

PyHGNC uses SQLAlchemy to store the data in the database. You can use an instance of pyhgnc.manager.query.QueryManager to query the content of the database.

Entity–relationship model:

ToDo: Add ER figure here!

Contents

Data Models
- HGNC
- AliasSymbol
- AliasName
- GeneFamily
- RefSeq
- RGD
- OMIM
- MGD
- UniProt
- CCDS
- PubMed
- ENA
- Enzyme
- LSDB
- OrthologyPrediction
Database functions

HGNC ¶

class pyhgnc.manager.models.HGNC(**kwargs)[source]¶

Root class (table, model) for all other classes (tables, models) in PyHGNC. Basic information with 1:1 relationship to identifier are stored here

Warning

homeodb (Homeobox Database ID)
horde_id (Symbol used within HORDE for the gene)

described in README, but not found in HGNC JSON file

Hint

To link to IUPHAR/BPS Guide to PHARMACOLOGY database only use the number (only use 1 from the result objectId:1)

Variables:

name (str) – HGNC approved name for the gene. Equates to the “APPROVED NAME” field within the gene symbol report
symbol (str) – The HGNC approved gene symbol. Equates to the “APPROVED SYMBOL” field within the gene symbol report
orphanet (int) – Orphanet ID
identifier (str) – Unique ID created by the HGNC for every approved symbol (HGNC ID)
status (str) – Status of the symbol report, which can be either “Approved” or “Entry Withdrawn”
uuid (str) – universally unique identifier
locus_group (str) – Group name for a set of related locus types as defined by the HGNC (e.g. non-coding RNA)
locus_type (str) – Locus type as defined by the HGNC (e.g. RNA, transfer)
date_name_changed (date) – date the gene name was last changed
date_modified (date) – date the entry was last modified
date_symbol_changed (date) – date the gene symbol was last changed
date_approved_reserved (date) – date the entry was first approved
ensembl_gene (str) – Ensembl gene ID. Found within the “GENE RESOURCES” section of the gene symbol report
horde (str) – symbol used within HORDE for the gene (not available in JSON)
vega (str) – Vega gene ID. Found within the “GENE RESOURCES” section of the gene symbol report
lncrnadb (str) – Long Noncoding RNA Database identifier
entrez (str) – Entrez gene ID. Found within the “GENE RESOURCES” section of the gene symbol report
mirbase (str) – miRBase ID
iuphar (str) – The objectId used to link to the IUPHAR/BPS Guide to PHARMACOLOGY database
ucsc (str) – UCSC gene ID. Found within the “GENE RESOURCES” section of the gene symbol report
snornabase (str) – snoRNABase ID
imgt (str) – Symbol used within international ImMunoGeneTics information system
pseudogeneorg (str) – Pseudogene.org ID
bioparadigmsslc (str) – Symbol used to link to the SLC tables database at bioparadigms.org for the gene
locationsortable (str) – locations sortable
merops (str) – ID used to link to the MEROPS peptidase database
location (str) – Cytogenetic location of the gene (e.g. 2q34).
cosmic (str) – Symbol used within the Catalogue of somatic mutations in cancer for the gene
rgds (list) – relationship to RGD
omims (list) – relationship to OMIM
ccdss (list) – relationship to CCDS
lsdbs (list) – relationship to LSDB
orthology_predictions (list) – relationship to OrthologyPrediction
enzymes (list) – relationship to Enzyme
gene_families (list) – relationship to GeneFamily
refseq_accessions (list) – relationship to RefSeq
mgds (list) – relationship to MGD
uniprots (list) – relationship to UniProt
pubmeds (list) – relationship to PubMed
enas (list) – relationship to ENA

AliasSymbol ¶

class pyhgnc.manager.models.AliasSymbol(**kwargs)[source]¶

Other symbols used to refer to this gene as seen in the “SYNONYMS” field in the symbol report.

Attention

Symbols previously approved by the HGNC for this gene are tagged with is_previous_symbol==True. Equates to the “PREVIOUS SYMBOLS & NAMES” field within the gene symbol report.

Variables:	alias_symbol (str) – other symbol is_previous_symbol (bool) – previously approved hgnc – back populates to `HGNC`

AliasName ¶

class pyhgnc.manager.models.AliasName(**kwargs)[source]¶

Other names used to refer to this gene as seen in the “SYNONYMS” field in the gene symbol report.

Attention

Gene names previously approved by the HGNC for this gene are tagged with is_previous_name==True.. Equates to the “PREVIOUS SYMBOLS & NAMES” field within the gene symbol report.

Variables:	alias_name (str) – other name is_previous_name (bool) – previously approved hgnc – back populates to `HGNC`

GeneFamily ¶

class pyhgnc.manager.models.GeneFamily(**kwargs)[source]¶

Name and identifier given to a gene family or group the gene has been assigned to. Equates to the “GENE FAMILY” field within the gene symbol report.

Variables:	familyid (int) – family identifier familyname (str) – family name hgncs (list) – back populates to `HGNC`

RefSeq ¶

class pyhgnc.manager.models.RefSeq(**kwargs)[source]¶

RefSeq nucleotide accession(s). Found within the”NUCLEOTIDE SEQUENCES” section of the gene symbol report.

See also RefSeq database for more information.

Variables:	accession (str) – RefSeq accession number hgncs (list) – back populates to `HGNC`

RGD ¶

class pyhgnc.manager.models.RGD(**kwargs)[source]¶

Rat genome database gene ID. Found within the “HOMOLOGS” section of the gene symbol report

Variables:	rgdid (str) – Rat genome database gene ID hgncs – back populates to `HGNC`

OMIM ¶

class pyhgnc.manager.models.OMIM(**kwargs)[source]¶

Online Mendelian Inheritance in Man (OMIM) ID

Variables:	omimid (str) – OMIM ID hgnc – back populates to pyhgnc.manager.models.HGNC

MGD ¶

class pyhgnc.manager.models.MGD(**kwargs)[source]¶

Mouse genome informatics database ID. Found within the “HOMOLOGS” section of the gene symbol report

Variables:	mgdid (str) – Mouse genome informatics database ID hgncs (list) – back populates to `HGNC`

UniProt ¶

class pyhgnc.manager.models.UniProt(**kwargs)[source]¶

Universal Protein Resource (UniProt) protein accession. Found within the “PROTEIN RESOURCES” section of the gene symbol report.

See also UniProt webpage for more information.

Variables:	uniprotid (str) – UniProt identifier hgncs (list) – back populates to `HGNC`

CCDS ¶

class pyhgnc.manager.models.CCDS(**kwargs)[source]¶

Consensus CDS ID. Found within the “NUCLEOTIDE SEQUENCES” section of the gene symbol report.

PubMed ¶

class pyhgnc.manager.models.PubMed(**kwargs)[source]¶

PubMed and Europe PubMed Central PMID

Variables:	pubmedid (str) – Pubmed identifier hgncs (list) – back populates to `HGNC`

ENA ¶

class pyhgnc.manager.models.ENA(**kwargs)[source]¶

International Nucleotide Sequence Database Collaboration (GenBank, ENA and DDBJ) accession number(s). Found within the “NUCLEOTIDE SEQUENCES” section of the gene symbol report.

Variables:	enaid (str) – European Nucleotide Archive (ENA) identifier hgncs (list) – back populates to `HGNC`

Enzyme ¶

class pyhgnc.manager.models.Enzyme(**kwargs)[source]¶

Enzyme Commission number (EC number)

Variables:	ec_number (str) – EC number hgncs (list) – back populates to `HGNC`

LSDB ¶

class pyhgnc.manager.models.LSDB(**kwargs)[source]¶

The name of the Locus Specific Mutation Database and URL

Variables:	lsdb (str) – name of the Locus Specific Mutation Database url (str) – URL to database hgnc – back populates to `HGNC`

OrthologyPrediction ¶

class pyhgnc.manager.models.OrthologyPrediction(**kwargs)[source]¶

Orthology Predictions

Warning

OrthologyPrediction is still not correctly normalized and documented.

Variables:

ortholog_species (int) – NCBI taxonomy identifier
human_entrez_gene (int) – Human Entrey gene identifier
human_ensembl_gene (str) – Human Ensembl gene identifier
human_name (str) – Human gene name
human_symbol (str) – Human gene symbol
human_chr (str) – Human gene chromosome location
human_assert_ids (str) –
ortholog_species_entrez_gene (str) – Ortholog species Entrez gene identifier
ortholog_species_ensembl_gene (str) – Ortholog species Ensembl gene identifier
ortholog_species_db_id (str) – Ortholog species database identifier
ortholog_species_name (str) – Ortholog species gene name
ortholog_species_symbol (str) – Ortholog species gene symbol
ortholog_species_chr (str) – Ortholog species gene chromosome location
ortholog_species_assert_ids (str) –
support (str) –
hgnc – back populates to HGNC

Database functions ¶

set_connection ¶

pyhgnc.manager.database.set_connection(connection='sqlite:////home/docs/.pyhgnc/data/pyhgnc.db')[source]¶

Set the connection string for sqlalchemy and write it to the config file.

import pyhgnc
pyhgnc.set_connection('mysql+pymysql://{user}:{passwd}@{host}/{db}?charset={charset}')

Hint

valid connection strings

mysql+pymysql://user:passwd@localhost/database?charset=utf8
postgresql://scott:tiger@localhost/mydatabase
mssql+pyodbc://user:passwd@database
oracle://user:passwd@127.0.0.1:1521/database
Linux: sqlite:////absolute/path/to/database.db
Windows: sqlite:///C:path odatabase.db

Parameters:	connection (str) – sqlalchemy connection string

update ¶

pyhgnc.manager.database.update(connection=None, silent=False, hgnc_file_path=None, hcop_file_path=None, low_memory=False)[source]¶

Update the database with current version of HGNC

Parameters:	connection (str) – conncetion string silent (bool) – silent while import hgnc_file_path (str) – import from path HGNC hcop_file_path (str) – import from path HCOP (orthologs) low_memory (bool) – set to True if you have low memory
Returns:

set_mysql_connection ¶

pyhgnc.manager.database.set_mysql_connection(host='localhost', user='pyhgnc_user', passwd='pyhgnc_passwd', db='pyhgnc', charset='utf8')[source]¶

Method to set a MySQL connection

Parameters:	host (str) – MySQL database host user (str) – MySQL database user passwd (str) – MySQL database password db (str) – MySQL database name charset (str) – MySQL database charater set
Returns:	connection string
Return type:	str