Data Models¶
PyHGNC uses SQLAlchemy to store the data in the database.
You can use an instance of pyhgnc.manager.query.QueryManager
to query the content of the database.
Entity–relationship model:
ToDo: Add ER figure here!
Contents
HGNC¶
-
class
pyhgnc.manager.models.
HGNC
(**kwargs)[source]¶ Root class (table, model) for all other classes (tables, models) in PyHGNC. Basic information with 1:1 relationship to identifier are stored here
Warning
- homeodb (Homeobox Database ID)
- horde_id (Symbol used within HORDE for the gene)
described in README, but not found in HGNC JSON file
Hint
To link to IUPHAR/BPS Guide to PHARMACOLOGY database only use the number (only use 1 from the result objectId:1)
Variables: - name (str) – HGNC approved name for the gene. Equates to the “APPROVED NAME” field within the gene symbol report
- symbol (str) – The HGNC approved gene symbol. Equates to the “APPROVED SYMBOL” field within the gene symbol report
- orphanet (int) – Orphanet ID
- identifier (str) – Unique ID created by the HGNC for every approved symbol (HGNC ID)
- status (str) – Status of the symbol report, which can be either “Approved” or “Entry Withdrawn”
- uuid (str) – universally unique identifier
- locus_group (str) – Group name for a set of related locus types as defined by the HGNC (e.g. non-coding RNA)
- locus_type (str) – Locus type as defined by the HGNC (e.g. RNA, transfer)
- date_name_changed (date) – date the gene name was last changed
- date_modified (date) – date the entry was last modified
- date_symbol_changed (date) – date the gene symbol was last changed
- date_approved_reserved (date) – date the entry was first approved
- ensembl_gene (str) – Ensembl gene ID. Found within the “GENE RESOURCES” section of the gene symbol report
- horde (str) – symbol used within HORDE for the gene (not available in JSON)
- vega (str) – Vega gene ID. Found within the “GENE RESOURCES” section of the gene symbol report
- lncrnadb (str) – Long Noncoding RNA Database identifier
- entrez (str) – Entrez gene ID. Found within the “GENE RESOURCES” section of the gene symbol report
- mirbase (str) – miRBase ID
- iuphar (str) – The objectId used to link to the IUPHAR/BPS Guide to PHARMACOLOGY database
- ucsc (str) – UCSC gene ID. Found within the “GENE RESOURCES” section of the gene symbol report
- snornabase (str) – snoRNABase ID
- imgt (str) – Symbol used within international ImMunoGeneTics information system
- pseudogeneorg (str) – Pseudogene.org ID
- bioparadigmsslc (str) – Symbol used to link to the SLC tables database at bioparadigms.org for the gene
- locationsortable (str) – locations sortable
- merops (str) – ID used to link to the MEROPS peptidase database
- location (str) – Cytogenetic location of the gene (e.g. 2q34).
- cosmic (str) – Symbol used within the Catalogue of somatic mutations in cancer for the gene
- rgds (list) – relationship to RGD
- omims (list) – relationship to OMIM
- ccdss (list) – relationship to CCDS
- lsdbs (list) – relationship to LSDB
- orthology_predictions (list) – relationship to OrthologyPrediction
- enzymes (list) – relationship to Enzyme
- gene_families (list) – relationship to GeneFamily
- refseq_accessions (list) – relationship to RefSeq
- mgds (list) – relationship to MGD
- uniprots (list) – relationship to UniProt
- pubmeds (list) – relationship to PubMed
- enas (list) – relationship to ENA
AliasSymbol¶
-
class
pyhgnc.manager.models.
AliasSymbol
(**kwargs)[source]¶ Other symbols used to refer to this gene as seen in the “SYNONYMS” field in the symbol report.
Attention
Symbols previously approved by the HGNC for this gene are tagged with is_previous_symbol==True. Equates to the “PREVIOUS SYMBOLS & NAMES” field within the gene symbol report.
Variables: - alias_symbol (str) – other symbol
- is_previous_symbol (bool) – previously approved
- hgnc – back populates to
HGNC
AliasName¶
-
class
pyhgnc.manager.models.
AliasName
(**kwargs)[source]¶ Other names used to refer to this gene as seen in the “SYNONYMS” field in the gene symbol report.
Attention
Gene names previously approved by the HGNC for this gene are tagged with is_previous_name==True.. Equates to the “PREVIOUS SYMBOLS & NAMES” field within the gene symbol report.
Variables: - alias_name (str) – other name
- is_previous_name (bool) – previously approved
- hgnc – back populates to
HGNC
GeneFamily¶
UniProt¶
-
class
pyhgnc.manager.models.
UniProt
(**kwargs)[source]¶ Universal Protein Resource (UniProt) protein accession. Found within the “PROTEIN RESOURCES” section of the gene symbol report.
See also UniProt webpage for more information.
Variables:
ENA¶
OrthologyPrediction¶
-
class
pyhgnc.manager.models.
OrthologyPrediction
(**kwargs)[source]¶ Orthology Predictions
Warning
OrthologyPrediction is still not correctly normalized and documented.
Variables: - ortholog_species (int) – NCBI taxonomy identifier
- human_entrez_gene (int) – Human Entrey gene identifier
- human_ensembl_gene (str) – Human Ensembl gene identifier
- human_name (str) – Human gene name
- human_symbol (str) – Human gene symbol
- human_chr (str) – Human gene chromosome location
- human_assert_ids (str) –
- ortholog_species_entrez_gene (str) – Ortholog species Entrez gene identifier
- ortholog_species_ensembl_gene (str) – Ortholog species Ensembl gene identifier
- ortholog_species_db_id (str) – Ortholog species database identifier
- ortholog_species_name (str) – Ortholog species gene name
- ortholog_species_symbol (str) – Ortholog species gene symbol
- ortholog_species_chr (str) – Ortholog species gene chromosome location
- ortholog_species_assert_ids (str) –
- support (str) –
- hgnc – back populates to
HGNC
Database functions¶
set_connection¶
-
pyhgnc.manager.database.
set_connection
(connection='sqlite:////home/docs/.pyhgnc/data/pyhgnc.db')[source]¶ Set the connection string for sqlalchemy and write it to the config file.
import pyhgnc pyhgnc.set_connection('mysql+pymysql://{user}:{passwd}@{host}/{db}?charset={charset}')
Hint
valid connection strings
- mysql+pymysql://user:passwd@localhost/database?charset=utf8
- postgresql://scott:tiger@localhost/mydatabase
- mssql+pyodbc://user:passwd@database
- oracle://user:passwd@127.0.0.1:1521/database
- Linux: sqlite:////absolute/path/to/database.db
- Windows: sqlite:///C:path odatabase.db
Parameters: connection (str) – sqlalchemy connection string