Applications Of Gene Network And Pathway Tools

Pathway and gene network tools have found numerous applications for understanding gene and protein expression in various circumstances, whether during disease or after treatment with a particular molecule. A recent review has described the tools for building biological networks that can be used for the analysis of experimental data in drug discovery [40]. The putative applications include target identification, validation, and prioritization. The methods available can be used to define toxicity biomarkers and for lead optimization

TABLE 6.2 Pharmaceutical Companies and Their Systems Biology Portfolios of Commercial Software and Collaborations Based on Press Releases, Posters at Scientific Meetings and Information on Vendor Websites

Company

Network Tools

Disease Models

Ontologies

Systems Biology Collaborations m m H

Astra Zeneca Bayer

Bristol-Myers Squibb Eli Lilly

Glaxo - Smith-Kline

Johnson & Johnson

Merck

Novartis

Organon

Pfizer

P&G

Roche

Sanofi-Aventis

Wyeth

GGMC

IPA, GGMC, JPA

IPA, GGMC

GGMC

GGMC

IPA, BSS

GGMC

EOM EAM EOM

ERAM EAM

TBAG EAM

BG, MIT

Lilly Systems Biology ( Singapore ) BG

ISB BG

GGMC

Companies may have either accessed these technologies or continue to use them as standalone or integrated with proprietary technologies. These portfolios may be in addition to internal software efforts that are likely to be ongoing.

Abbreviations: BG, BG Medicine; EAM, Entelos Asthma Model; ECM, Entelos Cardiac model; EOM, Entelos Obesity Model; ERMA, Entelos Rheumatoid Arthritis Model; GGMC, GeneGo MetaCore; GNS, Gene Network Systems Oncology Model; IPA, Ingenuity Pathway Analysis; JPA, Jubilant PathArt, ISB, Institute for Systems Biology, Massachusetts Institute of Technology's Computational and Systems Biology Initiative. The Bio-Analytics Group ("TBAG") and BlOSoftware Systems ("BSS"); EG, Electric Genetics eVoke.

or candidate selection. Clinical data can also be analyzed and may be useful to provide new indications for marketed drugs or as a means to perform postmarketing studies.

One of the few instances where systems biology research from a major commercial concern (namely Proctor and Gamble) has been published concerns a study using gene expression data to identify stress response networks in Mycobacterium tuberculosis before and after treatment with different drugs [41]. The research combined the KEGG and BioCyc protein interaction databases with previously published expression data and a k-shortest path algorithm. It was found that networks for isoniazid and hydrogen peroxide indicated a generic stress response that highlighted unique features. The authors suggested that differential network expression can be used to assess drug mode of action with similar networks indicating similar mechanisms [41]. A second recently published study combined microarray expression data from HeLa cells with Ingenuity pathways software to understand the expression of DBC2. The authors were able to find two networks that had at least 50% of the genes that were affected by DBC2 expression. These corresponded to cell cycle control, apoptosis, and cytoskeleton and membrane trafficking [12]. Several other applications of this software have also been published (Table 6.1).

A growing number of studies to date presented as meeting abstracts have used MetaCore software for genomic and proteomic data analysis. Yang et al. used a proteomic analysis to examine the targets of oxidative stress in brain tissue from the PS1/APP mouse model for Alzheimer disease and visualized these targets as a network and highlighted the proteins that are oxida-tively modified [42]. Waters et al. integrated microarray and proteomic data studies with pathway analysis and network modeling of epidermal growth factor signaling in human mammary epithelial cells and identified new cross talk mediators Src and matrix metalloproteinases as responsible for modification of the extracellular matrix [43]. Lantz et al. studied protein expression in rats exposed to arsenic in utero. Twelve proteins involved in signal trans-duction, cytoskeleton, nuclear organization, and DNA repair were differentially expressed and could be readily connected as a network to identify the potential involvement of RAC1, Pyk2, CDC42, JNK, and occludins as sites of action for arsenic [44]. Nie et al. produced a gene signature for nongeno-toxic carcinogens after establishing a database of more than 100 hepatotoxi-cants and used a stepwise exhaustive search algorithm. Ultimately, six genes were selected to differentiate nongenotoxic carcinogens from noncarcinogens [45]. A mouse emphysema model treated with elastase was used to show 95 genes that were differentially expressed after 1 week [46]. These data were analyzed with pathway maps and gene networks to show that the principal nodes of gene regulation were around the vitamin D receptor, Ca2+, MMP13, and the transcription factors c-myc and SP1. The myometrial events in guinea pigs during pregnancy were studied, using gene expression, signaling and metabolic maps, and gene networks to provide a global and comprehensive analysis for visualizing and understanding the dynamics of myometrial activation [47]. Further work from the same group has focused on G proteins, showing increased GTPase activity during pregnancy in guinea pigs, an effect also seen with estradiol [48]. The data in this study were visualized on metabolic maps and gene networks.

An algorithm for the reconstruction of accurate cellular networks (ARACNe) was recently described and used to reconstruct expression profiles of human B cells. ARACNe identifies statistically significant gene-gene coregulation and eliminates indirect interactions. Using 336 expression profiles after perturbing B cell phenotypes, a network was inferred. MYC appeared in the top 5% of cellular hubs, and the network consisted of 40% of previously identified target genes [49]. HCN-1A cells treated with different drugs were used to produce a compendium of gene signatures that was used to generate "sampling over gene space" models with random forests, linear discriminant analysis, and support vector machines. This approach was then used to classify drug classes, potentially representing a novel method for drug discovery as it discriminates physiologically active from inactive molecules and could identify drugs with off-target effects and assign confidence in their further assessment [38].

With a similar compendium-based comparative approach, the oxidative stress-inducing potential of over 50 new proprietary compounds under investigation at Johnson and Johnson was predicted from their matching gene expression signatures [50]. This study is particularly informative in that it was able to distinguish distinct mechanisms of action for diverse hepatotoxicants, all of which similarly resulted in oxidative stress, an adverse cellular condition. Initial successes such as this example suggest that gene expression signatures have potential utility in the detection of presymptomatic clinical conditions and in the molecular diagnosis of disease states. The ability to group patients who share a common disease phenotype or set of clinical symptoms by their gene expression signature is a critical milestone in achieving the goal of personalized and predictive medicine [51].

Numerous mechanisms have been proposed for hypertension, and subsequently there are many microarray studies with large amounts of data but little new information on mechanism to date. Therefore more complete sets of data and integration that may contribute to better therapeutic outcome and disease prevention are needed [52]. Ninety-two genes associated with atherosclerosis were used to generate a network with KEGG and Biocarta previously. Thirty-nine of these genes are in pathways containing at least three atherosclerosis genes, which represented 16 biological and signaling pathways with 353 unique genes. Numerous genes not previously associated with atherosclerosis were indicated on the network [53]. In contrast, the use of the commercially available tool MetaCore with this gene list enabled the mapping of 89 genes on networks, and 68 of these genes were on maps, with only three missing from this mapping. This set of genes was then used with the analyze networks algorithm to generate multiple networks. The network with the largest G-score (Fig. 6.1A, 35.72, p = 6.1e-61) was different from that with the highest p-value (Fig. 6.1B, 13.44, p = 2.7e-77). The former contained APOE and APOA1 as central hubs and also mapped onto the GO processes for cholesterol homeostasis (p = 10e-14) and cholesterol metabolism (p = 6.4e-13), whereas the latter had NF-kB as a hub gene and mapped to the inflammatory response (p = 1.6e-16) and the immune response (3.4e-10). There were several genes that were absent from the initial gene list identified in the original publication [53] but appear on either network including C/ EBPa, EDNRP, C/EBP, CRP, Brca1, CYP27B1, CYP2C8, PSAP, Calreticu-lin, Serglycin, MAPK7, MAPK1/3, a 2M, APP, Amyloid p, and Matrilysin. These may represent future genes to be assessed for their importance in hypertension.

Understanding the gene networks that can be generated in cells or whole organisms by single compounds enables the generation of signature networks [11]. Numerous recent studies have generated microarray data after treatment with xenobiotics (Table 6.3) that can be used with network and pathway database tools. Many other examples that have been recently summarized could also be used in this way [54]. For example, the anticancer activity of tanshinone IIA was evaluated against human breast cancer MCF-7 cells, and the changes in gene expression were evaluated over 72 h with a microarray containing over 3000 genes [55]. The resultant data for 65 genes that were either significantly up- or downregulated were used as an input for Meta-Core, and 48 of these genes were able to be used for network generation. The analyze networks algorithm was then used to generate multiple networks. The best G-score was 31.29 and p = 6.24 e-40 (Fig. 6.2A), whereas the best p-value network had a G-score of 13.17, p = 3.41e-47 (Fig. 6.2B). The Gene ontology processes were mapped to these, and for the best G-score network cell adhesion p = 6.29e-07 was the most significant, although the majority of genes were involved in the cell cycle p =1.32e-05 or apoptosis p = 3.44e-05. The best p-score network indicated a role in the cell cycle p = 1.33e-12, as over 30% of the genes were involved in this process. In both networks there were numerous genes that were not significantly up- or downregulated but nonetheless are present on these statistically significant networks. This type of approach has been taken for another anticancer drug, Tipifarnib, a non-peptidomimetic competitive farnesyltransferase inhibitor used for treatment in acute myeloid leukemia [56]. Gene expression analysis in three cell lines and blast cells from patients indicated a common set of 72 genes that were mapped onto cell signaling, cytoskeletal, immunity, and apoptosis pathways with Ingenuity Pathways Analysis [56]. Another published method has previously used GO annotations and correspondence analysis to generate a map of genes in human pancreatic cancer [57]. It is likely that an approach like this combining high-content data with curated databases and gene networks may be applicable to analysis of other diseases and available therapeutic treatments.

Ingenuity Gene Network
PPAR-llpha FSiE(CD55)
Metacore Proteomic Shortest

Gal p h a (q)-?pecifie peptide GPCRs

Figure 6.1 Gene interaction networks for atherosclerosis generated with the gene list from Ghazalpour et al. [53] with MetaCore™ (GeneGo, St. Joseph, MI). A. best G-score. B. best p value. The interaction types between nodes are shown as small colored hexagons, e.g., unspecified, allosteric regulation, binding, cleavage, competition, covalent modification, dephosphorylation, phosphorylation, transcription regulation, transformation. When applicable, interactions also have a positive or negative effect and direction. Ligands (purple) linked to other proteins (blue), transfactors (red), enzymes (orange). Genes with red dots represent the members of the original input gene list. See color plate.

Gal p h a (q)-?pecifie peptide GPCRs

Figure 6.1 Gene interaction networks for atherosclerosis generated with the gene list from Ghazalpour et al. [53] with MetaCore™ (GeneGo, St. Joseph, MI). A. best G-score. B. best p value. The interaction types between nodes are shown as small colored hexagons, e.g., unspecified, allosteric regulation, binding, cleavage, competition, covalent modification, dephosphorylation, phosphorylation, transcription regulation, transformation. When applicable, interactions also have a positive or negative effect and direction. Ligands (purple) linked to other proteins (blue), transfactors (red), enzymes (orange). Genes with red dots represent the members of the original input gene list. See color plate.

TABLE 6.3 Literature Data That Could Be Used to Create Compound Signature Networks

Compounds

Tissue Source

Microarray type

Compounds

Data Availability Reference

Docetaxel (Taxotere) Estramustine

Taxotere Capecitabine (Furtulon)

Nobiletin Allopregnanolone

Palmitate

Letrozole

Anastrozole

Tamoxifen

Human prostate cancer cells PC3 and LNCaP

Human prostate cancer cells PC3 and LNCaP

Human HepG2

cells Rat hippocampal neurons

Human hepatic Huh-7

MCF-7aro

Affymetrix U133A

Affymetrix U133A

Acegene human oligo chip subset A Cell Cycle GEArray Q series, version 1

Custom

Affymetrix U133A

2nmol/l Docetaxel 4(xmol/l Estramustine combination of 1 nmol/1 Docetaxel and 2(xmol/l Estramustine for 6, 36

and 72 h 2nM Taxotere, HOjiM capecitabine combination of InM taxotere and 50 (xM capecitabine for 6, 36 and 72h 10-3 M

500 nM

150 nM for 24 and 48 h

200 nmol/1 Letrozole 1 (xmol/1 Anastrozole 1 (xmol/1 Tamoxifen

Gene name. 111

accession number and fold change data in a manuscript table

Gene name, 112

accession number and fold change data in a manuscript table

Gene name 113

Gene name and fold 114

change on a bar chart

Gene name, fold 115

change as accession number, tables

Gene symbol, 13

GenBank identifier, gene name, ratio as a table

TABLE 6.3 Continued

Compounds

Tissue Source

Microarray type

Compounds

Data Availability Reference

Tanshinone IIA

iV-Hydroxy-4-

acetylaminobiphenyl Benzo[a]pyrene diol epoxide

Estradiol

Medroxyprogesterone acetate

Genistein

Daidzen

Glycitein

17ß-Estradiol

17ß-Estradiol

tetrahydrocannabinol

MCF-7

Human TK6 lymphoblastoid

Human microvascular endometrial endothelial cells MCF-7

H04 ExpressChip

Custom

Affymetrix U133A

Custom

Human endometrium Affymetrix U133A

4Tl-stimulated lymph node cells from mice

GEArray Q series mouse TH1, Th2, Th3 array membranes

0.25 Hg/ml Tanshinone IIA

iV-Hydroxy-4-

acetylaminobiphenyl 10 (xM for 27 h Benzo[a]pyrene diol epoxide 10 nM for lh E2 10-8M, MDA 10~7M for 48 h

Genistein 10|iM Daidzen 10|iM Glycitein 10|iM 17(3 -Estradiol 10 nM 17ß-Estradiol InM

50mg/kg A-9-tetrahydrocannabinol

Gene name, 55

Unigene symbol and Unigene identifier, fold change

Gene name, gene 116

symbol, gene bank accession fold change in a table All data available online

Gene name, fold 117

change

Gene name, 118

Accession number, fold change in table

Gene name. Gene 119

symbol. Accession number, fold change in table Gene name, 120

accession number and fold change in table

Figure 6.2 Gene interaction networks for tanshinone IIA-treated MCF-7 cells for 72 h [55] were generated with MetaCore™ (GeneGo). A. best G-score. B. best p value. The interaction types between nodes are shown as small colored hexagons, e. g., unspecified, allosteric regulation, binding, cleavage, competition, covalent modification, dephosphorylation, phosphorylation, transcription regulation, transformation. When applicable, interactions also have a positive or negative effect and direction. Ligands (purple) linked to other proteins (blue), transfactors (red), enzymes (orange). Genes with red dots represent the members of the original input gene list that were upregulated, whereas blue dots represent downregulated genes. See color plate.

Figure 6.2 Gene interaction networks for tanshinone IIA-treated MCF-7 cells for 72 h [55] were generated with MetaCore™ (GeneGo). A. best G-score. B. best p value. The interaction types between nodes are shown as small colored hexagons, e. g., unspecified, allosteric regulation, binding, cleavage, competition, covalent modification, dephosphorylation, phosphorylation, transcription regulation, transformation. When applicable, interactions also have a positive or negative effect and direction. Ligands (purple) linked to other proteins (blue), transfactors (red), enzymes (orange). Genes with red dots represent the members of the original input gene list that were upregulated, whereas blue dots represent downregulated genes. See color plate.

0 0

Post a comment