When:
13/08/2013 @ 3:25 pm – 3:35 pm
2013-08-13T15:25:00+05:30
2013-08-13T15:35:00+05:30
Where:
Sathyam Hall
Amrita University
Amritapuri, Vallikavu, Kerala 690525
India

Bhadrachalam Chitturi, Balaji Raghavachari and Donghyun Kim


Efficient gene prioritization

The gene prioritization, GP, problem seeks to identify the most promising genes among several candidate genes. In genetics, gene related conditions are typically associated with chromosomal regions, say with GWAS. These associations yield lists of candidate genes. A priori, some genes i.e. seed genes, are associated with a specific disease D; additional genes that are implicated via associations constitute the potential candidates. Thus, most promising novel candidates for D are sought. In network based approach, a protein protein interaction network, i.e. NP , and a set S of seed genes constitute the prior knowledge. We treat a gene and the protein that it encodes identically. Various GP algorithms based on guilt by association are run on the NP to predict novel candidates [1–6]. They rank a new candidate gene by its estimated association to D.

Distance between a pair of genes is the shortest path measured in the number of edges. Diameter of a set of genes is the longest distance between any pair of genes in terms of the number of edges. The density of a set X of genes is defined as e(X)/|X| where e(X) denotes the number of edges among genes of X and |X| denotes the number of genes of X. The set S: (i) can be of minimal size (say one), (ii) is tightly coupled in NP , i.e. has low-diameter/high-density, or (iii) is loosely coupled, i.e. has high-diameter/low-density. Similarly, the GP algorithms can be partitioned into: Type-1 that ignore the edge weights and Type-2 that employ the edge weights. However, currently, the prioritization process neither exploits the character of S nor the type of GP algorithm that is run. Given S, we compute two core networks of NP which we call NC1 and NC2 that are subnetworks of NP . The idea is to execute GP algorithms of Type-1 and Type-2 on NC1 and NC2 respectively instead of NP . Typically, NC1 and NC2 are much smaller than NP . Also, one runs several algorithms of Type-1 and Type-2 [2–4, 6] and takes consensus [6].

In general, the time to run a GP algorithm say AP on NP i.e. t1 or to compute NC1 and NC2 i.e. t2 is proportional to e(NP ) where e(NP ) e(NC1) and e(NP ) e(NC2). However, executing AP on NC1/NC2 (a much smaller network) is much more efficient than executing AP on NP . We run several GP algorithms onNC1/NC2 [6] but computeNC1/NC2 only once. So, overall our method is more efficient. Preliminary implementation results show that for several GP algorithms, the candidates identified by our method match the topmost prioritized candidates identified by the direct execution of the algorithm on NP . Overall, our method was more efficient. Based on the number of candidates that we seek and the nature of S, we can generate variants of NCx, x ∈ {1, 2}. In some cases, AP determines the appropriate variant of NCx.

Delegate Talk: Efficient gene prioritization