Aug
13
Tue
2013
Invited Talk: Interpretation of Genomic Variation – Identifying Rare Variations Leading to Disease @ Sathyam Hall
Aug 13 @ 10:20 am – 10:40 am

SrinivasanRajgopal Srinivasan, Ph.D.
Principal Scientist & Head Bio IT R&D, TCS Innovation Labs, India


Interpretation of Genomic Variation – Identifying Rare Variations Leading to Disease

Genome sequencing technologies are generating an abundance of data on human genetic variations. A big challenge lies in interpreting the functional relevance of such variations, especially in clinical settings. A first step in understanding the clinical relevance of genetic variations is to annotate the variants for region of occurrence, degree of conservation both within and across species, pattern of variation across related individuals, novelty of the variation and know effects of related variations.  Several tools already exist for this purpose. However, these tools have their strengths and weaknesses. A second issue is the development of algorithms, which, given a rich annotation of variants are able to prioritize the variants as being relevant to the phenotype under investigation.

In my talk I will detail work that has been done in our labs to address both of the above problems. I will also illustrate the application of these tools that helped identify a rare mutation in the ATM gene leading to a diagnosis of AT in two infants.

 

 

Invited Talk: Applying Machine learning for Automated Identification of Patient Cohorts @ Sathyam Hall
Aug 13 @ 2:40 pm – 3:05 pm

SriSairamSrisairam Achuthan, Ph.D.
Senior Scientific Programmer, Research Informatics Division, Department of Information Sciences, City of Hope, CA, USA


Applying Machine learning for Automated Identification of Patient Cohorts

Srisairam Achuthan, Mike Chang, Ajay Shah, Joyce Niland

Patient cohorts for a clinical study are typically identified based on specific selection criteria. In most cases considerable time and effort are spent in finding the most relevant criteria that could potentially lead to a successful study. For complex diseases, this process can be more difficult and error prone since relevant features may not be easily identifiable. Additionally, the information captured in clinical notes is in non-coded text format. Our goal is to discover patterns within the coded and non-coded fields and thereby reveal complex relationships between clinical characteristics across different patients that would be difficult to accomplish manually. Towards this, we have applied machine learning techniques such as artificial neural networks and decision trees to determine patients sharing similar characteristics from available medical records. For this proof of concept study, we used coded and non-coded (i.e., clinical notes) patient data from a clinical database. Coded clinical information such as diagnoses, labs, medications and demographics recorded within the database were pooled together with non-coded information from clinical notes including, smoking status, life style (active / inactive) status derived from clinical notes. The non-coded textual information was identified and interpreted using a Natural Language Processing (NLP) tool I2E from Linguamatics.