13/08/2013 @ 2:40 pm – 3:05 pm
Sathyam Hall
Amrita University
Amritapuri, Vallikavu, Kerala 690525

SriSairamSrisairam Achuthan, Ph.D.
Senior Scientific Programmer, Research Informatics Division, Department of Information Sciences, City of Hope, CA, USA

Applying Machine learning for Automated Identification of Patient Cohorts

Srisairam Achuthan, Mike Chang, Ajay Shah, Joyce Niland

Patient cohorts for a clinical study are typically identified based on specific selection criteria. In most cases considerable time and effort are spent in finding the most relevant criteria that could potentially lead to a successful study. For complex diseases, this process can be more difficult and error prone since relevant features may not be easily identifiable. Additionally, the information captured in clinical notes is in non-coded text format. Our goal is to discover patterns within the coded and non-coded fields and thereby reveal complex relationships between clinical characteristics across different patients that would be difficult to accomplish manually. Towards this, we have applied machine learning techniques such as artificial neural networks and decision trees to determine patients sharing similar characteristics from available medical records. For this proof of concept study, we used coded and non-coded (i.e., clinical notes) patient data from a clinical database. Coded clinical information such as diagnoses, labs, medications and demographics recorded within the database were pooled together with non-coded information from clinical notes including, smoking status, life style (active / inactive) status derived from clinical notes. The non-coded textual information was identified and interpreted using a Natural Language Processing (NLP) tool I2E from Linguamatics.

Invited Talk: Applying Machine learning for Automated Identification of Patient Cohorts