Aug
13
Tue
2013
Invited Talk: Applying Machine learning for Automated Identification of Patient Cohorts @ Sathyam Hall
Aug 13 @ 2:40 pm – 3:05 pm

SriSairamSrisairam Achuthan, Ph.D.
Senior Scientific Programmer, Research Informatics Division, Department of Information Sciences, City of Hope, CA, USA


Applying Machine learning for Automated Identification of Patient Cohorts

Srisairam Achuthan, Mike Chang, Ajay Shah, Joyce Niland

Patient cohorts for a clinical study are typically identified based on specific selection criteria. In most cases considerable time and effort are spent in finding the most relevant criteria that could potentially lead to a successful study. For complex diseases, this process can be more difficult and error prone since relevant features may not be easily identifiable. Additionally, the information captured in clinical notes is in non-coded text format. Our goal is to discover patterns within the coded and non-coded fields and thereby reveal complex relationships between clinical characteristics across different patients that would be difficult to accomplish manually. Towards this, we have applied machine learning techniques such as artificial neural networks and decision trees to determine patients sharing similar characteristics from available medical records. For this proof of concept study, we used coded and non-coded (i.e., clinical notes) patient data from a clinical database. Coded clinical information such as diagnoses, labs, medications and demographics recorded within the database were pooled together with non-coded information from clinical notes including, smoking status, life style (active / inactive) status derived from clinical notes. The non-coded textual information was identified and interpreted using a Natural Language Processing (NLP) tool I2E from Linguamatics.

Aug
14
Wed
2013
Invited Talk: A draft map of the human proteome @ Amriteshwari Hall
Aug 14 @ 10:42 am – 11:30 am

akhileshAkhilesh Pandey, Ph.D.
Professor, Johns Hopkins University School of Medicine, Baltimore, USA


A draft map of the human proteome

We have generated a draft map of the human proteome through a systematic and comprehensive analysis of normal human adult tissues, fetal tissues and hematopoietic cells as an India-US initiative. This unique dataset was generated from 30 histologically normal adult tissues, fetal tissues and purified primary hematopoietic cells that were analyzed at high resolution in the MS mode and by HCD fragmentation in the MS/MS mode on LTQ-Orbitrap Velos/Elite mass spectrometers. This dataset was searched against a 6-frame translation of the human genome and RNA-Seq transcripts in addition to standard protein databases. In addition to confirming a large majority (>16,000) of the annotated protein-coding genes in humans, we obtained novel information at multiple levels: novel protein-coding genes, unannotated exons, novel splice sites, proof of translation of pseudogenes (i.e. genes incorrectly annotated as pseudogenes), fused genes, SNPs encoded in proteins and novel N-termini to name a few. Many proteins identified in this study were identified by proteomic methods for the first time (e.g. hypothetical proteins or proteins annotated based solely on their chromosomal location). We have generated a catalog of proteins that show a more tissue-restricted pattern of expression, which should serve as the basis for pursuing biomarkers for diseases pertaining to specific organs. This study also provides one of the largest sets of proteotypic peptides for use in developing MRM assays for human proteins. Identification of several novel protein-coding regions in the human genome underscores the importance of systematic characterization of the human proteome and accurate annotation of protein-coding genes. This comprehensive dataset will complement other global HUPO initiatives using antibody-based as well as MRM mass spectrometry-based strategies. Finally, we believe that this dataset will become a reference set for use as a spectral library as well as for interesting interrogations pertaining to biomedical as well as bioinformatics questions.

Akhilesh (2)