Amritapuri, Vallikavu, Kerala 690525
Kunal Kundu, Sushma Motamarri, Uma Sunderam, Steven E. Brenner and Rajgopal Srinivasan.
VARANT: The Variant Annotation Tool
Genome sequencing technologies are generating an abundance of data on human genetic variations. A big challenge lies in interpreting the functional relevance of such variations, especially in clinical settings. A first step in understanding the clinical relevance of genetic variations is to annotate the variants for region of occurrence, degree of conservation both within and across species, pattern of variation across related individuals, novelty of the variation and know effects of related variations. Several tools already exist for this purpose. However, these tools have their strengths and weaknesses. We will present an open-source tool, VARANT, written in the python programming language, that is easily extended to incorporate newer annotations.
A detailed variant annotation places variants in context, highlights significant findings and prioritizes candidates for further analysis. With this outlook we developed VARANT to annotate, prioritize and visualize variants. VARANT has 5 levels of annotation â€“ genomic position based, gene based, untranslated region (UTR) based, mutation effect prediction and gene level disease association. The databases used for annotations have been compiled from several sources. The genomic position based annotation comprises of tagging variants present in dbSNP and 1000 Genomes projects, GWAS variants, variants in functionally constrained region and variants overlapping epigenetic signals. The gene-based annotation includes, the distance from splice sites for intronic variants; gene, transcript, amino acid change and splicing silencer and enhancers information for exonic variants. UTR based annotations comprise of UTR functional sites like miRNA binding site, internal ribosomal entry site, variations and deletions in UTR5-Coding Sequence(CDS) boundary, exon-intron boundary and CDS-UTR3 boundary.Mutation effect predictions are incorporated from PolyPhen2 and SIFT. Thus, a detailed annotation with VARANT captures multiple biological aspects of a variant and helps in filtering variants based on disease context. The input and output of VARANT is the universal Variant Call Format, with facilities to export the annotations to popular formats such as comma/tab separated values and MS Excel. Using a desktop computer with single core and 4GB RAM VARANT annotates over 50,000 variants/minute and can be readily parallelized. Being an exhaustive annotator with good performance using modest computational hardware, VARANT is a useful annotation tool for analyzing genomic variants. Furthermore, the tool includes facilities to update the underlying data sources in an automated fashion, and is easily extended to add additional annotations. VARANT also provides an interface to visualize variants in an annotated VCF file and to filter variants interactively based on annotation features like â€“ region, mutation effect etc, and inheritance models. In addition to annotation, there are ongoing efforts to incorporate a variant prioritization module using the annotated features as well as inheritance information.