Brown CS Blog

Sorin Istrail Receives NSF Grant For Haplotype Reconstruction Algorithms


    Brown CS community members continue to win noteworthy grants and awards. To read more articles click here.

    Sorin Istrail has received funding for the NSF grant “Genome-Wide Algorithms for Haplotype Reconstruction and Beyond: A Combined Haplotype Assembly and Identical-by-Descent Tracts Approach”. Human genomes are diploids, which means that each human has two haplotypes, one inherited from the mother and one inherited by the father; each haplotype is a set (chromosomes) of sequences of about 3.2 billions of A, C, G, and T. These haplotypes are mosaics of haplotype regions inherited from ancestors as a result of two major forces of evolution: recombination and mutation. When two or more individuals inherit the same haplotype region from a common ancestor, the shared region is called a “tract” and it is said to be inherited “identical-by-descent” (IBD). Tracts have the same start and end coordinates on genomes sharing them.

    The “logic” of detection of disease associations is rooted in the inference of tracts. For example, if a set of autistic patients is found to share a tract, and a certain gene is found part of this tract, this gene becomes a candidate gene for an ancestral model of autism inheritance. Preliminary work together with his PhD student Derek Aguiar succeeded in solving a major open problem of the influential Li-Stephens statistical framework (2003) for modeling linkage disequilibrium, recombination hotspots and haplotype phasing; this framework enabled some of the most practical genome-wide association study (GWAS) software tools to date.

    A major bottleneck was the failure of “exchangeability” of the statistical process (the output of the algorithm depended on the order in which the input was processed). The combinatorial solution that achieved exchangeability led to the first exact sub-quadratic (close to linear) and practical algorithm, Tractatus, for detecting the complete multi-shared identical-by-descent tracts in a GWAS sample of individuals (current GWAS input size is a matrix with several billions entries). The name of the algorithm was inspired by Ludwig Wittgenstein’s Tractatus Logico-Philosophicus. The grant proposes a comprehensive algorithmic framework for haplotype reconstruction using haplotype assembly (the HapCompass framework), haplotype phasing and generalizations of Tractatus to address the problem of haplotype reconstruction in polyploidy organisms and medical aneuploidy.