Sorin Istrail Reflects On The History Of His ESA Test-Of-Time Award-Winning Research
- Posted by Jesse Polhemus
- on Nov. 16, 2022
For a Brown CS news item about the award below, click here, and for the slides from Giuseppe Lancia's invited talk for the award in PDF format, click here.
The 30th European Symposium on Algorithms (ESA) last month awarded the ESA 2021 Test-of-Time Award to Brown CS Professor Sorin Istrail and his collaborators for their ESA 2001 paper called “SNPs Problems, Complexity, and Algorithms.” The award recognizes “excellent papers in algorithm research that were published … 19-21 years ago and which are still influential and stimulating for the field today,” according to the ESA.
Laudation, by the ESA 2021 Test-of-Time Award selection committee
Giuseppe Lancia, Vineet Bafna, Sorin Istrail, Ross Lippert, and Russell Schwartz
“SNPs Problems, Complexity and Algorithms”
Proceedings of the 9th European Symposium on Algorithms, Arhus, Denmark, August 28-31, 2001, Lecture Notes in Computer Science, 2161, Springer-Verlag, 2001
“The paper contributed to foundational work of understanding emerging problems in genetics through the computational lens. Single nucleotide polymorphisms (SNPs) are the most frequent form of human genetic variation. They are of fundamental importance in medical diagnostic, drug design, and are a fingerprint of disease genes. This work studied problems related to computational SNPs validation based on genome assemblies of diploid organisms (those with two copies of each chromosome) and presented both hardness results and efficient algorithms, using graph modeling.”
Giuseppe Lancia of the University of Udine, Vineet Bafna of UC San Diego, Ross A. Lippert of D.E. Shaw Research, and Russell Schwartz of Carnegie Mellon University co-authored the paper along with Sorin. The research presented in the paper was done while the authors worked in private industry at Celera Genomics, where, as Senior Director of Informatics Research, Sorin’s responsibilities included the research area of SNPs and Haplotypes/Genomic Variation Project.
In the above picture:
Top: Craig Venter
Front row: Clark Mobarry, Michael Flanigan, Brian Walenz, Nathan Edwards, Merissa Henry, Randall Bolanos, Ross Lippert
Second row: Shibu Yooseph, Vinnet Bafna, Aaron Halpern, Giuseppe Lancia, Granger Sutton, Sridhar Hannenhalli, Laurent Mouchard (please confirm), Karin Remington, Liliana Florea, Gene Myers
Third row: Daniel Huson, Russell Schwartz, Arthur Delcher, Knut Reinert, Sorin, Ian Dew, Saul Kravitz
Sorin’s paper dealt with the problem of haplotype phasing, i.e., constructing diploid (maternal and paternal) haplotypes of a single individual from genome sequencing data, a record of about 4 million SNPs / genetic variations. Genomic variation of SNPs and haplotypes is fundamental in the search of genetic determinants of health and disease.
Celera Genomics built a powerful supercomputer to construct the company’s seminal achievement, “The Sequence of the Human Genome” (Science, 2001), and create from the genome assembly the first big database of human SNPs/genomic variations. The genomics world today is the world of SNPs and haplotypes: from Genome-Wide Association Studies (GWAS) to the whole-genomes sequencing and assembling of hundreds of thousands of individuals, SNPs and haplotypes analysis is at the center of it all.
“Improving data quality is crucial, because if a human genome cannot be independently assembled then the sequence data cannot be sorted into the two sets of parental chromosomes, or haplotypes.
This process haplotype phasing will become one of the most useful tools in genomic medicine. Establishing the complete set of genetic information that we received from each parent is crucial to understanding the links between heritability, gene function, regulatory sequences and our predisposition to disease.” Craig Venter, “Multiple personal genomes await,” Nature, April 2010
“This is a story," says Sorin, "about algorithms and computational biology, the threads that connect the three intellectual and life-changing Camelots I had the fortune to work for: Sandia, Celera, and Brown."
"In February 2000, I was a senior scientist at Sandia National Labs, in Albuquerque, New Mexico, one of the national labs of the US Department of Energy (DOE). I was the project leader of Sandia’s Computational Biology Project, which we started in 1992. The project was founded by the DOE Applied Mathematics Program, a program started at DOE by John von Neumann. At that time, Sandia Labs was the world leader in supercomputing, having the first 5 TOP 100 supercomputers in the world. In San Francisco, at the Symposium on Discrete Algorithms (SODA 2000), Gene Myers, VP of Informatics Research at Celera Genomics, met with me to tell me that company president Craig Venter wanted to meet me, and to come for an interview."
"My interview with Craig went like a fairytale (a story for another time). We talked about supercomputing. I asked Craig about the role of algorithms at Celera. He answered with his famous quick wit: “Algorithms are the make-or-break of Celera.” He made me an offer I couldn’t refuse. In fact, it was the best job offer I’d had to date. Two weeks later, I was Celera’s Senior Director of Informatics Research, and I had to hire my group with the highest speed possible — after all, the name Celera is part of the verb accelerate, and the company motto was “Speed Matters! Discovery Can’t Wait!” As I joined Celera Genomics, it was clear to many of us that we were privileged to be part of a once-in-a-lifetime research effort. The brilliance of the Information Research Group was remarkable. I was thrilled to work closely with Craig, the maverick genomics scientist and our inspirational leader, renowned for the art of making the impossible possible."
"In setting the research agenda of the SNPs and Haplotypes team, we collaborated with Andy Clark (Cornell University), a Brown University alumnus and one of the leading researchers in human genetics and population genomics. The haplotype phasing problem was from the start one of the main research challenges we focused on. Collaborartion with Michael Waterman added strength and rigor to our algorithmic strategies. In time, a broad research agenda emerged with computational biology problems related to population genomics, pharmacogenomics and SNP assays. Our ESA 2001 Award paper, and follow-up papers, birthed the field of computational haplotyping, which justifies today’s massive use of SNPs, for the discovery of genetic determinants of disease, and of SNP patterns associated with diseased individuals, inferred by algorithms applied to huge data sets."
"The authors of the paper, together with Bjarni Halldorsson (now Head of Sequence Analysis at deCODE Genetics), were the major contributors of the Celera SNPs and Haplotype research group. They continued their algorithms research on SNPs and haplotypes after Celera, working together with their students and collaborators. At Brown, I continued my research on SNPs and haplotypes with my former and current graduate students, postdoc, and honor theses students now in industry or in graduate schools: Ryan Tarpine (Google), Derek Aguiar (UCONN), Pinar Demetci (Brown), Fumei Lam (UC Davis), and Doug McEarlean (Google), Young Kim (MIT), Daniel Seidman (Cornell), Sam Crisanto (Microsoft Research), Sud Perera (Brown Medical School), Daniel Ben-Isvy (Harvard Medical School)."
"Pictured in the 2003 photo are a group of the authors of our comparative analysis paper done through genome-to-genome alignments of all the human genomes to date. Their smiles show 'the make' of our Celera algorithms which stood the test of time!"
"Sorin Istrail, Granger G. Sutton, Liliana Florea, Aaron L. Halpern, Clark M. Mobarry, Ross Lippert, Brian Walenz, Hagit Shatkay, Ian Dew, Jason R. Miller, Michael J. Flanigan, Nathan J. Edwards, Randall Bolanos, Daniel Fasulo, Bjarni V. Halldorsson, Sridhar Hannenhalli, Russell Turner, Shibu Yooseph, Fu Lu, Deborah R. Nusskern, Bixiong Chris Shue, Xiangqun Holly Zheng, Fei Zhong, Arthur L. Delcher, Daniel H. Huson, Saul A. Kravitz, Laurent Mouchard, Knut Reinert, Karin A. Remington, Andrew G. Clark, Michael S. Waterman, Evan E. Eichler, Mark D. Adams, Michael W. Hunkapiller, Eugene W. Myers, and J. Craig Venter, 'Whole-genome shotgun assembly and comparison of human genome assemblies' Proceedings National Academy of Sciences, vol. 101 no. 7, 1916–1921, February 17, 2004"
"Upon the paper’s publication, the Celera genome assemblies were deposited to GenBank/NIH to the delight of the Director of National Center for Biotechnology Information/NIH, Brown alumnus David Lipman. Sadly, two colleagues in the picture are no longer with us: Karin Remington, a senior member of the genome assembly team, and Hagit Shatkay, the Brown Ph.D. computer science alumna, who led the Celera team to winning, in collaboration with the company ClearForrest, the 2002 ACM Knowledge Discovery and Data Mining Competition (KDD Cup 2002; The challenge: automatic annotation of the Drosophila genome), passed away during the pandemic years. We miss them very much.”
In the above 2003 picture: From left to right, upper row: Bjarni Halldorsson, Aaron Halpern, Hagit Shatkay, Liliana Florea, Ross Lippert, Nathan Edwards, Granger Sutton, Russell Turner, J. Craig Venter, Sorin, Karin Remington, Ian Dew; lower row: Clark Mobarry, Brian Walenz, the-father-of-all-dot-plots, Jason Miller, and Merissa Henry.
Sorin’s research at Brown focuses on SNPs and haplotypes algorithms, the regulatory genome, and protein folding algorithms, and on theoretical computer science, algorithms and computational complexity, and statistical physics. He’s the former Director of the Center for Computational Molecular Biology at Brown University. Before joining Brown, he served as Senior Director and then Head of Informatics Research at Celera Genomics, where his group played a central role in the construction of the sequence of the human genome. In 2000, at Sandia Labs, Sorin obtained the negative solution (computational intractability) of the Three-Dimensional Ising Model Problem.
See: Sorin Istrail, "Statistical Mechanics, Three-Dimensionality and NP-Completeness: I. Universality of Intractability of the Partition Functions of the Ising Model Across Non-Planar Lattices", In 32nd ACM Symposium on the Theory of Computing (STOC00), ACM Press, Portland, Oregon, pp. 87-96, 2000
Media coverage: Science, vol. 288, pp. 5471, 2000 “MATHEMATICS: Statistical Physicists Phase Out a Dream”, by Barry Cipra.
In 1995, at Sandia Labs, Sorin, in collaboration with Bill Hart, constructed the first protein folding approximation algorithm with mathematically guaranteed error-bounds.
See: William E. Hart, Sorin Istrail, "Fast Protein Folding in the Hydrophobic-Hydrophilic Model Within Three-eighths of Optimal", In 27th Annual ACM Symposium on Theory of Computing (STOC95), ACM Press, pp. 157-168, 1995
Media coverage: Science, vol. 269, pp. 1821, 1995 "Folding Proteins Fast"
For a list of previous ESA Test-of-Time awards see http://esa-symposium.org/test-of-time-award.php
For more information, click the link that follows to contact Brown CS Communications Manager Jesse C. Polhemus.