Signatures Seminar Series
The Signatures Community of Interest at the Pacific Northwest National Laboratory is a multidisciplinary community interested in the exploration of signatures across all science domains where collaboration can lead to creative solutions for signature identification and discovery. Scientists and engineers gather to discuss signatures as a tool to address both national and global challenges, frequently teaming with other communities or research divisions to provide guest lectures on the PNNL campus.
The Hearts and Minds of Data Science
Dr. Cecilia Aragon, University of Washington
Thanks in part to the recent popularity of the buzzword "big data," it is now generally understood that many important scientific breakthroughs are made by interdisciplinary collaborations of scientists working in geographically distributed locations, producing and analyzing vast and complex data sets. The extraordinary advances in our ability to acquire and generate data in physical, biological, and social sciences are transforming the fundamental nature of science discovery across domains. Much of the research in this area, which has become known as data science, has focused on automated methods of analyzing data such as machine learning and new database techniques. Less attention has been directed to the human aspects of data science, including how to build interactive tools that maximize scientific creativity and human insight, and how to train, support, motivate, and retain the individuals with the necessary skills to produce the next generation of scientific discoveries. In this talk, I will argue for the importance of a human centered approach to data science as necessary for the success of 21st century scientific discovery. Further, I attest that we need to go beyond well-designed user interfaces for data science software tools to consider the entire ecosystem of software development and use: we need to study scientific collaborations interacting with technology as socio-technical systems, where both computer science and sociological approaches are interwoven. I will discuss promising research in this area, describe the current status of the Moore/Sloan Data Science Environment at UW, and speculate upon future directions for data science..
Cecilia Aragon is an associate professor in the Department of Human Centered Design & Engineering at the University of Washington, where she directs the Scientific Collaboration and Creativity Lab. She holds a faculty position with the UW eScience Institute and courtesy appointments in Computer Science and Engineering, Electrical Engineering, and the Information School, and leads UW’s Ethnography and Evaluation Working Group as one of the PIs of the $37.8M Moore/Sloan Data Science Environment. Before arriving at UW in 2010, she held an appointment in the Computational Research Division at Lawrence Berkeley National Laboratory for six years after earning her Ph.D. in computer science from UC Berkeley in 2004. She received her B.S. in mathematics from the California Institute of Technology. Her current research focuses on human centered data science and computer-supported cooperative work (CSCW), visual analytics, emotion in informal text communication, and how social media and new methods of computer-mediated communication are changing data-intensive scientific practice. She has authored or co-authored over 70 refereed and 100 non-refereed publications in HCI, CSCW, visual analytics, machine learning, and astrophysics. Her research has been recognized with six Best Paper awards since 2004. She won the Distinguished Alumni Award in Computer Science from UC Berkeley in 2013, the Faculty Innovator in Teaching Award from her department at UW that same year, and was named one of the Top 25 Women of 2009 by Hispanic Business Magazine. In 2008, she received the Presidential Early Career Award for Scientists and Engineers (PECASE) for her work in data-intensive science. Aragon has an interdisciplinary background, including over 15 years of software development experience in industry and NASA, and a three-year stint as the founder and CEO of a small company.
Geometric and Topological Signatures of Functions with Applications
Dr. Chandrajit Bajaj, The University of Texas at Austin
In this talk, guest, Dr. Chandrajit L Bajaj surveys the construction and application of several geometric and topological signatures that are associated to critical point sets of multi-dimensional functions. These signatures include contour trees, Reeb graphs, Morse-smale complexes, etc. The construction utilizes efficient and robust techniques from computational differential geometry and topology. Applications of these signatures are numerous including shape reconstruction, segmentation, motif detection, skeletonization, similarity analysis, etc. and applicable for a variety of functional data, spanning microscopy to functional simulations of models.
Dr. Bajaj's research interests span the algorithmic and computational mathematics underpinnings of Image Processing, Geometric Modeling, Computer Graphics, Visualization, Structural Biology and Bioinformatics. These algorithms are applied to:
- Structure elucidation and reconstruction of spatially realistic models of molecules, organelles, cells, tissues, and organs, from electron microscopy, and bio-imaging,
- Fast high-dimensional search engine for identifying energetically favorable molecular binding conformations (e.g. virtual screening for anti-viral drugs), and
- Integrated approaches to computational modeling, mathematical analysis and interrogative visualization of the dynamics of electrical signaling and oscillations (3–10 Hz) amongst neurons in the hippocampus (the central area of learning and memory in the human brain).
Signal Detection From an Informatics Perspective
Dr. Steven Bedrick, Oregon Health & Science University
Signal detection rarely happens in a vacuum. The data that produce signals must necessarily come from somewhere, and those signals (once detected) are ultimately destined to be used in some larger context: to help inform a decision, trigger an activity, or to serve as input to a downstream process. The field of informatics can in many ways be thought of as the systematic study of how this transition takes place. This transition is fraught with difficulties, both technical and, for lack of a better term, organizational in nature. This talk will discuss informatics challenges inherent to the intelligence-gathering process (including knowledge representation, data interchange, and workflow) and will compare and contrast with challenges facing the medical world. The talk will also include discussion of a real-world medical signal detection scenario: early diagnosis of children with autism spectrum disorders.
Steven Bedrick is an assistant professor in Oregon Health & Science University's Center for Spoken Language Understanding. His work focuses on finding biomedical applications for natural language processing and other human language technologies, and includes projects ranging from early detection of autism in children to computer-assisted systematic literature reviews.
Imperfection is Beautiful and Efficient: Approximate Computing from Language to Hardware, and Beyond
Professor Luis Ceze, University of Washington
A significant proportion of computer system resources are devoted to applications that can inherently tolerate inaccuracies in their data, execution and communication. Hence, “approximate computing” is promising for performance and energy efficiency. However, taking advantage of approximate computing needs: language support to specify where and how to apply approximation; analysis and mechanisms that ensure good output quality; and hardware/system support that take advantage of approximation. In this talk I will describe our effort on co-designing language, hardware and system support to take advantage of approximate computing across the system stack (compute, storage and communication) in a safe and efficient way. I will end with some thoughts on how effective approximate computing techniques can not only improve computer systems in the short and medium term but can also enable the use of new substrates and applications.
Luis Ceze is an Associate Professor in the Computer Science and Engineering Department at the University of Washington. His research focuses on computer architecture, programming languages and OS to improve the programmability, reliability and energy efficiency of multiprocessor systems. He is a recipient of an NSF CAREER Award, a Sloan Research Fellowship, a Microsoft Research Faculty Fellowship and the 2013 IEEE TCCA Young Computer Architect Award. He was born and raised in Sao Paulo, Brazil, where it drizzles all the time; he now (mostly) lives in the similarly drizzly Seattle. When he is not working he is found either eating or cooking.
Ocean Ecosystems in a Warmer World: Review and Observing Needs
Dr. Francisco Chavez, Monterey Bay Aquarium Research Institute
Ocean ecosystem time series are few and far between but those available, together with remote sensing and modeling, show that these ecosystems can fluctuate widely and provide a glimpse of the processes at work. The presentation first reviews 25 years of ocean variability in the Monterey Bay region within the context of global climate variability and change. Large-scale changes in climate are important drivers of the local changes in Monterey Bay but those changes are not presently predictable. The observations highlight the need for new approaches to observing ocean ecosystems to: 1) improve predictive models by better characterizing the processes; 2) improve the characterization of the entire food chain; and 3) make the time series sustainable into the future. Some examples of how we might achieve this are provided.
Francisco Chavez is a biological oceanographer with interests in how climate variability and change regulate ocean ecosystems on local and basic scales. He was born and raised in Peru, has a BS from Humboldt State and a PhD from Duke University. He was one of the founding members of the Monterey Bay Aquarium Research Institute (MBARI) where he has pioneered time series research and the development of new instruments and systems to make this type of research sustainable. Chavez has authored or co-authored over 200 peer-reviewed papers with 10 in Nature and Science. He is past member of the National Science Foundation Geosciences Advisory Committee, has been involved in the development of the US Integrated Ocean Observing System (IOOS), is a member of the Governing Board of the Central and Northern California Coastal Ocean Observing System (CeNCOOS) and the Science Advisory Team for the California Ocean Protection Council. Chavez is a Fellow of the American Association for the Advancement of the Sciences; honored for distinguished research on the impact of climate variability on oceanic ecosystems and global carbon cycling. Chavez is also a Fellow of the American Geophysical Union; honored for advancing fundamental knowledge of the physical-biological coupling between Pacific Decadal Oscillations, productivity, and fisheries. He was awarded a Doctor Honoris Causa by the Universidad Pedro Ruiz Gallo in Peru in recognition of his distinguished scientific career and for contributing to elevate academic and cultural levels of university communities in particular and society in general. Chavez is the 2014 recipient of the Ed Ricketts Memorial award.
Advances in Anomaly Detection with Applications to Insider Threat Detection
Professor Thomas Dietterich, Oregon State University
Our team at Oregon State University has developed several new algorithms for anomaly detection. These are based on two main principles: “anomaly detection by underfitting” and “anomaly detection by overfitting”. In the underfitting approach, a model is fit to the data and points that do not fit well (e.g., that have low estimated probability density) are flagged as anomalies. In the overfitting approach, we transform the data to create learning problems in which there should be no signal or pattern and then apply machine learning algorithms to fit this data. If the algorithm finds a pattern, this is due to overfitting, and points belonging to the (false) pattern are likely to be anomalies. This talk will present these algorithms and also describe our benchmarking framework, which allows us to measure and compare the performance of different anomaly detection algorithms. I will also describe the results of a red-team experiment conducted under the DARPA ADAMS program in which our anomaly detection methods are showing excellent performance.
Thomas G. Dietterich is Distinguished Professor of Computer Science at Oregon State University. He has contributed to many aspects of machine learning including multiclass classification, learning from weakly-labeled (multiple instance) data, ensemble methods, cost-sensitive learning, hierarchical reinforcement learning, and integrating learning into user interfaces. Dietterich is a Fellow of the ACM, AAAI, and AAAS and President-Elect of the AAAI.
Statistical Analysis of High-dimensional Manifold Data
Dr. Tom Fletcher, University of Utah
Manifold representations are useful for many different types of data, including directional data, transformation matrices, tensors, and shape. Statistical analysis of these data is an important problem in a wide range of image analysis and computer vision applications. However, defining statistics on a manifold is not a straightforward process. Even the simplest statistics, such as the mean, depend on the vector space structure of Euclidean space. This structure is not available for a general manifold. In this talk I will discuss how many common statistics can be defined for manifold-valued data by utilizing the geodesic distance on the manifold. After explaining the algorithms for computing statistics on manifolds, I will demonstrate their application in several image analysis problems.
Tom is an Assistant Professor in the School of Computing at the University of Utah, and works within the Scientific Computing and Imaging Institute. His research focuses on solving problems in medical image analysis and computer vision through the combination of statistics and differential geometry. Tom earned his Ph.D. in Computer Science at the University of North Carolina at Chapel Hill, Statistical Variability in Nonlinear Spaces: Application to Shape Analysis and DT-MRI. Prior to that, Tom completed a M.S. in Computer Science at the University of North Carolina at Chapel Hill, and a .B.A. in Mathematics at the University of Virginia.
Modeling the Structure and Dynamics of Insurgent and Political Networks
Dr. Michael Gabbay, University of Washington
Dr. Gabbay presents two different components of his research on quantitatively representing network structure and behavior among insurgent groups and political elites. The goal of this research is to better understand and anticipate their factional dynamics, cooperative behavior, and strategic decision making. The first part of this talk will focus on the quantitative construction of networks from political actor rhetoric, focusing on the Iraqi and Afghan insurgencies. Three main features of insurgent rhetoric are employed: (1) ideologies as represented by the social identities groups seek to project; (2) the types of targets insurgents publicly claim to attack; and (3) group cooperative relationships. Ideologies are quantified using the concept of a "conflict frame" which consists of the in-groups and out-groups in insurgent discourse; it is implemented via a computational algorithm based on frequencies of key words and phrases. The second part of my talk will present a mathematical model of decision making in political networks. Group decision making is modeled as an opinion change process in which an individual's opinion concerning a policy under debate is taken to evolve as the result of the interplay between his ideological predispositions and the influence of the other group members with which he communicates. The model exhibits sharp transitions between equilibrium solutions arising from its nonlinear nature and I present results regarding the interaction of network density, network topology, and initial disagreement level upon decision outcome and consensus formation. Applications of the model to elite decision making and an experimental effort to test its predictions will be briefly discussed.
Dr. Gabbay's current research involves the development of mathematical models and computational simulations of network dynamics, focusing on social and political systems. He has also conducted research in the areas of nonequilibrium pattern formation, coupled oscillator dynamics, sensor development, and data analysis algorithms. His work has appeared in physics, engineering, biology, and political science publications. Dr. Gabbay received a B.S. in physics from Cornell University and and Ph.D. in physics from the University of Chicago with a specialization in nonlinear dynamics.
Epidemic Modeling, Biosurveillance, and Signatures of Infectious Disease
Dr. David Hartley, Georgetown University Medical Center
Recent work may provide the foundation for anticipating signatures of infectious diseases as they invade new, susceptible populations. Dr. Hartley describes recent work modeling the arrival of West Nile virus (WNV) in California and spatio-temporal patters of disease observed since. Clear environmental indicators emerge from this research to identify areas and times at risk for WNV emergence and transmission. Given a forecast, however, what are surveillance signatures of the arrival of a new infection? A potential scheme for applying epidemic models to event-based, Internet biosurveillance will be discussed. Modeling tools provide an objective and often overlooked framework for prospectively estimating biosurveillance signatures of diseases.
Dr. David Hartley is a Research Associate Professor in the department of Microbiology and Immunology at the Georgetown University Medical Center. His research interests include the ecology of infectious disease, public health surveillance, hospital infection control, and biological defense. His research applies diverse analytic methodologies to understand the dynamics of disease in human and animal populations, as well as to discover and analyze potential strategies to control infection.
Green’s Functions and Signal Propagation for Influence Prediction
Dr. Vikram Jandhyala, University of Washington
Predicting influence in large-scale social and organizational networks is a challenging big data problem. In this talk, we present recent work on graph Green's functions, sociological models of opinion formation, and information theoretic measures to analyze, model, and predict behavior and influence in such networks. We present PhySense, a prototype agent-based scalable simulator for graph-based social signal propagation and influence estimation. This work is complementary to and benefits from synergies with recent advances in statistical inference methods and automated topic modeling. We will present examples from both real and synthesized data..
Dr. Vikram Jandhyala is Professor and Chair of the department of Electrical Engineering, University of Washington and Director of Applied Computational Engineering Lab at UW EE. He is also UW director of the UW-PNNL Northwest Institute for Advanced Computing. He is a recipient of an NSF CAREER award, a NASA inventor award, an outstanding research advisor award from UW EE, and graduate research awards from IEEE Microwave Society and the University of Illinois. He has published more than 150 papers and is founder of Nimbic, a startup that provides electromagnetic simulation and cloud-based electronic design automation tools for the semiconductor industry. He is also a UW Presidential Entrepreneurial Faculty Fellow. His research has been funded by DARPA, NSF, SRC, WRF, NASA, LLNL, DoD, SBIRs, and several industrial sponsors. His research interests include fast electromagnetic simulation, physics based simulation, high dimensional design space exploration, synthesis and optimization, multicore and distributed parallel algorithms, secure and scalable scientific computing on public clouds, and graph techniques and scalable simulation methods for big data network applications.
Multi-omics Assessment of the Human Gut Microbiome
Dr. Janet Jansson, Pacific Northwest National Laboratory
Humans are colonized soon after birth with microorganisms that collectively comprise the human microbiome, the composition of which becomes relatively stable after the first 2 years of life. The studies that I will describe in this talk include analysis of two types of impacts on the gut microbiome: 1) inflammatory bowel disease (IBD) and 2) a resistant starch diet. Examples included shifts in a suite of lipid molecules according to diet. Together these studies exemplify the use of advanced omics measurements to detect novel molecular signatures of disease and diet that can inform personalized therapies as well as provide potential bioindicators of specific physiological states of importance to human health.
Janet Jansson obtained her Ph.D. in Microbial Ecology in1988 from Michigan State University under the supervision of James Tiedje. From there she moved to Sweden for her postdoctoral research at Stockholm University. She lived in Sweden for 20 years, at the end of which she had a position as Professor (Chair) of Environmental Microbiology at the Swedish University of Agricultural Sciences and as Vice Dean of the Natural Science Faculty. In 2007 she obtained a position as Senior Staff Scientist in the Earth Sciences Division at Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA, and headed the Ecosystems Biology Program in the Division. Recently, as of 2014, she has been serving as the Division Director of Biological Sciences at the Pacific Northwest National Laboratory in Richland, Washington. In addition, she holds Adjunct Professor positions at the University of California, Berkeley and the University of Copenhagen, Denmark. She is also currently serving as President of the International Society of Microbial Ecology (ISME) and is a Senior Editor of the ISME Journal. Her current research interests are in the application of molecular “omics” tools to gain an understanding of the function of microbial communities in complex environments, ranging from soil to sediment to the human gut.
Scatter Matrix Sharing for Distributed Supervised Learning and Big Data and One Health
Dr. Michael Kane, Yale Center for Analytical Sciences
Dr. Peter Rabinowitz, University of Washington
In the paper "Large Complex Data: Divide and Recombine (D&R) with RHIPE," (Guha et al. 2012) a computationally efficient and powerful approach to proposed for supervised learning problems on large data sets, where the number of rows far exceeds the number of columns. In particular the authors show that for least squares regression data can be partitioned row-wise, model can be fit for each partition, and slope coefficients can be averaged. This averaged model converges to a model trained on the entire data set and has the advantage that the operation can be performed in a single map-reduce step. This talk will explore an approach where this constraint is relaxed. The procedure first calculates the scatter matrix and then distributes this to partition-wise models. The scatter matrix is then incorporated into each model. The talk will show that for OLS, GLM, RLS the exact fit can be recovered and for SVM's a stable approximation can be found in a constant number of map-reduce steps. Real-world results will be shown for gigabyte-size data sets.
Recent emerging disease events have underscored the important linkages between human, animal, and ecosystem health. There has been increasing interest in developing integrated, transdisciplinary "One Health" approaches to address complex disease problems including emerging zoonotic disease, food security and bioterrorism threats as well as the use of animals as "sentinels" for environmental health hazards. Such integration could benefit from the application of Big Data approaches. This talk will discuss some of the potential applications of Big Data concepts for One Health challenges.
The human intestinal microbiome: gut health, energy extraction, and possible gut-brain interactions
Dr. Rosa Krajmalnik-Brown, Arizona State University
Ongoing studies are looking at gut microbiome variances in patients undergoing bariatric surgeries and in children with autism to assess contributions of fermentation products to: human gut health, energy extraction, and possible gut-brain interactions. To better understand microbiome effects on host metabolism, and environmental effects on the microbiome, this research analyzes microbial community structures and measure fecal pH, undigested calories, short-chain fatty acids (SCFAs), and the production of other important metabolites. Key microbial interactions in the gut involve partnerships between microorganisms to ferment carbohydrates and amino acids. The products of these interactions consequently contribute to the production of microbial metabolites which can have beneficial or deleterious health effects. Data collected from current NIH-funded study will be presented and Dr. Krajmalnik-Brown will give an overview of the use of PNNL’s Signature Discovery Tools to analyze disparate data types, including metabolomics, 16S RNA genes, and patient metadata.
Dr. Rosa Krajmalnik-Brown is an Associate Professor at the School of Sustainable Engineering and The Built Environment and the Swette Center for Environmental Biotechnology at Arizona State University. She Joined the SSEBE faculty in 2007. She has a Ph.D. in Environmental Engineering from Georgia Tech. In 2011, she received an NSF CAREER award. She is author of 5 patents and more than 60 peer-reviewed publications. She specializes in molecular microbial ecology for bioremediation, the use of microbial systems for bioenergy production, and human intestinal microbial ecology and its relationship to obesity, bariatric surgery, and autism.
Tracking Responses of Arid Land Soil Microbial Communities to Land Use and Climate Changes Using Molecular and Metagenomic Approaches
Dr. Cheryl Kuske, Los Alamos National Laboratory
Arid land ecosystems face combinations of physical and physiological disturbances from increased anthropogenic use and shifting climate patterns. The patchy distribution of plants and biocrusts, and the complex composition of biota in arid land soils create special challenges for tracking and predicting landscape responses to change. First, we have mapped the distribution of soil fungi and bacteria in arid landscapes at multiple scales using molecular and metagenomic approaches. Second, through multi-year field experiments, we have monitored the impacts of physical disturbance and climate change scenarios on the biomass, composition and functional profiles of biological soil crusts. Physical disturbance (repeated foot traffic), increases in soil temperature (3°C), an altered pattern of summer precipitation, and long-term elevated atmospheric CO2 each affected the biomass and composition of biocrust bacterial communities. Impacts of physical disturbance and altered precipitation pattern were the most noticeable and were visibly similar. However, the impacted soil communities were different in structure, suggesting legacy effects specific to the type of disturbance. Combined soil warming and altered precipitation resulted in biocrust compositional changes that differed from either perturbation alone, highlighting the importance of considering the combined influences of multiple disturbances. Development of fungal rRNA sequence databases to facilitate classification of taxa in complex soil fungal communities, and an approach to design quantitative PCR assays for soil taxa found to be ‘responsive’ to perturbations will be discussed.
Dr. Cheryl Kuske, has 32 years of research experience in microbial ecology, plant-microbe interactions, pathogen epidemiology, (meta)genomics, and terrestrial ecosystem science. Her professional experience has included academic, industry, and national laboratory positions. Over the past 23 years at Los Alamos National Laboratory, she has developed and applied molecular and ‘omic methods to study bacterial and fungal communities and their functions in the environment. Her research programs are relevant to DOE missions in climate change, carbon cycling, and actinide transport, and to detection of target pathogens and functional groups in environmental samples. Cheryl has published over 80 peer-reviewed manuscripts, fourteen Los Alamos unclassified reports, and two book chapters, and she holds four patents.
IKE: Identification and Characterization of Threats Through Quantitative Integration of Indications and Signatures Derived from Multiple Heterogeneous Data Sources
Dr. Deborah Leishman, Los Alamos National Laboratory
Identifying and Characterizing Nuclear, Biological, Chemical and other threats in a timely and useful manner requires the ability to optimally obtain and integrate discriminating indications and signatures from multiple heterogeneous data sources in a quantitative framework. We present a set of methods and tools in IKE (Integrated Knowledge Engine) that satisfy this need. IKE supports Analysts by integrating indications and signatures derived from multiple heterogeneous data sources to quantitatively characterize threats. IKE also supports Collection managers by optimizing the indications and signatures to collect and the assets to collect them with. IKE utilizes Bayesian networks as the mathematical framework for integration and optimization. We discuss problems IKE has addressed and use these problems to enable understanding of the IKE methods as well as motivate discussion related to Big Data and signature generation and where we actually have very little data.
Dr. Deborah Leishman holds a Ph.D. in Computer Science specializing in Machine Learning and Artificial Intelligence. Dr. Leishman joined Los Alamos National Laboratory (LANL) in 2001 from IBM where she worked on defining service oriented architectures and led research in business intelligence, and knowledge management. Prior to IBM Dr. Leishman managed research and development teams at several companies and research organizations, leading research in software reuse, spatial data systems and artificial intelligence. Currently, Dr. Leishman leads the Integrated Knowledge Engine (IKE) effort at LANL. IKE defines a set of mathematically based methods and tools that integrate multiple heterogeneous data sources to support analysts and collection managers in their decision making. IKE has been used in many problem areas related to Nuclear, Biological, Chemical and other threats.
How Cells Know Where to Go? Signal Detection and Processing at the Microorganism Level
Dr. Herb Levine, Rice University (PNNL Laboratory Director's Distinguished Lecture Series)
Many biological cells are able to bias their motion to track chemical or mechanical gradients in their environments. From a systems perspective, these cells must measure the stimulus field, must process the sensor data so as to decide which way to move, and then must engage their "muscles" to actually implement that decision. We have studied this capability in a simple microorganism, Dictyostelium discoideum, which is a tractable model system utilizing much of the same molecular machinery as more complex human cells that undertake the same task. Our research combines theoretical ideas from information theory, dynamical systems, and fluid mechanics with detailed experimental measurements carried out with the use of specialized microfluidics technology. Thus, it serves as an exemplar of how to begin to transform biology to a more quantitative science.
Herbert Levine is the co-director at the National Science Foundation Physics Frontier Center of Theoretical Biological Physics and a professor at the University of California San Diego (UCSD). He is a fellow of the American Physical Society and is chair of the APS Division of Biological Physics. He has just completed a 6-year stint as the associate editor of the Biophysical Journal and also has served on the editorial boards of Chaos and Physical Biology. He worked as a consultant for JASON, an independent group of scientists which conducts reviews of government scientific programs, mostly in the context of national defense. He is the co-director of UCSD's Center for Theoretical Biological Physics, a Physics Frontier Center devoted to applying new concepts from physics to biological processes. Dr. Levine received his bachelor's degree in Physics from the Massachusetts Institute of Technology and both his Master's and Ph.D. in Physics from Princeton. In July 2012, Mr. Levine will be moving to Rice University with the title Hasselman Professor of Bioengineering while continuing his role as co-director of the Center for Theoretical Biological Physics.
The CommonGround Visual Paradigm for Biosurveillance
Dr. Yarden Livnat, University of Utah
Biosurveillance is a critical area in the intelligence community for real-time detection of disease outbreaks. Identifying epidemics enables researchers to detect and monitor disease outbreak that might be spread from natural causes or from possible biological warfare attacks. The importance of early detection of disease outbreaks coupled with an exponential growth in our ability to collect and analyze vast amounts of data has led to the development of a multitude of modern disease surveillance systems. Such systems provide the user with unparalleled access to a wide range of analytic algorithms and numerous views of the data. Nevertheless, access to the data through these tools is constrained by a minimal user interface making it difficult to infer and correlate heterogeneous data from disparate sources.
In this talk, Dr. Livnat will present a novel visual paradigm that aims to improve situational awareness in biosurveillance using the concept of an Infectious Disease Weather Map. His approach emphasizes the discourse between the user and the surveillance system and leverages human perception and cognitive processes in order to facilitate and enhance comprehension. The proposed visualization system provides a visual common ground in which users can view and explore emerging concepts and correlations related to heterogeneous data, including symptom reports, existing syndromes, pathogens, age groups, and geographic locations.
Dr. Livnat received his Ph.D. in Computer Science from the University of Utah in 1999. He received his M.Sc. in Computer Science from the Hebrew University, Israel, in 1991 and B.Sc. in Computer Science from the Ben-Gurion University, Israel, in 1982. In between, Dr. Livnat served as a Captain in the Israeli Defense Forces (IDF) where he was a software team leader of a real-time system. In the past 25 years, he has been involved in diverse research and development activities, including Real-time systems, scientific visualization with emphasis on accelerated isosurface extraction algorithms, Common Components software Architecture (CCA), High-Order Finite Elements Methods (h-p FEM), CAGD, computer generated holograms, and Wavelets. Research interests include information visualization, scientific visualization, software architecture, computational geometry.
Algorithmic, Architectural, and Employment Concept Challenges Presented by the Hard + Soft Data Fusion Problem
Dr. James Llinas, State University of New York
The combined effects of new concepts of military operations (irregular and asymmetric warfare, counter-insurgency, etc.) and new modes of observation and communication of information (UAV's, human observers, social media feeds, open-source intelligence) have created a revolutionary challenge to the design and development of Data and Information Fusion (DIF) processes and systems. These impacts affect the complete system spectrum to include the architectural framework, the core algorithmic methods, and also the concepts of employment of these new technologies. The terminology for DIF systems now has come to use the terms "hard" and "soft" data/information in these applications, where hard designates the data and information from conventional and evolving electro-mechanical, "physics-based" sensors, and soft the term associated to linguistically-couched data/information from these various new sources.
This presentation will provide an overview of this new DIF process/system development domain, along with some current methods being used by our research center (the Center for Multisource Information Fusion at the University at Buffalo) to address these challenges under a large Army-funded research program.
Doctor James Llinas is an Emeritus Full Professor in the Departments of Industrial and Systems Engineering and also dual-appointed in the Department of Electrical Engineering at the State University of New York. He is an internationally-recognized expert in sensor, data, and information fusion, co-authored the first integrated book on Multisensor Data Fusion, and has lectured internationally for over 30 years on this topic. Dr. Llinas was a Technical Advisor to the Defense Department’s Joint Directors of Laboratories Data Fusion Group, the only US DoD technology oversight group for Data Fusion.
Pattern Recognition via Linear Subspace Models and the Flag Mean
Tim Marrinan, Colorado State University
Linear subspace models have gained popularity over the past two decades as representations that span variation (in contrast to methods that normalize for variation). An example of such a model is that a linear subspace can be used to accurately represent images of an object under all possible illumination conditions. In other words, a combination of images of an object taken with different lighting conditions can be used to create an image of that object under a previously unseen lighting condition. This talk explores applications of these linear subspace models that exploit an average called the flag mean to tasks in pattern recognition such as action recognition in videos and gas detection in long-wave infrared hyperspectral images.
Tim Marrinan is a fifth year PhD student in mathematics at Colorado State University as part of the Pattern Analysis Lab. His research focuses on geometric data analysis with applications to computer vision and pattern recognition. He is advised by professors Michael Kirby and Chris Peterson in the math department, and collaborates frequently with professors Bruce Draper and J. Ross Beveridge from the department of computer science. Tim received his M.S. in mathematics in 2013 from Colorado State, and earned his B.A. in applied mathematics and geology from Whitman College. Tim's other research interests are in topological data analysis, optimization, machine learning, semi-definite programming, dimensionality reduction, domain adaptation, and statistics on Riemannian manifolds.
Current Challenges in Shotgun Metagenome Analysis
Dr. Folker Meyer, Argonne National Laboratory
While next generation sequencing has enabled unique insights into microbial populations, it's application is not without significant problems. Meyer will highlight the current state of the art, some open issues and the general computational and scientific challenges faced by groups using shotgun metagenomic. Using a real world example he will also highlight some of the pitfalls.
Folker Meyer is a computational biologist at Argonne National Laboratory and a senior fellow at the Computation Institute at the University of Chicago. He is also associate division director of the Institute of Genomics and Systems Biology.
He trained as a computer scientist and started to work with biologists early on in his career. It was that exposure to interesting biological problems that sparked his interest in building software systems to tackle biological problems, mostly in the field of genomics or post-genomics. In the past he has been best known for his leadership role in the development of the GenDB genome annotation system, he has also played an active role in the design and implementation of several high-performance computing platforms. His current work focuses on the analysis of shotgun metagenomics data sets and on the MG-RAST community resource for metagenomics. Shotgun metagenomics is benefitting directly from the current advances in sequencing technology, leading to dramatic growth in the number scientists using this approach and the number and size of the data sets being produced. He also has an interest in microbial genomics and the analysis of complete microbial genomes and is a member of the RAST project. He is a founding member of the Earthmicrobiome project (EMP). He is a member of the Genomics Standards Consortium (GSC).
Ensemble Clustering of Phosphorylation Dynamics Reveals Novel Interactions in the ERBB Network
Dr. Kristen Naegle, Washington University in St. Louis
Receptor tyrosine kinase networks, such as the ERBB family of receptors, rely heavily on tyrosine phosphorylation for propagation of cellular signals. Mass spectrometry techniques have led to a rapid increase in the discovery and observation of phosphorylation dynamics within the cell, which has outpaced our ability to understand the role of individual phosphorylation sites within the network. Dr. Naegle shows how ensemble clustering of dynamic phosphorylation data has been useful in identifying novel network interactions. The combination of machine learning and molecular measurements has produced insight regarding transient, protein-protein interactions, which no other traditional molecular screens would likely have captured.
Kristen Naegle joined Washington University in St. Louis in spring 2012. She was previously a postdoctoral associate at the Koch Institute for Integrative Cancer Research and Department of Biological Engineering at the Massachusetts Institute of Technology.
Professor Naegle's research interests include computational molecular systems biology, post-translational modifications, signal transduction and proteomics. She combines computational mining and modeling techniques with experimental molecular biology approaches to understand the function of post-translational modifications in regulatory networks of the cell. The specific focus of her work is on those regulatory events that are involved in the complex development and propagation of human disease with the possibility of discovering new therapeutic interventions in diseases like cancer, diabetes and neurodegenerative disorders.
Dr. Mark Oxley, Air Force Institute of Technology
Mark Oxley discusses fusion from an abstract point of view concluding with its mathematical definition. Sensor-exploitation systems are presented in order to discuss various kinds of fusion, e.g., sensor fusion, data fusion, classifier fusion, and decision fusion. The generalization of Receiver Operator Characteristic (ROC) curves, called ROC manifolds, is presented along with results and some examples to demonstrate its usefulness. Current fusion ideas will be shared to generate discussion with the audience.
Mark Oxley is a Professor of Mathematics in the Department of Mathematics and Statistics, Graduate School of Engineering and Management, Air Force Institute of Technology (AFIT) located on Wright-Patterson Air Force Base, Ohio. Dr. Oxley earned the B.S. degree in mathematics from Cumberland College in 1978 (renamed to the University of the Cumberlands in 2005), the M.S. degree in applied mathematics from Purdue University in 1980, and the Ph.D. degree in mathematics from North Carolina State University in 1987. He joined the AFIT faculty in July 1987. He has advised several M.S. and Ph.D. student research in artificial neural network theory, applied mathematics and information fusion. He has received research funding from AFOSR, AFRL, ACC, DARPA and NASIC. He has published over 70 journal articles in mathematics, applied mathematics, and engineering. His research began in nonlinear partial differential equations, and has developed other areas of expertise in pattern recognition, signal and image processing, category theory, information fusion, and Receiver Operating Characteristics (ROC) curve and manifold analysis. He has served as an independent reviewer on 4 PNNL projects (beginning in 2006). He serves the fusion community as an associate editor of the Journal of Information Fusion (Elsevier).
Acquisition and Analysis of Functional Brain Signals
Dr. Dianne Patterson, University of Arizona
Functional Magnetic Resonance Imaging (fMRI) has provided significant insight into the function of the human brain over the last two decades; as such, the presentation discuss brain imaging with an emphasis on fMRI. In the last five years computational power has improved to the point where we can move away from the traditional but limited general linear model analyses to more robust and sensitive techniques like independent component analysis, network analyses, and machine learning. These new analytical techniques promise to provide a much more sophisticated picture of human brain function.
Dr. Dianne Patterson is a Research Scientist at the University of Arizona. Dr. Patterson completed her early graduate work in the Mazatec highlands studying traditional medical and biological terminology. She earned her Ph.D. from the University of Arizona in 1999 (in ethology and evolutionary psychology) focusing on the acoustic and articulatory features of speech sounds in African Grey parrots. For the past 15 years she has been researching neuroimaging with a special interest in diffusion tensor and functional MRI, network analysis, and data visualization.
Five Habits of the Master Thinker: Techniques for instilling rigor, saving time, and overcoming mindsets
Randall Pherson, Pherson Associates
Intelligence analysis is as much an art as a science. Many senior analysts tout their intuitive grasp of complex situations but others say they are just lucky. A better approach is to employ structured analytic techniques that help protect the analyst against known cognitive pitfalls and reduce the frequency and severity of error. A good analyst will instinctively:
1. Know when to challenge key assumptions—usually far more often than you think
2. Generate alternative explanations or hypotheses for all events
3. Look for inconsistent data that that challenges their mindsets
4. Focus on the key drivers that best explain what has happened or will soon occur
5. Consider the overarching context and the client’s true needs
This presentation provided an overview of the techniques associated with each of these habits drawing compelling examples from the worlds of intelligence, law enforcement, and banking, and risk analysis.
Randolph H. Pherson, President, Pherson Associates, LLC, helps businesses and governments develop robust analytic capabilities within their organizations. He has authored seven books on structured analytic techniques, critical thinking and writing skills, strategic foresight analysis, and communicating analysis in the digital age.
In 2000, he completed a 28-year career in the Intelligence Community, last serving as National Intelligence Officer (NIO) for Latin America. Previously at the CIA, Mr. Pherson managed the production of intelligence analysis on topics ranging from global instability to Latin America, served on the Inspector General’s staff, and developed and implemented a strategic planning process for the CIA. Mr. Pherson received his B.A. from Dartmouth College and his M.A. in International Relations from Yale University.
Signal Processing and Automatic Classification Research at APL-UW
Dr. James Pitton, University of Washington, APL
An overview of signal processing and automatic classification research at the Applied Physics Laboratory, University of Washington. Researchers at APL-UW have worked on a number of projects involving signal classification for a variety of sources, from gamma ray spectra to underwater acoustics.
Dr. James W. Pitton is a Principal Engineer at APL-UW, and holds an affiliate faculty position with the Department of Electrical Engineering at the University of Washington. Dr. Pitton received his Ph.D. in Electrical Engineering from the University of Washington in Seattle in 1994. He has also held research positions at AT&T Bell Laboratories in Murray Hill, NJ, and the Statistical Sciences Division of MathSoft in Seattle, WA. His ongoing research interests are focused on algorithms for information processing and autonomous systems, with an emphasis on sonar, automatic classification, nonstationary signal processing, and array processing.
Informatics for Discovering the Materials Genome
Dr. Krishna Rajan, Iowa State University
The concept of a "materials genome" provides an enticing paradigm for materials scientists. However unlike the biological definition of a "gene", the physical interpretation of a "materials gene" has no clear definition. This presentation discusses how that definition can be formalized and quantified through informatics. Particular focus will be given to the use of the tools of statistical learning and data mining and the integration of these mathematical methods into experimental and computational materials science. It is shown how informatics can be used to discover and amplify structure-property correlations that would otherwise not have been detected easily or rapidly. Hence it is truly a "knowledge discovery" tool for material scientists that can serve to catalyze new scientific discoveries. Examples of the use of informatics as a formalism for "materials genomics" will be highlighted through a discussion of broad spectrum of materials science applications ranging from designing drug delivery materials to new multifunctional ceramics.
Professor Krishna Rajan is the Wilkinson Professor of Interdisciplinary Engineering at Iowa State University. He is on the faculty of the Department of Materials Science and Engineering and holds an appointment in the Bioinformatics and Computational Biology Program. Professor Rajan is also the Director of the Institute for Combinatorial Discovery at Iowa State University, an interdepartmental facility engaged in supporting various aspects of combinatorial materials science and informatics research. Krishna Rajan specializes in advancing the science of informatics and data driven discovery to the field of materials science and engineering. He is one of the first in the materials science community to systematically advance the tools of statistical learning and data mining in the field of materials modeling and characterization.
Odor Processing in Biological and Artificial Olfactory Systems
Dr. Barani Raman, Washington University
Barani Raman explains how odor signals are processed in the relatively simple olfactory system of the locust. Using electrophysiological recordings, odor representations are characterized in the first three olfactory processing centers: antenna, antennal lobe and mushroom body. To clarify the contributions of these olfactory circuits to the odor encoding process, well-constrained computational models of these circuits are used to demonstrate the transformations that occur as information is transmitted from one circuit to the next.
Barani Raman is an Assistant Professor in Department of Biomedical Engineering at Washington University. He received his Bachelor of Engineering in Computer Science with distinction from the University of Madras and the M.S. and Ph.D. degrees in Computer Science from Texas A&M University. He was a joint postdoctoral fellow at the National Institutes of Health and the National Institute of Standards and Technology. He is the recipient of the 2011 Wolfgang Gopel Award from the International Society for Olfaction and Chemical Sensing. His research interests include sensory and systems neuroscience, sensor-based machine olfaction, machine learning, biomedical intelligent systems, and dynamical systems.
A Unified Framework for Multi-INT Signal Processing
Dr. Michael Robinson, American University
Analysts must understand the behaviors and intent of targets from persistent, but sporadic observations. This is a signal processing problem, in which data sources could be traditional sensors or other analytics. The available data sources have different capabilities and back end data processing chains -- making their resulting data streams inconsistent. This frustrates their use by automated inference algorithms because a consistent mathematical theory of fusion has not been developed. It is therefore imperative that we develop a unified mathematical theory of data fusion that permits assembly of local sensor coverage regions into a global picture. Such a theory must also inspire effective, practical algorithms that exploit whatever sensor resources are available, and suggest ways to redeploy sensors for more effective collection.
The mathematical theory of sheaves appears to meet these needs, and has recently become useful in signal processing, providing new insight into such traditional topics as sampling theory, filter design, and detection processing. However, sheaves based on partially ordered sets are poised to make inroads into more complex problems involving target identification, track disambiguation, sensor handoff, and false alarm suppression. Since sheaves are not commonplace in signal processing, the talk will explain why they are uniquely valuable and effective. The discussion will center around critical illustrative examples that are particularly challenging for traditional methods, but are treated effectively by sheaves.
Michael Robinson is an applied mathematician working as an assistant professor at American University. He is interested in signal processing, dynamics, and applications of topology. He earned a Bachelor's degree in Electrical Engineering (2002) and a Master's degree in Mathematics (2003) from Rensselaer Polytechnic Institute. From that time, he has worked on projects involving radio propagation and network planning, bistatic radar processing, and advanced radar simulation. In 2008, he earned a Ph.D. in Applied Mathematics at Cornell University in which he developed topological methods for studying the dynamics of parabolic equations. His more recent efforts follow an emerging trend started during a postdoc at the University of Pennsylvania of topologically-motivated signal processing techniques.
Real-Time Anomaly Detection for Wide Area Surveillance
Dr. Katherine Simonson, Sandia National Laboratories
A new method is introduced for the real-time, causal, detection of small transient changes in large scenes under surveillance. The technique is designed for fast-framing, staring, remote sensors that are subject to platform jitter, pixel defects, variable focus, pointing drift, and other real-world challenges. The approach uses flexible statistical models for the scene background and its variability, which are continually updated to track gradual drift in the sensor’s performance and the scene under observation. Application to several different video sequences is illustrated. This is joint research with Tian Ma.
Dr. Katherine (Hansen) Simonson graduated from Middlebury College in 1984, with a B.A. in Mathematics. She attended graduate school at Princeton University, (Ph.D., Statistics, 1989), where her research focused on the development of statistical methods for analyzing spatial data arising in the fields of geophysics and structural geology. She has been at Sandia National Laboratories since 1989, where her title is Distinguished Member of the Technical Staff. At Sandia, Dr. Simonson has worked in a wide range of technical areas, including the development of algorithms for pattern classification in airborne radar data, statistical methods for combining multi-source information, detection of dim transient signals in frame-rate video data that is subject to high jitter, and automated registration of challenging imagery.
Adventures in Personal Genomics and Whole Omics Profiling
Dr. Michael Snyder, Stanford University
Personalized medicine is expected to benefit from the combination of genomic information with the global monitoring of molecular components and physiological states. To ascertain whether this can be achieved, we determined the whole genome sequence of an individual at high accuracy and performed an integrated Personal Omics Profiling (iPOP) analysis, combining genomic, transcriptomic, proteomic, metabolomic, and autoantibodyomic information, over a 21-month period that included healthy and two virally infected states. Our iPOP analysis of blood components revealed extensive, dynamic and broad changes in diverse molecular components and biological pathways across healthy and disease conditions. Importantly, genomic information was also used to estimate medical risks, including Type 2 Diabetes, whose onset was observed during the course of our study. Our study demonstrates that longitudinal personal omics profiling can relate genomic information to global functional omics activity for physiological and medical interpretation of healthy and disease states.
Dr. Snyder has run an independent lab for twenty-six years and trained approximately 90 postdoctoral fellows and 52 graduate students. Nearly all (>95%) these individuals have gone onto successful research careers in academia or industry. The remainder has used their experience successfully in related fields (law, medicine or pharmaceutical industry). He has also invented a number of technologies that have been patented and/or formed the basis of biotechnology companies including: protein microarrays (Protometrix, Inc. now part of Life Technologies); high throughput antibody production (Affomix Inc, now part of Illumina;, genome interpretation tools (Personalis, Inc). Dr Snyder also consults for numerous Biotechnology companies.
Statistical Challenges in the Practice of Firearm/Toolmarks
Dr. Cliff Spiegelman, Texas A&M Transportation Institute
Three recent NAS reports have found that forensic techniques are oversold to the justice system. Oversold techniques include compositional bullet lead analysis (CBLA), fingerprints, firearm toolmarks, and almost all forensic technologies except nuclear DNA. This talk will focus on firearm/toolmarks. After demonstrating problems some paths forward are presented. While firearm/toolmarks evidence is oversold in courts, the underlying concept is far from junk and the speaker believes that it can, in time and with a lot of work and modification, be made worthy of evidence acceptable to the true scientific community.
Dr. Cliff Spiegelman is a senior research scientist at the Texas A&M Transportation Institute, the state of Texas' transportation research agency. He joined the Texas A&M Department of Statistics in 1987 as an associate professor and became a distinguished professor in 2009. He’s one of the founders within the statistical sciences of the field of chemometrics, the science of using data to extract information from chemical systems. Professor Spiegelman applies this expertise to the forensic sciences, specifically making a difference in the courtroom by helping to provide justice for those wrongly convicted of crimes on the basis of flawed forensic science. Professor Spiegelman's interest in statistical forensics was sparked in 2002, when, because of his expertise in statistics in chemistry, he was appointed to serve on a National Research Council (NRC) panel to study bullet lead evidence. Spiegelman became an ardent opponent of a method called Comparative Bullet-Lead Analysis (CBLA), which partly through his work the FBI discredited in 2007. Among his other accomplishments, Spiegelman was a co-recipient of the American Statistical Association's 2008 Statistics in Chemistry Award for publishing findings that the forensic evidence used to rule out the presence of a second shooter in President Kennedy's slaying was fundamentally flawed. He currently testifies in several cases a year where he believes the forensic science is faulty. He often works with the Innocence Project, the national non-profit legal clinic dedicated to exonerating wrongfully convicted people through DNA testing and other post-verdict methods.
Soil Organisms and Response to Global Change
Dr. Diana Wall, Colorado State University
Soil biodiversity is key to the maintenance of soil resources and plant production, but is often overlooked in global policies addressing desertification, food security, climate change and loss of biodiversity. This is despite the diversity of roles played by the soil biota that benefit society including: erosion control, carbon and nutrient cycling, clean water and air, fertile soils, reduced greenhouse gases and control of pests and pathogens. Research from Antarctic extreme ecosystems provides evidence for the factors that determine diversity, abundance, survival, geographic range and function of soil biodiversity under climate change. Our challenge is to incorporate these and other findings to determine how soil biota will respond to global changes at larger scales and the implications for ecosystem services.
Dr. Diana H. Wall is a University Distinguished Professor. She is also a Professor of Biology and Senior Research Scientist in the Natural Resource Ecology Laboratory at Colorado State University. As a soil ecologist and environmental scientist she is actively engaged in research on sustaining soils and has spent 24 seasons in the Antarctic McMurdo Dry Valleys examining how global changes impact soil biodiversity, ecosystem processes and ecosystem services. Dr. Wall chaired the DIVERSITAS-International Biodiversity Observation Year-2001- 2002, the Global Litter Invertebrate Decomposition Experiment; and co-chaired the Millennium Development Goals Committee of the Millennium Ecosystem Assessment. Dr. Wall is a member of the Working Group of the President’s Council of Advisors on Science and Technology (PCAST), and is also a member of the UNESCO International Hydrological Program US National Committee. She is a Board Member of the World Resources Institute and Island Press, she has served as President of the Ecological Society of America, the American Institute of Biological Sciences and other scientific societies. Dr. Wall holds an Honorary Doctorate from Utrecht University, The Netherlands and is a Fellow of the American Association for the Advancement of Science. She was awarded the Tyler Prize in 2013, and was recently inducted in to the Colorado Women’s Hall of Fame. She received a B.A. and Ph.D. at the University of Kentucky, Lexington.
Topological Data Analysis and Visualization: From Vector Fields to High-Dimensional Data
Dr. Bei Wang, University of Utah
Topological data analysis and visualization is a new field of study that focuses on the following questions: (a) Given point cloud samples, can we infer the topological or geometric structure of the underlying data? (b) How can we present or communicate the inferred structure? I will discuss some of my recent research activities that draw inspirations from topology, geometry and machine learning, in studying: protein docking, vector fields and high-dimensional data. In particular I will focus on (a) structural extraction and visualization with persistent homology, (b) geometric inference using kernel density estimate and (c) topology inspired adaptive sampling.
Bei Wang is a research computer scientist at the Scientific Computing and Imaging (SCI) Institute of the University of Utah. She is part of the Center for Extreme Data Management Analysis and Visualization (CEDMAV). She did her Ph.D. in Computer Science at Duke University in 2010. She also obtained a certificate in Computational Biology and Bioinformatics at Duke University in 2010. Dr. Wang's research is focused on creating novel techniques at the intersection of topological data analysis and visualization. Her research interests span both theoretical computer science (in particular, computational topology) and data-driven applications. Her research interests include: theoretical and algorithmic aspects in computational topology and computational geometry; foundations, techniques and applications for scientific data analysis and visualization; computational biology and bioinformatics; machine learning; and data mining.
Sampling Strategies and Reconstruction Guarantees for Compressive Sensing Imaging
Dr. Rachel Ward, University of Texas at Austin
We survey several recent developments in compressive sensing theory in the areas of imaging and total variation minimization. We will start by discussing theoretical guarantees in the 'ideal' setting of low noise and incoherent measurements, and follow by discussing strategies for applying compressive sensing methodology when these conditions are not met.
Professor Ward is an assistant professor at the University of Texas at Austin in the department of Mathematics and ICES. Prior she was an NSF postdoctoral fellow at Courant Institute. She received her PhD from Princeton University in Applied and Computational Mathematics.
New HMM-based Methods for Ultra-large Alignment and Phylogeny Estimation
Dr. Tandy Warnow, The University of Illinois at Urbana-Champaign
Multiple sequence alignment of datasets containing many thousands of sequences is a challenging problem with applications in phylogeny estimation, protein structure and function prediction, taxon identification of metagenomic data, etc. However, few methods can analyze large datasets, and none have been shown to have good accuracy on datasets with more than about 10,000 sequences, especially if the sequence datasets have evolved with high rates of evolution.
In this talk, Tandy will present a new method to obtain highly accurate estimations of large-scale multiple sequence alignments and phylogenies. The basic idea is to use a family of Hidden Markov Models (HMMs) to represent a "seed alignment", and then align all the remaining sequences to the seed alignment. The UPP methodology returns accurate alignments that are both fast and scalable. This technique can also be used for other machine learning problems, including taxon identification of metagenomic data.
Tandy Warnow’s research combines mathematics, computer science, and statistics to develop improved models and algorithms for reconstructing complex and large-scale evolutionary histories in both biology and historical linguistics.
Tandy received her PhD in Mathematics at UC Berkeley under the direction of Gene Lawler, and did postdoctoral training with Simon Tavare and Michael Waterman at USC. She received the National Science Foundation Young Investigator Award in 1994, the David and Lucile Packard Foundation Award in Science and Engineering in 1996, a Radcliffe Institute Fellowship in 2006, and a Guggenheim Foundation Fellowship for 2011. Her current research focuses on phylogeny and alignment estimation for very large datasets (10,000 to 500,000 sequences), estimating species trees from collections of gene trees, and metagenomics.
A Hierarchical Nonparametric Bayesian Model That Integrates Multiple Sources of Lifetime Information to Model System Reliability
Dr. Richard Warr, Central Washington University
The need for new large-scale reliability models is becoming apparent as the amount of available data is expanding at a dramatic rate. It is not uncommon for complex systems to have thousands of components where each of these components may have many test data. This can amount to a large scale estimation project which challenges the computational feasibility of traditional reliability models. The solution presented in this work suggests a hierarchical nonparametric Bayesian framework, using beta-Stacy processes, in which time-to-event distributions are estimated from sample data (which may be randomly right censored) and possible expert opinion.
Dr. Richard Warr is a professor of Aerospace Studies at Central Washington University and a Lieutenant Colonel in the United States Air Force. He received his Ph.D. in statistics from the University of New Mexico in 2010. After his schooling, he served as an Assistant Professor of Statistics at the Air Force Institute of Technology (AFIT). Dr Warr's research interests include applied stochastic processes, Bayesian inference, and reliability. His most recent publication, "Numerical Approximation of Probability Mass Functions via the Inverse Discrete Fourier Transform," was published in Methodology and Computing in Applied Probability. He is currently collaborating on several projects with researchers from AFIT and Los Alamos National Laboratory.
PDE Transform — A Unified Paradigm for Image Analysis and Multiscale Modeling
Dr. Guowei Wei, Michigan State University
The past two decades have witnessed increasing interest in geometric partial differential equations (PDEs). However, much attention is paid to the use of second-order geometric PDEs as low-pass filters in signal, image and data analysis. This talk focuses on some non-conventional aspects of geometric PDEs. First, the construction of arbitrarily high-order geometric PDEs and their utility for image and surface analysis. Additionally, the design of nonlinear high-pass filters from a coupled PDE system is illustrated. Appropriate combination of geometric PDEs gives rise to the PDE transform. Like the wavelet transform, the PDE transform is able to decompose signal, image and data into functional modes with controllable time-frequency localizations. The inverse PDE transform leads to a perfect reconstruction. Finally, the analysis of the geometric feature of the PDE transform that offers a powerful means for the multiscale modeling of biomolecular systems. The resulting differential geometry based multiscale models encompass discrete atomistic descriptions of macromolecules and continuum macroscopic descriptions of solvent. Applications are discussed to biomedical images, molecular solvation, virus surface formation, protein-protein interactions, multiscale molecular dynamics, and ion channel transport.
Dr. Wei has accumulated research experience in many disciplines, including mathematics, physics, chemistry, biology, computer science and engineering. His current research interests include mathematical molecular biosciences, mathematical biophysics, nano-bio device analysis, molecular imaging and image analysis, quantum kinetic theory, interface and wavelet local spectral methods for partial differential equations, and nonlinear dynamics. Apart from research and education, Dr. Wei has served extensively to academic communities. He is an honorary editor, editor, associate editor or editorial board member for a number of international journals. He has served as a panelist or reviewer for a large number of funding agencies in various countries. He has organized many international conferences and workshops. He has also been invited to present his research in numerous conferences, workshops, colloquiums and seminars around world.
Metagenomics for Pathogen Detection and Discovery: Challenges, Solutions, and Successes
Dr. Ryan Weil, SRA
Culture based methods of pathogen detection and discovery have long been the standard for microbiologists and public health practitioners. Culture based methods are not without their draw backs since they are low throughput, costly and do not guarantee successful identification. The metagenomics approach made possible by the advent of new generation sequencers attempts to use brute force to detect and profile the pathogen(s) directly from a complex sample. While the metagenomics approach circumvents the need for culture, the identification of the pathogen(s) in a sample is greatly complicated by the presence of contaminating or commensal sequences. Overcoming these and other issues requires significant computational and scientific resources. A flexible framework to automate the end to end analysis of metagenomics data was developed, leveraging open source and CoTS products where possible. Using this pipeline, significant success in supporting the analysis and interpretation of simulated and real world samples has been achieved.
Dr. Weil received his BS in Microbiology from Texas A&M in College Station and his Ph.D. in Molecular Biophysics from the University of Texas Southwestern Medical Center in Dallas. His notable roles include working with Roche/454 supporting bioinformatics for three generations of their genome sequencer platform, managing the solutions for the bioinformatics and cheminformatics platforms at Strand Life Sciences, and serving as bioinformatics and platforms manager at Emory under Dr. Tim Read. Dr. Weil is currently a contractor with the Centers for Disease Control where he is the program manager for the Core Bioinformatics effort in the Office of Infectious Disease and the Metagenomic Rapid Pathogen Identification project.
Data Cartography: Using Maps to Navigate Knowledge Networks
Dr. Jevin West, University of Washington
As De Solla Price noted in 1965, the scholarly literature forms a vast network — where the nodes are the millions of papers published in scholarly journals and the links are the hundreds of millions of citations connecting these papers. These kinds of knowledge networks are not unique to the scholarly literature. One finds them in the patent literature, the world wide web, law documents, social media, and Wikipedia. New approaches to clustering and visualizing these kinds of networks make it possible to explore these systems similarly to the way we explore new cities geographically with Google Maps. In this presentation, Dr. West will talk about the methods for mapping knowledge networks and provide examples on how these maps can be used to identify innovative and influential ideas, people, and institutions.
Dr. Jevin West is an Assistant Professor at the University of Washington Information School and a Data Science Fellow at the eScience Institute. His research lies at the cross section of network science, knowledge organization, and information visualization. He co-runs the DataLab at UW and is the co-founder of Eigenfactor.org — a free website that ranks and maps the scholarly literature in order to better navigate and understand scientific knowledge. Prior to joining the faculty at UW, he was a post-doc in the Department of Physics at Umea University in Sweden and received his PhD in Biology from the University of Washington.
Statistics, Learning, and Optimization for Data Analysis and Visualization
Dr. Ross Whitaker, University of Utah, SCI Institute
A variety of technologies developed in diverse areas such as medical imaging, industrial inspection, and oil and gas have a common set of underlying goals and challenges. Many of these problems lend themselves to statistical methods that entail estimation, learning, or regression. This talk motivates these ideas from some very traditional problems in image processing and then develops the concepts of nonparametric modeling with applications to data analysis and visualization more generally. Applications will be presented for images from a variety of different sources as well as other high-dimensional data sets from simulation, demographics, and industrial processes.
Ross Whitaker graduated Summa Cum Laude with B.S. degree in Electrical Engineering and Computer Science from Princeton University in 1986. From 1986 to 1988 he worked for the Boston Consulting Group, entering the University of North Carolina at Chapel Hill in 1989. At UNC he received the Alumni Scholarship Award, and completed his Ph.D. in Computer Science in 1994. From 1994-1996 he worked at the European Computer-Industry Research Centre in Munich Germany as a research scientist in the User Interaction and Visualization Group. From 1996-2000 he was an Assistant Professor in the Department of Electrical Engineering at the University of Tennessee and received an NSF Career Award. Since 2000 he has been at the University of Utah where he is a Professor in the School of Computing and a faculty member of the Scientific Computing and Imaging Institute. He teaches discrete math, scientific visualization, and image processing. He has lead graduate-level research group in image analysis, geometry processing, and scientific computing, with a variety of projects supported by both federal agencies and industrial contracts.
Machine Learning for Improving the Quality of Citizen Science Data
Dr. Weng-Keen Wong, Oregon State University
Crowdsourcing combines the efforts of a large population of online users to perform tasks or services. This approach has been applied to scientific research, resulting in a paradigm known as citizen science, in which volunteers from the general public participate in scientific studies. This participation is often in the form of data collection, in which citizen scientists act as a large global network of human sensors. Although these "sensors" can collect large quantities of data, data quality is often a concern due to variability in the skills of volunteers. In this talk, I will describe how machine learning can be used to improve the quality of data submitted by a vast network of human sensors with different levels of reliability. In particular, I will describe how machine learning techniques can identify observer variability in detecting bird species and how these differences can be leveraged to improve models learned from the data collected.
Weng-Keen Wong is an Associate Professor of Computer Science at Oregon State University. He received his Ph.D. (2004) and M.S. (2001) in Computer Science at Carnegie Mellon University and his B.Sc. (1997) from the University of British Columbia. After completing his Ph.D, he was a Postdoctoral Associate at the Center for Biomedical Informatics at the University of Pittsburgh. In 2005, he joined Oregon State University as an Assistant Professor. His current research areas are in data mining and machine learning, with specific interests in anomaly detection, mining crowdsourcing data and human-in-the-loop learning.
Signature Discovery for Personalized Medicine
Dr. Ka Yee Yeung, University of Washington
The development of genetic predictors of clinical outcomes contributes to risk assessment in personalized medicine. Selecting a small number of signature genes for accurate classification of samples using gene expression data is essential for the development of diagnostic tests. However, many genes are highly correlated in gene expression data, and hence, many possible sets of genes are potential classifiers. We present multivariate variable selection methods built upon Bayesian Model Averaging (BMA) that accounts for model uncertainty. We aim to select robust signature genes that are both predictive and biologically relevant to the mechanism underlying the disease of interest.
Ka Yee Yeung is a Research Associate Professor in the Department of Microbiology at University of Washington. Her research focuses on the development of data mining tools and their application to computational biology, with a particular interest in the development of methods to effectively integrate heterogeneous high-throughput data sources in the construction of regulatory networks and the identification of biologically meaningful biomarkers. A computer scientist by training, (Ph.D. in Computer Science from University of Washington under the supervision of Larry Ruzzo) her research spans multiple fields, including computational biology, statistics and machine learning.