Courtney Corley

Richland, Washington, United States Contact Info
2K followers 500+ connections

Join to view profile

About

Dr. Courtney D. Corley is nationally recognized leader in the field of data science and…

Articles by Courtney

Activity

Experience & Education

  • Pacific Northwest National Laboratory

View Courtney’s full experience

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Licenses & Certifications

Publications

  • Fuzzy Simplicial Networks: A Topology-Inspired Model to Improve Task Generalization in Few-shot Learning

    AAAI Workshop on Meta-Learning and MetaDL

    Deep learning has shown great success in settings with massive amounts of data but has struggled when data is limited. Few-shot learning algorithms, which seek to address this limitation, are designed to generalize well to new tasks with limited data. Typically, models are evaluated on unseen classes and datasets that are defined by the same fundamental task as they are trained for (e.g. category membership). One can also ask how well a model can generalize to fundamentally different tasks…

    Deep learning has shown great success in settings with massive amounts of data but has struggled when data is limited. Few-shot learning algorithms, which seek to address this limitation, are designed to generalize well to new tasks with limited data. Typically, models are evaluated on unseen classes and datasets that are defined by the same fundamental task as they are trained for (e.g. category membership). One can also ask how well a model can generalize to fundamentally different tasks within a fixed dataset (for example: moving from category membership to tasks that involve detecting object orientation or quantity). To formalize this kind of shift we define a notion of “independence of tasks” and identify three new sets of labels for established computer vision datasets that test a model’s ability to generalize to tasks which draw on orthogonal attributes in the data. We use these datasets to investigate the failure modes of metric-based few-shot models. Based on our findings, we introduce a new few-shot model called Fuzzy Simplicial Networks (FSN) which leverages a construction from topology to more flexibly represent each class from limited data. In particular, FSN models can not only form multiple representations for a given class but can also begin to capture the low-dimensional structure which characterizes class manifolds in the encoded space of deep networks. We show that FSN outperforms state-of-the-art models on the challenging tasks we introduce in this paper while remaining competitive on standard few-shot benchmarks.

    Other authors
    See publication
  • Artificial Intelligence and Machine Learning: Designing for Safety and Security

    AAAS 2020

    Artificial intelligence and autonomous systems increasingly pervade every aspect of society, from transportation, energy, and medicine to how we interact with one other. While they offer great promise for more efficient and effective operations in all of these areas, they can also be exploited by malicious actors to disrupt critical systems. With the growth of machine intelligence come new challenges in designing systems that are resistant to intentional or unintentional manipulation…

    Artificial intelligence and autonomous systems increasingly pervade every aspect of society, from transportation, energy, and medicine to how we interact with one other. While they offer great promise for more efficient and effective operations in all of these areas, they can also be exploited by malicious actors to disrupt critical systems. With the growth of machine intelligence come new challenges in designing systems that are resistant to intentional or unintentional manipulation, developing methods to assess AI vulnerability and dependability, and quantifying uncertainty in AI-driven results.

    This session will review recent scientific research in the emerging field of trustworthy AI systems and will discuss the current state of defensive techniques to validate and verify the safe performance of AI. The session will focus on technological challenges and opportunities in AI safety as a basis for policy and legal frameworks to address the implications of increased autonomy. Attendees will leave with a better appreciation for how taking appropriate technical and policy actions can lead to maximizing the positive benefits of AI while mitigating risks. The session will encourage the global research community to apply its scientific and technical expertise to ensure safe AI.

    Speakers
    Security Implications of Artificial Intelligence Systems
    Dawn Song, University of California, Berkeley, Berkeley, CA
    Human-AI Teams: Understanding Tradeoffs
    Dan Weld, University of Washington, Seattle, WA
    Challenges in Test and Evaluation of AI
    Jane Pinelis, Johns Hopkins University, Laurel, MD

    Other authors
    See publication
  • Deep learning to generate in silico chemical property libraries and candidate molecules for small molecule identification in complex samples

    Analytical Chemistry

    Comprehensive and unambiguous identification of small molecules in complex samples will revolutionize our understanding of the role of metabolites in biological systems. Existing and emerging technologies have enabled measurement of chemical properties of molecules in complex mixtures and, in concert, are sensitive enough to resolve even stereoisomers. Despite these experimental advances, small molecule identification is inhibited by (i) chemical reference libraries (e.g., mass spectra…

    Comprehensive and unambiguous identification of small molecules in complex samples will revolutionize our understanding of the role of metabolites in biological systems. Existing and emerging technologies have enabled measurement of chemical properties of molecules in complex mixtures and, in concert, are sensitive enough to resolve even stereoisomers. Despite these experimental advances, small molecule identification is inhibited by (i) chemical reference libraries (e.g., mass spectra, collision cross section, and other measurable property libraries) representing <1% of known molecules, limiting the number of possible identifications, and (ii) the lack of a method to generate candidate matches directly from experimental features (i.e., without a library). To this end, we developed a variational autoencoder (VAE) to learn a continuous numerical, or latent, representation of molecular structure to expand reference libraries for small molecule identification. We extended the VAE to include a chemical property decoder, trained as a multitask network, in order to shape the latent representation such that it assembles according to desired chemical properties. The approach is unique in its application to metabolomics and small molecule identification, with its focus on properties that can be obtained from experimental measurements (m/z, CCS) paired with its training paradigm, which involved a cascade of transfer learning iterations. Our approach is orders of magnitude faster than first-principles simulation for CCS property prediction. Additionally, the ability to generate novel molecules along manifolds, defined by chemical property analogues, positions DarkChem as highly useful in a number of application areas, including metabolomics and small molecule identification, drug discovery and design, chemical forensics, and beyond.

    See publication
  • Forecasting influenza-like illness dynamics for military populations using neural networks and social media

    PLOS ONE

    This work is the first to take advantage of recurrent neural networks to predict influenza-like illness (ILI) dynamics from various linguistic signals extracted from social media data. Unlike other approaches that rely on timeseries analysis of historical ILI data and the state-of-the-art machine learning models, we build and evaluate the predictive power of neural network architectures based on Long Short Term Memory (LSTMs) units capable of nowcasting (predicting in “real-time”) and…

    This work is the first to take advantage of recurrent neural networks to predict influenza-like illness (ILI) dynamics from various linguistic signals extracted from social media data. Unlike other approaches that rely on timeseries analysis of historical ILI data and the state-of-the-art machine learning models, we build and evaluate the predictive power of neural network architectures based on Long Short Term Memory (LSTMs) units capable of nowcasting (predicting in “real-time”) and forecasting (predicting the future) ILI dynamics in the 2011 – 2014 influenza seasons. To build our models we integrate information people post in social media e.g., topics, embeddings, word ngrams, stylistic patterns, and communication behavior using hashtags and mentions. We then quantitatively evaluate the predictive power of different social media signals and contrast the performance of the-state-of-the-art regression models with neural networks using a diverse set of evaluation metrics. Finally, we combine ILI and social media signals to build a joint neural network model for ILI dynamics prediction. Unlike the majority of the existing work, we specifically focus on developing models for local rather than national ILI surveillance, specifically for military rather than general populations in 26 U.S. and six international locations., and analyze how model performance depends on the amount of social media data available per location.

    Other authors
    See publication
  • Uncovering the relationships between military community health and affects expressed in social media

    EPJ Data Science

    Military populations present a small, unique community whose mental and physical health impacts the security of the nation. Recent literature has explored social media’s ability to enhance disease surveillance and characterize distinct communities with encouraging results. We present a novel analysis of the relationships between influenza-like illnesses (ILI) clinical data and affects (i.e., emotions and sentiments) extracted from social media around military facilities. Our analyses examine…

    Military populations present a small, unique community whose mental and physical health impacts the security of the nation. Recent literature has explored social media’s ability to enhance disease surveillance and characterize distinct communities with encouraging results. We present a novel analysis of the relationships between influenza-like illnesses (ILI) clinical data and affects (i.e., emotions and sentiments) extracted from social media around military facilities. Our analyses examine (1) differences in affects expressed by military and control populations, (2) affect changes over time by users, (3) differences in affects expressed during high and low ILI seasons, and (4) correlations and cross-correlations between ILI clinical visits and affects from an unprecedented scale - 171M geo-tagged tweets across 31 global geolocations. Key findings include: Military and control populations differ in the way they express affects in social media over space and time. Control populations express more positive and less negative sentiments and less sadness, fear, disgust, and anger emotions than military. However, affects expressed in social media by both populations within the same area correlate similarly with ILI visits to military health facilities. We have identified potential responsible cofactors leading to location variability, e.g., region or state locale, military service type and/or the ratio of military to civilian populations. For most locations, ILI proportions positively correlate with sadness and neutral sentiment, which are the affects most often expressed during high ILI season. The ILI proportions negatively correlate with fear, disgust, surprise, and positive sentiment. These results are similar to the low ILI season where anger, surprise, and positive sentiment are highest. Finally, cross-correlation analysis shows that most affects lead ILI clinical visits, i.e. are predictive of ILI data.

    Other authors
    See publication
  • Sharkzor: Interactive Deep Learning for Image Triage, Sort, and Summary

    Human in the Loop Machine Learning Workshop at ICML 2017

    Sharkzor is a web application for machine-learning assisted image sort and summary. Deep learning algorithms are leveraged to infer, augment, and automate the user's mental model. Initially, images uploaded by the user are spread out on a canvas. The user then interacts with the images to impute their mental model into the application’s algorithmic underpinnings. Methods of interaction within Sharkzor's user interface and user experience support three primary user tasks; triage, organize and…

    Sharkzor is a web application for machine-learning assisted image sort and summary. Deep learning algorithms are leveraged to infer, augment, and automate the user's mental model. Initially, images uploaded by the user are spread out on a canvas. The user then interacts with the images to impute their mental model into the application’s algorithmic underpinnings. Methods of interaction within Sharkzor's user interface and user experience support three primary user tasks; triage, organize and automate. The user triages the large pile of overlapping images by moving images of interest into proximity. The user then organizes said images into meaningful groups. After interacting with the images and groups, deep learning helps to automate the user's interactions. The loop of interaction, automation, and response by the user allows the system to quickly make sense of large amounts of data.

    Other authors
  • Beyond Fine Tuning: A Modular Approach to Learning on Small Data

    arXiv

    In this paper we present a technique to train neural network models on small amounts of data. Current methods for training neural networks on small amounts of rich data typically rely on strategies such as fine-tuning a pre-trained neural network or the use of domain-specific hand-engineered features. Here we take the approach of treating network layers, or entire networks, as modules and combine pre-trained modules with untrained modules, to learn the shift in distributions between data sets…

    In this paper we present a technique to train neural network models on small amounts of data. Current methods for training neural networks on small amounts of rich data typically rely on strategies such as fine-tuning a pre-trained neural network or the use of domain-specific hand-engineered features. Here we take the approach of treating network layers, or entire networks, as modules and combine pre-trained modules with untrained modules, to learn the shift in distributions between data sets. The central impact of using a modular approach comes from adding new representations to a network, as opposed to replacing representations via fine-tuning. Using this technique, we are able surpass results using standard fine-tuning transfer learning approaches, and we are also able to significantly increase performance over such approaches when using smaller amounts of data.

    Other authors
    See publication
  • Discourse, Health and Well-being of Military Populations Through the Social Media Lens

    In Proceedings of the 3rd International Workshop on the World Wide Web and Population Health Intelligence at AAAI 2016

    Social media can provide a resource for characterizing com- munities and small populations through activities and con- tent shared online. For instance, studying the language use in social media within military populations may provide in- sights into their health and well-being. In this paper, we ad- dress three research questions: (1) How do military popula- tions use social media? (2) What do military users discuss in social media? And (3) Do military users talk about health and well-being…

    Social media can provide a resource for characterizing com- munities and small populations through activities and con- tent shared online. For instance, studying the language use in social media within military populations may provide in- sights into their health and well-being. In this paper, we ad- dress three research questions: (1) How do military popula- tions use social media? (2) What do military users discuss in social media? And (3) Do military users talk about health and well-being differently than civilians? Military Twitter users were identified through keywords in the profile description of users who posted geo-tagged tweets at military installations. The data was anonymized for the analysis. User tweets that belong to military populations were compared to non-military populations. Our results indicate that military users talk more about events in their military life, whereas non-military users talk more about school, work, and leisure activities. Addition- ally, we identified significant differences in communication behavior between two populations, including health-related language.

    Other authors
  • Disentangling the Lexicons of Disaster Response in Twitter

    In Proceedings of the Social Web for Disaster Management Workshop 2015 held at the 24th Int’l World Wide Web Conference. Florence, IT May 2015.

    People around the world use social media platforms such as Twitter to express their opinion and share activities about various aspects of daily life. In the same way social media changes communication in daily life, it also is transform- ing the way individuals communicate during disasters and emergencies. Because emergency officials have come to rely on social media to communicate alerts and updates, they must learn how users communicate disaster related content on social media. We used a…

    People around the world use social media platforms such as Twitter to express their opinion and share activities about various aspects of daily life. In the same way social media changes communication in daily life, it also is transform- ing the way individuals communicate during disasters and emergencies. Because emergency officials have come to rely on social media to communicate alerts and updates, they must learn how users communicate disaster related content on social media. We used a novel information-theoretic un- supervised learning tool, CorEx, to extract and characterize highly relevant content used by the public on Twitter during known emergencies, such as fires, explosions, and hurricanes. Using the resulting analysis, authorities may be able to score social media content and prioritize their attention toward those messages most likely to be related to the disaster.People around the world use social media platforms such as Twitter to express their opinion and share activities about various aspects of daily life. In the same way social media changes communication in daily life, it also is transform- ing the way individuals communicate during disasters and emergencies. Because emergency officials have come to rely on social media to communicate alerts and updates, they must learn how users communicate disaster related content on social media. We used a novel information-theoretic un- supervised learning tool, CorEx, to extract and characterize highly relevant content used by the public on Twitter during known emergencies, such as fires, explosions, and hurricanes. Using the resulting analysis, authorities may be able to score social media content and prioritize their attention toward those messages most likely to be related to the disaster.

    Other authors
  • The Heroes' Problems: Exploring the Potentials of Google Glass for Biohazard Handling Professionals.

    In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15).

    In "white powder incidents" or other suspicious and risky situations relating to deadly diseases or chemicals (e.g., Ebola investigation), those who handle the potentially hazardous materials are the heroes who spearhead the first responder's operations. Although well trained, these heroes need to manage complex problems and make life-or-death decisions while handling the unknown and dangerous. We are motivated to explore how Google Glass can facilitate those heroes' missions. To this end, we…

    In "white powder incidents" or other suspicious and risky situations relating to deadly diseases or chemicals (e.g., Ebola investigation), those who handle the potentially hazardous materials are the heroes who spearhead the first responder's operations. Although well trained, these heroes need to manage complex problems and make life-or-death decisions while handling the unknown and dangerous. We are motivated to explore how Google Glass can facilitate those heroes' missions. To this end, we conducted contextual inquiry on six biohazard-handling, Personal Protective Equipment (PPE)-wearing professionals. With an inductive thematic analysis, we summarized the heroes' workflow and four groups of "Heroes' Problems". A unique "A3 Model" (Awareness, Analysis, Action) was generated to encapsulate our qualitative findings and proposed Glass features. The findings servIn "white powder incidents" or other suspicious and risky situations relating to deadly diseases or chemicals (e.g., Ebola investigation), those who handle the potentially hazardous materials are the heroes who spearhead the first responder's operations. Although well trained, these heroes need to manage complex problems and make life-or-death decisions while handling the unknown and dangerous. We are motivated to explore how Google Glass can facilitate those heroes' missions. To this end, we conducted contextual inquiry on six biohazard-handling, Personal Protective Equipment (PPE)-wearing professionals. With an inductive thematic analysis, we summarized the heroes' workflow and four groups of "Heroes' Problems". A unique "A3 Model" (Awareness, Analysis, Action) was generated to encapsulate our qualitative findings and proposed Glass features. The findings serv

    Other authors
    See publication
  • Assessment of User Home Location Geoinference Methods

    IEEE International Conference on Intelligence and Security Informatics

    This study presents an assessment of multiple approaches to determine the home and/or other important locations to a Twitter user. In this study, we present a unique approach to the problem of geotagged data sparsity in social media when performing geoinferencing tasks. Given the sparsity of explicitly geotagged Twitter data, the ability to perform accurate and reliable user geolocation from a limited number of geotagged posts has proven to be quite useful. In our survey, we have achieved…

    This study presents an assessment of multiple approaches to determine the home and/or other important locations to a Twitter user. In this study, we present a unique approach to the problem of geotagged data sparsity in social media when performing geoinferencing tasks. Given the sparsity of explicitly geotagged Twitter data, the ability to perform accurate and reliable user geolocation from a limited number of geotagged posts has proven to be quite useful. In our survey, we have achieved accuracy rates of over 86% in matching Twitter user profile locations with their inferred home locations derived from geotagged posts.

    Other authors
  • Medical Vocabulary and Transmission Vector Alignment with Schema.org

    International Conference on Biomedical Ontology 2015

    Available biomedical ontologies and knowledge bases currently lack formal and standards-based interconnections between disease, disease vector, and drug treatment vocabularies. The PNNL Medical Linked Dataset (PNNL-MLD) addresses this gap. This paper describes the PNNL-MLD, which provides a unified vocabulary and dataset of drug, disease, side effect, and vector transmission background information. Currently, the PNNL-MLD combines and curates data from the following research projects: DrugBank,…

    Available biomedical ontologies and knowledge bases currently lack formal and standards-based interconnections between disease, disease vector, and drug treatment vocabularies. The PNNL Medical Linked Dataset (PNNL-MLD) addresses this gap. This paper describes the PNNL-MLD, which provides a unified vocabulary and dataset of drug, disease, side effect, and vector transmission background information. Currently, the PNNL-MLD combines and curates data from the following research projects: DrugBank, DailyMed, Diseasome, DisGeNet, Wikipedia Infobox, Sider, and PharmGKB. The main outcomes of this effort are a dataset aligned to Schema.org, including a parsing framework, and extensible hooks ready for integration with selected medical ontologies. The PNNL-MLD enables researchers more quickly and easily to query distinct datasets. Future extensions to the PNNL-MLD will include Traditional Chinese Medicine, broader interlinks across genetic structures, a larger thesaurus of synonyms and hypernyms, explicit coding of diseases and drugs across research systems, and incorporating vector-borne transmission vocabularies.

    Other authors
  • Forensic Signature Detection of Yersinia pestis Culturing Practices across Institutions Using a Bayesian Network

    Journal of Forensic Investigation

    To participate in the Biosurveillance Ecosystem (BSVE) program and to develop Internet-based analytics that integrates transformational capabilities to radically improve U.S. Department of Defense (DoD) and interagency efficiency and effectiveness in support of threat surveillance, including the detection, investigation, and response to infectious disease events. Pacific Northwest National Laboratory (PNNL) will also deliver three mobile phone applications developed through a teamed student…

    To participate in the Biosurveillance Ecosystem (BSVE) program and to develop Internet-based analytics that integrates transformational capabilities to radically improve U.S. Department of Defense (DoD) and interagency efficiency and effectiveness in support of threat surveillance, including the detection, investigation, and response to infectious disease events. Pacific Northwest National Laboratory (PNNL) will also deliver three mobile phone applications developed through a teamed student competition. The successful creation of an advanced analytics and the mobile applications will 1) improve the understanding of disease baseline and event prediction related to human social, cultural, and behavioral data; environmental/climatological data; disease risk mapping; and 2) enable users to predict, alert, forecast and manage a biothreat event—whether emerging, endemic, or intentionally introduced—within 24 hours to minimize harmful impact to the warfighter and society.

    Other authors
    See publication
  • Disease prediction models and operational readiness.

    PLoS One

    The objective of this manuscript is to present a systematic review of biosurveillance models that operate on select agents and can forecast the occurrence of a disease event. We define a disease event to be a biological event with focus on the One Health paradigm. These events are characterized by evidence of infection and or disease condition. We reviewed models that attempted to predict a disease event, not merely its transmission dynamics and we considered models involving pathogens of…

    The objective of this manuscript is to present a systematic review of biosurveillance models that operate on select agents and can forecast the occurrence of a disease event. We define a disease event to be a biological event with focus on the One Health paradigm. These events are characterized by evidence of infection and or disease condition. We reviewed models that attempted to predict a disease event, not merely its transmission dynamics and we considered models involving pathogens of concern as determined by the US National Select Agent Registry (as of June 2011). We searched commercial and government databases and harvested Google search results for eligible models, using terms and phrases provided by public health analysts relating to biosurveillance, remote sensing, risk assessments, spatial epidemiology, and ecological niche modeling. After removal of duplications and extraneous material, a core collection of 6,524 items was established, and these publications along with their abstracts are presented in a semantic wiki at http://BioCat.pnnl.gov. As a result, we systematically reviewed 44 papers, and the results are presented in this analysis. We identified 44 models, classified as one or more of the following: event prediction, spatial, ecological niche, diagnostic or clinical, spread or response, and reviews. The model parameters (e.g., etiology, climatic, spatial, cultural) and data sources (e.g., remote sensing, non-governmental organizations, expert opinion, epidemiological) were recorded and reviewed. A component of this review is the identification of verification and validation (V&V) methods applied to each model, if any V&V method was reported. All models were classified as either having undergone Some Verification or Validation method, or No Verification or Validation. We close by outlining an initial set of operational readiness level guidelines for disease prediction models based upon established Technology Readiness Level definitions.

    Other authors
    See publication
  • BioCat 2.0

    PNNL-22767

    The U.S. Department of Homeland Security (DHS) National Biosurveillance Integration Center (NBIC) was established in 2008 with a primary mission to “(1) enhance the capability of the Federal Government to (A) rapidly identify, characterize, localize, and track a biological event of national concern by integrating and analyzing data relating to human health, animal, plant, food, and environmental monitoring systems (both national and international); and (B) disseminate alerts and other…

    The U.S. Department of Homeland Security (DHS) National Biosurveillance Integration Center (NBIC) was established in 2008 with a primary mission to “(1) enhance the capability of the Federal Government to (A) rapidly identify, characterize, localize, and track a biological event of national concern by integrating and analyzing data relating to human health, animal, plant, food, and environmental monitoring systems (both national and international); and (B) disseminate alerts and other information to Member Agencies and, in coordination with (and where possible through) Member Agencies, to agencies of State, local, and tribal governments, as appropriate, to enhance the ability of such agencies to respond to a biological event of national concern; and (2) oversee development and operation of the National Biosurveillance Integration System (NBIS).” Inherent in its mission then and the broader NBIS, NBIC is concerned with the identification, understanding, and use of a variety of biosurveillance models and systems. The goal of this project is to characterize, evaluate, classify, and catalog existing disease forecast and prediction models that could provide operational decision support for recognizing a biological event having a potentially significant impact. Additionally, gaps should be identified and recommendations made on using disease models in an operational environment to support real-time decision making.

    Other authors
    See publication
  • Current Trends in the Detection of Sociocultural Signatures: Data-Driven Models.

    Sociocultural Behavior Sensemaking: State of the Art in Understanding the Operational Environment

    The harvesting of behavioral data and their analysis through evidence-based reasoning enable the detection of sociocultural signatures in their context to support situation awareness and decision making. Harvested data are used as training materials from which to infer computational models of sociocultural behaviors or calibrate parameters for such models. Harvested data also serve as evidence input that the models use to provide insights about observed and future behaviors for targets of…

    The harvesting of behavioral data and their analysis through evidence-based reasoning enable the detection of sociocultural signatures in their context to support situation awareness and decision making. Harvested data are used as training materials from which to infer computational models of sociocultural behaviors or calibrate parameters for such models. Harvested data also serve as evidence input that the models use to provide insights about observed and future behaviors for targets of interest. The harvested data is often the result of assembling diverse data types and aggregating them into a form suitable for analysis. Data need to be analyzed to bring out the categories of content that are relevant to the domain being addressed in order to train or run a model. If, for example, we are modeling the intent of a group to engage in violent behavior using messages that the group has broadcasted, then these messages need to be processed to extract and measure indicators of violent intent. The extracted indicators and the associated measurements (e.g. rates or counts of occurrence) can then be used to train/calibrate and run computational models that assess the propensity for violence expressed in the source message. Ubiquitous access to the Internet, mobile telephony and technologies such as digital photography and digital video have enabled social media application platforms such as Facebook, YouTube, and Twitter that are altering the nature of human social interaction. The fast increasing pace of online social interaction introduces new challenges and opportunities for gathering sociocultural data. Challenges include the development of harvesting and processing techniques tailored to new data environments and formats (e.g. Twitter, Facebook), the integration of social media content with traditional media content, and the protection of personal privacy...

    Other authors
  • Fusion of laboratory and textual data for investigative bioforensics

    Forensic Science International

    Chemical and biological forensic programs focus on the identification of a threat and acquisition of laboratory measurements to determine how a threat agent may have been produced. However, to generate investigative leads, it might also be useful to identify institutions where the same agent has been produced by the same or a very similar process, since the producer of the agent may have learned methods at a university or similar institution. We have developed a Bayesian network framework that…

    Chemical and biological forensic programs focus on the identification of a threat and acquisition of laboratory measurements to determine how a threat agent may have been produced. However, to generate investigative leads, it might also be useful to identify institutions where the same agent has been produced by the same or a very similar process, since the producer of the agent may have learned methods at a university or similar institution. We have developed a Bayesian network framework that fuses hard and soft data sources to assign probability to production practices. It combines the results of laboratory measurements with an automatic text reader to scan scientific literature and rank institutions that had published papers on the agent of interest in order of the probability that the institution has the capability to generate the sample of interest based on laboratory data. We demonstrate the Bayesian network on an example case from microbial forensics, predicting the methods used to produce Bacillus anthracis spores based on mass spectrometric measurements and identifying institutions that have a history of growing Bacillus spores using the same or highly similar methods. We illustrate that the network model can assign a higher posterior probability than expected by random chance to appropriate institutions when trained using only a small set of manually analyzed documents. This is the first example of an automated methodology to integrate experimental and textual data for the purpose of investigative forensics.

    Other authors
    See publication
  • Assessing the Quality of Bioforensic Signatures

    Proceedings of the 2013 IEEE Intelligence and Security Informatics Conference

    We present a mathematical framework for assessing the quality of signature systems in terms of fidelity, risk, cost, other attributes, and utility—a method we call Signature Quality Metrics (SQM). We demonstrate the SQM approach by assessing the quality of a signature system designed to predict the culture medium used to grow a microorganism. The system consists of four chemical assays and a Bayesian network that estimates the probabilities the microorganism was grown using one of eleven…

    We present a mathematical framework for assessing the quality of signature systems in terms of fidelity, risk, cost, other attributes, and utility—a method we call Signature Quality Metrics (SQM). We demonstrate the SQM approach by assessing the quality of a signature system designed to predict the culture medium used to grow a microorganism. The system consists of four chemical assays and a Bayesian network that estimates the probabilities the microorganism was grown using one of eleven culture media. We evaluated fifteen combinations of the signature system by removing one or more of the assays from the Bayes net. We show how SQM can be used to compare the various combinations while accounting for the tradeoffs among three attributes of interest: fidelity, cost,and the amount of sample material consumed by the assays.

    Other authors
  • Jargon and Graph Modularity on Twitter

    IEEE Int'l Conference on Intelligence and Security Informatics

    The language of conversation is just as dependent upon word choice as it is on who is taking part. Twitter provides an excellent test-bed in which to conduct experiments not only on language usage but on who is using what language with whom. To find communities, we combine large scale graph analytical techniques with known socio-linguistic methods. In this article we leverage both curated vocabularies and naive mathematical graph analyses to determine if community structure on Twitter…

    The language of conversation is just as dependent upon word choice as it is on who is taking part. Twitter provides an excellent test-bed in which to conduct experiments not only on language usage but on who is using what language with whom. To find communities, we combine large scale graph analytical techniques with known socio-linguistic methods. In this article we leverage both curated vocabularies and naive mathematical graph analyses to determine if community structure on Twitter corroborates with modern socio-linguistic theory. The results reported indicate that, based on networks constructed from user to user communication and communities identified using the Clauset-Newman greedy modularity algorithm we find that more prolific users of these curated vocabularies are concentrated in distinct network communities.

    Other authors
    • Chase Dowling
    • Bill Reynolds
    • Rob Farber
  • SociAL Sensor Analytics: Measuring Phenomenology at Scale

    IEEE Int'l Conference on Intelligence and Security Informatics

    The objective of this paper is to present a system for interrogating immense social media streams through analytical methodologies that characterize topics and events critical to tactical and strategic planning. First, we propose a conceptual framework for interpreting social media as a sensor network. Time-series models and topic clustering algorithms are used to implement this concept into a functioning analytical system. Next, we address two scientific challenges: 1) to understand, quantify,…

    The objective of this paper is to present a system for interrogating immense social media streams through analytical methodologies that characterize topics and events critical to tactical and strategic planning. First, we propose a conceptual framework for interpreting social media as a sensor network. Time-series models and topic clustering algorithms are used to implement this concept into a functioning analytical system. Next, we address two scientific challenges: 1) to understand, quantify, and baseline phenomenology of social media at scale, and 2) to develop analytical methodologies to detect and investigate events of interest. This paper then documents computational methods and reports experimental findings that address these challenges. Ultimately, the ability to process billions of social media posts per week over a period of years enables the identification of patterns and predictors of tactical and strategic concerns at an unprecedented rate through SociAL Sensor Analytics (SALSA).

    Other authors
    • Chase Dowling
    • Stuart J Rose
    • Taylor McKenzie
  • Sociolect-based Community Detection

    IEEE Int'l Conference on Intelligence and Security Informatics

    “Sociolects” are specialized vocabularies used by social subgroups defined by common interests or origins. We applied methods to retrieve large quantities of Twitter data based on expert-identified sociolects and then applied and developed network-analysis methods to relate sociolect use to network (sub-) structure. We show that novel methods including consideration of node populations, as well as edge counts, provide substantially enhanced performance compared to standard assortativity. We…

    “Sociolects” are specialized vocabularies used by social subgroups defined by common interests or origins. We applied methods to retrieve large quantities of Twitter data based on expert-identified sociolects and then applied and developed network-analysis methods to relate sociolect use to network (sub-) structure. We show that novel methods including consideration of node populations, as well as edge counts, provide substantially enhanced performance compared to standard assortativity. We explain these methods, show their utility in analyzing large corpora of social media data, and discuss their further extensions and potential applications.

    Other authors
    • Bill Reynolds
    • William Salter
    • Rob Farber
    • Chase Dowling
  • Disease Models for Event Prediction

    International Society for Disease Surveillance Annual Conference

    A rich and diverse field of infectious disease modeling has emerged over the past 60 years and has advanced our understanding of population- and individual-level disease transmission dynamics, including risk factors, virulence and spatio-temporal patterns of disease spread. Recent modeling advances include biostatistical methods, and massive agent-based population, biophysical, ordinary differential equation, and ecological-niche models. Diverse data sources are being integrated into these…

    A rich and diverse field of infectious disease modeling has emerged over the past 60 years and has advanced our understanding of population- and individual-level disease transmission dynamics, including risk factors, virulence and spatio-temporal patterns of disease spread. Recent modeling advances include biostatistical methods, and massive agent-based population, biophysical, ordinary differential equation, and ecological-niche models. Diverse data sources are being integrated into these models as well, such as demographics, remotely-sensed measurements and imaging, environmental measurements, and surrogate data such as news alerts and social media. Yet, there remains a gap in the sensitivity and specificity of these models not only in tracking infectious disease events but also predicting their occurrence. One of the primary goals of this research was to characterize the viability of biosurveillance models to provide operationally relevant information to decision makers, in order to identify areas for future research. Two critical characteristics differentiate this work from other infectious disease modeling reviews. First, we reviewed models that attempted to predict the disease event, not merely its transmission dynamics. Second, we considered models involving pathogens of concern as determined by the US National Select Agent Registry. We close with recommended operational readiness level guidelines, based on established Technology Readiness Level definitions.

    Other authors
  • Outside the Continental United States International Travel and Contagion Impact Quick Look Tool

    ACM SIGSPATIAL HealthGIS'12, Redondo Beach, CA, USA

    This paper describes a tool that will allow public health analysts to estimate infectious disease risk at the country level as a function of different international transportation modes. The prototype focuses on a cholera epidemic originating within Latin America or the Caribbean, but it can be expanded to consider other pathogens as well. This effort leverages previous work in collaboration with the Centers for Disease Control and Prevention to develop the International Travel to Community…

    This paper describes a tool that will allow public health analysts to estimate infectious disease risk at the country level as a function of different international transportation modes. The prototype focuses on a cholera epidemic originating within Latin America or the Caribbean, but it can be expanded to consider other pathogens as well. This effort leverages previous work in collaboration with the Centers for Disease Control and Prevention to develop the International Travel to Community Impact (IT-CI) model, which analyzes and assesses potential international disease outbreaks then estimates the associated impacts to U.S. communities and the nation as a whole and orient it for use Outside the Continental United States (OCONUS). For brevity, we refer to this refined model as OIT-CI. First, we developed an operationalized meta-population spatial cholera model for Latin America and the Caribbean at the secondary administrative-level boundary. Secondly, we developed a robust function of human airline critical to approximating mixing patterns in the meta-population model. In the prototype version currently presented here, OIT-CI models a cholera epidemic originating in a Latin American or Caribbean country and spreading via airline transportation routes. Disease spread is modeled at the country level using a patch model with a connectivity function based on demographic, geospatial, and human transportation data. We have also identified data to estimate the water and health-related infrastructure capabilities of each country to include this potential impact on disease transmission.

    Other authors
    See publication
  • Assessing the Continuum of Event-Based Biosurveillance Through an Operational Lens.

    Biosecurity and Bioterrorism: Biodefense Strategy, Practice, and Science. March 2012.

    This research follows the Updated Guidelines for Evaluating Public Health Surveillance Systems, Recommendations from the Guidelines Working Group, published by the Centers for Disease Control and Prevention nearly a decade ago. Since then, models have been developed and complex systems have evolved with a breadth of disparate data to detect or forecast chemical, biological, and radiological events that have a significant impact on the One Health landscape. How the attributes identified in 2001…

    This research follows the Updated Guidelines for Evaluating Public Health Surveillance Systems, Recommendations from the Guidelines Working Group, published by the Centers for Disease Control and Prevention nearly a decade ago. Since then, models have been developed and complex systems have evolved with a breadth of disparate data to detect or forecast chemical, biological, and radiological events that have a significant impact on the One Health landscape. How the attributes identified in 2001 relate to the new range of event-based biosurveillance technologies is unclear. This article frames the continuum of event-based biosurveillance systems (that fuse media reports from the internet), models (ie, computational that forecast disease occurrence), and constructs (ie, descriptive analytical reports) through an operational lens (ie, aspects and attributes associated with operational considerations in the development, testing, and validation of the event-based biosurveillance methods and models and their use in an operational environment). A workshop was held in 2010 to scientifically identify, develop, and vet a set of attributes for event-based biosurveillance. Subject matter experts were invited from 7 federal government agencies and 6 different academic institutions pursuing research in biosurveillance event detection. We describe 8 attribute families for the characterization of event-based biosurveillance: event, readiness, operational aspects, geographic coverage, population coverage, input data, output, and cost. Ultimately, the analyses provide a framework from which the broad scope, complexity, and relevant issues germane to event-based biosurveillance useful in an operational environment can be characterized.

    Other authors
    See publication
  • Thought Leaders during Crises in Massive Social Networks

    Statistical Analysis and Data Mining. Wiley, ed.

    Making vast amounts of online social media data comprehensible to an analyst is a key question in operational analytics. This paper focuses on methods to assemble social network graphs from online social media to reveal nodes that are ‘interesting’ in the context of operational analysis—meaning that the computational results can be interpreted by a human analyst wishing to answer some operational questions.

    The reported results demonstrate that nodes with a high impact or…

    Making vast amounts of online social media data comprehensible to an analyst is a key question in operational analytics. This paper focuses on methods to assemble social network graphs from online social media to reveal nodes that are ‘interesting’ in the context of operational analysis—meaning that the computational results can be interpreted by a human analyst wishing to answer some operational questions.

    The reported results demonstrate that nodes with a high impact or disproportionally large agency on the whole network (e.g., online community) can be found in a variety of online communities. Validation of the importance of these high-agency nodes by human and computational methods is discussed, and the efficacy of our approach by both quantitative methods and tests against the null hypothesis is reported.

    Other authors
    • Rob Farber
    See publication
  • Use of Social Media to Target Information-Driven Arms Control and Nonproliferation Verification.

    to appear in the 53rd annual meeting of the Institute of Nuclear Materials Management (INMM 2012)

    There has been considerable discussion within the national security community, including a recent workshop sponsored by the U.S. State Department, about the use of social media for extracting patterns of collective behavior and influencing public perception in areas relevant to arms control and nonproliferation. This paper seeks to explore how social media can be used to supplement nonproliferation and arms control inspection and monitoring activities on states and sites of greatest…

    There has been considerable discussion within the national security community, including a recent workshop sponsored by the U.S. State Department, about the use of social media for extracting patterns of collective behavior and influencing public perception in areas relevant to arms control and nonproliferation. This paper seeks to explore how social media can be used to supplement nonproliferation and arms control inspection and monitoring activities on states and sites of greatest proliferation relevance. In this paper, we set the stage for how social media can be applied in this problem space and describe some of the foreseen challenges, including data validation, sources and attributes, verification, and security. Using information analytics and data visualization capabilities available at Pacific Northwest National Laboratory (PNNL), we provide examples of some social media “signatures” of potential relevance for nonproliferation and arms control purposes. We conclude by offering recommendations for further research.

    Other authors
    See publication
  • Massive Social Network Analysis: Mining Twitter for Social Good

    International Conference on Parallel Processing

    Social networks produce an enormous quan- tity of data. Facebook consists of over 400 million ac- tive users sharing over 5 billion pieces of information each month. Analyzing this vast quantity of unstructured data presents challenges for software and hardware. We present GraphCT, a Graph Characterization Toolkit for massive graphs representing social network data. On a 128- processor Cray XMT, GraphCT estimates the betweenness centrality of an artificially generated (R-MAT) 537 million…

    Social networks produce an enormous quan- tity of data. Facebook consists of over 400 million ac- tive users sharing over 5 billion pieces of information each month. Analyzing this vast quantity of unstructured data presents challenges for software and hardware. We present GraphCT, a Graph Characterization Toolkit for massive graphs representing social network data. On a 128- processor Cray XMT, GraphCT estimates the betweenness centrality of an artificially generated (R-MAT) 537 million vertex, 8.6 billion edge graph in 55 minutes and a real- world graph (Kwak, et al.) with 61.6 million vertices and 1.47 billion edges in 105 minutes. We use GraphCT to analyze public data from Twitter, a microblogging network. Twitter’s message connections appear primarily tree-structured as a news dissemination system. Within the public data, however, are clusters of conversations. Using GraphCT, we can rank actors within these conversations and help analysts focus attention on a much smaller data subset. In. San Diego, CA 2010 PNNL-SA-71335

    Other authors
    See publication

Courses

  • Socio-Behavioral Computing

    CptS 483 & 580

Honors & Awards

  • Recent Graduate Alumni Award

    Computer Science and Engineering Department

  • Project Team of the Year

    PNNL National Security Directorate

  • Ronald L. Brodzinski Early Career Exceptional Achievement Award

    Pacific Northwest National Laboratory

    Each year PNNL recognizes excellence in science and engineering with the Laboratory Director’s Science and Engineering Achievement Awards. This year, National Security Directorate’s Courtney Corley is among the honorees. Court is the recipient of the Ronald L. Brodzinski Early Career Exceptional Achievement Award for his research in biosecurity, which is actively contributing to solving some of the nation’s toughest challenges.

  • Project Team of the Year: BEOWulf

    PNNL National Security Directorate

    Project manager of the Pacific Northwest National Laboratory Nuclear, Chemical, and Biological Surety and Signatures project management office team of the year (2013).

  • Best Overall Paper Award

    IEEE International Conference on Intelligence and Security Informatics

    "SociAL Sensor Analytics: Measuring Phenomenology at Scale" Courtney D Corley, Chase P Dowling, Stuart J Rose, and Taylor McKenzie.

Languages

  • English

    Native or bilingual proficiency

  • Spanish

    Limited working proficiency

Organizations

  • International Society for Disease Surveillance

    Social Media Working Group Lead

    -

View Courtney’s full profile

  • See who you know in common
  • Get introduced
  • Contact Courtney directly
Join to view full profile

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Courtney Corley in United States

Add new skills with these courses