-
Reward-Free Attacks in Multi-Agent Reinforcement Learning
Authors:
Ted Fujimoto,
Timothy Doster,
Adam Attarian,
Jill Brandenberger,
Nathan Hodas
Abstract:
We investigate how effective an attacker can be when it only learns from its victim's actions, without access to the victim's reward. In this work, we are motivated by the scenario where the attacker wants to behave strategically when the victim's motivations are unknown. We argue that one heuristic approach an attacker can use is to maximize the entropy of the victim's policy. The policy is gener…
▽ More
We investigate how effective an attacker can be when it only learns from its victim's actions, without access to the victim's reward. In this work, we are motivated by the scenario where the attacker wants to behave strategically when the victim's motivations are unknown. We argue that one heuristic approach an attacker can use is to maximize the entropy of the victim's policy. The policy is generally not obfuscated, which implies it may be extracted simply by passively observing the victim. We provide such a strategy in the form of a reward-free exploration algorithm that maximizes the attacker's entropy during the exploration phase, and then maximizes the victim's empirical entropy during the planning phase. In our experiments, the victim agents are subverted through policy entropy maximization, implying an attacker might not need access to the victim's reward to succeed. Hence, reward-free attacks, which are based only on observing behavior, show the feasibility of an attacker to act strategically without knowledge of the victim's motives even if the victim's reward information is protected.
△ Less
Submitted 1 December, 2021;
originally announced December 2021.
-
Adaptive Transfer Learning: a simple but effective transfer learning
Authors:
Jung H Lee,
Henry J Kvinge,
Scott Howland,
Zachary New,
John Buckheit,
Lauren A. Phillips,
Elliott Skomski,
Jessica Hibler,
Courtney D. Corley,
Nathan O. Hodas
Abstract:
Transfer learning (TL) leverages previously obtained knowledge to learn new tasks efficiently and has been used to train deep learning (DL) models with limited amount of data. When TL is applied to DL, pretrained (teacher) models are fine-tuned to build domain specific (student) models. This fine-tuning relies on the fact that DL model can be decomposed to classifiers and feature extractors, and a…
▽ More
Transfer learning (TL) leverages previously obtained knowledge to learn new tasks efficiently and has been used to train deep learning (DL) models with limited amount of data. When TL is applied to DL, pretrained (teacher) models are fine-tuned to build domain specific (student) models. This fine-tuning relies on the fact that DL model can be decomposed to classifiers and feature extractors, and a line of studies showed that the same feature extractors can be used to train classifiers on multiple tasks. Furthermore, recent studies proposed multiple algorithms that can fine-tune teacher models' feature extractors to train student models more efficiently. We note that regardless of the fine-tuning of feature extractors, the classifiers of student models are trained with final outputs of feature extractors (i.e., the outputs of penultimate layers). However, a recent study suggested that feature maps in ResNets across layers could be functionally equivalent, raising the possibility that feature maps inside the feature extractors can also be used to train student models' classifiers. Inspired by this study, we tested if feature maps in the hidden layers of the teacher models can be used to improve the student models' accuracy (i.e., TL's efficiency). Specifically, we developed 'adaptive transfer learning (ATL)', which can choose an optimal set of feature maps for TL, and tested it in the few-shot learning setting. Our empirical evaluations suggest that ATL can help DL models learn more efficiently, especially when available examples are limited.
△ Less
Submitted 21 November, 2021;
originally announced November 2021.
-
One Representation to Rule Them All: Identifying Out-of-Support Examples in Few-shot Learning with Generic Representations
Authors:
Henry Kvinge,
Scott Howland,
Nico Courts,
Lauren A. Phillips,
John Buckheit,
Zachary New,
Elliott Skomski,
Jung H. Lee,
Sandeep Tiwari,
Jessica Hibler,
Courtney D. Corley,
Nathan O. Hodas
Abstract:
The field of few-shot learning has made remarkable strides in developing powerful models that can operate in the small data regime. Nearly all of these methods assume every unlabeled instance encountered will belong to a handful of known classes for which one has examples. This can be problematic for real-world use cases where one routinely finds 'none-of-the-above' examples. In this paper we desc…
▽ More
The field of few-shot learning has made remarkable strides in developing powerful models that can operate in the small data regime. Nearly all of these methods assume every unlabeled instance encountered will belong to a handful of known classes for which one has examples. This can be problematic for real-world use cases where one routinely finds 'none-of-the-above' examples. In this paper we describe this challenge of identifying what we term 'out-of-support' (OOS) examples. We describe how this problem is subtly different from out-of-distribution detection and describe a new method of identifying OOS examples within the Prototypical Networks framework using a fixed point which we call the generic representation. We show that our method outperforms other existing approaches in the literature as well as other approaches that we propose in this paper. Finally, we investigate how the use of such a generic point affects the geometry of a model's feature space.
△ Less
Submitted 2 June, 2021;
originally announced June 2021.
-
Prototypical Region Proposal Networks for Few-Shot Localization and Classification
Authors:
Elliott Skomski,
Aaron Tuor,
Andrew Avila,
Lauren Phillips,
Zachary New,
Henry Kvinge,
Courtney D. Corley,
Nathan Hodas
Abstract:
Recently proposed few-shot image classification methods have generally focused on use cases where the objects to be classified are the central subject of images. Despite success on benchmark vision datasets aligned with this use case, these methods typically fail on use cases involving densely-annotated, busy images: images common in the wild where objects of relevance are not the central subject,…
▽ More
Recently proposed few-shot image classification methods have generally focused on use cases where the objects to be classified are the central subject of images. Despite success on benchmark vision datasets aligned with this use case, these methods typically fail on use cases involving densely-annotated, busy images: images common in the wild where objects of relevance are not the central subject, instead appearing potentially occluded, small, or among other incidental objects belonging to other classes of potential interest. To localize relevant objects, we employ a prototype-based few-shot segmentation model which compares the encoded features of unlabeled query images with support class centroids to produce region proposals indicating the presence and location of support set classes in a query image. These region proposals are then used as additional conditioning input to few-shot image classifiers. We develop a framework to unify the two stages (segmentation and classification) into an end-to-end classification model -- PRoPnet -- and empirically demonstrate that our methods improve accuracy on image datasets with natural scenes containing multiple object classes.
△ Less
Submitted 8 April, 2021;
originally announced April 2021.
-
Fuzzy Simplicial Networks: A Topology-Inspired Model to Improve Task Generalization in Few-shot Learning
Authors:
Henry Kvinge,
Zachary New,
Nico Courts,
Jung H. Lee,
Lauren A. Phillips,
Courtney D. Corley,
Aaron Tuor,
Andrew Avila,
Nathan O. Hodas
Abstract:
Deep learning has shown great success in settings with massive amounts of data but has struggled when data is limited. Few-shot learning algorithms, which seek to address this limitation, are designed to generalize well to new tasks with limited data. Typically, models are evaluated on unseen classes and datasets that are defined by the same fundamental task as they are trained for (e.g. category…
▽ More
Deep learning has shown great success in settings with massive amounts of data but has struggled when data is limited. Few-shot learning algorithms, which seek to address this limitation, are designed to generalize well to new tasks with limited data. Typically, models are evaluated on unseen classes and datasets that are defined by the same fundamental task as they are trained for (e.g. category membership). One can also ask how well a model can generalize to fundamentally different tasks within a fixed dataset (for example: moving from category membership to tasks that involve detecting object orientation or quantity). To formalize this kind of shift we define a notion of "independence of tasks" and identify three new sets of labels for established computer vision datasets that test a model's ability to generalize to tasks which draw on orthogonal attributes in the data. We use these datasets to investigate the failure modes of metric-based few-shot models. Based on our findings, we introduce a new few-shot model called Fuzzy Simplicial Networks (FSN) which leverages a construction from topology to more flexibly represent each class from limited data. In particular, FSN models can not only form multiple representations for a given class but can also begin to capture the low-dimensional structure which characterizes class manifolds in the encoded space of deep networks. We show that FSN outperforms state-of-the-art models on the challenging tasks we introduce in this paper while remaining competitive on standard few-shot benchmarks.
△ Less
Submitted 23 September, 2020;
originally announced September 2020.
-
Explanatory Masks for Neural Network Interpretability
Authors:
Lawrence Phillips,
Garrett Goh,
Nathan Hodas
Abstract:
Neural network interpretability is a vital component for applications across a wide variety of domains. In such cases it is often useful to analyze a network which has already been trained for its specific purpose. In this work, we develop a method to produce explanation masks for pre-trained networks. The mask localizes the most important aspects of each input for prediction of the original netwo…
▽ More
Neural network interpretability is a vital component for applications across a wide variety of domains. In such cases it is often useful to analyze a network which has already been trained for its specific purpose. In this work, we develop a method to produce explanation masks for pre-trained networks. The mask localizes the most important aspects of each input for prediction of the original network. Masks are created by a secondary network whose goal is to create as small an explanation as possible while still preserving the predictive accuracy of the original network. We demonstrate the applicability of our method for image classification with CNNs, sentiment analysis with RNNs, and chemical property prediction with mixed CNN/RNN architectures.
△ Less
Submitted 15 November, 2019;
originally announced November 2019.
-
Metric-Based Few-Shot Learning for Video Action Recognition
Authors:
Chris Careaga,
Brian Hutchinson,
Nathan Hodas,
Lawrence Phillips
Abstract:
In the few-shot scenario, a learner must effectively generalize to unseen classes given a small support set of labeled examples. While a relatively large amount of research has gone into few-shot learning for image classification, little work has been done on few-shot video classification. In this work, we address the task of few-shot video action recognition with a set of two-stream models. We ev…
▽ More
In the few-shot scenario, a learner must effectively generalize to unseen classes given a small support set of labeled examples. While a relatively large amount of research has gone into few-shot learning for image classification, little work has been done on few-shot video classification. In this work, we address the task of few-shot video action recognition with a set of two-stream models. We evaluate the performance of a set of convolutional and recurrent neural network video encoder architectures used in conjunction with three popular metric-based few-shot algorithms. We train and evaluate using a few-shot split of the Kinetics 600 dataset. Our experiments confirm the importance of the two-stream setup, and find prototypical networks and pooled long short-term memory network embeddings to give the best performance as few-shot method and video encoder, respectively. For a 5-shot 5-way task, this setup obtains 84.2% accuracy on the test set and 59.4% on a special "challenge" test set, composed of highly confusable classes.
△ Less
Submitted 14 September, 2019;
originally announced September 2019.
-
The Outer Product Structure of Neural Network Derivatives
Authors:
Craig Bakker,
Michael J. Henry,
Nathan O. Hodas
Abstract:
In this paper, we show that feedforward and recurrent neural networks exhibit an outer product derivative structure but that convolutional neural networks do not. This structure makes it possible to use higher-order information without needing approximations or infeasibly large amounts of memory, and it may also provide insights into the geometry of neural network optima. The ability to easily acc…
▽ More
In this paper, we show that feedforward and recurrent neural networks exhibit an outer product derivative structure but that convolutional neural networks do not. This structure makes it possible to use higher-order information without needing approximations or infeasibly large amounts of memory, and it may also provide insights into the geometry of neural network optima. The ability to easily access these derivatives also suggests a new, geometric approach to regularization. We then discuss how this structure could be used to improve training methods, increase network robustness and generalizability, and inform network compression methods.
△ Less
Submitted 8 October, 2018;
originally announced October 2018.
-
Model of Cognitive Dynamics Predicts Performance on Standardized Tests
Authors:
Nathan O. Hodas,
Jacob Hunter,
Stephen J. Young,
Kristina Lerman
Abstract:
In the modern knowledge economy, success demands sustained focus and high cognitive performance. Research suggests that human cognition is linked to a finite resource, and upon its depletion, cognitive functions such as self-control and decision-making may decline. While fatigue, among other factors, affects human activity, how cognitive performance evolves during extended periods of focus remains…
▽ More
In the modern knowledge economy, success demands sustained focus and high cognitive performance. Research suggests that human cognition is linked to a finite resource, and upon its depletion, cognitive functions such as self-control and decision-making may decline. While fatigue, among other factors, affects human activity, how cognitive performance evolves during extended periods of focus remains poorly understood. By analyzing performance of a large cohort answering practice standardized test questions online, we show that accuracy and learning decline as the test session progresses and recover following prolonged breaks. To explain these findings, we hypothesize that answering questions consumes some finite cognitive resources on which performance depends, but these resources recover during breaks between test questions. We propose a dynamic mechanism of the consumption and recovery of these resources and show that it explains empirical findings and predicts performance better than alternative hypotheses. While further controlled experiments are needed to identify the physiological origin of these phenomena, our work highlights the potential of empirical analysis of large-scale human behavior data to explore cognitive behavior.
△ Less
Submitted 7 September, 2018;
originally announced September 2018.
-
Doing the impossible: Why neural networks can be trained at all
Authors:
Nathan O. Hodas,
Panos Stinis
Abstract:
As deep neural networks grow in size, from thousands to millions to billions of weights, the performance of those networks becomes limited by our ability to accurately train them. A common naive question arises: if we have a system with billions of degrees of freedom, don't we also need billions of samples to train it? Of course, the success of deep learning indicates that reliable models can be l…
▽ More
As deep neural networks grow in size, from thousands to millions to billions of weights, the performance of those networks becomes limited by our ability to accurately train them. A common naive question arises: if we have a system with billions of degrees of freedom, don't we also need billions of samples to train it? Of course, the success of deep learning indicates that reliable models can be learned with reasonable amounts of data. Similar questions arise in protein folding, spin glasses and biological neural networks. With effectively infinite potential folding/spin/wiring configurations, how does the system find the precise arrangement that leads to useful and robust results? Simple sampling of the possible configurations until an optimal one is reached is not a viable option even if one waited for the age of the universe. On the contrary, there appears to be a mechanism in the above phenomena that forces them to achieve configurations that live on a low-dimensional manifold, avoiding the curse of dimensionality. In the current work we use the concept of mutual information between successive layers of a deep neural network to elucidate this mechanism and suggest possible ways of exploiting it to accelerate training. We show that adding structure to the neural network that enforces higher mutual information between layers speeds training and leads to more accurate results. High mutual information between layers implies that the effective number of free parameters is exponentially smaller than the raw number of tunable weights.
△ Less
Submitted 27 May, 2018; v1 submitted 13 May, 2018;
originally announced May 2018.
-
Sharkzor: Interactive Deep Learning for Image Triage, Sort and Summary
Authors:
Meg Pirrung,
Nathan Hilliard,
Artëm Yankov,
Nancy O'Brien,
Paul Weidert,
Courtney D Corley,
Nathan O Hodas
Abstract:
Sharkzor is a web application for machine-learning assisted image sort and summary. Deep learning algorithms are leveraged to infer, augment, and automate the user's mental model. Initially, images uploaded by the user are spread out on a canvas. The user then interacts with the images to impute their mental model into the application's algorithmic underpinnings. Methods of interaction within Shar…
▽ More
Sharkzor is a web application for machine-learning assisted image sort and summary. Deep learning algorithms are leveraged to infer, augment, and automate the user's mental model. Initially, images uploaded by the user are spread out on a canvas. The user then interacts with the images to impute their mental model into the application's algorithmic underpinnings. Methods of interaction within Sharkzor's user interface and user experience support three primary user tasks; triage, organize and automate. The user triages the large pile of overlapping images by moving images of interest into proximity. The user then organizes said images into meaningful groups. After interacting with the images and groups, deep learning helps to automate the user's interactions. The loop of interaction, automation, and response by the user allows the system to quickly make sense of large amounts of data.
△ Less
Submitted 14 February, 2018;
originally announced February 2018.
-
Few-Shot Learning with Metric-Agnostic Conditional Embeddings
Authors:
Nathan Hilliard,
Lawrence Phillips,
Scott Howland,
Artëm Yankov,
Courtney D. Corley,
Nathan O. Hodas
Abstract:
Learning high quality class representations from few examples is a key problem in metric-learning approaches to few-shot learning. To accomplish this, we introduce a novel architecture where class representations are conditioned for each few-shot trial based on a target image. We also deviate from traditional metric-learning approaches by training a network to perform comparisons between classes r…
▽ More
Learning high quality class representations from few examples is a key problem in metric-learning approaches to few-shot learning. To accomplish this, we introduce a novel architecture where class representations are conditioned for each few-shot trial based on a target image. We also deviate from traditional metric-learning approaches by training a network to perform comparisons between classes rather than relying on a static metric comparison. This allows the network to decide what aspects of each class are important for the comparison at hand. We find that this flexible architecture works well in practice, achieving state-of-the-art performance on the Caltech-UCSD birds fine-grained classification task.
△ Less
Submitted 12 February, 2018;
originally announced February 2018.
-
Using Rule-Based Labels for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction
Authors:
Garrett B. Goh,
Charles Siegel,
Abhinav Vishnu,
Nathan O. Hodas
Abstract:
With access to large datasets, deep neural networks (DNN) have achieved human-level accuracy in image and speech recognition tasks. However, in chemistry, data is inherently small and fragmented. In this work, we develop an approach of using rule-based knowledge for training ChemNet, a transferable and generalizable deep neural network for chemical property prediction that learns in a weak-supervi…
▽ More
With access to large datasets, deep neural networks (DNN) have achieved human-level accuracy in image and speech recognition tasks. However, in chemistry, data is inherently small and fragmented. In this work, we develop an approach of using rule-based knowledge for training ChemNet, a transferable and generalizable deep neural network for chemical property prediction that learns in a weak-supervised manner from large unlabeled chemical databases. When coupled with transfer learning approaches to predict other smaller datasets for chemical properties that it was not originally trained on, we show that ChemNet's accuracy outperforms contemporary DNN models that were trained using conventional supervised learning. Furthermore, we demonstrate that the ChemNet pre-training approach is equally effective on both CNN (Chemception) and RNN (SMILES2vec) models, indicating that this approach is network architecture agnostic and is effective across multiple data modalities. Our results indicate a pre-trained ChemNet that incorporates chemistry domain knowledge, enables the development of generalizable neural networks for more accurate prediction of novel chemical properties.
△ Less
Submitted 18 March, 2018; v1 submitted 7 December, 2017;
originally announced December 2017.
-
SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties
Authors:
Garrett B. Goh,
Nathan O. Hodas,
Charles Siegel,
Abhinav Vishnu
Abstract:
Chemical databases store information in text representations, and the SMILES format is a universal standard used in many cheminformatics software. Encoded in each SMILES string is structural information that can be used to predict complex chemical properties. In this work, we develop SMILES2vec, a deep RNN that automatically learns features from SMILES to predict chemical properties, without the n…
▽ More
Chemical databases store information in text representations, and the SMILES format is a universal standard used in many cheminformatics software. Encoded in each SMILES string is structural information that can be used to predict complex chemical properties. In this work, we develop SMILES2vec, a deep RNN that automatically learns features from SMILES to predict chemical properties, without the need for additional explicit feature engineering. Using Bayesian optimization methods to tune the network architecture, we show that an optimized SMILES2vec model can serve as a general-purpose neural network for predicting distinct chemical properties including toxicity, activity, solubility and solvation energy, while also outperforming contemporary MLP neural networks that uses engineered features. Furthermore, we demonstrate proof-of-concept of interpretability by developing an explanation mask that localizes on the most important characters used in making a prediction. When tested on the solubility dataset, it identified specific parts of a chemical that is consistent with established first-principles knowledge with an accuracy of 88%. Our work demonstrates that neural networks can learn technically accurate chemical concept and provide state-of-the-art accuracy, making interpretable deep neural networks a useful tool of relevance to the chemical industry.
△ Less
Submitted 18 March, 2018; v1 submitted 5 December, 2017;
originally announced December 2017.
-
How Much Chemistry Does a Deep Neural Network Need to Know to Make Accurate Predictions?
Authors:
Garrett B. Goh,
Charles Siegel,
Abhinav Vishnu,
Nathan O. Hodas,
Nathan Baker
Abstract:
The meteoric rise of deep learning models in computer vision research, having achieved human-level accuracy in image recognition tasks is firm evidence of the impact of representation learning of deep neural networks. In the chemistry domain, recent advances have also led to the development of similar CNN models, such as Chemception, that is trained to predict chemical properties using images of m…
▽ More
The meteoric rise of deep learning models in computer vision research, having achieved human-level accuracy in image recognition tasks is firm evidence of the impact of representation learning of deep neural networks. In the chemistry domain, recent advances have also led to the development of similar CNN models, such as Chemception, that is trained to predict chemical properties using images of molecular drawings. In this work, we investigate the effects of systematically removing and adding localized domain-specific information to the image channels of the training data. By augmenting images with only 3 additional basic information, and without introducing any architectural changes, we demonstrate that an augmented Chemception (AugChemception) outperforms the original model in the prediction of toxicity, activity, and solvation free energy. Then, by altering the information content in the images, and examining the resulting model's performance, we also identify two distinct learning patterns in predicting toxicity/activity as compared to solvation free energy. These patterns suggest that Chemception is learning about its tasks in the manner that is consistent with established knowledge. Thus, our work demonstrates that advanced chemical knowledge is not a pre-requisite for deep learning models to accurately predict complex chemical properties.
△ Less
Submitted 18 March, 2018; v1 submitted 5 October, 2017;
originally announced October 2017.
-
Bounded Rationality in Scholarly Knowledge Discovery
Authors:
Kristina Lerman,
Nathan Hodas,
Hao Wu
Abstract:
In an information-rich world, people's time and attention must be divided among rapidly changing information sources and the diverse tasks demanded of them. How people decide which of the many sources, such as scientific articles or patents, to read and use in their own work affects dissemination of scholarly knowledge and adoption of innovation. We analyze the choices people make about what infor…
▽ More
In an information-rich world, people's time and attention must be divided among rapidly changing information sources and the diverse tasks demanded of them. How people decide which of the many sources, such as scientific articles or patents, to read and use in their own work affects dissemination of scholarly knowledge and adoption of innovation. We analyze the choices people make about what information to propagate on the citation networks of Physical Review journals, US patents and legal opinions. We observe regularities in behavior consistent with human bounded rationality: rather than evaluate all available choices, people rely on simply cognitive heuristics to decide what information to attend to. We demonstrate that these heuristics bias choices, so that people preferentially propagate information that is easier to discover, often because it is newer or more popular. However, we do not find evidence that popular sources help to amplify the spread of information beyond making it more salient. Our paper provides novel evidence of the critical role that bounded rationality plays in the decisions to allocate attention in social communication.
△ Less
Submitted 30 September, 2017;
originally announced October 2017.
-
Learning Deep Neural Network Representations for Koopman Operators of Nonlinear Dynamical Systems
Authors:
Enoch Yeung,
Soumya Kundu,
Nathan Hodas
Abstract:
The Koopman operator has recently garnered much attention for its value in dynamical systems analysis and data-driven model discovery. However, its application has been hindered by the computational complexity of extended dynamic mode decomposition; this requires a combinatorially large basis set to adequately describe many nonlinear systems of interest, e.g. cyber-physical infrastructure systems,…
▽ More
The Koopman operator has recently garnered much attention for its value in dynamical systems analysis and data-driven model discovery. However, its application has been hindered by the computational complexity of extended dynamic mode decomposition; this requires a combinatorially large basis set to adequately describe many nonlinear systems of interest, e.g. cyber-physical infrastructure systems, biological networks, social systems, and fluid dynamics. Often the dictionaries generated for these problems are manually curated, requiring domain-specific knowledge and painstaking tuning. In this paper we introduce a deep learning framework for learning Koopman operators of nonlinear dynamical systems. We show that this novel method automatically selects efficient deep dictionaries, outperforming state-of-the-art methods. We benchmark this method on partially observed nonlinear systems, including the glycolytic oscillator and show it is able to predict quantitatively 100 steps into the future, using only a single timepoint, and qualitative oscillatory behavior 400 steps into the future.
△ Less
Submitted 17 November, 2017; v1 submitted 22 August, 2017;
originally announced August 2017.
-
Dynamic Input Structure and Network Assembly for Few-Shot Learning
Authors:
Nathan Hilliard,
Nathan O. Hodas,
Courtney D. Corley
Abstract:
The ability to learn from a small number of examples has been a difficult problem in machine learning since its inception. While methods have succeeded with large amounts of training data, research has been underway in how to accomplish similar performance with fewer examples, known as one-shot or more generally few-shot learning. This technique has been shown to have promising performance, but in…
▽ More
The ability to learn from a small number of examples has been a difficult problem in machine learning since its inception. While methods have succeeded with large amounts of training data, research has been underway in how to accomplish similar performance with fewer examples, known as one-shot or more generally few-shot learning. This technique has been shown to have promising performance, but in practice requires fixed-size inputs making it impractical for production systems where class sizes can vary. This impedes training and the final utility of few-shot learning systems. This paper describes an approach to constructing and training a network that can handle arbitrary example sizes dynamically as the system is used.
△ Less
Submitted 22 August, 2017;
originally announced August 2017.
-
Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models
Authors:
Garrett B. Goh,
Charles Siegel,
Abhinav Vishnu,
Nathan O. Hodas,
Nathan Baker
Abstract:
In the last few years, we have seen the transformative impact of deep learning in many applications, particularly in speech recognition and computer vision. Inspired by Google's Inception-ResNet deep convolutional neural network (CNN) for image classification, we have developed "Chemception", a deep CNN for the prediction of chemical properties, using just the images of 2D drawings of molecules. W…
▽ More
In the last few years, we have seen the transformative impact of deep learning in many applications, particularly in speech recognition and computer vision. Inspired by Google's Inception-ResNet deep convolutional neural network (CNN) for image classification, we have developed "Chemception", a deep CNN for the prediction of chemical properties, using just the images of 2D drawings of molecules. We develop Chemception without providing any additional explicit chemistry knowledge, such as basic concepts like periodicity, or advanced features like molecular descriptors and fingerprints. We then show how Chemception can serve as a general-purpose neural network architecture for predicting toxicity, activity, and solvation properties when trained on a modest database of 600 to 40,000 compounds. When compared to multi-layer perceptron (MLP) deep neural networks trained with ECFP fingerprints, Chemception slightly outperforms in activity and solvation prediction and slightly underperforms in toxicity prediction. Having matched the performance of expert-developed QSAR/QSPR deep learning models, our work demonstrates the plausibility of using deep neural networks to assist in computational chemistry research, where the feature engineering process is performed primarily by a deep learning algorithm.
△ Less
Submitted 20 June, 2017;
originally announced June 2017.
-
Using Social Media to Predict the Future: A Systematic Literature Review
Authors:
Lawrence Phillips,
Chase Dowling,
Kyle Shaffer,
Nathan Hodas,
Svitlana Volkova
Abstract:
Social media (SM) data provides a vast record of humanity's everyday thoughts, feelings, and actions at a resolution previously unimaginable. Because user behavior on SM is a reflection of events in the real world, researchers have realized they can use SM in order to forecast, making predictions about the future. The advantage of SM data is its relative ease of acquisition, large quantity, and ab…
▽ More
Social media (SM) data provides a vast record of humanity's everyday thoughts, feelings, and actions at a resolution previously unimaginable. Because user behavior on SM is a reflection of events in the real world, researchers have realized they can use SM in order to forecast, making predictions about the future. The advantage of SM data is its relative ease of acquisition, large quantity, and ability to capture socially relevant information, which may be difficult to gather from other data sources. Promising results exist across a wide variety of domains, but one will find little consensus regarding best practices in either methodology or evaluation. In this systematic review, we examine relevant literature over the past decade, tabulate mixed results across a number of scientific disciplines, and identify common pitfalls and best practices. We find that SM forecasting is limited by data biases, noisy data, lack of generalizable results, a lack of domain-specific theory, and underlying complexity in many prediction tasks. But despite these shortcomings, recurring findings and promising results continue to galvanize researchers and demand continued investigation. Based on the existing literature, we identify research practices which lead to success, citing specific examples in each case and making recommendations for best practices. These recommendations will help researchers take advantage of the exciting possibilities offered by SM platforms.
△ Less
Submitted 19 June, 2017;
originally announced June 2017.
-
Understanding Cognitive Depletion in Novice NMR Analysts
Authors:
Lyndsey Franklin,
Kyungsik Han,
Zhuanyi Huang,
Dustin Arendt,
Nathan Hodas
Abstract:
We present the results of a user study with novice NMR analysts (N=19) involving a gamified simulation of the NMR analysis process. Participants solved randomly generated spectrum puzzles for up to three hours. We used eye tracking, event logging, and observations to record symptoms of cognitive depletion while participants worked. Analysis of results indicate that we can detect both signs of lear…
▽ More
We present the results of a user study with novice NMR analysts (N=19) involving a gamified simulation of the NMR analysis process. Participants solved randomly generated spectrum puzzles for up to three hours. We used eye tracking, event logging, and observations to record symptoms of cognitive depletion while participants worked. Analysis of results indicate that we can detect both signs of learning and signs of cognitive depletion in participants over the course of the three hours. Participants' break strategies did not predict or reflect game scores, but certain symptoms appear predictive of breaks.
△ Less
Submitted 6 June, 2017;
originally announced June 2017.
-
Assessing the Linguistic Productivity of Unsupervised Deep Neural Networks
Authors:
Lawrence Phillips,
Nathan Hodas
Abstract:
Increasingly, cognitive scientists have demonstrated interest in applying tools from deep learning. One use for deep learning is in language acquisition where it is useful to know if a linguistic phenomenon can be learned through domain-general means. To assess whether unsupervised deep learning is appropriate, we first pose a smaller question: Can unsupervised neural networks apply linguistic rul…
▽ More
Increasingly, cognitive scientists have demonstrated interest in applying tools from deep learning. One use for deep learning is in language acquisition where it is useful to know if a linguistic phenomenon can be learned through domain-general means. To assess whether unsupervised deep learning is appropriate, we first pose a smaller question: Can unsupervised neural networks apply linguistic rules productively, using them in novel situations? We draw from the literature on determiner/noun productivity by training an unsupervised, autoencoder network measuring its ability to combine nouns with determiners. Our simple autoencoder creates combinations it has not previously encountered and produces a degree of overlap matching adults. While this preliminary work does not provide conclusive evidence for productivity, it warrants further investigation with more complex models. Further, this work helps lay the foundations for future collaboration between the deep learning and cognitive science communities.
△ Less
Submitted 6 June, 2017;
originally announced June 2017.
-
Cognitive Depletion in the Wild: a Case Study of NMR Spectroscopy Analysis
Authors:
Lyndsey Franklin,
Nathan Hodas
Abstract:
NMR spectroscopy analysis is a detail-oriented analytic feat that typically requires specific domain expertise and hours of concentration. This work presents an ethnographic-style study of this analysis process in the context of evaluating the symptoms of cognitive depletion. The repeated, non-trivial decisions required by and the time-consuming nature of NMR spectroscopy analysis make it an ideal…
▽ More
NMR spectroscopy analysis is a detail-oriented analytic feat that typically requires specific domain expertise and hours of concentration. This work presents an ethnographic-style study of this analysis process in the context of evaluating the symptoms of cognitive depletion. The repeated, non-trivial decisions required by and the time-consuming nature of NMR spectroscopy analysis make it an ideal, real-world scenario to study the symptoms of cognitive depletion, its effect on workflow and performance, and potential strategies for mitigating its deleterious effects.
△ Less
Submitted 5 June, 2017;
originally announced June 2017.
-
Will Break for Productivity: Generalized Symptoms of Cognitive Depletion
Authors:
Lyndsey Franklin,
Kristina Lerman,
Nathan Hodas
Abstract:
In this work, we address the symptoms of cognitive depletion as they relate to generalized knowledge workers. We unify previous findings within a single analytical model of cognitive depletion. Our purpose is to develop a model that will help us predict when a person has reached a sufficient state of cognitive depletion such that taking a break or some other restorative action will benefit both hi…
▽ More
In this work, we address the symptoms of cognitive depletion as they relate to generalized knowledge workers. We unify previous findings within a single analytical model of cognitive depletion. Our purpose is to develop a model that will help us predict when a person has reached a sufficient state of cognitive depletion such that taking a break or some other restorative action will benefit both his or her own wellbeing and the quality of his or her performance. We provide a definition of each symptom in our model as well as the effect it would have on a knowledge worker's ability to work productively. We discuss methods to detect each symptom that do not require self assessment. Understanding symptoms of cognitive depletion provides the ability to support human knowledge workers by reducing the stress involved with cognitive and work overload while maintaining or improving the quality of their performance.
△ Less
Submitted 5 June, 2017;
originally announced June 2017.
-
Deep Learning for Computational Chemistry
Authors:
Garrett B. Goh,
Nathan O. Hodas,
Abhinav Vishnu
Abstract:
The rise and fall of artificial neural networks is well documented in the scientific literature of both computer science and computational chemistry. Yet almost two decades later, we are now seeing a resurgence of interest in deep learning, a machine learning algorithm based on multilayer neural networks. Within the last few years, we have seen the transformative impact of deep learning in many do…
▽ More
The rise and fall of artificial neural networks is well documented in the scientific literature of both computer science and computational chemistry. Yet almost two decades later, we are now seeing a resurgence of interest in deep learning, a machine learning algorithm based on multilayer neural networks. Within the last few years, we have seen the transformative impact of deep learning in many domains, particularly in speech recognition and computer vision, to the extent that the majority of expert practitioners in those field are now regularly eschewing prior established models in favor of deep learning models. In this review, we provide an introductory overview into the theory of deep neural networks and their unique properties that distinguish them from traditional machine learning algorithms used in cheminformatics. By providing an overview of the variety of emerging applications of deep neural networks, we highlight its ubiquity and broad applicability to a wide range of challenges in the field, including QSAR, virtual screening, protein structure prediction, quantum chemistry, materials design and property prediction. In reviewing the performance of deep neural networks, we observed a consistent outperformance against non-neural networks state-of-the-art models across disparate research topics, and deep neural network based models often exceeded the "glass ceiling" expectations of their respective tasks. Coupled with the maturity of GPU-accelerated computing for training deep neural networks and the exponential growth of chemical data on which to train these networks on, we anticipate that deep learning algorithms will be a valuable tool for computational chemistry.
△ Less
Submitted 16 January, 2017;
originally announced January 2017.
-
Mutual information for fitting deep nonlinear models
Authors:
Jacob S. Hunter,
Nathan O. Hodas
Abstract:
Deep nonlinear models pose a challenge for fitting parameters due to lack of knowledge of the hidden layer and the potentially non-affine relation of the initial and observed layers. In the present work we investigate the use of information theoretic measures such as mutual information and Kullback-Leibler (KL) divergence as objective functions for fitting such models without knowledge of the hidd…
▽ More
Deep nonlinear models pose a challenge for fitting parameters due to lack of knowledge of the hidden layer and the potentially non-affine relation of the initial and observed layers. In the present work we investigate the use of information theoretic measures such as mutual information and Kullback-Leibler (KL) divergence as objective functions for fitting such models without knowledge of the hidden layer. We investigate one model as a proof of concept and one application of cogntive performance. We further investigate the use of optimizers with these methods. Mutual information is largely successful as an objective, depending on the parameters. KL divergence is found to be similarly succesful, given some knowledge of the statistics of the hidden layer.
△ Less
Submitted 17 December, 2016;
originally announced December 2016.
-
Beyond Fine Tuning: A Modular Approach to Learning on Small Data
Authors:
Ark Anderson,
Kyle Shaffer,
Artem Yankov,
Court D. Corley,
Nathan O. Hodas
Abstract:
In this paper we present a technique to train neural network models on small amounts of data. Current methods for training neural networks on small amounts of rich data typically rely on strategies such as fine-tuning a pre-trained neural network or the use of domain-specific hand-engineered features. Here we take the approach of treating network layers, or entire networks, as modules and combine…
▽ More
In this paper we present a technique to train neural network models on small amounts of data. Current methods for training neural networks on small amounts of rich data typically rely on strategies such as fine-tuning a pre-trained neural network or the use of domain-specific hand-engineered features. Here we take the approach of treating network layers, or entire networks, as modules and combine pre-trained modules with untrained modules, to learn the shift in distributions between data sets. The central impact of using a modular approach comes from adding new representations to a network, as opposed to replacing representations via fine-tuning. Using this technique, we are able surpass results using standard fine-tuning transfer learning approaches, and we are also able to significantly increase performance over such approaches when using smaller amounts of data.
△ Less
Submitted 5 November, 2016;
originally announced November 2016.
-
How a user's personality influences content engagement in social media
Authors:
Nathan O. Hodas,
Ryan Butner,
Court Corley
Abstract:
Social media presents an opportunity for people to share content that they find to be significant, funny, or notable. No single piece of content will appeal to all users, but are there systematic variations between users that can help us better understand information propagation? We conducted an experiment exploring social media usage during disaster scenarios, combining electroencephalogram (EEG)…
▽ More
Social media presents an opportunity for people to share content that they find to be significant, funny, or notable. No single piece of content will appeal to all users, but are there systematic variations between users that can help us better understand information propagation? We conducted an experiment exploring social media usage during disaster scenarios, combining electroencephalogram (EEG), personality surveys, and prompts to share social media, we show how personality not only drives willingness to engage with social media but also helps to determine what type of content users find compelling. As expected, extroverts are more likely to share content. In contrast, one of our central results is that individuals with depressive personalities are the most likely cohort to share informative content, like news or alerts. Because personality and mood will generally be highly correlated between friends via homophily, our results may be an import factor in understanding social contagion.
△ Less
Submitted 1 September, 2016;
originally announced September 2016.
-
Adding Semantic Information into Data Models by Learning Domain Expertise from User Interaction
Authors:
Nathan Oken Hodas,
Alex Endert
Abstract:
Interactive visual analytic systems enable users to discover insights from complex data. Users can express and test hypotheses via user interaction, leveraging their domain expertise and prior knowledge to guide and steer the analytic models in the system. For example, semantic interaction techniques enable systems to learn from the user's interactions and steer the underlying analytic models base…
▽ More
Interactive visual analytic systems enable users to discover insights from complex data. Users can express and test hypotheses via user interaction, leveraging their domain expertise and prior knowledge to guide and steer the analytic models in the system. For example, semantic interaction techniques enable systems to learn from the user's interactions and steer the underlying analytic models based on the user's analytical reasoning. However, an open challenge is how to not only steer models based on the dimensions or features of the data, but how to add dimensions or attributes to the data based on the domain expertise of the user. In this paper, we present a technique for inferring and appending dimensions onto the dataset based on the prior expertise of the user expressed via user interactions. Our technique enables users to directly manipulate a spatial organization of data, from which both the dimensions of the data are weighted, and also dimensions created to represent the prior knowledge the user brings to the system. We describe this technique and demonstrate its utility via a use case.
△ Less
Submitted 6 April, 2016;
originally announced April 2016.
-
Network Weirdness: Exploring the Origins of Network Paradoxes
Authors:
Farshad Kooti,
Nathan O. Hodas,
Kristina Lerman
Abstract:
Social networks have many counter-intuitive properties, including the "friendship paradox" that states, on average, your friends have more friends than you do. Recently, a variety of other paradoxes were demonstrated in online social networks. This paper explores the origins of these network paradoxes. Specifically, we ask whether they arise from mathematical properties of the networks or whether…
▽ More
Social networks have many counter-intuitive properties, including the "friendship paradox" that states, on average, your friends have more friends than you do. Recently, a variety of other paradoxes were demonstrated in online social networks. This paper explores the origins of these network paradoxes. Specifically, we ask whether they arise from mathematical properties of the networks or whether they have a behavioral origin. We show that sampling from heavy-tailed distributions always gives rise to a paradox in the mean, but not the median. We propose a strong form of network paradoxes, based on utilizing the median, and validate it empirically using data from two online social networks. Specifically, we show that for any user the majority of user's friends and followers have more friends, followers, etc. than the user, and that this cannot be explained by statistical properties of sampling. Next, we explore the behavioral origins of the paradoxes by using the shuffle test to remove correlations between node degrees and attributes. We find that paradoxes for the mean persist in the shuffled network, but not for the median. We demonstrate that strong paradoxes arise due to the assortativity of user attributes, including degree, and correlation between degree and attribute.
△ Less
Submitted 27 March, 2014;
originally announced March 2014.
-
The Simple Rules of Social Contagion
Authors:
Nathan O. Hodas,
Kristina Lerman
Abstract:
It is commonly believed that information spreads between individuals like a pathogen, with each exposure by an informed friend potentially resulting in a naive individual becoming infected. However, empirical studies of social media suggest that individual response to repeated exposure to information is significantly more complex than the prediction of the pathogen model. As a proxy for interventi…
▽ More
It is commonly believed that information spreads between individuals like a pathogen, with each exposure by an informed friend potentially resulting in a naive individual becoming infected. However, empirical studies of social media suggest that individual response to repeated exposure to information is significantly more complex than the prediction of the pathogen model. As a proxy for intervention experiments, we compare user responses to multiple exposures on two different social media sites, Twitter and Digg. We show that the position of the exposing messages on the user-interface strongly affects social contagion. Accounting for this visibility significantly simplifies the dynamics of social contagion. The likelihood an individual will spread information increases monotonically with exposure, while explicit feedback about how many friends have previously spread it increases the likelihood of a response. We apply our model to real-time forecasting of user behavior.
△ Less
Submitted 22 August, 2013;
originally announced August 2013.
-
Attention and Visibility in an Information Rich World
Authors:
Nathan O. Hodas,
Kristina Lerman
Abstract:
As the rate of content production grows, we must make a staggering number of daily decisions about what information is worth acting on. For any flourishing online social media system, users can barely keep up with the new content shared by friends. How does the user-interface design help or hinder users' ability to find interesting content? We analyze the choices people make about which informatio…
▽ More
As the rate of content production grows, we must make a staggering number of daily decisions about what information is worth acting on. For any flourishing online social media system, users can barely keep up with the new content shared by friends. How does the user-interface design help or hinder users' ability to find interesting content? We analyze the choices people make about which information to propagate on the social media sites Twitter and Digg. We observe regularities in behavior which can be attributed directly to cognitive limitations of humans, resulting from the different visibility policies of each site. We quantify how people divide their limited attention among competing sources of information, and we show how the user-interface design can mediate information spread.
△ Less
Submitted 17 July, 2013;
originally announced July 2013.
-
Friendship Paradox Redux: Your Friends Are More Interesting Than You
Authors:
Nathan O. Hodas,
Farshad Kooti,
Kristina Lerman
Abstract:
Feld's friendship paradox states that "your friends have more friends than you, on average." This paradox arises because extremely popular people, despite being rare, are overrepresented when averaging over friends. Using a sample of the Twitter firehose, we confirm that the friendship paradox holds for >98% of Twitter users. Because of the directed nature of the follower graph on Twitter, we are…
▽ More
Feld's friendship paradox states that "your friends have more friends than you, on average." This paradox arises because extremely popular people, despite being rare, are overrepresented when averaging over friends. Using a sample of the Twitter firehose, we confirm that the friendship paradox holds for >98% of Twitter users. Because of the directed nature of the follower graph on Twitter, we are further able to confirm more detailed forms of the friendship paradox: everyone you follow or who follows you has more friends and followers than you. This is likely caused by a correlation we demonstrate between Twitter activity, number of friends, and number of followers. In addition, we discover two new paradoxes: the virality paradox that states "your friends receive more viral content than you, on average," and the activity paradox, which states "your friends are more active than you, on average." The latter paradox is important in regulating online communication. It may result in users having difficulty maintaining optimal incoming information rates, because following additional users causes the volume of incoming tweets to increase super-linearly. While users may compensate for increased information flow by increasing their own activity, users become information overloaded when they receive more information than they are able or willing to process. We compare the average size of cascades that are sent and received by overloaded and underloaded users. And we show that overloaded users post and receive larger cascades and they are poor detector of small cascades.
△ Less
Submitted 11 April, 2013;
originally announced April 2013.
-
How Visibility and Divided Attention Constrain Social Contagion
Authors:
Nathan Oken Hodas,
Kristina Lerman
Abstract:
How far and how fast does information spread in social media? Researchers have recently examined a number of factors that affect information diffusion in online social networks, including: the novelty of information, users' activity levels, who they pay attention to, and how they respond to friends' recommendations. Using URLs as markers of information, we carry out a detailed study of retweeting,…
▽ More
How far and how fast does information spread in social media? Researchers have recently examined a number of factors that affect information diffusion in online social networks, including: the novelty of information, users' activity levels, who they pay attention to, and how they respond to friends' recommendations. Using URLs as markers of information, we carry out a detailed study of retweeting, the primary mechanism by which information spreads on the Twitter follower graph. Our empirical study examines how users respond to an incoming stimulus, i.e., a tweet (message) from a friend, and reveals that %retweeting behavior is constrained by a few simple principles. the "principle of least effort" combined with limited attention plays a dominant role in retweeting behavior. Specifically, we observe that users retweet information when it is most visible, such as when it near the top of their Twitter stream. Moreover, our measurements quantify how a user's limited attention is divided among incoming tweets, providing novel evidence that highly connected individuals are less likely to propagate an arbitrary tweet. Our study indicates that the finite ability to process incoming information constrains social contagion, and we conclude that rapid decay of visibility is the primary barrier to information propagation online.
△ Less
Submitted 10 September, 2012; v1 submitted 11 May, 2012;
originally announced May 2012.
-
The Quality of Oscillations in Overdamped Networks
Authors:
Nathan O. Hodas
Abstract:
The second law of thermodynamics implies that no macroscopic system may oscillate indefinitely without consuming energy. The question of the number of possible oscillations and the coherent quality of these oscillations remain unanswered. This paper proves the upper-bounds on the number and quality of such oscillations when the system in question is homogeneously driven and has a discrete network…
▽ More
The second law of thermodynamics implies that no macroscopic system may oscillate indefinitely without consuming energy. The question of the number of possible oscillations and the coherent quality of these oscillations remain unanswered. This paper proves the upper-bounds on the number and quality of such oscillations when the system in question is homogeneously driven and has a discrete network of states. In a closed system, the maximum number of oscillations is bounded by the number of states in the network. In open systems, the size of the network bounds the quality factor of oscillation. This work also explores how the quality factor of macrostate oscillations, such as would be observed in chemical reactions, are bounded by the smallest equivalent loop of the network, not the size of the entire system. The consequences of this limit are explored in the context of chemical clocks and limit cycles.
△ Less
Submitted 8 August, 2011; v1 submitted 1 June, 2010;
originally announced June 2010.