-
Accelerated Mirror Descent for Non-Euclidean Star-convex Functions
Authors:
Clement Lezane,
Sophie Langer,
Wouter M Koolen
Abstract:
Acceleration for non-convex functions has been an important problem in optimisation. We revisit star-convex functions, which are strictly unimodal on all lines through a minimizer. In [1], the authors accelerate gradient descent for star-convex functions with gradients that are Lipschitz with respect to the Euclidean norm in an unconstrained domain. In this paper, we introduce a new assumption abo…
▽ More
Acceleration for non-convex functions has been an important problem in optimisation. We revisit star-convex functions, which are strictly unimodal on all lines through a minimizer. In [1], the authors accelerate gradient descent for star-convex functions with gradients that are Lipschitz with respect to the Euclidean norm in an unconstrained domain. In this paper, we introduce a new assumption about the regularity of the derivative of a general norm and we accelerate mirror descent for this class of normed spaces. We show that, under it, our algorithms show sharp convergence rates for star-convex functions with -H"older continuous gradients. We also prove that our convergence rate is near optimal for -norms.
[1] Near-Optimal Methods for Minimizing Star-Convex Functions and Beyond, Hinder Oliver and Sidford Aaron and Sohoni Nimit
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
FhGenie: A Custom, Confidentiality-preserving Chat AI for Corporate and Scientific Use
Authors:
Ingo Weber,
Hendrik Linka,
Daniel Mertens,
Tamara Muryshkin,
Heinrich Opgenoorth,
Stefan Langer
Abstract:
Since OpenAI's release of ChatGPT, generative AI has received significant attention across various domains. These AI-based chat systems have the potential to enhance the productivity of knowledge workers in diverse tasks. However, the use of free public services poses a risk of data leakage, as service providers may exploit user input for additional training and optimization without clear boundari…
▽ More
Since OpenAI's release of ChatGPT, generative AI has received significant attention across various domains. These AI-based chat systems have the potential to enhance the productivity of knowledge workers in diverse tasks. However, the use of free public services poses a risk of data leakage, as service providers may exploit user input for additional training and optimization without clear boundaries. Even subscription-based alternatives sometimes lack transparency in handling user data. To address these concerns and enable Fraunhofer staff to leverage this technology while ensuring confidentiality, we have designed and developed a customized chat AI called FhGenie (genie being a reference to a helpful spirit). Within few days of its release, thousands of Fraunhofer employees started using this service. As pioneers in implementing such a system, many other organizations have followed suit. Our solution builds upon commercial large language models (LLMs), which we have carefully integrated into our system to meet our specific requirements and compliance constraints, including confidentiality and GDPR. In this paper, we share detailed insights into the architectural considerations, design, implementation, and subsequent updates of FhGenie. Additionally, we discuss challenges, observations, and the core lessons learned from its productive usage.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
RIDGE: Reproducibility, Integrity, Dependability, Generalizability, and Efficiency Assessment of Medical Image Segmentation Models
Authors:
Farhad Maleki,
Linda Moy,
Reza Forghani,
Tapotosh Ghosh,
Katie Ovens,
Steve Langer,
Pouria Rouzrokh,
Bardia Khosravi,
Ali Ganjizadeh,
Daniel Warren,
Roxana Daneshjou,
Mana Moassefi,
Atlas Haddadi Avval,
Susan Sotardi,
Neil Tenenholtz,
Felipe Kitamura,
Timothy Kline
Abstract:
Deep learning techniques hold immense promise for advancing medical image analysis, particularly in tasks like image segmentation, where precise annotation of regions or volumes of interest within medical images is crucial but manually laborious and prone to interobserver and intraobserver biases. As such, deep learning approaches could provide automated solutions for such applications. However, t…
▽ More
Deep learning techniques hold immense promise for advancing medical image analysis, particularly in tasks like image segmentation, where precise annotation of regions or volumes of interest within medical images is crucial but manually laborious and prone to interobserver and intraobserver biases. As such, deep learning approaches could provide automated solutions for such applications. However, the potential of these techniques is often undermined by challenges in reproducibility and generalizability, which are key barriers to their clinical adoption. This paper introduces the RIDGE checklist, a comprehensive framework designed to assess the Reproducibility, Integrity, Dependability, Generalizability, and Efficiency of deep learning-based medical image segmentation models. The RIDGE checklist is not just a tool for evaluation but also a guideline for researchers striving to improve the quality and transparency of their work. By adhering to the principles outlined in the RIDGE checklist, researchers can ensure that their developed segmentation models are robust, scientifically valid, and applicable in a clinical setting.
△ Less
Submitted 3 July, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
Learning to Predict Structural Vibrations
Authors:
Jan van Delden,
Julius Schultz,
Christopher Blech,
Sabine C. Langer,
Timo Lüddecke
Abstract:
In mechanical structures like airplanes, cars and houses, noise is generated and transmitted through vibrations. To take measures to reduce this noise, vibrations need to be simulated with expensive numerical computations. Surrogate deep learning models present a promising alternative to classical numerical simulations as they can be evaluated magnitudes faster, while trading-off accuracy. To quan…
▽ More
In mechanical structures like airplanes, cars and houses, noise is generated and transmitted through vibrations. To take measures to reduce this noise, vibrations need to be simulated with expensive numerical computations. Surrogate deep learning models present a promising alternative to classical numerical simulations as they can be evaluated magnitudes faster, while trading-off accuracy. To quantify such trade-offs systematically and foster the development of methods, we present a benchmark on the task of predicting the vibration of harmonically excited plates. The benchmark features a total of 12000 plate geometries with varying forms of beadings, material and sizes with associated numerical solutions. To address the benchmark task, we propose a new network architecture, named Frequency-Query Operator, which is trained to map plate geometries to their vibration pattern given a specific excitation frequency. Applying principles from operator learning and implicit models for shape encoding, our approach effectively addresses the prediction of highly variable frequency response functions occurring in dynamic systems. To quantify the prediction quality, we introduce a set of evaluation metrics and evaluate the method on our vibrating-plates benchmark. Our method outperforms DeepONets, Fourier Neural Operators and more traditional neural network architectures. Code, dataset and visualizations: https://eckerlab.org/code/delden2023_plate
△ Less
Submitted 22 March, 2024; v1 submitted 9 October, 2023;
originally announced October 2023.
-
Efficient solution strategies for cabin noise assessment of a wave resolving aircraft fuselage model
Authors:
Christopher Blech,
Harikrishnan K. Sreekumar,
Yannik Hüpel,
Sabine C. Langer
Abstract:
For the purpose of high-fidelity aircraft cabin noise simulations during early design phases, we study three efficient solving approaches for the fully coupled finite element model of an aircraft fuselage segment. Obtaining an efficient solution with respect to consumed computational time and resources is challenging within a conventional simulation pipeline, as large-scale and complex vibroacoust…
▽ More
For the purpose of high-fidelity aircraft cabin noise simulations during early design phases, we study three efficient solving approaches for the fully coupled finite element model of an aircraft fuselage segment. Obtaining an efficient solution with respect to consumed computational time and resources is challenging within a conventional simulation pipeline, as large-scale and complex vibroacoustic models demand crucially high computational costs with increasing frequency. In this contribution, we adopt (1) frequency and domain-adaptive discretisation, (2) domain-decomposition techniques, and (3) model order reduction with rational Arnoldi Krylov subspace methods for an aircraft fuselage model. The three approaches have shown remarkable advantage thereby reducing the solving time as well as the memory requirement that are essential when solving large-scale models. While the discretisation and the model order reduction approaches accelerate the solving process by efficiently handling the complexity of the system to be solved, domain-decomposition techniques further handle the aspect of reducing the overall memory consumption. Finally with the help of active research aircraft models, we implement and showcase the achieved efficiency.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
Learning Green's Function Efficiently Using Low-Rank Approximations
Authors:
Kishan Wimalawarne,
Taiji Suzuki,
Sophie Langer
Abstract:
Learning the Green's function using deep learning models enables to solve different classes of partial differential equations. A practical limitation of using deep learning for the Green's function is the repeated computationally expensive Monte-Carlo integral approximations. We propose to learn the Green's function by low-rank decomposition, which results in a novel architecture to remove redunda…
▽ More
Learning the Green's function using deep learning models enables to solve different classes of partial differential equations. A practical limitation of using deep learning for the Green's function is the repeated computationally expensive Monte-Carlo integral approximations. We propose to learn the Green's function by low-rank decomposition, which results in a novel architecture to remove redundant computations by separate learning with domain data for evaluation and Monte-Carlo samples for integral approximation. Using experiments we show that the proposed method improves computational time compared to MOD-Net while achieving comparable accuracy compared to both PINNs and MOD-Net.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
Physics-Informed Neural Networks for Parametric Compressible Euler Equations
Authors:
Simon Wassing,
Stefan Langer,
Philipp Bekemeyer
Abstract:
The numerical approximation of solutions to the compressible Euler and Navier-Stokes equations is a crucial but challenging task with relevance in various fields of science and engineering. Recently, methods from deep learning have been successfully employed for solving partial differential equations by incorporating the equations into a loss function that is minimized during the training of a neu…
▽ More
The numerical approximation of solutions to the compressible Euler and Navier-Stokes equations is a crucial but challenging task with relevance in various fields of science and engineering. Recently, methods from deep learning have been successfully employed for solving partial differential equations by incorporating the equations into a loss function that is minimized during the training of a neural network. This approach yields a so-called physics-informed neural network. It is not based upon classical discretizations, such as finite-volume or finite-element schemes, and can even address parametric problems in a straightforward manner. This has raised the question, whether physics-informed neural networks may be a viable alternative to conventional methods for computational fluid dynamics. In this article we introduce an adaptive artificial viscosity reduction procedure for physics-informed neural networks enabling approximate parametric solutions for forward problems governed by the stationary two-dimensional Euler equations in sub- and supersonic conditions. To the best of our knowledge, this is the first time that the concept of artificial viscosity in physics-informed neural networks is successfully applied to a complex system of conservation laws in more than one dimension. Moreover, we highlight the unique ability of this method to solve forward problems in a continuous parameter space. The presented methodology takes the next step of bringing physics-informed neural networks closer towards realistic compressible flow applications.
△ Less
Submitted 29 January, 2024; v1 submitted 26 July, 2023;
originally announced July 2023.
-
Dropout Regularization Versus $\ell_2$-Penalization in the Linear Model
Authors:
Gabriel Clara,
Sophie Langer,
Johannes Schmidt-Hieber
Abstract:
We investigate the statistical behavior of gradient descent iterates with dropout in the linear regression model. In particular, non-asymptotic bounds for the convergence of expectations and covariance matrices of the iterates are derived. The results shed more light on the widely cited connection between dropout and l2-regularization in the linear model. We indicate a more subtle relationship, ow…
▽ More
We investigate the statistical behavior of gradient descent iterates with dropout in the linear regression model. In particular, non-asymptotic bounds for the convergence of expectations and covariance matrices of the iterates are derived. The results shed more light on the widely cited connection between dropout and l2-regularization in the linear model. We indicate a more subtle relationship, owing to interactions between the gradient descent dynamics and the additional randomness induced by dropout. Further, we study a simplified variant of dropout which does not have a regularizing effect and converges to the least squares estimator
△ Less
Submitted 25 April, 2024; v1 submitted 18 June, 2023;
originally announced June 2023.
-
SEMPAI: a Self-Enhancing Multi-Photon Artificial Intelligence for prior-informed assessment of muscle function and pathology
Authors:
Alexander Mühlberg,
Paul Ritter,
Simon Langer,
Chloë Goossens,
Stefanie Nübler,
Dominik Schneidereit,
Oliver Taubmann,
Felix Denzinger,
Dominik Nörenberg,
Michael Haug,
Wolfgang H. Goldmann,
Andreas K. Maier,
Oliver Friedrich,
Lucas Kreiss
Abstract:
Deep learning (DL) shows notable success in biomedical studies. However, most DL algorithms work as a black box, exclude biomedical experts, and need extensive data. We introduce the Self-Enhancing Multi-Photon Artificial Intelligence (SEMPAI), that integrates hypothesis-driven priors in a data-driven DL approach for research on multiphoton microscopy (MPM) of muscle fibers. SEMPAI utilizes meta-l…
▽ More
Deep learning (DL) shows notable success in biomedical studies. However, most DL algorithms work as a black box, exclude biomedical experts, and need extensive data. We introduce the Self-Enhancing Multi-Photon Artificial Intelligence (SEMPAI), that integrates hypothesis-driven priors in a data-driven DL approach for research on multiphoton microscopy (MPM) of muscle fibers. SEMPAI utilizes meta-learning to optimize prior integration, data representation, and neural network architecture simultaneously. This allows hypothesis testing and provides interpretable feedback about the origin of biological information in MPM images. SEMPAI performs joint learning of several tasks to enable prediction for small datasets. The method is applied on an extensive multi-study dataset resulting in the largest joint analysis of pathologies and function for single muscle fibers. SEMPAI outperforms state-of-the-art biomarkers in six of seven predictive tasks, including those with scarce data. SEMPAI's DL models with integrated priors are superior to those without priors and to prior-only machine learning approaches.
△ Less
Submitted 28 October, 2022;
originally announced October 2022.
-
Domain Adaptive Pretraining for Multilingual Acronym Extraction
Authors:
Usama Yaseen,
Stefan Langer
Abstract:
This paper presents our findings from participating in the multilingual acronym extraction shared task SDU@AAAI-22. The task consists of acronym extraction from documents in 6 languages within scientific and legal domains. To address multilingual acronym extraction we employed BiLSTM-CRF with multilingual XLM-RoBERTa embeddings. We pretrained the XLM-RoBERTa model on the shared task corpus to furt…
▽ More
This paper presents our findings from participating in the multilingual acronym extraction shared task SDU@AAAI-22. The task consists of acronym extraction from documents in 6 languages within scientific and legal domains. To address multilingual acronym extraction we employed BiLSTM-CRF with multilingual XLM-RoBERTa embeddings. We pretrained the XLM-RoBERTa model on the shared task corpus to further adapt XLM-RoBERTa embeddings to the shared task domain(s). Our system (team: SMR-NLP) achieved competitive performance for acronym extraction across all the languages.
△ Less
Submitted 30 June, 2022;
originally announced June 2022.
-
A statistical analysis of an image classification problem
Authors:
Sophie Langer,
Johannes Schmidt-Hieber
Abstract:
The availability of massive image databases resulted in the development of scalable machine learning methods such as convolutional neural network (CNNs) filtering and processing these data. While the very recent theoretical work on CNNs focuses on standard nonparametric denoising problems, the variability in image classification datasets does, however, not originate from additive noise but from va…
▽ More
The availability of massive image databases resulted in the development of scalable machine learning methods such as convolutional neural network (CNNs) filtering and processing these data. While the very recent theoretical work on CNNs focuses on standard nonparametric denoising problems, the variability in image classification datasets does, however, not originate from additive noise but from variation of the shape and other characteristics of the same object across different images. To address this problem, we consider a simple supervised classification problem for object detection on grayscale images. While from the function estimation point of view, every pixel is a variable and large images lead to high-dimensional function recovery tasks suffering from the curse of dimensionality, increasing the number of pixels in our image deformation model enhances the image resolution and makes the object classification problem easier. We propose and theoretically analyze two different procedures. The first method estimates the image deformation by support alignment. Under a minimal separation condition, it is shown that perfect classification is possible. The second method fits a CNN to the data. We derive a rate for the misclassification error depending on the sample size and the number of pixels. Both classifiers are empirically compared on images generated from the MNIST handwritten digit database. The obtained results corroborate the theoretical findings.
△ Less
Submitted 5 June, 2022;
originally announced June 2022.
-
DeepTechnome: Mitigating Unknown Bias in Deep Learning Based Assessment of CT Images
Authors:
Simon Langer,
Oliver Taubmann,
Felix Denzinger,
Andreas Maier,
Alexander Mühlberg
Abstract:
Reliably detecting diseases using relevant biological information is crucial for real-world applicability of deep learning techniques in medical imaging. We debias deep learning models during training against unknown bias - without preprocessing/filtering the input beforehand or assuming specific knowledge about its distribution or precise nature in the dataset. We use control regions as surrogate…
▽ More
Reliably detecting diseases using relevant biological information is crucial for real-world applicability of deep learning techniques in medical imaging. We debias deep learning models during training against unknown bias - without preprocessing/filtering the input beforehand or assuming specific knowledge about its distribution or precise nature in the dataset. We use control regions as surrogates that carry information regarding the bias, employ the classifier model to extract features, and suppress biased intermediate features with our custom, modular DecorreLayer. We evaluate our method on a dataset of 952 lung computed tomography scans by introducing simulated biases w.r.t. reconstruction kernel and noise level and propose including an adversarial test set in evaluations of bias reduction techniques. In a moderately sized model architecture, applying the proposed method to learn from data exhibiting a strong bias, it near-perfectly recovers the classification performance observed when training with corresponding unbiased data.
△ Less
Submitted 26 May, 2022;
originally announced May 2022.
-
Best Practices and Scoring System on Reviewing A.I. based Medical Imaging Papers: Part 1 Classification
Authors:
Timothy L. Kline,
Felipe Kitamura,
Ian Pan,
Amine M. Korchi,
Neil Tenenholtz,
Linda Moy,
Judy Wawira Gichoya,
Igor Santos,
Steven Blumer,
Misha Ysabel Hwang,
Kim-Ann Git,
Abishek Shroff,
Elad Walach,
George Shih,
Steve Langer
Abstract:
With the recent advances in A.I. methodologies and their application to medical imaging, there has been an explosion of related research programs utilizing these techniques to produce state-of-the-art classification performance. Ultimately, these research programs culminate in submission of their work for consideration in peer reviewed journals. To date, the criteria for acceptance vs. rejection i…
▽ More
With the recent advances in A.I. methodologies and their application to medical imaging, there has been an explosion of related research programs utilizing these techniques to produce state-of-the-art classification performance. Ultimately, these research programs culminate in submission of their work for consideration in peer reviewed journals. To date, the criteria for acceptance vs. rejection is often subjective; however, reproducible science requires reproducible review. The Machine Learning Education Sub-Committee of SIIM has identified a knowledge gap and a serious need to establish guidelines for reviewing these studies. Although there have been several recent papers with this goal, this present work is written from the machine learning practitioners standpoint. In this series, the committee will address the best practices to be followed in an A.I.-based study and present the required sections in terms of examples and discussion of what should be included to make the studies cohesive, reproducible, accurate, and self-contained. This first entry in the series focuses on the task of image classification. Elements such as dataset curation, data pre-processing steps, defining an appropriate reference standard, data partitioning, model architecture and training are discussed. The sections are presented as they would be detailed in a typical manuscript, with content describing the necessary information that should be included to make sure the study is of sufficient quality to be considered for publication. The goal of this series is to provide resources to not only help improve the review process for A.I.-based medical imaging papers, but to facilitate a standard for the information that is presented within all components of the research study. We hope to provide quantitative metrics in what otherwise may be a qualitative review process.
△ Less
Submitted 3 February, 2022;
originally announced February 2022.
-
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Authors:
Kaustubh D. Dhole,
Varun Gangal,
Sebastian Gehrmann,
Aadesh Gupta,
Zhenhao Li,
Saad Mahamood,
Abinaya Mahendiran,
Simon Mille,
Ashish Shrivastava,
Samson Tan,
Tongshuang Wu,
Jascha Sohl-Dickstein,
Jinho D. Choi,
Eduard Hovy,
Ondrej Dusek,
Sebastian Ruder,
Sajant Anand,
Nagender Aneja,
Rabin Banjade,
Lisa Barthe,
Hanna Behnke,
Ian Berlot-Attwell,
Connor Boyle,
Caroline Brun,
Marco Antonio Sobrevilla Cabezudo
, et al. (101 additional authors not shown)
Abstract:
Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data split…
▽ More
Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of natural language tasks. We demonstrate the efficacy of NL-Augmenter by using several of its transformations to analyze the robustness of popular natural language models. The infrastructure, datacards and robustness analysis results are available publicly on the NL-Augmenter repository (https://github.com/GEM-benchmark/NL-Augmenter).
△ Less
Submitted 11 October, 2022; v1 submitted 5 December, 2021;
originally announced December 2021.
-
Simulated LiDAR Repositioning: a novel point cloud data augmentation method
Authors:
Xavier Morin-Duchesne,
Michael S Langer
Abstract:
We address a data augmentation problem for LiDAR. Given a LiDAR scan of a scene from some position, how can one simulate new scans of that scene from different, secondary positions? The method defines criteria for selecting valid secondary positions, and then estimates which points from the original point cloud would be acquired by a scanner from these positions. We validate the method using synth…
▽ More
We address a data augmentation problem for LiDAR. Given a LiDAR scan of a scene from some position, how can one simulate new scans of that scene from different, secondary positions? The method defines criteria for selecting valid secondary positions, and then estimates which points from the original point cloud would be acquired by a scanner from these positions. We validate the method using synthetic scenes, and examine how the similarity of generated point clouds depends on scanner distance, occlusion, and angular resolution. We show that the method is more accurate at short distances, and that having a high scanner resolution for the original point clouds has a strong impact on the similarity of generated point clouds. We also demonstrate how the method can be applied to natural scene statistics: in particular, we apply our method to reposition the scanner horizontally and vertically, separately consider points belonging to the ground and to non-ground objects, and describe the impact on the distributions of distances to these two classes of points.
△ Less
Submitted 20 November, 2021;
originally announced November 2021.
-
Experiments conducted in the burning plasma regime with inertial fusion implosions
Authors:
J. S. Ross,
J. E. Ralph,
A. B. Zylstra,
A. L. Kritcher,
H. F. Robey,
C. V. Young,
O. A. Hurricane,
D. A. Callahan,
K. L. Baker,
D. T. Casey,
T. Doeppner,
L. Divol,
M. Hohenberger,
S. Le Pape,
A. Pak,
P. K. Patel,
R. Tommasini,
S. J. Ali,
P. A. Amendt,
L. J. Atherton,
B. Bachmann,
D. Bailey,
L. R. Benedetti,
L. Berzak Hopkins,
R. Betti
, et al. (127 additional authors not shown)
Abstract:
An experimental program is currently underway at the National Ignition Facility (NIF) to compress deuterium and tritium (DT) fuel to densities and temperatures sufficient to achieve fusion and energy gain. The primary approach being investigated is indirect drive inertial confinement fusion (ICF), where a high-Z radiation cavity (a hohlraum) is heated by lasers, converting the incident energy into…
▽ More
An experimental program is currently underway at the National Ignition Facility (NIF) to compress deuterium and tritium (DT) fuel to densities and temperatures sufficient to achieve fusion and energy gain. The primary approach being investigated is indirect drive inertial confinement fusion (ICF), where a high-Z radiation cavity (a hohlraum) is heated by lasers, converting the incident energy into x-ray radiation which in turn drives the DT fuel filled capsule causing it to implode. Previous experiments reported DT fuel gain exceeding unity [O.A. Hurricane et al., Nature 506, 343 (2014)] and then exceeding the kinetic energy of the imploding fuel [S. Le Pape et al., Phys. Rev. Lett. 120, 245003 (2018)]. We report on recent experiments that have achieved record fusion neutron yields on NIF, greater than 100 kJ with momentary fusion powers exceeding 1PW, and have for the first time entered the burning plasma regime where fusion alpha-heating of the fuel exceeds the energy delivered to the fuel via compression. This was accomplished by increasing the size of the high-density carbon (HDC) capsule, increasing energy coupling, while controlling symmetry and implosion design parameters. Two tactics were successful in controlling the radiation flux symmetry and therefore the implosion symmetry: transferring energy between laser cones via plasma waves, and changing the shape of the hohlraum. In conducting these experiments, we controlled for known sources of degradation. Herein we show how these experiments were performed to produce record performance, and demonstrate the data fidelity leading us to conclude that these shots have entered the burning plasma regime.
△ Less
Submitted 8 November, 2021;
originally announced November 2021.
-
Data Augmentation for Low-Resource Named Entity Recognition Using Backtranslation
Authors:
Usama Yaseen,
Stefan Langer
Abstract:
The state of art natural language processing systems relies on sizable training datasets to achieve high performance. Lack of such datasets in the specialized low resource domains lead to suboptimal performance. In this work, we adapt backtranslation to generate high quality and linguistically diverse synthetic data for low-resource named entity recognition. We perform experiments on two datasets…
▽ More
The state of art natural language processing systems relies on sizable training datasets to achieve high performance. Lack of such datasets in the specialized low resource domains lead to suboptimal performance. In this work, we adapt backtranslation to generate high quality and linguistically diverse synthetic data for low-resource named entity recognition. We perform experiments on two datasets from the materials science (MaSciP) and biomedical domains (S800). The empirical results demonstrate the effectiveness of our proposed augmentation strategy, particularly in the low-resource scenario.
△ Less
Submitted 26 August, 2021;
originally announced August 2021.
-
Convergence rates for shallow neural networks learned by gradient descent
Authors:
Alina Braun,
Michael Kohler,
Sophie Langer,
Harro Walk
Abstract:
In this paper we analyze the $L_2$ error of neural network regression estimates with one hidden layer. Under the assumption that the Fourier transform of the regression function decays suitably fast, we show that an estimate, where all initial weights are chosen according to proper uniform distributions and where the weights are learned by gradient descent, achieves a rate of convergence of…
▽ More
In this paper we analyze the $L_2$ error of neural network regression estimates with one hidden layer. Under the assumption that the Fourier transform of the regression function decays suitably fast, we show that an estimate, where all initial weights are chosen according to proper uniform distributions and where the weights are learned by gradient descent, achieves a rate of convergence of $1/\sqrt{n}$ (up to a logarithmic factor). Our statistical analysis implies that the key aspect behind this result is the proper choice of the initial inner weights and the adjustment of the outer weights via gradient descent. This indicates that we can also simply use linear least squares to choose the outer weights. We prove a corresponding theoretical result and compare our new linear least squares neural network estimate with standard neural network estimates via simulated data. Our simulations show that our theoretical considerations lead to an estimate with an improved performance in many cases.
△ Less
Submitted 18 August, 2023; v1 submitted 20 July, 2021;
originally announced July 2021.
-
Estimation of a regression function on a manifold by fully connected deep neural networks
Authors:
Michael Kohler,
Sophie Langer,
Ulrich Reif
Abstract:
Estimation of a regression function from independent and identically distributed data is considered. The $L_2$ error with integration with respect to the distribution of the predictor variable is used as the error criterion. The rate of convergence of least squares estimates based on fully connected spaces of deep neural networks with ReLU activation function is analyzed for smooth regression func…
▽ More
Estimation of a regression function from independent and identically distributed data is considered. The $L_2$ error with integration with respect to the distribution of the predictor variable is used as the error criterion. The rate of convergence of least squares estimates based on fully connected spaces of deep neural networks with ReLU activation function is analyzed for smooth regression functions. It is shown that in case that the distribution of the predictor variable is concentrated on a manifold, these estimates achieve a rate of convergence which depends on the dimension of the manifold and not on the number of components of the predictor variable.
△ Less
Submitted 20 July, 2021;
originally announced July 2021.
-
Neural Text Classification and Stacked Heterogeneous Embeddings for Named Entity Recognition in SMM4H 2021
Authors:
Usama Yaseen,
Stefan Langer
Abstract:
This paper presents our findings from participating in the SMM4H Shared Task 2021. We addressed Named Entity Recognition (NER) and Text Classification. To address NER we explored BiLSTM-CRF with Stacked Heterogeneous Embeddings and linguistic features. We investigated various machine learning algorithms (logistic regression, Support Vector Machine (SVM) and Neural Networks) to address text classif…
▽ More
This paper presents our findings from participating in the SMM4H Shared Task 2021. We addressed Named Entity Recognition (NER) and Text Classification. To address NER we explored BiLSTM-CRF with Stacked Heterogeneous Embeddings and linguistic features. We investigated various machine learning algorithms (logistic regression, Support Vector Machine (SVM) and Neural Networks) to address text classification. Our proposed approaches can be generalized to different languages and we have shown its effectiveness for English and Spanish. Our text classification submissions (team:MIC-NLP) have achieved competitive performance with F1-score of $0.46$ and $0.90$ on ADE Classification (Task 1a) and Profession Classification (Task 7a) respectively. In the case of NER, our submissions scored F1-score of $0.50$ and $0.82$ on ADE Span Detection (Task 1b) and Profession Span detection (Task 7b) respectively.
△ Less
Submitted 11 June, 2021; v1 submitted 10 June, 2021;
originally announced June 2021.
-
Fracture Toughness of Crystalline Solids
Authors:
J. S. Langer
Abstract:
This paper describes an attempt to construct a first-principles theory of the fracture toughness of crystalline solids. It is based on the thermodynamic dislocation theory (TDT), which starts with the assertion that dislocations in solids must obey the second law of thermodynamics. A second starting assumption is that fracture is initiated when the tip of a notch is driven to undergo a sharpening…
▽ More
This paper describes an attempt to construct a first-principles theory of the fracture toughness of crystalline solids. It is based on the thermodynamic dislocation theory (TDT), which starts with the assertion that dislocations in solids must obey the second law of thermodynamics. A second starting assumption is that fracture is initiated when the tip of a notch is driven to undergo a sharpening instability. The results of this analysis are developed in comparison with measurements by Gumbsch and colleagues of the notch toughness of both predeformed and non-predeformed tungsten crystals. The theory includes a mathematical conjecture regarding tip dynamics at small dislocation densities. Nevertheless, its predictions agree quantitatively with the experimental data, including both brittle and ductile fracture, over a wide range of temperatures, loading rates, and initial conditions.
△ Less
Submitted 29 March, 2021;
originally announced March 2021.
-
Statistical theory for image classification using deep convolutional neural networks with cross-entropy loss under the hierarchical max-pooling model
Authors:
Michael Kohler,
Sophie Langer
Abstract:
Convolutional neural networks (CNNs) trained with cross-entropy loss have proven to be extremely successful in classifying images. In recent years, much work has been done to also improve the theoretical understanding of neural networks. Nevertheless, it seems limited when these networks are trained with cross-entropy loss, mainly because of the unboundedness of the target function. In this paper,…
▽ More
Convolutional neural networks (CNNs) trained with cross-entropy loss have proven to be extremely successful in classifying images. In recent years, much work has been done to also improve the theoretical understanding of neural networks. Nevertheless, it seems limited when these networks are trained with cross-entropy loss, mainly because of the unboundedness of the target function. In this paper, we aim to fill this gap by analyzing the rate of the excess risk of a CNN classifier trained by cross-entropy loss. Under suitable assumptions on the smoothness and structure of the a posteriori probability, it is shown that these classifiers achieve a rate of convergence which is independent of the dimension of the image. These rates are in line with the practical observations about CNNs.
△ Less
Submitted 29 April, 2024; v1 submitted 27 November, 2020;
originally announced November 2020.
-
Combining Gesture and Voice Control for Mid-Air Manipulation of CAD Models in VR Environments
Authors:
Markus Friedrich,
Stefan Langer,
Fabian Frey
Abstract:
Modeling 3D objects in domains like Computer Aided Design (CAD) is time-consuming and comes with a steep learning curve needed to master the design process as well as tool complexities. In order to simplify the modeling process, we designed and implemented a prototypical system that leverages the strengths of Virtual Reality (VR) hand gesture recognition in combination with the expressiveness of a…
▽ More
Modeling 3D objects in domains like Computer Aided Design (CAD) is time-consuming and comes with a steep learning curve needed to master the design process as well as tool complexities. In order to simplify the modeling process, we designed and implemented a prototypical system that leverages the strengths of Virtual Reality (VR) hand gesture recognition in combination with the expressiveness of a voice-based interface for the task of 3D modeling. Furthermore, we use the Constructive Solid Geometry (CSG) tree representation for 3D models within the VR environment to let the user manipulate objects from the ground up, giving an intuitive understanding of how the underlying basic shapes connect. The system uses standard mid-air 3D object manipulation techniques and adds a set of voice commands to help mitigate the deficiencies of current hand gesture recognition techniques. A user study was conducted to evaluate the proposed prototype. The combination of our hybrid input paradigm shows to be a promising step towards easier to use CAD modeling.
△ Less
Submitted 18 November, 2020;
originally announced November 2020.
-
Analysis of the rate of convergence of fully connected deep neural network regression estimates with smooth activation function
Authors:
Sophie Langer
Abstract:
This article contributes to the current statistical theory of deep neural networks (DNNs). It was shown that DNNs are able to circumvent the so--called curse of dimensionality in case that suitable restrictions on the structure of the regression function hold. In most of those results the tuning parameter is the sparsity of the network, which describes the number of non-zero weights in the network…
▽ More
This article contributes to the current statistical theory of deep neural networks (DNNs). It was shown that DNNs are able to circumvent the so--called curse of dimensionality in case that suitable restrictions on the structure of the regression function hold. In most of those results the tuning parameter is the sparsity of the network, which describes the number of non-zero weights in the network. This constraint seemed to be the key factor for the good rate of convergence results. Recently, the assumption was disproved. In particular, it was shown that simple fully connected DNNs can achieve the same rate of convergence. Those fully connected DNNs are based on the unbounded ReLU activation function. In this article we extend the results to smooth activation functions, i.e., to the sigmoid activation function. It is shown that estimators based on fully connected DNNs with sigmoid activation function also achieve the minimax rates of convergence (up to $\ln n$-factors). In our result the number of hidden layers is fixed, the number of neurons per layer tends to infinity for sample size tending to infinity and a bound for the weights in the network is given.
△ Less
Submitted 12 October, 2020;
originally announced October 2020.
-
Approximating smooth functions by deep neural networks with sigmoid activation function
Authors:
Sophie Langer
Abstract:
We study the power of deep neural networks (DNNs) with sigmoid activation function. Recently, it was shown that DNNs approximate any $d$-dimensional, smooth function on a compact set with a rate of order $W^{-p/d}$, where $W$ is the number of nonzero weights in the network and $p$ is the smoothness of the function. Unfortunately, these rates only hold for a special class of sparsely connected DNNs…
▽ More
We study the power of deep neural networks (DNNs) with sigmoid activation function. Recently, it was shown that DNNs approximate any $d$-dimensional, smooth function on a compact set with a rate of order $W^{-p/d}$, where $W$ is the number of nonzero weights in the network and $p$ is the smoothness of the function. Unfortunately, these rates only hold for a special class of sparsely connected DNNs. We ask ourselves if we can show the same approximation rate for a simpler and more general class, i.e., DNNs which are only defined by its width and depth. In this article we show that DNNs with fixed depth and a width of order $M^d$ achieve an approximation rate of $M^{-2p}$. As a conclusion we quantitatively characterize the approximation power of DNNs in terms of the overall weights $W_0$ in the network and show an approximation rate of $W_0^{-p/d}$. This more general result finally helps us to understand which network topology guarantees a special target accuracy.
△ Less
Submitted 8 October, 2020;
originally announced October 2020.
-
A Patient-Centric Dataset of Images and Metadata for Identifying Melanomas Using Clinical Context
Authors:
Veronica Rotemberg,
Nicholas Kurtansky,
Brigid Betz-Stablein,
Liam Caffery,
Emmanouil Chousakos,
Noel Codella,
Marc Combalia,
Stephen Dusza,
Pascale Guitera,
David Gutman,
Allan Halpern,
Harald Kittler,
Kivanc Kose,
Steve Langer,
Konstantinos Lioprys,
Josep Malvehy,
Shenara Musthaq,
Jabpani Nanda,
Ofer Reiter,
George Shih,
Alexander Stratigos,
Philipp Tschandl,
Jochen Weber,
H. Peter Soyer
Abstract:
Prior skin image datasets have not addressed patient-level information obtained from multiple skin lesions from the same patient. Though artificial intelligence classification algorithms have achieved expert-level performance in controlled studies examining single images, in practice dermatologists base their judgment holistically from multiple lesions on the same patient. The 2020 SIIM-ISIC Melan…
▽ More
Prior skin image datasets have not addressed patient-level information obtained from multiple skin lesions from the same patient. Though artificial intelligence classification algorithms have achieved expert-level performance in controlled studies examining single images, in practice dermatologists base their judgment holistically from multiple lesions on the same patient. The 2020 SIIM-ISIC Melanoma Classification challenge dataset described herein was constructed to address this discrepancy between prior challenges and clinical practice, providing for each image in the dataset an identifier allowing lesions from the same patient to be mapped to one another. This patient-level contextual information is frequently used by clinicians to diagnose melanoma and is especially useful in ruling out false positives in patients with many atypical nevi. The dataset represents 2,056 patients from three continents with an average of 16 lesions per patient, consisting of 33,126 dermoscopic images and 584 histopathologically confirmed melanomas compared with benign melanoma mimickers.
△ Less
Submitted 7 August, 2020;
originally announced August 2020.
-
Content-based Recommendations for Radio Stations with Deep Learned Audio Fingerprints
Authors:
Stefan Langer,
Liza Obermeier,
André Ebert,
Markus Friedrich,
Emma Munisamy,
Claudia Linnhoff-Popien
Abstract:
The world of linear radio broadcasting is characterized by a wide variety of stations and played content. That is why finding stations playing the preferred content is a tough task for a potential listener, especially due to the overwhelming number of offered choices. Here, recommender systems usually step in but existing content-based approaches rely on metadata and thus are constrained by the av…
▽ More
The world of linear radio broadcasting is characterized by a wide variety of stations and played content. That is why finding stations playing the preferred content is a tough task for a potential listener, especially due to the overwhelming number of offered choices. Here, recommender systems usually step in but existing content-based approaches rely on metadata and thus are constrained by the available data quality. Other approaches leverage user behavior data and thus do not exploit any domain-specific knowledge and are furthermore disadvantageous regarding privacy concerns. Therefore, we propose a new pipeline for the generation of audio-based radio station fingerprints relying on audio stream crawling and a Deep Autoencoder. We show that the proposed fingerprints are especially useful for characterizing radio stations by their audio content and thus are an excellent representation for meaningful and reliable radio station recommendations. Furthermore, the proposed modules are part of the HRADIO Communication Platform, which enables hybrid radio features to radio stations. It is released with a flexible open source license and enables especially small- and medium-sized businesses, to provide customized and high quality radio services to potential listeners.
△ Less
Submitted 15 July, 2020;
originally announced July 2020.
-
Scaling Confirmation of the Thermodynamic Dislocation Theory
Authors:
J. S. Langer,
K. C. Le
Abstract:
We show that the thermodynamic dislocation theory (TDT) predicts a scaling relation between stresses, strain rates, and temperatures for steady-state deformations of crystalline solids, and that this relation is accurately obeyed by a wide range of experimental data for both aluminum and copper. Unlike conventional phenomenological dislocation theories, the TDT is based on the second law of thermo…
▽ More
We show that the thermodynamic dislocation theory (TDT) predicts a scaling relation between stresses, strain rates, and temperatures for steady-state deformations of crystalline solids, and that this relation is accurately obeyed by a wide range of experimental data for both aluminum and copper. Unlike conventional phenomenological dislocation theories, the TDT is based on the second law of thermodynamics. Its success implies that descriptions of solid deformation that are not based on the statistical mechanics of nonequilibrium processes cannot be relied upon to be predictive. Thus there is an urgent need -- and a new opportunity -- to revitalize this central part of materials physics.
△ Less
Submitted 26 March, 2020;
originally announced March 2020.
-
Statistical Thermodynamics of Dislocations in Solids
Authors:
J. S. Langer
Abstract:
This review is a simplified summary of the thermodynamic dislocation theory, with special emphasis on the role of an effective temperature. Materials scientists, for decades, have asserted that statistical thermodynamics is not applicable to dislocations. By use of simple, first-principles analyses and comparisons with experimental data, I argue that these scientists have been wrong, and that this…
▽ More
This review is a simplified summary of the thermodynamic dislocation theory, with special emphasis on the role of an effective temperature. Materials scientists, for decades, have asserted that statistical thermodynamics is not applicable to dislocations. By use of simple, first-principles analyses and comparisons with experimental data, I argue that these scientists have been wrong, and that this venerable field urgently needs to be revitalized because of its wide-ranging fundamental and technological importance. In addition to describing recent progress in understanding strain hardening, yielding, shear banding, and the like, I argue that the thermodynamic dislocation theory can lead to a much needed, first-principles understanding of brittle and ductile fracture in crystalline solids.
△ Less
Submitted 10 January, 2022; v1 submitted 4 March, 2020;
originally announced March 2020.
-
Enabling Machine Learning-Ready HPC Ensembles with Merlin
Authors:
J. Luc Peterson,
Ben Bay,
Joe Koning,
Peter Robinson,
Jessica Semler,
Jeremy White,
Rushil Anirudh,
Kevin Athey,
Peer-Timo Bremer,
Francesco Di Natale,
David Fox,
Jim A. Gaffney,
Sam A. Jacobs,
Bhavya Kailkhura,
Bogdan Kustowski,
Steven Langer,
Brian Spears,
Jayaraman Thiagarajan,
Brian Van Essen,
Jae-Seung Yeom
Abstract:
With the growing complexity of computational and experimental facilities, many scientific researchers are turning to machine learning (ML) techniques to analyze large scale ensemble data. With complexities such as multi-component workflows, heterogeneous machine architectures, parallel file systems, and batch scheduling, care must be taken to facilitate this analysis in a high performance computin…
▽ More
With the growing complexity of computational and experimental facilities, many scientific researchers are turning to machine learning (ML) techniques to analyze large scale ensemble data. With complexities such as multi-component workflows, heterogeneous machine architectures, parallel file systems, and batch scheduling, care must be taken to facilitate this analysis in a high performance computing (HPC) environment. In this paper, we present Merlin, a workflow framework to enable large ML-friendly ensembles of scientific HPC simulations. By augmenting traditional HPC with distributed compute technologies, Merlin aims to lower the barrier for scientific subject matter experts to incorporate ML into their analysis. In addition to its design, we describe some example applications that Merlin has enabled on leadership-class HPC resources, such as the ML-augmented optimization of nuclear fusion experiments and the calibration of infectious disease models to study the progression of and possible mitigation strategies for COVID-19.
△ Less
Submitted 1 July, 2021; v1 submitted 5 December, 2019;
originally announced December 2019.
-
Estimation of a function of low local dimensionality by deep neural networks
Authors:
Michael Kohler,
Adam Krzyzak,
Sophie Langer
Abstract:
Deep neural networks (DNNs) achieve impressive results for complicated tasks like object detection on images and speech recognition. Motivated by this practical success, there is now a strong interest in showing good theoretical properties of DNNs. To describe for which tasks DNNs perform well and when they fail, it is a key challenge to understand their performance. The aim of this paper is to co…
▽ More
Deep neural networks (DNNs) achieve impressive results for complicated tasks like object detection on images and speech recognition. Motivated by this practical success, there is now a strong interest in showing good theoretical properties of DNNs. To describe for which tasks DNNs perform well and when they fail, it is a key challenge to understand their performance. The aim of this paper is to contribute to the current statistical theory of DNNs. We apply DNNs on high dimensional data and we show that the least squares regression estimates using DNNs are able to achieve dimensionality reduction in case that the regression function has locally low dimensionality. Consequently, the rate of convergence of the estimate does not depend on its input dimension $d$, but on its local dimension $d^*$ and the DNNs are able to circumvent the curse of dimensionality in case that $d^*$ is much smaller than $d$. In our simulation study we provide numerical experiments to support our theoretical result and we compare our estimate with other conventional nonparametric regression estimates. The performance of our estimates is also validated in experiments with real data.
△ Less
Submitted 15 June, 2020; v1 submitted 29 August, 2019;
originally announced August 2019.
-
On the rate of convergence of fully connected very deep neural network regression estimates
Authors:
Michael Kohler,
Sophie Langer
Abstract:
Recent results in nonparametric regression show that deep learning, i.e., neural network estimates with many hidden layers, are able to circumvent the so-called curse of dimensionality in case that suitable restrictions on the structure of the regression function hold. One key feature of the neural networks used in these results is that their network architecture has a further constraint, namely t…
▽ More
Recent results in nonparametric regression show that deep learning, i.e., neural network estimates with many hidden layers, are able to circumvent the so-called curse of dimensionality in case that suitable restrictions on the structure of the regression function hold. One key feature of the neural networks used in these results is that their network architecture has a further constraint, namely the network sparsity. In this paper we show that we can get similar results also for least squares estimates based on simple fully connected neural networks with ReLU activation functions. Here either the number of neurons per hidden layer is fixed and the number of hidden layers tends to infinity suitably fast for sample size tending to infinity, or the number of hidden layers is bounded by some logarithmic factor in the sample size and the number of neurons per hidden layer tends to infinity suitably fast for sample size tending to infinity. The proof is based on new approximation results concerning deep neural networks.
△ Less
Submitted 29 September, 2020; v1 submitted 29 August, 2019;
originally announced August 2019.
-
Difficulty Classification of Mountainbike Downhill Trails utilizing Deep Neural Networks
Authors:
Stefan Langer,
Robert Müller,
Kyrill Schmid,
Claudia Linnhoff-Popien
Abstract:
The difficulty of mountainbike downhill trails is a subjective perception. However, sports-associations and mountainbike park operators attempt to group trails into different levels of difficulty with scales like the Singletrail-Skala (S0-S5) or colored scales (blue, red, black, ...) as proposed by The International Mountain Bicycling Association. Inconsistencies in difficulty grading occur due to…
▽ More
The difficulty of mountainbike downhill trails is a subjective perception. However, sports-associations and mountainbike park operators attempt to group trails into different levels of difficulty with scales like the Singletrail-Skala (S0-S5) or colored scales (blue, red, black, ...) as proposed by The International Mountain Bicycling Association. Inconsistencies in difficulty grading occur due to the various scales, different people grading the trails, differences in topography, and more. We propose an end-to-end deep learning approach to classify trails into three difficulties easy, medium, and hard by using sensor data. With mbientlab Meta Motion r0.2 sensor units, we record accelerometer- and gyroscope data of one rider on multiple trail segments. A 2D convolutional neural network is trained with a stacked and concatenated representation of the aforementioned data as its input. We run experiments with five different sample- and five different kernel sizes and achieve a maximum Sparse Categorical Accuracy of 0.9097. To the best of our knowledge, this is the first work targeting computational difficulty classification of mountainbike downhill trails.
△ Less
Submitted 5 August, 2019;
originally announced August 2019.
-
Soccer Team Vectors
Authors:
Robert Müller,
Stefan Langer,
Fabian Ritz,
Christoph Roch,
Steffen Illium,
Claudia Linnhoff-Popien
Abstract:
In this work we present STEVE - Soccer TEam VEctors, a principled approach for learning real valued vectors for soccer teams where similar teams are close to each other in the resulting vector space. STEVE only relies on freely available information about the matches teams played in the past. These vectors can serve as input to various machine learning tasks. Evaluating on the task of team market…
▽ More
In this work we present STEVE - Soccer TEam VEctors, a principled approach for learning real valued vectors for soccer teams where similar teams are close to each other in the resulting vector space. STEVE only relies on freely available information about the matches teams played in the past. These vectors can serve as input to various machine learning tasks. Evaluating on the task of team market value estimation, STEVE outperforms all its competitors. Moreover, we use STEVE for similarity search and to rank soccer teams.
△ Less
Submitted 31 March, 2020; v1 submitted 30 July, 2019;
originally announced August 2019.
-
Deep Neural Baselines for Computational Paralinguistics
Authors:
Daniel Elsner,
Stefan Langer,
Fabian Ritz,
Robert Müller,
Steffen Illium
Abstract:
Detecting sleepiness from spoken language is an ambitious task, which is addressed by the Interspeech 2019 Computational Paralinguistics Challenge (ComParE). We propose an end-to-end deep learning approach to detect and classify patterns reflecting sleepiness in the human voice. Our approach is based solely on a moderately complex deep neural network architecture. It may be applied directly on the…
▽ More
Detecting sleepiness from spoken language is an ambitious task, which is addressed by the Interspeech 2019 Computational Paralinguistics Challenge (ComParE). We propose an end-to-end deep learning approach to detect and classify patterns reflecting sleepiness in the human voice. Our approach is based solely on a moderately complex deep neural network architecture. It may be applied directly on the audio data without requiring any specific feature engineering, thus remaining transferable to other audio classification tasks. Nevertheless, our approach performs similar to state-of-the-art machine learning models.
△ Less
Submitted 5 July, 2019;
originally announced July 2019.
-
Brittle-Ductile Transitions in a Metallic Glass
Authors:
J. S. Langer
Abstract:
Recent computational and laboratory experiments have shown that the brittle-ductile transitions in metallic glasses such as Vitreloy1 are strongly sensitive to the initial effective disorder (or "fictive") temperature. Glasses with lower effective temperatures are weak and brittle; those with higher effective temperatures are strong and ductile. The analysis of this phenomenon presented here exami…
▽ More
Recent computational and laboratory experiments have shown that the brittle-ductile transitions in metallic glasses such as Vitreloy1 are strongly sensitive to the initial effective disorder (or "fictive") temperature. Glasses with lower effective temperatures are weak and brittle; those with higher effective temperatures are strong and ductile. The analysis of this phenomenon presented here examines the onset of fracture at the tip of a slightly rounded notch as predicted by the shear-transformation-zone (STZ) theory of spatially varying plastic deformation. The central ingredient of this analysis is an approximation for the dynamics of the plastic zone formed by stress concentration at the notch tip. This zone first shields the tip but then breaks down suddenly producing a discontinuous transition between brittle and ductile failure, in agreement with the numerical and experimental observations.
△ Less
Submitted 3 March, 2020; v1 submitted 8 June, 2019;
originally announced June 2019.
-
Statistical Thermodynamics of Crystal Plasticity
Authors:
J. S. Langer
Abstract:
This article is written in memory of Pierre Hohenberg with appreciation for his deep commitment to the basic principles of theoretical physics. I summarize recent developments in the theory of dislocation-enabled deformation of crystalline solids. This topic is especially appropriate for the Journal of Statistical Physics because materials scientists, for decades, have asserted that statistical th…
▽ More
This article is written in memory of Pierre Hohenberg with appreciation for his deep commitment to the basic principles of theoretical physics. I summarize recent developments in the theory of dislocation-enabled deformation of crystalline solids. This topic is especially appropriate for the Journal of Statistical Physics because materials scientists, for decades, have asserted that statistical thermodynamics is inapplicable to dislocations. By use of simple, first-principles analyses and comparisons with experimental data, I argue that these materials scientists have been wrong, and that this field should now be revisited because of its broad-ranging intellectual and technological importance.
△ Less
Submitted 5 November, 2018; v1 submitted 29 September, 2018;
originally announced October 2018.
-
Thermodynamic analysis of the Livermore molecular-dynamics simulations of dislocation-mediated plasticity
Authors:
J. S. Langer
Abstract:
Results of recent large-scale molecular dynamics simulations of dislocation-mediated solid plasticity are campared with predictions of the statistical thermodynamic theory of these phenomena. These computational and theoretical analyses are in substantial agreement with each other in both their descriptions of strain-rate dependent steady plastic flow and of a transient stress peak associated with…
▽ More
Results of recent large-scale molecular dynamics simulations of dislocation-mediated solid plasticity are campared with predictions of the statistical thermodynamic theory of these phenomena. These computational and theoretical analyses are in substantial agreement with each other in both their descriptions of strain-rate dependent steady plastic flow and of a transient stress peak associated with initially small densities of dislocations. The comparisons between the numerical simulations and basic theory reveal inconsistencies in some conventional phenomenological descriptions of solid plasticity.
△ Less
Submitted 9 May, 2018;
originally announced May 2018.
-
Thermodynamic dislocation theory of adiabatic shear banding in steel
Authors:
Khanh Chau Le,
Tuan Minh Tran,
James S. Langer
Abstract:
The statistical-thermodynamic dislocation theory developed in our earlier studies is used here in an analysis of the experimental observations of adiabatic shear banding in steel by Marchand and Duffy (1988). Employing a small set of physics-based parameters, which we expect to be approximately independent of strain rate and temperature, we are able to explain experimental stress-strain curves at…
▽ More
The statistical-thermodynamic dislocation theory developed in our earlier studies is used here in an analysis of the experimental observations of adiabatic shear banding in steel by Marchand and Duffy (1988). Employing a small set of physics-based parameters, which we expect to be approximately independent of strain rate and temperature, we are able to explain experimental stress-strain curves at six different temperatures and four different strain rates. We make a simple model of a weak notch-like disturbance that, when driven hard enough, triggers shear banding instabilities that are quantitatively comparable with those seen in the experiments.
△ Less
Submitted 16 October, 2017;
originally announced October 2017.
-
Learning crystal plasticity using digital image correlation: Examples from discrete dislocation dynamics
Authors:
Stefanos Papanikolaou,
Michail Tzimas,
Andrew C. E. Reid,
Stephen A. Langer
Abstract:
Digital image correlation (DIC) is a well-established, non-invasive technique for tracking and quantifying the deformation of mechanical samples under strain. While it provides an obvious way to observe incremental and aggregate displacement information, it seems likely that DIC data sets, which after all reflect the spatially-resolved response of a microstructure to loads, contain much richer inf…
▽ More
Digital image correlation (DIC) is a well-established, non-invasive technique for tracking and quantifying the deformation of mechanical samples under strain. While it provides an obvious way to observe incremental and aggregate displacement information, it seems likely that DIC data sets, which after all reflect the spatially-resolved response of a microstructure to loads, contain much richer information than has generally been extracted from them. In this paper, we demonstrate a machine-learning approach to quantifying the prior deformation history of a crystalline sample based on its response to a subsequent DIC test. This prior deformation history is encoded in the microstructure through the inhomogeneity of the dislocation microstructure, and in the spatial correlations of the dislocation patterns, which mediate the system's response to the DIC test load. Our domain consists of deformed crystalline thin films generated by a discrete dislocation plasticity simulation. We explore the range of applicability of machine learning (ML) for typical experimental protocols, and as a function of possible size effects and stochasticity. Plasticity size effects may directly influence the data, rendering unsupervised techniques unable to distinguish different plasticity regimes.
△ Less
Submitted 13 April, 2019; v1 submitted 24 September, 2017;
originally announced September 2017.
-
Thermodynamic theory of dislocation-enabled plasticity
Authors:
J. S. Langer
Abstract:
The thermodynamic theory of dislocation-enabled plasticity is based on two unconventional hypotheses. The first of these is that a system of dislocations, driven by external forces and irreversibly exchanging heat with its environment, must be characterized by a thermodynamically defined effective temperature that is not the same as the ordinary temperature. The second hypothesis is that the overw…
▽ More
The thermodynamic theory of dislocation-enabled plasticity is based on two unconventional hypotheses. The first of these is that a system of dislocations, driven by external forces and irreversibly exchanging heat with its environment, must be characterized by a thermodynamically defined effective temperature that is not the same as the ordinary temperature. The second hypothesis is that the overwhelmingly dominant mechanism controlling plastic deformation is thermally activated depinning of entangled pairs of dislocations. This paper consists of a systematic reformulation of this theory followed by examples of its use in analyses of experimentally observed phenomena including strain hardening, grain-size (Hall-Petch) effects, yielding transitions, and adiabatic shear banding.
△ Less
Submitted 17 July, 2017;
originally announced July 2017.
-
Thermodynamic dislocation theory of high-temperature deformation in aluminum and steel
Authors:
K. C. Le,
T. M. Tran,
J. S. Langer
Abstract:
The statistical-thermodynamic dislocation theory developed in previous papers is used here in an analysis of high-temperature deformation of aluminum and steel. Using physics-based parameters that we expect theoretically to be independent of strain rate and temperature, we are able to fit experimental stress-strain curves for three different strain rates and three different temperatures for each o…
▽ More
The statistical-thermodynamic dislocation theory developed in previous papers is used here in an analysis of high-temperature deformation of aluminum and steel. Using physics-based parameters that we expect theoretically to be independent of strain rate and temperature, we are able to fit experimental stress-strain curves for three different strain rates and three different temperatures for each of these two materials. Our theoretical curves include yielding transitions at zero strain in agreement with experiment. We find that thermal softening effects are important even at the lowest temperatures and smallest strain rates.
△ Less
Submitted 25 April, 2017;
originally announced April 2017.
-
Apache Lucene as Content-Based-Filtering Recommender System: 3 Lessons Learned
Authors:
Stefan Langer,
Joeran Beel
Abstract:
For the past few years, we used Apache Lucene as recommendation frame-work in our scholarly-literature recommender system of the reference-management software Docear. In this paper, we share three lessons learned from our work with Lucene. First, recommendations with relevance scores below 0.025 tend to have significantly lower click-through rates than recommendations with relevance scores above 0…
▽ More
For the past few years, we used Apache Lucene as recommendation frame-work in our scholarly-literature recommender system of the reference-management software Docear. In this paper, we share three lessons learned from our work with Lucene. First, recommendations with relevance scores below 0.025 tend to have significantly lower click-through rates than recommendations with relevance scores above 0.025. Second, by picking ten recommendations randomly from Lucene's top50 search results, click-through rate decreased by 15%, compared to recommending the top10 results. Third, the number of returned search results tend to predict how high click-through rates will be: when Lucene returns less than 1,000 search results, click-through rates tend to be around half as high as if 1,000+ results are returned.
△ Less
Submitted 26 March, 2017;
originally announced March 2017.
-
Yielding Transitions and Grain-Size Effects in Dislocation Theory
Authors:
J. S. Langer
Abstract:
The statistical-thermodynamic dislocation theory developed in previous papers is used here in an analysis of yielding transitions and grain-size effects in polycrystalline solids. Calculations are based on the 1995 experimental results of Meyers et al. for polycrystalline copper under strain-hardening conditions. The main assertion is that the well known Hall-Petch effects are caused by enhanced s…
▽ More
The statistical-thermodynamic dislocation theory developed in previous papers is used here in an analysis of yielding transitions and grain-size effects in polycrystalline solids. Calculations are based on the 1995 experimental results of Meyers et al. for polycrystalline copper under strain-hardening conditions. The main assertion is that the well known Hall-Petch effects are caused by enhanced strengths of dislocation sources at the edges of grains instead of the commonly assumed resistance to dislocation flow across grain boundaries. The theory describes rapid transitions between elastic and plastic deformation at yield points; thus it can be used to predict grain-size dependence of both yield stresses and flow stresses
△ Less
Submitted 28 January, 2017;
originally announced January 2017.
-
Thermal Effects in Dislocation Theory II: Shear Banding and Yielding Transitions
Authors:
J. S. Langer
Abstract:
The thermodynamic dislocation theory presented in preceding papers is used here to describe shear-banding instabilities. Central ingredients of the theory are a thermodynamically defined effective configurational temperature, and a formula for the plastic deformation rate determined by thermally activated depinning of entangled dislocations. An important feature of this paper is an interpretation…
▽ More
The thermodynamic dislocation theory presented in preceding papers is used here to describe shear-banding instabilities. Central ingredients of the theory are a thermodynamically defined effective configurational temperature, and a formula for the plastic deformation rate determined by thermally activated depinning of entangled dislocations. An important feature of this paper is an interpretation of yielding transitions in polycrystalline solids.
△ Less
Submitted 1 December, 2016;
originally announced December 2016.
-
Dynamical disentangling and cooling of atoms in bilayer optical lattices
Authors:
A. Kantian,
S. Langer,
A. J. Daley
Abstract:
We show how experimentally available bilayer lattice systems can be used to prepare quantum many-body states with exceptionally low entropy in one layer, by dynamically disentangling the two layers. This disentangling operation moves one layer - subsystem $A$ - into a regime where excitations in $A$ develop a single-particle gap. As a result, this operation maps directly to cooling for subsystem…
▽ More
We show how experimentally available bilayer lattice systems can be used to prepare quantum many-body states with exceptionally low entropy in one layer, by dynamically disentangling the two layers. This disentangling operation moves one layer - subsystem $A$ - into a regime where excitations in $A$ develop a single-particle gap. As a result, this operation maps directly to cooling for subsystem $A$, with entropy being shuttled to the other layer. For both bosonic and fermionic atoms, we study the dynamics of this process, and show that disentangling can be realised cleanly in ongoing optical lattice experiments. The corresponding entanglement entropies are directly measurable with quantum gas microscopes, and as a tool for producing lower-entropy states, this technique opens a range of applications beginning with simplifying production of anti-ferromagnetically ordered states of fermions.
△ Less
Submitted 12 September, 2016;
originally announced September 2016.
-
Thermal Effects in Dislocation Theory
Authors:
J. S. Langer
Abstract:
The mechanical behaviors of polycrystalline solids are determined by the interplay between phenomena governed by two different thermodynamic temperatures: the configurational effective temperature that controls the density of dislocations, and the ordinary kinetic-vibrational temperature that controls activated depinning mechanisms and thus deformation rates. This paper contains a review of the ef…
▽ More
The mechanical behaviors of polycrystalline solids are determined by the interplay between phenomena governed by two different thermodynamic temperatures: the configurational effective temperature that controls the density of dislocations, and the ordinary kinetic-vibrational temperature that controls activated depinning mechanisms and thus deformation rates. This paper contains a review of the effective-temperature theory and its relation to conventional dislocation theories. It includes a simple illustration of how these two thermal effects can combine to produce a predictive theory of spatial heterogeneities such as shear-banding instabilities. Its main message is a plea that conventional dislocation theories be reformulated in a thermodynamically consistent way so that the vast array of observed behaviors can be understood systematically.
△ Less
Submitted 24 July, 2016; v1 submitted 1 July, 2016;
originally announced July 2016.
-
Stick-slip instabilities in sheared granular flow: the role of friction and acoustic vibrations
Authors:
Charles K. C. Lieou,
Ahmed E. Elbanna,
J. S. Langer,
J. M. Carlson
Abstract:
We propose a theory of shear flow in dense granular materials. A key ingredient of the theory is an effective temperature that determines how the material responds to external driving forces such as shear stresses and vibrations. We show that, within our model, friction between grains produces stick-slip behavior at intermediate shear rates, even if the material is rate-strengthening at larger rat…
▽ More
We propose a theory of shear flow in dense granular materials. A key ingredient of the theory is an effective temperature that determines how the material responds to external driving forces such as shear stresses and vibrations. We show that, within our model, friction between grains produces stick-slip behavior at intermediate shear rates, even if the material is rate-strengthening at larger rates. In addition, externally generated acoustic vibrations alter the stick-slip amplitude, or suppress stick-slip altogether, depending on the pressure and shear rate. We construct a phase diagram that indicates the parameter regimes for which stick-slip occurs in the presence and absence of acoustic vibrations of a fixed amplitude and frequency. These results connect the microscopic physics to macroscopic dynamics, and thus produce useful information about a variety of granular phenomena including rupture and slip along earthquake faults, the remote triggering of instabilities, and the control of friction in material processing.
△ Less
Submitted 14 July, 2015; v1 submitted 31 May, 2015;
originally announced June 2015.
-
Statistical Thermodynamics of Strain Hardening in Polycrystalline Solids
Authors:
J. S. Langer
Abstract:
This paper starts with a systematic rederivation of the statistical thermodynamic equations of motion for dislocation-mediated plasticity proposed in 2010 by Langer, Bouchbinder and Lookman. It then uses that theory to explain the anomalous rate-hardening behavior reported in 1988 by Follansbee and Kocks, and to explore the relation between hardening rate and grain size reported in 1995 by Meyers…
▽ More
This paper starts with a systematic rederivation of the statistical thermodynamic equations of motion for dislocation-mediated plasticity proposed in 2010 by Langer, Bouchbinder and Lookman. It then uses that theory to explain the anomalous rate-hardening behavior reported in 1988 by Follansbee and Kocks, and to explore the relation between hardening rate and grain size reported in 1995 by Meyers et al. A central theme is the need for physics-based, nonequilibrium analyses in developing predictive theories of the strength of polycrystalline materials.
△ Less
Submitted 26 June, 2015; v1 submitted 25 May, 2015;
originally announced May 2015.
-
Dynamical Quasicondensation of Hard-Core Bosons at Finite Momenta
Authors:
L. Vidmar,
J. P. Ronzheimer,
M. Schreiber,
S. Braun,
S. S. Hodgman,
S. Langer,
F. Heidrich-Meisner,
I. Bloch,
U. Schneider
Abstract:
Long-range order in quantum many-body systems is usually associated with equilibrium situations. Here, we experimentally investigate the quasicondensation of strongly-interacting bosons at finite momenta in a far-from-equilibrium case. We prepare an inhomogeneous initial state consisting of one-dimensional Mott insulators in the center of otherwise empty one-dimensional chains in an optical lattic…
▽ More
Long-range order in quantum many-body systems is usually associated with equilibrium situations. Here, we experimentally investigate the quasicondensation of strongly-interacting bosons at finite momenta in a far-from-equilibrium case. We prepare an inhomogeneous initial state consisting of one-dimensional Mott insulators in the center of otherwise empty one-dimensional chains in an optical lattice with a lattice constant $d$. After suddenly quenching the trapping potential to zero, we observe the onset of coherence in spontaneously forming quasicondensates in the lattice. Remarkably, the emerging phase order differs from the ground-state order and is characterized by peaks at finite momenta $\pm (π/2) (\hbar / d)$ in the momentum distribution function.
△ Less
Submitted 19 October, 2015; v1 submitted 19 May, 2015;
originally announced May 2015.