subscribe to arXiv mailings

CESAR: Automatic Induction of Compositional Instructions for Multi-turn Dialogs

Authors: Taha Aksu, Devamanyu Hazarika, Shikib Mehri, Seokhwan Kim, Dilek Hakkani-Tür, Yang Liu, Mahdi Namazifar

Abstract: Instruction-based multitasking has played a critical role in the success of large language models (LLMs) in multi-turn dialog applications. While publicly available LLMs have shown promising performance, when exposed to complex instructions with multiple constraints, they lag against state-of-the-art models like ChatGPT. In this work, we hypothesize that the availability of large-scale complex dem… ▽ More Instruction-based multitasking has played a critical role in the success of large language models (LLMs) in multi-turn dialog applications. While publicly available LLMs have shown promising performance, when exposed to complex instructions with multiple constraints, they lag against state-of-the-art models like ChatGPT. In this work, we hypothesize that the availability of large-scale complex demonstrations is crucial in bridging this gap. Focusing on dialog applications, we propose a novel framework, CESAR, that unifies a large number of dialog tasks in the same format and allows programmatic induction of complex instructions without any manual effort. We apply CESAR on InstructDial, a benchmark for instruction-based dialog tasks. We further enhance InstructDial with new datasets and tasks and utilize CESAR to induce complex tasks with compositional instructions. This results in a new benchmark called InstructDial++, which includes 63 datasets with 86 basic tasks and 68 composite tasks. Through rigorous experiments, we demonstrate the scalability of CESAR in providing rich instructions. Models trained on InstructDial++ can follow compositional prompts, such as prompts that ask for multiple stylistic constraints. △ Less

Submitted 29 November, 2023; originally announced November 2023.

Comments: EMNLP 2023

arXiv:2311.14543 [pdf, other]

Data-Efficient Alignment of Large Language Models with Human Feedback Through Natural Language

Authors: Di Jin, Shikib Mehri, Devamanyu Hazarika, Aishwarya Padmakumar, Sungjin Lee, Yang Liu, Mahdi Namazifar

Abstract: Learning from human feedback is a prominent technique to align the output of large language models (LLMs) with human expectations. Reinforcement learning from human feedback (RLHF) leverages human preference signals that are in the form of ranking of response pairs to perform this alignment. However, human preference on LLM outputs can come in much richer forms including natural language, which ma… ▽ More Learning from human feedback is a prominent technique to align the output of large language models (LLMs) with human expectations. Reinforcement learning from human feedback (RLHF) leverages human preference signals that are in the form of ranking of response pairs to perform this alignment. However, human preference on LLM outputs can come in much richer forms including natural language, which may provide detailed feedback on strengths and weaknesses of a given response. In this work we investigate data efficiency of modeling human feedback that is in natural language. Specifically, we fine-tune an open-source LLM, e.g., Falcon-40B-Instruct, on a relatively small amount (1000 records or even less) of human feedback in natural language in the form of critiques and revisions of responses. We show that this model is able to improve the quality of responses from even some of the strongest LLMs such as ChatGPT, BARD, and Vicuna, through critique and revision of those responses. For instance, through one iteration of revision of ChatGPT responses, the revised responses have 56.6% win rate over the original ones, and this win rate can be further improved to 65.9% after applying the revision for five iterations. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: Accepted by Workshop on Instruction Tuning and Instruction Following at NeurIPS 2023, Submitted to AAAI 2024

arXiv:2310.20072 [pdf, other]

Automatic Evaluation of Generative Models with Instruction Tuning

Authors: Shuhaib Mehri, Vered Shwartz

Abstract: Automatic evaluation of natural language generation has long been an elusive goal in NLP.A recent paradigm fine-tunes pre-trained language models to emulate human judgements for a particular task and evaluation criterion. Inspired by the generalization ability of instruction-tuned models, we propose a learned metric based on instruction tuning. To test our approach, we collected HEAP, a dataset of… ▽ More Automatic evaluation of natural language generation has long been an elusive goal in NLP.A recent paradigm fine-tunes pre-trained language models to emulate human judgements for a particular task and evaluation criterion. Inspired by the generalization ability of instruction-tuned models, we propose a learned metric based on instruction tuning. To test our approach, we collected HEAP, a dataset of human judgements across various NLG tasks and evaluation criteria. Our findings demonstrate that instruction tuning language models on HEAP yields good performance on many evaluation tasks, though some criteria are less trivial to learn than others. Further, jointly training on multiple tasks can yield additional performance improvements, which can be beneficial for future tasks with little to no human annotated data. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: 11 pages, 1 figure

arXiv:2304.10162 [pdf, other]

Poly-Exp Bounds in Tandem Queues

Authors: Florin Ciucu, Sima Mehri

Abstract: When the arrival processes are Poisson, queueing networks are well-understood in terms of the product-form structure of the number of jobs $N_i$ at the individual queues; much less is known about the waiting time $W$ across the whole network. In turn, for non-Poisson arrivals, little is known about either $N_i$'s or $W$. This paper considers a tandem network… ▽ More When the arrival processes are Poisson, queueing networks are well-understood in terms of the product-form structure of the number of jobs $N_i$ at the individual queues; much less is known about the waiting time $W$ across the whole network. In turn, for non-Poisson arrivals, little is known about either $N_i$'s or $W$. This paper considers a tandem network $$GI/G/1\rightarrow \cdot/G/1\rightarrow\dots\rightarrow\cdot/G/1$$ with general arrivals and light-tailed service times. The main result is that the tail $¶(W>x)$ has a polynomial-exponential (Poly-Exp) structure by constructing upper bounds of the form $$(a_{I}x^{I}+\dots+a_1x+a_0)e^{-θx}~.$$ The degree $I$ of the polynomial depends on the number of bottleneck queues, their positions in the tandem, and also on the `light-tailedness' of the service times. The bounds hold in non-asymptotic regimes (i.e., for \textit{finite} $x$), are shown to be sharp, and improve upon alternative results based on large deviations by (many) orders of magnitude. The overall technique is also particularly robust as it immediately extends, for instance, to non-renewal arrivals. △ Less

Submitted 20 April, 2023; originally announced April 2023.

arXiv:2303.03069 [pdf, other]

doi 10.1103/PhysRevE.107.044702

Hidden scale invariance in the Gay-Berne model. II. Smectic B phase

Authors: Saeed Mehri, Jeppe C. Dyre, Trond S. Ingebrigtsen

Abstract: This paper complements a previous study of the isotropic and nematic phases of the Gay-Berne liquid-crystal model [Mehri et al., Phys. Rev. E 105, 064703 (2022)] with a study of its smectic B phase found at high density and low temperatures. We find also in this phase strong correlations between the virial and potential-energy thermal fluctuations, reflecting hidden scale invariance and implying t… ▽ More This paper complements a previous study of the isotropic and nematic phases of the Gay-Berne liquid-crystal model [Mehri et al., Phys. Rev. E 105, 064703 (2022)] with a study of its smectic B phase found at high density and low temperatures. We find also in this phase strong correlations between the virial and potential-energy thermal fluctuations, reflecting hidden scale invariance and implying the existence of isomorphs. The predicted approximate isomorph invariance of the physics is confirmed by simulations of the standard and orientational radial distribution functions, the mean-square displacement as a function of time, as well as the force, torque, velocity, angular velocity, and orientational time-autocorrelation functions. The regions of the Gay-Berne model that are relevant for liquid-crystal experiments can thus fully be simplified via the isomorph theory. △ Less

Submitted 6 April, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

Journal ref: Phys. Rev. E 107, 044702 (2023)

arXiv:2301.12004 [pdf, other]

Understanding the Effectiveness of Very Large Language Models on Dialog Evaluation

Authors: Jessica Huynh, Cathy Jiao, Prakhar Gupta, Shikib Mehri, Payal Bajaj, Vishrav Chaudhary, Maxine Eskenazi

Abstract: Language models have steadily increased in size over the past few years. They achieve a high level of performance on various natural language processing (NLP) tasks such as question answering and summarization. Large language models (LLMs) have been used for generation and can now output human-like text. Due to this, there are other downstream tasks in the realm of dialog that can now harness the… ▽ More Language models have steadily increased in size over the past few years. They achieve a high level of performance on various natural language processing (NLP) tasks such as question answering and summarization. Large language models (LLMs) have been used for generation and can now output human-like text. Due to this, there are other downstream tasks in the realm of dialog that can now harness the LLMs' language understanding capabilities. Dialog evaluation is one task that this paper will explore. It concentrates on prompting with LLMs: BLOOM, OPT, GPT-3, Flan-T5, InstructDial and TNLGv2. The paper shows that the choice of datasets used for training a model contributes to how well it performs on a task as well as on how the prompt should be structured. Specifically, the more diverse and relevant the group of datasets that a model is trained on, the better dialog evaluation performs. This paper also investigates how the number of examples in the prompt and the type of example selection used affect the model's performance. △ Less

Submitted 27 January, 2023; originally announced January 2023.

Comments: Accepted for publication at IWSDS 2023

arXiv:2208.10918 [pdf, other]

The DialPort tools

Authors: Jessica Huynh, Shikib Mehri, Cathy Jiao, Maxine Eskenazi

Abstract: The DialPort project http://dialport.org/, funded by the National Science Foundation (NSF), covers a group of tools and services that aim at fulfilling the needs of the dialog research community. Over the course of six years, several offerings have been created, including the DialPort Portal and DialCrowd. This paper describes these contributions, which will be demoed at SIGDIAL, including impleme… ▽ More The DialPort project http://dialport.org/, funded by the National Science Foundation (NSF), covers a group of tools and services that aim at fulfilling the needs of the dialog research community. Over the course of six years, several offerings have been created, including the DialPort Portal and DialCrowd. This paper describes these contributions, which will be demoed at SIGDIAL, including implementation, prior studies, corresponding discoveries, and the locations at which the tools will remain freely available to the community going forward. △ Less

Submitted 18 August, 2022; originally announced August 2022.

Comments: Accepted to SIGDIAL 2022

arXiv:2207.14403 [pdf, other]

Interactive Evaluation of Dialog Track at DSTC9

Authors: Shikib Mehri, Yulan Feng, Carla Gordon, Seyed Hossein Alavi, David Traum, Maxine Eskenazi

Abstract: The ultimate goal of dialog research is to develop systems that can be effectively used in interactive settings by real users. To this end, we introduced the Interactive Evaluation of Dialog Track at the 9th Dialog System Technology Challenge. This track consisted of two sub-tasks. The first sub-task involved building knowledge-grounded response generation models. The second sub-task aimed to exte… ▽ More The ultimate goal of dialog research is to develop systems that can be effectively used in interactive settings by real users. To this end, we introduced the Interactive Evaluation of Dialog Track at the 9th Dialog System Technology Challenge. This track consisted of two sub-tasks. The first sub-task involved building knowledge-grounded response generation models. The second sub-task aimed to extend dialog models beyond static datasets by assessing them in an interactive setting with real users. Our track challenges participants to develop strong response generation models and explore strategies that extend them to back-and-forth interactions with real users. The progression from static corpora to interactive evaluation introduces unique challenges and facilitates a more thorough assessment of open-domain dialog systems. This paper provides an overview of the track, including the methodology and results. Furthermore, it provides insights into how to best evaluate open-domain dialog models △ Less

Submitted 28 July, 2022; originally announced July 2022.

Comments: Presented at LREC 2022 and DSTC9 Workshop at AAAI 2021

arXiv:2207.14393 [pdf, other]

LAD: Language Models as Data for Zero-Shot Dialog

Authors: Shikib Mehri, Yasemin Altun, Maxine Eskenazi

Abstract: To facilitate zero-shot generalization in taskoriented dialog, this paper proposes Language Models as Data (LAD). LAD is a paradigm for creating diverse and accurate synthetic data which conveys the necessary structural constraints and can be used to train a downstream neural dialog model. LAD leverages GPT-3 to induce linguistic diversity. LAD achieves significant performance gains in zero-shot s… ▽ More To facilitate zero-shot generalization in taskoriented dialog, this paper proposes Language Models as Data (LAD). LAD is a paradigm for creating diverse and accurate synthetic data which conveys the necessary structural constraints and can be used to train a downstream neural dialog model. LAD leverages GPT-3 to induce linguistic diversity. LAD achieves significant performance gains in zero-shot settings on intent prediction (+15%), slot filling (+31.4 F-1) and next action prediction (+11 F1). Furthermore, an interactive human evaluation shows that training with LAD is competitive with training on human dialogs. LAD is open-sourced, with the code and data available at https://github.com/Shikib/lad. △ Less

Submitted 28 July, 2022; originally announced July 2022.

Comments: Accepted as a long paper to SIGDial 2022

arXiv:2206.05131 [pdf, other]

doi 10.3390/thermo2030013

Single-parameter aging in the weakly nonlinear limit

Authors: Saeed Mehri, Lorenzo Costigliola, Jeppe C. Dyre

Abstract: Physical aging deals with slow property changes over time caused by molecular rearrangements. This is relevant for non-crystalline materials like polymers and inorganic glasses, both in production and during subsequent use. The Narayanaswamy theory from 1971 describes physical aging - an inherently nonlinear phenomenon - in terms of a linear convolution integral over the so-called material time… ▽ More Physical aging deals with slow property changes over time caused by molecular rearrangements. This is relevant for non-crystalline materials like polymers and inorganic glasses, both in production and during subsequent use. The Narayanaswamy theory from 1971 describes physical aging - an inherently nonlinear phenomenon - in terms of a linear convolution integral over the so-called material time $ξ$. The resulting "Tool-Narayanaswamy (TN) formalism" is generally recognized to provide an excellent description of physical aging for small, but still highly nonlinear temperature variations. The simplest version of the TN formalism is single-parameter aging according to which the clock rate $dξ/dt$ is an exponential function of the property monitored [T. Hecksher et al., J. Chem. Phys. 142, 241103 (2015)]. For temperature jumps starting from thermal equilibrium, this leads to a first-order differential equation for property monitored, involving a system-specific function. The present paper shows analytically that the solution to this equation to first order in the temperature variation has a universal expression in terms of the zeroth-order solution, $R_0(t)$. Numerical data for a binary Lennard-Jones glass former probing the potential energy confirm that, in the weakly nonlinear limit, the theory predicts aging correctly from $R_0(t)$ (which by the fluctuation-dissipation theorem is the normalized equilibrium potential-energy time-autocorrelation function). △ Less

Submitted 6 July, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

Journal ref: Thermo 2, 160 (2022) [Open access]

arXiv:2205.12673 [pdf, other]

InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning

Authors: Prakhar Gupta, Cathy Jiao, Yi-Ting Yeh, Shikib Mehri, Maxine Eskenazi, Jeffrey P. Bigham

Abstract: Instruction tuning is an emergent paradigm in NLP wherein natural language instructions are leveraged with language models to induce zero-shot performance on unseen tasks. Instructions have been shown to enable good performance on unseen tasks and datasets in both large and small language models. Dialogue is an especially interesting area to explore instruction tuning because dialogue systems perf… ▽ More Instruction tuning is an emergent paradigm in NLP wherein natural language instructions are leveraged with language models to induce zero-shot performance on unseen tasks. Instructions have been shown to enable good performance on unseen tasks and datasets in both large and small language models. Dialogue is an especially interesting area to explore instruction tuning because dialogue systems perform multiple kinds of tasks related to language (e.g., natural language understanding and generation, domain-specific interaction), yet instruction tuning has not been systematically explored for dialogue-related tasks. We introduce InstructDial, an instruction tuning framework for dialogue, which consists of a repository of 48 diverse dialogue tasks in a unified text-to-text format created from 59 openly available dialogue datasets. Next, we explore cross-task generalization ability on models tuned on InstructDial across diverse dialogue tasks. Our analysis reveals that InstructDial enables good zero-shot performance on unseen datasets and tasks such as dialogue evaluation and intent detection, and even better performance in a few-shot setting. To ensure that models adhere to instructions, we introduce novel meta-tasks. We establish benchmark zero-shot and few-shot performance of models trained using the proposed framework on multiple dialogue tasks. △ Less

Submitted 26 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

Comments: EMNLP 2022

arXiv:2205.10306 [pdf, other]

doi 10.1103/PhysRevE.105.064703

Hidden scale invariance in the Gay-Berne model

Authors: Saeed Mehri, Jeppe C. Dyre, Trond S. Ingebrigtsen

Abstract: This paper presents a numerical study of the Gay-Berne liquid crystal model with parameters corresponding to calamitic (rod-shaped) molecules. The focus is on the isotropic and nematic phases at temperatures above unity. There we find strong correlations between the virial and potential-energy thermal fluctuations, reflecting the hidden-scale invariance symmetry. This implies the existence of isom… ▽ More This paper presents a numerical study of the Gay-Berne liquid crystal model with parameters corresponding to calamitic (rod-shaped) molecules. The focus is on the isotropic and nematic phases at temperatures above unity. There we find strong correlations between the virial and potential-energy thermal fluctuations, reflecting the hidden-scale invariance symmetry. This implies the existence of isomorphs, which are curves in the thermodynamic phase diagram of approximately invariant physics. We study numerically one isomorph in the isotropic phase and one in the nematic phase. In both cases, good invariance of the dynamics is demonstrated via data for the reduced-unit time-autocorrelation functions of the mean-square displacement, angular velocity, force, torque, and first- and second-order Legendre polynomial orientational order parameters. Deviations from isomorph invariance are observed at short times for the orientational time-autocorrelation functions, which reflects the fact that the moment of inertia is assumed to be constant and thus not isomorph invariant in reduced units. Structural isomorph invariance is demonstrated from data for the radial distribution functions of the particles and their orientations. For comparison, all quantities were also simulated along an isochore of similar temperature variation in which case invariance is not observed. We conclude that the thermodynamic phase diagram of the calamitic Gay-Berne model is essentially one-dimensional in the studied regions as predicted by isomorph theory, a fact that potentially allows for simplifications of future theories and numerical studies. △ Less

Submitted 19 June, 2022; v1 submitted 20 May, 2022; originally announced May 2022.

Journal ref: Phys. Rev. E 105, 064703 (2022)

arXiv:2203.10012 [pdf, ps, other]

Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges

Authors: Shikib Mehri, Jinho Choi, Luis Fernando D'Haro, Jan Deriu, Maxine Eskenazi, Milica Gasic, Kallirroi Georgila, Dilek Hakkani-Tur, Zekang Li, Verena Rieser, Samira Shaikh, David Traum, Yi-Ting Yeh, Zhou Yu, Yizhe Zhang, Chen Zhang

Abstract: This is a report on the NSF Future Directions Workshop on Automatic Evaluation of Dialog. The workshop explored the current state of the art along with its limitations and suggested promising directions for future work in this important and very rapidly changing area of research. This is a report on the NSF Future Directions Workshop on Automatic Evaluation of Dialog. The workshop explored the current state of the art along with its limitations and suggested promising directions for future work in this important and very rapidly changing area of research. △ Less

Submitted 18 March, 2022; originally announced March 2022.

Comments: Report from the NSF AED Workshop (http://dialrc.org/AED/)

arXiv:2109.11832 [pdf, other]

doi 10.1126/sciadv.abl9809

Predicting nonlinear physical aging of glasses from equilibrium relaxation via the material time

Authors: Birte Riechers, Lisa A. Roed, Saeed Mehri, Trond S. Ingebrigtsen, Tina Hecksher, Jeppe C. Dyre, Kristine Niss

Abstract: The noncrystalline glassy state of matter plays a role in virtually all fields of materials science and offers complementary properties to those of the crystalline counterpart. The caveat of the glassy state is that it is out of equilibrium and therefore exhibits physical aging, i.e., material properties change over time. For half a century, the physical aging of glasses has been known to be descr… ▽ More The noncrystalline glassy state of matter plays a role in virtually all fields of materials science and offers complementary properties to those of the crystalline counterpart. The caveat of the glassy state is that it is out of equilibrium and therefore exhibits physical aging, i.e., material properties change over time. For half a century, the physical aging of glasses has been known to be described well by the material-time concept, although the existence of a material time has never been directly validated. We do this here by successfully predicting the aging of the molecular glass 4-vinyl-1,3-dioxolan-2-one from its linear relaxation behavior. This establishes the defining property of the material time. Via the fluctuation-dissipation theorem, our results imply that physical aging can be predicted from thermal-equilibrium fluctuation data, which is confirmed by computer simulations of a binary liquid mixture. △ Less

Submitted 21 March, 2022; v1 submitted 24 September, 2021; originally announced September 2021.

Comments: Published in Science Advances

Journal ref: Sci. Adv. 8, eabl9809 (2022) [Open access]

arXiv:2106.07056 [pdf, other]

Schema-Guided Paradigm for Zero-Shot Dialog

Authors: Shikib Mehri, Maxine Eskenazi

Abstract: Developing mechanisms that flexibly adapt dialog systems to unseen tasks and domains is a major challenge in dialog research. Neural models implicitly memorize task-specific dialog policies from the training data. We posit that this implicit memorization has precluded zero-shot transfer learning. To this end, we leverage the schema-guided paradigm, wherein the task-specific dialog policy is explic… ▽ More Developing mechanisms that flexibly adapt dialog systems to unseen tasks and domains is a major challenge in dialog research. Neural models implicitly memorize task-specific dialog policies from the training data. We posit that this implicit memorization has precluded zero-shot transfer learning. To this end, we leverage the schema-guided paradigm, wherein the task-specific dialog policy is explicitly provided to the model. We introduce the Schema Attention Model (SAM) and improved schema representations for the STAR corpus. SAM obtains significant improvement in zero-shot settings, with a +22 F1 score improvement over prior work. These results validate the feasibility of zero-shot generalizability in dialog. Ablation experiments are also presented to demonstrate the efficacy of SAM. △ Less

Submitted 13 June, 2021; originally announced June 2021.

Comments: Accepted at SIGDial 2021

arXiv:2106.07055 [pdf, other]

GenSF: Simultaneous Adaptation of Generative Pre-trained Models and Slot Filling

Authors: Shikib Mehri, Maxine Eskenazi

Abstract: In transfer learning, it is imperative to achieve strong alignment between a pre-trained model and a downstream task. Prior work has done this by proposing task-specific pre-training objectives, which sacrifices the inherent scalability of the transfer learning paradigm. We instead achieve strong alignment by simultaneously modifying both the pre-trained model and the formulation of the downstream… ▽ More In transfer learning, it is imperative to achieve strong alignment between a pre-trained model and a downstream task. Prior work has done this by proposing task-specific pre-training objectives, which sacrifices the inherent scalability of the transfer learning paradigm. We instead achieve strong alignment by simultaneously modifying both the pre-trained model and the formulation of the downstream task, which is more efficient and preserves the scalability of transfer learning. We present GenSF (Generative Slot Filling), which leverages a generative pre-trained open-domain dialog model for slot filling. GenSF (1) adapts the pre-trained model by incorporating inductive biases about the task and (2) adapts the downstream task by reformulating slot filling to better leverage the pre-trained model's capabilities. GenSF achieves state-of-the-art results on two slot filling datasets with strong gains in few-shot and zero-shot settings. We achieve a 9 F1 score improvement in zero-shot slot filling. This highlights the value of strong alignment between the pre-trained model and the downstream task. △ Less

Submitted 13 June, 2021; originally announced June 2021.

Comments: Accepted at SIGDial 2021

arXiv:2106.03706 [pdf, other]

A Comprehensive Assessment of Dialog Evaluation Metrics

Authors: Yi-Ting Yeh, Maxine Eskenazi, Shikib Mehri

Abstract: Automatic evaluation metrics are a crucial component of dialog systems research. Standard language evaluation metrics are known to be ineffective for evaluating dialog. As such, recent research has proposed a number of novel, dialog-specific metrics that correlate better with human judgements. Due to the fast pace of research, many of these metrics have been assessed on different datasets and ther… ▽ More Automatic evaluation metrics are a crucial component of dialog systems research. Standard language evaluation metrics are known to be ineffective for evaluating dialog. As such, recent research has proposed a number of novel, dialog-specific metrics that correlate better with human judgements. Due to the fast pace of research, many of these metrics have been assessed on different datasets and there has as yet been no time for a systematic comparison between them. To this end, this paper provides a comprehensive assessment of recently proposed dialog evaluation metrics on a number of datasets. In this paper, 23 different automatic evaluation metrics are evaluated on 10 different datasets. Furthermore, the metrics are assessed in different settings, to better qualify their respective strengths and weaknesses. Metrics are assessed (1) on both the turn level and the dialog level, (2) for different dialog lengths, (3) for different dialog qualities (e.g., coherence, engaging), (4) for different types of response generation models (i.e., generative, retrieval, simple models and state-of-the-art models), (5) taking into account the similarity of different metrics and (6) exploring combinations of different metrics. This comprehensive assessment offers several takeaways pertaining to dialog evaluation metrics in general. It also suggests how to best assess evaluation metrics and indicates promising directions for future work. △ Less

Submitted 7 July, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

arXiv:2103.02650 [pdf, other]

Successor Feature Sets: Generalizing Successor Representations Across Policies

Authors: Kianté Brantley, Soroush Mehri, Geoffrey J. Gordon

Abstract: Successor-style representations have many advantages for reinforcement learning: for example, they can help an agent generalize from past experience to new goals, and they have been proposed as explanations of behavioral and neural data from human and animal learners. They also form a natural bridge between model-based and model-free RL methods: like the former they make predictions about future e… ▽ More Successor-style representations have many advantages for reinforcement learning: for example, they can help an agent generalize from past experience to new goals, and they have been proposed as explanations of behavioral and neural data from human and animal learners. They also form a natural bridge between model-based and model-free RL methods: like the former they make predictions about future experiences, and like the latter they allow efficient prediction of total discounted rewards. However, successor-style representations are not optimized to generalize across policies: typically, we maintain a limited-length list of policies, and share information among them by representation learning or GPI. Successor-style representations also typically make no provision for gathering information or reasoning about latent variables. To address these limitations, we bring together ideas from predictive state representations, belief space value iteration, successor features, and convex analysis: we develop a new, general successor-style representation, together with a Bellman equation that connects multiple sources of information within this representation, including different latent states, policies, and reward functions. The new representation is highly expressive: for example, it lets us efficiently read off an optimal policy for a new reward function, or a policy that imitates a new demonstration. For this paper, we focus on exact computation of the new representation in small, known environments, since even this restricted setting offers plenty of interesting questions. Our implementation does not scale to large, unknown environments -- nor would we expect it to, since it generalizes POMDP value iteration, which is difficult to scale. However, we believe that future work will allow us to extend our ideas to approximate reasoning in large, unknown environments. △ Less

Submitted 15 March, 2021; v1 submitted 3 March, 2021; originally announced March 2021.

arXiv:2012.00358 [pdf, other]

doi 10.1063/5.0039250

Single-parameter aging in a binary Lennard-Jones system

Authors: Saeed Mehri, Trond S. Ingebrigtsen, Jeppe C. Dyre

Abstract: This paper studies physical aging by computer simulations of a 2:1 Kob-Andersen binary Lennard-Jones mixture, a system that is less prone to crystallization than the standard 4:1 composition. Starting from thermal-equilibrium states, the time evolution of the following four quantities is monitored following up and down jumps in temperature: the potential energy, the virial, the average squared for… ▽ More This paper studies physical aging by computer simulations of a 2:1 Kob-Andersen binary Lennard-Jones mixture, a system that is less prone to crystallization than the standard 4:1 composition. Starting from thermal-equilibrium states, the time evolution of the following four quantities is monitored following up and down jumps in temperature: the potential energy, the virial, the average squared force, and the Laplacian of the potential energy. Despite the fact that significantly larger temperature jumps are studied here than in previous experiments, to a good approximation all four quantities conform to the single-parameter-aging scenario derived and validated for small jumps in experiments [Hecksher et al., J. Chem. Phys. 142, 241103 (2015)]. As a further confirmation of single-parameter aging with a common material time for the different quantities monitored, their relaxing parts are found to be almost identical for all temperature jumps. △ Less

Submitted 22 January, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

Journal ref: J. Chem. Phys. 154, 094504 (2021)

arXiv:2011.06486 [pdf, ps, other]

Overview of the Ninth Dialog System Technology Challenge: DSTC9

Authors: Chulaka Gunasekara, Seokhwan Kim, Luis Fernando D'Haro, Abhinav Rastogi, Yun-Nung Chen, Mihail Eric, Behnam Hedayatnia, Karthik Gopalakrishnan, Yang Liu, Chao-Wei Huang, Dilek Hakkani-Tür, Jinchao Li, Qi Zhu, Lingxiao Luo, Lars Liden, Kaili Huang, Shahin Shayandeh, Runze Liang, Baolin Peng, Zheng Zhang, Swadheen Shukla, Minlie Huang, Jianfeng Gao, Shikib Mehri, Yulan Feng , et al. (14 additional authors not shown)

Abstract: This paper introduces the Ninth Dialog System Technology Challenge (DSTC-9). This edition of the DSTC focuses on applying end-to-end dialog technologies for four distinct tasks in dialog systems, namely, 1. Task-oriented dialog Modeling with unstructured knowledge access, 2. Multi-domain task-oriented dialog, 3. Interactive evaluation of dialog, and 4. Situated interactive multi-modal dialog. This… ▽ More This paper introduces the Ninth Dialog System Technology Challenge (DSTC-9). This edition of the DSTC focuses on applying end-to-end dialog technologies for four distinct tasks in dialog systems, namely, 1. Task-oriented dialog Modeling with unstructured knowledge access, 2. Multi-domain task-oriented dialog, 3. Interactive evaluation of dialog, and 4. Situated interactive multi-modal dialog. This paper describes the task definition, provided datasets, baselines and evaluation set-up for each track. We also summarize the results of the submitted systems to highlight the overall trends of the state-of-the-art technologies for the tasks. △ Less

Submitted 12 November, 2020; originally announced November 2020.

arXiv:2011.00669 [pdf, other]

Reasoning Over History: Context Aware Visual Dialog

Authors: Muhammad A. Shah, Shikib Mehri, Tejas Srinivasan

Abstract: While neural models have been shown to exhibit strong performance on single-turn visual question answering (VQA) tasks, extending VQA to a multi-turn, conversational setting remains a challenge. One way to address this challenge is to augment existing strong neural VQA models with the mechanisms that allow them to retain information from previous dialog turns. One strong VQA model is the MAC netwo… ▽ More While neural models have been shown to exhibit strong performance on single-turn visual question answering (VQA) tasks, extending VQA to a multi-turn, conversational setting remains a challenge. One way to address this challenge is to augment existing strong neural VQA models with the mechanisms that allow them to retain information from previous dialog turns. One strong VQA model is the MAC network, which decomposes a task into a series of attention-based reasoning steps. However, since the MAC network is designed for single-turn question answering, it is not capable of referring to past dialog turns. More specifically, it struggles with tasks that require reasoning over the dialog history, particularly coreference resolution. We extend the MAC network architecture with Context-aware Attention and Memory (CAM), which attends over control states in past dialog turns to determine the necessary reasoning operations for the current question. MAC nets with CAM achieve up to 98.25% accuracy on the CLEVR-Dialog dataset, beating the existing state-of-the-art by 30% (absolute). Our error analysis indicates that with CAM, the model's performance particularly improved on questions that required coreference resolution. △ Less

Submitted 1 November, 2020; originally announced November 2020.

Comments: Accepted to NLP Beyond Text workshop, EMNLP 2020

arXiv:2010.11853 [pdf, other]

STAR: A Schema-Guided Dialog Dataset for Transfer Learning

Authors: Johannes E. M. Mosig, Shikib Mehri, Thomas Kober

Abstract: We present STAR, a schema-guided task-oriented dialog dataset consisting of 127,833 utterances and knowledge base queries across 5,820 task-oriented dialogs in 13 domains that is especially designed to facilitate task and domain transfer learning in task-oriented dialog. Furthermore, we propose a scalable crowd-sourcing paradigm to collect arbitrarily large datasets of the same quality as STAR. Mo… ▽ More We present STAR, a schema-guided task-oriented dialog dataset consisting of 127,833 utterances and knowledge base queries across 5,820 task-oriented dialogs in 13 domains that is especially designed to facilitate task and domain transfer learning in task-oriented dialog. Furthermore, we propose a scalable crowd-sourcing paradigm to collect arbitrarily large datasets of the same quality as STAR. Moreover, we introduce novel schema-guided dialog models that use an explicit description of the task(s) to generalize from known to unknown tasks. We demonstrate the effectiveness of these models, particularly for zero-shot generalization across tasks and domains. △ Less

Submitted 22 October, 2020; originally announced October 2020.

Comments: Equal contribution: Johannes E. M. Mosig, Shikib Mehri

arXiv:2010.08684 [pdf, other]

Example-Driven Intent Prediction with Observers

Authors: Shikib Mehri, Mihail Eric

Abstract: A key challenge of dialog systems research is to effectively and efficiently adapt to new domains. A scalable paradigm for adaptation necessitates the development of generalizable models that perform well in few-shot settings. In this paper, we focus on the intent classification problem which aims to identify user intents given utterances addressed to the dialog system. We propose two approaches f… ▽ More A key challenge of dialog systems research is to effectively and efficiently adapt to new domains. A scalable paradigm for adaptation necessitates the development of generalizable models that perform well in few-shot settings. In this paper, we focus on the intent classification problem which aims to identify user intents given utterances addressed to the dialog system. We propose two approaches for improving the generalizability of utterance classification models: (1) observers and (2) example-driven training. Prior work has shown that BERT-like models tend to attribute a significant amount of attention to the [CLS] token, which we hypothesize results in diluted representations. Observers are tokens that are not attended to, and are an alternative to the [CLS] token as a semantic representation of utterances. Example-driven training learns to classify utterances by comparing to examples, thereby using the underlying encoder as a sentence similarity model. These methods are complementary; improving the representation through observers allows the example-driven model to better measure sentence similarities. When combined, the proposed methods attain state-of-the-art results on three intent prediction datasets (\textsc{banking77}, \textsc{clinc150}, \textsc{hwu64}) in both the full data and few-shot (10 examples per intent) settings. Furthermore, we demonstrate that the proposed approach can transfer to new intents and across datasets without any additional training. △ Less

Submitted 24 May, 2021; v1 submitted 16 October, 2020; originally announced October 2020.

Comments: NAACL 2021

arXiv:2009.13570 [pdf, ps, other]

DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue

Authors: Shikib Mehri, Mihail Eric, Dilek Hakkani-Tur

Abstract: A long-standing goal of task-oriented dialogue research is the ability to flexibly adapt dialogue models to new domains. To progress research in this direction, we introduce DialoGLUE (Dialogue Language Understanding Evaluation), a public benchmark consisting of 7 task-oriented dialogue datasets covering 4 distinct natural language understanding tasks, designed to encourage dialogue research in re… ▽ More A long-standing goal of task-oriented dialogue research is the ability to flexibly adapt dialogue models to new domains. To progress research in this direction, we introduce DialoGLUE (Dialogue Language Understanding Evaluation), a public benchmark consisting of 7 task-oriented dialogue datasets covering 4 distinct natural language understanding tasks, designed to encourage dialogue research in representation-based transfer, domain adaptation, and sample-efficient task learning. We release several strong baseline models, demonstrating performance improvements over a vanilla BERT architecture and state-of-the-art results on 5 out of 7 tasks, by pre-training on a large open-domain dialogue corpus and task-adaptive self-supervised training. Through the DialoGLUE benchmark, the baseline methods, and our evaluation scripts, we hope to facilitate progress towards the goal of developing more general task-oriented dialogue models. △ Less

Submitted 30 September, 2020; v1 submitted 28 September, 2020; originally announced September 2020.

Comments: Benchmark hosted on: https://evalai.cloudcv.org/web/challenges/challenge-page/708/

arXiv:2006.12719 [pdf, ps, other]

Unsupervised Evaluation of Interactive Dialog with DialoGPT

Authors: Shikib Mehri, Maxine Eskenazi

Abstract: It is important to define meaningful and interpretable automatic evaluation metrics for open-domain dialog research. Standard language generation metrics have been shown to be ineffective for dialog. This paper introduces the FED metric (fine-grained evaluation of dialog), an automatic evaluation metric which uses DialoGPT, without any fine-tuning or supervision. It also introduces the FED dataset… ▽ More It is important to define meaningful and interpretable automatic evaluation metrics for open-domain dialog research. Standard language generation metrics have been shown to be ineffective for dialog. This paper introduces the FED metric (fine-grained evaluation of dialog), an automatic evaluation metric which uses DialoGPT, without any fine-tuning or supervision. It also introduces the FED dataset which is constructed by annotating a set of human-system and human-human conversations with eighteen fine-grained dialog qualities. The FED metric (1) does not rely on a ground-truth response, (2) does not require training data and (3) measures fine-grained dialog qualities at both the turn and whole dialog levels. FED attains moderate to strong correlation with human judgement at both levels. △ Less

Submitted 22 June, 2020; originally announced June 2020.

Comments: Published at to SIGdial 2020

arXiv:2005.00456 [pdf, other]

USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation

Authors: Shikib Mehri, Maxine Eskenazi

Abstract: The lack of meaningful automatic evaluation metrics for dialog has impeded open-domain dialog research. Standard language generation metrics have been shown to be ineffective for evaluating dialog models. To this end, this paper presents USR, an UnSupervised and Reference-free evaluation metric for dialog. USR is a reference-free metric that trains unsupervised models to measure several desirable… ▽ More The lack of meaningful automatic evaluation metrics for dialog has impeded open-domain dialog research. Standard language generation metrics have been shown to be ineffective for evaluating dialog models. To this end, this paper presents USR, an UnSupervised and Reference-free evaluation metric for dialog. USR is a reference-free metric that trains unsupervised models to measure several desirable qualities of dialog. USR is shown to strongly correlate with human judgment on both Topical-Chat (turn-level: 0.42, system-level: 1.0) and PersonaChat (turn-level: 0.48 and system-level: 1.0). USR additionally produces interpretable measures for several desirable properties of dialog. △ Less

Submitted 1 May, 2020; originally announced May 2020.

Comments: Accepted to ACL 2020 as long paper

arXiv:2004.01926 [pdf, other]

"None of the Above":Measure Uncertainty in Dialog Response Retrieval

Authors: Yulan Feng, Shikib Mehri, Maxine Eskenazi, Tiancheng Zhao

Abstract: This paper discusses the importance of uncovering uncertainty in end-to-end dialog tasks, and presents our experimental results on uncertainty classification on the Ubuntu Dialog Corpus. We show that, instead of retraining models for this specific purpose, the original retrieval model's underlying confidence concerning the best prediction can be captured with trivial additional computation. This paper discusses the importance of uncovering uncertainty in end-to-end dialog tasks, and presents our experimental results on uncertainty classification on the Ubuntu Dialog Corpus. We show that, instead of retraining models for this specific purpose, the original retrieval model's underlying confidence concerning the best prediction can be captured with trivial additional computation. △ Less

Submitted 14 May, 2020; v1 submitted 4 April, 2020; originally announced April 2020.

Comments: Accepted to ACL 2020 as short paper

arXiv:1912.09863 [pdf, other]

Discretizations of Stochastic Evolution Equations in Variational Approach Driven by Jump-Diffusion

Authors: Sima Mehri, Erfan Salavati, Bijan Z. Zangeneh

Abstract: Stochastic evolution equations with compensated Poisson noise are considered in the variational approach with monotone and coercive coefficients. Here the Poisson noise is assumed to be time-homogeneous with $σ$-finite intensity measure on a metric space. By using finite element methods and Galerkin approximations, some explicit and implicit discretizations for this equation are presented and thei… ▽ More Stochastic evolution equations with compensated Poisson noise are considered in the variational approach with monotone and coercive coefficients. Here the Poisson noise is assumed to be time-homogeneous with $σ$-finite intensity measure on a metric space. By using finite element methods and Galerkin approximations, some explicit and implicit discretizations for this equation are presented and their convergence is proved. Polynomial growth condition and linear growth condition are assumed on the drift operator, respectively for the implicit and explicit schemes. △ Less

Submitted 19 April, 2022; v1 submitted 20 December, 2019; originally announced December 2019.

MSC Class: 60H15; 65M60; 60G51; 47H05; 47J35

arXiv:1911.03861 [pdf, other]

Increasing Robustness to Spurious Correlations using Forgettable Examples

Authors: Yadollah Yaghoobzadeh, Soroush Mehri, Remi Tachet, T. J. Hazen, Alessandro Sordoni

Abstract: Neural NLP models tend to rely on spurious correlations between labels and input features to perform their tasks. Minority examples, i.e., examples that contradict the spurious correlations present in the majority of data points, have been shown to increase the out-of-distribution generalization of pre-trained language models. In this paper, we first propose using example forgetting to find minori… ▽ More Neural NLP models tend to rely on spurious correlations between labels and input features to perform their tasks. Minority examples, i.e., examples that contradict the spurious correlations present in the majority of data points, have been shown to increase the out-of-distribution generalization of pre-trained language models. In this paper, we first propose using example forgetting to find minority examples without prior knowledge of the spurious correlations present in the dataset. Forgettable examples are instances either learned and then forgotten during training or never learned. We empirically show how these examples are related to minorities in our training sets. Then, we introduce a new approach to robustify models by fine-tuning our models twice, first on the full training data and second on the minorities only. We obtain substantial improvements in out-of-distribution generalization when applying our approach to the MNLI, QQP, and FEVER datasets. △ Less

Submitted 1 February, 2021; v1 submitted 10 November, 2019; originally announced November 2019.

Comments: 14 pages, Accepted at EACL2021

arXiv:1909.01322 [pdf, other]

CMU GetGoing: An Understandable and Memorable Dialog System for Seniors

Authors: Shikib Mehri, Alan W Black, Maxine Eskenazi

Abstract: Voice-based technologies are typically developed for the average user, and thus generally not tailored to the specific needs of any subgroup of the population, like seniors. This paper presents CMU GetGoing, an accessible trip planning dialog system designed for senior users. The GetGoing system design is described in detail, with particular attention to the senior-tailored features. A user study… ▽ More Voice-based technologies are typically developed for the average user, and thus generally not tailored to the specific needs of any subgroup of the population, like seniors. This paper presents CMU GetGoing, an accessible trip planning dialog system designed for senior users. The GetGoing system design is described in detail, with particular attention to the senior-tailored features. A user study is presented, demonstrating that the senior-tailored features significantly improve comprehension and retention of information. △ Less

Submitted 3 September, 2019; originally announced September 2019.

Comments: Accepted to the Dialog for Good (DiGo) workshop (http://dialogforgood.org) at SIGDial 2019

arXiv:1908.10646 [pdf, ps, other]

A Stochastic Gronwall Lemma and Well-Posedness of Path-Dependent SDEs Driven by Martingale Noise

Authors: Sima Mehri, Michael Scheutzow

Abstract: We show existence and uniqueness of solutions of stochastic path-dependent differential equations driven by cadlag martingale noise under joint local monotonicity and coercivity assumptions on the coefficients with a bound in terms of the supremum norm. In this set-up, the usual proof using the ordinary Gronwall lemma together with the Burkholder-Davis-Gundy inequality seems impossible. In order t… ▽ More We show existence and uniqueness of solutions of stochastic path-dependent differential equations driven by cadlag martingale noise under joint local monotonicity and coercivity assumptions on the coefficients with a bound in terms of the supremum norm. In this set-up, the usual proof using the ordinary Gronwall lemma together with the Burkholder-Davis-Gundy inequality seems impossible. In order to solve this problem, we prove a new and quite general stochastic Gronwall lemma for cadlag martingales using Lenglart's inequality. △ Less

Submitted 28 August, 2019; originally announced August 2019.

Comments: 18 pages

MSC Class: 34K50; 60H10; 60G57; 34K28; 60G44

arXiv:1908.09890 [pdf, ps, other]

Multi-Granularity Representations of Dialog

Authors: Shikib Mehri, Maxine Eskenazi

Abstract: Neural models of dialog rely on generalized latent representations of language. This paper introduces a novel training procedure which explicitly learns multiple representations of language at several levels of granularity. The multi-granularity training algorithm modifies the mechanism by which negative candidate responses are sampled in order to control the granularity of learned latent represen… ▽ More Neural models of dialog rely on generalized latent representations of language. This paper introduces a novel training procedure which explicitly learns multiple representations of language at several levels of granularity. The multi-granularity training algorithm modifies the mechanism by which negative candidate responses are sampled in order to control the granularity of learned latent representations. Strong performance gains are observed on the next utterance retrieval task using both the MultiWOZ dataset and the Ubuntu dialog corpus. Analysis significantly demonstrates that multiple granularities of representation are being learned, and that multi-granularity training facilitates better transfer to downstream tasks. △ Less

Submitted 26 August, 2019; originally announced August 2019.

Comments: Accepted as a long paper at EMNLP 2019

arXiv:1907.10568 [pdf, other]

Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References

Authors: Prakhar Gupta, Shikib Mehri, Tiancheng Zhao, Amy Pavel, Maxine Eskenazi, Jeffrey P. Bigham

Abstract: The aim of this paper is to mitigate the shortcomings of automatic evaluation of open-domain dialog systems through multi-reference evaluation. Existing metrics have been shown to correlate poorly with human judgement, particularly in open-domain dialog. One alternative is to collect human annotations for evaluation, which can be expensive and time consuming. To demonstrate the effectiveness of mu… ▽ More The aim of this paper is to mitigate the shortcomings of automatic evaluation of open-domain dialog systems through multi-reference evaluation. Existing metrics have been shown to correlate poorly with human judgement, particularly in open-domain dialog. One alternative is to collect human annotations for evaluation, which can be expensive and time consuming. To demonstrate the effectiveness of multi-reference evaluation, we augment the test set of DailyDialog with multiple references. A series of experiments show that the use of multiple references results in improved correlation between several automatic metrics and human judgement for both the quality and the diversity of system output. △ Less

Submitted 8 September, 2019; v1 submitted 24 July, 2019; originally announced July 2019.

Comments: SIGDIAL 2019

arXiv:1907.10016 [pdf, other]

Structured Fusion Networks for Dialog

Authors: Shikib Mehri, Tejas Srinivasan, Maxine Eskenazi

Abstract: Neural dialog models have exhibited strong performance, however their end-to-end nature lacks a representation of the explicit structure of dialog. This results in a loss of generalizability, controllability and a data-hungry nature. Conversely, more traditional dialog systems do have strong models of explicit structure. This paper introduces several approaches for explicitly incorporating structu… ▽ More Neural dialog models have exhibited strong performance, however their end-to-end nature lacks a representation of the explicit structure of dialog. This results in a loss of generalizability, controllability and a data-hungry nature. Conversely, more traditional dialog systems do have strong models of explicit structure. This paper introduces several approaches for explicitly incorporating structure into neural models of dialog. Structured Fusion Networks first learn neural dialog modules corresponding to the structured components of traditional dialog systems and then incorporate these modules in a higher-level generative model. Structured Fusion Networks obtain strong results on the MultiWOZ dataset, both with and without reinforcement learning. Structured Fusion Networks are shown to have several valuable properties, including better domain generalizability, improved performance in reduced data scenarios and robustness to divergence during reinforcement learning. △ Less

Submitted 23 July, 2019; originally announced July 2019.

Comments: Accepted to SIGDial 2019

arXiv:1906.00414 [pdf, other]

Pretraining Methods for Dialog Context Representation Learning

Authors: Shikib Mehri, Evgeniia Razumovskaia, Tiancheng Zhao, Maxine Eskenazi

Abstract: This paper examines various unsupervised pretraining objectives for learning dialog context representations. Two novel methods of pretraining dialog context encoders are proposed, and a total of four methods are examined. Each pretraining objective is fine-tuned and evaluated on a set of downstream dialog tasks using the MultiWoz dataset and strong performance improvement is observed. Further eval… ▽ More This paper examines various unsupervised pretraining objectives for learning dialog context representations. Two novel methods of pretraining dialog context encoders are proposed, and a total of four methods are examined. Each pretraining objective is fine-tuned and evaluated on a set of downstream dialog tasks using the MultiWoz dataset and strong performance improvement is observed. Further evaluation shows that our pretraining objectives result in not only better performance, but also better convergence, models that are less data hungry and have better domain generalizability. △ Less

Submitted 3 June, 2019; v1 submitted 2 June, 2019; originally announced June 2019.

Comments: Accepted to ACL 2019

arXiv:1901.07778 [pdf, ps, other]

doi 10.1142/S0219493719500424

Weak Solutions to Vlasov-McKean Equations under Lyapunov-Type Conditions

Authors: Sima Mehri, Wilhelm Stannat

Abstract: We present a Lyapunov type approach to the problem of existence and uniqueness of general law-dependent stochastic differential equations. In the existing literature most results concerning existence and uniqueness are obtained under regularity assumptions of the coefficients w.r.t the Wasserstein distance. Some existence and uniqueness results for irregular coefficients have been obtained by cons… ▽ More We present a Lyapunov type approach to the problem of existence and uniqueness of general law-dependent stochastic differential equations. In the existing literature most results concerning existence and uniqueness are obtained under regularity assumptions of the coefficients w.r.t the Wasserstein distance. Some existence and uniqueness results for irregular coefficients have been obtained by considering the total variation distance. Here we extend this approach to the control of the solution in some weighted total variation distance, that allows us now to derive a rather general weak uniqueness result, merely assuming measurability and certain integrability on the drift coefficient and some non-degeneracy on the dispersion coefficient. We also present an abstract weak existence result for the solution of law-dependent stochastic differential equations with merely measurable coefficients, based on an approximation with law-dependent stochastic differential equations with regular coefficients under Lyapunov type assumptions. △ Less

Submitted 16 November, 2019; v1 submitted 23 January, 2019; originally announced January 2019.

MSC Class: 60J60; 60H30; 93D30; 35Q83

Journal ref: Stochastics and Dynamics 2019

arXiv:1901.06613 [pdf, other]

Beyond Turing: Intelligent Agents Centered on the User

Authors: Maxine Eskenazi, Shikib Mehri, Evgeniia Razumovskaia, Tiancheng Zhao

Abstract: Most research on intelligent agents centers on the agent and not on the user. We look at the origins of agent-centric research for slot-filling, gaming and chatbot agents. We then argue that it is important to concentrate more on the user. After reviewing relevant literature, some approaches for creating and assessing user-centric systems are proposed. Most research on intelligent agents centers on the agent and not on the user. We look at the origins of agent-centric research for slot-filling, gaming and chatbot agents. We then argue that it is important to concentrate more on the user. After reviewing relevant literature, some approaches for creating and assessing user-centric systems are proposed. △ Less

Submitted 18 March, 2019; v1 submitted 19 January, 2019; originally announced January 2019.

Comments: 13 pages

arXiv:1810.11735 [pdf, other]

Middle-Out Decoding

Authors: Shikib Mehri, Leonid Sigal

Abstract: Despite being virtually ubiquitous, sequence-to-sequence models are challenged by their lack of diversity and inability to be externally controlled. In this paper, we speculate that a fundamental shortcoming of sequence generation models is that the decoding is done strictly from left-to-right, meaning that outputs values generated earlier have a profound effect on those generated later. To addres… ▽ More Despite being virtually ubiquitous, sequence-to-sequence models are challenged by their lack of diversity and inability to be externally controlled. In this paper, we speculate that a fundamental shortcoming of sequence generation models is that the decoding is done strictly from left-to-right, meaning that outputs values generated earlier have a profound effect on those generated later. To address this issue, we propose a novel middle-out decoder architecture that begins from an initial middle-word and simultaneously expands the sequence in both directions. To facilitate information flow and maintain consistent decoding, we introduce a dual self-attention mechanism that allows us to model complex dependencies between the outputs. We illustrate the performance of our model on the task of video captioning, as well as a synthetic sequence de-noising task. Our middle-out decoder achieves significant improvements on de-noising and competitive performance in the task of video captioning, while quantifiably improving the caption diversity. Furthermore, we perform a qualitative analysis that demonstrates our ability to effectively control the generation process of our decoder. △ Less

Submitted 27 October, 2018; originally announced October 2018.

Comments: Published as a conference paper at NIPS 2018

arXiv:1805.01654 [pdf, ps, other]

doi 10.1214/19-AAP1499

Propagation of Chaos for Stochastic Spatially Structured Neuronal Networks with Delay driven by Jump Diffusions

Authors: Sima Mehri, Michael Scheutzow, Wilhelm Stannat, Bijan Z. Zangeneh

Abstract: Spatially structured neural networks driven by jump diffusion noise with monotone coefficients, fully path dependent delay and with a disorder parameter are considered. Well-posedness for the associated McKean-Vlasov equation and a corresponding propagation of chaos result in the infinite population limit are proven. Our existence result for the McKean-Vlasov equation is based on the Euler approxi… ▽ More Spatially structured neural networks driven by jump diffusion noise with monotone coefficients, fully path dependent delay and with a disorder parameter are considered. Well-posedness for the associated McKean-Vlasov equation and a corresponding propagation of chaos result in the infinite population limit are proven. Our existence result for the McKean-Vlasov equation is based on the Euler approximation, that is applied to this type of equation for the first time. △ Less

Submitted 27 May, 2019; v1 submitted 4 May, 2018; originally announced May 2018.

Comments: In this version, a shorter title has been chosen. The manuscript has been accepted for publication in Annals of Applied Probability

MSC Class: primary: 60K35; 92B20 secondary: 65C20; 60F99; 82C80

Journal ref: Ann. Appl. Probab. 30 (2020), no. 1, 175-207

arXiv:1712.09926 [pdf, other]

Rapid Adaptation with Conditionally Shifted Neurons

Authors: Tsendsuren Munkhdalai, Xingdi Yuan, Soroush Mehri, Adam Trischler

Abstract: We describe a mechanism by which artificial neural networks can learn rapid adaptation - the ability to adapt on the fly, with little data, to new tasks - that we call conditionally shifted neurons. We apply this mechanism in the framework of metalearning, where the aim is to replicate some of the flexibility of human learning in machines. Conditionally shifted neurons modify their activation valu… ▽ More We describe a mechanism by which artificial neural networks can learn rapid adaptation - the ability to adapt on the fly, with little data, to new tasks - that we call conditionally shifted neurons. We apply this mechanism in the framework of metalearning, where the aim is to replicate some of the flexibility of human learning in machines. Conditionally shifted neurons modify their activation values with task-specific shifts retrieved from a memory module, which is populated rapidly based on limited task experience. On metalearning benchmarks from the vision and language domains, models augmented with conditionally shifted neurons achieve state-of-the-art results. △ Less

Submitted 3 July, 2018; v1 submitted 28 December, 2017; originally announced December 2017.

Comments: ICML 2018; Added: additional ablation and speed comparison with MetaNet

arXiv:1705.09792 [pdf, other]

Deep Complex Networks

Authors: Chiheb Trabelsi, Olexa Bilaniuk, Ying Zhang, Dmitriy Serdyuk, Sandeep Subramanian, João Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, Christopher J Pal

Abstract: At present, the vast majority of building blocks, techniques, and architectures for deep learning are based on real-valued operations and representations. However, recent work on recurrent neural networks and older fundamental theoretical analysis suggests that complex numbers could have a richer representational capacity and could also facilitate noise-robust memory retrieval mechanisms. Despite… ▽ More At present, the vast majority of building blocks, techniques, and architectures for deep learning are based on real-valued operations and representations. However, recent work on recurrent neural networks and older fundamental theoretical analysis suggests that complex numbers could have a richer representational capacity and could also facilitate noise-robust memory retrieval mechanisms. Despite their attractive properties and potential for opening up entirely new neural architectures, complex-valued deep neural networks have been marginalized due to the absence of the building blocks required to design such models. In this work, we provide the key atomic components for complex-valued deep neural networks and apply them to convolutional feed-forward networks and convolutional LSTMs. More precisely, we rely on complex convolutions and present algorithms for complex batch-normalization, complex weight initialization strategies for complex-valued neural nets and we use them in experiments with end-to-end training schemes. We demonstrate that such complex-valued models are competitive with their real-valued counterparts. We test deep complex models on several computer vision tasks, on music transcription using the MusicNet dataset and on Speech Spectrum Prediction using the TIMIT dataset. We achieve state-of-the-art performance on these audio-related tasks. △ Less

Submitted 25 February, 2018; v1 submitted 27 May, 2017; originally announced May 2017.

arXiv:1612.07837 [pdf, other]

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

Authors: Soroush Mehri, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham Jain, Jose Sotelo, Aaron Courville, Yoshua Bengio

Abstract: In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time. We show that our model, which profits from combining memory-less modules, namely autoregressive multilayer perceptrons, and stateful recurrent neural networks in a hierarchical structure is able to capture underlying sources of variations in the temporal sequences over very lon… ▽ More In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time. We show that our model, which profits from combining memory-less modules, namely autoregressive multilayer perceptrons, and stateful recurrent neural networks in a hierarchical structure is able to capture underlying sources of variations in the temporal sequences over very long time spans, on three datasets of different nature. Human evaluation on the generated samples indicate that our model is preferred over competing models. We also show how each component of the model contributes to the exhibited performance. △ Less

Submitted 11 February, 2017; v1 submitted 22 December, 2016; originally announced December 2016.

Comments: Published as a conference paper at ICLR 2017

arXiv:1509.03891 [pdf, other]

On Binary Classification with Single-Layer Convolutional Neural Networks

Authors: Soroush Mehri

Abstract: Convolutional neural networks are becoming standard tools for solving object recognition and visual tasks. However, most of the design and implementation of these complex models are based on trail-and-error. In this report, the main focus is to consider some of the important factors in designing convolutional networks to perform better. Specifically, classification with wide single-layer networks… ▽ More Convolutional neural networks are becoming standard tools for solving object recognition and visual tasks. However, most of the design and implementation of these complex models are based on trail-and-error. In this report, the main focus is to consider some of the important factors in designing convolutional networks to perform better. Specifically, classification with wide single-layer networks with large kernels as a general framework is considered. Particularly, we will show that pre-training using unsupervised schemes is vital, reasonable regularization is beneficial and applying of strong regularizers like dropout could be devastating. Pool size is also could be as important as learning procedure itself. In addition, it has been presented that using such a simple and relatively fast model for classifying cats and dogs, performance is close to state-of-the-art achievable by a combination of SVM models on color and texture features. △ Less

Submitted 13 September, 2015; originally announced September 2015.

arXiv:1404.5106 [pdf, other]

The Hockey Stick Theorems in Pascal and Trinomial Triangles

Authors: Sima Mehri

Abstract: We have found some patterns in some triangles. We have found some patterns in some triangles. △ Less

Submitted 30 May, 2016; v1 submitted 21 April, 2014; originally announced April 2014.

Showing 1–44 of 44 results for author: Mehri, S