-
InsightNet: Structured Insight Mining from Customer Feedback
Authors:
Sandeep Sricharan Mukku,
Manan Soni,
Jitenkumar Rana,
Chetan Aggarwal,
Promod Yenigalla,
Rashmi Patange,
Shyam Mohan
Abstract:
We propose InsightNet, a novel approach for the automated extraction of structured insights from customer reviews. Our end-to-end machine learning framework is designed to overcome the limitations of current solutions, including the absence of structure for identified topics, non-standard aspect names, and lack of abundant training data. The proposed solution builds a semi-supervised multi-level t…
▽ More
We propose InsightNet, a novel approach for the automated extraction of structured insights from customer reviews. Our end-to-end machine learning framework is designed to overcome the limitations of current solutions, including the absence of structure for identified topics, non-standard aspect names, and lack of abundant training data. The proposed solution builds a semi-supervised multi-level taxonomy from raw reviews, a semantic similarity heuristic approach to generate labelled data and employs a multi-task insight extraction architecture by fine-tuning an LLM. InsightNet identifies granular actionable topics with customer sentiments and verbatim for each topic. Evaluations on real-world customer review data show that InsightNet performs better than existing solutions in terms of structure, hierarchy and completeness. We empirically demonstrate that InsightNet outperforms the current state-of-the-art methods in multi-label topic classification, achieving an F1 score of 0.85, which is an improvement of 11% F1-score over the previous best results. Additionally, InsightNet generalises well for unseen aspects and suggests new topics to be added to the taxonomy.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
Combined Pre-Supernova Alert System with Kamland and Super-Kamiokande
Authors:
KamLAND,
Super-Kamiokande Collaborations,
:,
Seisho Abe,
Minori Eizuka,
Sawako Futagi,
Azusa Gando,
Yoshihito Gando,
Shun Goto,
Takahiko Hachiya,
Kazumi Hata,
Koichi Ichimura,
Sei Ieki,
Haruo Ikeda,
Kunio Inoue,
Koji Ishidoshiro,
Yuto Kamei,
Nanami Kawada,
Yasuhiro Kishimoto,
Masayuki Koga,
Maho Kurasawa,
Tadao Mitsui,
Haruhiko Miyake,
Daisuke Morita,
Takeshi Nakahata
, et al. (290 additional authors not shown)
Abstract:
Preceding a core-collapse supernova, various processes produce an increasing amount of neutrinos of all flavors characterized by mounting energies from the interior of massive stars. Among them, the electron antineutrinos are potentially detectable by terrestrial neutrino experiments such as KamLAND and Super-Kamiokande via inverse beta decay interactions. Once these pre-supernova neutrinos are ob…
▽ More
Preceding a core-collapse supernova, various processes produce an increasing amount of neutrinos of all flavors characterized by mounting energies from the interior of massive stars. Among them, the electron antineutrinos are potentially detectable by terrestrial neutrino experiments such as KamLAND and Super-Kamiokande via inverse beta decay interactions. Once these pre-supernova neutrinos are observed, an early warning of the upcoming core-collapse supernova can be provided. In light of this, KamLAND and Super-Kamiokande, both located in the Kamioka mine in Japan, have been monitoring pre-supernova neutrinos since 2015 and 2021, respectively. Recently, we performed a joint study between KamLAND and Super-Kamiokande on pre-supernova neutrino detection. A pre-supernova alert system combining the KamLAND detector and the Super-Kamiokande detector was developed and put into operation, which can provide a supernova alert to the astrophysics community. Fully leveraging the complementary properties of these two detectors, the combined alert is expected to resolve a pre-supernova neutrino signal from a 15 M$_{\odot}$ star within 510 pc of the Earth, at a significance level corresponding to a false alarm rate of no more than 1 per century. For a Betelgeuse-like model with optimistic parameters, it can provide early warnings up to 12 hours in advance.
△ Less
Submitted 1 July, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Dialogue with Robots: Proposals for Broadening Participation and Research in the SLIVAR Community
Authors:
Casey Kennington,
Malihe Alikhani,
Heather Pon-Barry,
Katherine Atwell,
Yonatan Bisk,
Daniel Fried,
Felix Gervits,
Zhao Han,
Mert Inan,
Michael Johnston,
Raj Korpan,
Diane Litman,
Matthew Marge,
Cynthia Matuszek,
Ross Mead,
Shiwali Mohan,
Raymond Mooney,
Natalie Parde,
Jivko Sinapov,
Angela Stewart,
Matthew Stone,
Stefanie Tellex,
Tom Williams
Abstract:
The ability to interact with machines using natural human language is becoming not just commonplace, but expected. The next step is not just text interfaces, but speech interfaces and not just with computers, but with all machines including robots. In this paper, we chronicle the recent history of this growing field of spoken dialogue with robots and offer the community three proposals, the first…
▽ More
The ability to interact with machines using natural human language is becoming not just commonplace, but expected. The next step is not just text interfaces, but speech interfaces and not just with computers, but with all machines including robots. In this paper, we chronicle the recent history of this growing field of spoken dialogue with robots and offer the community three proposals, the first focused on education, the second on benchmarks, and the third on the modeling of language when it comes to spoken interaction with robots. The three proposals should act as white papers for any researcher to take and build upon.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Enhancing Bandwidth Efficiency for Video Motion Transfer Applications using Deep Learning Based Keypoint Prediction
Authors:
Xue Bai,
Tasmiah Haque,
Sumit Mohan,
Yuliang Cai,
Byungheon Jeong,
Adam Halasz,
Srinjoy Das
Abstract:
We propose a deep learning based novel prediction framework for enhanced bandwidth reduction in motion transfer enabled video applications such as video conferencing, virtual reality gaming and privacy preservation for patient health monitoring. To model complex motion, we use the First Order Motion Model (FOMM) that represents dynamic objects using learned keypoints along with their local affine…
▽ More
We propose a deep learning based novel prediction framework for enhanced bandwidth reduction in motion transfer enabled video applications such as video conferencing, virtual reality gaming and privacy preservation for patient health monitoring. To model complex motion, we use the First Order Motion Model (FOMM) that represents dynamic objects using learned keypoints along with their local affine transformations. Keypoints are extracted by a self-supervised keypoint detector and organized in a time series corresponding to the video frames. Prediction of keypoints, to enable transmission using lower frames per second on the source device, is performed using a Variational Recurrent Neural Network (VRNN). The predicted keypoints are then synthesized to video frames using an optical flow estimator and a generator network. This efficacy of leveraging keypoint based representations in conjunction with VRNN based prediction for both video animation and reconstruction is demonstrated on three diverse datasets. For real-time applications, our results show the effectiveness of our proposed architecture by enabling up to 2x additional bandwidth reduction over existing keypoint based video motion transfer frameworks without significantly compromising video quality.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Graph Regularized Encoder Training for Extreme Classification
Authors:
Anshul Mittal,
Shikhar Mohan,
Deepak Saini,
Suchith C. Prabhu,
Jain jiao,
Sumeet Agarwal,
Soumen Chakrabarti,
Purushottam Kar,
Manik Varma
Abstract:
Deep extreme classification (XC) aims to train an encoder architecture and an accompanying classifier architecture to tag a data point with the most relevant subset of labels from a very large universe of labels. XC applications in ranking, recommendation and tagging routinely encounter tail labels for which the amount of training data is exceedingly small. Graph convolutional networks (GCN) prese…
▽ More
Deep extreme classification (XC) aims to train an encoder architecture and an accompanying classifier architecture to tag a data point with the most relevant subset of labels from a very large universe of labels. XC applications in ranking, recommendation and tagging routinely encounter tail labels for which the amount of training data is exceedingly small. Graph convolutional networks (GCN) present a convenient but computationally expensive way to leverage task metadata and enhance model accuracies in these settings. This paper formally establishes that in several use cases, the steep computational cost of GCNs is entirely avoidable by replacing GCNs with non-GCN architectures. The paper notices that in these settings, it is much more effective to use graph data to regularize encoder training than to implement a GCN. Based on these insights, an alternative paradigm RAMEN is presented to utilize graph metadata in XC settings that offers significant performance boosts with zero increase in inference computational costs. RAMEN scales to datasets with up to 1M labels and offers prediction accuracy up to 15% higher on benchmark datasets than state of the art methods, including those that use graph metadata to train GCNs. RAMEN also offers 10% higher accuracy over the best baseline on a proprietary recommendation dataset sourced from click logs of a popular search engine. Code for RAMEN will be released publicly.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
NLP for Knowledge Discovery and Information Extraction from Energetics Corpora
Authors:
Francis G. VanGessel,
Efrem Perry,
Salil Mohan,
Oliver M. Barham,
Mark Cavolowsky
Abstract:
We present a demonstration of the utility of NLP for aiding research into energetic materials and associated systems. The NLP method enables machine understanding of textual data, offering an automated route to knowledge discovery and information extraction from energetics text. We apply three established unsupervised NLP models: Latent Dirichlet Allocation, Word2Vec, and the Transformer to a larg…
▽ More
We present a demonstration of the utility of NLP for aiding research into energetic materials and associated systems. The NLP method enables machine understanding of textual data, offering an automated route to knowledge discovery and information extraction from energetics text. We apply three established unsupervised NLP models: Latent Dirichlet Allocation, Word2Vec, and the Transformer to a large curated dataset of energetics-related scientific articles. We demonstrate that each NLP algorithm is capable of identifying energetic topics and concepts, generating a language model which aligns with Subject Matter Expert knowledge. Furthermore, we present a document classification pipeline for energetics text. Our classification pipeline achieves 59-76\% accuracy depending on the NLP model used, with the highest performing Transformer model rivaling inter-annotator agreement metrics. The NLP approaches studied in this work can identify concepts germane to energetics and therefore hold promise as a tool for accelerating energetics research efforts and energetics material development.
△ Less
Submitted 10 February, 2024;
originally announced February 2024.
-
Are Generative AI systems Capable of Supporting Information Needs of Patients?
Authors:
Shreya Rajagopal,
Subhashis Hazarika,
Sookyung Kim,
Yan-ming Chiou,
Jae Ho Sohn,
Hari Subramonyam,
Shiwali Mohan
Abstract:
Patients managing a complex illness such as cancer face a complex information challenge where they not only must learn about their illness but also how to manage it. Close interaction with healthcare experts (radiologists, oncologists) can improve patient learning and thereby, their disease outcome. However, this approach is resource intensive and takes expert time away from other critical tasks.…
▽ More
Patients managing a complex illness such as cancer face a complex information challenge where they not only must learn about their illness but also how to manage it. Close interaction with healthcare experts (radiologists, oncologists) can improve patient learning and thereby, their disease outcome. However, this approach is resource intensive and takes expert time away from other critical tasks. Given the recent advancements in Generative AI models aimed at improving the healthcare system, our work investigates whether and how generative visual question answering systems can responsibly support patient information needs in the context of radiology imaging data. We conducted a formative need-finding study in which participants discussed chest computed tomography (CT) scans and associated radiology reports of a fictitious close relative with a cardiothoracic radiologist. Using thematic analysis of the conversation between participants and medical experts, we identified commonly occurring themes across interactions, including clarifying medical terminology, locating the problems mentioned in the report in the scanned image, understanding disease prognosis, discussing the next diagnostic steps, and comparing treatment options. Based on these themes, we evaluated two state-of-the-art generative visual language models against the radiologist's responses. Our results reveal variability in the quality of responses generated by the models across various themes. We highlight the importance of patient-facing generative AI systems to accommodate a diverse range of conversational themes, catering to the real-world informational needs of patients.
△ Less
Submitted 31 January, 2024;
originally announced February 2024.
-
Taming Mode Collapse in Score Distillation for Text-to-3D Generation
Authors:
Peihao Wang,
Dejia Xu,
Zhiwen Fan,
Dilin Wang,
Sreyas Mohan,
Forrest Iandola,
Rakesh Ranjan,
Yilei Li,
Qiang Liu,
Zhangyang Wang,
Vikas Chandra
Abstract:
Despite the remarkable performance of score distillation in text-to-3D generation, such techniques notoriously suffer from view inconsistency issues, also known as "Janus" artifact, where the generated objects fake each view with multiple front faces. Although empirically effective methods have approached this problem via score debiasing or prompt engineering, a more rigorous perspective to explai…
▽ More
Despite the remarkable performance of score distillation in text-to-3D generation, such techniques notoriously suffer from view inconsistency issues, also known as "Janus" artifact, where the generated objects fake each view with multiple front faces. Although empirically effective methods have approached this problem via score debiasing or prompt engineering, a more rigorous perspective to explain and tackle this problem remains elusive. In this paper, we reveal that the existing score distillation-based text-to-3D generation frameworks degenerate to maximal likelihood seeking on each view independently and thus suffer from the mode collapse problem, manifesting as the Janus artifact in practice. To tame mode collapse, we improve score distillation by re-establishing the entropy term in the corresponding variational objective, which is applied to the distribution of rendered images. Maximizing the entropy encourages diversity among different views in generated 3D assets, thereby mitigating the Janus problem. Based on this new objective, we derive a new update rule for 3D score distillation, dubbed Entropic Score Distillation (ESD). We theoretically reveal that ESD can be simplified and implemented by just adopting the classifier-free guidance trick upon variational score distillation. Although embarrassingly straightforward, our extensive experiments successfully demonstrate that ESD can be an effective treatment for Janus artifacts in score distillation.
△ Less
Submitted 29 March, 2024; v1 submitted 31 December, 2023;
originally announced January 2024.
-
SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity
Authors:
Peihao Wang,
Zhiwen Fan,
Dejia Xu,
Dilin Wang,
Sreyas Mohan,
Forrest Iandola,
Rakesh Ranjan,
Yilei Li,
Qiang Liu,
Zhangyang Wang,
Vikas Chandra
Abstract:
Score distillation has emerged as one of the most prevalent approaches for text-to-3D asset synthesis. Essentially, score distillation updates 3D parameters by lifting and back-propagating scores averaged over different views. In this paper, we reveal that the gradient estimation in score distillation is inherent to high variance. Through the lens of variance reduction, the effectiveness of SDS an…
▽ More
Score distillation has emerged as one of the most prevalent approaches for text-to-3D asset synthesis. Essentially, score distillation updates 3D parameters by lifting and back-propagating scores averaged over different views. In this paper, we reveal that the gradient estimation in score distillation is inherent to high variance. Through the lens of variance reduction, the effectiveness of SDS and VSD can be interpreted as applications of various control variates to the Monte Carlo estimator of the distilled score. Motivated by this rethinking and based on Stein's identity, we propose a more general solution to reduce variance for score distillation, termed Stein Score Distillation (SSD). SSD incorporates control variates constructed by Stein identity, allowing for arbitrary baseline functions. This enables us to include flexible guidance priors and network architectures to explicitly optimize for variance reduction. In our experiments, the overall pipeline, dubbed SteinDreamer, is implemented by instantiating the control variate with a monocular depth estimator. The results suggest that SSD can effectively reduce the distillation variance and consistently improve visual quality for both object- and scene-level generation. Moreover, we demonstrate that SteinDreamer achieves faster convergence than existing methods due to more stable gradient updates.
△ Less
Submitted 29 March, 2024; v1 submitted 31 December, 2023;
originally announced January 2024.
-
Healthcare Policy Compliance: A Blockchain Smart Contract-Based Approach
Authors:
Md Al Amin,
Hemanth Tummala,
Seshamalini Mohan,
Indrajit Ray
Abstract:
This paper addresses the critical challenge of ensuring healthcare policy compliance in the context of Electronic Health Records (EHRs). Despite stringent regulations like HIPAA, significant gaps in policy compliance often remain undetected until a data breach occurs. To bridge this gap, we propose a novel blockchain-powered, smart contract-based access control model. This model is specifically de…
▽ More
This paper addresses the critical challenge of ensuring healthcare policy compliance in the context of Electronic Health Records (EHRs). Despite stringent regulations like HIPAA, significant gaps in policy compliance often remain undetected until a data breach occurs. To bridge this gap, we propose a novel blockchain-powered, smart contract-based access control model. This model is specifically designed to enforce patient-provider agreements (PPAs) and other relevant policies, thereby ensuring both policy compliance and provenance. Our approach integrates components of informed consent into PPAs, employing blockchain smart contracts to automate and secure policy enforcement. The authorization module utilizes these contracts to make informed access decisions, recording all actions in a transparent, immutable blockchain ledger. This system not only ensures that policies are rigorously applied but also maintains a verifiable record of all actions taken, thus facilitating an easy audit and proving compliance. We implement this model in a private Ethereum blockchain setup, focusing on maintaining the integrity and lineage of policies and ensuring that audit trails are accurately and securely recorded. The Proof of Compliance (PoC) consensus mechanism enables decentralized, independent auditor nodes to verify compliance status based on the audit trails recorded. Experimental evaluation demonstrates the effectiveness of the proposed model in a simulated healthcare environment. The results show that our approach not only strengthens policy compliance and provenance but also enhances the transparency and accountability of the entire process. In summary, this paper presents a comprehensive, blockchain-based solution to a longstanding problem in healthcare data management, offering a robust framework for ensuring policy compliance and provenance through smart contracts and blockchain technology.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
Multiple exciton generation in VO2
Authors:
S. R. Sahu,
S. Khan,
A. Tripathy,
K. Dey,
N. Bano,
S. Raj Mohan,
M. P. Joshi,
S. Verma,
B. T. Rao,
V. G. Sathe,
D. K. Shukla
Abstract:
Multiple exciton generation (MEG) is a widely studied phenomenon in semiconductor nanocrystals and quantum dots, aimed at improving the energy conversion efficiency of solar cells. MEG is the process wherein incident photon energy is significantly larger than the band gap, and the resulting photoexcited carriers relax by generating additional electron-hole pairs, rather than decaying by heat dissi…
▽ More
Multiple exciton generation (MEG) is a widely studied phenomenon in semiconductor nanocrystals and quantum dots, aimed at improving the energy conversion efficiency of solar cells. MEG is the process wherein incident photon energy is significantly larger than the band gap, and the resulting photoexcited carriers relax by generating additional electron-hole pairs, rather than decaying by heat dissipation. Here, we present an experimental demonstration of MEG in a prototype strongly correlated material, VO2, through photocurrent spectroscopy and ultrafast transient reflectivity measurements, both of which are considered the most prominent ways for detecting MEG in working devices. The key result of this paper is the observation of MEG at room temperature (in a correlated insulating phase of VO2), and the estimated threshold for MEG is 3Eg. We demonstrate an escalated photocurrent due to MEG in VO2, and quantum efficiency is found to exceed 100%. Our studies suggest that this phenomenon is a manifestation of expeditious impact ionization due to stronger electron correlations and could be exploited in a large number of strongly correlated materials.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Autonomous Mapping and Navigation using Fiducial Markers and Pan-Tilt Camera for Assisting Indoor Mobility of Blind and Visually Impaired People
Authors:
Dharmateja Adapa,
Virendra Singh Shekhawat,
Avinash Gautam,
Sudeept Mohan
Abstract:
Large indoor spaces have complex layouts making them difficult to navigate. Indoor spaces in hospitals, universities, shopping complexes, etc., carry multi-modal information in the form of text and symbols. Hence, it is difficult for Blind and Visually Impaired (BVI) people to independently navigate such spaces. Indoor environments are usually GPS-denied; therefore, Bluetooth-based, WiFi-based, or…
▽ More
Large indoor spaces have complex layouts making them difficult to navigate. Indoor spaces in hospitals, universities, shopping complexes, etc., carry multi-modal information in the form of text and symbols. Hence, it is difficult for Blind and Visually Impaired (BVI) people to independently navigate such spaces. Indoor environments are usually GPS-denied; therefore, Bluetooth-based, WiFi-based, or Range-based methods are used for localization. These methods have high setup costs, lesser accuracy, and sometimes need special sensing equipment. We propose a Visual Assist (VA) system for the indoor navigation of BVI individuals using visual Fiducial markers for localization. State-of-the-art (SOTA) approaches for visual localization using Fiducial markers use fixed cameras having a narrow field of view. These approaches stop tracking the markers when they are out of sight. We employ a Pan-Tilt turret-mounted camera which enhances the field of view to 360° for enhanced marker tracking. We, therefore, need fewer markers for mapping and navigation. The efficacy of the proposed VA system is measured on three metrics, i.e., RMSE (Root Mean Square Error), ADNN (Average Distance to Nearest Neighbours), and ATE (Absolute Trajectory Error). Our system outperforms Hector-SLAM, ORB-SLAM3, and UcoSLAM. The proposed system achieves localization accuracy within $\pm8cm$ compared to $\pm12cm$ and $\pm10cm$ for ORB-SLAM3 and UcoSLAM, respectively.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
TOPr: Enhanced Static Code Pruning for Fast and Precise Directed Fuzzing
Authors:
Chaitra Niddodi,
Stefan Nagy,
Darko Marinov,
Sibin Mohan
Abstract:
Directed fuzzing is a dynamic testing technique that focuses exploration on specific, pre targeted program locations. Like other types of fuzzers, directed fuzzers are most effective when maximizing testing speed and precision. To this end, recent directed fuzzers have begun leveraging path pruning: preventing the wasteful testing of program paths deemed irrelevant to reaching a desired target loc…
▽ More
Directed fuzzing is a dynamic testing technique that focuses exploration on specific, pre targeted program locations. Like other types of fuzzers, directed fuzzers are most effective when maximizing testing speed and precision. To this end, recent directed fuzzers have begun leveraging path pruning: preventing the wasteful testing of program paths deemed irrelevant to reaching a desired target location. Yet, despite code pruning's substantial speedup, current approaches are imprecise failing to capture indirect control flow requiring additional dynamic analyses that diminish directed fuzzers' speeds. Thus, without code pruning that is both fast and precise, directed fuzzers' effectiveness will continue to remain limited. This paper aims to tackle the challenge of upholding both speed and precision in pruning-based directed fuzzing. We show that existing pruning approaches fail to recover common case indirect control flow; and identify opportunities to enhance them with lightweight heuristics namely, function signature matching enabling them to maximize precision without the burden of dynamic analysis. We implement our enhanced pruning as a prototype, TOPr (Target Oriented Pruning), and evaluate it against the leading pruning based and pruning agnostic directed fuzzers SieveFuzz and AFLGo. We show that TOPr's enhanced pruning outperforms these fuzzers in (1) speed (achieving 222% and 73% higher test case throughput, respectively); (2) reachability (achieving 149% and 9% more target relevant coverage, respectively); and (3) bug discovery time (triggering bugs faster 85% and 8%, respectively). Furthermore, TOPr's balance of speed and precision enables it to find 24 new bugs in 5 open source applications, with 18 confirmed by developers, 12 bugs labelled as "Priority - 1. High", and 12 bugs fixed, underscoring the effectiveness of our framework.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
New radio lobes at parsec scale from the East-West protostellar jet RAFGL2591
Authors:
A. G. Cheriyan,
S. Vig,
Sreelekshmi Mohan
Abstract:
RAFGL2591 is a massive star-forming complex in the Cygnus-X region comprising of a cluster of embedded protostars and young stellar objects located at a distance of 3.33 kpc. We investigate low-frequency radio emission from the protostellar jet associated with RAFGL2591 using the Giant Metrewave Radio Telescope (GMRT) at 325, 610 and 1280 MHz. For the first time, we have detected radio jet lobes i…
▽ More
RAFGL2591 is a massive star-forming complex in the Cygnus-X region comprising of a cluster of embedded protostars and young stellar objects located at a distance of 3.33 kpc. We investigate low-frequency radio emission from the protostellar jet associated with RAFGL2591 using the Giant Metrewave Radio Telescope (GMRT) at 325, 610 and 1280 MHz. For the first time, we have detected radio jet lobes in the E-W direction, labelled as GMRT-1 and GMRT-2. While GMRT-1 displays a flat radio spectral index of $α$ = -0.10 , GMRT-2 shows a steeply negative value $α$ = -0.62 suggestive of non-thermal emission. H$_2$ emission maps show the presence of numerous knots, arcs and extended emission towards the East-West jet, excited by the protostar VLA 3. In addition, we report a few H$_2$ knots in the North-East and South-West for the first time. The radio lobes (GMRT-1, GMRT-2) and H$_2$ emission towards this region are understood in the context of the prominent East-West jet as well as its lesser-known sibling jet in the North-East and South-West direction. To model the radio emission from the lobes, we have employed a numerical model including both thermal and non-thermal emission and found number densities towards these lobes in the range 100 - 1000 cm$^{-3}$ . The misalignment of the East-West jet lobes exhibits a reflection symmetry with a bending of $\sim$ 20$\circ$ . We attempt to understand this misalignment through precession caused by a binary partner and/or a supersonic side wind from source(s) in the vicinity.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems
Authors:
Debopam Sanyal,
Jui-Tse Hung,
Manav Agrawal,
Prahlad Jasti,
Shahab Nikkhoo,
Somesh Jha,
Tianhao Wang,
Sibin Mohan,
Alexey Tumanov
Abstract:
Model-serving systems have become increasingly popular, especially in real-time web applications. In such systems, users send queries to the server and specify the desired performance metrics (e.g., desired accuracy, latency). The server maintains a set of models (model zoo) in the back-end and serves the queries based on the specified metrics. This paper examines the security, specifically robust…
▽ More
Model-serving systems have become increasingly popular, especially in real-time web applications. In such systems, users send queries to the server and specify the desired performance metrics (e.g., desired accuracy, latency). The server maintains a set of models (model zoo) in the back-end and serves the queries based on the specified metrics. This paper examines the security, specifically robustness against model extraction attacks, of such systems. Existing black-box attacks assume a single model can be repeatedly selected for serving inference requests. Modern inference serving systems break this assumption. Thus, they cannot be directly applied to extract a victim model, as models are hidden behind a layer of abstraction exposed by the serving system. An attacker can no longer identify which model she is interacting with. To this end, we first propose a query-efficient fingerprinting algorithm to enable the attacker to trigger any desired model consistently. We show that by using our fingerprinting algorithm, model extraction can have fidelity and accuracy scores within $1\%$ of the scores obtained when attacking a single, explicitly specified model, as well as up to $14.6\%$ gain in accuracy and up to $7.7\%$ gain in fidelity compared to the naive attack. Second, we counter the proposed attack with a noise-based defense mechanism that thwarts fingerprinting by adding noise to the specified performance metrics. The proposed defense strategy reduces the attack's accuracy and fidelity by up to $9.8\%$ and $4.8\%$, respectively (on medium-sized model extraction). Third, we show that the proposed defense induces a fundamental trade-off between the level of protection and system goodput, achieving configurable and significant victim model extraction protection while maintaining acceptable goodput ($>80\%$). We implement the proposed defense in a real system with plans to open source.
△ Less
Submitted 6 August, 2023; v1 submitted 3 July, 2023;
originally announced July 2023.
-
A Domain-Independent Agent Architecture for Adaptive Operation in Evolving Open Worlds
Authors:
Shiwali Mohan,
Wiktor Piotrowski,
Roni Stern,
Sachin Grover,
Sookyung Kim,
Jacob Le,
Johan De Kleer
Abstract:
Model-based reasoning agents are ill-equipped to act in novel situations in which their model of the environment no longer sufficiently represents the world. We propose HYDRA - a framework for designing model-based agents operating in mixed discrete-continuous worlds, that can autonomously detect when the environment has evolved from its canonical setup, understand how it has evolved, and adapt th…
▽ More
Model-based reasoning agents are ill-equipped to act in novel situations in which their model of the environment no longer sufficiently represents the world. We propose HYDRA - a framework for designing model-based agents operating in mixed discrete-continuous worlds, that can autonomously detect when the environment has evolved from its canonical setup, understand how it has evolved, and adapt the agents' models to perform effectively. HYDRA is based upon PDDL+, a rich modeling language for planning in mixed, discrete-continuous environments. It augments the planning module with visual reasoning, task selection, and action execution modules for closed-loop interaction with complex environments. HYDRA implements a novel meta-reasoning process that enables the agent to monitor its own behavior from a variety of aspects. The process employs a diverse set of computational methods to maintain expectations about the agent's own behavior in an environment. Divergences from those expectations are useful in detecting when the environment has evolved and identifying opportunities to adapt the underlying models. HYDRA builds upon ideas from diagnosis and repair and uses a heuristics-guided search over model changes such that they become competent in novel conditions. The HYDRA framework has been used to implement novelty-aware agents for three diverse domains - CartPole++ (a higher dimension variant of a classic control problem), Science Birds (an IJCAI competition problem), and PogoStick (a specific problem domain in Minecraft). We report empirical observations from these domains to demonstrate the efficacy of various components in the novelty meta-reasoning process.
△ Less
Submitted 9 June, 2023;
originally announced June 2023.
-
The Brain Tumor Segmentation (BraTS) Challenge 2023: Brain MR Image Synthesis for Tumor Segmentation (BraSyn)
Authors:
Hongwei Bran Li,
Gian Marco Conte,
Syed Muhammad Anwar,
Florian Kofler,
Ivan Ezhov,
Koen van Leemput,
Marie Piraud,
Maria Diaz,
Byrone Cole,
Evan Calabrese,
Jeff Rudie,
Felix Meissen,
Maruf Adewole,
Anastasia Janas,
Anahita Fathi Kazerooni,
Dominic LaBella,
Ahmed W. Moawad,
Keyvan Farahani,
James Eddy,
Timothy Bergquist,
Verena Chung,
Russell Takeshi Shinohara,
Farouk Dako,
Walter Wiggins,
Zachary Reitman
, et al. (43 additional authors not shown)
Abstract:
Automated brain tumor segmentation methods have become well-established and reached performance levels offering clear clinical utility. These methods typically rely on four input magnetic resonance imaging (MRI) modalities: T1-weighted images with and without contrast enhancement, T2-weighted images, and FLAIR images. However, some sequences are often missing in clinical practice due to time const…
▽ More
Automated brain tumor segmentation methods have become well-established and reached performance levels offering clear clinical utility. These methods typically rely on four input magnetic resonance imaging (MRI) modalities: T1-weighted images with and without contrast enhancement, T2-weighted images, and FLAIR images. However, some sequences are often missing in clinical practice due to time constraints or image artifacts, such as patient motion. Consequently, the ability to substitute missing modalities and gain segmentation performance is highly desirable and necessary for the broader adoption of these algorithms in the clinical routine. In this work, we present the establishment of the Brain MR Image Synthesis Benchmark (BraSyn) in conjunction with the Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2023. The primary objective of this challenge is to evaluate image synthesis methods that can realistically generate missing MRI modalities when multiple available images are provided. The ultimate aim is to facilitate automated brain tumor segmentation pipelines. The image dataset used in the benchmark is diverse and multi-modal, created through collaboration with various hospitals and research institutions.
△ Less
Submitted 28 June, 2023; v1 submitted 15 May, 2023;
originally announced May 2023.
-
The Brain Tumor Segmentation (BraTS) Challenge 2023: Local Synthesis of Healthy Brain Tissue via Inpainting
Authors:
Florian Kofler,
Felix Meissen,
Felix Steinbauer,
Robert Graf,
Eva Oswald,
Ezequiel de da Rosa,
Hongwei Bran Li,
Ujjwal Baid,
Florian Hoelzl,
Oezguen Turgut,
Izabela Horvath,
Diana Waldmannstetter,
Christina Bukas,
Maruf Adewole,
Syed Muhammad Anwar,
Anastasia Janas,
Anahita Fathi Kazerooni,
Dominic LaBella,
Ahmed W Moawad,
Keyvan Farahani,
James Eddy,
Timothy Bergquist,
Verena Chung,
Russell Takeshi Shinohara,
Farouk Dako
, et al. (43 additional authors not shown)
Abstract:
A myriad of algorithms for the automatic analysis of brain MR images is available to support clinicians in their decision-making. For brain tumor patients, the image acquisition time series typically starts with a scan that is already pathological. This poses problems, as many algorithms are designed to analyze healthy brains and provide no guarantees for images featuring lesions. Examples include…
▽ More
A myriad of algorithms for the automatic analysis of brain MR images is available to support clinicians in their decision-making. For brain tumor patients, the image acquisition time series typically starts with a scan that is already pathological. This poses problems, as many algorithms are designed to analyze healthy brains and provide no guarantees for images featuring lesions. Examples include but are not limited to algorithms for brain anatomy parcellation, tissue segmentation, and brain extraction. To solve this dilemma, we introduce the BraTS 2023 inpainting challenge. Here, the participants' task is to explore inpainting techniques to synthesize healthy brain scans from lesioned ones. The following manuscript contains the task formulation, dataset, and submission procedure. Later it will be updated to summarize the findings of the challenge. The challenge is organized as part of the BraTS 2023 challenge hosted at the MICCAI 2023 conference in Vancouver, Canada.
△ Less
Submitted 9 August, 2023; v1 submitted 15 May, 2023;
originally announced May 2023.
-
Impact of errors in the magnetic field measurement on the precision determination of neutrino oscillation parameters at the proposed ICAL detector at INO
Authors:
Honey Khindri,
D. Indumathi,
Lakshmi S. Mohan
Abstract:
The magnetised iron calorimeter (ICAL) detector proposed at the India-based Neutrino Observatory will be a 51 kton detector made up of 151 layers of 56 mm thick soft iron with 40 mm air gap in between where the RPCs, the active detectors, will be placed. The main goal of ICAL is to make precision measurements of the neutrino oscillation parameters using the atmospheric neutrinos as source. The cha…
▽ More
The magnetised iron calorimeter (ICAL) detector proposed at the India-based Neutrino Observatory will be a 51 kton detector made up of 151 layers of 56 mm thick soft iron with 40 mm air gap in between where the RPCs, the active detectors, will be placed. The main goal of ICAL is to make precision measurements of the neutrino oscillation parameters using the atmospheric neutrinos as source. The charged current interactions of the atmospheric muon neutrinos and anti-neutrinos in the detector produce charged muons. The magnetic field, with a maximum value of $\sim$ 1.5 T in the central region of ICAL, is a critical component since it will be used to distinguish the charges and determine the momentum and direction of these muons. It is difficult to measure the magnetic field inside the iron. The existing methods can only estimate the internal field and hence will be prone to error. This paper presents the first simulations study of the effect of errors in the measurement of the magnetic field in ICAL on its physics potential, especially the neutrino mass ordering and precision measurement of oscillation parameters in the 2--3 sector. The study is a GEANT4-based analysis, using measurements of the magnetic field at the prototype ICAL detector. We find that there is only a small effect on the determination of the mass ordering. While local fluctuations in the magnetic field measurement are well-tolerated, calibration errors must remain well within 5\% to retain good precision determination of the parameters $\sin^2θ_{23}$ and $Δm^2_{32}$.
△ Less
Submitted 15 May, 2023; v1 submitted 12 May, 2023;
originally announced May 2023.
-
MMG-Ego4D: Multi-Modal Generalization in Egocentric Action Recognition
Authors:
Xinyu Gong,
Sreyas Mohan,
Naina Dhingra,
Jean-Charles Bazin,
Yilei Li,
Zhangyang Wang,
Rakesh Ranjan
Abstract:
In this paper, we study a novel problem in egocentric action recognition, which we term as "Multimodal Generalization" (MMG). MMG aims to study how systems can generalize when data from certain modalities is limited or even completely missing. We thoroughly investigate MMG in the context of standard supervised action recognition and the more challenging few-shot setting for learning new action cat…
▽ More
In this paper, we study a novel problem in egocentric action recognition, which we term as "Multimodal Generalization" (MMG). MMG aims to study how systems can generalize when data from certain modalities is limited or even completely missing. We thoroughly investigate MMG in the context of standard supervised action recognition and the more challenging few-shot setting for learning new action categories. MMG consists of two novel scenarios, designed to support security, and efficiency considerations in real-world applications: (1) missing modality generalization where some modalities that were present during the train time are missing during the inference time, and (2) cross-modal zero-shot generalization, where the modalities present during the inference time and the training time are disjoint. To enable this investigation, we construct a new dataset MMG-Ego4D containing data points with video, audio, and inertial motion sensor (IMU) modalities. Our dataset is derived from Ego4D dataset, but processed and thoroughly re-annotated by human experts to facilitate research in the MMG problem. We evaluate a diverse array of models on MMG-Ego4D and propose new methods with improved generalization ability. In particular, we introduce a new fusion module with modality dropout training, contrastive-based alignment training, and a novel cross-modal prototypical loss for better few-shot performance. We hope this study will serve as a benchmark and guide future research in multimodal generalization problems. The benchmark and code will be available at https://github.com/facebookresearch/MMG_Ego4D.
△ Less
Submitted 11 May, 2023;
originally announced May 2023.
-
You Can't Always Check What You Wanted: Selective Checking and Trusted Execution to Prevent False Actuations in Cyber-Physical Systems
Authors:
Monowar Hasan,
Sibin Mohan
Abstract:
Cyber-physical systems (CPS) are vulnerable to attacks targeting outgoing actuation commands that modify their physical behaviors. The limited resources in such systems, coupled with their stringent timing constraints, often prevents the checking of every outgoing command. We present a "selective checking" mechanism that uses game-theoretic modeling to identify the right subset of commands to be c…
▽ More
Cyber-physical systems (CPS) are vulnerable to attacks targeting outgoing actuation commands that modify their physical behaviors. The limited resources in such systems, coupled with their stringent timing constraints, often prevents the checking of every outgoing command. We present a "selective checking" mechanism that uses game-theoretic modeling to identify the right subset of commands to be checked in order to deter an adversary. This mechanism is coupled with a "delay-aware" trusted execution environment (TEE) to ensure that only verified actuation commands are ever sent to the physical system, thus maintaining their safety and integrity. The selective checking and trusted execution (SCATE) framework is implemented on an off-the-shelf ARM platform running standard embedded Linux. We demonstrate the effectiveness of SCATE using four realistic cyber-physical systems (a ground rover, a flight controller, a robotic arm and an automated syringe pump) and study design trade-offs. Not only does SCATE provide a high level of security and high performance, it also suffers from significantly lower overheads (30.48%-47.32% less) in the process. In fact, SCATE can work with more systems without negatively affecting the safety of the system. Considering that most CPS do not have any such checking mechanisms, and SCATE is guaranteed to meet all the timing requirements (i.e., ensure the safety/integrity of the system), our methods can significantly improve the security (and, hence, safety) of the system.
△ Less
Submitted 27 April, 2023;
originally announced April 2023.
-
Ensemble prosody prediction for expressive speech synthesis
Authors:
Tian Huey Teh,
Vivian Hu,
Devang S Ram Mohan,
Zack Hodari,
Christopher G. R. Wallis,
Tomás Gomez Ibarrondo,
Alexandra Torresquintero,
James Leoni,
Mark Gales,
Simon King
Abstract:
Generating expressive speech with rich and varied prosody continues to be a challenge for Text-to-Speech. Most efforts have focused on sophisticated neural architectures intended to better model the data distribution. Yet, in evaluations it is generally found that no single model is preferred for all input texts. This suggests an approach that has rarely been used before for Text-to-Speech: an ens…
▽ More
Generating expressive speech with rich and varied prosody continues to be a challenge for Text-to-Speech. Most efforts have focused on sophisticated neural architectures intended to better model the data distribution. Yet, in evaluations it is generally found that no single model is preferred for all input texts. This suggests an approach that has rarely been used before for Text-to-Speech: an ensemble of models. We apply ensemble learning to prosody prediction. We construct simple ensembles of prosody predictors by varying either model architecture or model parameter values. To automatically select amongst the models in the ensemble when performing Text-to-Speech, we propose a novel, and computationally trivial, variance-based criterion. We demonstrate that even a small ensemble of prosody predictors yields useful diversity, which, combined with the proposed selection criterion, outperforms any individual model from the ensemble.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
Heuristic Search For Physics-Based Problems: Angry Birds in PDDL+
Authors:
Wiktor Piotrowski,
Yoni Sher,
Sachin Grover,
Roni Stern,
Shiwali Mohan
Abstract:
This paper studies how a domain-independent planner and combinatorial search can be employed to play Angry Birds, a well established AI challenge problem. To model the game, we use PDDL+, a planning language for mixed discrete/continuous domains that supports durative processes and exogenous events. The paper describes the model and identifies key design decisions that reduce the problem complexit…
▽ More
This paper studies how a domain-independent planner and combinatorial search can be employed to play Angry Birds, a well established AI challenge problem. To model the game, we use PDDL+, a planning language for mixed discrete/continuous domains that supports durative processes and exogenous events. The paper describes the model and identifies key design decisions that reduce the problem complexity. In addition, we propose several domain-specific enhancements including heuristics and a search technique similar to preferred operators. Together, they alleviate the complexity of combinatorial search. We evaluate our approach by comparing its performance with dedicated domain-specific solvers on a range of Angry Birds levels. The results show that our performance is on par with these domain-specific approaches in most levels, even without using our domain-specific search enhancements.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
Learning to Operate in Open Worlds by Adapting Planning Models
Authors:
Wiktor Piotrowski,
Roni Stern,
Yoni Sher,
Jacob Le,
Matthew Klenk,
Johan deKleer,
Shiwali Mohan
Abstract:
Planning agents are ill-equipped to act in novel situations in which their domain model no longer accurately represents the world. We introduce an approach for such agents operating in open worlds that detects the presence of novelties and effectively adapts their domain models and consequent action selection. It uses observations of action execution and measures their divergence from what is expe…
▽ More
Planning agents are ill-equipped to act in novel situations in which their domain model no longer accurately represents the world. We introduce an approach for such agents operating in open worlds that detects the presence of novelties and effectively adapts their domain models and consequent action selection. It uses observations of action execution and measures their divergence from what is expected, according to the environment model, to infer existence of a novelty. Then, it revises the model through a heuristics-guided search over model changes. We report empirical evaluations on the CartPole problem, a standard Reinforcement Learning (RL) benchmark. The results show that our approach can deal with a class of novelties very quickly and in an interpretable fashion.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
Modeling of thermal and non-thermal radio emission from HH80-81 jet
Authors:
Sreelekshmi Mohan,
Sarita Vig,
Samir Mandal
Abstract:
Protostellar jets are one of the primary signposts of star formation. A handful of protostellar objects exhibit radio emission from ionized jets, of which a few display negative spectral indices, indicating the presence of synchrotron emission. In this study, we characterize the radio spectra of HH80-81 jet with the help of a numerical model that we have developed earlier, which takes into account…
▽ More
Protostellar jets are one of the primary signposts of star formation. A handful of protostellar objects exhibit radio emission from ionized jets, of which a few display negative spectral indices, indicating the presence of synchrotron emission. In this study, we characterize the radio spectra of HH80-81 jet with the help of a numerical model that we have developed earlier, which takes into account both thermal free-free and non-thermal synchrotron emission mechanisms. For modeling the HH80-81 jet, we consider jet emission towards the central region close to the driving source along with two Herbig-Haro objects, HH80 and HH81. We have obtained the best-fit parameters for each of these sources by fitting the model to radio observational data corresponding to two frequency windows taken across two epochs. Considering an electron number density in the range $10^3 - 10^5$ cm$^{-3}$, we obtained the thickness of the jet edges and fraction of relativistic electrons that contribute to non-thermal emission in the range $0.01^{\circ} - 0.1^{\circ}$ and $10^{-7} - 10^{-4}$, respectively. For the best-fit parameter sets, the model spectral indices lie in the range of -0.15 to +0.11 within the observed frequency windows.
△ Less
Submitted 24 March, 2023; v1 submitted 23 March, 2023;
originally announced March 2023.
-
Controllable Prosody Generation With Partial Inputs
Authors:
Dan Andrei Iliescu,
Devang Savita Ram Mohan,
Tian Huey Teh,
Zack Hodari
Abstract:
We address the problem of human-in-the-loop control for generating prosody in the context of text-to-speech synthesis. Controlling prosody is challenging because existing generative models lack an efficient interface through which users can modify the output quickly and precisely. To solve this, we introduce a novel framework whereby the user provides partial inputs and the generative model genera…
▽ More
We address the problem of human-in-the-loop control for generating prosody in the context of text-to-speech synthesis. Controlling prosody is challenging because existing generative models lack an efficient interface through which users can modify the output quickly and precisely. To solve this, we introduce a novel framework whereby the user provides partial inputs and the generative model generates the missing features. We propose a model that is specifically designed to encode partial prosodic features and output complete audio. We show empirically that our model displays two essential qualities of a human-in-the-loop control mechanism: efficiency and robustness. With even a very small number of input values (~4), our model enables users to improve the quality of the output significantly in terms of listener preference (4:1).
△ Less
Submitted 15 April, 2024; v1 submitted 14 March, 2023;
originally announced March 2023.
-
A note on the Bures-Wasserstein metric
Authors:
Shravan Mohan
Abstract:
In this brief note, it is shown that the Bures-Wasserstein (BW) metric on the space positive definite matrices lends itself to convex optimization. In other words, the computation of the BW metric can be posed as a convex optimization problem. In turn, this leads to efficient computations of (i) the BW distance between convex subsets of positive definite matrices, (ii) the BW barycenter, and (iii)…
▽ More
In this brief note, it is shown that the Bures-Wasserstein (BW) metric on the space positive definite matrices lends itself to convex optimization. In other words, the computation of the BW metric can be posed as a convex optimization problem. In turn, this leads to efficient computations of (i) the BW distance between convex subsets of positive definite matrices, (ii) the BW barycenter, and (iii) incorporating BW distance from a given matrix as a convex constraint. Computations are provided for corroboration.
△ Less
Submitted 7 March, 2023;
originally announced March 2023.
-
Imaging of HH80-81 jet in the NIR shock tracers H$_2$ and [Fe II]
Authors:
Sreelekshmi Mohan,
Sarita Vig,
Watson P. Varricatt,
Anandmayee Tej
Abstract:
The HH80-81 system is one of the most powerful jets driven by a massive protostar. We present new near-infrared (NIR) line imaging observations of the HH80-81 jet in the H$_2$ (2.122 $μ$m) and [Fe II] (1.644 $μ$m) lines. These lines trace not only the jet close to the exciting source but also the knots located farther away. We have detected nine groups of knot-like structures in the jet including…
▽ More
The HH80-81 system is one of the most powerful jets driven by a massive protostar. We present new near-infrared (NIR) line imaging observations of the HH80-81 jet in the H$_2$ (2.122 $μ$m) and [Fe II] (1.644 $μ$m) lines. These lines trace not only the jet close to the exciting source but also the knots located farther away. We have detected nine groups of knot-like structures in the jet including HH80 and HH81 spaced $0.2-0.9$ pc apart. The knots in the northern arm of the jet show only [Fe II] emission closer to the exciting source, a combination of [Fe II] and H$_2$ at intermediate distances, and solely H$_2$ emission farther outwards. Towards the southern arm, all the knots exhibit both H$_2$ and [Fe II] emission. The nature of the shocks is inferred by assimilating the NIR observations with radio and X-ray observations from literature. In the northern arm, we infer the presence of strong dissociative shocks, in the knots located close to the exciting source. The knots in the southern arm that include HH80 and HH81 are explicable as a combination of strong and weak shocks. The mass-loss rates of the knots determined from [Fe II] luminosities are in the range $\sim 3.0\times 10^{-7}-5.2\times 10^{-5}$ M$_{\odot}$ yr$^{-1}$, consistent with those from massive protostars. Towards the central region, close to the driving source of the jet, we have observed various arcs in H$_2$ emission which resemble bow shocks, and strings of H$_2$ knots which reveal traces of multiple outflows.
△ Less
Submitted 15 November, 2022;
originally announced November 2022.
-
A note on power allocation for optimal capacity
Authors:
Shravan Mohan
Abstract:
The problems of determining the optimal power allocation, within maximum power bounds, to (i) maximize the minimum Shannon capacity, and (ii) minimize the weighted latency are considered. In the first case, the global optima can be achieved in polynomial time by solving a sequence of linear programs (LP). In the second case, the original non-convex problem is replaced by a convex surrogate (a geom…
▽ More
The problems of determining the optimal power allocation, within maximum power bounds, to (i) maximize the minimum Shannon capacity, and (ii) minimize the weighted latency are considered. In the first case, the global optima can be achieved in polynomial time by solving a sequence of linear programs (LP). In the second case, the original non-convex problem is replaced by a convex surrogate (a geometric program), using a functional approximation. Since the approximation error is relatively low, the optima of the surrogate is close to the global optimal point of the original problem. In either cases, there is no assumption on the SINR range. The use of LPs and geometric programming make the proposed algorithms numerically efficient. Computations are provided for corroboration.
△ Less
Submitted 13 November, 2022;
originally announced November 2022.
-
Analogical Concept Memory for Architectures Implementing the Common Model of Cognition
Authors:
Shiwali Mohan,
Matthew Klenk
Abstract:
Architectures that implement the Common Model of Cognition - Soar, ACT-R, and Sigma - have a prominent place in research on cognitive modeling as well as on designing complex intelligent agents. In this paper, we explore how computational models of analogical processing can be brought into these architectures to enable concept acquisition from examples obtained interactively. We propose a new anal…
▽ More
Architectures that implement the Common Model of Cognition - Soar, ACT-R, and Sigma - have a prominent place in research on cognitive modeling as well as on designing complex intelligent agents. In this paper, we explore how computational models of analogical processing can be brought into these architectures to enable concept acquisition from examples obtained interactively. We propose a new analogical concept memory for Soar that augments its current system of declarative long-term memories. We frame the problem of concept learning as embedded within the larger context of interactive task learning (ITL) and embodied language processing (ELP). We demonstrate that the analogical learning methods implemented in the proposed memory can quickly learn a diverse types of novel concepts that are useful not only in recognition of a concept in the environment but also in action selection. Our approach has been instantiated in an implemented cognitive system AILEEN and evaluated on a simulated robotic domain.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Evaluating Unsupervised Denoising Requires Unsupervised Metrics
Authors:
Adria Marcos-Morales,
Matan Leibovich,
Sreyas Mohan,
Joshua Lawrence Vincent,
Piyush Haluai,
Mai Tan,
Peter Crozier,
Carlos Fernandez-Granda
Abstract:
Unsupervised denoising is a crucial challenge in real-world imaging applications. Unsupervised deep-learning methods have demonstrated impressive performance on benchmarks based on synthetic noise. However, no metrics are available to evaluate these methods in an unsupervised fashion. This is highly problematic for the many practical applications where ground-truth clean images are not available.…
▽ More
Unsupervised denoising is a crucial challenge in real-world imaging applications. Unsupervised deep-learning methods have demonstrated impressive performance on benchmarks based on synthetic noise. However, no metrics are available to evaluate these methods in an unsupervised fashion. This is highly problematic for the many practical applications where ground-truth clean images are not available. In this work, we propose two novel metrics: the unsupervised mean squared error (MSE) and the unsupervised peak signal-to-noise ratio (PSNR), which are computed using only noisy data. We provide a theoretical analysis of these metrics, showing that they are asymptotically consistent estimators of the supervised MSE and PSNR. Controlled numerical experiments with synthetic noise confirm that they provide accurate approximations in practice. We validate our approach on real-world data from two imaging modalities: videos in raw format and transmission electron microscopy. Our results demonstrate that the proposed metrics enable unsupervised evaluation of denoising methods based exclusively on noisy data.
△ Less
Submitted 30 May, 2023; v1 submitted 11 October, 2022;
originally announced October 2022.
-
HeartSpot: Privatized and Explainable Data Compression for Cardiomegaly Detection
Authors:
Elvin Johnson,
Shreshta Mohan,
Alex Gaudio,
Asim Smailagic,
Christos Faloutsos,
Aurélio Campilho
Abstract:
Advances in data-driven deep learning for chest X-ray image analysis underscore the need for explainability, privacy, large datasets and significant computational resources. We frame privacy and explainability as a lossy single-image compression problem to reduce both computational and data requirements without training. For Cardiomegaly detection in chest X-ray images, we propose HeartSpot and fo…
▽ More
Advances in data-driven deep learning for chest X-ray image analysis underscore the need for explainability, privacy, large datasets and significant computational resources. We frame privacy and explainability as a lossy single-image compression problem to reduce both computational and data requirements without training. For Cardiomegaly detection in chest X-ray images, we propose HeartSpot and four spatial bias priors. HeartSpot priors define how to sample pixels based on domain knowledge from medical literature and from machines. HeartSpot privatizes chest X-ray images by discarding up to 97% of pixels, such as those that reveal the shape of the thoracic cage, bones, small lesions and other sensitive features. HeartSpot priors are ante-hoc explainable and give a human-interpretable image of the preserved spatial features that clearly outlines the heart. HeartSpot offers strong compression, with up to 32x fewer pixels and 11x smaller filesize. Cardiomegaly detectors using HeartSpot are up to 9x faster to train or at least as accurate (up to +.01 AUC ROC) when compared to a baseline DenseNet121. HeartSpot is post-hoc explainable by re-using existing attribution methods without requiring access to the original non-privatized image. In summary, HeartSpot improves speed and accuracy, reduces image size, improves privacy and ensures explainability.
Source code: https://www.github.com/adgaudio/HeartSpot
△ Less
Submitted 5 October, 2022;
originally announced October 2022.
-
Ellipsis: Towards Efficient System Auditing for Real-Time Systems
Authors:
Ayoosh Bansal,
Anant Kandikuppa,
Chien-Ying Chen,
Monowar Hasan,
Adam Bates,
Sibin Mohan
Abstract:
System auditing is a powerful tool that provides insight into the nature of suspicious events in computing systems, allowing machine operators to detect and subsequently investigate security incidents. While auditing has proven invaluable to the security of traditional computers, existing audit frameworks are rarely designed with consideration for Real-Time Systems (RTS). The transparency provided…
▽ More
System auditing is a powerful tool that provides insight into the nature of suspicious events in computing systems, allowing machine operators to detect and subsequently investigate security incidents. While auditing has proven invaluable to the security of traditional computers, existing audit frameworks are rarely designed with consideration for Real-Time Systems (RTS). The transparency provided by system auditing would be of tremendous benefit in a variety of security-critical RTS domains, (e.g., autonomous vehicles); however, if audit mechanisms are not carefully integrated into RTS, auditing can be rendered ineffectual and violate the real-world temporal requirements of the RTS.
In this paper, we demonstrate how to adapt commodity audit frameworks to RTS. Using Linux Audit as a case study, we first demonstrate that the volume of audit events generated by commodity frameworks is unsustainable within the temporal and resource constraints of real-time (RT) applications. To address this, we present Ellipsis, a set of kernel-based reduction techniques that leverage the periodic repetitive nature of RT applications to aggressively reduce the costs of system-level auditing. Ellipsis generates succinct descriptions of RT applications' expected activity while retaining a detailed record of unexpected activities, enabling analysis of suspicious activity while meeting temporal constraints. Our evaluation of Ellipsis, using ArduPilot (an open-source autopilot application suite) demonstrates up to 93% reduction in audit log generation.
△ Less
Submitted 4 August, 2022;
originally announced August 2022.
-
Investigating star-formation activity towards the southern HII region RCW 42
Authors:
Vipin Kumar,
S. Vig,
V. S. Veena,
S. Mohan,
S. K. Ghosh,
A. Tej,
D. K. Ojha
Abstract:
The star-forming activity in the HII region RCW 42 is investigated using multiple wavebands, from near-infrared to radio wavelengths. Located at a distance of 5.8 kpc, this southern region has a bolometric luminosity of 1.8 $\times$ 10$^6$ L$_{\odot}$. The ionized gas emission has been imaged at low radio frequencies of 610 and 1280 MHz using the Giant Metrewave Radio Telescope, India and shows a…
▽ More
The star-forming activity in the HII region RCW 42 is investigated using multiple wavebands, from near-infrared to radio wavelengths. Located at a distance of 5.8 kpc, this southern region has a bolometric luminosity of 1.8 $\times$ 10$^6$ L$_{\odot}$. The ionized gas emission has been imaged at low radio frequencies of 610 and 1280 MHz using the Giant Metrewave Radio Telescope, India and shows a large expanse of the HII region, spanning $20\times 15$ pc$^2$. The average electron number density in the region is estimated to be $\sim70$ cm$^{-3}$, which suggests an average ionization fraction of the cloud to be $11\%$. An extended green object EGO G274.0649-01.1460 and several young stellar objects have been identified in the region using data from the 2MASS and Spitzer surveys. The dust emission from the associated molecular cloud is probed using Herschel Space Telescope, which reveals the presence of 5 clumps, C1-C5, in this region. Two millimetre emission cores of masses 380 and 390 M$_{\odot}$ towards the radio emission peak have been identified towards C1 from the ALMA map at 1.4 mm. The clumps are investigated for their evolutionary stages based on association with various star-formation tracers, and we find that all the clumps are in active/evolved stage.
△ Less
Submitted 29 July, 2022; v1 submitted 28 July, 2022;
originally announced July 2022.
-
Simulation analysis with rock muons from atmospheric neutrino interactions in the ICAL detector at INO
Authors:
R. Kanishka,
D. Indumathi,
Lakshmi S. Mohan,
V. Bhatnagar
Abstract:
The proposed magnetized Iron CALorimeter detector (ICAL) to be built in the India-based Neutrino Observatory (INO) laboratory aims to study atmospheric neutrinos and its properties such as precision measurements of oscillation parameters and the neutrino mass hierarchy. High energy charged current (CC) interactions of atmospheric neutrinos with the rock surrounding the detector produce so-called "…
▽ More
The proposed magnetized Iron CALorimeter detector (ICAL) to be built in the India-based Neutrino Observatory (INO) laboratory aims to study atmospheric neutrinos and its properties such as precision measurements of oscillation parameters and the neutrino mass hierarchy. High energy charged current (CC) interactions of atmospheric neutrinos with the rock surrounding the detector produce so-called "rock muons" along with hadrons. While the hadron component of these events are absorbed in the rock itself, the rock muons traverse the rock and are detected in the detector. These rock muon events can be distinguished from cosmic muons only in the upward direction and can provide an independent measurement of the oscillation parameters. A simulation study of these events at the ICAL detector shows that, although reduced in significance compared to muons produced in direct CC neutrino interactions with the detector, these events are indeed sensitive to the oscillation parameters, achieving a possible $1σ$ precision of 10\% and 27\% in determining $Δm_{32}^2$ and $\sin^2θ_{23}$, respectively. Hence a combination of the standard atmospheric neutrino analysis which is the main goal of ICAL, with these rock muon events, will improve the precision reach of ICAL for these parameters.
△ Less
Submitted 21 November, 2023; v1 submitted 17 July, 2022;
originally announced July 2022.
-
Radio spectra of protostellar jets: Thermal and non-thermal emission
Authors:
Sreelekshmi Mohan,
Sarita Vig,
Samir Mandal
Abstract:
Protostellar jets and outflows are pointers of star-formation and serve as important sources of momentum and energy transfer to the interstellar medium. Radio emission from ionized jets have been detected towards a number of protostellar objects. In few cases, negative spectral indices and polarized emission have also been observed suggesting the presence of synchrotron emission from relativistic…
▽ More
Protostellar jets and outflows are pointers of star-formation and serve as important sources of momentum and energy transfer to the interstellar medium. Radio emission from ionized jets have been detected towards a number of protostellar objects. In few cases, negative spectral indices and polarized emission have also been observed suggesting the presence of synchrotron emission from relativistic electrons. In this work, we develop a numerical model that incorporates both thermal free-free and non-thermal synchrotron emission mechanisms in the jet geometry. The flux densities include contribution from an inner thermal jet, and a combination of emission from thermal and non-thermal distributions along the edges and extremities, where the jet interacts with the interstellar medium. We also include the effect of varying ionization fraction laterally across the jet. An investigation of radio emission and spectra along the jet shows the dependence of the emission process and optical depth along the line of sight. We explore the effect of various parameters on the turnover frequencies and the radio spectral indices (between 10 MHz and 300 GHz) associated with them.
△ Less
Submitted 26 April, 2022;
originally announced April 2022.
-
A note on load balancing in DC microgrids
Authors:
Shravan Mohan,
Bharath Bhikkaji
Abstract:
A problem of load balancing in isolated DC microgrids is considered in this paper. Here, a DC load is fed by multiple heterogenous DC sources, each of which is connected to the load via a boost converter. The gains of the DCC's provide for a means to control the division of load current amongst the DC sources. The primary objective of the control scheme is to minimise the total losses in the netwo…
▽ More
A problem of load balancing in isolated DC microgrids is considered in this paper. Here, a DC load is fed by multiple heterogenous DC sources, each of which is connected to the load via a boost converter. The gains of the DCC's provide for a means to control the division of load current amongst the DC sources. The primary objective of the control scheme is to minimise the total losses in the network, while maintaining the output voltage within a desired range, serving the load current demand and adhering to VI-characteristics of the power sources. Under assumptions of concavity/monotonocity/piece-wise-linearity of the VI-characteristics, the problem is solved using a convex relaxation. It is shown that the solution to the relaxed problem is tight. Thus, the resulting algorithm is guaranteed to reach global optimality in a numerically efficient manner. Simulations are provided for corroboration.
△ Less
Submitted 26 April, 2022;
originally announced April 2022.
-
Federated Learning Enables Big Data for Rare Cancer Boundary Detection
Authors:
Sarthak Pati,
Ujjwal Baid,
Brandon Edwards,
Micah Sheller,
Shih-Han Wang,
G Anthony Reina,
Patrick Foley,
Alexey Gruzdev,
Deepthi Karkada,
Christos Davatzikos,
Chiharu Sako,
Satyam Ghodasara,
Michel Bilello,
Suyash Mohan,
Philipp Vollmuth,
Gianluca Brugnara,
Chandrakanth J Preetha,
Felix Sahm,
Klaus Maier-Hein,
Maximilian Zenk,
Martin Bendszus,
Wolfgang Wick,
Evan Calabrese,
Jeffrey Rudie,
Javier Villanueva-Meyer
, et al. (254 additional authors not shown)
Abstract:
Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train acc…
▽ More
Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25,256 MRI scans from 6,314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing.
△ Less
Submitted 25 April, 2022; v1 submitted 22 April, 2022;
originally announced April 2022.
-
LSTM-RASA Based Agri Farm Assistant for Farmers
Authors:
Narayana Darapaneni,
Selvakumar Raj,
Raghul V,
Venkatesh Sivaraman,
Sunil Mohan,
Anwesh Reddy Paduri
Abstract:
The application of Deep Learning and Natural Language based ChatBots are growing rapidly in recent years. They are used in many fields like customer support, reservation system and as personal assistant. The Enterprises are using such ChatBots to serve their customers in a better and efficient manner. Even after such technological advancement, the expert advice does not reach the farmers on timely…
▽ More
The application of Deep Learning and Natural Language based ChatBots are growing rapidly in recent years. They are used in many fields like customer support, reservation system and as personal assistant. The Enterprises are using such ChatBots to serve their customers in a better and efficient manner. Even after such technological advancement, the expert advice does not reach the farmers on timely manner. The farmers are still largely dependent on their peers knowledge in solving the problems they face in their field. These technologies have not been effectively used to give the required information to farmers on timely manner. This project aims to implement a closed domain ChatBot for the field of Agriculture Farmers Assistant. Farmers can have conversation with the Chatbot and get the expert advice in their field. Farmers Assistant is based on RASA Open Source Framework. The Chatbot identifies the intent and entity from user utterances and retrieve the remedy from the database and share it with the user. We tested the Bot with existing data and it showed promising results.
△ Less
Submitted 7 April, 2022;
originally announced April 2022.
-
A Distant Supervision Corpus for Extracting Biomedical Relationships Between Chemicals, Diseases and Genes
Authors:
Dongxu Zhang,
Sunil Mohan,
Michaela Torkar,
Andrew McCallum
Abstract:
We introduce ChemDisGene, a new dataset for training and evaluating multi-class multi-label document-level biomedical relation extraction models. Our dataset contains 80k biomedical research abstracts labeled with mentions of chemicals, diseases, and genes, portions of which human experts labeled with 18 types of biomedical relationships between these entities (intended for evaluation), and the re…
▽ More
We introduce ChemDisGene, a new dataset for training and evaluating multi-class multi-label document-level biomedical relation extraction models. Our dataset contains 80k biomedical research abstracts labeled with mentions of chemicals, diseases, and genes, portions of which human experts labeled with 18 types of biomedical relationships between these entities (intended for evaluation), and the remainder of which (intended for training) has been distantly labeled via the CTD database with approximately 78\% accuracy. In comparison to similar preexisting datasets, ours is both substantially larger and cleaner; it also includes annotations linking mentions to their entities. We also provide three baseline deep neural network relation extraction models trained and evaluated on our new dataset.
△ Less
Submitted 13 April, 2022;
originally announced April 2022.
-
Deep Probability Estimation
Authors:
Sheng Liu,
Aakash Kaku,
Weicheng Zhu,
Matan Leibovich,
Sreyas Mohan,
Boyang Yu,
Haoxiang Huang,
Laure Zanna,
Narges Razavian,
Jonathan Niles-Weed,
Carlos Fernandez-Granda
Abstract:
Reliable probability estimation is of crucial importance in many real-world applications where there is inherent (aleatoric) uncertainty. Probability-estimation models are trained on observed outcomes (e.g. whether it has rained or not, or whether a patient has died or not), because the ground-truth probabilities of the events of interest are typically unknown. The problem is therefore analogous t…
▽ More
Reliable probability estimation is of crucial importance in many real-world applications where there is inherent (aleatoric) uncertainty. Probability-estimation models are trained on observed outcomes (e.g. whether it has rained or not, or whether a patient has died or not), because the ground-truth probabilities of the events of interest are typically unknown. The problem is therefore analogous to binary classification, with the difference that the objective is to estimate probabilities rather than predicting the specific outcome. This work investigates probability estimation from high-dimensional data using deep neural networks. There exist several methods to improve the probabilities generated by these models but they mostly focus on model (epistemic) uncertainty. For problems with inherent uncertainty, it is challenging to evaluate performance without access to ground-truth probabilities. To address this, we build a synthetic dataset to study and compare different computable metrics. We evaluate existing methods on the synthetic data as well as on three real-world probability estimation tasks, all of which involve inherent uncertainty: precipitation forecasting from radar images, predicting cancer patient survival from histopathology images, and predicting car crashes from dashcam videos. We also give a theoretical analysis of a model for high-dimensional probability estimation which reproduces several of the phenomena evinced in our experiments. Finally, we propose a new method for probability estimation using neural networks, which modifies the training process to promote output probabilities that are consistent with empirical probabilities computed from the data. The method outperforms existing approaches on most metrics on the simulated as well as real-world data.
△ Less
Submitted 11 October, 2022; v1 submitted 20 November, 2021;
originally announced November 2021.
-
Attention W-Net: Improved Skip Connections for better Representations
Authors:
Shikhar Mohan,
Saumik Bhattacharya,
Sayantari Ghosh
Abstract:
Segmentation of macro and microvascular structures in fundoscopic retinal images plays a crucial role in the detection of multiple retinal and systemic diseases, yet it is a difficult problem to solve. Most neural network approaches face several issues such as lack of enough parameters, overfitting and/or incompatibility between internal feature-spaces. We propose Attention W-Net, a new U-Net base…
▽ More
Segmentation of macro and microvascular structures in fundoscopic retinal images plays a crucial role in the detection of multiple retinal and systemic diseases, yet it is a difficult problem to solve. Most neural network approaches face several issues such as lack of enough parameters, overfitting and/or incompatibility between internal feature-spaces. We propose Attention W-Net, a new U-Net based architecture for retinal vessel segmentation to address these problems. In this architecture, we have two main contributions: Attention Block and regularisation measures. Our Attention Block uses attention between encoder and decoder features, resulting in higher compatibility upon addition. Our regularisation measures include augmentation and modifications to the ResNet Block used, which greatly prevent overfitting. We observe an F1 and AUC of 0.8407 and 0.9833 on the DRIVE and 0.8174 and 0.9865 respectively on the CHASE-DB1 datasets - a sizeable improvement over its backbone as well as competitive performance among contemporary state-of-the-art methods.
△ Less
Submitted 29 June, 2022; v1 submitted 17 October, 2021;
originally announced October 2021.
-
Making Document-Level Information Extraction Right for the Right Reasons
Authors:
Liyan Tang,
Dhruv Rajan,
Suyash Mohan,
Abhijeet Pradhan,
R. Nick Bryan,
Greg Durrett
Abstract:
Document-level models for information extraction tasks like slot-filling are flexible: they can be applied to settings where information is not necessarily localized in a single sentence. For example, key features of a diagnosis in a radiology report may not be explicitly stated in one place, but nevertheless can be inferred from parts of the report's text. However, these models can easily learn s…
▽ More
Document-level models for information extraction tasks like slot-filling are flexible: they can be applied to settings where information is not necessarily localized in a single sentence. For example, key features of a diagnosis in a radiology report may not be explicitly stated in one place, but nevertheless can be inferred from parts of the report's text. However, these models can easily learn spurious correlations between labels and irrelevant information. This work studies how to ensure that these models make correct inferences from complex text and make those inferences in an auditable way: beyond just being right, are these models "right for the right reasons?" We experiment with post-hoc evidence extraction in a predict-select-verify framework using feature attribution techniques. We show that regularization with small amounts of evidence supervision during training can substantially improve the quality of extracted evidence. We evaluate on two domains: a small-scale labeled dataset of brain MRI reports and a large-scale modified version of DocRED (Yao et al., 2019) and show that models' plausibility can be improved with no loss in accuracy.
△ Less
Submitted 18 May, 2022; v1 submitted 14 October, 2021;
originally announced October 2021.
-
Perturbation CheckLists for Evaluating NLG Evaluation Metrics
Authors:
Ananya B. Sai,
Tanay Dixit,
Dev Yashpal Sheth,
Sreyas Mohan,
Mitesh M. Khapra
Abstract:
Natural Language Generation (NLG) evaluation is a multifaceted task requiring assessment of multiple desirable criteria, e.g., fluency, coherency, coverage, relevance, adequacy, overall quality, etc. Across existing datasets for 6 NLG tasks, we observe that the human evaluation scores on these multiple criteria are often not correlated. For example, there is a very low correlation between human sc…
▽ More
Natural Language Generation (NLG) evaluation is a multifaceted task requiring assessment of multiple desirable criteria, e.g., fluency, coherency, coverage, relevance, adequacy, overall quality, etc. Across existing datasets for 6 NLG tasks, we observe that the human evaluation scores on these multiple criteria are often not correlated. For example, there is a very low correlation between human scores on fluency and data coverage for the task of structured data to text generation. This suggests that the current recipe of proposing new automatic evaluation metrics for NLG by showing that they correlate well with scores assigned by humans for a single criteria (overall quality) alone is inadequate. Indeed, our extensive study involving 25 automatic evaluation metrics across 6 different tasks and 18 different evaluation criteria shows that there is no single metric which correlates well with human scores on all desirable criteria, for most NLG tasks. Given this situation, we propose CheckLists for better design and evaluation of automatic metrics. We design templates which target a specific criteria (e.g., coverage) and perturb the output such that the quality gets affected only along this specific criteria (e.g., the coverage drops). We show that existing evaluation metrics are not robust against even such simple perturbations and disagree with scores assigned by humans to the perturbed output. The proposed templates thus allow for a fine-grained assessment of automatic evaluation metrics exposing their limitations and will facilitate better design, analysis and evaluation of such metrics.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.
-
Adaptive Denoising via GainTuning
Authors:
Sreyas Mohan,
Joshua L. Vincent,
Ramon Manzorro,
Peter A. Crozier,
Eero P. Simoncelli,
Carlos Fernandez-Granda
Abstract:
Deep convolutional neural networks (CNNs) for image denoising are usually trained on large datasets. These models achieve the current state of the art, but they have difficulties generalizing when applied to data that deviate from the training distribution. Recent work has shown that it is possible to train denoisers on a single noisy image. These models adapt to the features of the test image, bu…
▽ More
Deep convolutional neural networks (CNNs) for image denoising are usually trained on large datasets. These models achieve the current state of the art, but they have difficulties generalizing when applied to data that deviate from the training distribution. Recent work has shown that it is possible to train denoisers on a single noisy image. These models adapt to the features of the test image, but their performance is limited by the small amount of information used to train them. Here we propose "GainTuning", in which CNN models pre-trained on large datasets are adaptively and selectively adjusted for individual test images. To avoid overfitting, GainTuning optimizes a single multiplicative scaling parameter (the "Gain") of each channel in the convolutional layers of the CNN. We show that GainTuning improves state-of-the-art CNNs on standard image-denoising benchmarks, boosting their denoising performance on nearly every image in a held-out test set. These adaptive improvements are even more substantial for test images differing systematically from the training data, either in noise level or image type. We illustrate the potential of adaptive denoising in a scientific application, in which a CNN is trained on synthetic data, and tested on real transmission-electron-microscope images. In contrast to the existing methodology, GainTuning is able to faithfully reconstruct the structure of catalytic nanoparticles from these data at extremely low signal-to-noise ratios.
△ Less
Submitted 27 July, 2021;
originally announced July 2021.
-
Playing Angry Birds with a Domain-Independent PDDL+ Planner
Authors:
Wiktor Piotrowski,
Roni Stern,
Matthew Klenk,
Alexandre Perez,
Shiwali Mohan,
Johan de Kleer,
Jacob Le
Abstract:
This demo paper presents the first system for playing the popular Angry Birds game using a domain-independent planner. Our system models Angry Birds levels using PDDL+, a planning language for mixed discrete/continuous domains. It uses a domain-independent PDDL+ planner to generate plans and executes them. In this demo paper, we present the system's PDDL+ model for this domain, identify key design…
▽ More
This demo paper presents the first system for playing the popular Angry Birds game using a domain-independent planner. Our system models Angry Birds levels using PDDL+, a planning language for mixed discrete/continuous domains. It uses a domain-independent PDDL+ planner to generate plans and executes them. In this demo paper, we present the system's PDDL+ model for this domain, identify key design decisions that reduce the problem complexity, and compare the performance of our system to model-specific methods for this domain. The results show that our system's performance is on par with other domain-specific systems for Angry Birds, suggesting the applicability of domain-independent planning to this benchmark AI challenge.
△ Less
Submitted 9 July, 2021;
originally announced July 2021.
-
The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification
Authors:
Ujjwal Baid,
Satyam Ghodasara,
Suyash Mohan,
Michel Bilello,
Evan Calabrese,
Errol Colak,
Keyvan Farahani,
Jayashree Kalpathy-Cramer,
Felipe C. Kitamura,
Sarthak Pati,
Luciano M. Prevedello,
Jeffrey D. Rudie,
Chiharu Sako,
Russell T. Shinohara,
Timothy Bergquist,
Rong Chai,
James Eddy,
Julia Elliott,
Walter Reade,
Thomas Schaffter,
Thomas Yu,
Jiaxin Zheng,
Ahmed W. Moawad,
Luiz Otavio Coelho,
Olivia McDonnell
, et al. (78 additional authors not shown)
Abstract:
The BraTS 2021 challenge celebrates its 10th anniversary and is jointly organized by the Radiological Society of North America (RSNA), the American Society of Neuroradiology (ASNR), and the Medical Image Computing and Computer Assisted Interventions (MICCAI) society. Since its inception, BraTS has been focusing on being a common benchmarking venue for brain glioma segmentation algorithms, with wel…
▽ More
The BraTS 2021 challenge celebrates its 10th anniversary and is jointly organized by the Radiological Society of North America (RSNA), the American Society of Neuroradiology (ASNR), and the Medical Image Computing and Computer Assisted Interventions (MICCAI) society. Since its inception, BraTS has been focusing on being a common benchmarking venue for brain glioma segmentation algorithms, with well-curated multi-institutional multi-parametric magnetic resonance imaging (mpMRI) data. Gliomas are the most common primary malignancies of the central nervous system, with varying degrees of aggressiveness and prognosis. The RSNA-ASNR-MICCAI BraTS 2021 challenge targets the evaluation of computational algorithms assessing the same tumor compartmentalization, as well as the underlying tumor's molecular characterization, in pre-operative baseline mpMRI data from 2,040 patients. Specifically, the two tasks that BraTS 2021 focuses on are: a) the segmentation of the histologically distinct brain tumor sub-regions, and b) the classification of the tumor's O[6]-methylguanine-DNA methyltransferase (MGMT) promoter methylation status. The performance evaluation of all participating algorithms in BraTS 2021 will be conducted through the Sage Bionetworks Synapse platform (Task 1) and Kaggle (Task 2), concluding in distributing to the top ranked participants monetary awards of $60,000 collectively.
△ Less
Submitted 12 September, 2021; v1 submitted 5 July, 2021;
originally announced July 2021.
-
Comparative assessment of typical controlrealizations of grid forming converters based ontheir voltage source behaviour
Authors:
Kanakesh Vatta Kkuni,
Sibin Mohan,
Guangya Yang,
Wilsun Xu
Abstract:
The converter control functions to provide the capabilities similar to synchronous generators are referred to as grid forming converters (GFC). Identical to a synchronous machine, a grid forming converter is expected to behave as a voltage source behind an impedance beyond the control bandwidth. However, GFC's realization has been different, with some utilizes inner current and voltage controllers…
▽ More
The converter control functions to provide the capabilities similar to synchronous generators are referred to as grid forming converters (GFC). Identical to a synchronous machine, a grid forming converter is expected to behave as a voltage source behind an impedance beyond the control bandwidth. However, GFC's realization has been different, with some utilizes inner current and voltage controllers while others do not. This paper studies the impact of the inner loop on the grid forming converter's ability to behave as a voltage source behind an impedance. Three of the most popular GFC structures, 1) GFC with cascaded voltage and current control, 2) with inner current control, 3) with no inner loop, are chosen for the comparison. The analysis revealed that MW level GFC with inner loops could potentially go unstable under weak power system. Additionally, the GFC with cascaded control can only operate stably within a narrow range of network impedances. Furthermore, it is also shown that slow response behavior based on cascaded inner loop can impact on dynamic reactive and active power-sharing.
△ Less
Submitted 18 June, 2021;
originally announced June 2021.
-
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis
Authors:
Devang S Ram Mohan,
Vivian Hu,
Tian Huey Teh,
Alexandra Torresquintero,
Christopher G. R. Wallis,
Marlene Staib,
Lorenzo Foglianti,
Jiameng Gao,
Simon King
Abstract:
Text does not fully specify the spoken form, so text-to-speech models must be able to learn from speech data that vary in ways not explained by the corresponding text. One way to reduce the amount of unexplained variation in training data is to provide acoustic information as an additional learning signal. When generating speech, modifying this acoustic information enables multiple distinct rendit…
▽ More
Text does not fully specify the spoken form, so text-to-speech models must be able to learn from speech data that vary in ways not explained by the corresponding text. One way to reduce the amount of unexplained variation in training data is to provide acoustic information as an additional learning signal. When generating speech, modifying this acoustic information enables multiple distinct renditions of a text to be produced.
Since much of the unexplained variation is in the prosody, we propose a model that generates speech explicitly conditioned on the three primary acoustic correlates of prosody: $F_{0}$, energy and duration. The model is flexible about how the values of these features are specified: they can be externally provided, or predicted from text, or predicted then subsequently modified.
Compared to a model that employs a variational auto-encoder to learn unsupervised latent features, our model provides more interpretable, temporally-precise, and disentangled control. When automatically predicting the acoustic features from text, it generates speech that is more natural than that from a Tacotron 2 model with reference encoder. Subsequent human-in-the-loop modification of the predicted acoustic features can significantly further increase naturalness.
△ Less
Submitted 15 June, 2021;
originally announced June 2021.
-
ADEPT: A Dataset for Evaluating Prosody Transfer
Authors:
Alexandra Torresquintero,
Tian Huey Teh,
Christopher G. R. Wallis,
Marlene Staib,
Devang S Ram Mohan,
Vivian Hu,
Lorenzo Foglianti,
Jiameng Gao,
Simon King
Abstract:
Text-to-speech is now able to achieve near-human naturalness and research focus has shifted to increasing expressivity. One popular method is to transfer the prosody from a reference speech sample. There have been considerable advances in using prosody transfer to generate more expressive speech, but the field lacks a clear definition of what successful prosody transfer means and a method for meas…
▽ More
Text-to-speech is now able to achieve near-human naturalness and research focus has shifted to increasing expressivity. One popular method is to transfer the prosody from a reference speech sample. There have been considerable advances in using prosody transfer to generate more expressive speech, but the field lacks a clear definition of what successful prosody transfer means and a method for measuring it.
We introduce a dataset of prosodically-varied reference natural speech samples for evaluating prosody transfer. The samples include global variations reflecting emotion and interpersonal attitude, and local variations reflecting topical emphasis, propositional attitude, syntactic phrasing and marked tonicity. The corpus only includes prosodic variations that listeners are able to distinguish with reasonable accuracy, and we report these figures as a benchmark against which text-to-speech prosody transfer can be compared.
We conclude the paper with a demonstration of our proposed evaluation methodology, using the corpus to evaluate two text-to-speech models that perform prosody transfer.
△ Less
Submitted 21 July, 2021; v1 submitted 15 June, 2021;
originally announced June 2021.