-
FedMRL: Data Heterogeneity Aware Federated Multi-agent Deep Reinforcement Learning for Medical Imaging
Authors:
Pranab Sahoo,
Ashutosh Tripathi,
Sriparna Saha,
Samrat Mondal
Abstract:
Despite recent advancements in federated learning (FL) for medical image diagnosis, addressing data heterogeneity among clients remains a significant challenge for practical implementation. A primary hurdle in FL arises from the non-IID nature of data samples across clients, which typically results in a decline in the performance of the aggregated global model. In this study, we introduce FedMRL,…
▽ More
Despite recent advancements in federated learning (FL) for medical image diagnosis, addressing data heterogeneity among clients remains a significant challenge for practical implementation. A primary hurdle in FL arises from the non-IID nature of data samples across clients, which typically results in a decline in the performance of the aggregated global model. In this study, we introduce FedMRL, a novel federated multi-agent deep reinforcement learning framework designed to address data heterogeneity. FedMRL incorporates a novel loss function to facilitate fairness among clients, preventing bias in the final global model. Additionally, it employs a multi-agent reinforcement learning (MARL) approach to calculate the proximal term $(μ)$ for the personalized local objective function, ensuring convergence to the global optimum. Furthermore, FedMRL integrates an adaptive weight adjustment method using a Self-organizing map (SOM) on the server side to counteract distribution shifts among clients' local data distributions. We assess our approach using two publicly available real-world medical datasets, and the results demonstrate that FedMRL significantly outperforms state-of-the-art techniques, showing its efficacy in addressing data heterogeneity in federated learning. The code can be found here~{\url{https://github.com/Pranabiitp/FedMRL}}.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Fast, Scalable, Energy-Efficient Non-element-wise Matrix Multiplication on FPGA
Authors:
Xuqi Zhu,
Huaizhi Zhang,
JunKyu Lee,
Jiacheng Zhu,
Chandrajit Pal,
Sangeet Saha,
Klaus D. McDonald-Maier,
Xiaojun Zhai
Abstract:
Modern Neural Network (NN) architectures heavily rely on vast numbers of multiply-accumulate arithmetic operations, constituting the predominant computational cost. Therefore, this paper proposes a high-throughput, scalable and energy efficient non-element-wise matrix multiplication unit on FPGAs as a basic component of the NNs. We firstly streamline inter-layer and intra-layer redundancies of MAD…
▽ More
Modern Neural Network (NN) architectures heavily rely on vast numbers of multiply-accumulate arithmetic operations, constituting the predominant computational cost. Therefore, this paper proposes a high-throughput, scalable and energy efficient non-element-wise matrix multiplication unit on FPGAs as a basic component of the NNs. We firstly streamline inter-layer and intra-layer redundancies of MADDNESS algorithm, a LUT-based approximate matrix multiplication, to design a fast, efficient scalable approximate matrix multiplication module termed "Approximate Multiplication Unit (AMU)". The AMU optimizes LUT-based matrix multiplications further through dedicated memory management and access design, decoupling computational overhead from input resolution and boosting FPGA-based NN accelerator efficiency significantly. The experimental results show that using our AMU achieves up to 9x higher throughput and 112x higher energy efficiency over the state-of-the-art solutions for the FPGA-based Quantised Neural Network (QNN) accelerators.
△ Less
Submitted 7 July, 2024; v1 submitted 2 July, 2024;
originally announced July 2024.
-
Magnetic critical phenomena and low temperature re-entrant spin-glass features of Al$_2$MnFe Heusler alloy
Authors:
Abhinav Kumar Khorwal,
Sujoy Saha,
Mukesh Verma,
Lalita Saini,
Suvigya Kaushik,
Yugandhar Bitla,
Alexey V. Lukoyanov,
Ajit K. Patra
Abstract:
A detailed investigation of the structural and magnetic properties, including magnetocaloric effect, re-entrant spin-glass behavior at low temperature, and critical behavior in polycrystalline Al$_2$MnFe Heusler alloy is reported. The prepared alloy crystallizes in a cubic CsCl-type crystal structure with Pm-3m space group. The temperature-dependent magnetization data reveals a second-order parama…
▽ More
A detailed investigation of the structural and magnetic properties, including magnetocaloric effect, re-entrant spin-glass behavior at low temperature, and critical behavior in polycrystalline Al$_2$MnFe Heusler alloy is reported. The prepared alloy crystallizes in a cubic CsCl-type crystal structure with Pm-3m space group. The temperature-dependent magnetization data reveals a second-order paramagnetic to ferromagnetic phase transition ($\sim$ 122.9 K), which is further supported by the analysis of the magnetocaloric effect. The isothermal magnetization loops show a soft ferromagnetic behavior of the studied alloy and also reveal an itinerant character of the underlying exchange interactions. In order to understand the nature of magnetic interactions, the critical exponents for spontaneous magnetization, initial magnetic susceptibility, and critical MH isotherm are determined using Modified Arrott plots, Kouvel-Fisher plots, and critical isotherm analysis. The derived critical exponents $β$ = 0.363(2), $γ$ = 1.384(3), and $δ$ = 4.81(3) confirm the critical behavior similar to that of a 3D-Heisenberg-type ferromagnet with short-range exchange interactions that are found to decay with distance as J(r) $\approx$ r$^{-4.936}$. Moreover, the detailed analysis of the AC susceptibility data suggests that the frequency-dependent shifting of the peak temperatures is well explained using standard dynamic scaling laws such as the critical slowing down model and Vogel-Fulcher law, and confirms the signature of re-entrant spin-glass features in Al$_2$MnFe Heusler alloy. Furthermore, maximum magnetic entropy change of $\sim$ 1.92 J/kg-K and relative cooling power of $\sim$ 496 J/kg at 50 kOe applied magnetic field are determined from magnetocaloric studies that are comparable to those of other Mn-Fe-Al systems.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Development of an interactive GUI using MATLAB for the detection of type and stage of Breast Tumor
Authors:
Poulmi Banerjee,
Satadal Saha
Abstract:
Breast cancer is described as one of the most common types of cancer which has been diagnosed mainly in women. When compared in the ratio of male to female, it has been duly found that the prone of having breast cancer is more in females than males. Breast lumps are classified mainly into two groups namely: cancerous and non-cancerous. When we say that the lump in the breast is cancerous, it means…
▽ More
Breast cancer is described as one of the most common types of cancer which has been diagnosed mainly in women. When compared in the ratio of male to female, it has been duly found that the prone of having breast cancer is more in females than males. Breast lumps are classified mainly into two groups namely: cancerous and non-cancerous. When we say that the lump in the breast is cancerous, it means that it can spread via lobules, ducts, areola, stroma to various organs of the body. On the other hand, non-cancerous breast lumps are less harmful but it should be monitored under proper diagnosis to avoid it being transformed to cancerous lump. To diagnose these breast lumps the method of mammogram, ultrasonic images and MRI images are undertaken. Also, for better diagnosis sometimes doctors recommend for biopsy and any unforeseen anomalies occurring there may give rise to inaccurate test report. To avoid these discrepancies, processing the mammogram images is considered to be one of the most reliable methods. In the proposed method MATLAB GUI is developed and some sample images of breast lumps are placed accordingly in the respective axes. With the help of sliders the actual breast lump image is compared with the already stored breast lump sample images and then accordingly the history of the breast lumps is generated in real time in the form of test report.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
UltraGelBot: Autonomous Gel Dispenser for Robotic Ultrasound
Authors:
Deepak Raina,
Ziming Zhao,
Richard Voyles,
Juan Wachs,
Subir K. Saha,
S. H. Chandrashekhara
Abstract:
Telerobotic and Autonomous Robotic Ultrasound Systems (RUS) help alleviate the need for operator-dependability in free-hand ultrasound examinations. However, the state-of-the-art RUSs still rely on a human operator to apply the ultrasound gel. The lack of standardization in this process often leads to poor imaging of the scanned region. The reason for this has to do with air-gaps between the probe…
▽ More
Telerobotic and Autonomous Robotic Ultrasound Systems (RUS) help alleviate the need for operator-dependability in free-hand ultrasound examinations. However, the state-of-the-art RUSs still rely on a human operator to apply the ultrasound gel. The lack of standardization in this process often leads to poor imaging of the scanned region. The reason for this has to do with air-gaps between the probe and the human body. In this paper, we developed a end-of-arm tool for RUS, referred to as UltraGelBot. This bot can autonomously detect and dispense the gel. It uses a deep learning model to detect the gel from images acquired using an on-board camera. A motorized mechanism is also developed, which will use this feedback and dispense the gel. Experiments on phantom revealed that UltraGelBot increases the acquired image quality by $18.6\%$ and reduces the procedure time by $37.2\%$.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Pressure-induced exciton formation and superconductivity in platinum-based mineral Sperrylite
Authors:
Limin Wang,
Rongwei Hu,
Yash Anand,
Shanta R. Saha,
Jason R. Jeffries,
Johnpierre Paglione
Abstract:
We report a comprehensive study of Sperrylite (PtAs2), the main platinum source in natural minerals, as a function of applied pressures up to 150 GPa. While no structural phase transition was detected from pressure-dependent X-ray measurements, the unit cell volume shrinks monotonically with pressure following the third-order Birch-Murnaghan equation of state. The mildly semiconducting behavior fo…
▽ More
We report a comprehensive study of Sperrylite (PtAs2), the main platinum source in natural minerals, as a function of applied pressures up to 150 GPa. While no structural phase transition was detected from pressure-dependent X-ray measurements, the unit cell volume shrinks monotonically with pressure following the third-order Birch-Murnaghan equation of state. The mildly semiconducting behavior found in pure synthesized crystals at ambient pressures becomes more insulating upon increasing applied pressure before metalizing at higher pressures, giving way to the appearance of an abrupt decrease in resistance near 3 K at pressures above 92 GPa consistent with the onset of a superconducing phase. The pressure evolution of the calculated electronic band structure reveals the same physical trend as our transport measurements, with a non-monotonic evolution explained by a hole band that is pushed below the Fermi energy and an electron band that approaches it as a function of pressure, both reaching a touching point suggestive of an excitonic state. A topological Lifshitz transition of the electronic structure and an increase in the density of states may naturally explain the onset of superconductivity in this material
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Absence of a Bulk Thermodynamic Phase Transition to a Density Wave Phase in UTe2
Authors:
Florian Theuss,
Avi Shragai,
Gael Grissonnanche,
Luciano Peralta,
Gregorio de la Fuente Simarro,
Ian M Hayes,
Shanta R Saha,
Yun Suk Eo,
Alonso Suarez,
Andrea Capa Salinas,
Ganesh Pokharel,
Stephen D. Wilson,
Nicholas P Butch,
Johnpierre Paglione,
B. J. Ramshaw
Abstract:
Competing and intertwined orders are ubiquitous in strongly correlated electron systems, such as the charge, spin, and superconducting orders in the high-Tc cuprates. Recent scanning tunneling microscopy (STM) measurements provide evidence for a charge density wave (CDW) that coexists with superconductivity in the heavy Fermion metal UTe2. This CDW persists up to at least 7.5 K and, as a CDW break…
▽ More
Competing and intertwined orders are ubiquitous in strongly correlated electron systems, such as the charge, spin, and superconducting orders in the high-Tc cuprates. Recent scanning tunneling microscopy (STM) measurements provide evidence for a charge density wave (CDW) that coexists with superconductivity in the heavy Fermion metal UTe2. This CDW persists up to at least 7.5 K and, as a CDW breaks the translational symmetry of the lattice, its disappearance is necessarily accompanied by thermodynamic phase transition. Here, we report high-precision thermodynamic measurements of the elastic moduli of UTe2. We observe no signature of a phase transition in the elastic moduli down to a level of 1 part in 10^7, strongly implying the absence of bulk CDW order in UTe2. We suggest that the CDW and associated pair density wave (PDW) observed by STM may be confined to the surface of UTe2.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Absence of a bulk charge density wave signature in x-ray measurements of UTe$_2$
Authors:
Caitlin S. Kengle,
Dipanjan Chaudhuri,
Xuefei Guo,
Thomas A. Johnson,
Simon Bettler,
Wolfgang Simeth,
Matthew J. Krogstad,
Zahir Islam,
Sheng Ran,
Shanta R. Saha,
Johnpierre Paglione,
Nicholas P. Butch,
Eduardo Fradkin,
Vidya Madhavan,
Peter Abbamonte
Abstract:
The long-sought pair density wave (PDW) is an exotic phase of matter in which charge density wave (CDW) order is intertwined with the amplitude or phase of coexisting, superconducting order \cite{Berg2009,Berg2009b}. Originally predicted to exist in copper-oxides, circumstantial evidence for PDW order now exists in a variety of materials. Recently, scanning tunneling microscopy (STM) studies have…
▽ More
The long-sought pair density wave (PDW) is an exotic phase of matter in which charge density wave (CDW) order is intertwined with the amplitude or phase of coexisting, superconducting order \cite{Berg2009,Berg2009b}. Originally predicted to exist in copper-oxides, circumstantial evidence for PDW order now exists in a variety of materials. Recently, scanning tunneling microscopy (STM) studies have reported evidence for a three-component charge density wave (CDW) at the surface of the heavy-fermion superconductor, UTe$_2$, persisting below its superconducting transition temperature. Here, we use hard x-ray diffraction measurements on crystals of UTe$_2$ at $T = 1.9$ K and $12$ K to search for a bulk signature of this CDW. Using STM measurements as a constraint, we calculate the expected locations of CDW superlattice peaks, and sweep a large volume of reciprocal space in search of a signature. We failed to find any evidence for a CDW near any of the expected superlattice positions in many Brillouin zones. We estimate an upper bound on the CDW lattice distortion of $u_{max} \lesssim 4 \times 10^{-3} \mathrmÅ$. Our results suggest that the CDW observed in STM is either purely electronic, somehow lacking a signature in the structural lattice, or is restricted to the material surface.
△ Less
Submitted 24 June, 2024; v1 submitted 20 June, 2024;
originally announced June 2024.
-
AniFaceDiff: High-Fidelity Face Reenactment via Facial Parametric Conditioned Diffusion Models
Authors:
Ken Chen,
Sachith Seneviratne,
Wei Wang,
Dongting Hu,
Sanjay Saha,
Md. Tarek Hasan,
Sanka Rasnayaka,
Tamasha Malepathirana,
Mingming Gong,
Saman Halgamuge
Abstract:
Face reenactment refers to the process of transferring the pose and facial expressions from a reference (driving) video onto a static facial (source) image while maintaining the original identity of the source image. Previous research in this domain has made significant progress by training controllable deep generative models to generate faces based on specific identity, pose and expression condit…
▽ More
Face reenactment refers to the process of transferring the pose and facial expressions from a reference (driving) video onto a static facial (source) image while maintaining the original identity of the source image. Previous research in this domain has made significant progress by training controllable deep generative models to generate faces based on specific identity, pose and expression conditions. However, the mechanisms used in these methods to control pose and expression often inadvertently introduce identity information from the driving video, while also causing a loss of expression-related details. This paper proposes a new method based on Stable Diffusion, called AniFaceDiff, incorporating a new conditioning module for high-fidelity face reenactment. First, we propose an enhanced 2D facial snapshot conditioning approach by facial shape alignment to prevent the inclusion of identity information from the driving video. Then, we introduce an expression adapter conditioning mechanism to address the potential loss of expression-related information. Our approach effectively preserves pose and expression fidelity from the driving video while retaining the identity and fine details of the source image. Through experiments on the VoxCeleb dataset, we demonstrate that our method achieves state-of-the-art results in face reenactment, showcasing superior image quality, identity preservation, and expression accuracy, especially for cross-identity scenarios. Considering the ethical concerns surrounding potential misuse, we analyze the implications of our method, evaluate current state-of-the-art deepfake detectors, and identify their shortcomings to guide future research.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Automatic Speech Recognition for Biomedical Data in Bengali Language
Authors:
Shariar Kabir,
Nazmun Nahar,
Shyamasree Saha,
Mamunur Rashid
Abstract:
This paper presents the development of a prototype Automatic Speech Recognition (ASR) system specifically designed for Bengali biomedical data. Recent advancements in Bengali ASR are encouraging, but a lack of domain-specific data limits the creation of practical healthcare ASR models. This project bridges this gap by developing an ASR system tailored for Bengali medical terms like symptoms, sever…
▽ More
This paper presents the development of a prototype Automatic Speech Recognition (ASR) system specifically designed for Bengali biomedical data. Recent advancements in Bengali ASR are encouraging, but a lack of domain-specific data limits the creation of practical healthcare ASR models. This project bridges this gap by developing an ASR system tailored for Bengali medical terms like symptoms, severity levels, and diseases, encompassing two major dialects: Bengali and Sylheti. We train and evaluate two popular ASR frameworks on a comprehensive 46-hour Bengali medical corpus. Our core objective is to create deployable health-domain ASR systems for digital health applications, ultimately increasing accessibility for non-technical users in the healthcare sector.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Site-percolation transition of run-and-tumble particles
Authors:
Soumya K. Saha,
Aikya Banerjee,
P. K. Mohanty
Abstract:
We study percolation transition of run and tumble particles (RTPs) on a two dimensional square lattice. RTPs in these models run to the nearest neighbour along their internal orientation with unit rate, and to other nearest neighbours with rates $p$. In addition, they tumble to change their internal orientation with rate $ω$. We show that for small tumble rates, RTP-clusters created by joining occ…
▽ More
We study percolation transition of run and tumble particles (RTPs) on a two dimensional square lattice. RTPs in these models run to the nearest neighbour along their internal orientation with unit rate, and to other nearest neighbours with rates $p$. In addition, they tumble to change their internal orientation with rate $ω$. We show that for small tumble rates, RTP-clusters created by joining occupied nearest neighbours irrespective of their orientation form a phase separated state when the rate of positional diffusion $p$ crosses a threshold; with further increase of $p$ the clusters disintegrate and another transition to a mixed phase occurs. The critical exponents of this re-entrant site-percolation transition of RTPs vary continuously along the critical line in the $ω$-$p$ plane, but a scaling function remains invariant. This function is identical to the corresponding universal scaling function of percolation transition observed in the Ising model. We also show that the critical exponents of the underlying motility induced phase separation transition are related to corresponding percolation-critical-exponents by constant multiplicative factors known from the correspondence of magnetic and percolation critical exponents of Ising model.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Nonlocal cooperative behaviour, psychological effects, and collective decision-making: an exemplification with predator-prey models
Authors:
Sangeeta Saha,
Swadesh Pal,
Roderick Melnik
Abstract:
In bio-social models, cooperative behaviour has evolved as an adaptive strategy, playing multi-functional roles. One of such roles in populations is to increase the success of survival and reproduction of individuals and their families or social groups. Moreover, collective decision-making in cooperative behaviour is an aspect that is used to study the dynamic behaviour of individuals within a soc…
▽ More
In bio-social models, cooperative behaviour has evolved as an adaptive strategy, playing multi-functional roles. One of such roles in populations is to increase the success of survival and reproduction of individuals and their families or social groups. Moreover, collective decision-making in cooperative behaviour is an aspect that is used to study the dynamic behaviour of individuals within a social group. In this paper, we have focused on population dynamics by considering a predator-prey model as our main exemplification, where the generalist predator has adopted a cooperative hunting strategy while consuming their prey. In particular, we have analyzed the dynamic nature of the system when a nonlocal term is introduced in the cooperation. First, the Turing instability condition has been studied for the local model around the coexisting steady-state, followed by the Turing and non-Turing patterns in the presence of the nonlocal interaction term. This work is also concerned with the existence of travelling wave solutions for predator-prey interaction with the nonlocal cooperative hunting strategy. Such solutions are reported for local as well as for nonlocal models. We have characterized the invading speed of the predator with the help of the minimal wave speed of travelling wave solutions connecting the predator-free state to the co-existence state. The travelling waves are found to be non-monotonic in this system. The formation of wave trains has been demonstrated for an extended range of nonlocal interactions. Finally, the importance of psychological effects in shaping the dynamics of nonlocal collective behaviour is demonstrated with several representative examples.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Emergence of superradiance in dissipative dipolar-coupled spin systems
Authors:
Saptarshi Saha,
Yeshma Ibrahim,
Rangeet Bhattacharyya
Abstract:
In the superradiance phenomenon, a collection of non-interacting atoms exhibits collective dissipation due to interaction with a common radiation field, resulting in a non-monotonic decay profile. This work shows that dissipative dipolar-coupled systems exhibit an identical collective dissipation aided by the nonsecular part of the dipolar coupling. We consider a simplified dipolar network where t…
▽ More
In the superradiance phenomenon, a collection of non-interacting atoms exhibits collective dissipation due to interaction with a common radiation field, resulting in a non-monotonic decay profile. This work shows that dissipative dipolar-coupled systems exhibit an identical collective dissipation aided by the nonsecular part of the dipolar coupling. We consider a simplified dipolar network where the dipolar interaction between the spin-pairs is assumed to be identical. Hence the dynamics remain confined in the block diagonal Hilbert spaces. For a suitable choice of the initial condition, the resulting dynamics require dealing with a smaller subspace which helps extend the analysis to a larger spin network. To include the nonsecular dipolar relaxation, we use a fluctuation-regulated quantum master equation. We note that a successful observation of superradiance in this system requires a weak system-bath coupling. Moreover, we find that for an ensemble of N spins, the maximum intensity of the radiation exhibits a nearly quadratic scaling (N^2), and the dipolar relaxation time follows an inverse square proportionality (1/N^2); these two observations help characterize the emergence of superradiance. Our results agree well with the standard results of pure spin superradiance observed experimentally in various systems.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Language Models are Crossword Solvers
Authors:
Soumadeep Saha,
Sutanoya Chakraborty,
Saptarshi Saha,
Utpal Garain
Abstract:
Crosswords are a form of word puzzle that require a solver to demonstrate a high degree of proficiency in natural language understanding, wordplay, reasoning, and world knowledge, along with adherence to character and length constraints. In this paper we tackle the challenge of solving crosswords with Large Language Models (LLMs). We demonstrate that the current generation of state-of-the art (SoT…
▽ More
Crosswords are a form of word puzzle that require a solver to demonstrate a high degree of proficiency in natural language understanding, wordplay, reasoning, and world knowledge, along with adherence to character and length constraints. In this paper we tackle the challenge of solving crosswords with Large Language Models (LLMs). We demonstrate that the current generation of state-of-the art (SoTA) language models show significant competence at deciphering cryptic crossword clues, and outperform previously reported SoTA results by a factor of 2-3 in relevant benchmarks. We also develop a search algorithm that builds off this performance to tackle the problem of solving full crossword grids with LLMs for the very first time, achieving an accuracy of 93\% on New York Times crossword puzzles. Contrary to previous work in this area which concluded that LLMs lag human expert performance significantly, our research suggests this gap is a lot narrower.
△ Less
Submitted 14 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Understanding Inhibition Through Maximally Tense Images
Authors:
Chris Hamblin,
Srijani Saha,
Talia Konkle,
George Alvarez
Abstract:
We address the functional role of 'feature inhibition' in vision models; that is, what are the mechanisms by which a neural network ensures images do not express a given feature? We observe that standard interpretability tools in the literature are not immediately suited to the inhibitory case, given the asymmetry introduced by the ReLU activation function. Given this, we propose inhibition be und…
▽ More
We address the functional role of 'feature inhibition' in vision models; that is, what are the mechanisms by which a neural network ensures images do not express a given feature? We observe that standard interpretability tools in the literature are not immediately suited to the inhibitory case, given the asymmetry introduced by the ReLU activation function. Given this, we propose inhibition be understood through a study of 'maximally tense images' (MTIs), i.e. those images that excite and inhibit a given feature simultaneously. We show how MTIs can be studied with two novel visualization techniques; +/- attribution inversions, which split single images into excitatory and inhibitory components, and the attribution atlas, which provides a global visualization of the various ways images can excite/inhibit a feature. Finally, we explore the difficulties introduced by superposition, as such interfering features induce the same attribution motif as MTIs.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention
Authors:
Prince Jha,
Raghav Jain,
Konika Mandal,
Aman Chadha,
Sriparna Saha,
Pushpak Bhattacharyya
Abstract:
In the digital world, memes present a unique challenge for content moderation due to their potential to spread harmful content. Although detection methods have improved, proactive solutions such as intervention are still limited, with current research focusing mostly on text-based content, neglecting the widespread influence of multimodal content like memes. Addressing this gap, we present \textit…
▽ More
In the digital world, memes present a unique challenge for content moderation due to their potential to spread harmful content. Although detection methods have improved, proactive solutions such as intervention are still limited, with current research focusing mostly on text-based content, neglecting the widespread influence of multimodal content like memes. Addressing this gap, we present \textit{MemeGuard}, a comprehensive framework leveraging Large Language Models (LLMs) and Visual Language Models (VLMs) for meme intervention. \textit{MemeGuard} harnesses a specially fine-tuned VLM, \textit{VLMeme}, for meme interpretation, and a multimodal knowledge selection and ranking mechanism (\textit{MKS}) for distilling relevant knowledge. This knowledge is then employed by a general-purpose LLM to generate contextually appropriate interventions. Another key contribution of this work is the \textit{\textbf{I}ntervening} \textit{\textbf{C}yberbullying in \textbf{M}ultimodal \textbf{M}emes (ICMM)} dataset, a high-quality, labeled dataset featuring toxic memes and their corresponding human-annotated interventions. We leverage \textit{ICMM} to test \textit{MemeGuard}, demonstrating its proficiency in generating relevant and effective responses to toxic memes.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Npix2Cpix: A GAN-based Image-to-Image Translation Network with Retrieval-Classification Integration for Watermark Retrieval from Historical Document Images
Authors:
Utsab Saha,
Sawradip Saha,
Shaikh Anowarul Fattah,
Mohammad Saquib
Abstract:
The identification and restoration of ancient watermarks have long been a major topic in codicology and history. Classifying historical documents based on watermarks can be difficult due to the diversity of watermarks, crowded and noisy samples, multiple modes of representation, and minor distinctions between classes and intra-class changes. This paper proposes a U-net-based conditional generative…
▽ More
The identification and restoration of ancient watermarks have long been a major topic in codicology and history. Classifying historical documents based on watermarks can be difficult due to the diversity of watermarks, crowded and noisy samples, multiple modes of representation, and minor distinctions between classes and intra-class changes. This paper proposes a U-net-based conditional generative adversarial network (GAN) to translate noisy raw historical watermarked images into clean, handwriting-free images with just watermarks. Considering its ability to perform image translation from degraded (noisy) pixels to clean pixels, the proposed network is termed as Npix2Cpix. Instead of employing directly degraded watermarked images, the proposed network uses image-to-image translation using adversarial learning to create clutter and handwriting-free images for restoring and categorizing the watermarks for the first time. In order to learn the mapping from input noisy image to output clean image, the generator and discriminator of the proposed U-net-based GAN are trained using two separate loss functions, each of which is based on the distance between images. After using the proposed GAN to pre-process noisy watermarked images, Siamese-based one-shot learning is used to classify watermarks. According to experimental results on a large-scale historical watermark dataset, extracting watermarks from tainted images can result in high one-shot classification accuracy. The qualitative and quantitative evaluation of the retrieved watermarks illustrates the effectiveness of the proposed approach.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
A Generalized Apprenticeship Learning Framework for Modeling Heterogeneous Student Pedagogical Strategies
Authors:
Md Mirajul Islam,
Xi Yang,
John Hostetter,
Adittya Soukarjya Saha,
Min Chi
Abstract:
A key challenge in e-learning environments like Intelligent Tutoring Systems (ITSs) is to induce effective pedagogical policies efficiently. While Deep Reinforcement Learning (DRL) often suffers from sample inefficiency and reward function design difficulty, Apprenticeship Learning(AL) algorithms can overcome them. However, most AL algorithms can not handle heterogeneity as they assume all demonst…
▽ More
A key challenge in e-learning environments like Intelligent Tutoring Systems (ITSs) is to induce effective pedagogical policies efficiently. While Deep Reinforcement Learning (DRL) often suffers from sample inefficiency and reward function design difficulty, Apprenticeship Learning(AL) algorithms can overcome them. However, most AL algorithms can not handle heterogeneity as they assume all demonstrations are generated with a homogeneous policy driven by a single reward function. Still, some AL algorithms which consider heterogeneity, often can not generalize to large continuous state space and only work with discrete states. In this paper, we propose an expectation-maximization(EM)-EDM, a general AL framework to induce effective pedagogical policies from given optimal or near-optimal demonstrations, which are assumed to be driven by heterogeneous reward functions. We compare the effectiveness of the policies induced by our proposed EM-EDM against four AL-based baselines and two policies induced by DRL on two different but related tasks that involve pedagogical action prediction. Our overall results showed that, for both tasks, EM-EDM outperforms the four AL baselines across all performance metrics and the two DRL baselines. This suggests that EM-EDM can effectively model complex student pedagogical decision-making processes through the ability to manage a large, continuous state space and adapt to handle diverse and heterogeneous reward functions with very few given demonstrations.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
High-fidelity remote entanglement of trapped atoms mediated by time-bin photons
Authors:
Sagnik Saha,
Mikhail Shalaev,
Jameson O'Reilly,
Isabella Goetting,
George Toh,
Ashish Kalakuntla,
Yichao Yu,
Christopher Monroe
Abstract:
Photonic interconnects between quantum processing nodes are likely the only way to achieve large-scale quantum computers and networks. The bottleneck in such an architecture is the interface between well-isolated quantum memories and flying photons. We establish high-fidelity entanglement between remotely separated trapped atomic qubit memories, mediated by photonic qubits stored in the timing of…
▽ More
Photonic interconnects between quantum processing nodes are likely the only way to achieve large-scale quantum computers and networks. The bottleneck in such an architecture is the interface between well-isolated quantum memories and flying photons. We establish high-fidelity entanglement between remotely separated trapped atomic qubit memories, mediated by photonic qubits stored in the timing of their pulses. Such time-bin encoding removes sensitivity to polarization errors, enables long-distance quantum communication, and is extensible to quantum memories with more than two states. Using a measurement-based error detection process and suppressing a fundamental source of error due to atomic recoil, we achieve an entanglement fidelity of 97% and show that fidelities beyond 99.9% are feasible.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Bayesian compositional regression with flexible microbiome feature aggregation and selection
Authors:
Satabdi Saha,
Liangliang Zhang,
Kim-Anh Do,
Christine B. Peterson
Abstract:
Ongoing advances in microbiome profiling have allowed unprecedented insights into the molecular activities of microbial communities. This has fueled a strong scientific interest in understanding the critical role the microbiome plays in governing human health, by identifying microbial features associated with clinical outcomes of interest. Several aspects of microbiome data limit the applicability…
▽ More
Ongoing advances in microbiome profiling have allowed unprecedented insights into the molecular activities of microbial communities. This has fueled a strong scientific interest in understanding the critical role the microbiome plays in governing human health, by identifying microbial features associated with clinical outcomes of interest. Several aspects of microbiome data limit the applicability of existing variable selection approaches. In particular, microbiome data are high-dimensional, extremely sparse, and compositional. Importantly, many of the observed features, although categorized as different taxa, may play related functional roles. To address these challenges, we propose a novel compositional regression approach that leverages the data-adaptive clustering and variable selection properties of the spiked Dirichlet process to identify taxa that exhibit similar functional roles. Our proposed method, Bayesian Regression with Agglomerated Compositional Effects using a dirichLET process (BRACElet), enables the identification of a sparse set of features with shared impacts on the outcome, facilitating dimension reduction and model interpretation. We demonstrate that BRACElet outperforms existing approaches for microbiome variable selection through simulation studies and an application elucidating the impact of oral microbiome composition on insulin resistance.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Thermodynamics of the most generalized form of Holographic Dark Energy and some particular cases with Corrected Entropies
Authors:
Sanghati Saha,
Ertan Güdekli,
Surajit Chattopadhyay
Abstract:
The holographic cut-off in generalized dark energy (HDE) formalism depends on its cut-off. Following this, a four-parameter generalized entropy has recently been developed. It reduces to various known entropies for appropriate parameter limits in the study of Odintsov, S. D., S. DOnofrio, and T. Paul. (2023) Physics of the Dark Universe, 42 pp: 101277. In the current work, we investigate the evolu…
▽ More
The holographic cut-off in generalized dark energy (HDE) formalism depends on its cut-off. Following this, a four-parameter generalized entropy has recently been developed. It reduces to various known entropies for appropriate parameter limits in the study of Odintsov, S. D., S. DOnofrio, and T. Paul. (2023) Physics of the Dark Universe, 42 pp: 101277. In the current work, we investigate the evolution of the universe in its early phase and late phase within the framework of entropic cosmology, where the entropic energy density functions are reconstructed within the framework of the equivalence of holographic dark energy and four-parameter generalized entropy (Sg). Along with the reconstruction as mentioned earlier scheme, in this study, we demonstrate that an extensive variety of dark energy (DE) models can be considered distinct and particular candidates for the most generalized four-parameter entropic HDE family, each having their cut-off. We examined several entropic dark energy models in this regard, including the generalized holographic dark energy with Nojiri-Odintsov(NO) cut-off, the Barrow entropic HDE (BHDE) with particle horizon as IR cut-off, the Tsallis entropic HDE (THDE) with future event horizon as IR cut-off, all of three cases are particular cases of the most generalized four parameter entropic holographic dark energy. Inspired by S. Nojiri, and S. D. Odintsov (2006) (General Relativity and Gravitation, 38 p: 1285-1304 ) and (S. Nojiri and S. D. Odintsov, 2017, European Physical Journal C, 77, pp.1-8 ); our current work reports a study on cosmological parameters and thermodynamics with entropy-corrections (logarithmic and power-law) to cosmological horizon entropy as well as black hole entropy with a highly generalized viscous coupled holographic dark fluid along its particular cases.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
ToxVidLLM: A Multimodal LLM-based Framework for Toxicity Detection in Code-Mixed Videos
Authors:
Krishanu Maity,
A. S. Poornash,
Sriparna Saha,
Pushpak Bhattacharyya
Abstract:
In an era of rapidly evolving internet technology, the surge in multimodal content, including videos, has expanded the horizons of online communication. However, the detection of toxic content in this diverse landscape, particularly in low-resource code-mixed languages, remains a critical challenge. While substantial research has addressed toxic content detection in textual data, the realm of vide…
▽ More
In an era of rapidly evolving internet technology, the surge in multimodal content, including videos, has expanded the horizons of online communication. However, the detection of toxic content in this diverse landscape, particularly in low-resource code-mixed languages, remains a critical challenge. While substantial research has addressed toxic content detection in textual data, the realm of video content, especially in non-English languages, has been relatively underexplored. This paper addresses this research gap by introducing a benchmark dataset, the first of its kind, consisting of 931 videos with 4021 code-mixed Hindi-English utterances collected from YouTube. Each utterance within this dataset has been meticulously annotated for toxicity, severity, and sentiment labels. We have developed an advanced Multimodal Multitask framework built for Toxicity detection in Video Content by leveraging Large Language Models (LLMs), crafted for the primary objective along with the additional tasks of conducting sentiment and severity analysis. ToxVidLLM incorporates three key modules the Encoder module, Cross-Modal Synchronization module, and Multitask module crafting a generic multimodal LLM customized for intricate video classification tasks. Our experiments reveal that incorporating multiple modalities from the videos substantially enhances the performance of toxic content detection by achieving an Accuracy and Weighted F1 score of 94.29% and 94.35%, respectively.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Astrophysical aspects of $^{12}$C$(p,γ)^{13}$N reaction
Authors:
Soumya Saha
Abstract:
The Carbon-Nitrogen-Oxygen (CNO) cycle is fundamental to the process of hydrogen burning in stars, serving as a pivotal mechanism. At its core, the primary reaction involves the radiative capture of a proton by $^{12}$C, crucially influencing the isotopic ratio of $^{12}$C to $^{13}$C observed in celestial bodies, including our Solar System. We have addressed this reaction mechanism by extrapolati…
▽ More
The Carbon-Nitrogen-Oxygen (CNO) cycle is fundamental to the process of hydrogen burning in stars, serving as a pivotal mechanism. At its core, the primary reaction involves the radiative capture of a proton by $^{12}$C, crucially influencing the isotopic ratio of $^{12}$C to $^{13}$C observed in celestial bodies, including our Solar System. We have addressed this reaction mechanism by extrapolating to low-energy cross sections and S-factors with the aid of astrophysical R-matrix. Our investigation aims to shed light on its implications for nuclear reaction rates, thus influencing the abundance ratio of $^{12}$C to $^{13}$C in the cosmic environment.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Hydrodynamics of a hard-core non-polar active lattice gas
Authors:
Ritwik Mukherjee,
Soumyabrata Saha,
Tridib Sadhu,
Abhishek Dhar,
Sanjib Sabhapandit
Abstract:
We present a fluctuating hydrodynamic description of a non-polar active lattice gas model with excluded volume interactions that exhibits motility-induced phase separation under appropriate conditions. For quasi-one dimension and higher, stability analysis of the noiseless hydrodynamics gives quantitative bounds on the phase boundary of the motility-induced phase separation in terms of spinodal an…
▽ More
We present a fluctuating hydrodynamic description of a non-polar active lattice gas model with excluded volume interactions that exhibits motility-induced phase separation under appropriate conditions. For quasi-one dimension and higher, stability analysis of the noiseless hydrodynamics gives quantitative bounds on the phase boundary of the motility-induced phase separation in terms of spinodal and binodal. Inclusion of the multiplicative noise in the fluctuating hydrodynamics describes the exponentially decaying two-point correlations in the stationary-state homogeneous phase. Our hydrodynamic description and theoretical predictions based on it are in excellent agreement with our Monte-Carlo simulations and pseudo-spectral iteration of the hydrodynamics equations. Our construction of hydrodynamics for this model is not suitable in strictly one-dimension with single-file constraints, and we argue that this breakdown is associated with micro-phase separation.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
An Algorithm for the Decomposition of Complete Graph into Minimum Number of Edge-disjoint Trees
Authors:
Antika Sinha,
Sanjoy Kumar Saha,
Partha Basuchowdhuri
Abstract:
In this work, we study methodical decomposition of an undirected, unweighted complete graph ($K_n$ of order $n$, size $m$) into minimum number of edge-disjoint trees. We find that $x$, a positive integer, is minimum and $x=\lceil\frac{n}{2}\rceil$ as the edge set of $K_n$ is decomposed into edge-disjoint trees of size sequence $M = \{m_1,m_2,...,m_x\}$ where $m_i\le(n-1)$ and $Σ_{i=1}^{x} m_i$ =…
▽ More
In this work, we study methodical decomposition of an undirected, unweighted complete graph ($K_n$ of order $n$, size $m$) into minimum number of edge-disjoint trees. We find that $x$, a positive integer, is minimum and $x=\lceil\frac{n}{2}\rceil$ as the edge set of $K_n$ is decomposed into edge-disjoint trees of size sequence $M = \{m_1,m_2,...,m_x\}$ where $m_i\le(n-1)$ and $Σ_{i=1}^{x} m_i$ = $\frac{n(n-1)}{2}$. For decomposing the edge set of $K_n$ into minimum number of edge-disjoint trees, our proposed algorithm takes total $O(m)$ time.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
The communication power of a noisy qubit
Authors:
Saptarshi Roy,
Tamal Guha,
Sutapa Saha,
Giulio Chiribella
Abstract:
A fundamental property of quantum mechanics is that a single qubit can carry at most 1 bit of classical information. For an important class of quantum communication channels, known as entanglement-breaking, this limitation remains valid even if the sender and receiver share entangled particles before the start of the communication: for every entanglement-breaking channel, the rate at which classic…
▽ More
A fundamental property of quantum mechanics is that a single qubit can carry at most 1 bit of classical information. For an important class of quantum communication channels, known as entanglement-breaking, this limitation remains valid even if the sender and receiver share entangled particles before the start of the communication: for every entanglement-breaking channel, the rate at which classical messages can be reliably communicated cannot exceed 1 bit per transmitted qubit even with the assistance of quantum entanglement. But does this mean that, for the purpose of communicating classical messages, a noisy entanglement-breaking qubit channel can be replaced by a noisy bit channel? Here we answer the question in the negative. We introduce a game where a player (the sender) assists another player (the receiver) in finding a prize hidden into one of four possible boxes, while avoiding a bomb hidden in one of the three remaining boxes. In this game, the bomb cannot be avoided with certainty if the players communicate through a noisy bit channel. In contrast, the players can deterministically avoid the bomb and find the prize with a guaranteed 1/3 probability if they communicate through an entanglement-breaking qubit channel known as the universal NOT channel. We show that the features of the quantum strategy can be simulated with a noiseless bit channel, but this simulation requires the transmission to be assisted by shared randomness: without shared randomness, even the noiseless transmission of a three-level classical system cannot match the transmission of a single noisy qubit.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Enhancing Adverse Drug Event Detection with Multimodal Dataset: Corpus Creation and Model Development
Authors:
Pranab Sahoo,
Ayush Kumar Singh,
Sriparna Saha,
Aman Chadha,
Samrat Mondal
Abstract:
The mining of adverse drug events (ADEs) is pivotal in pharmacovigilance, enhancing patient safety by identifying potential risks associated with medications, facilitating early detection of adverse events, and guiding regulatory decision-making. Traditional ADE detection methods are reliable but slow, not easily adaptable to large-scale operations, and offer limited information. With the exponent…
▽ More
The mining of adverse drug events (ADEs) is pivotal in pharmacovigilance, enhancing patient safety by identifying potential risks associated with medications, facilitating early detection of adverse events, and guiding regulatory decision-making. Traditional ADE detection methods are reliable but slow, not easily adaptable to large-scale operations, and offer limited information. With the exponential increase in data sources like social media content, biomedical literature, and Electronic Medical Records (EMR), extracting relevant ADE-related information from these unstructured texts is imperative. Previous ADE mining studies have focused on text-based methodologies, overlooking visual cues, limiting contextual comprehension, and hindering accurate interpretation. To address this gap, we present a MultiModal Adverse Drug Event (MMADE) detection dataset, merging ADE-related textual information with visual aids. Additionally, we introduce a framework that leverages the capabilities of LLMs and VLMs for ADE detection by generating detailed descriptions of medical images depicting ADEs, aiding healthcare professionals in visually identifying adverse events. Using our MMADE dataset, we showcase the significance of integrating visual cues from images to enhance overall performance. This approach holds promise for patient safety, ADE awareness, and healthcare accessibility, paving the way for further exploration in personalized healthcare.
△ Less
Submitted 26 May, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
Photo-dynamical characterisation of the TOI-178 resonant chain
Authors:
A. Leleu,
J. -B. Delisle,
L. Delrez,
E. M. Bryant,
A. Brandeker,
H. P. Osborn,
N. Hara,
T. G. Wilson,
N. Billot,
M. Lendl,
D. Ehrenreich,
H. Chakraborty,
M. N. Günther,
M. J. Hooton,
Y. Alibert,
R. Alonso,
D. R. Alves,
D. R. Anderson,
I. Apergis,
D. Armstrong,
T. Bárczy,
D. Barrado Navascues,
S. C. C. Barros,
M. P. Battley,
W. Baumjohann
, et al. (82 additional authors not shown)
Abstract:
The TOI-178 system consists of a nearby late K-dwarf transited by six planets in the super-Earth to mini-Neptune regime, with radii ranging from 1.2 to 2.9 earth radius and orbital periods between 1.9 and 20.7 days. All planets but the innermost one form a chain of Laplace resonances. The fine-tuning and fragility of such orbital configurations ensure that no significant scattering or collision ev…
▽ More
The TOI-178 system consists of a nearby late K-dwarf transited by six planets in the super-Earth to mini-Neptune regime, with radii ranging from 1.2 to 2.9 earth radius and orbital periods between 1.9 and 20.7 days. All planets but the innermost one form a chain of Laplace resonances. The fine-tuning and fragility of such orbital configurations ensure that no significant scattering or collision event has taken place since the formation and migration of the planets in the protoplanetary disc, hence providing important anchors for planet formation models. We aim to improve the characterisation of the architecture of this key system, and in particular the masses and radii of its planets. In addition, since this system is one of the few resonant chains that can be characterised by both photometry and radial velocities, we aim to use it as a test bench for the robustness of the planetary mass determination with each technique. We perform a global analysis of all available photometry and radial velocity. We also try different sets of priors on the masses and eccentricity, as well as different stellar activity models, to study their effects on the masses estimated by each method. We show how stellar activity is preventing us from obtaining a robust mass estimation for the three outer planets using radial velocity data alone. We also show that our joint photo-dynamical and radial velocity analysis resulted in a robust mass determination for planets c to g, with precision of 12% for the mass of planet c, and better than 10% for planets d to g. The new precisions on the radii range from 2 to 3%. The understanding of this synergy between photometric and radial velocity measurements will be valuable during the PLATO mission. We also show that TOI-178 is indeed currently locked in the resonant configuration, librating around an equilibrium of the chain.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Quantile Activation: departing from single point estimation for better generalization across distortions
Authors:
Aditya Challa,
Sravan Danda,
Laurent Najman,
Snehanshu Saha
Abstract:
A classifier is, in its essence, a function which takes an input and returns the class of the input and implicitly assumes an underlying distribution. We argue in this article that one has to move away from this basic tenet to obtain generalisation across distributions. Specifically, the class of the sample should depend on the points from its context distribution for better generalisation across…
▽ More
A classifier is, in its essence, a function which takes an input and returns the class of the input and implicitly assumes an underlying distribution. We argue in this article that one has to move away from this basic tenet to obtain generalisation across distributions. Specifically, the class of the sample should depend on the points from its context distribution for better generalisation across distributions.
How does one achieve this? The key idea is to adapt the outputs of each neuron of the network to its context distribution. We propose quantile activation, QACT, which, in simple terms, outputs the relative quantile of the sample in its context distribution, instead of the actual values in traditional networks.
The scope of this article is to validate the proposed activation across several experimental settings, and compare it with conventional techniques. For this, we use the datasets developed to test robustness against distortions CIFAR10C, CIFAR100C, MNISTC, TinyImagenetC, and show that we achieve a significantly higher generalisation across distortions than the conventional classifiers, across different architectures. Although this paper is only a proof of concept, we surprisingly find that this approach outperforms DINOv2(small) at large distortions, even though DINOv2 is trained with a far bigger network on a considerably larger dataset.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Towards Knowledge-Infused Automated Disease Diagnosis Assistant
Authors:
Mohit Tomar,
Abhisek Tiwari,
Sriparna Saha
Abstract:
With the advancement of internet communication and telemedicine, people are increasingly turning to the web for various healthcare activities. With an ever-increasing number of diseases and symptoms, diagnosing patients becomes challenging. In this work, we build a diagnosis assistant to assist doctors, which identifies diseases based on patient-doctor interaction. During diagnosis, doctors utiliz…
▽ More
With the advancement of internet communication and telemedicine, people are increasingly turning to the web for various healthcare activities. With an ever-increasing number of diseases and symptoms, diagnosing patients becomes challenging. In this work, we build a diagnosis assistant to assist doctors, which identifies diseases based on patient-doctor interaction. During diagnosis, doctors utilize both symptomatology knowledge and diagnostic experience to identify diseases accurately and efficiently. Inspired by this, we investigate the role of medical knowledge in disease diagnosis through doctor-patient interaction. We propose a two-channel, knowledge-infused, discourse-aware disease diagnosis model (KI-DDI), where the first channel encodes patient-doctor communication using a transformer-based encoder, while the other creates an embedding of symptom-disease using a graph attention network (GAT). In the next stage, the conversation and knowledge graph embeddings are infused together and fed to a deep neural network for disease identification. Furthermore, we first develop an empathetic conversational medical corpus comprising conversations between patients and doctors, annotated with intent and symptoms information. The proposed model demonstrates a significant improvement over the existing state-of-the-art models, establishing the crucial roles of (a) a doctor's effort for additional symptom extraction (in addition to patient self-report) and (b) infusing medical knowledge in identifying diseases effectively. Many times, patients also show their medical conditions, which acts as crucial evidence in diagnosis. Therefore, integrating visual sensory information would represent an effective avenue for enhancing the capabilities of diagnostic assistants.
△ Less
Submitted 18 May, 2024;
originally announced May 2024.
-
Confidence Estimation in Unsupervised Deep Change Vector Analysis
Authors:
Sudipan Saha
Abstract:
Unsupervised transfer learning-based change detection methods exploit the feature extraction capability of pre-trained networks to distinguish changed pixels from the unchanged ones. However, their performance may vary significantly depending on several geographical and model-related aspects. In many applications, it is of utmost importance to provide trustworthy or confident results, even if over…
▽ More
Unsupervised transfer learning-based change detection methods exploit the feature extraction capability of pre-trained networks to distinguish changed pixels from the unchanged ones. However, their performance may vary significantly depending on several geographical and model-related aspects. In many applications, it is of utmost importance to provide trustworthy or confident results, even if over a subset of pixels. The core challenge in this problem is to identify changed pixels and confident pixels in an unsupervised manner. To address this, we propose a two-network model - one tasked with mere change detection and the other with confidence estimation. While the change detection network can be used in conjunction with popular transfer learning-based change detection methods such as Deep Change Vector Analysis, the confidence estimation network operates similarly to a randomized smoothing model. By ingesting ensembles of inputs perturbed by noise, it creates a distribution over the output and assigns confidence to each pixel's outcome. We tested the proposed method on three different Earth observation sensors: optical, Synthetic Aperture Radar, and hyperspectral sensors.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Neutron and $\boldsymbolγ$-ray Discrimination by a Pressurized Helium-4 Based Scintillation Detector
Authors:
Shubham Dutta,
Sayan Ghosh,
Satyajit Saha
Abstract:
Pressurized helium-4 based fast neutron scintillation detector offers an useful alternative to organic liquid-based scintillator due to its relatively low response to the $γ$-rays compared to the latter type of scintillator. In the present work, we have investigated the capabilities of a pressurized $^4$He (PHe) detector for the detection of fast neutrons in a mixed radiation field where both the…
▽ More
Pressurized helium-4 based fast neutron scintillation detector offers an useful alternative to organic liquid-based scintillator due to its relatively low response to the $γ$-rays compared to the latter type of scintillator. In the present work, we have investigated the capabilities of a pressurized $^4$He (PHe) detector for the detection of fast neutrons in a mixed radiation field where both the neutrons and the $γ$-rays are present. Discrimination between neutrons and $γ$-rays is achieved by using fast-slow charge integration method. We have also conducted systematic studies of the attenuation of fast neutrons and $γ$-rays by high-density polyethylene (HDPE). These studies are further corroborated by simulation analyses conducted using GEANT4, which show qualitative agreement with the experimental results. Additionally, the simulation provides detailed insights into the interactions of the radiation quanta with the PHe detector. Estimates of the scintillation signal yield are made based on our GEANT4 simulation results by considering the scintillation mechanism in the PHe gas.
△ Less
Submitted 1 July, 2024; v1 submitted 15 May, 2024;
originally announced May 2024.
-
Unveiling Hallucination in Text, Image, Video, and Audio Foundation Models: A Comprehensive Survey
Authors:
Pranab Sahoo,
Prabhash Meharia,
Akash Ghosh,
Sriparna Saha,
Vinija Jain,
Aman Chadha
Abstract:
The rapid advancement of foundation models (FMs) across language, image, audio, and video domains has shown remarkable capabilities in diverse tasks. However, the proliferation of FMs brings forth a critical challenge: the potential to generate hallucinated outputs, particularly in high-stakes applications. The tendency of foundation models to produce hallucinated content arguably represents the b…
▽ More
The rapid advancement of foundation models (FMs) across language, image, audio, and video domains has shown remarkable capabilities in diverse tasks. However, the proliferation of FMs brings forth a critical challenge: the potential to generate hallucinated outputs, particularly in high-stakes applications. The tendency of foundation models to produce hallucinated content arguably represents the biggest hindrance to their widespread adoption in real-world scenarios, especially in domains where reliability and accuracy are paramount. This survey paper presents a comprehensive overview of recent developments that aim to identify and mitigate the problem of hallucination in FMs, spanning text, image, video, and audio modalities. By synthesizing recent advancements in detecting and mitigating hallucination across various modalities, the paper aims to provide valuable insights for researchers, developers, and practitioners. Essentially, it establishes a clear framework encompassing definition, taxonomy, and detection strategies for addressing hallucination in multimodal foundation models, laying the foundation for future research in this pivotal area.
△ Less
Submitted 20 May, 2024; v1 submitted 15 May, 2024;
originally announced May 2024.
-
Weighted past and paired dynamic varentropy measures, their properties and usefulness
Authors:
Shital Saha,
Suchandan Kayal
Abstract:
We introduce two uncertainty measures, say weighted past varentropy (WPVE) and weighted paired dynamic varentropy (WPDVE). Several properties have been studied for these proposed measures. The effect of the monotone transformation for these measures have been discussed. We have obtained an upper bound of the WPVE using the weighted past Shannon entropy. A lower bound of the WPVE is also obtained.…
▽ More
We introduce two uncertainty measures, say weighted past varentropy (WPVE) and weighted paired dynamic varentropy (WPDVE). Several properties have been studied for these proposed measures. The effect of the monotone transformation for these measures have been discussed. We have obtained an upper bound of the WPVE using the weighted past Shannon entropy. A lower bound of the WPVE is also obtained. The WPVE has been studied for proportional reversed hazard rate (PRHR) models. Upper and lower bounds of the WPDVE have been derived. We propose non-parametric kernel estimates of the WPVE and WPDVE. Further, maximum likelihood estimation has been employed to estimate WPVE and WPDVE for an exponential population. A numerical simulation is provided to observe the behaviour of the proposed estimates. Finally, we have analysed a real data set and obtain the estimated values of WPVE.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Demystifying Behavior-Based Malware Detection at Endpoints
Authors:
Yigitcan Kaya,
Yizheng Chen,
Shoumik Saha,
Fabio Pierazzi,
Lorenzo Cavallaro,
David Wagner,
Tudor Dumitras
Abstract:
Machine learning is widely used for malware detection in practice. Prior behavior-based detectors most commonly rely on traces of programs executed in controlled sandboxes. However, sandbox traces are unavailable to the last line of defense offered by security vendors: malware detection at endpoints. A detector at endpoints consumes the traces of programs running on real-world hosts, as sandbox an…
▽ More
Machine learning is widely used for malware detection in practice. Prior behavior-based detectors most commonly rely on traces of programs executed in controlled sandboxes. However, sandbox traces are unavailable to the last line of defense offered by security vendors: malware detection at endpoints. A detector at endpoints consumes the traces of programs running on real-world hosts, as sandbox analysis might introduce intolerable delays. Despite their success in the sandboxes, research hints at potential challenges for ML methods at endpoints, e.g., highly variable malware behaviors. Nonetheless, the impact of these challenges on existing approaches and how their excellent sandbox performance translates to the endpoint scenario remain unquantified.
We present the first measurement study of the performance of ML-based malware detectors at real-world endpoints. Leveraging a dataset of sandbox traces and a dataset of in-the-wild program traces; we evaluate two scenarios where the endpoint detector was trained on (i) sandbox traces (convenient and accessible); and (ii) endpoint traces (less accessible due to needing to collect telemetry data). This allows us to identify a wide gap between prior methods' sandbox-based detection performance--over 90%--and endpoint performances--below 20% and 50% in (i) and (ii), respectively. We pinpoint and characterize the challenges contributing to this gap, such as label noise, behavior variability, or sandbox evasion. To close this gap, we propose that yield a relative improvement of 5-30% over the baselines. Our evidence suggests that applying detectors trained on sandbox data to endpoint detection -- scenario (i) -- is challenging. The most promising direction is training detectors on endpoint data -- scenario (ii) -- which marks a departure from widespread practice. We implement a leaderboard for realistic detector evaluations to promote research.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Revitalising Stagecraft: NLP-Driven Sentiment Analysis for Traditional Theater Revival
Authors:
Saikat Samanta,
Saptarshi Karmakar,
Satayajay Behuria,
Shibam Dutta,
Soujit Das,
Soumik Saha
Abstract:
This paper explores the application of FilmFrenzy, a python based ticket booking web application, in the revival of traditional Indian theatres. Additionally, this research paper explores how NLP can be implemented to improve user experience. Through clarifying audience views and pinpointing opportunities for development, FilmFrenzy aims to promote involvement and rejuvenation in India's conventio…
▽ More
This paper explores the application of FilmFrenzy, a python based ticket booking web application, in the revival of traditional Indian theatres. Additionally, this research paper explores how NLP can be implemented to improve user experience. Through clarifying audience views and pinpointing opportunities for development, FilmFrenzy aims to promote involvement and rejuvenation in India's conventional theatre scene. The platform seeks to maintain the relevance and vitality of conventional theatres by bridging the gap between audiences and them through the incorporation of contemporary technologies, especially NLP. This research envisions a future in which technology plays a crucial part in maintaining India's rich theatrical traditions, thereby contributing to the preservation and development of cultural heritage. With sentiment analysis and natural language processing (NLP) as essential instruments for improving stagecraft, the research envisions a period when traditional theatre will still be vibrant.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Third Harmonic Enhancement Harnessing Photoexcitation Unveils New Nonlinearities in Zinc Oxide
Authors:
Soham Saha,
Sudip Gurung,
Benjamin T. Diroll,
Suman Chakraborty,
Ohad Segal,
Mordechai Segev,
Vladimir M. Shalaev,
Alexander V. Kildishev,
Alexandra Boltasseva,
Richard D. Schaller
Abstract:
Nonlinear optical phenomena are at the heart of various technological domains such as high-speed data transfer, optical logic applications, and emerging fields such as non-reciprocal optics and photonic time crystal design. However, conventional nonlinear materials exhibit inherent limitations in the post-fabrication tailoring of their nonlinear optical properties. Achieving real-time control over…
▽ More
Nonlinear optical phenomena are at the heart of various technological domains such as high-speed data transfer, optical logic applications, and emerging fields such as non-reciprocal optics and photonic time crystal design. However, conventional nonlinear materials exhibit inherent limitations in the post-fabrication tailoring of their nonlinear optical properties. Achieving real-time control over optical nonlinearities remains a challenge. In this work, we demonstrate a method to switch third harmonic generation (THG), a commonly occurring nonlinear optical response. Third harmonic generation enhancements up to 50 times are demonstrated in zinc oxide films via the photoexcited state generation and tunable electric field enhancement. More interestingly, the enhanced third harmonic generation follows a quadratic scaling with incident power, as opposed to the conventional cubic scaling, which demonstrates a previously unreported mechanism of third harmonic generation. The THG can also be suppressed by modulating the optical losses in the film. This work shows that the photoexcitation of states can not only enhance nonlinearities, but can create new processes for third harmonic generation. Importantly, the proposed method enables real-time manipulation of the nonlinear response of a medium. The process is switchable and reversible, with the modulations occurring at picosecond timescale. Our study paves the way to boost or suppress the nonlinearities of solid-state media, enabling robust, switchable sources for nonlinear optical applications.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Exploring Explainable AI Techniques for Improved Interpretability in Lung and Colon Cancer Classification
Authors:
Mukaffi Bin Moin,
Fatema Tuj Johora Faria,
Swarnajit Saha,
Busra Kamal Rafa,
Mohammad Shafiul Alam
Abstract:
Lung and colon cancer are serious worldwide health challenges that require early and precise identification to reduce mortality risks. However, diagnosis, which is mostly dependent on histopathologists' competence, presents difficulties and hazards when expertise is insufficient. While diagnostic methods like imaging and blood markers contribute to early detection, histopathology remains the gold…
▽ More
Lung and colon cancer are serious worldwide health challenges that require early and precise identification to reduce mortality risks. However, diagnosis, which is mostly dependent on histopathologists' competence, presents difficulties and hazards when expertise is insufficient. While diagnostic methods like imaging and blood markers contribute to early detection, histopathology remains the gold standard, although time-consuming and vulnerable to inter-observer mistakes. Limited access to high-end technology further limits patients' ability to receive immediate medical care and diagnosis. Recent advances in deep learning have generated interest in its application to medical imaging analysis, specifically the use of histopathological images to diagnose lung and colon cancer. The goal of this investigation is to use and adapt existing pre-trained CNN-based models, such as Xception, DenseNet201, ResNet101, InceptionV3, DenseNet121, DenseNet169, ResNet152, and InceptionResNetV2, to enhance classification through better augmentation strategies. The results show tremendous progress, with all eight models reaching impressive accuracy ranging from 97% to 99%. Furthermore, attention visualization techniques such as GradCAM, GradCAM++, ScoreCAM, Faster Score-CAM, and LayerCAM, as well as Vanilla Saliency and SmoothGrad, are used to provide insights into the models' classification decisions, thereby improving interpretability and understanding of malignant and benign image classification.
△ Less
Submitted 14 May, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Study of the $^7$Be($d$,$^3$He)$^6$Li* reaction at 5 MeV/u
Authors:
Sk M. Ali,
D. Gupta,
K. Kundalia,
S. Maity,
Swapan K Saha,
O. Tengblad,
J. D. Ovejas,
A. Perea,
I. Martel,
J. Cederkall,
J. Park,
A. M. Moro
Abstract:
The measurement of the $^7$Be($d$,$^3$He)$^6$Li* transfer cross section at 5 MeV/u is carried out. The population of the 2.186 MeV excited state of $^6$Li in this reaction channel is observed for the first time. The experimental angular distributions have been analyzed in the finite range DWBA and coupled-channel frameworks. The effect of the $^7$Be($d$,$^3$He)$^6$Li reaction on both the $^6$Li an…
▽ More
The measurement of the $^7$Be($d$,$^3$He)$^6$Li* transfer cross section at 5 MeV/u is carried out. The population of the 2.186 MeV excited state of $^6$Li in this reaction channel is observed for the first time. The experimental angular distributions have been analyzed in the finite range DWBA and coupled-channel frameworks. The effect of the $^7$Be($d$,$^3$He)$^6$Li reaction on both the $^6$Li and $^7$Li abundances are investigated at the relevant big-bang nucleosynthesis energies. The excitation function is calculated by TALYS and normalized to the experimental data. The $S$ factor of the ($d$,$^3$He) channel from the present work is about 50$\%$ lower than existing data at nearby energies. At big-bang energies, the $S$ factor is about three orders of magnitude smaller than that of the ($d,p$) channel. The ($d$,$^3$He) reaction rate is found to have a less than 0.1$\%$ effect on the $^{6,7}$Li abundances.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Large deviations of current for the symmetric simple exclusion process on a semi-infinite line and on an infinite line with a slow bond
Authors:
Kapil Sharma,
Soumyabrata Saha,
Sandeep Jangid,
Tridib Sadhu
Abstract:
Two of the most influential exact results in classical one-dimensional diffusive transport are about current statistics for the symmetric simple exclusion process in the stationary state on a finite line coupled with two unequal reservoirs at the boundary, and in the non-stationary state on an infinite line. We present the corresponding result for the intermediate geometry of a semi-infinite line…
▽ More
Two of the most influential exact results in classical one-dimensional diffusive transport are about current statistics for the symmetric simple exclusion process in the stationary state on a finite line coupled with two unequal reservoirs at the boundary, and in the non-stationary state on an infinite line. We present the corresponding result for the intermediate geometry of a semi-infinite line coupled with a single reservoir. This result is obtained using the fluctuating hydrodynamics approach of macroscopic fluctuation theory and confirmed by rare event simulations using a cloning algorithm. Our exact result enables us to address the corresponding problem on an infinite line in the presence of a slow bond and several related problems.
△ Less
Submitted 12 May, 2024; v1 submitted 1 May, 2024;
originally announced May 2024.
-
Comprehensive Forecasting-Based Analysis of Hybrid and Stacked Stateful/ Stateless Models
Authors:
Swayamjit Saha
Abstract:
Wind speed is a powerful source of renewable energy, which can be used as an alternative to the non-renewable resources for production of electricity. Renewable sources are clean, infinite and do not impact the environment negatively during production of electrical energy. However, while eliciting electrical energy from renewable resources viz. solar irradiance, wind speed, hydro should require sp…
▽ More
Wind speed is a powerful source of renewable energy, which can be used as an alternative to the non-renewable resources for production of electricity. Renewable sources are clean, infinite and do not impact the environment negatively during production of electrical energy. However, while eliciting electrical energy from renewable resources viz. solar irradiance, wind speed, hydro should require special planning failing which may result in huge loss of labour and money for setting up the system. In this paper, we discuss four deep recurrent neural networks viz. Stacked Stateless LSTM, Stacked Stateless GRU, Stacked Stateful LSTM and Statcked Stateful GRU which will be used to predict wind speed on a short-term basis for the airport sites beside two campuses of Mississippi State University. The paper does a comprehensive analysis of the performance of the models used describing their architectures and how efficiently they elicit the results with the help of RMSE values. A detailed description of the time and space complexities of the above models has also been discussed.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
ir_explain: a Python Library of Explainable IR Methods
Authors:
Sourav Saha,
Harsh Agarwal,
Swastik Mohanty,
Mandar Mitra,
Debapriyo Majumdar
Abstract:
While recent advancements in Neural Ranking Models have resulted in significant improvements over traditional statistical retrieval models, it is generally acknowledged that the use of large neural architectures and the application of complex language models in Information Retrieval (IR) have reduced the transparency of retrieval methods. Consequently, Explainability and Interpretability have emer…
▽ More
While recent advancements in Neural Ranking Models have resulted in significant improvements over traditional statistical retrieval models, it is generally acknowledged that the use of large neural architectures and the application of complex language models in Information Retrieval (IR) have reduced the transparency of retrieval methods. Consequently, Explainability and Interpretability have emerged as important research topics in IR. Several axiomatic and post-hoc explanation methods, as well as approaches that attempt to be interpretable-by-design, have been proposed. This article presents \irexplain, an open-source Python library that implements a variety of well-known techniques for Explainable IR (ExIR) within a common, extensible framework. \irexplain supports the three standard categories of post-hoc explanations, namely pointwise, pairwise, and listwise explanations. The library is designed to make it easy to reproduce state-of-the-art ExIR baselines on standard test collections, as well as to explore new approaches to explaining IR models and methods. To facilitate adoption, \irexplain is well-integrated with widely-used toolkits such as Pyserini and \irdatasets.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Quantification of 2D Interfaces: Quality of heterostructures, and what is inside a nanobubble
Authors:
Mainak Mondal,
Pawni Manchanda,
Soumadeep Saha,
Abhishek Jangid,
Akshay Singh
Abstract:
Trapped materials at the interfaces of two-dimensional heterostructures (HS) lead to reduced coupling between the layers, resulting in degraded optoelectronic performance and device variability. Further, nanobubbles can form at the interface during transfer or after annealing. The question of what is inside a nanobubble, i.e. the trapped material, remains unanswered, limiting the studies and appli…
▽ More
Trapped materials at the interfaces of two-dimensional heterostructures (HS) lead to reduced coupling between the layers, resulting in degraded optoelectronic performance and device variability. Further, nanobubbles can form at the interface during transfer or after annealing. The question of what is inside a nanobubble, i.e. the trapped material, remains unanswered, limiting the studies and applications of these nanobubble systems. In this work, we report two key advances. Firstly, we quantify the interface quality using RAW-format optical imaging, and distinguish between ideal and non-ideal interfaces. The HS-substrate ratio value is calculated using a transfer matrix model, and is able to detect the presence of trapped layers. The second key advance is identification of water as the trapped material inside a nanobubble. To the best of our knowledge, this is the first study to show that optical imaging alone can quantify interface quality, and find the type of trapped material inside spontaneously formed nanobubbles. We also define a quality index parameter to quantify the interface quality of HS. Quantitative measurement of the interface will help answer the question whether annealing is necessary during HS preparation, and will enable creation of complex HS with small twist angles. Identification of the trapped materials will pave the way towards using nanobubbles for novel optical and engineering applications.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Fast photon-mediated entanglement of continuously-cooled trapped ions for quantum networking
Authors:
Jameson O'Reilly,
George Toh,
Isabella Goetting,
Sagnik Saha,
Mikhail Shalaev,
Allison Carter,
Andrew Risinger,
Ashish Kalakuntla,
Tingguang Li,
Ashrit Verma,
Christopher Monroe
Abstract:
We entangle two co-trapped atomic barium ion qubits by collecting single visible photons from each ion through in-vacuo 0.8 NA objectives, interfering them through an integrated fiber-beamsplitter and detecting them in coincidence. This projects the qubits into an entangled Bell state with an observed fidelity lower bound of F > 94%. We also introduce an ytterbium ion for sympathetic cooling to re…
▽ More
We entangle two co-trapped atomic barium ion qubits by collecting single visible photons from each ion through in-vacuo 0.8 NA objectives, interfering them through an integrated fiber-beamsplitter and detecting them in coincidence. This projects the qubits into an entangled Bell state with an observed fidelity lower bound of F > 94%. We also introduce an ytterbium ion for sympathetic cooling to remove the need for recooling interruptions and achieve a continuous entanglement rate of 250 1/s.
△ Less
Submitted 2 July, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
Microstructural features governing fracture of a two-dimensional amorphous solid identified by machine learning
Authors:
Max Huisman,
Axel Huerre,
Saikat Saha,
John C. Crocker,
Valeria Garbin
Abstract:
Brittle fracturing of materials is common in natural and industrial processes over a variety of length scales. Knowledge of individual particle dynamics is vital to obtain deeper insight into the atomistic processes governing crack propagation in such materials, yet it is challenging to obtain these details in experiments. We propose an experimental approach where isotropic dilational strain is ap…
▽ More
Brittle fracturing of materials is common in natural and industrial processes over a variety of length scales. Knowledge of individual particle dynamics is vital to obtain deeper insight into the atomistic processes governing crack propagation in such materials, yet it is challenging to obtain these details in experiments. We propose an experimental approach where isotropic dilational strain is applied to a densely packed monolayer of attractive colloidal microspheres, resulting in fracture. Using brightfield microscopy and particle tracking, we examine the microstructural evolution of the monolayer during fracturing. Furthermore, using a quantified representation of the microstructure in combination with a machine learning algorithm, we calculate the likelihood of regions of the monolayer to be on a crack line, which we term Weakness. From this analysis, we identify the most important contributions to crack propagation and find that local density is more important than orientational order. Our methodology and results provide a basis for further research on microscopic processes during the fracturing process.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Planet Hunters NGTS: New Planet Candidates from a Citizen Science Search of the Next Generation Transit Survey Public Data
Authors:
Sean M. O'Brien,
Megan E. Schwamb,
Samuel Gill,
Christopher A. Watson,
Matthew R. Burleigh,
Alicia Kendall,
David R. Anderson,
José I. Vines,
James S. Jenkins,
Douglas R. Alves,
Laura Trouille,
Solène Ulmer-Moll,
Edward M. Bryant,
Ioannis Apergis,
Matthew P. Battley,
Daniel Bayliss,
Nora L. Eisner,
Edward Gillen,
Michael R. Goad,
Maximilian N. Günther,
Beth A. Henderson,
Jeong-Eun Heo,
David G. Jackson,
Chris Lintott,
James McCormac
, et al. (13 additional authors not shown)
Abstract:
We present the results from the first two years of the Planet Hunters NGTS citizen science project, which searches for transiting planet candidates in data from the Next Generation Transit Survey (NGTS) by enlisting the help of members of the general public. Over 8,000 registered volunteers reviewed 138,198 light curves from the NGTS Public Data Releases 1 and 2. We utilize a user weighting scheme…
▽ More
We present the results from the first two years of the Planet Hunters NGTS citizen science project, which searches for transiting planet candidates in data from the Next Generation Transit Survey (NGTS) by enlisting the help of members of the general public. Over 8,000 registered volunteers reviewed 138,198 light curves from the NGTS Public Data Releases 1 and 2. We utilize a user weighting scheme to combine the classifications of multiple users to identify the most promising planet candidates not initially discovered by the NGTS team. We highlight the five most interesting planet candidates detected through this search, which are all candidate short-period giant planets. This includes the TIC-165227846 system that, if confirmed, would be the lowest-mass star to host a close-in giant planet. We assess the detection efficiency of the project by determining the number of confirmed planets from the NASA Exoplanet Archive and TESS Objects of Interest (TOIs) successfully recovered by this search and find that 74% of confirmed planets and 63% of TOIs detected by NGTS are recovered by the Planet Hunters NGTS project. The identification of new planet candidates shows that the citizen science approach can provide a complementary method to the detection of exoplanets with ground-based surveys such as NGTS.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
A Griffith description of fracture for non-monotonic loading with application to fatigue
Authors:
Subhrangsu Saha,
John E. Dolbow,
Oscar Lopez-Pamies
Abstract:
With the fundamental objective of establishing the universality of the Griffith energy competition to describe the growth of large cracks in solids \emph{not} just under monotonic but under general loading conditions, this paper puts forth a generalization of the classical Griffith energy competition in nominally elastic brittle materials to arbitrary \emph{non-monotonic} quasistatic loading condi…
▽ More
With the fundamental objective of establishing the universality of the Griffith energy competition to describe the growth of large cracks in solids \emph{not} just under monotonic but under general loading conditions, this paper puts forth a generalization of the classical Griffith energy competition in nominally elastic brittle materials to arbitrary \emph{non-monotonic} quasistatic loading conditions, which include monotonic and cyclic loadings as special cases. Centered around experimental observations, the idea consists in: $i$) viewing the critical energy release rate $\mathcal{G}_c$ \emph{not} as a material constant but rather as a material function of both space $\textbf{X}$ and time $t$, $ii$) one that decreases in value as the loading progresses, this solely within a small region $Ω_\ell(t)$ around crack fronts, with the characteristic size $\ell$ of such a region being material specific, and $iii$) with the decrease in value of $\mathcal{G}_c$ being dependent on the history of the elastic fields in $Ω_\ell(t)$. By construction, the proposed Griffith formulation is able to describe any Paris-law behavior of the growth of large cracks in nominally elastic brittle materials for the limiting case when the loading is cyclic. For the opposite limiting case when the loading is monotonic, the formulation reduces to the classical Griffith formulation. Additional properties of the proposed formulation are illustrated via a parametric analysis and direct comparisons with representative fatigue fracture experiments on a ceramic, mortar, and PMMA.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
Engineering software 2.0 by interpolating neural networks: unifying training, solving, and calibration
Authors:
Chanwook Park,
Sourav Saha,
Jiachen Guo,
Xiaoyu Xie,
Satyajit Mojumder,
Miguel A. Bessa,
Dong Qian,
Wei Chen,
Gregory J. Wagner,
Jian Cao,
Wing Kam Liu
Abstract:
The evolution of artificial intelligence (AI) and neural network theories has revolutionized the way software is programmed, shifting from a hard-coded series of codes to a vast neural network. However, this transition in engineering software has faced challenges such as data scarcity, multi-modality of data, low model accuracy, and slow inference. Here, we propose a new network based on interpola…
▽ More
The evolution of artificial intelligence (AI) and neural network theories has revolutionized the way software is programmed, shifting from a hard-coded series of codes to a vast neural network. However, this transition in engineering software has faced challenges such as data scarcity, multi-modality of data, low model accuracy, and slow inference. Here, we propose a new network based on interpolation theories and tensor decomposition, the interpolating neural network (INN). Instead of interpolating training data, a common notion in computer science, INN interpolates interpolation points in the physical space whose coordinates and values are trainable. It can also extrapolate if the interpolation points reside outside of the range of training data and the interpolation functions have a larger support domain. INN features orders of magnitude fewer trainable parameters, faster training, a smaller memory footprint, and higher model accuracy compared to feed-forward neural networks (FFNN) or physics-informed neural networks (PINN). INN is poised to usher in Engineering Software 2.0, a unified neural network that spans various domains of space, time, parameters, and initial/boundary conditions. This has previously been computationally prohibitive due to the exponentially growing number of trainable parameters, easily exceeding the parameter size of ChatGPT, which is over 1 trillion. INN addresses this challenge by leveraging tensor decomposition and tensor product, with adaptable network architecture.
△ Less
Submitted 22 April, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Improving Shift Invariance in Convolutional Neural Networks with Translation Invariant Polyphase Sampling
Authors:
Sourajit Saha,
Tejas Gokhale
Abstract:
Downsampling operators break the shift invariance of convolutional neural networks (CNNs) and this affects the robustness of features learned by CNNs when dealing with even small pixel-level shift. Through a large-scale correlation analysis framework, we study shift invariance of CNNs by inspecting existing downsampling operators in terms of their maximum-sampling bias (MSB), and find that MSB is…
▽ More
Downsampling operators break the shift invariance of convolutional neural networks (CNNs) and this affects the robustness of features learned by CNNs when dealing with even small pixel-level shift. Through a large-scale correlation analysis framework, we study shift invariance of CNNs by inspecting existing downsampling operators in terms of their maximum-sampling bias (MSB), and find that MSB is negatively correlated with shift invariance. Based on this crucial insight, we propose a learnable pooling operator called Translation Invariant Polyphase Sampling (TIPS) and two regularizations on the intermediate feature maps of TIPS to reduce MSB and learn translation-invariant representations. TIPS can be integrated into any CNN and can be trained end-to-end with marginal computational overhead. Our experiments demonstrate that TIPS results in consistent performance gains in terms of accuracy, shift consistency, and shift fidelity on multiple benchmarks for image classification and semantic segmentation compared to previous methods and also leads to improvements in adversarial and distributional robustness. TIPS results in the lowest MSB compared to all previous methods, thus explaining our strong empirical results.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions
Authors:
Akash Ghosh,
Arkadeep Acharya,
Sriparna Saha,
Vinija Jain,
Aman Chadha
Abstract:
The advent of Large Language Models (LLMs) has significantly reshaped the trajectory of the AI revolution. Nevertheless, these LLMs exhibit a notable limitation, as they are primarily adept at processing textual information. To address this constraint, researchers have endeavored to integrate visual capabilities with LLMs, resulting in the emergence of Vision-Language Models (VLMs). These advanced…
▽ More
The advent of Large Language Models (LLMs) has significantly reshaped the trajectory of the AI revolution. Nevertheless, these LLMs exhibit a notable limitation, as they are primarily adept at processing textual information. To address this constraint, researchers have endeavored to integrate visual capabilities with LLMs, resulting in the emergence of Vision-Language Models (VLMs). These advanced models are instrumental in tackling more intricate tasks such as image captioning and visual question answering. In our comprehensive survey paper, we delve into the key advancements within the realm of VLMs. Our classification organizes VLMs into three distinct categories: models dedicated to vision-language understanding, models that process multimodal inputs to generate unimodal (textual) outputs and models that both accept and produce multimodal inputs and outputs.This classification is based on their respective capabilities and functionalities in processing and generating various modalities of data.We meticulously dissect each model, offering an extensive analysis of its foundational architecture, training data sources, as well as its strengths and limitations wherever possible, providing readers with a comprehensive understanding of its essential components. We also analyzed the performance of VLMs in various benchmark datasets. By doing so, we aim to offer a nuanced understanding of the diverse landscape of VLMs. Additionally, we underscore potential avenues for future research in this dynamic domain, anticipating further breakthroughs and advancements.
△ Less
Submitted 12 April, 2024; v1 submitted 20 February, 2024;
originally announced April 2024.