subscribe to arXiv mailings

Elasticity affects the shock-induced aerobreakup of a polymeric droplet

Authors: Navin Kumar Chandra, Shubham Sharma, Saptarshi Basu, Aloke Kumar

Abstract: Boger fluids are viscoelastic liquids having constant viscosity for a broad range of shear rates. They are commonly used to separate the effects of liquid elasticity from viscosity in any experiment. We present an experimental study on the shock-induced aerobreakup of a Boger fluid droplet in the Shear-induced entrainment (SIE) and catastrophic breakup regime (Weber number ranging from ~ 800 to 50… ▽ More Boger fluids are viscoelastic liquids having constant viscosity for a broad range of shear rates. They are commonly used to separate the effects of liquid elasticity from viscosity in any experiment. We present an experimental study on the shock-induced aerobreakup of a Boger fluid droplet in the Shear-induced entrainment (SIE) and catastrophic breakup regime (Weber number ranging from ~ 800 to 5000). The results are compared with the aerobreakup of a Newtonian droplet having similar viscosity, and with shear-thinning droplets. The study aims to identify the role of liquid elasticity without the added complexity of simultaneous shear-thinning behavior. It is observed that at the early stages of droplet breakup, liquid elasticity plays an insignificant role, and all the fluids show similar behavior. However, during the late stages, the impact of liquid elasticity becomes dominant, which results in a markedly different morphology of the fragmenting liquid mass compared to a Newtonian droplet. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.06237 [pdf, other]

doi 10.1109/SAUS61785.2024.10563527

On the Echogenicity of Natural Starch-Based Blood Mimicking Fluids for Contrast Enhanced Ultrasound Imaging: Preliminary In-vitro Experiments

Authors: V. Arun Kumar, A. N. Madhavanunni, S. Nivetha, Mahesh Raveendranatha Panicker

Abstract: Natural starch-based blood-mimicking fluid (BMF) has been used as an alternative to commercially available BMFs for in-vitro Doppler investigations in low-resource settings. Most reported works in the literature have used corn starch-based BMF. Evaluation of other natural starches for potential BMF and their characterization have relatively been unexplored in the literature. To this end, this work… ▽ More Natural starch-based blood-mimicking fluid (BMF) has been used as an alternative to commercially available BMFs for in-vitro Doppler investigations in low-resource settings. Most reported works in the literature have used corn starch-based BMF. Evaluation of other natural starches for potential BMF and their characterization have relatively been unexplored in the literature. To this end, this work investigates the echogenicity of corn-, potato-, tapioca-, and wheat starch-based BMFs prepared using a liquid base of pure water-glycerol mixture with three different starch concentrations (1%, 3%, and 5%). The experiments were performed by manually pumping the BMFs to a PolyVinyl alcohol (PVA) based flow phantom using a syringe and raw datasets were acquired using a Verasonics Vantage 128 Research Ultrasound System. Echogenicity was measured as the mean pixel intensity in a selected region of interest (ROI) in the beamformed image. Among the four natural starch-based BMFs, potato starch-based BMF showed the highest echogenicity and contrast with almost 13%, 14%, and 10% higher pixel intensities (dB) than that of the least echoic BMF with 1%, 3%, and 5% starch concentrations respectively. Moreover, the echogenicity of corn, tapioca, and wheat starch-based BMF was observed to be similar, and the results suggest that these BMFs with higher starch concentrations shall be employed for in-vitro contrast-enhanced ultrasound imaging. △ Less

Submitted 10 March, 2024; originally announced March 2024.

Comments: 5 pages, 4 figures. Accepted in the IEEE South Asian Ultrasonics Symposium 2024 (IEEE SAUS 2024)

Journal ref: 2024 IEEE South Asian Ultrasonics Symposium (SAUS), 2024, pp. 1-4

arXiv:2403.05612 [pdf, other]

Unfamiliar Finetuning Examples Control How Language Models Hallucinate

Authors: Katie Kang, Eric Wallace, Claire Tomlin, Aviral Kumar, Sergey Levine

Abstract: Large language models are known to hallucinate when faced with unfamiliar queries, but the underlying mechanism that govern how models hallucinate are not yet fully understood. In this work, we find that unfamiliar examples in the models' finetuning data -- those that introduce concepts beyond the base model's scope of knowledge -- are crucial in shaping these errors. In particular, we find that a… ▽ More Large language models are known to hallucinate when faced with unfamiliar queries, but the underlying mechanism that govern how models hallucinate are not yet fully understood. In this work, we find that unfamiliar examples in the models' finetuning data -- those that introduce concepts beyond the base model's scope of knowledge -- are crucial in shaping these errors. In particular, we find that an LLM's hallucinated predictions tend to mirror the responses associated with its unfamiliar finetuning examples. This suggests that by modifying how unfamiliar finetuning examples are supervised, we can influence a model's responses to unfamiliar queries (e.g., say ``I don't know''). We empirically validate this observation in a series of controlled experiments involving SFT, RL, and reward model finetuning on TriviaQA and MMLU. Our work further investigates RL finetuning strategies for improving the factuality of long-form model generations. We find that, while hallucinations from the reward model can significantly undermine the effectiveness of RL factuality finetuning, strategically controlling how reward models hallucinate can minimize these negative effects. Leveraging our previous observations on controlling hallucinations, we propose an approach for learning more reliable reward models, and show that they improve the efficacy of RL factuality finetuning in long-form biography and book/movie plot generation tasks. △ Less

Submitted 28 May, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.05383 [pdf]

Thermal cycling induced evolution and colossal exchange bias in MnPS3/Fe3GeTe2 van der Waals heterostructures

Authors: Aravind Puthirath Balan, Aditya Kumar, Patrick Reiser, Joseph Vas, Thibaud Denneulin, Khoa Dang Lee, Tom G Saunderson, Märta Tschudin, Clement Pellet-Mary, Debarghya Dutta, Carolin Schrader, Tanja Scholz, Jaco Geuchies, Shuai Fu, Hai Wang, Alberta Bonanni, Bettina V. Lotsch, Ulrich Nowak, Gerhard Jakob, Jacob Gayles, Andras Kovacs, Rafal E. Dunin-Borkowski, Patrick Maletinsky, Mathias Kläui

Abstract: The exchange bias phenomenon, inherent in exchange-coupled ferromagnetic and antiferromagnetic systems, has intrigued researchers for decades. Van der Waals materials, with their layered structure, provide an optimal platform for probing such physical phenomena. However, achieving a facile and effective means to manipulate exchange bias in pristine van der Waals heterostructures remains challengin… ▽ More The exchange bias phenomenon, inherent in exchange-coupled ferromagnetic and antiferromagnetic systems, has intrigued researchers for decades. Van der Waals materials, with their layered structure, provide an optimal platform for probing such physical phenomena. However, achieving a facile and effective means to manipulate exchange bias in pristine van der Waals heterostructures remains challenging. In this study, we investigate the origin of exchange bias in MnPS3/Fe3GeTe2 van der Waals heterostructures. Our work demonstrates a method to modulate unidirectional exchange anisotropy, achieving an unprecedented nearly 1000% variation through simple thermal cycling. Despite the compensated interfacial spin configuration of MnPS3, magneto-transport measurements reveal a huge 170 mT exchange bias at 5 K, the largest observed in pristine van der Waals antiferromagnet-ferromagnet interfaces. This substantial magnitude of the exchange bias is linked to an anomalous weak ferromagnetic ordering in MnPS3 below 40 K. On the other hand, the tunability of exchange bias during thermal cycling is ascribed to the modified arrangement of interfacial atoms and changes in the vdW gap during field cooling. Our findings highlight a robust and easily adjustable exchange bias in van der Waals antiferromagnetic/ferromagnetic heterostructures, presenting a straightforward approach to enhance other interface related spintronic phenomena for practical applications. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.04781 [pdf]

Selective Encryption using Segmentation Mask with Chaotic Henon Map for Multidimensional Medical Images

Authors: S Arut Prakash, Aditya Ganesh Kumar, Prabhu Shankar K. C., Lithicka Anandavel, Aditya Lakshmi Narayanan

Abstract: A user-centric design and resource optimization should be at the center of any technology or innovation. The user-centric perspective gives the developer the opportunity to develop with task-based optimization. The user in the medical image field is a medical professional who analyzes the medical images and gives their diagnosis results to the patient. This scheme, having the medical professional… ▽ More A user-centric design and resource optimization should be at the center of any technology or innovation. The user-centric perspective gives the developer the opportunity to develop with task-based optimization. The user in the medical image field is a medical professional who analyzes the medical images and gives their diagnosis results to the patient. This scheme, having the medical professional user's perspective, innovates in the area of Medical Image storage and security. The architecture is designed with three main segments, namely: Segmentation, Storage, and Retrieval. This architecture was designed owing to the fact that the number of retrieval operations done by medical professionals was toweringly higher when compared to the storage operations done for some handful number of times for a particular medical image. This gives room for our innovation to segment out the medically indispensable part of the medical image, encrypt it, and store it. By encrypting the vital parts of the image using a strong encryption algorithm like the chaotic Henon map, we are able to keep the security intact. Now retrieving the medical image demands only the computationally less stressing decryption of the segmented region of interest. The decryption of the segmented region of interest results in the full recovery of the medical image which can be viewed on demand by the medical professionals for various diagnosis purposes. In this scheme, we were able to achieve a retrieval speed improvement of around 47% when compared to a full image encryption of brain medical CT images. △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2403.04333 [pdf, other]

A Survey of Application of Machine Learning in Wireless Indoor Positioning Systems

Authors: Amala Sonny, Abhinav Kumar, Linga Reddy Cenkeramaddi

Abstract: Indoor human positioning has become increasingly important for applications such as health monitoring, breath monitoring, human identification, safety and rescue operations, and security surveillance. However, achieving robust indoor human positioning remains challenging due to various constraints. Numerous attempts have been made in the literature to develop efficient indoor positioning systems (… ▽ More Indoor human positioning has become increasingly important for applications such as health monitoring, breath monitoring, human identification, safety and rescue operations, and security surveillance. However, achieving robust indoor human positioning remains challenging due to various constraints. Numerous attempts have been made in the literature to develop efficient indoor positioning systems (IPSs), with a growing focus on machine learning (ML) based techniques. This paper aims to compare and analyze current ML-based wireless techniques and approaches for indoor positioning, providing a comprehensive review of enabling technologies for human detection, positioning, and activity recognition. The study explores different input measurement data, including RSSI, TDOA, etc., for various IPSs. Key positioning techniques such as RSSI-based fingerprinting, Angle-based, and Time-based approaches are examined in conjunction with various ML methods. The survey compares the positioning accuracy, scalability, and algorithm complexity, with the goal of determining the suitable technology in various services. Finally, the paper compares distinct datasets focused on indoor localization, which have been published using diverse technologies. Overall, the paper presents a comprehensive comparison of existing techniques and localization models. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.03950 [pdf, other]

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Authors: Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Taïga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, Aviral Kumar, Rishabh Agarwal

Abstract: Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast… ▽ More Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast to supervised learning: by leveraging a cross-entropy classification loss, supervised methods have scaled reliably to massive networks. Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions. We demonstrate that value functions trained with categorical cross-entropy significantly improves performance and scalability in a variety of domains. These include: single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers, achieving state-of-the-art results on these domains. Through careful analysis, we show that the benefits of categorical cross-entropy primarily stem from its ability to mitigate issues inherent to value-based RL, such as noisy targets and non-stationarity. Overall, we argue that a simple shift to training value functions with categorical cross-entropy can yield substantial improvements in the scalability of deep RL at little-to-no cost. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.03744 [pdf, other]

MedSafetyBench: Evaluating and Improving the Medical Safety of Large Language Models

Authors: Tessa Han, Aounon Kumar, Chirag Agarwal, Himabindu Lakkaraju

Abstract: As large language models (LLMs) develop increasingly sophisticated capabilities and find applications in medical settings, it becomes important to assess their medical safety due to their far-reaching implications for personal and public health, patient safety, and human rights. However, there is little to no understanding of the notion of medical safety in the context of LLMs, let alone how to ev… ▽ More As large language models (LLMs) develop increasingly sophisticated capabilities and find applications in medical settings, it becomes important to assess their medical safety due to their far-reaching implications for personal and public health, patient safety, and human rights. However, there is little to no understanding of the notion of medical safety in the context of LLMs, let alone how to evaluate and improve it. To address this gap, we first define the notion of medical safety in LLMs based on the Principles of Medical Ethics set forth by the American Medical Association. We then leverage this understanding to introduce MedSafetyBench, the first benchmark dataset specifically designed to measure the medical safety of LLMs. We demonstrate the utility of MedSafetyBench by using it to evaluate and improve the medical safety of LLMs. Our results show that publicly-available medical LLMs do not meet standards of medical safety and that fine-tuning them using MedSafetyBench improves their medical safety. By introducing this new benchmark dataset, our work enables a systematic study of the state of medical safety in LLMs and motivates future work in this area, thereby mitigating the safety risks of LLMs in medicine. △ Less

Submitted 13 June, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.03004 [pdf, other]

Ultralight vector dark matter search using data from the KAGRA O3GK run

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, H. Abe, I. Abouelfettouh, F. Acernese, K. Ackley, C. Adamcewicz, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi , et al. (1778 additional authors not shown)

Abstract: Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we prese… ▽ More Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we present the result of a search for $U(1)_{B-L}$ gauge boson DM using the KAGRA data from auxiliary length channels during the first joint observation run together with GEO600. By applying our search pipeline, which takes into account the stochastic nature of ultralight DM, upper bounds on the coupling strength between the $U(1)_{B-L}$ gauge boson and ordinary matter are obtained for a range of DM masses. While our constraints are less stringent than those derived from previous experiments, this study demonstrates the applicability of our method to the lower-mass vector DM search, which is made difficult in this measurement by the short observation time compared to the auto-correlation time scale of DM. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 20 pages, 5 figures

Report number: LIGO-P2300250

arXiv:2403.02516 [pdf, other]

doi 10.1103/PhysRevLett.132.151001

Observation of Seven Astrophysical Tau Neutrino Candidates with IceCube

Authors: IceCube Collaboration, R. Abbasi, M. Ackermann, J. Adams, S. K. Agarwalla, J. A. Aguilar, M. Ahlers, J. M. Alameddine, N. M. Amin, K. Andeen, G. Anton, C. Argüelles, Y. Ashida, S. Athanasiadou, S. N. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, V. Basu, R. Bay, J. J. Beatty, J. Becker Tjus, J. Beise, C. Bellenghi , et al. (380 additional authors not shown)

Abstract: We report on a measurement of astrophysical tau neutrinos with 9.7 years of IceCube data. Using convolutional neural networks trained on images derived from simulated events, seven candidate $ν_τ$ events were found with visible energies ranging from roughly 20 TeV to 1 PeV and a median expected parent $ν_τ$ energy of about 200 TeV. Considering backgrounds from astrophysical and atmospheric neutrin… ▽ More We report on a measurement of astrophysical tau neutrinos with 9.7 years of IceCube data. Using convolutional neural networks trained on images derived from simulated events, seven candidate $ν_τ$ events were found with visible energies ranging from roughly 20 TeV to 1 PeV and a median expected parent $ν_τ$ energy of about 200 TeV. Considering backgrounds from astrophysical and atmospheric neutrinos, and muons from $π^\pm/K^\pm$ decays in atmospheric air showers, we obtain a total estimated background of about 0.5 events, dominated by non-$ν_τ$ astrophysical neutrinos. Thus, we rule out the absence of astrophysical $ν_τ$ at the $5σ$ level. The measured astrophysical $ν_τ$ flux is consistent with expectations based on previously published IceCube astrophysical neutrino flux measurements and neutrino oscillations. △ Less

Submitted 26 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: Accepted for publication in Physical Review Letters. This version includes full author list metadata

Journal ref: Phys.Rev.Lett. 132 (2024) 15, 151001

arXiv:2403.02470 [pdf, other]

doi 10.1088/1748-0221/19/06/P06026

Improved modeling of in-ice particle showers for IceCube event reconstruction

Authors: R. Abbasi, M. Ackermann, J. Adams, S. K. Agarwalla, J. A. Aguilar, M. Ahlers, J. M. Alameddine, N. M. Amin, K. Andeen, G. Anton, C. Argüelles, Y. Ashida, S. Athanasiadou, L. Ausborm, S. N. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, S. Bash, V. Basu, R. Bay, J. J. Beatty, J. Becker Tjus, J. Beise , et al. (394 additional authors not shown)

Abstract: The IceCube Neutrino Observatory relies on an array of photomultiplier tubes to detect Cherenkov light produced by charged particles in the South Pole ice. IceCube data analyses depend on an in-depth characterization of the glacial ice, and on novel approaches in event reconstruction that utilize fast approximations of photoelectron yields. Here, a more accurate model is derived for event reconstr… ▽ More The IceCube Neutrino Observatory relies on an array of photomultiplier tubes to detect Cherenkov light produced by charged particles in the South Pole ice. IceCube data analyses depend on an in-depth characterization of the glacial ice, and on novel approaches in event reconstruction that utilize fast approximations of photoelectron yields. Here, a more accurate model is derived for event reconstruction that better captures our current knowledge of ice optical properties. When evaluated on a Monte Carlo simulation set, the median angular resolution for in-ice particle showers improves by over a factor of three compared to a reconstruction based on a simplified model of the ice. The most substantial improvement is obtained when including effects of birefringence due to the polycrystalline structure of the ice. When evaluated on data classified as particle showers in the high-energy starting events sample, a significantly improved description of the events is observed. △ Less

Submitted 22 April, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: 28 pages, 18 figures, 1 table, submitted to JINST, updated to account for comments received

Journal ref: 2024 JINST 19 P06026

arXiv:2403.01794 [pdf, other]

Twisted hyperbolic van der Waals crystals for full Stokes mid-infrared polarization detection

Authors: Nihar Ranjan Sahoo, S. S. Jatin Prasath, Brijesh Kumar, Anshuman Kumar

Abstract: Investigating the polarization properties of light in the mid-infrared (mid-IR) spectrum is crucial for molecular sensing, biomedical diagnostics, and IR imaging system technologies. Traditional methods, limited by bulky size and intricate fabrication, utilize large rotating optics for full Stokes polarization detection, impeding miniaturization and accuracy. Van der Waals materials (vdW) based de… ▽ More Investigating the polarization properties of light in the mid-infrared (mid-IR) spectrum is crucial for molecular sensing, biomedical diagnostics, and IR imaging system technologies. Traditional methods, limited by bulky size and intricate fabrication, utilize large rotating optics for full Stokes polarization detection, impeding miniaturization and accuracy. Van der Waals materials (vdW) based devices can address these challenges due to their lithography-free fabrication, ease of integration with chip-scale platforms and room-temperature operation. This study introduces a chip-integrated polarimeter device leveraging the in-plane biaxial hyperbolic vdW crystal properties for mid-infrared light manipulation. The spatial division measurement scheme incorporates six meticulously designed linear and circular polarization filters, achieving high extinction ratios exceeding 30 dB and transmittance surpassing 50%, with fabrication tolerance of film thickness up to 100 nm. The proposed device represents a significant advancement in polarimetric detection, providing a compact, cost-effective solution and opens new avenues for on-chip mid-IR polarimetric detection in next-generation ultra-compact optical systems. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.01369 [pdf, other]

A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement

Authors: Ravi Shankar, Ke Tan, Buye Xu, Anurag Kumar

Abstract: Self-supervised learned models have been found to be very effective for certain speech tasks such as automatic speech recognition, speaker identification, keyword spotting and others. While the features are undeniably useful in speech recognition and associated tasks, their utility in speech enhancement systems is yet to be firmly established, and perhaps not properly understood. In this paper, we… ▽ More Self-supervised learned models have been found to be very effective for certain speech tasks such as automatic speech recognition, speaker identification, keyword spotting and others. While the features are undeniably useful in speech recognition and associated tasks, their utility in speech enhancement systems is yet to be firmly established, and perhaps not properly understood. In this paper, we investigate the uses of SSL representations for single-channel speech enhancement in challenging conditions and find that they add very little value for the enhancement task. Our constraints are designed around on-device real-time speech enhancement -- model is causal, the compute footprint is small. Additionally, we focus on low SNR conditions where such models struggle to provide good enhancement. In order to systematically examine how SSL representations impact performance of such enhancement models, we propose a variety of techniques to utilize these embeddings which include different forms of knowledge-distillation and pre-training. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Comments: 8 pages; Shorter form accepted in ICASSP 2024

arXiv:2403.00975 [pdf, other]

Equipment Health Assessment: Time Series Analysis for Wind Turbine Performance

Authors: Jana Backhus, Aniruddha Rajendra Rao, Chandrasekar Venkatraman, Abhishek Padmanabhan, A. Vinoth Kumar, Chetan Gupta

Abstract: In this study, we leverage SCADA data from diverse wind turbines to predict power output, employing advanced time series methods, specifically Functional Neural Networks (FNN) and Long Short-Term Memory (LSTM) networks. A key innovation lies in the ensemble of FNN and LSTM models, capitalizing on their collective learning. This ensemble approach outperforms individual models, ensuring stable and a… ▽ More In this study, we leverage SCADA data from diverse wind turbines to predict power output, employing advanced time series methods, specifically Functional Neural Networks (FNN) and Long Short-Term Memory (LSTM) networks. A key innovation lies in the ensemble of FNN and LSTM models, capitalizing on their collective learning. This ensemble approach outperforms individual models, ensuring stable and accurate power output predictions. Additionally, machine learning techniques are applied to detect wind turbine performance deterioration, enabling proactive maintenance strategies and health assessment. Crucially, our analysis reveals the uniqueness of each wind turbine, necessitating tailored models for optimal predictions. These insight underscores the importance of providing automatized customization for different turbines to keep human modeling effort low. Importantly, the methodologies developed in this analysis are not limited to wind turbines; they can be extended to predict and optimize performance in various machinery, highlighting the versatility and applicability of our research across diverse industrial contexts. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 19 Pages, 17 Figures, 3 Tables, Submitted at Applied Sciences (MDPI)

arXiv:2403.00927 [pdf]

doi 10.1051/0004-6361/202348642

SN 2019nyk: A rapidly declining Type II supernova with early interaction signatures

Authors: Raya Dastidar, Giuliano Pignata, Naveen Dukiya, Kuntal Misra, Daichi Hiramatsu, Javier Silva-Farfán, D. Andrew Howell, K. Azalee Bostroem, Mridweeka Singh, Anjasha Gangopadhyay, Amit Kumar, Curtis McCully

Abstract: We present an optical photometric and spectroscopic analysis of the fast-declining hydrogen-rich Type II supernova (SN) 2019nyk. The light curve properties of SN 2019nyk align well with those of other fast-declining Type II SNe, such as SNe 2013by and 2014G. SN 2019nyk exhibits a peak absolute magnitude of -18.09 $\pm$ 0.17 mag in the V band, followed by a rapid decline at 2.84 $\pm$ 0.03 mag (100… ▽ More We present an optical photometric and spectroscopic analysis of the fast-declining hydrogen-rich Type II supernova (SN) 2019nyk. The light curve properties of SN 2019nyk align well with those of other fast-declining Type II SNe, such as SNe 2013by and 2014G. SN 2019nyk exhibits a peak absolute magnitude of -18.09 $\pm$ 0.17 mag in the V band, followed by a rapid decline at 2.84 $\pm$ 0.03 mag (100 d)$^{-1}$ during the recombination phase. The early spectra of SN 2019nyk exhibit high-ionisation emission features as well as narrow H Balmer lines, persisting until 4.1 d since explosion, indicating the presence of circumstellar material (CSM) in close proximity. A comparison of these features with other Type II SNe displaying an early interaction reveals similarities between these features and those observed in SNe 2014G and 2023ixf. We also compared the early spectra to literature models, estimating a mass-loss rate of the order of 10$^{-3}$ M$_\odot$ yr$^{-1}$. Radiation hydrodynamical modelling of the light curve also suggests the mass loss from the progenitor within a short period prior to explosion, totalling 0.16 M$_\odot$ of material within 2900 R$_\odot$ of the progenitor. Furthermore, light curve modelling infers a zero-age main sequence mass of 15 M$_\odot$ for the progenitor, a progenitor radius of 1031 R$_\odot$, and an explosion energy of 1.1 $\times$ 10$^{51}$ erg. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 18 pages, 23 figures, accepted in A&A

arXiv:2403.00199 [pdf, other]

Improving Socratic Question Generation using Data Augmentation and Preference Optimization

Authors: Nischal Ashok Kumar, Andrew Lan

Abstract: The Socratic method is a way of guiding students toward solving a problem independently without directly revealing the solution to the problem. Although this method has been shown to significantly improve student learning outcomes, it remains a complex labor-intensive task for instructors. Large language models (LLMs) can be used to augment human effort by automatically generating Socratic questio… ▽ More The Socratic method is a way of guiding students toward solving a problem independently without directly revealing the solution to the problem. Although this method has been shown to significantly improve student learning outcomes, it remains a complex labor-intensive task for instructors. Large language models (LLMs) can be used to augment human effort by automatically generating Socratic questions for students. However, existing methods that involve prompting these LLMs sometimes produce invalid outputs, e.g., those that directly reveal the solution to the problem or provide irrelevant or premature questions. To alleviate this problem, inspired by reinforcement learning with AI feedback (RLAIF), we first propose a data augmentation method to enrich existing Socratic questioning datasets with questions that are invalid in specific ways. Next, we propose a method to optimize open-source LLMs such as LLama 2 to prefer ground-truth questions over generated invalid ones, using direct preference optimization (DPO). Our experiments on a Socratic questions dataset for student code debugging show that a DPO-optimized 7B LLama 2 model can effectively avoid generating invalid questions, and as a result, outperforms existing state-of-the-art prompting methods. △ Less

Submitted 18 April, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

Comments: Published at the 19th BEA Workshop co-located with NAACL-2024

arXiv:2402.19446 [pdf, other]

ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

Authors: Yifei Zhou, Andrea Zanette, Jiayi Pan, Sergey Levine, Aviral Kumar

Abstract: A broad use case of large language models (LLMs) is in goal-directed decision-making tasks (or "agent" tasks), where an LLM needs to not just generate completions for a given prompt, but rather make intelligent decisions over a multi-turn interaction to accomplish a task (e.g., when interacting with the web, using tools, or providing customer support). Reinforcement learning (RL) provides a genera… ▽ More A broad use case of large language models (LLMs) is in goal-directed decision-making tasks (or "agent" tasks), where an LLM needs to not just generate completions for a given prompt, but rather make intelligent decisions over a multi-turn interaction to accomplish a task (e.g., when interacting with the web, using tools, or providing customer support). Reinforcement learning (RL) provides a general paradigm to address such agent tasks, but current RL methods for LLMs largely focus on optimizing single-turn rewards. By construction, most single-turn RL methods cannot endow LLMs with the ability to intelligently seek information over multiple turns, perform credit assignment, or reason about their past actions -- all of which are critical in agent tasks. This raises the question: how can we design effective and efficient multi-turn RL algorithms for LLMs? In this paper, we develop a framework for building multi-turn RL algorithms for fine-tuning LLMs, that preserves the flexibility of existing single-turn RL methods for LLMs (e.g., proximal policy optimization), while accommodating multiple turns, long horizons, and delayed rewards effectively. To do this, our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel: a high-level off-policy value-based RL algorithm to aggregate reward over utterances, and a low-level RL algorithm that utilizes this high-level value function to train a token policy within each utterance or turn. Our hierarchical framework, Actor-Critic Framework with a Hierarchical Structure (ArCHer), can also give rise to other RL methods. Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks, attaining a sample efficiency of about 100x over existing methods, while also improving with larger model capacity (upto the 7 billion scale that we tested on). △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.18968 [pdf, other]

Ambisonics Networks -- The Effect Of Radial Functions Regularization

Authors: Bar Shaybet, Anurag Kumar, Vladimir Tourbabin, Boaz Rafaely

Abstract: Ambisonics, a popular format of spatial audio, is the spherical harmonic (SH) representation of the plane wave density function of a sound field. Many algorithms operate in the SH domain and utilize the Ambisonics as their input signal. The process of encoding Ambisonics from a spherical microphone array involves dividing by the radial functions, which may amplify noise at low frequencies. This ca… ▽ More Ambisonics, a popular format of spatial audio, is the spherical harmonic (SH) representation of the plane wave density function of a sound field. Many algorithms operate in the SH domain and utilize the Ambisonics as their input signal. The process of encoding Ambisonics from a spherical microphone array involves dividing by the radial functions, which may amplify noise at low frequencies. This can be overcome by regularization, with the downside of introducing errors to the Ambisonics encoding. This paper aims to investigate the impact of different ways of regularization on Deep Neural Network (DNN) training and performance. Ideally, these networks should be robust to the way of regularization. Simulated data of a single speaker in a room and experimental data from the LOCATA challenge were used to evaluate this robustness on an example algorithm of speaker localization based on the direct-path dominance (DPD) test. Results show that performance may be sensitive to the way of regularization, and an informed approach is proposed and investigated, highlighting the importance of regularization information. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: to be published in Icassp 2024

arXiv:2402.18889 [pdf]

Powering Monolithic and Hybrid Organic Optical Waveguides via Integrated Focused Micro-LEDs for Sustainable Photonic Circuits

Authors: Ankur Khapre, Avulu Vinod Kumar, Rajadurai Chandrasekar

Abstract: In the domain of mechanophotonics, achieving real-time applicability of organic crystals in visible light communication (VLC) technologies necessitates affordable light-emitting diodes (LEDs) as sources of light to run photonic devices through sustainable methods. Here in, we demonstrate an efficient strategy to excite (Z)-3-(3',5'-bis(trifluoromethyl)-[1,1'-biphenyl]-4-yl)-2-(4-methoxyphenyl) acr… ▽ More In the domain of mechanophotonics, achieving real-time applicability of organic crystals in visible light communication (VLC) technologies necessitates affordable light-emitting diodes (LEDs) as sources of light to run photonic devices through sustainable methods. Here in, we demonstrate an efficient strategy to excite (Z)-3-(3',5'-bis(trifluoromethyl)-[1,1'-biphenyl]-4-yl)-2-(4-methoxyphenyl) acrylonitrile (CF3OMe), 9,10-bis(phenylethynyl)anthracene (BPEA) and 2,2'-((1E,1'E)-hydrazine-1,2-diylidenebis(methaneylylidene))diphenol (SAA) flexible crystal waveguides utilizing UV LED source and transduce respective blue, orange and yellow fluorescence signals. The capability of the focused LED lies in its ability to (i) energize mechanically bent crystals at an angle of 180°, (ii) evanescently excite the FL of a SAA waveguide using the FL of CF3OMe waveguide through energy transfer, and (iii) excite and split different signals in a 2X2 hybrid directional coupler based on SSA-BPEA crystals. These demonstrations underscore the practicality of the proposed technique for sustainable applications in photonic systems related to VLC. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: 12 PAGES, 5 FIGURES

arXiv:2402.18878 [pdf]

Mechanophotonics: Pseudo-plastic Organic Crystal as a Fermat Spiral Optical Waveguide

Authors: Melchi Chosenyah, Avulu Vinod Kumar, Rajadurai Chandrasekar

Abstract: An unprecedented organic Fermat spiral optical waveguide (FSOW) self transducing green fluorescence is fabricated using a pseudo-plastic (E)-1-(((5-bromopyridin-2-yl)imino)methyl)naphthalene-2-ol crystal. A 1.618-millimeter-long crystal is initially bent into a hairpin-like bent waveguide. Later, a meticulous mechanophotonic strategy is employed to sculpt the hairpin-like bent waveguide into the F… ▽ More An unprecedented organic Fermat spiral optical waveguide (FSOW) self transducing green fluorescence is fabricated using a pseudo-plastic (E)-1-(((5-bromopyridin-2-yl)imino)methyl)naphthalene-2-ol crystal. A 1.618-millimeter-long crystal is initially bent into a hairpin-like bent waveguide. Later, a meticulous mechanophotonic strategy is employed to sculpt the hairpin-like bent waveguide into the Fermat spiral geometry, covering a compact area of 330x238 um2. The optical signal in FSOW survives two sharp 180-degree turns to produce optical output. The remarkably low bending-induced optical loss in FSOW can be ascribed to the smooth-defect-free surface morphology of the crystal. The development of such versatile optical components capable of transducing light through sharp bends is pivotal for realizing large-scale all-organic photonic circuits. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: 4 pages, 5 figures

arXiv:2402.18693 [pdf, ps, other]

Symbolic Powers of Classical Varieties

Authors: Arvind Kumar, Vivek Mukundan

Abstract: Let $R=\mathbb{K}[x_1,\dots,x_n]$ and $\mathfrak{a}_1,\dots,\mathfrak{a}_m$ are homogeneous ideals satisfying certain properties, which includes a description of the Noetherian symbolic Rees algebra. Then, we compute the Waldschmidt constant and resurgence and show that it exhibits a stronger version of the Chudnovsky and Demailly-type bounds. We further show that these properties are satisfied fo… ▽ More Let $R=\mathbb{K}[x_1,\dots,x_n]$ and $\mathfrak{a}_1,\dots,\mathfrak{a}_m$ are homogeneous ideals satisfying certain properties, which includes a description of the Noetherian symbolic Rees algebra. Then, we compute the Waldschmidt constant and resurgence and show that it exhibits a stronger version of the Chudnovsky and Demailly-type bounds. We further show that these properties are satisfied for classical varieties such as the generic determinantal ideals, minors of generic symmetric matrices, generic extended Hankel matrices, and ideal of pfaffians of skew-symmetric matrices. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 19 pages, comments, and suggestions are welcome

MSC Class: 13B25; 13A02

arXiv:2402.18199 [pdf, other]

Quantitative investigation of quantum emitter yield in drop-casted hexagonal boron nitride nanoflakes

Authors: Tom Kretzschmar, Sebastian Ritter, Anand Kumar, Tobias Vogl, Falk Eilenberger, Falko Schmidt

Abstract: Single photon emitters (SPEs) are a key component for their use as pure photon source in quantum technologies. In this study, we investigate the generation of SPEs from drop-casted hexagonal boron nitride (hBN) nanoflakes, examining the influence of the immersion solution and the source of hBN. We show that, depending on the utilized supplier and solution the number and quality of the emitters cha… ▽ More Single photon emitters (SPEs) are a key component for their use as pure photon source in quantum technologies. In this study, we investigate the generation of SPEs from drop-casted hexagonal boron nitride (hBN) nanoflakes, examining the influence of the immersion solution and the source of hBN. We show that, depending on the utilized supplier and solution the number and quality of the emitters changes. We perform a comprehensive optical characterization of the deposited nanoflakes to assess the quality of the generated SPEs. We show quantitative data on SPE yields, highlighting significant variations among solvents and different sources of hBN. This holds particular significance for employing drop-casted nanoflakes as SPE sources in quantum communication, sensing, and imaging. Our method is easily expandable to all kinds of surfaces and can be done without requiring complex fabrication steps and equipment, thus providing the necessary scalability required for industrial quantum applications. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 29 pages, 15 figures

arXiv:2402.18026 [pdf, other]

doi 10.1103/PhysRevD.110.022001

Characterization of the Astrophysical Diffuse Neutrino Flux using Starting Track Events in IceCube

Authors: R. Abbasi, M. Ackermann, J. Adams, S. K. Agarwalla, J. A. Aguilar, M. Ahlers, J. M. Alameddine, N. M. Amin, K. Andeen, G. Anton, C. Argüelles, Y. Ashida, S. Athanasiadou, L. Ausborm, S. N. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, S. Bash, V. Basu, R. Bay, J. J. Beatty, J. Becker Tjus, J. Beise , et al. (394 additional authors not shown)

Abstract: A measurement of the diffuse astrophysical neutrino spectrum is presented using IceCube data collected from 2011-2022 (10.3 years). We developed novel detection techniques to search for events with a contained vertex and exiting track induced by muon neutrinos undergoing a charged-current interaction. Searching for these starting track events allows us to not only more effectively reject atmospher… ▽ More A measurement of the diffuse astrophysical neutrino spectrum is presented using IceCube data collected from 2011-2022 (10.3 years). We developed novel detection techniques to search for events with a contained vertex and exiting track induced by muon neutrinos undergoing a charged-current interaction. Searching for these starting track events allows us to not only more effectively reject atmospheric muons but also atmospheric neutrino backgrounds in the southern sky, opening a new window to the sub-100 TeV astrophysical neutrino sky. The event selection is constructed using a dynamic starting track veto and machine learning algorithms. We use this data to measure the astrophysical diffuse flux as a single power law flux (SPL) with a best-fit spectral index of $γ= 2.58 ^{+0.10}_{-0.09}$ and per-flavor normalization of $φ^{\mathrm{Astro}}_{\mathrm{per-flavor}} = 1.68 ^{+0.19}_{-0.22} \times 10^{-18} \times \mathrm{GeV}^{-1} \mathrm{cm}^{-2} \mathrm{s}^{-1} \mathrm{sr}^{-1}$ (at 100 TeV). The sensitive energy range for this dataset is 3 - 550 TeV under the SPL assumption. This data was also used to measure the flux under a broken power law, however we did not find any evidence of a low energy cutoff. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 27 pages, 28 figures

Journal ref: Phys. Rev. D 110, 022001 (2024)

arXiv:2402.18001 [pdf, other]

Hilbert Space Fragmentation and Subspace Scar Time-Crystallinity in Driven Homogeneous Central-Spin Models

Authors: Abhishek Kumar, Rafail Frantzeskakis, Edwin Barnes

Abstract: We study the stroboscopic non-equilibrium quantum dynamics of periodically kicked Hamiltonians involving homogeneous central-spin interactions. The system exhibits a strong fragmentation of Hilbert space into four-dimensional Floquet-Krylov subspaces, which oscillate between two disjointed two-dimensional subspaces and thus break the discrete time-translation symmetry of the system. Our analytical… ▽ More We study the stroboscopic non-equilibrium quantum dynamics of periodically kicked Hamiltonians involving homogeneous central-spin interactions. The system exhibits a strong fragmentation of Hilbert space into four-dimensional Floquet-Krylov subspaces, which oscillate between two disjointed two-dimensional subspaces and thus break the discrete time-translation symmetry of the system. Our analytical and numerical analyses reveal that fully polarized states of the satellite spins exhibit fragmentations that are stable against perturbations and have high overlap with Floquet eigenstates of atypically low bipartite entanglement entropy (scar states). We present evidence of robust time-crystalline behavior in the form of a period doubling of the total magnetization of fully polarized satellite spin states that persists over long time scales. We compute non-equilibrium phase diagrams with respect to a magnetic field, coupling terms, and pulse error for various interaction types, including Heisenberg, Ising, XXZ, and XX. We also discuss possible experimental realizations of scar time crystals in color center, quantum dot, and rare-earth ion platforms. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 17 pages, 9 figures, 1 table

arXiv:2402.16142 [pdf]

From Text to Transformation: A Comprehensive Review of Large Language Models' Versatility

Authors: Pravneet Kaur, Gautam Siddharth Kashyap, Ankit Kumar, Md Tabrez Nafis, Sandeep Kumar, Vikrant Shokeen

Abstract: This groundbreaking study explores the expanse of Large Language Models (LLMs), such as Generative Pre-Trained Transformer (GPT) and Bidirectional Encoder Representations from Transformers (BERT) across varied domains ranging from technology, finance, healthcare to education. Despite their established prowess in Natural Language Processing (NLP), these LLMs have not been systematically examined fo… ▽ More This groundbreaking study explores the expanse of Large Language Models (LLMs), such as Generative Pre-Trained Transformer (GPT) and Bidirectional Encoder Representations from Transformers (BERT) across varied domains ranging from technology, finance, healthcare to education. Despite their established prowess in Natural Language Processing (NLP), these LLMs have not been systematically examined for their impact on domains such as fitness, and holistic well-being, urban planning, climate modelling as well as disaster management. This review paper, in addition to furnishing a comprehensive analysis of the vast expanse and extent of LLMs' utility in diverse domains, recognizes the research gaps and realms where the potential of LLMs is yet to be harnessed. This study uncovers innovative ways in which LLMs can leave a mark in the fields like fitness and wellbeing, urban planning, climate modelling and disaster response which could inspire future researches and applications in the said avenues. △ Less

Submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.15833 [pdf, other]

Prompt Perturbation Consistency Learning for Robust Language Models

Authors: Yao Qiang, Subhrangshu Nandi, Ninareh Mehrabi, Greg Ver Steeg, Anoop Kumar, Anna Rumshisky, Aram Galstyan

Abstract: Large language models (LLMs) have demonstrated impressive performance on a number of natural language processing tasks, such as question answering and text summarization. However, their performance on sequence labeling tasks such as intent classification and slot filling (IC-SF), which is a central component in personal assistant systems, lags significantly behind discriminative models. Furthermor… ▽ More Large language models (LLMs) have demonstrated impressive performance on a number of natural language processing tasks, such as question answering and text summarization. However, their performance on sequence labeling tasks such as intent classification and slot filling (IC-SF), which is a central component in personal assistant systems, lags significantly behind discriminative models. Furthermore, there is a lack of substantive research on the robustness of LLMs to various perturbations in the input prompts. The contributions of this paper are three-fold. First, we show that fine-tuning sufficiently large LLMs can produce IC-SF performance comparable to discriminative models. Next, we systematically analyze the performance deterioration of those fine-tuned models due to three distinct yet relevant types of input perturbations - oronyms, synonyms, and paraphrasing. Finally, we propose an efficient mitigation approach, Prompt Perturbation Consistency Learning (PPCL), which works by regularizing the divergence between losses from clean and perturbed samples. Our experiments demonstrate that PPCL can recover on average 59% and 69% of the performance drop for IC and SF tasks, respectively. Furthermore, PPCL beats the data augmentation approach while using ten times fewer augmented data samples. △ Less

Submitted 24 February, 2024; originally announced February 2024.

arXiv:2402.14591 [pdf, other]

High-Speed Detector For Low-Powered Devices In Aerial Grasping

Authors: Ashish Kumar, Laxmidhar Behera

Abstract: Autonomous aerial harvesting is a highly complex problem because it requires numerous interdisciplinary algorithms to be executed on mini low-powered computing devices. Object detection is one such algorithm that is compute-hungry. In this context, we make the following contributions: (i) Fast Fruit Detector (FFD), a resource-efficient, single-stage, and postprocessing-free object detector based o… ▽ More Autonomous aerial harvesting is a highly complex problem because it requires numerous interdisciplinary algorithms to be executed on mini low-powered computing devices. Object detection is one such algorithm that is compute-hungry. In this context, we make the following contributions: (i) Fast Fruit Detector (FFD), a resource-efficient, single-stage, and postprocessing-free object detector based on our novel latent object representation (LOR) module, query assignment, and prediction strategy. FFD achieves 100FPS@FP32 precision on the latest 10W NVIDIA Jetson-NX embedded device while co-existing with other time-critical sub-systems such as control, grasping, SLAM, a major achievement of this work. (ii) a method to generate vast amounts of training data without exhaustive manual labelling of fruit images since they consist of a large number of instances, which increases the labelling cost and time. (iii) an open-source fruit detection dataset having plenty of very small-sized instances that are difficult to detect. Our exhaustive evaluations on our and MinneApple dataset show that FFD, being only a single-scale detector, is more accurate than many representative detectors, e.g. FFD is better than single-scale Faster-RCNN by 10.7AP, multi-scale Faster-RCNN by 2.3AP, and better than latest single-scale YOLO-v8 by 8AP and multi-scale YOLO-v8 by 0.3 while being considerably faster. △ Less

Submitted 1 March, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: 8 Pages, 9 Figures, 8 Tables, IEEE Robotics and Automation Letters (IEEE RA-L)

arXiv:2402.14176 [pdf, other]

Ferroelectricity at the extreme thickness limit in the archetypal antiferroelectric PbZrO$_3$

Authors: Nikhilesh Maity, Milan Haddad, Nazanin Bassiri-Gharb, Amit Kumar, Lewys Jones, Sergey Lisenkov, Inna Ponomareva

Abstract: Size-driven transition of an antiferroelectric into a polar ferroelectric or ferrielectric state is a strongly debated issue from both experimental and theoretical perspectives. While critical thickness limits for such transitions have been explored, a bottom-up approach in the ultrathin limit considering few atomic layers could provide insight into the mechanism of stabilization of the polar phas… ▽ More Size-driven transition of an antiferroelectric into a polar ferroelectric or ferrielectric state is a strongly debated issue from both experimental and theoretical perspectives. While critical thickness limits for such transitions have been explored, a bottom-up approach in the ultrathin limit considering few atomic layers could provide insight into the mechanism of stabilization of the polar phases over the antipolar phase seen in bulk PbZrO$_3$. Here, we use first-principles density functional theory to predict the stability of polar phases in Pt/PbZrO$_3$/Pt nanocapacitors. In a few atomic layer thick slabs of PbZrO$_3$ sandwiched between Pt electrodes, we find that the polar phase originating from the well established R3c phase of bulk PbZrO$_3$ is energetically favorable over the antipolar phase originating from the Pbam phase of bulk PbZrO$_3$. The famous triple-well potential of antiferroelectric PbZrO$_3$ is modified in the nanocapacitor limit in such a way as to swap the positions of the global and local minima, stabilizing the polar phase relative to the antipolar one. The size effect is decomposed into the contributions from dimensionality reduction, surface charge screening, and interfacial relaxation, which reveals that it is the creation of well-compensated interfaces that stabilizes the polar phases over the antipolar ones in nanoscale PbZrO$_3$. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.12906 [pdf, other]

doi 10.1007/978-3-030-93620-4_7

Fog enabled distributed training architecture for federated learning

Authors: Aditya Kumar, Satish Narayana Srirama

Abstract: The amount of data being produced at every epoch of second is increasing every moment. Various sensors, cameras and smart gadgets produce continuous data throughout its installation. Processing and analyzing raw data at a cloud server faces several challenges such as bandwidth, congestion, latency, privacy and security. Fog computing brings computational resources closer to IoT that addresses some… ▽ More The amount of data being produced at every epoch of second is increasing every moment. Various sensors, cameras and smart gadgets produce continuous data throughout its installation. Processing and analyzing raw data at a cloud server faces several challenges such as bandwidth, congestion, latency, privacy and security. Fog computing brings computational resources closer to IoT that addresses some of these issues. These IoT devices have low computational capability, which is insufficient to train machine learning. Mining hidden patterns and inferential rules from continuously growing data is crucial for various applications. Due to growing privacy concerns, privacy preserving machine learning is another aspect that needs to be inculcated. In this paper, we have proposed a fog enabled distributed training architecture for machine learning tasks using resources constrained devices. The proposed architecture trains machine learning model on rapidly changing data using online learning. The network is inlined with privacy preserving federated learning training. Further, the learning capability of architecture is tested on a real world IIoT use case. We trained a neural network model for human position detection in IIoT setup on rapidly changing data. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: Conference paper accepted at BDA 2021

Journal ref: Big Data Analytics 9th International Conference, BDA 2021, Virtual Event, December 15-18, 2021

arXiv:2402.12456 [pdf, other]

Three point interaction of Dirac fermions with higher spin particles and discrete symmetries

Authors: Kushal Chakraborty, Aakash Kumar, Arnab Rudra, Amey Yeole

Abstract: We constructed all possible kinematically allowed three-point interactions of two massless Dirac spinors with massive higher-spin bosons. In any $D$ spacetime, the interactions have been constructed using the projections of the massive higher spin representations of $Spin(D-1)$ over the massless complex spinor representations of $Spin(D-2)\times Spin(D-2)$. Based on this analysis, we have further… ▽ More We constructed all possible kinematically allowed three-point interactions of two massless Dirac spinors with massive higher-spin bosons. In any $D$ spacetime, the interactions have been constructed using the projections of the massive higher spin representations of $Spin(D-1)$ over the massless complex spinor representations of $Spin(D-2)\times Spin(D-2)$. Based on this analysis, we have further classified the space of theories involving one massless Dirac spinor and a single (or multiple) massive higher spins based on the discrete symmetries: $C,\, R,$ and $ T$. We found that in any $D=2m+1/2m$, the interacting theories of a single massive higher spin have a $``m"$ mod $2$ (or $D$ mod $4$) classification. △ Less

Submitted 7 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: Reference added, typos corrected, textual improvement

arXiv:2402.11128 [pdf, other]

Strain activation of localized states in WSe2

Authors: Oguzhan Yücel, Denis Yagodkin, Jan N. Kirchhof, Abhijeet Kumar, Adrian Dewambrechies, Sviatoslav Kovalchuk, Yufeng Yu, Kirill I. Bolotin

Abstract: We explore strain-activated emission centers formed by atomic force microscopy (AFM) indentation in monolayer \ce{WSe2} on a flexible polymer substrate. In the indented areas, we observe sharp new photoluminescence (PL) peaks characterized by sublinear power dependence in the spectral regions 1.62 $-$ 1.66 eV and 1.70 $-$ 1.73 eV. After low-temperature thermal annealing ($< 120$ $^{\circ}$C), \ce{… ▽ More We explore strain-activated emission centers formed by atomic force microscopy (AFM) indentation in monolayer \ce{WSe2} on a flexible polymer substrate. In the indented areas, we observe sharp new photoluminescence (PL) peaks characterized by sublinear power dependence in the spectral regions 1.62 $-$ 1.66 eV and 1.70 $-$ 1.73 eV. After low-temperature thermal annealing ($< 120$ $^{\circ}$C), \ce{WSe2} experiences strain relaxation, leading to a blue shift of the peaks' spectral position and their ultimate disappearance. Our analysis of peaks' position vs. strain allows drawing multiple conclusions regarding the nature of these emission centers. We elucidate the roles of excitonic confinement and hybridization between free excitons and defect-related states, a process activated by the level of strain. Overall, our approach suggests that the energy of localized emitters may be controlled via strain engineering. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.10985 [pdf, other]

Leveraging AI Planning For Detecting Cloud Security Vulnerabilities

Authors: Mikhail Kazdagli, Mohit Tiwari, Akshat Kumar

Abstract: Cloud computing services provide scalable and cost-effective solutions for data storage, processing, and collaboration. Alongside their growing popularity, concerns related to their security vulnerabilities leading to data breaches and sophisticated attacks such as ransomware are growing. To address these, first, we propose a generic framework to express relations between different cloud objects s… ▽ More Cloud computing services provide scalable and cost-effective solutions for data storage, processing, and collaboration. Alongside their growing popularity, concerns related to their security vulnerabilities leading to data breaches and sophisticated attacks such as ransomware are growing. To address these, first, we propose a generic framework to express relations between different cloud objects such as users, datastores, security roles, to model access control policies in cloud systems. Access control misconfigurations are often the primary driver for cloud attacks. Second, we develop a PDDL model for detecting security vulnerabilities which can for example lead to widespread attacks such as ransomware, sensitive data exfiltration among others. A planner can then generate attacks to identify such vulnerabilities in the cloud. Finally, we test our approach on 14 real Amazon AWS cloud configurations of different commercial organizations. Our system can identify a broad range of security vulnerabilities, which state-of-the-art industry tools cannot detect. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.10336 [pdf, other]

Ultrafast Photochemistry and Electron Diffraction for Cyclobutanone in the S2 State: Surface Hopping with Time-Dependent Density Functional Theory

Authors: Ericka Roy Miller, Sean J. Hoehn, Abhijith Kumar, Dehua Jiang, Shane M. Parker

Abstract: We simulate the photodynamics of gas-phase cyclobutanone excited to the S$_2$ state using fewest switches surface hopping (FSSH) dynamics powered by time-dependent density functional theory (TDDFT). We predict a total C3+C2 photoproduct yield of 9%, with a C3:C2 product ratio of 1:8. Two primary S$_2$$\rightarrow$S$_1$ conical intersections are identified: $β$ stretch and CCH bend, with the higher… ▽ More We simulate the photodynamics of gas-phase cyclobutanone excited to the S$_2$ state using fewest switches surface hopping (FSSH) dynamics powered by time-dependent density functional theory (TDDFT). We predict a total C3+C2 photoproduct yield of 9%, with a C3:C2 product ratio of 1:8. Two primary S$_2$$\rightarrow$S$_1$ conical intersections are identified: $β$ stretch and CCH bend, with the higher energy $β$ stretch being associated with sub-picosecond S$_2$ decay. Excited state lifetimes computed with respect to electronic state populations were found to be 7.0 ps (S$_2$$\rightarrow$S$_1$) and 550 fs (S$_1$$\rightarrow$S$_0$). We also generate time-resolved difference pair distribution functions ($Δ$PDFs) from our TDDFT-FSSH dynamics results in order to generate direct comparisons to ultrafast electron diffraction experiment observables. Global and target analysis of time-resolved $Δ$PDFs produced a distinct set of lifetimes: i) a 0.462 ps decay, and ii) a 16.8 ps decay that both resemble the S$_2$ minimum, as well as iii) a long ($>$ nanosecond) decay that resembles the S$_1$ minimum geometry and the fully separated C3/C2 products. Finally, we contextualize our results by considering the impact of the most likely sources of significant errors. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 10 pages, 9 figures, preprint before peer review

arXiv:2402.09226 [pdf, other]

Directional Convergence Near Small Initializations and Saddles in Two-Homogeneous Neural Networks

Authors: Akshay Kumar, Jarvis Haupt

Abstract: This paper examines gradient flow dynamics of two-homogeneous neural networks for small initializations, where all weights are initialized near the origin. For both square and logistic losses, it is shown that for sufficiently small initializations, the gradient flow dynamics spend sufficient time in the neighborhood of the origin to allow the weights of the neural network to approximately converg… ▽ More This paper examines gradient flow dynamics of two-homogeneous neural networks for small initializations, where all weights are initialized near the origin. For both square and logistic losses, it is shown that for sufficiently small initializations, the gradient flow dynamics spend sufficient time in the neighborhood of the origin to allow the weights of the neural network to approximately converge in direction to the Karush-Kuhn-Tucker (KKT) points of a neural correlation function that quantifies the correlation between the output of the neural network and corresponding labels in the training data set. For square loss, it has been observed that neural networks undergo saddle-to-saddle dynamics when initialized close to the origin. Motivated by this, this paper also shows a similar directional convergence among weights of small magnitude in the neighborhood of certain saddle points. △ Less

Submitted 20 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

Comments: tmlr-final-version

arXiv:2402.09203 [pdf, ps, other]

Thermodynamic properties and phase diagram of quark matter within non-extensive Polyakov chiral SU (3) quark mean field model

Authors: Dhananjay Singh, Arvind Kumar

Abstract: In the present work, we apply Tsallis non-extensive statistics to study the thermodynamic properties and phase diagram of quark matter in the Polyakov chiral SU(3) quark mean field model. Within this model, the properties of the quark matter are modified through the scalar fields $σ, ζ, δ, χ$, the vector fields $ω, ρ$, $φ$, and the Polyakov fields $Φ$ and $\barΦ$ at finite temperature and chemical… ▽ More In the present work, we apply Tsallis non-extensive statistics to study the thermodynamic properties and phase diagram of quark matter in the Polyakov chiral SU(3) quark mean field model. Within this model, the properties of the quark matter are modified through the scalar fields $σ, ζ, δ, χ$, the vector fields $ω, ρ$, $φ$, and the Polyakov fields $Φ$ and $\barΦ$ at finite temperature and chemical potential. Non-extensive effects have been introduced through a dimensionless parameter $q$ and the results are compared to the extensive case ($q\rightarrow1$). In the non-extensive case, the exponential in the Fermi-Dirac (FD) function is modified to a $q$-exponential form. The influence of $q$ parameter on the thermodynamic properties: pressure, energy, and entropy density as well as trace anomaly is investigated. The speed of sound and specific heat with non-extensive effects is also studied. Furthermore, the effect of non-extensivity on the deconfinement phase transition as well as the chiral phase transition of $u, d,$ and $s$ quarks is explored. We found that the critical end point (CEP), which defines the point in the $(T - μ)$ phase diagram where the order of the phase transition changes, shifts to a lower value of temperature, $T_{CEP}$, and a higher value of chemical potential, $μ_{CEP}$, as the non-extensivity is increased, i.e., $q>$1. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: 25 pages, 9 figures (To be published in Chinese Physics C)

arXiv:2402.08732 [pdf, other]

doi 10.1093/mnras/stae418

Radio-loud fraction of z>6 quasars

Authors: Pascal M. Keller, Nithyanandan Thyagarajan, Ajay Kumar, Nissim Kanekar, Gianni Bernardi

Abstract: Quasars at redshifts $z>6$ are an excellent probe of the formation and evolution of supermassive black holes in the early Universe. The population of radio-luminous quasars is of particular interest, as such quasars could potentially be used to study the neutral intergalactic medium during cosmic reionisation via H$\,$I 21$\,$cm absorption studies. However, the lack of deep radio observations of… ▽ More Quasars at redshifts $z>6$ are an excellent probe of the formation and evolution of supermassive black holes in the early Universe. The population of radio-luminous quasars is of particular interest, as such quasars could potentially be used to study the neutral intergalactic medium during cosmic reionisation via H$\,$I 21$\,$cm absorption studies. However, the lack of deep radio observations of $z>6$ quasars leaves the population poorly constrained, and suitable candidates for an H$\,$I 21$\,$cm absorption study have yet to be found. In this work, we present Jansky Very Large Array (VLA) 1$-$2 GHz radio continuum observations of 138 quasars at redshifts $6.0 \leq z<7.6$. We detect the radio continuum emission of the $z=6.1$ quasar J1034-1425, with a 1.6 GHz flux density of $170\pm 36\,μ$Jy. This quasar is radio-quiet with radio-loudness, $R \equiv f_{5\text{~GHz}}/f_{ν,\text{4400 A}} = 2.4\pm0.5$. In addition, we detect 7 other quasars at z>6, which have previously been characterised in the literature at these frequencies. Using the full sample, we estimate the radio-loud fraction to be $3.8^{+6.2}_{-2.4}\%$, where the uncertainties are 95% confidence intervals. This is lower than recent estimates of the radio-loud fraction in the literature, but is still marginally consistent with no redshift evolution of the radio-loud fraction. We explore the undetected quasar population by stacking their continuum images at their optical positions and obtain a median stacked flux density of $13.8\pm 3.9~μ$Jy and luminosity of $\log{L_{5~\mathrm{GHz}}/(\mathrm{W~Hz}^{-1})}=24.2\pm0.1$. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: 12 pages, 8 figures, 3 tables, published in MNRAS

Journal ref: Monthly Notices of the Royal Astronomical Society, stae418 (2024)

arXiv:2402.08017 [pdf, other]

Lumos : Empowering Multimodal LLMs with Scene Text Recognition

Authors: Ashish Shenoy, Yichao Lu, Srihari Jayakumar, Debojeet Chatterjee, Mohsen Moslehpour, Pierce Chuang, Abhay Harpale, Vikas Bhardwaj, Di Xu, Shicong Zhao, Longfang Zhao, Ankit Ramchandani, Xin Luna Dong, Anuj Kumar

Abstract: We introduce Lumos, the first end-to-end multimodal question-answering system with text understanding capabilities. At the core of Lumos is a Scene Text Recognition (STR) component that extracts text from first person point-of-view images, the output of which is used to augment input to a Multimodal Large Language Model (MM-LLM). While building Lumos, we encountered numerous challenges related to… ▽ More We introduce Lumos, the first end-to-end multimodal question-answering system with text understanding capabilities. At the core of Lumos is a Scene Text Recognition (STR) component that extracts text from first person point-of-view images, the output of which is used to augment input to a Multimodal Large Language Model (MM-LLM). While building Lumos, we encountered numerous challenges related to STR quality, overall latency, and model inference. In this paper, we delve into those challenges, and discuss the system architecture, design choices, and modeling techniques employed to overcome these obstacles. We also provide a comprehensive evaluation for each component, showcasing high quality and efficiency. △ Less

Submitted 1 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

Comments: Accepted to KDD 2024 (ADS Track)

arXiv:2402.07081 [pdf, other]

Using Large Language Models for Student-Code Guided Test Case Generation in Computer Science Education

Authors: Nischal Ashok Kumar, Andrew Lan

Abstract: In computer science education, test cases are an integral part of programming assignments since they can be used as assessment items to test students' programming knowledge and provide personalized feedback on student-written code. The goal of our work is to propose a fully automated approach for test case generation that can accurately measure student knowledge, which is important for two reasons… ▽ More In computer science education, test cases are an integral part of programming assignments since they can be used as assessment items to test students' programming knowledge and provide personalized feedback on student-written code. The goal of our work is to propose a fully automated approach for test case generation that can accurately measure student knowledge, which is important for two reasons. First, manually constructing test cases requires expert knowledge and is a labor-intensive process. Second, developing test cases for students, especially those who are novice programmers, is significantly different from those oriented toward professional-level software developers. Therefore, we need an automated process for test case generation to assess student knowledge and provide feedback. In this work, we propose a large language model-based approach to automatically generate test cases and show that they are good measures of student knowledge, using a publicly available dataset that contains student-written Java code. We also discuss future research directions centered on using test cases to help students. △ Less

Submitted 10 February, 2024; originally announced February 2024.

Comments: Oral Presentation at AI4ED workshop at AAAI-2024

arXiv:2402.06785 [pdf, ps, other]

The poker-chip experiments of synthetic elastomers

Authors: Farhad Kamarei, Aditya Kumar, Oscar Lopez-Pamies

Abstract: In a recent study, Kumar and Lopez-Pamies (J. Mech. Phys. Solids 150: 104359, 2021) have provided a complete quantitative explanation of the famed poker-chip experiments of Gent and Lindley (Proc. R. Soc. Lond. Ser. A 249: 195--205, 1959) on natural rubber. In a nutshell, making use of the fracture theory of Kumar, Francfort, and Lopez-Pamies (J. Mech. Phys. Solids 112: 523--551, 2018), they have… ▽ More In a recent study, Kumar and Lopez-Pamies (J. Mech. Phys. Solids 150: 104359, 2021) have provided a complete quantitative explanation of the famed poker-chip experiments of Gent and Lindley (Proc. R. Soc. Lond. Ser. A 249: 195--205, 1959) on natural rubber. In a nutshell, making use of the fracture theory of Kumar, Francfort, and Lopez-Pamies (J. Mech. Phys. Solids 112: 523--551, 2018), they have shown that the nucleation of cracks in poker-chip experiments in natural rubber is governed by the strength -- in particular, the hydrostatic strength -- of the rubber, while the propagation of the nucleated cracks is governed by the Griffith competition between the bulk elastic energy of the rubber and its intrinsic fracture energy. The main objective of this paper is to extend the theoretical study of the poker-chip experiment by Kumar and Lopez-Pamies to synthetic elastomers that, as opposed to natural rubber: ($i$) may feature a hydrostatic strength that is larger than their uniaxial and biaxial tensile strengths and ($ii$) do not exhibit strain-induced crystallization. A parametric study, together with direct comparisons with recent poker-chip experiments on a silicone elastomer, show that these two different material characteristics have a profound impact on where and when cracks nucleate, as well as on where and when they propagate. In conjunction with the results put forth earlier for natural rubber, the results presented in this paper provide a complete description and explanation of the poker-chip experiments of elastomers at large. As a second objective, this paper also introduces a new fully explicit constitutive prescription for the driving force that describes the material strength in the fracture theory of Kumar, Francfort, and Lopez-Pamies. △ Less

Submitted 9 February, 2024; originally announced February 2024.

arXiv:2402.05346 [pdf, other]

KIX: A Metacognitive Generalization Framework

Authors: Arun Kumar, Paul Schrater

Abstract: Humans and other animals aptly exhibit general intelligence behaviors in solving a variety of tasks with flexibility and ability to adapt to novel situations by reusing and applying high level knowledge acquired over time. But artificial agents are more of a specialist, lacking such generalist behaviors. Artificial agents will require understanding and exploiting critical structured knowledge repr… ▽ More Humans and other animals aptly exhibit general intelligence behaviors in solving a variety of tasks with flexibility and ability to adapt to novel situations by reusing and applying high level knowledge acquired over time. But artificial agents are more of a specialist, lacking such generalist behaviors. Artificial agents will require understanding and exploiting critical structured knowledge representations. We present a metacognitive generalization framework, Knowledge-Interaction-eXecution (KIX), and argue that interactions with objects leveraging type space facilitate the learning of transferable interaction concepts and generalization. It is a natural way of integrating knowledge into reinforcement learning and promising to act as an enabler for autonomous and generalist behaviors in artificial intelligence systems. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.05149 [pdf, other]

FlowPG: Action-constrained Policy Gradient with Normalizing Flows

Authors: Janaka Chathuranga Brahmanage, Jiajing Ling, Akshat Kumar

Abstract: Action-constrained reinforcement learning (ACRL) is a popular approach for solving safety-critical and resource-allocation related decision making problems. A major challenge in ACRL is to ensure agent taking a valid action satisfying constraints in each RL step. Commonly used approach of using a projection layer on top of the policy network requires solving an optimization program which can resul… ▽ More Action-constrained reinforcement learning (ACRL) is a popular approach for solving safety-critical and resource-allocation related decision making problems. A major challenge in ACRL is to ensure agent taking a valid action satisfying constraints in each RL step. Commonly used approach of using a projection layer on top of the policy network requires solving an optimization program which can result in longer training time, slow convergence, and zero gradient problem. To address this, first we use a normalizing flow model to learn an invertible, differentiable mapping between the feasible action space and the support of a simple distribution on a latent variable, such as Gaussian. Second, learning the flow model requires sampling from the feasible action space, which is also challenging. We develop multiple methods, based on Hamiltonian Monte-Carlo and probabilistic sentential decision diagrams for such action sampling for convex and non-convex constraints. Third, we integrate the learned normalizing flow with the DDPG algorithm. By design, a well-trained normalizing flow will transform policy output into a valid action without requiring an optimization solver. Empirically, our approach results in significantly fewer constraint violations (upto an order-of-magnitude for several instances) and is multiple times faster on a variety of continuous control tasks. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Journal ref: Thirty-seventh Conference on Neural Information Processing Systems. 2023

arXiv:2402.04231 [pdf, ps, other]

Further Constructions of AMUBs for Non-prime power Composite Dimensions

Authors: Ajeet Kumar, Subhamoy Maitra

Abstract: Construction of a large class of Mutually Unbiased Bases (MUBs) for non-prime power composite dimensions ($d = k\times s$) is a long standing open problem, which leads to different construction methods for the class Approximate MUBs (AMUBs) by relaxing the criterion that the absolute value of the dot product between two vectors chosen from different bases should be $\leq \fracβ{\sqrt{d}}$. In this… ▽ More Construction of a large class of Mutually Unbiased Bases (MUBs) for non-prime power composite dimensions ($d = k\times s$) is a long standing open problem, which leads to different construction methods for the class Approximate MUBs (AMUBs) by relaxing the criterion that the absolute value of the dot product between two vectors chosen from different bases should be $\leq \fracβ{\sqrt{d}}$. In this chapter, we consider a more general class of AMUBs (ARMUBs, considering the real ones too), compared to our earlier work in [Cryptography and Communications, 14(3): 527--549, 2022]. We note that the quality of AMUBs (ARMUBs) constructed using RBD$(X,A)$ with $|X|= d$, critically depends on the parameters, $|s-k|$, $μ$ (maximum number of elements common between any pair of blocks), and the set of block sizes. We present the construction of $\mathcal{O}(\sqrt{d})$ many $β$-AMUBs for composite $d$ when $|s-k|< \sqrt{d}$, using RBDs having block sizes approximately $\sqrt{d}$, such that $|\braket{ψ^l_i|ψ^m_j}| \leq \fracβ{\sqrt{d}}$ where $β= 1 + \frac{|s-k|}{2\sqrt{d}}+ \mathcal{O}(d^{-1}) \leq 2$. Moreover, if real Hadamard matrix of order $k$ or $s$ exists, then one can construct at least $N(k)+1$ (or $N(s)+1$) many $β$-ARMUBs for dimension $d$, with $β\leq 2 - \frac{|s-k|}{2\sqrt{d}}+ \mathcal{O}(d^{-1})< 2$, where $N(w)$ is the number of MOLS$(w)$. This improves and generalizes some of our previous results for ARMUBs from two points, viz., the real cases are now extended to complex ones too. The earlier efforts use some existing RBDs, whereas here we consider new instances of RBDs that provide better results. Similar to the earlier cases, the AMUBs (ARMUBs) constructed using RBDs are in general very sparse, where the sparsity $(ε)$ is $1 - \mathcal{O}(d^{-\frac{1}{2}})$. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.03964 [pdf, ps, other]

Almost Perfect Mutually Unbiased Bases that are Sparse

Authors: Ajeet Kumar, Subhamoy Maitra, Somjit Roy

Abstract: In dimension $d$, Mutually Unbiased Bases (MUBs) are a collection of orthonormal bases over $\mathbb{C}^d$ such that for any two vectors $v_1, v_2$ belonging to different bases, the scalar product $|\braket{v_1|v_2}| = \frac{1}{\sqrt{d}}$. The upper bound on the number of such bases is $d+1$. Constructions to achieve this bound are known when $d$ is some power of prime. The situation is more restr… ▽ More In dimension $d$, Mutually Unbiased Bases (MUBs) are a collection of orthonormal bases over $\mathbb{C}^d$ such that for any two vectors $v_1, v_2$ belonging to different bases, the scalar product $|\braket{v_1|v_2}| = \frac{1}{\sqrt{d}}$. The upper bound on the number of such bases is $d+1$. Constructions to achieve this bound are known when $d$ is some power of prime. The situation is more restrictive in other cases and also when we consider the results over real rather than complex. Thus, certain relaxations of this model are considered in literature and consequently Approximate MUBs (AMUB) are studied. This enables one to construct potentially large number of such objects for $\mathbb{C}^d$ as well as in $\mathbb{R}^d$. In this regard, we propose the concept of Almost Perfect MUBs (APMUB), where we restrict the absolute value of inner product $|\braket{v_1|v_2}|$ to be two-valued, one being 0 and the other $ \leq \frac{1+\mathcal{O}(d^{-λ})}{\sqrt{d}}$, such that $λ> 0$ and the numerator $1 + \mathcal{O}(d^{-λ}) \leq 2$. Each such vector constructed, has an important feature that large number of its components are zero and the non-zero components are of equal magnitude. Our techniques are based on combinatorial structures related to RBDs. We show that for several composite dimensions $d$, one can construct $\mathcal{O}(\sqrt{d})$ many APMUBs, in which cases the number of MUBs are significantly small. To be specific, this result works for $d$ of the form $(q-e)(q+f), \ q, e, f \in \mathbb{N}$, with the conditions $0 \leq f \leq e$ for constant $e, f$ and $q$ some power of prime. We also show that such APMUBs provide sets of Bi-angular vectors which are $\mathcal{O}(d^{\frac{3}{2}})$ in numbers, having high angular distances among them. Finally, as the MUBs are equivalent to a set of Hadamard matrices, we show that the APMUBs are so with the set of Weighing matrices. △ Less

Submitted 13 March, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.02651 [pdf, other]

Vision-Language Models Provide Promptable Representations for Reinforcement Learning

Authors: William Chen, Oier Mees, Aviral Kumar, Sergey Levine

Abstract: Humans can quickly learn new behaviors by leveraging background world knowledge. In contrast, agents trained with reinforcement learning (RL) typically learn behaviors from scratch. We thus propose a novel approach that uses the vast amounts of general and indexable world knowledge encoded in vision-language models (VLMs) pre-trained on Internet-scale data for embodied RL. We initialize policies w… ▽ More Humans can quickly learn new behaviors by leveraging background world knowledge. In contrast, agents trained with reinforcement learning (RL) typically learn behaviors from scratch. We thus propose a novel approach that uses the vast amounts of general and indexable world knowledge encoded in vision-language models (VLMs) pre-trained on Internet-scale data for embodied RL. We initialize policies with VLMs by using them as promptable representations: embeddings that encode semantic features of visual observations based on the VLM's internal knowledge and reasoning capabilities, as elicited through prompts that provide task context and auxiliary information. We evaluate our approach on visually-complex, long horizon RL tasks in Minecraft and robot navigation in Habitat. We find that our policies trained on embeddings from off-the-shelf, general-purpose VLMs outperform equivalent policies trained on generic, non-promptable image embeddings. We also find our approach outperforms instruction-following methods and performs comparably to domain-specific embeddings. Finally, we show that our approach can use chain-of-thought prompting to produce representations of common-sense semantic reasoning, improving policy performance in novel scenes by 1.5 times. △ Less

Submitted 22 May, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

arXiv:2402.02592 [pdf, other]

Unified Training of Universal Time Series Forecasting Transformers

Authors: Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, Doyen Sahoo

Abstract: Deep learning for time series forecasting has traditionally operated within a one-model-per-dataset framework, limiting its potential to leverage the game-changing impact of large pre-trained models. The concept of universal forecasting, emerging from pre-training on a vast collection of time series datasets, envisions a single Large Time Series Model capable of addressing diverse downstream forec… ▽ More Deep learning for time series forecasting has traditionally operated within a one-model-per-dataset framework, limiting its potential to leverage the game-changing impact of large pre-trained models. The concept of universal forecasting, emerging from pre-training on a vast collection of time series datasets, envisions a single Large Time Series Model capable of addressing diverse downstream forecasting tasks. However, constructing such a model poses unique challenges specific to time series data: i) cross-frequency learning, ii) accommodating an arbitrary number of variates for multivariate time series, and iii) addressing the varying distributional properties inherent in large-scale data. To address these challenges, we present novel enhancements to the conventional time series Transformer architecture, resulting in our proposed Masked Encoder-based Universal Time Series Forecasting Transformer (Moirai). Trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains, Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models. Code, data, and model weights can be found at https://github.com/SalesforceAIResearch/uni2ts. △ Less

Submitted 22 May, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

arXiv:2402.02203 [pdf]

Cell Painting Gallery: an open resource for image-based profiling

Authors: Erin Weisbart, Ankur Kumar, John Arevalo, Anne E. Carpenter, Beth A. Cimini, Shantanu Singh

Abstract: Image-based or morphological profiling is a rapidly expanding field wherein cells are "profiled" by extracting hundreds to thousands of unbiased, quantitative features from images of cells that have been perturbed by genetic or chemical perturbations. The Cell Painting assay is the most popular imaged-based profiling assay wherein six small-molecule dyes label eight cellular compartments and thous… ▽ More Image-based or morphological profiling is a rapidly expanding field wherein cells are "profiled" by extracting hundreds to thousands of unbiased, quantitative features from images of cells that have been perturbed by genetic or chemical perturbations. The Cell Painting assay is the most popular imaged-based profiling assay wherein six small-molecule dyes label eight cellular compartments and thousands of measurements are made, describing quantitative traits such as size, shape, intensity, and texture within the nucleus, cytoplasm, and whole cell (Cimini et al., 2023). We have created the Cell Painting Gallery, a publicly available collection of Cell Painting datasets, with granular dataset descriptions and access instructions. It is hosted by AWS on the Registry of Open Data (RODA). As of January 2024, the Cell Painting Gallery holds 656 terabytes (TB) of image and associated numerical data. It includes the largest publicly available Cell Painting dataset, in terms of perturbations tested (Joint Undertaking for Morphological Profiling or JUMP (Chandrasekaran et al., 2023)), along with many other canonical datasets using Cell Painting, close derivatives of Cell Painting (such as LipocyteProfiler (Laber et al., 2023) and Pooled Cell Painting (Ramezani et al., 2023)). △ Less

Submitted 3 February, 2024; originally announced February 2024.

Comments: 9 pages, 1 table

arXiv:2402.00849 [pdf, other]

Score-based Causal Representation Learning: Linear and General Transformations

Authors: Burak Varıcı, Emre Acartürk, Karthikeyan Shanmugam, Abhishek Kumar, Ali Tajer

Abstract: This paper addresses intervention-based causal representation learning (CRL) under a general nonparametric latent causal model and an unknown transformation that maps the latent variables to the observed variables. Linear and general transformations are investigated. The paper addresses both the identifiability and achievability aspects. Identifiability refers to determining algorithm-agnostic con… ▽ More This paper addresses intervention-based causal representation learning (CRL) under a general nonparametric latent causal model and an unknown transformation that maps the latent variables to the observed variables. Linear and general transformations are investigated. The paper addresses both the identifiability and achievability aspects. Identifiability refers to determining algorithm-agnostic conditions that ensure recovering the true latent causal variables and the latent causal graph underlying them. Achievability refers to the algorithmic aspects and addresses designing algorithms that achieve identifiability guarantees. By drawing novel connections between score functions (i.e., the gradients of the logarithm of density functions) and CRL, this paper designs a score-based class of algorithms that ensures both identifiability and achievability. First, the paper focuses on linear transformations and shows that one stochastic hard intervention per node suffices to guarantee identifiability. It also provides partial identifiability guarantees for soft interventions, including identifiability up to ancestors for general causal models and perfect latent graph recovery for sufficiently non-linear causal models. Secondly, it focuses on general transformations and shows that two stochastic hard interventions per node suffice for identifiability. Notably, one does not need to know which pair of interventional environments have the same node intervened. △ Less

Submitted 26 February, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

Comments: (updated literature review) Linear transformations: stronger results than our previous paper Score-based Causal Representation Learning with Interventions (arXiv:2301.08230). General transformations: results also appear in our paper General Identifiability and Achievability for Causal Representation Learning (arXiv:2310.15450) accepted to AISTATS 2024 (oral). arXiv admin note: text overlap with arXiv:2310.15450

arXiv:2402.00586 [pdf, other]

Spin wave-driven variable-phase mutual synchronization in spin Hall nano-oscillators

Authors: Akash Kumar, Avinash kumar Chaurasiya, Victor H. González, Nilamani Behera, Roman Khymyn, Ahmad A. Awad, Johan Åkerman

Abstract: Spin-orbit torque can drive auto-oscillations of propagating spin wave (PSW) modes in nano-constriction spin Hall nano-oscillators (SHNOs). These modes allow both long-range coupling and the potential of controlling its phase -- critical aspect for nano-magnonics, spin wave logic, and Ising machines. Here, we demonstrate PSW-driven variable-phase coupling between two nano-constriction SHNOs and st… ▽ More Spin-orbit torque can drive auto-oscillations of propagating spin wave (PSW) modes in nano-constriction spin Hall nano-oscillators (SHNOs). These modes allow both long-range coupling and the potential of controlling its phase -- critical aspect for nano-magnonics, spin wave logic, and Ising machines. Here, we demonstrate PSW-driven variable-phase coupling between two nano-constriction SHNOs and study how their separation and the PSW wave vector impact their mutual synchronization. In addition to ordinary in-phase mutual synchronization, we observe, using both electrical measurements and phase-resolved $μ-$Brillouin Light Scattering microscopy, mutual synchronization with a phase that can be tuned from 0 to $π$ using the drive current or the applied field. Micromagnetic simulations corroborate the experiments and visualize how the PSW patterns in the bridge connecting the two nano-constrictions govern the coupling. These results advance the capabilities of mutually synchronized SHNOs and open up new possibilities for applications in spin wave logic, unconventional computing, and Ising Machines. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: 19 pages

arXiv:2402.00368 [pdf, ps, other]

Emergence of tension-compression asymmetry from a complete phase-field approach to brittle fracture

Authors: Chang Liu, Aditya Kumar

Abstract: The classical variational approach to brittle fracture propagation does not distinguish between strain energy accumulation in tension versus compression and consequently results in physically unrealistic cracking under compression. A variety of energy splits have been proposed as a possible remedy. However, a unique energy split that can describe this asymmetry for general loading conditions has n… ▽ More The classical variational approach to brittle fracture propagation does not distinguish between strain energy accumulation in tension versus compression and consequently results in physically unrealistic cracking under compression. A variety of energy splits have been proposed as a possible remedy. However, a unique energy split that can describe this asymmetry for general loading conditions has not been found. The main objective of this paper is to show that a complete phase-field theory of brittle fracture nucleation and propagation, one that accounts for the material strength at large, can naturally capture the tension-compression asymmetry without an energy split. One such theory has been recently proposed by Kumar et al. (2018). Over the past few years, several studies have shown that this theory is capable of accurately describing fracture nucleation and propagation for materials soft and hard under arbitrary monotonic loading conditions. However, a systematic study of the tension-compression asymmetry that emerges from this theory has not yet been reported. This paper does precisely that. In particular, this paper reports a comprehensive study of crack propagation in two problems, one involving a symmetric tension-compression state and the other involving larger compressive stresses at the crack tip. The results are compared with popular energy splits used in literature. The results show that, remarkably, for the second problem, only the complete theory is able to produce experimentally consistent results. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Showing 151–200 of 2,910 results for author: Kumar, A