-
Multiple collisions of eigenvalues and singular values of matrix Gaussian field
Authors:
Wangjun Yuan
Abstract:
Let $X^β$ be a real symmetric or complex Hermitian matrix whose entries are independent Gaussian random fields. We provide the sufficient and necessary conditions such that multiple collisions of eigenvalue processes of $A^β+ T_βX^βT_β^*$ occur with positive probability. In addition, for a real or complex rectangular matrix $W^β$ with independent Gaussian random field entries, we obtain the suffic…
▽ More
Let $X^β$ be a real symmetric or complex Hermitian matrix whose entries are independent Gaussian random fields. We provide the sufficient and necessary conditions such that multiple collisions of eigenvalue processes of $A^β+ T_βX^βT_β^*$ occur with positive probability. In addition, for a real or complex rectangular matrix $W^β$ with independent Gaussian random field entries, we obtain the sufficient and necessary conditions under which the probability of multiple collisions of non-trivial singular value processes of $B^β+ T_βW^β\tilde T_β$ is positive. In both cases, the size of the set of collision times is characterized via Hausdorff dimension.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Strong convergence for tensor GUE random matrices
Authors:
Benoît Collins,
Wangjun Yuan
Abstract:
Haagerup and Thorbjørnsen proved that iid GUEs converge strongly to free semicircular elements as the dimension grows to infinity. Motivated by considerations from quantum physics -- in particular, understanding nearest neighbor interactions in quantum spin systems -- we consider iid GUE acting on multipartite state spaces, with a mixing component on some sites and identity on the remaining sites.…
▽ More
Haagerup and Thorbjørnsen proved that iid GUEs converge strongly to free semicircular elements as the dimension grows to infinity. Motivated by considerations from quantum physics -- in particular, understanding nearest neighbor interactions in quantum spin systems -- we consider iid GUE acting on multipartite state spaces, with a mixing component on some sites and identity on the remaining sites. We show that under proper assumptions on the dimension of the sites, strong asymptotic freeness still holds. Our proof relies on an interpolation technology recently introduced by Bandeira, Boedihardjo and van Handel.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Implications of mappings between ICD clinical diagnosis codes and Human Phenotype Ontology terms
Authors:
Amelia LM Tan,
Rafael S Gonçalves,
William Yuan,
Gabriel A Brat,
The Consortium for Clinical Characterization of COVID-19 by EHR,
Robert Gentleman,
Isaac S Kohane
Abstract:
Objective: Integrating EHR data with other resources is essential in rare disease research due to low disease prevalence. Such integration is dependent on the alignment of ontologies used for data annotation. The International Classification of Diseases (ICD) is used to annotate clinical diagnoses; the Human Phenotype Ontology (HPO) to annotate phenotypes. Although these ontologies overlap in biom…
▽ More
Objective: Integrating EHR data with other resources is essential in rare disease research due to low disease prevalence. Such integration is dependent on the alignment of ontologies used for data annotation. The International Classification of Diseases (ICD) is used to annotate clinical diagnoses; the Human Phenotype Ontology (HPO) to annotate phenotypes. Although these ontologies overlap in biomedical entities described, the extent to which they are interoperable is unknown. We investigate how well aligned these ontologies are and whether such alignments facilitate EHR data integration.
Materials and Methods: We conducted an empirical analysis of the coverage of mappings between ICD and HPO. We interpret this mapping coverage as a proxy for how easily clinical data can be integrated with research ontologies such as HPO. We quantify how exhaustively ICD codes are mapped to HPO by analyzing mappings in the UMLS Metathesaurus. We analyze the proportion of ICD codes mapped to HPO within a real-world EHR dataset.
Results and Discussion: Our analysis revealed that only 2.2% of ICD codes have direct mappings to HPO in UMLS. Within our EHR dataset, less than 50% of ICD codes have mappings to HPO terms. ICD codes that are used frequently in EHR data tend to have mappings to HPO; ICD codes that represent rarer medical conditions are seldom mapped.
Conclusion: We find that interoperability between ICD and HPO via UMLS is limited. While other mapping sources could be incorporated, there are no established conventions for what resources should be used to complement UMLS.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
OFDM Achieves the Lowest Ranging Sidelobe Under Random ISAC Signaling
Authors:
Fan Liu,
Ying Zhang,
Yifeng Xiong,
Shuangyang Li,
Weijie Yuan,
Feifei Gao,
Shi Jin,
Giuseppe Caire
Abstract:
This paper aims to answer a fundamental question in the area of Integrated Sensing and Communications (ISAC): What is the optimal communication-centric ISAC waveform for ranging? Towards that end, we first established a generic framework to analyze the sensing performance of communication-centric ISAC waveforms built upon orthonormal signaling bases and random data symbols. Then, we evaluated thei…
▽ More
This paper aims to answer a fundamental question in the area of Integrated Sensing and Communications (ISAC): What is the optimal communication-centric ISAC waveform for ranging? Towards that end, we first established a generic framework to analyze the sensing performance of communication-centric ISAC waveforms built upon orthonormal signaling bases and random data symbols. Then, we evaluated their ranging performance by adopting both the periodic and aperiodic auto-correlation functions (P-ACF and A-ACF), and defined the expectation of the integrated sidelobe level (EISL) as a sensing performance metric. On top of that, we proved that among all communication waveforms with cyclic prefix (CP), the orthogonal frequency division multiplexing (OFDM) modulation is the only globally optimal waveform that achieves the lowest ranging sidelobe for quadrature amplitude modulation (QAM) and phase shift keying (PSK) constellations, in terms of both the EISL and the sidelobe level at each individual lag of the P-ACF. As a step forward, we proved that among all communication waveforms without CP, OFDM is a locally optimal waveform for QAM/PSK in the sense that it achieves a local minimum of the EISL of the A-ACF. Finally, we demonstrated by numerical results that under QAM/PSK constellations, there is no other orthogonal communication-centric waveform that achieves a lower ranging sidelobe level than that of the OFDM, in terms of both P-ACF and A-ACF cases.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Three-Body Recombination of Ultracold Microwave-Shielded Polar Molecules
Authors:
Ian Stevenson,
Shayamal Singh,
Ahmed Elkamshishy,
Niccoló Bigagli,
Weijun Yuan,
Siwei Zhang,
Chris H. Greene,
Sebastian Will
Abstract:
A combined experimental and theoretical study is carried out on the three-body recombination process in a gas of microwave-shielded polar molecules. For ground-state polar molecules dressed with a strong microwave field, field-linked bound states can appear in the intermolecular potential. We model three-body recombination into such bound states using classical trajectory calculations. Our results…
▽ More
A combined experimental and theoretical study is carried out on the three-body recombination process in a gas of microwave-shielded polar molecules. For ground-state polar molecules dressed with a strong microwave field, field-linked bound states can appear in the intermolecular potential. We model three-body recombination into such bound states using classical trajectory calculations. Our results show that recombination can explain the enhanced loss rates observed at small microwave detunings in trapped samples of bosonic NaCs [Bigagli, $\textit{et al.}$, Nat. Phys. $\textbf{19}$ 1579-1584 (2023)]. Specifically, our calculations reproduce the experimentally measured three-body loss rates across a wide range of microwave Rabi couplings, detunings, and temperatures. This work suggests that for bosonic shielded molecular systems in which the two-body loss is sufficiently suppressed and a field-linked bound state is present, the dominant loss process will be three-body recombination.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
An Intelligent Robotic System for Perceptive Pancake Batter Stirring and Precise Pouring
Authors:
Xinyuan Luo,
Shengmiao Jin,
Hung-Jui Huang,
Wenzhen Yuan
Abstract:
Cooking robots have long been desired by the commercial market, while the technical challenge is still significant. A major difficulty comes from the demand of perceiving and handling liquid with different properties. This paper presents a robot system that mixes batter and makes pancakes out of it, where understanding and handling the viscous liquid is an essential component. The system integrate…
▽ More
Cooking robots have long been desired by the commercial market, while the technical challenge is still significant. A major difficulty comes from the demand of perceiving and handling liquid with different properties. This paper presents a robot system that mixes batter and makes pancakes out of it, where understanding and handling the viscous liquid is an essential component. The system integrates Haptic Sensing and control algorithms to autonomously stir flour and water to achieve the desired batter uniformity, estimate the batter's properties such as the water-flour ratio and liquid level, as well as perform precise manipulations to pour the batter into any specified shape. Experimental results show the system's capability to always produce batter of desired uniformity, estimate water-flour ratio and liquid level precisely, and accurately pour it into complex shapes. This research showcases the potential for robots to assist in kitchens and step towards commercial culinary automation.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Dressed-State Spectroscopy and Magic Trapping of Microwave-Shielded NaCs Molecules
Authors:
Siwei Zhang,
Weijun Yuan,
Niccolò Bigagli,
Claire Warner,
Ian Stevenson,
Sebastian Will
Abstract:
We report on the optical polarizability of microwave-shielded ultracold NaCs molecules in an optical dipole trap. While dressing a pair of rotational states with a microwave field, we observe a marked dependence of the optical polarizability on the intensity and detuning of the dressing field. To precisely characterize differential energy shifts between dressed rotational states, we establish dres…
▽ More
We report on the optical polarizability of microwave-shielded ultracold NaCs molecules in an optical dipole trap. While dressing a pair of rotational states with a microwave field, we observe a marked dependence of the optical polarizability on the intensity and detuning of the dressing field. To precisely characterize differential energy shifts between dressed rotational states, we establish dressed-state spectroscopy. For strong dressing fields, we find that a magic rotational transition can be engineered and demonstrate its insensitivity to laser intensity fluctuations. The results of this work have direct relevance for evaporative cooling and the recent demonstration of molecular Bose-Einstein condensates [Bigagli, et al., Nature (2024)] and may open a door to precision microwave spectroscopy in interacting many-body systems of microwave-shielded molecules.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
Authors:
Jiafei Duan,
Wentao Yuan,
Wilbert Pumacay,
Yi Ru Wang,
Kiana Ehsani,
Dieter Fox,
Ranjay Krishna
Abstract:
Large-scale endeavors like RT-1 and widespread community efforts such as Open-X-Embodiment have contributed to growing the scale of robot demonstration data. However, there is still an opportunity to improve the quality, quantity, and diversity of robot demonstration data. Although vision-language models have been shown to automatically generate demonstration data, their utility has been limited t…
▽ More
Large-scale endeavors like RT-1 and widespread community efforts such as Open-X-Embodiment have contributed to growing the scale of robot demonstration data. However, there is still an opportunity to improve the quality, quantity, and diversity of robot demonstration data. Although vision-language models have been shown to automatically generate demonstration data, their utility has been limited to environments with privileged state information, they require hand-designed skills, and are limited to interactions with few object instances. We propose Manipulate-Anything, a scalable automated generation method for real-world robotic manipulation. Unlike prior work, our method can operate in real-world environments without any privileged state information, hand-designed skills, and can manipulate any static object. We evaluate our method using two setups. First, Manipulate-Anything successfully generates trajectories for all 5 real-world and 12 simulation tasks, significantly outperforming existing methods like VoxPoser. Second, Manipulate-Anything's demonstrations can train more robust behavior cloning policies than training with human demonstrations, or from data generated by VoxPoser and Code-As-Policies. We believe Manipulate-Anything can be the scalable method for both generating data for robotics and solving novel tasks in a zero-shot setting.
△ Less
Submitted 27 June, 2024; v1 submitted 27 June, 2024;
originally announced June 2024.
-
Following Length Constraints in Instructions
Authors:
Weizhe Yuan,
Ilia Kulikov,
Ping Yu,
Kyunghyun Cho,
Sainbayar Sukhbaatar,
Jason Weston,
Jing Xu
Abstract:
Aligned instruction following models can better fulfill user requests than their unaligned counterparts. However, it has been shown that there is a length bias in evaluation of such models, and that training algorithms tend to exploit this bias by learning longer responses. In this work we show how to train models that can be controlled at inference time with instructions containing desired length…
▽ More
Aligned instruction following models can better fulfill user requests than their unaligned counterparts. However, it has been shown that there is a length bias in evaluation of such models, and that training algorithms tend to exploit this bias by learning longer responses. In this work we show how to train models that can be controlled at inference time with instructions containing desired length constraints. Such models are superior in length instructed evaluations, outperforming standard instruction following models such as GPT4, Llama 3 and Mixtral.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Gaussian-Informed Continuum for Physical Property Identification and Simulation
Authors:
Junhao Cai,
Yuji Yang,
Weihao Yuan,
Yisheng He,
Zilong Dong,
Liefeng Bo,
Hui Cheng,
Qifeng Chen
Abstract:
This paper studies the problem of estimating physical properties (system identification) through visual observations. To facilitate geometry-aware guidance in physical property estimation, we introduce a novel hybrid framework that leverages 3D Gaussian representation to not only capture explicit shapes but also enable the simulated continuum to deduce implicit shapes during training. We propose a…
▽ More
This paper studies the problem of estimating physical properties (system identification) through visual observations. To facilitate geometry-aware guidance in physical property estimation, we introduce a novel hybrid framework that leverages 3D Gaussian representation to not only capture explicit shapes but also enable the simulated continuum to deduce implicit shapes during training. We propose a new dynamic 3D Gaussian framework based on motion factorization to recover the object as 3D Gaussian point sets across different time states. Furthermore, we develop a coarse-to-fine filling strategy to generate the density fields of the object from the Gaussian reconstruction, allowing for the extraction of object continuums along with their surfaces and the integration of Gaussian attributes into these continuums. In addition to the extracted object surfaces, the Gaussian-informed continuum also enables the rendering of object masks during simulations, serving as implicit shape guidance for physical property estimation. Extensive experimental evaluations demonstrate that our pipeline achieves state-of-the-art performance across multiple benchmarks and metrics. Additionally, we illustrate the effectiveness of the proposed method through real-world demonstrations, showcasing its practical utility. Our project page is at https://jukgei.github.io/project/gic.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics
Authors:
Wentao Yuan,
Jiafei Duan,
Valts Blukis,
Wilbert Pumacay,
Ranjay Krishna,
Adithyavairavan Murali,
Arsalan Mousavian,
Dieter Fox
Abstract:
From rearranging objects on a table to putting groceries into shelves, robots must plan precise action points to perform tasks accurately and reliably. In spite of the recent adoption of vision language models (VLMs) to control robot behavior, VLMs struggle to precisely articulate robot actions using language. We introduce an automatic synthetic data generation pipeline that instruction-tunes VLMs…
▽ More
From rearranging objects on a table to putting groceries into shelves, robots must plan precise action points to perform tasks accurately and reliably. In spite of the recent adoption of vision language models (VLMs) to control robot behavior, VLMs struggle to precisely articulate robot actions using language. We introduce an automatic synthetic data generation pipeline that instruction-tunes VLMs to robotic domains and needs. Using the pipeline, we train RoboPoint, a VLM that predicts image keypoint affordances given language instructions. Compared to alternative approaches, our method requires no real-world data collection or human demonstration, making it much more scalable to diverse environments and viewpoints. In addition, RoboPoint is a general model that enables several downstream applications such as robot navigation, manipulation, and augmented reality (AR) assistance. Our experiments demonstrate that RoboPoint outperforms state-of-the-art VLMs (GPT-4o) and visual prompting techniques (PIVOT) by 21.8% in the accuracy of predicting spatial affordance and by 30.5% in the success rate of downstream tasks. Project website: https://robo-point.github.io.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
PTF-FSR: A Parameter Transmission-Free Federated Sequential Recommender System
Authors:
Wei Yuan,
Chaoqun Yang,
Liang Qu,
Quoc Viet Hung Nguyen,
Guanhua Ye,
Hongzhi Yin
Abstract:
Sequential recommender systems have made significant progress. Recently, due to increasing concerns about user data privacy, some researchers have implemented federated learning for sequential recommendation, a.k.a., Federated Sequential Recommender Systems (FedSeqRecs), in which a public sequential recommender model is shared and frequently transmitted between a central server and clients to achi…
▽ More
Sequential recommender systems have made significant progress. Recently, due to increasing concerns about user data privacy, some researchers have implemented federated learning for sequential recommendation, a.k.a., Federated Sequential Recommender Systems (FedSeqRecs), in which a public sequential recommender model is shared and frequently transmitted between a central server and clients to achieve collaborative learning. Although these solutions mitigate user privacy to some extent, they present two significant limitations that affect their practical usability: (1) They require a globally shared sequential recommendation model. However, in real-world scenarios, the recommendation model constitutes a critical intellectual property for platform and service providers. Therefore, service providers may be reluctant to disclose their meticulously developed models. (2) The communication costs are high as they correlate with the number of model parameters. This becomes particularly problematic as the current FedSeqRec will be inapplicable when sequential recommendation marches into a large language model era.
To overcome the above challenges, this paper proposes a parameter transmission-free federated sequential recommendation framework (PTF-FSR), which ensures both model and data privacy protection to meet the privacy needs of service providers and system users alike. Furthermore, since PTF-FSR only transmits prediction results under privacy protection, which are independent of model sizes, this new federated learning architecture can accommodate more complex and larger sequential recommendation models. Extensive experiments conducted on three widely used recommendation datasets, employing various sequential recommendation models from both ID-based and ID-free paradigms, demonstrate the effectiveness and generalization capability of our proposed framework.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Markov chain Monte Carlo without evaluating the target: an auxiliary variable approach
Authors:
Wei Yuan,
Guanyang Wang
Abstract:
In sampling tasks, it is common for target distributions to be known up to a normalising constant. However, in many situations, evaluating even the unnormalised distribution can be costly or infeasible. This issue arises in scenarios such as sampling from the Bayesian posterior for tall datasets and the 'doubly-intractable' distributions. In this paper, we begin by observing that seemingly differe…
▽ More
In sampling tasks, it is common for target distributions to be known up to a normalising constant. However, in many situations, evaluating even the unnormalised distribution can be costly or infeasible. This issue arises in scenarios such as sampling from the Bayesian posterior for tall datasets and the 'doubly-intractable' distributions. In this paper, we begin by observing that seemingly different Markov chain Monte Carlo (MCMC) algorithms, such as the exchange algorithm, PoissonMH, and TunaMH, can be unified under a simple common procedure. We then extend this procedure into a novel framework that allows the use of auxiliary variables in both the proposal and acceptance-rejection steps. We develop the theory of the new framework, applying it to existing algorithms to simplify and extend their results. Several new algorithms emerge from this framework, with improved performance demonstrated on both synthetic and real datasets.
△ Less
Submitted 27 June, 2024; v1 submitted 7 June, 2024;
originally announced June 2024.
-
Near-field Beamforming for Extremely Large-scale MIMO Based on Unsupervised Deep Learning
Authors:
Jiali Nie,
Yuanhao Cui,
Zhaohui Yang,
Weijie Yuan,
Xiaojun Jing
Abstract:
Extremely Large-scale Array (ELAA) is considered a frontier technology for future communication systems, pivotal in improving wireless systems' rate and spectral efficiency. However, as ELAA employs a multitude of antennas operating at higher frequencies, users are typically situated in the near-field region where the spherical wavefront propagates. This inevitably leads to a significant increase…
▽ More
Extremely Large-scale Array (ELAA) is considered a frontier technology for future communication systems, pivotal in improving wireless systems' rate and spectral efficiency. However, as ELAA employs a multitude of antennas operating at higher frequencies, users are typically situated in the near-field region where the spherical wavefront propagates. This inevitably leads to a significant increase in the overhead of beam training, requiring complex two-dimensional beam searching in both the angle domain and the distance domain. To address this problem, we propose a near-field beamforming method based on unsupervised deep learning. Our convolutional neural network efficiently extracts complex channel state information features by strategically selecting padding and kernel size. We optimize the beamformers to maximize achievable rates in a multi-user network without relying on predefined custom codebooks. Upon deployment, the model requires solely the input of pre-estimated channel state information to derive the optimal beamforming vector. Simulation results show that our proposed scheme can obtain stable beamforming gain compared with the baseline scheme. Furthermore, owing to the inherent traits of deep learning methodologies, this approach substantially diminishes the beam training costs in near-field regions.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Poisoning Attacks and Defenses in Recommender Systems: A Survey
Authors:
Zongwei Wang,
Junliang Yu,
Min Gao,
Wei Yuan,
Guanhua Ye,
Shazia Sadiq,
Hongzhi Yin
Abstract:
Modern recommender systems (RS) have profoundly enhanced user experience across digital platforms, yet they face significant threats from poisoning attacks. These attacks, aimed at manipulating recommendation outputs for unethical gains, exploit vulnerabilities in RS through injecting malicious data or intervening model training. This survey presents a unique perspective by examining these threats…
▽ More
Modern recommender systems (RS) have profoundly enhanced user experience across digital platforms, yet they face significant threats from poisoning attacks. These attacks, aimed at manipulating recommendation outputs for unethical gains, exploit vulnerabilities in RS through injecting malicious data or intervening model training. This survey presents a unique perspective by examining these threats through the lens of an attacker, offering fresh insights into their mechanics and impacts. Concretely, we detail a systematic pipeline that encompasses four stages of a poisoning attack: setting attack goals, assessing attacker capabilities, analyzing victim architecture, and implementing poisoning strategies. The pipeline not only aligns with various attack tactics but also serves as a comprehensive taxonomy to pinpoint focuses of distinct poisoning attacks. Correspondingly, we further classify defensive strategies into two main categories: poisoning data filtering and robust training from the defender's perspective. Finally, we highlight existing limitations and suggest innovative directions for further exploration in this field.
△ Less
Submitted 5 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Muckenhoupt Weights Meet Brezis--Seeger--Van Schaftingen--Yung Formulae in Ball Banach Function Spaces
Authors:
Yinqin Li,
Dachun Yang,
Wen Yuan,
Yangyang Zhang,
Yirui Zhao
Abstract:
In this article, via first establishing a weighted variant of the profound and far-reaching inequality obtained by A. Cohen, W. Dahmen, I. Daubechies, and R. DeVore in 2003, the authors give two new characterizations of Muckenhoupt weights. As an application, the authors further establish a representation formula of gradients with sharp parameters in ball Banach function spaces, which extends the…
▽ More
In this article, via first establishing a weighted variant of the profound and far-reaching inequality obtained by A. Cohen, W. Dahmen, I. Daubechies, and R. DeVore in 2003, the authors give two new characterizations of Muckenhoupt weights. As an application, the authors further establish a representation formula of gradients with sharp parameters in ball Banach function spaces, which extends the famous formula obtained by H. Brezis, A. Seeger, J. Van Schaftingen, and P.-L. Yung in 2021 from classical Sobolev spaces to various different Sobolev-type spaces and gives an affirmative answer to the question in page 29 of [Calc. Var. Partial Differential Equations 62 (2023), Paper No. 234]. The most novelty of this article exists in subtly revealing the mutual equivalences among the Muckenhoupt weight, the weighted variant of the inequality of Cohen et al., and the weighted upper estimate of the formula of Brezis et al.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Chiral quantum heating and cooling with an optically controlled ion
Authors:
Jin-Tao Bu,
Jian-Qi Zhang,
Ge-Yi Ding,
Jia-Chong Li,
Jia-Wei Zhang,
Bin Wang,
Wen-Qiang Ding,
Wen-Fei Yuan,
Liang Chen,
Qi Zhong,
Ali Keçebaş,
Şahin K. Özdemir,
Fei Zhou,
Hui Jing,
Mang Feng
Abstract:
Quantum heat engines and refrigerators are open quantum systems, whose dynamics can be well understood using a non-Hermitian formalism. A prominent feature of non-Hermiticity is the existence of exceptional points (EPs), which has no counterpart in closed quantum systems. It has been shown in classical systems that dynamical encirclement in the vicinity of an EP, whether the loop includes the EP o…
▽ More
Quantum heat engines and refrigerators are open quantum systems, whose dynamics can be well understood using a non-Hermitian formalism. A prominent feature of non-Hermiticity is the existence of exceptional points (EPs), which has no counterpart in closed quantum systems. It has been shown in classical systems that dynamical encirclement in the vicinity of an EP, whether the loop includes the EP or not, could lead to chiral mode conversion. Here, we show that this is valid also for quantum systems when dynamical encircling is performed in the vicinity of their Liouvillian EPs (LEPs) which include the effects of quantum jumps and associated noise - an important quantum feature not present in previous works. We demonstrate, using a Paul-trapped ultracold ion, the first chiral quantum heating and refrigeration by dynamically encircling a closed loop in the vicinity of an LEP. We witness the cycling direction to be associated with the chirality and heat release (absorption) of the quantum heat engine (quantum refrigerator). Our experiments have revealed that not only the adiabaticity-breakdown but also the Landau-Zener-Stückelberg process play an essential role during dynamic encircling, resulting in chiral thermodynamic cycles. Our observations contributes to further understanding of chiral and topological features in non-Hermitian systems and pave a way to exploring the relation between chirality and quantum thermodynamics.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Laboratory-scale Perpendicular Collisionless Shock Generation and Ion Acceleration in Magnetized Head-on Colliding Plasmas
Authors:
P. Liu,
D. Wu,
D. W. Yuan,
G. Zhao,
Z. M. Sheng,
X. T. He,
J. Zhang
Abstract:
Magnetized collisionless shocks drive particle acceleration broadly in space and astrophysics. We perform the first large-scale particle-in-cell simulations with realistic laboratory parameters (density, temperature, and velocity) to investigate the magnetized shock in head-on colliding plasmas with an applied magnetic field of tens of Tesla. It is shown that a perpendicular collisionless shock is…
▽ More
Magnetized collisionless shocks drive particle acceleration broadly in space and astrophysics. We perform the first large-scale particle-in-cell simulations with realistic laboratory parameters (density, temperature, and velocity) to investigate the magnetized shock in head-on colliding plasmas with an applied magnetic field of tens of Tesla. It is shown that a perpendicular collisionless shock is formed with about fourfold density jump when two pre-magnetized flows collide. This shock is also characterized by rapid increase of neutron yield, triggered by the beam-beam nuclear reactions between injected deuterons and ones reflected by the shock. Distinct from the shocks arising from the interaction of injected flows with a magnetized background, the self-generated magnetic field in this colliding plasmas experiences a significant amplification due to the increasing diamagnetic current, approximately 30 times of upstream magnetic field. Moreover, we find that ions, regardless of whether they pass through or are reflected by the shock, can gain energy by the shock surfing acceleration, generating a power-law energy spectrum. In addition, we also demonstrate that the shock mediated only by filamentation instability cannot be generated under the prevailing unmagnetized experimental parameters. These results provide a direct connection of astrophysical field amplification to the magnetized shock formation and nonthermal ion generation.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Vector-Symbolic Architecture for Event-Based Optical Flow
Authors:
Hongzhi You,
Yijun Cao,
Wei Yuan,
Fanjun Wang,
Ning Qiao,
Yongjie Li
Abstract:
From a perspective of feature matching, optical flow estimation for event cameras involves identifying event correspondences by comparing feature similarity across accompanying event frames. In this work, we introduces an effective and robust high-dimensional (HD) feature descriptor for event frames, utilizing Vector Symbolic Architectures (VSA). The topological similarity among neighboring variab…
▽ More
From a perspective of feature matching, optical flow estimation for event cameras involves identifying event correspondences by comparing feature similarity across accompanying event frames. In this work, we introduces an effective and robust high-dimensional (HD) feature descriptor for event frames, utilizing Vector Symbolic Architectures (VSA). The topological similarity among neighboring variables within VSA contributes to the enhanced representation similarity of feature descriptors for flow-matching points, while its structured symbolic representation capacity facilitates feature fusion from both event polarities and multiple spatial scales. Based on this HD feature descriptor, we propose a novel feature matching framework for event-based optical flow, encompassing both model-based (VSA-Flow) and self-supervised learning (VSA-SM) methods. In VSA-Flow, accurate optical flow estimation validates the effectiveness of HD feature descriptors. In VSA-SM, a novel similarity maximization method based on the HD feature descriptor is proposed to learn optical flow in a self-supervised way from events alone, eliminating the need for auxiliary grayscale images. Evaluation results demonstrate that our VSA-based method achieves superior accuracy in comparison to both model-based and self-supervised learning methods on the DSEC benchmark, while remains competitive among both methods on the MVSEC benchmark. This contribution marks a significant advancement in event-based optical flow within the feature matching methodology.
△ Less
Submitted 15 May, 2024; v1 submitted 13 May, 2024;
originally announced May 2024.
-
New approach to affine Moser-Trudinger inequalities via Besov polar projection bodies
Authors:
Oscar Dominguez,
Yinqin Li,
Sergey Tikhonov,
Dachun Yang,
Wen Yuan
Abstract:
We extend the affine inequalities on $\mathbb{R}^n$ for Sobolev functions in $W^{s,p}$ with $1 \leq p < n/s$ obtained recently by Haddad-Ludwig [16, 17] to the remaining range $p \geq n/s$. For each value of $s$, our results are stronger than affine Moser-Trudinger and Morrey inequalities. As a byproduct, we establish the analog of the classical $L^p$ Bourgain-Brezis-Mironescu inequalities related…
▽ More
We extend the affine inequalities on $\mathbb{R}^n$ for Sobolev functions in $W^{s,p}$ with $1 \leq p < n/s$ obtained recently by Haddad-Ludwig [16, 17] to the remaining range $p \geq n/s$. For each value of $s$, our results are stronger than affine Moser-Trudinger and Morrey inequalities. As a byproduct, we establish the analog of the classical $L^p$ Bourgain-Brezis-Mironescu inequalities related to the Moser-Trudinger case $p=n$. Our main tool is the affine invariant provided by Besov polar projection bodies.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
ISAC-Assisted Wireless Rechargeable Sensor Networks with Multiple Mobile Charging Vehicles
Authors:
Muhammad Umar Farooq Qaisar,
Weijie Yuan,
Paolo Bellavista,
Guangjie Han,
Adeel Ahmed
Abstract:
As IoT-based wireless sensor networks (WSNs) become more prevalent, the issue of energy shortages becomes more pressing. One potential solution is the use of wireless power transfer (WPT) technology, which is the key to building a new shape of wireless rechargeable sensor networks (WRSNs). However, efficient charging and scheduling are critical for WRSNs to function properly. Motivated by the fact…
▽ More
As IoT-based wireless sensor networks (WSNs) become more prevalent, the issue of energy shortages becomes more pressing. One potential solution is the use of wireless power transfer (WPT) technology, which is the key to building a new shape of wireless rechargeable sensor networks (WRSNs). However, efficient charging and scheduling are critical for WRSNs to function properly. Motivated by the fact that probabilistic techniques can help enhance the effectiveness of charging scheduling for WRSNs, this article addresses the aforementioned issue and proposes a novel ISAC-assisted WRSN protocol. In particular, our proposed protocol considers several factors to balance the charging load on each mobile charging vehicle (MCV), uses an efficient charging factor strategy to partially charge network devices, and employs the ISAC concept to reduce the traveling cost of each MCV and prevent charging conflicts. Simulation results demonstrate that this protocol outperforms other classic, cutting-edge protocols in multiple areas.
△ Less
Submitted 11 May, 2024;
originally announced May 2024.
-
Improving the Ranging Performance of Random ISAC Signals Through Pulse Shaping Design
Authors:
Zihan Liao,
Fan Liu,
Shuangyang Li,
Yifeng Xiong,
Weijie Yuan,
Marco Lops
Abstract:
In this paper, we propose a novel pulse shaping design for single-carrier integrated sensing and communication (ISAC) transmission. Due to the communication information embedded in the ISAC signal, the resulting auto-correlation function (ACF) is determined by both the information-conveying random symbol sequence and the signaling pulse, where the former leads to random fluctuations in the sidelob…
▽ More
In this paper, we propose a novel pulse shaping design for single-carrier integrated sensing and communication (ISAC) transmission. Due to the communication information embedded in the ISAC signal, the resulting auto-correlation function (ACF) is determined by both the information-conveying random symbol sequence and the signaling pulse, where the former leads to random fluctuations in the sidelobes of the ACF, impairing the range estimation performance. To overcome this challenge, we first analyze the statistical characteristics of the random ACF under the symbol-wise pulse shaping (SWPS) regime. As a step further, we formulate an optimization problem to design ISAC pulse shaping filters, which minimizes the average integrated sidelobe level ratio (ISLR) while meeting the Nyquist criterion, subject to power and bandwidth constraints. We then show that the problem can be recast as a convex quadratic program by expressing it in the frequency domain, which can be readily solved through standard tools. Numerical results demonstrate that the proposed pulse shaping design achieves substantial ranging sidelobe reduction compared to the celebrated root-raised cosine (RRC) pulse shaping, given that the communication throughput is unchanged.
△ Less
Submitted 6 May, 2024; v1 submitted 6 May, 2024;
originally announced May 2024.
-
Iterative Reasoning Preference Optimization
Authors:
Richard Yuanzhe Pang,
Weizhe Yuan,
Kyunghyun Cho,
He He,
Sainbayar Sukhbaatar,
Jason Weston
Abstract:
Iterative preference optimization methods have recently been shown to perform well for general instruction tuning tasks, but typically make little improvement on reasoning tasks (Yuan et al., 2024, Chen et al., 2024). In this work we develop an iterative approach that optimizes the preference between competing generated Chain-of-Thought (CoT) candidates by optimizing for winning vs. losing reasoni…
▽ More
Iterative preference optimization methods have recently been shown to perform well for general instruction tuning tasks, but typically make little improvement on reasoning tasks (Yuan et al., 2024, Chen et al., 2024). In this work we develop an iterative approach that optimizes the preference between competing generated Chain-of-Thought (CoT) candidates by optimizing for winning vs. losing reasoning steps that lead to the correct answer. We train using a modified DPO loss (Rafailov et al., 2023) with an additional negative log-likelihood term, which we find to be crucial. We show reasoning improves across repeated iterations of this scheme. While only relying on examples in the training set, our approach results in increasing accuracy on GSM8K, MATH, and ARC-Challenge for Llama-2-70B-Chat, outperforming other Llama-2-based models not relying on additionally sourced datasets. For example, we see a large improvement from 55.6% to 81.6% on GSM8K and an accuracy of 88.7% with majority voting out of 32 samples.
△ Less
Submitted 25 June, 2024; v1 submitted 30 April, 2024;
originally announced April 2024.
-
Dual-Functional Waveform Design with Local Sidelobe Suppression via OTFS Signaling
Authors:
Kecheng Zhang,
Weijie Yuan,
Pingzhi Fan,
Xianbin Wang
Abstract:
Integrated sensing and communication (ISAC) is viewed as a key technology in future wireless networks. One of the main challenges in realizing ISAC is developing dual-functional waveforms that can communicate with communication receivers and perform radar sensing simultaneously. In this paper, we consider the joint design of a dual-functional orthogonal time-frequency space (OTFS) signal and a rec…
▽ More
Integrated sensing and communication (ISAC) is viewed as a key technology in future wireless networks. One of the main challenges in realizing ISAC is developing dual-functional waveforms that can communicate with communication receivers and perform radar sensing simultaneously. In this paper, we consider the joint design of a dual-functional orthogonal time-frequency space (OTFS) signal and a receiving filter for the ISAC system. The problem of ISAC waveform design is formulated as the minimization of the weighted integrated sidelobe level (WISL) of the ambiguity function and the interference term from ISAC waveform, with constraints on signal-to-noise ratio loss. The majorization-minimization algorithm combined with alternating iterative minimization is implemented to solve the optimization problem. Simulation results show that the WISL and the interference term can be significantly decreased to guarantee achievable data rates and detection performance.
△ Less
Submitted 30 April, 2024; v1 submitted 26 April, 2024;
originally announced April 2024.
-
NTIRE 2024 Quality Assessment of AI-Generated Content Challenge
Authors:
Xiaohong Liu,
Xiongkuo Min,
Guangtao Zhai,
Chunyi Li,
Tengchuan Kou,
Wei Sun,
Haoning Wu,
Yixuan Gao,
Yuqin Cao,
Zicheng Zhang,
Xiele Wu,
Radu Timofte,
Fei Peng,
Huiyuan Fu,
Anlong Ming,
Chuanming Wang,
Huadong Ma,
Shuai He,
Zifei Dou,
Shu Chen,
Huacong Zhang,
Haiyi Xie,
Chengwei Wang,
Baoying Chen,
Jishen Zeng
, et al. (89 additional authors not shown)
Abstract:
This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte…
▽ More
This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Content (AIGC). The challenge is divided into the image track and the video track. The image track uses the AIGIQA-20K, which contains 20,000 AI-Generated Images (AIGIs) generated by 15 popular generative models. The image track has a total of 318 registered participants. A total of 1,646 submissions are received in the development phase, and 221 submissions are received in the test phase. Finally, 16 participating teams submitted their models and fact sheets. The video track uses the T2VQA-DB, which contains 10,000 AI-Generated Videos (AIGVs) generated by 9 popular Text-to-Video (T2V) models. A total of 196 participants have registered in the video track. A total of 991 submissions are received in the development phase, and 185 submissions are received in the test phase. Finally, 12 participating teams submitted their models and fact sheets. Some methods have achieved better results than baseline methods, and the winning methods in both tracks have demonstrated superior prediction performance on AIGC.
△ Less
Submitted 7 May, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
Soft X-ray prompt emission from a high-redshift gamma-ray burst EP240315a
Authors:
Y. Liu,
H. Sun,
D. Xu,
D. S. Svinkin,
J. Delaunay,
N. R. Tanvir,
H. Gao,
C. Zhang,
Y. Chen,
X. -F. Wu,
B. Zhang,
W. Yuan,
J. An,
G. Bruni,
D. D. Frederiks,
G. Ghirlanda,
J. -W. Hu,
A. Li,
C. -K. Li,
J. -D. Li,
D. B. Malesani,
L. Piro,
G. Raman,
R. Ricci,
E. Troja
, et al. (170 additional authors not shown)
Abstract:
Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a,…
▽ More
Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a, whose bright peak was also detected by the Swift Burst Alert Telescope and Konus-Wind through off-line analyses. At a redshift of $z=4.859$, EP240315a showed a much longer and more complicated light curve in the soft X-ray band than in gamma-rays. Benefiting from a large field-of-view ($\sim$3600 deg$^2$) and a high sensitivity, EP-WXT captured the earlier engine activation and extended late engine activity through a continuous detection. With a peak X-ray flux at the faint end of previously known high-$z$ GRBs, the detection of EP240315a demonstrates the great potential for EP to study the early universe via GRBs.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
The fast X-ray transient EP240315a: a z ~ 5 gamma-ray burst in a Lyman continuum leaking galaxy
Authors:
Andrew J. Levan,
Peter G. Jonker,
Andrea Saccardi,
Daniele Bjørn Malesani,
Nial R. Tanvir,
Luca Izzo,
Kasper E. Heintz,
Daniel Mata Sánchez,
Jonathan Quirola-Vásquez,
Manuel A. P. Torres,
Susanna D. Vergani,
Steve Schulze,
Andrea Rossi,
Paolo D'Avanzo,
Benjamin Gompertz,
Antonio Martin-Carrillo,
Antonio de Ugarte Postigo,
Benjamin Schneider,
Weimin Yuan,
Zhixing Ling,
Wenjie Zhang,
Xuan Mao,
Yuan Liu,
Hui Sun,
Dong Xu
, et al. (51 additional authors not shown)
Abstract:
The nature of the minute-to-hour long Fast X-ray Transients (FXTs) localised by telescopes such as Chandra, Swift, and XMM-Newton remains mysterious, with numerous models suggested for the events. Here, we report multi-wavelength observations of EP240315a, a 1600 s long transient detected by the Einstein Probe, showing it to have a redshift of z=4.859. We measure a low column density of neutral hy…
▽ More
The nature of the minute-to-hour long Fast X-ray Transients (FXTs) localised by telescopes such as Chandra, Swift, and XMM-Newton remains mysterious, with numerous models suggested for the events. Here, we report multi-wavelength observations of EP240315a, a 1600 s long transient detected by the Einstein Probe, showing it to have a redshift of z=4.859. We measure a low column density of neutral hydrogen, indicating that the event is embedded in a low-density environment, further supported by direct detection of leaking ionising Lyman-continuum. The observed properties are consistent with EP240315a being a long-duration gamma-ray burst, and these observations support an interpretation in which a significant fraction of the FXT population are lower-luminosity examples of similar events. Such transients are detectable at high redshifts by the Einstein Probe and, in the (near) future, out to even larger distances by SVOM, THESEUS, and Athena, providing samples of events into the epoch of reionisation.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Single-Atom Verification of the Optimal Trade-Off between Speed and Cost in Shortcuts to Adiabaticity
Authors:
J. -W. Zhang,
J. -T. Bu,
J. C. Li,
Weiquan Meng,
W. -Q. Ding,
B. Wang,
W. -F. Yuan,
H. -J. Du,
G. -Y. Ding,
W. -J. Chen,
L. Chen,
F. Zhou,
Zhenyu Xu,
M. Feng
Abstract:
The approach of shortcuts to adiabaticity enables the effective execution of adiabatic dynamics in quantum information processing with enhanced speed. Owing to the inherent trade-off between dynamical speed and the cost associated with the transitionless driving field, executing arbitrarily fast operations becomes impractical. To understand the accurate interplay between speed and energetic cost i…
▽ More
The approach of shortcuts to adiabaticity enables the effective execution of adiabatic dynamics in quantum information processing with enhanced speed. Owing to the inherent trade-off between dynamical speed and the cost associated with the transitionless driving field, executing arbitrarily fast operations becomes impractical. To understand the accurate interplay between speed and energetic cost in this process, we propose theoretically and verify experimentally a new trade-off, which is characterized by a tightly optimized bound within $s$-parameterized phase spaces. Our experiment is carried out in a single ultracold $^{40}$Ca$^{+}$ ion trapped in a harmonic potential. By exactly operating the quantum states of the ion, we execute the Landau-Zener model as an example, where the quantum speed limit as well as the cost are governed by the spectral gap. We witness that our proposed trade-off is indeed tight in scenarios involving both initially eigenstates and initially thermal equilibrium states. Our work helps understanding the fundamental constraints in shortcuts to adiabaticity and illuminates the potential of under-utilized phase spaces that have been traditionally overlooked.
△ Less
Submitted 6 June, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
Energy-conversion device using a quantum engine with the work medium of two-atom entanglement
Authors:
J. -W. Zhang,
B. Wang,
W. -F. Yuan,
J. -C. Li,
J. -T. Bu,
G. -Y. Ding,
W. -Q. Ding,
L. Chen,
F. Zhou,
M. Feng
Abstract:
Although entanglement is considered as an essential resource for quantum information processing, whether entanglement helps for energy conversion or output in the quantum regime is still lack of experimental witness. Here we report on an energy-conversion device operating as a quantum engine with the working medium acted by two entangled ions confined in a harmonic potential. The two ions are enta…
▽ More
Although entanglement is considered as an essential resource for quantum information processing, whether entanglement helps for energy conversion or output in the quantum regime is still lack of experimental witness. Here we report on an energy-conversion device operating as a quantum engine with the working medium acted by two entangled ions confined in a harmonic potential. The two ions are entangled by virtually coupling to one of the vibrational modes shared by the two ions, and the quantum engine couples to a quantum load, which is another shared vibrational mode. We explore the energy conversion efficiency of the quantum engine and investigate the useful energy (i.e., the maximum extractable work) stored in the quantum load by tuning the two ions in different degrees of entanglement as well as detecting the change of the phonons in the load. Our observation provides, for the first time, quantitative evidence that entanglement fuels the useful energy produced by the quantum engine, but not helpful for the energy conversion efficiency. We consider that our results may be useful to the study of quantum batteries for which one of the most indexes is the maximum extractable energy.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
In-tube micro-pyramidal silicon nanopore for inertial-kinetic sensing of single molecules
Authors:
Jianxin Yang,
Tianle Pan,
Zhenming Xie,
Wu Yuan,
Ho-Pui Ho
Abstract:
Electrokinetic force has been the major choice for driving the translocation of molecules through a nanopore. However, the use of this approach is limited by an uncontrollable translocation speed, resulting in non-uniform conductance signals with low conformational sensitivity, which hinders the accurate discrimination of the molecules. Here, we show the first use of inertial-kinetic translocation…
▽ More
Electrokinetic force has been the major choice for driving the translocation of molecules through a nanopore. However, the use of this approach is limited by an uncontrollable translocation speed, resulting in non-uniform conductance signals with low conformational sensitivity, which hinders the accurate discrimination of the molecules. Here, we show the first use of inertial-kinetic translocation induced by spinning an in-tube micro-pyramidal silicon nanopore fabricated using photovoltaic electrochemical etch-stop technique for biomolecular sensing. By adjusting the kinetic properties of a funnel-shaped centrifugal force field while maintaining a counter-balanced state of electrophoretic and electroosmotic effect in the nanopore, we achieved regulated translocation of proteins and obtained stable signals of long and adjustable dwell times and high conformational sensitivity. Moreover, we demonstrated instantaneous sensing and discrimination of molecular conformations and longitudinal monitoring of molecular reactions and conformation changes by wirelessly measuring characteristic features in current blockade readouts using the in-tube nanopore device.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Automated Similarity Metric Generation for Recommendation
Authors:
Liang Qu,
Yun Lin,
Wei Yuan,
Xiaojun Wan,
Yuhui Shi,
Hongzhi Yin
Abstract:
The embedding-based architecture has become the dominant approach in modern recommender systems, mapping users and items into a compact vector space. It then employs predefined similarity metrics, such as the inner product, to calculate similarity scores between user and item embeddings, thereby guiding the recommendation of items that align closely with a user's preferences. Given the critical ro…
▽ More
The embedding-based architecture has become the dominant approach in modern recommender systems, mapping users and items into a compact vector space. It then employs predefined similarity metrics, such as the inner product, to calculate similarity scores between user and item embeddings, thereby guiding the recommendation of items that align closely with a user's preferences. Given the critical role of similarity metrics in recommender systems, existing methods mainly employ handcrafted similarity metrics to capture the complex characteristics of user-item interactions. Yet, handcrafted metrics may not fully capture the diverse range of similarity patterns that can significantly vary across different domains.
To address this issue, we propose an Automated Similarity Metric Generation method for recommendations, named AutoSMG, which can generate tailored similarity metrics for various domains and datasets. Specifically, we first construct a similarity metric space by sampling from a set of basic embedding operators, which are then integrated into computational graphs to represent metrics. We employ an evolutionary algorithm to search for the optimal metrics within this metric space iteratively. To improve search efficiency, we utilize an early stopping strategy and a surrogate model to approximate the performance of candidate metrics instead of fully training models. Notably, our proposed method is model-agnostic, which can seamlessly plugin into different recommendation model architectures. The proposed method is validated on three public recommendation datasets across various domains in the Top-K recommendation task, and experimental results demonstrate that AutoSMG outperforms both commonly used handcrafted metrics and those generated by other search strategies.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Future Perspectives for Gamma-ray Burst Detection from Space
Authors:
Enrico Bozzo,
Lorenzo Amati,
Wayne Baumgartner,
Tzu-Ching Chang,
Bertrand Cordier,
Nicolas De Angelis,
Akihiro Doi,
Marco Feroci,
Cynthia Froning,
Jessica Gaskin,
Adam Goldstein,
Diego Götz,
Jon E. Grove,
Sylvain Guiriec,
Margarita Hernanz,
C. Michelle Hui,
Peter Jenke,
Daniel Kocevski,
Merlin Kole,
Chryssa Kouveliotou,
Thomas Maccarone,
Mark L. McConnell,
Hideo Matsuhara,
Paul O'Brien,
Nicolas Produit
, et al. (13 additional authors not shown)
Abstract:
Since their first discovery in the late 1960s, Gamma-ray bursts have attracted an exponentially growing interest from the international community due to their central role in the most highly debated open questions of the modern research of astronomy, astrophysics, cosmology, and fundamental physics. These range from the intimate nuclear composition of high density material within the core of ultra…
▽ More
Since their first discovery in the late 1960s, Gamma-ray bursts have attracted an exponentially growing interest from the international community due to their central role in the most highly debated open questions of the modern research of astronomy, astrophysics, cosmology, and fundamental physics. These range from the intimate nuclear composition of high density material within the core of ultra-dense neuron stars, to stellar evolution via the collapse of massive stars, the production and propagation of gravitational waves, as well as the exploration of the early Universe by unveiling first stars and galaxies (assessing also their evolution and cosmic re-ionization). GRBs have stimulated in the past $\sim$50 years the development of cutting-edge technological instruments for observations of high energy celestial sources from space, leading to the launch and successful operations of many different scientific missions (several of them still in data taking mode nowadays). In this review, we provide a brief description of the GRB-dedicated missions from space being designed and developed for the future. The list of these projects, not meant to be exhaustive, shall serve as a reference to interested readers to understand what is likely to come next to lead the further development of GRB research and associated phenomenology.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Computing renormalized curvature integrals on Poincaré-Einstein manifolds
Authors:
Jeffrey S. Case,
Ayush Khaitan,
Yueh-Ju Lin,
Aaron J. Tyrrell,
Wei Yuan
Abstract:
We describe a general procedure for computing renormalized curvature integrals on Poincaré-Einstein manifolds. In particular, we explain the connection between the Gauss-Bonnet-type formulas of Albin and Chang-Qing-Yang for the renormalized volume, and explicitly identify a scalar conformal invariant in the latter formula. Our approach constructs scalar conformal invariants that are divergences at…
▽ More
We describe a general procedure for computing renormalized curvature integrals on Poincaré-Einstein manifolds. In particular, we explain the connection between the Gauss-Bonnet-type formulas of Albin and Chang-Qing-Yang for the renormalized volume, and explicitly identify a scalar conformal invariant in the latter formula. Our approach constructs scalar conformal invariants that are divergences at any Einstein manifold; these imply that the scalar invariant in the Chang-Qing-Yang formula is not unique in dimension at least eight. Our procedure also produces explicit conformally invariant Gauss-Bonnet-type formulas for compact Einstein manifolds.
△ Less
Submitted 9 May, 2024; v1 submitted 17 April, 2024;
originally announced April 2024.
-
NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results
Authors:
Zheng Chen,
Zongwei Wu,
Eduard Zamfir,
Kai Zhang,
Yulun Zhang,
Radu Timofte,
Xiaokang Yang,
Hongyuan Yu,
Cheng Wan,
Yuxin Hong,
Zhijuan Huang,
Yajun Zou,
Yuan Huang,
Jiamin Lin,
Bingnan Han,
Xianyu Guan,
Yongsheng Yu,
Daoan Zhang,
Xuanwu Yin,
Kunlong Zuo,
Jinhua Hao,
Kai Zhao,
Kun Yuan,
Ming Sun,
Chao Zhou
, et al. (63 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge i…
▽ More
This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge is to obtain designs/solutions with the most advanced SR performance, with no constraints on computational resources (e.g., model size and FLOPs) or training data. The track of this challenge assesses performance with the PSNR metric on the DIV2K testing dataset. The competition attracted 199 registrants, with 20 teams submitting valid entries. This collective endeavour not only pushes the boundaries of performance in single-image SR but also offers a comprehensive overview of current trends in this field.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
QI-DPFL: Quality-Aware and Incentive-Boosted Federated Learning with Differential Privacy
Authors:
Wenhao Yuan,
Xuehe Wang
Abstract:
Federated Learning (FL) has increasingly been recognized as an innovative and secure distributed model training paradigm, aiming to coordinate multiple edge clients to collaboratively train a shared model without uploading their private datasets. The challenge of encouraging mobile edge devices to participate zealously in FL model training procedures, while mitigating the privacy leakage risks dur…
▽ More
Federated Learning (FL) has increasingly been recognized as an innovative and secure distributed model training paradigm, aiming to coordinate multiple edge clients to collaboratively train a shared model without uploading their private datasets. The challenge of encouraging mobile edge devices to participate zealously in FL model training procedures, while mitigating the privacy leakage risks during wireless transmission, remains comparatively unexplored so far. In this paper, we propose a novel approach, named QI-DPFL (Quality-Aware and Incentive-Boosted Federated Learning with Differential Privacy), to address the aforementioned intractable issue. To select clients with high-quality datasets, we first propose a quality-aware client selection mechanism based on the Earth Mover's Distance (EMD) metric. Furthermore, to attract high-quality data contributors, we design an incentive-boosted mechanism that constructs the interactions between the central server and the selected clients as a two-stage Stackelberg game, where the central server designs the time-dependent reward to minimize its cost by considering the trade-off between accuracy loss and total reward allocated, and each selected client decides the privacy budget to maximize its utility. The Nash Equilibrium of the Stackelberg game is derived to find the optimal solution in each global iteration. The extensive experimental results on different real-world datasets demonstrate the effectiveness of our proposed FL framework, by realizing the goal of privacy protection and incentive compatibility.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Fundamental Limits of Communication-Assisted Sensing in ISAC Systems
Authors:
Fuwang Dong,
Fan Liu,
Shihang Liu,
Yifeng Xiong,
Weijie Yuan,
Yuanhao Cui
Abstract:
In this paper, we introduce a novel communication-assisted sensing (CAS) framework that explores the potential coordination gains offered by the integrated sensing and communication technique. The CAS system endows users with beyond-line-of-the-sight sensing capabilities, supported by a dual-functional base station that enables simultaneous sensing and communication. To delve into the system's fun…
▽ More
In this paper, we introduce a novel communication-assisted sensing (CAS) framework that explores the potential coordination gains offered by the integrated sensing and communication technique. The CAS system endows users with beyond-line-of-the-sight sensing capabilities, supported by a dual-functional base station that enables simultaneous sensing and communication. To delve into the system's fundamental limits, we characterize the information-theoretic framework of the CAS system in terms of rate-distortion theory. We reveal the achievable overall distortion between the target's state and the reconstructions at the end-user, referred to as the sensing quality of service, within a special case where the distortion metric is separable for sensing and communication processes. As a case study, we employ a typical application to demonstrate distortion minimization under the ISAC signaling strategy, showcasing the potential of CAS in enhancing sensing capabilities.
△ Less
Submitted 23 April, 2024; v1 submitted 11 April, 2024;
originally announced April 2024.
-
Small Magellanic Cloud Cepheids Observed with the Hubble Space Telescope Provide a New Anchor for the SH0ES Distance Ladder
Authors:
Louise Breuval,
Adam G. Riess,
Stefano Casertano,
Wenlong Yuan,
Lucas M. Macri,
Martino Romaniello,
Yukei S. Murakami,
Daniel Scolnic,
Gagandeep S. Anand,
Igor Soszyński
Abstract:
We present photometric measurements of 88 Cepheid variables in the core of the Small Magellanic Cloud (SMC), the first sample obtained with the Hubble Space Telescope (HST) and Wide Field Camera 3, in the same homogeneous photometric system as past measurements of all Cepheids on the SH0ES distance ladder. We limit the sample to the inner core and model the geometry to reduce errors in prior studi…
▽ More
We present photometric measurements of 88 Cepheid variables in the core of the Small Magellanic Cloud (SMC), the first sample obtained with the Hubble Space Telescope (HST) and Wide Field Camera 3, in the same homogeneous photometric system as past measurements of all Cepheids on the SH0ES distance ladder. We limit the sample to the inner core and model the geometry to reduce errors in prior studies due to the non-trivial depth of this Cloud. Without crowding present in ground-based studies, we obtain an unprecedentedly low dispersion of 0.102 mag for a Period-Luminosity relation in the SMC, approaching the width of the Cepheid instability strip. The new geometric distance to 15 late-type detached eclipsing binaries in the SMC offers a rare opportunity to improve the foundation of the distance ladder, increasing the number of calibrating galaxies from three to four. With the SMC as the only anchor, we find H$_0\!=\!74.1 \pm 2.1$ km s$^{-1}$ Mpc$^{-1}$. Combining these four geometric distances with our HST photometry of SMC Cepheids, we obtain H$_0\!=\!73.17 \pm 0.86$ km s$^{-1}$ Mpc$^{-1}$. By including the SMC in the distance ladder, we also double the range where the metallicity ([Fe/H]) dependence of the Cepheid Period-Luminosity relation can be calibrated, and we find $γ= -0.22 \pm 0.05$ mag dex$^{-1}$. Our local measurement of H$_0$ based on Cepheids and Type Ia supernovae shows a 5.8$σ$ tension with the value inferred from the CMB assuming a $Λ$CDM cosmology, reinforcing the possibility of physics beyond $Λ$CDM.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition
Authors:
Yisheng He,
Weihao Yuan,
Siyu Zhu,
Zilong Dong,
Liefeng Bo,
Qixing Huang
Abstract:
This paper enables high-fidelity, transferable NeRF editing by frequency decomposition. Recent NeRF editing pipelines lift 2D stylization results to 3D scenes while suffering from blurry results, and fail to capture detailed structures caused by the inconsistency between 2D editings. Our critical insight is that low-frequency components of images are more multiview-consistent after editing compare…
▽ More
This paper enables high-fidelity, transferable NeRF editing by frequency decomposition. Recent NeRF editing pipelines lift 2D stylization results to 3D scenes while suffering from blurry results, and fail to capture detailed structures caused by the inconsistency between 2D editings. Our critical insight is that low-frequency components of images are more multiview-consistent after editing compared with their high-frequency parts. Moreover, the appearance style is mainly exhibited on the low-frequency components, and the content details especially reside in high-frequency parts. This motivates us to perform editing on low-frequency components, which results in high-fidelity edited scenes. In addition, the editing is performed in the low-frequency feature space, enabling stable intensity control and novel scene transfer. Comprehensive experiments conducted on photorealistic datasets demonstrate the superior performance of high-fidelity and transferable NeRF editing. The project page is at \url{https://aigc3d.github.io/freditor}.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
HyperCLOVA X Technical Report
Authors:
Kang Min Yoo,
Jaegeun Han,
Sookyo In,
Heewon Jeon,
Jisu Jeong,
Jaewook Kang,
Hyunwook Kim,
Kyung-Min Kim,
Munhyong Kim,
Sungju Kim,
Donghyun Kwak,
Hanock Kwak,
Se Jung Kwon,
Bado Lee,
Dongsoo Lee,
Gichang Lee,
Jooho Lee,
Baeseong Park,
Seongjin Shin,
Joonsang Yu,
Seolki Baek,
Sumin Byeon,
Eungsup Cho,
Dooseok Choe,
Jeesung Han
, et al. (371 additional authors not shown)
Abstract:
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t…
▽ More
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.
△ Less
Submitted 13 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
GI-Free Pilot-Aided Channel Estimation for Affine Frequency Division Multiplexing Systems
Authors:
Yu Zhou,
Haoran Yin,
Nanhao Zhou,
Yanqun Tang,
Xiaoying Zhang,
Weijie Yuan
Abstract:
The recently developed affine frequency division multiplexing (AFDM) can achieve full diversity in doubly selective channels, providing a comprehensive sparse representation of the delay-Doppler domain channel. Thus, accurate channel estimation is feasible by using just one pilot symbol. However, traditional AFDM channel estimation schemes necessitate the use of guard intervals (GI) to mitigate da…
▽ More
The recently developed affine frequency division multiplexing (AFDM) can achieve full diversity in doubly selective channels, providing a comprehensive sparse representation of the delay-Doppler domain channel. Thus, accurate channel estimation is feasible by using just one pilot symbol. However, traditional AFDM channel estimation schemes necessitate the use of guard intervals (GI) to mitigate data-pilot interference, leading to spectral efficiency degradation. In this paper, we propose a GI-free pilot-aided channel estimation algorithm for AFDM systems, which improves spectral efficiency significantly. To mitigate the interference between the pilot and data symbols caused by the absence of GI, we perform joint interference cancellation, channel estimation, and signal detection iterately. Simulation results show that the bit error rate (BER) performance of the proposed method can approach the ideal case with perfect channel estimation.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images
Authors:
Yushuang Wu,
Luyue Shi,
Junhao Cai,
Weihao Yuan,
Lingteng Qiu,
Zilong Dong,
Liefeng Bo,
Shuguang Cui,
Xiaoguang Han
Abstract:
Generalizable 3D object reconstruction from single-view RGB-D images remains a challenging task, particularly with real-world data. Current state-of-the-art methods develop Transformer-based implicit field learning, necessitating an intensive learning paradigm that requires dense query-supervision uniformly sampled throughout the entire space. We propose a novel approach, IPoD, which harmonizes im…
▽ More
Generalizable 3D object reconstruction from single-view RGB-D images remains a challenging task, particularly with real-world data. Current state-of-the-art methods develop Transformer-based implicit field learning, necessitating an intensive learning paradigm that requires dense query-supervision uniformly sampled throughout the entire space. We propose a novel approach, IPoD, which harmonizes implicit field learning with point diffusion. This approach treats the query points for implicit field learning as a noisy point cloud for iterative denoising, allowing for their dynamic adaptation to the target object shape. Such adaptive query points harness diffusion learning's capability for coarse shape recovery and also enhances the implicit representation's ability to delineate finer details. Besides, an additional self-conditioning mechanism is designed to use implicit predictions as the guidance of diffusion learning, leading to a cooperative system. Experiments conducted on the CO3D-v2 dataset affirm the superiority of IPoD, achieving 7.8% improvement in F-score and 28.6% in Chamfer distance over existing methods. The generalizability of IPoD is also demonstrated on the MVImgNet dataset. Our project page is at https://yushuang-wu.github.io/IPoD.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Robust Federated Contrastive Recommender System against Model Poisoning Attack
Authors:
Wei Yuan,
Chaoqun Yang,
Liang Qu,
Guanhua Ye,
Quoc Viet Hung Nguyen,
Hongzhi Yin
Abstract:
Federated Recommender Systems (FedRecs) have garnered increasing attention recently, thanks to their privacy-preserving benefits. However, the decentralized and open characteristics of current FedRecs present two dilemmas. First, the performance of FedRecs is compromised due to highly sparse on-device data for each client. Second, the system's robustness is undermined by the vulnerability to model…
▽ More
Federated Recommender Systems (FedRecs) have garnered increasing attention recently, thanks to their privacy-preserving benefits. However, the decentralized and open characteristics of current FedRecs present two dilemmas. First, the performance of FedRecs is compromised due to highly sparse on-device data for each client. Second, the system's robustness is undermined by the vulnerability to model poisoning attacks launched by malicious users. In this paper, we introduce a novel contrastive learning framework designed to fully leverage the client's sparse data through embedding augmentation, referred to as CL4FedRec. Unlike previous contrastive learning approaches in FedRecs that necessitate clients to share their private parameters, our CL4FedRec aligns with the basic FedRec learning protocol, ensuring compatibility with most existing FedRec implementations. We then evaluate the robustness of FedRecs equipped with CL4FedRec by subjecting it to several state-of-the-art model poisoning attacks. Surprisingly, our observations reveal that contrastive learning tends to exacerbate the vulnerability of FedRecs to these attacks. This is attributed to the enhanced embedding uniformity, making the polluted target item embedding easily proximate to popular items. Based on this insight, we propose an enhanced and robust version of CL4FedRec (rCL4FedRec) by introducing a regularizer to maintain the distance among item embeddings with different popularity levels. Extensive experiments conducted on four commonly used recommendation datasets demonstrate that CL4FedRec significantly enhances both the model's performance and the robustness of FedRecs.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
SAM-dPCR: Real-Time and High-throughput Absolute Quantification of Biological Samples Using Zero-Shot Segment Anything Model
Authors:
Yuanyuan Wei,
Shanhang Luo,
Changran Xu,
Yingqi Fu,
Qingyue Dong,
Yi Zhang,
Fuyang Qu,
Guangyao Cheng,
Yi-Ping Ho,
Ho-Pui Ho,
Wu Yuan
Abstract:
Digital PCR (dPCR) has revolutionized nucleic acid diagnostics by enabling absolute quantification of rare mutations and target sequences. However, current detection methodologies face challenges, as flow cytometers are costly and complex, while fluorescence imaging methods, relying on software or manual counting, are time-consuming and prone to errors. To address these limitations, we present SAM…
▽ More
Digital PCR (dPCR) has revolutionized nucleic acid diagnostics by enabling absolute quantification of rare mutations and target sequences. However, current detection methodologies face challenges, as flow cytometers are costly and complex, while fluorescence imaging methods, relying on software or manual counting, are time-consuming and prone to errors. To address these limitations, we present SAM-dPCR, a novel self-supervised learning-based pipeline that enables real-time and high-throughput absolute quantification of biological samples. Leveraging the zero-shot SAM model, SAM-dPCR efficiently analyzes diverse microreactors with over 97.7% accuracy within a rapid processing time of 3.16 seconds. By utilizing commonly available lab fluorescence microscopes, SAM-dPCR facilitates the quantification of sample concentrations. The accuracy of SAM-dPCR is validated by the strong linear relationship observed between known and inferred sample concentrations. Additionally, SAM-dPCR demonstrates versatility through comprehensive verification using various samples and reactor morphologies. This accessible, cost-effective tool transcends the limitations of traditional detection methods or fully supervised AI models, marking the first application of SAM in nucleic acid detection or molecular diagnostics. By eliminating the need for annotated training data, SAM-dPCR holds great application potential for nucleic acid quantification in resource-limited settings.
△ Less
Submitted 22 January, 2024;
originally announced March 2024.
-
Radiation Effects on Scientific CMOS Detectors for X-ray Astronomy: II. Total Ionizing Dose Irradiation
Authors:
Mengxi Chen,
Zhixing Ling,
Mingjun Liu,
Qinyu Wu,
Chen Zhang,
Jiaqiang Liu,
Zhenlong Zhang,
Weimin Yuan,
Shuang-Nan Zhang
Abstract:
Complementary metal-oxide-semiconductor (CMOS) detectors are a competitive choice for current and upcoming astronomical missions. To understand the performance variations of CMOS detectors in space environment, we investigate the total ionizing dose effects on custom-made large-format X-ray CMOS detectors. Three CMOS detector samples were irradiated with a Co-60 source with a total dose of 70 krad…
▽ More
Complementary metal-oxide-semiconductor (CMOS) detectors are a competitive choice for current and upcoming astronomical missions. To understand the performance variations of CMOS detectors in space environment, we investigate the total ionizing dose effects on custom-made large-format X-ray CMOS detectors. Three CMOS detector samples were irradiated with a Co-60 source with a total dose of 70 krad and 105 krad. We test and compare the performance of these detectors before and after irradiation. After irradiation, the dark current increases by roughly 20 to 100 times, and the readout noise increases from 3 e- to 6 e-. The bias level at 50 ms integration time decreases by 13 to 18 Digital Number (DN) at -30 degree. The energy resolution increases from about 150 eV to about 170 eV at 4.5 keV at -30 degree. The conversion gain of the detectors varies for less than 2% after the irradiation. Furthermore, there are about 50 pixels whose bias at 50 ms has changed by more than 20 DN after the exposure to the radiation and about 30 to 140 pixels whose readout noise has increased by over 20 e- at -30 degree at 50 ms integration time. These results demonstrate that the performances of large-format CMOS detectors do not suffer significant degeneration in space environment.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
An Optimization Framework to Enforce Multi-View Consistency for Texturing 3D Meshes Using Pre-Trained Text-to-Image Models
Authors:
Zhengyi Zhao,
Chen Song,
Xiaodong Gu,
Yuan Dong,
Qi Zuo,
Weihao Yuan,
Zilong Dong,
Liefeng Bo,
Qixing Huang
Abstract:
A fundamental problem in the texturing of 3D meshes using pre-trained text-to-image models is to ensure multi-view consistency. State-of-the-art approaches typically use diffusion models to aggregate multi-view inputs, where common issues are the blurriness caused by the averaging operation in the aggregation step or inconsistencies in local features. This paper introduces an optimization framewor…
▽ More
A fundamental problem in the texturing of 3D meshes using pre-trained text-to-image models is to ensure multi-view consistency. State-of-the-art approaches typically use diffusion models to aggregate multi-view inputs, where common issues are the blurriness caused by the averaging operation in the aggregation step or inconsistencies in local features. This paper introduces an optimization framework that proceeds in four stages to achieve multi-view consistency. Specifically, the first stage generates an over-complete set of 2D textures from a predefined set of viewpoints using an MV-consistent diffusion process. The second stage selects a subset of views that are mutually consistent while covering the underlying 3D model. We show how to achieve this goal by solving semi-definite programs. The third stage performs non-rigid alignment to align the selected views across overlapping regions. The fourth stage solves an MRF problem to associate each mesh face with a selected view. In particular, the third and fourth stages are iterated, with the cuts obtained in the fourth stage encouraging non-rigid alignment in the third stage to focus on regions close to the cuts. Experimental results show that our approach significantly outperforms baseline approaches both qualitatively and quantitatively.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Fundamentals of Delay-Doppler Communications: Practical Implementation and Extensions to OTFS
Authors:
Shuangyang Li,
Peter Jung,
Weijie Yuan,
Zhiqiang Wei,
Jinhong Yuan,
Baoming Bai,
Giuseppe Caire
Abstract:
The recently proposed orthogonal time frequency space (OTFS) modulation, which is a typical Delay-Doppler (DD) communication scheme, has attracted significant attention thanks to its appealing performance over doubly-selective channels. In this paper, we present the fundamentals of general DD communications from the viewpoint of the Zak transform. We start our study by constructing DD domain basis…
▽ More
The recently proposed orthogonal time frequency space (OTFS) modulation, which is a typical Delay-Doppler (DD) communication scheme, has attracted significant attention thanks to its appealing performance over doubly-selective channels. In this paper, we present the fundamentals of general DD communications from the viewpoint of the Zak transform. We start our study by constructing DD domain basis functions aligning with the time-frequency (TF)-consistency condition, which are globally quasi-periodic and locally twisted-shifted. We unveil that these features are translated to unique signal structures in both time and frequency, which are beneficial for communication purposes. Then, we focus on the practical implementations of DD Nyquist communications, where we show that rectangular windows achieve perfect DD orthogonality, while truncated periodic signals can obtain sufficient DD orthogonality. Particularly, smoothed rectangular window with excess bandwidth can result in a slightly worse orthogonality but better pulse localization in the DD domain. Furthermore, we present a practical pulse shaping framework for general DD communications and derive the corresponding input-output relation under various shaping pulses. Our numerical results agree with our derivations and also demonstrate advantages of DD communications over conventional orthogonal frequency-division multiplexing (OFDM).
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation
Authors:
Junhao Cai,
Yisheng He,
Weihao Yuan,
Siyu Zhu,
Zilong Dong,
Liefeng Bo,
Qifeng Chen
Abstract:
This paper studies a new open-set problem, the open-vocabulary category-level object pose and size estimation. Given human text descriptions of arbitrary novel object categories, the robot agent seeks to predict the position, orientation, and size of the target object in the observed scene image. To enable such generalizability, we first introduce OO3D-9D, a large-scale photorealistic dataset for…
▽ More
This paper studies a new open-set problem, the open-vocabulary category-level object pose and size estimation. Given human text descriptions of arbitrary novel object categories, the robot agent seeks to predict the position, orientation, and size of the target object in the observed scene image. To enable such generalizability, we first introduce OO3D-9D, a large-scale photorealistic dataset for this task. Derived from OmniObject3D, OO3D-9D is the largest and most diverse dataset in the field of category-level object pose and size estimation. It includes additional annotations for the symmetry axis of each category, which help resolve symmetric ambiguity. Apart from the large-scale dataset, we find another key to enabling such generalizability is leveraging the strong prior knowledge in pre-trained visual-language foundation models. We then propose a framework built on pre-trained DinoV2 and text-to-image stable diffusion models to infer the normalized object coordinate space (NOCS) maps of the target instances. This framework fully leverages the visual semantic prior from DinoV2 and the aligned visual and language knowledge within the text-to-image diffusion model, which enables generalization to various text descriptions of novel categories. Comprehensive quantitative and qualitative experiments demonstrate that the proposed open-vocabulary method, trained on our large-scale synthesized data, significantly outperforms the baseline and can effectively generalize to real-world images of unseen categories. The project page is at https://ov9d.github.io.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model
Authors:
Qi Zuo,
Xiaodong Gu,
Lingteng Qiu,
Yuan Dong,
Zhengyi Zhao,
Weihao Yuan,
Rui Peng,
Siyu Zhu,
Zilong Dong,
Liefeng Bo,
Qixing Huang
Abstract:
Generating multi-view images based on text or single-image prompts is a critical capability for the creation of 3D content. Two fundamental questions on this topic are what data we use for training and how to ensure multi-view consistency. This paper introduces a novel framework that makes fundamental contributions to both questions. Unlike leveraging images from 2D diffusion models for training,…
▽ More
Generating multi-view images based on text or single-image prompts is a critical capability for the creation of 3D content. Two fundamental questions on this topic are what data we use for training and how to ensure multi-view consistency. This paper introduces a novel framework that makes fundamental contributions to both questions. Unlike leveraging images from 2D diffusion models for training, we propose a dense consistent multi-view generation model that is fine-tuned from off-the-shelf video generative models. Images from video generative models are more suitable for multi-view generation because the underlying network architecture that generates them employs a temporal module to enforce frame consistency. Moreover, the video data sets used to train these models are abundant and diverse, leading to a reduced train-finetuning domain gap. To enhance multi-view consistency, we introduce a 3D-Aware Denoising Sampling, which first employs a feed-forward reconstruction module to get an explicit global 3D model, and then adopts a sampling strategy that effectively involves images rendered from the global 3D model into the denoising sampling loop to improve the multi-view consistency of the final images. As a by-product, this module also provides a fast way to create 3D assets represented by 3D Gaussians within a few seconds. Our approach can generate 24 dense views and converges much faster in training than state-of-the-art approaches (4 GPU hours versus many thousand GPU hours) with comparable visual quality and consistency. By further fine-tuning, our approach outperforms existing state-of-the-art methods in both quantitative metrics and visual effects. Our project page is aigc3d.github.io/VideoMV.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
VisionCLIP: An Med-AIGC based Ethical Language-Image Foundation Model for Generalizable Retina Image Analysis
Authors:
Hao Wei,
Bowen Liu,
Minqing Zhang,
Peilun Shi,
Wu Yuan
Abstract:
Generalist foundation model has ushered in newfound capabilities in medical domain. However, the contradiction between the growing demand for high-quality annotated data with patient privacy continues to intensify. The utilization of medical artificial intelligence generated content (Med-AIGC) as an inexhaustible resource repository arises as a potential solution to address the aforementioned chal…
▽ More
Generalist foundation model has ushered in newfound capabilities in medical domain. However, the contradiction between the growing demand for high-quality annotated data with patient privacy continues to intensify. The utilization of medical artificial intelligence generated content (Med-AIGC) as an inexhaustible resource repository arises as a potential solution to address the aforementioned challenge. Here we harness 1 million open-source synthetic fundus images paired with natural language descriptions, to curate an ethical language-image foundation model for retina image analysis named VisionCLIP. VisionCLIP achieves competitive performance on three external datasets compared with the existing method pre-trained on real-world data in a zero-shot fashion. The employment of artificially synthetic images alongside corresponding textual data for training enables the medical foundation model to successfully assimilate knowledge of disease symptomatology, thereby circumventing potential breaches of patient confidentiality.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Representing Molecules as Random Walks Over Interpretable Grammars
Authors:
Michael Sun,
Minghao Guo,
Weize Yuan,
Veronika Thost,
Crystal Elaine Owens,
Aristotle Franklin Grosz,
Sharvaa Selvan,
Katelyn Zhou,
Hassan Mohiuddin,
Benjamin J Pedretti,
Zachary P Smith,
Jie Chen,
Wojciech Matusik
Abstract:
Recent research in molecular discovery has primarily been devoted to small, drug-like molecules, leaving many similarly important applications in material design without adequate technology. These applications often rely on more complex molecular structures with fewer examples that are carefully designed using known substructures. We propose a data-efficient and interpretable model for representin…
▽ More
Recent research in molecular discovery has primarily been devoted to small, drug-like molecules, leaving many similarly important applications in material design without adequate technology. These applications often rely on more complex molecular structures with fewer examples that are carefully designed using known substructures. We propose a data-efficient and interpretable model for representing and reasoning over such molecules in terms of graph grammars that explicitly describe the hierarchical design space featuring motifs to be the design basis. We present a novel representation in the form of random walks over the design space, which facilitates both molecule generation and property prediction. We demonstrate clear advantages over existing methods in terms of performance, efficiency, and synthesizability of predicted molecules, and we provide detailed insights into the method's chemical interpretability.
△ Less
Submitted 2 June, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.