subscribe to arXiv mailings

Heterogeneous integration of amorphous silicon carbide on thin film lithium niobate

Authors: Zizheng Li, Naresh Sharma, Bruno Lopez-Rodriguez, Roald van der Kolk, Thomas Scholte, Hugo Voncken, Jasper van der Boom, Simon Gröblacher, Iman Esmaeil Zadeh

Abstract: In the past decade, lithium niobate (LiNbO3 or LN) photonics, thanks to its heat-free and fast electro-optical modulation, second-order non-linearities and low loss, has been extensively investigated. Despite numerous demonstrations of high-performance LN photonics, processing lithium niobate remains challenging and suffers from incompatibilities with standard complementary metal-oxide semiconduct… ▽ More In the past decade, lithium niobate (LiNbO3 or LN) photonics, thanks to its heat-free and fast electro-optical modulation, second-order non-linearities and low loss, has been extensively investigated. Despite numerous demonstrations of high-performance LN photonics, processing lithium niobate remains challenging and suffers from incompatibilities with standard complementary metal-oxide semiconductor (CMOS) fabrication lines, limiting its scalability. Silicon carbide (SiC) is an emerging material platform with a high refractive index, a large non-linear Kerr coefficient, and a promising candidate for heterogeneous integration with LN photonics. Current approaches of SiC/LN integration require transfer-bonding techniques, which are time-consuming, expensive, and lack precision in layer thickness. Here we show that amorphous silicon carbide (a-SiC), deposited using inductively coupled plasma enhanced chemical vapor deposition (ICPCVD) at low temperatures (< 165 C), can be conveniently integrated with LiNbO3 and processed to form high-performance photonics. Most importantly, the fabrication only involves a standard, silicon-compatible, reactive ion etching step and leaves the LiNbO3 intact, hence its compatibility with standard foundry processes. As a proof-of-principle, we fabricated waveguides and ring resonators on the developed a-SiC/LN platform and achieved intrinsic quality factors higher than 106,000 and resonance electro-optic tunability of 3.4 pm/V with 3 mm tuning length. We showcase the possibility of dense integration by fabricating and testing ring resonators with 40um radius without a noticeable loss penalty. Our platform offers a CMOS-compatible and scalable approach for implementation of future fast electro-optic modulators and reconfigurable photonic circuits as well as nonlinear processes which can benefit from involving both second and third-order nonlinearities. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 9 pages, 4 figures

arXiv:2407.08994 [pdf, other]

Global Attention-Guided Dual-Domain Point Cloud Feature Learning for Classification and Segmentation

Authors: Zihao Li, Pan Gao, Kang You, Chuan Yan, Manoranjan Paul

Abstract: Previous studies have demonstrated the effectiveness of point-based neural models on the point cloud analysis task. However, there remains a crucial issue on producing the efficient input embedding for raw point coordinates. Moreover, another issue lies in the limited efficiency of neighboring aggregations, which is a critical component in the network stem. In this paper, we propose a Global Atten… ▽ More Previous studies have demonstrated the effectiveness of point-based neural models on the point cloud analysis task. However, there remains a crucial issue on producing the efficient input embedding for raw point coordinates. Moreover, another issue lies in the limited efficiency of neighboring aggregations, which is a critical component in the network stem. In this paper, we propose a Global Attention-guided Dual-domain Feature Learning network (GAD) to address the above-mentioned issues. We first devise the Contextual Position-enhanced Transformer (CPT) module, which is armed with an improved global attention mechanism, to produce a global-aware input embedding that serves as the guidance to subsequent aggregations. Then, the Dual-domain K-nearest neighbor Feature Fusion (DKFF) is cascaded to conduct effective feature aggregation through novel dual-domain feature learning which appreciates both local geometric relations and long-distance semantic connections. Extensive experiments on multiple point cloud analysis tasks (e.g., classification, part segmentation, and scene semantic segmentation) demonstrate the superior performance of the proposed method and the efficacy of the devised modules. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.08549 [pdf, ps, other]

A restriction estimate for a hyperbolic paraboloid in $\mathbb{R}^5$

Authors: Zhuoran Li

Abstract: In this paper, we prove a restriction estimate for a hyperbolic paraboloid in $\mathbb{R}^5$ by the polynomial partitioning method. In this paper, we prove a restriction estimate for a hyperbolic paraboloid in $\mathbb{R}^5$ by the polynomial partitioning method. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 13 pages. Any comments are welcome

arXiv:2407.08480 [pdf]

Magic silicon dioxide for widely tunable integrated photonics

Authors: Bruno Lopez-Rodriguez, Naresh Sharma, Zizheng Li, Roald van der Kolk, Jasper van der Boom, Thomas Scholte, Jin Chang, Simon Groblacher, Iman Esmaeil Zadeh

Abstract: Integrated photonic circuits have transformed data communication, biosensing, and light detection and ranging, and hold wide-ranging potential for optical computing, optical imaging and signal processing. These applications often require tunable and reconfigurable photonic components, most commonly accomplished through the thermo-optic effect. However, the resulting tuning window is limited for st… ▽ More Integrated photonic circuits have transformed data communication, biosensing, and light detection and ranging, and hold wide-ranging potential for optical computing, optical imaging and signal processing. These applications often require tunable and reconfigurable photonic components, most commonly accomplished through the thermo-optic effect. However, the resulting tuning window is limited for standard optical materials such as silicon dioxide and silicon nitride. Most importantly, bidirectional thermal tuning on a single platform has not been realized. For the first time, we show that by tuning and optimizing the deposition conditions in inductively-coupled plasma chemical vapor deposition (ICPCVD) of silicon dioxide, this material can be used to deterministically tune the thermo-optic properties of optical devices without introducing significant losses. We demonstrate that we can deterministically integrate positive and negative wavelength shifts on a single chip, validated on amorphous silicon carbide (a-SiC), silicon nitride (SiN) and silicon-on-insulator (SOI) platforms. We observe up to a 10-fold improvement of the thermo-optic tunability and, in addition, demonstrate athermal ring resonators with shifts as low as 1.5 pm/°C. This enables the fabrication of a novel tunable coupled ring optical waveguide (CROW) requiring only a single heater. In addition, the low-temperature deposition of our silicon dioxide cladding can be combined with lift-off to isolate the optical devices resulting in a decrease in thermal crosstalk by at least two orders of magnitude. Our method paves the way for novel photonic architectures incorporating bidirectional thermo-optic tunability. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.08414 [pdf, other]

MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos

Authors: Yushuo Chen, Zerong Zheng, Zhe Li, Chao Xu, Yebin Liu

Abstract: We present a novel pipeline for learning high-quality triangular human avatars from multi-view videos. Recent methods for avatar learning are typically based on neural radiance fields (NeRF), which is not compatible with traditional graphics pipeline and poses great challenges for operations like editing or synthesizing under different environments. To overcome these limitations, our method repres… ▽ More We present a novel pipeline for learning high-quality triangular human avatars from multi-view videos. Recent methods for avatar learning are typically based on neural radiance fields (NeRF), which is not compatible with traditional graphics pipeline and poses great challenges for operations like editing or synthesizing under different environments. To overcome these limitations, our method represents the avatar with an explicit triangular mesh extracted from an implicit SDF field, complemented by an implicit material field conditioned on given poses. Leveraging this triangular avatar representation, we incorporate physics-based rendering to accurately decompose geometry and texture. To enhance both the geometric and appearance details, we further employ a 2D UNet as the network backbone and introduce pseudo normal ground-truth as additional supervision. Experiments show that our method can learn triangular avatars with high-quality geometry reconstruction and plausible material decomposition, inherently supporting editing, manipulation or relighting operations. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: Project Page: https://shad0wta9.github.io/meshavatar-page/

arXiv:2407.08382 [pdf, ps, other]

Adjusting for Participation Bias in Case-Control Genetic Association Studies for Rare Diseases

Authors: Le Wang, Zhengbang Li, Ben Fitzpatrick, Clarice Weinberg, Jinbo Chen

Abstract: Collection of genotype data in case-control genetic association studies may often be incomplete for reasons related to genes themselves. This non-ignorable missingness structure, if not appropriately accounted for, can result in participation bias in association analyses. To deal with this issue, Chen et al. (2016) proposed to collect additional genetic information from family members of individua… ▽ More Collection of genotype data in case-control genetic association studies may often be incomplete for reasons related to genes themselves. This non-ignorable missingness structure, if not appropriately accounted for, can result in participation bias in association analyses. To deal with this issue, Chen et al. (2016) proposed to collect additional genetic information from family members of individuals whose genotype data were not available, and developed a maximum likelihood method for bias correction. In this study, we develop an estimating equation approach to analyzing data collected from this design that allows adjustment of covariates. It jointly estimates odds ratio parameters for genetic association and missingness, where a logistic regression model is used to relate missingness to genotype and other covariates. Our method allows correlation between genotype and covariates while using genetic information from family members to provide information on the missing genotype data. In the estimating equation for genetic association parameters, we weight the contribution of each genotyped subject to the empirical likelihood score function by the inverse probability that the genotype data are available. We evaluate large and finite sample performance of our method via simulation studies and apply it to a family-based case-control study of breast cancer. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.08346 [pdf, ps, other]

Polynomial tail solutions of the non-cutoff Boltzmann equation near local Maxwellians

Authors: Renjun Duan, Zongguang Li

Abstract: This paper aims to incorporate the Caflisch's decomposition into the macro-micro decomposition in Boltzmann theory for allowing the microscopic component to exhibit only the polynomial tail in large velocities. In particular, we treat the Cauchy problem on the non-cutoff Boltzmann equation under the compressible Euler scaling in case of three-dimensional whole space. Up to a finite time we constru… ▽ More This paper aims to incorporate the Caflisch's decomposition into the macro-micro decomposition in Boltzmann theory for allowing the microscopic component to exhibit only the polynomial tail in large velocities. In particular, we treat the Cauchy problem on the non-cutoff Boltzmann equation under the compressible Euler scaling in case of three-dimensional whole space. Up to a finite time we construct the Boltzmann solution around a local Maxwellian corresponding to small-amplitude classical solutions of the full compressible Euler system around constant states. We design a new energy functional which can capture the convergence rate in the small Knudsen number $\varepsilon$ and allow the microscopic part of solutions to decay polynomially in large velocities. Moreover, the energy norm of perturbations can be of the order $\varepsilon^{1/2}$ which the usual method of Hilbert expansion fails to obtain. As a byproduct of the proof, our estimates immediately yield a global-in-time existence result when the Euler solutions are taken to be constant states. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 58 pages. All comments are welcome

arXiv:2407.08273

RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL

Authors: Zhenhe Wu, Zhongqiu Li, Jie Zhang, Mengxiang Li, Yu Zhao, Ruiyu Fang, Zhongjiang He, Xuelong Li, Zhoujun Li, Shuangyong Song

Abstract: Large language models (LLMs) with in-context learning have significantly improved the performance of text-to-SQL task. Previous works generally focus on using exclusive SQL generation prompt to improve the LLMs' reasoning ability. However, they are mostly hard to handle large databases with numerous tables and columns, and usually ignore the significance of pre-processing database and extracting v… ▽ More Large language models (LLMs) with in-context learning have significantly improved the performance of text-to-SQL task. Previous works generally focus on using exclusive SQL generation prompt to improve the LLMs' reasoning ability. However, they are mostly hard to handle large databases with numerous tables and columns, and usually ignore the significance of pre-processing database and extracting valuable information for more efficient prompt engineering. Based on above analysis, we propose RB-SQL, a novel retrieval-based LLM framework for in-context prompt engineering, which consists of three modules that retrieve concise tables and columns as schema, and targeted examples for in-context learning. Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider. △ Less

Submitted 12 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

Comments: Further improvement and modification are needed.

arXiv:2407.08194 [pdf, other]

Uncovering Emergent Spacetime Supersymmetry with Rydberg Atom Arrays

Authors: Chengshu Li, Shang Liu, Hanteng Wang, Wenjun Zhang, Zi-Xiang Li, Hui Zhai, Yingfei Gu

Abstract: In the zoo of emergent symmetries in quantum many-body physics, the previously unrealized emergent spacetime supersymmetry (SUSY) is particularly intriguing. Although it was known that spacetime SUSY could emerge at the (1+1)d tricritical Ising transition, an experimental realization is still absent. In this letter, we propose to realize the tricritical Ising transition with Rydberg atom arrays, t… ▽ More In the zoo of emergent symmetries in quantum many-body physics, the previously unrealized emergent spacetime supersymmetry (SUSY) is particularly intriguing. Although it was known that spacetime SUSY could emerge at the (1+1)d tricritical Ising transition, an experimental realization is still absent. In this letter, we propose to realize the tricritical Ising transition with Rydberg atom arrays, taking advantage of the reconfigurability of these systems. In such systems, the spacetime SUSY manifests itself in the respective correlation functions of a bosonic mode and its fermionic partner. However, the correlation function of the fermionic mode inevitably involves a string operator, making direct measurement challenging in the conventional setting. Here, we utilize the analog--digital hybrid nature of the Rydberg atom arrays, which can simulate a physical Hamiltonian and perform a digital quantum circuit on the same platform, to measure the correlation function of the fermionic mode. This hybridized protocol provides an experimentally feasible way to reveal the hidden structure of the spacetime SUSY that emerges at the tricritical Ising transition. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 7 pages, 3 figures

arXiv:2407.08165 [pdf, other]

Explicit_NeRF_QA: A Quality Assessment Database for Explicit NeRF Model Compression

Authors: Yuke Xing, Qi Yang, Kaifa Yang, Yilin Xu, Zhu Li

Abstract: In recent years, Neural Radiance Fields (NeRF) have demonstrated significant advantages in representing and synthesizing 3D scenes. Explicit NeRF models facilitate the practical NeRF applications with faster rendering speed, and also attract considerable attention in NeRF compression due to its huge storage cost. To address the challenge of the NeRF compression study, in this paper, we construct a… ▽ More In recent years, Neural Radiance Fields (NeRF) have demonstrated significant advantages in representing and synthesizing 3D scenes. Explicit NeRF models facilitate the practical NeRF applications with faster rendering speed, and also attract considerable attention in NeRF compression due to its huge storage cost. To address the challenge of the NeRF compression study, in this paper, we construct a new dataset, called Explicit_NeRF_QA. We use 22 3D objects with diverse geometries, textures, and material complexities to train four typical explicit NeRF models across five parameter levels. Lossy compression is introduced during the model generation, pivoting the selection of key parameters such as hash table size for InstantNGP and voxel grid resolution for Plenoxels. By rendering NeRF samples to processed video sequences (PVS), a large scale subjective experiment with lab environment is conducted to collect subjective scores from 21 viewers. The diversity of content, accuracy of mean opinion scores (MOS), and characteristics of NeRF distortion are comprehensively presented, establishing the heterogeneity of the proposed dataset. The state-of-the-art objective metrics are tested in the new dataset. Best Person correlation, which is around 0.85, is collected from the full-reference objective metric. All tested no-reference metrics report very poor results with 0.4 to 0.6 correlations, demonstrating the need for further development of more robust no-reference metrics. The dataset, including NeRF samples, source 3D objects, multiview images for NeRF generation, PVSs, MOS, is made publicly available at the following location: https://github.com/LittlericeChloe/Explicit_NeRF_QA. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 5 pages, 4 figures, 2 tables, conference

arXiv:2407.08032 [pdf, other]

Rossby Wave Instability and Substructure Formation in 3D Non-Ideal MHD Wind-Launching Disks

Authors: Chun-Yen Hsu, Zhi-Yun Li, Yisheng Tu, Xiao Hu, Min-Kai Lin

Abstract: Rings and gaps are routinely observed in the dust continuum emission of protoplanetary discs (PPDs). How they form and evolve remains debated. Previous studies have demonstrated the possibility of spontaneous gas rings and gaps formation in wind-launching disks. Here, we show that such gas substructures are unstable to the Rossby Wave Instability (RWI) through numerical simulations. Specifically,… ▽ More Rings and gaps are routinely observed in the dust continuum emission of protoplanetary discs (PPDs). How they form and evolve remains debated. Previous studies have demonstrated the possibility of spontaneous gas rings and gaps formation in wind-launching disks. Here, we show that such gas substructures are unstable to the Rossby Wave Instability (RWI) through numerical simulations. Specifically, shorter wavelength azimuthal modes develop earlier, and longer wavelength ones dominate later, forming elongated (arc-like) anti-cyclonic vortices in the rings and (strongly magnetized) cyclonic vortices in the gaps that persist until the end of the simulation. Highly elongated vortices with aspect ratios of 10 or more are found to decay with time in our non-ideal MHD simulation, in contrast with the hydro case. This difference could be caused by magnetically induced motions, particularly strong meridional circulations with large values of the azimuthal component of the vorticity, which may be incompatible with the columnar structure preferred by vortices. The cyclonic and anti-cyclonic RWI vortices saturate at moderate levels, modifying but not destroying the rings and gaps in the radial gas distribution of the disk. In particular, they do not shut off the poloidal magnetic flux accumulation in low-density regions and the characteristic meridional flow patterns that are crucial to the ring and gap formation in wind-launching disks. Nevertheless, the RWI and their associated vortices open up the possibility of producing non-axisymmetric dust features observed in a small fraction of protoplanetary disks through non-ideal MHD, although detailed dust treatment is needed to explore this possibility. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07952 [pdf, other]

Regular Electric Black Holes from EMS Gravity

Authors: Zhi-Chao Li, H. Lu

Abstract: We construct Einstein-Maxwell-Scalar (EMS) theories that admit regular electric black holes. Such a Maxwell-scalar theory is equivalent to some nonlinear electrodynamics (NLED) at the level of equations of motion, but it has the advantage of circumventing the no-go theorem of regular electric black holes under a given Lagrangian of NLED. We study the thermodynamics and show that the mass of the re… ▽ More We construct Einstein-Maxwell-Scalar (EMS) theories that admit regular electric black holes. Such a Maxwell-scalar theory is equivalent to some nonlinear electrodynamics (NLED) at the level of equations of motion, but it has the advantage of circumventing the no-go theorem of regular electric black holes under a given Lagrangian of NLED. We study the thermodynamics and show that the mass of the regular black hole can be determined solely by the Maxwell field, without having to know the metric profile function. Our formalism allows to study the applications of the electrically-charged regular black holes in areas that were previously available only to the non-regular ones. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Latex, 28 pages, 3 graphs grouped in one figure

arXiv:2407.07723 [pdf, other]

Understanding is Compression

Authors: Ziguang Li, Chao Huang, Xuliang Wang, Haibo Hu, Cole Wyeth, Dongbo Bu, Quan Yu, Wen Gao, Xingwu Liu, Ming Li

Abstract: We have previously shown all understanding or learning are compression, under reasonable assumptions. In principle, better understanding of data should improve data compression. Traditional compression methodologies focus on encoding frequencies or some other computable properties of data. Large language models approximate the uncomputable Solomonoff distribution, opening up a whole new avenue to… ▽ More We have previously shown all understanding or learning are compression, under reasonable assumptions. In principle, better understanding of data should improve data compression. Traditional compression methodologies focus on encoding frequencies or some other computable properties of data. Large language models approximate the uncomputable Solomonoff distribution, opening up a whole new avenue to justify our theory. Under the new uncomputable paradigm, we present LMCompress based on the understanding of data using large models. LMCompress has significantly better lossless compression ratios than all other lossless data compression methods, doubling the compression ratios of JPEG-XL for images, FLAC for audios and H264 for videos, and tripling or quadrupling the compression ratio of bz2 for texts. The better a large model understands the data, the better LMCompress compresses. △ Less

Submitted 23 June, 2024; originally announced July 2024.

arXiv:2407.07651 [pdf, other]

Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$

Authors: M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (645 additional authors not shown)

Abstract: The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be… ▽ More The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07545 [pdf]

Narrow Linewidth Laser Based on Extended Topological Interface States in One-Dimensional Photonic Crystals

Authors: Xiao Sun, Zhibo Li, Yiming Sun, Yupei Wang, Jue Wang, Huihua Cheng, Cong Fu, John H. Marsh, Anthony E. Kelly, Lianping Hou

Abstract: Recent advances in topological one-dimensional photonic crystal concepts have enabled the development of robust light-emitting devices by incorporating a topological interface state (TIS) at the cavity center. In this study, we theoretically and experimentally demonstrate a one-dimensional TIS-extended photonic crystal (1D-TISE-PC) structure. By integrating a linearly dispersive zero-index one-dim… ▽ More Recent advances in topological one-dimensional photonic crystal concepts have enabled the development of robust light-emitting devices by incorporating a topological interface state (TIS) at the cavity center. In this study, we theoretically and experimentally demonstrate a one-dimensional TIS-extended photonic crystal (1D-TISE-PC) structure. By integrating a linearly dispersive zero-index one-dimensional photonic crystal structure with a four-phase shift sampled grating, photons propagate along the cavity without phase differences, enhancing the robustness to material variations and extending the TIS. Our findings indicate that extending the TIS promotes a more uniform photon distribution along the laser cavity and mitigates the spatial hole burning (SHB) effect. We fabricated and characterized a 1550 nm sidewall 1D-TISE-PC semiconductor laser, achieving stable single-mode operation across a wide current range from 60 to 420 mA, with a side-mode suppression ratio of 50 dB. The 1D-TISE-PC structure exhibited a linewidth narrowing effect to approximately 150 kHz Lorentzian linewidth. Utilizing reconstruction equivalent-chirp technology for the 4PS sampled grating enabled precise wavelength control in 1D-TISE-PC laser arrays, achieving a wavelength spacing of 0.796 nm +- 0.003 nm. We show that the TIS still exists in the TISE cavity and topological protection is preserved. Its mode extension characteristics mitigate the SHB so narrows the linewidth. We argue that the design simplicity and improvement of the fabrication tolerance make this architecture suitable for high-power and narrow-linewidth semiconductor lasers development. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07501 [pdf]

Electronic Correlation and Pseudogap-like Behavior of High-Temperature Superconductor La3Ni2O7

Authors: Yidian Li, Xian Du, Yantao Cao, Cuiying Pei, Mingxin Zhang, Wenxuan Zhao, Kaiyi Zhai, Runzhe Xu, Zhongkai Liu, Zhiwei Li, Jinkui Zhao, Gang Li, Yanpeng Qi, Hanjie Guo, Yulin Chen, Lexian Yang

Abstract: High-temperature superconductivity (HTSC) remains one of the most challenging and fascinating mysteries in condensed matter physics. Recently, superconductivity with transition temperature exceeding liquid-nitrogen temperature is discovered in La3Ni2O7 at high pressure, which provides a new platform to explore the unconventional HTSC. In this work, using high-resolution angle-resolved photoemissio… ▽ More High-temperature superconductivity (HTSC) remains one of the most challenging and fascinating mysteries in condensed matter physics. Recently, superconductivity with transition temperature exceeding liquid-nitrogen temperature is discovered in La3Ni2O7 at high pressure, which provides a new platform to explore the unconventional HTSC. In this work, using high-resolution angle-resolved photoemission spectroscopy and ab-initio calculation, we systematically investigate the electronic structures of La3Ni2O7 at ambient pressure. Our experiments are in nice agreement with ab-initio calculations after considering an orbital-dependent band renormalization effect. The strong electron correlation effect pushes a flat band of d_(z^2 ) orbital component below the Fermi level (EF), which is predicted to locate right at EF under high pressure. Moreover, the d_(x^2-y^2 ) band shows a pseudogap-like behavior with suppressed spectral weight and diminished quasiparticle peak near EF. Our findings provide important insights into the electronic structure of La3Ni2O7, which will shed light on the understanding of the unconventional superconductivity in nickelates. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07447 [pdf]

Spin Splitting in Altermagnetic RuO$_2$ Enables Field-free Spin-Orbit Torque Switching via Dominant Out-of-Plane Spin Polarization

Authors: Zhuoyi Li, Zhe Zhang, Xianyang Lu, Yongbing Xu

Abstract: Researchers have recently identified a novel class of magnetism, termed "altermagnetism", which exhibits characteristics of both ferromagnetism and antiferromagnetism. Here, we report a groundbreaking discovery of efficient field-free spin-orbit torque (SOT) switching in a RuO$_2$ (101)/Co/Pt/Co/Pt/Ta structure. Our results demonstrate that the spin current flows along the [100] axis, induced by t… ▽ More Researchers have recently identified a novel class of magnetism, termed "altermagnetism", which exhibits characteristics of both ferromagnetism and antiferromagnetism. Here, we report a groundbreaking discovery of efficient field-free spin-orbit torque (SOT) switching in a RuO$_2$ (101)/Co/Pt/Co/Pt/Ta structure. Our results demonstrate that the spin current flows along the [100] axis, induced by the in-plane charge current, with the spin polarization direction aligned parallel to the Néel vector. These z-polarized spins generate an out-of-plane anti-damping torque, enabling deterministic switching of the Co/Pt layer without the necessity of an external magnetic field. The altermagnetic spin splitting effect (ASSE) in RuO$_2$ promotes the generation of spin currents with pronounced anisotropic behavior, maximized when the charge current flows along the [010] direction. This unique capability yields the highest field-free switching ratio, maintaining stable SOT switching within an external field range of approximately 400 Oe. Notably, ASSE dominates the spin current, especially when the current is aligned with the [010] direction (θ = 90°). Here, the spin polarization component creates a substantial field-like effective field, surpassing the damping-like field from . This highlights the crucial role of in enhancing spin-torque efficiency and elucidating spin flow modulation mechanics in this crystalline context. Our study highlights the potential of RuO$_2$ as a powerful spin current generator, paving the way for practical applications in spin-torque switching technologies and other cutting-edge spintronic devices. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07377 [pdf, other]

Pseudospin-filter tunneling of massless Dirac fermions

Authors: Z. D. Li, W. Zeng

Abstract: The tunneling of the massless Dirac fermions through a vector potential barrier are theoretically investigated, where the vector potential can be introduced by the very high and very thin (delta-function) magnetic potential barriers. We show that, distinct from the previously studied electric barrier tunneling, the vector potential barriers are more transparent for pseudospin-1/2 Dirac fermions bu… ▽ More The tunneling of the massless Dirac fermions through a vector potential barrier are theoretically investigated, where the vector potential can be introduced by the very high and very thin (delta-function) magnetic potential barriers. We show that, distinct from the previously studied electric barrier tunneling, the vector potential barriers are more transparent for pseudospin-1/2 Dirac fermions but more obstructive for pseudospin-1 Dirac fermions. By tuning the height of the vector potential barrier, the pseudospin-1/2 Dirac fermions remain transmitted, whereas the transmission of the pseudospin-1 Dirac fermions is forbidden, leading to a pseudospin filtering effect for massless Dirac fermions. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 5 pages, 3 figures

arXiv:2407.07337 [pdf, other]

In-Orbit Processing or Not? Sunlight-Aware Task Scheduling for Energy-Efficient Space Edge Computing Networks

Authors: Weisen Liu, Zeqi Lai, Qian Wu, Hewu Li, Qi Zhang, Zonglun Li, Yuanjie Li, Jun Liu

Abstract: With the rapid evolution of space-borne capabilities, space edge computing (SEC) is becoming a new computation paradigm for future integrated space and terrestrial networks. Satellite edges adopt advanced on-board hardware, which not only enables new opportunities to perform complex intelligent tasks in orbit, but also involves new challenges due to the additional energy consumption in power-const… ▽ More With the rapid evolution of space-borne capabilities, space edge computing (SEC) is becoming a new computation paradigm for future integrated space and terrestrial networks. Satellite edges adopt advanced on-board hardware, which not only enables new opportunities to perform complex intelligent tasks in orbit, but also involves new challenges due to the additional energy consumption in power-constrained space environment. In this paper, we present PHOENIX, an energy-efficient task scheduling framework for emerging SEC networks. PHOENIX exploits a key insight that in the SEC network, there always exist a number of sunlit edges which are illuminated during the entire orbital period and have sufficient energy supplement from the sun. PHOENIX accomplishes energy-efficient in-orbit computing by judiciously offloading space tasks to "sunlight-sufficient" edges or to the ground. Specifically, PHOENIX first formulates the SEC battery energy optimizing (SBEO) problem which aims at minimizing the average battery energy consumption while satisfying various task completion constraints. Then PHOENIX incorporates a sunlight-aware scheduling mechanism to solve the SBEO problem and schedule SEC tasks efficiently. Finally, we implement a PHOENIX prototype and build an SEC testbed. Extensive data-driven evaluations demonstrate that as compared to other state-of-the-art solutions, PHOENIX can effectively reduce up to 54.8% SEC battery energy consumption and prolong battery lifetime to 2.9$\times$ while still completing tasks on time. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: Accepted by IEEE INFOCOM 2024

arXiv:2407.07327 [pdf, other]

Fuse, Reason and Verify: Geometry Problem Solving with Parsed Clauses from Diagram

Authors: Ming-Liang Zhang, Zhong-Zhi Li, Fei Yin, Liang Lin, Cheng-Lin Liu

Abstract: Geometry problem solving (GPS) requires capacities of multi-modal understanding, multi-hop reasoning and theorem knowledge application. In this paper, we propose a neural-symbolic model for plane geometry problem solving (PGPS), named PGPSNet-v2, with three key steps: modal fusion, reasoning process and knowledge verification. In modal fusion, we leverage textual clauses to express fine-grained st… ▽ More Geometry problem solving (GPS) requires capacities of multi-modal understanding, multi-hop reasoning and theorem knowledge application. In this paper, we propose a neural-symbolic model for plane geometry problem solving (PGPS), named PGPSNet-v2, with three key steps: modal fusion, reasoning process and knowledge verification. In modal fusion, we leverage textual clauses to express fine-grained structural and semantic content of geometry diagram, and fuse diagram with textual problem efficiently through structural-semantic pre-training. For reasoning, we design an explicable solution program to describe the geometric reasoning process, and employ a self-limited decoder to generate solution program autoregressively. To reduce solution errors, a multi-level theorem verifier is proposed to eliminate solutions that do not match geometric principles, alleviating the hallucination of the neural model. We also construct a large-scale geometry problem dataset called PGPS9K, containing fine-grained annotations of textual clauses, solution program and involved knowledge tuples. Extensive experiments on datasets Geometry3K and PGPS9K show that our PGPSNet solver outperforms existing symbolic and neural solvers in GPS performance, while maintaining good explainability and reliability, and the solver components (fusion, reasoning, verification) are all justified effective. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: under review by journal

arXiv:2407.07020 [pdf, other]

Less is More: Efficient Brain-Inspired Learning for Autonomous Driving Trajectory Prediction

Authors: Haicheng Liao, Yongkang Li, Zhenning Li, Chengyue Wang, Chunlin Tian, Yuming Huang, Zilin Bian, Kaiqun Zhu, Guofa Li, Ziyuan Pu, Jia Hu, Zhiyong Cui, Chengzhong Xu

Abstract: Accurately and safely predicting the trajectories of surrounding vehicles is essential for fully realizing autonomous driving (AD). This paper presents the Human-Like Trajectory Prediction model (HLTP++), which emulates human cognitive processes to improve trajectory prediction in AD. HLTP++ incorporates a novel teacher-student knowledge distillation framework. The "teacher" model equipped with an… ▽ More Accurately and safely predicting the trajectories of surrounding vehicles is essential for fully realizing autonomous driving (AD). This paper presents the Human-Like Trajectory Prediction model (HLTP++), which emulates human cognitive processes to improve trajectory prediction in AD. HLTP++ incorporates a novel teacher-student knowledge distillation framework. The "teacher" model equipped with an adaptive visual sector, mimics the dynamic allocation of attention human drivers exhibit based on factors like spatial orientation, proximity, and driving speed. On the other hand, the "student" model focuses on real-time interaction and human decision-making, drawing parallels to the human memory storage mechanism. Furthermore, we improve the model's efficiency by introducing a new Fourier Adaptive Spike Neural Network (FA-SNN), allowing for faster and more precise predictions with fewer parameters. Evaluated using the NGSIM, HighD, and MoCAD benchmarks, HLTP++ demonstrates superior performance compared to existing models, which reduces the predicted trajectory error with over 11% on the NGSIM dataset and 25% on the HighD datasets. Moreover, HLTP++ demonstrates strong adaptability in challenging environments with incomplete input data. This marks a significant stride in the journey towards fully AD systems. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2402.19251

arXiv:2407.06904 [pdf, other]

Hypergraph based Understanding for Document Semantic Entity Recognition

Authors: Qiwei Li, Zuchao Li, Ping Wang, Haojun Ai, Hai Zhao

Abstract: Semantic entity recognition is an important task in the field of visually-rich document understanding. It distinguishes the semantic types of text by analyzing the position relationship between text nodes and the relation between text content. The existing document understanding models mainly focus on entity categories while ignoring the extraction of entity boundaries. We build a novel hypergraph… ▽ More Semantic entity recognition is an important task in the field of visually-rich document understanding. It distinguishes the semantic types of text by analyzing the position relationship between text nodes and the relation between text content. The existing document understanding models mainly focus on entity categories while ignoring the extraction of entity boundaries. We build a novel hypergraph attention document semantic entity recognition framework, HGA, which uses hypergraph attention to focus on entity boundaries and entity categories at the same time. It can conduct a more detailed analysis of the document text representation analyzed by the upstream model and achieves a better performance of semantic information. We apply this method on the basis of GraphLayoutLM to construct a new semantic entity recognition model HGALayoutLM. Our experiment results on FUNSD, CORD, XFUND and SROIE show that our method can effectively improve the performance of semantic entity recognition tasks based on the original model. The results of HGALayoutLM on FUNSD and XFUND reach the new state-of-the-art results. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06886 [pdf, other]

Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI

Authors: Yang Liu, Weixing Chen, Yongjie Bai, Jingzhou Luo, Xinshuai Song, Kaixuan Jiang, Zhida Li, Ganlong Zhao, Junyi Lin, Guanbin Li, Wen Gao, Liang Lin

Abstract: Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace and the physical world. Recently, the emergence of Multi-modal Large Models (MLMs) and World Models (WMs) have attracted significant attention due to their remarkable perception, interaction, and reasoning capabilit… ▽ More Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace and the physical world. Recently, the emergence of Multi-modal Large Models (MLMs) and World Models (WMs) have attracted significant attention due to their remarkable perception, interaction, and reasoning capabilities, making them a promising architecture for the brain of embodied agents. However, there is no comprehensive survey for Embodied AI in the era of MLMs. In this survey, we give a comprehensive exploration of the latest advancements in Embodied AI. Our analysis firstly navigates through the forefront of representative works of embodied robots and simulators, to fully understand the research focuses and their limitations. Then, we analyze four main research targets: 1) embodied perception, 2) embodied interaction, 3) embodied agent, and 4) sim-to-real adaptation, covering the state-of-the-art methods, essential paradigms, and comprehensive datasets. Additionally, we explore the complexities of MLMs in virtual and real embodied agents, highlighting their significance in facilitating interactions in dynamic digital and physical environments. Finally, we summarize the challenges and limitations of embodied AI and discuss their potential future directions. We hope this survey will serve as a foundational reference for the research community and inspire continued innovation. The associated project can be found at https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List. △ Less

Submitted 11 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

Comments: The first comprehensive review of Embodied AI in the era of MLMs, 37 pages. We also provide the paper list for Embodied AI: https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List

arXiv:2407.06846 [pdf, other]

SilverCycling: Exploring the Impact of Bike-Based Locomotion on Spatial Orientation for Older Adults in VR

Authors: Qiongyan Chen, Zhiqing Wu, Yucheng Liu, Lei Han, Zisu Li, Ge Lin Kan, Mingming Fan

Abstract: Spatial orientation is essential for people to effectively navigate and interact with the environment in everyday life. With age-related cognitive decline, providing VR locomotion techniques with better spatial orientation performance for older adults becomes important. Such advancements not only make VR more accessible to older adults but also enable them to reap the potential health benefits of… ▽ More Spatial orientation is essential for people to effectively navigate and interact with the environment in everyday life. With age-related cognitive decline, providing VR locomotion techniques with better spatial orientation performance for older adults becomes important. Such advancements not only make VR more accessible to older adults but also enable them to reap the potential health benefits of VR technology. Natural motion-based locomotion has been shown to be effective in enhancing younger users' performance in VR navigation tasks that require spatial orientation. However, there is a lack of understanding regarding the impact of natural motion-based locomotion on spatial orientation for older adults in VR. To address this gap, we selected the SilverCycling system, a VR bike-based locomotion technique that we developed, as a representative of natural motion-based locomotion, guided by findings from our pilot study. We conducted a user study with 16 older adults to compare SilverCycling with the joystick-based controller. The findings suggest SilverCycling's potential to significantly enhance spatial orientation in the open-road urban environment for older adults, offering a better user experience. Based on our findings, we identify key factors influencing spatial orientation and propose design recommendations to make VR locomotion more accessible and user-friendly for older adults. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: 19 pages, 6 figures

arXiv:2407.06833 [pdf, other]

Training-free CryoET Tomogram Segmentation

Authors: Yizhou Zhao, Hengwei Bian, Michael Mu, Mostofa R. Uddin, Zhenyang Li, Xiang Li, Tianyang Wang, Min Xu

Abstract: Cryogenic Electron Tomography (CryoET) is a useful imaging technology in structural biology that is hindered by its need for manual annotations, especially in particle picking. Recent works have endeavored to remedy this issue with few-shot learning or contrastive learning techniques. However, supervised training is still inevitable for them. We instead choose to leverage the power of existing 2D… ▽ More Cryogenic Electron Tomography (CryoET) is a useful imaging technology in structural biology that is hindered by its need for manual annotations, especially in particle picking. Recent works have endeavored to remedy this issue with few-shot learning or contrastive learning techniques. However, supervised training is still inevitable for them. We instead choose to leverage the power of existing 2D foundation models and present a novel, training-free framework, CryoSAM. In addition to prompt-based single-particle instance segmentation, our approach can automatically search for similar features, facilitating full tomogram semantic segmentation with only one prompt. CryoSAM is composed of two major parts: 1) a prompt-based 3D segmentation system that uses prompts to complete single-particle instance segmentation recursively with Cross-Plane Self-Prompting, and 2) a Hierarchical Feature Matching mechanism that efficiently matches relevant features with extracted tomogram features. They collaborate to enable the segmentation of all particles of one category with just one particle-specific prompt. Our experiments show that CryoSAM outperforms existing works by a significant margin and requires even fewer annotations in particle picking. Further visualizations demonstrate its ability when dealing with full tomogram segmentation for various subcellular structures. Our code is available at: https://github.com/xulabs/aitom △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: This preprint has not undergone peer review or any post-submission improvements or corrections. The Version of Record of this contribution will be published in MICCAI 2024

arXiv:2407.06767 [pdf, other]

Enhancing Robustness and Security in ISAC Network Design: Leveraging Transmissive Reconfigurable Intelligent Surface with RSMA

Authors: Ziwei Liu, Wen Chen, Qingqing Wu, Zhendong Li, Xusheng Zhu, Qiong Wu, Nan Cheng

Abstract: In this paper, we propose a novel transmissive reconfigurable intelligent surface transceiver-enhanced robust and secure integrated sensing and communication network. A time-division sensing communication mechanism is designed for the scenario, which enables communication and sensing to share wireless resources. To address the interference management problem and hinder eavesdropping, we implement… ▽ More In this paper, we propose a novel transmissive reconfigurable intelligent surface transceiver-enhanced robust and secure integrated sensing and communication network. A time-division sensing communication mechanism is designed for the scenario, which enables communication and sensing to share wireless resources. To address the interference management problem and hinder eavesdropping, we implement rate-splitting multiple access (RSMA), where the common stream is designed as a useful signal and an artificial noise, while taking into account the imperfect channel state information and modeling the channel for the illegal users in a fine-grained manner as well as giving an upper bound on the error. We introduce the secrecy outage probability and construct an optimization problem with secrecy sum-rate as the objective functions to optimize the common stream beamforming matrix, the private stream beamforming matrix and the timeslot duration variable. Due to the coupling of the optimization variables and the infinity of the error set, the proposed problem is a nonconvex optimization problem that cannot be solved directly. In order to address the above challenges, the block coordinate descent-based second-order cone programming algorithm is used to decouple the optimization variables and solving the problem. Specifically, the problem is decoupled into two subproblems concerning the common stream beamforming matrix, the private stream beamforming matrix, and the timeslot duration variable, which are solved by alternating optimization until convergence is reached. To solve the problem, S-procedure, Bernstein's inequality and successive convex approximation are employed to deal with the objective function and non-convex constraints. Numerical simulation results verify the superiority of the proposed scheme in improving the secrecy energy efficiency and the Cramér-Rao boundary. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06687 [pdf, other]

Realization of Conditional Operations through Transition Pathway Engineering

Authors: Sheng Zhang, Peng Duan, Yun-Jie Wang, Tian-Le Wang, Peng Wang, Ren-Ze Zhao, Xiao-Yan Yang, Ze-An Zhao, Liang-Liang Guo, Yong Chen, Hai-Feng Zhang, Lei Du, Hao-Ran Tao, Zhi-Fei Li, Yuan Wu, Zhi-Long Jia, Wei-Cheng Kong, Zhao-Yun Chen, Yu-Chun Wu, Guo-Ping Guo

Abstract: In the NISQ era, achieving large-scale quantum computing demands compact circuits to mitigate decoherence and gate error accumulation. Quantum operations with diverse degrees of freedom hold promise for circuit compression, but conventional approaches encounter challenges in simultaneously adjusting multiple parameters. Here, we propose a transition composite gate (TCG) scheme grounded on state-se… ▽ More In the NISQ era, achieving large-scale quantum computing demands compact circuits to mitigate decoherence and gate error accumulation. Quantum operations with diverse degrees of freedom hold promise for circuit compression, but conventional approaches encounter challenges in simultaneously adjusting multiple parameters. Here, we propose a transition composite gate (TCG) scheme grounded on state-selective transition path engineering, enabling more expressive conditional operations. We experimentally validate a controlled unitary (CU) gate as an example, with independent and continuous parameters. By adjusting the parameters of $\rm X^{12}$ gate, we obtain the CU family with a fidelity range of 95.2% to 99.0% leveraging quantum process tomography (QPT). To demonstrate the capability of circuit compression, we use TCG scheme to prepare 3-qubit Greenberger-Horne-Zeilinger (GHZ) and W states, with the fidelity of 96.77% and 95.72%. TCG can achieve the reduction in circuit depth of about 40% and 44% compared with the use of CZ gates only. Moreover, we show that short-path TCG (SPTCG) can further reduce the state-preparation circuit time cost. The TCG scheme exhibits advantages in certain quantum circuits and shows significant potential for large-scale quantum algorithms. △ Less

Submitted 10 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

Comments: 21 pages, 12 figures

arXiv:2407.06642 [pdf, other]

Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning

Authors: Fanyue Wei, Wei Zeng, Zhenyang Li, Dawei Yin, Lixin Duan, Wen Li

Abstract: Personalized text-to-image models allow users to generate varied styles of images (specified with a sentence) for an object (specified with a set of reference images). While remarkable results have been achieved using diffusion-based generation models, the visual structure and details of the object are often unexpectedly changed during the diffusion process. One major reason is that these diffusio… ▽ More Personalized text-to-image models allow users to generate varied styles of images (specified with a sentence) for an object (specified with a set of reference images). While remarkable results have been achieved using diffusion-based generation models, the visual structure and details of the object are often unexpectedly changed during the diffusion process. One major reason is that these diffusion-based approaches typically adopt a simple reconstruction objective during training, which can hardly enforce appropriate structural consistency between the generated and the reference images. To this end, in this paper, we design a novel reinforcement learning framework by utilizing the deterministic policy gradient method for personalized text-to-image generation, with which various objectives, differential or even non-differential, can be easily incorporated to supervise the diffusion models to improve the quality of the generated images. Experimental results on personalized text-to-image generation benchmark datasets demonstrate that our proposed approach outperforms existing state-of-the-art methods by a large margin on visual fidelity while maintaining text-alignment. Our code is available at: \url{https://github.com/wfanyue/DPG-T2I-Personalization}. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: Accepted by ECCV 2024

arXiv:2407.06584 [pdf, other]

HiLMa-Res: A General Hierarchical Framework via Residual RL for Combining Quadrupedal Locomotion and Manipulation

Authors: Xiaoyu Huang, Qiayuan Liao, Yiming Ni, Zhongyu Li, Laura Smith, Sergey Levine, Xue Bin Peng, Koushil Sreenath

Abstract: This work presents HiLMa-Res, a hierarchical framework leveraging reinforcement learning to tackle manipulation tasks while performing continuous locomotion using quadrupedal robots. Unlike most previous efforts that focus on solving a specific task, HiLMa-Res is designed to be general for various loco-manipulation tasks that require quadrupedal robots to maintain sustained mobility. The novel des… ▽ More This work presents HiLMa-Res, a hierarchical framework leveraging reinforcement learning to tackle manipulation tasks while performing continuous locomotion using quadrupedal robots. Unlike most previous efforts that focus on solving a specific task, HiLMa-Res is designed to be general for various loco-manipulation tasks that require quadrupedal robots to maintain sustained mobility. The novel design of this framework tackles the challenges of integrating continuous locomotion control and manipulation using legs. It develops an operational space locomotion controller that can track arbitrary robot end-effector (toe) trajectories while walking at different velocities. This controller is designed to be general to different downstream tasks, and therefore, can be utilized in high-level manipulation planning policy to address specific tasks. To demonstrate the versatility of this framework, we utilize HiLMa-Res to tackle several challenging loco-manipulation tasks using a quadrupedal robot in the real world. These tasks span from leveraging state-based policy to vision-based policy, from training purely from the simulation data to learning from real-world data. In these tasks, HiLMa-Res shows better performance than other methods. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: IROS 2024

arXiv:2407.06334 [pdf, other]

Double-Ended Synthesis Planning with Goal-Constrained Bidirectional Search

Authors: Kevin Yu, Jihye Roh, Ziang Li, Wenhao Gao, Runzhong Wang, Connor W. Coley

Abstract: Computer-aided synthesis planning (CASP) algorithms have demonstrated expert-level abilities in planning retrosynthetic routes to molecules of low to moderate complexity. However, current search methods assume the sufficiency of reaching arbitrary building blocks, failing to address the common real-world constraint where using specific molecules is desired. To this end, we present a formulation of… ▽ More Computer-aided synthesis planning (CASP) algorithms have demonstrated expert-level abilities in planning retrosynthetic routes to molecules of low to moderate complexity. However, current search methods assume the sufficiency of reaching arbitrary building blocks, failing to address the common real-world constraint where using specific molecules is desired. To this end, we present a formulation of synthesis planning with starting material constraints. Under this formulation, we propose Double-Ended Synthesis Planning (DESP), a novel CASP algorithm under a bidirectional graph search scheme that interleaves expansions from the target and from the goal starting materials to ensure constraint satisfiability. The search algorithm is guided by a goal-conditioned cost network learned offline from a partially observed hypergraph of valid chemical reactions. We demonstrate the utility of DESP in improving solve rates and reducing the number of search expansions by biasing synthesis planning towards expert goals on multiple new benchmarks. DESP can make use of existing one-step retrosynthesis models, and we anticipate its performance to scale as these one-step model capabilities improve. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 10 pages main, 4 figures

arXiv:2407.06310 [pdf, other]

Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation

Authors: Mengzhe Geng, Xurong Xie, Jiajun Deng, Zengrui Jin, Guinan Li, Tianzi Wang, Shujie Hu, Zhaoqing Li, Helen Meng, Xunying Liu

Abstract: The application of data-intensive automatic speech recognition (ASR) technologies to dysarthric and elderly adult speech is confronted by their mismatch against healthy and nonaged voices, data scarcity and large speaker-level variability. To this end, this paper proposes two novel data-efficient methods to learn homogeneous dysarthric and elderly speaker-level features for rapid, on-the-fly test-… ▽ More The application of data-intensive automatic speech recognition (ASR) technologies to dysarthric and elderly adult speech is confronted by their mismatch against healthy and nonaged voices, data scarcity and large speaker-level variability. To this end, this paper proposes two novel data-efficient methods to learn homogeneous dysarthric and elderly speaker-level features for rapid, on-the-fly test-time adaptation of DNN/TDNN and Conformer ASR models. These include: 1) speaker-level variance-regularized spectral basis embedding (VR-SBE) features that exploit a special regularization term to enforce homogeneity of speaker features in adaptation; and 2) feature-based learning hidden unit contributions (f-LHUC) transforms that are conditioned on VR-SBE features. Experiments are conducted on four tasks across two languages: the English UASpeech and TORGO dysarthric speech datasets, the English DementiaBank Pitt and Cantonese JCCOCC MoCA elderly speech corpora. The proposed on-the-fly speaker adaptation techniques consistently outperform baseline iVector and xVector adaptation by statistically significant word or character error rate reductions up to 5.32% absolute (18.57% relative) and batch-mode LHUC speaker adaptation by 2.24% absolute (9.20% relative), while operating with real-time factors speeding up to 33.6 times against xVectors during adaptation. The efficacy of the proposed adaptation techniques is demonstrated in a comparison against current ASR technologies including SSL pre-trained systems on UASpeech, where our best system produces a state-of-the-art WER of 23.33%. Analyses show VR-SBE features and f-LHUC transforms are insensitive to speaker-level data quantity in testtime adaptation. T-SNE visualization reveals they have stronger speaker-level homogeneity than baseline iVectors, xVectors and batch-mode LHUC transforms. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: In submission to IEEE/ACM Transactions on Audio, Speech, and Language Processing

arXiv:2407.05709 [pdf, other]

Heterogeneous window transformer for image denoising

Authors: Chunwei Tian, Menghua Zheng, Chia-Wen Lin, Zhiwu Li, David Zhang

Abstract: Deep networks can usually depend on extracting more structural information to improve denoising results. However, they may ignore correlation between pixels from an image to pursue better denoising performance. Window transformer can use long- and short-distance modeling to interact pixels to address mentioned problem. To make a tradeoff between distance modeling and denoising time, we propose a h… ▽ More Deep networks can usually depend on extracting more structural information to improve denoising results. However, they may ignore correlation between pixels from an image to pursue better denoising performance. Window transformer can use long- and short-distance modeling to interact pixels to address mentioned problem. To make a tradeoff between distance modeling and denoising time, we propose a heterogeneous window transformer (HWformer) for image denoising. HWformer first designs heterogeneous global windows to capture global context information for improving denoising effects. To build a bridge between long and short-distance modeling, global windows are horizontally and vertically shifted to facilitate diversified information without increasing denoising time. To prevent the information loss phenomenon of independent patches, sparse idea is guided a feed-forward network to extract local information of neighboring patches. The proposed HWformer only takes 30% of popular Restormer in terms of denoising time. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05676 [pdf, other]

Continuous broadband Rydberg receiver using AC Stark shifts and Floquet States

Authors: Danni Song, Yuechun Jiao, Jinlian Hu, Yuwen Yin, Zhenhua Li, Yunhui He, Jingxu Bai, Jianming Zhao, Suotang Jia

Abstract: We demonstrate the continuous broadband microwave receivers based on AC Stark shifts and Floquet States of Rydberg levels in a cesium atomic vapor cell. The resonant transition frequency of two adjacent Rydberg states 78$S_{1/2}$ and 78$P_{1/2}$ is tuned based on AC Stark effect of 70~MHz Radio frequency (RF) field that is applied outside the vapor cell. Meanwhile, the Rydberg states also exhibit… ▽ More We demonstrate the continuous broadband microwave receivers based on AC Stark shifts and Floquet States of Rydberg levels in a cesium atomic vapor cell. The resonant transition frequency of two adjacent Rydberg states 78$S_{1/2}$ and 78$P_{1/2}$ is tuned based on AC Stark effect of 70~MHz Radio frequency (RF) field that is applied outside the vapor cell. Meanwhile, the Rydberg states also exhibit Floquet even-order sidebands that are used to extend the bandwidths further. We achieve microwave electric field measurements over 1.172~GHz of continuous frequency range. The sensitivity of the Rydberg receiver with heterodyne technique in the absence of RF field is 280.2~nVcm$^{-1}$Hz$^{-1/2}$, while it is dramatically decreased with tuning the resonant transition frequency in the presence of RF field. Surprisingly, the sensitivity can be greatly improved if the microwave field couples the Floquet sideband transition. The achieving of continuous frequency and high sensitivity microwave detection will promote the application of Rydberg receiver in the radar technique and wireless communication. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 5 pages, 4 figures

arXiv:2407.05600 [pdf, other]

GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing

Authors: Zhenyu Wang, Aoxue Li, Zhenguo Li, Xihui Liu

Abstract: Despite the success achieved by existing image generation and editing methods, current models still struggle with complex problems including intricate text prompts, and the absence of verification and self-correction mechanisms makes the generated images unreliable. Meanwhile, a single model tends to specialize in particular tasks and possess the corresponding capabilities, making it inadequate fo… ▽ More Despite the success achieved by existing image generation and editing methods, current models still struggle with complex problems including intricate text prompts, and the absence of verification and self-correction mechanisms makes the generated images unreliable. Meanwhile, a single model tends to specialize in particular tasks and possess the corresponding capabilities, making it inadequate for fulfilling all user requirements. We propose GenArtist, a unified image generation and editing system, coordinated by a multimodal large language model (MLLM) agent. We integrate a comprehensive range of existing models into the tool library and utilize the agent for tool selection and execution. For a complex problem, the MLLM agent decomposes it into simpler sub-problems and constructs a tree structure to systematically plan the procedure of generation, editing, and self-correction with step-by-step verification. By automatically generating missing position-related inputs and incorporating position information, the appropriate tool can be effectively employed to address each sub-problem. Experiments demonstrate that GenArtist can perform various generation and editing tasks, achieving state-of-the-art performance and surpassing existing models such as SDXL and DALL-E 3, as can be seen in Fig. 1. Project page is https://zhenyuw16.github.io/GenArtist_page. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05562 [pdf, other]

Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition

Authors: Bangbang Zhou, Yadong Qu, Zixiao Wang, Zicheng Li, Boqiang Zhang, Hongtao Xie

Abstract: Recently, scene text recognition (STR) models have shown significant performance improvements. However, existing models still encounter difficulties in recognizing challenging texts that involve factors such as severely distorted and perspective characters. These challenging texts mainly cause two problems: (1) Large Intra-Class Variance. (2) Small Inter-Class Variance. An extremely distorted char… ▽ More Recently, scene text recognition (STR) models have shown significant performance improvements. However, existing models still encounter difficulties in recognizing challenging texts that involve factors such as severely distorted and perspective characters. These challenging texts mainly cause two problems: (1) Large Intra-Class Variance. (2) Small Inter-Class Variance. An extremely distorted character may prominently differ visually from other characters within the same category, while the variance between characters from different classes is relatively small. To address the above issues, we propose a novel method that enriches the character features to enhance the discriminability of characters. Firstly, we propose the Character-Aware Constraint Encoder (CACE) with multiple blocks stacked. CACE introduces a decay matrix in each block to explicitly guide the attention region for each token. By continuously employing the decay matrix, CACE enables tokens to perceive morphological information at the character level. Secondly, an Intra-Inter Consistency Loss (I^2CL) is introduced to consider intra-class compactness and inter-class separability at feature space. I^2CL improves the discriminative capability of features by learning a long-term memory unit for each character category. Trained with synthetic data, our model achieves state-of-the-art performance on common benchmarks (94.1% accuracy) and Union14M-Benchmark (61.6% accuracy). Code is available at https://github.com/bang123-box/CFE. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: Accepted to IJCAI2024

arXiv:2407.05370 [pdf, other]

Learning Label Refinement and Threshold Adjustment for Imbalanced Semi-Supervised Learning

Authors: Zeju Li, Ying-Qiu Zheng, Chen Chen, Saad Jbabdi

Abstract: Semi-supervised learning (SSL) algorithms struggle to perform well when exposed to imbalanced training data. In this scenario, the generated pseudo-labels can exhibit a bias towards the majority class, and models that employ these pseudo-labels can further amplify this bias. Here we investigate pseudo-labeling strategies for imbalanced SSL including pseudo-label refinement and threshold adjustment… ▽ More Semi-supervised learning (SSL) algorithms struggle to perform well when exposed to imbalanced training data. In this scenario, the generated pseudo-labels can exhibit a bias towards the majority class, and models that employ these pseudo-labels can further amplify this bias. Here we investigate pseudo-labeling strategies for imbalanced SSL including pseudo-label refinement and threshold adjustment, through the lens of statistical analysis. We find that existing SSL algorithms which generate pseudo-labels using heuristic strategies or uncalibrated model confidence are unreliable when imbalanced class distributions bias pseudo-labels. To address this, we introduce SEmi-supervised learning with pseudo-label optimization based on VALidation data (SEVAL) to enhance the quality of pseudo-labelling for imbalanced SSL. We propose to learn refinement and thresholding parameters from a partition of the training dataset in a class-balanced way. SEVAL adapts to specific tasks with improved pseudo-labels accuracy and ensures pseudo-labels correctness on a per-class basis. Our experiments show that SEVAL surpasses state-of-the-art SSL methods, delivering more accurate and effective pseudo-labels in various imbalanced SSL situations. SEVAL, with its simplicity and flexibility, can enhance various SSL techniques effectively. The code is publicly available~\footnote{\url{https://github.com/ZerojumpLine/SEVAL}}. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.05332 [pdf, other]

Experimental investigation of direct non-Hermitian measurement and uncertainty relation towards high-dimensional quantum domain

Authors: Yi-Tao Wang, Zhao-An Wang, Zhi-Peng Li, Xiao-Dong Zeng, Jia-Ming Ren, Wei Liu, Yuan-Ze Yang, Nai-Jie Guo, Lin-Ke Xie, Jun-You Liu, Yu-Hang Ma, Jian-Shun Tang, Chengjie Zhang, Chuan-Feng Li, Guang-Can Guo

Abstract: Non-Hermitian dynamics in quantum systems have unveiled novel phenomena, yet the implementation of valid non-Hermitian quantum measurement remains a challenge, because a universal quantum projective mechanism on the complete but skewed non-Hermitian eigenstates is not explicit in experiment. This limitation hinders the direct acquisition of non-Hermitian observable statistics (e.g., non-Hermitian… ▽ More Non-Hermitian dynamics in quantum systems have unveiled novel phenomena, yet the implementation of valid non-Hermitian quantum measurement remains a challenge, because a universal quantum projective mechanism on the complete but skewed non-Hermitian eigenstates is not explicit in experiment. This limitation hinders the direct acquisition of non-Hermitian observable statistics (e.g., non-Hermitian population dynamics), also constrains investigations of non-Hermitian quantum measurement properties such as uncertainty relation. Here, we address these challenges by presenting a non-Hermitian projective protocol and investigating the non-Hermitian uncertainty relation. We derive the uncertainty relation for pseudo-Hermitian (PH) observables that is generalized beyond the Hermitian ones. We then investigate the projective properties of general quantum states onto complete non-Hermitian eigenvectors, and present a quantum simulating method to apply the valid non-Hermitian projective measurement on a direct-sum dilated space. Subsequently, we experimentally construct a quantum simulator in the quantum optical circuit and realize the 3-dimensional non-Hermitian quantum measurement on the single-photon qutrit. Employing this platform, we explore the uncertainty relation experimentally with different PH metrics. Our non-Hermitian quantum measurement method is state-independent and outputs directly the non-Hermitian quantum projective statistics, paving the way for studies of extensive non-Hermitian observable in quantum domain. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: 6 pages, 4 figures

arXiv:2407.05117 [pdf, ps, other]

Search for the baryon number and lepton number violating decays $τ^-\to Λπ^-$ and $τ^-\to \barΛπ^-$ at Belle II

Authors: Belle II Collaboration, I. Adachi, L. Aggarwal, H. Ahmed, H. Aihara, N. Akopov, A. Aloisio, N. Althubiti, N. Anh Ky, D. M. Asner, H. Atmacan, T. Aushev, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, S. Bahinipati, P. Bambade, Sw. Banerjee, S. Bansal, M. Barrett, J. Baudot, A. Baur, A. Beaubien , et al. (349 additional authors not shown)

Abstract: We present a search for the baryon number $B$ and lepton number $L$ violating decays $τ^- \rightarrow Λπ^-$ and $τ^- \rightarrow \barΛ π^-$ produced from the $e^+e^-\to τ^+τ^-$ process, using a 364 fb$^{-1}$ data sample collected by the Belle~II experiment at the SuperKEKB collider. No evidence of signal is found in either decay mode, which have $|Δ(B-L)|$ equal to $2$ and $0$, respectively. Upper… ▽ More We present a search for the baryon number $B$ and lepton number $L$ violating decays $τ^- \rightarrow Λπ^-$ and $τ^- \rightarrow \barΛ π^-$ produced from the $e^+e^-\to τ^+τ^-$ process, using a 364 fb$^{-1}$ data sample collected by the Belle~II experiment at the SuperKEKB collider. No evidence of signal is found in either decay mode, which have $|Δ(B-L)|$ equal to $2$ and $0$, respectively. Upper limits at 90\% credibility level on the branching fractions of $τ^- \rightarrow Λπ^-$ and $τ^- \rightarrow \barΛπ^-$ are determined to be $4.7 \times 10^{-8}$ and $4.3 \times 10^{-8}$, respectively. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: 8 pages, 4 figures

Report number: Belle II Preprint 2024-020; KEK Preprint 2024-17

arXiv:2407.04999 [pdf, other]

Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs

Authors: Zhengdao Li, Yong Cao, Kefan Shuai, Yiming Miao, Kai Hwang

Abstract: Graph classification benchmarks, vital for assessing and developing graph neural networks (GNNs), have recently been scrutinized, as simple methods like MLPs have demonstrated comparable performance. This leads to an important question: Do these benchmarks effectively distinguish the advancements of GNNs over other methodologies? If so, how do we quantitatively measure this effectiveness? In respo… ▽ More Graph classification benchmarks, vital for assessing and developing graph neural networks (GNNs), have recently been scrutinized, as simple methods like MLPs have demonstrated comparable performance. This leads to an important question: Do these benchmarks effectively distinguish the advancements of GNNs over other methodologies? If so, how do we quantitatively measure this effectiveness? In response, we first propose an empirical protocol based on a fair benchmarking framework to investigate the performance discrepancy between simple methods and GNNs. We further propose a novel metric to quantify the dataset effectiveness by considering both dataset complexity and model performance. To the best of our knowledge, our work is the first to thoroughly study and provide an explicit definition for dataset effectiveness in the graph learning area. Through testing across 16 real-world datasets, we found our metric to align with existing studies and intuitive assumptions. Finally, we explore the causes behind the low effectiveness of certain datasets by investigating the correlation between intrinsic graph properties and class labels, and we developed a novel technique supporting the correlation-controllable synthetic dataset generation. Our findings shed light on the current understanding of benchmark datasets, and our new platform could fuel the future evolution of graph classification benchmarks. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2407.04989 [pdf, ps, other]

FPTAS for Holant Problems with Log-Concave Signatures

Authors: Kun He, Zhidan Li, Guoliang Qiu, Chihao Zhang

Abstract: For an integer $b\ge 0$, a $b$-matching in a graph $G=(V,E)$ is a set $S\subseteq E$ such that each vertex $v\in V$ is incident to at most $b$ edges in $S$. We design a fully polynomial-time approximation scheme (FPTAS) for counting the number of $b$-matchings in graphs with bounded degrees. Our FPTAS also applies to a broader family of counting problems, namely Holant problems with log-concave si… ▽ More For an integer $b\ge 0$, a $b$-matching in a graph $G=(V,E)$ is a set $S\subseteq E$ such that each vertex $v\in V$ is incident to at most $b$ edges in $S$. We design a fully polynomial-time approximation scheme (FPTAS) for counting the number of $b$-matchings in graphs with bounded degrees. Our FPTAS also applies to a broader family of counting problems, namely Holant problems with log-concave signatures. Our algorithm is based on Moitra's linear programming approach (JACM'19). Using a novel construction called the extended coupling tree, we derandomize the coupling designed by Chen and Gu (SODA'24). △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2407.04903 [pdf, other]

MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension

Authors: Zekun Li, Xianjun Yang, Kyuri Choi, Wanrong Zhu, Ryan Hsieh, HyeonJung Kim, Jin Hyuk Lim, Sungyoung Ji, Byungju Lee, Xifeng Yan, Linda Ruth Petzold, Stephen D. Wilson, Woosang Lim, William Yang Wang

Abstract: The rapid advancement of Large Language Models (LLMs) and Large Multimodal Models (LMMs) has heightened the demand for AI-based scientific assistants capable of understanding scientific articles and figures. Despite progress, there remains a significant gap in evaluating models' comprehension of professional, graduate-level, and even PhD-level scientific content. Current datasets and benchmarks pr… ▽ More The rapid advancement of Large Language Models (LLMs) and Large Multimodal Models (LMMs) has heightened the demand for AI-based scientific assistants capable of understanding scientific articles and figures. Despite progress, there remains a significant gap in evaluating models' comprehension of professional, graduate-level, and even PhD-level scientific content. Current datasets and benchmarks primarily focus on relatively simple scientific tasks and figures, lacking comprehensive assessments across diverse advanced scientific disciplines. To bridge this gap, we collected a multimodal, multidisciplinary dataset from open-access scientific articles published in Nature Communications journals. This dataset spans 72 scientific disciplines, ensuring both diversity and quality. We created benchmarks with various tasks and settings to comprehensively evaluate LMMs' capabilities in understanding scientific figures and content. Our evaluation revealed that these tasks are highly challenging: many open-source models struggled significantly, and even GPT-4V and GPT-4o faced difficulties. We also explored using our dataset as training resources by constructing visual instruction-following data, enabling the 7B LLaVA model to achieve performance comparable to GPT-4V/o on our benchmark. Additionally, we investigated the use of our interleaved article texts and figure images for pre-training LMMs, resulting in improvements on the material generation task. The source dataset, including articles, figures, constructed benchmarks, and visual instruction-following data, is open-sourced. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: Code and data are available at https://github.com/Leezekun/MMSci

arXiv:2407.04711 [pdf, other]

MetaFruit Meets Foundation Models: Leveraging a Comprehensive Multi-Fruit Dataset for Advancing Agricultural Foundation Models

Authors: Jiajia Li, Kyle Lammers, Xunyuan Yin, Xiang Yin, Long He, Renfu Lu, Zhaojian Li

Abstract: Fruit harvesting poses a significant labor and financial burden for the industry, highlighting the critical need for advancements in robotic harvesting solutions. Machine vision-based fruit detection has been recognized as a crucial component for robust identification of fruits to guide robotic manipulation. Despite considerable progress in leveraging deep learning and machine learning techniques… ▽ More Fruit harvesting poses a significant labor and financial burden for the industry, highlighting the critical need for advancements in robotic harvesting solutions. Machine vision-based fruit detection has been recognized as a crucial component for robust identification of fruits to guide robotic manipulation. Despite considerable progress in leveraging deep learning and machine learning techniques for fruit detection, a common shortfall is the inability to swiftly extend the developed models across different orchards and/or various fruit species. Additionally, the limited availability of pertinent data further compounds these challenges. In this work, we introduce MetaFruit, the largest publicly available multi-class fruit dataset, comprising 4,248 images and 248,015 manually labeled instances across diverse U.S. orchards. Furthermore, this study proposes an innovative open-set fruit detection system leveraging advanced Vision Foundation Models (VFMs) for fruit detection that can adeptly identify a wide array of fruit types under varying orchard conditions. This system not only demonstrates remarkable adaptability in learning from minimal data through few-shot learning but also shows the ability to interpret human instructions for subtle detection tasks. The performance of the developed foundation model is comprehensively evaluated using several metrics, which outperforms the existing state-of-the-art algorithms in both our MetaFruit dataset and other open-sourced fruit datasets, thereby setting a new benchmark in the field of agricultural technology and robotic harvesting. The MetaFruit dataset and detection framework are open-sourced to foster future research in vision-based fruit harvesting, marking a significant stride toward addressing the urgent needs of the agricultural sector. △ Less

Submitted 13 May, 2024; originally announced July 2024.

Comments: 14 pages, 5 figures, 7 tables

arXiv:2407.04675 [pdf, other]

Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this work, we introduce Seed-ASR, a large language model (LLM) based speech recognition model. Seed-ASR is developed based on the framework of audio conditioned LLM (AcLLM), leveraging the capabilities of LLMs by inputting continuous speech representations together with contextual information into the LLM. Through stage-wise large-scale training and the elicitation of context-aware capabilities in LLM, Seed-ASR demonstrates significant improvement over end-to-end models on comprehensive evaluation sets, including multiple domains, accents/dialects and languages. Additionally, Seed-ASR can be further deployed to support specific needs in various scenarios without requiring extra language models. Compared to recently released large ASR models, Seed-ASR achieves 10%-40% reduction in word (or character, for Chinese) error rates on Chinese and English public test sets, further demonstrating its powerful performance. △ Less

Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.04656 [pdf, other]

Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement

Authors: Yongji Wu, Wenjie Qu, Tianyang Tao, Zhuang Wang, Wei Bai, Zhuohao Li, Yuan Tian, Jiaheng Zhang, Matthew Lentz, Danyang Zhuo

Abstract: Sparsely-activated Mixture-of-Experts (MoE) architecture has increasingly been adopted to further scale large language models (LLMs) due to its sub-linear scaling for computation costs. However, frequent failures still pose significant challenges as training scales. The cost of even a single failure is significant, as all GPUs need to wait idle until the failure is resolved, potentially losing con… ▽ More Sparsely-activated Mixture-of-Experts (MoE) architecture has increasingly been adopted to further scale large language models (LLMs) due to its sub-linear scaling for computation costs. However, frequent failures still pose significant challenges as training scales. The cost of even a single failure is significant, as all GPUs need to wait idle until the failure is resolved, potentially losing considerable training progress as training has to restart from checkpoints. Existing solutions for efficient fault-tolerant training either lack elasticity or rely on building resiliency into pipeline parallelism, which cannot be applied to MoE models due to the expert parallelism strategy adopted by the MoE architecture. We present Lazarus, a system for resilient and elastic training of MoE models. Lazarus adaptively allocates expert replicas to address the inherent imbalance in expert workload and speeds-up training, while a provably optimal expert placement algorithm is developed to maximize the probability of recovery upon failures. Through adaptive expert placement and a flexible token dispatcher, Lazarus can also fully utilize all available nodes after failures, leaving no GPU idle. Our evaluation shows that Lazarus outperforms existing MoE training systems by up to 5.7x under frequent node failures and 3.4x on a real spot instance trace. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.04467 [pdf, other]

Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games

Authors: Nathan Herr, Fernando Acero, Roberta Raileanu, María Pérez-Ortiz, Zhibin Li

Abstract: Large Language Models (LLMs) have been increasingly used in real-world settings, yet their strategic abilities remain largely unexplored. Game theory provides a good framework for assessing the decision-making abilities of LLMs in interactions with other agents. Although prior studies have shown that LLMs can solve these tasks with carefully curated prompts, they fail when the problem setting or p… ▽ More Large Language Models (LLMs) have been increasingly used in real-world settings, yet their strategic abilities remain largely unexplored. Game theory provides a good framework for assessing the decision-making abilities of LLMs in interactions with other agents. Although prior studies have shown that LLMs can solve these tasks with carefully curated prompts, they fail when the problem setting or prompt changes. In this work we investigate LLMs' behaviour in strategic games, Stag Hunt and Prisoner Dilemma, analyzing performance variations under different settings and prompts. Our results show that the tested state-of-the-art LLMs exhibit at least one of the following systematic biases: (1) positional bias, (2) payoff bias, or (3) behavioural bias. Subsequently, we observed that the LLMs' performance drops when the game configuration is misaligned with the affecting biases. Performance is assessed based on the selection of the correct action, one which agrees with the prompted preferred behaviours of both players. Alignment refers to whether the LLM's bias aligns with the correct action. For example, GPT-4o's average performance drops by 34% when misaligned. Additionally, the current trend of "bigger and newer is better" does not hold for the above, where GPT-4o (the current best-performing LLM) suffers the most substantial performance drop. Lastly, we note that while chain-of-thought prompting does reduce the effect of the biases on most models, it is far from solving the problem at the fundamental level. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 8 pages (19 with appendix), 6 figures in the main body (4 in the appendix), 4 tables in the main body

arXiv:2407.04252 [pdf, other]

Comparing metallicity correlations in nearby non-AGN and AGN-host galaxies

Authors: Song-lin Li, Zefeng Li, Emily Wisnioski, Mark R. Krumholz, Sebastián F. Sánchez

Abstract: The gas-phase metallicity distribution within galaxies records critical information about galactic evolution. In this work we investigate how active galactic nuclei (AGN) influence this distribution by measuring the two-point correlation functions of gas-phase metallicity in 95 non-AGN and 37 AGN-host galaxies from the Calar Alto Legacy Integral Field spectroscopy Area integral field spectrographi… ▽ More The gas-phase metallicity distribution within galaxies records critical information about galactic evolution. In this work we investigate how active galactic nuclei (AGN) influence this distribution by measuring the two-point correlation functions of gas-phase metallicity in 95 non-AGN and 37 AGN-host galaxies from the Calar Alto Legacy Integral Field spectroscopy Area integral field spectrographic survey. We measure metallicity using a novel Bayesian method that properly includes both stellar and AGN contributions to emission line fluxes and allows us to measure metallicities in both AGN-host and non-AGN galaxies in a single, consistent framework. We find that the two-point correlation functions of both AGN-host and non-AGN galaxies are well-fit by a simple injection-diffusion model, and that the correlation lengths $l_\mathrm{corr}$ we derive for the non-AGN galaxies are reasonably consistent with those obtained in earlier work. The AGN-host galaxies generally have smaller $l_\mathrm{corr}$ than non-AGN galaxies at fixed stellar mass, but similar $l_\mathrm{corr}$ at fixed star formation rate (SFR), suggesting that the primary effect of hosting an AGN in this sample is a reduction in SFR at fixed stellar mass, and that this in turn suppresses the correlation length. Our findings further indicate that, while both SFR and stellar mass are positively correlated with metallicity correlation length $l_\mathrm{corr}$, the former is more fundamental, implying that fluctuations in the metallicity distribution within galaxies are driven more by short-term responses to physical processes such as star formation that can change much faster than a Hubble time. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 13 pages, 10 figures, submitted to MNRAS

arXiv:2407.04206 [pdf, other]

Computational Graph Representation of Equations System Constructors in Hierarchical Circuit Simulation

Authors: Zichao Long, Lin Li, Lei Han, Xianglong Meng, Chongjun Ding, Ruiyan Li, Wu Jiang, Fuchen Ding, Jiaqing Yue, Zhichao Li, Yisheng Hu, Ding Li, Heng Liao

Abstract: Equations system constructors of hierarchical circuits play a central role in device modeling, nonlinear equations solving, and circuit design automation. However, existing constructors present limitations in applications to different extents. For example, the costs of developing and reusing device models -- especially coarse-grained equivalent models of circuit modules -- remain high while parame… ▽ More Equations system constructors of hierarchical circuits play a central role in device modeling, nonlinear equations solving, and circuit design automation. However, existing constructors present limitations in applications to different extents. For example, the costs of developing and reusing device models -- especially coarse-grained equivalent models of circuit modules -- remain high while parameter sensitivity analysis is complex and inefficient. Inspired by differentiable programming and leveraging the ecosystem benefits of open-source software, we propose an equations system constructor using the computational graph representation, along with its JSON format netlist, to address these limitations. This representation allows for runtime dependencies between signals and subcircuit/device parameters. The proposed method streamlines the model development process and facilitates end-to-end computation of gradients of equations remainders with respect to parameters. This paper discusses in detail the overarching concept of hierarchical subcircuit/device decomposition and nested invocation by drawing parallels to functions in programming languages, and introduces rules for parameters passing and gradient propagation across hierarchical circuit modules. The presented numerical examples, including (1) an uncoupled CMOS model representation using "equivalent circuit decomposition+dynamic parameters" and (2) operational amplifier (OpAmp) auto device sizing, have demonstrated that the proposed method supports circuit simulation and design and particularly subcircuit modeling with improved efficiency, simplicity, and decoupling compared to existing techniques. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.04121 [pdf, other]

Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models

Authors: Yuyan Chen, Qiang Fu, Yichen Yuan, Zhihao Wen, Ge Fan, Dayiheng Liu, Dongmei Zhang, Zhixu Li, Yanghua Xiao

Abstract: Large Language Models (LLMs) have gained widespread adoption in various natural language processing tasks, including question answering and dialogue systems. However, a major drawback of LLMs is the issue of hallucination, where they generate unfaithful or inconsistent content that deviates from the input source, leading to severe consequences. In this paper, we propose a robust discriminator name… ▽ More Large Language Models (LLMs) have gained widespread adoption in various natural language processing tasks, including question answering and dialogue systems. However, a major drawback of LLMs is the issue of hallucination, where they generate unfaithful or inconsistent content that deviates from the input source, leading to severe consequences. In this paper, we propose a robust discriminator named RelD to effectively detect hallucination in LLMs' generated answers. RelD is trained on the constructed RelQA, a bilingual question-answering dialogue dataset along with answers generated by LLMs and a comprehensive set of metrics. Our experimental results demonstrate that the proposed RelD successfully detects hallucination in the answers generated by diverse LLMs. Moreover, it performs well in distinguishing hallucination in LLMs' generated answers from both in-distribution and out-of-distribution datasets. Additionally, we also conduct a thorough analysis of the types of hallucinations that occur and present valuable insights. This research significantly contributes to the detection of reliable answers generated by LLMs and holds noteworthy implications for mitigating hallucination in the future work. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: Accepted to CIKM 2023 (Long Paper)

arXiv:2407.04118 [pdf, other]

MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization

Authors: Yuyan Chen, Zhihao Wen, Ge Fan, Zhengyu Chen, Wei Wu, Dayiheng Liu, Zhixu Li, Bang Liu, Yanghua Xiao

Abstract: Prompt engineering, as an efficient and effective way to leverage Large Language Models (LLM), has drawn a lot of attention from the research community. The existing research primarily emphasizes the importance of adapting prompts to specific tasks, rather than specific LLMs. However, a good prompt is not solely defined by its wording, but also binds to the nature of the LLM in question. In this w… ▽ More Prompt engineering, as an efficient and effective way to leverage Large Language Models (LLM), has drawn a lot of attention from the research community. The existing research primarily emphasizes the importance of adapting prompts to specific tasks, rather than specific LLMs. However, a good prompt is not solely defined by its wording, but also binds to the nature of the LLM in question. In this work, we first quantitatively demonstrate that different prompts should be adapted to different LLMs to enhance their capabilities across various downstream tasks in NLP. Then we novelly propose a model-adaptive prompt optimizer (MAPO) method that optimizes the original prompts for each specific LLM in downstream tasks. Extensive experiments indicate that the proposed method can effectively refine prompts for an LLM, leading to significant improvements over various downstream tasks. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: Accepted to EMNLP 2023 (Findings)

arXiv:2407.04105 [pdf, other]

Can Pre-trained Language Models Understand Chinese Humor?

Authors: Yuyan Chen, Zhixu Li, Jiaqing Liang, Yanghua Xiao, Bang Liu, Yunwen Chen

Abstract: Humor understanding is an important and challenging research in natural language processing. As the popularity of pre-trained language models (PLMs), some recent work makes preliminary attempts to adopt PLMs for humor recognition and generation. However, these simple attempts do not substantially answer the question: {\em whether PLMs are capable of humor understanding?} This paper is the first wo… ▽ More Humor understanding is an important and challenging research in natural language processing. As the popularity of pre-trained language models (PLMs), some recent work makes preliminary attempts to adopt PLMs for humor recognition and generation. However, these simple attempts do not substantially answer the question: {\em whether PLMs are capable of humor understanding?} This paper is the first work that systematically investigates the humor understanding ability of PLMs. For this purpose, a comprehensive framework with three evaluation steps and four evaluation tasks is designed. We also construct a comprehensive Chinese humor dataset, which can fully meet all the data requirements of the proposed evaluation framework. Our empirical study on the Chinese humor dataset yields some valuable observations, which are of great guiding value for future optimization of PLMs in humor understanding and generation. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: Accepted to WSDM 2022

Showing 1–50 of 10,552 results for author: Li, Z