subscribe to arXiv mailings

A Factuality and Diversity Reconciled Decoding Method for Knowledge-Grounded Dialogue Generation

Authors: Chenxu Yang, Zheng Lin, Chong Tian, Liang Pang, Lanrui Wang, Zhengyang Tong, Qirong Ho, Yanan Cao, Weiping Wang

Abstract: Grounding external knowledge can enhance the factuality of responses in dialogue generation. However, excessive emphasis on it might result in the lack of engaging and diverse expressions. Through the introduction of randomness in sampling, current approaches can increase the diversity. Nevertheless, such sampling method could undermine the factuality in dialogue generation. In this study, to disc… ▽ More Grounding external knowledge can enhance the factuality of responses in dialogue generation. However, excessive emphasis on it might result in the lack of engaging and diverse expressions. Through the introduction of randomness in sampling, current approaches can increase the diversity. Nevertheless, such sampling method could undermine the factuality in dialogue generation. In this study, to discover a solution for advancing creativity without relying on questionable randomness and to subtly reconcile the factuality and diversity within the source-grounded paradigm, a novel method named DoGe is proposed. DoGe can dynamically alternate between the utilization of internal parameter knowledge and external source knowledge based on the model's factual confidence. Extensive experiments on three widely-used datasets show that DoGe can not only enhance response diversity but also maintain factuality, and it significantly surpasses other various decoding strategy baselines. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.04842 [pdf, other]

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Authors: Zhaorun Chen, Yichao Du, Zichen Wen, Yiyang Zhou, Chenhang Cui, Zhenzhen Weng, Haoqin Tu, Chaoqi Wang, Zhengwei Tong, Qinglan Huang, Canyu Chen, Qinghao Ye, Zhihong Zhu, Yuqing Zhang, Jiawei Zhou, Zhuokai Zhao, Rafael Rafailov, Chelsea Finn, Huaxiu Yao

Abstract: While text-to-image models like DALLE-3 and Stable Diffusion are rapidly proliferating, they often encounter challenges such as hallucination, bias, and the production of unsafe, low-quality output. To effectively address these issues, it is crucial to align these models with desired behaviors based on feedback from a multimodal judge. Despite their significance, current multimodal judges frequent… ▽ More While text-to-image models like DALLE-3 and Stable Diffusion are rapidly proliferating, they often encounter challenges such as hallucination, bias, and the production of unsafe, low-quality output. To effectively address these issues, it is crucial to align these models with desired behaviors based on feedback from a multimodal judge. Despite their significance, current multimodal judges frequently undergo inadequate evaluation of their capabilities and limitations, potentially leading to misalignment and unsafe fine-tuning outcomes. To address this issue, we introduce MJ-Bench, a novel benchmark which incorporates a comprehensive preference dataset to evaluate multimodal judges in providing feedback for image generation models across four key perspectives: alignment, safety, image quality, and bias. Specifically, we evaluate a large variety of multimodal judges including smaller-sized CLIP-based scoring models, open-source VLMs (e.g. LLaVA family), and close-source VLMs (e.g. GPT-4o, Claude 3) on each decomposed subcategory of our preference dataset. Experiments reveal that close-source VLMs generally provide better feedback, with GPT-4o outperforming other judges in average. Compared with open-source VLMs, smaller-sized scoring models can provide better feedback regarding text-image alignment and image quality, while VLMs provide more accurate feedback regarding safety and generation bias due to their stronger reasoning capabilities. Further studies in feedback scale reveal that VLM judges can generally provide more accurate and stable feedback in natural language (Likert-scale) than numerical scales. Notably, human evaluations on end-to-end fine-tuned models using separate feedback from these multimodal judges provide similar conclusions, further confirming the effectiveness of MJ-Bench. All data, code, models are available at https://huggingface.co/MJ-Bench. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 42 pages, 13 figures, 33 tables

arXiv:2406.16177 [pdf, other]

Flowy: Supporting UX Design Decisions Through AI-Driven Pattern Annotation in Multi-Screen User Flows

Authors: Yuwen Lu, Ziang Tong, Qinyi Zhao, Yewon Oh, Bryan Wang, Toby Jia-Jun Li

Abstract: Many recent AI-powered UX design tools focus on generating individual static UI screens from natural language. However, they overlook the crucial aspect of interactions and user experiences across multiple screens. Through formative studies with UX professionals, we identified limitations of these tools in supporting realistic UX design workflows. In response, we designed and developed Flowy, an a… ▽ More Many recent AI-powered UX design tools focus on generating individual static UI screens from natural language. However, they overlook the crucial aspect of interactions and user experiences across multiple screens. Through formative studies with UX professionals, we identified limitations of these tools in supporting realistic UX design workflows. In response, we designed and developed Flowy, an app that augments designers' information foraging process in ideation by supplementing specific user flow examples with distilled design pattern knowledge. Flowy utilizes large multimodal AI models and a high-quality user flow dataset to help designers identify and understand relevant abstract design patterns in the design space for multi-screen user flows. Our user study with professional UX designers demonstrates how Flowy supports realistic UX tasks. Our design considerations in Flowy, such as representations with appropriate levels of abstraction and assisted navigation through the solution space, are generalizable to other creative tasks and embody a human-centered, intelligence augmentation approach to using AI in UX design. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.13078 [pdf]

A universal bioluminescence tomography system for pre-clinical image-guided radiotherapy research

Authors: Zhishen Tong, Zijian Deng, Xiangkun Xu, Ciara Newman, Xun Jia, Yuncheng Zhong, Merle Reinhart, Paul Tsouchlos, Tim Devling, Hamid Dehghani, Iulian Iordachita, Debabrata Saha, John W. Wong, Ken Kang-Hsin Wang

Abstract: CBCT-guided small animal irradiators encounter challenges in localizing soft-tissue targets due to low imaging contrast. Bioluminescence tomography (BLT) offers a promising solution, but they have largely remained in laboratorial development, limiting accessibility for researchers. In this work, we develop a universal, commercial-graded BLT-guided system (MuriGlo) designed to seamlessly integrate… ▽ More CBCT-guided small animal irradiators encounter challenges in localizing soft-tissue targets due to low imaging contrast. Bioluminescence tomography (BLT) offers a promising solution, but they have largely remained in laboratorial development, limiting accessibility for researchers. In this work, we develop a universal, commercial-graded BLT-guided system (MuriGlo) designed to seamlessly integrate with commercial irradiators and empower researchers for translational studies. We demonstrate its capabilities in supporting in vitro and in vivo studies. The MuriGlo comprises detachable mouse bed, thermostatic control, mirrors, filters, and CCD, enabling multi-projection and multi-spectral imaging. We evaluate that the thermostatic control effectively sustains animal temperature at 37°C throughout imaging, and quantify that the system can detect as few as 61 GL261-AkaLuc cells in vitro. To illustrate how the MuriGlo can be utilized for in vivo image-guided research, we present 3 strategies, BLT-guided 5-arc, 2-field box, and BLI-guided single-beam, ranging from complicated high-conformal to simplest high-throughput plans. The high conformal BLT-guided 5-arc plan fully covers the gross tumor volume (GTV) at prescribed dose with minimal normal tissue exposure (3.9%), while the simplified, high-throughput BLT-guided 2-field box achieves 100% GTV coverage but results in higher normal tissue exposure (13.1%). Moreover, we demonstrate that the localization accuracy of MuriGlo for both widely-used SARRP and SmART irradiators is within1 mm, and the tumor coverage reaches over 97% with 0.75mm margin. The universal BLT-guided system offers seamless integration with commercial irradiators, achieving comparable localization accuracy, expected to supporting high-precision radiation research. △ Less

Submitted 27 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12874 [pdf, other]

The Design, Implementation, and Performance of the LZ Calibration Systems

Authors: J. Aalbers, D. S. Akerib, A. K. Al Musalhi, F. Alder, C. S. Amarasinghe, A. Ames, T. J. Anderson, N. Angelides, H. M. Araújo, J. E. Armstrong, M. Arthurs, A. Baker, S. Balashov, J. Bang, E. E. Barillier, J. W. Bargemann, K. Beattie, T. Benson, A. Bhatti, A. Biekert, T. P. Biesiadzinski, H. J. Birch, E. Bishop, G. M. Blockinger, B. Boxer , et al. (179 additional authors not shown)

Abstract: LUX-ZEPLIN (LZ) is a tonne-scale experiment searching for direct dark matter interactions and other rare events. It is located at the Sanford Underground Research Facility (SURF) in Lead, South Dakota, USA. The core of the LZ detector is a dual-phase xenon time projection chamber (TPC), designed with the primary goal of detecting Weakly Interacting Massive Particles (WIMPs) via their induced low e… ▽ More LUX-ZEPLIN (LZ) is a tonne-scale experiment searching for direct dark matter interactions and other rare events. It is located at the Sanford Underground Research Facility (SURF) in Lead, South Dakota, USA. The core of the LZ detector is a dual-phase xenon time projection chamber (TPC), designed with the primary goal of detecting Weakly Interacting Massive Particles (WIMPs) via their induced low energy nuclear recoils. Surrounding the TPC, two veto detectors immersed in an ultra-pure water tank enable reducing background events to enhance the discovery potential. Intricate calibration systems are purposely designed to precisely understand the responses of these three detector volumes to various types of particle interactions and to demonstrate LZ's ability to discriminate between signals and backgrounds. In this paper, we present a comprehensive discussion of the key features, requirements, and performance of the LZ calibration systems, which play a crucial role in enabling LZ's WIMP-search and its broad science program. The thorough description of these calibration systems, with an emphasis on their novel aspects, is valuable for future calibration efforts in direct dark matter and other rare-event search experiments. △ Less

Submitted 20 June, 2024; v1 submitted 2 May, 2024; originally announced June 2024.

arXiv:2406.12187 [pdf, other]

Diverse Responses in Lattice Thermal Conductivity of $n$-type/$p$-type Semiconductors Driven by Asymmetric Electron-Phonon Interactions

Authors: Jianshi Sun, Shouhang Li, Zhen Tong, Cheng Shao, Han Xie, Meng An, Chuang Zhang, Xiongfei Zhu, Chen Huang, Yucheng Xiong, Xiangjun Liu

Abstract: Accurately assessing the impact of electron-phonon interaction (EPI) on the lattice thermal conductivity of semiconductors is crucial for the thermal management of electronic devices and a unified physical understanding of this issue is highly desired. In this work, we predict the lattice thermal conductivities of typical direct and indirect bandgap semiconductors accounting for EPI based on mode-… ▽ More Accurately assessing the impact of electron-phonon interaction (EPI) on the lattice thermal conductivity of semiconductors is crucial for the thermal management of electronic devices and a unified physical understanding of this issue is highly desired. In this work, we predict the lattice thermal conductivities of typical direct and indirect bandgap semiconductors accounting for EPI based on mode-level first-principles calculations. It is found that EPI has a larger effect on the lattice thermal conductivity of $p$-type doping compared to $n$-type doping in the same semiconductor at high charge carrier concentrations. The stronger EPI in $p$-type doping is attributed to the relatively higher electron density of states caused by the relatively larger $p$-orbital component. Furthermore, EPI has a stronger influence on the lattice thermal conductivity of $n$-type indirect bandgap semiconductors than $n$-type direct bandgap semiconductors. This is attributed to the relatively lower electron density of states in direct bandgap semiconductors stemming from the $s$-orbital component. This work reveals that there exist diverse responses in lattice thermal conductivity of $n$-type/$p$-type semiconductors, which can be attributed to asymmetric EPIs. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 8 pages,5 figures

arXiv:2406.02874 [pdf, other]

Giant enhancement of hole mobility for 4H-silicon carbide through suppressing interband electron-phonon scattering

Authors: Jianshi Sun, Shouhang Li, Zhen Tong, Cheng Shao, Meng An, Xiongfei Zhu, Chuang Zhang, Xiangchuan Chen, Yucheng Xiong, Thomas Frauenheim, Xiangjun Liu

Abstract: 4H-Silicon Carbide (4H-SiC) possesses a high Baliga figure of merit, making it a promising material for power electronics. However, its applications are limited by its low hole mobility. Herein, we found that the hole mobility of 4H-SiC is mainly limited by the strong interband electron-phonon scattering using mode-level first-principles calculations. Our research indicates that applying compressi… ▽ More 4H-Silicon Carbide (4H-SiC) possesses a high Baliga figure of merit, making it a promising material for power electronics. However, its applications are limited by its low hole mobility. Herein, we found that the hole mobility of 4H-SiC is mainly limited by the strong interband electron-phonon scattering using mode-level first-principles calculations. Our research indicates that applying compressive strain can reverse the sign of crystal-field splitting and change the ordering of electron bands close to the valence band maximum. Therefore, the interband electron-phonon scattering is severely suppressed, and the out-of-plane hole mobility of 4H-SiC can be enhanced by 200% with 2% uniaxial compressive strain applied. This work provides new insights into the electron transport mechanisms in semiconductors and suggests a strategy to improve hole mobility that could be applied to other semiconductors with hexagonal crystalline geometries. △ Less

Submitted 20 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

Comments: 22 pages, 4 figures

arXiv:2406.02441 [pdf, other]

Probing the Scalar WIMP-Pion Coupling with the first LUX-ZEPLIN data

Authors: J. Aalbers, D. S. Akerib, A. K. Al Musalhi, F. Alder, C. S. Amarasinghe, A. Ames, T. J. Anderson, N. Angelides, H. M. Araújo, J. E. Armstrong, M. Arthurs, A. Baker, S. Balashov, J. Bang, E. E. Barillier, J. W. Bargemann, K. Beattie, T. Benson, A. Bhatti, A. Biekert, T. P. Biesiadzinski, H. J. Birch, E. J. Bishop, G. M. Blockinger, B. Boxer , et al. (178 additional authors not shown)

Abstract: Weakly interacting massive particles (WIMPs) may interact with a virtual pion that is exchanged between nucleons. This interaction channel is important to consider in models where the spin-independent isoscalar channel is suppressed. Using data from the first science run of the LUX-ZEPLIN dark matter experiment, containing 60 live days of data in a 5.5~tonne fiducial mass of liquid xenon, we repor… ▽ More Weakly interacting massive particles (WIMPs) may interact with a virtual pion that is exchanged between nucleons. This interaction channel is important to consider in models where the spin-independent isoscalar channel is suppressed. Using data from the first science run of the LUX-ZEPLIN dark matter experiment, containing 60 live days of data in a 5.5~tonne fiducial mass of liquid xenon, we report the results on a search for WIMP-pion interactions. We observe no significant excess and set an upper limit of $1.5\times10^{-46}$~cm$^2$ at a 90\% confidence level for a WIMP mass of 33~GeV/c$^2$ for this interaction. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2405.18910 [pdf, other]

Predicting Parking Availability in Singapore with Cross-Domain Data: A New Dataset and A Data-Driven Approach

Authors: Huaiwu Zhang, Yutong Xia, Siru Zhong, Kun Wang, Zekun Tong, Qingsong Wen, Roger Zimmermann, Yuxuan Liang

Abstract: The increasing number of vehicles highlights the need for efficient parking space management. Predicting real-time Parking Availability (PA) can help mitigate traffic congestion and the corresponding social problems, which is a pressing issue in densely populated cities like Singapore. In this study, we aim to collectively predict future PA across Singapore with complex factors from various domain… ▽ More The increasing number of vehicles highlights the need for efficient parking space management. Predicting real-time Parking Availability (PA) can help mitigate traffic congestion and the corresponding social problems, which is a pressing issue in densely populated cities like Singapore. In this study, we aim to collectively predict future PA across Singapore with complex factors from various domains. The contributions in this paper are listed as follows: (1) A New Dataset: We introduce the \texttt{SINPA} dataset, containing a year's worth of PA data from 1,687 parking lots in Singapore, enriched with various spatial and temporal factors. (2) A Data-Driven Approach: We present DeepPA, a novel deep-learning framework, to collectively and efficiently predict future PA across thousands of parking lots. (3) Extensive Experiments and Deployment: DeepPA demonstrates a 9.2% reduction in prediction error for up to 3-hour forecasts compared to existing advanced models. Furthermore, we implement DeepPA in a practical web-based platform to provide real-time PA predictions to aid drivers and inform urban planning for the governors in Singapore. We release the dataset and source code at https://github.com/yoshall/SINPA. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: Accepted by IJCAI 2024 (Multi-Year Track On AI And Social Good with ~20% acceptance rate)

arXiv:2405.14732 [pdf, other]

The Data Acquisition System of the LZ Dark Matter Detector: FADR

Authors: J. Aalbers, D. S. Akerib, A. K. Al Musalhi, F. Alder, C. S. Amarasinghe, A. Ames, T. J. Anderson, N. Angelides, H. M. Araújo, J. E. Armstrong, M. Arthurs, A. Baker, S. Balashov, J. Bang, E. E. Barillier, J. W. Bargemann, K. Beattie, T. Benson, A. Bhatti, A. Biekert, T. P. Biesiadzinski, H. J. Birch, E. Bishop, G. M. Blockinger, B. Boxer , et al. (190 additional authors not shown)

Abstract: The Data Acquisition System (DAQ) for the LUX-ZEPLIN (LZ) dark matter detector is described. The signals from 745 PMTs, distributed across three subsystems, are sampled with 100-MHz 32-channel digitizers (DDC-32s). A basic waveform analysis is carried out on the on-board Field Programmable Gate Arrays (FPGAs) to extract information about the observed scintillation and electroluminescence signals.… ▽ More The Data Acquisition System (DAQ) for the LUX-ZEPLIN (LZ) dark matter detector is described. The signals from 745 PMTs, distributed across three subsystems, are sampled with 100-MHz 32-channel digitizers (DDC-32s). A basic waveform analysis is carried out on the on-board Field Programmable Gate Arrays (FPGAs) to extract information about the observed scintillation and electroluminescence signals. This information is used to determine if the digitized waveforms should be preserved for offline analysis. The system is designed around the Kintex-7 FPGA. In addition to digitizing the PMT signals and providing basic event selection in real time, the flexibility provided by the use of FPGAs allows us to monitor the performance of the detector and the DAQ in parallel to normal data acquisition. The hardware and software/firmware of this FPGA-based Architecture for Data acquisition and Realtime monitoring (FADR) are discussed and performance measurements are described. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 18 pages, 24 figures

arXiv:2405.02866 [pdf, other]

Universal exponential pointwise convergence for weighted multiple ergodic averages over $ \mathbb{T}^\infty $

Authors: Zhicheng Tong, Yong Li

Abstract: By employing an accelerated weighting method, we establish arbitrary polynomial and exponential pointwise convergence for multiple ergodic averages under general conditions in both discrete and continuous settings, involving quasi-periodic and almost periodic cases, which breaks the well known slow convergence rate observed in classical ergodic theory. We also present joint Diophantine rotations a… ▽ More By employing an accelerated weighting method, we establish arbitrary polynomial and exponential pointwise convergence for multiple ergodic averages under general conditions in both discrete and continuous settings, involving quasi-periodic and almost periodic cases, which breaks the well known slow convergence rate observed in classical ergodic theory. We also present joint Diophantine rotations as explicit applications. Especially, in the sense that excluding nearly rational rotations with zero measure, we demonstrate that the pointwise exponential convergence is universal via analytic observables, even when multiplicatively averaging over the infinite-dimensional torus $ \mathbb{T}^\infty $, utilizing a novel truncated approach. Moreover, by constructing counterexamples concerning with multiple ergodicity, we highlight the irremovability of the joint nonresonance and establish the optimality of our weighting method in preserving rapid convergence. We also provide numerical simulations with analysis to further illustrate our results. △ Less

Submitted 10 June, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

Comments: 36pages. Comments are welcome!

MSC Class: 37A25; 37A45

arXiv:2405.01864 [pdf, ps, other]

Full-dimensional KAM torus with frequency-preserving in infinite-dimensional Hamiltonian systems

Authors: Zhicheng Tong, Yong Li

Abstract: In this paper, we present two infinite-dimensional KAM theorems with frequency-preserving for a nonresonant frequency of Diophantine type or even weaker. To be more precise, under a nondegenerate condition for an infinite-dimensional Hamiltonian system, we prove the persistence of a full-dimensional KAM torus with the specified frequency independent of any spectral asymptotics, by advantage of the… ▽ More In this paper, we present two infinite-dimensional KAM theorems with frequency-preserving for a nonresonant frequency of Diophantine type or even weaker. To be more precise, under a nondegenerate condition for an infinite-dimensional Hamiltonian system, we prove the persistence of a full-dimensional KAM torus with the specified frequency independent of any spectral asymptotics, by advantage of the generating function method. This appears to be the first Kolmogorov type result in the infinite-dimensional context. As a direct application, we provide a positive answer to Bourgain's conjecture: full-dimensional invariant tori for 1D nonlinear Schrödinger equations do exist. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: 30 pages

MSC Class: 37K55; 35Q55

arXiv:2404.17666 [pdf, other]

Constraints On Covariant WIMP-Nucleon Effective Field Theory Interactions from the First Science Run of the LUX-ZEPLIN Experiment

Authors: J. Aalbers, D. S. Akerib, A. K. Al Musalhi, F. Alder, C. S. Amarasinghe, A. Ames, T. J. Anderson, N. Angelides, H. M. Araújo, J. E. Armstrong, M. Arthurs, A. Baker, S. Balashov, J. Bang, E. E. Barillier, J. W. Bargemann, K. Beattie, T. Benson, A. Bhatti, A. Biekert, T. P. Biesiadzinski, H. J. Birch, E. J. Bishop, G. M. Blockinger, B. Boxer , et al. (179 additional authors not shown)

Abstract: The first science run of the LUX-ZEPLIN (LZ) experiment, a dual-phase xenon time project chamber operating in the Sanford Underground Research Facility in South Dakota, USA, has reported leading limits on spin-independent WIMP-nucleon interactions and interactions described from a non-relativistic effective field theory (NREFT). Using the same 5.5~t fiducial mass and 60 live days of exposure we re… ▽ More The first science run of the LUX-ZEPLIN (LZ) experiment, a dual-phase xenon time project chamber operating in the Sanford Underground Research Facility in South Dakota, USA, has reported leading limits on spin-independent WIMP-nucleon interactions and interactions described from a non-relativistic effective field theory (NREFT). Using the same 5.5~t fiducial mass and 60 live days of exposure we report on the results of a relativistic extension to the NREFT. We present constraints on couplings from covariant interactions arising from the coupling of vector, axial currents, and electric dipole moments of the nucleon to the magnetic and electric dipole moments of the WIMP which cannot be described by recasting previous results described by an NREFT. Using a profile-likelihood ratio analysis, in an energy region between 0~keV$_\text{nr}$ to 270~keV$_\text{nr}$, we report 90% confidence level exclusion limits on the coupling strength of five interactions in both the isoscalar and isovector bases. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 7 pages, 4 figures

arXiv:2404.14464 [pdf, other]

Tree of Reviews: A Tree-based Dynamic Iterative Retrieval Framework for Multi-hop Question Answering

Authors: Li Jiapeng, Liu Runze, Li Yabo, Zhou Tong, Li Mingling, Chen Xiang

Abstract: Multi-hop question answering is a knowledge-intensive complex problem. Large Language Models (LLMs) use their Chain of Thoughts (CoT) capability to reason complex problems step by step, and retrieval-augmentation can effectively alleviate factual errors caused by outdated and unknown knowledge in LLMs. Recent works have introduced retrieval-augmentation in the CoT reasoning to solve multi-hop ques… ▽ More Multi-hop question answering is a knowledge-intensive complex problem. Large Language Models (LLMs) use their Chain of Thoughts (CoT) capability to reason complex problems step by step, and retrieval-augmentation can effectively alleviate factual errors caused by outdated and unknown knowledge in LLMs. Recent works have introduced retrieval-augmentation in the CoT reasoning to solve multi-hop question answering. However, these chain methods have the following problems: 1) Retrieved irrelevant paragraphs may mislead the reasoning; 2) An error in the chain structure may lead to a cascade of errors. In this paper, we propose a dynamic retrieval framework called Tree of Reviews (ToR), where the root node is the question, and the other nodes are paragraphs from retrieval, extending different reasoning paths from the root node to other nodes. Our framework dynamically decides to initiate a new search, reject, or accept based on the paragraphs on the reasoning paths. Compared to related work, we introduce a tree structure to handle each retrieved paragraph separately, alleviating the misleading effect of irrelevant paragraphs on the reasoning path; the diversity of reasoning path extension reduces the impact of a single reasoning error on the whole. We conducted experiments on three different multi-hop question answering datasets. The results show that compared to the baseline methods, ToR achieves state-of-the-art performance in both retrieval and response generation. In addition, we propose two tree-based search optimization strategies, pruning and effective expansion, to reduce time overhead and increase the diversity of path extension. We will release our code. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: Keywords: Muti-hop Question Answering; Retrieval-Augmented Generation; Tree of Thought; Reasoning TLDR: We proposed a tree-based dynamic, iterative retrieval framework for multi-hop question answering

arXiv:2403.12922 [pdf, other]

Contextual AD Narration with Interleaved Multimodal Sequence

Authors: Hanlin Wang, Zhan Tong, Kecheng Zheng, Yujun Shen, Limin Wang

Abstract: The Audio Description (AD) task aims to generate descriptions of visual elements for visually impaired individuals to help them access long-form video contents, like movie. With video feature, text, character bank and context information as inputs, the generated ADs are able to correspond to the characters by name and provide reasonable, contextual descriptions to help audience understand the stor… ▽ More The Audio Description (AD) task aims to generate descriptions of visual elements for visually impaired individuals to help them access long-form video contents, like movie. With video feature, text, character bank and context information as inputs, the generated ADs are able to correspond to the characters by name and provide reasonable, contextual descriptions to help audience understand the storyline of movie. To achieve this goal, we propose to leverage pre-trained foundation models through a simple and unified framework to generate ADs with interleaved multimodal sequence as input, termed as Uni-AD. To enhance the alignment of features across various modalities with finer granularity, we introduce a simple and lightweight module that maps video features into the textual feature space. Moreover, we also propose a character-refinement module to provide more precise information by identifying the main characters who play more significant role in the video context. With these unique designs, we further incorporate contextual information and a contrastive loss into our architecture to generate more smooth and contextual ADs. Experiments on the MAD-eval dataset show that Uni-AD can achieve state-of-the-art performance on AD generation, which demonstrates the effectiveness of our approach. Code will be available at https://github.com/MCG-NJU/Uni-AD. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2402.08865 [pdf, other]

doi 10.1103/PhysRevD.109.112010

New constraints on ultraheavy dark matter from the LZ experiment

Authors: J. Aalbers, D. S. Akerib, A. K. Al Musalhi, C. S. Amarasinghe, A. Ames, T. J. Anderson, N. Angelides, H. M. Araújo, J. E. Armstrong, M. Arthurs, A. Baker, S. Balashov, J. Bang, J. W. Bargemann, A. Baxter, K. Beattie, T. Benson, A. Bhatti, A. Biekert, T. P. Biesiadzinski, H. J. Birch, E. Bishop, G. M. Blockinger, B. Boxer, C. A. J. Brew , et al. (174 additional authors not shown)

Abstract: Searches for dark matter with liquid xenon time projection chamber experiments have traditionally focused on the region of the parameter space that is characteristic of weakly interacting massive particles, ranging from a few GeV/$c^2$ to a few TeV/$c^2$. Models of dark matter with a mass much heavier than this are well motivated by early production mechanisms different from the standard thermal f… ▽ More Searches for dark matter with liquid xenon time projection chamber experiments have traditionally focused on the region of the parameter space that is characteristic of weakly interacting massive particles, ranging from a few GeV/$c^2$ to a few TeV/$c^2$. Models of dark matter with a mass much heavier than this are well motivated by early production mechanisms different from the standard thermal freeze-out, but they have generally been less explored experimentally. In this work, we present a re-analysis of the first science run (SR1) of the LZ experiment, with an exposure of $0.9$ tonne$\times$year, to search for ultraheavy particle dark matter. The signal topology consists of multiple energy deposits in the active region of the detector forming a straight line, from which the velocity of the incoming particle can be reconstructed on an event-by-event basis. Zero events with this topology were observed after applying the data selection calibrated on a simulated sample of signal-like events. New experimental constraints are derived, which rule out previously unexplored regions of the dark matter parameter space of spin-independent interactions beyond a mass of 10$^{17}$ GeV/$c^2$. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: 9 pages, 7 figures

Journal ref: Phys. Rev. D 109, 112010 (2024)

arXiv:2401.02133 [pdf, other]

Weak effects of electron-phonon interactions on the lattice thermal conductivity of wurtzite GaN with high electron concentrations

Authors: Jianshi Sun, Shouhang Li, Zhen Tong, Cheng Shao, Xiangchuan Chen, Qianqian Liu, Yucheng Xiong, Meng An, Xiangjun Liu

Abstract: Wurtzite gallium nitride (GaN) has great potential for high-frequency and high-power applications due to its excellent electrical and thermal transport properties. However, enhancing the performance of GaN-based power electronics relies on heavy doping. Previous studies showed that electron-phonon interactions have strong effects on the lattice thermal conductivity of GaN due to the Fröhlich inter… ▽ More Wurtzite gallium nitride (GaN) has great potential for high-frequency and high-power applications due to its excellent electrical and thermal transport properties. However, enhancing the performance of GaN-based power electronics relies on heavy doping. Previous studies showed that electron-phonon interactions have strong effects on the lattice thermal conductivity of GaN due to the Fröhlich interaction. Surprisingly, our investigation reveals weak effects of electron-phonon interactions on the lattice thermal conductivity of n-type GaN at ultra-high electron concentrations and the impact of the Fröhlich interaction can be ignored. The small phonon-electron scattering rate is attributed to the limited scattering channels, quantified by the Fermi surface nesting function. In contrast, there is a significant reduction in the lattice thermal conductivity of p-type GaN at high hole concentrations due to the relatively larger Fermi surface nesting function. Meanwhile, as p-type GaN has relatively smaller electron-phonon matrix elements, the reduction in lattice thermal conductivity is still weaker than that observed in p-type silicon. Our work provides a deep understanding of thermal transport in doped GaN and the conclusions can be further extended to other wide-bandgap semiconductors, including $β$-Ga2O3, AlN, and ZnO. △ Less

Submitted 5 May, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

arXiv:2312.14149 [pdf, other]

TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification

Authors: Qinying Liu, Wei Wu, Kecheng Zheng, Zhan Tong, Jiawei Liu, Yu Liu, Wei Chen, Zilei Wang, Yujun Shen

Abstract: The crux of learning vision-language models is to extract semantically aligned information from visual and linguistic data. Existing attempts usually face the problem of coarse alignment, e.g., the vision encoder struggles in localizing an attribute-specified object. In this work, we propose an embarrassingly simple approach to better align image and text features with no need of additional data f… ▽ More The crux of learning vision-language models is to extract semantically aligned information from visual and linguistic data. Existing attempts usually face the problem of coarse alignment, e.g., the vision encoder struggles in localizing an attribute-specified object. In this work, we propose an embarrassingly simple approach to better align image and text features with no need of additional data formats other than image-text pairs. Concretely, given an image and its paired text, we manage to parse objects (e.g., cat) and attributes (e.g., black) from the description, which are highly likely to exist in the image. It is noteworthy that the parsing pipeline is fully automatic and thus enjoys good scalability. With these parsed semantics as supervision signals, we can complement the commonly used image-text contrastive loss with the multi-tag classification loss. Extensive experimental results on a broad suite of semantic segmentation datasets substantiate the average 5.2\% improvement of our framework over existing alternatives. Furthermore, the visualization results indicate that attribute supervision makes vision-language models accurately localize attribute-specified objects. Project page can be found at https://qinying-liu.github.io/Tag-Align. △ Less

Submitted 26 March, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

arXiv:2312.02030 [pdf, other]

doi 10.1103/PhysRevD.109.092003

First Constraints on WIMP-Nucleon Effective Field Theory Couplings in an Extended Energy Region From LUX-ZEPLIN

Authors: LZ Collaboration, J. Aalbers, D. S. Akerib, A. K. Al Musalhi, F. Alder, C. S. Amarasinghe, A. Ames, T. J. Anderson, N. Angelides, H. M. Araújo, J. E. Armstrong, M. Arthurs, A. Baker, S. Balashov, J. Bang, J. W. Bargemann, A. Baxter, K. Beattie, T. Benson, A. Bhatti, A. Biekert, T. P. Biesiadzinski, H. J. Birch, E. Bishop, G. M. Blockinger , et al. (175 additional authors not shown)

Abstract: Following the first science results of the LUX-ZEPLIN (LZ) experiment, a dual-phase xenon time projection chamber operating from the Sanford Underground Research Facility in Lead, South Dakota, USA, we report the initial limits on a model-independent non-relativistic effective field theory describing the complete set of possible interactions of a weakly interacting massive particle (WIMP) with a n… ▽ More Following the first science results of the LUX-ZEPLIN (LZ) experiment, a dual-phase xenon time projection chamber operating from the Sanford Underground Research Facility in Lead, South Dakota, USA, we report the initial limits on a model-independent non-relativistic effective field theory describing the complete set of possible interactions of a weakly interacting massive particle (WIMP) with a nucleon. These results utilize the same 5.5 t fiducial mass and 60 live days of exposure collected for the LZ spin-independent and spin-dependent analyses while extending the upper limit of the energy region of interest by a factor of 7.5 to 270 keVnr. No significant excess in this high energy region is observed. Using a profile-likelihood ratio analysis, we report 90% confidence level exclusion limits on the coupling of each individual non-relativistic WIMP-nucleon operator for both elastic and inelastic interactions in the isoscalar and isovector bases. △ Less

Submitted 26 February, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

Comments: 17 pages 11 figures

Journal ref: Phys. Rev. D 109, 092003 (2024)

arXiv:2312.01987 [pdf, other]

Bootstrapping SparseFormers from Vision Foundation Models

Authors: Ziteng Gao, Zhan Tong, Kevin Qinghong Lin, Joya Chen, Mike Zheng Shou

Abstract: The recently proposed SparseFormer architecture provides an alternative approach to visual understanding by utilizing a significantly lower number of visual tokens via adjusting RoIs, greatly reducing computational costs while still achieving promising performance. However, training SparseFormers from scratch is still expensive, and scaling up the number of parameters can be challenging. In this p… ▽ More The recently proposed SparseFormer architecture provides an alternative approach to visual understanding by utilizing a significantly lower number of visual tokens via adjusting RoIs, greatly reducing computational costs while still achieving promising performance. However, training SparseFormers from scratch is still expensive, and scaling up the number of parameters can be challenging. In this paper, we propose to bootstrap SparseFormers from ViT-based vision foundation models in a simple and efficient way. Since the majority of SparseFormer blocks are the standard transformer ones, we can inherit weights from large-scale pre-trained vision transformers and freeze them as much as possible. Therefore, we only need to train the SparseFormer-specific lightweight focusing transformer to adjust token RoIs and fine-tune a few early pre-trained blocks to align the final token representation. In such a way, we can bootstrap SparseFormer architectures from various large-scale pre-trained models (e.g., IN-21K pre-trained AugRegs or CLIPs) using a rather smaller amount of training samples (e.g., IN-1K) and without labels or captions within just a few hours. As a result, the bootstrapped unimodal SparseFormer (from AugReg-ViT-L/16-384) can reach 84.9% accuracy on IN-1K with only 49 tokens, and the multimodal SparseFormer from CLIPs also demonstrates notable zero-shot performance with highly reduced computational cost without seeing any caption during the bootstrapping procedure. In addition, CLIP-bootstrapped SparseFormers, which align the output space with language without seeing a word, can serve as efficient vision encoders in multimodal large language models. Code and models are available at https://github.com/showlab/sparseformer △ Less

Submitted 4 April, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

Comments: CVPR 2024

arXiv:2311.15157 [pdf, other]

Advancing Vision Transformers with Group-Mix Attention

Authors: Chongjian Ge, Xiaohan Ding, Zhan Tong, Li Yuan, Jiangliu Wang, Yibing Song, Ping Luo

Abstract: Vision Transformers (ViTs) have been shown to enhance visual recognition through modeling long-range dependencies with multi-head self-attention (MHSA), which is typically formulated as Query-Key-Value computation. However, the attention map generated from the Query and Key captures only token-to-token correlations at one single granularity. In this paper, we argue that self-attention should have… ▽ More Vision Transformers (ViTs) have been shown to enhance visual recognition through modeling long-range dependencies with multi-head self-attention (MHSA), which is typically formulated as Query-Key-Value computation. However, the attention map generated from the Query and Key captures only token-to-token correlations at one single granularity. In this paper, we argue that self-attention should have a more comprehensive mechanism to capture correlations among tokens and groups (i.e., multiple adjacent tokens) for higher representational capacity. Thereby, we propose Group-Mix Attention (GMA) as an advanced replacement for traditional self-attention, which can simultaneously capture token-to-token, token-to-group, and group-to-group correlations with various group sizes. To this end, GMA splits the Query, Key, and Value into segments uniformly and performs different group aggregations to generate group proxies. The attention map is computed based on the mixtures of tokens and group proxies and used to re-combine the tokens and groups in Value. Based on GMA, we introduce a powerful backbone, namely GroupMixFormer, which achieves state-of-the-art performance in image classification, object detection, and semantic segmentation with fewer parameters than existing models. For instance, GroupMixFormer-L (with 70.3M parameters and 384^2 input) attains 86.2% Top-1 accuracy on ImageNet-1K without external data, while GroupMixFormer-B (with 45.8M parameters) attains 51.2% mIoU on ADE20K. △ Less

Submitted 25 November, 2023; originally announced November 2023.

arXiv:2310.15455 [pdf, other]

UI Layout Generation with LLMs Guided by UI Grammar

Authors: Yuwen Lu, Ziang Tong, Qinyi Zhao, Chengzhi Zhang, Toby Jia-Jun Li

Abstract: The recent advances in Large Language Models (LLMs) have stimulated interest among researchers and industry professionals, particularly in their application to tasks concerning mobile user interfaces (UIs). This position paper investigates the use of LLMs for UI layout generation. Central to our exploration is the introduction of UI grammar -- a novel approach we proposed to represent the hierarch… ▽ More The recent advances in Large Language Models (LLMs) have stimulated interest among researchers and industry professionals, particularly in their application to tasks concerning mobile user interfaces (UIs). This position paper investigates the use of LLMs for UI layout generation. Central to our exploration is the introduction of UI grammar -- a novel approach we proposed to represent the hierarchical structure inherent in UI screens. The aim of this approach is to guide the generative capacities of LLMs more effectively and improve the explainability and controllability of the process. Initial experiments conducted with GPT-4 showed the promising capability of LLMs to produce high-quality user interfaces via in-context learning. Furthermore, our preliminary comparative study suggested the potential of the grammar-based approach in improving the quality of generative results in specific aspects. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: ICML 2023 Workshop on AI and HCI

arXiv:2309.16260 [pdf, other]

doi 10.1002/adma.202309356

A gate-tunable quantum phase transition in a topological excitonic insulator

Authors: Yande Que, Yang-Hao Chan, Junxiang Jia, Anirban Das, Zhengjue Tong, Yu-Tzu Chang, Zhenhao Cui, Amit Kumar, Gagandeep Singh, Hsin Lin, Shantanu Mukherjee, Bent Weber

Abstract: Coulomb interactions among electrons and holes in two-dimensional (2D) semimetals with overlapping valence and conduction bands can give rise to a correlated insulating ground state via exciton formation and condensation. One candidate material in which such excitonic state uniquely combines with non-trivial band topology are atomic monolayers of tungsten ditelluride (WTe2), in which a 2D topologi… ▽ More Coulomb interactions among electrons and holes in two-dimensional (2D) semimetals with overlapping valence and conduction bands can give rise to a correlated insulating ground state via exciton formation and condensation. One candidate material in which such excitonic state uniquely combines with non-trivial band topology are atomic monolayers of tungsten ditelluride (WTe2), in which a 2D topological excitonic insulator (2D TEI) forms. However, the detailed mechanism of the 2D bulk gap formation in WTe2, in particular with regard to the role of Coulomb interactions, has remained a subject of ongoing debate. Here, we show that WTe2 is susceptible to a gate-tunable quantum phase transition, evident from an abrupt collapse of its 2D bulk energy gap upon ambipolar field-effect doping. Such gate tunability of a 2D TEI, into either n- and p-type semimetals, promises novel handles of control over non-trivial 2D superconductivity with excitonic pairing. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: 8 pages, 4 figures, under submission

arXiv:2309.13942 [pdf, other]

Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training

Authors: Jiangliu Wang, Jianbo Jiao, Yibing Song, Stephen James, Zhan Tong, Chongjian Ge, Pieter Abbeel, Yun-hui Liu

Abstract: This work aims to improve unsupervised audio-visual pre-training. Inspired by the efficacy of data augmentation in visual contrastive learning, we propose a novel speed co-augmentation method that randomly changes the playback speeds of both audio and video data. Despite its simplicity, the speed co-augmentation method possesses two compelling attributes: (1) it increases the diversity of audio-vi… ▽ More This work aims to improve unsupervised audio-visual pre-training. Inspired by the efficacy of data augmentation in visual contrastive learning, we propose a novel speed co-augmentation method that randomly changes the playback speeds of both audio and video data. Despite its simplicity, the speed co-augmentation method possesses two compelling attributes: (1) it increases the diversity of audio-visual pairs and doubles the size of negative pairs, resulting in a significant enhancement in the learned representations, and (2) it changes the strict correlation between audio-visual pairs but introduces a partial relationship between the augmented pairs, which is modeled by our proposed SoftInfoNCE loss to further boost the performance. Experimental results show that the proposed method significantly improves the learned representations when compared to vanilla audio-visual contrastive learning. △ Less

Submitted 25 September, 2023; originally announced September 2023.

Comments: Published at the CVPR 2023 Sight and Sound workshop

arXiv:2309.11797 [pdf, ps, other]

A sharp frequency-preserving KAM theorem with continuous dependence on parameters and several counterexamples

Authors: Zhicheng Tong, Yong Li

Abstract: This paper mainly concerns the frequency-preserving Kolmogorov-Arnold-Moser (KAM) theorem via irregular continuity with respect to the parameter. Instead of digging out domains or requiring the uniform weak convexity for the frequency mapping, we introduce the concept of relative singularity, allowing many explicit parameterized Hamiltonian systems that admit arbitrarily weak regularity. The KAM i… ▽ More This paper mainly concerns the frequency-preserving Kolmogorov-Arnold-Moser (KAM) theorem via irregular continuity with respect to the parameter. Instead of digging out domains or requiring the uniform weak convexity for the frequency mapping, we introduce the concept of relative singularity, allowing many explicit parameterized Hamiltonian systems that admit arbitrarily weak regularity. The KAM iterative scheme employed here is quasi-linear that differs from the usual linear one. Additionally, we also construct a number of counterexamples, aiming to emphasize the indispensability of our new conditions towards frequency-preserving. Moreover, we also show the parallel applicability of our results to partial frequency-preserving and infinite-dimensional cases. △ Less

Submitted 21 September, 2023; originally announced September 2023.

Comments: 31 pages

MSC Class: 37J40; 70H08; 70K43

arXiv:2308.00333 [pdf]

doi 10.1088/1361-6528/acebf7

Performance benchmarking of an ultra-low vibration laboratory to host a commercial millikelvin scanning tunnelling microscope

Authors: Yande Que, Amit Kumar, Michael S. Lodge, Zhengjue Tong, Marcus Lai Kar Fai, Wei Tao, Zhenhao Cui, Ranjith Shivajirao, Junxiang Jia, Siew Eang Lee, Bent Weber

Abstract: Ultra-low temperature scanning tunnelling microscopy and spectroscopy (STM/STS) achieved by dilution refrigeration can provide unrivalled insight into the local electronic structure of quantum materials and atomic-scale quantum systems. Effective isolation from mechanical vibration and acoustic noise is critical in order to achieve ultimate spatial and energy resolution. Here, we report on the des… ▽ More Ultra-low temperature scanning tunnelling microscopy and spectroscopy (STM/STS) achieved by dilution refrigeration can provide unrivalled insight into the local electronic structure of quantum materials and atomic-scale quantum systems. Effective isolation from mechanical vibration and acoustic noise is critical in order to achieve ultimate spatial and energy resolution. Here, we report on the design and performance of an ultra-low vibration (ULV) laboratory hosting a customized but otherwise commercially available 40mK STM. The design of the vibration isolation consists of a T-shaped concrete mass block (55t), suspended by actively controlled pneumatic springs, and placed on a foundation separated from the surrounding building in a "room-within-a-room" design. Vibration levels achieved are meeting the VC-M vibration standard at >3 Hz, reached only in a limited number of laboratories worldwide. Measurement of the STM's junction noise confirms effective vibration isolation on par with custom built STMs in ULV laboratories. In this tailored low-vibration environment, the STM achieves an energy resolution of 43ueV (144 mK), promising for the investigation and control of quantum matter at atomic length scales. △ Less

Submitted 1 August, 2023; originally announced August 2023.

arXiv:2307.15753 [pdf, other]

doi 10.1103/PhysRevD.108.072006

A search for new physics in low-energy electron recoils from the first LZ exposure

Authors: The LZ Collaboration, J. Aalbers, D. S. Akerib, A. K. Al Musalhi, F. Alder, C. S. Amarasinghe, A. Ames, T. J. Anderson, N. Angelides, H. M. Araújo, J. E. Armstrong, M. Arthurs, A. Baker, S. Balashov, J. Bang, J. W. Bargemann, A. Baxter, K. Beattie, P. Beltrame, T. Benson, A. Bhatti, A. Biekert, T. P. Biesiadzinski, H. J. Birch, G. M. Blockinger , et al. (178 additional authors not shown)

Abstract: The LUX-ZEPLIN (LZ) experiment is a dark matter detector centered on a dual-phase xenon time projection chamber. We report searches for new physics appearing through few-keV-scale electron recoils, using the experiment's first exposure of 60 live days and a fiducial mass of 5.5t. The data are found to be consistent with a background-only hypothesis, and limits are set on models for new physics inc… ▽ More The LUX-ZEPLIN (LZ) experiment is a dark matter detector centered on a dual-phase xenon time projection chamber. We report searches for new physics appearing through few-keV-scale electron recoils, using the experiment's first exposure of 60 live days and a fiducial mass of 5.5t. The data are found to be consistent with a background-only hypothesis, and limits are set on models for new physics including solar axion electron coupling, solar neutrino magnetic moment and millicharge, and electron couplings to galactic axion-like particles and hidden photons. Similar limits are set on weakly interacting massive particle (WIMP) dark matter producing signals through ionized atomic states from the Migdal effect. △ Less

Submitted 9 September, 2023; v1 submitted 28 July, 2023; originally announced July 2023.

Comments: 13 pages, 10 figures. See https://tinyurl.com/LZDataReleaseRun1ER for a data release related to this paper

Journal ref: Phys. Rev. D 108, 072006 (2023)

arXiv:2306.08211 [pdf, ps, other]

Towards sharp regularity: Full dimensional tori in $ C^\infty $ vector fields over $ \mathbb{T}^\infty $

Authors: Zhicheng Tong, Yong Li

Abstract: We consider linearization of perturbed vector field $ ω+P $ over infinite dimensional torus $ \mathbb{T}^\infty $ and give sharp regularity requirement for perturbation $ P $ under which there is a nearly identical transformation conjugating the unperturbed one $ ω$ onto $ ω-\tildeω+P $ via a small modifying term $ \tildeω $. Besides discussing the Diophantine type introduced by Bourgain [11], we… ▽ More We consider linearization of perturbed vector field $ ω+P $ over infinite dimensional torus $ \mathbb{T}^\infty $ and give sharp regularity requirement for perturbation $ P $ under which there is a nearly identical transformation conjugating the unperturbed one $ ω$ onto $ ω-\tildeω+P $ via a small modifying term $ \tildeω $. Besides discussing the Diophantine type introduced by Bourgain [11], we also investigate the universal nonresonance and provide weakest regularity of perturbations known so far for which KAM applies. Lower than analyticity, our results allow Gevrey or even only $ C^\infty $, and the new KAM scheme with a balancing sequence to overcome non-polynomial nonresonance is shown to be non-Newtonian that differs from the usual ones. Thereby, except deriving sharp Gevrey exponent along Diophantine nonresonance, we answer the fundamental question of what is the minimum regularity required for KAM in infinite dimensional case. Additionally, our linearization could also be employed to deal with quasi periodic case over $ \mathbb{T}^n $. △ Less

Submitted 6 July, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

Comments: 28 pages

MSC Class: 37K20; 37K55

arXiv:2305.14895 [pdf, other]

doi 10.1088/1674-4527/acd593

The Lobster Eye Imager for Astronomy Onboard the SATech-01 Satellite

Authors: Z. X. Ling, X. J. Sun, C. Zhang, S. L. Sun, G. Jin, S. N. Zhang, X. F. Zhang, J. B. Chang, F. S. Chen, Y. F. Chen, Z. W. Cheng, W. Fu, Y. X. Han, H. Li, J. F. Li, Y. Li, Z. D. Li, P. R. Liu, Y. H. Lv, X. H. Ma, Y. J. Tang, C. B. Wang, R. J. Xie, Y. L. Xue, A. L. Yan , et al. (101 additional authors not shown)

Abstract: The Lobster Eye Imager for Astronomy (LEIA), a pathfinder of the Wide-field X-ray Telescope of the Einstein Probe (EP) mission, was successfully launched onboard the SATech-01 satellite of the Chinese Academy of Sciences on 27 July 2022. In this paper, we introduce the design and on-ground test results of the LEIA instrument. Using state-of-the-art Micro-Pore Optics (MPO), a wide field-of-view (Fo… ▽ More The Lobster Eye Imager for Astronomy (LEIA), a pathfinder of the Wide-field X-ray Telescope of the Einstein Probe (EP) mission, was successfully launched onboard the SATech-01 satellite of the Chinese Academy of Sciences on 27 July 2022. In this paper, we introduce the design and on-ground test results of the LEIA instrument. Using state-of-the-art Micro-Pore Optics (MPO), a wide field-of-view (FoV) of 346 square degrees (18.6 degrees * 18.6 degrees) of the X-ray imager is realized. An optical assembly composed of 36 MPO chips is used to focus incident X-ray photons, and four large-format complementary metal-oxide semiconductor (CMOS) sensors, each of 6 cm * 6 cm, are used as the focal plane detectors. The instrument has an angular resolution of 4 - 8 arcmin (in FWHM) for the central focal spot of the point spread function, and an effective area of 2 - 3 cm2 at 1 keV in essentially all the directions within the field of view. The detection passband is 0.5 - 4 keV in the soft X-rays and the sensitivity is 2 - 3 * 10-11 erg s-1 cm-2 (about 1 mini-Crab) at 1,000 second observation. The total weight of LEIA is 56 kg and the power is 85 W. The satellite, with a design lifetime of 2 years, operates in a Sun-synchronous orbit of 500 km with an orbital period of 95 minutes. LEIA is paving the way for future missions by verifying in flight the technologies of both novel focusing imaging optics and CMOS sensors for X-ray observation, and by optimizing the working setups of the instrumental parameters. In addition, LEIA is able to carry out scientific observations to find new transients and to monitor known sources in the soft X-ray band, albeit limited useful observing time available. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: Accepted by RAA

arXiv:2305.14173 [pdf, other]

TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale

Authors: Ziyun Zeng, Yixiao Ge, Zhan Tong, Xihui Liu, Shu-Tao Xia, Ying Shan

Abstract: The ultimate goal for foundation models is realizing task-agnostic, i.e., supporting out-of-the-box usage without task-specific fine-tuning. Although breakthroughs have been made in natural language processing and image representation learning, it is still challenging for video models to reach it due to the increasing uncertainty of spatiotemporal signals. To ease training, existing works leverage… ▽ More The ultimate goal for foundation models is realizing task-agnostic, i.e., supporting out-of-the-box usage without task-specific fine-tuning. Although breakthroughs have been made in natural language processing and image representation learning, it is still challenging for video models to reach it due to the increasing uncertainty of spatiotemporal signals. To ease training, existing works leverage image foundation models' prior knowledge and equip them with efficient temporal modules. Despite the satisfactory fine-tuning performance, we empirically find they fall short of out-of-the-box usage, given the even degraded performance in zero-shot/linear protocols compared to their baseline counterparts. In this work, we analyze the factor that leads to degradation from the perspective of language supervision distortion. We argue that tuning a text encoder end-to-end, as done in previous work, is suboptimal since it may overfit in terms of styles, thereby losing its original generalization ability to capture the semantics of various language registers. The overfitted text encoder, in turn, provides a harmful supervision signal, degrading the video representation. To tackle this issue, we propose a degradation-free pre-training strategy to retain the generalization ability of the text encoder via freezing shallow layers while enabling the task-related semantics capturing in tunable deep layers. As for the training objective, we adopted the transcript sorting task in TVTS incorporated with masking techniques to enable scalable training. As a result, we produce a series of models, dubbed TVTSv2, with up to one billion parameters. We achieve new state-of-the-arts on various video benchmarks with a frozen backbone, surpassing the recent ImageBind, InternVideo, etc. Code is available at https://github.com/TencentARC/TVTS. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: Technical Report

arXiv:2305.07095 [pdf, other]

Are Machine Rationales (Not) Useful to Humans? Measuring and Improving Human Utility of Free-Text Rationales

Authors: Brihi Joshi, Ziyi Liu, Sahana Ramnath, Aaron Chan, Zhewei Tong, Shaoliang Nie, Qifan Wang, Yejin Choi, Xiang Ren

Abstract: Among the remarkable emergent capabilities of large language models (LMs) is free-text rationalization; beyond a certain scale, large LMs are capable of generating seemingly useful rationalizations, which in turn, can dramatically enhance their performances on leaderboards. This phenomenon raises a question: can machine generated rationales also be useful for humans, especially when lay humans try… ▽ More Among the remarkable emergent capabilities of large language models (LMs) is free-text rationalization; beyond a certain scale, large LMs are capable of generating seemingly useful rationalizations, which in turn, can dramatically enhance their performances on leaderboards. This phenomenon raises a question: can machine generated rationales also be useful for humans, especially when lay humans try to answer questions based on those machine rationales? We observe that human utility of existing rationales is far from satisfactory, and expensive to estimate with human studies. Existing metrics like task performance of the LM generating the rationales, or similarity between generated and gold rationales are not good indicators of their human utility. While we observe that certain properties of rationales like conciseness and novelty are correlated with their human utility, estimating them without human involvement is challenging. We show that, by estimating a rationale's helpfulness in answering similar unseen instances, we can measure its human utility to a better extent. We also translate this finding into an automated score, GEN-U, that we propose, which can help improve LMs' ability to generate rationales with better human utility, while maintaining most of its task performance. Lastly, we release all code and collected data with this project. △ Less

Submitted 11 May, 2023; originally announced May 2023.

Comments: Accepted at ACL 2023

arXiv:2304.13838 [pdf]

doi 10.1115/1.4062844

Theoretical Puncture Mechanics of Soft Compressible Solids

Authors: Stefano Fregonese, Zhiyuan Tong, Sibo Wang, Mattia Bacca

Abstract: Accurate prediction of the force required to puncture a soft material is critical in many fields like medical technology, food processing, and manufacturing. However, such a prediction strongly depends on our understanding of the complex nonlinear behavior of the material subject to deep indentation and complex failure mechanisms. Only recently we developed theories capable of correlating puncture… ▽ More Accurate prediction of the force required to puncture a soft material is critical in many fields like medical technology, food processing, and manufacturing. However, such a prediction strongly depends on our understanding of the complex nonlinear behavior of the material subject to deep indentation and complex failure mechanisms. Only recently we developed theories capable of correlating puncture force with material properties and needle geometry. However, such models are based on simplifications that seldom limit their applicability to real cases. One common assumption is the incompressibility of the cut material, albeit no material is truly incompressible. In this paper we propose a simple model that accounts for linearly elastic compressibility, and its interplay with toughness, stiffness, and elastic strain-stiffening. Confirming previous theories and experiments, materials having high-toughness and low-modulus exhibit the highest puncture resistance at a given needle radius. Surprisingly, in these conditions, we observe that incompressible materials exhibit the lowest puncture resistance, where volumetric compressibility can create an additional (strain) energy barrier to puncture. Our model provides a valuable tool to assess the puncture resistance of soft compressible materials and suggests new design strategies for sharp needles and puncture-resistant materials. △ Less

Submitted 26 April, 2023; originally announced April 2023.

arXiv:2304.08451 [pdf, other]

Efficient Video Action Detection with Token Dropout and Context Refinement

Authors: Lei Chen, Zhan Tong, Yibing Song, Gangshan Wu, Limin Wang

Abstract: Streaming video clips with large-scale video tokens impede vision transformers (ViTs) for efficient recognition, especially in video action detection where sufficient spatiotemporal representations are required for precise actor identification. In this work, we propose an end-to-end framework for efficient video action detection (EVAD) based on vanilla ViTs. Our EVAD consists of two specialized de… ▽ More Streaming video clips with large-scale video tokens impede vision transformers (ViTs) for efficient recognition, especially in video action detection where sufficient spatiotemporal representations are required for precise actor identification. In this work, we propose an end-to-end framework for efficient video action detection (EVAD) based on vanilla ViTs. Our EVAD consists of two specialized designs for video action detection. First, we propose a spatiotemporal token dropout from a keyframe-centric perspective. In a video clip, we maintain all tokens from its keyframe, preserve tokens relevant to actor motions from other frames, and drop out the remaining tokens in this clip. Second, we refine scene context by leveraging remaining tokens for better recognizing actor identities. The region of interest (RoI) in our action detector is expanded into temporal domain. The captured spatiotemporal actor identity representations are refined via scene context in a decoder with the attention mechanism. These two designs make our EVAD efficient while maintaining accuracy, which is validated on three benchmark datasets (i.e., AVA, UCF101-24, JHMDB). Compared to the vanilla ViT backbone, our EVAD reduces the overall GFLOPs by 43% and improves real-time inference speed by 40% with no performance degradation. Moreover, even at similar computational costs, our EVAD can improve the performance by 1.1 mAP with higher resolution inputs. Code is available at https://github.com/MCG-NJU/EVAD. △ Less

Submitted 28 August, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

Comments: technical report

arXiv:2304.03885 [pdf]

Direct Laser Writing of Surface Micro-Domes by Plasmonic Bubbles

Authors: Lihua Dong, Fulong Wang, Buyun Chen, Chenliang Xia, Pengwei Zhu, Zhi Tong, Huimin Wang, Lijun Yang, Yuliang Wang

Abstract: Plasmonic microbubbles produced by laser irradiated gold nanoparticles (GNPs) in various liquids have emerged in numerous innovative applications. The nucleation of these bubbles inherently involves rich phenomena. In this paper, we systematically investigate the physicochemical hydrodynamics of plasmonic bubbles upon irradiation of a continuous wave (CW) laser on a GNP decorated sample surface in… ▽ More Plasmonic microbubbles produced by laser irradiated gold nanoparticles (GNPs) in various liquids have emerged in numerous innovative applications. The nucleation of these bubbles inherently involves rich phenomena. In this paper, we systematically investigate the physicochemical hydrodynamics of plasmonic bubbles upon irradiation of a continuous wave (CW) laser on a GNP decorated sample surface in ferric nitrate solution. Surprisingly, we observe the direct formation of well-defined micro-domes on the sample surface. It reveals that the nucleation of a plasmonic bubble is associated with the solvothermal decomposition of ferric nitrate in the solution. The plasmonic bubble acts as a template for the deposition of iron oxide nanoparticles. It first forms a rim, then a micro-shell, which eventually becomes a solid micro-dome. Experimental results show that the micro-dome radius Rd exhibits an obvious dependence on time t, which can be well interpreted theoretically. Our findings reveal the rich phenomena associated with plasmonic bubble nucleation in a thermally decomposable solution, paving a plasmonic bubble-based approach to fabricate three dimensional microstructures by using an ordinary CW laser. △ Less

Submitted 7 April, 2023; originally announced April 2023.

arXiv:2304.03768 [pdf, other]

SparseFormer: Sparse Visual Recognition via Limited Latent Tokens

Authors: Ziteng Gao, Zhan Tong, Limin Wang, Mike Zheng Shou

Abstract: Human visual recognition is a sparse process, where only a few salient visual cues are attended to rather than traversing every detail uniformly. However, most current vision networks follow a dense paradigm, processing every single visual unit (e.g,, pixel or patch) in a uniform manner. In this paper, we challenge this dense paradigm and present a new method, coined SparseFormer, to imitate human… ▽ More Human visual recognition is a sparse process, where only a few salient visual cues are attended to rather than traversing every detail uniformly. However, most current vision networks follow a dense paradigm, processing every single visual unit (e.g,, pixel or patch) in a uniform manner. In this paper, we challenge this dense paradigm and present a new method, coined SparseFormer, to imitate human's sparse visual recognition in an end-to-end manner. SparseFormer learns to represent images using a highly limited number of tokens (down to 49) in the latent space with sparse feature sampling procedure instead of processing dense units in the original pixel space. Therefore, SparseFormer circumvents most of dense operations on the image space and has much lower computational costs. Experiments on the ImageNet classification benchmark dataset show that SparseFormer achieves performance on par with canonical or well-established models while offering better accuracy-throughput tradeoff. Moreover, the design of our network can be easily extended to the video classification with promising performance at lower computational costs. We hope that our work can provide an alternative way for visual modeling and inspire further research on sparse neural architectures. The code will be publicly available at https://github.com/showlab/sparseformer △ Less

Submitted 7 April, 2023; originally announced April 2023.

Comments: Technical report

arXiv:2303.17142 [pdf, other]

Soft Neighbors are Positive Supporters in Contrastive Visual Representation Learning

Authors: Chongjian Ge, Jiangliu Wang, Zhan Tong, Shoufa Chen, Yibing Song, Ping Luo

Abstract: Contrastive learning methods train visual encoders by comparing views from one instance to others. Typically, the views created from one instance are set as positive, while views from other instances are negative. This binary instance discrimination is studied extensively to improve feature representations in self-supervised learning. In this paper, we rethink the instance discrimination framework… ▽ More Contrastive learning methods train visual encoders by comparing views from one instance to others. Typically, the views created from one instance are set as positive, while views from other instances are negative. This binary instance discrimination is studied extensively to improve feature representations in self-supervised learning. In this paper, we rethink the instance discrimination framework and find the binary instance labeling insufficient to measure correlations between different samples. For an intuitive example, given a random image instance, there may exist other images in a mini-batch whose content meanings are the same (i.e., belonging to the same category) or partially related (i.e., belonging to a similar category). How to treat the images that correlate similarly to the current image instance leaves an unexplored problem. We thus propose to support the current image by exploring other correlated instances (i.e., soft neighbors). We first carefully cultivate a candidate neighbor set, which will be further utilized to explore the highly-correlated instances. A cross-attention module is then introduced to predict the correlation score (denoted as positiveness) of other correlated instances with respect to the current one. The positiveness score quantitatively measures the positive support from each correlated instance, and is encoded into the objective for pretext training. To this end, our proposed method benefits in discriminating uncorrelated instances while absorbing correlated instances for SSL. We evaluate our soft neighbor contrastive learning method (SNCLR) on standard visual recognition benchmarks, including image classification, object detection, and instance segmentation. The state-of-the-art recognition performance shows that SNCLR is effective in improving feature representations from both ViT and CNN encoders. △ Less

Submitted 30 March, 2023; originally announced March 2023.

Comments: Accepted by ICLR23

arXiv:2303.16727 [pdf, other]

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

Authors: Limin Wang, Bingkun Huang, Zhiyu Zhao, Zhan Tong, Yinan He, Yi Wang, Yali Wang, Yu Qiao

Abstract: Scale is the primary factor for building a powerful foundation model that could well generalize to a variety of downstream tasks. However, it is still challenging to train video foundation models with billions of parameters. This paper shows that video masked autoencoder (VideoMAE) is a scalable and general self-supervised pre-trainer for building video foundation models. We scale the VideoMAE in… ▽ More Scale is the primary factor for building a powerful foundation model that could well generalize to a variety of downstream tasks. However, it is still challenging to train video foundation models with billions of parameters. This paper shows that video masked autoencoder (VideoMAE) is a scalable and general self-supervised pre-trainer for building video foundation models. We scale the VideoMAE in both model and data with a core design. Specifically, we present a dual masking strategy for efficient pre-training, with an encoder operating on a subset of video tokens and a decoder processing another subset of video tokens. Although VideoMAE is very efficient due to high masking ratio in encoder, masking decoder can still further reduce the overall computational cost. This enables the efficient pre-training of billion-level models in video. We also use a progressive training paradigm that involves an initial pre-training on a diverse multi-sourced unlabeled dataset, followed by a post-pre-training on a mixed labeled dataset. Finally, we successfully train a video ViT model with a billion parameters, which achieves a new state-of-the-art performance on the datasets of Kinetics (90.0% on K400 and 89.9% on K600) and Something-Something (68.7% on V1 and 77.0% on V2). In addition, we extensively verify the pre-trained video ViT models on a variety of downstream tasks, demonstrating its effectiveness as a general video representation learner. The code and model is available at \url{https://github.com/OpenGVLab/VideoMAEv2}. △ Less

Submitted 18 April, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

Comments: CVPR 2023 camera-ready version

arXiv:2303.16118 [pdf, other]

CycleACR: Cycle Modeling of Actor-Context Relations for Video Action Detection

Authors: Lei Chen, Zhan Tong, Yibing Song, Gangshan Wu, Limin Wang

Abstract: The relation modeling between actors and scene context advances video action detection where the correlation of multiple actors makes their action recognition challenging. Existing studies model each actor and scene relation to improve action recognition. However, the scene variations and background interference limit the effectiveness of this relation modeling. In this paper, we propose to select… ▽ More The relation modeling between actors and scene context advances video action detection where the correlation of multiple actors makes their action recognition challenging. Existing studies model each actor and scene relation to improve action recognition. However, the scene variations and background interference limit the effectiveness of this relation modeling. In this paper, we propose to select actor-related scene context, rather than directly leverage raw video scenario, to improve relation modeling. We develop a Cycle Actor-Context Relation network (CycleACR) where there is a symmetric graph that models the actor and context relations in a bidirectional form. Our CycleACR consists of the Actor-to-Context Reorganization (A2C-R) that collects actor features for context feature reorganizations, and the Context-to-Actor Enhancement (C2A-E) that dynamically utilizes reorganized context features for actor feature enhancement. Compared to existing designs that focus on C2A-E, our CycleACR introduces A2C-R for a more effective relation modeling. This modeling advances our CycleACR to achieve state-of-the-art performance on two popular action detection datasets (i.e., AVA and UCF101-24). We also provide ablation studies and visualizations as well to show how our cycle actor-context relation modeling improves video action detection. Code is available at https://github.com/MCG-NJU/CycleACR. △ Less

Submitted 28 March, 2023; originally announced March 2023.

Comments: technical report

arXiv:2302.14361 [pdf, ps, other]

Towards continuity: Universal frequency-preserving KAM persistence and remaining regularity

Authors: Zhicheng Tong, Yong Li

Abstract: Beyond Hölder's type, this paper mainly concerns the persistence and remaining regularity of an individual frequency-preserving KAM torus in a finitely differentiable Hamiltonian system, even allows the non-integrable part being critical finitely smooth. To achieve this goal, besides investigating the Jackson approximation theorem towards only modulus of continuity, we demonstrate an abstract regu… ▽ More Beyond Hölder's type, this paper mainly concerns the persistence and remaining regularity of an individual frequency-preserving KAM torus in a finitely differentiable Hamiltonian system, even allows the non-integrable part being critical finitely smooth. To achieve this goal, besides investigating the Jackson approximation theorem towards only modulus of continuity, we demonstrate an abstract regularity theorem adapting to the new iterative scheme. Via these tools, we obtain a KAM theorem with sharp differentiability hypotheses, asserting that the persistent torus keeps prescribed universal Diophantine frequency unchanged. Further, the non-Hölder regularity for invariant KAM torus as well as the conjugation is explicitly shown by introducing asymptotic analysis. To our knowledge, this is the first approach to KAM on these aspects in a continuous sense, and we also provide two systems, which cannot be studied by previous KAM but by ours. △ Less

Submitted 28 February, 2023; originally announced February 2023.

Comments: 36 pages, substantial text overlap with arXiv:2301.13590

MSC Class: 37J40; 70K60

arXiv:2302.05183 [pdf, ps, other]

Moser's Theorem with Frequency-preserving

Authors: Chang Liu, Zhicheng Tong, Yong Li

Abstract: This paper mainly concerns the KAM persistence of the mapping $\mathscr{F}:\mathbb{T}^{n}\times E\rightarrow \mathbb{T}^{n}\times \mathbb{R}^{n}$ with intersection property, where $E\subset \mathbb{R}^{n}$ is a connected closed bounded domain with interior points. By assuming that the frequency mapping satisfies certain topological degree condition and weak convexity condition, we prove some Moser… ▽ More This paper mainly concerns the KAM persistence of the mapping $\mathscr{F}:\mathbb{T}^{n}\times E\rightarrow \mathbb{T}^{n}\times \mathbb{R}^{n}$ with intersection property, where $E\subset \mathbb{R}^{n}$ is a connected closed bounded domain with interior points. By assuming that the frequency mapping satisfies certain topological degree condition and weak convexity condition, we prove some Moser type results about the invariant torus of mapping $\mathscr{F}$ with frequency-preserving under small perturbations. To our knowledge, this is the first approach to Moser's theorem with frequency-preserving. Moreover, given perturbed mappings over $ \mathbb{T}^n $, it is shown that such persistence still holds when the frequency mapping and perturbations are only continuous about parameter beyond Lipschitz or even Hölder type. We also touch the parameter without dimension limitation problem under such settings. △ Less

Submitted 10 February, 2023; originally announced February 2023.

Comments: 26 pages

MSC Class: 37E40; 37J40

arXiv:2301.13590 [pdf, ps, other]

Universal frequency-preserving KAM persistence via modulus of continuity

Authors: Zhicheng Tong, Yong Li

Abstract: In this paper, we study the persistence and remaining regularity of KAM invariant torus under sufficiently small perturbations of a Hamiltonian function together with its derivatives, in sense of finite smoothness with modulus of continuity, as a generalization of classical Hölder continuous circumstances. To achieve this goal, we extend the Jackson approximation theorem to the case of modulus of… ▽ More In this paper, we study the persistence and remaining regularity of KAM invariant torus under sufficiently small perturbations of a Hamiltonian function together with its derivatives, in sense of finite smoothness with modulus of continuity, as a generalization of classical Hölder continuous circumstances. To achieve this goal, we extend the Jackson approximation theorem to the case of modulus of continuity, and establish a corresponding regularity theorem adapting to the new iterative scheme. Via these tools, we establish a KAM theorem with sharp differentiability hypotheses, which asserts that the persistent torus keeps prescribed universal Diophantine frequency unchanged and reaches the regularity for persistent KAM torus beyond Hölder's type. △ Less

Submitted 31 January, 2023; originally announced January 2023.

Comments: 24 pages

MSC Class: 37J40; 70K60

arXiv:2301.10051 [pdf, other]

Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism

Authors: Zanjia Tong, Yuhang Chen, Zewei Xu, Rong Yu

Abstract: The loss function for bounding box regression (BBR) is essential to object detection. Its good definition will bring significant performance improvement to the model. Most existing works assume that the examples in the training data are high-quality and focus on strengthening the fitting ability of BBR loss. If we blindly strengthen BBR on low-quality examples, it will jeopardize localization perf… ▽ More The loss function for bounding box regression (BBR) is essential to object detection. Its good definition will bring significant performance improvement to the model. Most existing works assume that the examples in the training data are high-quality and focus on strengthening the fitting ability of BBR loss. If we blindly strengthen BBR on low-quality examples, it will jeopardize localization performance. Focal-EIoU v1 was proposed to solve this problem, but due to its static focusing mechanism (FM), the potential of non-monotonic FM was not fully exploited. Based on this idea, we propose an IoU-based loss with a dynamic non-monotonic FM named Wise-IoU (WIoU). The dynamic non-monotonic FM uses the outlier degree instead of IoU to evaluate the quality of anchor boxes and provides a wise gradient gain allocation strategy. This strategy reduces the competitiveness of high-quality anchor boxes while also reducing the harmful gradient generated by low-quality examples. This allows WIoU to focus on ordinary-quality anchor boxes and improve the detector's overall performance. When WIoU is applied to the state-of-the-art real-time detector YOLOv7, the AP-75 on the MS-COCO dataset is improved from 53.03% to 54.50%. Code is available at https://github.com/Instinct323/wiou. △ Less

Submitted 8 April, 2023; v1 submitted 24 January, 2023; originally announced January 2023.

arXiv:2212.03499 [pdf, other]

Learning Continuous Depth Representation via Geometric Spatial Aggregator

Authors: Xiaohang Wang, Xuanhong Chen, Bingbing Ni, Zhengyan Tong, Hang Wang

Abstract: Depth map super-resolution (DSR) has been a fundamental task for 3D computer vision. While arbitrary scale DSR is a more realistic setting in this scenario, previous approaches predominantly suffer from the issue of inefficient real-numbered scale upsampling. To explicitly address this issue, we propose a novel continuous depth representation for DSR. The heart of this representation is our propos… ▽ More Depth map super-resolution (DSR) has been a fundamental task for 3D computer vision. While arbitrary scale DSR is a more realistic setting in this scenario, previous approaches predominantly suffer from the issue of inefficient real-numbered scale upsampling. To explicitly address this issue, we propose a novel continuous depth representation for DSR. The heart of this representation is our proposed Geometric Spatial Aggregator (GSA), which exploits a distance field modulated by arbitrarily upsampled target gridding, through which the geometric information is explicitly introduced into feature aggregation and target generation. Furthermore, bricking with GSA, we present a transformer-style backbone named GeoDSR, which possesses a principled way to construct the functional mapping between local coordinates and the high-resolution output results, empowering our model with the advantage of arbitrary shape transformation ready to help diverse zooming demand. Extensive experimental results on standard depth map benchmarks, e.g., NYU v2, have demonstrated that the proposed framework achieves significant restoration gain in arbitrary scale depth map super-resolution compared with the prior art. Our codes are available at https://github.com/nana01219/GeoDSR. △ Less

Submitted 7 December, 2022; originally announced December 2022.

Comments: Accepted to AAAI 2023. Code is available at https://github.com/nana01219/GeoDSR

ACM Class: I.4

arXiv:2211.17120 [pdf, other]

doi 10.1103/PhysRevD.108.012010

Background Determination for the LUX-ZEPLIN (LZ) Dark Matter Experiment

Authors: J. Aalbers, D. S. Akerib, A. K. Al Musalhi, F. Alder, S. K. Alsum, C. S. Amarasinghe, A. Ames, T. J. Anderson, N. Angelides, H. M. Araújo, J. E. Armstrong, M. Arthurs, A. Baker, J. Bang, J. W. Bargemann, A. Baxter, K. Beattie, P. Beltrame, E. P. Bernard, A. Bhatti, A. Biekert, T. P. Biesiadzinski, H. J. Birch, G. M. Blockinger, B. Boxer , et al. (178 additional authors not shown)

Abstract: The LUX-ZEPLIN experiment recently reported limits on WIMP-nucleus interactions from its initial science run, down to $9.2\times10^{-48}$ cm$^2$ for the spin-independent interaction of a 36 GeV/c$^2$ WIMP at 90% confidence level. In this paper, we present a comprehensive analysis of the backgrounds important for this result and for other upcoming physics analyses, including neutrinoless double-bet… ▽ More The LUX-ZEPLIN experiment recently reported limits on WIMP-nucleus interactions from its initial science run, down to $9.2\times10^{-48}$ cm$^2$ for the spin-independent interaction of a 36 GeV/c$^2$ WIMP at 90% confidence level. In this paper, we present a comprehensive analysis of the backgrounds important for this result and for other upcoming physics analyses, including neutrinoless double-beta decay searches and effective field theory interpretations of LUX-ZEPLIN data. We confirm that the in-situ determinations of bulk and fixed radioactive backgrounds are consistent with expectations from the ex-situ assays. The observed background rate after WIMP search criteria were applied was $(6.3\pm0.5)\times10^{-5}$ events/keV$_{ee}$/kg/day in the low-energy region, approximately 60 times lower than the equivalent rate reported by the LUX experiment. △ Less

Submitted 17 July, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

Comments: 25 pages, 15 figures

Journal ref: Phys. Rev. D 108, 012010 (2023)

arXiv:2211.10007 [pdf, other]

doi 10.3847/2041-8213/aca32f

First wide field-of-view X-ray observations by a lobster eye focusing telescope in orbit

Authors: C. Zhang, Z. X. Ling, X. J. Sun, S. L. Sun, Y. Liu, Z. D. Li, Y. L. Xue, Y. F. Chen, Y. F. Dai, Z. Q. Jia, H. Y. Liu, X. F. Zhang, Y. H. Zhang, S. N. Zhang, F. S. Chen, Z. W. Cheng, W. Fu, Y. X. Han, H. Li, J. F. Li, Y. Li, P. R. Liu, X. H. Ma, Y. J. Tang, C. B. Wang , et al. (53 additional authors not shown)

Abstract: As a novel X-ray focusing technology, lobster eye micro-pore optics (MPO) feature both a wide observing field of view and true imaging capability, promising sky monitoring with significantly improved sensitivity and spatial resolution in soft X-rays. Since first proposed by Angel (1979), the optics have been extensively studied, developed and trialed over the past decades. In this Letter, we repor… ▽ More As a novel X-ray focusing technology, lobster eye micro-pore optics (MPO) feature both a wide observing field of view and true imaging capability, promising sky monitoring with significantly improved sensitivity and spatial resolution in soft X-rays. Since first proposed by Angel (1979), the optics have been extensively studied, developed and trialed over the past decades. In this Letter, we report on the first-light results from a flight experiment of the Lobster Eye Imager for Astronomy ($LEIA$), a pathfinder of the wide-field X-ray telescope of the Einstein Probe mission. The piggyback imager, launched in July 2022, has a mostly un-vignetted field of view of $18.6^\circ \times 18.6^\circ $. Its spatial resolution is in the range of 4$-$7 arcmin in FWHM and the focal spot effective area is 2$-$3 cm$^2$, both showing only mild fluctuations across the field of view. We present images of the Galactic center region, Sco X-1 and the diffuse Cygnus Loop nebular taken in snapshot observations over 0.5$-$4 keV. These are truly wide-field X-ray images of celestial bodies observed, for the first time, by a focusing imaging telescope. Initial analyses of the in-flight data show excellent agreement between the observed images and the on-ground calibration and simulations. The instrument and its characterization are briefly described, as well as the flight experiment. The results provide a solid basis for the development of the present and proposed wide-field X-ray missions using lobster eye MPO. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: 11 pages, 4 figures. Accepted for publication in Astrophysical Journal Letter

arXiv:2211.01590 [pdf, ps, other]

Relation between irrationality and regularity for $ C^1 $ conjugacy of $ C^2 $ circle diffeomorphisms to rigid rotations

Authors: Zhicheng Tong, Yong Li

Abstract: By introducing the modulus of continuity, we first establish the corresponding cross-ratio distortion estimates under $ C^2 $ smoothness, and further give a Denjoy-type inequality, which is almost optimal in dealing with circle diffeomorphisms. The latter plays a prominent role in the study of $ C^1 $ conjugacy to irrational rotations. We also give the explicit integrability correlation between co… ▽ More By introducing the modulus of continuity, we first establish the corresponding cross-ratio distortion estimates under $ C^2 $ smoothness, and further give a Denjoy-type inequality, which is almost optimal in dealing with circle diffeomorphisms. The latter plays a prominent role in the study of $ C^1 $ conjugacy to irrational rotations. We also give the explicit integrability correlation between continuity and irrationality for the first time. Further the regularity of the conjugation is also considered and proved to be sharp. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Comments: 27 pages

MSC Class: 37E10; 37C15

arXiv:2210.04392 [pdf]

doi 10.1115/1.4062842

Multi-material topology optimization of adhesive backing layers via J-integral and strain energy minimizations

Authors: Zhiyuan Tong, Farid H. Benvidi, Mattia Bacca

Abstract: Strong adhesives rely on reduced stress concentrations, often obtained via specific geometry or composition of materials. In many examples in nature and engineering prototypes, the adhesive performance relies on structural rigidity being placed in specific locations. A few design principles have been formulated, based on parametric optimization, while a general design tool is still missing. We pro… ▽ More Strong adhesives rely on reduced stress concentrations, often obtained via specific geometry or composition of materials. In many examples in nature and engineering prototypes, the adhesive performance relies on structural rigidity being placed in specific locations. A few design principles have been formulated, based on parametric optimization, while a general design tool is still missing. We propose to use topology optimization to achieve optimal stiffness distribution in a multi-material adhesive backing layer, reducing stress concentration at specified locations. The method involves the minimization of a linear combination of J-integral and strain energy. While the J-integral minimization is aimed at reducing stress concentration, we observe that the combination of these two objectives ultimately provides the best results. We analyze three cases in plane strain conditions, namely (i) double-edged crack and (ii) center crack in tension and (iii) edge crack under shear. Each case evidences a different optimal topology with (i) and (ii) providing similar results. The optimal topology allocates stiffness in regions that are far away from the crack tip, intuitively, but the allocation of softer materials over stiffer ones can be non-trivial. To test our solutions, we plot the contact stress distribution across the interface. In all observed cases, we eliminate the stress singularity at the crack tip. Stress concentrations might arise in locations far away from the crack tip, but the final results are independent of crack size. Our method ultimately provides optimal, flaw tolerant, adhesives where the crack location is known. △ Less

Submitted 22 June, 2023; v1 submitted 9 October, 2022; originally announced October 2022.

arXiv:2210.04383 [pdf, ps, other]

KAM theorem on modulus of continuity about parameter

Authors: Zhicheng Tong, Jiayin Du, Yong Li

Abstract: In this paper, we study the Hamiltonian systems $ H\left( {y,x,ξ,\varepsilon } \right) = \left\langle {ω\left( ξ\right),y} \right\rangle + \varepsilon P\left( {y,x,ξ,\varepsilon } \right) $, where $ ω$ and $ P $ are continuous about $ ξ$. We prove that persistent invariant tori possess the same frequency as the unperturbed tori, under certain transversality condition and weak convexity condition f… ▽ More In this paper, we study the Hamiltonian systems $ H\left( {y,x,ξ,\varepsilon } \right) = \left\langle {ω\left( ξ\right),y} \right\rangle + \varepsilon P\left( {y,x,ξ,\varepsilon } \right) $, where $ ω$ and $ P $ are continuous about $ ξ$. We prove that persistent invariant tori possess the same frequency as the unperturbed tori, under certain transversality condition and weak convexity condition for the frequency mapping $ ω$. As a direct application, we prove a KAM theorem when the perturbation $P$ holds arbitrary Hölder continuity with respect to parameter $ ξ$. The infinite dimensional case is also considered. To our knowledge, this is the first approach to the systems with the only continuity in parameter beyond Hölder's type. △ Less

Submitted 19 January, 2023; v1 submitted 9 October, 2022; originally announced October 2022.

Comments: 23 pages, has been accepted for publication in SCIENCE CHINA Mathematics

MSC Class: 37J40 (Primary); 58F27 (Secondary)

arXiv:2209.13219 [pdf, other]

doi 10.1145/3503161.3547759

Im2Oil: Stroke-Based Oil Painting Rendering with Linearly Controllable Fineness Via Adaptive Sampling

Authors: Zhengyan Tong, Xiaohang Wang, Shengchao Yuan, Xuanhong Chen, Junjie Wang, Xiangzhong Fang

Abstract: This paper proposes a novel stroke-based rendering (SBR) method that translates images into vivid oil paintings. Previous SBR techniques usually formulate the oil painting problem as pixel-wise approximation. Different from this technique route, we treat oil painting creation as an adaptive sampling problem. Firstly, we compute a probability density map based on the texture complexity of the input… ▽ More This paper proposes a novel stroke-based rendering (SBR) method that translates images into vivid oil paintings. Previous SBR techniques usually formulate the oil painting problem as pixel-wise approximation. Different from this technique route, we treat oil painting creation as an adaptive sampling problem. Firstly, we compute a probability density map based on the texture complexity of the input image. Then we use the Voronoi algorithm to sample a set of pixels as the stroke anchors. Next, we search and generate an individual oil stroke at each anchor. Finally, we place all the strokes on the canvas to obtain the oil painting. By adjusting the hyper-parameter maximum sampling probability, we can control the oil painting fineness in a linear manner. Comparison with existing state-of-the-art oil painting techniques shows that our results have higher fidelity and more realistic textures. A user opinion test demonstrates that people behave more preference toward our oil paintings than the results of other methods. More interesting results and the code are in https://github.com/TZYSJTU/Im2Oil. △ Less

Submitted 27 September, 2022; originally announced September 2022.

Comments: ACM MM 2022 oral paper, accepted by the 30th ACM International Conference on Multimedia

arXiv:2208.14062 [pdf, ps, other]

Attack detection based on machine learning algorithms for different variants of Spectre attacks and different Meltdown attack implementations

Authors: Zhongkai Tong, Ziyuan Zhu, Yusha Zhang, Yuxin Liu, Dan Meng

Abstract: To improve the overall performance of processors, computer architects use various performance optimization techniques in modern processors, such as speculative execution, branch prediction, and chaotic execution. Both now and in the future, these optimization techniques are critical for improving the execution speed of processor instructions. However, researchers have discovered that these techniq… ▽ More To improve the overall performance of processors, computer architects use various performance optimization techniques in modern processors, such as speculative execution, branch prediction, and chaotic execution. Both now and in the future, these optimization techniques are critical for improving the execution speed of processor instructions. However, researchers have discovered that these techniques introduce hidden inherent security flaws, such as meltdown and ghost attacks in recent years. They exploit techniques such as chaotic execution or speculative execution combined with cache-based side-channel attacks to leak protected data. The impact of these vulnerabilities is enormous because they are prevalent in existing or future processors. However, until today, meltdown and ghost have not been effectively addressed, but instead, multiple attack variants and different attack implementations have evolved from them. This paper proposes to optimize four different hardware performance events through feature selection and use machine learning algorithms to build a real-time detection mechanism for Spectre v1,v2,v4, and different implementations of meltdown attacks, ultimately achieving an accuracy rate of over 99\%. In order to verify the practicality of the attack detection model, this paper is tested with a variety of benign programs and different implementations of Spectre attacks different from the modeling process, and the absolute accuracy also exceeds 99\%, showing that this paper can cope with different attack variants and different implementations of the same attack that may occur daily. △ Less

Submitted 30 August, 2022; originally announced August 2022.

Showing 1–50 of 110 results for author: Tong, Z