Showing 1–50 of 4,657 results for author: Li, M

Search v0.5.6 released 2020-02-24

arXiv:2407.09418 [pdf, other]

math.NA math-ph

Efficient energy-stable parametric finite element methods for surface diffusion flow and applications in solid-state dewetting

Authors: Meng Li, Yihang Guo, Jingjiang Bi

Abstract: Currently existing energy-stable parametric finite element methods for surface diffusion flow and other flows are usually limited to first-order accuracy in time. Designing a high-order algorithm for geometric flows that can also be theoretically proven to be energy-stable poses a significant challenge. Motivated by the new scalar auxiliary variable approach [F.Huang, J.Shen, Z.Yang, SIAM J. SCI.… ▽ More Currently existing energy-stable parametric finite element methods for surface diffusion flow and other flows are usually limited to first-order accuracy in time. Designing a high-order algorithm for geometric flows that can also be theoretically proven to be energy-stable poses a significant challenge. Motivated by the new scalar auxiliary variable approach [F.Huang, J.Shen, Z.Yang, SIAM J. SCI. Comput., 42 (2020), pp. A2514-A2536], we propose novel energy-stable parametric finite element approximations for isotropic/anisotropic surface diffusion flows, achieving both first-order and second-order accuracy in time. Additionally, we apply the algorithms to simulate the solid-state dewetting of thin films. Finally, extensive numerical experiments validate the accuracy, energy stability, and efficiency of our developed numerical methods. The designed algorithms in this work exhibit strong versatility, as they can be readily extended to other high-order time discretization methods (e.g., BDFk schemes). Meanwhile, the algorithms achieve remarkable computational efficiency and maintain excellent mesh quality. More importantly, the algorithm can be theoretically proven to possess unconditional energy stability, with the energy nearly equal to the original energy. △ Less

Submitted 12 July, 2024; originally announced July 2024.
arXiv:2407.09336 [pdf, other]

cs.LG cs.AI

Guidelines for Augmentation Selection in Contrastive Learning for Time Series Classification

Authors: Ziyu Liu, Azadeh Alavi, Minyi Li, Xiang Zhang

Abstract: Self-supervised contrastive learning has become a key technique in deep learning, particularly in time series analysis, due to its ability to learn meaningful representations without explicit supervision. Augmentation is a critical component in contrastive learning, where different augmentations can dramatically impact performance, sometimes influencing accuracy by over 30%. However, the selection… ▽ More Self-supervised contrastive learning has become a key technique in deep learning, particularly in time series analysis, due to its ability to learn meaningful representations without explicit supervision. Augmentation is a critical component in contrastive learning, where different augmentations can dramatically impact performance, sometimes influencing accuracy by over 30%. However, the selection of augmentations is predominantly empirical which can be suboptimal, or grid searching that is time-consuming. In this paper, we establish a principled framework for selecting augmentations based on dataset characteristics such as trend and seasonality. Specifically, we construct 12 synthetic datasets incorporating trend, seasonality, and integration weights. We then evaluate the effectiveness of 8 different augmentations across these synthetic datasets, thereby inducing generalizable associations between time series characteristics and augmentation efficiency. Additionally, we evaluated the induced associations across 6 real-world datasets encompassing domains such as activity recognition, disease diagnosis, traffic monitoring, electricity usage, mechanical fault prognosis, and finance. These real-world datasets are diverse, covering a range from 1 to 12 channels, 2 to 10 classes, sequence lengths of 14 to 1280, and data frequencies from 250 Hz to daily intervals. The experimental results show that our proposed trend-seasonality-based augmentation recommendation algorithm can accurately identify the effective augmentations for a given time series dataset, achieving an average Recall@3 of 0.667, outperforming baselines. Our work provides guidance for studies employing contrastive learning in time series analysis, with wide-ranging applications. All the code, datasets, and analysis results will be released at https://github.com/DL4mHealth/TS-Contrastive-Augmentation-Recommendation. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 20 pages, 11 figures
arXiv:2407.09100 [pdf, other]

q-bio.NC

Retrospective for the Dynamic Sensorium Competition for predicting large-scale mouse primary visual cortex activity from videos

Authors: Polina Turishcheva, Paul G. Fahey, Michaela Vystrčilová, Laura Hansel, Rachel Froebe, Kayla Ponder, Yongrong Qiu, Konstantin F. Willeke, Mohammad Bashiri, Ruslan Baikulov, Yu Zhu, Lei Ma, Shan Yu, Tiejun Huang, Bryan M. Li, Wolf De Wulf, Nina Kudryashova, Matthias H. Hennig, Nathalie L. Rochefort, Arno Onken, Eric Wang, Zhiwei Ding, Andreas S. Tolias, Fabian H. Sinz, Alexander S Ecker

Abstract: Understanding how biological visual systems process information is challenging because of the nonlinear relationship between visual input and neuronal responses. Artificial neural networks allow computational neuroscientists to create predictive models that connect biological and machine vision. Machine learning has benefited tremendously from benchmarks that compare different model on the same ta… ▽ More Understanding how biological visual systems process information is challenging because of the nonlinear relationship between visual input and neuronal responses. Artificial neural networks allow computational neuroscientists to create predictive models that connect biological and machine vision. Machine learning has benefited tremendously from benchmarks that compare different model on the same task under standardized conditions. However, there was no standardized benchmark to identify state-of-the-art dynamic models of the mouse visual system. To address this gap, we established the Sensorium 2023 Benchmark Competition with dynamic input, featuring a new large-scale dataset from the primary visual cortex of ten mice. This dataset includes responses from 78,853 neurons to 2 hours of dynamic stimuli per neuron, together with the behavioral measurements such as running speed, pupil dilation, and eye movements. The competition ranked models in two tracks based on predictive performance for neuronal responses on a held-out test set: one focusing on predicting in-domain natural stimuli and another on out-of-distribution (OOD) stimuli to assess model generalization. As part of the NeurIPS 2023 competition track, we received more than 160 model submissions from 22 teams. Several new architectures for predictive models were proposed, and the winning teams improved the previous state-of-the-art model by 50%. Access to the dataset as well as the benchmarking infrastructure will remain online at www.sensorium-competition.net. △ Less

Submitted 12 July, 2024; originally announced July 2024.
arXiv:2407.09048 [pdf, other]

cs.AI

KUNPENG: An Embodied Large Model for Intelligent Maritime

Authors: Naiyao Wang, Tongbang Jiang, Ye Wang, Shaoyang Qiu, Bo Zhang, Xinqiang Xie, Munan Li, Chunliu Wang, Yiyang Wang, Hongxiang Ren, Ruili Wang, Hongjun Shan, Hongbo Liu

Abstract: Intelligent maritime, as an essential component of smart ocean construction, deeply integrates advanced artificial intelligence technology and data analysis methods, which covers multiple aspects such as smart vessels, route optimization, safe navigation, aiming to enhance the efficiency of ocean resource utilization and the intelligence of transportation networks. However, the complex and dynamic… ▽ More Intelligent maritime, as an essential component of smart ocean construction, deeply integrates advanced artificial intelligence technology and data analysis methods, which covers multiple aspects such as smart vessels, route optimization, safe navigation, aiming to enhance the efficiency of ocean resource utilization and the intelligence of transportation networks. However, the complex and dynamic maritime environment, along with diverse and heterogeneous large-scale data sources, present challenges for real-time decision-making in intelligent maritime. In this paper, We propose KUNPENG, the first-ever embodied large model for intelligent maritime in the smart ocean construction, which consists of six systems. The model perceives multi-source heterogeneous data for the cognition of environmental interaction and make autonomous decision strategies, which are used for intelligent vessels to perform navigation behaviors under safety and emergency guarantees and continuously optimize power to achieve embodied intelligence in maritime. In comprehensive maritime task evaluations, KUNPENG has demonstrated excellent performance. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 9 pages, 3 figures
arXiv:2407.09019 [pdf, other]

cs.SI cs.AI

Heterogeneous Subgraph Network with Prompt Learning for Interpretable Depression Detection on Social Media

Authors: Chen Chen, Mingwei Li, Fenghuan Li, Haopeng Chen, Yuankun Lin

Abstract: Massive social media data can reflect people's authentic thoughts, emotions, communication, etc., and therefore can be analyzed for early detection of mental health problems such as depression. Existing works about early depression detection on social media lacked interpretability and neglected the heterogeneity of social media data. Furthermore, they overlooked the global interaction among users.… ▽ More Massive social media data can reflect people's authentic thoughts, emotions, communication, etc., and therefore can be analyzed for early detection of mental health problems such as depression. Existing works about early depression detection on social media lacked interpretability and neglected the heterogeneity of social media data. Furthermore, they overlooked the global interaction among users. To address these issues, we develop a novel method that leverages a Heterogeneous Subgraph Network with Prompt Learning(HSNPL) and contrastive learning mechanisms. Specifically, prompt learning is employed to map users' implicit psychological symbols with excellent interpretability while deep semantic and diverse behavioral features are incorporated by a heterogeneous information network. Then, the heterogeneous graph network with a dual attention mechanism is constructed to model the relationships among heterogeneous social information at the feature level. Furthermore, the heterogeneous subgraph network integrating subgraph attention and self-supervised contrastive learning is developed to explore complicated interactions among users and groups at the user level. Extensive experimental results demonstrate that our proposed method significantly outperforms state-of-the-art methods for depression detection on social media. △ Less

Submitted 12 July, 2024; originally announced July 2024.
arXiv:2407.08986 [pdf]

cs.CY

Exploring Generative AI Policies in Higher Education: A Comparative Perspective from China, Japan, Mongolia, and the USA

Authors: Qin Xie, Ming Li, Ariunaa Enkhtur

Abstract: This study conducts a comparative analysis of national policies on Generative AI across four countries: China, Japan, Mongolia, and the USA. Employing the Qualitative Comparative Analysis (QCA) method, it examines the responses of these nations to Generative AI in higher education settings, scrutinizing the diversity in their approaches within this group. While all four countries exhibit a positiv… ▽ More This study conducts a comparative analysis of national policies on Generative AI across four countries: China, Japan, Mongolia, and the USA. Employing the Qualitative Comparative Analysis (QCA) method, it examines the responses of these nations to Generative AI in higher education settings, scrutinizing the diversity in their approaches within this group. While all four countries exhibit a positive attitude toward Generative AI in higher education, Japan and the USA prioritize a human-centered approach and provide direct guidance in teaching and learning. In contrast, China and Mongolia prioritize national security concerns, with their guidelines focusing more on the societal level rather than being specifically tailored to education. Additionally, despite all four countries emphasizing diversity, equity, and inclusion, they consistently fail to clearly discuss or implement measures to address the digital divide. By offering a comprehensive comparative analysis of attitudes and policies regarding Generative AI in higher education across these countries, this study enriches existing literature and provides policymakers with a global perspective, ensuring that policies in this domain promote inclusion rather than exclusion. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 14 pages, 1 table
arXiv:2407.08651 [pdf, other]

cs.CR cs.DC

SpiralShard: Highly Concurrent and Secure Blockchain Sharding via Linked Cross-shard Endorsement

Authors: You Lin, Mingzhe Li, Jin Zhang

Abstract: Blockchain sharding improves the scalability of blockchain systems by partitioning the whole blockchain state, nodes, and transaction workloads into different shards. However, existing blockchain sharding systems generally suffer from a small number of shards, resulting in limited concurrency. The main reason is that existing sharding systems require large shard sizes to ensure security. To enhanc… ▽ More Blockchain sharding improves the scalability of blockchain systems by partitioning the whole blockchain state, nodes, and transaction workloads into different shards. However, existing blockchain sharding systems generally suffer from a small number of shards, resulting in limited concurrency. The main reason is that existing sharding systems require large shard sizes to ensure security. To enhance the concurrency of blockchain sharding securely, we propose SpiralShard. The intuition is to allow the existence of some shards with a larger fraction of malicious nodes (i.e., corrupted shards), thus reducing shard sizes. SpiralShard can configure more and smaller shards for higher concurrency at the same network size. To ensure security with the existence of corrupted shards, we propose the Linked Cross-shard Endorsement (LCE) protocol. According to our LCE protocol, the blocks of each shard are sequentially verified and endorsed by a group of shards before being finalized. As a result, a corrupted shard can eliminate forks with the help of the other shards. We implement SpiralShard based on Harmony and conduct extensive evaluations. Experimental results show that, compared with Harmony, SpiralShard achieves around 19x throughput gain under a large network size with 4,000+ nodes. △ Less

Submitted 9 July, 2024; originally announced July 2024.
arXiv:2407.08537 [pdf, other]

cs.NI cs.CR

BriDe Arbitrager: Enhancing Arbitrage in Ethereum 2.0 via Bribery-enabled Delayed Block Production

Authors: Hulin Yang, Mingzhe Li, Jin Zhang, Alia Asheralieva, Qingsong Wei, Siow Mong Rick Goh

Abstract: The advent of Ethereum 2.0 has introduced significant changes, particularly the shift to Proof-of-Stake consensus. This change presents new opportunities and challenges for arbitrage. Amidst these changes, we introduce BriDe Arbitrager, a novel tool designed for Ethereum 2.0 that leverages Bribery-driven attacks to Delay block production and increase arbitrage gains. The main idea is to allow mali… ▽ More The advent of Ethereum 2.0 has introduced significant changes, particularly the shift to Proof-of-Stake consensus. This change presents new opportunities and challenges for arbitrage. Amidst these changes, we introduce BriDe Arbitrager, a novel tool designed for Ethereum 2.0 that leverages Bribery-driven attacks to Delay block production and increase arbitrage gains. The main idea is to allow malicious proposers to delay block production by bribing validators/proposers, thereby gaining more time to identify arbitrage opportunities. Through analysing the bribery process, we design an adaptive bribery strategy. Additionally, we propose a Delayed Transaction Ordering Algorithm to leverage the delayed time to amplify arbitrage profits for malicious proposers. To ensure fairness and automate the bribery process, we design and implement a bribery smart contract and a bribery client. As a result, BriDe Arbitrager enables adversaries controlling a limited (< 1/4) fraction of the voting powers to delay block production via bribery and arbitrage more profit. Extensive experimental results based on Ethereum historical transactions demonstrate that BriDe Arbitrager yields an average of 8.66 ETH (16,442.23 USD) daily profits. Furthermore, our approach does not trigger any slashing mechanisms and remains effective even under Proposer Builder Separation and other potential mechanisms will be adopted by Ethereum. △ Less

Submitted 11 July, 2024; originally announced July 2024.
arXiv:2407.08443 [pdf, other]

cs.CV

Infinite Motion: Extended Motion Generation via Long Text Instructions

Authors: Mengtian Li, Chengshuo Zhai, Shengxiang Yao, Zhifeng Xie, Keyu Chen, Yu-Gang Jiang

Abstract: In the realm of motion generation, the creation of long-duration, high-quality motion sequences remains a significant challenge. This paper presents our groundbreaking work on "Infinite Motion", a novel approach that leverages long text to extended motion generation, effectively bridging the gap between short and long-duration motion synthesis. Our core insight is the strategic extension and reass… ▽ More In the realm of motion generation, the creation of long-duration, high-quality motion sequences remains a significant challenge. This paper presents our groundbreaking work on "Infinite Motion", a novel approach that leverages long text to extended motion generation, effectively bridging the gap between short and long-duration motion synthesis. Our core insight is the strategic extension and reassembly of existing high-quality text-motion datasets, which has led to the creation of a novel benchmark dataset to facilitate the training of models for extended motion sequences. A key innovation of our model is its ability to accept arbitrary lengths of text as input, enabling the generation of motion sequences tailored to specific narratives or scenarios. Furthermore, we incorporate the timestamp design for text which allows precise editing of local segments within the generated sequences, offering unparalleled control and flexibility in motion synthesis. We further demonstrate the versatility and practical utility of "Infinite Motion" through three specific applications: natural language interactive editing, motion sequence editing within long sequences and splicing of independent motion sequences. Each application highlights the adaptability of our approach and broadens the spectrum of possibilities for research and development in motion generation. Through extensive experiments, we demonstrate the superior performance of our model in generating long sequence motions compared to existing methods.Project page: https://shuochengzhai.github.io/Infinite-motion.github.io/ △ Less

Submitted 12 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

Comments: 12 pages,13 figures
arXiv:2407.08273

cs.CL

RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL

Authors: Zhenhe Wu, Zhongqiu Li, Jie Zhang, Mengxiang Li, Yu Zhao, Ruiyu Fang, Zhongjiang He, Xuelong Li, Zhoujun Li, Shuangyong Song

Abstract: Large language models (LLMs) with in-context learning have significantly improved the performance of text-to-SQL task. Previous works generally focus on using exclusive SQL generation prompt to improve the LLMs' reasoning ability. However, they are mostly hard to handle large databases with numerous tables and columns, and usually ignore the significance of pre-processing database and extracting v… ▽ More Large language models (LLMs) with in-context learning have significantly improved the performance of text-to-SQL task. Previous works generally focus on using exclusive SQL generation prompt to improve the LLMs' reasoning ability. However, they are mostly hard to handle large databases with numerous tables and columns, and usually ignore the significance of pre-processing database and extracting valuable information for more efficient prompt engineering. Based on above analysis, we propose RB-SQL, a novel retrieval-based LLM framework for in-context prompt engineering, which consists of three modules that retrieve concise tables and columns as schema, and targeted examples for in-context learning. Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider. △ Less

Submitted 12 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

Comments: Further improvement and modification are needed.
arXiv:2407.08255 [pdf, other]

cs.CV cs.LG

GraphMamba: An Efficient Graph Structure Learning Vision Mamba for Hyperspectral Image Classification

Authors: Aitao Yang, Min Li, Yao Ding, Leyuan Fang, Yaoming Cai, Yujie He

Abstract: Efficient extraction of spectral sequences and geospatial information has always been a hot topic in hyperspectral image classification. In terms of spectral sequence feature capture, RNN and Transformer have become mainstream classification frameworks due to their long-range feature capture capabilities. In terms of spatial information aggregation, CNN enhances the receptive field to retain integ… ▽ More Efficient extraction of spectral sequences and geospatial information has always been a hot topic in hyperspectral image classification. In terms of spectral sequence feature capture, RNN and Transformer have become mainstream classification frameworks due to their long-range feature capture capabilities. In terms of spatial information aggregation, CNN enhances the receptive field to retain integrated spatial information as much as possible. However, the spectral feature-capturing architectures exhibit low computational efficiency, and CNNs lack the flexibility to perceive spatial contextual information. To address these issues, this paper proposes GraphMamba--an efficient graph structure learning vision Mamba classification framework that fully considers HSI characteristics to achieve deep spatial-spectral information mining. Specifically, we propose a novel hyperspectral visual GraphMamba processing paradigm (HVGM) that preserves spatial-spectral features by constructing spatial-spectral cubes and utilizes linear spectral encoding to enhance the operability of subsequent tasks. The core components of GraphMamba include the HyperMamba module for improving computational efficiency and the SpectralGCN module for adaptive spatial context awareness. The HyperMamba mitigates clutter interference by employing the global mask (GM) and introduces a parallel training inference architecture to alleviate computational bottlenecks. The SpatialGCN incorporates weighted multi-hop aggregation (WMA) spatial encoding to focus on highly correlated spatial structural features, thus flexibly aggregating contextual information while mitigating spatial noise interference. Extensive experiments were conducted on three different scales of real HSI datasets, and compared with the state-of-the-art classification frameworks, GraphMamba achieved optimal performance. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 13 pages, 10 figures
arXiv:2407.08125 [pdf, ps, other]

cs.LG

Real-Time Summarization of Twitter

Authors: Yixin Jin, Meiqi Wang, Meng Li, Wenjing Zhou, Yi Shen, Hao Liu

Abstract: In this paper, we describe our approaches to TREC Real-Time Summarization of Twitter. We focus on real time push notification scenario, which requires a system monitors the stream of sampled tweets and returns the tweets relevant and novel to given interest profiles. Dirichlet score with and with very little smoothing (baseline) are employed to classify whether a tweet is relevant to a given inter… ▽ More In this paper, we describe our approaches to TREC Real-Time Summarization of Twitter. We focus on real time push notification scenario, which requires a system monitors the stream of sampled tweets and returns the tweets relevant and novel to given interest profiles. Dirichlet score with and with very little smoothing (baseline) are employed to classify whether a tweet is relevant to a given interest profile. Using metrics including Mean Average Precision (MAP, cumulative gain (CG) and discount cumulative gain (DCG), the experiment indicates that our approach has a good performance. It is also desired to remove the redundant tweets from the pushing queue. Due to the precision limit, we only describe the algorithm in this paper. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: This paper was accepted to International Conference on Artificial Intelligence and Electromechanical Automation 2024
arXiv:2407.08039 [pdf, other]

cs.CL

Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models

Authors: Yuji Zhang, Sha Li, Jiateng Liu, Pengfei Yu, Yi R. Fung, Jing Li, Manling Li, Heng Ji

Abstract: Hallucination is often regarded as a major impediment for using large language models (LLMs), especially for knowledge-intensive tasks. Even when the training corpus consists solely of true statements, language models still generate hallucinations in the form of amalgamations of multiple facts. We coin this phenomenon as ``knowledge overshadowing'': when we query knowledge from a language model wi… ▽ More Hallucination is often regarded as a major impediment for using large language models (LLMs), especially for knowledge-intensive tasks. Even when the training corpus consists solely of true statements, language models still generate hallucinations in the form of amalgamations of multiple facts. We coin this phenomenon as ``knowledge overshadowing'': when we query knowledge from a language model with multiple conditions, some conditions overshadow others, leading to hallucinated outputs. This phenomenon partially stems from training data imbalance, which we verify on both pretrained models and fine-tuned models, over a wide range of LM model families and sizes.From a theoretical point of view, knowledge overshadowing can be interpreted as over-generalization of the dominant conditions (patterns). We show that the hallucination rate grows with both the imbalance ratio (between the popular and unpopular condition) and the length of dominant condition description, consistent with our derived generalization bound. Finally, we propose to utilize overshadowing conditions as a signal to catch hallucination before it is produced, along with a training-free self-contrastive decoding method to alleviate hallucination during inference. Our proposed approach showcases up to 82% F1 for hallucination anticipation and 11.2% to 39.4% hallucination control, with different models and datasets. △ Less

Submitted 10 July, 2024; originally announced July 2024.
arXiv:2407.07723 [pdf, other]

cs.IT cs.AI

Understanding is Compression

Authors: Ziguang Li, Chao Huang, Xuliang Wang, Haibo Hu, Cole Wyeth, Dongbo Bu, Quan Yu, Wen Gao, Xingwu Liu, Ming Li

Abstract: We have previously shown all understanding or learning are compression, under reasonable assumptions. In principle, better understanding of data should improve data compression. Traditional compression methodologies focus on encoding frequencies or some other computable properties of data. Large language models approximate the uncomputable Solomonoff distribution, opening up a whole new avenue to… ▽ More We have previously shown all understanding or learning are compression, under reasonable assumptions. In principle, better understanding of data should improve data compression. Traditional compression methodologies focus on encoding frequencies or some other computable properties of data. Large language models approximate the uncomputable Solomonoff distribution, opening up a whole new avenue to justify our theory. Under the new uncomputable paradigm, we present LMCompress based on the understanding of data using large models. LMCompress has significantly better lossless compression ratios than all other lossless data compression methods, doubling the compression ratios of JPEG-XL for images, FLAC for audios and H264 for videos, and tripling or quadrupling the compression ratio of bz2 for texts. The better a large model understands the data, the better LMCompress compresses. △ Less

Submitted 23 June, 2024; originally announced July 2024.
arXiv:2407.07651 [pdf, other]

hep-ex physics.data-an

Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$

Authors: M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (645 additional authors not shown)

Abstract: The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be… ▽ More The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes. △ Less

Submitted 10 July, 2024; originally announced July 2024.
arXiv:2407.07059 [pdf, other]

q-bio.NC cs.LG

Differentiable Optimization of Similarity Scores Between Models and Brains

Authors: Nathan Cloos, Moufan Li, Markus Siegel, Scott L. Brincat, Earl K. Miller, Guangyu Robert Yang, Christopher J. Cueva

Abstract: What metrics should guide the development of more realistic models of the brain? One proposal is to quantify the similarity between models and brains using methods such as linear regression, Centered Kernel Alignment (CKA), and angular Procrustes distance. To better understand the limitations of these similarity measures we analyze neural activity recorded in five experiments on nonhuman primates,… ▽ More What metrics should guide the development of more realistic models of the brain? One proposal is to quantify the similarity between models and brains using methods such as linear regression, Centered Kernel Alignment (CKA), and angular Procrustes distance. To better understand the limitations of these similarity measures we analyze neural activity recorded in five experiments on nonhuman primates, and optimize synthetic datasets to become more similar to these neural recordings. How similar can these synthetic datasets be to neural activity while failing to encode task relevant variables? We find that some measures like linear regression and CKA, differ from angular Procrustes, and yield high similarity scores even when task relevant variables cannot be linearly decoded from the synthetic datasets. Synthetic datasets optimized to maximize similarity scores initially learn the first principal component of the target dataset, but angular Procrustes captures higher variance dimensions much earlier than methods like linear regression and CKA. We show in both theory and simulations how these scores change when different principal components are perturbed. And finally, we jointly optimize multiple similarity scores to find their allowed ranges, and show that a high angular Procrustes similarity, for example, implies a high CKA score, but not the converse. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: 16 pages, 6 figures
arXiv:2407.06953 [pdf, other]

cs.DC

SP-Chain: Boosting Intra-Shard and Cross-Shard Security and Performance in Blockchain Sharding

Authors: Mingzhe Li, You Lin, Wei Wang, Jin Zhang

Abstract: A promising way to overcome the scalability limitations of the current blockchain is to use sharding, which is to split the transaction processing among multiple, smaller groups of nodes. A well-performed blockchain sharding system requires both high performance and high security in both intra- and cross-shard perspectives. However, existing protocols either have issues on protecting security or t… ▽ More A promising way to overcome the scalability limitations of the current blockchain is to use sharding, which is to split the transaction processing among multiple, smaller groups of nodes. A well-performed blockchain sharding system requires both high performance and high security in both intra- and cross-shard perspectives. However, existing protocols either have issues on protecting security or trade off great performance for security. In this paper, we propose SP-Chain, a blockchain sharding system with enhanced Security and Performance for both intra- and cross-shard perspectives. For intra-shard aspect, we design a two-phase concurrent voting scheme to provide high system throughput and low transaction confirmation latency. Moreover, we propose an efficient unbiased leader rotation scheme to ensure high performance under malicious behavior. For cross-shard aspect, a proof-assisted efficient cross-shard transaction processing mechanism is proposed to guard the cross-shard transactions with low overhead. We implement SP-Chain based on Harmony, and evaluate its performance via large-scale deployment. Extensive evaluations suggest that SP-Chain can process more than 10,000 tx/sec under malicious behaviors with a confirmation latency of 7.6s in a network of 4,000 nodes. △ Less

Submitted 9 July, 2024; originally announced July 2024.
arXiv:2407.06882 [pdf, other]

cs.DC

DL-Chain: Scalable and Stable Blockchain Sharding with High Concurrency via Dual-Layer Consensus

Authors: You Lin, Mingzhe Li, Qingsong Wei, Yong Liu, Siow Mong Rick Goh, Jin Zhang

Abstract: Sharding enhances blockchain scalability by partitioning nodes into multiple groups for concurrent transaction processing. Configuring a large number of \emph{small shards} helps improve the transaction concurrency of a sharding system. However, it increases the fraction of malicious nodes within each shard, easily leading to shard corruption and jeopardizing system security. Some existing works h… ▽ More Sharding enhances blockchain scalability by partitioning nodes into multiple groups for concurrent transaction processing. Configuring a large number of \emph{small shards} helps improve the transaction concurrency of a sharding system. However, it increases the fraction of malicious nodes within each shard, easily leading to shard corruption and jeopardizing system security. Some existing works have attempted to improve concurrency by reducing the shard size while maintaining security. However, they often require frequent and time-consuming recovery of corrupted shards, leading to severe system stagnation. Also, they usually require network-wide consensus to guarantee security, which limits scalability. To address these issues, we propose DL-Chain, a blockchain sharding system that can securely provide \emph{high concurrency with stable and scalable performance.} Our core idea is a \underline{D}ual-\underline{L}ayer architecture and consensus, which consists of numerous smaller proposer shards (PSs) for transaction processing and multiple larger finalizer committees (FCs) for transaction finalization. To avoid system stagnation and thus guarantee stable performance, we ensure PSs' liveness even if they are corrupted through the cooperation of PSs and FCs, thus eliminating the recovery process of corrupted PSs. To better trade-off security and scalability, we fine-tune the FCs to enable multiple FCs to coexist securely. As a result, DL-Chain allows a larger fraction of malicious nodes in each PS ($<1/2$) and thus can securely configure smaller shards for boosted stable and scalable concurrency. Evaluation results show that DL-Chain achieves up to 10 times improvement in throughput compared to existing solutions and provides stable concurrency with up to 2,550 nodes. △ Less

Submitted 9 July, 2024; originally announced July 2024.
arXiv:2407.05763 [pdf, other]

math.OC cs.MA eess.SY

Homogeneous Distributed Observers for Quasilinear Systems

Authors: Min Li, Andrey Polyakov, Siyuan Wang, Gang Zheng

Abstract: The problem of finite/fixed-time cooperative state estimation is considered for a class of quasilinear systems with nonlinearities satisfying a Hölder condition. A strongly connected nonlinear distributed observer is designed under the assumption of global observability. By proper parameter tuning with linear matrix inequalities, the observer error equation possesses finite/fixed-time stability in… ▽ More The problem of finite/fixed-time cooperative state estimation is considered for a class of quasilinear systems with nonlinearities satisfying a Hölder condition. A strongly connected nonlinear distributed observer is designed under the assumption of global observability. By proper parameter tuning with linear matrix inequalities, the observer error equation possesses finite/fixed-time stability in the perturbation-free case and input-to-state stability with respect to bounded perturbations. Numerical simulations are performed to validate this design. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: This manuscript has been submitted for a possible journal publication
arXiv:2407.05591 [pdf, other]

cs.LG cs.CL cs.NE

On the Power of Convolution Augmented Transformer

Authors: Mingchen Li, Xuechen Zhang, Yixiao Huang, Samet Oymak

Abstract: The transformer architecture has catalyzed revolutionary advances in language modeling. However, recent architectural recipes, such as state-space models, have bridged the performance gap. Motivated by this, we examine the benefits of Convolution-Augmented Transformer (CAT) for recall, copying, and length generalization tasks. CAT incorporates convolutional filters in the K/Q/V embeddings of an at… ▽ More The transformer architecture has catalyzed revolutionary advances in language modeling. However, recent architectural recipes, such as state-space models, have bridged the performance gap. Motivated by this, we examine the benefits of Convolution-Augmented Transformer (CAT) for recall, copying, and length generalization tasks. CAT incorporates convolutional filters in the K/Q/V embeddings of an attention layer. Through CAT, we show that the locality of the convolution synergizes with the global view of the attention. Unlike comparable architectures, such as Mamba or transformer, CAT can provably solve the associative recall (AR) and copying tasks using a single layer while also enjoying guaranteed length generalization. We also establish computational tradeoffs between convolution and attention by characterizing how convolution can mitigate the need for full attention by summarizing the context window and creating salient summary tokens to attend. Evaluations on real datasets corroborate our findings and demonstrate that CAT and its variations indeed enhance the language modeling performance. △ Less

Submitted 8 July, 2024; originally announced July 2024.
arXiv:2407.05499 [pdf, other]

eess.SY

Towards Reliable Neural Optimizers: A Permutation Equivariant Neural Approximation for Information Processing Applications

Authors: Meiyi Li, Javad Mohammadi

Abstract: The complexities of information processing across Dynamic Data Driven Applications Systems drive the development and adoption of Artificial Intelligence-based optimization solutions. Traditional solvers often suffer from slow response times and an inability to adapt swiftly to real-time input variations. To address these deficiencies, we will expand on our previous research in neural-based optimiz… ▽ More The complexities of information processing across Dynamic Data Driven Applications Systems drive the development and adoption of Artificial Intelligence-based optimization solutions. Traditional solvers often suffer from slow response times and an inability to adapt swiftly to real-time input variations. To address these deficiencies, we will expand on our previous research in neural-based optimizers by introducing a machine learning-enabled neural approximation model called LOOP-PE (Learning to Optimize the Optimization Process -- Permutation Equivariance version). This model not only enhances decision-making efficiency but also dynamically adapts to variations of data collections from sensor networks. In this work, we focus on mitigating the heterogeneity issues of data collection from sensor networks, including sensor dropout and failures, communication delays, and the complexities involved in integrating new sensors during system scaling. The proposed LOOP-PE model specifically overcomes these issues with a unique structure that is permutation equivariant, allowing it to accommodate inputs from a varying number of sensors and directly linking these inputs to their optimal operational outputs. This design significantly boosts the system's flexibility and adaptability, especially in scenarios characterized by unordered, distributed, and asynchronous data collections. Moreover, our approach increases the robustness of decision-making by integrating physical constraints through the generalized gauge map method, which theoretically ensures the decisions' practical feasibility and operational viability under dynamic conditions. We use a DDDAS case study to demonstrate that LOOP-PE model reliably delivers near-optimal and adaptable solutions, significantly outperforming traditional methods in managing the complexities of multi-sensor environments for real-time deployments. △ Less

Submitted 7 July, 2024; originally announced July 2024.
arXiv:2407.05086 [pdf, other]

physics.optics

Topological edge states in photonic Floquet insulator with unpaired Dirac cones

Authors: Hua Zhong, Yaroslav V. Kartashov, Yongdong Li, Ming Li, Yiqi Zhang

Abstract: Topological insulators are most frequently constructed using lattices with specific degeneracies in their linear spectra, such as Dirac points. For a broad class of lattices, such as honeycomb ones, these points and associated Dirac cones generally appear in non-equivalent pairs. Simultaneous breakup of the time-reversal and inversion symmetry in systems based on such lattices may result in the fo… ▽ More Topological insulators are most frequently constructed using lattices with specific degeneracies in their linear spectra, such as Dirac points. For a broad class of lattices, such as honeycomb ones, these points and associated Dirac cones generally appear in non-equivalent pairs. Simultaneous breakup of the time-reversal and inversion symmetry in systems based on such lattices may result in the formation of the unpaired Dirac cones in bulk spectrum, but the existence of topologically protected edge states in such structures remains an open problem. Here photonic Floquet insulator on honeycomb lattice with unpaired Dirac cones in its spectrum is introduced that can support unidirectional edge states appearing at the edge between two regions with opposite sublattice detuning. Topological properties of this system are characterized by the nonzero valley Chern number. Remarkably, edge states in this system can circumvent sharp corners without inter-valley scattering even though there is no total forbidden gap in the spectrum. Our results reveal unusual interplay between two different physical mechanisms of creation of topological edge states based on simultaneous breakup of different symmetries of the system. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: 9 pages, 7 figures. To appear in Photonics Research. Comments are welcome
arXiv:2407.05082 [pdf, other]

cs.LG cs.AI cs.CV stat.ML

DMTG: One-Shot Differentiable Multi-Task Grouping

Authors: Yuan Gao, Shuguo Jiang, Moran Li, Jin-Gang Yu, Gui-Song Xia

Abstract: We aim to address Multi-Task Learning (MTL) with a large number of tasks by Multi-Task Grouping (MTG). Given N tasks, we propose to simultaneously identify the best task groups from 2^N candidates and train the model weights simultaneously in one-shot, with the high-order task-affinity fully exploited. This is distinct from the pioneering methods which sequentially identify the groups and train th… ▽ More We aim to address Multi-Task Learning (MTL) with a large number of tasks by Multi-Task Grouping (MTG). Given N tasks, we propose to simultaneously identify the best task groups from 2^N candidates and train the model weights simultaneously in one-shot, with the high-order task-affinity fully exploited. This is distinct from the pioneering methods which sequentially identify the groups and train the model weights, where the group identification often relies on heuristics. As a result, our method not only improves the training efficiency, but also mitigates the objective bias introduced by the sequential procedures that potentially lead to a suboptimal solution. Specifically, we formulate MTG as a fully differentiable pruning problem on an adaptive network architecture determined by an underlying Categorical distribution. To categorize N tasks into K groups (represented by K encoder branches), we initially set up KN task heads, where each branch connects to all N task heads to exploit the high-order task-affinity. Then, we gradually prune the KN heads down to N by learning a relaxed differentiable Categorical distribution, ensuring that each task is exclusively and uniquely categorized into only one branch. Extensive experiments on CelebA and Taskonomy datasets with detailed ablations show the promising performance and efficiency of our method. The codes are available at https://github.com/ethanygao/DMTG. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: Accepted to ICML 2024

Journal ref: International Conference on Machine Learning (ICML), 2024
arXiv:2407.04955 [pdf, other]

cs.CV

Asynchronous Multimodal Video Sequence Fusion via Learning Modality-Exclusive and -Agnostic Representations

Authors: Dingkang Yang, Mingcheng Li, Linhao Qu, Kun Yang, Peng Zhai, Song Wang, Lihua Zhang

Abstract: Understanding human intentions (e.g., emotions) from videos has received considerable attention recently. Video streams generally constitute a blend of temporal data stemming from distinct modalities, including natural language, facial expressions, and auditory clues. Despite the impressive advancements of previous works via attention-based paradigms, the inherent temporal asynchrony and modality… ▽ More Understanding human intentions (e.g., emotions) from videos has received considerable attention recently. Video streams generally constitute a blend of temporal data stemming from distinct modalities, including natural language, facial expressions, and auditory clues. Despite the impressive advancements of previous works via attention-based paradigms, the inherent temporal asynchrony and modality heterogeneity challenges remain in multimodal sequence fusion, causing adverse performance bottlenecks. To tackle these issues, we propose a Multimodal fusion approach for learning modality-Exclusive and modality-Agnostic representations (MEA) to refine multimodal features and leverage the complementarity across distinct modalities. On the one hand, MEA introduces a predictive self-attention module to capture reliable context dynamics within modalities and reinforce unique features over the modality-exclusive spaces. On the other hand, a hierarchical cross-modal attention module is designed to explore valuable element correlations among modalities over the modality-agnostic space. Meanwhile, a double-discriminator strategy is presented to ensure the production of distinct representations in an adversarial manner. Eventually, we propose a decoupled graph fusion mechanism to enhance knowledge exchange across heterogeneous modalities and learn robust multimodal representations for downstream tasks. Numerous experiments are implemented on three multimodal datasets with asynchronous sequences. Systematic analyses show the necessity of our approach. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: TCSVT 2024
arXiv:2407.04928 [pdf, other]

cs.CV eess.IV

CLIPVQA:Video Quality Assessment via CLIP

Authors: Fengchuang Xing, Mingjie Li, Yuan-Gen Wang, Guopu Zhu, Xiaochun Cao

Abstract: In learning vision-language representations from web-scale data, the contrastive language-image pre-training (CLIP) mechanism has demonstrated a remarkable performance in many vision tasks. However, its application to the widely studied video quality assessment (VQA) task is still an open issue. In this paper, we propose an efficient and effective CLIP-based Transformer method for the VQA problem… ▽ More In learning vision-language representations from web-scale data, the contrastive language-image pre-training (CLIP) mechanism has demonstrated a remarkable performance in many vision tasks. However, its application to the widely studied video quality assessment (VQA) task is still an open issue. In this paper, we propose an efficient and effective CLIP-based Transformer method for the VQA problem (CLIPVQA). Specifically, we first design an effective video frame perception paradigm with the goal of extracting the rich spatiotemporal quality and content information among video frames. Then, the spatiotemporal quality features are adequately integrated together using a self-attention mechanism to yield video-level quality representation. To utilize the quality language descriptions of videos for supervision, we develop a CLIP-based encoder for language embedding, which is then fully aggregated with the generated content information via a cross-attention module for producing video-language representation. Finally, the video-level quality and video-language representations are fused together for final video quality prediction, where a vectorized regression loss is employed for efficient end-to-end optimization. Comprehensive experiments are conducted on eight in-the-wild video datasets with diverse resolutions to evaluate the performance of CLIPVQA. The experimental results show that the proposed CLIPVQA achieves new state-of-the-art VQA performance and up to 37% better generalizability than existing benchmark VQA methods. A series of ablation studies are also performed to validate the effectiveness of each module in CLIPVQA. △ Less

Submitted 5 July, 2024; originally announced July 2024.
arXiv:2407.04913 [pdf, other]

astro-ph.SR

Tilted Disk Precession and Negative Superhumps in HS 2325+8205: A Multi-Window Analysis

Authors: Qi-Bin Sun, Sheng-Bang Qian, Li-Ying Zhu, Qin-Mei Li, Min-Yu Li, Ping Li

Abstract: Tilted disk precession exists in different objects. Negative superhumps (NSHs) in cataclysmic variable stars (CVs) are hypothesized to arise from the interaction between the reverse precession of a tilted disk and the streams from the secondary star. Utilizing TESS photometry, we present a comprehensive investigation into the tilted disk precession and NSHs in the dwarf nova (DN) HS 2325+8205, emp… ▽ More Tilted disk precession exists in different objects. Negative superhumps (NSHs) in cataclysmic variable stars (CVs) are hypothesized to arise from the interaction between the reverse precession of a tilted disk and the streams from the secondary star. Utilizing TESS photometry, we present a comprehensive investigation into the tilted disk precession and NSHs in the dwarf nova (DN) HS 2325+8205, employing eclipse minima, eclipse depths, NSH frequencies, and NSH amplitudes and the correlation between them as the windows. We report the discovery of NSHs in HS 2325+8205 with a period of 0.185671(17) d. The NSH frequency was found to vary with a period of 3.943(9) d, similar to the tilted disk precession period validated in novae-like star (NL, SDSS J0812) and intermediate polar (IP, TV Col). The eclipsing minima of O-C were similarly found to vary cyclically in period 4.135(5) d, showing faster rise than fall. Furthermore, NSH amplitude varies parabolically, linearly increasing with periodic variations, potentially linked to changes in disk radius, mass transfer rate, and apparent area of the hot spot. Additionally, for the first time in DNe, we observe bi-periodic variations in eclipse depth (P1= 4.131(4) d and P2= 2.065(2) d ~ Pprec/2), resembling those seen in IPs, suggesting that variations with P2 are not attributable to the accretion curtain. Moreover, NSH amplitude and eclipse depth decrease with increasing NSH frequency, while NSH amplitude correlates positively with eclipse depth.These complex variations observed across multiple observational windows provide substantial evidence for understanding of tilted disk precession and NSHs. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 26 pages, 10 figures and 2 tables
arXiv:2407.04557 [pdf, other]

cond-mat.mtrl-sci cs.LG

Structural Constraint Integration in Generative Model for Discovery of Quantum Material Candidates

Authors: Ryotaro Okabe, Mouyang Cheng, Abhijatmedhi Chotrattanapituk, Nguyen Tuan Hung, Xiang Fu, Bowen Han, Yao Wang, Weiwei Xie, Robert J. Cava, Tommi S. Jaakkola, Yongqiang Cheng, Mingda Li

Abstract: Billions of organic molecules are known, but only a tiny fraction of the functional inorganic materials have been discovered, a particularly relevant problem to the community searching for new quantum materials. Recent advancements in machine-learning-based generative models, particularly diffusion models, show great promise for generating new, stable materials. However, integrating geometric patt… ▽ More Billions of organic molecules are known, but only a tiny fraction of the functional inorganic materials have been discovered, a particularly relevant problem to the community searching for new quantum materials. Recent advancements in machine-learning-based generative models, particularly diffusion models, show great promise for generating new, stable materials. However, integrating geometric patterns into materials generation remains a challenge. Here, we introduce Structural Constraint Integration in the GENerative model (SCIGEN). Our approach can modify any trained generative diffusion model by strategic masking of the denoised structure with a diffused constrained structure prior to each diffusion step to steer the generation toward constrained outputs. Furthermore, we mathematically prove that SCIGEN effectively performs conditional sampling from the original distribution, which is crucial for generating stable constrained materials. We generate eight million compounds using Archimedean lattices as prototype constraints, with over 10% surviving a multi-staged stability pre-screening. High-throughput density functional theory (DFT) on 26,000 survived compounds shows that over 50% passed structural optimization at the DFT level. Since the properties of quantum materials are closely related to geometric patterns, our results indicate that SCIGEN provides a general framework for generating quantum materials candidates. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 512 pages total, 4 main figures + 218 supplementary figures
arXiv:2407.04524 [pdf, other]

math.NA

Energy-stable parametric finite element approximations for regularized solid-state dewetting in strongly anisotropic materials

Authors: Meng Li, Chunjie Zhou

Abstract: In this work, we aim to develop energy-stable parametric finite element approximations for a sharp-interface model with strong surface energy anisotropy, which is derived from the first variation of an energy functional composed of film/vapor interfacial energy, substrate energy, and regularized Willmore energy. By introducing two geometric relations, we innovatively establish an equivalent regula… ▽ More In this work, we aim to develop energy-stable parametric finite element approximations for a sharp-interface model with strong surface energy anisotropy, which is derived from the first variation of an energy functional composed of film/vapor interfacial energy, substrate energy, and regularized Willmore energy. By introducing two geometric relations, we innovatively establish an equivalent regularized sharp-interface model and further construct an energy-stable parametric finite element algorithm for this equivalent model. We provide a detailed proof of the energy stability of the numerical scheme, addressing a gap in the relevant theory. Additionally, we develop another structure-preserving parametric finite element scheme that can preserve both area conservation and energy stability. Finally, we present several numerical simulations to show accuracy and efficiency as well as some structure-preserving properties of the proposed numerical methods. More importantly, extensive numerical simulations reveal that our schemes provide better mesh quality and are more suitable for long-term computations. △ Less

Submitted 5 July, 2024; originally announced July 2024.
arXiv:2407.04211 [pdf, other]

cs.LG

TimeLDM: Latent Diffusion Model for Unconditional Time Series Generation

Authors: Jian Qian, Miao Sun, Sifan Zhou, Biao Wan, Minhao Li, Patrick Chiang

Abstract: Time series generation is a crucial research topic in the area of deep learning, which can be used for data augmentation, imputing missing values, and forecasting. Currently, latent diffusion models are ascending to the forefront of generative modeling for many important data representations. Being the most pivotal in the computer vision domain, latent diffusion models have also recently attracted… ▽ More Time series generation is a crucial research topic in the area of deep learning, which can be used for data augmentation, imputing missing values, and forecasting. Currently, latent diffusion models are ascending to the forefront of generative modeling for many important data representations. Being the most pivotal in the computer vision domain, latent diffusion models have also recently attracted interest in other communities, including NLP, Speech, and Geometric Space. In this work, we propose TimeLDM, a novel latent diffusion model for high-quality time series generation. TimeLDM is composed of a variational autoencoder that encodes time series into an informative and smoothed latent content and a latent diffusion model operating in the latent space to generate latent information. We evaluate the ability of our method to generate synthetic time series with simulated and realistic datasets, benchmark the performance against existing state-of-the-art methods. Qualitatively and quantitatively, we find that the proposed TimeLDM persistently delivers high-quality generated time series. Sores from Context-FID and Discriminative indicate that TimeLDM consistently and significantly outperforms current state-of-the-art benchmarks with an average improvement of 3.4$\times$ and 3.8$\times$, respectively. Further studies demonstrate that our method presents better performance on different lengths of time series data generation. To the best of our knowledge, this is the first study to explore the potential of the latent diffusion model for unconditional time series generation and establish a new baseline for synthetic time series. △ Less

Submitted 4 July, 2024; originally announced July 2024.
arXiv:2407.02899 [pdf, other]

hep-ex

Measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be… ▽ More A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be $\mathcal{B}(J/ψ\to p \bar{p} η(η\to γγ)) = (1.480 \pm 0.001 \pm 0.024)\times\,10^{-3}$ and $\mathcal{B}(J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)) = (1.557 \pm 0.003 \pm 0.038)\times\,10^{-3}$, where the first uncertainties are statistical and the second systematic. Both results are compatible within their uncorrelated systematic uncertainties. The combined result is $\mathcal{B}(J/ψ\to p \bar{p} η)=(1.495 \pm 0.001 \pm 0.023)\times\,10^{-3}$ where the first uncertainty is the combined statistical uncertainty and the second one the combined systematic uncertainty of both analyses, incorporating correlations between them. In addition, the $p \bar{p}$ threshold region is investigated for a potential threshold enhancement, and no evidence for one is observed. △ Less

Submitted 3 July, 2024; originally announced July 2024.
arXiv:2407.02787 [pdf]

physics.optics quant-ph

A versatile quantum microwave photonic signal processing platform based on coincidence window selection technique

Authors: Xinghua Li, Yifan Guo, Xiao Xiang, Runai Quan, Mingtao Cao, Ruifang Dong, Tao Liu, Ming Li, Shougang Zhang

Abstract: Quantum microwave photonics (QMWP) is an innovative approach that combines energy-time entangled biphoton sources as the optical carrier with time-correlated single-photon detection for high-speed RF signal recovery. This groundbreaking method offers unique advantages such as nonlocal RF signal encoding and robust resistance to dispersion-induced frequency fading. This paper explores the versatili… ▽ More Quantum microwave photonics (QMWP) is an innovative approach that combines energy-time entangled biphoton sources as the optical carrier with time-correlated single-photon detection for high-speed RF signal recovery. This groundbreaking method offers unique advantages such as nonlocal RF signal encoding and robust resistance to dispersion-induced frequency fading. This paper explores the versatility of processing the quantum microwave photonic signal by utilizing coincidence window selection on the biphoton coincidence distribution. The demonstration includes finely-tunable RF phase shifting, flexible multi-tap transversal filtering (with up to 15 taps), and photonically implemented RF mixing, leveraging the nonlocal RF mapping characteristic of QMWP. These accomplishments significantly enhance the capability of microwave photonic systems in processing ultra-weak signals, opening up new possibilities for various applications. △ Less

Submitted 2 July, 2024; originally announced July 2024.
arXiv:2407.02774 [pdf]

physics.optics quant-ph

Quantum microwave photonic mixer with a large spurious-free dynamic range

Authors: Xinghua Li, Yifan Guo, Xiao Xiang, Runai Quan, Mingtao Cao, Ruifang Dong, Tao Liu, Ming Li, Shougang Zhang

Abstract: As one of the most fundamental functionalities of microwave photonics, microwave frequency mixing plays an essential role in modern radars and wireless communication systems. However, the commonly utilized intensity modulation in the systems often leads to inadequate spurious-free dynamic range (SFDR) for many sought-after applications. Quantum microwave photonics technique offers a promising solu… ▽ More As one of the most fundamental functionalities of microwave photonics, microwave frequency mixing plays an essential role in modern radars and wireless communication systems. However, the commonly utilized intensity modulation in the systems often leads to inadequate spurious-free dynamic range (SFDR) for many sought-after applications. Quantum microwave photonics technique offers a promising solution for improving SFDR in terms of higher-order harmonic distortion. In this paper, we demonstrate two types of quantum microwave photonic mixers based on the configuration of the intensity modulators: cascade-type and parallel-type. Leveraging the nonlocal RF signal encoding capability, both types of quantum microwave photonic mixers not only exhibit the advantage of dual-channel output but also present significant improvement in SFDR. Specifically, the parallel-type quantum microwave photonic mixer achieves a remarkable SFDR value of 113.6 dB.Hz1/2, which is 30 dB better than that of the cascade-type quantum microwave photonic mixer. When compared to the classical microwave photonic mixer, this enhancement reaches a notable 53.6 dB at the expense of 8 dB conversion loss. These results highlight the superiority of quantum microwave photonic mixers in the fields of microwave and millimeter-wave systems. Further applying multi-photon frequency entangled sources as optical carriers, the dual-channel microwave frequency conversion capability endowed by the quantum microwave photonic mixer can be extended to enhance the performance of multiple-paths microwave mixing which is essential for radar net systems. △ Less

Submitted 2 July, 2024; originally announced July 2024.
arXiv:2407.02446 [pdf, other]

cs.CL cs.AI

Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling

Authors: Margaret Li, Weijia Shi, Artidoro Pagnoni, Peter West, Ari Holtzman

Abstract: RLHF-aligned LMs have shown unprecedented ability on both benchmarks and long-form text generation, yet they struggle with one foundational task: next-token prediction. As RLHF models become agent models aimed at interacting with humans, they seem to lose their world modeling -- the ability to predict what comes next in arbitrary documents, which is the foundational training objective of the Base… ▽ More RLHF-aligned LMs have shown unprecedented ability on both benchmarks and long-form text generation, yet they struggle with one foundational task: next-token prediction. As RLHF models become agent models aimed at interacting with humans, they seem to lose their world modeling -- the ability to predict what comes next in arbitrary documents, which is the foundational training objective of the Base LMs that RLHF adapts. Besides empirically demonstrating this trade-off, we propose a potential explanation: to perform coherent long-form generation, RLHF models restrict randomness via implicit blueprints. In particular, RLHF models concentrate probability on sets of anchor spans that co-occur across multiple generations for the same prompt, serving as textual scaffolding but also limiting a model's ability to generate documents that do not include these spans. We study this trade-off on the most effective current agent models, those aligned with RLHF, while exploring why this may remain a fundamental trade-off between models that act and those that predict, even as alignment techniques improve. △ Less

Submitted 2 July, 2024; originally announced July 2024.
arXiv:2407.02022 [pdf, ps, other]

math.CV math.AG math.DG

Smooth deformation limit of Moishezon manifolds is Moishezon

Authors: Mu-lin Li, Sheng Rao, Kai Wang, Meng-jiao Wang

Abstract: We prove the conjecture that the deformation limit of Moishezon manifolds under a smooth deformation over a unit disk in $\mathbb{C}$ is Moishezon. We prove the conjecture that the deformation limit of Moishezon manifolds under a smooth deformation over a unit disk in $\mathbb{C}$ is Moishezon. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: All comments are welcome
arXiv:2407.01891 [pdf, other]

cs.RO eess.SY

Refined Motion Compensation with Soft Laser Manipulators using Data-Driven Surrogate Models

Authors: Yongjun Yan, Qingpeng Ding, Mingwu Li, Junyan Yan, Shing Shin Cheng

Abstract: Non-contact laser ablation, a precise thermal technique, simultaneously cuts and coagulates tissue without the insertion errors associated with rigid needles. Human organ motions, such as those in the liver, exhibit rhythmic components influenced by respiratory and cardiac cycles, making effective laser energy delivery to target lesions while compensating for tumor motion crucial. This research in… ▽ More Non-contact laser ablation, a precise thermal technique, simultaneously cuts and coagulates tissue without the insertion errors associated with rigid needles. Human organ motions, such as those in the liver, exhibit rhythmic components influenced by respiratory and cardiac cycles, making effective laser energy delivery to target lesions while compensating for tumor motion crucial. This research introduces a data-driven method to derive surrogate models of a soft manipulator. These low-dimensional models offer computational efficiency when integrated into the Model Predictive Control (MPC) framework, while still capturing the manipulator's dynamics with and without control input. Spectral Submanifolds (SSM) theory models the manipulator's autonomous dynamics, acknowledging its tendency to reach equilibrium when external forces are removed. Preliminary results show that the MPC controller using the surrogate model outperforms two other models within the same MPC framework. The data-driven MPC controller also supports a design-agnostic feature, allowing the interchangeability of different soft manipulators within the laser ablation surgery robot system. △ Less

Submitted 1 July, 2024; originally announced July 2024.
arXiv:2407.01316 [pdf, other]

cs.LG cs.CY stat.ML

Evaluating Model Performance Under Worst-case Subpopulations

Authors: Mike Li, Hongseok Namkoong, Shangzhou Xia

Abstract: The performance of ML models degrades when the training population is different from that seen under operation. Towards assessing distributional robustness, we study the worst-case performance of a model over all subpopulations of a given size, defined with respect to core attributes Z. This notion of robustness can consider arbitrary (continuous) attributes Z, and automatically accounts for compl… ▽ More The performance of ML models degrades when the training population is different from that seen under operation. Towards assessing distributional robustness, we study the worst-case performance of a model over all subpopulations of a given size, defined with respect to core attributes Z. This notion of robustness can consider arbitrary (continuous) attributes Z, and automatically accounts for complex intersectionality in disadvantaged groups. We develop a scalable yet principled two-stage estimation procedure that can evaluate the robustness of state-of-the-art models. We prove that our procedure enjoys several finite-sample convergence guarantees, including dimension-free convergence. Instead of overly conservative notions based on Rademacher complexities, our evaluation error depends on the dimension of Z only through the out-of-sample error in estimating the performance conditional on Z. On real datasets, we demonstrate that our method certifies the robustness of a model and prevents deployment of unreliable models. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Earlier version appeared in the proceedings of Advances in Neural Information Processing Systems 34 (NeurIPS 2021): https://proceedings.neurips.cc/paper_files/paper/2021/file/908075ea2c025c335f4865f7db427062-Paper.pdf
arXiv:2407.01281 [pdf, other]

cs.LG cs.AI math.FA

Bridging Smoothness and Approximation: Theoretical Insights into Over-Smoothing in Graph Neural Networks

Authors: Guangrui Yang, Jianfei Li, Ming Li, Han Feng, Ding-Xuan Zhou

Abstract: In this paper, we explore the approximation theory of functions defined on graphs. Our study builds upon the approximation results derived from the $K$-functional. We establish a theoretical framework to assess the lower bounds of approximation for target functions using Graph Convolutional Networks (GCNs) and examine the over-smoothing phenomenon commonly observed in these networks. Initially, we… ▽ More In this paper, we explore the approximation theory of functions defined on graphs. Our study builds upon the approximation results derived from the $K$-functional. We establish a theoretical framework to assess the lower bounds of approximation for target functions using Graph Convolutional Networks (GCNs) and examine the over-smoothing phenomenon commonly observed in these networks. Initially, we introduce the concept of a $K$-functional on graphs, establishing its equivalence to the modulus of smoothness. We then analyze a typical type of GCN to demonstrate how the high-frequency energy of the output decays, an indicator of over-smoothing. This analysis provides theoretical insights into the nature of over-smoothing within GCNs. Furthermore, we establish a lower bound for the approximation of target functions by GCNs, which is governed by the modulus of smoothness of these functions. This finding offers a new perspective on the approximation capabilities of GCNs. In our numerical experiments, we analyze several widely applied GCNs and observe the phenomenon of energy decay. These observations corroborate our theoretical results on exponential decay order. △ Less

Submitted 1 July, 2024; originally announced July 2024.
arXiv:2407.00948 [pdf, other]

cs.CL cs.AI cs.LG

The House Always Wins: A Framework for Evaluating Strategic Deception in LLMs

Authors: Tanush Chopra, Michael Li

Abstract: We propose a framework for evaluating strategic deception in large language models (LLMs). In this framework, an LLM acts as a game master in two scenarios: one with random game mechanics and another where it can choose between random or deliberate actions. As an example, we use blackjack because the action space nor strategies involve deception. We benchmark Llama3-70B, GPT-4-Turbo, and Mixtral i… ▽ More We propose a framework for evaluating strategic deception in large language models (LLMs). In this framework, an LLM acts as a game master in two scenarios: one with random game mechanics and another where it can choose between random or deliberate actions. As an example, we use blackjack because the action space nor strategies involve deception. We benchmark Llama3-70B, GPT-4-Turbo, and Mixtral in blackjack, comparing outcomes against expected distributions in fair play to determine if LLMs develop strategies favoring the "house." Our findings reveal that the LLMs exhibit significant deviations from fair play when given implicit randomness instructions, suggesting a tendency towards strategic manipulation in ambiguous scenarios. However, when presented with an explicit choice, the LLMs largely adhere to fair play, indicating that the framing of instructions plays a crucial role in eliciting or mitigating potentially deceptive behaviors in AI systems. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Research conducted at the Deception Detection Hackathon 2024 hosted by Apart & Apollo Research
arXiv:2407.00421 [pdf]

physics.optics

Multi-wavelength switchable single-frequency hyper Raman microlasers

Authors: Chuntao Li, Ni Yao, Jintian Lin, Renhong Gao, Jianglin Guan, Guanghui Zhao, Minghui Li, Min Wang, Lingling Qiao, Ya Cheng

Abstract: Multi-wavelength switchable single-frequency microlasers in a broad spectral range are highly desirable for integrated photonic applications due to their dynamic switching functionality, narrow linewidth, and high side-mode-suppression-ratio (SMSR). Here, a strategy based on highly efficient successive excitation of different stimulated multi-photon hyper-Raman scattering (SMPHRS) processes is pro… ▽ More Multi-wavelength switchable single-frequency microlasers in a broad spectral range are highly desirable for integrated photonic applications due to their dynamic switching functionality, narrow linewidth, and high side-mode-suppression-ratio (SMSR). Here, a strategy based on highly efficient successive excitation of different stimulated multi-photon hyper-Raman scattering (SMPHRS) processes is proposed to generate multi-wavelength switchable single-frequency hyper-Raman microlasers. This is achieved through collective precise dispersion management for arranging excitation wavelengths to trigger different phase-matched SMPHRS processes in order, mode-hopping-free tuning of the pump wavelength within a wide range of 0.75 nm by leveraging strong thermo-optical broadening of ultra-high Q modes, and simultaneously suppressing harmonics generation in a lithium niobate microcavity with high second-order nonlinearity. As a result, under continuous-wave laser pump at a low level of only 3.9 mW, SMPHRS processes from two- to five-photons emerged step by step and almost depleted previously generated multi-photon Raman signal. Consequently, four-wavelength dynamically switchable single-mode lasing from near infrared (857 nm) to ultraviolet (350 nm) spanning beyond the record range (~500 nm) with high SMSRs >35 dB is reported. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: 17 pages,5 figures, and 1 table
arXiv:2407.00163 [pdf]

cond-mat.mtrl-sci cond-mat.str-el

Pressure Tuning the Mixture of Eu$^{2+}$ and Eu$^{3+}$ in Eu$_4$Bi$_6$Se$_{13}$

Authors: Mingyu Xu, Jose L. Gonzalez Jimenez, Greeshma C. Jose, Artittaya Boonkird, Chengkun Xing, Chelsea Harrod, Xinle Li, Haidong Zhou, Alyssa Gaiser, Xianglin Ke, Wenli Bi, Mingda Li, Weiwei Xie

Abstract: The investigation of crystallographic, electronic, and magnetic characteristics, especially the mixed valences of Eu$^{2+}$ and Eu$^{3+}$ under pressure of a novel europium-based bismuth selenide compound, Eu$_4$Bi$_6$Se$_{13}$, presented. This new compound adopts a monoclinic crystal structure classified under the P$2_1$/m space group (#11). It exhibits distinctive structural features, including… ▽ More The investigation of crystallographic, electronic, and magnetic characteristics, especially the mixed valences of Eu$^{2+}$ and Eu$^{3+}$ under pressure of a novel europium-based bismuth selenide compound, Eu$_4$Bi$_6$Se$_{13}$, presented. This new compound adopts a monoclinic crystal structure classified under the P$2_1$/m space group (#11). It exhibits distinctive structural features, including substantial Eu-Se coordination numbers, Bi-Se ladders, and linear chains of Eu atoms that propagate along the b-axis. Electronic resistivity assessments indicate that Eu$_{4}$Bi$_{6}$Se$_{13}$ exhibits weak metallic behaviors. Magnetic characterization reveals uniaxial magnetic anisotropy, with a notable spin transition at approximately 1.2 T when the magnetic field is oriented along the b-axis. This behavior, coupled with the specific Eu-Eu interatomic distances and the magnetic saturation observed at low fields, supports the identification of metamagnetic properties attributable to the flipping of europium spins. The Curie-Weiss analysis of the magnetic susceptibility measured both perpendicular and parallel to the b-axis and high-pressure partial fluorescence yield (PFY) results detected by X-ray absorption spectroscopy (XAS) reveal the tendency of the material to enter a mixed valent state where the trivalent state becomes more prominent with the pressure increase or temperature decrease. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: 22 pages 8 figures
arXiv:2407.00136 [pdf, other]

hep-ex

Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, S. Ahmed, M. Albrecht, R. Aliberti, A. Amoroso, M. R. An, Q. An, X. H. Bai, Y. Bai, O. Bakina, R. Baldini Ferroli, I. Balossino, Y. Ban, K. Begzsuren, N. Berger, M. Bertani, D. Bettoni, F. Bianchi, J. Bloms, A. Bortone, I. Boyko, R. A. Briere , et al. (495 additional authors not shown)

Abstract: Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions… ▽ More Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions $\frac{\mathcal{B}(h_c\rightarrow e^+e^-η_c)}{\mathcal{B}(h_c\rightarrow γη_c)}$ separately for the $h_c$ samples produced via $ψ(3686)\toπ^0h_c$ and $e^+e^-\toπ^+π^-h_c$. The average ratio is determined to be $(0.59\pm0.10(\text{stat.})\pm0.04(\text{syst.}))\%$, where the uncertainty includes both statistical and systematic components. △ Less

Submitted 2 July, 2024; v1 submitted 28 June, 2024; originally announced July 2024.
arXiv:2407.00046 [pdf, other]

cs.DC cs.GR

Barrier-Augmented Lagrangian for GPU-based Elastodynamic Contact

Authors: Dewen Guo, Minchen Li, Yin Yang, Guoping Wang, Sheng Li

Abstract: We propose a GPU-based iterative method for accelerated elastodynamic simulation with the log-barrier-based contact model. While Newton's method is a conventional choice for solving the interior-point system, the presence of ill-conditioned log barriers often necessitates a direct solution at each linearized substep and costs substantial storage and computational overhead. Moreover, constraint set… ▽ More We propose a GPU-based iterative method for accelerated elastodynamic simulation with the log-barrier-based contact model. While Newton's method is a conventional choice for solving the interior-point system, the presence of ill-conditioned log barriers often necessitates a direct solution at each linearized substep and costs substantial storage and computational overhead. Moreover, constraint sets that vary in each iteration present additional challenges in algorithm convergence. Our method employs a novel barrier-augmented Lagrangian method to improve system conditioning and solver efficiency by adaptively updating an augmentation constraint sets. This enables the utilization of a scalable, inexact Newton-PCG solver with sparse GPU storage, eliminating the need for direct factorization. We further enhance PCG convergence speed with a domain-decomposed warm start strategy based on an eigenvalue spectrum approximated through our in-time assembly. Demonstrating significant scalability improvements, our method makes simulations previously impractical on 128 GB of CPU memory feasible with only 8 GB of GPU memory and orders-of-magnitude faster. Additionally, our method adeptly handles stiff problems, surpassing the capabilities of existing GPU-based interior-point methods. Our results, validated across various complex collision scenarios involving intricate geometries and large deformations, highlight the exceptional performance of our approach. △ Less

Submitted 4 June, 2024; originally announced July 2024.

Comments: 17 pages, 30 figures
arXiv:2406.19987 [pdf, other]

cs.HC

doi 10.1109/VIS54172.2023.00053

Concept Lens: Visually Analyzing the Consistency of Semantic Manipulation in GANs

Authors: Sangwon Jeong, Mingwei Li, Matthew Berger, Shusen Liu

Abstract: As applications of generative AI become mainstream, it is important to understand what generative models are capable of producing, and the extent to which one can predictably control their outputs. In this paper, we propose a visualization design, named Concept Lens, for jointly navigating the data distribution of a generative model, and concept manipulations supported by the model. Our work is fo… ▽ More As applications of generative AI become mainstream, it is important to understand what generative models are capable of producing, and the extent to which one can predictably control their outputs. In this paper, we propose a visualization design, named Concept Lens, for jointly navigating the data distribution of a generative model, and concept manipulations supported by the model. Our work is focused on modern vision-based generative adversarial networks (GAN), and their learned latent spaces, wherein concept discovery has gained significant interest as a means of image manipulation. Concept Lens is designed to support users in understanding the diversity of a provided set of concepts, the relationship between concepts, and the suitability of concepts to give semantic controls for image generation. Key to our approach is the hierarchical grouping of concepts, generated images, and the associated joint exploration. We show how Concept Lens can reveal consistent semantic manipulations for editing images, while also serving as a diagnostic tool for studying the limitations and trade-offs of concept discovery methods. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Journal ref: 2023 IEEE Visualization and Visual Analytics (VIS), Melbourne, Australia, 2023, pp. 221-225
arXiv:2406.19756 [pdf, other]

cs.CV cs.AI

Structure-aware World Model for Probe Guidance via Large-scale Self-supervised Pre-train

Authors: Haojun Jiang, Meng Li, Zhenguo Sun, Ning Jia, Yu Sun, Shaqi Luo, Shiji Song, Gao Huang

Abstract: The complex structure of the heart leads to significant challenges in echocardiography, especially in acquisition cardiac ultrasound images. Successful echocardiography requires a thorough understanding of the structures on the two-dimensional plane and the spatial relationships between planes in three-dimensional space. In this paper, we innovatively propose a large-scale self-supervised pre-trai… ▽ More The complex structure of the heart leads to significant challenges in echocardiography, especially in acquisition cardiac ultrasound images. Successful echocardiography requires a thorough understanding of the structures on the two-dimensional plane and the spatial relationships between planes in three-dimensional space. In this paper, we innovatively propose a large-scale self-supervised pre-training method to acquire a cardiac structure-aware world model. The core innovation lies in constructing a self-supervised task that requires structural inference by predicting masked structures on a 2D plane and imagining another plane based on pose transformation in 3D space. To support large-scale pre-training, we collected over 1.36 million echocardiograms from ten standard views, along with their 3D spatial poses. In the downstream probe guidance task, we demonstrate that our pre-trained model consistently reduces guidance errors across the ten most common standard views on the test set with 0.29 million samples from 74 routine clinical scans, indicating that structure-aware pre-training benefits the scanning. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: Technical report
arXiv:2406.19236 [pdf, other]

cs.AI cs.CV cs.RO

Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions

Authors: Minghan Li, Heng Li, Zhi-Qi Cheng, Yifei Dong, Yuxuan Zhou, Jun-Yan He, Qi Dai, Teruko Mitamura, Alexander G. Hauptmann

Abstract: Vision-and-Language Navigation (VLN) aims to develop embodied agents that navigate based on human instructions. However, current VLN frameworks often rely on static environments and optimal expert supervision, limiting their real-world applicability. To address this, we introduce Human-Aware Vision-and-Language Navigation (HA-VLN), extending traditional VLN by incorporating dynamic human activitie… ▽ More Vision-and-Language Navigation (VLN) aims to develop embodied agents that navigate based on human instructions. However, current VLN frameworks often rely on static environments and optimal expert supervision, limiting their real-world applicability. To address this, we introduce Human-Aware Vision-and-Language Navigation (HA-VLN), extending traditional VLN by incorporating dynamic human activities and relaxing key assumptions. We propose the Human-Aware 3D (HA3D) simulator, which combines dynamic human activities with the Matterport3D dataset, and the Human-Aware Room-to-Room (HA-R2R) dataset, extending R2R with human activity descriptions. To tackle HA-VLN challenges, we present the Expert-Supervised Cross-Modal (VLN-CM) and Non-Expert-Supervised Decision Transformer (VLN-DT) agents, utilizing cross-modal fusion and diverse training strategies for effective navigation in dynamic human environments. A comprehensive evaluation, including metrics considering human activities, and systematic analysis of HA-VLN's unique challenges, underscores the need for further research to enhance HA-VLN agents' real-world robustness and adaptability. Ultimately, this work provides benchmarks and insights for future research on embodied AI and Sim2Real transfer, paving the way for more realistic and applicable VLN systems in human-populated environments. △ Less

Submitted 4 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

Comments: 30 pages, 18 figures, Project Page: https://lpercc.github.io/HA3D_simulator/
arXiv:2406.19190 [pdf, ps, other]

hep-ex

Improved measurement of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (643 additional authors not shown)

Abstract: Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential dec… ▽ More Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential decay rate of $D^+_s\to K^0 e^+ν_e$ to be $f^{K^0}_+(0)=0.636\pm0.049\pm0.013$. For both measurements, the first uncertainty is statistical and the second systematic. The branching fraction and form factor measurements are factors of 1.6 and 1.7 more precise than the previous world averages, respectively. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 13 pages, 6 figures
arXiv:2406.18870 [pdf, ps, other]

math.CO

Exact results on traces of sets

Authors: Mingze Li, Jie Ma, Mingyuan Rong

Abstract: For non-negative integers $n$, $m$, $a$ and $b$, we write $\left( n,m \right) \rightarrow \left( a,b \right)$ if for every family $\mathcal{F}\subseteq 2^{[n]}$ with $|\mathcal{F}|\geqslant m$ there is an $a$-element set $T\subseteq [n]$ such that $\left| \mathcal{F}_{\mid T} \right| \geqslant b$, where $\mathcal{F}_{\mid T}=\{ F \cap T : F \in \mathcal{F} \}$. A longstanding problem in extremal s… ▽ More For non-negative integers $n$, $m$, $a$ and $b$, we write $\left( n,m \right) \rightarrow \left( a,b \right)$ if for every family $\mathcal{F}\subseteq 2^{[n]}$ with $|\mathcal{F}|\geqslant m$ there is an $a$-element set $T\subseteq [n]$ such that $\left| \mathcal{F}_{\mid T} \right| \geqslant b$, where $\mathcal{F}_{\mid T}=\{ F \cap T : F \in \mathcal{F} \}$. A longstanding problem in extremal set theory asks to determine $m(s)=\lim_{n\rightarrow +\infty}\frac{m(n,s)}{n}$, where $m(n,s)$ denotes the maximum integer $m$ such that $\left( n,m \right) \rightarrow \left( n-1,m-s \right)$ holds for non-negatives $n$ and $s$. In this paper, we establish the exact value of $m(2^{d-1}-c)$ for all $1\leqslant c\leqslant d$ whenever $d\geqslant 50$, thereby solving an open problem posed by Piga and Schülke. To be precise, we show that $$m(n,2^{d-1}-c)=\frac{2^{d}-c}{d}n \mbox{ for } 1\leq c\leq d-1 \mbox{ and } d\mid n, \mbox{ and } m(n,2^{d-1}-d)=\frac{2^{d}-d-0.5}{d}n \mbox{ for } 2d\mid n $$ holds for $d\geq 50$. Furthermore, we provide a proof that confirms a conjecture of Frankl and Watanabe from 1994, demonstrating that $m(11)=5.3$. △ Less

Submitted 26 June, 2024; originally announced June 2024.
arXiv:2406.18588 [pdf, other]

cs.CV cs.LG

Varying Manifolds in Diffusion: From Time-varying Geometries to Visual Saliency

Authors: Junhao Chen, Manyi Li, Zherong Pan, Xifeng Gao, Changhe Tu

Abstract: Deep generative models learn the data distribution, which is concentrated on a low-dimensional manifold. The geometric analysis of distribution transformation provides a better understanding of data structure and enables a variety of applications. In this paper, we study the geometric properties of the diffusion model, whose forward diffusion process and reverse generation process construct a seri… ▽ More Deep generative models learn the data distribution, which is concentrated on a low-dimensional manifold. The geometric analysis of distribution transformation provides a better understanding of data structure and enables a variety of applications. In this paper, we study the geometric properties of the diffusion model, whose forward diffusion process and reverse generation process construct a series of distributions on manifolds which vary over time. Our key contribution is the introduction of generation rate, which corresponds to the local deformation of manifold over time around an image component. We show that the generation rate is highly correlated with intuitive visual properties, such as visual saliency, of the image component. Further, we propose an efficient and differentiable scheme to estimate the generation rate for a given image component over time, giving rise to a generation curve. The differentiable nature of our scheme allows us to control the shape of the generation curve via optimization. Using different loss functions, our generation curve matching algorithm provides a unified framework for a range of image manipulation tasks, including semantic transfer, object removal, saliency manipulation, image blending, etc. We conduct comprehensive analytical evaluations to support our findings and evaluate our framework on various manipulation tasks. The results show that our method consistently leads to better manipulation results, compared to recent baselines. △ Less

Submitted 7 June, 2024; originally announced June 2024.
arXiv:2406.18546 [pdf]

cs.CV cs.AI

Application of Multimodal Fusion Deep Learning Model in Disease Recognition

Authors: Xiaoyi Liu, Hongjie Qiu, Muqing Li, Zhou Yu, Yutian Yang, Yafeng Yan

Abstract: This paper introduces an innovative multi-modal fusion deep learning approach to overcome the drawbacks of traditional single-modal recognition techniques. These drawbacks include incomplete information and limited diagnostic accuracy. During the feature extraction stage, cutting-edge deep learning models including convolutional neural networks (CNN), recurrent neural networks (RNN), and transform… ▽ More This paper introduces an innovative multi-modal fusion deep learning approach to overcome the drawbacks of traditional single-modal recognition techniques. These drawbacks include incomplete information and limited diagnostic accuracy. During the feature extraction stage, cutting-edge deep learning models including convolutional neural networks (CNN), recurrent neural networks (RNN), and transformers are applied to distill advanced features from image-based, temporal, and structured data sources. The fusion strategy component seeks to determine the optimal fusion mode tailored to the specific disease recognition task. In the experimental section, a comparison is made between the performance of the proposed multi-mode fusion model and existing single-mode recognition methods. The findings demonstrate significant advantages of the multimodal fusion model across multiple evaluation metrics. △ Less

Submitted 22 May, 2024; originally announced June 2024.
arXiv:2406.18311 [pdf, other]

cs.LG

Online Learning of Multiple Tasks and Their Relationships : Testing on Spam Email Data and EEG Signals Recorded in Construction Fields

Authors: Yixin Jin, Wenjing Zhou, Meiqi Wang, Meng Li, Xintao Li, Tianyu Hu

Abstract: This paper examines an online multi-task learning (OMTL) method, which processes data sequentially to predict labels across related tasks. The framework learns task weights and their relatedness concurrently. Unlike previous models that assumed static task relatedness, our approach treats tasks as initially independent, updating their relatedness iteratively using newly calculated weight vectors.… ▽ More This paper examines an online multi-task learning (OMTL) method, which processes data sequentially to predict labels across related tasks. The framework learns task weights and their relatedness concurrently. Unlike previous models that assumed static task relatedness, our approach treats tasks as initially independent, updating their relatedness iteratively using newly calculated weight vectors. We introduced three rules to update the task relatedness matrix: OMTLCOV, OMTLLOG, and OMTLVON, and compared them against a conventional method (CMTL) that uses a fixed relatedness value. Performance evaluations on three datasets a spam dataset and two EEG datasets from construction workers under varying conditions demonstrated that our OMTL methods outperform CMTL, improving accuracy by 1% to 3% on EEG data, and maintaining low error rates around 12% on the spam dataset. △ Less

Submitted 29 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

Search v0.5.6 released 2020-02-24