-
Dynamic neural network with memristive CIM and CAM for 2D and 3D vision
Authors:
Yue Zhang,
Woyu Zhang,
Shaocong Wang,
Ning Lin,
Yifei Yu,
Yangu He,
Bo Wang,
Hao Jiang,
Peng Lin,
Xiaoxin Xu,
Xiaojuan Qi,
Zhongrui Wang,
Xumeng Zhang,
Dashan Shang,
Qi Liu,
Kwang-Ting Cheng,
Ming Liu
Abstract:
The brain is dynamic, associative and efficient. It reconfigures by associating the inputs with past experiences, with fused memory and processing. In contrast, AI models are static, unable to associate inputs with past experiences, and run on digital computers with physically separated memory and processing. We propose a hardware-software co-design, a semantic memory-based dynamic neural network…
▽ More
The brain is dynamic, associative and efficient. It reconfigures by associating the inputs with past experiences, with fused memory and processing. In contrast, AI models are static, unable to associate inputs with past experiences, and run on digital computers with physically separated memory and processing. We propose a hardware-software co-design, a semantic memory-based dynamic neural network (DNN) using memristor. The network associates incoming data with the past experience stored as semantic vectors. The network and the semantic memory are physically implemented on noise-robust ternary memristor-based Computing-In-Memory (CIM) and Content-Addressable Memory (CAM) circuits, respectively. We validate our co-designs, using a 40nm memristor macro, on ResNet and PointNet++ for classifying images and 3D points from the MNIST and ModelNet datasets, which not only achieves accuracy on par with software but also a 48.1% and 15.9% reduction in computational budget. Moreover, it delivers a 77.6% and 93.3% reduction in energy consumption.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Quantifying cascading power outages during climate extremes considering renewable energy integration
Authors:
Luo Xu,
Ning Lin,
H. Vincent Poor,
Dazhi Xi,
A. T. D. Perera
Abstract:
Climate extremes, such as hurricanes, combined with large-scale integration of environment-sensitive renewables, could exacerbate the risk of widespread power outages. We introduce a coupled climate-energy model for cascading power outages, which comprehensively captures the impacts of evolving climate extremes on renewable generation, and transmission and distribution networks. The model is valid…
▽ More
Climate extremes, such as hurricanes, combined with large-scale integration of environment-sensitive renewables, could exacerbate the risk of widespread power outages. We introduce a coupled climate-energy model for cascading power outages, which comprehensively captures the impacts of evolving climate extremes on renewable generation, and transmission and distribution networks. The model is validated by the 2022 Puerto Rico catastrophic blackout during Hurricane Fiona, the first-ever system-wide blackout event with complete weather-induced outage records. The model presents a novel resilience pattern that was not captured by the present state-of-the-art models and reveals that early failure of certain critical components surprisingly enhances overall system resilience. Sensitivity analysis of various behind-the-meter solar integration scenarios demonstrates that lower integration levels (below 45%, including the current level) exhibit minimal impact on system resilience in this event. However, surpassing this critical level without additional flexibility resources can exacerbate the failure probability due to substantially enlarged energy imbalances.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Older and Wiser: The Marriage of Device Aging and Intellectual Property Protection of Deep Neural Networks
Authors:
Ning Lin,
Shaocong Wang,
Yue Zhang,
Yangu He,
Kwunhang Wong,
Arindam Basu,
Dashan Shang,
Xiaoming Chen,
Zhongrui Wang
Abstract:
Deep neural networks (DNNs), such as the widely-used GPT-3 with billions of parameters, are often kept secret due to high training costs and privacy concerns surrounding the data used to train them. Previous approaches to securing DNNs typically require expensive circuit redesign, resulting in additional overheads such as increased area, energy consumption, and latency. To address these issues, we…
▽ More
Deep neural networks (DNNs), such as the widely-used GPT-3 with billions of parameters, are often kept secret due to high training costs and privacy concerns surrounding the data used to train them. Previous approaches to securing DNNs typically require expensive circuit redesign, resulting in additional overheads such as increased area, energy consumption, and latency. To address these issues, we propose a novel hardware-software co-design approach for DNN intellectual property (IP) protection that capitalizes on the inherent aging characteristics of circuits and a novel differential orientation fine-tuning (DOFT) to ensure effective protection. Hardware-wise, we employ random aging to produce authorized chips. This process circumvents the need for chip redesign, thereby eliminating any additional hardware overhead during the inference procedure of DNNs. Moreover, the authorized chips demonstrate a considerable disparity in DNN inference performance when compared to unauthorized chips. Software-wise, we propose a novel DOFT, which allows pre-trained DNNs to maintain their original accuracy on authorized chips with minimal fine-tuning, while the model's performance on unauthorized chips is reduced to random guessing. Extensive experiments on various models, including MLP, VGG, ResNet, Mixer, and SwinTransformer, with lightweight binary and practical multi-bit weights demonstrate that the proposed method achieves effective IP protection, with only 10\% accuracy on unauthorized chips, while preserving nearly the original accuracy on authorized ones.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Composited-Nested-Learning with Data Augmentation for Nested Named Entity Recognition
Authors:
Xingming Liao,
Nankai Lin,
Haowen Li,
Lianglun Cheng,
Zhuowei Wang,
Chong Chen
Abstract:
Nested Named Entity Recognition (NNER) focuses on addressing overlapped entity recognition. Compared to Flat Named Entity Recognition (FNER), annotated resources are scarce in the corpus for NNER. Data augmentation is an effective approach to address the insufficient annotated corpus. However, there is a significant lack of exploration in data augmentation methods for NNER. Due to the presence of…
▽ More
Nested Named Entity Recognition (NNER) focuses on addressing overlapped entity recognition. Compared to Flat Named Entity Recognition (FNER), annotated resources are scarce in the corpus for NNER. Data augmentation is an effective approach to address the insufficient annotated corpus. However, there is a significant lack of exploration in data augmentation methods for NNER. Due to the presence of nested entities in NNER, existing data augmentation methods cannot be directly applied to NNER tasks. Therefore, in this work, we focus on data augmentation for NNER and resort to more expressive structures, Composited-Nested-Label Classification (CNLC) in which constituents are combined by nested-word and nested-label, to model nested entities. The dataset is augmented using the Composited-Nested-Learning (CNL). In addition, we propose the Confidence Filtering Mechanism (CFM) for a more efficient selection of generated data. Experimental results demonstrate that this approach results in improvements in ACE2004 and ACE2005 and alleviates the impact of sample imbalance.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Concurrent Accretion and Migration of Giant Planets in their Natal Disks with Consistent Accretion Torque
Authors:
Ya-Ping Li,
Yi-Xian Chen,
Douglas N. C. Lin
Abstract:
Migration commonly occurs during the epoch of planet formation. For emerging gas giant planets, it proceeds concurrently with their growth through the accretion of gas from their natal protoplanetary disks. Similar migration process should also be applied to the stellar-mass black holes embedded in active galactic nucleus disks. In this work, we perform high resolution 3D and 2D numerical hydrodyn…
▽ More
Migration commonly occurs during the epoch of planet formation. For emerging gas giant planets, it proceeds concurrently with their growth through the accretion of gas from their natal protoplanetary disks. Similar migration process should also be applied to the stellar-mass black holes embedded in active galactic nucleus disks. In this work, we perform high resolution 3D and 2D numerical hydrodynamical simulations to study the migration dynamics for accreting embedded objects over the disk viscous timescales in a self-consistent manner. We find that an accreting planet embedded in a predominantly viscous disk has a tendency to migrate outward, in contrast to the inward orbital decay of non-accreting planets. 3D and 2D simulations find the consistent outward migration results for the accreting planets. Under this circumstance, the accreting planet's outward migration is mainly due to the asymmetric spiral arms feeding from the global disk into the Hill radius. This is analogous to the unsaturated corotation torque although the imbalance is due to material accretion within the libration timescale rather than diffusion onto the inner disk. In a disk with a relatively small viscosity, the accreting planets clear deep gaps near their orbits. The tendency of inward migration is recovered, albeit with suppressed rates. By performing a parameter survey with a range of disks' viscosity, we find that the transition from outward to inward migration occurs with the effective viscous efficiency factor $α\sim 0.003$ for Jupiter-mass planets.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Continuous-Time Digital Twin with Analogue Memristive Neural Ordinary Differential Equation Solver
Authors:
Hegan Chen,
Jichang Yang,
Jia Chen,
Songqi Wang,
Shaocong Wang,
Dingchen Wang,
Xinyu Tian,
Yifei Yu,
Xi Chen,
Yinan Lin,
Yangu He,
Xiaoshan Wu,
Yi Li,
Xinyuan Zhang,
Ning Lin,
Meng Xu,
Yi Li,
Xumeng Zhang,
Zhongrui Wang,
Han Wang,
Dashan Shang,
Qi Liu,
Kwang-Ting Cheng,
Ming Liu
Abstract:
Digital twins, the cornerstone of Industry 4.0, replicate real-world entities through computer models, revolutionising fields such as manufacturing management and industrial automation. Recent advances in machine learning provide data-driven methods for developing digital twins using discrete-time data and finite-depth models on digital computers. However, this approach fails to capture the underl…
▽ More
Digital twins, the cornerstone of Industry 4.0, replicate real-world entities through computer models, revolutionising fields such as manufacturing management and industrial automation. Recent advances in machine learning provide data-driven methods for developing digital twins using discrete-time data and finite-depth models on digital computers. However, this approach fails to capture the underlying continuous dynamics and struggles with modelling complex system behaviour. Additionally, the architecture of digital computers, with separate storage and processing units, necessitates frequent data transfers and Analogue-Digital (A/D) conversion, thereby significantly increasing both time and energy costs. Here, we introduce a memristive neural ordinary differential equation (ODE) solver for digital twins, which is capable of capturing continuous-time dynamics and facilitates the modelling of complex systems using an infinite-depth model. By integrating storage and computation within analogue memristor arrays, we circumvent the von Neumann bottleneck, thus enhancing both speed and energy efficiency. We experimentally validate our approach by developing a digital twin of the HP memristor, which accurately extrapolates its nonlinear dynamics, achieving a 4.2-fold projected speedup and a 41.4-fold projected decrease in energy consumption compared to state-of-the-art digital hardware, while maintaining an acceptable error margin. Additionally, we demonstrate scalability through experimentally grounded simulations of Lorenz96 dynamics, exhibiting projected performance improvements of 12.6-fold in speed and 189.7-fold in energy efficiency relative to traditional digital approaches. By harnessing the capabilities of fully analogue computing, our breakthrough accelerates the development of digital twins, offering an efficient and rapid solution to meet the demands of Industry 4.0.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
HateDebias: On the Diversity and Variability of Hate Speech Debiasing
Authors:
Nankai Lin,
Hongyan Wu,
Zhengming Chen,
Zijian Li,
Lianxi Wang,
Shengyi Jiang,
Dong Zhou,
Aimin Yang
Abstract:
Hate speech on social media is ubiquitous but urgently controlled. Without detecting and mitigating the biases brought by hate speech, different types of ethical problems. While a number of datasets have been proposed to address the problem of hate speech detection, these datasets seldom consider the diversity and variability of bias, making it far from real-world scenarios. To fill this gap, we p…
▽ More
Hate speech on social media is ubiquitous but urgently controlled. Without detecting and mitigating the biases brought by hate speech, different types of ethical problems. While a number of datasets have been proposed to address the problem of hate speech detection, these datasets seldom consider the diversity and variability of bias, making it far from real-world scenarios. To fill this gap, we propose a benchmark, named HateDebias, to analyze the model ability of hate speech detection under continuous, changing environments. Specifically, to meet the diversity of biases, we collect existing hate speech detection datasets with different types of biases. To further meet the variability (i.e., the changing of bias attributes in datasets), we reorganize datasets to follow the continuous learning setting. We evaluate the detection accuracy of models trained on the datasets with a single type of bias with the performance on the HateDebias, where a significant performance drop is observed. To provide a potential direction for debiasing, we further propose a debiasing framework based on continuous learning and bias information regularization, as well as the memory replay strategies to ensure the debiasing ability of the model. Experiment results on the proposed benchmark show that the aforementioned method can improve several baselines with a distinguished margin, highlighting its effectiveness in real-world applications.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
White dwarf magnetospheres: Shielding volatile content of icy objects and implications for volatile pollution scarcity
Authors:
Wen-Han Zhou,
Shang-Fei Liu,
Douglas N. C. Lin
Abstract:
Context. About 25% -- 50% of white dwarfs are found to be contaminated by heavy elements, which are believed to originate from external sources such as planetary materials. Elemental abundances suggest that most of the pollutants are rocky objects and only a small fraction of white dwarfs bear traces of volatile accretion.
Aims. In order to account for the scarcity of volatile pollution, we inve…
▽ More
Context. About 25% -- 50% of white dwarfs are found to be contaminated by heavy elements, which are believed to originate from external sources such as planetary materials. Elemental abundances suggest that most of the pollutants are rocky objects and only a small fraction of white dwarfs bear traces of volatile accretion.
Aims. In order to account for the scarcity of volatile pollution, we investigate the role of the white dwarfs' magnetospheres in shielding the volatile content of icy objects.
Methods. We estimated the volatile sublimation of inward-drifting exocomets. We assume the orbits of the exocomets are circularized by the Alfven wing drag that is effective for long-period comets.
Results. Volatile material can sublimate outside the corotation radius and be shielded by the magnetic field. {The two conditions for this volatile-shielded mechanism are that the magnetosphere radius must be larger than the corotation radius and that the volatiles are depleted outside the corotation radius, which requires a sufficiently slow orbital circularization process.} We applied our model to nine white dwarfs with known rotational periods, magnetic fields, and atmosphere compositions. Our volatile-shielded model may explain the excess of volatile elements such as C and S in the disk relative to the white dwarf atmosphere in WD2326+049 (G29-38). Nevertheless, given the sensitivity of our model to the circularization process and material properties of icy objects, there remains considerable uncertainty in our results.
Conclusions. Our work suggests a possible explanation for the scarcity of volatile-accretion signatures among white dwarfs. We also identify a correlation between the magnetic field strength, the spin period, and the composition of pollutants in white dwarf atmospheres.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
A Survey on Industrial Internet of Things (IIoT) Testbeds for Connectivity Research
Authors:
Tianyu Zhang,
Chuanyu Xue,
Jiachen Wang,
Zelin Yun,
Natong Lin,
Song Han
Abstract:
Industrial Internet of Things (IIoT) technologies have revolutionized industrial processes, enabling smart automation, real-time data analytics, and improved operational efficiency across diverse industry sectors. IIoT testbeds play a critical role in advancing IIoT research and development (R&D) to provide controlled environments for technology evaluation before their real-world deployment. In th…
▽ More
Industrial Internet of Things (IIoT) technologies have revolutionized industrial processes, enabling smart automation, real-time data analytics, and improved operational efficiency across diverse industry sectors. IIoT testbeds play a critical role in advancing IIoT research and development (R&D) to provide controlled environments for technology evaluation before their real-world deployment. In this article, we conduct a comprehensive literature review on existing IIoT testbeds, aiming to identify benchmark performance, research gaps and explore emerging trends in IIoT systems. We first review the state-of-the-art resource management solutions proposed for IIoT applications. We then categorize the reviewed testbeds according to their deployed communication protocols (including TSN, IEEE 802.15.4, IEEE 802.11 and 5G) and discuss the design and usage of each testbed. Driven by the knowledge gained during this study, we present suggestions and good practices for researchers and practitioners who are planning to design and develop IIoT testbeds for connectivity research.
△ Less
Submitted 30 June, 2024; v1 submitted 26 April, 2024;
originally announced April 2024.
-
Efficient and accurate neural field reconstruction using resistive memory
Authors:
Yifei Yu,
Shaocong Wang,
Woyu Zhang,
Xinyuan Zhang,
Xiuzhe Wu,
Yangu He,
Jichang Yang,
Yue Zhang,
Ning Lin,
Bo Wang,
Xi Chen,
Songqi Wang,
Xumeng Zhang,
Xiaojuan Qi,
Zhongrui Wang,
Dashan Shang,
Qi Liu,
Kwang-Ting Cheng,
Ming Liu
Abstract:
Human beings construct perception of space by integrating sparse observations into massively interconnected synapses and neurons, offering a superior parallelism and efficiency. Replicating this capability in AI finds wide applications in medical imaging, AR/VR, and embodied AI, where input data is often sparse and computing resources are limited. However, traditional signal reconstruction methods…
▽ More
Human beings construct perception of space by integrating sparse observations into massively interconnected synapses and neurons, offering a superior parallelism and efficiency. Replicating this capability in AI finds wide applications in medical imaging, AR/VR, and embodied AI, where input data is often sparse and computing resources are limited. However, traditional signal reconstruction methods on digital computers face both software and hardware challenges. On the software front, difficulties arise from storage inefficiencies in conventional explicit signal representation. Hardware obstacles include the von Neumann bottleneck, which limits data transfer between the CPU and memory, and the limitations of CMOS circuits in supporting parallel processing. We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs. Software-wise, we employ neural field to implicitly represent signals via neural networks, which is further compressed using low-rank decomposition and structured pruning. Hardware-wise, we design a resistive memory-based computing-in-memory (CIM) platform, featuring a Gaussian Encoder (GE) and an MLP Processing Engine (PE). The GE harnesses the intrinsic stochasticity of resistive memory for efficient input encoding, while the PE achieves precise weight mapping through a Hardware-Aware Quantization (HAQ) circuit. We demonstrate the system's efficacy on a 40nm 256Kb resistive memory-based in-memory computing macro, achieving huge energy efficiency and parallelism improvements without compromising reconstruction quality in tasks like 3D CT sparse reconstruction, novel view synthesis, and novel view synthesis for dynamic scenes. This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
The Population of Massive Stars in AGN Disks
Authors:
Yi-Xian Chen,
Douglas N. C. Lin
Abstract:
Gravitational instability in the outskirts of Active Galactic Nuclei (AGN) disks lead to disk fragmentation and formation of super-massive (several 10^2Msun) stars with potentially long lifetimes. Alternatively, stars can be captured ex-situ and grow from gas accretion in the AGN disk. However, the number density distribution throughout the disk is limited by thermal feedback as their luminosities…
▽ More
Gravitational instability in the outskirts of Active Galactic Nuclei (AGN) disks lead to disk fragmentation and formation of super-massive (several 10^2Msun) stars with potentially long lifetimes. Alternatively, stars can be captured ex-situ and grow from gas accretion in the AGN disk. However, the number density distribution throughout the disk is limited by thermal feedback as their luminosities provide the dominant heating source. We derive equilibrium stellar surface density profiles under two limiting contexts: in the case where the stellar lifetimes are prolonged due to recycling of hydrogen rich disk gas, only the fraction of gas converted into heat is removed from the disk accretion flow. Alternatively, if stellar composition recycling is inefficient and stars can evolve off the main sequence, the disk accretion rate is quenched towards smaller radii resembling a classical star-burst disk, albeit the effective removal rate depends not only on the stellar lifetime, but also the mass of stellar remnants. For AGNs with central Supermassive Black Hole (SMBH) masses of \sim 10^6 to 10^8Msun accreting at \sim 0.1 Eddington efficiency, we estimate a total number of 10^3 to 10^5 coexisting massive stars and the rate of stellar mergers to be 10^-3 to 1 per year. We motivate the detailed study of interaction between a swarm of massive stars through hydro and N body simulations to provide better prescriptions of dynamical processes in AGN disks, and to constrain more accurate estimates of the stellar population.
△ Less
Submitted 30 May, 2024; v1 submitted 12 April, 2024;
originally announced April 2024.
-
Resistive Memory-based Neural Differential Equation Solver for Score-based Diffusion Model
Authors:
Jichang Yang,
Hegan Chen,
Jia Chen,
Songqi Wang,
Shaocong Wang,
Yifei Yu,
Xi Chen,
Bo Wang,
Xinyuan Zhang,
Binbin Cui,
Yi Li,
Ning Lin,
Meng Xu,
Yi Li,
Xiaoxin Xu,
Xiaojuan Qi,
Zhongrui Wang,
Xumeng Zhang,
Dashan Shang,
Han Wang,
Qi Liu,
Kwang-Ting Cheng,
Ming Liu
Abstract:
Human brains image complicated scenes when reading a novel. Replicating this imagination is one of the ultimate goals of AI-Generated Content (AIGC). However, current AIGC methods, such as score-based diffusion, are still deficient in terms of rapidity and efficiency. This deficiency is rooted in the difference between the brain and digital computers. Digital computers have physically separated st…
▽ More
Human brains image complicated scenes when reading a novel. Replicating this imagination is one of the ultimate goals of AI-Generated Content (AIGC). However, current AIGC methods, such as score-based diffusion, are still deficient in terms of rapidity and efficiency. This deficiency is rooted in the difference between the brain and digital computers. Digital computers have physically separated storage and processing units, resulting in frequent data transfers during iterative calculations, incurring large time and energy overheads. This issue is further intensified by the conversion of inherently continuous and analog generation dynamics, which can be formulated by neural differential equations, into discrete and digital operations. Inspired by the brain, we propose a time-continuous and analog in-memory neural differential equation solver for score-based diffusion, employing emerging resistive memory. The integration of storage and computation within resistive memory synapses surmount the von Neumann bottleneck, benefiting the generative speed and energy efficiency. The closed-loop feedback integrator is time-continuous, analog, and compact, physically implementing an infinite-depth neural network. Moreover, the software-hardware co-design is intrinsically robust to analog noise. We experimentally validate our solution with 180 nm resistive memory in-memory computing macros. Demonstrating equivalent generative quality to the software baseline, our system achieved remarkable enhancements in generative speed for both unconditional and conditional generation tasks, by factors of 64.8 and 156.5, respectively. Moreover, it accomplished reductions in energy consumption by factors of 5.2 and 4.1. Our approach heralds a new horizon for hardware solutions in edge computing for generative AI applications.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects
Authors:
Zicong Fan,
Takehiko Ohkawa,
Linlin Yang,
Nie Lin,
Zhishan Zhou,
Shihao Zhou,
Jiajun Liang,
Zhong Gao,
Xuanyang Zhang,
Xue Zhang,
Fei Li,
Liu Zheng,
Feng Lu,
Karim Abou Zeid,
Bastian Leibe,
Jeongwan On,
Seungryul Baek,
Aditya Prakash,
Saurabh Gupta,
Kun He,
Yoichi Sato,
Otmar Hilliges,
Hyung Jin Chang,
Angela Yao
Abstract:
We interact with the world with our hands and see it through our own (egocentric) perspective. A holistic 3D understanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation. Accurately reconstructing such interactions in 3D is challenging due to heavy occlusion, viewpoint bias, camera distortion, and motion blur from the…
▽ More
We interact with the world with our hands and see it through our own (egocentric) perspective. A holistic 3D understanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation. Accurately reconstructing such interactions in 3D is challenging due to heavy occlusion, viewpoint bias, camera distortion, and motion blur from the head movement. To this end, we designed the HANDS23 challenge based on the AssemblyHands and ARCTIC datasets with carefully designed training and testing splits. Based on the results of the top submitted methods and more recent baselines on the leaderboards, we perform a thorough analysis on 3D hand(-object) reconstruction tasks. Our analysis demonstrates the effectiveness of addressing distortion specific to egocentric cameras, adopting high-capacity transformers to learn complex hand-object interactions, and fusing predictions from different views. Our study further reveals challenging scenarios intractable with state-of-the-art methods, such as fast hand motion, object reconstruction from narrow egocentric views, and close contact between two hands and objects. Our efforts will enrich the community's knowledge foundation and facilitate future hand studies on egocentric hand-object interactions.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Computational Models to Study Language Processing in the Human Brain: A Survey
Authors:
Shaonan Wang,
Jingyuan Sun,
Yunhao Zhang,
Nan Lin,
Marie-Francine Moens,
Chengqing Zong
Abstract:
Despite differing from the human language processing mechanism in implementation and algorithms, current language models demonstrate remarkable human-like or surpassing language capabilities. Should computational language models be employed in studying the brain, and if so, when and how? To delve into this topic, this paper reviews efforts in using computational models for brain research, highligh…
▽ More
Despite differing from the human language processing mechanism in implementation and algorithms, current language models demonstrate remarkable human-like or surpassing language capabilities. Should computational language models be employed in studying the brain, and if so, when and how? To delve into this topic, this paper reviews efforts in using computational models for brain research, highlighting emerging trends. To ensure a fair comparison, the paper evaluates various computational models using consistent metrics on the same dataset. Our analysis reveals that no single model outperforms others on all datasets, underscoring the need for rich testing datasets and rigid experimental control to draw robust conclusions in studies involving computational models.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Reverse That Number! Decoding Order Matters in Arithmetic Learning
Authors:
Daniel Zhang-Li,
Nianyi Lin,
Jifan Yu,
Zheyuan Zhang,
Zijun Yao,
Xiaokang Zhang,
Lei Hou,
Jing Zhang,
Juanzi Li
Abstract:
Recent advancements in pretraining have demonstrated that modern Large Language Models (LLMs) possess the capability to effectively learn arithmetic operations. However, despite acknowledging the significance of digit order in arithmetic computation, current methodologies predominantly rely on sequential, step-by-step approaches for teaching LLMs arithmetic, resulting in a conclusion where obtaini…
▽ More
Recent advancements in pretraining have demonstrated that modern Large Language Models (LLMs) possess the capability to effectively learn arithmetic operations. However, despite acknowledging the significance of digit order in arithmetic computation, current methodologies predominantly rely on sequential, step-by-step approaches for teaching LLMs arithmetic, resulting in a conclusion where obtaining better performance involves fine-grained step-by-step. Diverging from this conventional path, our work introduces a novel strategy that not only reevaluates the digit order by prioritizing output from the least significant digit but also incorporates a step-by-step methodology to substantially reduce complexity. We have developed and applied this method in a comprehensive set of experiments. Compared to the previous state-of-the-art (SOTA) method, our findings reveal an overall improvement of in accuracy while requiring only a third of the tokens typically used during training. For the purpose of facilitating replication and further research, we have made our code and dataset publicly available at \url{https://anonymous.4open.science/r/RAIT-9FB7/}.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Episodic eruptions of young accreting stars: the key role of disc thermal instability due to Hydrogen ionisation
Authors:
Sergei Nayakshin,
Fernando Cruz Saenz de Miera,
Agnes Kospal,
Aleksandra Calovic,
Jochen Eisloffel,
Douglas N. C. Lin
Abstract:
In the classical grouping of large magnitude episodic variability of young accreting stars, FUORs outshine their stars by a factor of $\sim$ 100, and can last for up to centuries; EXORs are dimmer, and last months to a year. A disc Hydrogen ionisation Thermal Instability (TI) scenario was previously proposed for FUORs but required unrealistically low disc viscosity. In the last decade, many interm…
▽ More
In the classical grouping of large magnitude episodic variability of young accreting stars, FUORs outshine their stars by a factor of $\sim$ 100, and can last for up to centuries; EXORs are dimmer, and last months to a year. A disc Hydrogen ionisation Thermal Instability (TI) scenario was previously proposed for FUORs but required unrealistically low disc viscosity. In the last decade, many intermediate type objects, e.g., FUOR-like in luminosity and spectra but EXOR-like in duration were found. Here we show that the intermediate type bursters Gaia20eae, PTF14jg, Gaia19bey and Gaia21bty may be naturally explained by the TI scenario with realistic viscosity values. We argue that TI predicts a dearth (desert) of bursts with peak accretion rates between $\dot M \sim 10^{-6} M_\odot$/yr and $\dot M \sim 10^{-5} M_\odot$/yr, and that this desert is seen in the sample of all the bursters with previously determined $\dot M$ burst. Most classic EXORs (FUORs) appear to be on the cold (hot) branch of the S-curve during the peak light of their eruptions; thus TI may play a role in this class differentiation. At the same time, TI is unable to explain how classic FUORs can last for up to centuries, and over-predicts the occurrence rate of short FUORs by at least an order of magnitude. We conclude that TI is a required ingredient of episodic accretion operating at R < 0.1 au, but additional physics must play a role at larger scales. Knowledge of TI inner workings from related disciplines may enable its use as a tool to constrain the nature of this additional physics.
△ Less
Submitted 25 March, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Dust Accumulation near the Magnetospheric Truncation of Protoplanetary Discs. II. The Effects of Opacity and Thermal Evolution
Authors:
Rixin Li,
Yi-Xian Chen,
Douglas N. C. Lin
Abstract:
Dust trapping in the global pressure bump induced by magnetospheric truncation offers a promising formation mechanism for close-in super-Earths/sub-Neptunes. These planets likely form in evolved protoplanetary discs, where the gas temperature at the expanding truncation radius become amiable to refractory solids. However, dust accumulation may alter the disc opacity such that thermal evolution is…
▽ More
Dust trapping in the global pressure bump induced by magnetospheric truncation offers a promising formation mechanism for close-in super-Earths/sub-Neptunes. These planets likely form in evolved protoplanetary discs, where the gas temperature at the expanding truncation radius become amiable to refractory solids. However, dust accumulation may alter the disc opacity such that thermal evolution is inevitable. To better understand how thermodynamics affects this planet formation pathway, we conduct a suite of local dust evolution simulations in an idealized inner disc model. Our calculations take into account self-consistent opacity-dependent temperature changes as well as dust evaporation and vapour condensation. We find that disc thermal evolution regulates dust growth and evolution, discouraging any accumulation of small particles that drives the increase of opacity and temperature. Significant retention of dust mass takes place when the disc environments allow runaway growth of large solids beyond the fragmentation barrier, where small particles are then swept up and preserved. Our results further validate dust accumulation near disc truncation as a promising mechanism to form close-in planets.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Exploring the Impact of Table-to-Text Methods on Augmenting LLM-based Question Answering with Domain Hybrid Data
Authors:
Dehai Min,
Nan Hu,
Rihui Jin,
Nuo Lin,
Jiaoyan Chen,
Yongrui Chen,
Yu Li,
Guilin Qi,
Yun Li,
Nijun Li,
Qianren Wang
Abstract:
Augmenting Large Language Models (LLMs) for Question Answering (QA) with domain specific data has attracted wide attention. However, domain data often exists in a hybrid format, including text and semi-structured tables, posing challenges for the seamless integration of information. Table-to-Text Generation is a promising solution by facilitating the transformation of hybrid data into a uniformly…
▽ More
Augmenting Large Language Models (LLMs) for Question Answering (QA) with domain specific data has attracted wide attention. However, domain data often exists in a hybrid format, including text and semi-structured tables, posing challenges for the seamless integration of information. Table-to-Text Generation is a promising solution by facilitating the transformation of hybrid data into a uniformly text-formatted corpus. Although this technique has been widely studied by the NLP community, there is currently no comparative analysis on how corpora generated by different table-to-text methods affect the performance of QA systems. In this paper, we address this research gap in two steps. First, we innovatively integrate table-to-text generation into the framework of enhancing LLM-based QA systems with domain hybrid data. Then, we utilize this framework in real-world industrial data to conduct extensive experiments on two types of QA systems (DSFT and RAG frameworks) with four representative methods: Markdown format, Template serialization, TPLM-based method, and LLM-based method. Based on the experimental results, we draw some empirical findings and explore the underlying reasons behind the success of some methods. We hope the findings of this work will provide a valuable reference for the academic and industrial communities in developing robust QA systems.
△ Less
Submitted 9 April, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Magnetic field of gas giant exoplanets and its influence on the retention of their exomoons
Authors:
Xing Wei,
D. N. C. Lin
Abstract:
We study the magnetic and tidal interactions of a gas-giant exoplanet with its host star and with its exomoons, and focus on their retention. We briefly revisit the scaling law for planetary dynamo in terms of its mass, radius and luminosity. Based on the virial theorem, we construct an evolution law for planetary magnetic field and find that its initial entropy is important for the field evolutio…
▽ More
We study the magnetic and tidal interactions of a gas-giant exoplanet with its host star and with its exomoons, and focus on their retention. We briefly revisit the scaling law for planetary dynamo in terms of its mass, radius and luminosity. Based on the virial theorem, we construct an evolution law for planetary magnetic field and find that its initial entropy is important for the field evolution of a high-mass planet. We estimate the magnetic torques on orbit arising from the star-planet and planet-moon magnetic interactions, and find that it can compensate tidal torques and bypass frequency valleys where dynamical-tide response is ineffective. For exomoon's retention we consider two situations. In the presence of a circumplanetary disk (CPD), by comparison between CPD's inner and outer radii, we find that planets with too strong magnetic fields or too small distance from its host star tend not to host exomoons. During the subsequent CPD-free evolution, we find, by comparison between planet's spindown and moon's migration timescales, that hot Jupiters with periods of several days are unlikely to retain large exomoons, albeit they could be surrounded by rings from the debris of tidally disrupted moons. In contrast, moons, if formed around warm or cold Jupiters, can be preserved. Finally, we estimate the radio power and flux density due to the star-planet and planet-moon magnetic interactions and give the upper limit of detection distance by FAST.
△ Less
Submitted 11 March, 2024; v1 submitted 11 February, 2024;
originally announced February 2024.
-
Hazard resistance-based spatiotemporal risk analysis for distribution network outages during hurricanes
Authors:
Luo Xu,
Ning Lin,
Dazhi Xi,
Kairui Feng,
H. Vincent Poor
Abstract:
Blackouts in recent decades show an increasing prevalence of power outages due to extreme weather events such as hurricanes. Precisely assessing the spatiotemporal outages in distribution networks, the most vulnerable part of power systems, is critical to enhance power system resilience. The Sequential Monte Carlo (SMC) simulation method is widely used for spatiotemporal risk analysis of power sys…
▽ More
Blackouts in recent decades show an increasing prevalence of power outages due to extreme weather events such as hurricanes. Precisely assessing the spatiotemporal outages in distribution networks, the most vulnerable part of power systems, is critical to enhance power system resilience. The Sequential Monte Carlo (SMC) simulation method is widely used for spatiotemporal risk analysis of power systems during extreme weather hazards. However, it is found here that the SMC method can lead to large errors by directly applying the fragility function or failure probability of system components in time-sequential analysis, particularly overestimating damages under evolving hazards with high-frequency sampling. To address this issue, a novel hazard resistance-based spatiotemporal risk analysis (HRSRA) method is proposed. This method converts the time-varying failure probability of a component into a hazard resistance as a time-invariant value during the simulation of evolving hazards. The proposed HRSRA provides an adaptive framework for incorporating high-spatiotemporal-resolution meteorology models into power outage simulations. By leveraging the geographic information system data of the power system and a physics-based hurricane wind field model, the superiority of the proposed method is validated using real-world time-series power outage data from Puerto Rico during Hurricane Fiona 2022.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Perturbation Analysis of Markov Chain Monte Carlo for Graphical Models
Authors:
Na Lin,
Yuanyuan Liu,
Aaron Smith
Abstract:
The basic question in perturbation analysis of Markov chains is: how do small changes in the transition kernels of Markov chains translate to chains in their stationary distributions? Many papers on the subject have shown, roughly, that the change in stationary distribution is small as long as the change in the kernel is much less than some measure of the convergence rate. This result is essential…
▽ More
The basic question in perturbation analysis of Markov chains is: how do small changes in the transition kernels of Markov chains translate to chains in their stationary distributions? Many papers on the subject have shown, roughly, that the change in stationary distribution is small as long as the change in the kernel is much less than some measure of the convergence rate. This result is essentially sharp for generic Markov chains. In this paper we show that much larger errors, up to size roughly the square root of the convergence rate, are permissible for many target distributions associated with graphical models. The main motivation for this work comes from computational statistics, where there is often a tradeoff between the per-step error and per-step cost of approximate MCMC algorithms. Our results show that larger perturbations (and thus less-expensive chains) still give results with small error.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Random resistive memory-based deep extreme point learning machine for unified visual processing
Authors:
Shaocong Wang,
Yizhao Gao,
Yi Li,
Woyu Zhang,
Yifei Yu,
Bo Wang,
Ning Lin,
Hegan Chen,
Yue Zhang,
Yang Jiang,
Dingchen Wang,
Jia Chen,
Peng Dai,
Hao Jiang,
Peng Lin,
Xumeng Zhang,
Xiaojuan Qi,
Xiaoxin Xu,
Hayden So,
Zhongrui Wang,
Dashan Shang,
Qi Liu,
Kwang-Ting Cheng,
Ming Liu
Abstract:
Visual sensors, including 3D LiDAR, neuromorphic DVS sensors, and conventional frame cameras, are increasingly integrated into edge-side intelligent machines. Realizing intensive multi-sensory data analysis directly on edge intelligent machines is crucial for numerous emerging edge applications, such as augmented and virtual reality and unmanned aerial vehicles, which necessitates unified data rep…
▽ More
Visual sensors, including 3D LiDAR, neuromorphic DVS sensors, and conventional frame cameras, are increasingly integrated into edge-side intelligent machines. Realizing intensive multi-sensory data analysis directly on edge intelligent machines is crucial for numerous emerging edge applications, such as augmented and virtual reality and unmanned aerial vehicles, which necessitates unified data representation, unprecedented hardware energy efficiency and rapid model training. However, multi-sensory data are intrinsically heterogeneous, causing significant complexity in the system development for edge-side intelligent machines. In addition, the performance of conventional digital hardware is limited by the physically separated processing and memory units, known as the von Neumann bottleneck, and the physical limit of transistor scaling, which contributes to the slowdown of Moore's law. These limitations are further intensified by the tedious training of models with ever-increasing sizes. We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM), that offers efficient unified point set analysis. We show the system's versatility across various data modalities and two different learning tasks. Compared to a conventional digital hardware-based system, our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems. Our random resistive memory-based deep extreme point learning machine may pave the way for energy-efficient and training-friendly edge AI across various data modalities and tasks.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
Chaotic Type I Migration in Turbulent Discs
Authors:
Yinhao Wu,
Yi-Xian Chen,
Douglas N. C. Lin
Abstract:
By performing global hydrodynamical simulations of accretion discs with driven turbulence models, we demonstrate that elevated levels of turbulence induce highly stochastic migration torques on low-mass companions embedded in these discs. This scenario applies to planets migrating within gravito-turbulent regions of protoplanetary discs as well as stars and black holes embedded in the outskirts of…
▽ More
By performing global hydrodynamical simulations of accretion discs with driven turbulence models, we demonstrate that elevated levels of turbulence induce highly stochastic migration torques on low-mass companions embedded in these discs. This scenario applies to planets migrating within gravito-turbulent regions of protoplanetary discs as well as stars and black holes embedded in the outskirts of active galactic nuclei (AGN) accretion discs. When the turbulence level is low, linear Lindblad torques persists in the background of stochastic forces and its accumulative effect can still dominate over relatively long timescales. However, in the presence of very stronger turbulence, classical flow patterns around the companion embedded in the disc are disrupted, leading to significant deviations from the expectations of classical Type I migration theory over arbitrarily long timescales. Our findings suggest that the stochastic nature of turbulent migration can prevent low-mass companions from monotonically settling into universal migration traps within the traditional laminar disc framework, thus reducing the frequency of three-body interactions and hierarchical mergers compared to previously expected. We propose a scaling for the transition mass ratio from classical to chaotic migration $q\propto α_R$, where $α_R$ is the Reynolds viscosity stress parameter, which can be further tested and refined by conducting extensive simulations over the relevant parameter space.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Out-of-Distribution Generalized Dynamic Graph Neural Network for Human Albumin Prediction
Authors:
Zeyang Zhang,
Xingwang Li,
Fei Teng,
Ning Lin,
Xueling Zhu,
Xin Wang,
Wenwu Zhu
Abstract:
Human albumin is essential for indicating the body's overall health. Accurately predicting plasma albumin levels and determining appropriate doses are urgent clinical challenges, particularly in critically ill patients, to maintain optimal blood levels. However, human albumin prediction is non-trivial that has to leverage the dynamics of biochemical markers as well as the experience of treating pa…
▽ More
Human albumin is essential for indicating the body's overall health. Accurately predicting plasma albumin levels and determining appropriate doses are urgent clinical challenges, particularly in critically ill patients, to maintain optimal blood levels. However, human albumin prediction is non-trivial that has to leverage the dynamics of biochemical markers as well as the experience of treating patients. Moreover, the problem of distribution shift is often encountered in real clinical data, which may lead to a decline in the model prediction performance and reduce the reliability of the model's application. In this paper, we propose a framework named Out-of-Distribution Generalized Dynamic Graph Neural Network for Human Albumin Prediction (DyG-HAP), which is able to provide accurate albumin predictions for Intensity Care Unit (ICU) patients during hospitalization. We first model human albumin prediction as a dynamic graph regression problem to model the dynamics and patient relationship. Then, we propose a disentangled dynamic graph attention mechanism to capture and disentangle the patterns whose relationship to labels under distribution shifts is invariant and variant respectively. Last, we propose an invariant dynamic graph regression method to encourage the model to rely on invariant patterns to make predictions. Moreover, we propose a dataset named Albumin level testing and nutritional dosing data for Intensive Care (ANIC) for evaluation. Extensive experiments demonstrate the superiority of our method compared to several baseline methods in human albumin prediction.
△ Less
Submitted 7 March, 2024; v1 submitted 27 November, 2023;
originally announced November 2023.
-
Pruning random resistive memory for optimizing analogue AI
Authors:
Yi Li,
Songqi Wang,
Yaping Zhao,
Shaocong Wang,
Woyu Zhang,
Yangu He,
Ning Lin,
Binbin Cui,
Xi Chen,
Shiming Zhang,
Hao Jiang,
Peng Lin,
Xumeng Zhang,
Xiaojuan Qi,
Zhongrui Wang,
Xiaoxin Xu,
Dashan Shang,
Qi Liu,
Kwang-Ting Cheng,
Ming Liu
Abstract:
The rapid advancement of artificial intelligence (AI) has been marked by the large language models exhibiting human-like intelligence. However, these models also present unprecedented challenges to energy consumption and environmental sustainability. One promising solution is to revisit analogue computing, a technique that predates digital computing and exploits emerging analogue electronic device…
▽ More
The rapid advancement of artificial intelligence (AI) has been marked by the large language models exhibiting human-like intelligence. However, these models also present unprecedented challenges to energy consumption and environmental sustainability. One promising solution is to revisit analogue computing, a technique that predates digital computing and exploits emerging analogue electronic devices, such as resistive memory, which features in-memory computing, high scalability, and nonvolatility. However, analogue computing still faces the same challenges as before: programming nonidealities and expensive programming due to the underlying devices physics. Here, we report a universal solution, software-hardware co-design using structural plasticity-inspired edge pruning to optimize the topology of a randomly weighted analogue resistive memory neural network. Software-wise, the topology of a randomly weighted neural network is optimized by pruning connections rather than precisely tuning resistive memory weights. Hardware-wise, we reveal the physical origin of the programming stochasticity using transmission electron microscopy, which is leveraged for large-scale and low-cost implementation of an overparameterized random neural network containing high-performance sub-networks. We implemented the co-design on a 40nm 256K resistive memory macro, observing 17.3% and 19.9% accuracy improvements in image and audio classification on FashionMNIST and Spoken digits datasets, as well as 9.8% (2%) improvement in PR (ROC) in image segmentation on DRIVE datasets, respectively. This is accompanied by 82.1%, 51.2%, and 99.8% improvement in energy efficiency thanks to analogue in-memory computing. By embracing the intrinsic stochasticity and in-memory computing, this work may solve the biggest obstacle of analogue computing systems and thus unleash their immense potential for next-generation AI hardware.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
PowerFlowNet: Power Flow Approximation Using Message Passing Graph Neural Networks
Authors:
Nan Lin,
Stavros Orfanoudakis,
Nathan Ordonez Cardenas,
Juan S. Giraldo,
Pedro P. Vergara
Abstract:
Accurate and efficient power flow (PF) analysis is crucial in modern electrical networks' operation and planning. Therefore, there is a need for scalable algorithms that can provide accurate and fast solutions for both small and large scale power networks. As the power network can be interpreted as a graph, Graph Neural Networks (GNNs) have emerged as a promising approach for improving the accurac…
▽ More
Accurate and efficient power flow (PF) analysis is crucial in modern electrical networks' operation and planning. Therefore, there is a need for scalable algorithms that can provide accurate and fast solutions for both small and large scale power networks. As the power network can be interpreted as a graph, Graph Neural Networks (GNNs) have emerged as a promising approach for improving the accuracy and speed of PF approximations by exploiting information sharing via the underlying graph structure. In this study, we introduce PowerFlowNet, a novel GNN architecture for PF approximation that showcases similar performance with the traditional Newton-Raphson method but achieves it 4 times faster in the simple IEEE 14-bus system and 145 times faster in the realistic case of the French high voltage network (6470rte). Meanwhile, it significantly outperforms other traditional approximation methods, such as the DC relaxation method, in terms of performance and execution time; therefore, making PowerFlowNet a highly promising solution for real-world PF analysis. Furthermore, we verify the efficacy of our approach by conducting an in-depth experimental evaluation, thoroughly examining the performance, scalability, interpretability, and architectural dependability of PowerFlowNet. The evaluation provides insights into the behavior and potential applications of GNNs in power system analysis.
△ Less
Submitted 13 February, 2024; v1 submitted 6 November, 2023;
originally announced November 2023.
-
3D Global Simulations of Accretion onto Gap-opening Planets: Implications for Circumplanetary Disc Structures and Accretion Rates
Authors:
Ya-Ping Li,
Yi-Xian Chen,
Douglas N. C. Lin
Abstract:
We perform a series of 3D simulations to study the accretion of giant planet embedded in protoplanetary discs (PPDs) over gap-opening timescales. We find that the accretion mass flux mainly comes from the intermediate latitude above the disc midplane. The circumplanetary disc (CPD) for a super-thermal planet is rotation-supported up to $\sim$20-30\% of the planet Hill radius. While both mass inflo…
▽ More
We perform a series of 3D simulations to study the accretion of giant planet embedded in protoplanetary discs (PPDs) over gap-opening timescales. We find that the accretion mass flux mainly comes from the intermediate latitude above the disc midplane. The circumplanetary disc (CPD) for a super-thermal planet is rotation-supported up to $\sim$20-30\% of the planet Hill radius. While both mass inflow and outflow exists in the CPD midplane, the overall trend is an outflow that forms a meridional circulation with high-latitude inflows. We confirm the absence of accretion outburst from disc eccentricity excited by massive planets in our 3D simulations, contrary to the consensus of previous 2D simulations. This suggests the necessity of 3D simulations of accretion even for super-Jupiters. The accretion rates of planets measured in steady-state can be decomposed into the ``geometric" and ``density depletion" factors. Through extensive parameter survey, we identify a power-law scaling for the geometric factor $\propto q_{\rm th}^{2/3}$ for super-thermal planets ($q_{\rm th}$ being the thermal mass ratio), which transforms to $\propto q_{\rm th}^{2}$ for less massive cases. The density depletion factor is limited by the disc accretion rate for mildly super-thermal planets, and by gap-opening for highly super-thermal ones. Moderate planetary eccentricities can enhance the accretion rates by a factor of $2-3$ through making the gap shallower, but does not impact the flow geometry. We have applied our simulations results to accreting protoplanet system PDS 70 and can satisfactorily explain the accretion rate and CPD size in observations.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
Changing-Look AGN Behaviour Induced by Disk-Captured Tidal Disruption Events
Authors:
Yihan Wang,
Douglas N. C. Lin,
Bing Zhang,
Zhaohuan Zhu
Abstract:
Recent observations of changing-look active galactic nuclei (AGN) hint at a frequency of accretion activity not fully explained by tidal disruption events (TDEs) stemming from relaxation processes in nucleus star clusters (NSCs), traditionally estimated to occur at rates of $10^{-4}$ to $10^{-5}$ yr$^{-1}$ per galaxy. In this letter, we propose an enhanced TDE rate through the AGN disk capture pro…
▽ More
Recent observations of changing-look active galactic nuclei (AGN) hint at a frequency of accretion activity not fully explained by tidal disruption events (TDEs) stemming from relaxation processes in nucleus star clusters (NSCs), traditionally estimated to occur at rates of $10^{-4}$ to $10^{-5}$ yr$^{-1}$ per galaxy. In this letter, we propose an enhanced TDE rate through the AGN disk capture process, presenting a viable explanation for the frequent transitions observed in changing-look AGN. Specifically, we investigate the interaction between the accretion disk and retrograde stars within NSCs, resulting in the rapid occurrence of TDEs within a condensed time frame. Through detailed calculations, we derive the time-dependent TDE rates for both relaxation-induced TDE and disk-captured TDE. Our analysis reveals that TDEs triggered by the disk capture process can notably amplify the TDE rate by several orders of magnitude during the AGN phase. This mechanism offers a potential explanation for the enhanced high-energy variability characteristic of changing-look AGNs.
△ Less
Submitted 29 September, 2023;
originally announced October 2023.
-
The impermanent fate of massive stars in AGN disks
Authors:
Mohamad Ali-Dib,
Douglas N. C. Lin
Abstract:
Stars are likely to form or to be captured in AGN disks. Their mass reaches an equilibrium when their rate of accretion is balanced by that of wind. If the exchanged gas is well mixed with the stellar core, this metabolic process would indefinitely sustain an "immortal" state on the main sequence (MS) and pollute the disk with He byproducts. This theoretical extrapolation is inconsistent with the…
▽ More
Stars are likely to form or to be captured in AGN disks. Their mass reaches an equilibrium when their rate of accretion is balanced by that of wind. If the exchanged gas is well mixed with the stellar core, this metabolic process would indefinitely sustain an "immortal" state on the main sequence (MS) and pollute the disk with He byproducts. This theoretical extrapolation is inconsistent with the super-solar α element and Fe abundances inferred from the broad emission lines in active AGNs with modest He concentration. We show this paradox can be resolved with a highly-efficient retention of the He ashes or the suppression of chemical blending. The latter mechanism is robust in the geometrically-thin, dense, sub-pc regions of the disk where the embedded-stars' mass is limited by the gap-formation condition. These stars contain a radiative zone between their mass-exchange stellar surface and the nuclear-burning core. Insulation of the core lead to the gradual decrease of its H fuel and the stars' equilibrium masses. These stars transition to their post-main-sequence (PostMS) tracks on a chemical evolution time scale of a few Myr. Subsequently, the triple-α and α-chain reactions generate α and Fe byproducts which are released into their natal disks. These PostMS stars also undergo core collapse, set off type II supernova, and leave behind a few solar-mass residual black holes or neutron stars
△ Less
Submitted 18 September, 2023; v1 submitted 8 September, 2023;
originally announced September 2023.
-
Metal Enrichment due to Embedded Stars in AGN Discs
Authors:
Jiamu Huang,
Douglas N. C. Lin,
Gregory Shields
Abstract:
We separately assess elemental abundances in AGNs' broad and narrow emission line regions (BLR and NLR), based on a critical assessment of published results together with new photoionization models. We find 1) He/H enhancements in some AGN, exceeding what can be explained by normal chemical evolution and confirm 2) super-solar α abundance, though to a lesser degree than previously reported. We als…
▽ More
We separately assess elemental abundances in AGNs' broad and narrow emission line regions (BLR and NLR), based on a critical assessment of published results together with new photoionization models. We find 1) He/H enhancements in some AGN, exceeding what can be explained by normal chemical evolution and confirm 2) super-solar α abundance, though to a lesser degree than previously reported. We also reaffirm 3) a N/O ratio consistent with secondary production; 4) solar or slightly sub-solar Fe abundance; and 5) red-shift independent metallicity, in contrast with galactic chemical evolution. We interpret 6) the larger metallicity in the BLR than NRL in terms of an in situ stellar evolution and pollution in AGN discs (SEPAD) model. We attribute: a) the redshift independence to the heavy element pollutants being disposed into the disc and accreted onto the central supermassive black hole (SMBH); b) the limited He excess to the accretion-wind metabolism of a top-heavy population of evolving massive main sequence stars; c) the super-solar CNO enrichment to the nuclear synthesis during their post-main-sequence evolution; d) the large N/O to the byproduct of multiple stellar generations; and e) the Mg, Si, and Fe to the ejecta of type II supernovae in the disc. These results provide supporting evidence for f) ongoing self-regulated star formation, g) adequate stellar luminosity to maintain marginal gravitational stability, h) prolific production of seeds and i) dense coexistence of subsequently-grown residual black hole populations in AGN discs.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
Stellar/BH Population in AGN Disks: Direct Binary Formation from Capture Objects in Nuclei Clusters
Authors:
Yihan Wang,
Zhaohuan Zhu,
Douglas N. C. Lin
Abstract:
The Active Galatic Nuclei(AGN) disk has been proposed as a potential channel for the merger of binary black holes. The population of massive stars and black holes in AGN disks captured from the nuclei cluster plays a crucial role in determining the efficiency of binary formation and final merger rate within the AGN disks. In this paper, we investigate the capture process using analytical and numer…
▽ More
The Active Galatic Nuclei(AGN) disk has been proposed as a potential channel for the merger of binary black holes. The population of massive stars and black holes in AGN disks captured from the nuclei cluster plays a crucial role in determining the efficiency of binary formation and final merger rate within the AGN disks. In this paper, we investigate the capture process using analytical and numerical approaches. We discover a new constant integral of motion for one object's capture process. Applying this result to the whole population of the nuclei cluster captured by the AGN disk, we find that the population of captured objects depends on the angular density and eccentricity distribution of the nuclei clusters and is effectively independent of the radial density profile of the nuclei cluster and disk models. An isotropic nuclei cluster with thermal eccentricity distribution predicts a captured profile $d N/d r \propto r^{-1/4}$. The captured objects are found to be dynamically crowded within the disk. Direct binary formation right after the capture would be promising, especially for stars. The conventional migration traps that help pile up single objects in AGN disks for black hole mergers might not be required.
△ Less
Submitted 26 June, 2024; v1 submitted 17 August, 2023;
originally announced August 2023.
-
Evaluating Picture Description Speech for Dementia Detection using Image-text Alignment
Authors:
Youxiang Zhu,
Nana Lin,
Xiaohui Liang,
John A. Batsis,
Robert M. Roth,
Brian MacWhinney
Abstract:
Using picture description speech for dementia detection has been studied for 30 years. Despite the long history, previous models focus on identifying the differences in speech patterns between healthy subjects and patients with dementia but do not utilize the picture information directly. In this paper, we propose the first dementia detection models that take both the picture and the description t…
▽ More
Using picture description speech for dementia detection has been studied for 30 years. Despite the long history, previous models focus on identifying the differences in speech patterns between healthy subjects and patients with dementia but do not utilize the picture information directly. In this paper, we propose the first dementia detection models that take both the picture and the description texts as inputs and incorporate knowledge from large pre-trained image-text alignment models. We observe the difference between dementia and healthy samples in terms of the text's relevance to the picture and the focused area of the picture. We thus consider such a difference could be used to enhance dementia detection accuracy. Specifically, we use the text's relevance to the picture to rank and filter the sentences of the samples. We also identified focused areas of the picture as topics and categorized the sentences according to the focused areas. We propose three advanced models that pre-processed the samples based on their relevance to the picture, sub-image, and focused areas. The evaluation results show that our advanced models, with knowledge of the picture and large image-text alignment models, achieve state-of-the-art performance with the best detection accuracy at 83.44%, which is higher than the text-only baseline model at 79.91%. Lastly, we visualize the sample and picture results to explain the advantages of our models.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
Zero-shot Skeleton-based Action Recognition via Mutual Information Estimation and Maximization
Authors:
Yujie Zhou,
Wenwen Qiang,
Anyi Rao,
Ning Lin,
Bing Su,
Jiaqi Wang
Abstract:
Zero-shot skeleton-based action recognition aims to recognize actions of unseen categories after training on data of seen categories. The key is to build the connection between visual and semantic space from seen to unseen classes. Previous studies have primarily focused on encoding sequences into a singular feature vector, with subsequent mapping the features to an identical anchor point within t…
▽ More
Zero-shot skeleton-based action recognition aims to recognize actions of unseen categories after training on data of seen categories. The key is to build the connection between visual and semantic space from seen to unseen classes. Previous studies have primarily focused on encoding sequences into a singular feature vector, with subsequent mapping the features to an identical anchor point within the embedded space. Their performance is hindered by 1) the ignorance of the global visual/semantic distribution alignment, which results in a limitation to capture the true interdependence between the two spaces. 2) the negligence of temporal information since the frame-wise features with rich action clues are directly pooled into a single feature vector. We propose a new zero-shot skeleton-based action recognition method via mutual information (MI) estimation and maximization. Specifically, 1) we maximize the MI between visual and semantic space for distribution alignment; 2) we leverage the temporal information for estimating the MI by encouraging MI to increase as more frames are observed. Extensive experiments on three large-scale skeleton action datasets confirm the effectiveness of our method. Code: https://github.com/YujieOuO/SMIE.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
A massive hot Jupiter orbiting a metal-rich early-M star discovered in the TESS full frame images
Authors:
Tianjun Gan,
Charles Cadieux,
Farbod Jahandar,
Allona Vazan,
Sharon X. Wang,
Shude Mao,
Jaime A. Alvarado-Montes,
D. N. C. Lin,
Étienne Artigau,
Neil J. Cook,
René Doyon,
Andrew W. Mann,
Keivan G. Stassun,
Adam J. Burgasser,
Benjamin V. Rackham,
Steve B. Howell,
Karen A. Collins,
Khalid Barkaoui,
Avi Shporer,
Jerome de Leon,
Luc Arnold,
George R. Ricker,
Roland Vanderspek,
David W. Latham,
Sara Seager
, et al. (19 additional authors not shown)
Abstract:
Observations and statistical studies have shown that giant planets are rare around M dwarfs compared with Sun-like stars. The formation mechanism of these extreme systems remains under debate for decades. With the help of the TESS mission and ground based follow-up observations, we report the discovery of TOI-4201b, the most massive and densest hot Jupiter around an M dwarf known so far with a rad…
▽ More
Observations and statistical studies have shown that giant planets are rare around M dwarfs compared with Sun-like stars. The formation mechanism of these extreme systems remains under debate for decades. With the help of the TESS mission and ground based follow-up observations, we report the discovery of TOI-4201b, the most massive and densest hot Jupiter around an M dwarf known so far with a radius of $1.22\pm 0.04\ R_J$ and a mass of $2.48\pm0.09\ M_J$, about 5 times heavier than most other giant planets around M dwarfs. It also has the highest planet-to-star mass ratio ($q\sim 4\times 10^{-3}$) among such systems. The host star is an early-M dwarf with a mass of $0.61\pm0.02\ M_{\odot}$ and a radius of $0.63\pm0.02\ R_{\odot}$. It has significant super-solar iron abundance ([Fe/H]=$0.52\pm 0.08$ dex). However, interior structure modeling suggests that its planet TOI-4201b is metal-poor, which challenges the classical core-accretion correlation of stellar-planet metallicity, unless the planet is inflated by additional energy sources. Building on the detection of this planet, we compare the stellar metallicity distribution of four planetary groups: hot/warm Jupiters around G/M dwarfs. We find that hot/warm Jupiters show a similar metallicity dependence around G-type stars. For M dwarf host stars, the occurrence of hot Jupiters shows a much stronger correlation with iron abundance, while warm Jupiters display a weaker preference, indicating possible different formation histories.
△ Less
Submitted 13 September, 2023; v1 submitted 13 July, 2023;
originally announced July 2023.
-
Resistive memory-based zero-shot liquid state machine for multimodal event data learning
Authors:
Ning Lin,
Shaocong Wang,
Yi Li,
Bo Wang,
Shuhui Shi,
Yangu He,
Woyu Zhang,
Yifei Yu,
Yue Zhang,
Xiaojuan Qi,
Xiaoming Chen,
Hao Jiang,
Xumeng Zhang,
Peng Lin,
Xiaoxin Xu,
Qi Liu,
Zhongrui Wang,
Dashan Shang,
Ming Liu
Abstract:
The human brain is a complex spiking neural network (SNN) that learns multimodal signals in a zero-shot manner by generalizing existing knowledge. Remarkably, the brain achieves this with minimal power consumption, using event-based signals that propagate within its structure. However, mimicking the human brain in neuromorphic hardware presents both hardware and software challenges. Hardware limit…
▽ More
The human brain is a complex spiking neural network (SNN) that learns multimodal signals in a zero-shot manner by generalizing existing knowledge. Remarkably, the brain achieves this with minimal power consumption, using event-based signals that propagate within its structure. However, mimicking the human brain in neuromorphic hardware presents both hardware and software challenges. Hardware limitations, such as the slowdown of Moore's law and the von Neumann bottleneck, hinder the efficiency of digital computers. On the software side, SNNs are known for their difficult training, especially when learning multimodal signals. To overcome these challenges, we propose a hardware-software co-design that combines a fixed and random liquid state machine (LSM) SNN encoder with trainable artificial neural network (ANN) projections. The LSM is physically implemented using analogue resistive memory, leveraging the inherent stochasticity of resistive switching to generate random weights. This highly efficient and nanoscale in-memory computing approach effectively addresses the von Neumann bottleneck and the slowdown of Moore's law. The ANN projections are implemented digitally, allowing for easy optimization using contrastive loss, which helps to overcome the difficulties associated with SNN training. We experimentally implement this co-design on a 40nm 256Kb in-memory computing macro. We first demonstrate LSM-based event encoding through supervised classification and linear probing on the N-MNIST and N-TIDIGITS datasets.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Self-assembled Frameworks Solid with Turbostratic Stacked Crystalline Layers -- A Frustrated 3D Crystal Lattice
Authors:
Hongmei Qin,
Jiahui Wang,
Na Lin,
Xiaoxu Sun,
Yin Chen
Abstract:
Solid materials possess both long-range order and some degree of disorder are critical for understanding the nature of crystal and glassy state, but how to controllable introduce specific type of disorder into a crystalline material is a big challenge. Our previous work indicated that weakening the inter-layer interaction is an effective strategy to import disorders between the layers.Here, we ill…
▽ More
Solid materials possess both long-range order and some degree of disorder are critical for understanding the nature of crystal and glassy state, but how to controllable introduce specific type of disorder into a crystalline material is a big challenge. Our previous work indicated that weakening the inter-layer interaction is an effective strategy to import disorders between the layers.Here, we illustrated that the inter-layer interaction can be weakened to around 1/60 of that of graphite in the self-assembled material, a two-dimensions frameworks formed by B-C-T-A with Cu nodes, which has an obvious layered-structure.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
KoLA: Carefully Benchmarking World Knowledge of Large Language Models
Authors:
Jifan Yu,
Xiaozhi Wang,
Shangqing Tu,
Shulin Cao,
Daniel Zhang-Li,
Xin Lv,
Hao Peng,
Zijun Yao,
Xiaohan Zhang,
Hanming Li,
Chunyang Li,
Zheyuan Zhang,
Yushi Bai,
Yantao Liu,
Amy Xin,
Nianyi Lin,
Kaifeng Yun,
Linlu Gong,
Jianhui Chen,
Zhili Wu,
Yunjia Qi,
Weikai Li,
Yong Guan,
Kaisheng Zeng,
Ji Qi
, et al. (10 additional authors not shown)
Abstract:
The unprecedented performance of large language models (LLMs) necessitates improvements in evaluations. Rather than merely exploring the breadth of LLM abilities, we believe meticulous and thoughtful designs are essential to thorough, unbiased, and applicable evaluations. Given the importance of world knowledge to LLMs, we construct a Knowledge-oriented LLM Assessment benchmark (KoLA), in which we…
▽ More
The unprecedented performance of large language models (LLMs) necessitates improvements in evaluations. Rather than merely exploring the breadth of LLM abilities, we believe meticulous and thoughtful designs are essential to thorough, unbiased, and applicable evaluations. Given the importance of world knowledge to LLMs, we construct a Knowledge-oriented LLM Assessment benchmark (KoLA), in which we carefully design three crucial factors: (1) For \textbf{ability modeling}, we mimic human cognition to form a four-level taxonomy of knowledge-related abilities, covering $19$ tasks. (2) For \textbf{data}, to ensure fair comparisons, we use both Wikipedia, a corpus prevalently pre-trained by LLMs, along with continuously collected emerging corpora, aiming to evaluate the capacity to handle unseen data and evolving knowledge. (3) For \textbf{evaluation criteria}, we adopt a contrastive system, including overall standard scores for better numerical comparability across tasks and models and a unique self-contrast metric for automatically evaluating knowledge-creating ability. We evaluate $28$ open-source and commercial LLMs and obtain some intriguing findings. The KoLA dataset and open-participation leaderboard are publicly released at https://kola.xlore.cn and will be continuously updated to provide references for developing LLMs and knowledge-related systems.
△ Less
Submitted 30 June, 2024; v1 submitted 15 June, 2023;
originally announced June 2023.
-
The Orbital Structure and Selection Effects of the Galactic Center S-Star Cluster
Authors:
Andreas Burkert,
Stefan Gillessen,
Douglas N. C. Lin,
Xiaochen Zheng,
Philipp Schoeller,
Frank Eisenhauer,
Reinhard Genzel
Abstract:
The orbital distribution of the S-star cluster surrounding the supermassive black hole in the center of the Milky Way is analyzed. A tight, roughly exponential dependence of the pericenter distance r$_{p}$ on orbital eccentricity e$_{\star}$ is found, $\log ($r$_p)\sim$(1-e$_{\star}$), which cannot be explained simply by a random distribution of semi-major axes and eccentricities. No stars are fou…
▽ More
The orbital distribution of the S-star cluster surrounding the supermassive black hole in the center of the Milky Way is analyzed. A tight, roughly exponential dependence of the pericenter distance r$_{p}$ on orbital eccentricity e$_{\star}$ is found, $\log ($r$_p)\sim$(1-e$_{\star}$), which cannot be explained simply by a random distribution of semi-major axes and eccentricities. No stars are found in the region with high e$_{\star}$ and large log r$_{p}$ or in the region with low e$_{\star}$ and small log r$_{p}$. G-clouds follow the same correlation. The likelihood P(log r$_p$,(1-e$_{\star}$)) to determine the orbital parameters of S-stars is determined. P is very small for stars with large e$_{\star}$ and large log r$_{p}$. S-stars might exist in this region. To determine their orbital parameters, one however needs observations over a longer time period. On the other hand, if stars would exist in the region of low log r$_{p}$ and small e$_{\star}$, their orbital parameters should by now have been determined. That this region is unpopulated therefore indicates that no S-stars exist with these orbital characteristics, providing constraints for their formation. We call this region, defined by $\log$ (r$_p$/AU) $<$ 1.57+2.6(1-e$_{\star})$, the zone of avoidance. Finally, it is shown that the observed frequency of eccentricities and pericenter distances is consistent with a random sampling of log r$_{p}$ and e$_{\star}$. However, only if one takes into account that no stars exist in the zone of avoidance and that orbital parameters cannot yet be determined for stars with large r$_{p}$ and large e$_{\star}$.
△ Less
Submitted 3 June, 2023;
originally announced June 2023.
-
An interpretability framework for Similar case matching
Authors:
Nankai Lin,
Haonan Liu,
Jiajun Fang,
Dong Zhou,
Aimin Yang
Abstract:
Similar Case Matching (SCM) plays a pivotal role in the legal system by facilitating the efficient identification of similar cases for legal professionals. While previous research has primarily concentrated on enhancing the performance of SCM models, the aspect of interpretability has been neglected. To bridge the gap, this study proposes an integrated pipeline framework for interpretable SCM. The…
▽ More
Similar Case Matching (SCM) plays a pivotal role in the legal system by facilitating the efficient identification of similar cases for legal professionals. While previous research has primarily concentrated on enhancing the performance of SCM models, the aspect of interpretability has been neglected. To bridge the gap, this study proposes an integrated pipeline framework for interpretable SCM. The framework comprises four modules: judicial feature sentence identification, case matching, feature sentence alignment, and conflict resolution. In contrast to current SCM methods, our framework first extracts feature sentences within a legal case that contain essential information. Then it conducts case matching based on these extracted features. Subsequently, our framework aligns the corresponding sentences in two legal cases to provide evidence of similarity. In instances where the results of case matching and feature sentence alignment exhibit conflicts, the conflict resolution module resolves these inconsistencies. The experimental results show the effectiveness of our proposed framework, establishing a new benchmark for interpretable SCM.
△ Less
Submitted 16 August, 2023; v1 submitted 4 April, 2023;
originally announced April 2023.
-
A BERT-based Unsupervised Grammatical Error Correction Framework
Authors:
Nankai Lin,
Hongbin Zhang,
Menglan Shen,
Yu Wang,
Shengyi Jiang,
Aimin Yang
Abstract:
Grammatical error correction (GEC) is a challenging task of natural language processing techniques. While more attempts are being made in this approach for universal languages like English or Chinese, relatively little work has been done for low-resource languages for the lack of large annotated corpora. In low-resource languages, the current unsupervised GEC based on language model scoring perfor…
▽ More
Grammatical error correction (GEC) is a challenging task of natural language processing techniques. While more attempts are being made in this approach for universal languages like English or Chinese, relatively little work has been done for low-resource languages for the lack of large annotated corpora. In low-resource languages, the current unsupervised GEC based on language model scoring performs well. However, the pre-trained language model is still to be explored in this context. This study proposes a BERT-based unsupervised GEC framework, where GEC is viewed as multi-class classification task. The framework contains three modules: data flow construction module, sentence perplexity scoring module, and error detecting and correcting module. We propose a novel scoring method for pseudo-perplexity to evaluate a sentence's probable correctness and construct a Tagalog corpus for Tagalog GEC research. It obtains competitive performance on the Tagalog corpus we construct and open-source Indonesian corpus and it demonstrates that our framework is complementary to baseline method for low-resource GEC task.
△ Less
Submitted 30 March, 2023;
originally announced March 2023.
-
Chaotic Gas Accretion by Black Holes Embedded in AGN Discs as Cause of Low-spin Signatures in Gravitational Wave Events
Authors:
Yi-Xian Chen,
Douglas N. C. Lin
Abstract:
Accretion discs around super-massive black holes (SMBH) not only power active galactic nuclei (AGNs), but also host single and binary embedded stellar-mass black holes (EBHs) that grow rapidly from gas accretion. The merger of these EBHs provides a promising mechanism for the excitation of some gravitational wave events observed by LIGO-Virgo, especially those with source masses considerably large…
▽ More
Accretion discs around super-massive black holes (SMBH) not only power active galactic nuclei (AGNs), but also host single and binary embedded stellar-mass black holes (EBHs) that grow rapidly from gas accretion. The merger of these EBHs provides a promising mechanism for the excitation of some gravitational wave events observed by LIGO-Virgo, especially those with source masses considerably larger than isolated stellar-mass black hole binaries. In addition to their mass and mass-ratio distribution, their hitherto enigmatic small spin-parameters chi_effective carry important clues and stringent constraints on their formation channels and evolutionary pathways. Here we show that, between each coalescence, the typical rapid spin of the merged EBHs is suppressed by their subsequent accretion of gas from a turbulent environment, due to its ability to randomize the flow's spin orientation with respect to that of the EBHs on an eddy-turnover timescale. This theory provides supporting evidence for the prolificacy of EBH mergers and suggests that their mass growth may be dominated by gas accretion rather than their coalescence in AGN discs.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
Model and Evaluation: Towards Fairness in Multilingual Text Classification
Authors:
Nankai Lin,
Junheng He,
Zhenghang Tang,
Dong Zhou,
Aimin Yang
Abstract:
Recently, more and more research has focused on addressing bias in text classification models. However, existing research mainly focuses on the fairness of monolingual text classification models, and research on fairness for multilingual text classification is still very limited. In this paper, we focus on the task of multilingual text classification and propose a debiasing framework for multiling…
▽ More
Recently, more and more research has focused on addressing bias in text classification models. However, existing research mainly focuses on the fairness of monolingual text classification models, and research on fairness for multilingual text classification is still very limited. In this paper, we focus on the task of multilingual text classification and propose a debiasing framework for multilingual text classification based on contrastive learning. Our proposed method does not rely on any external language resources and can be extended to any other languages. The model contains four modules: multilingual text representation module, language fusion module, text debiasing module, and text classification module. The multilingual text representation module uses a multilingual pre-trained language model to represent the text, the language fusion module makes the semantic spaces of different languages tend to be consistent through contrastive learning, and the text debiasing module uses contrastive learning to make the model unable to identify sensitive attributes' information. The text classification module completes the basic tasks of multilingual text classification. In addition, the existing research on the fairness of multilingual text classification is relatively simple in the evaluation mode. The evaluation method of fairness is the same as the monolingual equality difference evaluation method, that is, the evaluation is performed on a single language. We propose a multi-dimensional fairness evaluation framework for multilingual text classification, which evaluates the model's monolingual equality difference, multilingual equality difference, multilingual equality performance difference, and destructiveness of the fairness strategy. We hope that our work can provide a more general debiasing method and a more comprehensive evaluation framework for multilingual text fairness tasks.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
Dynamical Evolution of Closely Packed Multiple Planetary Systems Subject to Atmospheric Mass-Loss
Authors:
S. Wang,
D. N. C. Lin
Abstract:
A gap in exoplanets' radius distribution has been widely attributed to the photo-evaporation threshold of their progenitors' gaseous envelope. Giant impacts can also lead to substantial mass-loss. The outflowing gas endures tidal torque from the planets and their host stars. Alongside the planet-star tidal and magnetic interaction, this effect leads to planets' orbital evolution. In multiple super…
▽ More
A gap in exoplanets' radius distribution has been widely attributed to the photo-evaporation threshold of their progenitors' gaseous envelope. Giant impacts can also lead to substantial mass-loss. The outflowing gas endures tidal torque from the planets and their host stars. Alongside the planet-star tidal and magnetic interaction, this effect leads to planets' orbital evolution. In multiple super-Earth systems, especially in those which are closely spaced and/or contain planets locked in mean motion resonances (MMRs), modest mass-loss can lead to dynamical instabilities. In order to place some constraints on the extent of planets' mass-loss, we study the evolution of a series of idealized systems of multiple planets with equal masses and a general scaled separation. We consider mass-loss from one or more planets either in the conservative limit or with angular momentum loss from the system. We show that the stable preservation of idealized multiple planetary systems requires either a wide initial separation or a modest upper limit in the amount of mass-loss. This constraint is stringent for the multiple planetary systems in compact and resonant chains. Perturbation due to either impulsive giant impacts between super-Earths or greater than a few percent mass-loss can lead to dynamical instabilities.
△ Less
Submitted 1 March, 2023;
originally announced March 2023.
-
GLM-Dialog: Noise-tolerant Pre-training for Knowledge-grounded Dialogue Generation
Authors:
Jing Zhang,
Xiaokang Zhang,
Daniel Zhang-Li,
Jifan Yu,
Zijun Yao,
Zeyao Ma,
Yiqi Xu,
Haohua Wang,
Xiaohan Zhang,
Nianyi Lin,
Sunrui Lu,
Juanzi Li,
Jie Tang
Abstract:
We present GLM-Dialog, a large-scale language model (LLM) with 10B parameters capable of knowledge-grounded conversation in Chinese using a search engine to access the Internet knowledge. GLM-Dialog offers a series of applicable techniques for exploiting various external knowledge including both helpful and noisy knowledge, enabling the creation of robust knowledge-grounded dialogue LLMs with limi…
▽ More
We present GLM-Dialog, a large-scale language model (LLM) with 10B parameters capable of knowledge-grounded conversation in Chinese using a search engine to access the Internet knowledge. GLM-Dialog offers a series of applicable techniques for exploiting various external knowledge including both helpful and noisy knowledge, enabling the creation of robust knowledge-grounded dialogue LLMs with limited proper datasets. To evaluate the GLM-Dialog more fairly, we also propose a novel evaluation method to allow humans to converse with multiple deployed bots simultaneously and compare their performance implicitly instead of explicitly rating using multidimensional metrics.Comprehensive evaluations from automatic to human perspective demonstrate the advantages of GLM-Dialog comparing with existing open source Chinese dialogue models. We release both the model checkpoint and source code, and also deploy it as a WeChat application to interact with users. We offer our evaluation platform online in an effort to prompt the development of open source models and reliable dialogue evaluation systems. The additional easy-to-use toolkit that consists of short text entity linking, query generation, and helpful knowledge classification is also released to enable diverse applications. All the source code is available on Github.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
How to choose "Good" Samples for Text Data Augmentation
Authors:
Xiaotian Lin,
Nankai Lin,
Yingwen Fu,
Ziyu Yang,
Shengyi Jiang
Abstract:
Deep learning-based text classification models need abundant labeled data to obtain competitive performance. Unfortunately, annotating large-size corpus is time-consuming and laborious. To tackle this, multiple researches try to use data augmentation to expand the corpus size. However, data augmentation may potentially produce some noisy augmented samples. There are currently no works exploring sa…
▽ More
Deep learning-based text classification models need abundant labeled data to obtain competitive performance. Unfortunately, annotating large-size corpus is time-consuming and laborious. To tackle this, multiple researches try to use data augmentation to expand the corpus size. However, data augmentation may potentially produce some noisy augmented samples. There are currently no works exploring sample selection for augmented samples in nature language processing field. In this paper, we propose a novel self-training selection framework with two selectors to select the high-quality samples from data augmentation. Specifically, we firstly use an entropy-based strategy and the model prediction to select augmented samples. Considering some samples with high quality at the above step may be wrongly filtered, we propose to recall them from two perspectives of word overlap and semantic similarity. Experimental results show the effectiveness and simplicity of our framework.
△ Less
Submitted 2 February, 2023;
originally announced February 2023.
-
Language Cognition and Language Computation -- Human and Machine Language Understanding
Authors:
Shaonan Wang,
Nai Ding,
Nan Lin,
Jiajun Zhang,
Chengqing Zong
Abstract:
Language understanding is a key scientific issue in the fields of cognitive and computer science. However, the two disciplines differ substantially in the specific research questions. Cognitive science focuses on analyzing the specific mechanism of the brain and investigating the brain's response to language; few studies have examined the brain's language system as a whole. By contrast, computer s…
▽ More
Language understanding is a key scientific issue in the fields of cognitive and computer science. However, the two disciplines differ substantially in the specific research questions. Cognitive science focuses on analyzing the specific mechanism of the brain and investigating the brain's response to language; few studies have examined the brain's language system as a whole. By contrast, computer scientists focus on the efficiency of practical applications when choosing research questions but may ignore the most essential laws of language. Given these differences, can a combination of the disciplines offer new insights for building intelligent language models and studying language cognitive mechanisms? In the following text, we first review the research questions, history, and methods of language understanding in cognitive and computer science, focusing on the current progress and challenges. We then compare and contrast the research of language understanding in cognitive and computer sciences. Finally, we review existing work that combines insights from language cognition and language computation and offer prospects for future development trends.
△ Less
Submitted 11 January, 2023;
originally announced January 2023.
-
Tidal disruption of stellar clusters and their remnants' spatial distribution near the galactic center
Authors:
Long Wang,
D. N. C. Lin
Abstract:
The accretion of massive star clusters via dynamical friction has previously been established to be a likely scenario for the build up of nuclear stellar clusters (NSCs). A remaining issue is whether strong external tidal perturbation may lead to the severe disruption of loosely-bound clusters well before they sink deeply into the center of their host galaxies. We carry out a series of $N$-body si…
▽ More
The accretion of massive star clusters via dynamical friction has previously been established to be a likely scenario for the build up of nuclear stellar clusters (NSCs). A remaining issue is whether strong external tidal perturbation may lead to the severe disruption of loosely-bound clusters well before they sink deeply into the center of their host galaxies. We carry out a series of $N$-body simulations and verify our early idealized analytic models. We show if the density profile of the host galaxies can be described by a power-law distribution with an index, $α<1$, the cluster would be compressed in the radial direction by the external galactic tidal field. In contrast, the galactic tidal perturbation is disruptive in regions with a steep, $α>1$, density fall-off or in the very center where gravity is dominated by the point-mass potential of super-massive black holes (SMBHs). This sufficient criterion supplements the conventional necessary Roche-lobe-filling condition in determining the preservation versus disintegration of satellite stellar systems. We simulate the disruption of stellar clusters which venture on nearly-circular, modestly- or highly-eccentric orbits into the center of galaxies with a range of background density profiles and SMBHs. We obtain the spatial distribution of the stellar-cluster remnants. We apply these results to the NSC within a few parsecs from SMBH Sgr A$^\ast$ at the Galactic Center. Recent observations indicate the coexistence of two populations of stars with distinctively separate ages and metallicities. We verify that the subsolar-metalicity population can be the debris of disrupted stellar clusters.
△ Less
Submitted 18 December, 2022;
originally announced December 2022.
-
Hydrodynamical Simulations of Circumbinary Accretion: Balance between Heating and Cooling
Authors:
Hai-Yang Wang,
Xue-Ning Bai,
Dong Lai,
Douglas N. C. Lin
Abstract:
Hydrodynamical interaction in circumbinary discs (CBDs) plays a crucial role in various astrophysical systems, ranging from young stellar binaries to supermassive black hole binaries in galactic centers. Most previous simulations of binary-disc systems have adopted locally isothermal equation of state. In this study, we use the grid-based code $\texttt{Athena++}$ to conduct a suite of two-dimensio…
▽ More
Hydrodynamical interaction in circumbinary discs (CBDs) plays a crucial role in various astrophysical systems, ranging from young stellar binaries to supermassive black hole binaries in galactic centers. Most previous simulations of binary-disc systems have adopted locally isothermal equation of state. In this study, we use the grid-based code $\texttt{Athena++}$ to conduct a suite of two-dimensional viscous hydrodynamical simulations of circumbinary accretion on a cartesian grid, resolving the central cavity of the binary. The gas thermodynamics is treated by thermal relaxation towards an equilibrium temperature (based on the constant$-β$ cooling ansatz, where $β$ is the cooling time in units of the local Keplerian time). Focusing on equal mass, circular binaries in CBDs with (equilibrium) disc aspect ratio $H/R=0.1$, we find that the cooling of the disc gas significantly influences the binary orbital evolution, accretion variability, and CBD morphology, and the effect depends sensitively on the disc viscosity prescriptions. When adopting a constant kinematic viscosity, a finite cooling time ($β\gtrsim 0.1$) leads to binary inspiral as opposed to outspiral and the CBD cavity becomes more symmetric. When adopting a dynamically varying $α-$viscosity, binary inspiral only occurs within a narrow range of cooling time (corresponding to $β$ around 0.5).
△ Less
Submitted 19 September, 2023; v1 submitted 14 December, 2022;
originally announced December 2022.
-
Rimmed and Rippled Accretion Disc Models to Explain AGN Continuum Lags
Authors:
D. A. Starkey,
Jiamu Huang,
Keith Horne,
Douglas N. C. Lin
Abstract:
We propose a solution to the problem of accretion disc sizes in active galactic nuclei being larger when measured by reverberation mapping than predicted by theory. Considering blackbody reprocessing on a disc with thickness profile $H(r)$, our solution invokes a steep rim or rippled structures irradiated by the central lamp-post. We model the continuum lags and the faint and bright disc spectral…
▽ More
We propose a solution to the problem of accretion disc sizes in active galactic nuclei being larger when measured by reverberation mapping than predicted by theory. Considering blackbody reprocessing on a disc with thickness profile $H(r)$, our solution invokes a steep rim or rippled structures irradiated by the central lamp-post. We model the continuum lags and the faint and bright disc spectral energy distribution (SED) in the best-studied case NGC 5548 (black hole mass $M = 7\times10^{7} M_\odot$, disc inclination $i=45^\circ$). With the lamp-post off, the observed disc SED requires a low accretion rate ($\dot{M} \sim 0.0014 M_\odot$/yr) and high prograde black hole spin ($a \sim 0.93$). Reprocessing on the thin disc gives time lags increasing with wavelength but 3 times smaller than observed. Introducing a steep $H(r)$ rim, or multiple crests, near $r = 5$ light days, reprocessing on their steep centre-facing slopes increases temperatures from $\sim1500$ K to $\sim6000$ K and this increases optical lags to match the lag data. Most of the disc surface maintains the cooler $T\propto r^{-3/4}$ temperature profile that matches the SED. The bright lamp-post may be powered by magnetic links tapping the black hole spin. The steep rim occurs near the sublimation radius for dust in the disc, as in the "failed disc wind model" for broad-line clouds. Lens-Thirring torques aligning the disc and black hole spin may also raise a warp and associated waves. In both scenarios, the small density scale height implied by the inferred value of $H(r)$ suggests possible marginal gravitational instability in the disc.
△ Less
Submitted 2 December, 2022;
originally announced December 2022.
-
An Effective Deployment of Contrastive Learning in Multi-label Text Classification
Authors:
Nankai Lin,
Guanqiu Qin,
Jigang Wang,
Aimin Yang,
Dong Zhou
Abstract:
The effectiveness of contrastive learning technology in natural language processing tasks is yet to be explored and analyzed. How to construct positive and negative samples correctly and reasonably is the core challenge of contrastive learning. It is even harder to discover contrastive objects in multi-label text classification tasks. There are very few contrastive losses proposed previously. In t…
▽ More
The effectiveness of contrastive learning technology in natural language processing tasks is yet to be explored and analyzed. How to construct positive and negative samples correctly and reasonably is the core challenge of contrastive learning. It is even harder to discover contrastive objects in multi-label text classification tasks. There are very few contrastive losses proposed previously. In this paper, we investigate the problem from a different angle by proposing five novel contrastive losses for multi-label text classification tasks. These are Strict Contrastive Loss (SCL), Intra-label Contrastive Loss (ICL), Jaccard Similarity Contrastive Loss (JSCL), Jaccard Similarity Probability Contrastive Loss (JSPCL), and Stepwise Label Contrastive Loss (SLCL). We explore the effectiveness of contrastive learning for multi-label text classification tasks by the employment of these novel losses and provide a set of baseline models for deploying contrastive learning techniques on specific tasks. We further perform an interpretable analysis of our approach to show how different components of contrastive learning losses play their roles. The experimental results show that our proposed contrastive losses can bring improvement to multi-label text classification tasks. Our work also explores how contrastive learning should be adapted for multi-label text classification tasks.
△ Less
Submitted 14 July, 2023; v1 submitted 1 December, 2022;
originally announced December 2022.