-
LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models
Authors:
Ruihao Gong,
Yang Yong,
Shiqiao Gu,
Yushi Huang,
Yunchen Zhang,
Xianglong Liu,
Dacheng Tao
Abstract:
Recent advancements in large language models (LLMs) are propelling us toward artificial general intelligence, thanks to their remarkable emergent abilities and reasoning capabilities. However, the substantial computational and memory requirements of LLMs limit their widespread adoption. Quan- tization, a key compression technique, offers a viable solution to mitigate these demands by compressing a…
▽ More
Recent advancements in large language models (LLMs) are propelling us toward artificial general intelligence, thanks to their remarkable emergent abilities and reasoning capabilities. However, the substantial computational and memory requirements of LLMs limit their widespread adoption. Quan- tization, a key compression technique, offers a viable solution to mitigate these demands by compressing and accelerating LLMs, albeit with poten- tial risks to model accuracy. Numerous studies have aimed to minimize the accuracy loss associated with quantization. However, the quantization configurations in these studies vary and may not be optimized for hard- ware compatibility. In this paper, we focus on identifying the most effective practices for quantizing LLMs, with the goal of balancing performance with computational efficiency. For a fair analysis, we develop a quantization toolkit LLMC, and design four crucial principles considering the inference efficiency, quantized accuracy, calibration cost, and modularization. By benchmarking on various models and datasets with over 500 experiments, three takeaways corresponding to calibration data, quantization algorithm, and quantization schemes are derived. Finally, a best practice of LLM PTQ pipeline is constructed. All the benchmark results and the toolkit can be found at https://github.com/ModelTC/llmc.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Fast and Controllable Post-training Sparsity: Learning Optimal Sparsity Allocation with Global Constraint in Minutes
Authors:
Ruihao Gong,
Yang Yong,
Zining Wang,
Jinyang Guo,
Xiuying Wei,
Yuqing Ma,
Xianglong Liu
Abstract:
Neural network sparsity has attracted many research interests due to its similarity to biological schemes and high energy efficiency. However, existing methods depend on long-time training or fine-tuning, which prevents large-scale applications. Recently, some works focusing on post-training sparsity (PTS) have emerged. They get rid of the high training cost but usually suffer from distinct accura…
▽ More
Neural network sparsity has attracted many research interests due to its similarity to biological schemes and high energy efficiency. However, existing methods depend on long-time training or fine-tuning, which prevents large-scale applications. Recently, some works focusing on post-training sparsity (PTS) have emerged. They get rid of the high training cost but usually suffer from distinct accuracy degradation due to neglect of the reasonable sparsity rate at each layer. Previous methods for finding sparsity rates mainly focus on the training-aware scenario, which usually fails to converge stably under the PTS setting with limited data and much less training cost. In this paper, we propose a fast and controllable post-training sparsity (FCPTS) framework. By incorporating a differentiable bridge function and a controllable optimization objective, our method allows for rapid and accurate sparsity allocation learning in minutes, with the added assurance of convergence to a predetermined global sparsity rate. Equipped with these techniques, we can surpass the state-of-the-art methods by a large margin, e.g., over 30\% improvement for ResNet-50 on ImageNet under the sparsity rate of 80\%. Our plug-and-play code and supplementary materials are open-sourced at https://github.com/ModelTC/FCPTS.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Galaxy 3D Shape Recovery using Mixture Density Network
Authors:
Suk Yee Yong,
K. E. Harborne,
Caroline Foster,
Robert Bassett,
Gregory B. Poole,
Mitchell Cavanagh
Abstract:
Since the turn of the century, astronomers have been exploiting the rich information afforded by combining stellar kinematic maps and imaging in an attempt to recover the intrinsic, three-dimensional (3D) shape of a galaxy. A common intrinsic shape recovery method relies on an expected monotonic relationship between the intrinsic misalignment of the kinematic and morphological axes and the triaxia…
▽ More
Since the turn of the century, astronomers have been exploiting the rich information afforded by combining stellar kinematic maps and imaging in an attempt to recover the intrinsic, three-dimensional (3D) shape of a galaxy. A common intrinsic shape recovery method relies on an expected monotonic relationship between the intrinsic misalignment of the kinematic and morphological axes and the triaxiality parameter. Recent studies have, however, cast doubt about underlying assumptions relating shape and intrinsic kinematic misalignment. In this work, we aim to recover the 3D shape of individual galaxies using their projected stellar kinematic and flux distributions using a supervised machine learning approach with mixture density network (MDN). Using a mock dataset of the EAGLE hydrodynamical cosmological simulation, we train the MDN model for a carefully selected set of common kinematic and photometric parameters. Compared to previous methods, we demonstrate potential improvements achieved with the MDN model to retrieve the 3D galaxy shape along with the uncertainties, especially for prolate and triaxial systems. We make specific recommendations for recovering galaxy intrinsic shapes relevant for current and future integral field spectroscopic galaxy surveys.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Uncertainty Quantification of the Virial Black Hole Mass with Conformal Prediction
Authors:
Suk Yee Yong,
Cheng Soon Ong
Abstract:
Precise measurements of the black hole mass are essential to gain insight on the black hole and host galaxy co-evolution. A direct measure of the black hole mass is often restricted to nearest galaxies and instead, an indirect method using the single-epoch virial black hole mass estimation is used for objects at high redshifts. However, this method is subjected to biases and uncertainties as it is…
▽ More
Precise measurements of the black hole mass are essential to gain insight on the black hole and host galaxy co-evolution. A direct measure of the black hole mass is often restricted to nearest galaxies and instead, an indirect method using the single-epoch virial black hole mass estimation is used for objects at high redshifts. However, this method is subjected to biases and uncertainties as it is reliant on the scaling relation from a small sample of local active galactic nuclei. In this study, we propose the application of conformalised quantile regression (CQR) to quantify the uncertainties of the black hole predictions in a machine learning setting. We compare CQR with various prediction interval techniques and demonstrated that CQR can provide a more useful prediction interval indicator. In contrast to baseline approaches for prediction interval estimation, we show that the CQR method provides prediction intervals that adjust to the black hole mass and its related properties. That is it yields a tighter constraint on the prediction interval (hence more certain) for a larger black hole mass, and accordingly, bright and broad spectral line width source. Using a combination of neural network model and CQR framework, the recovered virial black hole mass predictions and uncertainties are comparable to those measured from the Sloan Digital Sky Survey. The code is publicly available at https://github.com/yongsukyee/uncertain_blackholemass.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
Compressing Models with Few Samples: Mimicking then Replacing
Authors:
Huanyu Wang,
Junjie Liu,
Xin Ma,
Yang Yong,
Zhenhua Chai,
Jianxin Wu
Abstract:
Few-sample compression aims to compress a big redundant model into a small compact one with only few samples. If we fine-tune models with these limited few samples directly, models will be vulnerable to overfit and learn almost nothing. Hence, previous methods optimize the compressed model layer-by-layer and try to make every layer have the same outputs as the corresponding layer in the teacher mo…
▽ More
Few-sample compression aims to compress a big redundant model into a small compact one with only few samples. If we fine-tune models with these limited few samples directly, models will be vulnerable to overfit and learn almost nothing. Hence, previous methods optimize the compressed model layer-by-layer and try to make every layer have the same outputs as the corresponding layer in the teacher model, which is cumbersome. In this paper, we propose a new framework named Mimicking then Replacing (MiR) for few-sample compression, which firstly urges the pruned model to output the same features as the teacher's in the penultimate layer, and then replaces teacher's layers before penultimate with a well-tuned compact one. Unlike previous layer-wise reconstruction methods, our MiR optimizes the entire network holistically, which is not only simple and effective, but also unsupervised and general. MiR outperforms previous methods with large margins. Codes will be available soon.
△ Less
Submitted 7 January, 2022;
originally announced January 2022.
-
Design and implementation of smart cooking based on amazon echo
Authors:
Lin Xiaoguang,
Yang Yong,
Zhang Ju
Abstract:
Smart cooking based on Amazon Echo uses the internet of things and cloud computing to assist in cooking food. People may speak to Amazon Echo during the cooking in order to get the information and situation of the cooking. Amazon Echo recognizes what people say, then transfers the information to the cloud services, and speaks to people the results that cloud services make by querying the embedded…
▽ More
Smart cooking based on Amazon Echo uses the internet of things and cloud computing to assist in cooking food. People may speak to Amazon Echo during the cooking in order to get the information and situation of the cooking. Amazon Echo recognizes what people say, then transfers the information to the cloud services, and speaks to people the results that cloud services make by querying the embedded cooking knowledge and achieving the information of intelligent kitchen devices online. An intelligent food thermometer and its mobile application are well-designed and implemented to monitor the temperature of cooking food.
△ Less
Submitted 4 December, 2018;
originally announced December 2018.
-
A Study on Potential of Integrating Multimodal Interaction into Musical Conducting Education
Authors:
Gilbert Phuah Leong Siang,
Nor Azman Ismail,
Pang Yee Yong
Abstract:
With the rapid development of computer technology, computer music has begun to appear in the laboratory. Many potential utility of computer music is gradually increasing. The purpose of this paper is attempted to analyze the possibility of integrating multimodal interaction such as vision-based hand gesture and speech interaction into musical conducting education. To achieve this purpose, this pap…
▽ More
With the rapid development of computer technology, computer music has begun to appear in the laboratory. Many potential utility of computer music is gradually increasing. The purpose of this paper is attempted to analyze the possibility of integrating multimodal interaction such as vision-based hand gesture and speech interaction into musical conducting education. To achieve this purpose, this paper is focus on discuss some related research and the traditional musical conducting education. To do so, six musical conductors had been interviewed to share their musical conducting learning/ teaching experience. These interviews had been analyzed in this paper to show the syllabus and the focus of musical conducting education for beginners.
△ Less
Submitted 21 May, 2010;
originally announced May 2010.