subscribe to arXiv mailings

Spatial Non-Stationary Dual-Wideband Channel Estimation for XL-MIMO Systems

Authors: Anzheng Tang, Jun-Bo Wang, Yijin Pan, Tuo Wu, Chuanwen Chang, Yijian Chen, Hongkang Yu, Maged Elkashlan

Abstract: In this paper, we investigate the channel estimation problem for extremely large-scale multi-input and multi-output (XL-MIMO) systems, considering the spherical wavefront effect, spatially non-stationary (SnS) property, and dual-wideband effects. To accurately characterize the XL-MIMO channel, we first derive a novel spatial-and-frequency-domain channel model for XL-MIMO systems and carefully exam… ▽ More In this paper, we investigate the channel estimation problem for extremely large-scale multi-input and multi-output (XL-MIMO) systems, considering the spherical wavefront effect, spatially non-stationary (SnS) property, and dual-wideband effects. To accurately characterize the XL-MIMO channel, we first derive a novel spatial-and-frequency-domain channel model for XL-MIMO systems and carefully examine the channel characteristics in the angular-and-delay domain. Based on the obtained channel representation, we formulate XL-MIMO channel estimation as a Bayesian inference problem. To fully exploit the clustered sparsity of angular-and-delay channels and capture the inter-antenna and inter-subcarrier correlations, a Markov random field (MRF)-based hierarchical prior model is adopted. Meanwhile, to facilitate efficient channel reconstruction, we propose a sparse Bayesian learning (SBL) algorithm based on approximate message passing (AMP) with a unitary transformation. Tailored to the MRF-based hierarchical prior model, the message passing equations are reformulated using structured variational inference, belief propagation, and mean-field rules. Finally, simulation results validate the convergence and superiority of the proposed algorithm over existing methods. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: This paper has been submitted to IEEE journal for possible publication

arXiv:2407.02666 [pdf, other]

Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models

Authors: Annie S. Chen, Alec M. Lessing, Andy Tang, Govind Chada, Laura Smith, Sergey Levine, Chelsea Finn

Abstract: Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions. For example, in a search and rescue mission, a legged robot could climb over debris, crawl through gaps, and navigate out of dead ends. However, the robot's controller needs to respond intelligently to such varied obstacles, and this requires handling unexpected and unu… ▽ More Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions. For example, in a search and rescue mission, a legged robot could climb over debris, crawl through gaps, and navigate out of dead ends. However, the robot's controller needs to respond intelligently to such varied obstacles, and this requires handling unexpected and unusual scenarios successfully. This presents an open challenge to current learning methods, which often struggle with generalization to the long tail of unexpected situations without heavy human supervision. To address this issue, we investigate how to leverage the broad knowledge about the structure of the world and commonsense reasoning capabilities of vision-language models (VLMs) to aid legged robots in handling difficult, ambiguous situations. We propose a system, VLM-Predictive Control (VLM-PC), combining two key components that we find to be crucial for eliciting on-the-fly, adaptive behavior selection with VLMs: (1) in-context adaptation over previous robot interactions and (2) planning multiple skills into the future and replanning. We evaluate VLM-PC on several challenging real-world obstacle courses, involving dead ends and climbing and crawling, on a Go1 quadruped robot. Our experiments show that by reasoning over the history of interactions and future plans, VLMs enable the robot to autonomously perceive, navigate, and act in a wide range of complex scenarios that would otherwise require environment-specific engineering or human guidance. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 27 pages

arXiv:2406.09770 [pdf, other]

Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion

Authors: Anke Tang, Li Shen, Yong Luo, Shiwei Liu, Han Hu, Bo Du

Abstract: Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost of training and evaluating models. Efficient Pareto front approximation of large models enables multi-objective optimization for various tasks such as multi-task learning and trade-off analysis. Existing algorithms for l… ▽ More Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost of training and evaluating models. Efficient Pareto front approximation of large models enables multi-objective optimization for various tasks such as multi-task learning and trade-off analysis. Existing algorithms for learning Pareto set, including (1) evolutionary, hypernetworks, and hypervolume-maximization methods, are computationally expensive and have restricted scalability to large models; (2) Scalarization algorithms, where a separate model is trained for each objective ray, which is inefficient for learning the entire Pareto set and fails to capture the objective trade-offs effectively. Inspired by the recent success of model merging, we propose a practical and scalable approach to Pareto set learning problem via mixture of experts (MoE) based model fusion. By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives and closely approximate the entire Pareto set of large neural networks. Once the routers are learned and a preference vector is set, the MoE module can be unloaded, thus no additional computational cost is introduced during inference. We conduct extensive experiments on vision and language tasks using large-scale models such as CLIP-ViT and GPT-2. The experimental results demonstrate that our method efficiently approximates the entire Pareto front of large models. Using only hundreds of trainable parameters of the MoE routers, our method even has lower memory usage compared to linear scalarization and algorithms that learn a single Pareto optimal solution, and are scalable to both the number of objectives and the size of the model. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: code is available at https://github.com/tanganke/pareto_set_learning

arXiv:2406.03280 [pdf, other]

FusionBench: A Comprehensive Benchmark of Deep Model Fusion

Authors: Anke Tang, Li Shen, Yong Luo, Han Hu, Bo Du, Dacheng Tao

Abstract: Deep model fusion is an emerging technique that unifies the predictions or parameters of several deep neural networks into a single model in a cost-effective and data-efficient manner. This enables the unified model to take advantage of the original models' strengths, potentially exceeding their performance. Although a variety of deep model fusion techniques have been introduced, their evaluations… ▽ More Deep model fusion is an emerging technique that unifies the predictions or parameters of several deep neural networks into a single model in a cost-effective and data-efficient manner. This enables the unified model to take advantage of the original models' strengths, potentially exceeding their performance. Although a variety of deep model fusion techniques have been introduced, their evaluations tend to be inconsistent and often inadequate to validate their effectiveness and robustness against distribution shifts. To address this issue, we introduce FusionBench, which is the first comprehensive benchmark dedicated to deep model fusion. FusionBench covers a wide range of tasks, including open-vocabulary image classification, text classification, and text-to-text generation. Each category includes up to eight tasks with corresponding task-specific models, featuring both full fine-tuning and LoRA fine-tuning, as well as models of different sizes, to ensure fair and balanced comparisons of various multi-task model fusion techniques across different tasks, model scales, and fine-tuning strategies. We implement and evaluate a broad spectrum of deep model fusion techniques. These techniques range from model ensemble methods, which combine the predictions to improve the overall performance, to model merging, which integrates different models into a single one, and model mixing methods, which upscale or recombine the components of the original models. FusionBench now contains 26 distinct tasks, 74 fine-tuned models, and 16 fusion techniques, and we are committed to consistently expanding the benchmark with more tasks, models, and fusion techniques. In addition, we offer a well-documented set of resources and guidelines to aid researchers in understanding and replicating the benchmark results. Homepage https://github.com/tanganke/fusion_bench △ Less

Submitted 14 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

Comments: Project homepage: https://github.com/tanganke/fusion_bench

arXiv:2403.14525 [pdf, other]

Optimizing queues with deadlines under infrequent monitoring

Authors: Faraz Farahvash, Ao Tang

Abstract: In this paper, we aim to improve the percentage of packets meeting their deadline in discrete-time M/M/1 queues with infrequent monitoring. More specifically, we look into policies that only monitor the system (and subsequently take actions) after a packet arrival. We model the system as an MDP and provide the optimal policy for some special cases. Furthermore, we introduce a heuristic algorithm c… ▽ More In this paper, we aim to improve the percentage of packets meeting their deadline in discrete-time M/M/1 queues with infrequent monitoring. More specifically, we look into policies that only monitor the system (and subsequently take actions) after a packet arrival. We model the system as an MDP and provide the optimal policy for some special cases. Furthermore, we introduce a heuristic algorithm called "AB-n" for general deadlines. Finally, we provide numerical results demonstrating the desirable performance of "AB-n" policies. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 11 pages, 6 figures

arXiv:2403.02633 [pdf, other]

Spatially Non-Stationary XL-MIMO Channel Estimation: A Three-Layer Generalized Approximate Message Passing Method

Authors: Anzheng Tang, Jun-Bo Wang, Yijin Pan, Wence Zhang, Xiaodan Zhang, Yijian Chen, Hongkang Yu, Rodrigo C. de Lamare

Abstract: In this paper, channel estimation problem for extremely large-scale multi-input multi-output (XL-MIMO) systems is investigated with the considerations of the spherical wavefront effect and the spatially non-stationary (SnS) property. Due to the diversities of SnS characteristics among different propagation paths, the concurrent channel estimation of multiple paths becomes intractable. To address t… ▽ More In this paper, channel estimation problem for extremely large-scale multi-input multi-output (XL-MIMO) systems is investigated with the considerations of the spherical wavefront effect and the spatially non-stationary (SnS) property. Due to the diversities of SnS characteristics among different propagation paths, the concurrent channel estimation of multiple paths becomes intractable. To address this challenge, we propose a two-phase channel estimation scheme. In the first phase, the angles of departure (AoDs) on the user side are estimated, and a carefully designed pilot transmission scheme enables the decomposition of the received signal from different paths. In the second phase, the subchannel estimation corresponding to different paths is formulated as a three-layer Bayesian inference problem. Specifically, the first layer captures block sparsity in the angular domain, the second layer promotes SnS property in the antenna domain, and the third layer decouples the subchannels from the observed signals. To efficiently facilitate Bayesian inference, we propose a novel three-layer generalized approximate message passing (TL-GAMP) algorithm based on structured variational massage passing and belief propagation rules. Simulation results validate the convergence and effectiveness of the proposed algorithm, showcasing its robustness to different channel scenarios. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: This manuscript has been submitted to the IEEE journal for possible pubilcation

arXiv:2402.19173 [pdf, other]

StarCoder 2 and The Stack v2: The Next Generation

Authors: Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo , et al. (41 additional authors not shown)

Abstract: The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data… ▽ More The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.10525 [pdf, other]

doi 10.1145/3643834.3661547

How People Prompt to Create Interactive VR Scenes

Authors: Setareh Aghel Manesh, Tianyi Zhang, Yuki Onishi, Kotaro Hara, Scott Bateman, Jiannan Li, Anthony Tang

Abstract: Generative AI tools can provide people with the ability to create virtual environments and scenes with natural language prompts. Yet, how people will formulate such prompts is unclear -- particularly when they inhabit the environment that they are designing. For instance, it is likely that a person might say, "Put a chair here", while pointing at a location. If such linguistic features are common… ▽ More Generative AI tools can provide people with the ability to create virtual environments and scenes with natural language prompts. Yet, how people will formulate such prompts is unclear -- particularly when they inhabit the environment that they are designing. For instance, it is likely that a person might say, "Put a chair here", while pointing at a location. If such linguistic features are common to people's prompts, we need to tune models to accommodate them. In this work, we present a wizard-of-oz elicitation study with 22 participants, where we studied people's implicit expectations when verbally prompting such programming agents to create interactive VR scenes. Our findings show that people prompt with several implicit expectations: (1) that agents have an embodied knowledge of the environment; (2) that agents understand embodied prompts by users; (3) that the agents can recall previous states of the scene and the conversation, and that (4) agents have a commonsense understanding of objects in the scene. Further, we found that participants prompt differently when they are prompting in situ (i.e. within the VR environment) versus ex situ (i.e. viewing the VR environment from the outside). To explore how our could be applied, we designed and built Oastaad, a conversational programming agent that allows non-programmers to design interactive VR experiences that they inhabit. Based on these explorations, we outline new opportunities and challenges for conversational programming agents that create VR environments. △ Less

Submitted 29 May, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

Comments: Accepted at ACM 2024 Designing Interactive Systems (DIS)

arXiv:2402.04958 [pdf, other]

Channel-Selective Normalization for Label-Shift Robust Test-Time Adaptation

Authors: Pedro Vianna, Muawiz Chaudhary, Paria Mehrbod, An Tang, Guy Cloutier, Guy Wolf, Michael Eickenberg, Eugene Belilovsky

Abstract: Deep neural networks have useful applications in many different tasks, however their performance can be severely affected by changes in the data distribution. For example, in the biomedical field, their performance can be affected by changes in the data (different machines, populations) between training and test datasets. To ensure robustness and generalization to real-world scenarios, test-time a… ▽ More Deep neural networks have useful applications in many different tasks, however their performance can be severely affected by changes in the data distribution. For example, in the biomedical field, their performance can be affected by changes in the data (different machines, populations) between training and test datasets. To ensure robustness and generalization to real-world scenarios, test-time adaptation has been recently studied as an approach to adjust models to a new data distribution during inference. Test-time batch normalization is a simple and popular method that achieved compelling performance on domain shift benchmarks. It is implemented by recalculating batch normalization statistics on test batches. Prior work has focused on analysis with test data that has the same label distribution as the training data. However, in many practical applications this technique is vulnerable to label distribution shifts, sometimes producing catastrophic failure. This presents a risk in applying test time adaptation methods in deployment. We propose to tackle this challenge by only selectively adapting channels in a deep network, minimizing drastic adaptation that is sensitive to label shifts. Our selection scheme is based on two principles that we empirically motivate: (1) later layers of networks are more sensitive to label shift (2) individual features can be sensitive to specific classes. We apply the proposed technique to three classification tasks, including CIFAR10-C, Imagenet-C, and diagnosis of fatty liver, where we explore both covariate and label distribution shifts. We find that our method allows to bring the benefits of TTA while significantly reducing the risk of failure common in other methods, while being robust to choice in hyperparameters. △ Less

Submitted 29 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

Comments: Accepted at the Conference on Lifelong Learning Agents (CoLLAs) 2024

arXiv:2402.01380 [pdf, other]

Efficient Dynamic-NeRF Based Volumetric Video Coding with Rate Distortion Optimization

Authors: Zhiyu Zhang, Guo Lu, Huanxiong Liang, Anni Tang, Qiang Hu, Li Song

Abstract: Volumetric videos, benefiting from immersive 3D realism and interactivity, hold vast potential for various applications, while the tremendous data volume poses significant challenges for compression. Recently, NeRF has demonstrated remarkable potential in volumetric video compression thanks to its simple representation and powerful 3D modeling capabilities, where a notable work is ReRF. However, R… ▽ More Volumetric videos, benefiting from immersive 3D realism and interactivity, hold vast potential for various applications, while the tremendous data volume poses significant challenges for compression. Recently, NeRF has demonstrated remarkable potential in volumetric video compression thanks to its simple representation and powerful 3D modeling capabilities, where a notable work is ReRF. However, ReRF separates the modeling from compression process, resulting in suboptimal compression efficiency. In contrast, in this paper, we propose a volumetric video compression method based on dynamic NeRF in a more compact manner. Specifically, we decompose the NeRF representation into the coefficient fields and the basis fields, incrementally updating the basis fields in the temporal domain to achieve dynamic modeling. Additionally, we perform end-to-end joint optimization on the modeling and compression process to further improve the compression efficiency. Extensive experiments demonstrate that our method achieves higher compression efficiency compared to ReRF on various datasets. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2402.00433 [pdf, other]

Merging Multi-Task Models via Weight-Ensembling Mixture of Experts

Authors: Anke Tang, Li Shen, Yong Luo, Nan Yin, Lefei Zhang, Dacheng Tao

Abstract: Merging various task-specific Transformer-based models trained on different tasks into a single unified model can execute all the tasks concurrently. Previous methods, exemplified by task arithmetic, have been proven to be both effective and scalable. Existing methods have primarily focused on seeking a static optimal solution within the original model parameter space. A notable challenge is mitig… ▽ More Merging various task-specific Transformer-based models trained on different tasks into a single unified model can execute all the tasks concurrently. Previous methods, exemplified by task arithmetic, have been proven to be both effective and scalable. Existing methods have primarily focused on seeking a static optimal solution within the original model parameter space. A notable challenge is mitigating the interference between parameters of different models, which can substantially deteriorate performance. In this paper, we propose to merge most of the parameters while upscaling the MLP of the Transformer layers to a weight-ensembling mixture of experts (MoE) module, which can dynamically integrate shared and task-specific knowledge based on the input, thereby providing a more flexible solution that can adapt to the specific needs of each instance. Our key insight is that by identifying and separating shared knowledge and task-specific knowledge, and then dynamically integrating them, we can mitigate the parameter interference problem to a great extent. We conduct the conventional multi-task model merging experiments and evaluate the generalization and robustness of our method. The results demonstrate the effectiveness of our method and provide a comprehensive understanding of our method. The code is available at https://github.com/tanganke/weight-ensembling_MoE △ Less

Submitted 7 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

arXiv:2312.06173 [pdf, other]

Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion

Authors: Anke Tang, Li Shen, Yong Luo, Liang Ding, Han Hu, Bo Du, Dacheng Tao

Abstract: Merging models fine-tuned from a common, extensively pre-trained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multi-task model that performs well across diverse tasks. Recent research, exemplified by task arithmetic, highlights that this multi-task model can be derived through arithmetic operations on task vectors. Neverthele… ▽ More Merging models fine-tuned from a common, extensively pre-trained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multi-task model that performs well across diverse tasks. Recent research, exemplified by task arithmetic, highlights that this multi-task model can be derived through arithmetic operations on task vectors. Nevertheless, current merging techniques frequently resolve potential conflicts among parameters from task-specific models by evaluating individual attributes, such as the parameters' magnitude or sign, overlooking their collective impact on the overall functionality of the model. In this work, we propose the CONtinuous relaxation of disCRETE (Concrete) subspace learning method to identify a common low-dimensional subspace and utilize its shared information to track the interference problem without sacrificing much performance. Specifically, we model the problem as a bi-level optimization problem and introduce a meta-learning framework to find the Concrete subspace mask through gradient-based techniques. At the upper level, we focus on learning a shared Concrete mask to identify the subspace, while at the inner level, model merging is performed to maximize the performance of the merged model. We conduct extensive experiments on both vision domain and language domain, and the results demonstrate the effectiveness of our method. The code is available at https://github.com/tanganke/subspace_fusion △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.04586 [pdf, other]

Automated SELinux RBAC Policy Verification Using SMT

Authors: Divyam Pahuja, Alvin Tang, Klim Tsoutsman

Abstract: Security-Enhanced Linux (SELinux) is a Linux kernel module that allows for a role-based access control (RBAC) mechanism. It provides a fine-grained security framework enabling system administrators to define security policies at the system and application level. Whilst SELinux offers robust security features through a customisable, powerful RBAC model, its manual policy management is prone to erro… ▽ More Security-Enhanced Linux (SELinux) is a Linux kernel module that allows for a role-based access control (RBAC) mechanism. It provides a fine-grained security framework enabling system administrators to define security policies at the system and application level. Whilst SELinux offers robust security features through a customisable, powerful RBAC model, its manual policy management is prone to error, leaving the system vulnerable to accidental misconfigurations or loopholes. We present a tool to automate the conversion of SELinux policies into satisfiability modulo theories (SMT), enabling the verification of the intended security configurations using automated theorem proving. Our tool is capable of flagging common policy misconfigurations by asserting consistency between supplied RBAC policies and the intended specification by the user in SMT. RBAC policies are inherently complicated to verify entirely. We envision that the automated tool presented here can be further extended to identify an even broader range of policy misconfigurations, relieving the burden of managing convoluted policies on system administrators. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: 10 pages (excluding appendices), 2 figures, 3 appendices

ACM Class: F.4.1; D.4.6

arXiv:2311.10305 [pdf, other]

Semi-supervised ViT knowledge distillation network with style transfer normalization for colorectal liver metastases survival prediction

Authors: Mohamed El Amine Elforaici, Emmanuel Montagnon, Francisco Perdigon Romero, William Trung Le, Feryel Azzi, Dominique Trudel, Bich Nguyen, Simon Turcotte, An Tang, Samuel Kadoury

Abstract: Colorectal liver metastases (CLM) significantly impact colon cancer patients, influencing survival based on systemic chemotherapy response. Traditional methods like tumor grading scores (e.g., tumor regression grade - TRG) for prognosis suffer from subjectivity, time constraints, and expertise demands. Current machine learning approaches often focus on radiological data, yet the relevance of histo… ▽ More Colorectal liver metastases (CLM) significantly impact colon cancer patients, influencing survival based on systemic chemotherapy response. Traditional methods like tumor grading scores (e.g., tumor regression grade - TRG) for prognosis suffer from subjectivity, time constraints, and expertise demands. Current machine learning approaches often focus on radiological data, yet the relevance of histological images for survival predictions, capturing intricate tumor microenvironment characteristics, is gaining recognition. To address these limitations, we propose an end-to-end approach for automated prognosis prediction using histology slides stained with H&E and HPS. We first employ a Generative Adversarial Network (GAN) for slide normalization to reduce staining variations and improve the overall quality of the images that are used as input to our prediction pipeline. We propose a semi-supervised model to perform tissue classification from sparse annotations, producing feature maps. We use an attention-based approach that weighs the importance of different slide regions in producing the final classification results. We exploit the extracted features for the metastatic nodules and surrounding tissue to train a prognosis model. In parallel, we train a vision Transformer (ViT) in a knowledge distillation framework to replicate and enhance the performance of the prognosis prediction. In our evaluation on a clinical dataset of 258 patients, our approach demonstrates superior performance with c-indexes of 0.804 (0.014) for OS and 0.733 (0.014) for TTR. Achieving 86.9% to 90.3% accuracy in predicting TRG dichotomization and 78.5% to 82.1% accuracy for the 3-class TRG classification task, our approach outperforms comparative methods. Our proposed pipeline can provide automated prognosis for pathologists and oncologists, and can greatly promote precision medicine progress in managing CLM patients. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: 16 pages, 7 figures and 7 tables. Submitted to Medical Journal Analysis (MedIA) journal

arXiv:2311.09490 [pdf, other]

Joint Visibility Region and Channel Estimation for Extremely Large-scale MIMO Systems

Authors: Anzheng Tang, Jun-bo Wang, Yijin Pan, Wence Zhang, Yijian Chen, Xiaodan Zhang, Hongkang Yu, Rodrigo C. de Lamare

Abstract: In this work, we investigate the joint visibility region (VR) detection and channel estimation (CE) problem for extremely large-scale multiple-input-multiple-output (XL-MIMO) systems considering both the spherical wavefront effect and spatial non-stationary (SnS) property. Unlike existing SnS CE methods that rely on the statistical characteristics of channels in the spatial or delay domain, we pro… ▽ More In this work, we investigate the joint visibility region (VR) detection and channel estimation (CE) problem for extremely large-scale multiple-input-multiple-output (XL-MIMO) systems considering both the spherical wavefront effect and spatial non-stationary (SnS) property. Unlike existing SnS CE methods that rely on the statistical characteristics of channels in the spatial or delay domain, we propose an approach that simultaneously exploits the antenna-domain spatial correlation and the wavenumber-domain sparsity of SnS channels. To this end, we introduce a two-stage VR detection and CE scheme. In the first stage, the belief regarding the visibility of antennas is obtained through a VR detection-oriented message passing (VRDO-MP) scheme, which fully exploits the spatial correlation among adjacent antenna elements. In the second stage, leveraging the VR information and wavenumber-domain sparsity, we accurately estimate the SnS channel employing the belief-based orthogonal matching pursuit (BB-OMP) method. Simulations show that the proposed algorithms lead to a significant enhancement in VR detection and CE accuracy as compared to existing methods, especially in low signal-to-noise ratio (SNR) scenarios. △ Less

Submitted 30 March, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

Comments: A major revision version has been submitted to IEEE journal

arXiv:2311.01049 [pdf]

Multi-dimensional data refining strategy for effective fine-tuning LLMs

Authors: Thanh Nguyen Ngoc, Quang Nhat Tran, Arthur Tang, Bao Nguyen, Thuy Nguyen, Thanh Pham

Abstract: Data is a cornerstone for fine-tuning large language models, yet acquiring suitable data remains challenging. Challenges encompassed data scarcity, linguistic diversity, and domain-specific content. This paper presents lessons learned while crawling and refining data tailored for fine-tuning Vietnamese language models. Crafting such a dataset, while accounting for linguistic intricacies and striki… ▽ More Data is a cornerstone for fine-tuning large language models, yet acquiring suitable data remains challenging. Challenges encompassed data scarcity, linguistic diversity, and domain-specific content. This paper presents lessons learned while crawling and refining data tailored for fine-tuning Vietnamese language models. Crafting such a dataset, while accounting for linguistic intricacies and striking a balance between inclusivity and accuracy, demands meticulous planning. Our paper presents a multidimensional strategy including leveraging existing datasets in the English language and developing customized data-crawling scripts with the assistance of generative AI tools. A fine-tuned LLM model for the Vietnamese language, which was produced using resultant datasets, demonstrated good performance while generating Vietnamese news articles from prompts. The study offers practical solutions and guidance for future fine-tuning models in languages like Vietnamese. △ Less

Submitted 2 November, 2023; originally announced November 2023.

arXiv:2311.01048 [pdf]

AI-assisted Learning for Electronic Engineering Courses in High Education

Authors: Thanh Nguyen Ngoc, Quang Nhat Tran, Arthur Tang, Bao Nguyen, Thuy Nguyen, Thanh Pham

Abstract: This study evaluates the efficacy of ChatGPT as an AI teaching and learning support tool in an integrated circuit systems course at a higher education institution in an Asian country. Various question types were completed, and ChatGPT responses were assessed to gain valuable insights for further investigation. The objective is to assess ChatGPT's ability to provide insights, personalized support,… ▽ More This study evaluates the efficacy of ChatGPT as an AI teaching and learning support tool in an integrated circuit systems course at a higher education institution in an Asian country. Various question types were completed, and ChatGPT responses were assessed to gain valuable insights for further investigation. The objective is to assess ChatGPT's ability to provide insights, personalized support, and interactive learning experiences in engineering education. The study includes the evaluation and reflection of different stakeholders: students, lecturers, and engineers. The findings of this study shed light on the benefits and limitations of ChatGPT as an AI tool, paving the way for innovative learning approaches in technical disciplines. Furthermore, the study contributes to our understanding of how digital transformation is likely to unfold in the education sector. △ Less

Submitted 2 November, 2023; originally announced November 2023.

arXiv:2310.20227 [pdf, ps, other]

Achieving Scalable Capacity in Wireless Mesh Networks

Authors: Lei Lei, Aimin Tang, Xudong Wang

Abstract: Wireless mesh networks play a critical role in enabling key networking scenarios in beyond-5G (B5G) and 6G networks, including integrated access and backhaul (IAB), multi-hop sidelinks, and V2X. However, it still poses a challenge to deliver scalable per-node throughput via mesh networking, which significantly limits the potential of large-scale deployment of wireless mesh networks. Existing resea… ▽ More Wireless mesh networks play a critical role in enabling key networking scenarios in beyond-5G (B5G) and 6G networks, including integrated access and backhaul (IAB), multi-hop sidelinks, and V2X. However, it still poses a challenge to deliver scalable per-node throughput via mesh networking, which significantly limits the potential of large-scale deployment of wireless mesh networks. Existing research has achieved $O(1)$ per-node throughput in a dense network, but how to achieve scalability remains an unresolved issue for an extended wireless network where the network size increases with a constant node density. This issue prevents a wireless mesh network from large-scale deployment. To this end, this paper aims to develop a theoretical approach to achieving scalable per-node throughput in wireless mesh networks. First, the key factors that limit the per-node throughput of wireless mesh networks are analyzed, through which two major ones are identified, i.e., link sharing and interference. Next, a multi-tier hierarchical architecture is proposed to overcome the link-sharing issue. The inter-tier interference under this architecture is then mitigated by utilizing orthogonal frequency allocation between adjacent tiers, while the intra-tier interference is reduced by considering two specific transmission schemes, one is MIMO spatial multiplexing with time-division, the other is MIMO beamforming. Theoretical analysis shows that the multi-tier mesh networking architecture can achieve a per-node throughput of $Θ(1)$ in both schemes, as long as certain conditions on network parameters including bandwidth, antenna numbers, and node numbers of each tier are satisfied. A case study on a realistic deployment of 10,000 nodes is then carried out, which demonstrates that a scalable throughput of $Θ(1)$ is achievable with a reasonable assumption on bandwidth and antenna numbers. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: ~12pages, 4 figures, submitted to IEEE TIT, part of this work has been published in IEEE MASS 2022

arXiv:2310.18646 [pdf]

Predicting Agricultural Commodities Prices with Machine Learning: A Review of Current Research

Authors: Nhat-Quang Tran, Anna Felipe, Thanh Nguyen Ngoc, Tom Huynh, Quang Tran, Arthur Tang, Thuy Nguyen

Abstract: Agricultural price prediction is crucial for farmers, policymakers, and other stakeholders in the agricultural sector. However, it is a challenging task due to the complex and dynamic nature of agricultural markets. Machine learning algorithms have the potential to revolutionize agricultural price prediction by improving accuracy, real-time prediction, customization, and integration. This paper re… ▽ More Agricultural price prediction is crucial for farmers, policymakers, and other stakeholders in the agricultural sector. However, it is a challenging task due to the complex and dynamic nature of agricultural markets. Machine learning algorithms have the potential to revolutionize agricultural price prediction by improving accuracy, real-time prediction, customization, and integration. This paper reviews recent research on machine learning algorithms for agricultural price prediction. We discuss the importance of agriculture in developing countries and the problems associated with crop price falls. We then identify the challenges of predicting agricultural prices and highlight how machine learning algorithms can support better prediction. Next, we present a comprehensive analysis of recent research, discussing the strengths and weaknesses of various machine learning techniques. We conclude that machine learning has the potential to revolutionize agricultural price prediction, but further research is essential to address the limitations and challenges associated with this approach. △ Less

Submitted 28 October, 2023; originally announced October 2023.

arXiv:2310.08184 [pdf, other]

Learn From Model Beyond Fine-Tuning: A Survey

Authors: Hongling Zheng, Li Shen, Anke Tang, Yong Luo, Han Hu, Bo Du, Dacheng Tao

Abstract: Foundation models (FM) have demonstrated remarkable performance across a wide range of tasks (especially in the fields of natural language processing and computer vision), primarily attributed to their ability to comprehend instructions and access extensive, high-quality data. This not only showcases their current effectiveness but also sets a promising trajectory towards the development of artifi… ▽ More Foundation models (FM) have demonstrated remarkable performance across a wide range of tasks (especially in the fields of natural language processing and computer vision), primarily attributed to their ability to comprehend instructions and access extensive, high-quality data. This not only showcases their current effectiveness but also sets a promising trajectory towards the development of artificial general intelligence. Unfortunately, due to multiple constraints, the raw data of the model used for large model training are often inaccessible, so the use of end-to-end models for downstream tasks has become a new research trend, which we call Learn From Model (LFM) in this article. LFM focuses on the research, modification, and design of FM based on the model interface, so as to better understand the model structure and weights (in a black box environment), and to generalize the model to downstream tasks. The study of LFM techniques can be broadly categorized into five major areas: model tuning, model distillation, model reuse, meta learning and model editing. Each category encompasses a repertoire of methods and strategies that aim to enhance the capabilities and performance of FM. This paper gives a comprehensive review of the current methods based on FM from the perspective of LFM, in order to help readers better understand the current research status and ideas. To conclude, we summarize the survey by highlighting several critical areas for future exploration and addressing open issues that require further attention from the research community. The relevant papers we investigated in this article can be accessed at <https://github.com/ruthless-man/Awesome-Learn-from-Model>. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: 20 pages, 9 figures

arXiv:2310.04742 [pdf, other]

Parameter Efficient Multi-task Model Fusion with Partial Linearization

Authors: Anke Tang, Li Shen, Yong Luo, Yibing Zhan, Han Hu, Bo Du, Yixin Chen, Dacheng Tao

Abstract: Large pre-trained models have enabled significant advances in machine learning and served as foundation components. Model fusion methods, such as task arithmetic, have been proven to be powerful and scalable to incorporate fine-tuned weights from different tasks into a multi-task model. However, efficiently fine-tuning large pre-trained models on multiple downstream tasks remains challenging, lead… ▽ More Large pre-trained models have enabled significant advances in machine learning and served as foundation components. Model fusion methods, such as task arithmetic, have been proven to be powerful and scalable to incorporate fine-tuned weights from different tasks into a multi-task model. However, efficiently fine-tuning large pre-trained models on multiple downstream tasks remains challenging, leading to inefficient multi-task model fusion. In this work, we propose a novel method to improve multi-task fusion for parameter-efficient fine-tuning techniques like LoRA fine-tuning. Specifically, our approach partially linearizes only the adapter modules and applies task arithmetic over the linearized adapters. This allows us to leverage the the advantages of model fusion over linearized fine-tuning, while still performing fine-tuning and inference efficiently. We demonstrate that our partial linearization technique enables a more effective fusion of multiple tasks into a single model, outperforming standard adapter tuning and task arithmetic alone. Experimental results demonstrate the capabilities of our proposed partial linearization technique to effectively construct unified multi-task models via the fusion of fine-tuned task vectors. We evaluate performance over an increasing number of tasks and find that our approach outperforms standard parameter-efficient fine-tuning techniques. The results highlight the benefits of partial linearization for scalable and efficient multi-task model fusion. The code is available at https://github.com/tanganke/peta △ Less

Submitted 11 March, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

arXiv:2309.14225 [pdf, other]

HumanMimic: Learning Natural Locomotion and Transitions for Humanoid Robot via Wasserstein Adversarial Imitation

Authors: Annan Tang, Takuma Hiraoka, Naoki Hiraoka, Fan Shi, Kento Kawaharazuka, Kunio Kojima, Kei Okada, Masayuki Inaba

Abstract: Transferring human motion skills to humanoid robots remains a significant challenge. In this study, we introduce a Wasserstein adversarial imitation learning system, allowing humanoid robots to replicate natural whole-body locomotion patterns and execute seamless transitions by mimicking human motions. First, we present a unified primitive-skeleton motion retargeting to mitigate morphological diff… ▽ More Transferring human motion skills to humanoid robots remains a significant challenge. In this study, we introduce a Wasserstein adversarial imitation learning system, allowing humanoid robots to replicate natural whole-body locomotion patterns and execute seamless transitions by mimicking human motions. First, we present a unified primitive-skeleton motion retargeting to mitigate morphological differences between arbitrary human demonstrators and humanoid robots. An adversarial critic component is integrated with Reinforcement Learning (RL) to guide the control policy to produce behaviors aligned with the data distribution of mixed reference motions. Additionally, we employ a specific Integral Probabilistic Metric (IPM), namely the Wasserstein-1 distance with a novel soft boundary constraint to stabilize the training process and prevent mode collapse. Our system is evaluated on a full-sized humanoid JAXON in the simulator. The resulting control policy demonstrates a wide range of locomotion patterns, including standing, push-recovery, squat walking, human-like straight-leg walking, and dynamic running. Notably, even in the absence of transition motions in the demonstration dataset, robots showcase an emerging ability to transit naturally between distinct locomotion patterns as desired speed changes. △ Less

Submitted 23 April, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

arXiv:2308.12983 [pdf, ps, other]

Implementation of Formal Semantics and the Potential of Non-Classical Logic Systems for the Enhancement of Access Control Models: A Literature Review

Authors: Alvin Tang

Abstract: This literature review discovers an implementation of formal logic systems in cyber security by enhancing access control models. We explore the characteristics of the existing access control theories, their limitations and how classical logic is used therein. We then delve into the possibility of utilising non-classical logic systems for improving the models. In particular, we explore how classica… ▽ More This literature review discovers an implementation of formal logic systems in cyber security by enhancing access control models. We explore the characteristics of the existing access control theories, their limitations and how classical logic is used therein. We then delve into the possibility of utilising non-classical logic systems for improving the models. In particular, we explore how classical logic can be used to describe and prove the correctness of role-based access control and attribute-based access control models. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: 10 pages

ACM Class: F.4.1; D.4.6

arXiv:2307.05609 [pdf, other]

Virtual Network Embedding without Explicit Virtual Network Specification

Authors: Jiangnan Cheng, Yingjie Bi, Ao Tang

Abstract: Network virtualization enables Internet service providers to run multiple heterogeneous and dedicated network architectures for different customers on a shared substrate. In existing works on virtual network embedding (VNE), each customer formulates a virtual network request (VNR) where a virtual network (VN) is required. Motivated by a concrete example where VN is not a proper VNR formulation to… ▽ More Network virtualization enables Internet service providers to run multiple heterogeneous and dedicated network architectures for different customers on a shared substrate. In existing works on virtual network embedding (VNE), each customer formulates a virtual network request (VNR) where a virtual network (VN) is required. Motivated by a concrete example where VN is not a proper VNR formulation to reflect the traffic demand of a customer, we propose a new VNR formulation described by the traffic demand between several access node pairs to complement the existing VNR formulation. Moreover, three different groups of VNE variants are systematically examined. Simulations demonstrate that shared channel embedding, as a new embedding variant under the proposed VNR formulation, improves the acceptance rate and reduces cost and link utility compared to traditional independent channel embedding. △ Less

Submitted 10 July, 2023; originally announced July 2023.

arXiv:2307.04945 [pdf, other]

What do LLMs need to Synthesize Correct Router Configurations?

Authors: Rajdeep Mondal, Alan Tang, Ryan Beckett, Todd Millstein, George Varghese

Abstract: We investigate whether Large Language Models (e.g., GPT-4) can synthesize correct router configurations with reduced manual effort. We find GPT-4 works very badly by itself, producing promising draft configurations but with egregious errors in topology, syntax, and semantics. Our strategy, that we call Verified Prompt Programming, is to combine GPT-4 with verifiers, and use localized feedback from… ▽ More We investigate whether Large Language Models (e.g., GPT-4) can synthesize correct router configurations with reduced manual effort. We find GPT-4 works very badly by itself, producing promising draft configurations but with egregious errors in topology, syntax, and semantics. Our strategy, that we call Verified Prompt Programming, is to combine GPT-4 with verifiers, and use localized feedback from the verifier to automatically correct errors. Verification requires a specification and actionable localized feedback to be effective. We show results for two use cases: translating from Cisco to Juniper configurations on a single router, and implementing no-transit policy on multiple routers. While human input is still required, if we define the leverage as the number of automated prompts to the number of human prompts, our experiments show a leverage of 10X for Juniper translation, and 6X for implementing no-transit policy, ending with verified configurations. △ Less

Submitted 10 July, 2023; originally announced July 2023.

arXiv:2305.13871 [pdf, other]

Improving Heterogeneous Model Reuse by Density Estimation

Authors: Anke Tang, Yong Luo, Han Hu, Fengxiang He, Kehua Su, Bo Du, Yixin Chen, Dacheng Tao

Abstract: This paper studies multiparty learning, aiming to learn a model using the private data of different participants. Model reuse is a promising solution for multiparty learning, assuming that a local model has been trained for each party. Considering the potential sample selection bias among different parties, some heterogeneous model reuse approaches have been developed. However, although pre-traine… ▽ More This paper studies multiparty learning, aiming to learn a model using the private data of different participants. Model reuse is a promising solution for multiparty learning, assuming that a local model has been trained for each party. Considering the potential sample selection bias among different parties, some heterogeneous model reuse approaches have been developed. However, although pre-trained local classifiers are utilized in these approaches, the characteristics of the local data are not well exploited. This motivates us to estimate the density of local data and design an auxiliary model together with the local classifiers for reuse. To address the scenarios where some local models are not well pre-trained, we further design a multiparty cross-entropy loss for calibration. Upon existing works, we address a challenging problem of heterogeneous model reuse from a decision theory perspective and take advantage of recent advances in density estimation. Experimental results on both synthetic and benchmark data demonstrate the superiority of the proposed method. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: 9 pages, 5 figues. Accepted by IJCAI 2023

arXiv:2305.01543 [pdf, other]

NFT Wash Trading Detection

Authors: Derek Liu, Francesco Piccoli, Katie Chen, Adrina Tang, Victor Fang

Abstract: Wash trading is a form of market manipulation where the same entity sells an asset to themselves to drive up market prices, launder money under the cover of a legitimate transaction, or claim a tax loss without losing ownership of an asset. Although the practice is illegal with traditional assets, lack of supervision in the non-fungible token market enables criminals to wash trade and scam unsuspe… ▽ More Wash trading is a form of market manipulation where the same entity sells an asset to themselves to drive up market prices, launder money under the cover of a legitimate transaction, or claim a tax loss without losing ownership of an asset. Although the practice is illegal with traditional assets, lack of supervision in the non-fungible token market enables criminals to wash trade and scam unsuspecting buyers while operating under regulators radar. AnChain.AI designed an algorithm that flags transactions within an NFT collection history as wash trades when a wallet repurchases a token within 30 days of previously selling it. The algorithm also identifies intermediate transactions within a wash trade cycle. Testing on 7 popular NFT collections reveals that on average, 0.14% of transactions, 0.11% of wallets, and 0.16% of tokens in each collection are involved in wash trading. These wash trades generate an overall total price manipulation, sales, and repurchase profit of \$900K, \$1.1M, and negative \$1.6M respectively. The results draw attention to the prevalent market manipulation taking place and inform unsuspecting buyers which tokens and sellers may be involved in criminal activity. △ Less

Submitted 7 February, 2023; originally announced May 2023.

arXiv:2303.03221 [pdf, other]

doi 10.1145/3544548.3580896

Stargazer: An Interactive Camera Robot for Capturing How-To Videos Based on Subtle Instructor Cues

Authors: Jiannan Li, Mauricio Sousa, Karthik Mahadevan, Bryan Wang, Paula Akemi Aoyaui, Nicole Yu, Angela Yang, Ravin Balakrishnan, Anthony Tang, Tovi Grossman

Abstract: Live and pre-recorded video tutorials are an effective means for teaching physical skills such as cooking or prototyping electronics. A dedicated cameraperson following an instructor's activities can improve production quality. However, instructors who do not have access to a cameraperson's help often have to work within the constraints of static cameras. We present Stargazer, a novel approach for… ▽ More Live and pre-recorded video tutorials are an effective means for teaching physical skills such as cooking or prototyping electronics. A dedicated cameraperson following an instructor's activities can improve production quality. However, instructors who do not have access to a cameraperson's help often have to work within the constraints of static cameras. We present Stargazer, a novel approach for assisting with tutorial content creation with a camera robot that autonomously tracks regions of interest based on instructor actions to capture dynamic shots. Instructors can adjust the camera behaviors of Stargazer with subtle cues, including gestures and speech, allowing them to fluidly integrate camera control commands into instructional activities. Our user study with six instructors, each teaching a distinct skill, showed that participants could create dynamic tutorial videos with a diverse range of subjects, camera framing, and camera angle combinations using Stargazer. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: To appear in Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23), April 23--28, 2023, Hamburg, Germany

arXiv:2212.05005 [pdf, other]

doi 10.1109/TPAMI.2024.3409380

Memories are One-to-Many Mapping Alleviators in Talking Face Generation

Authors: Anni Tang, Tianyu He, Xu Tan, Jun Ling, Li Song

Abstract: Talking face generation aims at generating photo-realistic video portraits of a target person driven by input audio. Due to its nature of one-to-many mapping from the input audio to the output video (e.g., one speech content may have multiple feasible visual appearances), learning a deterministic mapping like previous works brings ambiguity during training, and thus causes inferior visual results.… ▽ More Talking face generation aims at generating photo-realistic video portraits of a target person driven by input audio. Due to its nature of one-to-many mapping from the input audio to the output video (e.g., one speech content may have multiple feasible visual appearances), learning a deterministic mapping like previous works brings ambiguity during training, and thus causes inferior visual results. Although this one-to-many mapping could be alleviated in part by a two-stage framework (i.e., an audio-to-expression model followed by a neural-rendering model), it is still insufficient since the prediction is produced without enough information (e.g., emotions, wrinkles, etc.). In this paper, we propose MemFace to complement the missing information with an implicit memory and an explicit memory that follow the sense of the two stages respectively. More specifically, the implicit memory is employed in the audio-to-expression model to capture high-level semantics in the audio-expression shared space, while the explicit memory is employed in the neural-rendering model to help synthesize pixel-level details. Our experimental results show that our proposed MemFace surpasses all the state-of-the-art results across multiple scenarios consistently and significantly. △ Less

Submitted 5 March, 2024; v1 submitted 9 December, 2022; originally announced December 2022.

Comments: Project page: see https://memoryface.github.io

Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

arXiv:2211.02012 [pdf, other]

Optimal Compression for Minimizing Classification Error Probability: an Information-Theoretic Approach

Authors: Jingchao Gao, Ao Tang, Weiyu Xu

Abstract: We formulate the problem of performing optimal data compression under the constraints that compressed data can be used for accurate classification in machine learning. We show that this translates to a problem of minimizing the mutual information between data and its compressed version under the constraint on error probability of classification is small when using the compressed data for machine l… ▽ More We formulate the problem of performing optimal data compression under the constraints that compressed data can be used for accurate classification in machine learning. We show that this translates to a problem of minimizing the mutual information between data and its compressed version under the constraint on error probability of classification is small when using the compressed data for machine learning. We then provide analytical and computational methods to characterize the optimal trade-off between data compression and classification error probability. First, we provide an analytical characterization for the optimal compression strategy for data with binary labels. Second, for data with multiple labels, we formulate a set of convex optimization problems to characterize the optimal tradeoff, from which the optimal trade-off between the classification error and compression efficiency can be obtained by numerically solving the formulated optimization problems. We further show the improvements of our formulations over the information-bottleneck methods in classification performance. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Comments: This work was done in Summer 2021

arXiv:2209.02866 [pdf, ps, other]

Algorithmic Learning Foundations for Common Law

Authors: Jason D. Hartline, Daniel W. Linna Jr., Liren Shan, Alex Tang

Abstract: This paper looks at a common law legal system as a learning algorithm, models specific features of legal proceedings, and asks whether this system learns efficiently. A particular feature of our model is explicitly viewing various aspects of court proceedings as learning algorithms. This viewpoint enables directly pointing out that when the costs of going to court are not commensurate with the ben… ▽ More This paper looks at a common law legal system as a learning algorithm, models specific features of legal proceedings, and asks whether this system learns efficiently. A particular feature of our model is explicitly viewing various aspects of court proceedings as learning algorithms. This viewpoint enables directly pointing out that when the costs of going to court are not commensurate with the benefits of going to court, there is a failure of learning and inaccurate outcomes will persist in cases that settle. Specifically, cases are brought to court at an insufficient rate. On the other hand, when individuals can be compelled or incentivized to bring their cases to court, the system can learn and inaccuracy vanishes over time. △ Less

Submitted 8 September, 2022; v1 submitted 6 September, 2022; originally announced September 2022.

arXiv:2208.02711 [pdf, ps, other]

Agnostic Learning of General ReLU Activation Using Gradient Descent

Authors: Pranjal Awasthi, Alex Tang, Aravindan Vijayaraghavan

Abstract: We provide a convergence analysis of gradient descent for the problem of agnostically learning a single ReLU function under Gaussian distributions. Unlike prior work that studies the setting of zero bias, we consider the more challenging scenario when the bias of the ReLU function is non-zero. Our main result establishes that starting from random initialization, in a polynomial number of iteration… ▽ More We provide a convergence analysis of gradient descent for the problem of agnostically learning a single ReLU function under Gaussian distributions. Unlike prior work that studies the setting of zero bias, we consider the more challenging scenario when the bias of the ReLU function is non-zero. Our main result establishes that starting from random initialization, in a polynomial number of iterations gradient descent outputs, with high probability, a ReLU function that achieves a competitive error guarantee when compared to the error of the best ReLU function. We also provide finite sample guarantees, and these techniques generalize to a broader class of marginal distributions beyond Gaussians. △ Less

Submitted 4 August, 2022; originally announced August 2022.

Comments: 28 oages

arXiv:2206.08853 [pdf, other]

MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge

Authors: Linxi Fan, Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, Anima Anandkumar

Abstract: Autonomous agents have made great strides in specialist domains like Atari games and Go. However, they typically learn tabula rasa in isolated environments with limited and manually conceived objectives, thus failing to generalize across a wide spectrum of tasks and capabilities. Inspired by how humans continually learn and adapt in the open world, we advocate a trinity of ingredients for building… ▽ More Autonomous agents have made great strides in specialist domains like Atari games and Go. However, they typically learn tabula rasa in isolated environments with limited and manually conceived objectives, thus failing to generalize across a wide spectrum of tasks and capabilities. Inspired by how humans continually learn and adapt in the open world, we advocate a trinity of ingredients for building generalist agents: 1) an environment that supports a multitude of tasks and goals, 2) a large-scale database of multimodal knowledge, and 3) a flexible and scalable agent architecture. We introduce MineDojo, a new framework built on the popular Minecraft game that features a simulation suite with thousands of diverse open-ended tasks and an internet-scale knowledge base with Minecraft videos, tutorials, wiki pages, and forum discussions. Using MineDojo's data, we propose a novel agent learning algorithm that leverages large pre-trained video-language models as a learned reward function. Our agent is able to solve a variety of open-ended tasks specified in free-form language without any manually designed dense shaping reward. We open-source the simulation suite, knowledge bases, algorithm implementation, and pretrained models (https://minedojo.org) to promote research towards the goal of generally capable embodied agents. △ Less

Submitted 22 November, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

Comments: Outstanding Paper Award at NeurIPS 2022. Project website: https://minedojo.org

arXiv:2204.09635 [pdf, other]

doi 10.1145/3603269.3604842

LIGHTYEAR: Using Modularity to Scale BGP Control Plane Verification

Authors: Alan Tang, Ryan Beckett, Steven Benaloh, Karthick Jayaraman, Tejas Patil, Todd Millstein, George Varghese

Abstract: Current network control plane verification tools cannot scale to large networks, because of the complexity of jointly reasoning about the behaviors of all nodes in the network. In this paper we present a modular approach to control plane verification, whereby end-to-end network properties are verified via a set of purely local checks on individual nodes and edges. The approach targets the verifica… ▽ More Current network control plane verification tools cannot scale to large networks, because of the complexity of jointly reasoning about the behaviors of all nodes in the network. In this paper we present a modular approach to control plane verification, whereby end-to-end network properties are verified via a set of purely local checks on individual nodes and edges. The approach targets the verification of safety properties for BGP configurations and provides guarantees in the face of both arbitrary external route announcements from neighbors and arbitrary node/link failures. We have proven the approach correct and also implemented it in a tool called Lightyear. Experimental results show that Lightyear scales dramatically better than prior control plane verifiers. Further, we have used Lightyear to verify three properties of the wide area network of a major cloud provider, containing hundreds of routers and tens of thousands of edges. To our knowledge no prior tool has been demonstrated to provide such guarantees at that scale. Finally, in addition to the scaling benefits, our modular approach to verification makes it easy to localize the causes of configuration errors and to support incremental re-verification as configurations are updated. △ Less

Submitted 20 September, 2023; v1 submitted 20 April, 2022; originally announced April 2022.

Comments: 12 pages (+ 2 pages references), 3 figures, Accepted at SIGCOMM '23

Journal ref: In Proceedings of the ACM SIGCOMM 2023 Conference (ACM SIGCOMM '23). Association for Computing Machinery, New York, NY, USA, 94-107

arXiv:2201.11917 [pdf, other]

Task-Aware Network Coding Over Butterfly Network

Authors: Jiangnan Cheng, Sandeep Chinchali, Ao Tang

Abstract: Network coding allows distributed information sources such as sensors to efficiently compress and transmit data to distributed receivers across a bandwidth-limited network. Classical network coding is largely task-agnostic -- the coding schemes mainly aim to faithfully reconstruct data at the receivers, regardless of what ultimate task the received data is used for. In this paper, we analyze a new… ▽ More Network coding allows distributed information sources such as sensors to efficiently compress and transmit data to distributed receivers across a bandwidth-limited network. Classical network coding is largely task-agnostic -- the coding schemes mainly aim to faithfully reconstruct data at the receivers, regardless of what ultimate task the received data is used for. In this paper, we analyze a new task-driven network coding problem, where distributed receivers pass transmitted data through machine learning (ML) tasks, which provides an opportunity to improve efficiency by transmitting salient task-relevant data representations. Specifically, we formulate a task-aware network coding problem over a butterfly network in real-coordinate space, where lossy analog compression through principal component analysis (PCA) can be applied. A lower bound for the total loss function for the formulated problem is given, and necessary and sufficient conditions for achieving this lower bound are also provided. We introduce ML algorithms to solve the problem in the general case, and our evaluation demonstrates the effectiveness of task-aware network coding. △ Less

Submitted 31 October, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

arXiv:2112.02955 [pdf]

Does constituency analysis enhance domain-specific pre-trained BERT models for relation extraction?

Authors: Anfu Tang, Louise Deléger, Robert Bossy, Pierre Zweigenbaum, Claire Nédellec

Abstract: Recently many studies have been conducted on the topic of relation extraction. The DrugProt track at BioCreative VII provides a manually-annotated corpus for the purpose of the development and evaluation of relation extraction systems, in which interactions between chemicals and genes are studied. We describe the ensemble system that we used for our submission, which combines predictions of fine-t… ▽ More Recently many studies have been conducted on the topic of relation extraction. The DrugProt track at BioCreative VII provides a manually-annotated corpus for the purpose of the development and evaluation of relation extraction systems, in which interactions between chemicals and genes are studied. We describe the ensemble system that we used for our submission, which combines predictions of fine-tuned bioBERT, sciBERT and const-bioBERT models by majority voting. We specifically tested the contribution of syntactic information to relation extraction with BERT. We observed that adding constituentbased syntactic information to BERT improved precision, but decreased recall, since relations rarely seen in the train set were less likely to be predicted by BERT models in which the syntactic information is infused. Our code is available online [https://github.com/Maple177/drugprot-relation-extraction]. △ Less

Submitted 25 November, 2021; originally announced December 2021.

Journal ref: BioCreative VII Challenge Evaluation Workshop, Nov 2021, on-line, Spain

arXiv:2112.02097 [pdf]

Global alignment for relation extraction in Microbiology

Authors: Anfu Tang, Claire Nédellec, Pierre Zweigenbaum, Louise Deléger, Robert Bossy

Abstract: We investigate a method to extract relations from texts based on global alignment and syntactic information. Combined with SVM, this method is shown to have a performance comparable or even better than LSTM on two RE tasks. We investigate a method to extract relations from texts based on global alignment and syntactic information. Combined with SVM, this method is shown to have a performance comparable or even better than LSTM on two RE tasks. △ Less

Submitted 25 November, 2021; originally announced December 2021.

Journal ref: Junior Conference on Data Science and Engineering, Feb 2021, Orsay, France

arXiv:2110.02329 [pdf, other]

Task-aware Privacy Preservation for Multi-dimensional Data

Authors: Jiangnan Cheng, Ao Tang, Sandeep Chinchali

Abstract: Local differential privacy (LDP) can be adopted to anonymize richer user data attributes that will be input to sophisticated machine learning (ML) tasks. However, today's LDP approaches are largely task-agnostic and often lead to severe performance loss -- they simply inject noise to all data attributes according to a given privacy budget, regardless of what features are most relevant for the ulti… ▽ More Local differential privacy (LDP) can be adopted to anonymize richer user data attributes that will be input to sophisticated machine learning (ML) tasks. However, today's LDP approaches are largely task-agnostic and often lead to severe performance loss -- they simply inject noise to all data attributes according to a given privacy budget, regardless of what features are most relevant for the ultimate task. In this paper, we address how to significantly improve the ultimate task performance with multi-dimensional user data by considering a task-aware privacy preservation problem. The key idea is to use an encoder-decoder framework to learn (and anonymize) a task-relevant latent representation of user data. We obtain an analytical near-optimal solution for the linear setting with mean-squared error (MSE) task loss. We also provide an approximate solution through a gradient-based learning algorithm for general nonlinear cases. Extensive experiments demonstrate that our task-aware approach significantly improves ultimate task accuracy compared to standard benchmark LDP approaches with the same level of privacy guarantee. △ Less

Submitted 7 August, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

Comments: Accepted by 39th International Conference on Machine Learning (ICML 2022)

arXiv:2109.14675 [pdf, other]

Data Sharing and Compression for Cooperative Networked Control

Authors: Jiangnan Cheng, Marco Pavone, Sachin Katti, Sandeep Chinchali, Ao Tang

Abstract: Sharing forecasts of network timeseries data, such as cellular or electricity load patterns, can improve independent control applications ranging from traffic scheduling to power generation. Typically, forecasts are designed without knowledge of a downstream controller's task objective, and thus simply optimize for mean prediction error. However, such task-agnostic representations are often too la… ▽ More Sharing forecasts of network timeseries data, such as cellular or electricity load patterns, can improve independent control applications ranging from traffic scheduling to power generation. Typically, forecasts are designed without knowledge of a downstream controller's task objective, and thus simply optimize for mean prediction error. However, such task-agnostic representations are often too large to stream over a communication network and do not emphasize salient temporal features for cooperative control. This paper presents a solution to learn succinct, highly-compressed forecasts that are co-designed with a modular controller's task objective. Our simulations with real cellular, Internet-of-Things (IoT), and electricity load data show we can improve a model predictive controller's performance by at least $25\%$ while transmitting $80\%$ less data than the competing method. Further, we present theoretical compression results for a networked variant of the classical linear quadratic regulator (LQR) control problem. △ Less

Submitted 5 October, 2021; v1 submitted 29 September, 2021; originally announced September 2021.

Comments: Accepted by 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

arXiv:2107.10209 [pdf, ps, other]

Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations

Authors: Pranjal Awasthi, Alex Tang, Aravindan Vijayaraghavan

Abstract: We present polynomial time and sample efficient algorithms for learning an unknown depth-2 feedforward neural network with general ReLU activations, under mild non-degeneracy assumptions. In particular, we consider learning an unknown network of the form $f(x) = {a}^{\mathsf{T}}σ({W}^\mathsf{T}x+b)$, where $x$ is drawn from the Gaussian distribution, and $σ(t) := \max(t,0)$ is the ReLU activation.… ▽ More We present polynomial time and sample efficient algorithms for learning an unknown depth-2 feedforward neural network with general ReLU activations, under mild non-degeneracy assumptions. In particular, we consider learning an unknown network of the form $f(x) = {a}^{\mathsf{T}}σ({W}^\mathsf{T}x+b)$, where $x$ is drawn from the Gaussian distribution, and $σ(t) := \max(t,0)$ is the ReLU activation. Prior works for learning networks with ReLU activations assume that the bias $b$ is zero. In order to deal with the presence of the bias terms, our proposed algorithm consists of robustly decomposing multiple higher order tensors arising from the Hermite expansion of the function $f(x)$. Using these ideas we also establish identifiability of the network parameters under minimal assumptions. △ Less

Submitted 1 August, 2021; v1 submitted 21 July, 2021; originally announced July 2021.

Comments: 45 pages (including appendix). This version fixes an error in the previous version of the paper

arXiv:2107.01446 [pdf]

Architecture Information Communication in Two OSS Projects: the Why, Who, When, and What

Authors: Tingting Bi, Wei Ding, Peng Liang, Antony Tang

Abstract: Architecture information is vital for Open Source Software (OSS) development, and mailing list is one of the widely used channels for developers to share and communicate architecture information. This work investigates the nature of architecture information communication (i.e., why, who, when, and what) by OSS developers via developer mailing lists. We employed a multiple case study approach to ex… ▽ More Architecture information is vital for Open Source Software (OSS) development, and mailing list is one of the widely used channels for developers to share and communicate architecture information. This work investigates the nature of architecture information communication (i.e., why, who, when, and what) by OSS developers via developer mailing lists. We employed a multiple case study approach to extract and analyze the architecture information communication from the developer mailing lists of two OSS projects, ArgoUML and Hibernate, during their development life-cycle of over 18 years. Our main findings are: (a) architecture negotiation and interpretation are the two main reasons (i.e., why) of architecture communication; (b) the amount of architecture information communicated in developer mailing lists decreases after the first stable release (i.e., when); (c) architecture communications centered around a few core developers (i.e., who); (d) and the most frequently communicated architecture elements (i.e., what) are Architecture Rationale and Architecture Model. There are a few similarities of architecture communication between the two OSS projects. Such similarities point to how OSS developers naturally gravitate towards the four aspects of architecture communication in OSS development. △ Less

Submitted 3 July, 2021; originally announced July 2021.

Comments: Preprint accepted for publication in Journal of Systems and Software, 2021

arXiv:2105.07940 [pdf]

doi 10.1016/j.jss.2021.111005

Mining Architecture Tactics and Quality Attributes Knowledge in Stack Overflow

Authors: Tingting Bi, Peng Liang, Antony Tang, Xin Xia

Abstract: Context: Architecture Tactics (ATs) are architectural building blocks that provide general architectural solutions for addressing Quality Attributes (QAs) issues. Mining and analyzing QA-AT knowledge can help the software architecture community better understand architecture design. However, manually capturing and mining this knowledge is labor-intensive and difficult. Objective: Using Stack Overf… ▽ More Context: Architecture Tactics (ATs) are architectural building blocks that provide general architectural solutions for addressing Quality Attributes (QAs) issues. Mining and analyzing QA-AT knowledge can help the software architecture community better understand architecture design. However, manually capturing and mining this knowledge is labor-intensive and difficult. Objective: Using Stack Overflow (SO) as our source, our main goals are to effectively mine such knowledge; and to have some sense of how developers use ATs with respect to QA concerns from related discussions. Methods: We applied a semi-automatic dictionary-based mining approach to extract the QA-AT posts in SO. With the mined QA-AT posts, we identified the relationships between ATs and QAs. Results: Our approach allow us to mine QA-AT knowledge effectively with an F-measure of 0.865 and Performance of 82.2%. Using this mining approach, we are able to discover architectural synonyms of QAs and ATs used by designers, from which we discover how developers apply ATs to address quality requirements. Conclusions: We make two contributions in this work: First, we demonstrated a semi-automatic approach to mine ATs and QAs from SO posts; Second, we identified little-known design relationships between QAs and ATs and grouped architectural design considerations to aid architects make architecture tactics design decisions. △ Less

Submitted 17 May, 2021; originally announced May 2021.

Comments: Preprint accepted for publication in Journal of Systems and Software, 2021

arXiv:2102.00581 [pdf]

"Grip-that-there": An Investigation of Explicit and Implicit Task Allocation Techniques for Human-Robot Collaboration

Authors: Karthik Mahadevan, Maurício Sousa, Anthony Tang, Tovi Grossman

Abstract: In ad-hoc human-robot collaboration (HRC), humans and robots work on a task without pre-planning the robot's actions prior to execution; instead, task allocation occurs in real-time. However, prior research has largely focused on task allocations that are pre-planned - there has not been a comprehensive exploration or evaluation of techniques where task allocation is adjusted in real-time. Inspire… ▽ More In ad-hoc human-robot collaboration (HRC), humans and robots work on a task without pre-planning the robot's actions prior to execution; instead, task allocation occurs in real-time. However, prior research has largely focused on task allocations that are pre-planned - there has not been a comprehensive exploration or evaluation of techniques where task allocation is adjusted in real-time. Inspired by HCI research on territoriality and proxemics, we propose a design space of novel task allocation techniques including both explicit techniques, where the user maintains agency, and implicit techniques, where the efficiency of automation can be leveraged. The techniques were implemented and evaluated using a tabletop HRC simulation in VR. A 16-participant study, which presented variations of a collaborative block stacking task, showed that implicit techniques enable efficient task completion and task parallelization, and should be augmented with explicit mechanisms to provide users with fine-grained control. △ Less

Submitted 2 February, 2021; v1 submitted 31 January, 2021; originally announced February 2021.

Comments: To be published in Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

ACM Class: H.5.2

arXiv:2012.08483 [pdf, other]

Amazon SageMaker Autopilot: a white box AutoML solution at scale

Authors: Piali Das, Valerio Perrone, Nikita Ivkin, Tanya Bansal, Zohar Karnin, Huibin Shen, Iaroslav Shcherbatyi, Yotam Elor, Wilton Wu, Aida Zolic, Thibaut Lienart, Alex Tang, Amr Ahmed, Jean Baptiste Faddoul, Rodolphe Jenatton, Fela Winkelmolen, Philip Gautier, Leo Dirac, Andre Perunicic, Miroslav Miladinovic, Giovanni Zappella, Cédric Archambeau, Matthias Seeger, Bhaskar Dutt, Laurence Rouesnel

Abstract: AutoML systems provide a black-box solution to machine learning problems by selecting the right way of processing features, choosing an algorithm and tuning the hyperparameters of the entire pipeline. Although these systems perform well on many datasets, there is still a non-negligible number of datasets for which the one-shot solution produced by each particular system would provide sub-par perfo… ▽ More AutoML systems provide a black-box solution to machine learning problems by selecting the right way of processing features, choosing an algorithm and tuning the hyperparameters of the entire pipeline. Although these systems perform well on many datasets, there is still a non-negligible number of datasets for which the one-shot solution produced by each particular system would provide sub-par performance. In this paper, we present Amazon SageMaker Autopilot: a fully managed system providing an automated ML solution that can be modified when needed. Given a tabular dataset and the target column name, Autopilot identifies the problem type, analyzes the data and produces a diverse set of complete ML pipelines including feature preprocessing and ML algorithms, which are tuned to generate a leaderboard of candidate models. In the scenario where the performance is not satisfactory, a data scientist is able to view and edit the proposed ML pipelines in order to infuse their expertise and business knowledge without having to revert to a fully manual solution. This paper describes the different components of Autopilot, emphasizing the infrastructure choices that allow scalability, high quality models, editable ML pipelines, consumption of artifacts of offline meta-learning, and a convenient integration with the entire SageMaker suite allowing these trained models to be used in a production setting. △ Less

Submitted 16 December, 2020; v1 submitted 15 December, 2020; originally announced December 2020.

arXiv:2006.01353 [pdf, other]

doi 10.1145/3399715.3399921

Activity River: Visualizing Planned and Logged Personal Activities for Reflection

Authors: Bon Adriel Aseniero, Charles Perin, Wesley Willett, Anthony Tang, Sheelagh Carpendale

Abstract: We present Activity River, a personal visualization tool which enables individuals to plan, log, and reflect on their self-defined activities. We are interested in supporting this type of reflective practice as prior work has shown that reflection can help people plan and manage their time effectively. Hence, we designed Activity River based on five design goals (visualize historical and contextua… ▽ More We present Activity River, a personal visualization tool which enables individuals to plan, log, and reflect on their self-defined activities. We are interested in supporting this type of reflective practice as prior work has shown that reflection can help people plan and manage their time effectively. Hence, we designed Activity River based on five design goals (visualize historical and contextual data, facilitate comparison of goals and achievements, engage viewers with delightful visuals, support authorship, and enable flexible planning and logging) which we distilled from the Information Visualization and Human-Computer Interaction literature. To explore our approach's strengths and limitations, we conducted a qualitative study of Activity River using a role-playing method. Through this qualitative exploration, we illustrate how our participants envisioned using our visualization to perform dynamic and continuous reflection on their activities. We observed that they were able to assess their progress towards their plans and adapt to unforeseen circumstances using our tool. △ Less

Submitted 1 June, 2020; originally announced June 2020.

Comments: 9 pages, 6 figures, AVI '20, September 28-October 2, 2020, Salerno, Italy 2020 Association for Computing Machinery

arXiv:2001.03021 [pdf]

TanGi: Tangible Proxies for Embodied Object Exploration and Manipulation in Virtual Reality

Authors: Martin Feick, Scott Bateman, Anthony Tang, André Miede, Nicolai Marquardt

Abstract: Exploring and manipulating complex virtual objects is challenging due to limitations of conventional controllers and free-hand interaction techniques. We present the TanGi toolkit which enables novices to rapidly build physical proxy objects using Composable Shape Primitives. TanGi also provides Manipulators allowing users to build objects including movable parts, making them suitable for rich obj… ▽ More Exploring and manipulating complex virtual objects is challenging due to limitations of conventional controllers and free-hand interaction techniques. We present the TanGi toolkit which enables novices to rapidly build physical proxy objects using Composable Shape Primitives. TanGi also provides Manipulators allowing users to build objects including movable parts, making them suitable for rich object exploration and manipulation in VR. With a set of different use cases and applications we show the capabilities of the TanGi toolkit, and evaluate its use. In a study with 16 participants, we demonstrate that novices can quickly build physical proxy objects using the Composable Shape Primitives, and explore how different levels of object embodiment affect virtual object exploration. In a second study with 12 participants we evaluate TanGi's Manipulators, and investigate the effectiveness of embodied interaction. Findings from this study show that TanGi's proxies outperform traditional controllers, and were generally favored by participants. △ Less

Submitted 9 January, 2020; originally announced January 2020.

Comments: 10 pages, 11 figures

ACM Class: H.5.2

arXiv:1909.06026 [pdf]

doi 10.1021/acs.nanolett.9b04200

Magnetic domain wall based synaptic and activation function generator for neuromorphic accelerators

Authors: Saima A Siddiqui, Sumit Dutta, Astera Tang, Luqiao Liu, Caroline A Ross, Marc A Baldo

Abstract: Magnetic domain walls are information tokens in both logic and memory devices, and hold particular interest in applications such as neuromorphic accelerators that combine logic in memory. Here, we show that devices based on the electrical manipulation of magnetic domain walls are capable of implementing linear, as well as programmable nonlinear, functions. Unlike other approaches, domain-wall-base… ▽ More Magnetic domain walls are information tokens in both logic and memory devices, and hold particular interest in applications such as neuromorphic accelerators that combine logic in memory. Here, we show that devices based on the electrical manipulation of magnetic domain walls are capable of implementing linear, as well as programmable nonlinear, functions. Unlike other approaches, domain-wall-based devices are ideal for application to both synaptic weight generators and thresholding in deep neural networks. Prototype micrometer-size devices operate with 8 ns current pulses and the energy consumption required for weight modulation is < 16 pJ. Both speed and energy consumption compare favorably to other synaptic nonvolatile devices, with the expected energy dissipation for scaled 20 nm devices close to that of biological neurons. △ Less

Submitted 7 September, 2019; originally announced September 2019.

Comments: 24 pages, 5 figures

arXiv:1901.09483 [pdf, other]

End-to-End Discriminative Deep Network for Liver Lesion Classification

Authors: Francisco Perdigon Romero, Andre Diler, Gabriel Bisson-Gregoire, Simon Turcotte, Real Lapointe, Franck Vandenbroucke-Menu, An Tang, Samuel Kadoury

Abstract: Colorectal liver metastasis is one of most aggressive liver malignancies. While the definition of lesion type based on CT images determines the diagnosis and therapeutic strategy, the discrimination between cancerous and non-cancerous lesions are critical and requires highly skilled expertise, experience and time. In the present work we introduce an end-to-end deep learning approach to assist in t… ▽ More Colorectal liver metastasis is one of most aggressive liver malignancies. While the definition of lesion type based on CT images determines the diagnosis and therapeutic strategy, the discrimination between cancerous and non-cancerous lesions are critical and requires highly skilled expertise, experience and time. In the present work we introduce an end-to-end deep learning approach to assist in the discrimination between liver metastases from colorectal cancer and benign cysts in abdominal CT images of the liver. Our approach incorporates the efficient feature extraction of InceptionV3 combined with residual connections and pre-trained weights from ImageNet. The architecture also includes fully connected classification layers to generate a probabilistic output of lesion type. We use an in-house clinical biobank with 230 liver lesions originating from 63 patients. With an accuracy of 0.96 and a F1-score of 0.92, the results obtained with the proposed approach surpasses state of the art methods. Our work provides the basis for incorporating machine learning tools in specialized radiology software to assist physicians in the early detection and treatment of liver lesions. △ Less

Submitted 27 January, 2019; originally announced January 2019.

arXiv:1901.04056 [pdf, other]

doi 10.1016/j.media.2022.102680

The Liver Tumor Segmentation Benchmark (LiTS)

Authors: Patrick Bilic, Patrick Christ, Hongwei Bran Li, Eugene Vorontsov, Avi Ben-Cohen, Georgios Kaissis, Adi Szeskin, Colin Jacobs, Gabriel Efrain Humpire Mamani, Gabriel Chartrand, Fabian Lohöfer, Julian Walter Holch, Wieland Sommer, Felix Hofmann, Alexandre Hostettler, Naama Lev-Cohain, Michal Drozdzal, Michal Marianne Amitai, Refael Vivantik, Jacob Sosna, Ivan Ezhov, Anjany Sekuboyina, Fernando Navarro, Florian Kofler, Johannes C. Paetzold , et al. (84 additional authors not shown)

Abstract: In this work, we report the set-up and results of the Liver Tumor Segmentation Benchmark (LiTS), which was organized in conjunction with the IEEE International Symposium on Biomedical Imaging (ISBI) 2017 and the International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2017 and 2018. The image dataset is diverse and contains primary and secondary tumors with… ▽ More In this work, we report the set-up and results of the Liver Tumor Segmentation Benchmark (LiTS), which was organized in conjunction with the IEEE International Symposium on Biomedical Imaging (ISBI) 2017 and the International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2017 and 2018. The image dataset is diverse and contains primary and secondary tumors with varied sizes and appearances with various lesion-to-background levels (hyper-/hypo-dense), created in collaboration with seven hospitals and research institutions. Seventy-five submitted liver and liver tumor segmentation algorithms were trained on a set of 131 computed tomography (CT) volumes and were tested on 70 unseen test images acquired from different patients. We found that not a single algorithm performed best for both liver and liver tumors in the three events. The best liver segmentation algorithm achieved a Dice score of 0.963, whereas, for tumor segmentation, the best algorithms achieved Dices scores of 0.674 (ISBI 2017), 0.702 (MICCAI 2017), and 0.739 (MICCAI 2018). Retrospectively, we performed additional analysis on liver tumor detection and revealed that not all top-performing segmentation algorithms worked well for tumor detection. The best liver tumor detection method achieved a lesion-wise recall of 0.458 (ISBI 2017), 0.515 (MICCAI 2017), and 0.554 (MICCAI 2018), indicating the need for further research. LiTS remains an active benchmark and resource for research, e.g., contributing the liver-related segmentation tasks in \url{http://medicaldecathlon.com/}. In addition, both data and online evaluation are accessible via \url{www.lits-challenge.com}. △ Less

Submitted 25 November, 2022; v1 submitted 13 January, 2019; originally announced January 2019.

Comments: Patrick Bilic, Patrick Christ, Hongwei Bran Li, and Eugene Vorontsov made equal contributions to this work. Published in Medical Image Analysis

Journal ref: Medical Image Analysis (2022) Pg. 102680

arXiv:1901.03684 [pdf, other]

doi 10.1109/ISBI.2019.8759410

Multi-Level Batch Normalization In Deep Networks For Invasive Ductal Carcinoma Cell Discrimination In Histopathology Images

Authors: Francisco Perdigon Romero, An Tang, Samuel Kadoury

Abstract: Breast cancer is the most diagnosed cancer and the most predominant cause of death in women worldwide. Imaging techniques such as the breast cancer pathology helps in the diagnosis and monitoring of the disease. However identification of malignant cells can be challenging given the high heterogeneity in tissue absorbotion from staining agents. In this work, we present a novel approach for Invasive… ▽ More Breast cancer is the most diagnosed cancer and the most predominant cause of death in women worldwide. Imaging techniques such as the breast cancer pathology helps in the diagnosis and monitoring of the disease. However identification of malignant cells can be challenging given the high heterogeneity in tissue absorbotion from staining agents. In this work, we present a novel approach for Invasive Ductal Carcinoma (IDC) cells discrimination in histopathology slides. We propose a model derived from the Inception architecture, proposing a multi-level batch normalization module between each convolutional steps. This module was used as a base block for the feature extraction in a CNN architecture. We used the open IDC dataset in which we obtained a balanced accuracy of 0.89 and an F1 score of 0.90, thus surpassing recent state of the art classification algorithms tested on this public dataset. △ Less

Submitted 11 January, 2019; originally announced January 2019.

Comments: 4 pages, 5 figures

Showing 1–50 of 69 results for author: Tang, A