Skip to main content

Showing 1–50 of 1,343 results for author: Khan, S

  1. arXiv:2407.09379  [pdf, other

    cs.CV

    FANet: Feature Amplification Network for Semantic Segmentation in Cluttered Background

    Authors: Muhammad Ali, Mamoona Javaid, Mubashir Noman, Mustansar Fiaz, Salman Khan

    Abstract: Existing deep learning approaches leave out the semantic cues that are crucial in semantic segmentation present in complex scenarios including cluttered backgrounds and translucent objects, etc. To handle these challenges, we propose a feature amplification network (FANet) as a backbone network that incorporates semantic information using a novel feature enhancement module at multi-stages. To achi… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted at ICIP 2024

  2. arXiv:2407.07611  [pdf, other

    cs.LG cs.CE

    Physics-Informed Geometric Operators to Support Surrogate, Dimension Reduction and Generative Models for Engineering Design

    Authors: Shahroz Khan, Zahid Masood, Muhammad Usama, Konstantinos Kostas, Panagiotis Kaklis, Wei, Chen

    Abstract: In this work, we propose a set of physics-informed geometric operators (GOs) to enrich the geometric data provided for training surrogate/discriminative models, dimension reduction, and generative models, typically employed for performance prediction, dimension reduction, and creating data-driven parameterisations, respectively. However, as both the input and output streams of these models consist… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  3. arXiv:2407.07054  [pdf, other

    cs.CR cs.ET cs.LG

    A Differentially Private Blockchain-Based Approach for Vertical Federated Learning

    Authors: Linh Tran, Sanjay Chari, Md. Saikat Islam Khan, Aaron Zachariah, Stacy Patterson, Oshani Seneviratne

    Abstract: We present the Differentially Private Blockchain-Based Vertical Federal Learning (DP-BBVFL) algorithm that provides verifiability and privacy guarantees for decentralized applications. DP-BBVFL uses a smart contract to aggregate the feature representations, i.e., the embeddings, from clients transparently. We apply local differential privacy to provide privacy for embeddings stored on a blockchain… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  4. arXiv:2407.06096  [pdf, other

    cs.CV

    Muzzle-Based Cattle Identification System Using Artificial Intelligence (AI)

    Authors: Hasan Zohirul Islam, Safayet Khan, Sanjib Kumar Paul, Sheikh Imtiaz Rahi, Fahim Hossain Sifat, Md. Mahadi Hasan Sany, Md. Shahjahan Ali Sarker, Tareq Anam, Ismail Hossain Polas

    Abstract: Absence of tamper-proof cattle identification technology was a significant problem preventing insurance companies from providing livestock insurance. This lack of technology had devastating financial consequences for marginal farmers as they did not have the opportunity to claim compensation for any unexpected events such as the accidental death of cattle in Bangladesh. Using machine learning and… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  5. arXiv:2407.05770  [pdf, other

    astro-ph.GA

    A global view on star formation: The GLOSTAR Galactic plane survey X. Galactic HII region catalog using radio recombination lines

    Authors: S. Khan, M. R. Rugel, A. Brunthaler, K. M. Menten, F. Wyrowski, J. S. Urquhart, Y. Gong, A. Y. Yang, H. Nguyen, R. Dokara, S. A. Dzib, S. -N. X. Medina, G. N. Ortiz-León, J. D. Pandian, H. Beuther, V. S. Veena, S. Neupane, A. Cheema, W. Reich, N. Roy

    Abstract: Studies of Galactic HII regions are of crucial importance for studying star formation and the evolution of the interstellar medium. Gaining an insight into their physical characteristics contributes to a more comprehensive understanding of these phenomena. The GLOSTAR project aims to provide a GLObal view on STAR formation in the Milky Way by performing an unbiased and sensitive survey. This is ac… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted for publication in A&A

  6. arXiv:2407.05618  [pdf, other

    nucl-ex hep-ex

    Improved limit on neutrinoless double beta decay of \mohundred~from AMoRE-I

    Authors: A. Agrawal, V. V. Alenkov, P. Aryal, J. Beyer, B. Bhandari, R. S. Boiko, K. Boonin, O. Buzanov, C. R. Byeon, N. Chanthima, M. K. Cheoun, J. S. Choe, Seonho Choi, S. Choudhury, J. S. Chung, F. A. Danevich, M. Djamal, D. Drung, C. Enss, A. Fleischmann, A. M. Gangapshev, L. Gastaldo, Y. M. Gavrilyuk, A. M. Gezhaev, O. Gileva , et al. (83 additional authors not shown)

    Abstract: AMoRE searches for the signature of neutrinoless double beta decay of $^{100}$Mo with a 100 kg sample of enriched $^{100}$Mo. Scintillating molybdate crystals coupled with a metallic magnetic calorimeter operate at milli-Kelvin temperatures to measure the energy of electrons emitted in the decay. As a demonstration of the full-scale AMoRE, we conducted AMoRE-I, a pre-experiment with 18 molybdate c… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 7 pages, 4 figures

  7. arXiv:2407.04123  [pdf, other

    astro-ph.GA

    A multi-wavelength study of Galactic H II regions with extended emission

    Authors: Jyotirmoy Dey, Jagadheep D. Pandian, Dharam V. Lal, Michael R. Rugel, Andreas Brunthaler, Karl M. Menten, Friedrich Wyrowski, Nirupam Roy, Sergio A. Dzib, Sac-Nicté X. Medina, Sarwar Khan, Rohit Dokara

    Abstract: H II regions are the signposts of massive ($M\geq\,8\,M_\odot$) star-forming sites in our Galaxy. It has been observed that the ionizing photon rate inferred from the radio continuum emission of H II regions is significantly lower ($\sim$ 90%) than that inferred from far-infrared fluxes measured by IRAS. This discrepancy in the ionizing photon rates may arise due to there being significant amounts… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted for publication in A&A. 22 pages, 9 figures, 6 tables

  8. arXiv:2407.03210  [pdf, other

    cs.LG cs.AI cs.NE

    Combining AI Control Systems and Human Decision Support via Robustness and Criticality

    Authors: Walt Woods, Alexander Grushin, Simon Khan, Alvaro Velasquez

    Abstract: AI-enabled capabilities are reaching the requisite level of maturity to be deployed in the real world, yet do not always make correct or safe decisions. One way of addressing these concerns is to leverage AI control systems alongside and in support of human decisions, relying on the AI control system in safe situations while calling on a human co-decider for critical situations. We extend a method… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 19 pages, 12 figures

    MSC Class: 68T07 ACM Class: I.2.6

    Journal ref: Proc. SPIE 13058, Disruptive Technologies in Information Sciences VIII, 130580J (6 June 2024)

  9. arXiv:2407.01155  [pdf, other

    cs.LG

    CPT: Consistent Proxy Tuning for Black-box Optimization

    Authors: Yuanyang He, Zitong Huang, Xinxing Xu, Rick Siow Mong Goh, Salman Khan, Wangmeng Zuo, Yong Liu, Chun-Mei Feng

    Abstract: Black-box tuning has attracted recent attention due to that the structure or inner parameters of advanced proprietary models are not accessible. Proxy-tuning provides a test-time output adjustment for tuning black-box language models. It applies the difference of the output logits before and after tuning a smaller white-box "proxy" model to improve the black-box model. However, this technique serv… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 10 pages,2 figures plus supplementary materials

  10. Multi-agent Cooperative Games Using Belief Map Assisted Training

    Authors: Qinwei Huang, Chen Luo, Alex B. Wu, Simon Khan, Hai Li, Qinru Qiu

    Abstract: In a multi-agent system, agents share their local observations to gain global situational awareness for decision making and collaboration using a message passing system. When to send a message, how to encode a message, and how to leverage the received messages directly affect the effectiveness of the collaboration among agents. When training a multi-agent cooperative game using reinforcement learn… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Journal ref: ECAI 2023. IOS Press, 2023: 1617-1624

  11. arXiv:2406.16460  [pdf, other

    hep-ph

    Constraining dark matter from strong phase transitions in a $U(1)_{L_μ-L_τ}$ model: Implications for neutrino masses and muon $g-2$

    Authors: Sandhya Choubey, Sarif Khan, Marco Merchand, Sampsa Vihonen

    Abstract: In this paper, we study a non-minimal gauged $U(1)_{L_μ-L_τ}$ model, where we add two complex singlet scalars, three right-handed Majorana neutrinos (RHN), and a vector-like dark fermion to the Standard Model (SM), all non-trivially charged under the extra gauge symmetry. The model offers an easy resolution to the muon $(g-2)$ anomaly, which fixes the scale of spontaneous symmetry breaking. Furthe… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  12. arXiv:2406.15831  [pdf, other

    cs.CV

    Shape2.5D: A Dataset of Texture-less Surfaces for Depth and Normals Estimation

    Authors: Muhammad Saif Ullah Khan, Muhammad Zeshan Afzal, Didier Stricker

    Abstract: Reconstructing texture-less surfaces poses unique challenges in computer vision, primarily due to the lack of specialized datasets that cater to the nuanced needs of depth and normals estimation in the absence of textural information. We introduce "Shape2.5D," a novel, large-scale dataset designed to address this gap. Comprising 364k frames spanning 2635 3D models and 48 unique objects, our datase… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: This dataset paper was originally written in 2022

  13. arXiv:2406.15556  [pdf, other

    cs.CV

    Open-Vocabulary Temporal Action Localization using Multimodal Guidance

    Authors: Akshita Gupta, Aditya Arora, Sanath Narayan, Salman Khan, Fahad Shahbaz Khan, Graham W. Taylor

    Abstract: Open-Vocabulary Temporal Action Localization (OVTAL) enables a model to recognize any desired action category in videos without the need to explicitly curate training data for all categories. However, this flexibility poses significant challenges, as the model must recognize not only the action categories seen during training but also novel categories specified at inference. Unlike standard tempor… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  14. arXiv:2406.14830  [pdf, other

    cs.CV

    CLIP-Decoder : ZeroShot Multilabel Classification using Multimodal CLIP Aligned Representation

    Authors: Muhammad Ali, Salman Khan

    Abstract: Multi-label classification is an essential task utilized in a wide variety of real-world applications. Multi-label zero-shot learning is a method for classifying images into multiple unseen categories for which no training data is available, while in general zero-shot situations, the test set may include observed classes. The CLIP-Decoder is a novel method based on the state-of-the-art ML-Decoder… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted at ICCVW- VLAR

  15. arXiv:2406.14370  [pdf, other

    cs.CV

    Enhanced Bank Check Security: Introducing a Novel Dataset and Transformer-Based Approach for Detection and Verification

    Authors: Muhammad Saif Ullah Khan, Tahira Shehzadi, Rabeya Noor, Didier Stricker, Muhammad Zeshan Afzal

    Abstract: Automated signature verification on bank checks is critical for fraud prevention and ensuring transaction authenticity. This task is challenging due to the coexistence of signatures with other textual and graphical elements on real-world documents. Verification systems must first detect the signature and then validate its authenticity, a dual challenge often overlooked by current datasets and meth… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted for publication in 16th IAPR International Workshop on Document Analysis Systems 2024

  16. arXiv:2406.14214  [pdf, other

    cs.AI

    REVEAL-IT: REinforcement learning with Visibility of Evolving Agent poLicy for InTerpretability

    Authors: Shuang Ao, Simon Khan, Haris Aziz, Flora D. Salim

    Abstract: Understanding the agent's learning process, particularly the factors that contribute to its success or failure post-training, is crucial for comprehending the rationale behind the agent's decision-making process. Prior methods clarify the learning process by creating a structural causal model (SCM) or visually representing the distribution of value functions. Nevertheless, these approaches have co… ▽ More

    Submitted 27 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2307.01452 by other authors

  17. arXiv:2406.13439  [pdf, other

    cs.CL

    Finding Blind Spots in Evaluator LLMs with Interpretable Checklists

    Authors: Sumanth Doddapaneni, Mohammed Safi Ur Rahman Khan, Sshubam Verma, Mitesh M. Khapra

    Abstract: Large Language Models (LLMs) are increasingly relied upon to evaluate text outputs of other LLMs, thereby influencing leaderboards and development decisions. However, concerns persist over the accuracy of these assessments and the potential for misleading conclusions. In this work, we investigate the effectiveness of LLMs as evaluators for text generation tasks. We propose FBI, a novel framework d… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  18. arXiv:2406.13302  [pdf, other

    cs.CV

    Situational Instructions Database: Task Guidance in Dynamic Environments

    Authors: Muhammad Saif Ullah Khan, Sankalp Sinha, Didier Stricker, Muhammad Zeshan Afzal

    Abstract: The Situational Instructions Database (SID) addresses the need for enhanced situational awareness in artificial intelligence (AI) systems operating in dynamic environments. By integrating detailed scene graphs with dynamically generated, task-specific instructions, SID provides a novel dataset that allows AI systems to perform complex, real-world tasks with improved context sensitivity and operati… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 9 pages, 6 figures

  19. arXiv:2406.11704  [pdf, other

    cs.CL cs.AI cs.LG

    Nemotron-4 340B Technical Report

    Authors: Nvidia, :, Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H. Anh, Pallab Bhattacharya, Annika Brundyn, Jared Casper, Bryan Catanzaro, Sharon Clay, Jonathan Cohen, Sirshak Das, Ayush Dattagupta, Olivier Delalleau, Leon Derczynski, Yi Dong, Daniel Egert, Ellie Evans, Aleksander Ficek, Denys Fridman, Shaona Ghosh, Boris Ginsburg, Igor Gitman, Tomasz Grzegorzek , et al. (58 additional authors not shown)

    Abstract: We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation be… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  20. arXiv:2406.10326  [pdf, other

    cs.CV

    VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs

    Authors: Rohit Bharadwaj, Hanan Gani, Muzammal Naseer, Fahad Shahbaz Khan, Salman Khan

    Abstract: The recent developments in Large Multi-modal Video Models (Video-LMMs) have significantly enhanced our ability to interpret and analyze video data. Despite their impressive capabilities, current Video-LMMs have not been evaluated for anomaly detection tasks, which is critical to their deployment in practical scenarios e.g., towards identifying deepfakes, manipulated video content, traffic accident… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Data: https://huggingface.co/datasets/rohit901/VANE-Bench

  21. arXiv:2406.09698  [pdf, other

    physics.ins-det hep-ex

    Projected background and sensitivity of AMoRE-II

    Authors: A. Agrawal, V. V. Alenkov, P. Aryal, J. Beyer, B. Bhandari, R. S. Boiko, K. Boonin, O. Buzanov, C. R. Byeon, N. Chanthima, M. K. Cheoun, J. S. Choe, Seonho Choi, S. Choudhury, J. S. Chung, F. A. Danevich, M. Djamal, D. Drung, C. Enss, A. Fleischmann, A. M. Gangapshev, L. Gastaldo, Y. M. Gavrilyuk, A. M. Gezhaev, O. Gileva , et al. (81 additional authors not shown)

    Abstract: AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located ap… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  22. arXiv:2406.09418  [pdf, other

    cs.CV

    VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

    Authors: Muhammad Maaz, Hanoona Rasheed, Salman Khan, Fahad Khan

    Abstract: Building on the advances of language models, Large Multimodal Models (LMMs) have contributed significant improvements in video understanding. While the current video LMMs utilize advanced Large Language Models (LLMs), they rely on either image or video encoders to process visual inputs, each of which has its own limitations. Image encoders excel at capturing rich spatial details from frame sequenc… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Technical Report

  23. arXiv:2406.09407  [pdf, other

    cs.CV

    Towards Evaluating the Robustness of Visual State Space Models

    Authors: Hashmat Shadab Malik, Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar, Fahad Shahbaz Khan, Salman Khan

    Abstract: Vision State Space Models (VSSMs), a novel architecture that combines the strengths of recurrent neural networks and latent variable models, have demonstrated remarkable performance in visual perception tasks by efficiently capturing long-range dependencies and modeling complex visual dynamics. However, their robustness under natural and adversarial perturbations remains a critical concern. In thi… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  24. arXiv:2406.08714  [pdf, other

    eess.SP

    Real-time Digital RF Emulation -- II: A Near Memory Custom Accelerator

    Authors: Mandovi Mukherjee, Xiangyu Mao, Nael Rahman, Coleman DeLude, Joe Driscoll, Sudarshan Sharma, Payman Behnam, Uday Kamal, Jongseok Woo, Daehyun Kim, Sharjeel Khan, Jianming Tong, Jamin Seo, Prachi Sinha, Madhavan Swaminathan, Tushar Krishna, Santosh Pande, Justin Romberg, Saibal Mukhopadhyay

    Abstract: A near memory hardware accelerator, based on a novel direct path computational model, for real-time emulation of radio frequency systems is demonstrated. Our evaluation of hardware performance uses both application-specific integrated circuits (ASIC) and field programmable gate arrays (FPGA) methodologies: 1). The ASIC testchip implementation, using TSMC 28nm CMOS, leverages distributed autonomous… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  25. arXiv:2406.08710  [pdf, other

    eess.SP

    Real-time Digital RF Emulation -- I: The Direct Path Computational Model

    Authors: Coleman DeLude, Joe Driscoll, Mandovi Mukherjee, Nael Rahman, Uday Kamal, Xiangyu Mao, Sharjeel Khan, Hariharan Sivaraman, Eric Huang, Jeffrey McHarg, Madhavan Swaminathan, Santosh Pande, Saibal Mukhopadhyay, Justin Romberg

    Abstract: In this paper we consider the problem of developing a computational model for emulating an RF channel. The motivation for this is that an accurate and scalable emulator has the potential to minimize the need for field testing, which is expensive, slow, and difficult to replicate. Traditionally, emulators are built using a tapped delay line model where long filters modeling the physical interaction… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  26. arXiv:2406.08486  [pdf, other

    eess.IV cs.CV

    On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models

    Authors: Hashmat Shadab Malik, Numan Saeed, Asif Hanif, Muzammal Naseer, Mohammad Yaqub, Salman Khan, Fahad Shahbaz Khan

    Abstract: Volumetric medical segmentation models have achieved significant success on organ and tumor-based segmentation tasks in recent years. However, their vulnerability to adversarial attacks remains largely unexplored, raising serious concerns regarding the real-world deployment of tools employing such models in the healthcare sector. This underscores the importance of investigating the robustness of e… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  27. arXiv:2406.07818  [pdf

    physics.app-ph cond-mat.mtrl-sci physics.ins-det physics.optics

    Single MoS2-flake as a high TCR non-cryogenic bolometer

    Authors: Saba M. Khan, Jyoti Saini, Anirban Kundu, Renu Rani, Kiran S. Hazra

    Abstract: Temperature coefficient of resistance (TCR) of a bolometer can be tuned by modifying the thermal conductance of an absorbing materials since they sense radiations via the temperature change in the absorber. However, the thermal conductance of the absorber can be reduced by engineering the appropriate thermal isolation, which can be an ultimate solution towards making a highly sensitive thermal det… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 22 pages, 9 figures

  28. arXiv:2406.07710  [pdf, other

    cs.CV

    Vehicle Speed Detection System Utilizing YOLOv8: Enhancing Road Safety and Traffic Management for Metropolitan Areas

    Authors: SM Shaqib, Alaya Parvin Alo, Shahriar Sultan Ramit, Afraz Ul Haque Rupak, Sadman Sadik Khan, Mr. Md. Sadekur Rahman

    Abstract: In order to ensure traffic safety through a reduction in fatalities and accidents, vehicle speed detection is essential. Relentless driving practices are discouraged by the enforcement of speed restrictions, which are made possible by accurate monitoring of vehicle speeds. Road accidents remain one of the leading causes of death in Bangladesh. The Bangladesh Passenger Welfare Association stated in… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  29. arXiv:2406.04844  [pdf, other

    cs.CV

    Multi-Granularity Language-Guided Multi-Object Tracking

    Authors: Yuhao Li, Muzammal Naseer, Jiale Cao, Yu Zhu, Jinqiu Sun, Yanning Zhang, Fahad Shahbaz Khan

    Abstract: Most existing multi-object tracking methods typically learn visual tracking features via maximizing dis-similarities of different instances and minimizing similarities of the same instance. While such a feature learning scheme achieves promising performance, learning discriminative features solely based on visual information is challenging especially in case of environmental interference such as o… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  30. arXiv:2406.04413  [pdf, other

    cs.CV cs.AI

    Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning

    Authors: Amandeep Kumar, Muhammad Awais, Sanath Narayan, Hisham Cholakkal, Salman Khan, Rao Muhammad Anwer

    Abstract: Drawing upon StyleGAN's expressivity and disentangled latent space, existing 2D approaches employ textual prompting to edit facial images with different attributes. In contrast, 3D-aware approaches that generate faces at different target poses require attribute-specific classifiers, learning separate model weights for each attribute, and are not scalable for novel attributes. In this work, we prop… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  31. arXiv:2406.03922  [pdf, other

    cs.DS

    Engineering Semi-streaming DFS algorithms

    Authors: Kancharla Nikhilesh Bhagavan, Macharla Sri Vardhan, Madamanchi Ashok Chowdary, Shahbaz Khan

    Abstract: Depth first search is a fundamental graph problem having a wide range of applications. For a graph $G=(V,E)$ having $n$ vertices and $m$ edges, the DFS tree can be computed in $O(m+n)$ using $O(m)$ space where $m=O(n^2)$. In the streaming environment, most graph problems are studied in the semi-streaming model where several passes (preferably one) are allowed over the input, allowing $O(nk)$ local… ▽ More

    Submitted 7 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  32. arXiv:2406.02548  [pdf, other

    cs.CV

    Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation

    Authors: Mohamed El Amine Boudjoghra, Angela Dai, Jean Lahoud, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Fahad Shahbaz Khan

    Abstract: Recent works on open-vocabulary 3D instance segmentation show strong promise, but at the cost of slow inference speed and high computation requirements. This high computation cost is typically due to their heavy reliance on 3D clip features, which require computationally expensive 2D foundation models like Segment Anything (SAM) and CLIP for multi-view aggregation into 3D. As a consequence, this h… ▽ More

    Submitted 20 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  33. arXiv:2406.02466  [pdf, other

    gr-qc astro-ph.HE

    What no one has seen before: gravitational waveforms from warp drive collapse

    Authors: Katy Clough, Tim Dietrich, Sebastian Khan

    Abstract: Despite originating in science fiction, warp drives have a concrete description in general relativity, with Alcubierre first proposing a spacetime metric that supported faster-than-light travel. Whilst there are numerous practical barriers to their implementation in real life, including a requirement for negative energy, computationally, one can simulate their evolution in time given an equation o… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 12 pages, 6 figures, plus appendix. Comments welcome!

  34. arXiv:2406.00843  [pdf, other

    quant-ph cs.LG

    Diffusion-Inspired Quantum Noise Mitigation in Parameterized Quantum Circuits

    Authors: Hoang-Quan Nguyen, Xuan Bac Nguyen, Samuel Yen-Chi Chen, Hugh Churchill, Nicholas Borys, Samee U. Khan, Khoa Luu

    Abstract: Parameterized Quantum Circuits (PQCs) have been acknowledged as a leading strategy to utilize near-term quantum advantages in multiple problems, including machine learning and combinatorial optimization. When applied to specific tasks, the parameters in the quantum circuits are trained to minimize the target function. Although there have been comprehensive studies to improve the performance of the… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  35. arXiv:2406.00667  [pdf, other

    eess.IV cs.AI cs.CL cs.CV cs.LG

    An Early Investigation into the Utility of Multimodal Large Language Models in Medical Imaging

    Authors: Sulaiman Khan, Md. Rafiul Biswas, Alina Murad, Hazrat Ali, Zubair Shah

    Abstract: Recent developments in multimodal large language models (MLLMs) have spurred significant interest in their potential applications across various medical imaging domains. On the one hand, there is a temptation to use these generative models to synthesize realistic-looking medical image data, while on the other hand, the ability to identify synthetic image data in a pool of data is also significantl… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Accepted in Fifth IEEE Workshop on Artificial Intelligence for HealthCare, IEEE 25th International Conference on Information Reuse and Integration for Data Science

  36. arXiv:2406.00449  [pdf, other

    eess.IV cs.CV

    Dual Hyperspectral Mamba for Efficient Spectral Compressive Imaging

    Authors: Jiahua Dong, Hui Yin, Hongliu Li, Wenbo Li, Yulun Zhang, Salman Khan, Fahad Shahbaz Khan

    Abstract: Deep unfolding methods have made impressive progress in restoring 3D hyperspectral images (HSIs) from 2D measurements through convolution neural networks or Transformers in spectral compressive imaging. However, they cannot efficiently capture long-range dependencies using global receptive fields, which significantly limits their performance in HSI reconstruction. Moreover, these methods may suffe… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 13 pages, 6 figures

  37. arXiv:2406.00086  [pdf, ps, other

    hep-ph gr-qc hep-th

    Leptogenesis in exponential $f(R)$ gravity model

    Authors: Suhail Khan, Ajay Bassi, Rathin Adhikari

    Abstract: We show that gravitational leptogenesis with dynamical $CPT$ breaking in an expanding universe can be reconciled with exponential $f(R)$ gravity models with axion as cold dark matter. For $L$ violating interactions, we consider both non-supersymmetric model with heavy right handed neutrino decay and supersymmetric model with sneutrino decay. For both the cases, we have shown that the required bary… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: 6 pages, 3 figures

  38. arXiv:2405.20363  [pdf, other

    cs.CV

    LLMGeo: Benchmarking Large Language Models on Image Geolocation In-the-wild

    Authors: Zhiqiang Wang, Dejia Xu, Rana Muhammad Shahroz Khan, Yanbin Lin, Zhiwen Fan, Xingquan Zhu

    Abstract: Image geolocation is a critical task in various image-understanding applications. However, existing methods often fail when analyzing challenging, in-the-wild images. Inspired by the exceptional background knowledge of multimodal language models, we systematically evaluate their geolocation capabilities using a novel image dataset and a comprehensive evaluation framework. We first collect images f… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 7 pages, 3 figures, 5 tables, CVPR 2024 Workshop on Computer Vision in the Wild

  39. arXiv:2405.20084  [pdf, other

    cs.CV

    Estimating Human Poses Across Datasets: A Unified Skeleton and Multi-Teacher Distillation Approach

    Authors: Muhammad Saif Ullah Khan, Dhavalkumar Limbachiya, Didier Stricker, Muhammad Zeshan Afzal

    Abstract: Human pose estimation is a key task in computer vision with various applications such as activity recognition and interactive systems. However, the lack of consistency in the annotated skeletons across different datasets poses challenges in developing universally applicable models. To address this challenge, we propose a novel approach integrating multi-teacher knowledge distillation with a unifie… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 15 pages (with references)

  40. arXiv:2405.19725  [pdf, other

    quant-ph cs.CV

    Quantum Visual Feature Encoding Revisited

    Authors: Xuan-Bac Nguyen, Hoang-Quan Nguyen, Hugh Churchill, Samee U. Khan, Khoa Luu

    Abstract: Although quantum machine learning has been introduced for a while, its applications in computer vision are still limited. This paper, therefore, revisits the quantum visual encoding strategies, the initial step in quantum machine learning. Investigating the root cause, we uncover that the existing quantum encoding design fails to ensure information preservation of the visual features after the enc… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  41. arXiv:2405.19722  [pdf, other

    cs.CV

    QClusformer: A Quantum Transformer-based Framework for Unsupervised Visual Clustering

    Authors: Xuan-Bac Nguyen, Hoang-Quan Nguyen, Samuel Yen-Chi Chen, Samee U. Khan, Hugh Churchill, Khoa Luu

    Abstract: Unsupervised vision clustering, a cornerstone in computer vision, has been studied for decades, yielding significant outcomes across numerous vision tasks. However, these algorithms involve substantial computational demands when confronted with vast amounts of unlabeled data. Conversely, Quantum computing holds promise in expediting unsupervised algorithms when handling large-scale databases. In t… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  42. arXiv:2405.18808  [pdf, other

    cs.CV

    BRACTIVE: A Brain Activation Approach to Human Visual Brain Learning

    Authors: Xuan-Bac Nguyen, Hojin Jang, Xin Li, Samee U. Khan, Pawan Sinha, Khoa Luu

    Abstract: The human brain is a highly efficient processing unit, and understanding how it works can inspire new algorithms and architectures in machine learning. In this work, we introduce a novel framework named Brain Activation Network (BRACTIVE), a transformer-based approach to studying the human visual brain. The main objective of BRACTIVE is to align the visual features of subjects with corresponding b… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  43. arXiv:2405.18304  [pdf, other

    cs.CV

    Multi-modal Generation via Cross-Modal In-Context Learning

    Authors: Amandeep Kumar, Muzammal Naseer, Sanath Narayan, Rao Muhammad Anwer, Salman Khan, Hisham Cholakkal

    Abstract: In this work, we study the problem of generating novel images from complex multimodal prompt sequences. While existing methods achieve promising results for text-to-image generation, they often struggle to capture fine-grained details from lengthy prompts and maintain contextual coherence within prompt sequences. Moreover, they often result in misaligned image generation for prompt sequences featu… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Technical Report

  44. arXiv:2405.13278  [pdf, other

    cs.CV physics.med-ph

    Single color virtual H&E staining with In-and-Out Net

    Authors: Mengkun Chen, Yen-Tung Liu, Fadeel Sher Khan, Matthew C. Fox, Jason S. Reichenberg, Fabiana C. P. S. Lopes, Katherine R. Sebastian, Mia K. Markey, James W. Tunnell

    Abstract: Virtual staining streamlines traditional staining procedures by digitally generating stained images from unstained or differently stained images. While conventional staining methods involve time-consuming chemical processes, virtual staining offers an efficient and low infrastructure alternative. Leveraging microscopy-based techniques, such as confocal microscopy, researchers can expedite tissue a… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  45. arXiv:2405.12986  [pdf

    eess.IV cs.AI cs.CV

    A Novel Feature Map Enhancement Technique Integrating Residual CNN and Transformer for Alzheimer Diseases Diagnosis

    Authors: Saddam Hussain Khan

    Abstract: Alzheimer diseases (ADs) involves cognitive decline and abnormal brain protein accumulation, necessitating timely diagnosis for effective treatment. Therefore, CAD systems leveraging deep learning advancements have demonstrated success in AD detection but pose computational intricacies and the dataset minor contrast, structural, and texture variations. In this regard, a novel hybrid FME-Residual-H… ▽ More

    Submitted 25 May, 2024; v1 submitted 30 March, 2024; originally announced May 2024.

    Comments: 28 Pages, 11 Figures, 3 Tables

  46. arXiv:2405.12217  [pdf, other

    cs.CV cs.AI cs.LG

    Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning

    Authors: Guanglin Zhou, Zhongyi Han, Shiming Chen, Biwei Huang, Liming Zhu, Salman Khan, Xin Gao, Lina Yao

    Abstract: Recent studies indicate that large multimodal models (LMMs) are highly robust against natural distribution shifts, often surpassing previous baselines. Despite this, domain-specific adaptation is still necessary, particularly in specialized areas like healthcare. Due to the impracticality of fine-tuning LMMs given their vast parameter space, this work investigates in-context learning (ICL) as an e… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 17 pages, 7 figures, 7 tables

  47. arXiv:2405.09361  [pdf, other

    hep-th cond-mat.str-el quant-ph

    Quantum operations for Kramers-Wannier duality

    Authors: Maaz Khan, Syed Anausha Bin Zakir Khan, Arif Mohd

    Abstract: We study the Kramers-Wannier duality for the transverse-field Ising lattice on a ring. A careful consideration of the ring boundary conditions shows that the duality has to be implemented with a proper treatment of different charge sectors of both the twisted and untwisted Ising and the dual-Ising Hilbert spaces. We construct a superoperator that explicitly maps the Ising operators to the dual-Isi… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 7 pages, 1 table, 4 figures

  48. arXiv:2405.07979  [pdf, other

    stat.ME math.ST

    Low-order outcomes and clustered designs: combining design and analysis for causal inference under network interference

    Authors: Matthew Eichhorn, Samir Khan, Johan Ugander, Christina Lee Yu

    Abstract: Variance reduction for causal inference in the presence of network interference is often achieved through either outcome modeling, which is typically analyzed under unit-randomized Bernoulli designs, or clustered experimental designs, which are typically analyzed without strong parametric assumptions. In this work, we study the intersection of these two approaches and consider the problem of estim… ▽ More

    Submitted 11 July, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  49. arXiv:2405.07753  [pdf

    cond-mat.mtrl-sci physics.app-ph

    Dynamic FMR and magneto-optical response of hydrogenated FCC phase Fe25Pd75 thin films and micro patterned devices

    Authors: Shahbaz Khan, Satyajit Sarkar, Nicolas B. Lawler, Ali Akbar, Muhammad Sabieh Anwar, Mariusz Martyniuk, K. Swaminathan Iyer, Mikhail Kostylev

    Abstract: In this work, we investigate the effects of H2 on the physical properties of Fe25Pd75. Broadband ferromagnetic resonance (FMR) spectroscopy revealed a significant FMR peak shift induced by H2 absorption for the FCC phased Fe25Pd75. The peak shifted towards higher applied fields, which is contrary to what was previously observed for CoPd alloys. Additionally, we conducted structural and magneto-opt… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  50. arXiv:2405.03690  [pdf, other

    cs.CV

    How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs

    Authors: Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Jameel Hassan, Muzammal Naseer, Federico Tombari, Fahad Shahbaz Khan, Salman Khan

    Abstract: Recent advancements in Large Language Models (LLMs) have led to the development of Video Large Multi-modal Models (Video-LMMs) that can handle a wide range of video understanding tasks. These models have the potential to be deployed in real-world applications such as robotics, AI assistants, medical surgery, and autonomous vehicles. The widespread adoption of Video-LMMs in our daily lives undersco… ▽ More

    Submitted 8 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: Technical report