George Z. Lin’s Post

Navigate the AI landscape with me! 🤖🚀💼🌐 #AITechBiz AI/ML Leader at BrandGuard AI | MassChallenge | Wharton VentureLab.

1mo

A collaborative team from Peking University and Microsoft have proposed xRAG, a system that reinterprets document embeddings—traditionally used solely for retrieval purposes—as features from the retrieval modality, integrating these into the language model representation space. This integration eliminates the need for textual counterparts of document embeddings, achieving a significant compression rate. The core innovation of xRAG lies in its modality fusion methodology, which allows for the direct incorporation of document embeddings into the language model's representation space without the need for the embeddings' textual content. This approach simplifies the retrieval-augmented generation process and significantly reduces the computational resources required, as evidenced by a reduction in overall FLOPs by a factor of 3.53 compared to uncompressed models. Additionally, xRAG's design ensures that both the retriever and the language model remain unchanged, preserving the plug-and-play nature of retrieval augmentation and allowing for the reuse of offline-constructed document embeddings. Experimental evaluations across six knowledge-intensive tasks reveal that xRAG achieves an average improvement of over 10%, adaptable to various language model backbones. This performance demonstrates xRAG's effectiveness in context compression and highlights its potential to match the performance of uncompressed models on several datasets. These achievements underscore xRAG's role in pioneering new directions in retrieval-augmented generation, particularly from the perspective of multimodality fusion. Modality fusion and context compression will enable high performance generalized modal models on device and/or linearly scalable retrieval systems. Arxiv: https://lnkd.in/eWEiy6rS

To view or add a comment, sign in

More Relevant Posts

George Z. Lin

Navigate the AI landscape with me! 🤖🚀💼🌐 #AITechBiz AI/ML Leader at BrandGuard AI | MassChallenge | Wharton VentureLab.
1w
Report this post
Sparse MetA-Tuning (SMAT) paper at ICML from CUHK showcases a new fine tuning mechanism, particularly for vision tasks. Traditional transfer learning methods involved fine-tuning pre-trained models to adapt to new tasks, but recent advancements have introduced meta-tuning, which aims to enhance the transferability of foundation models through a secondary optimization stage. However, traditional meta-tuning has struggled with generalizing to out-of-distribution (OOD) tasks. SMAT addresses these challenges by leveraging sparse mixture-of-experts models to isolate subsets of pre-trained parameters for meta-tuning specific tasks. This method significantly improves the transfer abilities of vision foundation models, particularly in OOD scenarios, and establishes new state-of-the-art results on challenging datasets in both zero-shot and gradient-based adaptation settings. The core innovation of SMAT lies in its ability to automatically identify and utilize sparsity patterns in the model's parameters, promoting specialization and reducing task interference. This is achieved through a learned gated interpolation, balancing between in-distribution and out-of-distribution generalization. The approach preserves the pre-trained model’s generalization capabilities while consolidating knowledge from all meta-tuned tasks without interference. SMAT's methodology includes the use of task-specific dense teachers during meta-training, which promotes the specialization and cooperation of the sparse interpolated experts. The research highlights the importance of controlled sparsity levels, showing that higher sparsity levels can improve OOD performance due to stronger intrinsic meta-regularization and better preservation of generic pre-trained features. Arxiv: https://lnkd.in/eRRcnddn
1 Comment
Like Comment
To view or add a comment, sign in
George Z. Lin

Navigate the AI landscape with me! 🤖🚀💼🌐 #AITechBiz AI/ML Leader at BrandGuard AI | MassChallenge | Wharton VentureLab.
2w
Report this post
The team at Apple released a study (back in February!!) documenting Speculative Streaming, a novel approach designed to enhance the inference speed of large language models (LLMs) without the need for auxiliary models. Traditionally, speculative decoding techniques have relied on a two-model system, where a smaller "draft" model generates candidate tokens that are then verified by a larger "target" model. While effective, this method introduces complexity and requires fine-tuning of both models, which can be resource-intensive. Speculative Streaming simplifies this process by integrating the drafting and verification phases into a single model. This is achieved by shifting the model's fine-tuning objective from predicting the next token to predicting future n-grams, thereby allowing the model to generate and verify tokens concurrently. This method not only streamlines the inference process but also significantly reduces the parameter overhead, making it an attractive option for resource-constrained environments. The effectiveness of Speculative Streaming is evident in its performance across various tasks, including summarization, structured queries, and meaning representation. It achieves speedups of 1.8 to 3.1 times without compromising on generation quality. Moreover, when compared to existing methods like Medusa-style architectures, Speculative Streaming demonstrates superior efficiency by requiring approximately 10,000 times fewer extra parameters. One of the key advantages of Speculative Streaming is its resource efficiency. Unlike methods that rely on auxiliary models, Speculative Streaming does not increase the memory footprint, making it particularly well-suited for deployment on devices with limited resources. Additionally, its end-to-end training approach ensures that the speculation and verification phases are naturally aligned, further enhancing its efficiency. Arxiv: https://lnkd.in/e-QxkNFr
1 Comment
Like Comment
To view or add a comment, sign in
George Z. Lin

Navigate the AI landscape with me! 🤖🚀💼🌐 #AITechBiz AI/ML Leader at BrandGuard AI | MassChallenge | Wharton VentureLab.
2w
Report this post
EPFL research has identified notable vulnerabilities in the latest safety-aligned Large Language Models (LLMs) to adaptive jailbreaking attacks. Despite advancements in safety measures, these models, including GPT-4 and various models from Meta and Google, can be manipulated to generate harmful content with nearly 100% success rates. The study employed adversarial prompt templates, random search techniques, and transfer attacks to exploit weaknesses in these models, highlighting the need for more robust safety mechanisms. The research emphasizes the importance of adaptivity in constructing these attacks, as different models exhibit unique vulnerabilities. For example, some models are particularly sensitive to specific prompting templates, while others have API-based weaknesses. The study also introduced a novel approach to detecting trojan strings in poisoned models, showcasing the team's success in the SaTML Trojan Detection Competition. The implications of this research are troubling, particularly as it implies that even the most robust safety mechanisms may be vulnerable to ever more advanced techniques and given the large scale integration of these tools into our devices and everyday workflows. Arxiv: https://lnkd.in/g-Q3nupV
Like Comment
To view or add a comment, sign in
George Z. Lin

Navigate the AI landscape with me! 🤖🚀💼🌐 #AITechBiz AI/ML Leader at BrandGuard AI | MassChallenge | Wharton VentureLab.
2w
Report this post
Conventional wisdom is that CV models must incorporate a 'locality' bias—where nearby pixels are deemed more related than distant ones -- the architecture of Convolutional Neural Networks (ConvNets) is designed just for this and even newer Vision Transformer (ViT) architectures segment images into patches. However, a recent study by researchers from FAIR, Meta AI, and the University of Amsterdam challenges this assumption proposing the application of pixel level vanilla Transformers (PiT), which treat each individual pixel as a separate token, across various tasks in computer vision, including supervised learning for object classification, self-supervised learning via masked autoencoding, and image generation with diffusion models. This approach diverges significantly from the standard practice in Vision Transformers, which typically operate on 16x16 pixel patches. The team found that treating each pixel as a token not only challenges the mainstream belief in the indispensability of locality but also yields highly performant results. Despite the computational impracticality of processing individual pixels due to the quadratic complexity of self-attention mechanisms in Transformers, the findings suggest a reevaluation of model design strategies in computer vision. The study underscores the potential of data-driven, learnable patterns over manually encoded biases, pointing towards a future where models might learn to identify and utilize relevant patterns directly from the data, irrespective of their spatial arrangements. This revelation prompts a broader discussion on the design of neural architectures for image processing. While the efficiency trade-offs of pixel-level processing cannot be overlooked, the study provides a clear message: locality, as an inductive bias, is not fundamental. Instead, it emerges as a heuristic that balances efficiency against accuracy. As the field moves forward, this insight could guide the development of more flexible and potentially more powerful models that are not constrained by traditional biases. Moreover, the study's findings on the texture versus shape bias in PiT models, which show a slight preference for shape over texture, further illustrate the nuanced ways in which these models process visual information. This could have implications for understanding model interpretability and the mechanisms through which neural networks derive their conclusions from visual data. Arxiv: https://lnkd.in/eMS-SYZb
Like Comment
To view or add a comment, sign in
George Z. Lin

Navigate the AI landscape with me! 🤖🚀💼🌐 #AITechBiz AI/ML Leader at BrandGuard AI | MassChallenge | Wharton VentureLab.
3w
Report this post
OpenVLA, a 7-billion parameter open-source model, marks the integration of vision, language, and action (VLA) into a unified robotics system. Developed through a collaboration among Stanford, Berkeley, Toyota Research, DeepMind, and MIT, OpenVLA is trained on a diverse dataset of 970,000 robot manipulation trajectories from the Open X-Embodiment dataset. Utilizing a Llama 2 language model and visual encoders from DINOv2 and SigLIP, it surpasses the 55-billion parameter RT-2-X model by 16.5% in task success rates across 29 tasks, despite having significantly fewer parameters. OpenVLA's architecture facilitates multi-robot control and allows for rapid adaptation to new domains through parameter-efficient fine-tuning, enhancing its versatility and scalability. Its open-source nature addresses accessibility and adaptability challenges, offering model checkpoints, a PyTorch training pipeline, and fine-tuning notebooks on HuggingFace. This approach democratizes access to advanced robotics technology and encourages community-driven innovation. The model's efficiency is further highlighted by its capability to be fine-tuned on consumer GPUs using modern low-rank adaptation methods and served efficiently via quantization, which reduces the memory footprint without sacrificing performance. OpenVLA's robust performance has checked through extensive evaluations on multiple robot platforms, including the WidowX and Google Robot, achieving high success rates in various generalization tasks. Arxiv: https://lnkd.in/eaFWN44c
Like Comment
To view or add a comment, sign in
George Z. Lin

Navigate the AI landscape with me! 🤖🚀💼🌐 #AITechBiz AI/ML Leader at BrandGuard AI | MassChallenge | Wharton VentureLab.
4w
Report this post
🚀 For everyone who has asked, I'm posting the slides from my Build your own BrandGPT talk from #TechWeek #NYC. Special thanks again to WeWork for hosting us in their delightful hub and to Reuschal Jackson and Adrienne Crowley for enabling this. 🌟 Curious about how BrandGuard can empower you to craft your very own BrandGPT? Drop us a line at contact@brandguard.ai and let's ignite your brand's potential! 🔥 👉 Slides https://hubs.ly/Q02BC8Dg0
Like Comment
To view or add a comment, sign in

2,390 followers

197 Posts

View Profile Follow

George Z. Lin’s Post

More Relevant Posts

Explore topics