Charlie Lee, PhD’s Post

Genomics Industry Lead | Experienced R&D Leader

1mo

https://lnkd.in/gcgHTFhg Matrix multiplications (MatMul) are the most computationally expensive operations in large language models (LLM) using the Transformer architecture. As LLMs scale to larger sizes, the cost of MatMul grows significantly, increasing memory usage and latency during training and inference. Now, researchers at the University of California, Santa Cruz, Soochow University and University of California, Davis have developed a novel architecture that completely eliminates matrix multiplications from language models while maintaining strong performance at large scales.

New Transformer architecture could enable powerful LLMs without GPUs

https://venturebeat.com

To view or add a comment, sign in

More Relevant Posts

Vibhanshu Abhishek
6mo
Report this post
Found an interesting tool to create Neural Network architecture diagrams with a click of a button rather than doing it manually. Check out NN-SVG. I think it will save a lot of time while creating files suitable for inclusion in academic papers or web pages. #neuralnetworks #ai #research #researchpaper #student #phd #mastersdegree #diagram #network #artificialintelligence #svg
2 Comments
Like Comment
To view or add a comment, sign in
Divy Chaurasia

Open Source | ML & MLOps | Community Engineer
1mo
Report this post
A new language model architecture that removes the need for matrix multiplication. As of now, matrix-matrix multiplication is required in most of the neural networks. The computational cost grows as it scales to larger embedding dimensions. Most of the GPUs are optimized for matrix multiplication operations. This paper proposes a new architecture of language models that uses CUDA (Compute Unified Device Architecture) and linear algebra libraries like cuBLAS (https://lnkd.in/gmcMH-3H). LLMs based on this architecture maintained a strong performance even at billion-parameter scales. Here's the link to the full paper: https://lnkd.in/g9y323yE Hugging Face models: https://lnkd.in/gcnbeC_G I think this work is amazing and can be utilized at scale. Tagging some authors of this paper: Rui-Jie Zhu, Ethan Sifferman, Jason Eshraghian, Dustin R.
3 Comments
Like Comment
To view or add a comment, sign in
Ricardo Merlos
8mo
Report this post
Just came across an incredible paper on Swin Transformers, a cutting-edge architecture transforming the landscape of image classification. 🤖 📸 👓 What's Swin Transformer? It introduces a hierarchical vision architecture using shifted windows, addressing the limitations of traditional transformers in computer vision tasks. By capturing both local and global contextual information. 📰 Research Paper: For those interested in diving deeper, here's the link to the research paper: https://lnkd.in/eW62_8rk 🔑 Key Features: * Hierarchical structure for multi-scale information capture. * Shifted windows for enhanced computational efficiency 🌎 Impact: Exciting times ahead as Swin Transformers push the boundaries of what's possible in computer vision. The efficiency and performance are paving the way for innovative applications across various domains. 👉 Let's explore the future of computer vision together! Have you come across any other fascinating research in this space? Share your thougths below! 👇 #ComputerVision #SwinTransformers #MachineLearning #AI

2103.14030.pdf

arxiv.org
Like Comment
To view or add a comment, sign in
Satnam Gautam

Student at GL Bajaj Institute of Technology and Management
8mo
Report this post
LSTM Architecture: Long Short Term Memory networks – usually just called “LSTMs” are a special kind of RNN, capable of learning long-term dependencies. LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, not something they struggle to learn. LSTMs also have this chain like structure, but the repeating module has a different structure. Instead of having a single neural network layer, there are four, interacting in a very special way, first three are sigma which have sigmoid function and the last is tanh. They maintain two layers , LTM(Long term memory) known as c(t)-1 & STM(Short term memory) known as h(t)-1. LTM also known as cell state & STM also known as hidden state. cell state is the horizontal line running through the top of the diagram and the hidden state is the horizontal line running through the bottom of the diagram.
Like Comment
To view or add a comment, sign in
Syed Musharaf

Data Science Enthusiast | Expert in Transforming Data into Insights | Unique Blend of Analytical Rigor and Innovation.
3mo Edited
Report this post
my handcrafted Artificial Neuron Network Leveraging the power , I've meticulously designed an analog rendition of a neural network. This DIY approach allowed me to deepen my understanding of the fundamentals while honing my skills in neural network architecture. Check out the link below to delve into the intricacies of this handwritten marvel! #ArtificialNeuralNetwork #Sequential #DIYAI #DeepLearning . . . 👉 https://lnkd.in/gWeFetKX
Like Comment
To view or add a comment, sign in
Daljit Singh

Student at Auckland University of Technology
2mo
Report this post
How do you choose the right architecture for a neural network?

How do you choose the right architecture for a neural network?

https://howtoanswershub.com
Like Comment
To view or add a comment, sign in
es/iode

802 followers
4mo
Report this post
📃Scientific paper: Constrained Deep Reinforcement Learning for Fronthaul Compression Optimization Abstract: In the Centralized-Radio Access Network \(C-RAN\) architecture, functions can be placed in the central or distributed locations. This architecture can offer higher capacity and cost savings but also puts strict requirements on the fronthaul \(FH\). Adaptive FH compression schemes that adapt the compression amount to varying FH traffic are promising approaches to deal with stringent FH requirements. In this work, we design such a compression scheme using a model-free off policy deep reinforcement learning algorithm which accounts for FH latency and packet loss constraints. Furthermore, this algorithm is designed for model transparency and interpretability which is crucial for AI trustworthiness in performance critical domains. We show that our algorithm can successfully choose an appropriate compression scheme while satisfying the constraints and exhibits a roughly 70\% increase in FH utilization compared to a reference scheme. ;Comment: conference, ieee Continued on ES/IODE ➡️ https://etcse.fr/iqLM ------- If you find this interesting, feel free to follow, comment and share. We need your help to enhance our visibility, so that our platform continues to serve you.

Constrained Deep Reinforcement Learning for Fronthaul Compression Optimization

ethicseido.com
Like Comment
To view or add a comment, sign in
Mahesh Narayan

Artificial Intelligence Solutions Sales, Accenture
4mo
Report this post
In their paper, "The Topos of Transformer Networks," Villani and McBurney explore the transformer neural network's superiority in large language models using topos theory. They demonstrate that while common networks like CNNs and RNNs operate within a pretopos of piecewise-linear functions, transformers function within a topos completion, enabling higher-order reasoning. This analysis also intersects with architecture search and gradient descent, positioning transformers as advanced cybernetic agents in machine learning. https://lnkd.in/g_MBtNdm

The Topos of Transformer Networks

arxiv.org

1 Comment
Like Comment
To view or add a comment, sign in
Dhanushkumar R

Microsoft Student Learn Ambassador - BETA | SIH'23 Finalist | Machine Learning | Deep Learning | Languages Models | Gen AI | LLM | Azure | Mlops Enthusiast
5mo
Report this post
EfficientNet is a convolutional neural network architecture and scaling method that uniformly scales all the dimensions of depth ,width and resolution using a compound coefficient.Unlike the convolutional practice that scales these factors,the EfficientNet scaling method uniformly scales network width,depth,resolution with a set of fixed scaling coefficients.For example,if we want to use 2^N time more computational resources,then we can simply increases the network depth by α^N ,width by β^N,image size by γ^N ,where α,β, γ are constant coefficients determined by a small grid search on the original small model.EfficientNet uses a compound coefficient Φ to uniformly scales network width,depth and resolution in a principled way. #efficientnet #cnn #model #deeplearning #convolution #cv #ai #ml #medium #blog #

EfficientNet — Scaling Depth,Width,Resolution

link.medium.com
Like Comment
To view or add a comment, sign in
Alexander Pivovarov

ML Engineer at Amazon Web Services
11mo
Report this post
https://lnkd.in/gdpWXiVv - Constructive approach to explaining the transformer: starting from a simple convolutional neural network, he will step through all of the changes that need to be made, along with the motivations for why these changes need to be made

Transformer Neural Networks Derived from Scratch

https://www.youtube.com/

1 Comment
Like Comment
To view or add a comment, sign in

9,669 followers

1,029 Posts

View Profile Follow

Charlie Lee, PhD’s Post

More Relevant Posts

Transformer Neural Networks Derived from Scratch

https://www.youtube.com/

Explore topics