📣 MAX 24.4 ⚡️ is here! Featuring: ⚛️ New MAX Quantization API - bringing state-of-the-art CPU performance to models built with MAX Graphs. 🍎 MAX for macOS! 🦙 INT4 implementations of Llama 3 and Llama 2! Plus, performance and standard library improvements for Mojo 🔥, including 200+ community contributions. Learn more in the Modular blog! https://lnkd.in/gzumMscT
Modular’s Post
More Relevant Posts
-
Int4 and Int6 is a great way to reach new levels of efficiency and latency for big LLMs - particularly on CPU. MAX makes is super easy to adopt and move GenAI models to these formats, and gets all the overhead of other systems out of the way. This release is a big step that brings together the state of the art tech in the MAX platform with programmability of Mojo. It allows you to enable these custom datatypes for YOUR models too, instead of the typical point solution demo for a fixed reference model or two. Also, yes - MAX for NVIDIA GPUs is coming later in summer!
📣 MAX 24.4 ⚡️ is here! Featuring: ⚛️ New MAX Quantization API - bringing state-of-the-art CPU performance to models built with MAX Graphs. 🍎 MAX for macOS! 🦙 INT4 implementations of Llama 3 and Llama 2! Plus, performance and standard library improvements for Mojo 🔥, including 200+ community contributions. Learn more in the Modular blog! https://lnkd.in/gzumMscT
Modular: MAX 24.4 - Introducing Quantization APIs and MAX on macOS
To view or add a comment, sign in
-
Now benchmarking FrodoKEM C++ library implementation on CPU targets, running GNU/Linux kernel, can capture CPU cycles, using google-benchmark and libPFM 🥳 Following results collected on a 12th Gen Intel i7 machine s.t. library was compiled using GCC-13.1, with optimization flags `-O3 -march=native -flto`. 🔎 itzmeanjan/frodokem.git on Github
To view or add a comment, sign in
-
-
As a select NVIDIA Elite partner, MBX (now AHEAD Engineered Solutions), we’ve got a jump on developing reference architectures with NVIDIA IGX Orin. Case in point: Our Optio L100 and L150 platforms featuring NVIDIA IGX Orin architecture. Each connects multiple high-speed sensors and streams for on-device analytics as well as support for standard enterprise requirements such as bare-metal manageability and standard Linux distributions by leading Linux providers. Dive deeper into our expanded Optio series. ⤵️ https://hubs.ly/Q01ZR5qc0...
To view or add a comment, sign in
-
-
Based on the upstream v6.8 kernel, the 24.04 release of Real-time Ubuntu also includes optimised support for Raspberry Pi hardware to deliver enhanced performance and compatibility across a broad compute spectrum. With Ubuntu 24.04 LTS, users can confidently explore the possibilities of optimised real-time compute on Raspberry Pi 4 and 5, driving innovation and unlocking new opportunities in embedded. Read more about Ubuntu 24.04 LTS in our announcement: https://lnkd.in/d4qHVcuf #RealTimeUbuntu
To view or add a comment, sign in
-
-
Torching my eGPU to train Stable Diffusion LoRA (SD 1.5 and SDXL) with 20 images. Clearly 8 VRAM will be challenging here as the total process consumes > 9 GB of VRAM. MPS backend in macOS just keep failing to utilize the eGPU and fallback to CPU. But in Ubuntu 24.04, the ROCM and HIP SDK works flawlessly.
To view or add a comment, sign in
-
-
Updated Post: Installing and Configuring NVIDIA GPU and CUDA drivers on Ubuntu 20.04: A Guide for Use of Docker-NVIDIA https://buff.ly/3wMoSEQ
To view or add a comment, sign in
-
-
Wrote an article about turning a ThinkPad X1 Carbon 6th Gen laptop into a programmable USB device by enabling the xDCI controller 😯 Now I can emulate USB devices from the laptop without any external hardware, including via Raw Gadget or even Facedancer 😁 The overall process included fiddling with Linux kernel drivers, xHCI, DWC3, ACPI, BIOS/UEFI, Boot Guard, TPM, NVRAM, PCH, PMC, PSF, IOSF, and P2SB, and making a custom USB cable 😱 https://lnkd.in/dXhHwfQB
🤫 Unlocking secret ThinkPad functionality for emulating USB devices
xairy.io
To view or add a comment, sign in
-
VICTORY!!! Successfully implemented support for the final POSIX clock ID type, CLOCK_THREAD_CPUTIME_ID, in the KallistiOS kernel for the Sega #Dreamcast! It required adding two new 64-bit variables to each thread: 1) timestamp from when the scheduler last activated the thread 2) running total CPU time which gets updated every time the scheduler swaps out the thread for another. Values for both are taken from the (undocumented) SH4 performance counters, which can provide a 5ns resolution timer which is updated every time a non-sleep instruction gets executed. The end result is that we can now create a thread-level CPU performance profiler allowing you to visualize what percentage of the total CPU time is spent in each individual thread! (For the record, the Win32 API function, GetThreadTimes, only supports 100ns resolution on Windows... while we just did 5ns resolution for the Dreamcast.)
To view or add a comment, sign in
-
-
IT Administrator @ Qualitest | Linux, Network Administration, IT Administration, Cyber Security | CompTia A+ and Security+ In Progress
Super excited to try out playing Metal Gear Solid 4 @ 60fps on my newly installed Arch System(bare metal)! To get this running I had to: 1. Install NVIDIA dkms modules (pain in the butt, but thank God for nano) 2. Add some nvidia hooks to my pacman.conf and take out kms so nouveau isn’t loaded 3. Clone the RPCS3 repo and use Cmake to compile it on my system. (Shout out to the awesome people managing the AUR) It’s really awesome having the power to do as you want with your system and Arch lets you to install ONLY what you need , never had a less bloated OS in my life! Next I will try my hand at building Linux from Scratch ! #arch #archlinux #rpcs3 #linuxfromscratch #AUR
To view or add a comment, sign in
-
-
"Code reuse attacks reuse existing code snippets to bypass existing memory protections. However, the X86_64 Linux kernel image is located in the top 2G of the address space, and even with KASLR, it can only be relocated within the top 2G. This makes it easy for an attacker to guess the virtual address. By building the kernel as Position Independent Executables (PIE), it can be placed in any virtual address, thereby increasing the number of possible locations and making it harder for an attacker to guess the virtual address. This also provides flexibility to the kernel image's virtual address, allowing it to be placed in the low half of the address space. This presentation will demonstrate the implementation of the X86 kernel relocation and explain how to build the X86_64 Linux kernel as PIE and relocate it below the top 2G." https://lnkd.in/eUcXWTBn
X86/Pie: Make Kernel Image's Virtual Address Flexible - Wenlong Hou, Ant Group
https://www.youtube.com/
To view or add a comment, sign in
Lead Software Engineer | C# .NET | Java | ASP.NET| Spring Boot
1moModular shipping at the speed of light. Love it, can’t wait to try it out.