Fast dataset format and loader
-
Updated
Jul 25, 2024 - Python
Fast dataset format and loader
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
React component library for crafting user-friendly and engaging conversational experiences
Build real-time multimodal AI applications 🤖🎙️📹
notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.
Employee Productivity GenAI Assistant Example is an innovative code sample and architecture pattern designed to enhance writing tasks efficiency using AWS serverless technologies and Amazon Bedrock's generative AI models.
A minimal codebase for finetuning large multimodal models, supporting llava-1.5, qwen-vl, llava-interleave, llava-next-video, phi3-v etc.
Library to build personalized AI powered by what you've seen, said, or heard. Works with Ollama. Alternative to Rewind.ai. Open. Secure. You own your data. Rust.
The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework Join our Community: https://discord.com/servers/agora-999382051935506503
Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
Repository for the paper 'MDS-ED: Multimodal Decision Support in the Emergency Department – a benchmark dataset based on MIMIC-IV'.
DataChain 🔗 Process and curate unstructured data using local ML models and LLM calls
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
This repository is used to collect papers and code in the field of AI.
A lightning-fast workflow builder, it supports multimodal interaction, highly customizable extensions, and is intuitive to use even without any coding knowledge.
Corpus of resources for multimodal machine learning with physiological signals (mmps).
Phi-3 for Mac: Locally-run Vision and Language Models for Apple Silicon
Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.
To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."