How AI Is Built 🛠

How AI Is Built 🛠

Media Production

Learn what world-class AI builders, engineers, and startup founders have already figured out.

About us

AI is way more than just fitting a model. It is data storage, collection, ingestion, transformation. Model training, model serving, autoscaling. In How AI Is Built, you get the earned knowledge and practical insight how you can build AI. Nicolay Gerold teaches you what world-class builders, engineers, and startup founders have already figured out.

Website
https://www.how-ai-is-built.com/
Industry
Media Production
Company size
1 employee
Type
Self-Owned
Founded
2024
Specialties
AI, LLMs, Data Engineering, MLOps, and AI UI/UX

Employees at How AI Is Built 🛠

Updates

  • How AI Is Built 🛠 reposted this

    View profile for Nicolay Christopher Gerold, graphic

    Building AI and Data Systems | Working where unstructured data and AI intersect | Using data engineering in AI and AI in data engineering Aisbach CEO | Host of How AI Is Built

    AI agents might soon be running your business workflows? Just wrapped up a podcast on the future of AI agents. 5 Key Takeaways: - The "human in the loop" is evolving: It's no longer about offshore workers completing tasks, but about AI pausing to ask YOU for critical decisions. - Cost vs. Accuracy Trade-off: For high-stakes tasks (think financial analysis), accuracy trumps cost. But for marketing copy? A faster, cheaper model might suffice. - The Workflow Capture Challenge: The magic often happens in the user's head. Capturing that decision-making process is the next frontier in agent development. - Enterprise-Ready Agents: Deploying AI agents in corporate environments requires robust security, scalability, and integration with existing systems. - The Future is Declarative: Moving beyond chat interfaces, the next big leap could be agents that understand and modify system states directly. Learn more about how you can actually bring agents into production and listen to the latest episode of How AI Is Built 🛠 with Rahul Parundekar, AI Hero. Links are in the comments below. #aiagents #agents #llms

    • No alternative text description for this image
  • How AI Is Built 🛠 reposted this

    View profile for Nicolay Christopher Gerold, graphic

    Building AI and Data Systems | Working where unstructured data and AI intersect | Using data engineering in AI and AI in data engineering Aisbach CEO | Host of How AI Is Built

    Are you facing these reliability nightmares with your AI agents? - Does your AI agent occasionally produce nonsensical or off-topic responses? - Are you struggling to maintain consistent output quality as your agent scales? - Do minor changes in input sometimes lead to wildly different outputs? (1) Prompt Engineering for Consistency Implement structured prompts with clear instructions and examples Use few-shot learning techniques to guide the model's behavior Regularly update and refine prompts based on performance data (2) Output Filtering and Post-processing Implement content filters to catch inappropriate or off-topic responses Use language models or rule-based systems to refine and polish agent outputs Implement fact-checking mechanisms for critical information (3) Prompt Preprocessing and Compression Reduce token count without losing essential information Improve consistency by focusing on key elements of the prompt Enable use of larger context windows without increased costs In today's podcast of How AI Is Built 🛠, Richmond Alake gives you the tools to create more reliable and predictable AI agents. Richmond is an AI engineer at MongoDB working on using agents to solve real-world problems. We talk about some of the latest tools for LLM and AI Agents like Microsoft's LLMLingua (Huiqiang Jiang) for prompt compression, crewAI (João (Joe) Moura) and LangChain (Harrison Chase) for building agents, and more. Catch the episode on Spotify, Apple, Snipd, or wherever you get your podcasts. Links are in the comments below! What reliability challenges have you faced with your AI agents? Let me know below! Catch you next week with another episode in our series on AI agents with Rahul Parundekar.

    • No alternative text description for this image
  • It's often beneficial to start with a simplified version of these concepts and iterate as you learn more about your specific integration challenges.

    View profile for Nicolay Christopher Gerold, graphic

    Building AI and Data Systems | Working where unstructured data and AI intersect | Using data engineering in AI and AI in data engineering Aisbach CEO | Host of How AI Is Built

    How can you standardize data from various formats into a single, usable structure? 1️⃣ The "Two-Sided Funnel" Model: Input: Data comes from many different sources. Middle: All data is changed into one standard format (canonical format). Output: The standardized data is then shared or used. 2️⃣ Unified Data Streams: Turn different types of data into one consistent stream. Create parsers for each data source that convert the source's native format into your ingestible object format. 3️⃣ Parallel Processing Power: Use the "competing consumer model" to handle data faster and more efficiently. These workers (processes or threads) take tasks from a shared list (message queue), allowing faster and more efficient processing. 4️⃣ Actor Model Magic: Treat each part of your system (like data collectors or processors) as separate units. These units manage themselves and talk to each other through messages. This makes the system more stable and easier to grow. 5️⃣ Smart Polling: Instead of constantly checking all data, only ask for what's new or changed. This saves time and resources by focusing on updates rather than repeating checks on unchanged data. Want to go deeper? My latest podcast on How AI Is Built 🛠 with Kirk Marple, CEO of Graphlit, by Unstruk Data, unpacks these concepts and more. Listen here: https://lnkd.in/dkVaJtmE Which one does fit the best into agentic workflows?

    • No alternative text description for this image
  • We had the pleasure of hosting Derek Tu of Carbon. If you want to learn how to feed your data-hungry friends at OpenAI or Anthropic, listen to this one. For the ones who do not want to listen to 37 minutes of me learning from Derek, the gist is below. https://lnkd.in/dperWzTh

    View profile for Nicolay Christopher Gerold, graphic

    Building AI and Data Systems | Working where unstructured data and AI intersect | Using data engineering in AI and AI in data engineering Aisbach CEO | Host of How AI Is Built

    How can you bring the data to your LLM? Chunks, context, metadata. All need to be stored, prepared, and delivered. While most people tweak prompts, tweaking this pipeline often brings the biggest marginal results. Some key points: 1. Capturing and representing the rich metadata is as important as the content itself when processing conversational data. 2. Implementing a tagging system for data stored in AI systems can greatly enhance retrieval efficiency and accuracy. 3. The best results are not always in the most complex tools. AWS Textract can provide better OCR results than LLM-based solutions. If you want to learn more, read below or listen to my episode with Derek Tu (Carbon) on How AI Is Built 🛠.

  • How AI Is Built 🛠 reposted this

    View profile for Nicolay Christopher Gerold, graphic

    Building AI and Data Systems | Working where unstructured data and AI intersect | Using data engineering in AI and AI in data engineering Aisbach CEO | Host of How AI Is Built

    When building LLMs, we easily get caught up with the new, shiny optimizations: optimizing chunking strategies and fine-tuning prompts. The biggest gains can often be found in the pipes. Improving the data that is fed into your models. Tags, such as topics, the source of the document, and other metadata, allow you to retrieve more relevant input and contextualize it for the LLM. In today's episode of How AI Is Built 🛠, I talked to Derek Tu, founder of Carbon. Carbon is an ETL tool specifically built for LLMs that connects to various data sources and parses and chunks unstructured data before delivering it to a vector database. Beyond tagging, we discussed fine-tuning embedding models, necessary innovations to drive the space forward, building ETL pipelines for LLMs, extracting data from documents, and much more. Jump into the podcast ⬇ (comments) Stay tuned for next week when we start a little mini-series on agents with Richmond Alake. #llms #genai #etl

    • No alternative text description for this image
  • Learn how to orchestrate tasks like data ingestion, transformation, and AI calls, as well as how to monitor and get analytics on data products.

    View profile for Nicolay Christopher Gerold, graphic

    Building AI and Data Systems | Working where unstructured data and AI intersect | Using data engineering in AI and AI in data engineering Aisbach CEO | Host of How AI Is Built

    Bringing AI into data orchestration or orchestrating data workflows for AI. Today, you can learn about both. In today’s episode of How AI Is Built, I learned from Hugo Lu how to build robust, cost-efficient, and scalable data pipelines that are easy to monitor. Hugo is the founder of Orchestra, a serverless data orchestration tool that aims to provide a unified control plane for managing data pipelines, infrastructure, and analytics across an organization's modern data stack. If you only take away three things, here they are: Find the right level of abstraction when building data orchestration tasks/workflows. "I think the right level of abstraction is always good. I think like Prefect do this really well, right? Their big sell was, just put a decorator on a function and it becomes a task. That is a great idea. You know, just make tasks modular and have them do all the boilerplate stuff like error logging, monitoring of data, all of that stuff.” Modularize data pipeline components: "It's just around understanding what that dev workflow should look like. I think it should be a bit more modular." Having a modular architecture where different components like data ingestion, transformation, model training are decoupled allows better flexibility and scalability." Adopt a streaming/event-driven architecture for low-latency AI use cases: "If you've got an event-driven architecture, then, you know, that's not what you use an orchestration tool for...if you're having a conversation with a chatbot, like, you know, you're sending messages, you're sending events, you're getting a response back. That I would argue should be dealt with by microservices." Listen now: https://lnkd.in/dPgZQ6Am Question to you: How are AI workloads changing the way you approach data orchestration? Are you using specialized tools or adapting existing ones? Stay tuned for next week, when I discuss how to build data pipelines specifically for generative AI with Derek Tu from Carbon. #genai #llms #data #dataengineering #dataorchestration

    Serverless Data Orchestration, AI in the Data Stack, AI Pipelines | ep 12

    Serverless Data Orchestration, AI in the Data Stack, AI Pipelines | ep 12

    https://spotify.com

  • How AI Is Built 🛠 reposted this

    View profile for Nicolay Christopher Gerold, graphic

    Building AI and Data Systems | Working where unstructured data and AI intersect | Using data engineering in AI and AI in data engineering Aisbach CEO | Host of How AI Is Built

    Data storage costs skyrocketing? Quantization offer a powerful, affordable solution. Embeddings can be big, taking up lots of space. That's where quantization comes in. It's shrinking the embeddings down, making them easier to store and search through. Two popular methods are binary quantization (BQ) and product quantization (PQ). BQ simplifies embeddings, making them smaller but still useful. PQ breaks them into pieces, compressing each part separately. Learn more in today's podcast on How AI Is Built with Zain Hasan from Weaviate. https://lnkd.in/dZqwP4tG Let me know below what use-cases you dream about with low-cost storage. #rag #vectordatabase #llm #generativeai

  • View organization page for How AI Is Built 🛠, graphic

    81 followers

    Ever wondered how AI systems handle images and videos, or how they make lightning-fast recommendations?

    View profile for Nicolay Christopher Gerold, graphic

    Building AI and Data Systems | Working where unstructured data and AI intersect | Using data engineering in AI and AI in data engineering Aisbach CEO | Host of How AI Is Built

    Discover how you can reduce cost, boost relevance, and enable new capabilities in your vector database. In our latest episode of How AI Is Built, we dive deep into the world of vector databases with Zain Hasan, ML Engineer at Weaviate. Zain shares his expertise on how vector databases are transforming search and recommendation systems. 3 Key Insights: - Reduce costs by up to 80% with vector quantization techniques like binary and product quantization - Boost search relevance by combining vector search with keyword search in a hybrid approach - Enable powerful new capabilities with multi-vector and multimodal search across text, image, audio, and more Check out the full episode now: https://lnkd.in/d_XZHx92 Let me know below: - How are you currently using or planning to use vector databases? - What are the biggest challenges you face with search or recommendation systems? #VectorDatabases #SemanticSearch #RecommenderSystems

    Mastering Vector Databases: Product & Binary Quantization, Multi-Vector Search

    Mastering Vector Databases: Product & Binary Quantization, Multi-Vector Search

    https://spotify.com

  • How AI Is Built 🛠 reposted this

    View profile for Nicolay Christopher Gerold, graphic

    Building AI and Data Systems | Working where unstructured data and AI intersect | Using data engineering in AI and AI in data engineering Aisbach CEO | Host of How AI Is Built

    Have you ever wondered how complex AI and data systems are built? For any AI or data application, you have to make a lot of choices. 1. What type of storage to use. 2. How to extract data from its sources. 3. How to orchestrate your pipelines 4. How to integrate AI into data processing. In this episode of How AI Is Built, Anjan Banerjee shares his expertise on identifying data sources, selecting the right tools for extraction and storage, and the growing popularity of multi-modal storage engines like Snowflake, Databricks, and BigQuery. We also discuss the pros and cons of data orchestration tools like Airflow and the importance of choosing the right solution based on your organization's technical capabilities. Key takeaways: - Native cloud services can sometimes outperform third-party tools - TinyML is making waves in manufacturing and industrial setups - "Poka-yoke" error-proofing is crucial for data quality - Snowflake is overhyped, while Databricks shines for heavy data processing Listen to the full episode here: https://lnkd.in/dVfhVkj6 Questions for you: - What are your thoughts on the rise of multi-modal storage engines? - How do you ensure data quality and standardization in your AI and data systems? Share your experiences and insights in the comments below! #AI #dataengineering #data

    Building Robust AI and Data Systems, Data Architecture, Data Quality, Data Storage | ep 10

    Building Robust AI and Data Systems, Data Architecture, Data Quality, Data Storage | ep 10

    https://spotify.com

  • View organization page for How AI Is Built 🛠, graphic

    81 followers

    From starting with the right data sources to implementing robust data orchestration and ensuring data quality, Anjan provides a comprehensive guide to navigating the challenges of designing data architectures for AI. 💪

    View profile for Nicolay Christopher Gerold, graphic

    Building AI and Data Systems | Working where unstructured data and AI intersect | Using data engineering in AI and AI in data engineering Aisbach CEO | Host of How AI Is Built

    Essential Strategies for Building AI-Ready Data Systems For any AI or data application, you have to make a lot of choices. 1. What type of storage to use. 2. How to extract data from its sources. 3. How to orchestrate your pipelines 4. How to integrate AI into data processing. 5. Choosing between managed providers and self-hosted solutions, e.g. Snowflake vs Databricks One of the key challenges for data applications? "Automated referential integrity... How can I join different data sets together with the least amount of time spent in identifying what the data actually means? From CRM platforms to your Google Analytics data or your information of user activities and so on, how can I join the different data sets to identify, let's say, a customer 360 view automatically?” Listen to the full episode of How AI Is Built for more expert advice from Anjan Banerjee on building data systems that can power your AI initiatives. https://lnkd.in/d7D-T3y7 What's your biggest challenge in designing AI-ready data architectures? Share your experiences in the comments! #data #tinyml #datasystem #dataengineering

Similar pages