How AI Is Built 🛠

How AI Is Built 🛠

Media Production

Learn what world-class AI builders, engineers, and startup founders have already figured out.

About us

AI is way more than just fitting a model. It is data storage, collection, ingestion, transformation. Model training, model serving, autoscaling. In How AI Is Built, you get the earned knowledge and practical insight how you can build AI. Nicolay Gerold teaches you what world-class builders, engineers, and startup founders have already figured out.

Website
https://www.how-ai-is-built.com/
Industry
Media Production
Company size
1 employee
Type
Self-Owned
Founded
2024
Specialties
AI, LLMs, Data Engineering, MLOps, and AI UI/UX

Employees at How AI Is Built 🛠

Updates

  • How AI Is Built 🛠 reposted this

    View profile for Nicolay Christopher Gerold, graphic

    I build AI systems that turn unstructured data into business value. Optimizing at system level, creating feedback loops, and building data assets. CEO @Aisbach | Host How AI Is Built

    Hey everyone, we have come a long way. Super excited to conclude the first season of How AI Is Built 🛠 with a slightly different episode. Today, we are taking more of an enterprise view. What is happening in the industry, where is the potential for enterprises to adopt AI, where can startups come in and build a solution. In today’s episode, we talk with Jonathan Yarkoni. He is the founder of Latent, where he specializes on solving hard problems with generative AI. We talk a lot about what industries are ready for AI disruption? What are the key drivers of the current technology shift? How to solve the big, hairy problems with AI? Some key points: - 2024 is about internal use-cases - 2025 will be about integrating AI into the customer-facing applications - Text-heavy domains (legal, education, marketing) are prime candidates, but healthcare and finance are a close second. - Companies that have "exhausted all possible paths" are perfect customers for AI solutions Check it out on Spotify: https://lnkd.in/dq4Sgqju Apple: https://lnkd.in/dGrvxwaC Or now also on Youtube: https://lnkd.in/dY6CWGgw Stay tuned for the next season. Season 2 will be all about information retrieval, recommendations, and RAG. I will try to bring three very related fields together that do not really talk and learn from each other.

    Unlocking Value from Unstructured Data, Real-World Applications of Generative AI | ep 17

    Unlocking Value from Unstructured Data, Real-World Applications of Generative AI | ep 17

    https://spotify.com

  • How AI Is Built 🛠 reposted this

    View profile for Nicolay Christopher Gerold, graphic

    I build AI systems that turn unstructured data into business value. Optimizing at system level, creating feedback loops, and building data assets. CEO @Aisbach | Host How AI Is Built

    What are the areas where generative AI can bring the highest ROI? 1. Text-heavy domains: ➡️ Legal ➡️ Education ➡️ Software engineering ➡️ Marketing 2. Biotech: ➡️ Protein engineering ➡️ Drug discovery 3. Entertainment: ➡️ Gaming ➡️ Content generation ➡️ Personalized streaming (e.g. "Netflix on demand") 4. Healthcare: ➡️ Diagnostics ➡️ Personalized medicine 5. Finance: ➡️ Risk assessment ➡️ Algorithmic trading Key drivers: • Maturity of text models • Advancements in diffusion models • Increasing hyperrealism in generated content • Growing ability to adhere to specific styles/brands The next few years will likely see waves of AI adoption: 2024: Focus on internal use cases 2025: Broader customer-facing applications as models improve Learn more in today’s podcast with Jonathan Yarkoni on How AI Is Built 🛠. Links to the podcast in the comments. What industries do you think will see the biggest #AI impact? Any surprises? 👇

  • We have now also officially launched on YouTube, if you want to catch a video version of the podcast! Link: https://lnkd.in/d3xNpeqh

    View profile for Nicolay Christopher Gerold, graphic

    I build AI systems that turn unstructured data into business value. Optimizing at system level, creating feedback loops, and building data assets. CEO @Aisbach | Host How AI Is Built

    When should you use Spark to process your data for your AI Systems? In today’s episode of How AI Is Built 🛠, Abhishek Choudhary, principal data engineer at Bayer, breaks down data processing with Spark: - When to use Spark vs. alternatives for data processing - Key components of Spark: RDDs, DataFrames, and SQL - Integrating AI into data pipelines - Challenges with LLM latency and consistency - Data storage strategies for AI workloads - Orchestration tools for data pipelines - Tips for making LLMs more reliable in production Key takeaways: 1. Data Volume: Spark shines when dealing with terabytes+ of data. For datasets under 100GB, simpler tools like Pandas or DuckDB often suffice. 2. Uncertainty in Data Growth: If you expect rapid, unpredictable growth in your data volume, Spark's scalability becomes a major advantage. 3. Complex Data Pipelines: Spark excels when you need to perform multiple operations (e.g., loading, transforming, aggregating) on large datasets. 4. Existing Infrastructure: If you already have a Spark cluster set up (e.g., in Databricks), leveraging it for AI data processing can be efficient. 5. Team Expertise: Consider your team's familiarity with Spark. If you're starting from scratch, simpler Python-based tools might be easier to adopt. 6. Performance Requirements: For compute-intensive operations on large datasets, Spark's distributed computing model can significantly boost performance. Link to the episode is in the comments below! Let me know what tools you use for data processing and why. #dataengineering #llms #ai

    • No alternative text description for this image
  • How AI Is Built 🛠 reposted this

    View profile for Nicolay Christopher Gerold, graphic

    I build AI systems that turn unstructured data into business value. Optimizing at system level, creating feedback loops, and building data assets. CEO @Aisbach | Host How AI Is Built

    AI agents might soon be running your business workflows? Just wrapped up a podcast on the future of AI agents. 5 Key Takeaways: - The "human in the loop" is evolving: It's no longer about offshore workers completing tasks, but about AI pausing to ask YOU for critical decisions. - Cost vs. Accuracy Trade-off: For high-stakes tasks (think financial analysis), accuracy trumps cost. But for marketing copy? A faster, cheaper model might suffice. - The Workflow Capture Challenge: The magic often happens in the user's head. Capturing that decision-making process is the next frontier in agent development. - Enterprise-Ready Agents: Deploying AI agents in corporate environments requires robust security, scalability, and integration with existing systems. - The Future is Declarative: Moving beyond chat interfaces, the next big leap could be agents that understand and modify system states directly. Learn more about how you can actually bring agents into production and listen to the latest episode of How AI Is Built 🛠 with Rahul Parundekar, AI Hero. Links are in the comments below. #aiagents #agents #llms

    • No alternative text description for this image
  • How AI Is Built 🛠 reposted this

    View profile for Nicolay Christopher Gerold, graphic

    I build AI systems that turn unstructured data into business value. Optimizing at system level, creating feedback loops, and building data assets. CEO @Aisbach | Host How AI Is Built

    Are you facing these reliability nightmares with your AI agents? - Does your AI agent occasionally produce nonsensical or off-topic responses? - Are you struggling to maintain consistent output quality as your agent scales? - Do minor changes in input sometimes lead to wildly different outputs? (1) Prompt Engineering for Consistency Implement structured prompts with clear instructions and examples Use few-shot learning techniques to guide the model's behavior Regularly update and refine prompts based on performance data (2) Output Filtering and Post-processing Implement content filters to catch inappropriate or off-topic responses Use language models or rule-based systems to refine and polish agent outputs Implement fact-checking mechanisms for critical information (3) Prompt Preprocessing and Compression Reduce token count without losing essential information Improve consistency by focusing on key elements of the prompt Enable use of larger context windows without increased costs In today's podcast of How AI Is Built 🛠, Richmond Alake gives you the tools to create more reliable and predictable AI agents. Richmond is an AI engineer at MongoDB working on using agents to solve real-world problems. We talk about some of the latest tools for LLM and AI Agents like Microsoft's LLMLingua (Huiqiang Jiang) for prompt compression, crewAI (João (Joe) Moura) and LangChain (Harrison Chase) for building agents, and more. Catch the episode on Spotify, Apple, Snipd, or wherever you get your podcasts. Links are in the comments below! What reliability challenges have you faced with your AI agents? Let me know below! Catch you next week with another episode in our series on AI agents with Rahul Parundekar.

    • No alternative text description for this image
  • View organization page for How AI Is Built 🛠, graphic

    87 followers

    It's often beneficial to start with a simplified version of these concepts and iterate as you learn more about your specific integration challenges.

    View profile for Nicolay Christopher Gerold, graphic

    I build AI systems that turn unstructured data into business value. Optimizing at system level, creating feedback loops, and building data assets. CEO @Aisbach | Host How AI Is Built

    How can you standardize data from various formats into a single, usable structure? 1️⃣ The "Two-Sided Funnel" Model: Input: Data comes from many different sources. Middle: All data is changed into one standard format (canonical format). Output: The standardized data is then shared or used. 2️⃣ Unified Data Streams: Turn different types of data into one consistent stream. Create parsers for each data source that convert the source's native format into your ingestible object format. 3️⃣ Parallel Processing Power: Use the "competing consumer model" to handle data faster and more efficiently. These workers (processes or threads) take tasks from a shared list (message queue), allowing faster and more efficient processing. 4️⃣ Actor Model Magic: Treat each part of your system (like data collectors or processors) as separate units. These units manage themselves and talk to each other through messages. This makes the system more stable and easier to grow. 5️⃣ Smart Polling: Instead of constantly checking all data, only ask for what's new or changed. This saves time and resources by focusing on updates rather than repeating checks on unchanged data. Want to go deeper? My latest podcast on How AI Is Built 🛠 with Kirk Marple, CEO of Graphlit, by Unstruk Data, unpacks these concepts and more. Listen here: https://lnkd.in/dkVaJtmE Which one does fit the best into agentic workflows?

    • No alternative text description for this image
  • View organization page for How AI Is Built 🛠, graphic

    87 followers

    We had the pleasure of hosting Derek Tu of Carbon. If you want to learn how to feed your data-hungry friends at OpenAI or Anthropic, listen to this one. For the ones who do not want to listen to 37 minutes of me learning from Derek, the gist is below. https://lnkd.in/dperWzTh

    View profile for Nicolay Christopher Gerold, graphic

    I build AI systems that turn unstructured data into business value. Optimizing at system level, creating feedback loops, and building data assets. CEO @Aisbach | Host How AI Is Built

    How can you bring the data to your LLM? Chunks, context, metadata. All need to be stored, prepared, and delivered. While most people tweak prompts, tweaking this pipeline often brings the biggest marginal results. Some key points: 1. Capturing and representing the rich metadata is as important as the content itself when processing conversational data. 2. Implementing a tagging system for data stored in AI systems can greatly enhance retrieval efficiency and accuracy. 3. The best results are not always in the most complex tools. AWS Textract can provide better OCR results than LLM-based solutions. If you want to learn more, read below or listen to my episode with Derek Tu (Carbon) on How AI Is Built 🛠.

  • How AI Is Built 🛠 reposted this

    View profile for Nicolay Christopher Gerold, graphic

    I build AI systems that turn unstructured data into business value. Optimizing at system level, creating feedback loops, and building data assets. CEO @Aisbach | Host How AI Is Built

    When building LLMs, we easily get caught up with the new, shiny optimizations: optimizing chunking strategies and fine-tuning prompts. The biggest gains can often be found in the pipes. Improving the data that is fed into your models. Tags, such as topics, the source of the document, and other metadata, allow you to retrieve more relevant input and contextualize it for the LLM. In today's episode of How AI Is Built 🛠, I talked to Derek Tu, founder of Carbon. Carbon is an ETL tool specifically built for LLMs that connects to various data sources and parses and chunks unstructured data before delivering it to a vector database. Beyond tagging, we discussed fine-tuning embedding models, necessary innovations to drive the space forward, building ETL pipelines for LLMs, extracting data from documents, and much more. Jump into the podcast ⬇ (comments) Stay tuned for next week when we start a little mini-series on agents with Richmond Alake. #llms #genai #etl

    • No alternative text description for this image
  • View organization page for How AI Is Built 🛠, graphic

    87 followers

    Learn how to orchestrate tasks like data ingestion, transformation, and AI calls, as well as how to monitor and get analytics on data products.

    View profile for Nicolay Christopher Gerold, graphic

    I build AI systems that turn unstructured data into business value. Optimizing at system level, creating feedback loops, and building data assets. CEO @Aisbach | Host How AI Is Built

    Bringing AI into data orchestration or orchestrating data workflows for AI. Today, you can learn about both. In today’s episode of How AI Is Built, I learned from Hugo Lu how to build robust, cost-efficient, and scalable data pipelines that are easy to monitor. Hugo is the founder of Orchestra, a serverless data orchestration tool that aims to provide a unified control plane for managing data pipelines, infrastructure, and analytics across an organization's modern data stack. If you only take away three things, here they are: Find the right level of abstraction when building data orchestration tasks/workflows. "I think the right level of abstraction is always good. I think like Prefect do this really well, right? Their big sell was, just put a decorator on a function and it becomes a task. That is a great idea. You know, just make tasks modular and have them do all the boilerplate stuff like error logging, monitoring of data, all of that stuff.” Modularize data pipeline components: "It's just around understanding what that dev workflow should look like. I think it should be a bit more modular." Having a modular architecture where different components like data ingestion, transformation, model training are decoupled allows better flexibility and scalability." Adopt a streaming/event-driven architecture for low-latency AI use cases: "If you've got an event-driven architecture, then, you know, that's not what you use an orchestration tool for...if you're having a conversation with a chatbot, like, you know, you're sending messages, you're sending events, you're getting a response back. That I would argue should be dealt with by microservices." Listen now: https://lnkd.in/dPgZQ6Am Question to you: How are AI workloads changing the way you approach data orchestration? Are you using specialized tools or adapting existing ones? Stay tuned for next week, when I discuss how to build data pipelines specifically for generative AI with Derek Tu from Carbon. #genai #llms #data #dataengineering #dataorchestration

    Serverless Data Orchestration, AI in the Data Stack, AI Pipelines | ep 12

    Serverless Data Orchestration, AI in the Data Stack, AI Pipelines | ep 12

    https://spotify.com

  • How AI Is Built 🛠 reposted this

    View profile for Nicolay Christopher Gerold, graphic

    I build AI systems that turn unstructured data into business value. Optimizing at system level, creating feedback loops, and building data assets. CEO @Aisbach | Host How AI Is Built

    Data storage costs skyrocketing? Quantization offer a powerful, affordable solution. Embeddings can be big, taking up lots of space. That's where quantization comes in. It's shrinking the embeddings down, making them easier to store and search through. Two popular methods are binary quantization (BQ) and product quantization (PQ). BQ simplifies embeddings, making them smaller but still useful. PQ breaks them into pieces, compressing each part separately. Learn more in today's podcast on How AI Is Built with Zain Hasan from Weaviate. https://lnkd.in/dZqwP4tG Let me know below what use-cases you dream about with low-cost storage. #rag #vectordatabase #llm #generativeai

Similar pages