Alejandro Luis Figueroa Cevallos’ Post

View profile for Alejandro Luis Figueroa Cevallos, graphic

Amazon MRA | Full Stack Developer | Electronic Engineer

It is certainly so Andrew Ng, LLMs trained on their own responses through agentic workflows could potentially improve output quality, resembling human learning processes. This approach, though costly, could enhance LLM training. #AI #GenAI #innovation #technology

View profile for Andrew Ng, graphic
Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Founder and CEO of Landing AI

Inexpensive token generation and agentic workflows for LLMs open up new possibilities for training LLMs on synthetic data. Pretraining an LLM on its own directly generated responses to prompts doesn't help. But if an agentic workflow implemented with the LLM results in higher quality output than the LLM can generate directly, then training on that output becomes potentially useful. Just as humans can learn from their own thinking, perhaps LLMs can, too. Imagine a math student learning to write mathematical proofs. By solving a few problems — even without external input — they can reflect on what works and learn to generate better proofs. LLM training involves (i) pretraining (learning from unlabeled text data to predict the next work) followed by (ii) instruction fine-tuning (learning to follow instructions) and (iii) RLHF/DPO to align to human values. Step (i) requires orders of magnitude more data than the others. For example, Llama 3 was pretrained on over 15 trillion tokens. LLM developers are still hungry for more data. Where can we get more text to train on? Many developers train smaller models on the output of larger models, so a smaller model learns to mimic a larger model’s behavior on a particular task. But an LLM can’t learn much by training on data it generated directly. Indeed, training a model repeatedly on the output of an earlier version of itself can result in model collapse. But, an LLM wrapped in an agentic workflow can produce higher-quality output than it can generate directly. This output might be useful as pretraining data. Efforts like these have precedents: - When using reinforcement learning to play a game like chess, a model might learn a function that evaluates board positions. If we apply game tree search along with a low-accuracy evaluation function, the model can come up with more accurate evaluations. Then we can train that evaluation function to mimic these more accurate values. - During alignment, Anthropic’s constitutional AI uses RLAIF (RL from AI Feedback) to judge LLM output quality, substituting feedback generated by an AI model for human feedback. A significant barrier to using agentic workflows to produce LLM training data is the cost of generating tokens. Say we want to generate 1 trillion tokens to extend a pre-existing dataset. At current retail prices, 1 trillion tokens from GPT-4-turbo ($30 per million output tokens), Claude 3 Opus ($75), Gemini 1.5 Pro ($21), and Llama-3-70B on Groq ($0.79) would cost, respectively, $30M, $75M, $21M and $790K. Of course, an agentic workflow would require generating more than one token per final output token. But budgets for training cutting-edge LLMs easily surpass $100M, so spending a few million dollars more for data to boost performance is feasible. That’s why agentic workflows might opening up new opportunities for high-quality synthetic data generation. [Original text: https://lnkd.in/gFF2AsZ9 ]

Apple's Tiny LLMs, Amazon Rethinks Cashier-Free Stores, and more

Apple's Tiny LLMs, Amazon Rethinks Cashier-Free Stores, and more

deeplearning.ai

To view or add a comment, sign in

Explore topics