Vectorview reposted this
Sometimes, LLMs act in ways we didn't intend. To solve this problem, Vectorview (YC W24) is providing custom evaluation tasks for AI. It’s difficult to prevent unwanted behaviors in LLMs due to their non-deterministic nature. Testing them against every possible scenario is hard, making it tough to catch all unintended behaviors. Additionally, most evaluation benchmarks are too general, missing the specific issues that can arise in real-world use. Vectorview’s platform offers a suite of custom evaluation tools designed to benchmark AI applications against specific, real-world scenarios they are likely to encounter. This targeted approach ensures that AI behaves as intended, mitigating the risk of unintended behaviors that generic benchmarks often miss. The founders, Emil Fröberg and Lukas Petersson, believe that enabling access to custom evaluations at scale is the way to realize the full potential of AI. Congrats to the team on the launch!
Creating a system prompt that gives expected results 100% of the time is a very challenging task. Can't wait to see how much of my time can be saved with this tool. Congrats guys!
Let's do this! 🥷
This looks like a great product. Can I test it on chatgpt ?
Cool tool, definitely useful for building safety measures into the process and reducing unintended behaviours! Especially as we're adopting more AI
Grattis Lukas Petersson och Emil Fröberg! 🌟 🚀
🚀
🚀🚀
🌟🌟
Congratulations! I can't wait to see where the journey takes you!
CEO & Founder of TripRanger - Gamifying Travel | Retired at 26 from edTech | 12 Years full-time Traveler
5moIs this an actual tool being built (something automated) or just a horde of freelancers getting paid hourly to test the model, like the person i met today who does literally this for a company? I'm legitimately curious as my company could use this tool if it's actually automated.