Joseph Spisak’s Post

Product Director & Head of Generative AI Open Source @Meta | Ex: Google, Amazon

1mo

Slowly patterns are emerging in generative AI evaluations but we are still **very** far from having things figured out. And the risks for safety and frontier harms are tricky to understand given the areas require highly specialized knowledge and a deep understanding of how to solicit the types of inputs that could trigger these types of harms. This is the first of many blogs we are writing as the newly formed AI Alliance - I hope you enjoy it and please comment if you are working in this space. The tent for the alliance is HUGE and there is room for everyone!!

Evaluation of Generative AI - what’s ultimately our goal? | AI Alliance

thealliance.ai

6 Comments

Pulkit Kapur

Product Leader @ Amazon | Generative AI | Autonomous Systems

1mo

Thanks for sharing. The part about gaps in public benchmarks such as MMLU/GSM8K resonates a lot and what is required is both a widely accepted taxonomy of user needs as well as SME benchmarks to evaluate those needs. This is how product managers can influence model capabilities to align with use cases and applications. "The line of sight from something like the Massive Multitask Language Understanding (MMLU) or HellaSwag datasets, to what the downstream consumer (i.e., the developer) wants in terms of application performance is unclear and certainly non-linear"

2 Reactions

Mark Simithraaratchy

ML Eng Management | Meta Alum

1mo

> “In many ways, we are really talking about the evaluation of a model or agent as the new PRD (product requirements document). This flips product development on its head given defining an eval up front and working backwards requires those developing foundation models to work backwards defining everything from safety mitigations to data mixtures for both pretraining and post training.” Great to see evals appropriately represented as primitives in this framework. Excellent write-up!

1 Reaction

Kelvin Meeks

Consulting Architect/CTO - Leadership in Enterprise Architecture and Software Engineering Innovation (US Army Veteran)

1mo

The links to the various leaderboards are very noteworthy.

1 Reaction

Jesse Kulp

1mo

Nice to see the collaboration joe!

See more comments

To view or add a comment, sign in

More Relevant Posts

Subha Mahanti
1mo Edited
Report this post
Check out this insightful article on evaluating #GenAI models. I especially enjoyed the deep dive into what evaluation means. Beyond the technical aspects, it's crucial to emphasize organizational readiness and the maturity model when implementing any strategy. Ultimately, as the title aptly points out: what’s the goal? The pursuit of GenAI or any other emerging technology is exciting because it can be an amazing enabler or accelerator. However, we must remind ourselves that it is still a means to an end—the business outcome. The key is to balance much-needed exploration with focused execution. Another important point is aligning with stakeholders about the radical vs. incremental innovation approach of these efforts, as this will ensure continued support and success of these initiatives. #generativeai #innovation

Joseph Spisak

Product Director & Head of Generative AI Open Source @Meta | Ex: Google, Amazon
1mo

Slowly patterns are emerging in generative AI evaluations but we are still **very** far from having things figured out. And the risks for safety and frontier harms are tricky to understand given the areas require highly specialized knowledge and a deep understanding of how to solicit the types of inputs that could trigger these types of harms. This is the first of many blogs we are writing as the newly formed AI Alliance - I hope you enjoy it and please comment if you are working in this space. The tent for the alliance is HUGE and there is room for everyone!!

Evaluation of Generative AI - what’s ultimately our goal? | AI Alliance

thealliance.ai
Like Comment
To view or add a comment, sign in
Stuart Asbury
1mo
Report this post
The AI Alliance, bringing together global experts to collaborate and ultimately understand not only how to measure these emergent capabilities, what we mean by evaluations, but also to learn how best to mitigate the risks of using generative AI.

Joseph Spisak

Product Director & Head of Generative AI Open Source @Meta | Ex: Google, Amazon
1mo

Slowly patterns are emerging in generative AI evaluations but we are still **very** far from having things figured out. And the risks for safety and frontier harms are tricky to understand given the areas require highly specialized knowledge and a deep understanding of how to solicit the types of inputs that could trigger these types of harms. This is the first of many blogs we are writing as the newly formed AI Alliance - I hope you enjoy it and please comment if you are working in this space. The tent for the alliance is HUGE and there is room for everyone!!

Evaluation of Generative AI - what’s ultimately our goal? | AI Alliance

thealliance.ai
Like Comment
To view or add a comment, sign in
Yves De Hondt
1mo
Report this post
We need more AI related alliances at different levels of abstraction and concern

Joseph Spisak

Product Director & Head of Generative AI Open Source @Meta | Ex: Google, Amazon
1mo

Slowly patterns are emerging in generative AI evaluations but we are still **very** far from having things figured out. And the risks for safety and frontier harms are tricky to understand given the areas require highly specialized knowledge and a deep understanding of how to solicit the types of inputs that could trigger these types of harms. This is the first of many blogs we are writing as the newly formed AI Alliance - I hope you enjoy it and please comment if you are working in this space. The tent for the alliance is HUGE and there is room for everyone!!

Evaluation of Generative AI - what’s ultimately our goal? | AI Alliance

thealliance.ai
Like Comment
To view or add a comment, sign in
Phys.org

59,874 followers
3mo
Report this post
To build AI systems that can collaborate effectively with humans, it helps to have a good model of human behavior to start with.

To build a better AI helper, start by modeling the irrational behavior of humans

techxplore.com
Like Comment
To view or add a comment, sign in
Johnson & Wales University College of Professional Studies

5,696 followers
2mo
Report this post
Explore how artificial intelligence is shaping our world with the top 15 AI applications revolutionizing industries today. Dive into the future now and see what's possible!

The Future of AI: 15 Best Applications of Artificial Intelligence

https://online.jwu.edu
Like Comment
To view or add a comment, sign in

9,809 followers

521 Posts

View Profile Follow

Joseph Spisak’s Post

More Relevant Posts

Explore topics