Dewey’s Post

View organization page for Dewey, graphic

754 followers

📚 "A Multi-clustering Unbiased Relative Prediction Recommendation Scheme for Data with Hidden Multiple Overlaps" It's always fun to see the research that is done utilizing data from our partners! Check out this recent conference paper by researchers at Shenkar College of Engineering, Design and Art, featuring data from People Data Labs! 🔗 https://lnkd.in/dkhAHfZS

A Multi-clustering Unbiased Relative Prediction Recommendation Scheme for Data with Hidden Multiple Overlaps

link.springer.com

2 Comments

Sam Rounds

Data to power innovative products

This is so awesome!! So proud of the PDL and Dewey teams for continuing to expand the impact of aggregated employment data!

1 Reaction

Steve L.

Leading Where People, Product and Privacy Meet

I did grant-funded academic research for years. Dewey is a great solution to the age-old problem of just getting enough good data to do an academic's actual job - test and build knew knowledge.

2 Reactions

See more comments

To view or add a comment, sign in

More Relevant Posts

Mårten Schultzberg

Product Manager | Staff Data Scientist på Spotify
4mo
Report this post
Got a paper that I started working on 2019 published today! Looking back at this paper regarding how to compare Neyman and Fisher inference, I can't help but drawing a parallel to recent discussions I have seen about Bayesian vs Frequentis inference – it is really hard to come up with a 'metric' for comparison that give both method their due merit and it is easy to completely miss the point of one of the methods in the choice of how to compare them.

Is Fisher inference inferior to Neyman inference for policy analysis? - Statistical Papers

link.springer.com

4 Comments
Like Comment
To view or add a comment, sign in
C-stats blog

Statistics
9mo
Report this post
Hint for balancing a binary dataset: The Dolan-More curve below demonstrates the quality of classification after applying resampling methods(random undersampling, random oversampling, smote) and resampling multiplier selection(equalizing strategy and CV-search) to imbalanced real and artificial datasets. 😊
Like Comment
To view or add a comment, sign in
Devam Mondal
5mo
Report this post
I'm happy to announce a new Medium article written by Carlo Lipizzi and myself about bias and trust in large-language models! In this article, we discuss the challenges of quantifying and reducing bias so public trust in such models increase. Attached is the link to the article: https://lnkd.in/eq-9phvs. This article provides a sneak-peek into our research for the last few months. Stay tuned for a full-length paper that hopes to make novel metrics for quantifying bias in large-language models and mechanisms to reduce it!

Bias and Trust in Large-Language Models — Quantification and Resolution

medium.com
Like Comment
To view or add a comment, sign in
Andreea Munteanu

AI/ML | MLOps | Product manager | Open source
1mo
Report this post
Data science is at the heart of many of the innovative solutions that we see on the market. Yet, when it comes to the struggles of professionals who are working in this area, setting up an environment has been constant. They often spend more time on preparing it, than actually exploring new ways to get data insight. But, why is this a pain? 😥 𝐓𝐨𝐨𝐥𝐬 𝐟𝐫𝐚𝐠𝐦𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧 𝐚𝐧𝐝 𝐜𝐨𝐧𝐟𝐢𝐠𝐮𝐫𝐚𝐭𝐢𝐨𝐧:This is common in the whole industry and the quick development of new libraries, frameworks and tools does not make it easier. To that, we also need to add the time spent on actually configuring different parts of the stack, such as the GPUs. 😥 𝐃𝐫𝐢𝐯𝐞𝐫𝐬 𝐢𝐧𝐜𝐨𝐦𝐩𝐚𝐭𝐢𝐛𝐢𝐥𝐢𝐭𝐲: The versioning dependency is often a nightmare for data scientists and data engineers, especially when they are not so experienced. Every update or upgrade is basically a risk of breaking the entire ML environment. 😥 𝐒𝐞𝐭𝐮𝐩 𝐭𝐢𝐦𝐞: It often takes much longer, mostly because of the abundance of tools to choose from, possible incompatibilities as well as additional configurations that need to be done. 😥 𝐂𝐨𝐦𝐩𝐮𝐭𝐞 𝐫𝐞𝐬𝐨𝐮𝐫𝐜𝐞𝐬: Whereas having powerful hardware is needed to run AI at scale, for anyone who does initial exploration, having an AI workstation, ideally with a GPU should be enough. Canonical's mission to innovate at speed with open source AI has determined us to put together Data Science Stack, a developer tool that helps you set up your ML environment with ease and within minutes. Our objective is to see if we address the challenges of data scientists and data engineers out there, so together with Lidia Luna Puerta, we are conducting user research to gather feedback about our beta release. We won't take much of your time and you will have your ML environment set up very quickly. Just sign up here: https://lnkd.in/dV2MBxHC #datascience #dataengineer #aiml #opensource

User Research: getting started with the DSS - Lidia Luna Puerta

calendly.com

3 Comments
Like Comment
To view or add a comment, sign in
Mark Stent

Director | Data Scientist (MSc Computer Science and Data Science) | Certified Statistician (ICCSSA) | AI & Data Science Educator | Consultant | Talks about #ArtificialIntelligence, #MachineLearning, #DataLeadership
3mo
Report this post
I love having friends in the industry who show me cool new techniques. My good friend Daniel Simpson (a genius, by the way; follow him, and you won't regret it) introduced me to a new technique for measuring correlation that overcomes the many problems with the Pearson or Spearman methods and allows for measuring correlation on non-linear datasets WITHOUT assuming and underlying distribution....this is huge....check out the article here: https://lnkd.in/emjcgS_p

A New Coefficient of Correlation

towardsdatascience.com

3 Comments
Like Comment
To view or add a comment, sign in
Alessandro Zito
12mo
Report this post
New paper on ArXiv!

Lee Richardson

Staff Research Data Scientist (TLM) at Google
12mo Edited

We just posted a new paper, "Pareto optimal proxy metrics", on arXiv: https://lnkd.in/gYVbF3fj Good proxy metrics 1) predict the long-term impact of the north star metric and 2) are more sensitive in the short run. Existing literature mostly focuses on predicting the long-term impact. In contrast, our method finds the optimal trade-off between predicting long-term impact and short-term sensitivity! IMO, our paper provides super clear problem framing and definitions of key entities for folks interested in the Proxy metric problem. We're currently working on a few extensions, so should have more on this topic within the next few months. Thanks for taking a look, and feel free to message me if you have any feedback!

2307.01000.pdf

arxiv.org
Like Comment
To view or add a comment, sign in
Paul Rule

Software Developer
1mo
Report this post
Week in review - 2024-05-31 https://lnkd.in/gqpkVaFW I stumbled across • Data notebook comparisons https://lnkd.in/gp4TMQHH - This website exists to compare the features in different data science notebook tools. • All about GiB vs GB https://lnkd.in/gFrHDwUM - I was completely unaware of this previously. Apparently there have been law suits about this! I’ve been thinking about data quality reporting. I’m currently using Databricks notebooks to explore some data and report on some insights. Will be looking into this more. I read the first two chapters of Solve Any Data Analysis Problem Eight projects that show you how - and I’m considering continuing my data processing experiment codebase to see how well it might be applied to this problem. I’ve been listening to: Full Stack approach for effective AI agents https://lnkd.in/gUwpfuYN Notes: • Is it just a better version of google at the moment? • Main problem is correctness • 60-80% correct isn’t good enough • Agents can make it more robust/correct? • What can you use agents for now? How to sell like Steve Jobs https://lnkd.in/griugg8x Notes: • Start with a problem that needs solving • Lots more in here that I didn't take notes on The Hard Truths Of Software Development https://lnkd.in/gyffqHGH Notes: • Adopting ideas is different to knowing ideas • Getting it wrong by preferring process over people • Be more feedback driven than process driven AI Coding tools comparison: • https://lnkd.in/gfFZq89T

Week in review - 2024-05-31

paulr70.substack.com
Like Comment
To view or add a comment, sign in
Daniel Tunkelang

High-Class Consultant
1mo
Report this post
We can view the bag-of-documents model as a corollary to the cluster hypothesis: if all documents relevant to a query are similar to one other, then they are also similar to their mean or centroid.

Bags of Documents and the Cluster Hypothesis

dtunkelang.medium.com
Like Comment
To view or add a comment, sign in
Lee Richardson

Staff Research Data Scientist (TLM) at Google
12mo Edited
Report this post
We just posted a new paper, "Pareto optimal proxy metrics", on arXiv: https://lnkd.in/gYVbF3fj Good proxy metrics 1) predict the long-term impact of the north star metric and 2) are more sensitive in the short run. Existing literature mostly focuses on predicting the long-term impact. In contrast, our method finds the optimal trade-off between predicting long-term impact and short-term sensitivity! IMO, our paper provides super clear problem framing and definitions of key entities for folks interested in the Proxy metric problem. We're currently working on a few extensions, so should have more on this topic within the next few months. Thanks for taking a look, and feel free to message me if you have any feedback!

2307.01000.pdf

arxiv.org

10 Comments
Like Comment
To view or add a comment, sign in

754 followers

View Profile Follow

Dewey’s Post

More Relevant Posts

Explore topics