📚 "A Multi-clustering Unbiased Relative Prediction Recommendation Scheme for Data with Hidden Multiple Overlaps" It's always fun to see the research that is done utilizing data from our partners! Check out this recent conference paper by researchers at Shenkar College of Engineering, Design and Art, featuring data from People Data Labs! 🔗 https://lnkd.in/dkhAHfZS
Dewey’s Post
More Relevant Posts
-
Got a paper that I started working on 2019 published today! Looking back at this paper regarding how to compare Neyman and Fisher inference, I can't help but drawing a parallel to recent discussions I have seen about Bayesian vs Frequentis inference – it is really hard to come up with a 'metric' for comparison that give both method their due merit and it is easy to completely miss the point of one of the methods in the choice of how to compare them.
Is Fisher inference inferior to Neyman inference for policy analysis? - Statistical Papers
link.springer.com
To view or add a comment, sign in
-
Hint for balancing a binary dataset: The Dolan-More curve below demonstrates the quality of classification after applying resampling methods(random undersampling, random oversampling, smote) and resampling multiplier selection(equalizing strategy and CV-search) to imbalanced real and artificial datasets. 😊
To view or add a comment, sign in
-
-
I'm happy to announce a new Medium article written by Carlo Lipizzi and myself about bias and trust in large-language models! In this article, we discuss the challenges of quantifying and reducing bias so public trust in such models increase. Attached is the link to the article: https://lnkd.in/eq-9phvs. This article provides a sneak-peek into our research for the last few months. Stay tuned for a full-length paper that hopes to make novel metrics for quantifying bias in large-language models and mechanisms to reduce it!
Bias and Trust in Large-Language Models — Quantification and Resolution
medium.com
To view or add a comment, sign in
-
Data science is at the heart of many of the innovative solutions that we see on the market. Yet, when it comes to the struggles of professionals who are working in this area, setting up an environment has been constant. They often spend more time on preparing it, than actually exploring new ways to get data insight. But, why is this a pain? 😥 𝐓𝐨𝐨𝐥𝐬 𝐟𝐫𝐚𝐠𝐦𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧 𝐚𝐧𝐝 𝐜𝐨𝐧𝐟𝐢𝐠𝐮𝐫𝐚𝐭𝐢𝐨𝐧:This is common in the whole industry and the quick development of new libraries, frameworks and tools does not make it easier. To that, we also need to add the time spent on actually configuring different parts of the stack, such as the GPUs. 😥 𝐃𝐫𝐢𝐯𝐞𝐫𝐬 𝐢𝐧𝐜𝐨𝐦𝐩𝐚𝐭𝐢𝐛𝐢𝐥𝐢𝐭𝐲: The versioning dependency is often a nightmare for data scientists and data engineers, especially when they are not so experienced. Every update or upgrade is basically a risk of breaking the entire ML environment. 😥 𝐒𝐞𝐭𝐮𝐩 𝐭𝐢𝐦𝐞: It often takes much longer, mostly because of the abundance of tools to choose from, possible incompatibilities as well as additional configurations that need to be done. 😥 𝐂𝐨𝐦𝐩𝐮𝐭𝐞 𝐫𝐞𝐬𝐨𝐮𝐫𝐜𝐞𝐬: Whereas having powerful hardware is needed to run AI at scale, for anyone who does initial exploration, having an AI workstation, ideally with a GPU should be enough. Canonical's mission to innovate at speed with open source AI has determined us to put together Data Science Stack, a developer tool that helps you set up your ML environment with ease and within minutes. Our objective is to see if we address the challenges of data scientists and data engineers out there, so together with Lidia Luna Puerta, we are conducting user research to gather feedback about our beta release. We won't take much of your time and you will have your ML environment set up very quickly. Just sign up here: https://lnkd.in/dV2MBxHC #datascience #dataengineer #aiml #opensource
User Research: getting started with the DSS - Lidia Luna Puerta
calendly.com
To view or add a comment, sign in
-
Director | Data Scientist (MSc Computer Science and Data Science) | Certified Statistician (ICCSSA) | AI & Data Science Educator | Consultant | Talks about #ArtificialIntelligence, #MachineLearning, #DataLeadership
I love having friends in the industry who show me cool new techniques. My good friend Daniel Simpson (a genius, by the way; follow him, and you won't regret it) introduced me to a new technique for measuring correlation that overcomes the many problems with the Pearson or Spearman methods and allows for measuring correlation on non-linear datasets WITHOUT assuming and underlying distribution....this is huge....check out the article here: https://lnkd.in/emjcgS_p
A New Coefficient of Correlation
towardsdatascience.com
To view or add a comment, sign in
-
New paper on ArXiv!
We just posted a new paper, "Pareto optimal proxy metrics", on arXiv: https://lnkd.in/gYVbF3fj Good proxy metrics 1) predict the long-term impact of the north star metric and 2) are more sensitive in the short run. Existing literature mostly focuses on predicting the long-term impact. In contrast, our method finds the optimal trade-off between predicting long-term impact and short-term sensitivity! IMO, our paper provides super clear problem framing and definitions of key entities for folks interested in the Proxy metric problem. We're currently working on a few extensions, so should have more on this topic within the next few months. Thanks for taking a look, and feel free to message me if you have any feedback!
2307.01000.pdf
arxiv.org
To view or add a comment, sign in
-
Week in review - 2024-05-31 https://lnkd.in/gqpkVaFW I stumbled across • Data notebook comparisons https://lnkd.in/gp4TMQHH - This website exists to compare the features in different data science notebook tools. • All about GiB vs GB https://lnkd.in/gFrHDwUM - I was completely unaware of this previously. Apparently there have been law suits about this! I’ve been thinking about data quality reporting. I’m currently using Databricks notebooks to explore some data and report on some insights. Will be looking into this more. I read the first two chapters of Solve Any Data Analysis Problem Eight projects that show you how - and I’m considering continuing my data processing experiment codebase to see how well it might be applied to this problem. I’ve been listening to: Full Stack approach for effective AI agents https://lnkd.in/gUwpfuYN Notes: • Is it just a better version of google at the moment? • Main problem is correctness • 60-80% correct isn’t good enough • Agents can make it more robust/correct? • What can you use agents for now? How to sell like Steve Jobs https://lnkd.in/griugg8x Notes: • Start with a problem that needs solving • Lots more in here that I didn't take notes on The Hard Truths Of Software Development https://lnkd.in/gyffqHGH Notes: • Adopting ideas is different to knowing ideas • Getting it wrong by preferring process over people • Be more feedback driven than process driven AI Coding tools comparison: • https://lnkd.in/gfFZq89T
Week in review - 2024-05-31
paulr70.substack.com
To view or add a comment, sign in
-
We can view the bag-of-documents model as a corollary to the cluster hypothesis: if all documents relevant to a query are similar to one other, then they are also similar to their mean or centroid.
Bags of Documents and the Cluster Hypothesis
dtunkelang.medium.com
To view or add a comment, sign in
-
We just posted a new paper, "Pareto optimal proxy metrics", on arXiv: https://lnkd.in/gYVbF3fj Good proxy metrics 1) predict the long-term impact of the north star metric and 2) are more sensitive in the short run. Existing literature mostly focuses on predicting the long-term impact. In contrast, our method finds the optimal trade-off between predicting long-term impact and short-term sensitivity! IMO, our paper provides super clear problem framing and definitions of key entities for folks interested in the Proxy metric problem. We're currently working on a few extensions, so should have more on this topic within the next few months. Thanks for taking a look, and feel free to message me if you have any feedback!
2307.01000.pdf
arxiv.org
To view or add a comment, sign in
Data to power innovative products
1wThis is so awesome!! So proud of the PDL and Dewey teams for continuing to expand the impact of aggregated employment data!