Excited to share insights from our CEO, Oren Netzer's engaging talk at the AdTech Cookieless ML NYC event. Oren discussed the innovative use of log file data for building machine learning models amidst the evolving privacy landscape. Big thank you to Uri Goren and argmax for organizing the event and inviting us to speak. We explored how the shift away from cookies provides new opportunities for ad tech through advanced data structures and techniques, notably enhancing model training and efficiency. For those refining ML strategies in ad tech, make sure to watch the full conversation. #AdTech #MachineLearning #DataScience #PrivacyTechnology
About us
Build a Better ML Model. 10x Faster. Reduce your dataset to a small subset that maintains the statistical properties and corner cases of your full dataset. Use standard libraries to explore, clean, label, train and tune your models on a smaller subset, and build a higher quality model, faster.
- Website
-
https://www.dataheroes.ai
External link for DataHeroes
- Industry
- Software Development
- Company size
- 11-50 employees
- Type
- Privately Held
Employees at DataHeroes
-
David Blumberg
Founder & Managing Partner, Blumberg Capital
-
Oren Netzer
Founder and former CEO of DoubleVerify (NYSE: DV); Co-Founder and CEO, DataHeroes
-
Igor Vainer
Data Scientist & Machine Learning Lead at cClearly | Data Hero
-
Vittesh Sahni
Top AI Voice | Head of AI @ Acorns | Technical Advisor | AI Practitioner & Thought Leader
Updates
-
We are thrilled to join forces with Andrei Lapets from Magnite, Isaac Foster from Microsoft, and Uri Goren from argmax for an engaging day of discussions in New York on April 23rd. This gathering will feature an elite group of ad tech, data science, and engineering leaders. Our conversation will cover the latest in privacy technologies, ML model training, and much more. Secure your spot by RSVPing here: https://lu.ma/cookieless
-
🚀 Exciting news! Our CEO, Oren Netzer, spoke about Real-Time Machine Learning at PyData Global 2023. Discover how data scientists can effortlessly update ML models with fresh data without costly retraining. Watch the recording now: https://lnkd.in/d2bKpdmc #PyData #MachineLearning #DataScience #Innovation
DataHeroes' CEO Oren Netzer talks Real Time Machine Learning PyData Global 2023
https://www.youtube.com/
-
🚀✨ Exciting News! Oren Netzer, the CEO of DataHeroes, was recently featured on Episode #65 of the XtrawAI.com podcast, where he delved into the innovative world of Coresets for AI modeling. #AI #MachineLearning #DataScience #Innovation #Technology
-
🚀 Exciting News from DataHeroes! 🚀 We are thrilled to announce that DataHeroes has been selected for Intel Ignite's Fall 2023 US Cohort! 🎉 Intel Ignite, Intel Corporation’s startup accelerator program for early-stage deep tech startups, has chosen us as one of the ten companies to participate in this prestigious 12-week program. 🌟 We were selected from a pool of over 300 incredible companies, and we are honored to be part of a cohort that includes visionary startups working on innovations ranging from cloud computing to health-tech, AI, data management, and more. At DataHeroes, we are on a mission to revolutionize the world of data-centric AI by significantly accelerating model training time and reducing training costs. We are proud to be among the startups working towards reimagining the future across industries. We want to express our heartfelt thanks to Intel Corporation for this incredible opportunity. Mark Castleman, Sharad Garg, Reed Frerichs and Kelsie Pallanck - thank you for believing in our vision! And congratulations to our fellow cohort members: Bloch Quantum, TryCarbonara, CLIKA Inc., CloudNatix Inc., DynamoFL, Gaia AI, Intramotev, Mesodyne and Tristar AI. We look forward to collaborating with you all and making a lasting impact together! 🙌 Stay tuned for more updates as we embark on this exciting journey with Intel Ignite. 🌍 #DataHeroes #IntelIgnite #TechInnovation #AI #DeepTech #StartupAcceleration #FutureTech #InnovationJourney
Intel Ignite Announces Startups Selected for Fall 2023 US Cohort
intel.com
-
Have you ever wondered about the magic behind algorithms that help us make decisions based on data? Dive into our latest guide on Decision Trees! 💡 Explore the diverse family of tree-based models, from the fundamental decision tree to ensemble methods like Random Forest and Gradient Boosting Trees. 💡 Discover how powerful tools like XGBoost, LightGBM, and CatBoost change the game. 💡 Delve into key components, prediction methods, advantages, and challenges decision trees face. 💡 Gain insights into pruning, handling imbalanced data, feature importance, and the trade-off between interpretability and complexity. Whether you're a seasoned data scientist or an ML enthusiast, our guide offers something for everyone. Would you be ready to leaf through this knowledge? #DecisionTrees #MachineLearning #DataScience
Decision Trees: An Overview and Practical Guide
dataheroes.ai
-
Modern machine learning thrives on two core principles: Model Quality and performance. While many methods aim to strike a balance between these elements, not all prove equally effective. Enter Coresets and Coreset Trees - revolutionary techniques poised to redefine our ML landscape. In our latest blog post, we: 👉 Explore the intricate dance between model quality and performance 👉 Delve into industry-recognized practices like feature engineering, algorithm optimization, and distributed computing 👉 Uncover the transformative potential of Coresets & Coreset Trees Dive into the intricacies of these techniques and how they promise a brighter, more efficient future for ML practitioners. #MachineLearning #Coresets #ModelOptimization
Enhancing Model Quality and Performance: A Deep Dive into Coresets and Innovative Techniques
dataheroes.ai
-
🎉 Exciting News from DataHeroes! 🎉 We've just rolled out version 0.7.0 of the DataHeroes library! This latest upgrade brings: ✅ Seamless handling of datasets with missing values during the coreset build. ✅ Enhanced grid search now supports unsupervised coresets. ✅ Better detection and pre-processing for categorical features. Want to see the full rundown? Check out our detailed release notes: https://lnkd.in/dakwzTeX Huge shoutout to our incredible team for their dedication and hard work on this release. 🌟
-
Artificial intelligence is evolving, with Data-Centric AI taking the forefront. Prioritizing the quality, diversity, and ethical considerations of data, this approach emphasizes the pivotal role data plays in driving AI advancements. In our newest article, we: 👉 Delve deep into the crucial role of data in Machine Learning 👉 Examine challenges like data bias, data privacy, and data imbalance 👉 Introduce the concept of coresets for efficient model training From data collection to coresets, we unravel the transformative power of Data-Centric AI. Dive into the full article to understand how you can harness this approach to build robust AI systems. 🔎 #DataCentricAI #MachineLearning #Coresets #AIInnovations
Understanding the Power of Data-Centric AI
dataheroes.ai
-
We are beyond thrilled to share the groundbreaking work of our Lead Coreset Researcher, Morad Tukan. At ICML'23, Morad and his collaborators presented two phenomenal papers, pushing the boundaries of deep learning and coreset construction. From pioneering methodologies for efficient neural network training to developing practical frameworks applicable to any loss function, Morad's contributions are reshaping the future of machine learning. We are proud to foster an environment where creativity and scientific excellence thrive. #DeepLearning #Coresets #Research #Innovation #DataHeroes
Hello everyone, We are happy to announce that we presented two papers at ICML'23: 1) Provable Data Subset Selection For Efficient Neural Network Training In collaboration with Samson Zhou, Alaa Maalouf, Daniela Rus, Vladimir Braverman, and Dan Feldman In this paper, we introduce a coreset construction methodology targeting the task of subset selection in deep neural networks. Our approach is capable of handling any continuous function represented by a Radial Basis Function Neural Network (RBFNN), known for its universal approximation capabilities. Our coresets, which are small weighted subsets, play a key role in approximating the loss of input data on an RBFNN, thereby enabling the approximation of any function defined by the RBFNN on the original data. Building upon this coreset framework, we present a provable subset selection technique to enhance the training process of deep neural networks. Our method is designed to boost the performance of widely used neural network architectures across various datasets. To validate the effectiveness of our approach, we conducted empirical evaluations, focusing on function approximation tasks and subset selection for boosting deep neural networks. The results demonstrate the efficiency and efficacy of our methodology, showcasing its potential for improving subset selection in neural network training. If you're interested in the intersection of deep learning and subset selection, make sure to check out our paper! #DeepLearning #SubsetSelection #NeuralNetworks #Research * Code: https://lnkd.in/eN5EBEKX * Paper: https://lnkd.in/e2bfgD2n 2) AutoCoreset: An Automatic Practical Coreset Construction Framework: In collaboration with Alaa Maalouf, Vladimir Braverman, and Daniela Rus In this paper, we put forth a novel system featuring a practical coreset construction framework applicable to any loss function. We address the challenge of coresets being problem-dependent and difficult to ensure their existence from a theoretical standpoint. Our proposed framework overcomes these limitations, providing a practical solution that requires only the input data and cost function. This convenience and applicability make it suitable for various machine learning problems. If you're interested in enhancing the efficiency of your ML projects, check out our work! #MachineLearning #CoresetConstruction #Efficiency #Research * Code: https://lnkd.in/ek9yEVnS * Paper: https://lnkd.in/eRAWbk8m I'm eager to hear your valuable insights and feedback.
GitHub - muradtuk/Provable-Data-Subset-Selection-For-Efficient-Neural-Network-Training: Provable Data Subset Selection For Efficient Neural Network Training
github.com