Siphelele Danisa’s Post

1mo Edited

This week, we will focus on a connection that I have found interesting recently involving stochastic gradient descent (SGD). We have talked about SGD before on a high level, and this week it's important to take a step back to loosely define the algorithm. SGD is an iterative method for optimizing an objective function, typically used in machine learning and deep learning for training models. It is regarded as a stochastic approximation of gradient descent optimization since it replaces the actual gradient, calculated from the entire dataset, with an estimate calculated from a randomly selected subset of the data. Especially in high-dimensional optimization problems, this reduces the very high computational burden, achieving faster iterations in exchange for a likely lower convergence rate. The result attached below provides an alternative perspective on SGD in the continuous time limit (that is, assuming infinitesimal step size). Researchers have used this continuous time formulation to prove some properties of the algorithm. On Thursday, we will see some of these properties and yet another perspective. Below: D is called the diffusion matrix, which is defined as the covariance of the stochastic gradients in SGD, f is the function being minimized, eta is the step size and the small curly b is the batch size.

5 Comments

Yuri Robbertze

Lecturer at University of Cape Town, Quantitative Risk Analyst at Old Mutual Limited

1mo

Why you call eta eta, but beta the small curly b 😂

2 Reactions

Rishit Dagli

CS UG University of Toronto | AI Research, Qualcomm | Research ML, Vision UofT, Vector Institute | Prev: Civo, SpaceX, JWST | RT Kubernetes 1.26-9, TEDx, TED-Ed

1mo

No, D is not equal to the SGN covariance, it is a scaled version of that: it would be \eta / 2 times the SGN covariance, right?

See more comments

To view or add a comment, sign in

More Relevant Posts

InstaDatahelp AI News

132 followers
11mo
Report this post
Generalized Stochastic Dominance: A Statistical Approach to Comparing Classifiers Statistical Comparisons of Classifiers by Generalized Stochastic Dominance Authors: Christoph Jansen, Malte Nalenz, Georg Schollmeyer, Thomas Augustin; 24- 231 -:1−37, 2023. Abstract Comparing classifiers across multiple data sets and criteria is a crucial question in the development of machine learning algorithms, but there is no consensus on how to do it. Existing comparison frameworks face three fundamental challenges: the multitude of quality criteria, the multitude of data sets, and the randomness of data set selection. In this paper, we propose a fresh approach by leveraging recent developments in decision theory. Our framework ranks classifiers using a generalized concept of stochastic dominance, which avoids the reliance on aggregates that can be cumbersome and self-contradictory. We demonstrate that generalized stochastic dominance can be implemented using linear programs and statistically tested using an adapted two-sample observation-randomization test. This provides a powerful framework for statistically comparing classifiers across multiple data sets and criteria. We validate our framework through a simulation study and by applying it to a set of standard benchmark data sets. https://lnkd.in/dF-nzxX7

Generalized Stochastic Dominance: A Statistical Approach to Comparing Classifiers

https://instadatahelpainews.com
Like Comment
To view or add a comment, sign in
InstaDataHelp Analytics Services

1,698 followers
11mo
Report this post
Generalized Stochastic Dominance: A Statistical Approach to Comparing Classifiers Statistical Comparisons of Classifiers by Generalized Stochastic Dominance Authors: Christoph Jansen, Malte Nalenz, Georg Schollmeyer, Thomas Augustin; 24- 231 -:1−37, 2023. Abstract Comparing classifiers across multiple data sets and criteria is a crucial question in the development of machine learning algorithms, but there is no consensus on how to do it. Existing comparison frameworks face three fundamental challenges: the multitude of quality criteria, the multitude of data sets, and the randomness of data set selection. In this paper, we propose a fresh approach by leveraging recent developments in decision theory. Our framework ranks classifiers using a generalized concept of stochastic dominance, which avoids the reliance on aggregates that can be cumbersome and self-contradictory. We demonstrate that generalized stochastic dominance can be implemented using linear programs and statistically tested using an adapted two-sample observation-randomization test. This provides a powerful framework for statistically comparing classifiers across multiple data sets and criteria. We validate our framework through a simulation study and by applying it to a set of standard benchmark data sets. https://lnkd.in/dF-nzxX7

Generalized Stochastic Dominance: A Statistical Approach to Comparing Classifiers

https://instadatahelpainews.com
Like Comment
To view or add a comment, sign in
Dr. Subhabaha Pal

Co-Founder, Chief AI & Analytics Advisor @ InstaDataHelp | Innovator and Patent-Holder in Gen AI and LLM | Data Science Thought Leader and Blogger | FRSS(UK) FSASS FRIOASD | 16+ Years of Excellence
11mo
Report this post
Generalized Stochastic Dominance: A Statistical Approach to Comparing Classifiers Statistical Comparisons of Classifiers by Generalized Stochastic Dominance Authors: Christoph Jansen, Malte Nalenz, Georg Schollmeyer, Thomas Augustin; 24- 231 -:1−37, 2023. Abstract Comparing classifiers across multiple data sets and criteria is a crucial question in the development of machine learning algorithms, but there is no consensus on how to do it. Existing comparison frameworks face three fundamental challenges: the multitude of quality criteria, the multitude of data sets, and the randomness of data set selection. In this paper, we propose a fresh approach by leveraging recent developments in decision theory. Our framework ranks classifiers using a generalized concept of stochastic dominance, which avoids the reliance on aggregates that can be cumbersome and self-contradictory. We demonstrate that generalized stochastic dominance can be implemented using linear programs and statistically tested using an adapted two-sample observation-randomization test. This provides a powerful framework for statistically comparing classifiers across multiple data sets and criteria. We validate our framework through a simulation study and by applying it to a set of standard benchmark data sets. https://lnkd.in/dzPcvA8k

Generalized Stochastic Dominance: A Statistical Approach to Comparing Classifiers

https://instadatahelpainews.com
Like Comment
To view or add a comment, sign in
Syed Haseeb Ahmed

Student at NED University of Engineering and Technology | React Native | MERN Stack Developer | Associate Software Engineer @ Ephlux
9mo
Report this post
Hello LinkedIn Family, I am delighted to announce the publication of our latest research paper, "XGBoost and Random Forest Algorithms: An in-Depth Analysis," in the Pakistan Journal of Scientific Research. 🔗 You can access the full paper here: https://lnkd.in/dwFZjccc This significant accomplishment is the result of unwavering dedication, diligent teamwork, and collaboration among our esteemed team members: Ayan Hussain, Sohaib Amir, and Huzaifa Aslam. I also want to express my gratitude to our teacher, Ms. Sana Fatima, whose invaluable guidance and unwavering support have been instrumental throughout our research journey. Our research delves into the exploration of two robust machine learning methods, namely XGBoost and Random Forest. We presented an introductory overview of these algorithms, underscoring their historical significance and evolutionary development. The core focus of our study revolves around an thorough comparative analysis, encompassing critical aspects including time complexity, precision, reliability, and comprehensive performance evaluation using metrics such as F1-score, Recall, Precision, Mean Squared Error, and more. #Research #MachineLearning #XGBoost #RandomForest

XGBoost and Random Forest Algorithms: An in Depth Analysis

pjosr.com
Like Comment
To view or add a comment, sign in
Kriti Anand

AI and Robotics ( 3+ years ) || 3.5k + followers || Bachelor of Engineering ( Robotics ) - Germany
4mo
Report this post
◼ Brief : Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations. A popular application is to quantify semantic similarity between high-dimensional objects by applying cosine-similarity to a learned low-dimensional feature embedding. This can work better but sometimes also worse than the unnormalized dot-product between embedded vectors in practice. To gain insight into this empirical observation, we study embeddings derived from regularized linear models, where closed-form solutions facilitate analytical insights. We derive analytically how cosine-similarity can yield arbitrary and therefore meaningless `similarities.' For some linear models the similarities are not even unique, while for others they are implicitly controlled by the regularization. They've discussed implications beyond linear models: a combination of different regularizations are employed when learning deep models; these have implicit and unintended effects when taking cosine-similarities of the resulting embeddings, rendering results opaque and possibly arbitrary. Based on these insights, they've cautioned against blindly using cosine-similarity and outline alternatives.
Like Comment
To view or add a comment, sign in
Silur Endre Abraham

Fullstack Web3 Developer and Security expert
2mo
Report this post
The importance of embeddings continously get discredited and re-inveted over and over. "We show that the performance of Bayesian learning in LLMs depends critically on the performance of the embeddings. Specifically we proved a “lipschitz-like" continuity property based on the assumption of convexity preserving mapping of embeddings to next token multinomial distributions, and a general result on the importance of continuity has been established" https://lnkd.in/ewK4UGMD

The Matrix: A Bayesian learning model for LLMs

ar5iv.labs.arxiv.org
Like Comment
To view or add a comment, sign in
Frédéric Barbaresco

THALES "SENSING" Segment Leader of Key Technology Domain PCC (Processing, Control & Cognition)
12mo
Report this post
SL(2,Z)-Equivariant Machine Learning with Modular Forms Theory and Applications Pierre-Yves Lagrave https://lnkd.in/e8jrRsxy Abstract This paper introduces an approach for building Machine Learning (ML) algorithms embedding equivariance mechanisms to the Lie group SL(2,Z) SL(2,Z) by leveraging on modular forms theory. More precisely, we propose using Eisenstein series to build parametric equivariant operators which can then be combined within usual ML architectures to solve both supervised and unsupervised tasks. We substantiate the interest of using SL(2,Z) SL(2,Z)-equivariance on simulated Toeplitz Hermitian Positive Definite matrices datasets built to reproduce some of the challenges associated with financial time series analysis.
1 Comment
Like Comment
To view or add a comment, sign in
Tanat Tonguthaisri, CISSP®

enabling digital services for Student Loan related activities while maintaining the highest security standard, the most compliant personal data protection and customer-centric data-driven innovation.
10mo
Report this post
Exciting news! We are proud to introduce our latest blog post on the "Unified Long-Term Time-Series Forecasting Benchmark". This comprehensive dataset is specifically designed to enhance machine learning methods for predicting time-series data. By incorporating diverse, real-life records and dynamic systems, we ensure a reliable evaluation of long-term forecasting capabilities. To determine the most effective model in various scenarios, we conducted an extensive benchmarking analysis using classical and state-of-the-art models such as LSTM, DeepAR, NLinear, N-Hits, PatchTST, and LatentODE. The findings reveal intriguing performance comparisons among these models, emphasizing the dataset-dependent nature of model effectiveness. Additionally, we introduce a custom latent NLinear model and enhance DeepAR with a curriculum learning phase, both consistently outperforming their vanilla counterparts. Read the full blog post to gain valuable insights and stay ahead in the world of time-series forecasting: [Link to the blog post](https://bit.ly/3tir7NX)
Like Comment
To view or add a comment, sign in
Michael Galkin

AI Research Scientist at Intel AI Lab | Graph ML & GNNs & Knowledge Graphs & ML
1mo
Report this post
📢 In our new blogpost with Michael Bronstein, Haitao Mao, Jianan Zhao, and Zhaocheng Zhu, we discuss foundation models in Graph & Geometric DL: from the core theoretical and data challenges to the most recent models that you can try already today! https://lnkd.in/gVwtYfsn Based off the recent ICML'24 position papers, we are wondering: what are graph FMs? How expressive should they be to transfer to node/edge/graph-level tasks? Is there enough data? Turns out there exists a handful of graph FMs in the geometric and non-geometric GNN worlds!

Foundation Models in Graph & Geometric Deep Learning

towardsdatascience.com

1 Comment
Like Comment
To view or add a comment, sign in
Gyan Prakash Kushwaha

Looking for Data Science opportunities | IIT Madras Data science student | Kaggle Expert
12mo Edited
Report this post
96.36% Training and 90.87% Validation accuracy 🚀🚀 Only in 5000 Data Points. Hello Friends, I have made a neural network model using Bidirectional LSTM Model by starting from very basic architecture then moved towards very complex and then again came to simple but very optimized. My main aim was to reach the best training and validation accuracy in only 5000 rows data!. I have done in the notebook . 1. Early stopping (To prevent overfitting). 2. Learning Rate scheduling(To again Prevent from overfitting). 3. Regularization (To again Prevent from overfitting). 4. Batch Normalization layer(To make training stable and faster). 5. Dropout layer(To prevent overfitting). 6. Bidirectional LSTM(with less neurons to let the model simple). Please navigate to the model part and see the amazing explanation that I have done. Find out the codes Here - https://lnkd.in/dHhUtr7X Feel free to connect with me🙂.
Like Comment
To view or add a comment, sign in

1,650 followers

60 Posts

View Profile Follow

Siphelele Danisa’s Post

More Relevant Posts

Explore topics