Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Colab demo of the Topics classifier is incorrect and not aligned with Google Chrome's implementation of the Topics API #307

Closed
yohhaan opened this issue Apr 18, 2024 · 1 comment

Comments

@yohhaan
Copy link

yohhaan commented Apr 18, 2024

Problem Description

The documentation of the Topics API for the web links to a demo on Google Colab to perform inferences with the model used by Chrome. However, this demo does not follow the same algorithm as the one executed in Google Chrome. As a result, classifications results differ.

Some differences in the Colab:

  • old taxonomy v1 used
  • pre-processing step missing: removal of the www. prefix
  • post-filtering of model inference scores is incorrect: Chrome's implementation is more involved (top 5 scores kept, minimum thresholds, check "Unknown" topic contribution, normalization, etc.)

I would suggest updating the Colab demo to exactly mirror Google Chrome's implementation of the Topics API for the web. This would avoid potential confusion due to classification mismatches between the Colab and Chrome implementations.

Resources

  • In this blog post and this paper, I describe the steps performed in Google Chrome when a hostname is classified by the Topics API, specifically see the post-filtering algorithm.

  • Here is my correct reimplementation of the classification performed in Google Chrome: https://github.com/yohhaan/topics_classifier

@jkarlin
Copy link
Collaborator

jkarlin commented Jul 9, 2024

Hi, thanks for creating the issue. The colab demo that you reference was meant as a one-time demonstration on how one might extract and use the classifier model and not as an evergreen document. We leave it as an exercise to developers to look at Chrome's code and to keep up with further changes if they want to copy Chrome's behavior over time.

@jkarlin jkarlin closed this as completed Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants