Skip to content

Releases: privacy-tech-lab/gpc-web-crawler

June 2024 Crawl

10 Jun 04:34
24e732e
Compare
Choose a tag to compare

Differences from April 2024 Crawl:

  • addition of GPP version that identifies whether the site is using GPP v1.0 or v1.1 version

April 2024 Crawl

18 Apr 18:29
b302a2a
Compare
Choose a tag to compare

Differences from February 2024 crawl:

  • well-known data is no longer collected by the crawler. We use a python script instead, which is also included in this repo.
  • longer database values are now stored as TEXT instead of varchar
  • addition of OneTrustWPCCPAGoogleOptOut and OTGPPConsent cookies

February 2024 Crawl

13 Feb 02:01
d2545de
Compare
Choose a tag to compare

This is largely the same as the December 2023 crawl code.

Differences:

  • well-known data is collected by the crawler
  • column values in the debugging table are capped at 4,000 characters, as this is what is specified in our table
  • one new human check regular expression

December 2023 Crawl

03 Jan 17:14
72f5d87
Compare
Choose a tag to compare

This is the code we used to perform our crawl on 11,708 sites in December 2023.

The extension collects data from Firefox's urlClassification object in order to determine whether a site is subject to the CCPA. It collects data on the USPS, GPP string, and the OptanonConsent cookie to determine whether sites recognize GPC signals. This version uses a SQL database to store the data.

Firefox-analysis-mode-crawler

19 Aug 18:48
7badcbe
Compare
Choose a tag to compare

The Firefox-analysis-mode-crawler is used to crawl the top 1000 sites of the US Privacy String Test Set.