Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creative pre-registration strategies #792

Open
rdgordon-index opened this issue Sep 11, 2023 · 45 comments
Open

Creative pre-registration strategies #792

rdgordon-index opened this issue Sep 11, 2023 · 45 comments

Comments

@rdgordon-index
Copy link
Contributor

As suggested in the explainer, sellers have the ability to fetch additional real-time signals based on a combination of renderURL and hostname (representing the publisher’s domain) that can be used during scoreAd() when scoring creatives. Specifically:

Similarly, sellers may want to fetch information about a specific creative, e.g. the results of some out-of-band ad scanning system

Checking whether the creative contents have been pre-approved by the seller. This could be implemented by an out-of-band creative review process…

In today’s programmatic ecosystem, buyers communicate their creative markup via the bid.adm during RTB, alongside other key bid metadata (advertiser domain, seat ID, IAB category, creative format, creative & campaign identifiers, etc.); however, no creative URL (aka renderURL) containing the markup is provided. As a result, there is no existing mechanism by which SSPs can obtain this URL for all existing creatives submitted in contextual auctions.

This necessarily means that all existing creatives are unable to be served in PA auctions, since creatives have to be pre-approved in order to be scored with a desirability > 0. Otherwise, the rejectReason for all PA creatives returned by scoreAd() would be pending-approval-by-exchange.

This also necessitates a mechanism to initiate such PA creative registration via renderURL, which poses some challenges, as outlined below.

The most naïve such mechanism, available today, would leverage the forDebuggingOnly.reportAdAuctionLoss() endpoint – that is, for any renderURL not found in the seller’s K/V server, initiate an API call to a seller endpoint to indicate that said renderURL has not yet been approved. According to #632 (comment), this function will be available until the end of 3PCD, and should suffice for short-term testing as well as during the 1% 3PCD time horizon.

Challenges with this approach:

  • A significant volume of unregistered created API calls from each device, for each such creative submitted via generateBid()

  • the K/V call doesn’t include a buyer origin - and the renderURL need not utilize the buyer origin – so there is no guaranteed way to map a URL to a given buyer (aka DSP)

  • Other key signals available in OpenRTB – such as adomain and seat – are not guaranteed to be made available to scoreAd() (and hence able to be passed into this debugging endpoint), despite them being required for creative registration. As quoted here

    The metadata accompanying the returned ad is not specified in this document, because sellers and buyers are free to establish whatever protocols they want here.

    As such, without an IAB standard for parameters like seat in renderURL, it’s unclear how buyers will be able to ensure that their creatives are being registered for all sellers.

Another alternative approach would be to somehow leverage the Private Aggregation API, but this shares all of the challenges above, as well as it being unclear how to bucket the fields required for registration (e.g. renderURL, seat, adomain). Furthermore, this also requires the immediate adoption of this API (and its requirement for TEE) in order to be able to start registering creatives, and as such, this does not seem like a short-term solution.

@MattMenke2
Copy link
Contributor

One thing we need to be careful with here is about leaking data - renderURLs haven't been checked for k-anonymity, and so requesting them can leak data (e.g., if we send them only when offered in a bid, then they could pass in a user ID for the publisher page that could be correlated with a user ID on the joining origin. 32 IGs could provide one bit of publisher page ID each, like: https://foo.test/bit-0-is-1?user=FreddyPharkas, https://foo.test/bit-1-is-0?user=FreddyPharkas, etc. Each URL has the full user ID in the joining origin, and one ordered bit from the top-level-site where the auction is running).

Sending renderURLs on IG join would be more practical, but we don't know the seller origin to send the information to, and we'd need the IG to opt-in to sending the information (normally, offering a bid is considered to provide that permission).

So I think we need to figure out the privacy story here on how we can implement this without creating a new cross-top-level-origin information leak.

@rdgordon-index
Copy link
Contributor Author

renderURLs haven't been checked for k-anonymity

Can you elaborate? As per https://github.com/WICG/turtledove/blob/main/FLEDGE.md#33-metadata-with-the-ad-bid, I wasn't expecting scoreAd() to ever receive a renderURL from generateBid() that didn't pass the k-anon check.

If generateBid() picks an ad whose rendering URL is not yet above the browser-enforced microtargeting prevention threshold, then the function will be called a second time, this time with a modified interestGroup argument that includes only the subset of the group's ads that are over threshold. (The under-threshold ad will, however, be counted towards the microtargeting thresholding for future auctions for this and other users.)

@michaelkleber
Copy link
Collaborator

Sending on IG join would be great, from the privacy POV. If only IGs declared which sellers they were willing to bit with, this would be the preferred approach. But that hasn't been a required part of IG metadata until now. I suspect that if we propose it we will hear push-back, but maybe I'm being too pessimistic? Roni, want to pop my bubble quickly?

Can you elaborate? As per https://github.com/WICG/turtledove/blob/main/FLEDGE.md#33-metadata-with-the-ad-bid, I wasn't expecting scoreAd() to ever receive a renderURL from generateBid() that didn't pass the k-anon check.

We do pass bids along to scoreAd() even if they are for ads below the k-anon bar — we need to do that, otherwise we could never learn whether they would have been the winner, which is the condition for warming up their k-anonymity count.

@MattMenke2
Copy link
Contributor

Only render URLs that win auctions (or rather, that would have won auctions) are registered with the k-anon server for the purposes of calculating k-anonymity, as otherwise, an ad could only be show to a single user, despite appearing in IGs for a lot of users. So if you're blocking ads that you've never seen before, they'll never reach the k-anon threshold. Therefore, this would need to be done for non-k-anon ads.

@MattMenke2
Copy link
Contributor

MattMenke2 commented Sep 12, 2023

And just to be clear - I mean the ads need to have won the top-level auction, in an environment that doesn't know whether they've met the k-anon threshold or not. My understanding is that you'd want to know the URL so it can be scanned before showing it anywhere. If that's not the case, and this can all be done after the ad has hit the k-anon threshold and we've already started showing the ad to users, this becomes much easier to do. We may need some sort of k-anon <renderURL, seller, component auction bool> check on how often an ad has won auctions, and once it's hit, have some way of conveying it to sellers, whether directly, or through an aggregation server of some sort.

@rdgordon-index
Copy link
Contributor Author

rdgordon-index commented Sep 12, 2023

My understanding is that you'd want to know the URL so it can be scanned before showing it anywhere.

Correct.

check on how often an ad has won auctions, and once it's hit, have some way of conveying it to sellers

To be clear, there's no desire to trigger this registration under the k-anon threshold; in other words, if a creative won't be shown to N devices, then there's no need to register it "before" it reaches this threshold.

@rdgordon-index
Copy link
Contributor Author

We do pass bids along to scoreAd() even if they are for ads below the k-anon bar

Only render URLs that win auctions (or rather, that would have won auctions) are registered with the k-anon server for the purposes of calculating k-anonymity

IMHO it isn't immediately obvious that from the explainer that this ever reaches scoreAd() -- though, upon further inspection, it's somewhat implied from this text in https://github.com/WICG/turtledove/blob/main/FLEDGE.md#12-interest-group-attributes (emphasis added):

The browser will provide protection against microtargeting, by only rendering an ad if the same rendering URL is being shown to a sufficiently large number of people (e.g. at least 100 people would have seen the ad, if it were allowed to show)

@MattMenke2
Copy link
Contributor

Agree that the explainer could be clearer on this point. I think this is the first case that's come up where the distinction really matters.

@rdgordon-index
Copy link
Contributor Author

rdgordon-index commented Sep 12, 2023

Sending on IG join would be great, from the privacy POV. If only IGs declared which sellers they were willing to bit with, this would be the preferred approach. But that hasn't been a required part of IG metadata until now. I suspect that if we propose it we will hear push-back, but maybe I'm being too pessimistic? Roni, want to pop my bubble quickly?

That would add complexity if an existing buyer/IG wanted to start working with a new seller, correct? Would updateURL allow for post-join updates to a seller list?

That being said, even if IG seller declaration were in place, that doesn't address the challenge of being able to leverage the metadata provided by generateBid() when registering these new creatives, as noted in the issue description. In other words, it's not solely about making renderURLs available off-device -- though that's definitely part of the challenge.

@rdgordon-index
Copy link
Contributor Author

as otherwise, an ad could only be show to a single user, despite appearing in IGs for a lot of users

Just so that I fully understand the privacy concern -- doesn't that situation arise the first time the ad across the k-anon threshold already?

@MattMenke2
Copy link
Contributor

I think we could pass along renderURLs to new sellers when fetching the updateURL without any major new privacy issues, though that would potentially add a bunch of network requests and overhead (We'd need up update new sellers about renderURLs, and update old sellers about new renderURLs, so if we inform sellers directly from Chrome, that could be a lot of extra traffic).

I don't think sending extra metadata specified by the IG affects the privacy characteristics here if we send the information on join (as opposed to on win on a 3P site, where it would need to be added to the k-anon check, at least). We are putting more complexity and overhead on the browser here for something that the browser doesn't really need to care about, unfortunately. Ideally we'd keep the browser API surface for this as minimal as possible.

@MattMenke2
Copy link
Contributor

Just so that I fully understand the privacy concern -- doesn't that situation arise the first time the ad across the k-anon threshold already?

So, ideally the DSP and SSP don't know when the ad reaches the k-anon threshold for the first time, so can't alter behavior based on that. It can only get so much information from loss reports, and auctions are run in a manner that limits information it can get out of them. Only doing the k-anon counting after it wins the auction is done in part to protect against exactly that sort of gaming the system.

@michaelkleber
Copy link
Collaborator

I'm still wondering whether we can find a safe way to make this happen at IG Join and Update time. Really this is kind of about the browser mediating a direct flow of information from DSP to SSP, if both of them are OK with us doing so. I'm thinking something like:

(1) Suppose SSP X has run auctions in the past [period of time] in which X has invited DSP Y to be a buyer and some Y IG has placed a bid. (Each browser instance could keep track of this.)

(2) Suppose the IG object on which DSP Y calls Join includes a new field 'OkayToTellSellersMyAdUrls': true.

If both of those are true, then at the moment of IG Join, it seems to me like it would be OK for the browser to contact that SSP's KV server — if we knew the base URL somehow — and ask for the associated KV signals for each renderURL in the IG. And then if no KV signals came back, we could send the renderURL to some SSP-chosen scan-queueing endpoint, maybe as identified by the SSP in the KV response.

This could even be one instead of two round-trips, since I don't think there's any need for the first one to go to a trusted server, this is just a question of what endpoints are set up to receive a lot of traffic (KV expecting calls on each auction) vs only a little (scan-queueing expecting traffic only when a new renderURL appears).

@rdgordon-index
Copy link
Contributor Author

and the renderURL need not utilize the buyer origin

As per the new guidance in https://github.com/WICG/turtledove/blob/main/FLEDGE.md#14-buyer-security-considerations :

the ads renderURLs should not be same-origin with the interest group’s owner

This confirms that there will be no a priori method to be able to associate a renderURL with a particular buyer (aka DSP), which is one of the challenges noted above.

@pm-harshad-mane
Copy link

This confirms that there will be no a priori method to be able to associate a renderURL with a particular buyer (aka DSP), which is one of the challenges noted above.

In this situation, DSPs should tell SSP partners which domain they will use in the renderingURL so that SSP can keep track of it on their KV server to recognize the DSP partner from the renderingURL.

@rdgordon-index
Copy link
Contributor Author

DSPs should tell SSP partners which domain they will use in the renderingURL

Agreed -- but it's also not clear that there will only be a single such render domain per DSP.

@JoelPM
Copy link
Contributor

JoelPM commented Oct 15, 2023

[I read through all the comments and think I understand what's being discussed/proposed, but apologies in advance if I rehash something or miss a point already made.]

I think @michaelkleber is on the right track when he says that we're looking for someone to mediate between the SSP and the DSP. However, the challenge with having it be the browser was already pointed out by @rdgordon-index in the initial description, as I think this still results in a "significant volume of unregistered created API calls from each device, for each such creative."

Could the K/V server be the point of coordination? It could provide an endpoint that can be queried for a list of renderURLs that have no data associated with them. It's effectively a list of cache misses. When the endpoint gets queried, it could take the extra step of filtering by checking which keys are still misses, though it doesn't have to.

Depending on how lookups get distributed geographically, it might help segment the data by region (assuming K/V servers are deployed in multiple regions and probably see different keys). This could help SSPs know which values need to be pushed to which K/V servers.

@rdgordon-index
Copy link
Contributor Author

@orrb1
Copy link
Collaborator

orrb1 commented Jan 19, 2024

Hi all. We've been exploring this issue, and have prepared a document that details a proposed solution, including a chronicle of several options considered and their respective pros/cons. Please take a look:
https://docs.google.com/document/d/1s0tTN25AiPwl3ocCFYOLqeKhetZCt_YFIYQEQ7wzHqI/edit?usp=sharing

Thanks.

@orrb1
Copy link
Collaborator

orrb1 commented Jan 19, 2024

Given the length of the document linked above, I believe it would be helpful to convey a high-level summary of that document here.

The design expressed in this document attempts to balance a few competing objectives:

  1. Ensure that all ads that a seller may be asked to score have been sent to that seller for creative scanning
  2. Don't overload sellers' servers with a firehose of ads to creative scan, and in particular avoid sending the same ad many more times than needed
  3. Minimize the privacy impact of sending ads for creative scanning

To this end, the design proposed has the following properties. The document explains each of these properties and their motivation in far greater detail.

  • Sellers expose an entrypoint at a well-known URI, e.g. https://www.example-ssp.com/.well-known/protected-audience-creative-scanning
  • Buyers can express metadata specifically intended for creative scanning, included alongside the renderURL in each ad in an interest group, and sent alongside the renderURL in requests to that entrypoint.
  • Buyers explicitly enumerate the sellers with which they participate in auctions and to whom the browser should send that buyer's ads for creative scanning. The buyer would do so using an entrypoint exposed at another well-known URI, e.g. https://www.example-dsp.com/.well-known/protected-audience-creative-scanning-buyer-config, and can override it for a specific interest group using seller capabilities.
  • Sellers can limit the rate at which ads are sent from each buyer using an entrypoint at another well-known URI, e.g. https://www.example-ssp.com/.well-known/protected-audience-creative-scanning-seller-config.

The majority of the document focuses on the question of when the browser would send ads to sellers' creative scanning entrypoints. The design alternative the document recommends proposes sending the ads of an interest group anytime that interest group is joined or updated, except that the browser would also keep track of which ads it already sent to each seller, so that it could reduce the volume of traffic sent to sellers' creative scanning entrypoint by sending each ad to each seller only once. To protect privacy, this deduplication would be partitioned by the joining site of the interest group.

Please see the document for more details, and provide your comments here on GitHub issue. Thanks.

@rdgordon-index
Copy link
Contributor Author

https://www.example-ssp.com/.well-known/protected-audience-creative-scanning?renderURL=<URL-encoded-renderURL>&metadata=<URL-encoded-metadata>

question: can we include interestGroupOwner as well?

@rdgordon-index
Copy link
Contributor Author

Though this approach relies on buyers explicitly enumerating all of the sellers with whom they participate in auctions, other approaches for determining the list of target sellers - e.g. having the browser remember, at auction time, which buyers participated in that auction - cause a potential leak of cross-site identity via the seller domain. Mitigating this leak requires an allowlist of sellers, making these approaches redundant, and leaving us no better option than the explicit enumeration.

Can you elaborate on the "potential leak" here? A buyer submitting a bid is effectively "allowing" the seller to scan their creatives.

@orrb1
Copy link
Collaborator

orrb1 commented Jan 19, 2024

https://www.example-ssp.com/.well-known/protected-audience-creative-scanning?renderURL=<URL-encoded-renderURL>&metadata=<URL-encoded-metadata>

question: can we include interestGroupOwner as well?

Yes, that sounds like a good idea. I've modified the document to reflect it. I've also added a change log at the bottom of the document to record any changes made from when the document was first posted here.

Though this approach relies on buyers explicitly enumerating all of the sellers with whom they participate in auctions, other approaches for determining the list of target sellers - e.g. having the browser remember, at auction time, which buyers participated in that auction - cause a potential leak of cross-site identity via the seller domain. Mitigating this leak requires an allowlist of sellers, making these approaches redundant, and leaving us no better option than the explicit enumeration.

Can you elaborate on the "potential leak" here? A buyer submitting a bid is effectively "allowing" the seller to scan their creatives.

This is a good question. The privacy risk described here would not be part of normal operation, but a malicious party could cause a leak in the following way. An auction is run on the user's device for which the seller is userID_on_publisher.adtechB.com, which the browser would remember had participated in an auction with a given buyer. At a later time, that buyer joins the user to an interest group for which an ad's renderURL is adtechA.com/userID_on_advertiser. The browser would then send a creative scanning request https://userID_on_publisher.adtechB.com/.well-known/protected-audience-creative-scanning?renderURL=adtechA.com/userID_on_advertiser&.... Attestation is insufficient to protect against this because it's enforced at eTLD+1 so that userID_on_publisher.adtechB.com would be allowed to run an auction under the attestation for adtechB.com. This is the reason why the browser can't automatically remember the seller-buyer mapping, and the design instead relies on buyers explicitly enumerating the sellers for which their ads should be sent for creative scanning.

@rdgordon-index
Copy link
Contributor Author

Attestation is insufficient to protect against this because it's enforced at eTLD+1

Technically true, but attestation also requires the ad tech vendor to indicate that they're not going do this kind of thing -- and because it's on the adtechB.com, that means that this ad tech would be in violation of their own attestation to the contrary.

@dmdabbs
Copy link
Contributor

dmdabbs commented Jan 22, 2024

Thanks for the well-written and thought out proposal, @orrb1.
I have comments to post and find myself wanting to comment in situ such as on a PR, versus copy/paste/formatting the context into comment(s) in this issue.
Could the external doc be converted to a PR to, say, a "proposals" folder markdown doc?

@dmdabbs
Copy link
Contributor

dmdabbs commented Jan 22, 2024

Buyers could specify the list of sellers to which they would want to send their ads for creative scanning by exposing an entrypoint at another well-known URI. The browser would issue a GET request to the buyer's server, e.g.
https://www.example-dsp.com/.well-known/protected-audience-creative-scanning-buyer-config

Suggestion to please use a consistent root path component for all Protected Audience .well-known URIs as Attribution Reporting has. We find this helpful for request routing.

@dmdabbs
Copy link
Contributor

dmdabbs commented Jan 22, 2024

Buyers' Config Publishing

Since Chrome proposes to commit to fetching and persisting the new creative scanning config and you intend to extend sellerCapabilities, WDYT of generalizing this endpoint to publish using the single config scheme and affording future needs?

For example, https://www.example-dsp.com/.well-known/protected-audience/buyer-config

{
   "sellerCapabilities": {
     "https://seller1.com": { "creative-scanning" },
     "https://seller2.com": { "latency-stats", "creative-scanning" },
     "https://seller3.com": { "latency-stats" },
     "*": { "interest-group-counts" }
   }
}

An interest group can override the settings on any IG join, otherwise these are used. Same caveat mentioned in the explainer applies, that creative scanning cannot have a catch-all. The nifty new scanning declarations are yet another thing to hang onto every IG registration that counts against the size constraints.
In for a penny, in for a pound?

@dmdabbs
Copy link
Contributor

dmdabbs commented Jan 22, 2024

Seller Configs

Same might apply on the seller side along with picking up the perBuyerXXX keyed dict pattern from auctionConfig.

For example, https://www.example-ssp.com/.well-known/protected-audience/seller-config

{
     "perBuyerCreativeSampling": {
       "https://www.example-dsp.com": { "sampling_rate": 1},  // Send all ads from this buyer
       "https://www.another-dsp.com": { "sampling_rate": 0},  // Don't send any ads from this buyer 
       "*": { "sampling_rate": 0.1}                           // Send 10% of ads from other buyers
     }     
     etc...
}
@dmdabbs
Copy link
Contributor

dmdabbs commented Jan 22, 2024

Submitting Creatives

GET https://www.example-ssp.com/.well-known/protected-audience-creative-scanning?renderURL=<URL-encoded-renderURL>&interestGroupOwner=<URL-encoded-interest-group-owner>&metadata=<URL-encoded-metadata>

Chrome isn't consuming anything from the response, right? You can ditch the URL encoding by POSTing,

POST https://www.example-ssp.com/.well-known/protected-audience/creative-scanning

{
   "https://www.example-dsp.com": [
      {
        "renderURL": "https://some-adserver.com/...",   
        "metadata": {the renderURL's associated creativeScanningMetadata}
      }
   ]
}

Also free to send multiple creatives identified at the IG joining site as suggested in your perferred Option 2b.

@rdgordon-index
Copy link
Contributor Author

question: when would creativeScanningHistory table be purged? Any relationship to how/when IGs are cleared?

@dmdabbs
Copy link
Contributor

dmdabbs commented Jan 23, 2024

question: when would creativeScanningHistory table be purged? Any relationship to how/when IGs are cleared?

Not sure if this is where you were going @rdgordon-index, but statements like

since the creative scanning entrypoint would need to see an ad only once

had me wondering what frequency sellers get to re-review renderURLs. In today's workflows I'm familiar with, if our fetch url is active our partner re-verifies it.

@rdgordon-index
Copy link
Contributor Author

had me wondering what frequency sellers get to re-review renderURLs

Somewhat depends on which Option is under consideration; indirectly, in Options 1 & 2, for example:

the browser would occasionally send an ad that does have an associated entry in the creativeScanningHistory

Which is some form of re-scanning, albeit indirectly -- I was asking about the explicit ability to do so.

@dmdabbs
Copy link
Contributor

dmdabbs commented Jan 23, 2024

The maxTrustedBiddingSignalsURLLength currently being plumbed would be handy to specify as a 'global' IG/buyer config attribute (should the buyer-config accommodate knobs beyond seller scanning). Same if sellers will also get to limit scoring URL length.

@orrb1
Copy link
Collaborator

orrb1 commented Jan 24, 2024

Thank you both, Roni and David, for your thoughtful feedback. I'll try to answer each of your points below.

Technically true, but attestation also requires the ad tech vendor to indicate that they're not going do this kind of thing -- and because it's on the adtechB.com, that means that this ad tech would be in violation of their own attestation to the contrary.

Though this is true, the Protected Audience API has a precedent of enforcing with technical restrictions what can be enforced, and relying on policy where that isn't possible.

Could the external doc be converted to a PR to, say, a "proposals" folder markdown doc?

We considered this among other options for posting this design and getting feedback. The goal was specifically to encourage most of the conversation to stay in this thread so that anyone who's interested can stay involved.

Suggestion to please use a consistent root path component for all Protected Audience .well-known URIs as Attribution Reporting has.

That's a good idea. There's an existing prefix for permission delegation, as described in the explainer, which we can use here as well. I've updated the well-known URIs in this design to be:

  • https://www.example-ssp.com/.well-known/interest-group/creative-scanning - the seller's creative scanning entrypoint
  • https://www.example-ssp.com/.well-known/interest-group/creative-scanning-seller-config - the seller's configured per-buyer rate limits for ads sent to their creative scanning entrypoint.
  • https://www.example-dsp.com/.well-known/interest-group/creative-scanning-buyer-config - the buyer listing of sellers to whom to send ads for creative scanning unless otherwise overridden by seller capabilities on the interest group.

Since Chrome proposes to commit to fetching and persisting the new creative scanning config and you intend to extend sellerCapabilities, WDYT of generalizing this endpoint to publish using the single config scheme and affording future needs?

The concern here would be in determining what to do if there's a network error while trying to fetch the buyer config. At interest group join time, we'd have only a partial interest group, and that group may have trouble participating in auctions on that device, for example, in auctions that have required seller capabilities. Providing everything inline protects against that. Having just the sellers for creative scanning in a buyer config that needs to be fetched is an acceptable risk, since, even if the buyer config fetch fails, the interest group can still participate in auctions, and presumably the buyer config fetch will succeed on another device, which will send that buyer's ads for creative scanning.

Same might apply on the seller side along with picking up the perBuyerXXX keyed dict pattern from auctionConfig.

The issue with combining perBuyerCreativeSampling together with other perBuyerXXX fields is that they're used at different times. In most of the design options listed in this doc, creative scanning happens at interest group join/update time, when there isn't an available auction config. The other perBuyerXXX fields are all used at auction time, when there is an available auction config. (There are a few options that happen at auction time, but those notably do not rely on perBuyerCreativeSampling, because the seller can dictate - via one of their return values from scoreAd() - which ads should be sent for creative scanning.)

Chrome isn't consuming anything from the response, right? You can ditch the URL encoding by POSTing,

Yes, there seem to be some compelling benefits to using a POST here. I've updated the document to use a POST instead of a GET for the creative scanning entrypoint.

Question: when would creativeScanningHistory table be purged? Any relationship to how/when IGs are cleared?

Entries in the creativeScanningHistory will be purged when interest groups are cleared.

What frequency do sellers get to re-review renderURLs. In today's workflows I'm familiar with, if our fetch url is active our partner re-verifies it.

Could you clarify this? I had envisioned the creative scanning problem as a "discovery" problem. Once the seller knows about an ad, is there any reason it couldn't reverify that ad anytime it wanted to?

From my perspective, an ad repeatedly sent to a seller's creative scanning entrypoint was a thing to be avoided because it contributed unnecessary load to the entrypoint. Still, in most of the options, an ad will likely be sent many times throughout its use. In options 3, 5, 6, and 7, a seller can either explicitly request that an ad be sent to their creative scanning entrypoint at any time. In other options, e.g. options 2 and 2a, other devices would send that ad, so sellers would get an opportunity to re-verify anytime a new device joins that interest group.

The maxTrustedBiddingSignalsURLLength currently being plumbed would be handy to specify as a 'global' IG/buyer config attribute (should the buyer-config accommodate knobs beyond seller scanning). Same if sellers will also get to limit scoring URL length.

This seems like a new idea that's distinct from creative scanning. If you'd like to explore this further, could you please file a new issue for further discussion? Thanks.

@dmdabbs
Copy link
Contributor

dmdabbs commented Jan 24, 2024

Providing everything inline protects against that.

Yes after posting I realised that. You want the IG in a ready-to-go state in the IG cache, sans any 'assembly.'

Could you clarify this? I had envisioned the creative scanning problem as a "discovery" problem. Once the seller knows about an ad, is there any reason it couldn't reverify that ad anytime it wanted to?

Yes. Good point. Up to sellers when to age off discovered renderURLs.

This seems like a new idea that's distinct from creative scanning. If you'd like to explore this further, could you please file a new issue for further discussion? Thanks.

Indeed it was. I'll post something separate from this thread. Thanks.

@dmdabbs
Copy link
Contributor

dmdabbs commented Jan 25, 2024

@dmdabbs: Same might apply on the seller side along with picking up the perBuyerXXX keyed dict pattern from auctionConfig.
@orrb1: The issue with combining perBuyerCreativeSampling together with other perBuyerXXX fields....

Re-reading your response on the train I see that "picking up the perBuyerXXX pattern" could have been clearer.
I wasn't advocating picking up unrelated perBuyerXXX, only their slightly more consise representation ('pattern'). Perhaps a foolish consistency on my part:

This

{
  "perBuyerSamplingRates": [
    {"interest_group_owner": "https://www.example-dsp.com","sampling_rate": 1},
    {"interest_group_owner": "https://www.another-dsp.com","sampling_rate": 0}
  ],
  "defaultSamplingRate": 0.1
}

compared to

{
  "perBuyerSamplingRates": {
    "https://www.example-dsp.com": { "sampling_rate": 1},
    "https://www.another-dsp.com": { "sampling_rate": 0},
    "*": { "sampling_rate": 0.1}
  }
  etc...     
}

where the map pattern obviates the "interest_group_owner" and "defaultSamplingRate" labels. It's the OpenRTB background - looking for a concise representation to reduce network bytes. The 'etc...' was to accommodate future, appropriate attributes.

@orrb1
Copy link
Collaborator

orrb1 commented Jan 25, 2024

@dmdabbs: Same might apply on the seller side along with picking up the perBuyerXXX keyed dict pattern from auctionConfig.
@orrb1: The issue with combining perBuyerCreativeSampling together with other perBuyerXXX fields....

Re-reading your response on the train I see that "picking up the perBuyerXXX pattern" could have been clearer. I wasn't advocating picking up unrelated perBuyerXXX, only their slightly more consise representation ('pattern'). Perhaps a foolish consistency on my part:

This

{
  "perBuyerSamplingRates": [
    {"interest_group_owner": "https://www.example-dsp.com","sampling_rate": 1},
    {"interest_group_owner": "https://www.another-dsp.com","sampling_rate": 0}
  ],
  "defaultSamplingRate": 0.1
}

compared to

{
  "perBuyerSamplingRates": {
    "https://www.example-dsp.com": { "sampling_rate": 1},
    "https://www.another-dsp.com": { "sampling_rate": 0},
    "*": { "sampling_rate": 0.1}
  }
  etc...     
}

where the map pattern obviates the "interest_group_owner" and "defaultSamplingRate" labels. It's the OpenRTB background - looking for a concise representation to reduce network bytes. The 'etc...' was to accommodate future, appropriate attributes.

Ah, sorry for the misunderstanding. It makes a lot of sense to use a format that's consistent with existing parameters. I've updated this in the document. Thanks.

@rdgordon-index
Copy link
Contributor Author

A few additional comments in advance of the WICG meeting:

  • Option 4

    • Can you elaborate on the possibility of the browser maintaining a hash by seller + renderURL + k-anon status? Would this address the case of a new seller coming on board for an existing k-anon renderURL, or a long-lived creative that is below the threshold?
  • Option 6

    • this is most similar to the existing creative registration workflow, lowest volume to seller's creative registration endpoints
    • the trustedScoringSignalsURL endpoint already is not supposed to store any state -- and we've attested to that -- and in the future, the TEE will guarantee this. If so, can you elaborate on the privacy risk here?
    • the scanning endpoint doesn't receive cross-site information in the QSPs -- so I'm curious about the nature of the leak
  • Scanning Rate

    • difficult to manage based on number of devices and IGs, renderURLs - this is a moving target, especially given the rate and scale of joinAdInterestGroup calls today
    • is this intended just to control the firehose? Is there a way to ensure that we see all renderURLs without constant tuning?

I'm aligned that Options 1, 3, 5 and 7 are less desirable; and 2 is preferable to 2b from a seller workload perspective.

@orrb1
Copy link
Collaborator

orrb1 commented Mar 8, 2024

Hi everyone,

Thank you for all of your feedback on the document and proposals. Based on that feedback, we've made several changes to the design reflected in the document and described below. We've also changed the structure of the document to reflect the current recommended design, while moving the other options explored into an "Alternatives Considered" section. Please continue to provide us with feedback as we continue to explore potential solutions for supporting creative scanning with the Protected Audience API.


From the notes:

(Patrick McCann) Would be better fit for purpose if the top level seller could identify the creative scanner instead of the component sellers scanning the ads?

In the current recommended design, the owner of the interest group can indicate which attested parties should be notified of new ads. This can absolutely include the top-level seller. We've updated the design so that this seller can explicitly indicate the creative scanner. They would do this using a new creativeScanningURL in the seller config exposed at their .well-known URI (e.g., https://www.example-ssp.com/.well-known/interest-group/creative-scanning-seller-config). (This replaces the previous behavior in which the seller would expose a second .well-known URI as the target of the creative scanning request.) From the updated document: "The seller’s creative scanning entrypoint indicated in the seller config does not need to be hosted at the seller’s origin, and the seller can choose to send ads for creative scanning directly to a third party vendor that specializes in ad quality."


Patrick McCann and Laurentiu Badea both asked about having the Trusted Scoring Signals Server keep track of which ads had no corresponding signals - a sign that these ads had not been previously scanned - and expose those via an endpoint. Laurentiu pointed to Joel's prior comment on this issue. From Joel's comment:

Could the K/V server be the point of coordination? It could provide an endpoint that can be queried for a list of renderURLs that have no data associated with them. It's effectively a list of cache misses. When the endpoint gets queried, it could take the extra step of filtering by checking which keys are still misses, though it doesn't have to.

We've added this idea as a new "Option 8" in the alternatives considered section of the document. Copying from the analysis provided there:

If all Trusted Scoring Signals Servers were running in TEEs, a design like this could work for creative scanning while still preserving privacy. In order to mitigate the privacy risk incurred by allowing for the exfiltration of ad URLs that could potentially be used to expose a user's cross-site identity, the Trusted Scoring Signals Server could aggregate "cache misses" and then, after a delay (e.g. once a day), expose only those that have been reported by at least k devices, enforcing a k-anonymity threshold for creative scanning that would help mitigate the privacy risk. However, for this to work, the Trusted Scoring Signals request would need to include an identifier for that device, which is a privacy risk while Trusted Scoring Signals Servers still run outside of TEEs. We’re continuing to explore whether this offers a feasible solution in the short-term.


From the notes:

(David Dabbs) Component ads are a new thing, there will be some vendors that use this, and you would get some markup blob to scan but you would need to talk to the seller - effectively you are submitting markup

This is a fair point, as the browser currently fetches trusted scoring signals for component ads as part of the same request that fetches trusted scoring signals for ads. The design has been updated to reflect that the renderURL and creativeScanningMetadata for each component ad would be sent for creative scanning alongside the renderURL and creativeScanningMetadata for each ad.


From the notes:

Pat McCann: Can you describe more about 5 - is it more expensive with higher quality?

Option 5 is more expensive without any benefit in quality. This option explored the question of whether the trusted scoring signals server could be used to indicate whether an ad should be sent to the creative scanning entrypoint. The conclusion of that exploration was that using the trusted scoring signals server to, in effect, triage the ads and determine which should be sent to the creative scanning entrypoint was inefficient. Assuming that the creative scanning entrypoint would be less expensive than the trusted scoring signals server, making a request to that more expensive trusted scoring signals server only to determine whether or not to make a request to the less expensive creative scanning entrypoint would be inefficient.


From the notes:

Stan Belov: Was a discussion from the private aggregation api to use this for the trusted scoring signals - issue was that the map is mapped to a 128bit design which does not know about the creative ids etc. Have you thought about extending the private aggregation api?

The Private Aggregation API doesn't seem to be a good match for conveying arbitrary renderURLs. The aggregation key in an Private Aggregation API event is limited to 128 bits. As such, it could be used to convey the hash of a renderURL, but without knowing a priori what the set of all possible renderURLs could be, we couldn't convert that hash back to a renderURL.


From the notes:

David Dabbs: The simplest approach here is the industry solves this and buyers submit to sellers etc

The current proposed design provides a mechanism that could be replicated by buyers sending their creatives directly to creative scanners. Building support as part of the Protected Audience API aims to establish a set of protocols to make that process easier.


Roni Gordon: Option 4. Can you elaborate on the possibility of the browser maintaining a hash by seller + renderURL + k-anon status? Would this address the case of a new seller coming on board for an existing k-anon renderURL, or a long-lived creative that is below the threshold?

Though this would address the issues you described, the effect would be to make this option identical in its behavior to option 2. The reason for this is that, at an individual device, if the ad first arrives when it's already k-anonymous, the browser wouldn't know to which sellers the ad had been sent from other devices before it was k-anonymous. As a result, each browser would fall back to sending each ad to each seller for each new ad, and potentially a second time if first sent before it was k-anonymous.


Roni Gordon: Option 6. The trustedScoringSignalsURL endpoint already is not supposed to store any state -- and we've attested to that -- and in the future, the TEE will guarantee this. If so, can you elaborate on the privacy risk here? The scanning endpoint doesn't receive cross-site information in the QSPs -- so I'm curious about the nature of the leak

Though the TEE provides a guarantee that the trusted scoring signal server won't be able to exfiltrate any information by itself, the control it has over whether an ad is sent to creative scanning servers would provide it with a mechanism for exfiltrating a small amount of information. The trusted scoring signal server potentially has access to multiple sites’ worth of information - context from the publisher site and renderURLs from advertiser sites. If the trusted scoring signal server intentionally selected a subset of ads to be sent to the seller's creative scanning entrypoint, these could be used to reconstruct a user's cross-site identity.


Roni Gordon: Scanning Rate is difficult to manage based on number of devices and IGs, renderURLs - this is a moving target, especially given the rate and scale of joinAdInterestGroup calls today. Is this intended just to control the firehose? Is there a way to ensure that we see all renderURLs without constant tuning?

The sampling rate defined in the seller config is an optional configuration that sellers may use to tune the rate of traffic as they see fit. If no per-buyer sampling rates are provided, the default sampling rate assumed is 1.0, so that all ads are sent to the sellers' creative scanning entrypoint. To ensure that they see all renderURLs, a seller may choose to maintain a sampling rate of 1.0 and, as noted in the document, efficiently shed previously discovered renderURLs at their creative scanning entrypoint.

@dmdabbs
Copy link
Contributor

dmdabbs commented Mar 11, 2024

Appreciate the follow-ups to address earlier threads, @orrb1, and the updated written spec proposal.

From your doc:

The browser would send an ad's renderURL for creative scanning.

A number established features and emerging proposals concern renderURLs:

  1. The specified & implemented, but not yet required, creative size declarations factoring into k-anonymity
  2. The deprecatedRenderURLReplacements work that is underway
  3. this "renderURL scanning support" proposal
  4. Reducing interest group payload by compressing renderURLs #1076, @ardianp-google requested a few days ago
  5. Multi-bid support that is underway
  6. Video and native delivery approaches on which folks are currently iterating

Regardless of how these chips land, I presume that the constraint will remain that a bidder/buyer will not be permitted to submit novel "render URLs"; they must be recognizable as present in the IG on device.

On #1. the explainer says,

width: The creative's width. This size will be matched against the declaration in the interest group and substituted into any ad size macros present in the ad creative URL.

Does this mean that the "creative url" supplied to the seller will have AD_WIDTH & AD_HEIGHT replaced as Chrome does prior to navigating?

On #2,
Same. Will these be substituted prior to sending to seller(s)?

On #4
Basically the same. Chrome should 'instantiate' the template to a string prior to supplying to seller, yes?

On #5
Does Chrome submit to sellers all the creatives submitted by the seller or just the winner?

On #6
Today PA only does banners. Once you teach it about newfangled formats like video &c, the POST to the seller will need some signal indicating what media rendering use case the render URL entity is for.

buyers could provide a set of metadata fields, e.g. a domain and seat, that would be sent alongside the renderURL in support of creative scanning. A buyer would provide these using a new creativeScanningMetadata optional property

Some of these are in discussion for buyers to provide to sellers in the bid ad metadata attribute. Would be nice to avoid duplication.

Buyers would explicitly specify the list of sellers by serving a “creative scanning buyer config” at a well-known URI.

Chrome has or will mitigate attestation file availability by downloading these via some Chrome component. Wondering how to keep this file/fetch from experiencing similar issues. Can one assume that no buyer creatives will be shared if there is not a cached resource available? Also the fetch will be out of the critical path, yes?

Each seller would expose a “creative scanning seller config” at a well-known URI.

Same here.

To do so, the browser would collate all of the ads of an interest group sent to a given seller and send these as a POST request of the form

Is this answering Yes to the question above regarding multibid submissions?

'renderUrl': shoesAd1,

Suggest using realistic, illustrative URLs. |
Nit (renderUrl->renderURL)

@rdgordon-index
Copy link
Contributor Author

if the ad first arrives when it's already k-anonymous, the browser wouldn't know to which sellers the ad had been sent from other devices before it was k-anonymous

Can you clarify why the browser would need to know about what's happening on other devices in this case (for Option 4)? If the hash includes seller, it should already know what the 'new seller' is -- and the existing sellers are already locally stored in the cache.

@rdgordon-index
Copy link
Contributor Author

If the trusted scoring signal server intentionally selected a subset of ads to be sent to the seller's creative scanning entrypoint

Can you elaborate on the nature of this "intention"? By definition, only ads that need to be re-scanned, or aren't already scanned, would be sent to the creative scanning endpoint -- so how is this any different?

@rdgordon-index
Copy link
Contributor Author

To ensure that they see all renderURLs, a seller may choose to maintain a sampling rate of 1.0 and, as noted in the document, efficiently shed previously discovered renderURLs at their creative scanning entrypoint.

I don't think that's a viable solution -- that's an enormous amount of network traffic simply to discard it at the entrypoint.

@orrb1
Copy link
Collaborator

orrb1 commented Apr 26, 2024

@rdgordon-index - I have a couple of small follow questions regarding your initial comment on this issue. If given both the renderURL and the buyer origin, would it be possible to infer the other key signals needed for creative scanning? Specifically, could adomain be determined from either response headers returned from the ad server or by rendering the creative, and could seat be inferred using the renderURL, buyer origin, and adomain?

@rdgordon-index
Copy link
Contributor Author

If given both the renderURL and the buyer origin, would it be possible to infer the other key signals needed for creative scanning?

Would definitely be valuable to include a link between renderURL and buyer origin, since they aren't supposed to be same, as per https://github.com/WICG/turtledove/blob/main/FLEDGE.md#14-buyer-security-considerations, so this way, we would be able to definitively associated renderURLs to a particular buyer -- today, it's implicit, and works primarily because buyers are still using their origin as their renderURL parent domain (as that security consideration was a later addition to the markdown file).

Specifically, could adomain be determined from either response headers returned from the ad server or by rendering the creative, and could seat be inferred using the renderURL, buyer origin, and adomain?

For response headers -- are you thinking about something like https://developers.google.com/authorized-buyers/rtb/protected-audience-api#automatic_creative_scanning ?

'returned from the ad server' -- #1028 talks about some of the challenges and assumptions as to whether or not the renderURL 'ad server' would have the full awareness of these parameters -- but assuming so, then yes, in principle, it could be scanned as part of the creative registration process. How would you solve for the for AD_WIDTH and AD_HEIGHT macros that were added as part of #417, since the K/V call doesn't have access to these?

or by rendering the creative

Typically this would involve support for some sort of 'creative audit' flags to ensure that the renderURL is able to be fetched by creative scanners -- there are complications for geography and client-side IP expectations, for instance, that often come into play. In practice this also means crawling the rendered creative to determine adomain (a.k.a. landing pages), which isn't always straight-forward -- so a declarative approach, like response headers, is preferred IMO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
7 participants