Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider dropping permission for captured surface control APIs #48

Open
jan-ivar opened this issue Oct 21, 2024 · 29 comments
Open

Consider dropping permission for captured surface control APIs #48

jan-ivar opened this issue Oct 21, 2024 · 29 comments

Comments

@jan-ivar
Copy link
Member

jan-ivar commented Oct 21, 2024

Let's continue discussion from #27 here where other members can contribute.

The choice of requiring permission influences API design, like requiring methods over attributes, but this is of secondary concern.

We should first agree on whether permission is required or not for scrolling and/or zoom features. This should be based on threat vectors and UX concerns, and what the guidelines say around that. I mention some of them in #27 (comment):

... Let's look to the guidelines for help.

§ 2.10. Require user activation for powerful APIs says "user activation ... is not always sufficient to protect users from invasive behaviours, and seeking meaningful consent is also important."

"not always" = sometimes. So there's a chance we're good, since we implement something even stronger than consuming activation here. The question is:

Is meaningful consent required here? § 1.4. Ask users for meaningful consent says: "If a useful feature has the potential to cause harm to users, ... make sure ... they can refuse consent effectively."

Do we feel buttons that (when interacted with) can scroll down or zoom a captured tab all the way out reaches a level of harm? Possibly, since this might reveal more webpage information than the user expected.

But it also says: "If a feature is powerful enough to require user consent, but it’s impossible to explain to a typical user what they are consenting to, that’s a signal that you may need to reconsider the design of the feature."

Should we work on mitigating these risks directly?

Do we need to pull in the full permission machinery with delegated permissions, query and the like, or might giving UAs the option to throw NotAllowedError suffice?

@eladalon1983
Copy link
Member

It's not clear to me which heavy machinery is required. If a browser wishes to have a trivial, always-granted permission of the "captured-surface-control" permission policy:

  • Does the spec not already permit that?
  • Is there a non-trivial cost to the implementers of that browser?

@steely-glint
Copy link

Stepping back slightly - I feel like the risk here is that we are encouraging the user to view the captured surface through the machinery of the video call app (say) but interact with it semi-directly. There is no certainty that the VC will faithfully render the capture to the local user. What an attacking remote user sees is not necessarily what the local user sees (until they uncover the captured surface). It might have scrolled down through your emails but still be showing you and the rest of the conference the 3rd page of the first email whilst rendering the whole thing to the attacker.

So yes, I think we need informed user consent (unless you tell me how deceptive zoom/scroll is prevented otherwise).

@eladalon1983
Copy link
Member

Indeed.

@jan-ivar
Copy link
Member Author

There is no certainty that the VC will faithfully render the capture to the local user.

We should work to help UAs ensure this, as I propose in #49. I'd rather address the risk than slap a permission on it.

@jan-ivar
Copy link
Member Author

It's not clear to me which heavy machinery is required. If a browser wishes to have a trivial, always-granted permission of the "captured-surface-control" permission policy:

  • Does the spec not already permit that?
  • Is there a non-trivial cost to the implementers of that browser?

Appeal to Triviality arguments for implementing something run afoul of § 1.7. Add new capabilities with care.

A new permission adds:

  • A new iframe allow="display-capture https://a.com; captured-surface-control https://a.com;" that web developers need to turn on explicitly
  • A new permissions.query({name: "captured-surface-control"}) value web developers need to consider
  • Implementation-defined behavior web developers need to consider for interop
    image

One might even say a new permission without good reason is a failure to standardize.
I've not heard a good reason yet for this level of granularity at the iframe level, or at the prompt level.

If there's a way to avoid this extra level of complexity for web developers (not vendors), I'd like to exhaust those ideas first.

@eladalon1983
Copy link
Member

eladalon1983 commented Oct 22, 2024

A new iframe allow="display-capture https://a.com; captured-surface-control https://a.com;" that web developers need to turn on explicitly

That is desirable. Think embedding a third-party video-conferencing tool into an application and limiting its capabilities to control the user experience.

A new permissions.query({name: "captured-surface-control"}) value web developers need to consider

That's not a problem.

Implementation-defined behavior web developers need to consider for interop

This point is purely academic, because Chrome deems the permission policy as required, and that means we have to introduce this implementation-defined part of the API (your own citation discusses that the policies are impl-defined). Luckily, we can more easily reach consensus here, because our belief that a permission policy is needed, does not force you to also implement things this way - you are free to infer user intent and skip a prompt.

One might even say a new permission without good reason is a failure to standardize.

There are good reasons and they have been presented to you in this thread as well as the earlier threads ([1], [2]).

I've not heard a good reason yet for this level of granularity at the iframe level, or at the prompt level.

For iframes - see earlier in this comment.
For prompt - you have heard good reasons. Please re-read @steely-glint's comment and my comments on both this thread as well as the earlier threads ([1], [2]).

If there's a way to avoid this extra level of complexity for web developers (not vendors), I'd like to exhaust those ideas first.

The onus is on you to present a way to avoid this "level of complexity".

@youennf
Copy link

youennf commented Oct 22, 2024

Most of the permissions I saw are high level permissions like camera, location...
It is not clear to me that this particular permission will be easy to explain to the user, hence my preference for no-prompt approaches if we can.
As an illustration, is it Chrome's plan for this permission to be exposed to website settings pane along location, camera, notification and so on?

Note that this feature is very particular since it is already gated by screen share permission. I do not think other permissions are usually gated by super permissions.

To be noted that it would be convenient for the user/UA to enable/disable/reenable gesture forwarding during a capture.
The current API does not really offer this flexibility. API checks permission at a single place and the web page will then think that forwarding is on until the end of the screen share capture.

It would seem better to expose the fact that gesture forwarding for a particular media element is on or off.

@eladalon1983
Copy link
Member

eladalon1983 commented Oct 22, 2024

It is not clear to me that this particular permission will be easy to explain to the user

In our experience, it is clear and it does help.
But browsers are free to skip the permission prompt.

As an illustration, is it Chrome's plan for this permission to be exposed to website settings pane along location, camera, notification and so on?

Yes.

Most of the permissions I saw are high level permissions like camera, location...
...
Note that this feature is very particular since it is already gated by screen share permission. I do not think other permissions are usually gated by super permissions.

PTZ is gated on camera.
We have a precedent.

To be noted that it would be convenient for the user/UA to enable/disable/reenable gesture forwarding during a capture.
The current API does not really offer this flexibility. API checks permission at a single place and the web page will then think that forwarding is on until the end of the screen share capture.

The spec mandates that every forwarded event is permission-checked before being forwarded.

By the way, you have just made a great point. You want the UA to allow users to revoke this capability, be it during a capture session or outside of one. I want that too. And this perfectly illustrates the need for the established and meticulously spec-ed and implemented mechanism of permissions policies. We should not "roll our own".

@steely-glint
Copy link

steely-glint commented Oct 22, 2024

There is no certainty that the VC will faithfully render the capture to the local user.

We should work to help UAs ensure this, as I propose in #13 (comment). I'd rather address the risk than slap a permission on it.

I doubt this is possible - you'd have to disable SVG, CSS, layout, z-order and a tonne of things to ensure that the video tag is a faithful rendering of the capture.

@eladalon1983
Copy link
Member

eladalon1983 commented Oct 22, 2024

There is no certainty that the VC will faithfully render the capture to the local user.

We should work to help UAs ensure this, as I propose in #13 (comment). I'd rather address the risk than slap a permission on it.

I doubt this is possible - you'd have to disable SVG, CSS, layout, z-order and a tonne of things to ensure that the video tag is a faithful rendering of the capture.

Indeed again.

The result is that it breaks any app that ever wants to draw in front of the video element.

And what would be achieved? A way to avoid a prompt, which is optional to begin with if UAs wish to skip it? What's the benefit?

And what would replace the Permission Policy machinery in allowing users to revoke this capability if they don't desire it?

I don't understand this thread at all. Nor do I understand the basis for the insulting accusations that Chrome has just "slapped" this prompt on without thinking it through.

@jan-ivar
Copy link
Member Author

@eladalon1983 My "slap a permission on it" refers to what you asked other browser vendors to do in #48, which you said was "trivial". Thank you very much, but I'd prefer not to take that advice.

I'll try to choose my words more carefully next time when rejecting an idea.

I want to correct the record that I have not accused the Chrome team of anything. We are here to standardize functionality for all browsers. This inherently involves scrutiny and disagreements over designs and ideas, but let's keep it to that and not people.


With my co-chair hat, I ask that we not "assign intent or interpretations to other contributors' comments". This is part of our work mode, which I encourage everyone to reread every so often.

@jan-ivar
Copy link
Member Author

I doubt this is possible - you'd have to disable SVG, CSS, layout, z-order and a tonne of things to ensure that the video tag is a faithful rendering of the capture.

We'd disable forwarding, not those things. UAs already do all sorts of mitigations like this. So that doesn't seem unusual.

We should also be considering these mitigations whether we add permission or not.

@eladalon1983
Copy link
Member

eladalon1983 commented Oct 23, 2024

My "slap a permission on it" refers to what you asked other browser vendors to do in #48, which you said was "trivial". Thank you very much, but I'd prefer not to take that advice.

I think you have misread that comment. Please read it again. It said that you could have a "trivial" implementation that avoids a prompt, so it's the opposite of slapping a permission (prompt) - it said it would be "trivial" for you to avoid the permission (prompt).

As to adding the permission - we thought about it long and hard in Chrome and carefully decided it was necessary. If you think long and hard about this non-trivial issue, and come to the opposite conclusion, it would still be trivial for you to avoid this prompt which we non-trivially added.

We'd disable forwarding, not those things.

  1. We have feedback from multiple Web developers in our origin trial, saying they are already using the API in exactly the way you seek to block. (That is - the video element has annotations and other elements on top, and the scrolling is forwarded from an overlaid surface.)
  2. For some applications, you would be blocking it 100% of the time, not just when there is something drawing on top of the video element, because these developers always have that element over the video, ready to draw something.
  3. Even when you block forwarding when something flies across the video - how annoying it will be for the user when they try to scroll the captured tab and end up scrolling the capturing tab instead. Your limitation would create those bugs and developers would lose their access to the simple fix they currently have, of forwarding scrolls from an overlay.
  4. We have a browser (Chrome) saying they are unwilling to implement it without a permission, whereas other browsers are perfectly able to skip the prompt if they wish, and have the flexibility to change their mind on it multiple times as the years go by and new information pours in.

Given these points - especially 1 and 4 - I don't see what this discussion aims to achieve.

@eladalon1983
Copy link
Member

eladalon1983 commented Oct 23, 2024

I doubt this is possible - you'd have to disable SVG, CSS, layout, z-order and a tonne of things to ensure that the video tag is a faithful rendering of the capture.

Jan-Ivar, when you read this message by Tim, please consider the possibility of drawing two video elements, one with (1) an unfaithful representation of the capture at full opacity (fully visible), then another (2) faithful representation at minimum opacity (practically invisible). The app places (2) on top and forwards scrolls from it, proving your prompt-replacement mitigations useless.

Now consider that you are asking the browser to employ heuristics to avoid that, and Tim is telling you that there might just be too many ways to do it, and that it's infeasible to both enumerate them as well as mitigate them all.

@steely-glint
Copy link

You don't even need multiple video tags to be usefully deceptive. The simplest example I can come up with is to lie about the amount of scroll by clipping (in css) the rendered video to a small subsection of the captured video then un-scrolling the view port at a rate that partially counteracts the user's attempt to scroll. Most users would just spin the scroll wheel a bit further - possibly revealing things they didn't intend to.

@steely-glint
Copy link

I would add that Jan-Ivar is right that the problem stems from the terrible UX of having the user view and interact with a re-render of their own app. Unfortunately the modern combination of screen shares and the expectation that a VC app will fill the screen makes this inevitable. (in times gone by we'd have re-parented the 'captured' window and added it to the window-tree of the VC app. We live in more complicated times now ;-) )

@steely-glint
Copy link

If you think my example is absurd, consider the value to an attacker of being able to see a couple of slides ahead in an earnings call.

@eladalon1983
Copy link
Member

eladalon1983 commented Oct 23, 2024

Summarizing things:

  • Chrome believes a permissions prompt is a hard requirement for its own implementation.
  • Tim demonstrates that Chrome's position is reasonable and this philosophy is shared by others.
  • Elad argues that if other browsers wish to avoid the prompt, it's trivial for other browsers to avoid it.
  • Youenn raises the issue of revocation, which Elad argues is a great example for the benefits of the pre-existing, well-specced and well-implemented Permissions mechanisms.
  • No feasible mitigations have been demonstrated that would obviate the need for permissions. (And recall again revocation!)

I move to close this issue.

@youennf
Copy link

youennf commented Oct 23, 2024

PTZ is gated on camera.

PTZ is not a permission, it is a descriptor.

If you think my example is absurd, consider the value to an attacker of being able to see a couple of slides ahead in an earnings call.

The only protection that the Google Chrome prompt will provide is that most users may ignore it. I am not sure they might be able to understand the potential threats, this is pretty hard.
If implementing a synchronous prompt in Safari, I am not sure how I would convey the right message to the user.

One disconnect seems to be that one approach is to start small and very protective and another approach is to support all potential use-cases.
The MVP to me is to forward user gestures for a video element that is fully displayed and without any other element above it. We should further define what we want to address in v1.

  • Elad argues is a great example for the benefits of the pre-existing, well-specced and well-implemented Permissions mechanisms.

I might have missed my explanation.
Revocation is basically page wide, which does not work well since the granularity may be on a per media element basis.

That is not to say there are no pros for the permission model or permission policy.
picture-in-picture is using it to control third-party delegation.

@eladalon1983
Copy link
Member

PTZ is not a permission, it is a descriptor.

I see this in Chromium's Translation Console (database of localized strings with some visual context for the translator). Possibly I am missing some later development and/or a nuance?

The only protection that the Google Chrome prompt will provide is that most users may ignore it.

Nobody has said this would be the only protection. We have already discussed multiple other protections; see the move from sendWheel() to captureWheel(), for instance.

If implementing a synchronous prompt in Safari

We are not proposing a synchronous prompt.
Jan-Ivar did (here), but I objected (here).

One disconnect seems to be that one approach is to start small and very protective and another approach is to support all potential use-cases.
The MVP to me is to forward user gestures for a video element that is fully displayed and without any other element above it. We should further define what we want to address in v1.

If this is about a permission prompt, then starting small is to have it and potentially remove it.
If this is about limiting the element type, then video is too small - we have multiple developers in the origin trial that forward from a non-video element, and I am not aware of any that are forwarding from a video element. It is clear that the MVP includes non-video elements. It is NOT clear what would be gained by limiting to video-elements; the mitigation has been shown, multiple times, to be insufficient to increase security. I dare say - it's theater.

I might have missed my explanation

I think you mean "you", and if so, then yes, I have missed it. Could you please link me to the relevant comment?

Revocation is basically page wide, which does not work well since the granularity may be on a per media element basis.

Multiple concurrent screen-captures by one application is an edge-case that I don't expect to tackle in Chrome. (But the spec does not prohibit it, so other browsers should not be bothered.)

@jan-ivar
Copy link
Member Author

jan-ivar commented Oct 23, 2024

... please consider the possibility of drawing two video elements, one with (1) an unfaithful representation of the capture at full opacity (fully visible), then another (2) faithful representation at minimum opacity (practically invisible). The app places (2) on top and forwards scrolls from it, proving your prompt-replacement mitigations useless.

This is click-jacking, which UAs already deal with on the daily. UAs should be given a lot of leeway to fight such behavior when detected, and we should build APIs that support them in this (like tying forwarding to playback as a start).

So these are click-jacking mitigations. Calling them "prompt-replacement" mitigations wrongly suggests a permission prompt adequately addresses click-jacking (or that mitigation can be skipped because the user was warned), which is false.

Permission prompts have shown to be useless in explaining click-jacking threats to users. If users can't understand the risk then we have not obtained meaningful consent.

If you think my example is absurd, consider the value to an attacker of being able to see a couple of slides ahead in an earnings call.

Remote control is mitigated. A company using a malicious VC app for earnings calls seems a bit unrealistic.

I do not wish to diminish the risk. On the contrary, I'm arguing for mitigation and against permission as panacea.

@eladalon1983
Copy link
Member

Permission prompts have shown to be useless in explaining click-jacking threats to users.

The permission policy and prompt are NOT a click-jacking prevention mechanism.

This issue started with a claim that a permission prompt is unnecessary, and a suggestion that its benefits could be better provided with other mechanisms; namely, with a limitation of the element types. In response, Tim and I have shown that element-type-limiting is easy to circumvent, which means it cannot be used as a substitute for anything, because it provides nothing. This is the correct context of this exchange. The claims that (1) a permission prompt is undesirable, and that (2) other mechanisms are sufficient substitutes, both remain unsubstantiated.

Moreover, the counter-claim that if a permission prompt is truly undesirable, the spec does not prevent UAs from skipping it, has not been addressed.

I'm arguing for mitigation and against permission as panacea.

And I am claiming that the mitigation you proposed (limiting element types) confers no security benefits.

Further, the permission was not presented as a panacea, so let's please not characterize that claim as such.

@eladalon1983
Copy link
Member

I'm arguing for mitigation

The only mitigation currently under discussion is captured-surface-control/issues/28. However, my questions about the attack-vector involved, and the effectiveness of the mitigation, remain unanswered. (If you are referring to a different mitigation, however, please advise which.)

@jan-ivar
Copy link
Member Author

Permissions are necessary when undesirable behaviors are indistinguishable from desirable ones (microphone capture for example).

Undesirable behaviors:

  • Attempts to click-jack scrolling input from the user, through techniques such as
    • div covering entire page
    • transparent element
    • element following the mouse
    • element larger than visible preview video
    • element not visible to the user
  • Attempts to induce over-scroll
    • no preview video
    • delayed preview video
    • inauthentic preview video

If UAs detect these behaviors, they can simply disable forwarding. No trust needed.

Desirable behaviors:

  • User-visible, live and stable preview area the user can comfortably interact with (doesn't jump around)
  • Emojis on top

The MVP to me is to forward user gestures for a video element that is fully displayed and without any other element above it.

This seems to satisfy the desirable behaviors except emojis. It seems implementable without relying on users trusting the website, provided the UA detects the undesirable behaviors that remain possible (last-second moving of the element, pausing playback etc.).

I'm open to extending MVP to solving emojis. As I proposed in #49: "CSS [or]... a div.enableGestureForwarding to forward gestures to the video element underneath."

@youennf
Copy link

youennf commented Oct 25, 2024

I also would like to see whether we can do without prompts.
Safari usually requires synchronous prompts and if the only way for this feature is to use that kind of prompt, this might make it hard to support this feature in Safari.

If UAs detect these behaviors, they can simply disable forwarding. No trust needed.

That would be ideal.

  • Emojis on top

is it even needed for the initial MVP?
The demo I saw seems already useful as is for scrolling slides or wikipedia pages.

FWIW, we started very cautious for getDisplayMedia and we added some features progressively, maybe we can do the same here.

@eladalon1983
Copy link
Member

Permissions are necessary when undesirable behaviors are indistinguishable from desirable ones

The established TAG design principle is: "If a useful feature has the potential to cause harm to users, make sure that the user can give meaningful consent for that feature to be used, and that they can refuse consent effectively."

A call to getDisplayMedia() invokes a prompt asking the user whether they want to share the currently-visible pixels of another surface. That is an established prompt, and accepting it does not indicate to the browswer any other intention by the user. The way to understand the user's intention is to prompt them. If screen-share implicitly allows scrolling/zooming, without an additional prompt, then the aforementioned principle is broken.

Prompts might not be perfect, but they are better than guessing. It is perfectly spec-compliant for the user agent to (i) augment the permission policy with heuristics, (ii) skip the additional prompt, or (iii) modify the getDisplayMedia() prompt. The current spec is flexible enough that user agents have much flexibility in implementing it. This is desirable, and usually aids consensus formation.

Safari usually requires synchronous prompts

Chrome usually uses async prompts.
It is quite useful that the API shape proposed already accommodates both these design philosophies, as well as others.

  • Emojis on top

is it even needed for the initial MVP?

Yes, it is. The MVP is informed by the real-world requirements of real-world Web developers, and the real-world scenarios they inform us that they need to solve.

The MVP to me is to forward user gestures for a video element that is fully displayed and without any other element above it.

It seems implementable without relying on users trusting the website

This claim is unsubstantiated; the counterarguments are available earlier in the thread, which explained that the representation in the video element might not be faithful. ([1], [2])

To name another counterargument - malicious sites can get the user to scrolls somewhere, then either:

  • Pop a video element where the user was already scrolling.
  • Have the video already there, but obscured by another element, then remove the obscuring element.

Reminder - I am NOT saying that the prompt is intended to stop clickjacking (recall here). Rather, I am saying that the proposed alternative of limiting-scrolling-to-video-element is succeptible to clickjacking, and it therefore fails to add any value, let alone can it obviate any security measures.

I'm open to extending MVP to solving emojis.

The MVP is informed by Web developers' stated needs. As explained, this includes a video completely obscured by a canvas, div or other element on top of which developers introduce whichever other elements.

@jan-ivar
Copy link
Member Author

jan-ivar commented Oct 30, 2024

Thank you all for the continued and thoughtful discussion on this important issue.

I'm glad we agree that serious click-jacking concerns remain with this API. To address them, I've filed #41 so we can collaboratively work on mitigating them.

I agree with @youennf that adopting a cautious and protective approach by adding features progressively is prudent. In the long term, I believe that user agents shouldn't rely solely on user consent — especially when that consent may not be fully informed due to the complexities of risks like click-jacking. Instead, we should aim to build robust protections directly into the technology.

However, I understand that browsers may need time and practical experience to develop effective safeguards against click-jacking attacks introduced by these new features.

As a compromise, I'm open to considering the inclusion of a permission prompt in the short term, provided we can agree that our long-term goal is to eliminate the need for it once adequate mitigations are in place.

@eladalon1983 makes a good point that vendors can choose to grant this permission by default once they feel confident in their protective measures. Once all browsers reach that level of confidence, we should be able to deprecate the permission or at least reduce its implementation cost.

That said, I would prefer if the inclusion of permission doesn't dictate the API shape.

E.g. if prompting on first scroll is undesirable, what if @youennf's proposed API triggered a prompt and instant NotAllowedError?

try {
  videoElement.enableGestureForwarding = true; // triggers a prompt and fails instantly
} catch (e) {
  if (e.name != "NotAllowedError") throw;
  console.log(videoElement.enableGestureForwarding); // false
}

@eladalon1983
Copy link
Member

As a compromise, I'm open to considering the inclusion of a permission prompt in the short term, provided we can agree that our long-term goal is to eliminate the need for it once adequate mitigations are in place.

Thank you for proposing this compromise. It works for me. But to be perfectly clear - we agree to explore ways to eliminate the policy and/or prompt in the long-term, once we gain real-life data on users' and Web apps' behavior. If the policy and/or prompt are proven unnecessary, we will gladly remove them. But this long-term goal cannot block short-term API shapes that return a Promise, which is necessary for now (see below).

Once all browsers reach that level of confidence, we should be able to deprecate the permission or at least reduce its implementation cost.

I agree.

what if @youennf's proposed API triggered a prompt and instant NotAllowedError?

That is a completely unworkable solution. Please see this comment for a list of the benefits of an async API that returns a Promise. (There are three quote-response pairs in that comment. I am referring to the middle one.)

I'm glad we agree that serious click-jacking concerns remain with this API.

We do not agree about that, but since you propose a compromise which I support - starting with a prompt and considering its removal later - we can bench this discussion.

@eladalon1983
Copy link
Member

Issue transferred; heads up to those discussion participants who might otherwise be looking for it elsewhere: @jan-ivar, @youennf, @steely-glint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants