Skip to content

Fix #81543: parse_url omits leading slash in windows paths #7598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Pchelolo
Copy link

According to the RFC8089 Section 2[1]:

The generic syntax in [RFC3986] includes "path" and "authority"
components, for each of which only a subset is used in the definition
of the file URI scheme. The relevant subset of "path" is "path-
absolute"

'path-absolute' in RFC3986 is defined as beginning with a '/' character.

There's also a requirement in RFC3986 section 3.3 [2], that

If a URI contains an authority component, then the path component
must either be empty or begin with a slash ("/") character.

For file:///c:/ URL the authority component is present but empty,
which implies 'localhost'. Thus the 'path' component must begin
with the '/' character.

For all intents and purposes 'file:///c:/' and 'file://localhost/c:/'
URLs are equivalent, so parsing them differently is a surprising behaviour.

Other popular languages like Java or JavaScript preserve the
leading slash character as well.

This effectively reverts commit [3].

[1] https://datatracker.ietf.org/doc/html/rfc8089#section-2
[2] https://datatracker.ietf.org/doc/html/rfc3986#section-3.3
[3] 4505a61

@cmb69
Copy link
Member

cmb69 commented Oct 20, 2021

Thanks for the PR! IMO it's okay for master, if we properly document the change.

@Pchelolo Pchelolo force-pushed the parse_url_windows_paths branch from c2d2075 to a18157a Compare October 20, 2021 15:27
According to the RFC8089 Section 2[1]:

> The generic syntax in [RFC3986] includes "path" and "authority"
  components, for each of which only a subset is used in the definition
  of the file URI scheme.  The relevant subset of "path" is "path-
  absolute"

'path-absolute' in RFC3986 is defined as beginning with a '/' character.

There's also a requirement in RFC3986 section 3.3 [2], that

> If a URI contains an authority component, then the path component
  must either be empty or begin with a slash ("/") character.

For file:///c:/ URL the authority component is present but empty,
which implies 'localhost'. Thus the 'path' component must begin
with the '/' character.

For all intents and purposes 'file:///c:/' and 'file://localhost/c:/'
URLs are equivalent, so parsing them differently is a surprising behaviour.

Other popular languages like Java or JavaScript preserve the
leading slash character as well.

This effectively reverts commit [3].

[1] https://datatracker.ietf.org/doc/html/rfc8089#section-2
[2] https://datatracker.ietf.org/doc/html/rfc3986#section-3.3
[3] php@4505a61
@Pchelolo Pchelolo force-pushed the parse_url_windows_paths branch from a18157a to 2badf46 Compare October 20, 2021 15:29
@krakjoe krakjoe added the Bug label Oct 22, 2021
@nikic
Copy link
Member

nikic commented Nov 3, 2021

I'm not sure about this. While the change is "technically correct", I suspect that it's going to break a significant amount of code, because "file:///" URIs will no longer "just work" on Windows. You now need to manually strip the leading slash.

@cmb69
Copy link
Member

cmb69 commented Nov 3, 2021

So better discuss this on the internals mailing list?

@Pchelolo Pchelolo requested a review from bukka as a code owner October 7, 2023 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 participants