Skip to content

--find-links should not warn about missing HTML5 doctype #10903

@virtuald

Description

@virtuald

Description

This is not a duplicate of #10825 -- I've moved a number of my comments from #10825 as this is a separate issue. That issue is about enforcing PEP 503, which states that servers implementing the python simple index protocol should have an HTML5 doctype. This is not about that.


When using --find-links, this warning can appear:

× The package index page being used does not have a proper HTML doctype declaration.
╰─> Problematic URL: https://www.tortall.net/~robotpy/wheels/2022/roborio/

note: This is an issue with the page at the URL mentioned above.
hint: You might need to reach out to the owner of that package index, to get this fixed. See https://github.com/pypa/pip/issues/10825 for context.

Currently the --find-links documentation says:

  -f, --find-links <url>      If a URL or path to an html file, then parse for links
                              to archives such as sdist (.tar.gz) or wheel (.whl)
                              files. If a local path or file:// URL that's a
                              directory, then look for archives in the directory
                              listing. Links to VCS project URLs are not supported.

There is no HTML5 doctype requirement mentioned.


To me, --find-links serves a very different purpose than a full-up pypi-style index implementation. For environments where a full python index is too much work (or in corporate environments where working with IT is really difficult), it's very convenient to stick a bunch of files on a webserver and be able to point pip at an arbitrary directory listing and install packages from that directory. Unfortunately, the most popular webservers in the world (and even python's default http.server!) do not put an HTML5 doctype by default, because it simply does not matter if all you're doing is trying to show a directory listing so users can download a file.

You might say, that it's currently only a warning, and it'll be a long time until we make it an error! But it's a useless warning, and the only way to fully resolve this is to go to every webserver vendor in the world and tell them that they must use an HTML5 doctype in their directory listings because pip says so. And then those changes need to be backported to 'stable' linux distributions like RHEL.

In many corporate environments, developers don't get a choice of which webserver IT is using, and so this warning is just unnecessary noise and will waste hundreds of hours for developers and ops teams.

Production-quality web servers that don't emit HTML5 doctype by default

Others that don't

Those that do


I appreciate that html5lib adds a lot of work for pip maintainers. If there's a way to use http.parser and ignore the doctype (which the migration from an error to a warning indicates that it is), it seems like that would save hundreds (thousands?) of person-hours for ops teams all around the world who would need to figure out how to reconfigure their webservers because pip is being unnecessarily picky.

Thanks for your consideration.

Expected behavior

No warning

pip version

22.0.3

Python version

3.10

OS

any

How to Reproduce

N/A

Output

No response

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions