-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Description
What's the problem this feature will solve?
Currently, pip installation statistics are aggregated to the gCloud and made available on libraries.io and pepy.tech. A lot of effort has gone into these numbers, but thanks to automation, they mean less now than they did a few years ago.
CI and other automation, combined with maybe a bit too much reliance on PyPI's central infrastructure, have inflated the download numbers and diluted the signal with noise.
Describe the solution you'd like
We could detect when pip is being used interactively (by checking if stdin is a tty or some other mechanism), and include that in the pip install request headers, to be included in the statistics generated by the server.
This would provide us with much cleaner data for highlighting actual community activity, instead of drowning in automation trends, overly favoring professionalized sectors of Python. Specifically, a library being manually installed 100 times may well indicate something much more interesting than a CI (or, unfortunately, a production) fleet installing a package 10,000 times.
Additional context
- I wasn't sure whether to file this on pip or on Warehouse, it seems kind of 🐔 / 🥚 to me.
- I'm not really sure if/how other package indexes solve this, but would be very interested in hearing.
- As an arbitrary example, I happen to know Mozilla uses PyPI for quite a few relatively-internal packages. Granted, they're open-source and I'm happy to see some infrastructure synergy. But, picking at random, mozlog actually ranks ok for downloads, even though it's not a very broadly-useful package, and I'm pretty sure the data will show it's mostly Mozilla infrastructure downloading it.
Thanks for your attention and keep up the good work!