-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Fix vcs password leak through subprocess_cmd. #6862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for filing this! I looked at the PR, and I think there is a cleaner way we can be handling this. Making every caller have to construct and pass I'm thinking a simpler solution could involve defining a simple The wrapper class could be something like this (the name is just a suggestion, maybe class SensitiveText(object):
def __init__(self, raw, redacted):
self.raw = raw
self.redacted = redacted
def __str__(self):
return self.redactedAnd then we could have helper functions as needed so callers won't need to call def hide_url(url):
redacted = redact_password_from_url(url)
return SensitiveText(url, redacted=redacted)
hidden_url = hide_url(...)and the even simpler: def hide_password(password):
return SensitiveText(password, redacted='****')Callers could "hide" variables as soon as we know about them (though for now, we could just restrict this use to the Then pip/src/pip/_internal/utils/misc.py Lines 767 to 772 in 908e203
And of course, we'd also want to convert these objects to |
|
@cjerdonek, that would also be relevant to #6834. |
|
@cjerdonek Thanks for the feedback! I agree, the current approach in this PR feels pretty fragile. I'll start working on incorporating those suggestions! |
|
@cjerdonek I've pushed up some edits, things are looking much cleaner code-wise now! 😄 |
src/pip/_internal/utils/misc.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My instinct is that this class can be made a lot simpler. Like, couldn't it work just to do this? (no need for __new__, __init__, __eq__, and __hash__)
class HiddenText(str):
def __repr__(self):
return '<HiddenText {!r}>'.format(str(self))
def __str__(self):
return self.redacted
@property
def raw(self):
return str.__str__(self)Then "public" creation functions can be made like this, for example:
def hide_password(password):
hidden = HiddenText(password)
hidden.redacted = '****'
return hidden
a = hide_password('*secret*')
print(str(a))
print(repr(a))
print(a.raw)which gives:
****
<HiddenText '****'>
*secret*
I think decoupling the creation functions from the class will be better because it keeps the class simpler. Also, after giving it some thought, I think HiddenText is a better name in part because it's simpler to type and say.
By the way, your __hash__() implementation below I don't think is right because it bases the hash off of the redacted form, which can be the same for different values. The above, on the other hand, does the right thing because it inherits what str does, which is to use the underlying string value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the hash off of the redacted form
That's actually intentional. The idea is that if the redacted form is stored in a hashable location, the redacted form is likely what is being operated upon. The example that I ran into was the urlparse cache where the redacted url was being split. In that case (and I think all cases), the redacted form was what was being cached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the __eq__ is necessary for proper comparison specifically during tests. I was trying to avoid rewriting all of the assertions which compare actual values to expected values but I'm open to rewriting those tests as well if that's a better approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the eq is necessary for proper comparison specifically during tests.
Can you give me an example of some tests that would be affected by what I suggested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This went back to when I was patching the Link model but at that point there were a few tests that were failing when comparing the actual url (HiddenText) against the expected url (str).
Here were a couple of examples:
pip/tests/unit/test_req_file.py
Line 564 in 908e203
| def test_expand_missing_env_variables(self, tmpdir, finder): |
pip/tests/unit/test_req_file.py
Line 535 in 908e203
| def test_expand_existing_env_variables(self, tmpdir, finder): |
Since I'm no longer updating the Link model, these tests wouldn't be impacted but if the intent is to move the HiddenText cast back into the Link model eventually this kind of issue will come up. At that point, it's whether the tests should be updated or the __eq__ comparison logic.
I defer to your preferences here 😄
|
@cjerdonek I've pushed simplifications and renames to the |
|
I think you should keep this initial PR more limited in scope, so don’t
change the Link class.
…On Sun, Aug 11, 2019 at 9:23 PM Allan Feldman ***@***.***> wrote:
@cjerdonek <https://github.com/cjerdonek> I've pushed simplifications and
renames to the HiddenText class. My concerns with the PR are more
functional at the moment: I've edited the Link model to replace the url
attribute with a HiddenText type. I'm not familiar enough with the
overall architecture of pip to tell if I've actually broken something by
doing this. Do you have any concerns with modifying the Link class?
Should I move the HiddenText cast closer in to the versioncontrol logic?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6862?email_source=notifications&email_token=AACW33TTDUSCNM6PLHYQUS3QEDQUTA5CNFSM4IK44LJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4BRH6A#issuecomment-520295416>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACW33VZS3OHKLEXDHKZCFLQEDQUTANCNFSM4IK44LJA>
.
|
|
Scoping down the PR sounds like a good plan 😄 |
I still see the class methods.. |
Sorry, rereading the suggestion posted, I'm not entirely clear on the suggestion. Is the suggestion to remove the class methods entirely and replace with helper functions? Is there an advantage to having functions here over class methods? I'll go with whichever you suggest I'm just curious to learn more about the rationale. |
|
Yes, the suggestion was to have the creation functions be separate from the class. This keeps the class simpler so it doesn't e.g. have to know about passwords, parsing URL's, etc. Also, if we later want to add something like a |
|
@cjerdonek Got it, I also see what you were saying about the simplification. I'm making edits to be more in-line with your comment now 😄 |
|
@cjerdonek It doesn't look like the
|
|
@cjerdonek I've gone ahead with option 1 and left out all of the |
|
Can you say why it didn’t work for you?
…On Sun, Aug 11, 2019 at 10:23 PM Allan Feldman ***@***.***> wrote:
@cjerdonek <https://github.com/cjerdonek> It doesn't look like the
HiddenText class suggested here
<#6862 (comment)> works. I'm
not quite sure how to proceed. There are a couple of ways I can get this to
work:
1. I can use the HiddenText class with the redacted value as the
string and the raw value as an additional attribute. The downside to this
is the eq behavior may have to be changed, depending on how we want to
handle things.
2. I can remove the string subclass and pass the raw / redacted
explicitly but this has the downside of not being able to work in areas
that require strings or byte strings (like regex).
3. Possibly other ways 😄
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6862?email_source=notifications&email_token=AACW33SBLGUEDFVY6I2YTKLQEDXTXA5CNFSM4IK44LJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4BS3SA#issuecomment-520302024>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACW33UDZL73LTXQY5WZ7WDQEDXTXANCNFSM4IK44LJA>
.
|
When I used the class given, I observed the passwords getting leaked into the logs. This was due to the call to format_command_args which was ultimately using the raw value to formulate the description. |
This is why I said this in my first comment:
|
|
I worry that defaulting to operating on secrets is still error prone. Will there be future issues that occur from defaulting to the use of the raw secret value for string operations? The current implementation is fail-secure, the raw value must be explicitly accessed. I can certainly see advantages / disadvantages either way, it does seem prudent to discuss though. |
|
I would focus on getting something working that is simple and doesn't require too many changes. And we can always continue to refine. My suggestion in my first comment was "fail secure" in the sense that you're talking about, but I don't know what effect it would have on type-checking without trying it out myself. Personally, I'd rather we didn't subclass from |
|
I pushed up an edit which does an isinstance check inside of |
src/pip/_internal/utils/misc.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can just be shlex_quote(str(arg)).
It looks like the Also, I would add a comment explaining why the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, as the PR is in better shape now, I've added some more detailed comments.
Also, I don't want to put more on your plate now that things look like they're working, but did you ever try the form of class I suggested in my original comment here? That has some advantages in being more explicit about what's being passed around. For example, it would let us distinguish more easily between HiddenText and str when using the the type-checker (and not get confused about whether something has been converted yet). It also addresses the "failure mode" issue you raised earlier.
src/pip/_internal/download.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though this seems harmless, I think it would be better to keep things narrow for now and just keep to the minimal places in the vcs code. We can expand the use in later PR's as the pattern solidifies, etc.
src/pip/_internal/utils/misc.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add type annotations to all the new methods and functions you're adding.
tests/functional/test_install.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be best to avoid adding more tests that depend on network access, IMO, as it adds flakiness, makes the tests slower, etc. It would be best if we can find the appropriate places at which to unit test to test this functionality (starting with the "inner-most" functions).
src/pip/_internal/utils/misc.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you have one too many Unions. Can't this just be List[Union[str, HiddenText]]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mypy doesn't allow it because this function is in use by req_install.py and a few other files.
mypy appears to treat List[Union[str, HiddenText]] as incompatible with List[str].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, you might want to define a short-hand for List[Union[str, HiddenText]] then to make the type annotations more concise. CommandArgs? Then it can be Union[List[str], CommandArgs].
|
@cjerdonek Ah, I see! Sorry for the misunderstanding - I read the original feedback as there being a way to avoid intercepting all of the
I thought it was a cool idea and tried playing around with it! I've pushed an update going back to the original plan of patching all of the URL locations in each of the vcs modules. |
I agree, and thanks for your efforts trying it out! :) I think we both got a taste for what that might involve. Since that "next step" would take more work and planning, etc. to make sure it all works out (as it's more extensive and potentially more sophisticated), I think it's better for now to save something like that for a later PR, and just get something simpler in place as a start to address the immediate issue.. |
|
@cjerdonek 👍 ! Any ideas on regression tests for this? If I remove the network marker from the tests submitted earlier in the PR, will those be ok? They do tend to run on the slower side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One quick comment. (This isn't exhaustive. Just pointing out one thing for now.)
|
@cjerdonek OK I've pushed up an edit which annotates the inputs to the VCS methods and hides the url in versioncontrol.py The only known deviation is that Hopefully I've addressed all feedback 🙏 |
|
One thing that can be done to simplify the changes would be to define a function like this and put it in the same module that you define def cast_args(args):
# type: (Union[List[str], List[HiddenText], CommandArgs]) -> CommandArgs
new_args = [] # type: CommandArgs
new_args.extend(args)
return new_argsThen instead of having to break every list up into several lines, you can just convert the first element. So things like this: cmd_args = (
['export'] + self.get_remote_call_options() +
rev_options.to_args() + [url, location]
)would just need to be changed to: cmd_args = (
cast_args(['export']) + self.get_remote_call_options() +
rev_options.to_args() + [url, location]
)Also, question: how many changes to the code were due just to adding the type annotations themselves and not related to |
|
Aha! I can use a type cast helper! Thanks for the tip, I'm not very practiced with python type annotations 😄
I tried to make all of the type annotation updates in a specific commit to hopefully ease some of the review pain There should be a way in github to filter by specific commit when reviewing the diffs as well. |
|
Would you be open to filing a PR that adds type annotations to the vcs
modules without any changes related to HiddenText? That would let us see
that CI is passing. Also, in your PR I saw some changes that didn’t seem
related to the hidden text stuff (e.g. around None values), so I’m assuming
it was done to get the type checker to pass.
…On Tue, Aug 13, 2019 at 7:08 PM Allan Feldman ***@***.***> wrote:
Aha! I can use a type cast helper! Thanks for the tip, I'm not very
practiced with python type annotations 😄
how many changes to the code were due just to adding the type annotations
themselves and not related to HiddenText?
I tried to make all of the type annotation updates in a specific commit to
hopefully ease some of the review pain
eff94f7
<eff94f7>
There should be a way in github to filter by specific commit when
reviewing the diffs as well.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6862?email_source=notifications&email_token=AACW33XV44RN3ZM5T53OIWDQENSJTA5CNFSM4IK44LJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4HPRYI#issuecomment-521074913>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACW33QNRGLROXNLQPRMAQTQENSJTANCNFSM4IK44LJA>
.
|
If you're ok opening and driving that second PR, we can merge that one first and then I can rebase on top of those changes. To be honest I'm having trouble mentally tracking all of the changes flying around at the moment and I feel that driving another PR would add to my cognitive overhead.
Yeah, that's correct. In particular the none check you're referring to is because the property returns an optional so the type checker was complaining about accessing an attribute on None-typed values. |
|
You would be able to set this one down while doing that one. The purpose
would be to make each PR do fewer things. Otherwise, I would recommend you
go back to the PR prior to when I suggested that you change the argument
type, because I had assumed the vcs modules were already annotated.
…On Tue, Aug 13, 2019 at 7:25 PM Allan Feldman ***@***.***> wrote:
Would you be open to filing a PR that adds type annotations to the vcs
modules without any changes related to HiddenText?
If you're ok opening and driving that second PR, we can merge that one
first and then I can rebase on top of those changes. To be honest I'm
having trouble mentally tracking all of the changes flying around at the
moment and I feel that driving another PR would add to my cognitive
overhead.
I saw some changes that didn’t seem related to the hidden text stuff (e.g.
around None values), so I’m assuming it was done to get the type checker to
pass.
Yeah, that's correct. In particular the none check you're referring to is
because the property returns an optional so the type checker was
complaining about accessing an attribute on None-typed values.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6862?email_source=notifications&email_token=AACW33QDQ5ATKYVUW23EVQTQENUJTA5CNFSM4IK44LJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4HQIUY#issuecomment-521077843>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACW33UVOB2KCWFRZCVYX2TQENUJTANCNFSM4IK44LJA>
.
|
|
👍 I'll open a 2nd PR for the type annotations and put this one on hold for now. |
|
I posted PR #6890 to finish the work started here, FYI. |
pip installs that use auth with version control systems appear to be leaking passwords in the logs.
A partial fix appears to have been applied in #5773 but the passwords are still leaking due to
subprocess_cmdprinting the command description which includes the password in use.This PR attempts to scrub the passwords out of the logs by using a new
HiddenTextclass to hide the raw url contents in arawattribute until explicitly needed. Thesubprocess_cmdfunction has been modified to extract therawattribute when calling the underlying command.