Skip to content

Add defaults to str accessor methods #1305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

hamdanal
Copy link
Contributor

@hamdanal hamdanal commented Aug 3, 2025

Added simple default values to the str accessor.

I made a few more changes for things I noticed while adding default values:

  • Added a dtype parameter to str.decode
  • Fixed the type of str.get key, its documented to accept hashable objects not just int
  • Made the table of str.translate covariant as it does not modify the input object
  • Marked str.wrap parameters as keyword-only (at runtime they are passed through **kwargs) and fixed their type
  • Simplified some redundant overloads and added missing tests for some edge cases
idx_list = pd.Index([["apple", "banana"], ["cherry", "date"], ["one", "eggplant"]])
_check(assert_type(idx_list.str.join("-"), "pd.Index[str]"))

# wrap doesn't accept positional arguments other than width
with pytest.raises(TypeError):
idx.str.wrap(80, False) # type: ignore[misc] # pyright: ignore[reportCallIssue]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would usually recommend you use TYPE_CHECKING_INVALID_USAGE instead of wrapping it in a pytest context manager: the goal is to test the type checking, not the error raised (same for the other places).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure happy to change it but isn't pytest.raises better because it tests the thing we are claiming to raise actually raises?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends what you want to test, the type checker has nothing to do with pytest, what we want is to make sure that pyright and mypy complain thus you put # type: ignore[misc] # pyright: ignore[reportCallIssue].
The fact that it raises a TypeError is something that pandas controls, not the stubs. That is why we wrap it in TYPE_CHECKING_INVALID_USAGE so it is not run when testing but we only check how mypy and pyright behave.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other note is that if one day pandas decides to change the TypeError into for example a RuntimeError, you would have to change the test but fundamentally the behavior of the types has not changed. That is why it is often easier for future maintenance to wrap it with TYPE_CHECKING_INVALID_USAGE instead of a pytest.raises, maybe @Dr-Irv can add some details if I missed something.

Copy link
Contributor Author

@hamdanal hamdanal Aug 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends what you want to test, the type checker has nothing to do with pytest, what we want is to make sure that pyright and mypy complain thus you put # type: ignore[misc] # pyright: ignore[reportCallIssue].

I am well aware of that; that's why I added the ignore comments in the test.

The fact that it raises a TypeError is something that pandas controls, not the stubs. That is why we wrap it in TYPE_CHECKING_INVALID_USAGE so it is not run when testing but we only check how mypy and pyright behave.

Sure but what we actually want is the type checkers to warn on something that is not supported at runtime -- that is why this project also runs pytest, to test the runtime types as well as the static types otherwise you'll get type errors on something that is totally valid.

One other note is that if one day pandas decides to change the TypeError into for example a RuntimeError, you would have to change the test but fundamentally the behavior of the types has not changed. That is why it is often easier for future maintenance to wrap it with TYPE_CHECKING_INVALID_USAGE instead of a pytest.raises

If this is your concern, I think using pytest.raises(Exception) is the best balance, generic enough so that it doesn't depend on what error might pandas use but still check that what the test claims should raise at runtime raises.
For example, consider that in the future pandas starts allowing positional arguments here, with pytest.raises you'll immediately get a failing test so you know you should fix your stub definition while with your suggestion you get no such clue and the stub become out-of-date with the runtime.

This is all to explain why pytest.raises is better here imo but I don't feel strongly about it so I'll change it to use TYPE_CHECKING_INVALID_USAGE (which I didn't know about so thanks for the hint) similar to the other parts of the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants