-
Notifications
You must be signed in to change notification settings - Fork 50
fix: correct read_csv behaviours with use_cols, names, index_col #1804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
table_column_names = [field.name for field in table.schema] | ||
for column_name in columns: | ||
if column_name not in table_column_names: | ||
possibility = min( | ||
table_column_names, | ||
key=lambda item: bigframes._tools.strings.levenshtein_distance( | ||
column_name, item | ||
), | ||
) | ||
raise ValueError( | ||
f"Column '{column_name}' of `columns` not found in this table. " | ||
f"Did you mean '{possibility}'?" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Shall we use a helper function/method? The indentation is very deep here. (https://goto.google.com/tott/733)
else: | ||
if names is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like this is the same as elif names is not None:
which can save you a level of indentation.
assert len(table.schema) >= len(list(names)) | ||
assert len(list(names)) >= len(columns) | ||
table_column_names = [ | ||
field.name for field in table.schema[: len(list(names))] | ||
] | ||
|
||
invalid_columns = set(columns) - set(names) | ||
if len(invalid_columns) != 0: | ||
raise ValueError( | ||
"Usecols do not match columns, columns expected but not " | ||
f"found: {invalid_columns}" | ||
) | ||
|
||
rename_to_schema = dict(zip(list(names), table_column_names)) | ||
names = columns | ||
columns = [rename_to_schema[renamed_name] for renamed_name in columns] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hosting function is very long. I think there might be an opportunity to make this code block a helper function/method
go/pystyle#function-length
Fixes internal issue 421466334 🦕