Skip to content

fix: correct read_csv behaviours with use_cols, names, index_col #1804

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

chelsea-lin
Copy link
Contributor

Fixes internal issue 421466334 🦕

@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Jun 10, 2025
@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: l Pull request size is large. labels Jun 10, 2025
@chelsea-lin chelsea-lin marked this pull request as ready for review June 10, 2025 18:41
@chelsea-lin chelsea-lin requested review from a team as code owners June 10, 2025 18:41
@chelsea-lin chelsea-lin requested a review from sycai June 10, 2025 18:41
Comment on lines +632 to +644
table_column_names = [field.name for field in table.schema]
for column_name in columns:
if column_name not in table_column_names:
possibility = min(
table_column_names,
key=lambda item: bigframes._tools.strings.levenshtein_distance(
column_name, item
),
)
raise ValueError(
f"Column '{column_name}' of `columns` not found in this table. "
f"Did you mean '{possibility}'?"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Shall we use a helper function/method? The indentation is very deep here. (https://goto.google.com/tott/733)

Comment on lines +613 to +614
else:
if names is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this is the same as elif names is not None: which can save you a level of indentation.

Comment on lines +615 to +630
assert len(table.schema) >= len(list(names))
assert len(list(names)) >= len(columns)
table_column_names = [
field.name for field in table.schema[: len(list(names))]
]

invalid_columns = set(columns) - set(names)
if len(invalid_columns) != 0:
raise ValueError(
"Usecols do not match columns, columns expected but not "
f"found: {invalid_columns}"
)

rename_to_schema = dict(zip(list(names), table_column_names))
names = columns
columns = [rename_to_schema[renamed_name] for renamed_name in columns]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hosting function is very long. I think there might be an opportunity to make this code block a helper function/method

go/pystyle#function-length

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: m Pull request size is medium.
3 participants