fix: correct read_csv behaviours with use_cols, names, index_col #1804

chelsea-lin · 2025-06-10T00:02:45Z

Fixes internal issue 421466334 🦕

sycai · 2025-06-10T20:19:44Z

bigframes/session/loader.py

+                table_column_names = [field.name for field in table.schema]
+                for column_name in columns:
+                    if column_name not in table_column_names:
+                        possibility = min(
+                            table_column_names,
+                            key=lambda item: bigframes._tools.strings.levenshtein_distance(
+                                column_name, item
+                            ),
+                        )
+                        raise ValueError(
+                            f"Column '{column_name}' of `columns` not found in this table. "
+                            f"Did you mean '{possibility}'?"
+                        )


nit: Shall we use a helper function/method? The indentation is very deep here. (https://goto.google.com/tott/733)

sycai · 2025-06-10T20:21:05Z

bigframes/session/loader.py

+        else:
+            if names is not None:


It feels like this is the same as elif names is not None: which can save you a level of indentation.

sycai · 2025-06-10T20:23:37Z

bigframes/session/loader.py

+                assert len(table.schema) >= len(list(names))
+                assert len(list(names)) >= len(columns)
+                table_column_names = [
+                    field.name for field in table.schema[: len(list(names))]
+                ]
+
+                invalid_columns = set(columns) - set(names)
+                if len(invalid_columns) != 0:
+                    raise ValueError(
+                        "Usecols do not match columns, columns expected but not "
+                        f"found: {invalid_columns}"
+                    )
+
+                rename_to_schema = dict(zip(list(names), table_column_names))
+                names = columns
+                columns = [rename_to_schema[renamed_name] for renamed_name in columns]


The hosting function is very long. I think there might be an opportunity to make this code block a helper function/method

go/pystyle#function-length

Refactored for more readable. Please check.

…meters

product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: m Pull request size is medium. and removed size: l Pull request size is large. labels Jun 10, 2025

chelsea-lin marked this pull request as ready for review June 10, 2025 18:41

chelsea-lin requested review from a team as code owners June 10, 2025 18:41

chelsea-lin requested a review from sycai June 10, 2025 18:41

blunderbuss-gcf bot assigned ivansmf Jun 10, 2025

sycai reviewed Jun 10, 2025

View reviewed changes

chelsea-lin added 3 commits June 11, 2025 22:51

fix: correct read_csv behaviours with use_cols, names, index_col para…

1bac10c

…meters

fix test_default_index_warning_not_raised_by_read_gbq_primary_key

2282da8

refactor read_gbq_table for more readable

d345126

chelsea-lin force-pushed the main_chelsealin_readcsv branch from 94c8af9 to d345126 Compare June 11, 2025 22:54

product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Jun 11, 2025

chelsea-lin requested a review from sycai June 11, 2025 22:54

sycai previously approved these changes Jun 11, 2025

View reviewed changes

fix presubmit

25d91a2

chelsea-lin dismissed sycai’s stale review via 25d91a2 June 11, 2025 23:18

sycai approved these changes Jun 11, 2025

View reviewed changes

chelsea-lin merged commit 855031a into main Jun 12, 2025
22 of 24 checks passed

chelsea-lin deleted the main_chelsealin_readcsv branch June 12, 2025 16:58

release-please bot mentioned this pull request Jun 12, 2025

chore(main): release 2.7.0 #1805

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: correct read_csv behaviours with use_cols, names, index_col #1804

fix: correct read_csv behaviours with use_cols, names, index_col #1804

Uh oh!

chelsea-lin commented Jun 10, 2025

sycai Jun 10, 2025

chelsea-lin Jun 11, 2025

sycai Jun 10, 2025

chelsea-lin Jun 11, 2025

sycai Jun 10, 2025

chelsea-lin Jun 11, 2025

Uh oh!

fix: correct read_csv behaviours with use_cols, names, index_col #1804

fix: correct read_csv behaviours with use_cols, names, index_col #1804

Uh oh!

Conversation

chelsea-lin commented Jun 10, 2025

sycai Jun 10, 2025

Choose a reason for hiding this comment

chelsea-lin Jun 11, 2025

Choose a reason for hiding this comment

sycai Jun 10, 2025

Choose a reason for hiding this comment

chelsea-lin Jun 11, 2025

Choose a reason for hiding this comment

sycai Jun 10, 2025

Choose a reason for hiding this comment

chelsea-lin Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!