feat: add `GeoSeries.from_xy()` #1364

arwas11 · 2025-02-05T18:25:13Z

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)
- from_xy: screen/3pW4NWMsvfvVeYd, screen/3SimxdQU93zjinv, screen/rfsK45NBkxkUdz7

Fixes #393394365 🦕

…point-from-xy

review-notebook-app · 2025-02-05T18:25:20Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

tswast

Thanks!

tswast · 2025-02-05T20:31:17Z

Doctest failures:

FAILED third_party/bigframes_vendored/pandas/core/frame.py::frame.DataFrame.map
FAILED third_party/bigframes_vendored/pandas/core/frame.py::frame.DataFrame.apply
FAILED third_party/bigframes_vendored/pandas/core/series.py::series.Series.apply
FAILED third_party/bigframes_vendored/pandas/core/series.py::series.Series.map
FAILED third_party/bigframes_vendored/pandas/core/series.py::series.Series.mask
FAILED bigframes/dataframe.py::bigframes.dataframe.DataFrame.map
FAILED bigframes/dataframe.py::bigframes.dataframe.DataFrame.applymap
FAILED bigframes/pandas/io/api.py::bigframes.pandas.io.api.read_gbq_function
FAILED bigframes/session/__init__.py::bigframes.session.Session.read_gbq_function

These don't appear to be related to this change.

tswast · 2025-02-05T20:33:21Z

Notebook test failures:

Failed: generative_ai/bq_dataframes_llm_kmeans.ipynb
Failed: geo/geoseries.ipynb
Failed: getting_started/getting_started_bq_dataframes.ipynb
Failed: location/regionalized.ipynb_us-central1
Failed: location/regionalized.ipynb_us
Failed: remote_functions/remote_function.ipynb
Failed: remote_functions/remote_function_usecases.ipynb

The geoseries.ipynb one does look related. Let's fix before merging.

error

__ /tmpfs/src/github/python-bigquery-dataframes/notebooks/geo/geoseries.ipynb __
---------------------------------------------------------------------------
bigframes.geopandas.GeoSeries.from_xy(geo_points.x, geo_points.y)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File /[tmpfs/src/github/python-bigquery-dataframes/.nox/notebook-3-10/lib/python3.10/site-packages/IPython/core/formatters.py:770](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/.nox/notebook-3-10/lib/python3.10/site-packages/IPython/core/formatters.py?l=770), in PlainTextFormatter.__call__(self, obj)
    763 stream = StringIO()
    764 printer = pretty.RepresentationPrinter(stream, self.verbose,
    765     self.max_width, self.newline,
    766     max_seq_length=self.max_seq_length,
    767     singleton_pprinters=self.singleton_printers,
    768     type_pprinters=self.type_printers,
    769     deferred_pprinters=self.deferred_printers)
--> 770 printer.pretty(obj)
    771 printer.flush()
    772 return stream.getvalue()

File /[tmpfs/src/github/python-bigquery-dataframes/.nox/notebook-3-10/lib/python3.10/site-packages/IPython/lib/pretty.py:419](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/.nox/notebook-3-10/lib/python3.10/site-packages/IPython/lib/pretty.py?l=419), in RepresentationPrinter.pretty(self, obj)
    408                         return meth(obj, self, cycle)
    409                 if (
    410                     cls is not object
    411                     # check if cls defines __repr__
   (...)
    417                     and callable(_safe_getattr(cls, "__repr__", None))
    418                 ):
--> 419                     return _repr_pprint(obj, self, cycle)
    421     return _default_pprint(obj, self, cycle)
    422 finally:

File /[tmpfs/src/github/python-bigquery-dataframes/.nox/notebook-3-10/lib/python3.10/site-packages/IPython/lib/pretty.py:794](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/.nox/notebook-3-10/lib/python3.10/site-packages/IPython/lib/pretty.py?l=794), in _repr_pprint(obj, p, cycle)
    792 """A pprint that just redirects to the normal repr function."""
    793 # Find newlines and replace them with p.break_()
--> 794 output = repr(obj)
    795 lines = output.splitlines()
    796 with p.group():

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/core/log_adapter.py:147](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/log_adapter.py?l=147), in method_logger.<locals>.wrapper(self, *args, **kwargs)
    144 _call_stack.append(full_method_name)
    146 try:
--> 147     return method(self, *args, **kwargs)
    148 except (NotImplementedError, TypeError) as e:
    149     # Log method parameters that are implemented in pandas but either missing (TypeError)
    150     # or not fully supported (NotImplementedError) in BigFrames.
    151     # Logging is currently supported only when we can access the bqclient through
    152     # self._block.expr.session.bqclient. Also, to avoid generating multiple queries
    153     # because of internal calls, we log only when the method is directly invoked.
    154     if hasattr(self, "_block") and len(_call_stack) == 1:

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/series.py:346](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/series.py?l=346), in Series.__repr__(self)
    343     return formatter.repr_query_job(self._compute_dry_run())
    345 self._cached()
--> 346 pandas_df, _, query_job = self._block.retrieve_repr_request_results(max_results)
    347 self._set_internal_query_job(query_job)
    349 pd_series = pandas_df.iloc[:, 0]

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/core/blocks.py:1453](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/blocks.py?l=1453), in Block.retrieve_repr_request_results(self, max_results)
   1445 """
   1446 Retrieves a pandas dataframe containing only max_results many rows for use
   1447 with printing methods.
   1448 
   1449 Returns a tuple of the dataframe and the overall number of rows of the query.
   1450 """
   1452 # head caches full underlying expression, so row_count will be free after
-> 1453 head_result = self.session._executor.head(self.expr, max_results)
   1454 count = self.session._executor.get_row_count(self.expr)
   1456 arrow = head_result.to_arrow_table()

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/session/executor.py:401](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/session/executor.py?l=401), in BigQueryCachingExecutor.head(self, array_value, n_rows)
    399 maybe_row_count = self._local_get_row_count(array_value)
    400 if (maybe_row_count is not None) and (maybe_row_count <= n_rows):
--> 401     return self.execute(array_value, ordered=True)
    403 if not self.strictly_ordered and not array_value.node.explicitly_ordered:
    404     # No user-provided ordering, so just get any N rows, its faster!
    405     return self.peek(array_value, n_rows)

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/session/executor.py:286](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/session/executor.py?l=286), in BigQueryCachingExecutor.execute(self, array_value, ordered, col_id_overrides, use_explicit_destination, get_size_bytes, page_size, max_results)
    283 # Runs strict validations to ensure internal type predictions and ibis are completely in sync
    284 # Do not execute these validations outside of testing suite.
    285 if "PYTEST_CURRENT_TEST" in os.environ and len(col_id_overrides) == 0:
--> 286     self._validate_result_schema(array_value, iterator.schema)
    288 return ExecuteResult(
    289     arrow_batches=iterator_supplier,
    290     schema=adjusted_schema,
   (...)
    293     total_rows=iterator.total_rows,
    294 )

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/session/executor.py:648](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/session/executor.py?l=648), in BigQueryCachingExecutor._validate_result_schema(self, array_value, bq_schema)
    642 def _validate_result_schema(
    643     self,
    644     array_value: bigframes.core.ArrayValue,
    645     bq_schema: list[bigquery.SchemaField],
    646 ):
    647     actual_schema = tuple(bq_schema)
--> 648     ibis_schema = bigframes.core.compile.test_only_ibis_inferred_schema(
    649         self.replace_cached_subtrees(array_value.node)
    650     )
    651     internal_schema = array_value.schema
    652     if not bigframes.features.PANDAS_VERSIONS.is_arrow_list_dtype_usable:

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/core/compile/api.py:83](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/compile/api.py?l=83), in test_only_ibis_inferred_schema(node)
     81 node = _STRICT_COMPILER._preprocess(node)
     82 compiled = _STRICT_COMPILER.compile_node(node)
---> 83 items = tuple(
     84     bigframes.core.schema.SchemaItem(name, compiled.get_column_type(ibis_id))
     85     for name, ibis_id in zip(node.schema.names, compiled.column_ids)
     86 )
     87 return bigframes.core.schema.ArraySchema(items)

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/core/compile/api.py:84](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/compile/api.py?l=84), in <genexpr>(.0)
     81 node = _STRICT_COMPILER._preprocess(node)
     82 compiled = _STRICT_COMPILER.compile_node(node)
     83 items = tuple(
---> 84     bigframes.core.schema.SchemaItem(name, compiled.get_column_type(ibis_id))
     85     for name, ibis_id in zip(node.schema.names, compiled.column_ids)
     86 )
     87 return bigframes.core.schema.ArraySchema(items)

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/core/compile/compiled.py:154](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/compile/compiled.py?l=154), in UnorderedIR.get_column_type(self, key)
    147 def get_column_type(self, key: str) -> bigframes.dtypes.Dtype:
    148     ibis_type = typing.cast(
    149         bigframes.core.compile.ibis_types.IbisDtype,
    150         self._get_ibis_column(key).type(),
    151     )
    152     return typing.cast(
    153         bigframes.dtypes.Dtype,
--> 154         bigframes.core.compile.ibis_types.ibis_dtype_to_bigframes_dtype(ibis_type),
    155     )

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/core/compile/ibis_types.py:307](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/compile/ibis_types.py?l=307), in ibis_dtype_to_bigframes_dtype(ibis_dtype)
    305     return IBIS_TO_BIGFRAMES[ibis_dtypes.string]
    306 else:
--> 307     raise ValueError(
    308         f"Unexpected Ibis data type {ibis_dtype}. {constants.FEEDBACK_LINK}"
    309     )

ValueError: Unexpected Ibis data type point:geometry. Share your usecase with the BigQuery DataFrames team at the [https://bit.ly/bigframes-feedback](https://www.google.com/url?q=https://bit.ly/bigframes-feedback&sa=D) survey.You are currently running BigFrames version 1.35.0

Learn more about nbmake at [https://github.com/treebeardtech/nbmake](https://www.google.com/url?q=https://github.com/treebeardtech/nbmake&sa=D)

tswast · 2025-02-05T20:40:18Z

bigframes/core/compile/ibis_types.py

Per the notebook failure, please add some logic to ibis_dtype_to_bigframes_dtype looking for all ibis geography-y types and returning the gpd.array.GeometryDtype().

Something like

if isinstance(ibis_dtype, ibis_dtypes.GeoSpatial): return gpd.array.GeometryDtype()

around here:

python-bigquery-dataframes/bigframes/core/compile/ibis_types.py

Line 284 in b9bdca8

The reason we can look for ibis_dtypes.GeoSpatial is that all the geography-y types have it as a superclass. See:

python-bigquery-dataframes/third_party/bigframes_vendored/ibis/expr/datatypes/core.py

Line 974 in b9bdca8

class Point(GeoSpatial):

tswast

Marking as request changes to cover the ibis_dtype_to_bigframes_dtype change for the notebook fix before merging.

* chore: support timestamp subtractions * Fix format * use tree rewrites to dispatch timestamp_diff operator * add TODO for more node updates * polish the code and fix typos * fix comment * add rewrites to compile_raw and compile_peek_sql

* chore: add a tool to upload tpcds data to bigquery. * update error type * update docstring

* feat: add GeoSeries.from_xy * add from_xy test and update ibis types * update geoseries notebook with from_xy * Update docstring example * fix doctstring lint error * return GeometryDtype() for all ibis geo types * chore: support timestamp subtractions (#1346) * chore: support timestamp subtractions * Fix format * use tree rewrites to dispatch timestamp_diff operator * add TODO for more node updates * polish the code and fix typos * fix comment * add rewrites to compile_raw and compile_peek_sql * chore: add a tool to upload tpcds data to bigquery. (#1367) * chore: add a tool to upload tpcds data to bigquery. * update error type * update docstring --------- Co-authored-by: Shenyang Cai <sycai@users.noreply.github.com> Co-authored-by: Huan Chen <142538604+Genesis929@users.noreply.github.com>

arwas11 added 4 commits February 4, 2025 15:42

feat: add GeoSeries.from_xy

abb8f06

add from_xy test and update ibis types

d207a9f

Merge remote-tracking branch 'origin/main' into b308738592-geopandas-…

7c19c73

…point-from-xy

update geoseries notebook with from_xy

4878f9b

product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Feb 5, 2025

arwas11 and others added 2 commits February 5, 2025 13:46

Update docstring example

c6d46b1

fix doctstring lint error

d4488d8

arwas11 requested a review from tswast February 5, 2025 19:51

arwas11 changed the title ~~feat: add GeoSeries.from_xy~~ Feb 5, 2025

tswast added the owlbot:run Add this label to trigger the Owlbot post processor. label Feb 5, 2025

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Feb 5, 2025

tswast approved these changes Feb 5, 2025

View reviewed changes

tswast marked this pull request as ready for review February 5, 2025 19:56

tswast requested review from a team as code owners February 5, 2025 19:56

tswast requested a review from Genesis929 February 5, 2025 19:56

blunderbuss-gcf bot assigned shuoweil Feb 5, 2025

arwas11 enabled auto-merge (squash) February 5, 2025 20:06

Merge branch 'main' into b308738592-geopandas-point-from-xy

249ce90

tswast disabled auto-merge February 5, 2025 20:31

tswast added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Feb 5, 2025

tswast reviewed Feb 5, 2025

View reviewed changes

tswast requested changes Feb 5, 2025

View reviewed changes

arwas11 and others added 3 commits February 6, 2025 11:38

return GeometryDtype() for all ibis geo types

948c83b

chore: support timestamp subtractions (#1346)

6393df7

* chore: support timestamp subtractions * Fix format * use tree rewrites to dispatch timestamp_diff operator * add TODO for more node updates * polish the code and fix typos * fix comment * add rewrites to compile_raw and compile_peek_sql

chore: add a tool to upload tpcds data to bigquery. (#1367)

f5d3e4f

* chore: add a tool to upload tpcds data to bigquery. * update error type * update docstring

product-auto-label bot added size: xl Pull request size is extra large. and removed size: m Pull request size is medium. labels Feb 6, 2025

Merge branch 'main' into b308738592-geopandas-point-from-xy

dab3daa

product-auto-label bot added size: m Pull request size is medium. and removed size: xl Pull request size is extra large. labels Feb 6, 2025

Merge branch 'main' into b308738592-geopandas-point-from-xy

15760b5

arwas11 requested a review from tswast February 6, 2025 18:46

tswast approved these changes Feb 6, 2025

View reviewed changes

tswast removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Feb 6, 2025

tswast enabled auto-merge (squash) February 6, 2025 19:17

tswast merged commit 3c3e14c into main Feb 6, 2025
22 of 24 checks passed

tswast deleted the b308738592-geopandas-point-from-xy branch February 6, 2025 19:48

release-please bot mentioned this pull request Feb 6, 2025

chore(main): release 1.36.0 #1360

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add `GeoSeries.from_xy()` #1364

feat: add `GeoSeries.from_xy()` #1364

Uh oh!

arwas11 commented Feb 5, 2025 •

edited

Loading

review-notebook-app bot commented Feb 5, 2025

tswast left a comment

tswast commented Feb 5, 2025

tswast commented Feb 5, 2025

tswast Feb 5, 2025

tswast Feb 5, 2025

tswast left a comment

Uh oh!

feat: add GeoSeries.from_xy() #1364

feat: add GeoSeries.from_xy() #1364

Uh oh!

Conversation

arwas11 commented Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

review-notebook-app bot commented Feb 5, 2025

tswast left a comment

Choose a reason for hiding this comment

tswast commented Feb 5, 2025

tswast commented Feb 5, 2025

tswast Feb 5, 2025

Choose a reason for hiding this comment

tswast Feb 5, 2025

Choose a reason for hiding this comment

tswast left a comment

Choose a reason for hiding this comment

Uh oh!

feat: add `GeoSeries.from_xy()` #1364

feat: add `GeoSeries.from_xy()` #1364

arwas11 commented Feb 5, 2025 •

edited

Loading