Skip to content

feat: add GeoSeries.from_xy() #1364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Feb 6, 2025
Merged

Conversation

arwas11
Copy link
Contributor

@arwas11 arwas11 commented Feb 5, 2025

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)
    • from_xy: screen/3pW4NWMsvfvVeYd, screen/3SimxdQU93zjinv, screen/rfsK45NBkxkUdz7

Fixes #393394365 🦕

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Feb 5, 2025
@arwas11 arwas11 requested a review from tswast February 5, 2025 19:51
@arwas11 arwas11 changed the title feat: add GeoSeries.from_xy Feb 5, 2025
@tswast tswast added the owlbot:run Add this label to trigger the Owlbot post processor. label Feb 5, 2025
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Feb 5, 2025
Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@tswast tswast marked this pull request as ready for review February 5, 2025 19:56
@tswast tswast requested review from a team as code owners February 5, 2025 19:56
@tswast tswast requested a review from Genesis929 February 5, 2025 19:56
@arwas11 arwas11 enabled auto-merge (squash) February 5, 2025 20:06
@tswast
Copy link
Collaborator

tswast commented Feb 5, 2025

Doctest failures:

FAILED third_party/bigframes_vendored/pandas/core/frame.py::frame.DataFrame.map
FAILED third_party/bigframes_vendored/pandas/core/frame.py::frame.DataFrame.apply
FAILED third_party/bigframes_vendored/pandas/core/series.py::series.Series.apply
FAILED third_party/bigframes_vendored/pandas/core/series.py::series.Series.map
FAILED third_party/bigframes_vendored/pandas/core/series.py::series.Series.mask
FAILED bigframes/dataframe.py::bigframes.dataframe.DataFrame.map
FAILED bigframes/dataframe.py::bigframes.dataframe.DataFrame.applymap
FAILED bigframes/pandas/io/api.py::bigframes.pandas.io.api.read_gbq_function
FAILED bigframes/session/__init__.py::bigframes.session.Session.read_gbq_function

These don't appear to be related to this change.

@tswast tswast disabled auto-merge February 5, 2025 20:31
@tswast
Copy link
Collaborator

tswast commented Feb 5, 2025

Notebook test failures:

Failed: generative_ai/bq_dataframes_llm_kmeans.ipynb
Failed: geo/geoseries.ipynb
Failed: getting_started/getting_started_bq_dataframes.ipynb
Failed: location/regionalized.ipynb_us-central1
Failed: location/regionalized.ipynb_us
Failed: remote_functions/remote_function.ipynb
Failed: remote_functions/remote_function_usecases.ipynb

The geoseries.ipynb one does look related. Let's fix before merging.

error
__ /tmpfs/src/github/python-bigquery-dataframes/notebooks/geo/geoseries.ipynb __
---------------------------------------------------------------------------
bigframes.geopandas.GeoSeries.from_xy(geo_points.x, geo_points.y)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File /[tmpfs/src/github/python-bigquery-dataframes/.nox/notebook-3-10/lib/python3.10/site-packages/IPython/core/formatters.py:770](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/.nox/notebook-3-10/lib/python3.10/site-packages/IPython/core/formatters.py?l=770), in PlainTextFormatter.__call__(self, obj)
    763 stream = StringIO()
    764 printer = pretty.RepresentationPrinter(stream, self.verbose,
    765     self.max_width, self.newline,
    766     max_seq_length=self.max_seq_length,
    767     singleton_pprinters=self.singleton_printers,
    768     type_pprinters=self.type_printers,
    769     deferred_pprinters=self.deferred_printers)
--> 770 printer.pretty(obj)
    771 printer.flush()
    772 return stream.getvalue()

File /[tmpfs/src/github/python-bigquery-dataframes/.nox/notebook-3-10/lib/python3.10/site-packages/IPython/lib/pretty.py:419](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/.nox/notebook-3-10/lib/python3.10/site-packages/IPython/lib/pretty.py?l=419), in RepresentationPrinter.pretty(self, obj)
    408                         return meth(obj, self, cycle)
    409                 if (
    410                     cls is not object
    411                     # check if cls defines __repr__
   (...)
    417                     and callable(_safe_getattr(cls, "__repr__", None))
    418                 ):
--> 419                     return _repr_pprint(obj, self, cycle)
    421     return _default_pprint(obj, self, cycle)
    422 finally:

File /[tmpfs/src/github/python-bigquery-dataframes/.nox/notebook-3-10/lib/python3.10/site-packages/IPython/lib/pretty.py:794](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/.nox/notebook-3-10/lib/python3.10/site-packages/IPython/lib/pretty.py?l=794), in _repr_pprint(obj, p, cycle)
    792 """A pprint that just redirects to the normal repr function."""
    793 # Find newlines and replace them with p.break_()
--> 794 output = repr(obj)
    795 lines = output.splitlines()
    796 with p.group():

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/core/log_adapter.py:147](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/log_adapter.py?l=147), in method_logger.<locals>.wrapper(self, *args, **kwargs)
    144 _call_stack.append(full_method_name)
    146 try:
--> 147     return method(self, *args, **kwargs)
    148 except (NotImplementedError, TypeError) as e:
    149     # Log method parameters that are implemented in pandas but either missing (TypeError)
    150     # or not fully supported (NotImplementedError) in BigFrames.
    151     # Logging is currently supported only when we can access the bqclient through
    152     # self._block.expr.session.bqclient. Also, to avoid generating multiple queries
    153     # because of internal calls, we log only when the method is directly invoked.
    154     if hasattr(self, "_block") and len(_call_stack) == 1:

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/series.py:346](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/series.py?l=346), in Series.__repr__(self)
    343     return formatter.repr_query_job(self._compute_dry_run())
    345 self._cached()
--> 346 pandas_df, _, query_job = self._block.retrieve_repr_request_results(max_results)
    347 self._set_internal_query_job(query_job)
    349 pd_series = pandas_df.iloc[:, 0]

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/core/blocks.py:1453](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/blocks.py?l=1453), in Block.retrieve_repr_request_results(self, max_results)
   1445 """
   1446 Retrieves a pandas dataframe containing only max_results many rows for use
   1447 with printing methods.
   1448 
   1449 Returns a tuple of the dataframe and the overall number of rows of the query.
   1450 """
   1452 # head caches full underlying expression, so row_count will be free after
-> 1453 head_result = self.session._executor.head(self.expr, max_results)
   1454 count = self.session._executor.get_row_count(self.expr)
   1456 arrow = head_result.to_arrow_table()

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/session/executor.py:401](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/session/executor.py?l=401), in BigQueryCachingExecutor.head(self, array_value, n_rows)
    399 maybe_row_count = self._local_get_row_count(array_value)
    400 if (maybe_row_count is not None) and (maybe_row_count <= n_rows):
--> 401     return self.execute(array_value, ordered=True)
    403 if not self.strictly_ordered and not array_value.node.explicitly_ordered:
    404     # No user-provided ordering, so just get any N rows, its faster!
    405     return self.peek(array_value, n_rows)

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/session/executor.py:286](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/session/executor.py?l=286), in BigQueryCachingExecutor.execute(self, array_value, ordered, col_id_overrides, use_explicit_destination, get_size_bytes, page_size, max_results)
    283 # Runs strict validations to ensure internal type predictions and ibis are completely in sync
    284 # Do not execute these validations outside of testing suite.
    285 if "PYTEST_CURRENT_TEST" in os.environ and len(col_id_overrides) == 0:
--> 286     self._validate_result_schema(array_value, iterator.schema)
    288 return ExecuteResult(
    289     arrow_batches=iterator_supplier,
    290     schema=adjusted_schema,
   (...)
    293     total_rows=iterator.total_rows,
    294 )

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/session/executor.py:648](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/session/executor.py?l=648), in BigQueryCachingExecutor._validate_result_schema(self, array_value, bq_schema)
    642 def _validate_result_schema(
    643     self,
    644     array_value: bigframes.core.ArrayValue,
    645     bq_schema: list[bigquery.SchemaField],
    646 ):
    647     actual_schema = tuple(bq_schema)
--> 648     ibis_schema = bigframes.core.compile.test_only_ibis_inferred_schema(
    649         self.replace_cached_subtrees(array_value.node)
    650     )
    651     internal_schema = array_value.schema
    652     if not bigframes.features.PANDAS_VERSIONS.is_arrow_list_dtype_usable:

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/core/compile/api.py:83](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/compile/api.py?l=83), in test_only_ibis_inferred_schema(node)
     81 node = _STRICT_COMPILER._preprocess(node)
     82 compiled = _STRICT_COMPILER.compile_node(node)
---> 83 items = tuple(
     84     bigframes.core.schema.SchemaItem(name, compiled.get_column_type(ibis_id))
     85     for name, ibis_id in zip(node.schema.names, compiled.column_ids)
     86 )
     87 return bigframes.core.schema.ArraySchema(items)

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/core/compile/api.py:84](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/compile/api.py?l=84), in <genexpr>(.0)
     81 node = _STRICT_COMPILER._preprocess(node)
     82 compiled = _STRICT_COMPILER.compile_node(node)
     83 items = tuple(
---> 84     bigframes.core.schema.SchemaItem(name, compiled.get_column_type(ibis_id))
     85     for name, ibis_id in zip(node.schema.names, compiled.column_ids)
     86 )
     87 return bigframes.core.schema.ArraySchema(items)

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/core/compile/compiled.py:154](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/compile/compiled.py?l=154), in UnorderedIR.get_column_type(self, key)
    147 def get_column_type(self, key: str) -> bigframes.dtypes.Dtype:
    148     ibis_type = typing.cast(
    149         bigframes.core.compile.ibis_types.IbisDtype,
    150         self._get_ibis_column(key).type(),
    151     )
    152     return typing.cast(
    153         bigframes.dtypes.Dtype,
--> 154         bigframes.core.compile.ibis_types.ibis_dtype_to_bigframes_dtype(ibis_type),
    155     )

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/core/compile/ibis_types.py:307](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/compile/ibis_types.py?l=307), in ibis_dtype_to_bigframes_dtype(ibis_dtype)
    305     return IBIS_TO_BIGFRAMES[ibis_dtypes.string]
    306 else:
--> 307     raise ValueError(
    308         f"Unexpected Ibis data type {ibis_dtype}. {constants.FEEDBACK_LINK}"
    309     )

ValueError: Unexpected Ibis data type point:geometry. Share your usecase with the BigQuery DataFrames team at the [https://bit.ly/bigframes-feedback](https://www.google.com/url?q=https://bit.ly/bigframes-feedback&sa=D) survey.You are currently running BigFrames version 1.35.0

Learn more about nbmake at [https://github.com/treebeardtech/nbmake](https://www.google.com/url?q=https://github.com/treebeardtech/nbmake&sa=D)
@tswast tswast added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Feb 5, 2025
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the notebook failure, please add some logic to ibis_dtype_to_bigframes_dtype looking for all ibis geography-y types and returning the gpd.array.GeometryDtype().

Something like

if isinstance(ibis_dtype, ibis_dtypes.GeoSpatial):
    return gpd.array.GeometryDtype()

around here:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we can look for ibis_dtypes.GeoSpatial is that all the geography-y types have it as a superclass. See:

Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking as request changes to cover the ibis_dtype_to_bigframes_dtype change for the notebook fix before merging.

arwas11 and others added 3 commits February 6, 2025 11:38
* chore: support timestamp subtractions

* Fix format

* use tree rewrites to dispatch timestamp_diff operator

* add TODO for more node updates

* polish the code and fix typos

* fix comment

* add rewrites to compile_raw and compile_peek_sql
* chore: add a tool to upload tpcds data to bigquery.

* update error type

* update docstring
@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. and removed size: m Pull request size is medium. labels Feb 6, 2025
@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: xl Pull request size is extra large. labels Feb 6, 2025
@arwas11 arwas11 requested a review from tswast February 6, 2025 18:46
@tswast tswast removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Feb 6, 2025
@tswast tswast enabled auto-merge (squash) February 6, 2025 19:17
@tswast tswast merged commit 3c3e14c into main Feb 6, 2025
22 of 24 checks passed
@tswast tswast deleted the b308738592-geopandas-point-from-xy branch February 6, 2025 19:48
shuoweil pushed a commit that referenced this pull request Feb 6, 2025
* feat: add GeoSeries.from_xy

* add from_xy test and update ibis types

* update geoseries notebook with from_xy

* Update docstring example

* fix doctstring lint error

* return GeometryDtype() for all ibis geo types

* chore: support timestamp subtractions (#1346)

* chore: support timestamp subtractions

* Fix format

* use tree rewrites to dispatch timestamp_diff operator

* add TODO for more node updates

* polish the code and fix typos

* fix comment

* add rewrites to compile_raw and compile_peek_sql

* chore: add a tool to upload tpcds data to bigquery. (#1367)

* chore: add a tool to upload tpcds data to bigquery.

* update error type

* update docstring

---------

Co-authored-by: Shenyang Cai <sycai@users.noreply.github.com>
Co-authored-by: Huan Chen <142538604+Genesis929@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: m Pull request size is medium.
5 participants