Skip to content

Commit c5b7fda

Browse files
feat: Implement ST_LENGTH geography function (#1791)
* feat: Implement ST_LENGTH geography function This commit introduces the ST_LENGTH function for BigQuery DataFrames. ST_LENGTH computes the length of GEOGRAPHY objects in meters. The implementation includes: - A new operation `geo_st_length_op` in `bigframes.operations.geo_ops`. - The user-facing function `st_length` in `bigframes.bigquery._operations.geo`. - Exposure of the new operation and function in relevant `__init__.py` files. - Comprehensive unit tests covering various geometry types (Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, GeometryCollection), empty geographies, and NULL inputs. The function behaves as per the BigQuery ST_LENGTH documentation: - Returns 0 for POINT, MULTIPOINT, and empty GEOGRAPHYs. - Returns the perimeter for POLYGON and MULTIPOLYGON. - Returns the total length for LINESTRING and MULTILINESTRING. - For GEOMETRYCOLLECTION, sums the lengths/perimeters of its constituent linestrings and polygons. * feat: Add NotImplemented length property to GeoSeries This commit adds a `length` property to the `GeoSeries` class. Accessing this property will raise a `NotImplementedError`, guiding you to utilize the `bigframes.bigquery.st_length()` function instead. This change includes: - The `length` property in `bigframes/geopandas/geoseries.py`. - A unit test in `tests/system/small/geopandas/test_geoseries.py` to verify that the correct error is raised with the specified message when `GeoSeries.length` is accessed. * Update bigframes/bigquery/_operations/__init__.py * fix lint * add missing compilation method * use pandas for the expected values in tests * fix: Apply patch for ST_LENGTH and related test updates This commit applies a user-provided patch that includes: - Removing `st_length` from `bigframes/bigquery/_operations/__init__.py`. - Adding an Ibis implementation for `geo_st_length_op` in `bigframes/core/compile/scalar_op_compiler.py`. - Modifying `KMeans` in `bigframes/ml/cluster.py` to handle `init="k-means++"`. - Updating geo tests in `tests/system/small/bigquery/test_geo.py` to use `to_pandas()` and `pd.testing.assert_series_equal`. Note: System tests requiring Google Cloud authentication were not executed due to limitations in my current environment. * feat: Add use_spheroid parameter to ST_LENGTH and update docs This commit introduces the `use_spheroid` parameter to the `ST_LENGTH` geography function, aligning it more closely with the BigQuery ST_LENGTH(geography_expression[, use_spheroid]) signature. Key changes: - `bigframes.operations.geo_ops.GeoStLengthOp` is now a dataclass that accepts `use_spheroid` (defaulting to `False`). A check is included to raise `NotImplementedError` if `use_spheroid` is `True`, as this is the current limitation in BigQuery. - The Ibis compiler implementation for `geo_st_length_op` in `bigframes.core.compile.scalar_op_compiler.py` has been updated to accept the new `GeoStLengthOp` operator type. - The user-facing `st_length` function in `bigframes.bigquery._operations.geo.py` now includes the `use_spheroid` keyword argument. - The docstring for `st_length` has been updated to match the official BigQuery documentation, clarifying that only lines contribute to the length (points and polygons result in 0 length), and detailing the `use_spheroid` parameter. Examples have been updated accordingly. - Tests in `tests/system/small/bigquery/test_geo.py` have been updated to: - Reflect the correct behavior (0 length for polygons/points). - Test calls with both default `use_spheroid` and explicit `use_spheroid=False`. - Verify that `use_spheroid=True` raises a `NotImplementedError`. Note: System tests requiring Google Cloud authentication were not re-executed for this specific commit due to environment limitations identified in previous steps. The changes primarily affect the operator definition, function signature, and client-side validation, with the core Ibis compilation logic for length remaining unchanged. * feat: Implement use_spheroid for ST_LENGTH via Ibis UDF This commit refactors the ST_LENGTH implementation to correctly pass the `use_spheroid` parameter to BigQuery by using Ibis's `ibis_udf.scalar.builtin('ST_LENGTH', ...)` function. Key changes: - `bigframes.operations.geo_ops.GeoStLengthOp`: The client-side `NotImplementedError` for `use_spheroid=True` (raised in `__post_init__`) has been removed. BigQuery DataFrames will now pass this parameter directly to BigQuery. - `bigframes.core.compile.scalar_op_compiler.geo_length_op_impl`: The implementation now always uses `ibis_udf.scalar.builtin('ST_LENGTH', x, op.use_spheroid)` instead of `x.length()`. This ensures the `use_spheroid` parameter is included in the SQL generated for BigQuery. - `tests/system/small/bigquery/test_geo.py`: - The test expecting a client-side `NotImplementedError` for `use_spheroid=True` has been removed. - A new test `test_st_length_use_spheroid_true_errors_from_bq` has been added. This test calls `st_length` with `use_spheroid=True` and asserts that an exception is raised from BigQuery, as BigQuery itself currently only supports `use_spheroid=False` for the `ST_LENGTH` function. - Existing tests for `st_length` were already updated in a previous commit to reflect that only line geometries contribute to the length, and these continue to verify behavior with `use_spheroid=False`. This change ensures that BigQuery DataFrames accurately reflects BigQuery's `ST_LENGTH` capabilities concerning the `use_spheroid` parameter. * refactor: Use Ibis UDF for ST_LENGTH BigQuery builtin This commit refactors the ST_LENGTH geography operation to use an Ibis UDF defined via `@ibis_udf.scalar.builtin`. This aligns with the pattern exemplified by other built-in functions like ST_DISTANCE when a direct Ibis method with all necessary parameters is not available. Key changes: - A new `st_length` function is defined in `bigframes/core/compile/scalar_op_compiler.py` using `@ibis_udf.scalar.builtin`. This UDF maps to BigQuery's `ST_LENGTH(geography, use_spheroid)` function. - The `geo_length_op_impl` in the same file now calls this `st_length` Ibis UDF, replacing the previous use of `op_typing.ibis_function`. - The `GeoStLengthOp` in `bigframes/operations/geo_ops.py` and the user-facing `st_length` function in `bigframes/bigquery/_operations/geo.py` remain unchanged from the previous version, as they correctly define the operation's interface and parameters. This change provides a cleaner and more direct way to map the BigQuery DataFrames operation to the specific BigQuery ST_LENGTH SQL function signature, while maintaining the existing BigQuery DataFrames operation structure. The behavior of the `st_length` function, including its handling of the `use_spheroid` parameter and error conditions from BigQuery, remains the same. * refactor: Consolidate st_length tests in test_geo.py This commit refactors the system tests for the `st_length` geography function in `tests/system/small/bigquery/test_geo.py`. The numerous individual test cases for different geometry types have been combined into a single, comprehensive test function `test_st_length_various_geometries`. This new test uses a single GeoSeries with a variety of inputs (Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, GeometryCollection, None/Empty) and compares the output of `st_length` (with both default and explicit `use_spheroid=False`) against a pandas Series of expected lengths. This consolidation improves the conciseness and maintainability of the tests for `st_length`. The test for `use_spheroid=True` (expecting an error from BigQuery) remains separate. * fix: Correct export of GeoStLengthOp in operations init This commit fixes an ImportError caused by an incorrect name being used for the ST_LENGTH geography operator in `bigframes/operations/__init__.py`. When `geo_st_length_op` (a variable) was replaced by the dataclass `GeoStLengthOp`, the import and `__all__` list in this `__init__.py` file were not updated. This commit changes the import from `.geo_ops` to correctly import `GeoStLengthOp` and updates the `__all__` list to export `GeoStLengthOp`. * fix system test and some linting * fix lint * fix doctest * fix docstring * Update bigframes/core/compile/scalar_op_compiler.py --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
1 parent d2154c8 commit c5b7fda

File tree

8 files changed

+170
-1
lines changed

8 files changed

+170
-1
lines changed

‎bigframes/bigquery/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
st_difference,
3333
st_distance,
3434
st_intersection,
35+
st_length,
3536
)
3637
from bigframes.bigquery._operations.json import (
3738
json_extract,
@@ -58,6 +59,7 @@
5859
"st_difference",
5960
"st_distance",
6061
"st_intersection",
62+
"st_length",
6163
# json ops
6264
"json_extract",
6365
"json_extract_array",

‎bigframes/bigquery/_operations/geo.py

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -380,3 +380,67 @@ def st_intersection(
380380
each aligned geometry with other.
381381
"""
382382
return series._apply_binary_op(other, ops.geo_st_intersection_op)
383+
384+
385+
def st_length(
386+
series: Union[bigframes.series.Series, bigframes.geopandas.GeoSeries],
387+
*,
388+
use_spheroid: bool = False,
389+
) -> bigframes.series.Series:
390+
"""Returns the total length in meters of the lines in the input GEOGRAPHY.
391+
392+
If a series element is a point or a polygon, returns zero for that row.
393+
If a series element is a collection, returns the length of the lines
394+
in the collection; if the collection doesn't contain lines, returns
395+
zero.
396+
397+
The optional use_spheroid parameter determines how this function
398+
measures distance. If use_spheroid is FALSE, the function measures
399+
distance on the surface of a perfect sphere.
400+
401+
The use_spheroid parameter currently only supports the value FALSE. The
402+
default value of use_spheroid is FALSE. See:
403+
https://cloud.google.com/bigquery/docs/reference/standard-sql/geography_functions#st_length
404+
405+
**Examples:**
406+
407+
>>> import bigframes.geopandas
408+
>>> import bigframes.pandas as bpd
409+
>>> import bigframes.bigquery as bbq
410+
>>> from shapely.geometry import Polygon, LineString, Point, GeometryCollection
411+
>>> bpd.options.display.progress_bar = None
412+
413+
>>> series = bigframes.geopandas.GeoSeries(
414+
... [
415+
... LineString([(0, 0), (1, 0)]), # Length will be approx 1 degree in meters
416+
... Polygon([(0.0, 0.0), (0.1, 0.1), (0.0, 0.1)]), # Length is 0
417+
... Point(0, 1), # Length is 0
418+
... GeometryCollection([LineString([(0,0),(0,1)]), Point(1,1)]) # Length of LineString only
419+
... ]
420+
... )
421+
422+
Default behavior (use_spheroid=False):
423+
424+
>>> result = bbq.st_length(series)
425+
>>> result
426+
0 111195.101177
427+
1 0.0
428+
2 0.0
429+
3 111195.101177
430+
dtype: Float64
431+
432+
Args:
433+
series (bigframes.series.Series | bigframes.geopandas.GeoSeries):
434+
A series containing geography objects.
435+
use_spheroid (bool, optional):
436+
Determines how this function measures distance.
437+
If FALSE (default), measures distance on a perfect sphere.
438+
Currently, only FALSE is supported.
439+
440+
Returns:
441+
bigframes.series.Series:
442+
Series of floats representing the lengths in meters.
443+
"""
444+
series = series._apply_unary_op(ops.GeoStLengthOp(use_spheroid=use_spheroid))
445+
series.name = None
446+
return series

‎bigframes/core/compile/scalar_op_compiler.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,6 @@
3030
import bigframes.core.compile.default_ordering
3131
import bigframes.core.compile.ibis_types
3232
import bigframes.core.expression as ex
33-
import bigframes.dtypes
3433
import bigframes.operations as ops
3534

3635
_ZERO = typing.cast(ibis_types.NumericValue, ibis_types.literal(0))
@@ -1079,6 +1078,12 @@ def geo_x_op_impl(x: ibis_types.Value):
10791078
return typing.cast(ibis_types.GeoSpatialValue, x).x()
10801079

10811080

1081+
@scalar_op_compiler.register_unary_op(ops.GeoStLengthOp, pass_op=True)
1082+
def geo_length_op_impl(x: ibis_types.Value, op: ops.GeoStLengthOp):
1083+
# Call the st_length UDF defined in this file (or imported)
1084+
return st_length(x, op.use_spheroid)
1085+
1086+
10821087
@scalar_op_compiler.register_unary_op(ops.geo_y_op)
10831088
def geo_y_op_impl(x: ibis_types.Value):
10841089
return typing.cast(ibis_types.GeoSpatialValue, x).y()
@@ -2057,6 +2062,12 @@ def st_distance(a: ibis_dtypes.geography, b: ibis_dtypes.geography, use_spheroid
20572062
"""Convert string to geography."""
20582063

20592064

2065+
@ibis_udf.scalar.builtin
2066+
def st_length(geog: ibis_dtypes.geography, use_spheroid: bool) -> ibis_dtypes.float: # type: ignore
2067+
"""ST_LENGTH BQ builtin. This body is never executed."""
2068+
pass
2069+
2070+
20602071
@ibis_udf.scalar.builtin
20612072
def unix_micros(a: ibis_dtypes.timestamp) -> int: # type: ignore
20622073
"""Convert a timestamp to microseconds"""

‎bigframes/geopandas/geoseries.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,12 @@ def __init__(self, data=None, index=None, **kwargs):
3030
data=data, index=index, dtype=geopandas.array.GeometryDtype(), **kwargs
3131
)
3232

33+
@property
34+
def length(self):
35+
raise NotImplementedError(
36+
"GeoSeries.length is not yet implemented. Please use bigframes.bigquery.st_length(geoseries) instead."
37+
)
38+
3339
@property
3440
def x(self) -> bigframes.series.Series:
3541
series = self._apply_unary_op(ops.geo_x_op)

‎bigframes/operations/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,7 @@
101101
geo_x_op,
102102
geo_y_op,
103103
GeoStDistanceOp,
104+
GeoStLengthOp,
104105
)
105106
from bigframes.operations.json_ops import (
106107
JSONExtract,
@@ -385,6 +386,7 @@
385386
"geo_st_geogfromtext_op",
386387
"geo_st_geogpoint_op",
387388
"geo_st_intersection_op",
389+
"GeoStLengthOp",
388390
"geo_x_op",
389391
"geo_y_op",
390392
"GeoStDistanceOp",

‎bigframes/operations/geo_ops.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,3 +80,12 @@ class GeoStDistanceOp(base_ops.BinaryOp):
8080

8181
def output_type(self, *input_types: dtypes.ExpressionType) -> dtypes.ExpressionType:
8282
return dtypes.FLOAT_DTYPE
83+
84+
85+
@dataclasses.dataclass(frozen=True)
86+
class GeoStLengthOp(base_ops.UnaryOp):
87+
name = "geo_st_length"
88+
use_spheroid: bool = False
89+
90+
def output_type(self, *input_types: dtypes.ExpressionType) -> dtypes.ExpressionType:
91+
return dtypes.FLOAT_DTYPE

‎tests/system/small/bigquery/test_geo.py

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,10 +19,14 @@
1919
from shapely.geometry import ( # type: ignore
2020
GeometryCollection,
2121
LineString,
22+
MultiLineString,
23+
MultiPoint,
24+
MultiPolygon,
2225
Point,
2326
Polygon,
2427
)
2528

29+
from bigframes.bigquery import st_length
2630
import bigframes.bigquery as bbq
2731
import bigframes.geopandas
2832

@@ -59,6 +63,66 @@ def test_geo_st_area():
5963
)
6064

6165

66+
# Expected length for 1 degree of longitude at the equator is approx 111195.079734 meters
67+
DEG_LNG_EQUATOR_METERS = 111195.07973400292
68+
69+
70+
def test_st_length_various_geometries(session):
71+
input_geometries = [
72+
Point(0, 0),
73+
LineString([(0, 0), (1, 0)]),
74+
Polygon([(0, 0), (1, 0), (0, 1), (0, 0)]),
75+
MultiPoint([Point(0, 0), Point(1, 1)]),
76+
MultiLineString([LineString([(0, 0), (1, 0)]), LineString([(0, 0), (0, 1)])]),
77+
MultiPolygon(
78+
[
79+
Polygon([(0, 0), (1, 0), (0, 1), (0, 0)]),
80+
Polygon([(2, 2), (3, 2), (2, 3), (2, 2)]),
81+
]
82+
),
83+
GeometryCollection([Point(0, 0), LineString([(0, 0), (1, 0)])]),
84+
GeometryCollection([]),
85+
None, # Represents NULL geography input
86+
GeometryCollection([Point(1, 1), Point(2, 2)]),
87+
]
88+
geoseries = bigframes.geopandas.GeoSeries(input_geometries, session=session)
89+
90+
expected_lengths = pd.Series(
91+
[
92+
0.0, # Point
93+
DEG_LNG_EQUATOR_METERS, # LineString
94+
0.0, # Polygon
95+
0.0, # MultiPoint
96+
2 * DEG_LNG_EQUATOR_METERS, # MultiLineString
97+
0.0, # MultiPolygon
98+
DEG_LNG_EQUATOR_METERS, # GeometryCollection (Point + LineString)
99+
0.0, # Empty GeometryCollection
100+
pd.NA, # None input for ST_LENGTH(NULL) is NULL
101+
0.0, # GeometryCollection (Point + Point)
102+
],
103+
index=pd.Index(range(10), dtype="Int64"),
104+
dtype="Float64",
105+
)
106+
107+
# Test default use_spheroid
108+
result_default = st_length(geoseries).to_pandas()
109+
pd.testing.assert_series_equal(
110+
result_default,
111+
expected_lengths,
112+
rtol=1e-3,
113+
atol=1e-3, # For comparisons involving 0.0
114+
) # type: ignore
115+
116+
# Test explicit use_spheroid=False
117+
result_explicit_false = st_length(geoseries, use_spheroid=False).to_pandas()
118+
pd.testing.assert_series_equal(
119+
result_explicit_false,
120+
expected_lengths,
121+
rtol=1e-3,
122+
atol=1e-3, # For comparisons involving 0.0
123+
) # type: ignore
124+
125+
62126
def test_geo_st_difference_with_geometry_objects():
63127
data1 = [
64128
Polygon([(0, 0), (10, 0), (10, 10), (0, 0)]),

‎tests/system/small/geopandas/test_geoseries.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,17 @@ def test_geo_area_not_supported():
9696
bf_series.area
9797

9898

99+
def test_geoseries_length_property_not_implemented(session):
100+
gs = bigframes.geopandas.GeoSeries([Point(0, 0)], session=session)
101+
with pytest.raises(
102+
NotImplementedError,
103+
match=re.escape(
104+
"GeoSeries.length is not yet implemented. Please use bigframes.bigquery.st_length(geoseries) instead."
105+
),
106+
):
107+
_ = gs.length
108+
109+
99110
def test_geo_distance_not_supported():
100111
s1 = bigframes.pandas.Series(
101112
[

0 commit comments

Comments
 (0)