Add gdal mdim get-refs algorithm#14677
Conversation
rouault
left a comment
There was a problem hiding this comment.
not a bad idea, but clearly this doesn't follow our LLM tool policy. The code has clearly not been read carefully. A lot of comments are hyper verbose, with typical "LLM plan garbage" references. I feel some of the tests are also a bit more verbose than what would be strictly needed (although it might just be my aversion for being exposed to prior verbosity)
| /****************************************************************************** | ||
| * | ||
| * Project: GDAL | ||
| * Purpose: gdal "mdim get-refs" subcommand |
There was a problem hiding this comment.
get-refs is a bit mysterious until you read the doc.
brainstorming: get-chunk-location, chunk-loc
There was a problem hiding this comment.
list-chunks or extract-chunks - I don't have a really strong feel here (get-chunk-location is absolutely fine so I'll go with that unless a stronger candidate or position arrives)
| } // namespace | ||
|
|
||
| /************************************************************************/ | ||
| /* GDALMdimGetRefsAlgorithm() */ |
There was a problem hiding this comment.
formatting issue. your pre-commit is not setup ?
There was a problem hiding this comment.
it is setup, I will check
| bool GDALMdimGetRefsAlgorithm::RunImpl(GDALProgressFunc pfnProgress, | ||
| void *pProgressData) | ||
| { | ||
| // ---------------------------------------------------------------------- |
There was a problem hiding this comment.
all those "stage", "A1", "D6" etc are really LLM noise. Please remove in general.
And you can pretty much remove that particular 5-line comment for a 2 line effective code...
| auto poSrcDS = m_inputDataset.GetDatasetRef(); | ||
| CPLAssert(poSrcDS); | ||
|
|
||
| // A2. GetRootGroup(). Null root => driver lacks mdim support => fail with a |
There was a problem hiding this comment.
make comment more succinct (remove the ==> fail part .... that's obvious from the code)
| auto poRootGroup = poSrcDS->GetRootGroup(); | ||
| CPLDebug("MDIM-GET-REFS", "input: %s, root group: %s", | ||
| poSrcDS->GetDescription(), poRootGroup ? "present" : "NULL"); | ||
|
|
|
|
||
| * ``ARRAY_NAME`` — the fullname of the input array | ||
| * ``DTYPE`` — the array's data type (e.g. ``Int16``, ``Float32``) | ||
| * ``DIM_N_NAME``, ``DIM_N_SIZE``, ``DIM_N_BLOCK``, ``DIM_N_CHUNKS`` — for |
There was a problem hiding this comment.
make a global pass to replace UTF-8 dash by ASCII dash
| * Only a single array can be emitted (``--array`` is required). | ||
| * Arrays without natural block size decline with a ``not chunk-enumerable`` error. | ||
| * The inline data payload is not extracted as a binary field. | ||
| * No geometry column is emitted. |
There was a problem hiding this comment.
| * No geometry column is emitted. |
| ----------- | ||
|
|
||
| * Only a single array can be emitted (``--array`` is required). | ||
| * Arrays without natural block size decline with a ``not chunk-enumerable`` error. |
There was a problem hiding this comment.
not really a limitation IMHO (in the sense there's nothing we can do about that)
| * Arrays without natural block size decline with a ``not chunk-enumerable`` error. |
|
|
||
| .. code-block:: bash | ||
|
|
||
| ogrinfo ocean_chunks.parquet -sql \ |
There was a problem hiding this comment.
| ogrinfo ocean_chunks.parquet -sql \ | |
| gdal vector info ocean_chunks.parquet --features --sql \ |
|
|
||
|
|
||
| @pytest.mark.require_driver("Parquet") | ||
| def test_gdalalg_mdim_get_refs_parquet_boolean_subtype(tmp_vsimem): |
There was a problem hiding this comment.
does it exercice any code path not already exercised ? I don't think so
|
The GDAL project highly values your contribution and would love to see this work merged! Unfortunately this PR has not had any activity in the last 28 days and is being automatically marked as "stale". If you think this pull request should be merged, please check
|
gdal mdim get-refs command.
Creates a vector layer (no geometry) that lists the
path,offset,sizeandinfoof chunk byte references from a multidimensional raster array.The emitted table also includes
dim_0, ...dim_nfields that identify the chunk's address (and allows filter queries), andpresenta boolean for whether the chunk exists.Metadata is also recorded in the layer for
ARRAY_NAME,DTYPE,DIM_0{:N}, and.papszInfoitems in the formCODEC_%seg.CODEC_COMPRESSION,CODEC_FILTER.AI tool usage
Tasklist
Description
Example
gdal mdim get-refs --array /Band1 autotest/gdrivers/data/netcdf/byte_chunked_multiple.nc \ /tmp/Band1.parquet --of Parquet gdal vector info /tmp/Band1.parquet --layer Band1 --features INFO: Open of `/tmp/Band1.parquet' using driver `Parquet' successful. Layer name: Band1 Metadata: ARRAY_NAME=/Band1 DTYPE=Byte DIM_0_NAME=y DIM_0_SIZE=20 DIM_0_BLOCK=10 DIM_0_CHUNKS=2 DIM_1_NAME=x DIM_1_SIZE=20 DIM_1_BLOCK=10 DIM_1_CHUNKS=2 CODEC_COMPRESSION=DEFLATE CODEC_FILTER=SHUFFLE Geometry: None Feature Count: 4 Layer SRS WKT: (unknown) dim_0: Integer64 (0.0) dim_1: Integer64 (0.0) present: Integer(Boolean) (0.0) path: String (0.0) offset: Integer64 (0.0) size: Integer64 (0.0) info: String (0.0) OGRFeature(Band1):0 dim_0 (Integer64) = 0 dim_1 (Integer64) = 0 present (Integer(Boolean)) = 1 path (String) = autotest/gdrivers/data/netcdf/byte_chunked_multiple.nc offset (Integer64) = 6108 size (Integer64) = 86 info (String) = COMPRESSION=DEFLATE; FILTER=SHUFFLE OGRFeature(Band1):1 dim_0 (Integer64) = 0 dim_1 (Integer64) = 1 present (Integer(Boolean)) = 1 path (String) = autotest/gdrivers/data/netcdf/byte_chunked_multiple.nc offset (Integer64) = 6194 size (Integer64) = 74 info (String) = COMPRESSION=DEFLATE; FILTER=SHUFFLE OGRFeature(Band1):2 dim_0 (Integer64) = 1 dim_1 (Integer64) = 0 present (Integer(Boolean)) = 1 path (String) = autotest/gdrivers/data/netcdf/byte_chunked_multiple.nc offset (Integer64) = 5959 size (Integer64) = 74 info (String) = COMPRESSION=DEFLATE; FILTER=SHUFFLE OGRFeature(Band1):3 dim_0 (Integer64) = 1 dim_1 (Integer64) = 1 present (Integer(Boolean)) = 1 path (String) = autotest/gdrivers/data/netcdf/byte_chunked_multiple.nc offset (Integer64) = 6033 size (Integer64) = 75 info (String) = COMPRESSION=DEFLATE; FILTER=SHUFFLE