Skip to content

fix: Update push_data and user_data annotation with JsonSerializable instead of Any#1889

Merged
vdusek merged 4 commits into
apify:masterfrom
Mantisus:up-json-serializavle-typing
May 16, 2026
Merged

fix: Update push_data and user_data annotation with JsonSerializable instead of Any#1889
vdusek merged 4 commits into
apify:masterfrom
Mantisus:up-json-serializavle-typing

Conversation

@Mantisus

Copy link
Copy Markdown
Collaborator

Description

  • Improved annotation for arguments that accept JSON data by replacing implicit Any with explicit JsonSerializable type for push_data and user_data parameters.

Issues

@Mantisus Mantisus requested review from janbuchar and vdusek May 11, 2026 15:52
@Mantisus Mantisus self-assigned this May 11, 2026

@vdusek vdusek left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please try using this type:

JsonSerializable = dict[str, 'JsonSerializable'] | list['JsonSerializable'] | str | int | float | bool | None
"""Recursive type for JSON-serializable values - primitives plus objects and arrays with JSON-serializable contents.

Based on the definition discussed in https://github.com/python/typing/issues/182.
"""

All major type checkers support recursive types now, so we can finally type this correctly. I recently made the same change in the API client as well - https://github.com/apify/apify-client-python/blob/master/src/apify_client/_types.py#L30C1-L34C4.

@vdusek vdusek left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of a better JsonSerializable type should be that it lets us type its various usages more accurately and with less complexity. Here are a few examples:

Comment thread src/crawlee/_utils/file.py
Comment thread src/crawlee/crawlers/_abstract_http/_abstract_http_crawler.py Outdated
Comment thread src/crawlee/crawlers/_basic/_basic_crawler.py
@Mantisus Mantisus requested a review from vdusek May 13, 2026 21:49

@vdusek vdusek left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return value of iterate_items is still AsyncIterator[dict[str, Any]] in the FileSystemDatasetClient, SqlDatasetClient, and RedisDatasetClient. Is that intended?

In Redis, there is also a relevant cast.

@vdusek vdusek left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also see data field in the DatasetItemDb:

data: Mapped[list[dict[str, Any]] | dict[str, Any]] = mapped_column(JsonField, nullable=False)

=>

data: Mapped[dict[str, JsonSerializable]] = mapped_column(JsonField, nullable=False)
Comment thread src/crawlee/_types.py
Comment thread src/crawlee/storage_clients/_base/_dataset_client.py Outdated
Comment thread src/crawlee/sessions/_session.py Outdated
Comment thread src/crawlee/sessions/_models.py Outdated
Comment thread src/crawlee/crawlers/_playwright/_playwright_crawler.py Outdated
Comment thread src/crawlee/crawlers/_basic/_basic_crawler.py Outdated
Comment thread src/crawlee/crawlers/_abstract_http/_abstract_http_crawler.py Outdated
@Mantisus

Copy link
Copy Markdown
Collaborator Author

Also see data field in the DatasetItemDb:

data: Mapped[list[dict[str, Any]] | dict[str, Any]] = mapped_column(JsonField, nullable=False)

=>

data: Mapped[dict[str, JsonSerializable]] = mapped_column(JsonField, nullable=False)

SQLAlchemy uses its own mechanism for handling types and their mapping. This prevents the use of JsonSerializable.

@Mantisus Mantisus requested a review from vdusek May 15, 2026 01:24

@vdusek vdusek left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just before merging, could you please prepare a draft PR to the SDK with

dependencies = [
    # ...
    "crawlee @ git+https://github.com/apify/crawlee-python.git@master",
    # ...
]

so we can make sure the new typing doesn't cause any issues there?

@vdusek vdusek merged commit 662b93b into apify:master May 16, 2026
58 of 59 checks passed
vdusek pushed a commit to apify/apify-sdk-python that referenced this pull request May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants