Skip to content

refactor!: сhange default incognito context to persistent context for Playwright#985

Merged
vdusek merged 34 commits into
apify:masterfrom
Mantisus:pw-persist-browser-context
Feb 25, 2025
Merged

refactor!: сhange default incognito context to persistent context for Playwright#985
vdusek merged 34 commits into
apify:masterfrom
Mantisus:pw-persist-browser-context

Conversation

@Mantisus

@Mantisus Mantisus commented Feb 15, 2025

Copy link
Copy Markdown
Collaborator

Description

  • Changes the PlaywrightCrawler from using the standard browser context to using a persistent browser context.
  • Allows passing a user_data_dir with the path to the directory for the context. If user_data_dir is not provided, a temporary directory will be created.

Issues

@Mantisus Mantisus requested review from Pijukatel and vdusek and removed request for vdusek February 15, 2025 03:07
@Mantisus Mantisus self-assigned this Feb 15, 2025
@Mantisus Mantisus changed the title refactor!: сhange default 'incognito context' to 'persistent context' for Playwright Feb 15, 2025
@vdusek vdusek added this to the 108th sprint - Tooling team milestone Feb 19, 2025

@vdusek vdusek left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it contains a breaking change, could you please describe the breaking changes in the PR's description and also summarize it in the Upgrading guide?

Comment thread src/crawlee/crawlers/_playwright/_playwright_crawler.py Outdated
Comment thread src/crawlee/browsers/_browser_pool.py Outdated
Comment thread src/crawlee/browsers/_playwright_browser.py Outdated
Comment thread src/crawlee/browsers/_playwright_browser.py
Comment thread src/crawlee/browsers/_browser_pool.py
Mantisus and others added 16 commits February 19, 2025 13:26
Co-authored-by: Vlada Dusek <v.dusek96@gmail.com>
Co-authored-by: Vlada Dusek <v.dusek96@gmail.com>
Co-authored-by: Vlada Dusek <v.dusek96@gmail.com>
### Description

- fix public imports in `__init__` files
- Add `rich` to direct dependencies. It is one of `cookiecutter`'s
dependencies, but we use it directly in `statistics._models.py`

---------

Co-authored-by: Vlada Dusek <v.dusek96@gmail.com>
### Description

Add adaptive context helpers and documentation for
AdaptivePlaywrightCrawler.

### Issues

- Closes: apify#249

---------

Co-authored-by: Jan Buchar <Teyras@gmail.com>
Co-authored-by: Jan Buchar <jan.buchar@apify.com>
…#988)

### Description

- change custom `LRUCache` to `cachetools.LRUCache`. In my opinion,
`functools.lru_cache's` logic isn't well-suited for this use case.
Therefore, if we want to modify our caching approach, using `cachetools`
appears to be a better option.

### Issues

- Closes: apify#86
### Description

- update curl-cffi version requirement to >=0.9.0.
- update default `impersonate` from `chrome124` to `chrome131`
- Migrate from `poetry` to `uv`.
- Relates: apify#628
- The update of templates to use `uv` will be implemented separately.
- `project.urls`
- python 3.13 in ci
- unify name "Set up uv package manager"
- fix contributing guide
- add all extra, remove dev extra (move to dev deps)
- relates: apify#628
Mantisus and others added 6 commits February 19, 2025 17:14
…pify#959)

Add `additional_http_error_status_codes` and
`ignore_http_error_status_codes` to PlaywrightCrawler.
Since they exist now on all crawlers, move them to `BasicCrawler` level.
Do not use `_http_client` attributes for getting additional status codes
related variables.

**Breaking:** Remove `HttpCrawlerOptions` -> No unique options compared
to `BasicCrawlerOptions` anymore.

- Closes: apify#953
@Mantisus Mantisus requested review from Pijukatel and vdusek February 19, 2025 17:22

@vdusek vdusek left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are failing tests for Windows Python 3.12.

@Mantisus Mantisus force-pushed the pw-persist-browser-context branch from 6b6236d to 5e97e31 Compare February 20, 2025 23:25
@Mantisus

Copy link
Copy Markdown
Collaborator Author

There are failing tests for Windows Python 3.12.

The problem is not related to this PR.

It's solved in the PR #1007

@Pijukatel Pijukatel left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tests, just two minor comments.

Comment thread tests/unit/crawlers/_playwright/test_playwright_crawler.py Outdated
Comment thread tests/unit/crawlers/_playwright/test_playwright_crawler.py
@Mantisus Mantisus requested review from Pijukatel and vdusek February 21, 2025 13:41

@vdusek vdusek left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few comments.

Comment thread docs/upgrading/upgrading_to_v0x.md Outdated
Comment thread src/crawlee/browsers/_playwright_browser.py
Comment thread src/crawlee/browsers/_playwright_browser.py Outdated
Comment thread src/crawlee/browsers/_playwright_browser.py Outdated
Comment thread src/crawlee/browsers/_playwright_browser.py Outdated
Comment thread src/crawlee/browsers/_playwright_browser_controller.py Outdated
@Mantisus Mantisus requested a review from vdusek February 25, 2025 02:45

@vdusek vdusek left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vdusek vdusek merged commit f01520d into apify:master Feb 25, 2025
@vdusek vdusek added the t-tooling Issues with this label are in the ownership of the tooling team. label Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-tooling Issues with this label are in the ownership of the tooling team.

3 participants