Skip to content

feat: add support use_incognito_pages for browser_launch_options in PlaywrightCrawler#941

Merged
vdusek merged 9 commits into
apify:masterfrom
Mantisus:playwright-cookies
Feb 5, 2025
Merged

feat: add support use_incognito_pages for browser_launch_options in PlaywrightCrawler#941
vdusek merged 9 commits into
apify:masterfrom
Mantisus:playwright-cookies

Conversation

@Mantisus

@Mantisus Mantisus commented Jan 29, 2025

Copy link
Copy Markdown
Collaborator

Description

  • Improve cookie handling for PlaywrightCrawler. Cookies are now stored in the Session and set in Playwright Context from the Session.
  • Add use_incognito_pages option for browser_launch_options allowing each new page to be launched in a separate context.

Issues

@Mantisus Mantisus requested review from janbuchar and vdusek January 30, 2025 13:45
@janbuchar

Copy link
Copy Markdown
Collaborator

Does the incognito pages option have any relationship with the cookie handling? Also, a test would be nice 🙂

@Mantisus

Copy link
Copy Markdown
Collaborator Author

Does the incognito pages option have any relationship with the cookie handling?

Yes. When we work in basic mode we are working with one common context. In this case, cookies will be strayed between sessions.

However, by using incognito pages one page per context, we get a more controlled situation. When a session works only with the cookies intended for it.

@Mantisus Mantisus self-assigned this Jan 30, 2025
@janbuchar

Copy link
Copy Markdown
Collaborator

Does the incognito pages option have any relationship with the cookie handling?

Yes. When we work in basic mode we are working with one common context. In this case, cookies will be strayed between sessions.

Can this be a desirable state for anyone?

However, by using incognito pages one page per context, we get a more controlled situation. When a session works only with the cookies intended for it.

I sort of think that this should be the default. Do we really need to make it configurable? What does the JS version do?

@B4nan

B4nan commented Jan 31, 2025

Copy link
Copy Markdown
Member

What does the JS version do?

It's not the default, because it means no browser cache, so a huge perf cost, when we were testing this, things took literally twice the time to finish because of that.

@janbuchar

Copy link
Copy Markdown
Collaborator

What does the JS version do?

It's not the default, because it means no browser cache, so a huge perf cost, when we were testing this, things took literally twice the time to finish because of that.

So does the JS PlaywrightCrawler also not store cookies in the Session?

@Mantisus

Mantisus commented Jan 31, 2025

Copy link
Copy Markdown
Collaborator Author

So does the JS PlaywrightCrawler also not store cookies in the Session?

Stores, but when using the basic setup, they are just as flowing between sessions from the Playwright context.

@B4nan

B4nan commented Jan 31, 2025

Copy link
Copy Markdown
Member

This is the thing we really need to redesign in next major. IDK how cookies behave, but the more important part there is that because of this, we keep the same proxy in one browser instance with the defaults (so persistent contexts). In other words, sessions rotate per request, but proxies only per browser (we have some limits on how many times a browser instance can be used).

Comment thread src/crawlee/browsers/_playwright_browser_controller.py Outdated
Comment thread src/crawlee/browsers/_playwright_browser_controller.py
Comment thread src/crawlee/browsers/_playwright_browser_plugin.py Outdated

@vdusek vdusek left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last nit, otherwise LGTM, but let's wait for @janbuchar's approve as well.

Comment thread tests/unit/crawlers/_playwright/test_playwright_crawler.py
Comment thread tests/unit/crawlers/_playwright/test_playwright_crawler.py Outdated
@Mantisus Mantisus requested a review from janbuchar February 4, 2025 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants