Skip to content

Conversation

@mscolnick
Copy link
Contributor

@mscolnick mscolnick commented Dec 29, 2025

Reduce the size of the AppRouter class, make it more testable, add more tests.

@vercel
Copy link

vercel bot commented Dec 29, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
marimo-docs Ready Ready Preview, Comment Dec 29, 2025 8:29pm
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the AppRouter class to improve code organization and testability by extracting functionality into separate, focused modules. The main changes involve creating dedicated classes for path validation and directory scanning, introducing an AppDefaults dataclass to simplify parameter passing, and adding comprehensive test coverage.

Key changes:

  • Extracted path validation logic into PathValidator class for better separation of security concerns
  • Extracted directory scanning logic into DirectoryScanner class for improved testability
  • Introduced AppDefaults dataclass to reduce parameter repetition across the codebase

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
marimo/_server/files/path_validator.py New module containing PathValidator class for security validation of file paths
marimo/_server/files/directory_scanner.py New module containing DirectoryScanner class and is_marimo_app function
marimo/_server/app_defaults.py New dataclass for consolidating app default configuration parameters
marimo/_server/file_router.py Refactored to delegate to PathValidator and DirectoryScanner, significantly reducing code size
marimo/_server/session_manager.py Updated to use AppDefaults instead of passing individual parameters
marimo/_session/notebook/file_manager.py Updated to accept AppDefaults instead of individual config parameters
marimo/_server/api/endpoints/home.py Updated imports to reference DirectoryScanner.MAX_FILES
marimo/_server/api/endpoints/assets.py Updated to use PathValidator for security checks
marimo/_pyodide/bootstrap.py Updated to use AppDefaults when creating AppFileManager
tests/_server/test_path_validator.py New comprehensive tests for PathValidator functionality
tests/_server/test_directory_scanner.py New comprehensive tests for DirectoryScanner functionality
tests/_server/test_file_router.py Updated to use PathValidator instead of validate_inside_directory function
tests/_server/rtc/test_rtc_doc.py Minor optimization moving AsyncGenerator import into TYPE_CHECKING block

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

try:
validate_inside_directory(public_dir, file_path)
validator = PathValidator(public_dir)
validator.validate_inside_directory(public_dir, file_path)
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PathValidator is instantiated with public_dir as the base directory on line 316, but then validate_inside_directory is called with public_dir and file_path again on line 317. This is redundant because you could use the instance's validate_file_access method instead, or simply call validate_inside_directory without passing public_dir to the constructor.

The current code creates a PathValidator with a base_directory but then doesn't leverage that instance state. Consider either:

  1. Using validator.validate_file_access(file_path) instead, which would internally use the base_directory, OR
  2. Creating the validator without a base directory: validator = PathValidator() and keeping the current validate_inside_directory call
Suggested change
validator.validate_inside_directory(public_dir, file_path)
validator.validate_file_access(file_path)
Copilot uses AI. Check for mistakes.
Comment on lines 20 to 23
from marimo._server.models.models import (
CopyNotebookRequest,
SaveNotebookRequest,
)
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The imports CopyNotebookRequest and SaveNotebookRequest are imported twice: once at lines 20-23 as regular imports, and again at lines 35-39 inside the TYPE_CHECKING block. These are redundant - since they're already imported at the top, the TYPE_CHECKING imports should be removed.

Copilot uses AI. Check for mistakes.
Comment on lines +161 to +164
raise HTTPException(
status_code=HTTPStatus.REQUEST_TIMEOUT,
detail=f"Request timed out: Loading workspace files took too long. Showing first {file_count[0]} files.", # noqa: E501
)
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The accumulated_results list is populated during directory scanning but is never returned when a timeout occurs. In the previous implementation (in file_router.py), partial results were stored in self._lazy_files before raising the timeout exception, allowing the caller to return partial results.

In the new DirectoryScanner, when a timeout occurs at line 161, the accumulated_results are lost. The caller in file_router.py (lines 268-275) expects to handle partial results on timeout, but self._lazy_files will still be None, so it falls back to an empty list instead of returning the files that were successfully scanned.

To fix this, DirectoryScanner should either:

  1. Include accumulated_results in the HTTPException's detail or as a custom attribute
  2. Return accumulated_results before raising the timeout exception
  3. Use a callback mechanism to provide partial results
Suggested change
raise HTTPException(
status_code=HTTPStatus.REQUEST_TIMEOUT,
detail=f"Request timed out: Loading workspace files took too long. Showing first {file_count[0]} files.", # noqa: E501
)
exc = HTTPException(
status_code=HTTPStatus.REQUEST_TIMEOUT,
detail=f"Request timed out: Loading workspace files took too long. Showing first {file_count[0]} files.", # noqa: E501
)
# Attach partial results so callers can access scanned files
setattr(exc, "partial_results", accumulated_results)
raise exc
Copilot uses AI. Check for mistakes.
contain `marimo-version:`.
- Python (`.py`) files are marimo apps if the header (first 512 bytes)
contains both `marimo.App` and `import marimo`.
- If the header contains `# /// script`, read the full file and check for
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the 512 trick is a DDOS prevention isn't this just an easy trick around that?
Could just bump this up a fair amount in this case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see for perf. May as well only check an MB or something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only a refactor, not changing any logic



@dataclass
class AppDefaults:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chance to couple with marimo/_ast/app_config.py? Agree we shouldn't move around a full AppConfig object though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only for the server though, i don't think "defaults" belong in _ast

contain `marimo-version:`.
- Python (`.py`) files are marimo apps if the header (first 512 bytes)
contains both `marimo.App` and `import marimo`.
- If the header contains `# /// script`, read the full file and check for
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see for perf. May as well only check an MB or something

@mscolnick mscolnick merged commit bfb742a into main Dec 30, 2025
39 of 77 checks passed
@mscolnick mscolnick deleted the ms/refactor-file-source branch December 30, 2025 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

3 participants