-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Add the Uri\Rfc3986\Uri class to ext/uri without wither support #18836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
43c1d9f
to
155e070
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursory glance, I see there's some todos wrt memory management as well so maybe I'll wait a bit on that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some first remarks from just looking at the diff on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something I'm only noticing now: I think this should rather be called ext/uri/parser_rfc3986.c
and php_lexbor.c
would be ext/uri/parser_whatwg.c
. The php_uriparser.c
name is confusing to me, because uriparser
is an extremely generic term.
To give another comparison with ext/random, since it is architecturally similar: Each engine has its own engine_enginename.c
file, e.g. engine_xoshiro256starstar.c
.
efree(uriparser_uris); | ||
} | ||
|
||
const uri_handler_t uriparser_uri_handler = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And similarly the public symbols in the file should also be prefixed with php_uri_
. Just uriparser_
can conflict with the uriparser library itself.
Here I would suggest php_uri_rfc3986_handler
.
But I'm also happy to leave this to a follow-up to not introduce too much churn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this and the file renaming is something that I'm happy to do after most parts are merged.
This comment was marked as resolved.
This comment was marked as resolved.
@TimWolla Thanks for the patch, it's awesome! TBH I didn't even think about using the |
@kocsismate When stack allocating the struct the code should become even simpler (and possibly faster, due to fewer pointer indirection). |
0b3cfd8
to
062dc4d
Compare
9b63691
to
e0b40a7
Compare
Co-authored-by: Tim Düsterhus <tim@tideways-gmbh.com>
@@ -153,7 +153,10 @@ int URI_FUNC(CopyUriMm)(URI_TYPE(Uri) * destUri, | |||
*(destUri->hostData.ip6) = *(sourceUri->hostData.ip6); | |||
} | |||
|
|||
if (URI_FUNC(CopyRangeAsNeeded)(&destUri->hostData.ipFuture, &sourceUri->hostData.ipFuture, URI_FALSE, memory) == URI_FALSE) { | |||
if (sourceUri->hostData.ipFuture.first != NULL && sourceUri->hostText.first == sourceUri->hostData.ipFuture.first) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no further suggestions, but don't feel qualified to approve this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly okay for an initial merge now. The additional clean-up can happen in a follow-up to move forward. The leak should be fixed, though.
PHP_METHOD(Uri_WhatWg_InvalidUrlException, __construct) | ||
{ | ||
zend_string *message = NULL; | ||
zval *errors = NULL; | ||
zend_long code = 0; | ||
zval *previous = NULL; | ||
|
||
ZEND_PARSE_PARAMETERS_START(0, 4) | ||
Z_PARAM_OPTIONAL | ||
Z_PARAM_STR(message) | ||
Z_PARAM_ARRAY(errors) | ||
Z_PARAM_LONG(code) | ||
Z_PARAM_OBJECT_OF_CLASS_OR_NULL(previous, zend_ce_throwable) | ||
ZEND_PARSE_PARAMETERS_END(); | ||
|
||
if (zend_update_exception_properties(INTERNAL_FUNCTION_PARAM_PASSTHRU, message, code, previous) == FAILURE) { | ||
RETURN_THROWS(); | ||
} | ||
|
||
if (errors == NULL) { | ||
zval tmp; | ||
ZVAL_EMPTY_ARRAY(&tmp); | ||
zend_update_property(uri_whatwg_invalid_url_exception_ce, Z_OBJ_P(ZEND_THIS), ZEND_STRL("errors"), &tmp); | ||
} else { | ||
zend_update_property(uri_whatwg_invalid_url_exception_ce, Z_OBJ_P(ZEND_THIS), ZEND_STRL("errors"), errors); | ||
} | ||
if (EG(exception)) { | ||
RETURN_THROWS(); | ||
} | ||
} | ||
|
||
PHP_METHOD(Uri_WhatWg_UrlValidationError, __construct) | ||
{ | ||
zend_string *context; | ||
zval *type; | ||
bool failure; | ||
|
||
ZEND_PARSE_PARAMETERS_START(3, 3) | ||
Z_PARAM_STR(context) | ||
Z_PARAM_OBJECT_OF_CLASS(type, uri_whatwg_url_validation_error_type_ce) | ||
Z_PARAM_BOOL(failure) | ||
ZEND_PARSE_PARAMETERS_END(); | ||
|
||
zend_update_property_str(uri_whatwg_url_validation_error_ce, Z_OBJ_P(ZEND_THIS), ZEND_STRL("context"), context); | ||
if (EG(exception)) { | ||
RETURN_THROWS(); | ||
} | ||
|
||
zend_update_property_ex(uri_whatwg_url_validation_error_ce, Z_OBJ_P(ZEND_THIS), ZSTR_KNOWN(ZEND_STR_TYPE), type); | ||
if (EG(exception)) { | ||
RETURN_THROWS(); | ||
} | ||
|
||
zval failure_zv; | ||
ZVAL_BOOL(&failure_zv, failure); | ||
zend_update_property(uri_whatwg_url_validation_error_ce, Z_OBJ_P(ZEND_THIS), ZEND_STRL("failure"), &failure_zv); | ||
if (EG(exception)) { | ||
RETURN_THROWS(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't particularly like that you reordered these. It makes the diff hard to read.
PHP_METHOD(Uri_Rfc3986_Uri, equals) | ||
{ | ||
zend_object *that_object; | ||
zend_object *comparison_mode = NULL; | ||
|
||
ZEND_PARSE_PARAMETERS_START(1, 2) | ||
Z_PARAM_OBJ_OF_CLASS(that_object, uri_rfc3986_uri_ce) | ||
Z_PARAM_OPTIONAL | ||
Z_PARAM_OBJ_OF_CLASS(comparison_mode, uri_comparison_mode_ce) | ||
ZEND_PARSE_PARAMETERS_END(); | ||
|
||
uri_equals(INTERNAL_FUNCTION_PARAM_PASSTHRU, that_object, comparison_mode); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly to the other remark about file naming and organization, it might make sense to move the PHP_METHOD
implementations into the matching source file to keep php_uri.c
slim.
if (uriparser_uris->normalized_uri_initialized == false) { | ||
uriparser_copy_uri(&uriparser_uris->normalized_uri, &uriparser_uris->uri); | ||
uriparser_normalize_uri(&uriparser_uris->normalized_uri); | ||
uriparser_uris->normalized_uri_initialized = true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (uriparser_uris->normalized_uri_initialized == false) { | |
uriparser_copy_uri(&uriparser_uris->normalized_uri, &uriparser_uris->uri); | |
uriparser_normalize_uri(&uriparser_uris->normalized_uri); | |
uriparser_uris->normalized_uri_initialized = true; | |
} | |
if (uriparser_uris->normalized_uri_initialized) { | |
return; | |
} | |
uriparser_copy_uri(&uriparser_uris->normalized_uri, &uriparser_uris->uri); | |
uriparser_normalize_uri(&uriparser_uris->normalized_uri); | |
uriparser_uris->normalized_uri_initialized = true; |
Perhaps?
ZEND_ASSERT(uriparser_uri != NULL); | ||
|
||
if (uriparser_uri->userInfo.first != NULL && uriparser_uri->userInfo.afterLast != NULL) { | ||
const char *c = memchr(uriparser_uri->userInfo.first, ':', get_text_range_length(&uriparser_uri->userInfo)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation is inconsistent with the implementation in uriparser_read_username
.
|
||
static uriparser_uris_t *uriparser_create_uris(void) | ||
{ | ||
uriparser_uris_t *uriparser_uris = emalloc(sizeof(*uriparser_uris)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uriparser_uris_t *uriparser_uris = emalloc(sizeof(*uriparser_uris)); | |
uriparser_uris_t *uriparser_uris = ecalloc(1, sizeof(*uriparser_uris)); |
This would cleanly zero-initialize everything. As far as I can see, the uriparser library guarantees that calling uriFreeUriMembersA()
on a zero-initialized URI is also safe, thus this improves the overall safety somewhat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asked if this guarantee will continue to hold: uriparser/uriparser#238
efree(uriparser_uris); | ||
if (!silent) { | ||
throw_invalid_uri_exception(); | ||
} | ||
|
||
return NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will leak uri
. Test case:
<?php
var_dump(
new Uri\Rfc3986\Uri('foo', new Uri\Rfc3986\Uri('bar'))
);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
* is discarded altogether. */ | ||
static void *uriparser_clone_uri(void *uri) | ||
{ | ||
uriparser_uris_t *uriparser_uris = (uriparser_uris_t *) uri; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uriparser_uris_t *uriparser_uris = (uriparser_uris_t *) uri; | |
uriparser_uris_t *uriparser_uris = uri; |
Same below.
uriFreeUriMembersA(&uriparser_uris->uri); | ||
|
||
if (uriparser_uris->normalized_uri_initialized) { | ||
ZEND_ASSERT(uriparser_uris->normalized_uri.owner); | ||
|
||
uriFreeUriMembersA(&uriparser_uris->normalized_uri); | ||
} | ||
|
||
efree(uriparser_uris); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When switching to calloc()
, you can do:
uriFreeUriMembersA(&uriparser_uris->uri); | |
if (uriparser_uris->normalized_uri_initialized) { | |
ZEND_ASSERT(uriparser_uris->normalized_uri.owner); | |
uriFreeUriMembersA(&uriparser_uris->normalized_uri); | |
} | |
efree(uriparser_uris); | |
uriFreeUriMembersA(&uriparser_uris->uri); | |
uriFreeUriMembersA(&uriparser_uris->normalized_uri); | |
efree(uriparser_uris); |
Upstream confirmed that this is safe.
ZEND_ASSERT(result == URI_SUCCESS); | ||
} | ||
|
||
static void normalize_uri_if_needed(uriparser_uris_t *uriparser_uris) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be a little nicer to make this:
static UriUriA *get_normalized_uri(uriparser_uris_t *uriparser_uris) {
if (!uriparser_uris->normalized_uri_initialized) {
// normalize
}
return &uriparser_uris->normalized_uri;
}
Then you can just is this as a getter for the normalized URI and it will to the right thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the things that Tim said, here some small remarks
ZEND_ASSERT(uriparser_uri != NULL); | ||
|
||
if (uriparser_uri->hostText.first != NULL && uriparser_uri->hostText.afterLast != NULL && get_text_range_length(&uriparser_uri->hostText) > 0) { | ||
if (uriparser_uri->hostData.ip6 != NULL || uriparser_uri->hostData.ipFuture.first != NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this needs a comment to explain that the hostText overlaps with ip6 / ipFuture
return SUCCESS; | ||
} | ||
|
||
static int str_to_int(const char *str, int len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess len
should actually be a size_t to be strictly correct, and i
too.
} | ||
|
||
ZVAL_NEW_STR(retval, smart_str_extract(&str)); | ||
} else if (uriparser_uri->absolutePath) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be:
} else if (uriparser_uri->absolutePath) { | |
} else if (uriparser_uri->absolutePath || uriHasHostA(uriparser_uri)) { |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this would make a few tests fail:
$uri = new Uri\Rfc3986\Uri("https://example.com");
var_dump($uri->getPath());
Currently, this outputs string(0) ""
, but it would then output string(1) "/"
. Handling of the leading slash is a bit convoluted with uriparser. I filed a ticket with my question (uriparser/uriparser#215) and the uriparser/uriparser#8 (comment) table should clear all the questions.
charsRequired++; | ||
|
||
zend_string *uri_string = zend_string_alloc(charsRequired - 1, false); | ||
result = uriToStringA(ZSTR_VAL(uri_string), uriparser_uri, charsRequired, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the +1/-1 for the NUL byte?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! This is highlighted in the uriparser documentation: https://uriparser.github.io/doc/api/latest/index.html#recomposition
There are a few WIP parts but the PR can already be reviewed.
Relates to #14461 and https://wiki.php.net/rfc/url_parsing_api