-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Updated the stream_select() function so that it supports bigger file … #14452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…descriptor numbers without recompiling PHP on non-Windows OS, by passing the select() system call dynamically generated fd_sets with a custom fd_bigset struct type
…ow if those functions could be referenced by other extensions.
The testing complained that I had left behind two functions which I didn't dare to remove in case somebody else was using them. I have updated the pr. |
…rent architecture than what I developed this on.
I'm not sure why the tests timed out or failed; it does not seem to be related to my patch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -756,115 +782,132 @@ static int stream_array_emulate_read_fd_set(zval *stream_array) | |||
/* }}} */ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be tested from useland when more than 1024 file descriptors are created/used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also when there is less than 1024, but at least one has a number > 1024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mvorisek I used this code to test:
<?php
// Create a large number of file descriptors
$files = [];
$r = $w = $e = [];
for ($i = 0; $i < 20000; $i++) {
$fileName = tempnam(sys_get_temp_dir(), 'testfile');
$fp = fopen($fileName, 'r');
if ($fp === false) {
die("Failed to open file: $fileName\n");
}
$files[] = ['name' => $fileName, 'handle' => $fp];
$r[] = $fp;
}
// Check the number of file descriptors
$res = stream_select($r, $w, $e, 0, 0);
var_dump($res, $r);
// Clean up
foreach ($files as $file) {
fclose($file['handle']);
unlink($file['name']);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A function doing dup2()
can probably be added to zend_test
, allowing to zend_test_dup2(1, 9999)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TimWolla Is zend_test a mechanism to add special functions to the PHP runtime during testing? I think it would be nice to have zend_test_get_fd($resource), but perhaps that already exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is zend_test a mechanism to add special functions to the PHP runtime during testing?
Yes, exactly. It is meant to provide helpers to test stuff that would otherwise be untestable. You can find it in https://github.com/php/php-src/tree/master/ext/zend_test and can enable it at build time using --enable-zend-test
.
…est file descriptor id and allocates a correct size.
I will look over the PR thoroughly later today or tomorrow morning. Can anybody tell me how to build this in a stricter mode? Several commits could have been avoided by me if only my environment would let me know about unused functions etc, instead of waiting for the github CI process to check for me. |
Speaking of CI, you can get inspiration from it if you like. |
The only feedback I would give is to please update your editor to respect the It looks like you've converted a lot of tab characters to spaces, so the diff is much bigger than it should be. |
I am having trouble because stream wrappers emit warnings without respecting the silent cast argument of php_stream_cast... So a phpt test does not generate the exact same output anymore after my patch. |
The cause of the extra warnings in |
Now it seems all tests succeeded, except Windows 64 bit. Not sure how to proceed. I think the php_select macro refers to a win32 implementation which is meant to be compatible with select() in Linux and therefore should work. |
|
||
if (!PHP_SAFE_MAX_FD(max_fd, max_set_count)) { | ||
if (max_set_count == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we missing FD_BIGSET_FREE
before returning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure where you mean; I think FD_BIGSET_FREE is everywhere it is needed now, but there was a few places they were missing before the last push.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've already called FD_BIGSET_ZERO
by this point, so the memory is allocated. On this line, we are about to return without freeing that memory; thus I think we need to free it.
That being said, I'm not quite sure how we would get here.
… the PR from github.
in any case it would fall back to the current behavior
Sendt fra Outlook for iOS<https://aka.ms/o0ukef>
________________________________
Fra: Rob Landers ***@***.***>
Sendt: Friday, June 7, 2024 4:36:36 PM
Til: php/php-src ***@***.***>
Kopi: Frode Børli ***@***.***>; Author ***@***.***>
Emne: Re: [php/php-src] Updated the stream_select() function so that it supports bigger file … (PR #14452)
@withinboredom commented on this pull request.
________________________________
In ext/standard/streamsfuncs.c<#14452 (comment)>:
+ FD_BIGSET_ZERO(&rfds, max_fds);
+ FD_BIGSET_ZERO(&wfds, max_fds);
+ FD_BIGSET_ZERO(&efds, max_fds);
See man sysconf<https://www.man7.org/linux/man-pages/man3/sysconf.3.html>:
The return value of sysconf() is one of the following:
On error, -1 is returned and errno<https://www.man7.org/linux/man-pages/man3/errno.3.html> is set to indicate the
error (for example, EINVAL, indicating that name is invalid).
If name corresponds to a maximum or minimum limit, and that
limit is indeterminate, -1 is returned and errno<https://www.man7.org/linux/man-pages/man3/errno.3.html> is not
changed. (To distinguish an indeterminate limit from an
error, set errno<https://www.man7.org/linux/man-pages/man3/errno.3.html> to zero before the call, and then check
whether errno<https://www.man7.org/linux/man-pages/man3/errno.3.html> is nonzero when -1 is returned.)
If name corresponds to an option, a positive value is returned
if the option is supported, and -1 is returned if the option
is not supported.
Otherwise, the current value of the option or limit is
returned. This value will not be more restrictive than the
corresponding value that was described to the application in
<unistd.h> or <limits.h> when the application was compiled.
So, theoretically, if uname -n unlimited were possible, you could also receive a -1 too. AFAIK, it isn't possible to set the open file limit to unlimited though. (I could be wrong, but I'm 95% sure).
—
Reply to this email directly, view it on GitHub<#14452 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AARRLUDB2NEIVS4KIIK75JTZGHAPJAVCNFSM6AAAAABIWFGFUGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCMBUG42TENJUGQ>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
…se an extremely high ulimit -n is configured.
The failing tests seem to relate to jit, not stream_select? |
I think it will be very rare with file descriptors above 131k, at least for the next couple of years. Perhaps the best solution is to reallocate on the fly if we see fds above 131k and use a single pass. 131k is extremely high for a single process so I think it will be rare to encounter it.
|
size_t largest = rfds.size; | ||
if (largest < wfds.size) largest = wfds.size; | ||
if (largest < efds.size) largest = efds.size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use max_fd
to directly compute the largest size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like:
while (max_fd > rfds.size) fd_bigset_double_size(&rfds);
I was just not confident about possibly off by one error here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use the same condition as in FD_BIGSET_ENSURE_CAPACITY(), or pre-compute the largest size with:
size_t num_fds = max_fd + 1;
size_t largest = (num_fds + 7) / 8;
@arnaud-lb Thank you for your detailed review. I'm merging the latest php-src and pushing an updated PR with all your suggestions. I was looking into Windows which apparently only has FD_SETSIZE 64 by default, and it uses an entirely different fd_set structure. Not sure how this is handled inside PHP and if my approach breaks it? I've not got a Windows C development environment, but I can look into it if you can give some hints perhaps? |
Oh right, it looks like this will break on Windows. We use WinSock select() according to this comment, so fd_set is as defined here: https://learn.microsoft.com/en-us/windows/win32/api/winsock/ns-winsock-fd_set on Windows. It represents an array of handles rather than a bitset, and the limit is the number of watched fds rather than the max fd value. We can use the same strategy as for other platforms, but use the fd count to decide whether to grow the set. We have to define the FD_BIGSET_ macros differently on windows:
However there is a (non-enforced) hard limit of php_select() copies It would make sense to do part of this work later in a separate PR, as long as this PR doesn't break windows. E.g. don't support more than FD_SETSIZE fds for now on Windows.
I sometimes use a Windows VM from https://developer.microsoft.com/en-us/windows/downloads/virtual-machines/ (The VirtualBox ones work well. If you use virt-manager you may be able to convert them to KVM but that's not trivial.). This page describes how to build on windows: https://wiki.php.net/internals/windows/stepbystepbuild_sdk_2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will take a proper look later but could you please in the meantime fix the coding style.
typedef struct { | ||
char *fds_bits; | ||
size_t size; | ||
} fd_bigset; | ||
#define FD_BIGSET_ENSURE_CAPACITY(fd, set) \ | ||
if (UNEXPECTED(((fd) / 8 >= (set)->size))) { \ | ||
fd_bigset_double_size(set); \ | ||
} | ||
#define FD_BIGSET_ZERO(set, num_fds) do { \ | ||
(set)->size = (num_fds + 7) / 8; \ | ||
(set)->fds_bits = (char *)ecalloc((set)->size, sizeof(char)); \ | ||
} while (0) | ||
#define FD_BIGSET_SET(fd, set) do { \ | ||
FD_BIGSET_ENSURE_CAPACITY(fd, set) \ | ||
(set)->fds_bits[(fd) / 8] |= (1 << ((fd) % 8)); \ | ||
} while (0) | ||
#define FD_BIGSET_ISSET(fd, set) (((fd) / 8 < (set)->size) ? ((set)->fds_bits[(fd) / 8] & (1 << ((fd) % 8))) : 0) | ||
#define FD_BIGSET_CLR(fd, set) do { \ | ||
if ((fd) / 8 < (set)->size) { \ | ||
(set)->fds_bits[(fd) / 8] &= ~(1 << ((fd) % 8)); \ | ||
} \ | ||
} while (0) | ||
#define FD_BIGSET_FREE(set) do { \ | ||
if ((set)->fds_bits) { \ | ||
efree((set)->fds_bits); \ | ||
(set)->fds_bits = NULL; \ | ||
} \ | ||
(set)->size = 0; \ | ||
} while (0) | ||
|
||
/* {{{ Function to double the size of an fd_bigset */ | ||
static void fd_bigset_double_size(fd_bigset *set) { | ||
ZEND_ASSERT(set && set->fds_bits); | ||
|
||
size_t old_size = set->size; | ||
size_t new_size = old_size * 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please fix coding style - we use tabs, not spaces. This is making the diff bigger and harder to review.
btw I have a real Windows env so I can test it then too. |
Updated the stream_select() function so that it supports bigger file descriptor numbers without recompiling PHP on non-Windows OS, by passing the select() system call dynamically generated fd_sets with a custom fd_bigset struct type.
The select() system call supports unlimited numbers of file descriptors technically, and so should all BSD derived unix operating systems. The perceived limitation of the select() system call to support only 1024 file descriptors is because of the fd_set structure being prebuilt to FD_SETSIZE length, but the kernel supports unlimited.
The new version of this function allocates memory on the heap to build an fd_bigset which is passed to the select() system call, and then frees the structures.
I decided to leave the functions stream_array_to_fd_set and stream_array_from_fd_set functions in streamfuncs.c in case somebody is using that.
I did not change the php_select macro, only the stream_select function.