Skip to content

Updated the stream_select() function so that it supports bigger file … #14452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

frodeborli
Copy link

Updated the stream_select() function so that it supports bigger file descriptor numbers without recompiling PHP on non-Windows OS, by passing the select() system call dynamically generated fd_sets with a custom fd_bigset struct type.

The select() system call supports unlimited numbers of file descriptors technically, and so should all BSD derived unix operating systems. The perceived limitation of the select() system call to support only 1024 file descriptors is because of the fd_set structure being prebuilt to FD_SETSIZE length, but the kernel supports unlimited.

The new version of this function allocates memory on the heap to build an fd_bigset which is passed to the select() system call, and then frees the structures.

I decided to leave the functions stream_array_to_fd_set and stream_array_from_fd_set functions in streamfuncs.c in case somebody is using that.

I did not change the php_select macro, only the stream_select function.

…descriptor numbers without recompiling PHP on non-Windows OS, by passing the select() system call dynamically generated fd_sets with a custom fd_bigset struct type
…ow if those functions could be referenced by other extensions.
@frodeborli
Copy link
Author

The testing complained that I had left behind two functions which I didn't dare to remove in case somebody else was using them. I have updated the pr.

@frodeborli
Copy link
Author

I'm not sure why the tests timed out or failed; it does not seem to be related to my patch.

Copy link
Member

@arnaud-lb arnaud-lb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big +1 on the general idea. This fixes a long standing issue with stream_select() not supporting file descriptors above 1024.

@bukka @devnexen wdyt?

Maybe we can remove the --enable-fd-setsize configure flag as well. I believe it's a no-op in glibc for some time now.

@@ -756,115 +782,132 @@ static int stream_array_emulate_read_fd_set(zval *stream_array)
/* }}} */

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be tested from useland when more than 1024 file descriptors are created/used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also when there is less than 1024, but at least one has a number > 1024

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mvorisek I used this code to test:

<?php
// Create a large number of file descriptors
$files = [];
$r = $w = $e = [];

for ($i = 0; $i < 20000; $i++) {
    $fileName = tempnam(sys_get_temp_dir(), 'testfile');
    $fp = fopen($fileName, 'r');
    if ($fp === false) {
        die("Failed to open file: $fileName\n");
    }
    $files[] = ['name' => $fileName, 'handle' => $fp];
    $r[] = $fp;
}

// Check the number of file descriptors
$res = stream_select($r, $w, $e, 0, 0);
var_dump($res, $r);
// Clean up
foreach ($files as $file) {
    fclose($file['handle']);
    unlink($file['name']);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A function doing dup2() can probably be added to zend_test, allowing to zend_test_dup2(1, 9999).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TimWolla Is zend_test a mechanism to add special functions to the PHP runtime during testing? I think it would be nice to have zend_test_get_fd($resource), but perhaps that already exists?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frodeborli

Is zend_test a mechanism to add special functions to the PHP runtime during testing?

Yes, exactly. It is meant to provide helpers to test stuff that would otherwise be untestable. You can find it in https://github.com/php/php-src/tree/master/ext/zend_test and can enable it at build time using --enable-zend-test.

@frodeborli
Copy link
Author

I will look over the PR thoroughly later today or tomorrow morning.

Can anybody tell me how to build this in a stricter mode? Several commits could have been avoided by me if only my environment would let me know about unused functions etc, instead of waiting for the github CI process to check for me.

@devnexen
Copy link
Member

devnexen commented Jun 3, 2024

I will look over the PR thoroughly later today or tomorrow morning.

Can anybody tell me how to build this in a stricter mode? Several commits could have been avoided by me if only my environment would let me know about unused functions etc, instead of waiting for the github CI process to check for me.

Speaking of CI, you can get inspiration from it if you like.

@ramsey
Copy link
Member

ramsey commented Jun 4, 2024

The only feedback I would give is to please update your editor to respect the .editorconfig file provided with PHP. https://editorconfig.org

It looks like you've converted a lot of tab characters to spaces, so the diff is much bigger than it should be.

@frodeborli
Copy link
Author

frodeborli commented Jun 4, 2024

I am having trouble because stream wrappers emit warnings without respecting the silent cast argument of php_stream_cast...

So a phpt test does not generate the exact same output anymore after my patch.

@arnaud-lb
Copy link
Member

arnaud-lb commented Jun 4, 2024

The cause of the extra warnings in ext/standard/tests/file/userstreams_002.phpt appear to be that max_fd is never equal to -1 here.

@frodeborli
Copy link
Author

Now it seems all tests succeeded, except Windows 64 bit. Not sure how to proceed. I think the php_select macro refers to a win32 implementation which is meant to be compatible with select() in Linux and therefore should work.


if (!PHP_SAFE_MAX_FD(max_fd, max_set_count)) {
if (max_set_count == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we missing FD_BIGSET_FREE before returning?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where you mean; I think FD_BIGSET_FREE is everywhere it is needed now, but there was a few places they were missing before the last push.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've already called FD_BIGSET_ZERO by this point, so the memory is allocated. On this line, we are about to return without freeing that memory; thus I think we need to free it.

That being said, I'm not quite sure how we would get here.

@frodeborli
Copy link
Author

frodeborli commented Jun 7, 2024 via email

@frodeborli
Copy link
Author

The failing tests seem to relate to jit, not stream_select?

@frodeborli
Copy link
Author

frodeborli commented Jun 27, 2024 via email

Comment on lines +940 to +942
size_t largest = rfds.size;
if (largest < wfds.size) largest = wfds.size;
if (largest < efds.size) largest = efds.size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use max_fd to directly compute the largest size?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like:

while (max_fd > rfds.size) fd_bigset_double_size(&rfds);

I was just not confident about possibly off by one error here...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the same condition as in FD_BIGSET_ENSURE_CAPACITY(), or pre-compute the largest size with:

size_t num_fds = max_fd + 1;
size_t largest = (num_fds + 7) / 8;
@frodeborli
Copy link
Author

@arnaud-lb Thank you for your detailed review. I'm merging the latest php-src and pushing an updated PR with all your suggestions.

I was looking into Windows which apparently only has FD_SETSIZE 64 by default, and it uses an entirely different fd_set structure. Not sure how this is handled inside PHP and if my approach breaks it? I've not got a Windows C development environment, but I can look into it if you can give some hints perhaps?

@arnaud-lb
Copy link
Member

Oh right, it looks like this will break on Windows. We use WinSock select() according to this comment, so fd_set is as defined here: https://learn.microsoft.com/en-us/windows/win32/api/winsock/ns-winsock-fd_set on Windows. It represents an array of handles rather than a bitset, and the limit is the number of watched fds rather than the max fd value.

We can use the same strategy as for other platforms, but use the fd count to decide whether to grow the set. We have to define the FD_BIGSET_ macros differently on windows:

#ifdef PHP_WIN32

typedef struct {
    uint32_t capacity;
    fd_set set;
} fd_bigset;

#define FD_BIGSET_ENSURE_CAPACITY(fd, set) \
    if (UNEXPECTED(fd.set.count == fd.capacity)) { \
        fd_bigset_double_size(set); \
    }

// other FD_BIGSET macros for windows

#else 

// FD_BIGSET macros for non-windows

#endif

However there is a (non-enforced) hard limit of MAXIMUM_WAIT_OBJECTS (64) non-socket fds in win32/select.c. I would suggest to return a failure from php_select() when there are more than MAXIMUM_WAIT_OBJECTS non-socket fds.

php_select() copies fd_set structs by assignment (fd_set sock_read, aread; ...; aread = sock_read;). This will not work with custom sized sets, so we need to replace these with memcpy()s.

It would make sense to do part of this work later in a separate PR, as long as this PR doesn't break windows. E.g. don't support more than FD_SETSIZE fds for now on Windows.

I've not got a Windows C development environment, but I can look into it if you can give some hints perhaps?

I sometimes use a Windows VM from https://developer.microsoft.com/en-us/windows/downloads/virtual-machines/ (The VirtualBox ones work well. If you use virt-manager you may be able to convert them to KVM but that's not trivial.). This page describes how to build on windows: https://wiki.php.net/internals/windows/stepbystepbuild_sdk_2

Copy link
Member

@bukka bukka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take a proper look later but could you please in the meantime fix the coding style.

Comment on lines +620 to +655
typedef struct {
char *fds_bits;
size_t size;
} fd_bigset;
#define FD_BIGSET_ENSURE_CAPACITY(fd, set) \
if (UNEXPECTED(((fd) / 8 >= (set)->size))) { \
fd_bigset_double_size(set); \
}
#define FD_BIGSET_ZERO(set, num_fds) do { \
(set)->size = (num_fds + 7) / 8; \
(set)->fds_bits = (char *)ecalloc((set)->size, sizeof(char)); \
} while (0)
#define FD_BIGSET_SET(fd, set) do { \
FD_BIGSET_ENSURE_CAPACITY(fd, set) \
(set)->fds_bits[(fd) / 8] |= (1 << ((fd) % 8)); \
} while (0)
#define FD_BIGSET_ISSET(fd, set) (((fd) / 8 < (set)->size) ? ((set)->fds_bits[(fd) / 8] & (1 << ((fd) % 8))) : 0)
#define FD_BIGSET_CLR(fd, set) do { \
if ((fd) / 8 < (set)->size) { \
(set)->fds_bits[(fd) / 8] &= ~(1 << ((fd) % 8)); \
} \
} while (0)
#define FD_BIGSET_FREE(set) do { \
if ((set)->fds_bits) { \
efree((set)->fds_bits); \
(set)->fds_bits = NULL; \
} \
(set)->size = 0; \
} while (0)

/* {{{ Function to double the size of an fd_bigset */
static void fd_bigset_double_size(fd_bigset *set) {
ZEND_ASSERT(set && set->fds_bits);

size_t old_size = set->size;
size_t new_size = old_size * 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please fix coding style - we use tabs, not spaces. This is making the diff bigger and harder to review.

@bukka
Copy link
Member

bukka commented Jul 5, 2024

btw I have a real Windows env so I can test it then too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment