Skip to content

gsoc2023: net: http: culminating pull request#59669

Closed
Emna-Rekik wants to merge 41 commits intozephyrproject-rtos:mainfrom
Emna-Rekik:gsoc-2023-http
Closed

gsoc2023: net: http: culminating pull request#59669
Emna-Rekik wants to merge 41 commits intozephyrproject-rtos:mainfrom
Emna-Rekik:gsoc-2023-http

Conversation

@Emna-Rekik
Copy link

@Emna-Rekik Emna-Rekik commented Jun 23, 2023

This PR includes all commits for the gsoc2023 HTTP project.

Please see each commit message for details.

This commit adds an empty test suite for http server.
No test cases have been added in this commit.
However, creating the empty test suite provides a structure
for adding tests in subsequent commits.

Signed-off-by: Emna Rekik <emna.rekik007@gmail.com>
@cfriedt cfriedt added area: Networking GSoC Google Summer of Code labels Jun 23, 2023
@cfriedt cfriedt changed the title tests: net: http: create empty testsuites for gsoc 2023 project Jun 23, 2023
@Emna-Rekik Emna-Rekik force-pushed the gsoc-2023-http branch 2 times, most recently from 8feb4e8 to 287d7b1 Compare June 26, 2023 21:43
Copy link
Contributor

@rlubos rlubos Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of a sample than a library at the moment.

Having a sample is fine - and we should have one, but subsys/ is not where it belongs. And the sample should make actual use of the library we're working on.

Generally, we really need to think on how should the HTTP2 server library look like. For sure we'll need a public API (as I wrote on Discord, the public header should be defined in include/zephyr/net/http). You need to think on what public functions should be exposed to the application. The most basic ones that come up to me would be:

  • http2_server_init()
  • http2_server_start()
  • http2_server_stop()

For the resource (webpage) representation in the server, you should have a look at the API added some time ago by @cfriedt (which could likely provide more details about it): https://github.com/zephyrproject-rtos/zephyr/blob/main/include/zephyr/net/http/service.h#L19

Finally, the sample app should make use of the above to create a HTTP2 server. And we should cover those functionalities in tests, just as listed in the GSoC umbrella issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, to expand on this a bit, the http service description was specifically designed to be more-or-less independent of any actual implementation details.

What that allows us to do, for example, is to easily (for various definitions of easy) compare two or more different http server implementations in various sample applications and testsuites. There may be an implementation-specific / conditionally compiled source file here or there, but otherwise it should work.

Ultimately, this will allow us to quantitatively compare HTTP server implementations in Zephyr and provide graphs, evidence, etc to show that e.g. Emna's server implementation outperforms CivetWeb by a factor of 2 or something along those lines, or this implementation or that implementation is able to handle N concurrent clients while using M bytes of memory, and so on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I can pull the CivetWeb implementation in for reference purposes, in a way that we can easily choose this or that based on Kconfig.

Copy link
Contributor

@rlubos rlubos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Emna-Rekik Please familiarize yourself with https://docs.zephyrproject.org/latest/contribute/guidelines.html#coding-style, we should follow this standard for Zephyr code. We should use tabs for indentation (configured as 8 spaces/tab), use snake_case for function/variable names etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be easier for apps to work with binary IP representation (we have struct net_addr that covers family and binary IP address).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, but I'd use uint8_t * here.

Comment on lines 8 to 14
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cfriedt I think we should settle whether we want to stick to POSIX socket APIs, or use zsock_* prefixed ones. The former approach proved to be problematic in the past (as we still have two ways to enable socket POSIX APIs). Not sure if we need the dependency to POSIX either in a networking component.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this conceptually belong to the HTTP1 parsing part - update header is part of HTTP1 request. IMO we only need to reply for that and take no further action (i. e. expect HTTP2 request in the next step).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed last week - we may lose stream data here (overwrite it). If the initial recv() call received more data than only preface, we lose it here.

Also, think we should rethink the design here. recv() should only be caleld after the poll() call reports there is new data to read - otherwise we may block inside handle_http2_request(), effectively preventing the server from handling other requests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasnt' this already done in http2_server_start()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this was for an HTTP/1 client after it was upgraded to HTTP/2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should really avoid declaring large buffers on a stack. I think we should have one buffer statically allocated per client, and use only it for communication.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And actually it should be even needed to copy the buffers, just offset the orignal buffer when providing the pointer to the function (so you don't provide buffer directly, but buffer + some_offset).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still should be prepared for this case. I. e. the actual paylaod size may exceed the buffer we're processing, so that in case of the next recv() call we should treat the remainig data as a payload for the previous frame, and not try to parse the new frame header.

@Emna-Rekik Emna-Rekik force-pushed the gsoc-2023-http branch 3 times, most recently from a2a3452 to 4722b52 Compare July 10, 2023 14:14
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is - you cannot feed the data to the HTTP1 parser unless you're certain it's a HTTP1 request. Otherwise, you may get unexpected results. So the order of action should be:

  • Read data on a socket until you have strlen(preface) bytes
  • If preface matches HTTP2 request - proceed as with HTTP2 request.
  • Otherwise, proceed with HTTP1 request. Only now you can feed that data to the HTTP1 parser, and search for the upgrade header. If there's a header, reply and await next request (HTTP2 this time).
@Emna-Rekik Emna-Rekik force-pushed the gsoc-2023-http branch 4 times, most recently from 26dcbd5 to 2101835 Compare July 13, 2023 11:51
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose for this variable? Looks a bit hacky considering how it's used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

server_fd and event_fd sockets need to be handled separately. Currently, you would call `close_client_connection() on the listening socket, if it reported an error.

Comment on lines 581 to 603
Copy link
Contributor

@rlubos rlubos Jul 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment will be a little longer, but I'll try to give a general overview of how a stateful parsing could look like.

First of all, we want to get rid of all poll()/recv() calls outside of the main server loop (so the one here). The reason for this is that we don't want to block the server from handling other requests, in case recv() would block inside handle_http2_request(). But even with non-blocking calls it doesn't work great in the current design, as in case there are no new data, you'd stop the parsing and start over from scratch (i. e. preface handling) w/o serving the request.

Receving new data

What I think would be needed in the first place to achieve this goal is to redesign how we recv new data, I suggest to:

  • Introduce a separate input buffer for each client (so make it a part of the client context).
  • Introduce a new offset variable to keep track of how many bytes are currently in the buffer. This would also affect how you recv() new data:
recv(ctx->client_fds[i].fd, buffer + offset, sizeof(buffer) - offset, 0);

As you don't want to overwrite the data that is already in the buffer.

Parsing the buffer

Now, recv() will be providing new data to the buffer, so we need handle_http_request to consume it. That's where the stateful handling kicks in. If you look at the existing HTTP1 parser implementation you'll see there are lots of states, the parser also works in a way that it counsumes the new data byte by byte:
https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/net/lib/http/http_parser.c#L784
I don't think we need it to get that complex with HTTP2 since it's binary, so easier to parse.

First of all, we need to figure out what states the server can be in. The very first thing we expect from a newly connected client is a preface - therfore, it makes sense to introduce a state that represents this stage (HTTP_PREFACE). Once we've received the preface, we can determine whether we deal with HTTP1 or HTTP2 request.

For HTTP1 I believe we don't need much state representation, as most of this is handled by the existing parser. HTTP1_REQUEST should do, but if you see a need for more states for HTTP1 parsing, don't hesitate to add more.

For HTTP2 it gets a bit more complex, as we need to implement the parser ourselves. So we need to figure out, what is the next data we expect to get, when we proceed with HTTP2 request. For me, it seems that we should expect frame header. So we can introduce HTTP2_FRAME_HEADER state. Once we're done with the frame header parsing, we know what type of frame we deal with. So for a header frame, we could have HTTP2_HEADER_FRAME in which we parse header fields one by one. Similarly, in HTTP2_SETTINGS_FRAME we would parse ("consume") settings, one by one (note that they all have predefined format/size). The list goes on and on, I hope that you see where it's going. If certain frame type requires more complex parsing - you can define substates as well.

Also note, that with each frame header, you know exactly how much bytes in that frame to expect. So when you're done with parsing of the current frame ("consumed" all of the frame data), you can get back to HTTP2_FRAME_HEADER state.

Some pseudocode to consider

We should consider now how could this look like in practice (note that it's just a pseudo-C code, I did not compile it, it's just for an overview).

The main input function could look like:

int handle_http_request(struct http2_client_ctx *ctx)
{
	int ret = -EINVAL;

	do {
		switch (ctx->state) {
		case HTTP_PREFACE:
			ret = handle_http_preface(ctx);
			break;
		case HTTP1_REQUEST:
			ret = handle_http1_request(ctx);
			break;
		case HTTP2_FRAME_HEADER:
			ret = handle_http2_frame_header(ctx);
			break;
		case xxx:
			etc..
		}
	} while (ret == 0 && ctx->offset > 0);

	return ret;
}

We need to establish some protocol for the return values. 0 could indicate successful parsing and that the function should proceed with the next state (there might still be some data in the buffer to process). -EAGAIN could indicate that we expect more data to come before we can proceed with parsing. Any other negative value - parsing error, we should terminate the connection.

Now for example how could handle_preface() look like:

int handle_http_preface(struct http2_client_ctx *ctx)
{
	if (ctx->offset < strlen(preface)) {
		/* We don't have full preface yet, get more data. */
		return -EAGAIN;
	}

	if (strncmp(ctx->buffer, preface, strlen(preface)) != 0) {
		ctx->state = HTTP1_REQUEST;
	} else {
		ctx->state = HTTP2_FRAME_HEADER;
	}

	/* Now we need to "consume" the already processed data from the buffer
	 * For simplicity, I suggest to use memmove for starters, we can
	 * optimize to some other solution if needed
	 */

	/* ctx->offset will indicate remaining data in the buffer */
	ctx->offset -= strlen(preface);
	memmove(ctx->buffer, ctx->buffer + strlen(preface), ctx->offset);

	return 0;
}
@cfriedt
Copy link
Member

cfriedt commented Jul 21, 2023

Say you had a uC with a single core, not even multi-threading capable. Say that the firmware on this uC had to do lots of things like read GPIO, read / write from flash memory, serve / respond to HTTP traffic, etc

int http_client_input_work(client_context *ctx)
{  
  // how many bytes to read? 
  // depends on http parser state
  // depends also on how much data is available on your transport
  // in Linux: 
  int bytes_available_to_read;
  int ret = ioctl(ctx->fd, FIONREAD, &bytes_available_to_read);
  // in Zephyr?
  
}

int http_client_output_work(client_context *ctx)
{  
  // how many bytes to read? 
  // depends on http parser state
  
  // depends also on how much data is available on your transport
  // in Linux: 
  int bytes_available_to_write;
  int ret = ioctl(ctx->fd, FIONWRITE, &bytes_available_to_read);
  // in Zephyr?
  
  // how many bytes we can write on ctx->fd WITHOUT blocking at all
  int bytes_to_write = MIN(bytes_available_to_write, ctx->output_buffer_size);
  
  // linear buffer (array)
  int bytes_written = write(ctx->fd, ctx->output_buffer, bytes_to_write);
  // check for errors...
  __ASSERT_NO_MSG(bytes_written == bytes_to_write);
  // downside of linear buffer is that we either keep track of offset or use memmove to move data to the front of the array
  memmove( ... );
  
  // struct ring_buffer approach
  ret = write(ctx->fd, ctx->output_ring, /* MIN(bytes_to_write, ring_buffer_bytes_to_end_of_ring) */ );
  // check for errors..
  bytes_written += ret;
  bytes_to_write -= ret;
  // drain N bytes from ring buffer (essentially update internal counters)
  
  if (bytes_to_write > 0) {
    ret = write(ctx->fd, ctx->output_ring[offs], /* MIN(bytes_to_write, ring_buffer_bytes_to_end_of_ring) */ );
    // check for errors
    // drain bytes_to_write bytes from ring buffer
  }
}

int do_http_work(void)
{
  int server_fd;
  int cancel_fd;
  struct pollfd pfd[1 /* server_fd */ + 1 /* cancellation_fd */ + max_num_clients ];
  
  // condition pfd array (POLLIN, POLLOUT)
  
  int ret = poll(pfd, ARRAY_SIZE(pfd), 0);
  switch(ret) {
  case 0:
    return;
  case -1:
    // report error to main as -EINTR
    return -errno;
  default:
    break;
  }
  
  // ret contains the number of file descriptors in pfd that have events
  for(int i = 0; ret > 0; --ret, ++i) {
    struct pollfd *p = &pfd[i];

    if (p->fd == cancel_fd) {
      if (!(ctx->client_fds[i].revents & POLLIN)) {
        http_server_shutdown();
        return;
      }
    }
    
    if (p->fd in client_fds) {
      if ((ctx->client_fds[i].revents & POLLERR) == POLLERR) {
        // handle error
        close_client_connection();
      if ((ctx->client_fds[i].revents & POLLHUP) == POLLHUP) {
        // handle socket close
        close_client_connection();
      } else if ((ctx->client_fds[i].revents & POLLOUT) == POLLOUT) {
        // there is room to write data on this socket
        // even our outgoing traffic has to be stateful!
        // so sendAll() cannot work in a fully async system
        http_client_output_work(&client_context[p->fd]);
      } else if ((ctx->client_fds[i].revents & POLLIN) == POLLIN) {
        // there is room to write data on this socket
        // even our outgoing traffic has to be stateful!
        // so sendAll() cannot work in a fully async system
        http_client_input_work(&client_context[p->fd]);
      }
    }
    
    if (p->fd == server_fd) {
      // check if server *can* accept new client (check if num clients < max)
    }
  }
}

int main()
{
  do {
    // all of these functions would need to be more or less instantaneous
    do_gpio_work();
    do_flash_memory_work();
    do_http_work();
    // sleep until some event happens
  } while(1);
}
Comment on lines 237 to 206
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've mentioned this already, but 0 is still a valid file descriptor - it should not be used as a marker of unused slot. Use -1 instead (preferably defined as some symbol like INVALID_SOCK.
Also, please make sure you follow Zephyr CS:

Suggested change
if (ctx->fds[j].fd != 0)
continue;
if (ctx->fds[j].fd != -1) {
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should break the loop if ret < 0. And handle -EAGAIN as a special case - it's not an error, but an indication that more data is expected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should drop this idea of having parse_http2_frames() which will attempt to parse as many frames there is in the buffer, and store their contents in an enormous array of frames - it won't scale well for embedded systems.

We should be parsing frames one by one - i. e. start in a FRAME_HEADER state always, and based on the content switch to a state that will be responsible of parsing the payload of the frame. In case of SETTINGS frame, parse settings one be one, until we consume all of the data in the current frame.

Implement a library for hpack static table
Fixes zephyrproject-rtos#59693

Signed-off-by: Emna Rekik <emna.rekik007@gmail.com>
Add Zephyr HTTP library

Signed-off-by: Emna Rekik <emna.rekik007@gmail.com>
Replace net_http with net_http_client

Signed-off-by: Emna Rekik <emna.rekik007@gmail.com>
Modify http Kconfig
Fixes zephyrproject-rtos#59684

Signed-off-by: Emna Rekik <emna.rekik007@gmail.com>
Verifies that HTTP method and HTTP path are correctly parsed

Signed-off-by: Emna Rekik <emna.rekik007@gmail.com>
Add sample.yaml file and adjust files after moving
server.c to lib

Signed-off-by: Emna Rekik <emna.rekik007@gmail.com>
Convert README.md to README.rst and update it.

Signed-off-by: Emna Rekik <emna.rekik007@gmail.com>
This test ensures that the server supports Gzipped
Compression.
Fixes zephyrproject-rtos#59700

Signed-off-by: Emna Rekik <emna.rekik007@gmail.com>
Add test function for basic rest api

Signed-off-by: Emna Rekik <emna.rekik007@gmail.com>
This test ensures that the server supports JSON
Format

Signed-off-by: Emna Rekik <emna.rekik007@gmail.com>
Add JSON format support

Signed-off-by: Emna Rekik <emna.rekik007@gmail.com>
Establish a TLS communication between a client and
Zephyr HTTP server.

Signed-off-by: Emna Rekik <emna.rekik007@gmail.com>
Add service header file

Signed-off-by: Emna Rekik <emna.rekik007@gmail.com>
Support Static Ressources

Signed-off-by: Emna Rekik <emna.rekik007@gmail.com>
add this line
ITERABLE_SECTION_ROM(http_resource_desc, 4)

Signed-off-by: Emna Rekik <emna.rekik007@gmail.com>
Return them to their origin after we have
done linux prototye

Signed-off-by: Emna Rekik <emna.rekik007@gmail.com>
add section rom file

Signed-off-by: Emna Rekik <emna.rekik007@gmail.com>
add socket support

Signed-off-by: Emna Rekik <emna.rekik007@gmail.com>
Add static ressource support

Signed-off-by: Emna Rekik <emna.rekik007@gmail.com>
Modify service header file

Signed-off-by: Emna Rekik <emna.rekik007@gmail.com>
Update prototype test functions and fix issues
Fixes zephyrproject-rtos#59685
Fixes zephyrproject-rtos#59686
Fixes zephyrproject-rtos#59688
Fixes zephyrproject-rtos#59690
Fixes zephyrproject-rtos#59670

Signed-off-by: Emna Rekik <emna.rekik007@gmail.com>
@jukkar
Copy link
Member

jukkar commented Sep 11, 2023

Hi all, what is the plan with this one (considering merging)? There are lot of commits (a bit too much imho), and reviewing becomes harder. Is there something major still missing from this?
Should we just remove the draft status and have a normal PR so that more people would look at it?
Also we might consider merging this even if not ready so that remaining issues could be resolved more easily. We could add to documentation that this is still very experimental. WDYT?

@jukkar
Copy link
Member

jukkar commented Sep 15, 2023

ping @cfriedt, any ideas / suggestions regarding my question?

@cfriedt
Copy link
Member

cfriedt commented Sep 15, 2023

ping @cfriedt, any ideas / suggestions regarding my question?

@jukkar - are you on Discord by any chance?

@jukkar
Copy link
Member

jukkar commented Sep 15, 2023

are you on Discord by any chance?

Yes, my nick is jukkar

break;
}
/* Create a socket */
ctx->server_fd = socket(af, SOCK_STREAM, proto);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this an issue if there is more than 1 service? The server_fd member will have the value of the last socket, which is an issue for accepting new clients.

@jukkar jukkar mentioned this pull request Oct 26, 2023
@jukkar
Copy link
Member

jukkar commented Oct 26, 2023

Moving the work to #64465, please add comments there. Closing this one.

@jukkar jukkar closed this Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: Networking GSoC Google Summer of Code

5 participants