• src/sbbs3/ringbuf.c

    From Rob Swindell (in GitKraken)@VERT to Git commit to main/sbbs/master on Friday, March 24, 2023 13:27:09
    https://gitlab.synchro.net/main/sbbs/-/commit/ffe092b65b38633a5ca185e3
    Modified Files:
    src/sbbs3/ringbuf.c
    Log Message:
    Set initial state of 'data_event' and 'highwater_event' to *not* signaled

    Since commit fafed094c52b0ca1 (2 months ago), the Synchronet Web Server on
    my Windows systems has occasionally gone into a state where every HTTP session thread was causing 100% CPU utilization and each new HTTP session thread
    would just exacerbate the problem eventually leading to complete system instability/unresponsiveness until the sbbs instance was terminated. This was more readily reproducible on a Win7-32 VM, but would occasionally, though
    much less frequently, happen in a native instance on Win10-64 as well.

    Using the VisualStudio debugger, I was able to narrow it down to this:

    Each new HTTP thread (eventually, hundreds of such threads) would get stuck
    in this loop in http_output_thread():

    while(session->socket!=INVALID_SOCKET) {

    /* Wait for something to output in the RingBuffer */
    if((avail=RingBufFull(obuf))==0) { /* empty */
    if(WaitForEvent(obuf->data_event, 1000) != WAIT_OBJECT_0)
    continue;
    /* Check for spurious sem post... */
    if((avail=RingBufFull(obuf))==0)
    continue; // <--- data_event signaled, but never cleared
    }

    There appears to be a race condition where this logic could be executed immediately after the output ringbuf was created, but before writebuf() was ever called (which would have actually placed data in the output buffer), causing a potential high-utilization infinite loop: the data_event is signaled but there is no data and the event is never reset and nothing can ever add
    data to the ringbuf due to starvation of CPU cycles.

    Uses of ringbuf's data_event elsewhere in Synchronet don't seem to be subject to this issue since they always call RingBufRead after, which will clear the data_event when the ringbuf is actually empty (no similar loops to this one).

    The root cause just appears to be a simple copy/paste issue in the code added to RingBufInit(): the preexisting 'empty_event' was initialized with a
    correct initiate state of 'signaled' (because by default a ringbuf is empty) but the newly added events (data_event and highwater_event) should *not* be initially-signaled because... the ringbuf is empty. So I added some parameter comments to these calls to CreateEvent() to hopefully make that more clear
    and prevent similar mistakes in the future.

    Relieved to have this one resolved finally.

    ---
    þ Synchronet þ Vertrauen þ Home of Synchronet þ [vert/cvs/bbs].synchro.net