• Hub 3 Update

    From deon@21:2/116 to All on Friday, April 12, 2024 19:42:09
    Howdy,

    I've neglected Hub 3 for the last couple of months - but today I logged into the host to see what the damage was.

    There were a bunch of files that werent processed (not sure why), so I processed those - as well as a few nodes marked as "auto hold". (Hub 3 puts CRASH/DAILY nodes into "Auto Hold" status if polling fails 5 times in a row, and nodes dont poll in to collect mail regularly.)

    I've released some of those auto holds as I see those BBS's are still online, so there is a back log of mail going out to those Systems. (I'm guessing they are on auto-pilot cause I havent heard anyway say they arent getting mail.)

    I've also discovered some issues with my automatic routing - so if you get an old netmail that was in transit that'll be why (oops sorry :)

    I also gave the DB server some more memory - it looks like the linux kernel was killing it due to memory starvation.

    Let me know if you see anything else odd...


    ...лоеп
    --- SBBSecho 3.20-Linux
    * Origin: I'm playing with ANSI+videotex - wanna play too? (21:2/116)
  • From Al@21:4/106 to deon on Friday, April 12, 2024 05:00:22
    I've released some of those auto holds as I see those BBS's are still online, so there is a back log of mail going out to those Systems. (I'm guessing they are on auto-pilot cause I havent heard anyway say they arent getting mail.)

    I have noticed here in the last month or two that the net 3 hub gets put on hold because it doesn't answer, or because there is some failure.

    I clear those holds periodically and things flow as expected until the next hold comes along.

    I just cleared the holds on net 3 a hour ago or so.

    I also gave the DB server some more memory - it looks like the linux kernel was killing it due to memory starvation.

    Sometimes when I watch mailer sessions with hub 3 the session is very slow. This could also be the cause of failures. I don't know why the session progresses slowly. A lack of memory perhaps?

    --- BBBS/Li6 v4.10 Toy-6
    * Origin: The Rusty MailBox - Penticton, BC Canada (21:4/106)
  • From deon@21:2/116 to Al on Friday, April 12, 2024 22:45:51
    Re: Hub 3 Update
    By: Al to deon on Fri Apr 12 2024 05:00 am

    Hey Al,

    I have noticed here in the last month or two that the net 3 hub gets put on hold because it doesn't answer, or because there is some failure.

    I clear those holds periodically and things flow as expected until the next hold comes along.

    I just cleared the holds on net 3 a hour ago or so.

    OK, there are probably a couple of reasons for this:
    * There is major construction going on nearby, and they are constantly taking my internet down for "maintenance" - and its prolonged (usually around 10hrs). (They are rebuilding the rail line near me, and its an 18-24 mth project while they move it above ground.). So I imagine this long outage is probably a primary reason.

    (I have a hotspot, which gets traffic when my main cable goes down, so mail still flows, but only outbound from me.)

    * I've taken hub down for updates.

    * I nightly backup "pauses" the container and backs up the hub, but that should only be a few mins. But that might be happing while there is a session active.

    * My IPv4 link goes down (IP6 is much more reliable...)

    Tonight I stopped the hub from accepting inbound calls while I cleared the backlog - it made it easier for me to trace a problem in the logs - which is when I noticed the kernel killing the db... ;)

    How many failed attempts (and time) before your system puts me on hold?

    Sometimes when I watch mailer sessions with hub 3 the session is very slow. This could also be the cause of failures. I don't know why the session progresses slowly. A lack of memory perhaps?

    Slow as in there is a delay before there are transfers? binkp by default has a 5 min timeout, hopefully not that slow that it times out?

    Outbound mail bundles are built on the fly, and the DB has a lot of mail in it (I've never deleted anything...), but it should be seconds before mail packets are ready, not minutes...

    I just looked in the logs for a session tonight, and it looks like 2-5s:

    [2024-04-12 22:05:02] production.INFO: PB-:- We have authed these AKAs [21:4/106.0@fsxnet] {"pid":268}
    [2024-04-12 22:05:04] production.INFO: MA-:= Got [1] echomails for [21:4/106.0@fsxnet] for sending {"pid":268}
    [2024-04-12 22:05:07] production.INFO: IS-:- Sending item [0] (118c0100.pkt) {"pid":268}
    [2024-04-12 22:05:07] production.INFO: PB-:= Packet/File [118c0100.pkt], type [4] sent. {"pid":268}

    That said, I've noticed the website is slowing down, so I may need to think about better DB indexes and/or deleting some mail.

    To be honest, I'm surprised that memory is the issue - docker stats show it using < 200MB of the 512MB that I had assigned to the DB, yet the kernel was killing it (oom-killer). I've doubled it just in case, but I'll need to keep an eye on it.


    ...лоеп
    --- SBBSecho 3.20-Linux
    * Origin: I'm playing with ANSI+videotex - wanna play too? (21:2/116)
  • From Al@21:4/106 to deon on Friday, April 12, 2024 08:14:46
    I just cleared the holds on net 3 a hour ago or so.

    OK, there are probably a couple of reasons for this:
    * There is major construction going on nearby, and they are constantly taking
    my internet down for "maintenance" - and its prolonged (usually around 10hrs). >(They are rebuilding the rail line near me, and its an 18-24 mth project while
    they move it above ground.). So I imagine this long outage is probably a primary reason.

    (I have a hotspot, which gets traffic when my main cable goes down, so mail still flows, but only outbound from me.)

    * I've taken hub down for updates.

    * I nightly backup "pauses" the container and backs up the hub, but that shoul >only be a few mins. But that might be happing while there is a session active.

    Yep, any of the above would cause sessions to fail.

    * My IPv4 link goes down (IP6 is much more reliable...)

    I only have IPv4 here. That's caused by a bad actor here in the internet backbone.

    Tonight I stopped the hub from accepting inbound calls while I cleared the backlog - it made it easier for me to trace a problem in the logs - which is when I noticed the kernel killing the db... ;)

    How many failed attempts (and time) before your system puts me on hold?

    The mailer doesn't actually switch mail to hold but it it stops polling after 3 failed attempts. That will happen after about 5 minutes if any of the above is going on. I delete that failed call counter in nightly maintanance and during the day I do it myself if needed.

    Sometimes when I watch mailer sessions with hub 3 the session is very slow. >> This could also be the cause of failures. I don't know why the session
    progresses slowly. A lack of memory perhaps?

    Slow as in there is a delay before there are transfers? binkp by default has a
    5 min timeout, hopefully not that slow that it times out?

    I have never seen a session fail. Inbound mail session are handled by a daemon so I can only read the logs to see what is happening.

    I can watch outbound mail sessions if I switch to that window. Watching connects with hub 3 looks like the session is running 300 baud. Even so I have never seen a session failure, it is just slow.

    Outbound mail bundles are built on the fly, and the DB has a lot of mail in it >(I've never deleted anything...), but it should be seconds before mail packets
    are ready, not minutes...

    That said, I've noticed the website is slowing down, so I may need to think about better DB indexes and/or deleting some mail.

    To be honest, I'm surprised that memory is the issue - docker stats show it using < 200MB of the 512MB that I had assigned to the DB, yet the kernel was
    killing it (oom-killer). I've doubled it just in case, but I'll need to keep a
    eye on it.

    It may be something other than memory, I wonder if the process that is running your mailer/tosser is getting enough cpu time to do what it needs to do?

    Anyway, I think your new mailer/tosser is doing a good job although there may be more needed that you'll need to identify and sort out.

    --- BBBS/Li6 v4.10 Toy-6
    * Origin: The Rusty MailBox - Penticton, BC Canada (21:4/106)
  • From Zip@21:1/202 to deon on Friday, April 12, 2024 21:50:13
    Hello deon!

    On 12 Apr 2024, deon said the following...

    To be honest, I'm surprised that memory is the issue - docker stats show it using < 200MB of the 512MB that I had assigned to the DB, yet the kernel was killing it (oom-killer). I've doubled it just in case, but
    I'll need to keep an eye on it.

    I think some of the default settings for memory/virtual memory in Linux leaves a lot to be desired, having vm.overcommit_memory = 0 (heuristic mode which tends to fail when an application wants lots of memory fast), vm.overcommit_ratio = 50 (only allow applications to use 50% of physical RAM) and vm.swappiness = 60 (rather high tendency to use swap).

    Some good reads:

    https://github.com/Tanzu-Solutions-Engineering/blog/blob/master/content/post/Vi rtual_memory_settings_in_Linux_-_The_problem_with_Overcommit.md

    https://unix.stackexchange.com/a/294651

    https://en.wikipedia.org/wiki/Memory_paging#Swappiness

    I have chosen to set vm.swappiness = 0, vm.overcommit_memory = 2 and vm.overcommit_ratio = 95 here to ensure that applications can use most of the amount of installed RAM and that swap is only used when really necessary.

    Hope this helps!

    Best regards
    Zip

    --- Mystic BBS v1.12 A49 2023/04/30 (Linux/64)
    * Origin: Star Collision BBS, Uppsala, Sweden (21:1/202)
  • From deon@21:2/116 to Al on Saturday, April 13, 2024 08:58:09
    Re: Hub 3 Update
    By: Al to deon on Fri Apr 12 2024 08:14 am

    Hey Al,

    The mailer doesn't actually switch mail to hold but it it stops polling after 3 failed attempts. That will happen after about 5 minutes if any of the above is going on. I delete that failed call counter in nightly maintanance and during the day I do it myself if needed.

    Slow as in there is a delay before there are transfers? binkp by default

    I can watch outbound mail sessions if I switch to that window. Watching connects with hub 3 looks like the session is running 300 baud. Even so I have never seen a session failure, it is just slow.

    So I had a look through the logs for the last couple of days. (I should expose this on the web UI as well...)

    Most sessions with 4/106 are < 7s, with the occassional one at 18s. In terms of transfers most transfers are 2-4k.

    In comparison, Spectres (which is EMSI/Zmodem) are 13-14s and are around 10-20k.

    And for interest, last nights catch up session with 3/130 was 207s for 6MB of mail.

    If you see a slow response on sending packets (to the hub), it may also be because your packet is being processed on the fly before the hub is ready to receive the next packet.

    Anyway, I think your new mailer/tosser is doing a good job although there may be more needed that you'll need to identify and sort out.

    So at the moment, I dont think I have any issues to sort out. Everything seems to be working as I expect, although I do plan to work on some optimising the indexes in the DB so that the response to queries is a little snappier, especially for the web ui.


    ...лоеп
    --- SBBSecho 3.20-Linux
    * Origin: I'm playing with ANSI+videotex - wanna play too? (21:2/116)
  • From deon@21:2/116 to Zip on Saturday, April 13, 2024 09:07:36
    Re: Re: Hub 3 Update
    By: Zip to deon on Fri Apr 12 2024 09:50 pm

    Howdy,

    I think some of the default settings for memory/virtual memory in Linux leaves a lot to be desired, having vm.overcommit_memory = 0 (heuristic mode which tends to fail when an application wants lots of memory fast), vm.overcommit_ratio = 50 (only allow applications to use 50% of physical RAM) and vm.swappiness = 60 (rather high tendency to use swap).

    So in my case, there is another element, which is the memory limit assigned to the docker container. I do that so 1 container doesnt starve the resources for others.

    docker "stats" shows the DB hovering around 200MB of ram, with the ability to use 512MB - so I thought that was well enough. But underload the oom-killer would kill it.

    The host has plenty of swap available (only using 5% and using <50% of system RAM (with cache buffers), so not sure why it was selected to kill. I guess it's quite possible a large (infrequent) query consumed all memory. Anyway giving it 1GB hasnt seen it be killed in 12 hrs...

    Hope this helps!

    Thanks...


    ...лоеп
    --- SBBSecho 3.20-Linux
    * Origin: I'm playing with ANSI+videotex - wanna play too? (21:2/116)
  • From Al@21:4/106 to deon on Friday, April 12, 2024 17:19:00
    So I had a look through the logs for the last couple of days. (I should expose this on the web UI as well...)

    Most sessions with 4/106 are < 7s, with the occassional one at 18s. In terms of transfers most transfers are 2-4k.

    I find that sessions last for 2 or 3 seconds as long transfers are less than 100k.

    In comparison, Spectres (which is EMSI/Zmodem) are 13-14s and are around 10-20k.

    I see that also with EMSI/Telnet sessions. They usually take ~15 seconds even with a small load of less than 100k.

    Even if these sessions take longer than that it is no problem, as long as sessions don't fail.

    If you see a slow response on sending packets (to the hub), it may also be because your packet is being processed on the fly before the hub is ready to receive the next packet.

    OK, there are many factors to consider.

    Anyway, I think your new mailer/tosser is doing a good job although there
    may be more needed that you'll need to identify and sort out.

    So at the moment, I dont think I have any issues to sort out. Everything seems to be working as I expect, although I do plan to work on some optimising the indexes in the DB so that the response to queries is a
    little snappier, especially for the web ui.

    OK. If you are happy I am happy.

    --- BBBS/Li6 v4.10 Toy-6
    * Origin: The Rusty MailBox - Penticton, BC Canada (21:4/106)