Last night from approximately 2300-0000 UTC the Verium daemon went down after receiving an unusual, but not unheard of, amount of traffic. The daemon gave the following out of memory error:
And this is what the traffic looked like during the failure:
As you can see, there was an increase in traffic beginning around 2200 and lasted for about 1 hour until the daemon ran out of memory and dropped all active connections. CPU usage increased from 1% to 3% before the daemon crashed:
However, there was no corresponding increase in CPU usage or traffic on the stratum server:
As soon as the daemon was restarted, the stratum server was immediately able to start submitting shares and receiving work and mining resumed. Blocks were being successfully mined from this point, but this was not being reflected in the front end UI. After manually restarting the service that updates the interface the displayed statistics caught up with reality and 3 blocks were ‘instantly’ found.
To prevent this from happening in the future, I will be cloning the existing daemon and only allowing inbound connections from the stratum server and outbound connections to other nodes. This node will act as a fall-back to allow mining to continue should the first node fail. In the event of a failure, the backup node becomes the primary full node and the old primary becomes the backup after it is restarted.