Temporary downtime for iPOSpays STEAM Service

March 2, 2023

INCIDENT DATE AND TIME
US-EST: March 2nd 2023
Beginning at: Time 11.10 EST
Resolved at: Time Noon EST

IPOSPAYS STEAM service (built in STEAM server to our gateway) was down for a temporary period. Our general STEAM service (dvmms.com/steam) for non-gateway terminals was not impacted.

IMPACT

Terminals files could not have been prepared or saved.

Parameter/ downloads failed. This would have impacted several customers who just took application download before the failure but didn’t get the parameter download.

CloudPOS customers who logged in during the above window could not perform transactions.
All previously logged in customers could still transact.

All customers who would have tried transactions during the DB fail over period would have been
impacted.

Transaction database was briefly down during the failover from primary to secondary to
tertiary. This would have been for not more than a minute.

ROOT CAUSE

Based on the preliminary investigation, an unexpected spike of application/parameter downloads and updates crashed the parameter server that runs the application on MONGO DB.

Mongo DB is running on the same Transaction server (MariaDB). As a result, it impacted the MariaDB Server as well. As designed the MariaDB failed over to a Secondary instance. When we attempted to bring Mongo DB on Secondary instance, it caused Secondary MariaDB to fail as well. On our third redundancy server, finally, MariaDB got restored. Than it allowed us to bring the other two server back up and the full system got restored.

RESPONSE HANDLING

There was no alert set up to detect connectivity failures between STEAM and MongoDB. We were alerted by our internal user who was trying to configure something on production detected the failure.

Our on-site system administrators and programmers responded immediately to the situation and
acted in the most efficient manner.

REMEDIAL STEPS
Develop new alerts to detect connectivity loss between STEAM and DB (Maria and Mongo).
ETA – 3/10

Separate MongoDB from MariaDB and run on two different servers so that one will not impact others.

ETA – 3/10

Perform load testing to bench-mark peripheral services like app download, param download, portal access etc., for volume and stress.

ETA – 3/20