Performance Degradation
Postmortem

As you are aware over the last 2 days we have encountered an issue that has required us to reboot a number of servers to address an issue that has started occurring on our primary SQL Server that hosts a number of SAP Business One deployments and has a flow on effect to other servers.

SYMPTOMS

What is happening is that suddenly the SQL Server disk becomes flooded with activity which then causes the entire server to become unresponsive, and this has flow on impacts to other servers running on the primary Hyper-V virtualisation server that hosts the SQL Server.

ACTIONS TAKEN

We have done significant investigation over the last 48 hours since the issue first occurred and we do not see any issues in any of our underlying hardware or infrastructure.

This is pointing to a software issue either from the underlying SQL Server or more likely, there is a process running on one of our customer deployments such as a badly written query or running task that is causing this to occur which makes it very difficult to diagnose and rectify.

It also means that we cannot just move it to another set of hardware as it is not a hardware problem that we are seeing.

We know for certain that it is not malicious activity and at no stage has any data been lost - the main impact is down time for us customers which is sincerely regretted.

We have taken a number of steps overnight to move one other critical piece of SAP software , the Cloud Control Center to a different SQL Server machine so that HANA customers do not get impacted by the issue when it occurs.

NEXT STEPS

We have a plan to move the SQL Server to its another virtualisation server if the issue occurs again to further minimise the impact to other customers in our cloud environment but this does involve a period of downtime so we cannot do this until the coming weekend on Saturday night to minimise the impact on customers.

We will continue to work the problem to find the root cause and then the resolution.

If you have any questions, please email me - richard@smbsolutions.com.au

Posted Dec 07, 2023 - 07:44 AEDT

Resolved
This incident has been resolved.
Posted Dec 06, 2023 - 20:03 AEDT
Monitoring
We are still monitoring and diagnosing the SQL Server but access should be restored.
Posted Dec 06, 2023 - 17:10 AEDT
Investigating
Microsoft SQL Server performance has again dropped out. We are investigating as a priority.
Posted Dec 06, 2023 - 16:35 AEDT
Monitoring
Performance has been restored. We are monitoring the SAP SQL Server for further incidents.
Posted Dec 06, 2023 - 14:31 AEDT
Update
We need to allow the server to reboot gracefully to ensure that no data corruption occurs - please bear with us as we go through this process
Posted Dec 06, 2023 - 13:59 AEDT
Identified
Server performance has dropped on the SAP SQL Server component again. Will be immediately restarting the server to avoid yesterdays extended dropouts.
Posted Dec 06, 2023 - 13:45 AEDT
This incident affected: Database Engines (SQL Server Database Engines) and Remote Desktop Services.