Users of Microsoft Azure, Office 365, Dynamics 365 and other Microsoft powered services seem like to be on a roller-coaster from the past one week with the occurrence of two major outages – one related to its Multi-factor authentication (MFA) services and the recent Office 365 Exchange Online Service outage.
On November 19, between 04:39 UTC and 18:38 UTC, Microsoft Azure Active (AD) Directory MFA services witnessed an outage. As an effect, users of Office 365, Azure, Dynamics and other services which were using Azure AD for user authentication could not login if MFA was enabled by the organization.
Engineers are actively investigating an ongoing issue affecting Azure Active Directory, when Multi-Factor Authentication is required by policy. Please refer to https://t.co/Dw19fIoS5H for updates.
— Azure Support (@AzureSupport) November 19, 2018
In a more recent twitter update from Microsoft 365 status at 11:53 UTC, Microsoft said it is working on an issue that is preventing some UK customers from connecting to Exchange Online.
Due to this, Office users might be unable to connect to the Exchange Online.
We’re investigating an issue where users may be unable to connect to the Exchange Online service. All details are published in the admin center, available to your Office 365 admin under Service Incident (SI)# EX165763.
— Microsoft 365 Status (@MSFT365Status) November 26, 2018
However, Microsoft have suggested the users to refresh their connections to invoke access, as a temporary solution.
Admins can check regular updates to the service in the Office 365 admin center under service incident EX165763.
For its MFA service outage. Microsoft soon mitigated the issue and published three root-causes and monitoring gaps that resulted in MFA outage across its major services.
- Latency issue in the MFA’s frontend communication to the cache services was identified as the first root cause.
- Race condition in processing responses from MFA backend server was the second root cause, that led to recycles of the frontend server processes of MFA service. This further resulted in latency.
- The third root cause was triggered by the second, under which MFA backend was not able to process any request from the front-end, though under Microsoft monitoring it seemed to be working fine.
Apart from the three independent root causes, gaps in monitoring and telemetry of MFA services also delayed quick identification and understanding of these root causes.
The outage effected users for almost 14 long hours. Middle Eastern, European, Asian Pacific and African customers were hit first, per reports, followed by American and Western European datacenters.
As next steps, Microsoft has suggested review of its update deployment procedure to be able to identify similar issues in future during development and testing process, which it targets to meet by December 2018. Further, it announced review of its monitoring services by December 2018 end, review of its containment process by January 2019 and updating communication process to the health dashboard for quick dissemination of information, by end of December 2018.
For Office 365 outage, we would suggest admins to check updates available in the Office 365 admin center.