Session APIs, Webhooks and Recordings Failures
Incident Report for Dyte
Date and Time: 20th June 2024 10:06 AM UTC to 20th June 2024 11:11 AM UTC

Affected Service: Webhooks, Recordings, Sessions API

Issue: All APIs under the `/v2/sessions` prefix returned invalid or no data. Meetings with Record on Start features on the new media layer did not trigger recordings on start. A subset of webhooks were sent with a delay for the new media layer.

Root Cause: A change was rolled out on 11 June 2024 that improved the performance and reliability for session data processing for larger meetings. This change introduced a regression to stop processing events after a threshold of failures is reached. Due to a gradual increase in count of events over several days, a message broker stopped forwarding events to various services internally.

Resolution: On noticing this issue, we restarted internal services, after which all pending data was processed and APIs started responding correctly. After initial mitigation, we rolled out a fix to prevent this class of issues from happening in the future.

Impact: This incident did not cause a 100% failure, other APIs and services continued functioning normally. This issue only affected customers that are on the newer media layer.

Monitoring: We are prioritising re-deployment of probes that constantly monitor these APIs on to a more reliable system to alert us about such issues sooner.

A detailed RCA has already been shared with affected customers. If you are interested in the RCA, please reach out to Dyte Support.
Posted Jun 20, 2024 - 15:30 IST