The doi.org resolution downtime on 2024-05-29 was caused by a sudden disappearance of our access to Cloudflare’s “Load Balancing” product. This has been used by doi.org to steer traffic based on location to appropriate backend servers. We were not able to re-enable “Load Balancing” and found a workaround using AWS services, eventually AWS “Global Accelerator”.
We opened a ticket with Cloudflare at that time, but after three weeks we have not received any useful information from them. During this time, Cloudflare has noted an issue with billing and subscriptions at https://www.cloudflarestatus.com/incidents/5t270n2ndf0h . We suspect that our issue is related to this, and just happens to have been perfectly organized to result in a loss of service.
Although widespread billing issues may explain Cloudflare’s lack of response to our ticket, we find it disappointing, especially given that we had actual service downtime as a result.
We have decided to make our use of AWS “Global Accelerator” permanent. For doi.org, this service is significantly less expensive than Cloudflare’s “Load Balancing”, and even seems to have a small positive effect on latency. We are still using Cloudflare services also, especially for rate limiting and potential DDoS mitigation.
We will continue to monitor our Cloudflare subscriptions, billing, and tickets for any new issues or any explanatory information. And we apologize for the inconvenience caused by this downtime.