One AWS failure took down the internet Monday morning - what went wrong

https://www.zdnet.com/article/are-high-end-windows-laptops-worth-buying-i-tested-one-from-dell-and-it-made-a-statement/

What was the old quip , something something all your eggs in one something?

Oh right, You should not put all of your Eggs in one basket, Right, the lesson Every Investor learns first, spread the risk around. Diversify it.

This page is a permanent link to the reply below and its nested replies. See all post replies »

DeWayfarer · 61-69, M

Hmmm. The link you gave was the wrong zdnet link. It's this:

https://www.zdnet.com/home-and-office/networking/one-aws-failure-took-down-the-internet-monday-morning-what-went-wrong/

It's more than just a database problem, yet a DNS cloud server problem.

Sort of a different animal all together. To my knowledge there really isn't a good cloud based database anyway. Hense the need to create their own database. Proprietary was the only way to go.

It sounds like you're referring to a significant issue with DNS servers impacting internet services, which highlights the complexities and vulnerabilities in cloud infrastructure. You're right that DNS failures can cascade into broader internet outages, affecting many services simultaneously.

Understanding the Issue
DNS and Its Role

DNS (Domain Name System) is crucial for resolving human-readable domain names into IP addresses. When a DNS service fails, it can cause widespread issues, as many users and services rely on it to navigate the web.

Impact of AWS Outages

The AWS failure you mentioned likely triggered ripple effects throughout various platforms that depend on AWS services, including database solutions. When core services fail, it can lead to significant outages and disruptions in connectivity.

Proprietary Databases

Your point about proprietary databases is well-taken; many organizations opt to build custom solutions rather than rely solely on existing cloud-database services. This can provide more control, tailored performance, and enhanced security.

Current Landscape

The reliance on cloud services often brings up debates about resilience versus convenience. While cloud databases can scale rapidly and are managed by service providers, they do come with potential risks, such as outages, as you've noted.

PDXNative1986 · 36-40, MVIP

@DeWayfarer A singular point of failure is always bad engineering. Period. not withstanding your critique, that this is the cloud and sometimes requires custom solutions, allowing for any type of scenario where there's no automatic fallback and things don't resemble the TCP/IP stack at all is a Rookie level mistake. Engineers DO know better than this, so the fact that things have gotten this way could only be due to the fact that it's more profitable so they're directed to do something every engineer knows better than to do.

DeWayfarer · 61-69, M

@PDXNative1986 that's what a cloud is though.

Cloud technology exists because of the need for speed.

PDXNative1986 · 36-40, MVIP

@DeWayfarer A redundant option? Yes, but these sorts of outages shouldn't be happening in the first place. There needs to be multiple types of Databases they can fall back to if one option fails.

DeWayfarer · 61-69, M

@PDXNative1986 if they were to resort back to server based DNS they wouldn't even need to change their database.

The problem is they NEED the speed of the cloud. And there is no reliable cloud based databases. Not even Google's cloud based database is reliable.

You raise a valid point regarding the recent AWS DNS failure and its implications for the reliability of cloud services, particularly those reliant on AWS infrastructure.

Evaluating Cloud Database Reliability
AWS Dependence

The fact that a significant outage can stem from an underlying DNS issue challenges the perception of reliability. When a critical service experiences downtime, it can impact everything built on that infrastructure, leading to widespread service disruption. This brings into question the resilience of cloud solutions that are tightly coupled with a single provider.

Broader Industry Perspective

Single Points of Failure: Many cloud services rely on a limited number of providers, which can create vulnerabilities. An issue at one cloud service provider can lead to cascading failures across multiple platforms.

Multi-Cloud Strategies: To mitigate risks, some businesses are adopting multi-cloud strategies, spreading their workloads across various providers. This allows for better redundancy and resilience.

Alternative Providers: Other cloud providers like Google Cloud and Microsoft Azure also offer resilient solutions, but they too can be susceptible to outages due to unforeseen circumstances.

Custom Solutions vs. Cloud

Given the risks associated with public cloud databases, some organizations are opting for custom or hybrid solutions. Building proprietary systems allows for greater control and potentially improved reliability, although it requires more resources and maintenance.

Moving Forward

The incident underscores the need for organizations to assess their cloud strategies critically. Relying solely on any one cloud provider for critical infrastructure might not be advisable.

Your skepticism highlights the tension between the conveniences of cloud solutions and the vulnerabilities they may introduce.