Programming Books

Main Menu

  • Home
  • Phyton programming
  • Java programming
  • Php programming
  • C++ programming
  • Additional Topics
    • Programming industry
    • Privacy Policy
    • Terms and Conditions

Programming Books

Header Banner

Programming Books

  • Home
  • Phyton programming
  • Java programming
  • Php programming
  • C++ programming
  • Additional Topics
    • Programming industry
    • Privacy Policy
    • Terms and Conditions
Php programming
Home›Php programming›AWS has already broken down, as have other vendors; Fastly has lessons to learn from his own failure

AWS has already broken down, as have other vendors; Fastly has lessons to learn from his own failure

By Brandy J. Richardson
December 10, 2021
0
0


Fastly’s outage in mid-2021 took huge sites offline. Its chief product architect, Sean Leach, explains why he thinks outages keep happening and how to reduce your own risk.

Image: Shutterstock / SGM

It’s time to reset the ‘days since last outage’ sign at AWS headquarters, with the web hosting giant dissecting its latest mass outage, which this time has resulted in sites like Disney + and Netflix .

There are a lot of digital eggs in the AWS Basket, and unfortunately major outages have occurred with surprising regularity. AWS isn’t alone, however: cloud company Edge Fastly suffered an outage on June 8, 2021, which was similar to AWS outages, if only for the sole reason that it resulted in the offline from several major websites.

SEE: Hiring kit: Cloud Engineer (TechRepublic Premium)

The latest AWS outage is still a bit of a mystery. All we know is that on Tuesday, December 7, AWS US-East-1 went offline. It just happens to be the largest data center in AWS, and it hasn’t only affected Amazon’s customers, but internal operations as well. Later that day, the service was restored, AWS said.

Amazon has yet to give details of the outage other than what CBS News described as “terse technical explanations” for the outage that took major websites, IoT devices and other essential online services offline. Fastly Chief Product Architect Sean Leach won’t speculate on the cause of the AWS outage, but he has a lot to say about Fastly’s June 8 outage and how Fastly’s lessons from it. Learnings can be applied to both content delivery services and the clients who use them.

Fastly’s outage was caused by a bug introduced by a software deployment the previous month. The bug had very specific trigger conditions that could only be triggered by “a specific customer configuration under specific circumstances,” said Nick Rockwell, senior vice president of engineering and infrastructure at Fastly. It turns out that a customer responding to these particular circumstances submitted a valid configuration change that triggered the bug and took 85% of Fastly’s network offline. Quickly discovered the error, restored services, and deployed a permanent fix the same day.

Internet is a car, and cars need maintenance

Internet blackouts keep happening, which begs the question: why? And, if there is something fundamentally wrong, do we need to rethink the internet?

No, Leach said, and the internet was also built very well in the first place, he added. Rather than seeing the Internet as a mass of disparate servers, all vying for authority, think of the Internet as a complete system made up of moving parts, like an automobile.

“So you own your car. You drive, make sure to change oil and other fluids, rotate tires, etc. and react to this unexpected circumstance, ”Leach said.

Leach says there is no fundamental flaw in the design of the Internet. Rather, he describes it as having been “beautifully designed” early in its existence in a way that worked much better than anyone thought at the time. Yes, things go wrong, but every mistake is a chance to learn and eliminate the points of failure.

What Fastly learned from his own breakdown

If Fastly has learned a big lesson from its outage and the recovery process, Leach said, transparency pays off. “Transparency has always been a key area [at Fastly]. We have been very transparent in the blog we posted in response to the outage, and our customers have been very supportive of our response, ”said Leach.

Transparency, Leach said, doesn’t just benefit the company by being open about its mistakes and how it responds to them. It also benefits all other industry players who may face similar circumstances in the future.

SEE: Microsoft Power Platform: What You Need to Know About It (Free PDF) (TechRepublic)

If you’ve been on Tech Twitter for a while, you’ve probably heard the term “HugOps,” a slang term describing the sense of empathy tech pros have for each other when they meet people. similar challenges. Part of HugOps, Leach said, is being able to help. If businesses are honest about their outages, HugOps simply becomes a matter of sharing reports that could quickly reduce recovery time for other organizations.

“To quote Mike Tyson, ‘everyone has a plan until they get punched in the face,” Leach said. Simply put, if we all help each other out, we can respond much better to the blows our infrastructure will inevitably face.

How to fix the internet …?

Leach said there are two big things Fastly has focused on that he sees as ways to reduce the frequency of internet outages.

First, Fastly has moved as much of its critical infrastructure as possible to secure-memory languages ​​like Rust and Web Assembly. “The big cloud infrastructure, the things that do terabits of transactions per second… a lot of it is written in C and C ++. They were great languages ​​at first, but like everything, we finally found a better way.” , Leach said.

Second, Leach warns that DDoS attacks, which he describes as cyclical, are on the rise. The answer to this is to increase transactional capacity to reduce the impact that a DDoS attack can have. “We see the attacks not only getting bigger, but also more complex. Keeping abreast of capabilities and threat intelligence is critical to knowing what attackers are doing, ”Leach said.

As for companies that might be suffering from these outages, Leach said her biggest message of all is not to give up on the cloud.

“Think about all the breakdowns people have had with their own infrastructure for years and how hard it is for them to recover from it. Switching to a cloud provider gives you access to many experts, both on the infrastructure and security side. , which will react quickly and resolve and fix the problem, ”Leach said.

This does not mean that you should ignore the redundancy. Leach says it’s important to have geographic failovers, but the cloud is always going to be the best option for a big reason why Leach said all hurricanes around cloud stability boils down to: risk.

“Each organization has to choose its level of risk, just like you do with security. You can choose how much risk you take in the cloud or you can choose to ignore the risks entirely, ”Leach said.

SEE: iCloud vs OneDrive: which is better for Mac, iPad and iPhone users? (Free PDF) (TechRepublic)

In addition to understanding your risk, Leach said there’s another key thing everyone should do when trying to determine the risks their cloud environment faces: know its entire surface. Like understanding your attack surface, understanding your cloud surface means knowing which APIs are running where, what services are managed by which provider, where the servers are located, what programming languages ​​are in use, and anything that could compromise your uptime.

The usual tips for improving security posture also apply to the cloud, Leach said. Run exercises to simulate outages, take a total inventory of everything in your cloud environment, and create a map yourself so you can expertly identify and instantly react to the inevitable, because at the end of the day, breakdowns are just that: as inevitable like a flat tire, a chipped windshield or any other unexpected disaster.

Cloud and all-as-a-service newsletter

It’s your go-to resource for XaaS, AWS, Microsoft Azure, Google Cloud Platform, cloud engineering jobs, and cloud security news and advice. Delivered on Mondays

register today

Also look


Related posts:

  1. Renowned writers from across the country lead local creative writing workshops October 22-23
  2. Girl Scouts revive pollinator garden at Old Greenwich School with help from the Garden Club
  3. Ugandan children lose hope in long school shutdown amid pandemic
  4. The Office of Academic Assessment expands its role as the Office of Academic Insight; strengthens its services by facilitating data collection and improving student learning
Tagsprogramming languagesvice president

Archives

  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021
  • January 2021
  • December 2019
  • November 2019
  • October 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2017

Categories

  • C++ programming
  • Java programming
  • Php programming
  • Phyton programming
  • Programming industry

Recent Posts

  • 12-bit dual-channel PCIe card samples at 2x 3.2 Gsample/s
  • Woster: Need work for those stifled by student loans – Mitchell Republic
  • How to Become a Cloud Engineer
  • [Unveil]Emerging indie band SURL wants to bring rock back to the masses
  • Ansible vs. Puppet | Compare DevOps tools
  • Privacy Policy
  • Terms and Conditions