CloudWatch Is of the Devil, but I Must Use It
Let’s discuss Amazon CloudWatch.
For the ones lucky sufficient not to be caught in the weeds of Amazon Web
Services (AWS), CloudWatch is, and I quote from the reliable
AWS description, “a tracking and
control provider constructed for builders, device operators, web site reliability
engineers (SRE), and IT managers.” This is all nicely and excellent, except for for the
phase the place there isn’t any unmarried named constituency who enjoys running with
the product. Allow me to dispense some tracking heresy.
Better, let me describe this in the context of the 14 Amazon
Leadership Principles that reportedly information each and every resolution Amazon makes.
When you are taking a difficult have a look at CloudWatch’s whole failure throughout all
14 Leadership Principles, you surprise how this product ever made it out
the door in its present state.
I’ll get started with billing. Normally left for the tail finish of articles like
this, the CloudWatch billing paradigm is so horrible, I’m main with
it as an alternative. You get billed consistent with metric, per 30 days. You get billed consistent with
thousand metrics you request to view by way of the API. You get billed consistent with
dashboard per 30 days. You get billed consistent with alarm per 30 days. You get charged for
logs based totally upon information quantity ingested, information quantity saved and “vended logs”
that get printed natively through AWS products and services on behalf of the buyer. And,
you get billed consistent with customized match. All of this can also be summed up absolute best as
“no person on the planet understands how your CloudWatch metrics and logs get
billed”, and it ends up in eventualities the place tracking distributors can inadvertently
value you 1000’s of greenbacks through polling CloudWatch too often. When the
AWS fees are better than what you are paying your tracking dealer, it is
now not a lovely feeling.
“Invent and Simplify”
CloudWatch Logs, CloudWatch Events, Custom Metrics, Vended Logs and Custom
Dashboards all imply various things internally to CloudWatch from what you’ll
be expecting, in comparison to metrics answers that if truth be told make some fathomable
degree of sense. There are, thus, a couple of products and services that do very other
issues, all working underneath the “CloudWatch” moniker. For instance, it isn’t
specifically intuitive to most of the people that scheduling a Lambda serve as to
invoke as soon as an hour calls for a customized CloudWatch Event. It feels overly
sophisticated, extremely complicated, and in no time, you end up in a
scenario the place you are having to construct complicated relationships to watch
issues which are themselves a long way more practical.
All industry folks, when requested what they would like from a tracking platform,
will reply with one thing that resembles “a dashboard” or “a
unmarried pane of glass view”. CloudWatch gives minutia up the wazoo, but
it categorically gives no world view, no inexperienced/yellow/pink standing
indicator that provides you with even a glimmer of the total well being of your web site.
Want a graph of every core to your example’s CPU for the previous 30
seconds? Easy! Want to understand if your whole corporate will have to be placing out the
burning hearth this is the present manufacturing state of your web page? Keep
having a look—CloudWatch has not anything to give you.
“Insist on the Highest Standards”
By its very nature, CloudWatch appears like small considering. The complete
enjoy, begin to end, smacks of “what is the absolute least we
may just do and escape with it?” They constructed their MVP, after which simply
sorta…stopped, frozen in amber. They created a suite of development blocks,
except for they did not resolve the downside of “how do I monitor my AWS resources?”
Instead, it appears like the complete group phoned it in and let a big marketplace
of tracking distributors expand consequently. None of the ones distributors have the
degree of get admission to to the uncooked information that CloudWatch does; all of them have constructed
higher merchandise. You’d suppose the CloudWatch group would take a clue from
the innovation that is all of a sudden taking place on this house, but that’d
require any person to Learn and Be Curious.
“Are Right, a Lot”
Recent information is “eventually consistent”, so that you all the time get graphs like the
one proven in Figure 1.
Figure 1. Example CloudWatch Graph
Here if truth be told, that may be a terrifying factor to peer on an correct
dashboard—one thing is clearly very improper along with your web site! For higher or
worse, the “accurate” description does not follow to CloudWatch, and that’s the reason
simply how your graphs all the time glance. “Your metrics might be in the end
constant” may be very just about the last item you need to listen to about your
tracking platform, 2d best to “what metrics?” This ties without delay
Let me be very transparent right here—the actual factor is not the ingestion downside.
Absolutely each and every dealer on the planet has the similar factor—you’ll be able to’t
show information you should not have. Where CloudWatch drops the ball is in
exposing this habits to the finish consumer with out clarification as to what is
occurring. Thus, till you develop familiar with it, you’ve gotten a heart-stopping
second of “what the hell just happened to the site” on every occasion you
look at a dashboard. This stipulations you to be totally too calm when
having a look at smart dashboards when a crisis simply took place. If you believe
what the CloudWatch dashboards display you, you are making a horrible
If you are the use of Lambda or Fargate, you don’t have any selection but to make use of CloudWatch
Logs, in which looking for the whole lot is basically horrible. If you are
the use of CloudWatch Logs to diagnose the rest, congratulations: you are
diving so deep, chances are you’ll drown earlier than making it again to the floor.
For instance, if I have a Lambda serve as that throws an error, with the intention to
diagnose the downside, I will have to:
- Find the indisputable fact that it encountered an error in the first position through having a look at
the invocation error CloudWatch dashboard. I additionally may just arrange a filter out to
run a continual question on the logs and alert when one thing presentations up, except for
that’s not natively supported—I want a third-party device for that (such
- Go diving into a wide range of CloudWatch log teams and to find the one named
after the particular erroring serve as.
- Scroll manually thru the many, many, many pages of log teams to seek out the
particular invocation that threw an error.
- Realize that the JSON object that is retained is not sufficient to troubleshoot
with, cry in melancholy, and pass write a piece of writing identical to this one.
- Do some fast math and understand I’m paying an uncomfortable share of my
AWS invoice for a provider that is best of quite marginal software at absolute best.
All of your metrics, all of your logs—they are locked away inside of
CloudWatch’s quite a lot of elements. You’re now not going to discover a
“page me when this threshold is exceeded” choice in CloudWatch; your
choices are relegated to “design an alert supply pipeline with baling
twine and SNS” or pay a non-AWS dealer for any other tracking product.
CloudWatch helps to keep all of your metrics. It helps to keep your logs. It means that you can construct
customized dashboards to view your metrics multi function position. The development blocks
of an ideal provider are already right here—it is the expression of that software
that falls quick, now and again significantly. The reality that enormous tracking
distributors are premier sponsors of AWS occasions can be laughable if CloudWatch
ever have been to get its act in combination. You’d now not want a 0.33 celebration to make
sense of a natural AWS setting, and plenty of of them would starve to demise as
they develop too susceptible to break your dialog to invite if they are able to scan
your badge. Choosing to make use of CloudWatch vs. actually anything is like
purchasing a automotive. “Why sure, I wish to purchase the Yugo as an alternative of the Honda.
After all, it tests all the containers of technically being a automotive, so it is effective,
“Disagree and Commit”
It could be that the root motive of many of CloudWatch’s failings
comes from the product engineers who constructed it false impression this
(admittedly slippery!) Leadership Principle. It’s envisioned as
passionately expressing your reservations a couple of resolution, but as soon as
it is reached that, you decide to the resolution that used to be made.
Unfortunately, apparently that the engineering groups liable for
CloudWatch made up our minds to “Disagree in Commits” and inflict their
arguments upon the global in the shape of the product.
If I have been to move on the web and publish about how horrible just about any
different AWS provider used to be, folks would rally to that provider’s protection.
It’s the web; folks will do this. But when those and plenty of extra
equivalent feedback about CloudWatch seem, and no person from AWS pipes in to
say “wow, I’m sorry, why do you feel that way?”, it is
abundantly transparent that if any folks on the CloudWatch group actually care about
the product, they have been locked in a malfunctioning rest room stall for
the best part of a decade. These feedback return a minimum of that a long way, but
the corporate’s “Bias for Action” idea.
“Hire and Develop the Best”
The individuals who construct CloudWatch are not horrible at their jobs; I
if truth be told consider they do not moderately clutch how their product is perceived.
Given that it is deficient shape to write down a rant like this and now not be offering
ideas for certain development, listed here are some product improvements I’d
like to peer:
- Give me the solution to rate-limit API calls at arbitrary ranges slightly than
being stunned at month finish through a invoice that is roughly Zanzibar’s
- “Here’s an error that your Lambda serve as threw, this is the log output from
that exact serve as” will have to be at maximum two clicks away—now not 30.
- If your canine has a clutter of 14 pups, most likely you do not wish to identify
all of them delicate diversifications of the time period “CloudWatch”. The proliferation of
products and services and corporations that every one get started with the phrase “Cloud” is the matter
of an absolutely separate rant.
Please do not misunderstand me. I use, experience and advertise AWS products and services,
and I’m thought to be to be “an authentic voice” in large part as a result of in
addition to praising issues which are glorious, I’ll name out issues
that are not, as I’ve simply carried out. I’ve constructed my occupation and
industry on running inside that ecosystem. I to find AWS workers to be
clever and well-intentioned, and maximum of their products and services moderately excellent.
CloudWatch may just get there with some paintings, but it has got a bunch of very
painful usability problems that stay it from being excellent, let on my own nice.