Table of contents
- CloudWatch Unified Navigation (easier correlation)
- Logs Insights News (less silos, more analytics)
- OpenSearch ❤️ CloudWatch (less silos, more analytics)
- Transaction Search (deeper, distributed tracing)
- Application Signals
- X-Ray to OTEL
- AI Investigation
- Auditing Tracing Configuration
- Synthetics
- New Metrics (more coverage)
- Summary
- Resources
There were a lot of CloudWatch launches this year. In the launch session, they were summarized as:
More Coverage
Easier Correlation
Less silos, more analytics
Deeper distributed tracing
Aided investigations
We’ve put them a bit into our own categories, let’s go through all of them.
CloudWatch Unified Navigation (easier correlation)
Let’s start with something very cool, the CloudWatch Unified Navigation.
This feature aims to bring CloudWatch into almost every service pane that is available on AWS. Basically, it is a new sidebar that you can trigger.
You will mostly see this feature with the name of the explore related button (naming is hard yes).
The new feature should help you find things that belong together. Often you will find yourself looking at certain traces and you know that something else belongs to that as well. E.g. another trace, log, or metrics. This is what this is meant for.
Finding this feature was harder than I thought. In the documentation, it states that it is available on different pages of CloudWatch. In the launch session, there was also a compass icon with the name “explore related” available. Somehow, that wasn’t the case for me.
You need to look for it in the top right corner. It is not the compass icon described in the documentation 🤷🏽♂️ but it is a laptop with a wrench - I already submitted feedback.
The pages you can access it from:
CloudWatch Metrics (navigation, legend, data points)
Console toolbar
In different services (e.g. Lambda → Monitoring → … → Explore related)
Once you open up this pane, you will see additional information. This is quite neat! First of all the tracing overview page got a nice overhaul. Let’s hope this comes to the general trace map as well.
From this pane, you can see all related metrics, logs, and traces. You can also go further by clicking on the connected resources. For example, on another service or API that is used from these services. Then you can see the metrics, logs, and traces of this resource.
For everybody who knows how hard it can be to even find the correct log group name, this can be a lifesaver.
Here is a list of supported services within the explore related page. For some services that are mentioned,it somehow doesn’t work anyway. For example, for my Step Function.
Overall, a very cool feature in my opinion. Especially, to find fast-related logs, traces, and components.
Logs Insights News (less silos, more analytics)
We love logs insights. And if you use CloudWatch as your main observability solution, you will use logs insights daily. There were a couple of launches for Logs Insights itself. I’ll summarize them here.
New Languages to analyze logs - SQL and PPL
You can now use two more languages to analyze logs. Piped Processing Language (PPL) and SQL.
PPL follows a typical Pipe approach like you’re used to it in Linux:
fields `@timestamp`, `@message`, `@logStream`, `@log`, `logLevel`, `event.path`
| where `correlationIds.httpMethod` = "POST"
| stats count() by `event.path`
And SQL, well is SQL.
SELECT A.`message` as Message, A.`event.path` as Path, A.`lambdaFunction.coldStart` as IsColdStart, B.`status` as ApiGwStatus
FROM `/aws/lambda/dev-ApiStack-RestApiConstructRestApiHandler8040241-nX7hVTvM1SMY` as A
INNER JOIN `dev-ApiStack-RestApiConstructRestApiAccessLogGroup6FB97884-Jru7iSV5QAsj` as B
ON A.`correlationIds.requestId` = B.`requestId`
In SQL you have the ability to use cool SQL functions like
join
aggregations
and all the other stuff SQL has to offer 😉
I’m not sure if I will use PPL a lot, but I definitely start using SQL to analyze my logs. In the example query above, I join my Lambda logs with my API Gateway logs based on the request ID to get some further data like the integration status 😎
I like that a lot!
10,000 Log Groups
There was a limitation of having 50 log groups in one query. This was changed if you search for log groups by a prefix or use all log groups available
Field Indexes
You can now also index fields of logs that you are analyzing. This will improve the performance of queries and hence reduce the costs.
For example, here I’ve created a new index on all my Lambda log groups (/aws/lambda/dev
prefix) on the request ID in my correlation IDs.
OpenSearch ❤️ CloudWatch (less silos, more analytics)
OpenSearch now natively integrates with CloudWatch. You can create dashboards for some pre-defined use cases like:
VPC Flow Logs
CloudTrail Logs
WAF Logs
The idea is quite cool. You can use it everywhere where you can use OpenSearch Direct Query. This is kind of a serverless variant of OpenSearch. You only pay for the usage (but not too little).
Their pricing still seems a bit harsh and hard to calculate. Here is a pricing example from their landing page:
The total monthly charges = $732
$3 (Direct Query OCU)
$350 (Serverless Indexing)
$29 (Serverless Storage)
$350 (Serverless Search)
This is with a monthly ingest of over 1 TB!
Great feature, especially for getting an ELK stack-like experience. Let’s see if we can build dashboards ourselves soon without the need to use a pre-defined dashboard.
Transaction Search (deeper, distributed tracing)
Transaction search is another very interesting piece! Once you enable it it will transform your X-Ray traces into Open Telemetry spans. These spans help you gain visibility into your application.
For me, this simply looks like distributed tracing for now. But maybe this is the way of AWS to support more Open Telemetry instead of only supporting X-Ray. Maybe this will even replace X-Ray at some point? 🤔
We’ve enabled transaction search for our GitHub repository tracker (our example CloudWatch Book application) and got a few spans:
Once you open one of those you will be redirected to the actual X-Ray trace.
You can also do some basic aggregations:
But for us some services are missing, so that needs to be further investigated.
Application Signals
With this one, I needed to think first. Because Application Signals already exist as a category of services.
Services like Evidently (RIP), RUM, and Synthetics fall into the category of Application Signals. However, this launch also describes the service or feature Application Signals. Yes, naming things is hard.
This feature already existed and was launched last year at re:invent.
Application Signals wants to give you an overall view of your application and give you the whole visibility. The launch post promises three main features for developers
Developers can answer any question related to performance through an interactive visual editor
Developers can diagnose rarely occurring issues
Logs offer advanced features for transaction spans
With Application Signals, you can also define Service Level Objectives (SLO). These can help you understand if you meet the goals you’ve set for yourself or not. These can for example be availability, latency, errors, etc.
Application Signals are there for whole services. You can enable it for:
ECS
EKS
Lambda
But you can also enable it (I think) for everything that the CloudWatch agent can run on. You need to enable them by installing the CloudWatch Agent or AWS Distro for OpenTelemetry.
We’ve activated Transaction Search for our example web application for the CloudWatch Book and an Application Signal Service was automatically created as well:
The canaries (we have one) are not connected yet, but we already get an overview like that.
If you want to learn more about Application Signals, make sure to check out the amazing One Observability workshop.
X-Ray to OTEL
I think one main insight into all of these launches is that AWS supports more and more OpenTelemetry now! It seems that AWS is basing its new services on OTEL data spans instead of their format. This is quite cool because it allows you to use third-party software for traces as well.
AI Investigation
Investigations is the first 👆🏽 AI feature of CloudWatch in this re:invent. The idea is to help you debug and investigate any issues you have. You can connect it with your chat applications via connecting it to SNS. And it also allows you to connect your ticketing system like Linear, Jira, or whatever you use.
You can trigger a sample investigation to get an idea how what it looks like:
There are different panes you can see:
Feed: The feed is the overview you are often used to in a ticketing system. You can see what you’re other developers posted to this investigation.
Suggestions: Suggestions are auto-generated by Q. It looks at recent deployments, configs, and much more to give you an idea of how you can improve. This looks quite nice!
Overall, the idea is amazing. It hardly depends on how well it will work. I’m amazed by it and will make use of it. Let’s see how good it will work in a production app with lots of traffic!
Auditing Tracing Configuration
CloudWatch gives you a new overview of your tracing settings. You can turn it on for your whole account or organization. Once activated it will search for resources in your account.
It then shows you an overview of activated traces of the following resource types:
EC2 Instances
VPCs
Lambda Functions
The idea here is to give you an overview of all the different tracing settings within your infrastructure. You don’t want to miss traces of a crucial application. Especially, since for the OTEL spans they clearly recommend to sample 100% of your traces, this will help you with that!
Unfortunately, for our accounts, it didn’t work yet and it couldn’t find any resources.
Synthetics
Synthetics also got two minor updates. With Synthetics you can build E2E web tests. Typically, you use a headless browser for that. That is a browser that you can control from code. There is now a new runtime, playwright for that. This is quite nice! What comes with that as well is that you can store your logs directly in CloudWatch instead of storing them as text files in S3. That’s quite cool!
Synthetics will now also finally delete Lambda resources when canaries are removed. This was quite a hassle always if you’ve removed a canary you needed to remove the CloudWatch Log Group, Lambda, and everything yourself. This should now be automated!
New Metrics (more coverage)
CloudWatch announced several new metrics to some services.
Event Source Mapping Metrics for Lambda
There are now metrics available for the actual event source mapping (ESM) in Lambda. This is quite useful. If you connect SQS with a Lambda, for example, the main magic happens within the event source mapping. Until now this was kind of a black box. Now you can see metrics like
PolledEventCount
(events read by ESM)InvokedEventCount
(events invoking Lambda function)FilteredOutEventCount
(events filtered out)FailedInvokeEventCount
(events failing to invoke)
ECS Container Insights enhanced observability
ECS now has an additional mode called enhanced observability. Before it was only called ECS Container Insights and the enhanced observability bit gives you some more metrics.
You can set it up very easily: aws ecs put-account-setting --name containerInsights --value enhanced
Some more metrics are:
ContainerMemoryUtilization
ContainerCpuUtilization
ContainerCpuReserved
Database Insights
Database Insights gives you more insights into your database (🥁). Only Aurora MySQL and Aurora PostgreSQL are supported right now. It will mainly summarize logs and metrics from your DB in a dashboard.
There are two modes: Standard and Advanced.
Network Flow Monitoring
Network flow monitoring allows you to get network data to CloudWatch. You need to install an agent for that. If you do that you get near real-time information about your network traffic. While this is a bit bigger than “we’ve added some new metrics”, in the end ,you’ll have new metrics 😉
Summary
This re:invent had some amazing launches. Only the CloudWatch launches were amazing!
TLDR;
More Coverage: More Metrics
Easier Correlation: CloudWatch Unified Navigation
Less silos, more analytics: OpenSearch integration
Deeper distributed tracing: X-Ray → OTEL spans
Aided investigations: AI Q Developer Assistant
Improving the user experience for CloudWatch should be one of the number one topics of AWS in my opinion. CloudWatch is often the only service why developers log into the console still a lot. The unified navigation is a great first step.
Making use of OTEL spans instead of their own X-Ray format is a great idea as well in my perspective. It allows AWS to support more observability tools and gives customers the ability to export them into third-party tools and correlate with more systems.
Let’s see what the future brings!
Resources
AWS News was a great help for that!
OpenTelemetry on AWS: Observability at Scale with Open-Source