How to Enrich AWS Cost and Usage Data with Infrastructure Data in Amazon Athena
Introduction
AWS provides Cost and Usage Reports (CUR) as a comprehensive set of cost and usage data including granular breakdowns by time period, by resource, and by other distinctions. To gain actionable insights on cost data as part of a FinOps strategy, we will enrich the Cost and Usage Reports data with more detail of the corresponding infrastructure and resources so that educated decisions can be made on how to optimize cloud cost.
In this tutorial, we will show you how to load AWS's Cost and Usage Reports into Amazon Athena and enrich the Cost and Usage Reports data with infrastructure data synced by CloudQuery. This will enable you to query your data and execute different cost optimization and FinOps use cases to understand and manage the financials of your cloud infrastructure.
Walkthrough
1. Infrastructure Data Prerequisites
In a previous guide, we talked about how to load infrastructure data into Amazon Athena. If you haven't done so already, follow the guide to setup CloudQuery to sync infrastructure data into S3 and query that data with Amazon Athena.
After completing the guide above, you should have the following completed:
- CloudQuery syncing data from AWS to S3.
- The ability to query data in Athena.
2. Start Gathering Cost and Usage Reports
To set up Cost and Usage Reports, we'll follow AWS documentation for how to set up the infrastructure necessary for Cost and Usage Reports.
By the end of this step, we'll have created the following:
- A S3 Bucket to hold our Cost and Usage Reports.
- A new Cost and Usage Report.
Let's start by creating a new S3 bucket (opens in a new tab) for our Cost and Usage Reports. To do so in AWS console, navigate to the Cost and Usage Reports page of the AWS Billing Console (opens in a new tab) (use the region of your choice) and select the following options:
- Include resource IDs
- Enable Data Refresh
On the next page, click Configure S3 Bucket and create a bucket. You can choose to use an existing bucket or customize the bucket settings if your organization has specific S3 requirements.
Once the bucket has been confirmed with the default bucket policy, select the following settings:
- Select the time granularity of your choice.
- Enable report data integration for Amazon Athena.
After verifying the settings of the Cost and Usage Report, select create report!
Please see here (opens in a new tab) for the CLI equivalent.
AWS's billingreports.amazonaws.com
will deliver these reports to our new S3 Bucket. Keep in mind that it may take up to 24 hours for AWS to start delivering reports to the Amazon S3 bucket. If you are creating a fresh S3 bucket, we suggest creating the bucket and coming back the next day to complete the rest of this guide! Storing the billing reports data in Amazon S3 will incur standard S3 costs (opens in a new tab).
You should see multiple files in the S3 bucket once the cost and usage reports have been successfully delivered to your S3 bucket. In the billing console, the cost and usage report created will also show a time stamp for Data last refreshed
such as the screenshot below.
3. Set up Amazon Athena for Cost and Usage Reports
Now that we have successfully created an S3 Bucket to receive Cost and Usage Reports, we will now setup Athena for querying our Cost and Usage Reports.
For this step, we'll use the CloudFormation template that AWS generates (opens in a new tab) in the bucket we previously created for Cost and Usage Reports. Athena can be setup manually (opens in a new tab) if CloudFormation does not work for your usage.
aws cloudformation create-stack \
--stack-name cost-and-usage-crawler \
--template-url https://cq-demo-cur.s3.amazonaws.com/cost-and-usage/cloudquery-demo-reports/crawler-cfn.yml \
--capabilities CAPABILITY_IAM
--region us-east-1
Remember to replace the template-url
parameter with the object URL for the CloudFormation template that AWS created in the Cost and Usage S3 Bucket.
This template creates the following resources:
- 3 IAM Roles
- An AWS Glue Database
- An AWS Glue crawler
- 2 Lambda functions
- An Amazon S3 notification
As needed, adjust the CloudFormation template depending on your organizational requirements such as required IAM controls, tags, and encryption settings.
You should now see 2 databases loaded in Amazon Athena:
- CloudQuery Data
- AWS Cost and Usage Reports
We can run the following query to validate that Athena is ready (opens in a new tab) for queries on the Cost and Usage data, replacingathenacurcfn_cloudquery_demo_reports
with your own database name. A status of READY
means we can query the Athena database. If the status is UPDATING
, Athena may return incomplete results.
SELECT *
FROM "athenacurcfn_cloudquery_demo_reports"."cost_and_usage_data_status"
4. Run Athena Queries on Infrastructure and Cost Data
Now, let's try a few use cases and run example Athena queries on our Infrastructure and Cost Data.
We'll start by looking at our biggest expenses by AWS resource ID:
SELECT
line_item_resource_id,
line_item_product_code,
sum(line_item_blended_cost) AS cost,
month
FROM "athenacurcfn_cloudquery_demo_reports"."cloudquery_demo_reports"
WHERE year='2023' and line_item_resource_id!=''
GROUP BY line_item_resource_id, line_item_product_code, month
HAVING sum(line_item_blended_cost) > 0
ORDER BY cost DESC;
By including the line_item_resource_id!=''
in our Athena query, we exclude cost of items that are not directly associated with resource IDs such as cost for AWS Support. This query will return the most expensive resources ordered by cost for 2023.
Let's look at another use case. One method to reduce costs is to use more cost-effective EBS volume types. In December of 2020, AWS introduced the availability (opens in a new tab) of the gp3 new Amazon EBS general purpose volumes which can provide up to 20% cost savings over existing gp2 volumes.
SELECT
costquery.line_item_resource_id,
costquery.cost,
vols.volume_type,
vols.attachments,
vols.arn,
vols.tags,
vols.state,
vols.snapshot_id,
vols.size,
vols.create_time
FROM (
SELECT
line_item_resource_id, line_item_product_code,
SUM(line_item_blended_cost) AS cost, month
FROM "athenacurcfn_cloudquery_demo_reports"."cloudquery_demo_reports"
WHERE
line_item_resource_id LIKE 'vol-%'
GROUP BY
line_item_resource_id, line_item_product_code,
month
HAVING SUM(line_item_blended_cost) > 0
ORDER BY cost DESC
) as costquery
LEFT JOIN
"cloudquery-athena-example-json"."aws_ec2_ebs_volumes" as vols
ON costquery.line_item_resource_id = vols.volume_id
WHERE vols.volume_type = 'gp2'
The above query will return all gp2 EBS volumes in 2023 and show the cost as well as useful characteristics we synced from AWS with CloudQuery including attachment information and the state of the EBS volume. We can see the current and historical cost, evaluate the potential cost savings, and decide whether these are good candidates for migration from gp2 to gp3 while avoiding potential impact to applications based off the infrastructure details added to the cost and usage data.
For more information on how to migrate from gp2 to gp3 volumes, see the AWS blog post (opens in a new tab)
Summary
We've now walked through how to load AWS's Cost and Usage Reports into Amazon Athena and enriched that data with infrastructure data synced by CloudQuery. Additionally, we've run the following 2 example queries on our infrastructure and cloud cost data:
- Listing our highest expenses by AWS resource ID.
- Listing potential candidates for gp2 to more cost-effective gp3 EBS volume migration.
We'd love to see what more you do with cost optimization and FinOps with CloudQuery and AWS Cost and Usage Reports. If you have comments or questions about using CloudQuery for Cloud Cost Management and Optimization, we would love to hear from you! Reach out to us on GitHub (opens in a new tab) or Discord (opens in a new tab)!
References
AWS: Querying Cost and Usage Reports using Amazon Athena (opens in a new tab)
AWS: What are AWS Cost and Usage Reports (opens in a new tab)
AWS Blogs: Starting your Cloud Financial Management journey: Cost visibility (opens in a new tab)