Overview
- Amazon Athena is an interactive query service
- makes it easy to analyze data in Amazon S3 using standard SQL.
- Athena is serverless, so there is no infrastructure to manage
- you pay only for the queries that you run.
- Athena is used for analytics and not to prepare data for analytics.
- Athena supports many formats
- CSV
- JSON
- ORC
- Avro
- Parquet
- Possibly others
- Amazon is commonly used with QuickSight for reporting/dashboards
Pricing
- Fixed amount
- $5.00 per TB of data scanned
Use Cases
- Business intelligence
- Analytics
- report, analyze, & query VPC flow logs
- ELB Logs
- CloudTrail trails
- Ad-hoc queries
- Pretty much query any logs that originate from your
Use columnar data for cost-savings (scan less!!!)
Compress Data for smaller retrievals
- bzip2
- gzip
- lz4
- snappy
- zlip
- zstd
Partition Datasets in S3 for Easier Querying on Virtutal Columns
s3://yourBucket/pathToTable
/<PARTITION_COLUMN_NAME>=<VALUE>
/<PARTITION_COLUMN_NAME>=<VALUE>
/<PARTITION_COLUMN_NAME>=<VALUE>
/etc...
Use Larger Files to Minimize Overhead
Federated Query
- Allows you to run SQL queries across data stored in relational, nne-relational, object, and custom data sources
- Uses Data Source Connectors that run on AWS Lambda to run Federated Queries, for example
- Store the results back in S3
Exam Alerts
- Analyze data in S3 using serverless SQL, you should be thinking Athena