Example of an ETL job in AWS Glue, and query in AWS Athena

Example of an ETL job in AWS Glue, and query in AWS Athena

Let’s look at a simple end to end run through of using AWS Glue to transform data then query using AWS Athena

Let’s look at a simple end to end run through of using AWS Glue to transform data from a format, into a more queryable format, and then query it using AWS Athena. We will look at this through the console only, with more focus on how to automate this with terraform in the future post.

All the data in this post are from apache logs, which can be downloaded from Github. The data has been broken into 5 pieces, to simulate that the logs were uploaded at 5 different times.

You can download the splits here: log 1 log 2 log 3 log 4 log 5

Uploading the data

In order to query the data in AWS, you will need to upload the data files into an S3 bucket, you can use the example files above, or just some random apache log files.

Image for post

S3 bucket

Setting up an AWS Glue Job

In the AWS console, search for Glue. Once it is open, navigate to the Databases tab. Create a new database, I created a database called craig-test. The database is used as a data catalog and stores information about schema information, not actual data.

glue python athena data-analytics aws

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Overview of Data Analytics in AWS -  Glue, Athena and DataLake

In this post, i'll share you Overview of Data Analytics in AWS -  Glue, Athena and DataLake. I hope with this post to discuss the current state of analytics in AWS Cloud.

Basic Data Types in Python | Python Web Development For Beginners

In the programming world, Data types play an important role. Each Variable is stored in different data types and responsible for various functions. Python had two different objects, and They are mutable and immutable objects.

Data Analytics For Beginners

🔥Intellipaat Data Analytics training course: https://intellipaat.com/data-analytics-master-training-course/ In this data analytics for beginners video you wi...

Data Analytics Lifecycle using AWS

Data Analytics Lifecycle using AWS. What comprises of Data Analytics Pipeline ? Confused with n-number of data channels ? Don’t worry !! This blog will try explain it with much ease and efficacy.

How to Define Data Analytics Capabilities | Hacker Noon

Disclaimer: Many points made in this post have been derived from discussions with various parties, but do not represent any individuals or organisations.