Skip to content

Data Engineering Roadmap

how big data engineering projects work

  • 'Terraform' to build infra on AWS and tear it down.
  • Github Actions for CI CD to build, test and deploy changes.
  • AWS ECR to host docker images.
  • AWS Lambda to host code.
  • AWS EventBridge to trigger something on event.
  • AWS S3 to store data dumps.
  • Docker to run in container, infra independent.
  • AWS IAM roles to give permission to resources to talk to each other.
  • DBT is used to do ETL.
  • how to read from api using lambda, load to S3?
  • what is Makefile? is shell command file, it is used to put all infra build and docker compose commands in a file.

Links

Terraform - IaC

It is a generic IaaC tool that can let you build infra using JSON on AWS/Azure and other platforms. Eg

  • You need not do things on AWS manually, like creating an IAM role.
  • You can create IAM Role on AWS.
  • You can create Rule on AWS EventBridge, like trigger lambda function daily.
  • You can create AWS ECR repository, where you can push the docker image.
  • You can create policy to save the logs to Cloudwatch event.
  • basically it is code the lets you create resources on AWS.
  • It is Infrastructure as Code IaC.

Terraform is container form image, where it runs code on a machine.

Alternatives: AWS CDK (Cloud Development Kit)

Github Actions - CI/CD

Action has jobs. A job has steps. Steps are commands.

Action is set of commands called jobs (a command is called step and set of jobs is action). It runs the commands on a machine (eg, ubuntu server). The jobs can be dependent on each other. The action (set of jobs) is triggered on something (eg, on repo push do build and deploy job).

Makefile

It specifies shell commands for docker-compose like run up down etc. Example

Commands to use it:

# compose up
make up

# run build format test etc
make ci

# compose down
make down

2025-01-12 Feb 2024