Skip to content

AWS DEA-C01 Preparation

Exam Parts and Weighting

Exam Details

  • 130 minutes (2hr 10 minutes)
  • 64 MCQ
  • Ask additional time as English is second language

DE Fundamentals

  • Data from Sources lands on to AWS Data Lake
  • AWS Data Lake is object file storage like S3.
  • Data landed as is is called RAW Data.
  • Then it is read by AWS Glue using Crawler.
  • Crawler crawls and builds Catalog.
  • Catalog is meta-data which lets do more operations and is schema-on-read.
  • Then data is processed and stored as Transformed Data back into AWS Data Lake.
  • Again crawler is built on Transformed Data, and then it can be used for Data Analytics by QuickSight, SageMaker, Athena or OpenSearch

flowchart LR

s1[Data Sources \n
Database
Object
IoT
Mobile]

s2[Raw Data \n
Amazon S3
]

s31[[Glue \n
Crawler
Catalog]]

s32[[Glue \n
Crawler
Catalog]]


s4[Transformed Data \n
Amazon S3
]

s5[Data Analytics \n
Amazon QuickSight
Amazon SageMaker
Amazon OpenSearch
Amazon Athena
]

s1 --> s2 --> s31 --> s4 --> s32 --> s5

subgraph sub1[Data Lake]
s2
s31
s4
s32
end

Data Lake vs Data Warehouse

Why Data Lake?

  • Cost: DL is cheap as it is file storage whereas DW being RDBMS is expensive.
  • DL is central repo for raw data, DW is central repo for historical data.
  • Structure - DL supports any format, DW store organised and structured.
  • Schema - DL is schema-on-read; DW is schema-on-write
  • Type - DL can store text, images, videos; DW is good for transactional, CRM, ERP etc.

2025-01-12 January 12, 2025