Data Platform Architecture
A full-stack data engineering pipeline โ from raw sources to analytics-ready outputs.
๐ฑ๏ธ Hover a card on desktop ยท ๐ Tap on mobile to see how each tool is used
Data Sources
3 toolsPostgreSQL
Operational DB source for transactional ingestion pipelines
APIs
REST & GraphQL endpoints as real-time data sources
Files / Storage
ADLS Gen2 & GCS blobs as raw data landing zones
Streaming & Event Systems
3 toolsKafka
High-throughput distributed event streaming for real-time pipelines
Google Pub/Sub
GCP-native async messaging & event delivery at scale
Azure Event Hub
Azure-native event streaming hub for high-volume telemetry
Data Ingestion
3 toolsApache Airflow
DAG-based orchestration for complex batch pipeline scheduling
Azure Data Factory
Cloud ETL/ELT for scalable Azure data movement & transformation
Databricks Auto Loader
Incremental file ingestion with automatic schema evolution
Data Processing
4 toolsApache Spark
Distributed processing engine for large-scale dataset transformations
Databricks
Unified analytics platform โ used at Rolls-Royce & Boots for lakehouse builds
Python
Primary language for data engineering, automation & ML workflows
SQL
Core language for data modelling, transformation & analytical queries
Lakehouse & Warehousing
4 toolsDelta Lake
ACID-compliant lakehouse storage โ foundation of medallion architecture
Snowflake
Cloud data warehouse for high-performance analytical workloads
dbt
SQL-based data transformation with lineage, testing & documentation
Microsoft Fabric
Unified SaaS analytics platform combining data engineering, warehousing & BI
AI & Generative AI
6 toolsAzure OpenAI
GPT-4 & embeddings via Azure-native OpenAI service for enterprise AI apps
LangChain
Agent orchestration & RAG pipeline framework for LLM-powered workflows
Hugging Face
Pre-trained transformer models & open-source model hub for NLP & vision
MLflow
ML experiment tracking, model registry & deployment lifecycle management
Vector DB
Pinecone & ChromaDB for embeddings storage powering semantic search & RAG
Machine Learning
TensorFlow & Scikit-learn for predictive modelling & feature engineering
Serving & Analytics
2 toolsPower BI
Enterprise BI dashboards & self-service analytics for stakeholders
Knowledge Graph
Neo4j graph modelling for entity relationships & connected data queries
Platform Engineering
4 toolsDocker
Containerisation for reproducible, portable data pipeline environments
Kubernetes
Container orchestration for scalable, resilient data platform workloads
Azure
Primary cloud โ ADF, ADLS Gen2, Synapse, Fabric & Azure DevOps
Google Cloud
GCP for BigQuery, Pub/Sub, Dataflow & Cloud Composer