About
A highly skilled Data Engineer with 3 years of experience specializing in designing and optimizing scalable ETL/ELT pipelines and real-time streaming systems across AWS and GCP. Proven ability to deliver high-impact data products, processing over 20M daily records and 5K events/sec using Python, PySpark, and Kafka, with a strong focus on automation, performance, and delivering ML-ready data through clean, modular architectures. Seeking to leverage expertise in cloud data platforms and real-time analytics to drive significant data-driven initiatives and enhance data infrastructure.
Work
Google (via Vaco Binary Semantics)
|Software Engineer
Gurugram, Haryana, India
→
Summary
Spearheaded the development and optimization of high-performance data pipelines and real-time streaming solutions for critical business and blockchain data across Google Cloud Platform.
Highlights
Optimized a modular batch pipeline for Amazon SP API reports, processing over 10K reports daily and reducing end-to-end latency by 70% through parallelized Pub/Sub and autoscaling Dataflow jobs.
Converted complex transformation logic into reusable Dataflow Flex Templates with config-driven schema mapping and parameterization, enabling rapid onboarding of new sellers and marketplaces without code changes.
Designed and implemented a low-latency real-time pipeline to ingest, transform, and load 10M+ Ethereum transactions daily into Google's proprietary graph database, leveraging Kafka and PySpark Structured Streaming.
Reduced data propagation latency by 65% in blockchain streaming through optimized micro-batch intervals, checkpoint tuning, partitioning, and parallelized graph API ingestion.
Enhanced a global AQI data pipeline, increasing daily data coverage by 60% to over 20M records across 120+ countries, by integrating enrichment, geospatial tagging, and deduplication components.
Improved pipeline uptime to 99.8% and reduced ingestion failures by 90% in the AQI pipeline by implementing robust validation rules and schema drift handling.
Wipro
|Project Engineer
Noida, Uttar Pradesh, India
→
Summary
Developed and automated scalable ETL pipelines and real-time streaming solutions for high-volume business data, contributing to operational analytics and efficiency improvements for the Marelli Project.
Highlights
Developed scalable ETL pipelines to ingest high-volume business data into centralized data lakes utilizing AWS Glue, Redshift, and S3.
Designed real-time data streaming flows with Kinesis and Python, enabling sub-minute operational analytics for critical business insights.
Automated reporting and reconciliation workflows using SQL and AWS QuickSight, reducing manual effort by 70%.
Education
Bharati Vidyapeeth's College of Engineering
→
B.Tech
Electronics and Communication
Grade: 8.3 CGPA
Awards
Best New Talent & Lead Recommendation Award
Awarded By
Vaco
Awarded for rapid production delivery and exceptional performance within the initial three months of tenure, recognized as 'Best New Talent'.
Star of the Month (x2)
Awarded By
Wipro
Recognized twice for outstanding contributions in ETL and reporting innovations, demonstrating exceptional performance and impact.
Certificates
Microsoft Certified: Azure Fundamentals
Issued By
Microsoft
Skills
Programming Languages
Python, SQL, Java.
Data Engineering
PySpark, Apache Kafka, Airflow, Hadoop, ETL/ELT Pipelines, Real-time Streaming, Data Modeling, Dataflow Flex Templates, Structured Streaming, Watermarking, Schema Drift Handling, Data Reliability.
Cloud Platforms
AWS (Redshift, S3, Lambda, Glue, Kinesis), GCP (BigQuery, Composer, Pub/Sub, Looker, Dataflow).
Databases
PostgreSQL, MySQL, Graphical DB.
Tools & Methodologies
Git, REST APIs, Shell Scripting, CI/CD, Agile, Production Support, Monitoring, AWS QuickSight.