Back to search
Recro Linkedin · Posted 1mo ago

Data Engineer

Bengaluru, Karnataka, India

Linkedin
Continue to application Add your email once, then Caio opens the original posting.

Indexed description

Role - Data Engineer

Experience - 3-6 yrs

Location - Bangalore


We are looking for a Data Engineer II (SDE-2) to join our data team. The ideal candidate will be a play a key role to develop of high performant and scalable Data Lake-house, moving us toward a world of sub-minute data latency and unified batch/streaming compute. This is an engineering-heavy role where you will manage complex CDC flows, optimize distributed query engines and leverage AI to accelerate our development lifecycle.

Technical Priorities

  • Real-time CDC: Ownership of high-throughput ingestion from RDBMS to Lakehouse using Debezium, PeerDB.
  • Lakehouse Architecture: Designing and optimizing table formats (Iceberg, Delta, Hudi) for both performance and storage efficiency.
  • Unified Compute: Developing robust ETL/ELT frameworks in PySpark and Flink (handling both batch and streaming workloads).
  • Infrastructure & Ops: Managing data workloads on AWS (EMR, EKS, MSK, S3) and automating everything via Gitlab/Github Actions.
  • Query & BI: Tuning Trino or Clickhouse to power real-time dashboards in Metabase, Superset, and PowerBI.

Requirements

  • Experience: 3–5 years in Data Engineering, specifically with distributed systems and cloud-native architectures.
  • Coding: Expert-level Python/PySpark and SQL.
  • Familiarity with Go/Java/Scala is a plus
  • Infrastructure: Hands-on experience with AWS (S3, EKS, MSK) and Infrastructure-as-Code.
  • Orchestration: Experience with Airflow or Temporal for complex workflow management.
  • AI-Native: Proficiency in using AI tools (Claude, Codex, Copilot) to write, test, and document code efficiently.
  • Systems Thinking: Ability to explain the trade-offs between different storage formats and processing frameworks.
  • Tech LeaderShip : Drive key tech initiatives by preparing TRD and actively involve in design reviews.
  • Domain Modelling - Should be hands on in designing Domain models for OLAP like Fact, Dimension and types of SCD’s and OBT pattern tables.
  • Self Starter - Lead the team technically and bring in new ideas to contribute to the growth of the charter.
  • Customer First - Interact with the Product & Key Stakeholders & help them by adding value to the business workflow with data & analytics.

Our Tech Stack

  • Ingestion: Debezium, PeerDB, Olake
  • Storage: Delta, Iceberg, Hudi (S3-based Lakehouse)
  • Compute: PySpark, Flink, EMR, EKS
  • Streaming: MSK (Kafka)
  • Query Engines: Trino, Clickhouse
  • Orchestration: Airflow, Temporal
  • DevOps: Gitlab, Github Actions, Terraform

Visualization: Metabase, Superset, Tableau, PowerBI

Free. 20 seconds. No password. See every match in this search.

Create a free Caio profile to unlock more results and save your role and location preferences.

Unlock free search
Want help applying to roles like this? Search Caio for free. If the repetitive CV tweaking gets heavy, Daniel can help set up Caio Agent.
Ask about Agent