Data Engineer
Indexed description
Key Responsibilities
- Pipeline Development: Design and implement end-to-end ETL/ELT pipelines using Python and Spark to process large-scale healthcare datasets.
- Healthcare Interoperability: Lead the ingestion and transformation of clinical data standards, specifically HL7 (v2/v3) and FHIR (Fast Healthcare Interoperability Resources).
- Architecture Management: Implement and optimize a Medallion Architecture (Bronze, Silver, Gold layers) to ensure data quality, lineage, and governance.
- Data Modeling: Create complex SQL queries and data models optimized for healthcare analytics, ensuring consistency across clinical and claims data.
- Optimization: Performance tune Spark jobs and SQL queries to handle high-velocity healthcare data streams efficiently.
- Domain Expertise: Minimum 10 years of experience specifically within the Healthcare industry (Payer, Provider, or Health Tech).
- Interoperability Standards: Hands-on experience working with HL7 and FHIR is a must-have. Candidates must understand resources, profiles, and bundles.
- Technical Core:
- Spark: Advanced proficiency in PySpark for distributed data processing.
- Python: Strong programming skills for data manipulation and automation.
- SQL: Expert-level ability to write, debug, and optimize complex queries.
- Data Architecture: Proven experience implementing Medallion Architecture (Data Lakehouse patterns).
- Experience with cloud platforms (AWS, Azure, or GCP).
- Familiarity with healthcare privacy regulations (HIPAA, GDPR, or HITRUST).
- Experience with workflow orchestration tools like Airflow or Dagster.
- Knowledge of medical coding systems (ICD-10, LOINC, SNOMED).
Create a free Caio profile to unlock the full index and keep your job-search signal for future recommendations.
Unlock free search