Data Scientist- Hybrid (3 times per week)
Indexed description
Fusemachines continues to actively pursue the mission of democratizing AI for the masses by providing high-quality AI education in underserved communities and helping organizations achieve their full potential with AI.
Salary Range: US$ 140,000-170,000/year
Important: Immigration Sponsorship Policy
This position is not elegible for employment visa sponsorship or transfer sponsorship now or in the future.
- Direct Company Sponsorship: Such as H-1B, J-1, or TN visas
- Employer of Record: Listing Fusemachines as the immigration employer on any government documentation
- Written Documentation: Providing letters or other support for any work authorization (e.g., OPT, STEM OPT, CPT)
You should be strong in core data science and applied machine learning, comfortable working with real-world data, and capable of turning modeling work into production-ready systems.
Key Responsibilities
- Problem Framing & Stakeholder Partnership
- Translate business questions into ML problem statements (classification, regression, time series forecasting, clustering, anomaly detection, recommendation, etc.)
- Collaborate with stakeholders to define success metrics, evaluation plans, and practical constraints (latency, interpretability, cost, data availability)
- Data Analysis & Feature Engineering
- Use SQL and Python to extract, join, and analyze data from relational databases and data warehouses
- Perform data profiling, missingness analysis, leakage checks, and exploratory analysis to guide modeling choices
- Build robust feature pipelines (aggregation, encoding, scaling, embeddings where appropriate) and document assumptions
- Model Development (Core ML)
- Train and tune supervised learning models for tabular data (e.g., logistic/linear models, tree-based methods, gradient boosting such as XGBoost/LightGBM/CatBoost, and neural nets for structured data)
- Apply strong tabular modeling practices: handling missing data, categorical encoding, leakage prevention, class imbalance strategies, calibration, and robust cross-validation
- Build time series models (statistical and ML/DL approaches) and validate with proper backtesting
- Apply clustering and segmentation techniques (k-means, hierarchical, DBSCAN, Gaussian mixtures) and evaluate stability and usefulness
- Apply statistics in practice (hypothesis testing, confidence intervals, sampling, experiment design) to support inference and decision-making
- Deep Learning
- Build and train deep learning models using PyTorch or TensorFlow/Keras
- Use best practices for training (regularization, calibration, class imbalance handling, reproducibility, sound train/val/test design)
- Evaluation, Explainability, and Iteration
- Choose appropriate metrics (AUC/F1/PR, RMSE/MAE/MAPE, calibration, lift, and business KPIs) and create evaluation reports
- Perform error analysis and interpretation (feature importance/SHAP, cohort slicing) and iterate based on evidence
- Productionization & MLOps (Project-Dependent)
- Package models for deployment (batch scoring pipelines or real-time APIs) and collaborate with engineers on integration
- Implement practical MLOps: versioning, reproducible training, automated evaluation, monitoring for drift/performance, and retraining plans
- Documentation & Communication
- Communicate tradeoffs and recommendations clearly to technical and non-technical stakeholders
- Create documentation and lightweight demos that make results actionable
- You deliver models that perform well and move business metrics (revenue lift, cost reduction, risk reduction, improved forecast accuracy, operational efficiency)
- Your work is reproducible and production-aware: clear data lineage, robust evaluation, and a credible path to deployment/monitoring
- Stakeholders trust your judgment in selecting methods and communicating uncertainty honestly
- 3–8 years of experience in data science, machine learning engineering, or applied ML (mid-to-senior)
- Strong Python skills for data analysis and modeling (pandas/numpy/scikit-learn or equivalent)
- Strong SQL skills (joins, window functions, aggregation, performance awareness)
- Solid foundation in statistics (hypothesis testing, uncertainty, bias/variance, sampling) and practical experimentation mindset
- Hands-on experience across multiple model types, including:
- Classification & regression
- Time series forecasting
- Clustering/segmentation
- Experience with deep learning in PyTorch or TensorFlow/Keras
- Strong problem-solving skills: ability to work with ambiguous goals and messy data
- Clear communication skills and ability to translate analysis into decisions
- Experience with Databricks for applied ML (e.g., Spark, Delta Lake, MLflow, Databricks Jobs/Workflows)
- Experience deploying models to production (APIs, batch pipelines) and maintaining them over time (monitoring, retraining)
- Experience with orchestration tools (Airflow, Prefect, Dagster) and modern data stacks (Snowflake/BigQuery/Redshift/Databricks)
- Experience with cloud platforms (AWS/GCP/Azure/IBM) and containerization (Docker)
- Experience with responsible AI and governance best practices (privacy/PII handling, auditability, access controls)
- Consulting or client-facing delivery experience
- Cloud certifications: AWS, Google Cloud, Microsoft Azure, or IBM (data/AI/ML tracks)
- Databricks certifications (Data Scientist, Data Engineer, or related)
- Causal inference experience (e.g., quasi-experimental methods, propensity scores, uplift/heterogeneous treatment effects, experimentation beyond A/B tests)
- Agentic development experience: designing and evaluating agentic workflows (tool use, planning, memory/state, guardrails) and integrating them into products
- Deep familiarity with agentic coding tools and workflows for accelerated product development (e.g., AI-assisted IDEs, code agents, automated testing/refactoring, repo-aware assistants), including strong judgment on quality, security, and maintainability
pYO5KHk85I
Create a free Caio profile to unlock the full index and keep your job-search signal for future recommendations.
Unlock free search