Summary:
Responsible for building and maintaining data pipelines using Azure Data Factory, Synapse, and Databricks. Ensures reliable data ingestion, transformation, and storage across Azure platforms. Collaborates with ML Engineers and Solution Architects to deploy and monitor machine learning models, automate retraining workflows, and maintain version control of models and datasets. Performs data exploration, quality checks, and preprocessing to support scalable and production-ready ML solutions.
Roles & Responsibilities:
- Build and maintain data pipelines using Azure Data Factory, Synapse, or Databricks.
- Ensure data quality, transformation, and ingestion from various sources.
- Optimize data storage (e.g., Azure Data Lake, Blob Storage).
- Collaborate with ML Engineers and Solution Architects to deploy models into production environments.
- Design automated retraining and monitoring pipelines to track model drift and accuracy.
- Ingest, explore, and preprocess structured and unstructured data using:
- Azure Data Lake Storage
- Azure Synapse Analytics
- Azure Data Factory for data pipelines
- Perform exploratory data analysis in notebooks (e.g., Azure Machine Learning Notebooks or Azure Databricks).
- Assess data quality, detect anomalies, and recommend data cleansing strategies.
Professional & Technical Skills:
- Must To Have Skills: Proficiency in Python on Azure.
- Good To Have Skills: Experience with cloud-based application development.
- Strong understanding of web development frameworks and libraries.
- Familiarity with database management systems and data modeling.
- Experience with Azure Data Factory, Synapse Analytics, and Databricks for data pipeline development
- Proficient in data ingestion, transformation, and storage using Azure Data Lake and Blob Storage
- Skilled in data exploration, preprocessing, and quality assessment (structured & unstructured data)
- Familiar with deploying and monitoring machine learning models in production environments
- Ability to build automated model retraining and drift monitoring pipelines
- Strong in SQL, Python, and/or PySpark for data analysis and pipeline scripting
- Knowledge of version control tools and practices (e.g., Git)
- Effective collaboration with ML Engineers and Solution Architects
- Solid understanding of scalable, cloud-based data architectures.
Educational Qualification:
15 years of full-time education
Shift Timing:
Candidate should be flexible to work in shifts and coming to office.
Location:
Any location in India.