Custom Software Engineer
Project Role Description : Develop custom software solutions to design, code, and enhance components across systems or applications. Use modern frameworks and agile practices to deliver scalable, high-performing solutions tailored to specific business needs.
Must have skills : PySpark
Good to have skills : NA
Minimum 3 year(s) of experience is required
Educational Qualification : 15 years full time education
We are seeking a skilled Data Engineer to design, build, and optimize scalable data pipelines on the Enterprise Data Platform (EDL) running on Cloudera (CDP) on AWS. The role involves working with the Hadoop ecosystem, building PySpark-based data processing pipelines, orchestrating workflows using Oozie and Control-M, integrating with AWS services (S3, IAM, EC2), and delivering secure, reliable, cloud-ready data solutions.
Key Responsibilities
Data Engineering & Platform Development
Build scalable data ingestion and processing pipelines using Cloudera CDP on AWS, Hadoop (HDFS, Hive, YARN), PySpark/Spark SQL, and AWS S3.
Design data flows between HDFS and S3 using DistCp, Spark read/write, file compaction, archival, and lifecycle policies.
Develop and optimize PySpark jobs for performance, partitioning, caching, and YARN resource allocation.
Build FastAPI microservices for data access, metadata, and operational endpoints.
Design scalable data models across Raw, Presentation, and Data Provisioning layers.
Workflow Orchestration & Automation
Develop and manage Apache Oozie workflows and coordinators including triggers, SLAs, HDFS/S3 path management, and kill/recovery actions.
Implement enterprise scheduling using Control-M with dependencies, calendars, alerts, SLAs, and automated retries.
Automate operational tasks using Shell/Bash scripting for monitoring, file operations, HDFS maintenance, and backfill processes.
Cloud, Storage & Platform Operations
Work with Cloudera on AWS including Cloudera Manager, Hive on Tez, Spark on YARN, cluster scaling, queues, and capacity planning.
Use AWS services: S3 (storage, versioning, lifecycle, encryption) and optionally IAM, EMR, EC2, Lambda.
Implement data security and governance using Ranger, Kerberos, TLS, audit logs, and data masking/tokenization (nice-to-have).
API Engineering (FastAPI on YARN + Hive)
Build FastAPI REST services interacting with Hive tables (HiveServer2 / Impala / LLAP / JDBC/ODBC) and Spark jobs on YARN.
Develop APIs to submit Spark/PySpark jobs, track job status/logs/YARN application IDs, execute Hive queries, and return dataset results.
Expose metadata, lineage, health checks, and data insights via APIs.
Implement asynchronous APIs for long-running Spark jobs.
Develop FastAPI middleware for authentication, logging, monitoring, retries, and circuit breaking.
Quality, Monitoring & Reliability
Implement data quality checks including schema validation, null checks, and reconciliation.
Monitor using Spark UI, YARN RM metrics, and Cloudera Manager alerts.
Optimize cluster and application performance, reduce cost, and improve pipeline efficiency.
Perform production issue triage, RCA, and preventive automation.
Required Skills & Experience
3–10+ years of experience in Data Engineering.
Hands-on experience with Cloudera (CDH/CDP), Hadoop, HDFS, Hive/Impala, PySpark/Spark SQL, Oozie workflows/coordinators/SLA, Control-M scheduling, AWS S3 architecture.
Strong Python development for ETL frameworks, exception handling, and testing.
Strong Linux/Shell scripting skills.
Understanding of distributed systems concepts including shuffle, skew handling, broadcast joins, and spill tuning.
Experience with CI/CD tools such as Git, Jenkins, Azure DevOps, or GitHub Actions.
Nice-to-Have
Cloudera Machine Learning (CML) or Cloudera Data Science Workbench (CDSW)
Delta Lake / Iceberg / Hudi
Cloudera Manager administration
Data catalog & lineage (Atlas)
Exposure to Kafka, NiFi, Informatica
HBase
Core Competencies
Ownership and accountability
Strong analytical and performance tuning mindset
Collaboration with cross-functional teams
Excellent documentation and communication skills
Pune
平等就业机会声明
所有聘用决定均不考虑年龄、种族、信仰、肤色、宗教、性别、国籍、血统、残疾状况、退伍军人身份、性取向、性别认同或表达、基因信息、婚姻状况、公民身份或任何其他受联邦、州或地方法律保护的因素。
求职者在招聘过程中没有义务披露已封存或已删除的定罪或逮捕记录。
埃森哲致力于为我们的男女军人提供退伍军人就业机会。
请阅读埃森哲的招聘和聘用声明,了解更多关于我们在招聘和聘用过程中如何处理您的数据的信息。
We work with one shared purpose: to deliver on the promise of technology and human ingenuity. Every day, more than 775,000 of us help our stakeholders continuously reinvent. Together, we drive positive change and deliver value to our clients, partners, shareholders, communities, and each other.
We believe that delivering value requires innovation, and innovation thrives in an inclusive and diverse environment. We actively foster a workplace free from bias, where everyone feels a sense of belonging and is respected and empowered to do their best work.
At Accenture, we see well-being holistically, supporting our people’s physical, mental, and financial health. We also provide opportunities to keep skills relevant through certifications, learning, and diverse work experiences. We’re proud to be consistently recognized as one of the World’s Best Workplaces™.
Join Accenture to work at the heart of change. Visit us at www.accenture.com.