Location: Mumbai

Experience: 4yrsto 8yrs

Technologies / Skills: Advanced SQL, Python and associated librarieslike Pandas, Numpy etc., Pyspark , Shell scripting, DataModelling, Big data, Hadoop, Hive, ETL pipelines.


Responsibilities:

• Proven successin communicating with users, other technical teams, and senior management to collect requirements, describe data modeling decisions and develop data engineering strategy.

• Ability to work with business ownersto define key businessrequirements and convert to user stories with required technical specifications.

• Communicate results and businessimpacts of insight initiatives to key stakeholders to collaboratively solve business problems.

• Working closely with the overall Enterprise Data & Analytics Architect and Engineering practice leads to ensure adherence with the best practices and design principles.

• Assures quality, security and compliance requirements are met forsupported area.

• Design and create fault-tolerance data pipelinesrunning on cluster

• Excellent communication skills with the ability to influence client business and IT teams

• Should have design data engineering solutions end to end. Ability to come up with scalable and modular solutions


Required Qualification:

• 3+ years of hands-on experience Designing and developing Data Pipelinesfor Data Ingestion or Transformation using Python (PySpark)/Spark SQL in AWS cloud

• Experience in design and development of data pipelines and processing of data at scale.

• Advanced experience in writing and optimizing efficient SQL queries with Python and Hive handling Large Data Sets in Big-Data Environments

• Experience in debugging, tunning and optimizing PySpark data pipelines

• Should have implemented concepts and have good knowledge of Pyspark data frames, joins, caching, memory management, partitioning, parallelism etc.

• Understanding of Spark UI, Event Timelines, DAG, Spark config parameters, in order to tune the long running data pipelines.

• Experience working in Agile implementations

• Experience with building data pipelinesin streaming and batch mode.

• Experience with Git and CI/CD pipelines to deploy cloud applications

• Good knowledge of designing Hive tables with partitioning for performance.


Desired Qualification:

• Experience in data modelling

• Hands on creating workflows on any Scheduling Tool like Autosys, CA Workload Automation

• Proficiency in using SDKsfor interacting with native AWS services

• Strong understanding of concepts of ETL, ELT and data modeling.