Big Data

Responsibilities

Create data integration pipelines to extract, cleanse, and integrate data from a variety of sources and formats for analysis and use across use cases.
Perform data profiling, discovery, and analysis to identify/determine location, suitability and coverage of data, and identify the various data types, formats, and data quality which exist within a given data source.
Work with source system and business SME’s to develop an understanding of the data requirements and options available within customer sources to meet the data and business requirements.
Create re-usable data extraction/ingestion pipelines and templates to demonstrate the logical flow and manipulation of data required to move data from customer source systems into the target data lake, warehouse, and/or sandbox.
Perform hands on data development to build the data extraction, movement and integration, leveraging state of the art tools and practices, including both streaming and batched data ingestion techniques.
Provide elbow-to-elbow style mentoring of customer resources and other consultants.
Assist in creation of data requirements and data model design as necessary and appropriate.

Qualifications

Minimum of 3 years of experience working with the Apache Hadoop Ecosystem of tools and technologies to extract, integrate, cleanse and organize data, including experience with either the Hortonworks or Cloudera distributions.
Key Tools and Technologies

o Spark
o Scala
o Python
o Java

o Enterprise-scale ETL and ELT batched workloads
o Near real-time micro-batches
o Streaming data

Experience working with Data Governance frameworks
Some experience performing conceptual and logical data model design
Experience in the Financial Services, Retail industry, or Healthcare Payor or Provider industries is a plus.
Strong NoSQL, SparkSQL, and ANSI SQL query language skills
Strong verbal and written communication and English language skills
Strong consulting skills, consulting experience strongly desired