- Create data integration pipelines to extract, cleanse, and integrate data from a variety of sources and formats for analysis and use across use cases.
- Perform data profiling, discovery, and analysis to identify/determine location, suitability and coverage of data, and identify the various data types, formats, and data quality which exist within a given data source.
- Work with source system and business SME’s to develop an understanding of the data requirements and options available within customer sources to meet the data and business requirements.
- Create re-usable data extraction/ingestion pipelines and templates to demonstrate the logical flow and manipulation of data required to move data from customer source systems into the target data lake, warehouse, and/or sandbox.
- Perform hands on data development to build the data extraction, movement and integration, leveraging state of the art tools and practices, including both streaming and batched data ingestion techniques.
- Provide elbow-to-elbow style mentoring of customer resources and other consultants.
- Assist in creation of data requirements and data model design as necessary and appropriate.
- Minimum of 3 years of experience working with the Apache Hadoop Ecosystem of tools and technologies to extract, integrate, cleanse and organize data, including experience with either the Hortonworks or Cloudera distributions.
- Key Tools and Technologies
- Experience working with the following types of workloads and data pipelines:
o Enterprise-scale ETL and ELT batched workloads
o Near real-time micro-batches
o Streaming data
- Experience working with Data Governance frameworks
- Some experience performing conceptual and logical data model design
- Experience in the Financial Services, Retail industry, or Healthcare Payor or Provider industries is a plus.
- Strong NoSQL, SparkSQL, and ANSI SQL query language skills
- Strong verbal and written communication and English language skills
- Strong consulting skills, consulting experience strongly desired