Python/Big Data Developer

Charlotte, NC

Company Name :IBA Infotech LLC

Type : Contract

Primary Skills : Hadoop, PySpark, ORC, RDD, Dataframes

Location : Charlotte

CTC : DOE

Job Description:

Job Description

  • In-depth understanding and knowledge of Hadoop and Spark architecture and RDD transformation
  • Proven experience in developing solutions using Spark architecture and PySpark for data engineering pipelines, transformation, and aggregation of data from a variety of sources into the data lake.
  • At least 3 or more years of relevant experience in developing PySpark programs using APIs. Expertise in different file formats like parquet, ORC.
  • Experience with troubleshooting, fine-tuning Spark and python based applications for scalability and performance.
  • Experience in designing hive tables to handle velocity, variety and to handle huge volumes.
  • Experience in data ingestion, processing and analyzing data using Spark/SQL from disparate sources.
  • Knowledge in using Spark-Submit and Spark UI. Experience in creating and then performing operations on Spark RDD.
  • Experience in creating Spark Data Frames from RDD, HIVE and Parquet files and then performing Joins and Aggregations on Dataframes.
  • Experience in processing data from Python and other API modules.