Responsibilities
- Processing and Analyzing Large Datasets
- Utilize PySpark to process and analyze extensive datasets, ensuring data accuracy and quality.
- Building and Optimizing ETL Processes
- Design, develop, and optimize ETL pipelines to handle large-scale data efficiently.
- Machine Learning and Predictive Analytics
- Implement machine learning models and predictive analytics solutions using big data frameworks.
- Optimization and Debugging
- Identify performance bottlenecks, optimize workflows, and debug complex distributed systems.
Requirements
- Proficiency in PySpark and Apache Spark Ecosystem
- Advanced skills in PySpark and familiarity with the broader Apache Spark framework.
- Strong Python Knowledge
- Expertise in Python, including libraries like Pandas and NumPy, for data manipulation and analysis.
- Familiarity with Big Data Technologies
- Hands-on experience with HDFS, Hive, HBase, and Kafka for big data processing and storage.
- SQL Expertise
- Ability to write and optimize complex SQL queries for data analysis and ETL workflows.
- Cloud Platform Experience
- Experience working with cloud platforms like AWS, Azure, or Google Cloud Platform (GCP).
- Distributed Computing Concepts
- In-depth understanding of distributed computing and parallel processing.
What we offer
- B2B Contract
- Employment based on a B2B contract
- Stable and Dynamic International Firm
- Opportunity to work in a stable, dynamically developing international company
- Engaging Projects and Latest IT
- Chance to participate in interesting projects and work with the latest information technologies
- Competitive Rates
- Attractive remuneration rates offered.
- Renowned International Projects
- Involvement in the most prestigious international projects
- Multisport and Private Medical Care
- Access to Multisport benefits and private healthcare services.
Work with us
Apply & join the team
Didn’t find anything for yourself? Send your CV to [email protected]