Seargin is a dynamic multinational tech company operating in 50 countries. At Seargin, we drive innovation and create projects that shape the future and greatly enhance the quality of life. You will find our solutions in the space industry, supporting scientists in the development of cancer drugs, and implementing innovative technological solutions for industrial clients worldwide. These are just some of the areas in which we operate.
Data Engineer
Remote
UE
B2B
Senior
Enable the finding, accessing, processing, publication, and sharing of biomedical data to facilitate insights for secondary use
Develop and maintain the EDIS end-to-end engine designed for secondary use and primary exploration, ensuring seamless integration with externally generated data sources
Integrate real-world data (RWD) from both clinical and non-clinical sources into the data ecosystem to enhance data richness and insight generation.
Design and implement efficient Extract, Transform, Load (ETL) processes to ensure high-quality data is available for analysis and reporting
Work on enhancements to the data warehouse infrastructure, optimizing performance and scalability to support large volumes of biomedical data
Collaborate with data scientists and analysts to understand data requirements and ensure data availability for analytics and reporting
Implement data quality checks and validation processes to ensure the integrity and accuracy of data throughout its lifecycle
Monitor data pipelines and workflows, troubleshooting issues as they arise and performing routine maintenance to ensure data reliability
Document data workflows, processes, and standards to maintain transparency and facilitate onboarding for new team members
Utilize AWS services (e.g., S3, Redshift, Lambda) to build and manage cloud-based data solutions that support data storage, processing, and analysis
Optimize data processing and storage solutions for efficiency and performance, ensuring quick access to large datasets
Ensure compliance with relevant data protection regulations and implement security measures to protect sensitive biomedical data
Employment based on a B2B contract
Opportunity to work in a stable, dynamically developing international company
Chance to participate in interesting projects and work with the latest information technologies
Attractive remuneration rates offered
Involvement in the most prestigious international projects
Access to Multisport benefits and private healthcare services
4+ years of experience working with programming languages focused on data pipelines, such as Python or R
4+ years of experience working with SQL for data querying and manipulation
3+ years of experience in maintaining and optimizing data pipelines to ensure data accuracy and efficiency
3+ years of experience working with various types of storage solutions, including filesystem, relational databases, MPP (Massively Parallel Processing), and NoSQL databases
3+ years of experience in data architecture concepts, including data modeling, metadata management, workflow management, ETL/ELT processes, real-time streaming, data quality, and distributed systems
3+ years of experience with cloud technologies, particularly for data pipelines, utilizing tools such as Airflow, Glue, Dataflow, and other solutions like Elastic, Redshift, BigQuery, Lambda, S3, and EBS
Strong knowledge of relational databases, including schema design and query optimization (optional)
1+ years of experience in Java and/or Scala for data processing and application development
Very good knowledge of data serialization languages such as JSON, XML, and YAML
Excellent knowledge of Git and Gitflow, along with experience in DevOps tools such as Docker, Bamboo, Jenkins, and Terraform
Capability to conduct performance analysis, troubleshooting, and remediation of data pipelines (optional)
Excellent knowledge of Unix/Linux environments for data processing and management.
Experience in implementing data quality checks and validation processes to ensure data integrity and reliability throughout the pipeline
Understanding of data security practices and regulations, particularly concerning sensitive health data and compliance standards in the pharmaceutical industry
Familiarity with data visualization tools (e.g., Tableau, Power BI) to support data-driven decision-making and reporting
Strong collaboration skills to work effectively with cross-functional teams, including data scientists, business analysts, and pharmaceutical partners
Understanding of pharmaceutical data formats, particularly SDTM (Study Data Tabulation Model), is a significant plus.