at The Jackson Laboratory in Bar Harbor, Maine, United States
The Data Engineer builds, manages, and optimizes data pipelines into production for key Business Products. The incumbent plays a pivot role in operationalizing large-scale data capture, management data analytics initiatives. Data engineers comply with data quality, governance and security requirements while creating and operationalizing integrated and reusable data pipelines for faster data access. This individual interfaces with and applies creativity, problem solving and collaboration with IT members. This role will also work with collaborators outside of our dept, to architect and manage the flow and integration of externally sourced data. This role requires deep understanding of data architecture, data engineering, data analysis, reporting, and a basic understanding of data science techniques and workflows. The ideal candidate is a skilled data / software engineer with experience creating data products supporting analytic solutions. The Data Engineer reports to a Product Manager, IT.
Build data pipelines: 35% Architect, create and maintain data pipelines. Manage data pipelines consisting of a series of stages through which data flows, from data sources to integration to end-user consumption. Build pipelines for data transformation, models, schemas, metadata and workload management. Create, maintain and optimize as workloads move from development to production for specific use cases. 35%
Drive Automation through effective metadata management: 25% Apply innovative and modern tools, techniques and architectures to partially or completely automate the most common, repeatable data preparation and integration tasks in order to minimize manual and error-prone processes and improve productivity. Improve the data management infrastructure process to drive automation in data integration and management. Learn and use modern data preparation, integration and AI-enabled metadata management tools and techniques. 25%
Collaborate with Product Managers in refining their data requirements for various data and analytics initiatives and their data consumption requirements. Work with collaborators outside the Laboratory to integrate or ingest external data. 20%
Stay current with new innovations, and propose modern data ingestion, preparation, integration and operationalization techniques in optimally addressing the data requirements. Train counterparts in these data pipelining and preparation techniques to enable them to better analyze the data. Contribute to grant writing as needed. 10%
Comply with data quality, data governance and security requirements. 10%
Other duties as assigned.
Knowledge, Skills, and Abilities:
Proven experience in leading and mentoring others
Bachelor's degree in computer science or engineering with 3 years' experience in relevant area of computer science or engineering
Expertise with advanced analytics tools for object oriented/ object function scripting using languages such as MuleSoft, Python, Java, C++, Scala, others
Expertise with popular database programming languages for relational and nonrelational databases to build, manage and integrate data pipelines. Experience with ETL processes
Familiar with multiple deployment environments and operating systems and containerization techniques
Ability to develop, recommend and execute solutions to complex problems often under time constraints
A positive attitude, self-starter and ability to perform successfully in a fast-paced environment. Sense of urgency
Familiarity with open source and commercial data science platforms
Exposure to multiple, diverse technologies and processing environments. Ability to quickly comprehend the functions and capabilities of new technologies.
Thorough knowledge of Microsoft SQL Server
Working knowledge of SQL Server Integrated Services (SSIS)
Working knowledge of SQL Server Reporting Service (SSRS)
Bachelor of Science in Computer Science, Engineering, Mathematics, Statistics or related subject
Expertise in SQL programming language
Expertise designing and implementing data pipelines using modern data engineering approach and tools: SQL, MuleSoft, Delta Lake, Databricks, Spark, Glue, Nifi, Streamsets, cloud-native DWH (BigQuery, Snowflake), Kafka/Confluence, Presto/ Dremio /Athena
Experience with developing solutions on cloud computing services and infrastructure with Azure
Experience with database development using a variety of relational, NoSQL, and cloud database technologies
Worked with BI tools such as PowerBI
Conceptual knowledge of data and analytics, such as dimensional modeling, ETL, reporting tools, data governance, DWH, etc.
Exposure to machine learning, data science, computer vision, artificial intelligence, statistics, and/or applied... For full info follow application link.
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability or protected veteran status.To view full details and how to apply, please login or create a Job Seeker account