This hire will be responsible for expanding and optimizing our data and pipeline architecture and data flow and collection. The ideal candidate is an experienced data pipeline builder and data wrangler who enjoys optimizing data systems and building them from the ground up. In addition, the Data Engineer will collaborate and support software developers, implementation architecture, and data engineers on data initiatives and ensure optimal data delivery architecture is consistent throughout ongoing projects. In addition, they must be self-directed and comfortable supporting the data needs of multiple teams, systems, and products.
Responsibilities:
Because we work on a programmatic leading edge of many technologies, we need someone who is a creative problem solver, resourceful in getting things done, and can shift productively to or from working independently and collaboratively. This person would also take on the following responsibilities:
- Process unstructured data into a form suitable for analysis.
- Support the business with ad hoc data analysis and build reliable data pipelines.
- Implementation of best practices and IT operations in mission-critical tighter SLA data pipelines using Airflow
- Query Engine Migration from Dremio to Redshift.
- We leverage: Multiple AWS Data & Analytic Services(e.g.,Glue, Kinesis, S3) , SQL (e.g., PostgreSQL, Redshift, Athena); NoSQL (e.g., DocumentDB, MongoDB); Kafka, Docker, Spark(AWS EMR and DataBricks), Airflow, Dremio, Qubole, etc
- We use AWS extensively, so experience with AWS cloud and AWS Data & Analytics certification will help you hit the ground running.
Skills and Qualifications:
- 8+ years of real-world Data Engineering experience.
- Programming experience, ideally in Python and other data engineering languages like Scala
- Programming knowledge to clean structure and semi-structure datasets.
- Experience processing large amounts of structured and unstructured data. Streaming data experience is a plus.
- Experience building and optimizing big data data pipelines, architectures, and data sets.
- Background in Linux
- Build the infrastructure required for optimal extraction, transformation, and loading of data from various data sources using SQL and other cloud big data technologies like DataBricks, Snowflake, Dremio, and Qubole.
- Build processes supporting data transformation, data structures, metadata, dependency, and workload management
- A successful history of manipulating, processing, and extracting value from large, disconnected datasets.
- Experience creating a platform on which complex data pipelines are built using orchestration tools like Airflow, and Astronomer.
- Experience with real-time sync between OLTP and OLAP using AWS technologies like realtime sync between AWS Aurora and AWS Redshift.