Key Responsibilities:
- Data Pipeline Execution: Build and maintain reliable data pipelines that automate ingestion from both structured and unstructured sources. Leverage Python and SQL to ensure data flows are secure and traceable.
- Transformation Layer: Develop and manage transformation workflows using dbt, ensuring data models are modular, tested, and version-controlled via Git.
- Orchestration & Scheduling: Schedule data workflows using tools such as Airflow, Prefect, or GCP Workflows, ensuring automated and timely data delivery across our systems.
- Cloud Warehouse Support: Maintain the data warehouse environment (BigQuery), focusing on query performance, cost monitoring, and schema organization.
- Observability & Quality: Implement data validation tests and lineage tracking using frameworks like Elementary to ensure high levels of data integrity and trust.
- Infrastructure as Code (IaC): Assist in managing and deploying cloud resources (BigQuery datasets, IAM roles, GCS buckets) using Terraform to ensure a reproducible and documented environment.
- Version Control & CI/CD: Maintain the integrity of our codebase using GitHub. Ensure that every dbt change or Python script follows our CI/CD patterns (GitHub Actions) for automated testing and deployment.
- Analytics Support: Collaborate with BI analysts and product teams to provide clean, optimized data sets for reporting and internal tools.
- Operational Documentation: Maintain clear documentation (dbt docs/SOPs) for pipelines and models to support team-wide data discovery and "self-service" BI.
Competencies
WHAT QUALIFICATIONS YOU’LL NEED
- Proficiency in SQL and Python for data manipulation and automation.
- Practical experience with dbt for building and maintaining modular data models.
- Familiarity with Git/GitHub workflows and CI/CD principles for managing code.
- Hands-on experience with BigQuery (or similar cloud DWH like Snowflake).
- Practical experience with GCP (preferred) or AWS/Azure. Understanding of IAM permissions and cloud storage.
- Familiarity with Terraform or a strong desire to learn how to manage infrastructure through code rather than manual console clicks.
- Solid understanding of Data Modeling, knowledge of star schemas and how to structure data for efficient reporting.
- Ability to troubleshoot broken pipelines and optimize slow-running queries.
- Strong communication skills and a desire to work collaboratively under the guidance of the Lead Data Engineer.
- Proactive about catching data issues and suggesting improvements to existing workflows.
Experiences
- 2 to 3 years of progressive experience in Data Engineering or Analytics Engineering.
- Prior experience in a scale-up, product-led, or data-centric organization.
- Familiarity with BI tools is a plus (Tableau, PowerBI, Looker), though not a core requirement.
- Proven track record of building and managing dbt models in a production environment.
- Experience with API-based ingestion (using Airbyte, dlt, and/or custom scripts) is a plus.
- Ability to work effectively with people at all levels in an organization.
- Excellent written and oral communication skills, with the ability to present to various audiences and distill key messages in order to effectively inform.
- Skills to communicate complex ideas effectively.