Open Opportunities
Staff Data Engineer
About The Position
Company Overview:
Cellebrite’s (NASDAQ: CLBT) mission is to enable its customers to protect and save lives, accelerate justice, and preserve privacy in communities around the world. We are a global leader in Digital Intelligence solutions for the public and private sectors, empowering organizations in mastering the complexities of legally sanctioned digital investigations by streamlining intelligence processes. Trusted by thousands of leading agencies and companies worldwide, Cellebrite’s Digital Intelligence platform and solutions transform how customers collect, review, analyze and manage data in legally sanctioned investigations.
Position Overview:
Cellebrite’s AI Group is transforming digital investigations into an AI‑first world. We build the systems that allow investigators to move from raw data to insight — faster, more defensibly, and at scale.
We are a small group of strong engineers and researchers who care deeply about quality, performance, and impact. We experiment, we measure, we ship — and we hold ourselves to a very high bar.
We’re looking for a Staff Data Engineer who is deeply hands-on and thrives on solving hard technical problems.
You are the kind of engineer who:
- Made that one massive query 10x faster — because it bothered you.
- Contributed a meaningful patch to an open-source project.
- Enjoys diving into internals to understand why something behaves the way it does.
- Helps teammates level up through code reviews and technical discussions.
- Shifted-left complex debugging to AI using Skills.
This is a senior Individual Contributor role. You won’t just implement requirements — you’ll shape systems through execution. You’ll research, prototype, benchmark, and then build production-grade solutions that scale.
What You’ll Do:
- Design and build scalable data pipelines for AI and ML research
- Turn experimental workflows into robust production systems
- Improve performance across storage, queries, indexing, and compute
- Build APIs and services that expose large-scale data systems
- Evaluate and introduce new technologies when they create real leverage
- Reduce infrastructure cost through smart architectural and operational decisions
- Define observability standards (metrics, tracing, logging)
- Use GenAI tools and coding agents to increase engineering velocity
- You will operate across ingestion, modeling, analytics, backend services, and DevOps. End-to-end ownership is expected.
Requirements
- 8+ years of experience in Data Engineering, ML, backend and distributed systems
- Proven track record of building and operating production data platforms
- Strong Understanding in various retrieval systems and data-fabrics with an eye for workload and query optimization.
- Experience with cloud-native systems (AWS/GCP/Azure)
- Strong Python skills and at least one additional production language
- Deep hands-on experience with Apache Spark (batch and streaming), including understanding of execution plans, partitioning strategies, shuffle behavior, memory tuning, and performance optimization at scale
- Solid DevOps and CI/CD practices
- Strong advantage:
- Deep experience with OpenSearch / Elasticsearch, including index design, shard/replica strategy, query DSL optimization, relevance tuning, and cluster performance management
- Strong knowledge of Lakehouse technologies such as Apache Iceberg, Delta Lake, or similar — including table formats, snapshot isolation, compaction, partition evolution, metadata scaling, and query performance tuning
- Experience supporting ML / MLOps environments
- Lakehouse or large-scale analytical systems experience
- Performance tuning at scale
- Experience working with vector search or AI data pipelines
Who You Are
- Highly analytical and systems-oriented
- Obsessed with performance and correctness
- Hands-on and execution-driven
- Comfortable researching and validating new technologies independently
- Generous with knowledge and invested in raising the team’s bar
- Public Technical Footprint (Required)We value engineers who engage with the broader ecosystem. You should have at least one of the following:
- Active GitHub contributions or open-source involvement
- Technical blog posts or published articles
- Public speaking or meetup participation
- Recognized technical community presence
Why Join
- You’ll work on problems that sit at the intersection of data, AI, and large-scale systems — directly shaping how digital investigations evolve in an AI-first era.
- If you care about building systems that are fast, scalable, and elegant — and you take pride in being the person who makes them better — we’d love to talk.