Open Opportunities
Senior Data Engineer - AI Platform
About The Position
About Cellebrite:
Cellebrite’s (Nasdaq: CLBT) mission is to enable its global customers to protect and save lives by enhancing digital investigations and intelligence gathering to accelerate justice in communities around the world. Cellebrite’s AI-powered Digital Investigation Platform enables customers to lawfully access, collect, analyze and share digital evidence in legally sanctioned investigations while preserving data privacy. Thousands of public safety organizations, intelligence agencies and businesses rely on Cellebrite’s digital forensic and investigative solutions—available via cloud, on-premises and hybrid deployments—to close cases faster and safeguard communities. To learn more, visit us at www.cellebrite.com, https://investors.cellebrite.com/investors and find us on social media @Cellebrite.
Position Overview:
Cellebrite is looking for a Senior Engineer, Backend and Data to join Cellebrite AI - the group building the AI foundation, applied AI capabilities, and agentic experiences that power the next generation of Cellebrite’s investigative products.
This role sits at the intersection of backend engineering, data engineering, and applied AI. It is designed for a strong backend engineer who has become deeply data-oriented: someone who can build reliable production services, but is also comfortable working with complex data pipelines, parsing challenges, retrieval systems, knowledge infrastructure, and AI-first development workflows.
Cellebrite AI works with complex, sensitive, real-world investigative data. The team builds capabilities that help Cellebrite products extract, normalize, index, retrieve, reason over, and safely expose data to AI systems and agentic applications.
This includes both development-time data work - preparing, cleaning, sampling, transforming, and managing data for research, data science, and machine learning -and production-time data systems - backend services, extraction pipelines, parsing infrastructure, retrieval layers, knowledge stores, and governed data access for AI agents.
Success in this role means making messy, large-scale, domain-specific data usable for AI: enabling researchers and data scientists to iterate faster, helping production AI systems retrieve and reason over the right information, and building robust backend/data infrastructure that supports reliable, scalable, and cost-effective AI capabilities across Cellebrite’s product portfolio.
This is not a classic BI data engineering role. It is a hands-on engineering role for someone who enjoys building the data backbone that makes AI systems actually work.
Responsibilities:
- Design, build, and operate backend and data infrastructure for Cellebrite AI products, platforms, and agentic systems.
- Build data extraction, parsing, normalization, enrichment, and transformation pipelines for structured, semi-structured, and unstructured investigative data.
- Support research and machine learning workflows by preparing datasets, cleaning and transforming data, enabling sampling strategies, and improving data accessibility for data scientists and AI engineers.
- Build production data services and APIs that expose data safely and efficiently to AI applications, retrieval systems, and agentic workflows.
- Develop and optimize retrieval and knowledge infrastructure, including indexing, search, hybrid retrieval, vector-aware access patterns, metadata filtering, and data access tooling for AI agents.
- Work with modern data lakehouse and analytical infrastructure, including Apache Iceberg, AWS Glue, Trino, Spark, and Pinot.
- Design and optimize storage and query patterns in systems such as PostgreSQL and/or OpenSearch, including indexing, tuning, performance optimization, and advanced data access patterns.
- Develop data manipulation logic, UDFs, backend services, and pipeline components in Python and related technologies.
- Collaborate closely with AI engineers, data scientists, backend engineers, product managers, and platform teams to turn research needs and product requirements into reliable data systems.
- Improve data quality, observability, validation, reproducibility, reliability, cost efficiency, and operational maintainability across AI data workflows.
- Help build the infrastructure that enables AI agents to access complex case data safely, efficiently, and with the right level of context and governance.
Requirements
- 4+ years of hands-on experience in backend engineering, data engineering, platform engineering, or similar production engineering roles.
- Strong backend engineering skills, with experience designing, building, and operating production services, APIs, and data-heavy backend systems.
- Strong Python experience, including building production-grade services, data processing logic, automation, and data manipulation workflows.
- Hands-on experience with Spark and modern data engineering patterns for processing large-scale structured, semi-structured, or unstructured data.
- Experience with data lakehouse / data warehouse ecosystems such as Apache Iceberg, AWS Glue, Trino, Pinot, or similar technologies.
- Strong AWS experience, including working with cloud-native data services, storage, compute, permissions, and production deployment environments.
- DBA-level depth in PostgreSQL and/or OpenSearch, including schema/index design, query tuning, performance optimization, scaling considerations, and operational troubleshooting.
- Experience building extraction, parsing, normalization, transformation, or enrichment pipelines for complex real-world data.
- Strong SQL skills and ability to reason about query plans, data modeling, indexing strategies, and performance tradeoffs.
- Ability to work independently as a strong individual contributor in an AI-first engineering environment, collaborating effectively with AI engineers, data scientists, backend engineers, and product stakeholders.
Advantage:
- Experience supporting AI/ML systems, RAG, retrieval infrastructure, embeddings, knowledge stores, evaluation datasets, or agentic workflows.
- Experience developing UDFs or custom query/runtime extensions for analytical or search systems.
- Experience with hybrid search, semantic retrieval, vector search, metadata filtering, ranking, or retrieval quality optimization.
- Experience with sensitive, regulated, security-oriented, legal, forensic, intelligence, or investigative data.
- Familiarity with responsible AI considerations such as privacy, auditability, data lineage, explainability, access control, and safe use of sensitive data.
- Experience with observability, testing, validation, reproducibility, and data quality frameworks for production data pipelines.