Posted 4 months ago

Software Engineer, Infrastructure

San FranciscoOn-siteFull-time

AI Summary

Product-oriented infrastructure engineer building the backbone for a stateful, multiplayer AI agent platform. Focuses on core data infra, observability, and developer experience to improve production reliability and latency.

About this role

About Blockit

Time is the most valuable resource we have, yet coordinating it remains stuck in the dark ages. At Blockit, we're building the AI that finally fixes this: an autonomous time agent that handles the full complexity of scheduling—timezones, group coordination, in-person logistics—like an executive assistant that never sleeps.

While every LLM application to date has been a one-on-one conversation, Blockit is one of the first multiplayer, stateful AI agents—coordinating between multiple people, maintaining context across conversations, and taking real actions in the world. As more people connect their calendars, our network becomes exponentially more powerful.

This is the foundation of a platform of AI agents with access to the world's time. We're backed by Sequoia, and we're a small, sharp team that moves fast, ships constantly, and holds a high bar. If you want to build something genuinely new, we'd love to talk.

You can visit our teams page to learn more about our team and culture.

The role

Think of this as a product engineer role — but the product is everything underneath. For a stateful, multiplayer AI agent that takes real actions in the world, infrastructure is the user experience. When a meeting gets scheduled in 800ms instead of 4 seconds, when an email never gets dropped, when our agent recovers gracefully from a flaky third-party API — that's a feature users feel. You'll ship those features.

This role spans three connected areas:

Infra that users feel — own our core systems (PostgreSQL, ClickHouse, async job processing, LLM gateway, email infra) with the same product instincts an engineer would bring to a customer-facing feature. Latency, reliability, and recovery are user-facing metrics, and you'll move them. We've outgrown some of our early choices and will outgrow more, so you'll also lead the build-vs-buy and migration calls: is pg-boss still the right queue or do we need a managed system? Should we move off Postmark? When do we add Redis, Kafka, a managed vector store? You make the call, ship the migration, and bring the team along.
Observability — own how we see our system. Metrics, logs, traces, dashboards, alerting, and the on-call / pager rotation — including what we alert on and what we don't. The goal is that we catch problems before customers do, and that when something does break, the engineer paged knows exactly where to look.
Developer experience & DevOps — own how Blockit engineers work day-to-day. That includes capacity planning and deploys, but also our AI-augmented dev environment: Claude Code setup, agent tooling, sandboxes, evals, the works. We want this team to be the most AI-native engineering org in the Valley and we need someone who treats that as a first-class problem.

This role is ideal if you care deeply about how things actually run in production and want to build the foundation for a platform that coordinates millions of calendars.

What you'll do

Own, operate, and evolve our core data infrastructure: PostgreSQL, ClickHouse, and async processing pipelines — measuring success in user-facing terms (p95 scheduling latency, agent action success rate, message delivery reliability)
Own observability end-to-end: metrics, logs, traces, alerting, dashboards, and the on-call / pager rotation — including what we alert on and what we don't
Design, manage, and optimize our LLM infrastructure (gateway, evaluation pipelines, agent observability) for reliability, performance, and cost
Lead infrastructure migration decisions and execute them: e.g. evaluating pg-boss vs. managed queue alternatives, weighing whether to move off Postmark, introducing Redis / Kafka / similar when the time is right
Partner directly with product engineers to ship features — the line between "infra work" and "product work" should be invisible here
Own DevX: shape how engineers leverage AI in their day-to-day workflow — local environments, Claude Code conventions, agent tooling, eval harnesses, anything that compounds team velocity
Own DevOps: deploy pipelines, environment management, capacity planning, infra cost

What we’re looking for

4+ years of software engineering experience, with significant time on backend or infrastructure
Product sensibility — you reach for user-facing metrics first, and you can tell the difference between an infra problem that matters and one that doesn't
Deep with databases — you can read query plans, find the bottleneck, and know when to fix the query vs. fix the schema vs. fix the architecture
Have run production services and owned the pager — comfortable in an incident, methodical in the postmortem
Familiar with job queues, async processing, or event-driven systems, and have opinions on when each is the right tool
Interest in developer tooling and how AI changes the way engineers should work
Pragmatic about build vs. buy — you don't reinvent infrastructure for fun, and you don't outsource what should be core

Location

San Francisco, CA. On‑site 4 days per week

Skills

AlertingAsync Job ProcessingCapacity PlanningClaude CodeClickHouseDashboardsDatabase Query OptimizationDeploy PipelinesEmail InfrastructureEval HarnessesEvent-driven SystemsInfra CostKafkaLLM GatewayLogsMetricsObservability ToolingOn-call RotationPg-bossPostgreSQLPostmarkProducer/consumer QueuesRedisSchema And Architecture ImprovementsTracesVector Store

Software Engineer, Infrastructure

About this role

About Blockit

The role

What you'll do

What we’re looking for

Location

Skills

Explore related jobs

More jobs at Blockit AI

Similar Alerting jobs

Jobs in San Francisco

Browse these categories