Member of Technical Staff - Agent Platform (Agent OS)
Bosonai
Posted: June 6, 2025
Interested in this position?
Create a free account to apply with AI-powered matching
Quick Summary
We are looking for a highly skilled engineer to join our team as a member of the technical staff, where you will be working on the development of our agent platform.
Required Skills
Job Description
About Boson AI: At Boson AI, we are not just building AI solutions; we are pioneering the future of enterprise AI. Driven by a passion for cutting-edge AI research, particularly in the transformative areas of large language models and agentic systems, our mission is to tackle the most complex real-world problems for businesses and unlock significant value. We are a dynamic and collaborative team of researchers and engineers who thrive on pushing the boundaries of what's possible, dedicated to delivering high-quality, reliable products that seamlessly integrate into the fabric of enterprise workflows and set new industry standards.
About the Role: Engineer and evolve the core Agent OS—a high-performance, resilient platform encompassing the dialog & policy engine, distributed context & memory, execution runtime, security isolation, voice runtime, and complex agentic orchestration frameworks. This system underpins all Boson agents, from low-code configuration flows in Workspace to advanced, production-grade systems leveraging RAG, ReAct, robust tool calling, and multi-step execution.
Responsibilities:
• System Ownership: Take ownership of the core dialog & policy engine. Define and implement the state machine for agent state representation, the decision-making logic, and the mechanisms for enforcing complex safety policies and guardrails at the execution layer of a workflow.
• Distributed Context & Memory: Design, implement, and maintain the high-performance context and memory systems. Focus on low-latency, reliable access to conversational and user history, including the tight integration and optimization of RAG and vector retrieval pipelines for production use.
• Agentic Orchestration Frameworks: Define, architect, and deliver robust agentic orchestration patterns, including battle-tested planner–executor schemes, ReAct-style reasoning and acting loops, and resilient, multi-step workflows that programmatically combine tools, LLMs, and stateful memory.
• Internal SDK/Framework Development: Build and evolve the internal, production-grade equivalent of frameworks like LangChain/LlamaIndex. Design composable graphs and execution chains with clear APIs and type safety that product engineering teams and low-code builders can safely reuse, extend, and deploy at scale.
• Voice Runtime Infrastructure: Own and optimize the voice runtime components for streaming audio, low-latency barge-in detection, and reliable turn-taking protocols. This requires deep collaboration with Application and ML Platform teams to meet tight latency, jitter, and quality of service (QoS) constraints.
• Tooling & Integration Architecture: Architect a robust, secure tooling and integration framework (MCP/A2A). This includes building the underlying infrastructure for tool registration, handling complex authentication/authorization, implementing rate limiting/circuit breaking, managing retries, and ensuring typed, validated I/O between agents and external microservices.
• Platform Observability & Reliability: Define, instrument, and monitor rigorous SLIs/SLOs for the Agent Platform. Lead engineering efforts to continuously improve reliability, enhance system debuggability (rich, step-level traces and structured logging), and drive core performance optimizations over time.
• API & Abstraction Design: Ensure the platform's public-facing APIs and internal abstractions are clear, well-documented, and fundamentally sound, enabling junior and senior engineers alike to compose sophisticated agent behavior without introducing systemic invariants or breaking changes.
• Advanced Capabilities R&D: Explore and prototype future capabilities, focusing on the engineering challenges of on-device personalization, implementing privacy-preserving federated learning signals, or integrating novel policy adaptation techniques that influence agent behavior in production.
Qualifications:
• Deep Experience: 3+ years of hands-on experience in backend engineering and distributed systems, with a track record of building and owning core platforms or frameworks used successfully by other engineering teams.
• Agentic Systems Expertise: Demonstrated, hands-on experience architecting, building, or operating production-grade agentic systems: orchestrating LLM calls, managing complex tool interactions, and defining stateful workflows—moving beyond simple single prompt/response API integrations.
• Orchestration & Design Patterns: Strong working knowledge of engineering orchestration frameworks (e.g., LangChain, LlamaIndex, or internal equivalents) and a deep understanding of core design patterns like RAG, ReAct, and multi-step planning.
• Systems Engineering Mastery: Deep and practical understanding of distributed system design, concurrent programming, and building for reliability in multi-tenant cloud environments with strictly defined latency and cost envelopes.
• Framework Evangelism: Proven experience designing, implementing, and rolling out successful frameworks or libraries that other internal engineering teams enthusiastically adopt and productively build upon.
• Security Focus: Comfort and prior experience working on security-sensitive systems, including implementing authz/authn schemes, isolation boundaries, data protection protocols, and integrating with centralized policy/safety infrastructure.
• Technical Leadership: Strong technical communication skills and the ability to lead complex, cross-functional technical initiatives, driving consensus and influencing architectural decisions across partner teams.
Bonus point:
• Experience developing and operating conversational AI platforms, agent frameworks, or high-throughput, complex workflow engines in a production setting.
• Engineering background in real-time media (audio/video) systems or low-level signaling protocols where extreme low-latency and jitter management are critical performance factors.
• Prior experience building high-stakes enterprise platforms (e.g., payments, identity, core data services) where correctness, auditability, and absolute reliability are non-negotiable requirements.
• Exposure to emerging systems and engineering techniques, such as integrating federated learning models, enabling on-device personalization, or implementing bandit-style adaptive policy systems.