Why Amazon's AGI SF Lab is putting HCI on the AI keynote stage
TechCrunch reports that Danielle Perszyk, who leads human-computer interaction at Amazon’s AGI SF Lab, will keynote TechCrunch Sessions: AI on June 5 at UC Berkeley’s Zellerbach Hall. That’s conference news, but it also says something about where the...
Amazon’s Danielle Perszyk Will Keynote TechCrunch Sessions: AI, and the Topic Matters More Than the Speaker Slot
TechCrunch reports that Danielle Perszyk, who leads human-computer interaction at Amazon’s AGI SF Lab, will keynote TechCrunch Sessions: AI on June 5 at UC Berkeley’s Zellerbach Hall. That’s conference news, but it also says something about where the field is stuck.
The industry keeps pouring money into bigger models, longer context windows, and more orchestration glue. Meanwhile, a lot of agent systems still struggle with the basic mechanics of working with people.
Agents are getting better at doing things. They’re still uneven at doing them in ways users can follow, trust, and correct.
That makes Perszyk’s background relevant. Her work reportedly combines social-cognitive research with product and engineering practice, which is a useful mix right now. Teams know how to wire models into workflows. They still have trouble deciding when an agent should act, when it should ask, how it should explain itself, and how a user can tell whether it’s on track or quietly failing.
Why this keynote stands out
The past year has been full of talk about agentic AI, usually with autonomy framed as the main challenge. It’s a challenge, sure. So are coordination, trust, and recovery after mistakes. Agents fail in messy ways because they sit between prediction and action.
A chatbot that gives a bad answer is annoying. An agent that books the wrong meeting, triggers the wrong workflow, or moves inventory based on a bad read is a different problem entirely.
That’s where HCI matters. If you’re building for logistics, customer support, internal ops, robotics, or multimodal assistants, model quality won’t carry the whole system. You need interaction design that deals honestly with uncertainty.
Perszyk’s resume gives that some weight. According to the event writeup, she has a PhD in language evolution and experience at Google and Adept. That puts her closer to systems where language, intent, and action collide, not just generic UX work or detached lab research.
The technical problem teams keep hitting
Modern agents are usually modular, whether teams admit it or not.
You have:
- perception layers for text, speech, vision, or sensor input
- a planner, often LLM-based
- tools and APIs that can actually do things
- memory or state management
- some kind of policy layer for guardrails, approval, or rollback
- telemetry, if the team has been disciplined
That architecture demos well. Production is rougher.
Multimodal systems need timing alignment, data normalization, and error handling across components that fail for different reasons. Speech recognition misses a phrase. Vision misclassifies an object. A planner generalizes too aggressively. A downstream API times out. The user sees one assistant. Underneath, it’s six unreliable systems trying to pass as one coherent actor.
Perszyk’s stated focus on building next-generation AI agents through a human-centered lens points straight at the issue many teams learn the hard way: the interaction model belongs in the architecture.
If the agent can’t communicate uncertainty, ask for clarification at the right moment, and expose enough state for users to debug outcomes, the system stays brittle even with a strong model.
Social cognition has hard engineering consequences
The source material frames Perszyk’s work around social cognition: how humans infer intent, establish trust, and coordinate tasks. That can sound vague until you map it to product decisions.
For an agent, it shows up in questions like:
- When should the system interrupt versus continue autonomously?
- How much explanation does a user need before approving an action?
- Can the agent infer the user’s goal from partial context without getting overconfident?
- How does it recover after a misunderstanding without forcing the user to start over?
- What signals tell a user the system is unsure?
Those questions decide whether an agent is usable.
A lot of current products still handle them badly. They either dump out walls of reasoning nobody asked for, or they act with fake confidence and hide the uncertainty. Neither works well in enterprise settings, where bad actions bring audit, security, and compliance problems with them.
This is also why the source’s recommendation to deploy in “assist mode” before full autonomy makes sense. It’s practical. Let the agent propose. Keep a human in the loop. Log where human choices differ from the recommendation. That gives teams training data grounded in actual operations instead of benchmark fantasy.
Evaluation needs to get more serious
One of the better points in the event description is evaluation beyond benchmark accuracy. The industry still tests agents with the wrong yardsticks.
An agent can score well on task completion in a controlled environment and still be a liability in production if it:
- asks for help too late
- makes silent assumptions
- breaks badly on edge cases
- can’t explain why it took an action
- creates too much review burden for the human supervisor
So the metrics need to widen.
Track task success, yes, but also intervention rate, correction frequency, latency under tool chaining, explanation usefulness, user confidence, rollback events, and divergence between suggested and approved actions.
If you’re building internal tools, instrument this early. The pseudocode example in the source for logging per-action feedback is basic, but the idea is solid. Attach user ratings and comments to specific actions, not vague session-level satisfaction scores. Tie them to traces. Otherwise you won’t know whether the failure came from planning, perception, retrieval, or the UI.
That has a direct MLOps consequence. Agent observability has to cover both model behavior and human response. Conventional application logs won’t do it. Pure model tracing won’t either.
Good architecture still comes with a debugging bill
The source recommends separating perceptual modules such as vision and ASR from decision modules, and using an event bus like Kafka or RabbitMQ to decouple components.
That’s sensible. It lets teams scale parts of the system independently, swap models without rewriting everything, and keep latency-sensitive services from getting tangled up with slower planning loops.
It also makes debugging harder.
Every boundary is another place for context loss, schema drift, or inconsistent state. Event-driven systems look clean until you have to reconstruct why an agent made a bad decision three services and two queues later.
So yes, modularize. But take end-to-end tracing, message versioning, and deterministic replay seriously. If the system touches physical-world tasks, money, or customers, auditability needs to be there before rollout, not bolted on after the first incident review.
Explainability needs discipline
The source mentions capturing intermediate reasoning and surfacing it as user-friendly justification. Fair enough. The details matter.
By 2026, most serious teams know that exposing raw chain-of-thought is a bad default. It can leak sensitive information, create legal risk, and dress up speculation as explanation. Most users need something simpler: a short action summary, the evidence behind it, and a clear statement of confidence or uncertainty.
Good explanation tooling should answer:
- what the agent is about to do
- why it thinks that makes sense
- what inputs or tools informed the decision
- what could go wrong
- how the user can intervene
That’s useful explainability. Dumping token soup into the UI isn’t.
Where this lands in the broader stack
TechCrunch’s broader agenda reportedly includes model selection, cross-modal integration, robustness, scaling, and alignment. All of that connects directly to Perszyk’s area.
Model selection affects interaction quality because different models handle ambiguity, memory, and tool use differently. Cross-modal integration changes what context actually means. Robustness becomes a user experience problem the second an agent enters messy real environments. Alignment stops being abstract once the system starts touching APIs, devices, or enterprise data.
Security runs through all of it. An agent that can act across systems increases the blast radius for prompt injection, permission creep, bad retrieval, and poisoned tool outputs. Human-centered design has to include permission boundaries, review gates, and clear action scopes. A polished interface won’t save a reckless execution model.
That’s why this keynote is worth watching. The market has heard enough vague talk about AI assistants. What teams need are better patterns for building systems that stay useful under supervision, are clear about uncertainty, and hold up in production.
Perszyk’s session may or may not get all the way there. Keynotes often promise more than they deliver. Still, the topic choice is a signal. Serious teams are spending less time arguing about model IQ and more time dealing with a harder question: how do you make an agent legible, cooperative, and safe enough to trust with actual work?
For senior engineers and tech leads, that’s the point. The next bottleneck is getting agents to act in ways people can manage.
Useful next reads and implementation paths
If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.
Design agentic workflows with tools, guardrails, approvals, and rollout controls.
How AI-assisted routing cut manual support triage time by 47%.
May Habib is taking the AI stage at TechCrunch Disrupt 2025 to talk about a problem plenty of enterprise teams still haven't solved: getting AI agents out of demos and into systems that actually matter. A lot of enterprise AI still looks like a chat ...
Carl Pei’s latest pitch fits neatly on a keynote slide: smartphone apps fade away, and AI agents take their place. He made the case at SXSW, calling the app grid an outdated interface for software that should understand intent and act on it. Book the...
Moltbot, the open source personal AI assistant formerly known as Clawdbot, is getting attention for a simple reason: it aims to do work on your machine. It can send messages, create calendar events, trigger workflows, and in some setups even check yo...