From Models to Systems:
Engineering Trustworthy AI Agents

Yuan (Emily) Xue
Purdue CERIAS Annual Cybersecurity Symposium — April 2026

Three Curves. One Shift.

Open-Source Autonomous Agents

GitHub stars / repo growth exploding

335K stars in 3 months

Coding Agent Adoption

135K+ GitHub commits/day · ~4% of all public commits

42,896x growth in 13 months

Source: SemiAnalysis, Feb 2026

AI Traffic in Cyber Systems

Machine-to-machine / bot / agentic traffic

7,851% YoY agent traffic

Source: HUMAN Security 2026 Report

AI is no longer just a model. It is becoming an operational actor.

Story 1

The Viral Rise of Autonomous Agents

GitHub Star History: React vs OpenClaw vs Linux

Representative open-source agent repo growth vs established projects

Autonomy

It keeps trying without being told

Persistence

It stays alive, stays connected

Surprise

It finds paths you didn't script

Story 1

Why It Felt Different

Long-Running

Persists over time
Keeps state & context

Goal-Directed

Decomposes tasks
Retries & adapts

Uses tools · Retries on failure · Keeps state · Remembers context · Decomposes tasks · Writes & executes code

Goal

→

Planner

→

Tool Use

→

Memory

→

Execution

→

Retry / Adapt

What made these systems feel magical was not just model quality. It was agency structure: the system persists over time, it uses tools, it keeps working toward a goal, and it can surprise you with a path you didn't explicitly script. That "aha moment" is exactly what drives virality. But the same properties that create the "aha" also create the security problem.

Story 1

Capability Requires Privilege

To do more, agents need more access.

Access They Want

File system
Browser
Terminal
APIs
Credentials
Cloud / infra

What Users Want

Convenience
Speed
No setup friction
Just works

What Security Wants

Least privilege
Auditability
Isolation
Revocation

Story 2

Coding Agents Are Entering the Software Supply Chain

Source: SemiAnalysis, "Claude Code is the Inflection Point," Feb 2026

135K+

GitHub commits per day
~4% of all public commits

42,896x

Growth in 13 months
since research preview

20%+

Projected share of daily
GitHub commits by end of 2026

AI is no longer just assisting code. It is participating in release pipelines.

Story 2

The Coding Agent Amplification Loop

What They Do

Generate, debug & fix code
Write tests & modify configs
Package software & set up CI/CD
Open PRs & commit changes

Where They're Used

Traditional software
ML systems
Agent systems themselves

The Amplification Loop

Agent builds software

↓

Software contains agents

↓

Agents build more software

↻

Story 2

Claude Code Release Incident

March 2025 — The release pipeline created the risk.

What Happened

The Flaw: npm package publication included .map files by default
The Leak: Obfuscated source code was easily reconstructible via these source maps
The Exposure: Internal functions, comments, and non-public logic were visible to anyone who downloaded the CLI tool

The Challenges

Data Leakage via Metadata: The source maps acted as unintended metadata
Cognitive Overload: A human error — but how can human cognitive load keep pace with AI coding velocity?
The Deskilling Problem: When automation replaces human tasks, how do humans retain the knowledge needed to make sound judgments?

In modern web development, source maps bridge the gap between compressed, unreadable code and the original source. By accidentally shipping these, Anthropic effectively "open-sourced" their proprietary agent logic.

Source: Anthropic PBC, "Security Advisory: Claude Code Source Map Exposure," March 2025. anthropic.com/news/claude-code-security-update

Agents Are Becoming Infrastructure

Once agents write code, open tickets, modify configs, query systems, or operate workflows — they become part of the attack surface and part of the control plane.

AI Agent

Software Development

Release Process

Cloud Resources

Networked Systems

Browsers / APIs

Enterprise Workflows

AI agents are no longer outside the system. They are inside it.

Recent cyber benchmarks show AI agent traffic growing 7,851% YoY, with automated traffic now growing 8× faster than human traffic. — HUMAN Security, 2026

The ML Community's Perspective

Two Engineering Practices Shaping Today's AI Safety

Model Developer View

Policy / constitutional alignment
Post-training for preference & behavior shaping
RLHF / RLAIF / RLVR (reinforcement learning from human / AI / verifiable reward)
Safety evaluation and red teaming
Inference-time safety filters

Applied AI Engineer View

Prompt engineering
Context engineering
Tool scaffolding
Harnesses and workflow design
Eval sets and regression testing
Red teaming

    Both communities are highly eval-driven — but they optimize different layers of the stack.
  

Model Safety Is Already a Lifecycle Practice

The model developer community treats safety as a multi-stage discipline.

Training

Policy / constitution shaping
RLHF / RLAIF
RLVR / reasoning optimization

→

Evaluation

Safety benchmarks
Adversarial prompting
Red teaming

→

Inference-Time Controls

Moderation
Refusal behavior
Guardrails / policy enforcement

Example: Constitutional AI Rules

"Please choose the assistant response that is as harmless and helpful as possible, without being dishonest."

"Choose the response that would be most appropriate for a helpful, honest, and harmless AI assistant."

Trends in AI Agent Development

Each phase of AI engineering expanded what agents can do — and what can go wrong.

Trends in AI Interaction and Development

Google Trends, US, 2023–2026

Prompt Engineering

Can the model be induced to say the wrong thing?

Context Engineering

Can the agent be manipulated through what it reads?

Scaffolding / Orchestration

Can the system complete an unsafe sequence of plausible actions?

Agent Harness

Can we observe, contain, and govern the full operational system?

The Risk Surface Expands at Every Stage

Prompt Engineering

Risk surface: output manipulation

Prompt injection & jailbreaks
Harmful / policy-violating outputs
Hallucinations & misleading reasoning
Sensitive information leakage

Context Engineering

Risk surface: context poisoning & trust boundary failure

RAG poisoning & malicious documents
Hidden instructions in retrieved content
Memory poisoning & stale context
Trust confusion: system vs. external input

Scaffolding / Orchestration

Risk surface: workflow-level failure & exploit chaining

Exploit chaining across tools & steps
Recursive failure loops & error amplification
Policy drift across subtasks
Bypass of human checkpoints via workflow logic

Agent Harness

Risk surface: operational governance at scale

Execution shell integrity & state corruption
Multi-agent coordination failures
Environment coupling & side effects
Evaluation blind spots in production

    A chain of locally reasonable decisions that becomes globally unsafe — that is the new failure mode.
  

Where We Are Now: The Agent Harness

The harness is the surrounding system that allows agents to operate, coordinate, observe, and improve within an environment.

Execution Shell

Message passing
Tool invocation
State & turn control
Retry logic
Trace collection

Coordination Layer

Planner / executor separation
Specialist agents
Reviewer / verifier roles
Routing & hierarchy
Multi-agent composition

Environment Interface

Tools & APIs
Browser & filesystem
Code executor
CRM / DB / Slack / email

Observation & Eval

Logging & replay
Trace analysis
Regression testing
Adversarial testing
Failure diagnosis

    A harness provides the runtime structure that allows AI agents to operate, interact with their environment, and be observed, constrained, and improved.
  

The ML Community's Core Mindset

Specify behavior. Measure behavior. Optimize behavior.

How Behavior Is Specified

By Data

Human preference data
Comparison labels
Reward models
Safety / moderation examples

By Specification

Constitutions / policy rules
AI feedback
Verifiable constraints
Programmatic judges

How It Is Enforced

RLHF
RLAIF
Verifier RL / RLVR
LLM-as-judge
Guardrails
Post-training optimization

The key assumption: if we shape the right behavior signal, the system will generalize correctly.

The modern ML community has developed a very powerful operating philosophy: if you can specify desired behavior, evaluate it, and optimize against it, you can drive remarkable capability and alignment progress.

RL Quietly Became the Development Engine

Reinforcement learning is no longer a niche method — it is becoming a general development paradigm across model and agent systems.

RLHF / RLAIF for preference alignment
Verifier-based RL for reasoning & correctness
Tool-use optimization
Self-improvement loops
Agent training through outcome signals

"The Bitter Lesson"

— Richard Sutton

Systems that scale with computation and learning tend to dominate systems built around hand-crafted human structure.

The lesson many in AI internalized: specify as little as possible, optimize as much as possible.

Leave room for search, creativity, and emergent intelligence.

This mindset helped take us from chat models to capable agents. RL is now the engine behind alignment, reasoning, tool use, and agent behavior. But this philosophy — optimize behavior, minimize specification — creates a tension with security and systems engineering, which demand explicit contracts, boundaries, and verifiable constraints. That tension is where the open questions live.

Open Question #1: Can We Make Agency Transactional?

How do we contain, checkpoint, and roll back probabilistic side effects?

Agent reads docs → writes code → opens PR → updates ticket → sends email → step 6 fails

Steps 1–5 already changed the world. What now?

Partial failure handling

Checkpointing & journaling

Idempotence

Rollback / compensation

Safe recovery

ML asks whether the behavior was good. Systems asks what happens when step 6 fails after steps 1–5 already changed the world. This is a systems primitive — can we treat agent actions like database transactions, with checkpoints, journaling, and rollback?

Open Question #2: What Does Least Privilege Mean for Reasoning Systems?

How do we bound what an agent is allowed to access, infer, plan, and execute?

Today

Static tool permissions
Broad access scopes
Coarse sandboxing
API-level access control

What We Need

Contextual permissions
Plan-aware authorization
Dynamic trust boundaries
Reasoning-aware control

    Least privilege in classical security is about what code is allowed to do.
For agents, we also need to think about what the system is allowed to infer, plan, and attempt.
  

Least privilege is well understood for traditional software, but what does it mean when the system reasons, plans, and adapts? We need contextual, plan-aware authorization — not static tool permissions. This is a security primitive that the field has not yet built.

Open Question #3: Can We Build a Compiler for Intent?

How do we translate human goals into machine-checkable authority boundaries before action?

Today

Natural language goal → action / code / tool use

Needed

NL goal → structured intent → constraints / policy → safe execution

Delegated scope

Intent representation

Constraint extraction

Pre-execution validation

Policy-aware planning

If least privilege is the principle, then intent compilation may be the mechanism.

Anthropic's stress-testing of frontier models found that when facing replacement or goal conflicts, systems consistently chose harmful actions over failure — demonstrating that current safety training doesn't reliably prevent "agentic misalignment." Anthropic, "Agentic Misalignment," June 2025

The Translation Gap Across Communities

Community	Strong At	Often Misses
ML / AI	Behavior shaping, evaluation, optimization	State, rollback, authority, runtime control
Software Eng	Abstractions, testing, maintainability	Model non-determinism, prompt-mediated failure
Systems	State, observability, failure propagation	Behavioral ambiguity, learned policies
Security	Threat models, privilege, containment	Agent reasoning loops, emergent workflows

We are not missing effort. We are missing a shared systems language.

What We Need to Build Next

What the Field Needs

Shared mental models across ML, systems, security, and SE
New assurance primitives for agentic systems
Runtime control and audit infrastructure
Reference architectures for trustworthy deployment

What Each Community Must Bring

ML Better behavior shaping is not enough
Security Treat agents as dynamic operational actors
Systems Build rollback, observability, and control for agency
SE Define specs, interfaces, and enforceable contracts

    The bottleneck is no longer model quality. It is our ability to engineer trustworthiness around these systems.
  

The Agent Era Is Here.

The question is no longer: Can we build these systems?

Can we build them in a way that deserves trust?

Trustworthy AI will not come from better models alone.
It will require systems thinking, engineering discipline,
and a shared map across communities.

Yuan (Emily) Xue
Purdue CERIAS Annual Cybersecurity Symposium — April 2026

From Models to Systems:Engineering Trustworthy AI Agents

Three Curves. One Shift.

Open-Source Autonomous Agents

Coding Agent Adoption

AI Traffic in Cyber Systems

The Viral Rise of Autonomous Agents

Why It Felt Different

Capability Requires Privilege

Access They Want

What Users Want

What Security Wants

Coding Agents Are Entering the Software Supply Chain

The Coding Agent Amplification Loop

What They Do

Where They're Used

Claude Code Release Incident

What Happened

The Challenges

Agents Are Becoming Infrastructure

The ML Community's Perspective

Model Developer View

Applied AI Engineer View

Model Safety Is Already a Lifecycle Practice

Training

Evaluation

Inference-Time Controls

Trends in AI Agent Development

Prompt Engineering

Context Engineering

Scaffolding / Orchestration

Agent Harness

The Risk Surface Expands at Every Stage

Prompt Engineering

Context Engineering

Scaffolding / Orchestration

Agent Harness

Where We Are Now: The Agent Harness

Execution Shell

Coordination Layer

Environment Interface

Observation & Eval

The ML Community's Core Mindset

How Behavior Is Specified

How It Is Enforced

RL Quietly Became the Development Engine

Open Question #1: Can We Make Agency Transactional?

Open Question #2: What Does Least Privilege Mean for Reasoning Systems?

Today

What We Need

Open Question #3: Can We Build a Compiler for Intent?

Today

Needed

The Translation Gap Across Communities

What We Need to Build Next

What the Field Needs

What Each Community Must Bring

The Agent Era Is Here.

From Models to Systems:
Engineering Trustworthy AI Agents