AI Agents Learn Through Practice. Here's Why Environments Matter More Than You Think

5 minutes

Most of the AI conversation is still about making models more intelligent: bigger models, more parameters, more compute. But as AI agents move beyond answering questions and start taking actions, the real bottleneck is shifting.

An AI agent is a system that pairs a foundation model with tools, memory, workflows and feedback mechanisms to complete tasks and act on its own. The question is no longer whether a model can generate a good answer. It’s whether an agent can reliably finish a task in the messy, unpredictable conditions of real software.

That was the focus of a recent Acceler8 Talent webinar, How AI Agents Learn, Practice, and Improve, and What That Means for the Businesses Building With Them, featuring Deniz Zorlu, Research Engineer at Fleet AI


The conversation looked at a less-discussed side of agent development: the environments agents learn in, the feedback they receive, and the systems that shape their behaviour before they ever reach production.

What is an AI agent?

Unlike a standalone model, an agent can interact with software, make decisions and learn from outcomes. The model is one component, often the reasoning engine, but the agent also includes memory, tools, workflows and the orchestration layer that lets it actually complete a task. 

That distinction matters more than it first appears.

Reinforcement learning changes how agents improve

Much of the recent progress in agent development has come from reinforcement learning (RL). Where supervised learning trains a model on a fixed dataset, RL lets a model improve through experience: it takes actions, receives feedback and adjusts its behaviour based on the outcomes.

Training on a fixed dataset caps a model at the quality of the process that produced the data. Reinforcement learning has no such ceiling. As long as there is an environment and a reward signal, the model can keep improving through experience.

That is an important difference. A model trained through imitation is limited by the examples it has seen; a model trained through interaction can keep getting better as it meets new situations and feedback. For teams building AI products, it changes where performance comes from: less from supplying more examples, more from building environments where agents can learn by doing.

"When you train a model on a fixed dataset, it can only become as good as the process that produced that data. With reinforcement learning, as long as there's an environment and a reward signal, the model can keep improving through experience."

A model is not an agent

The terms get used interchangeably, but they describe different things. 

A model is a component; an agent is a system. The model may be the reasoning engine, but the agent also includes memory, tools, workflows, orchestration and the surrounding infrastructure that lets it complete tasks.

Agent performance is not determined by model capability alone. A highly capable model can still produce poor outcomes inside a poorly designed system. 

This is why Fleet focuses on the part many companies overlook: the environment.

The company's view is that model capability is only one part of the equation. The tasks, feedback loops, evaluation frameworks and software environments surrounding the model often determine whether an agent succeeds in practice.

Why do environments matter?

Fleet describes itself as an environment company. Its work centres on building realistic environments where agents can practise tasks before being deployed into production systems. In practice, those environments often resemble the software people already use every day: JIRA, expense management tools, Google Maps, internal file systems, enterprise platforms, databases and spreadsheets.

Rather than learning from instructions alone, agents learn by interacting with these environments directly. They complete tasks, receive feedback and gradually improve through experience.

The concept echoes a long-standing challenge in robotics known as the "sim-to-real" problem. A robot may perform perfectly in simulation and still struggle in the real world. 

Agents face the same challenge. The closer a training environment mirrors production, the more likely it is that what an agent learns during training will transfer successfully when deployed.

Why rewards are harder to define 

If environments decide where learning happens, rewards decide what gets learned. And defining rewards is surprisingly hard.

One example from the session came from an early reinforcement learning experiment built on a boat-racing game. 

Researchers expected the agent to learn to finish the race. Instead, it found a way to loop around collecting reward points without ever completing the course. The agent was not cheating. It was optimising for exactly the objective it had been given. That is the heart of the reward problem: agents optimise for the signal they are given, and if that signal does not reflect the intended outcome, behaviour drifts.

For shorter tasks, rewards can often be defined programmatically. The agent either completes the task correctly or it does not: it places the right order, updates the database, retrieves the correct information, books the right meeting. 

Longer workflows are harder. When a task runs to dozens or hundreds of decisions, a single pass-or-fail signal at the end rarely gives the agent enough to learn from. That is where human feedback still matters, especially for tasks that are difficult to verify objectively, where people remain essential for judging quality and relevance.

Humans are moving up the stack

The webinar also looked at how agents are changing the role of technical teams. It isn't removing the need for engineers so much as shifting where their attention goes. Instead of implementing every solution directly, engineers increasingly spend their time defining objectives, evaluating outputs, reviewing behaviour and setting success criteria. 

The work becomes less about execution and more about oversight.

There is a practical limit, though. While AI opens up parallel work, human attention is finite: in his own workflow, Deniz finds that managing more than four or five agent-driven tasks at once becomes difficult to hold together. The bottleneck simply moves from execution to oversight.

Why deployment still deserves caution

As agents improve, businesses are understandably keen to deploy them. The catch is that real systems carry real consequences. Training environments exist partly because learning directly in production is expensive, risky and sometimes harmful. A poorly tested agent can behave unexpectedly, burn through resources or create operational problems that a controlled environment would have caught.

During the discussion, Deniz referenced reports of a company running up a compute bill in the hundreds of millions of dollars after inadequate controls and testing. It is an extreme case, but it makes the point: the cost of poor evaluation changes dramatically once an agent is working against real systems, real users and real budgets. 

For founders and operators, environment quality, evaluation frameworks and feedback loops may matter as much as model selection itself.

Where it's going

Most agent training today still happens in simulated environments. 

One area researchers are increasingly interested in is whether agents can learn directly from real-world workflows rather than synthetic copies of them. If that becomes practical, it would unlock a far richer source of training data than today's environments provide. The technical and safety challenges are significant, but so is the upside: capturing real-world experience and turning it into a learning signal could be one of the next major advances in agent development. 

It is an area Fleet is actively exploring.

Better training creates better agents

Fleet's argument is that the system around the model deserves equal attention, because better environments produce better training, and better training produces more reliable behaviour. 

Reliability is what businesses actually care about once agents move from demos into live workflows. The teams building agent systems that work are the ones investing in practice, feedback and evaluation, and in environments where agents can learn before the stakes are real.

The core insight is simple: agents don't need to be smarter. They need better practice.




Want to go deeper? Watch the full session, How AI Agents Learn, Practice, and Improve, and What That Means for the Businesses Building With Them, featuring Deniz Zorlu of Fleet AI and Matthew Ferdenzi of Acceler8 Talent.

Watch the on-demand recording here:
https://www.linkedin.com/video/live/urn:li:ugcPost:7467969705207709696/