Agents and Agentic Deployments with MultiTool

Published
December 16, 2024
by
Eric Ghildyal

In its early days, people used Generative AI (GenAI) for entertainment. They wrote humorous songs about celebrities and generated essays for readings that students didn’t do. Then, people figured out they could use these new tools to do work–for better or for worse. GenAI started writing blog posts by the minute (not this one!). It became responsible for all the sales emails with specific, yet subtly incorrect, details about your alma mater. As time passes, its use cases have expanded and its output has become more powerful. To the point where we can now ask: can an AI do more than write text? This idea has paved the way for “Agentic AI.”

What is an Agent and Agentic AI?

GenAI is an umbrella term for any algorithm that outputs an artifact like text, video, or images. Earlier AI produced abstract outputs like solutions to a chess game, stock predictions, or calculations. Under the umbrella of GenAI, Agentic AI outputs an action. An agent might send emails on your behalf, order DoorDash, or place phone calls. The breakthrough in GenAI that allowed Agentic AI to take off is that agents are task-specific. They perform targeted actions as part of their output, rather than trying to be everything for everyone (like an LLM). An agent is purpose-built to receive a set of inputs, produce a decision, and finally perform an action based on that decision.

Consider a simple example: an agent for an eCommerce store that processes warranty claims. The input is a warranty claim from a human customer and some extra data about that customer. The output is to update the customer support ticket with an initial approval or denial for the warranty. This goes one step further than an LLM could typically achieve by taking an action – changing the ticket status. An LLM might be able to analyze the data and produce a written suggestion, but it could not initiate the process by taking an action.

Agentic AI can be implemented using traditional AI methods like classifiers, auto-encoders, or, more recently, neural networks. Some agents use mixed models: first an LLM processes language-specific inputs before a deep neural network generates actions.

Why use an agent?

Agents typically have two goals: save time and save money. Ideally, they do both. Our example eCommerce agent’s primary goal is to cut down on time needed to assess customer warranty claims. That goal is simple enough, but the “how” is where it gets interesting. 

Agents differ from simpler models like decision trees or Q-learning in the quality and variety of data they use to make decisions. Modern agent systems are adept at encoding a wider variety of data sources into the underlying data structures (usually neural nets), allowing them to learn from a wider array of inputs than other approaches. Our warranty agent might analyze the data from the email, a customer-provided image of the defect, or the customer’s purchase history to determine whether a particular customer is likely to shop again. Then it could instantly approve the warranty claims for all those identified as repeat shoppers.

If a human were to perform this task, they might look at a photo of the defective product and determine whether it was damaged by the customer. But an AI agent can pull in data from other sources in addition to the photo. It might also consider the customer’s past order history.This approach could reveal that this customer has made six previous purchases, and that the best course of action is to approve their claim to ensure repeat business.

This process is necessarily hands off for the customer service team. The inputs are queried independently and the action is taken automatically. Automation is where the real time and cost savings are.

How does MultiTool use an agent?

The MultiTool agent’s single goal is to deploy new code safely. It tests the waters with a new deployment over and over until it feels confident in its quality. More concretely:

  1. MultiTool collects inputs about the existing production deployment. Currently, we use HTTP response codes, since they’re the most impactful factor for measuring stability. In the future, we’ll add CPU usage, memory usage, and latency as additional measures.
  2. After building a baseline model, MultiTool will take its first action: reroute a portion of traffic from the baseline deployment to the new release. This is the canary deployment.
  3. MultiTool builds a secondary model for the canary deployment. It now has two models: one for the baseline deployment and one for the canary. Both models update as more data floods in. The key insight: if the canary deployment is as stable as the baseline, then the model of its behavior should be identical.
  4. MultiTool continuously runs a drift detection algorithm on the two models, which measures the difference between them. If there’s no drift, then the canary is as stable (or more stable) than the baseline.
  5. If the MultiTool agent determines the two models are equally stable, it increases the amount of traffic sent to the canary deployment. Then, it waits to see if this increased load impacts the models. Giving the agent more data about the canary improves the accuracy of the drift detection. As long as things look good, MultiTool progressively ramps up traffic. 
  6. Finally, once MultiTool has developed confidence in the release, the agent will cutover and promote the canary deployment to be the new production deployment. 🚀

Making MultiTool an agentic model was a key design decision for us from day one. We knew that we wanted the deployment to be hands-off for developers without sacrificing speed or accuracy.

Our work is ongoing. We’re constantly improving MultiTool’s model-building capabilities and incorporating more deployment metrics so we can improve agent accuracy and support a broader range of platforms. Join our waitlist to get early access to new platforms, and connect with us on GitHub and LinkedIn to keep up to date with MultiTool’s progress.