How to Build AI Agents Using “Tool Use”?

Blog

How to Build AI Agents Using “Tool Use”?

Introduction

Before talking about AI Agents, It is imperative to understand the lifespan of a sophisticated language model like GPT. A large language model such as GPT starts its lifespan with pretraining when it learns from a massive corpus of textual data to establish a basic grasp of the language. The next step is supervised fine-tuning when the model is improved for specific tasks by using specified datasets to refine it. By using positive reinforcement to optimize the model’s behavior, reward modeling enhances performance in general and decision-making in particular. Lastly, the model may learn and change dynamically through interactions thanks to reinforcement learning, honing its skills to do various tasks more accurately and adaptable. In this article, we will also learn how you can build AI Agents using “Tool Use.”

Overview

Language models like GPT are developed through pretraining, supervised fine-tuning, reward modeling, and reinforcement learning.
Each phase involves specific datasets, algorithms, model adjustments, and evaluations to enhance the model’s capabilities.
Static models struggle with providing real-time information, requiring regular fine-tuning, which is resource-intensive and often impractical.
Build AI Agents Using “Tool Use” in Agentic Workflow.
AI agents with access to external tools can gather real-time data, execute tasks, and maintain context, enhancing accuracy and responsiveness.

GPT Assistant Training Pipeline

Each phase of the model’s development—pretraining, supervised fine-tuning, reward modeling, and reinforcement learning—progresses through four critical components: Dataset, Algorithm, Model, and Evaluation.

Pretraining Phase

In the initial pretraining phase, the model ingests vast quantities of raw internet data, totaling trillions of words. While the data’s quality may vary, its sheer volume is substantial but still falls short of satisfying the model’s hunger for more. This phase demands significant hardware resources, including GPUs, and months of intensive training. The process begins with initializing weights from scratch and updating them as learning progresses. Algorithms like language modeling predict the next token, forming the basis of the model’s early stages.

Supervised Fine-Tuning Phase

Moving to supervised fine-tuning, the focus shifts to task-specific labeled datasets where the model refines its parameters to predict accurate labels for each input. Here, the datasets’ quality is paramount, leading to a reduction in quantity. Algorithms tailor training for tasks such as token prediction, culminating in a Supervised Fine-Tuning (SFT) Model. This phase requires fewer GPUs and less time than pretraining due to enhanced dataset quality.

Reward Modeling Phase

Reward modeling follows, employing algorithms like binary classification to enhance model performance based on positive reinforcement signals. The resulting Reward Modeling (RM) Model undergoes further enhancement through human feedback or evaluation.

Reinforcement Learning Phase

Reinforcement learning optimizes the model’s responses through iterative interactions with its environment, ensuring adaptability to new information and prompts. However, integrating real-world data to keep the model updated remains a challenge.

The Challenge of Real-Time Data

Addressing this challenge involves bridging the gap between trained data and real-world information. It necessitates strategies to continuously update and integrate new data into the model’s knowledge base, ensuring it can respond accurately to the latest queries and prompts.

However, a critical question arises: While we’ve trained our LLM on the data provided, how do we equip it to access and respond to real-world information, especially to address the latest queries and prompts?

For instance, the model struggled to provide responses grounded in real-world data when testing ChatGPT 3.5 with specific questions, as shown in the image below:

Fine-tune the Model

One approach is to fine-tune the model, perhaps scheduling daily sessions regularly. However, due to resource limitations, the viability of this technique is currently under doubt. Regular fine-tuning comes with several difficulties:

Insufficient Data: A lack of new data frequently makes it impossible to justify numerous fine-tuning sessions.
High Requirements for Computation: Fine-tuning usually requires significant processing power, which might not be feasible for regular tasks.
Time Intensiveness: Retraining the model might take a long period, which is a big obstacle.

In light of these difficulties, it is clear that adding new data to the model requires overcoming several barriers and is not a simple operation.

So here comes AI Agents

Here, we present AI agents, essentially LLMs, with built-in access to external tools. These agents can collect and process information, carry out tasks, and keep track of past encounters in their working memory. Although familiar LLM-based systems are capable of running programming and conducting web searches, AI agents go one step further:

External Tool Use: AI agents can interface with and utilize external tools.
Data Gathering and Manipulation: They can collect and process data to help them with their tasks.
Task Planning: They can plan and carry out tasks delegated to these agents.
Working Memory: They keep details from previous exchanges, which improves dialogue flow and context.
Feature Enhancements: The range of what LLMs can accomplish is increased by this feature enhancement, which goes beyond basic questions and answers to actively manipulating and leveraging external resources

Using AI Agents for Real-Time Information Retrieval

If prompted with “What is the current temperature and weather in Delhi, India?” an online LLM-based chat system might initiate a web search to gather relevant information. Early on, developers of LLMs recognized that relying solely on pre-trained transformers to generate output is limiting. By integrating a web search tool, LLMs can perform more comprehensive tasks. In this scenario, the LLM could be fine-tuned or prompted (potentially with few-shot learning) to generate a specific command like {tool: web-search, query: “current temperature and weather in Delhi, India”} to initiate a search engine query.

A subsequent step identifies such commands, triggers the web search function with the appropriate parameters, retrieves the weather information, and integrates it back into the LLM’s input context for further processing.

Handling Complex Queries with Computational Tools

If you pose a question such as, “If a product-based company sells an item at a 20% loss, what would be the final profit or loss?” an LLM equipped with a code execution tool could handle this by executing a Python command to compute the result accurately. For instance, it might generate a command like {tool: python-interpreter, code: “cost_price * (1 – 0.20)”}, where “cost_price” represents the initial cost of the item. This approach ensures that the LLM leverages computational tools effectively to provide the correct profit or loss calculation rather than attempting to generate the answer directly through its language processing capabilities, which might not yield accurate results. Besides that, with the help of external tools, the users can also book a ticket, which is planning an execution, i.e., Task Planning – Agentic Workflow.

So, AI agents can help ChatGPT with the problem of not having any information about the latest data in the real world. We can provide access to the Internet, where it can Google search and retrieve the top matches. So here, in this case, the tool is the Internet search.

When the AI identifies the necessity for current weather information in responding to a user’s query, it includes a list of available tools in its API request, indicating its access to such functions. Upon recognizing the need to use get_current_weather, it generates a specific function call with a designated location, such as “London,” as the parameter. Subsequently, the system executes this function call, fetching the latest weather details for London. The retrieved weather data is then seamlessly integrated into the AI’s response, enhancing the accuracy and relevance of the information provided to the user.

Now, let’s implement and inculcate the Tool Use to understand the Agentic workflow!

We are going to Use AI agents, a tool, to get information on current weather. As we saw in the above example, it cannot generate a response to the real-world question using the latest data.

So, we will now begin with the Implementation.

Let’s begin:

Installing dependencies and Libraries

Let’s install dependencies first:

langchain
langchain-community>=0.0.36
langchainhub>=0.1.15
llama_cpp_python  # please install the correct build based on your hardware and OS
pandas
loguru
googlesearch-python
transformers
Openai

Importing Libraries

Now, we will import libraries:

from openai import OpenAI
import json
from rich import print


import dotenv
dotenv.load_dotenv()

Keep your OpenAI API key in an env file, or you can put the key in a variable

OPENAI_API_KEY= "your_open_api_key"

client = OpenAI(api_key= OPENAI_API_KEY)

Interact with the GPT model using code and not interface :

messages = [{"role": "user", "content": "What's the weather like in London?"}]
response = client.chat.completions.create(
   model="gpt-4o",
   messages=messages,
)
print(response)

This code sets up a simple interaction with an AI model, asking about the weather in London. The API would process this request and return a response, which you would need to parse to get the actual answer.

It’s worth noting that this code doesn’t fetch real-time weather data. Instead, it asks an AI model to generate a response based on its training data, which may not reflect the current weather in London.

In this case, the AI acknowledged it couldn’t provide real-time information and suggested checking a weather website or app for current London weather.

This structure allows easy parsing and extracting relevant information from the API response. The additional metadata (like token usage) can be useful for monitoring and optimizing API usage.

Defining the Function

Now, let’s define a function for getting weather information and set up the structure for using it as a tool in an AI conversation:

def get_current_weather(location):
   """Get the current weather in a given city"""
   if "london" in location.lower():
       return json.dumps({"temperature": "20 C"})
   elif "san francisco" in location.lower():
       return json.dumps({"temperature": "15 C"})
   elif "paris" in location.lower():
       return json.dumps({"temperature": "22 C"})
   else:
       return json.dumps({"temperature": "unknown"})

messages = [{"role": "user", "content": "What's the weather like in London?"}]
tools = [
   {
       "type": "function",
       "function": {
           "name": "get_current_weather",
           "description": "Get the current weather in a given location",
           "parameters": {
               "type": "object",
               "properties": {
                   "location": {
                       "type": "string",
                       "description": "The city and state, e.g. San Francisco",
                   },
               },
               "required": ["location"],
           },
       },
   }
]

Code Explanation

This code snippet defines a function for getting weather information and sets up the structure for using it as a tool in an AI conversation. Let’s break it down:

get_current_weather function:
- Takes a location parameter.
- Returns simulated weather data for London, San Francisco, and Paris.
- For any other location, it returns “unknown”.
- The weather data is returned as a JSON string.
messages list:
- Contains a single message from the user asking about the weather in London.
- This is the same as in the previous example.
tools list:
- Defines a single tool (function) that the AI can use.
- The tool is of type “function”.
- It describes the get_current_weather function:
  - name: The name of the function to be called.
  - description: A brief description of what the function does.
  - parameters: Describes the expected input for the function:
    - It expects an object with a location property.
    - location should be a string describing a city.
    - The location parameter is required.

response = client.chat.completions.create(
   model="gpt-4o",
   messages=messages,
   tools=tools,
)
print(response)

Also read: Agentic AI Demystified: The Ultimate Guide to Autonomous Agents

Here, we use three external Scripts named LLMs, tools, and tool_executor, which act as helper functions.

fromllms import OpenAIChatCompletion
from tools import get_current_weather
from tool_executor import need_tool_use

Before going further with the code flow, let’s understand the scripts.

llms.py script

It manages interactions with OpenAI’s chat completion API, enabling the use of external tools within the chat context:

from typing import List, Optional, Any, Dict

import logging
from agents.specs import ChatCompletion
from agents.tool_executor import ToolRegistry
from langchain_core.tools import StructuredTool
from llama_cpp import ChatCompletionRequestMessage
from openai import OpenAI

logger = logging.getLogger(__name__)

class OpenAIChatCompletion:
   def __init__(self, model: str = "gpt-4o"):
       self.model = model
       self.client = OpenAI()
       self.tool_registry = ToolRegistry()

   def bind_tools(self, tools: Optional[List[StructuredTool]] = None):
       for tool in tools:
           self.tool_registry.register_tool(tool)


   def chat_completion(
       self, messages: List[ChatCompletionRequestMessage], **kwargs
   ) -> ChatCompletion:
       tools = self.tool_registry.openai_tools
       output = self.client.chat.completions.create(
           model=self.model, messages=messages, tools=tools
       )
       logger.debug(output)
       return output


   def run_tools(self, chat_completion: ChatCompletion) -> List[Dict[str, Any]]:
       return self.tool_registry.call_tools(chat_completion)

This code defines a class OpenAIChatCompletion that encapsulates the functionality for interacting with OpenAI’s chat completion API and managing tools. Let’s break it down:

Imports

Various typing annotations and necessary modules are imported.

Class Definition

pythonCopyclass OpenAIChatCompletion:

This class serves as a wrapper for OpenAI’s chat completion functionality.

Constructor

pythonCopydef __init__(self, model: str = “gpt-4o”):

Initializes the class with a specified model (default is “gpt-4o”).

Creates an OpenAI client and a ToolRegistry instance.

bind_tools method

pythonCopydef bind_tools(self, tools: Optional[List[StructuredTool]] = None):

Registers provided tools with the ToolRegistry.

This allows the chat completion to use these tools when needed.

chat_completion method:

pythonCopydef chat_completion(

self, messages: List[ChatCompletionRequestMessage], **kwargs

) ->

ChatCompletion

Sends a request to the OpenAI API for chat completion.

Includes the registered tools in the request.

Returns the API response as a ChatCompletion object.

run_tools method

pythonCopydef run_tools(self, chat_completion: ChatCompletion) -> List[Dict[str, Any]]:

Executes the tools called in the chat completion response.

Returns the results of the tool executions.

tools.py

It defines individual tools or functions, such as fetching real-time weather data, that can be utilized by the AI to perform specific tasks.

import json
import requests
from langchain.tools import tool
from loguru import logger

@tool
def get_current_weather(city: str) -> str:
   """Get the current weather for a given city.


   Args:
     city (str): The city to fetch weather for.


   Returns:
     str: current weather condition, or None if an error occurs.
   """
   try:
       data = json.dumps(
           requests.get(f"https://wttr.in/{city}?format=j1")
           .json()
           .get("current_condition")[0]
       )
       return data
   except Exception as e:
       logger.exception(e)
       error_message = f"Error fetching current weather for {city}: {e}"
       return error_message

This code defines several tools that can be used in an AI system, likely in conjunction with the OpenAIChatCompletion class we discussed earlier. Let’s break down each tool:

get_current_weather:

Fetches real-time weather data for a given city using the wttr.in API.
Returns the weather data as a JSON string.
Includes error handling and logging.

Tool_executor.py

It handles the execution and management of tools, ensuring they are called and integrated correctly within the AI’s response workflow.

import json
from typing import Any, List, Union, Dict

from langchain_community.tools import StructuredTool

from langchain_core.utils.function_calling import convert_to_openai_function
from loguru import logger
from agents.specs import ChatCompletion, ToolCall

class ToolRegistry:
   def __init__(self, tool_format="openai"):
       self.tool_format = tool_format
       self._tools: Dict[str, StructuredTool] = {}
       self._formatted_tools: Dict[str, Any] = {}

   def register_tool(self, tool: StructuredTool):
       self._tools[tool.name] = tool
       self._formatted_tools[tool.name] = convert_to_openai_function(tool)

   def get(self, name: str) -> StructuredTool:
       return self._tools.get(name)

   def __getitem__(self, name: str)
       return self._tools[name]

   def pop(self, name: str) -> StructuredTool:
       return self._tools.pop(name)

   @property
   def openai_tools(self) -> List[Dict[str, Any]]:
       # [{"type": "function", "function": registry.openai_tools[0]}],
       result = []
       for oai_tool in self._formatted_tools.values():
           result.append({"type": "function", "function": oai_tool})

       return result if result else None

   def call_tool(self, tool: ToolCall) -> Any:
       """Call a single tool and return the result."""
       function_name = tool.function.name
       function_to_call = self.get(function_name)


       if not function_to_call:
           raise ValueError(f"No function was found for {function_name}")


       function_args = json.loads(tool.function.arguments)
       logger.debug(f"Function {function_name} invoked with {function_args}")
       function_response = function_to_call.invoke(function_args)
       logger.debug(f"Function {function_name}, responded with {function_response}")
       return function_response

   def call_tools(self, output: Union[ChatCompletion, Dict]) -> List[Dict[str, str]]:
       """Call all tools from the ChatCompletion output and return the
       result."""
       if isinstance(output, dict):
           output = ChatCompletion(**output)


       if not need_tool_use(output):
           raise ValueError(f"No tool call was found in ChatCompletion\n{output}")

       messages = []
       # https://platform.openai.com/docs/guides/function-calling
       tool_calls = output.choices[0].message.tool_calls
       for tool in tool_calls:
           function_name = tool.function.name
           function_response = self.call_tool(tool)
           messages.append({
               "tool_call_id": tool.id,
               "role": "tool",
               "name": function_name,
               "content": function_response,
           })
       return messages

def need_tool_use(output: ChatCompletion) -> bool:
   tool_calls = output.choices[0].message.tool_calls
   if tool_calls:
       return True
   return False

def check_function_signature(
   output: ChatCompletion, tool_registry: ToolRegistry = None
):
   tools = output.choices[0].message.tool_calls
   invalid = False
   for tool in tools:
       tool: ToolCall
       if tool.type == "function":
           function_info = tool.function
           if tool_registry:
               if tool_registry.get(function_info.name) is None:
                   logger.error(f"Function {function_info.name} is not available")
                   invalid = True


           arguments = function_info.arguments
           try:
               json.loads(arguments)
           except json.JSONDecodeError as e:
               logger.exception(e)
               invalid = True
       if invalid:
           return False

   return True

Code Explanation

This code defines a ToolRegistry class and associated helper functions for managing and executing tools in an AI system. Let’s break it down:

ToolRegistry class:
- Manages a collection of tools, storing them in both their original form and an OpenAI-compatible format.
- Provides methods to register, retrieve, and execute tools.
Key methods:
- register_tool: Adds a new tool to the registry.
- openai_tools: Property that returns tools in OpenAI’s function format.
- call_tool: Executes a single tool.
- call_tools: Executes multiple tools from a ChatCompletion output.
Helper functions:
- need_tool_use: Checks if a ChatCompletion output requires tool usage.
- check_function_signature: Validates function calls against the available tools.

This ToolRegistry class is a central component for managing and executing tools in an AI system. It allows for:

Easy registration of new tools
Conversion of tools to OpenAI’s function calling format
Execution of tools based on AI model outputs
Validation of tool calls and signatures

The design allows seamless integration with AI models supporting function calling, like those from OpenAI. It provides a structured way to extend an AI system’s capabilities by allowing it to interact with external tools and data sources.

The helper functions need_tool_use and check_function_signature provide additional utility for working with ChatCompletion outputs and validating tool usage.

This code forms a crucial part of a larger system for building AI agents capable of using external tools and APIs to enhance their capabilities beyond simple text generation.

These were the external scripts and other helper functions required to include external tools/functionality and leverage all AI capabilities.

Also read: How Autonomous AI Agents Are Shaping Our Future?

Now, an instance of OpenAIChatCompletion is created.

The get_current_weather tool is bound to this instance.

A message list is created with a user query about London’s weather.

A chat completion is requested using this setup.

llm = OpenAIChatCompletion()
llm.bind_tools([get_current_weather])

messages = [
   {"role": "user", "content": "how is the weather in London today?"}
]

output = llm.chat_completion(messages)
print(output)

The AI understood that to answer the question about London’s weather, it needed to use the get_current_weather function.
Instead of providing a direct answer, it requests that this function be called with “London” as the argument.
In a complete system, the next step would be to execute the get_current_weather function with this argument, get the result, and then potentially interact with the AI again to formulate a final response based on the weather data.

This demonstrates how the AI can intelligently decide to use available tools to gather information before providing an answer, making its responses more accurate and up-to-date.

if need_tool_use(output):
   print("Using weather tool")
   tool_results = llm.run_tools(output)
   print(tool_results)
   tool_results[0]["role"] = "assistant"


   updated_messages = messages + tool_results
   updated_messages = updated_messages + [
       {"role": "user", "content": "Think step by step and answer my question based on the above context."}
   ]
   output = llm.chat_completion(updated_messages)


print(output.choices[0].message.content)

This code:

Check if tools need to be used based on the AI’s output.
Runs the tool (get_current_weather) and prints the result.
Changes the role of the tool result to “assistant.”
Creates an updated message list with the original message, tool results, and a new user prompt.
Sends this updated message list for another chat completion.

The AI initially recognized it needed weather data to answer the question.
The code executed the weather tool to get this data.
The weather data was added to the context of the conversation.
The AI was then prompted to answer the original question using this new information.
The final response is a comprehensive breakdown of London’s weather, directly answering the original question with specific, up-to-date information.

Conclusion

This implementation represents a significant step toward creating more capable, context-aware AI systems. By bridging the gap between large language models and external tools and data sources, we can create AI assistants that understand and generate human-like text that meaningfully interacts with the real world.

Frequently Asked Questions

Q1. What exactly is an AI agent with dynamic tool use?

Ans. An AI agent with dynamic tool use is an advanced artificial intelligence system that can autonomously select and utilize various external tools or functions to gather information, perform tasks, and solve problems. Unlike traditional chatbots or AI models that are limited to their pre-trained knowledge, these agents can interact with external data sources and APIs in real time, allowing them to provide up-to-date and contextually relevant responses.

Q2. How does using a dynamic tool differ from that of regular AI models?

Ans. Regular AI models typically rely solely on their pre-trained knowledge to generate responses. In contrast, AI agents with dynamic tool use can recognize when they need additional information, select appropriate tools to gather that information (like weather APIs, search engines, or databases), use these tools, and then incorporate the new data into their reasoning process. This allows them to handle a much wider range of tasks and provide more accurate, current information.

Q3. What are the potential applications of building AI agents with tool use?

Ans. The applications of building AI agents are vast and varied. Some examples include:
– Personal assistants who can schedule appointments, check real-time information, and perform complex research tasks.
– Customer service bots that can access user accounts, process orders, and provide product information.
– Financial advisors who can analyze market data, check current stock prices, and provide personalized investment advice.
– Healthcare assistants who can access medical databases interpret lab results and provide preliminary diagnoses.
– Project management systems that can coordinate tasks, access multiple data sources, and provide real-time updates.

Source link

Blog