When building applications using Large Language Models (LLMs), the quality of responses heavily depends on effective planning and reasoning capabilities for a given user task. While traditional RAG techniques are powerful, incorporating Agentic workflows can significantly enhance the system’s ability to process and respond to queries.
In this article, you will build an Agentic RAG system with memory components using the Phidata open-source Agentic framework, demonstrating how to combine vector databases i.e., Qdrant, embedding models, and intelligent agents for improved results.
Learning Objectives
- Understand and design the architecture for the components required for Agentic RAG systems.
- How do vector databases and embedding models for knowledge base creation be integrated within the Agentic workflow?
- Learn to implement memory components for improved context retention
- Develop an AI Agent that can perform multiple tool calls and decide what tool to choose based on user questions or tasks using the Phidata.
- Real-world use case to build a Document Analyzer Assistant Agent that can interact with personal information from the knowledge base and DuckDuckGo in the absence of context in the knowledge base.
This article was published as a part of the Data Science Blogathon.
What is Agents and RAG?
Agents in the context of AI are components designed to emulate human-like thinking and planning capabilities. Agents components consist of:
- Task decomposition into manageable subtasks.
- Intelligent decision-making about which tools to use and take necessary Action.
- Reasoning about the best approach to solving a problem.
RAG (Retrieval-Augmented Generation) combines knowledge retrieval with LLM capabilities. When we integrate agents into RAG systems, we create a powerful workflow that can:
- Analyze user queries intelligently.
- Save the user document inside a knowledge base or Vector database.
- Choose appropriate knowledge sources or context for the given user query.
- Plan the retrieval and response generation process.
- Maintain context through memory components.
The key difference between traditional RAG and Agentic RAG lies in the decision-making layer that determines how to process each query and interact with tools to get real-time information.
Now that we know, there is a thing like Agentic RAG, how do we build it? Let’s break it down.
What is Phidata?
Phidata is an open-source framework designed to build, monitor, and deploy Agentic workflows. It supports multimodal AI agents equipped with memory, knowledge, tools, and reasoning capabilities. Its model-agnostic architecture ensures compatibility with various large language models (LLMs), enabling developers to transform any LLM into a functional AI agent. Additionally, Phidata allows you to deploy your Agent workflows using a bring your own cloud (BYOC) approach, offering both flexibility and control over your AI systems.
Key features of Phidata include the ability to build teams of agents that collaborate to solve complex problems, a user-friendly Agent UI for seamless interaction (Phidata playground), and built-in support for agentic retrieval-augmented generation (RAG) and structured outputs. The framework also emphasizes monitoring and debugging, providing tools to ensure robust and reliable AI applications.
Agents Use Cases Using Phidata
Explore the transformative power of Agent-based systems in real-world applications, leveraging Phidata to enhance decision-making and task automation.
Financial Analysis Agent
By integrating tools like YFinance, Phidata allows the creation of agents that can fetch real-time stock prices, analyze financial data, and summarize analyst recommendations. Such agents assist investors and analysts in making informed decisions by providing up-to-date market insights.
Web Search Agent
Phidata also helps develop agents capable of retrieving real-time information from the web using search tools like DuckDuckGo, SerpAPI, or Serper. These agents can answer user queries by sourcing the latest data, making them valuable for research and information-gathering tasks.
Multimodal Agents
Phidata also supports multimodal capabilities, enabling the creation of agents that analyze images, videos, and audio. These multimodal agents can handle tasks such as image recognition, text-to-image generation, audio transcription, and video analysis, offering versatile solutions across various domains. For text-to-image or text-to-video tasks, tools like DALL-E and Replicate can be integrated, while for image-to-text and video-to-text tasks, multimodal LLMs such as GPT-4, Gemini 2.0, Claude AI, and others can be utilized.
Real-time Use Case for Agentic RAG
Imagine you have documentation for your startup and want to create a chat assistant that can answer user questions based on that documentation. To make your chatbot more intelligent, it also needs to handle real-time data. Typically, answering real-time data queries requires either rebuilding the knowledge base or retraining the model.
This is where Agents come into play. By combining the knowledge base with Agents, you can create an Agentic RAG (Retrieval-Augmented Generation) solution that not only improves the chatbot’s ability to retrieve accurate answers but also enhances its overall performance.
We have three main components that come together to form our knowledge base. First, we have Data sources, like documentation pages, PDFs, or any websites we want to use. Then we have Qdrant, which is our vector database – it’s like a smart storage system that helps us find similar information quickly. And finally, we have the embedding model that converts our text into a format that computers can understand better. These three components feed into our knowledge base, which is like the brain of our system.
Now we define the Agent object from Phidata.
The agent is connected to three components:
- A Reasoning Model (like GPT-4, Gemini 2.0, or Claude) that helps it think and plan.
- Memory (SqlAgentStorage) that helps it remember previous conversations
- Tools (like DuckDuckGo search) that it can use to find information
Note: Here Knowledge Base and DuckDuckGo both will act as a tool, and then based on a task or user query the Agent will take Action on which tool to use to generate the response. Also Embedding model is OpenAI by default, so we will use OpenAI – GPT-4o as the reasoning model.
Let’s build this code.
Step-by-Step Code Implementation: Agentic RAG using Qdrant, OpenAI, and Phidata
It’s time to build a Document Analyzer Assistant Agent that can interact with personal information (A website) from the knowledge base and DuckDuckGo in the absence of context in the knowledge base.
Step1: Setting Up Dependencies
To build the Agentic RAG workflow we need to install a few libraries that include:
- Phidata: To define the Agent object and workflow execution.
- Google Generative AI – Reasoning model i.e., Gemini 2.0 Flash
- Qdrant – Vector database where the knowledge base will be saved and later used to retrieve relevant information
- DuckDuckGo – Search engine used to extract real-time information.
pip install phidata google-generativeai duckduckgo-search qdrant-client
Step2: Initial Configuration and Setup API keys
In this step, we will set up the environment variables and gather the required API credentials to run this use case. For your OpenAI API key, you can get it from: https://platform.openai.com/. Create your account and create a new key.
from phi.knowledge.website import WebsiteKnowledgeBase
from phi.vectordb.qdrant import Qdrant
from phi.agent import Agent
from phi.storage.agent.sqlite import SqlAgentStorage
from phi.model.openai import OpenAIChat
from phi.tools.duckduckgo import DuckDuckGo
import os
os.environ['OPENAI_API_KEY'] = "<replace>"
Step3: Setup Vector Database – Qdrant
You now will have to initialize the Qdrant client by providing the collection name, URL, and API key for your vector database. The Qdrant database stores and indexes the knowledge from the website, allowing the agent to perform retrieval of relevant information based on user queries. This step sets up the data layer for your agent:
- Create cluster: https://cloud.qdrant.io/
- Give a name to your cluster and copy the API key once the cluster is created.
- Under the curl command, you can copy the Endpoint URL.
COLLECTION_NAME = "agentic-rag"
QDRANT_URL = "<replace>"
QDRANT_API_KEY = "<replace>"
vector_db = Qdrant(
collection=COLLECTION_NAME,
url=QDRANT_URL,
api_key=QDRANT_API_KEY,
)
Step4: Creating the knowledge base
Here, you’ll define the sources from which the agent will pull its knowledge. In this example, we are building a Document analyzer agent that can make our job easy to answer questions from the website. We will use the Qdrant document website URL for indexing.
The WebsiteKnowledgeBase object interacts with the Qdrant vector database to store the indexed knowledge from the provided URL. It’s then loaded into the knowledge base for retrieval by the agent.
Note: Remember we use the load function to index the data source to the knowledge base. This needs to be run just once for each collection name, if you change the collection name and want to add new data, only that time run the load function again.
URL = "https://qdrant.tech/documentation/overview/"
knowledge_base = WebsiteKnowledgeBase(
urls = [URL],
max_links = 10,
vector_db = vector_db,
)
knowledge_base.load() # only run once, after the collection is created, comment this
Step5: Define your Agent
The Agent configures an LLM (GPT-4) for response generation, a knowledge base for information retrieval, and an SQLite storage system to track interactions and responses as Memory. It also sets up a DuckDuckGo search tool for additional web searches when needed. This setup forms the core AI agent capable of answering queries.
We will set show_tool_calls
to True
to observe the backend runtime execution and track whether the query is routed to the knowledge base or the DuckDuckGo search tool. When you run this cell, it will create a database file where all messages are saved by enabling memory storage and setting add_history_to_messages
to True
.
agent = Agent(
model=OpenAIChat(id="gpt-4o"),
knowledge=knowledge_base,
tools=[DuckDuckGo()],
show_tool_calls=True,
markdown=True,
storage=SqlAgentStorage(table_name="agentic_rag", db_file="agents_rag.db"),
add_history_to_messages=True,
)
Step6: Try Multiple Query
Finally, the agent is ready to process user queries. By calling the print_response() function, you pass in a user query, and the agent responds by retrieving relevant information from the knowledge base and processing it. If the query is not from the knowledge base, it will use a search tool. Lets observe the changes.
Query -1: From the knowledge base
agent.print_response(
"what are the indexing techniques mentioned in the document?",
stream=True
)
Query-2 Outside the knowledge base
agent.print_response(
"who is Virat Kohli?",
stream=True
)
Advantages of Agentic RAG
Discover the key advantages of Agentic RAG, where intelligent agents and relational graphs combine to optimize data retrieval and decision-making.
- Enhanced reasoning capabilities for better response generation.
- Intelligent tool selection based on query contexts such as Knowledge Base and DuckDuckGo or any other tools from where we can fetch the context that can be provided to the Agent.
- Memory integration for improved context awareness that can remember and extract history conversation messages.
- Better planning and task decomposition, the primary part in Agentic workflow is to get the task and break it down into sub-tasks, and then make better decisions and action plans.
- Flexible integration with various data sources such as PDF, Website, CSV, Docs, and many more.
Conclusion
Implementing Agentic RAG with memory components provides a reliable solution for building intelligent knowledge retrieval systems and search engines. In this article, we explored what Agents and RAG are, and how to combine them. With the combination of Agentic RAG, query routing improves due to the decision-making capabilities of the Agents.
Key Takeaways
- Discover how Agentic RAG with Phidata enhances AI by integrating memory, a knowledge base, and dynamic query handling.
- Learn to implement an Agentic RAG with Phidata for efficient information retrieval and adaptive response generation.
- The Phidata data library provides a streamlined implementation process with just 30 lines of core code along with Multimodal such as Gemini 2.0 Flash.
- Memory components are crucial for maintaining context and improving response relevance.
- Integration of multiple tools (knowledge base, web search) enables flexible information retrieval – Vector databases like Qdrant provide advanced indexing capabilities for efficient search.
Frequently Asked Questions
A. Yes, Phidata is built to support multimodal AI agents capable of handling tasks involving images, videos, and audio. It integrates tools like DALL-E and Replicate for text-to-image or text-to-video generation, and utilizes multimodal LLMs such as GPT-4, Gemini 2.0, and Claude AI for image-to-text and video-to-text tasks.
A. Developing Agentic Retrieval-Augmented Generation (RAG) systems involves utilizing various tools and frameworks that facilitate the integration of autonomous agents with retrieval and generation capabilities. Here are some tools and frameworks available for this purpose: Langchain, LlamaIndex, Phidata, CrewAI, and AutoGen.
A. Yes, Phidata allows the integration of various tools and knowledge bases. For instance, it can connect with financial data tools like YFinance for real-time stock analysis or web search tools like DuckDuckGo for retrieving up-to-date information. This flexibility enables the creation of specialized agents tailored to specific use cases.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.