How to create agents with LlamaIndex
How to create a knowledge retrieval agent that can query and retrieve information from any dataset using LlamaIndex, OpenAI, and Hugging Face.
Aug 14, 2025 • 9 Minute Read

Artificial Intelligence (AI) is reshaping the way we interact with technology, and one of its most powerful applications is the creation of intelligent agents. These agents act like autonomous problem-solvers, capable of answering questions, retrieving knowledge, and even making decisions. In this article, we’ll dive into how to create a knowledge retrieval agent using LlamaIndex, a powerful framework for connecting Large Language Models (LLMs) with external data sources.
If you’re an intermediate-level AI enthusiast ready to implement agentic AI, buckle up. By the end of this article, you’ll not only understand the concepts behind LlamaIndex but also have a working knowledge retrieval agent you can deploy.
What are agents, and why use LlamaIndex?
Imagine a virtual librarian who can quickly search through thousands of books and hand you exactly the information you need. That’s essentially what an AI agent does. It autonomously processes tasks, retrieves knowledge, and interacts with users to deliver results.
Now, the challenge is: how do you give this “librarian” access to external knowledge, such as text files, databases, or articles? This is where LlamaIndex comes in.
LlamaIndex: A bridge between data and LLMs
LlamaIndex (formerly GPT Index) simplifies the process of integrating LLMs, like OpenAI’s GPT-4, with external data. Instead of overwhelming the model with vast amounts of context, LlamaIndex allows the AI to query a pre-built index of your data, ensuring faster, more relevant responses.
Key benefits include:
Efficient knowledge retrieval from both structured (e.g., tables) and unstructured data (e.g., plain text).
Compatibility with OpenAI’s API and Hugging Face Transformers.
Simplified implementation for building intelligent agents.
Building your knowledge retrieval agent
Let’s break down the process step by step. By the end, you’ll have a fully functional agent ready to retrieve knowledge from your custom dataset.
Step 1: Setting up your environment
First, let’s ensure you have all the required tools installed. You’ll need LlamaIndex, OpenAI’s API, and Hugging Face Transformers.
pip install llama-index openai transformers
You’ll also need an OpenAI API key. Sign up at OpenAI if you don’t already have one and securely store your API key. In your code, load the key like this
import os
# Set your OpenAI API key
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"
Step 2: Preparing your dataset
Your agent needs something to query. Let’s prepare a dataset—a collection of text documents or articles that the agent can search.
For this example, save your text files in a folder named data/. Then, use LlamaIndex’s SimpleDirectoryReader to load these documents into your program:
from llama_index import SimpleDirectoryReader
# Load documents from the data folder
documents = SimpleDirectoryReader('./data').load_data()
print(f"Loaded {len(documents)} documents!")
If you don’t have sample documents, create a few .txt files with topics like AI applications or benefits of technology. This will give your agent something to query.
Step 3: Building the index
An index is like a search-friendly map of your data. LlamaIndex offers several types of indices, such as vector-based and keyword-based indices. For simplicity, let’s create a vector index, which uses embeddings to find similar text.
from llama_index import GPTVectorStoreIndex
# Build the index from the loaded documents
index = GPTVectorStoreIndex.from_documents(documents)
# Save the index for later use
index.storage_context.persist('./index_storage')
Here’s what’s happening:
GPTVectorStoreIndex converts your data into embeddings, making it easier to search.
The persist() method saves the index to disk, so you can reuse it later.
Step 4: Querying the index
Now that the index is ready, let’s query it. Here, we’ll integrate the OpenAI API to handle the query and response.
# Query the index
def query_index(query: str, index):
response = index.query(query)
return response.response
# Example query
query = "What are the benefits of AI in healthcare?"
response = query_index(query, index)
print("Agent Response:", response)
This simple function takes a user query, runs it through the index, and returns the most relevant response.
Step 5: Enhancing with Hugging Face
While OpenAI’s models are great at understanding queries, you can use Hugging Face Transformers to refine responses. For instance, you might use a summarization model to clean up verbose answers.
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
# Load a Hugging Face model
tokenizer = AutoTokenizer.from_pretrained("t5-large")
model = AutoModelForSeq2SeqLM.from_pretrained("t5-large")
# Refine the response
def refine_response(query, raw_response):
inputs = tokenizer(query + raw_response, return_tensors="pt", truncation=True)
outputs = model.generate(**inputs, max_length=50)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Refine the agent's response
refined_response = refine_response(query, response)
print("Refined Response:", refined_response)
In this example:
The Hugging Face T5-large model refines the raw response by summarizing or rephrasing it.
This step improves the quality of answers, especially for verbose or unstructured data.
Adding a datastore from AWS S3 or Azure Blob Storage
In production environments, datasets often reside in cloud storage services like AWS S3 or Azure Blob Storage instead of local filesystems. LlamaIndex provides multiple ways to work with these cloud storage systems, ensuring flexibility for different use cases. You can choose between:
Manual Download: Retrieve files from cloud storage manually and process them locally.
Direct Integration with AWS S3: Use LlamaIndex's built-in SimpleS3Reader class for seamless access to S3 buckets.
Direct Integration with Azure Blob Storage: Use LlamaIndex's SimpleBlobReader to load data directly from Azure Blob containers.
Let’s explore these options in detail.
Option 1: Manual download and processing
The first approach involves downloading files from cloud storage to your local machine and then loading them into LlamaIndex. This is useful when you want full control over the downloaded files or when processing data offline.
AWS S3: Manual download
We use the boto3 library to fetch files from an S3 bucket, save them locally, and then load them using SimpleDirectoryReader.
import boto3
from llama_index import SimpleDirectoryReader
import os
# AWS S3 Configuration
AWS_ACCESS_KEY = "your-aws-access-key"
AWS_SECRET_KEY = "your-aws-secret-key"
AWS_BUCKET_NAME = "your-s3-bucket-name"
LOCAL_DOWNLOAD_PATH = "./s3_data/"
# Initialize the S3 client
s3_client = boto3.client(
's3',
aws_access_key_id=AWS_ACCESS_KEY,
aws_secret_access_key=AWS_SECRET_KEY
)
# Download all files from the S3 bucket
def download_s3_bucket(bucket_name, local_path):
os.makedirs(local_path, exist_ok=True)
response = s3_client.list_objects_v2(Bucket=bucket_name)
for obj in response.get('Contents', []):
s3_client.download_file(bucket_name, obj['Key'], os.path.join(local_path, obj['Key']))
print(f"Downloaded: {obj['Key']}")
# Download files and load into LlamaIndex
download_s3_bucket(AWS_BUCKET_NAME, LOCAL_DOWNLOAD_PATH)
documents = SimpleDirectoryReader(LOCAL_DOWNLOAD_PATH).load_data()
print(f"Loaded {len(documents)} documents from S3!")
Azure Blob Storage: Manual download
For Azure Blob Storage, we use the azure-storage-blob library in a similar manner to fetch and save files locally.
from azure.storage.blob import BlobServiceClient
from llama_index import SimpleDirectoryReader
import os
# Azure Blob Storage Configuration
AZURE_CONNECTION_STRING = "your-azure-connection-string"
AZURE_CONTAINER_NAME = "your-container-name"
LOCAL_DOWNLOAD_PATH = "./azure_blob_data/"
# Initialize the BlobServiceClient
blob_service_client = BlobServiceClient.from_connection_string(AZURE_CONNECTION_STRING)
container_client = blob_service_client.get_container_client(AZURE_CONTAINER_NAME)
# Download all blobs from the Azure container
def download_azure_container(container_client, local_path):
os.makedirs(local_path, exist_ok=True)
blob_list = container_client.list_blobs()
for blob in blob_list:
blob_client = container_client.get_blob_client(blob)
with open(os.path.join(local_path, blob.name), "wb") as file:
file.write(blob_client.download_blob().readall())
print(f"Downloaded: {blob.name}")
# Download files and load into LlamaIndex
download_azure_container(container_client, LOCAL_DOWNLOAD_PATH)
documents = SimpleDirectoryReader(LOCAL_DOWNLOAD_PATH).load_data()
print(f"Loaded {len(documents)} documents from Azure Blob Storage!")
Option 2: Direct integration with AWS S3 using SimpleS3Reader
If you prefer a more streamlined approach, LlamaIndex provides the SimpleS3Reader class for direct integration with AWS S3. This class handles authentication, data retrieval, and parsing behind the scenes.
Code Example: Using SimpleS3Reader
from llama_index import SimpleS3Reader
# AWS S3 Configuration
AWS_ACCESS_KEY = "your-aws-access-key"
AWS_SECRET_KEY = "your-aws-secret-key"
AWS_BUCKET_NAME = "your-s3-bucket-name"
# Initialize the S3 reader
s3_reader = SimpleS3Reader(
bucket=AWS_BUCKET_NAME,
aws_access_key_id=AWS_ACCESS_KEY,
aws_secret_access_key=AWS_SECRET_KEY,
)
# Load documents directly from S3
documents = s3_reader.load_data()
print(f"Loaded {len(documents)} documents from S3!")
How It Works
Authentication: The SimpleS3Reader takes your AWS access key and secret key to authenticate with the S3 bucket.
File Parsing: It fetches and processes documents directly from the bucket without needing local storage.
LlamaIndex Compatibility: Documents are loaded into LlamaIndex’s internal format, ready for indexing and querying.
Option 3: Direct integration with Azure Blob Storage using SimpleBlobReader
For Azure Blob Storage, LlamaIndex offers the SimpleBlobReader class, which simplifies accessing and loading data from Azure containers.
Code Example: Using SimpleBlobReader
from llama_index import SimpleBlobReader
# Azure Blob Storage Configuration
AZURE_CONNECTION_STRING = "your-azure-connection-string"
AZURE_CONTAINER_NAME = "your-container-name"
# Initialize the Blob reader
blob_reader = SimpleBlobReader(
connection_string=AZURE_CONNECTION_STRING,
container_name=AZURE_CONTAINER_NAME,
)
# Load documents directly from Azure Blob Storage
documents = blob_reader.load_data()
print(f"Loaded {len(documents)} documents from Azure Blob Storage!")
How it works
Authentication: The SimpleBlobReader uses your Azure connection string to authenticate with the blob storage service.
Blob Retrieval: It fetches and processes all blobs (files) from the specified container.
LlamaIndex Compatibility: Like the S3 reader, it prepares documents for indexing and querying seamlessly.
Which method should you choose?
Approach |
When to use |
Manual Download |
You need full control over downloaded files or work offline after retrieving data. |
SimpleS3Reader |
You use AWS S3 and want direct integration without managing file downloads manually. |
SimpleBlobReader |
You use Azure Blob Storage and need seamless integration for real-time indexing and querying. |
Advantages of built-in readers
Efficiency: Directly integrates cloud storage into your workflow, reducing complexity.
Real-Time Processing: Ideal for dynamic or frequently updated datasets.
Less Boilerplate: No need for external libraries like boto3 or azure-storage-blob unless you need custom functionality.
Real-world use cases
Here are some practical applications for your knowledge retrieval agent:
Customer Support Chatbots:
Train the agent on user manuals or FAQs to provide instant answers to customer queries.
Research Assistants:
Load academic articles or reports, enabling researchers to retrieve summaries or references effortlessly.
Enterprise Knowledge Management:
Use the agent to search company policies, internal documents, or training materials.
Deploying the agent can be as simple as wrapping it in a Flask app or deploying it to a serverless platform like AWS Lambda.
Troubleshooting and best practices
Common pitfalls
Memory Errors:
If your dataset is too large, use chunking to split documents into smaller parts before indexing.
Inconsistent Results:
Experiment with parameters like temperature and max_tokens to control the model’s creativity and output length.
API Errors:
Ensure your API keys are correct and check rate limits for OpenAI and Hugging Face.
Optimization tips
Hierarchical Queries: Use advanced indices for layered queries, e.g., searching by topic first, then diving into details.
Metadata Tagging: Tag documents with metadata (e.g., author, date) to improve query relevance.
Conclusion
Congratulations! You’ve just built a knowledge retrieval agent using LlamaIndex, OpenAI, and Hugging Face. By combining these tools, you’ve created an intelligent system capable of querying and retrieving information from any dataset.
The possibilities are endless—whether you’re building chatbots, research assistants, or enterprise tools, LlamaIndex empowers you to harness the power of LLMs effectively.
Ready to take it further? Experiment with larger datasets, custom Hugging Face models, or advanced LlamaIndex features. The future of agentic AI is in your hands!
If you're interested in expanding your knowledge on AI systems, check out my other related guides:
Advance your tech skills today
Access courses on AI, cloud, data, security, and more—all led by industry experts.