In this lesson, we'll explain the relationship between external memory and RAG. We'll also implement agentic RAG in two ways. First, by copying the data directly into the agent's archival memory, and second, giving the agent access to a LangChain tool to query the web. You'll build a research agent to demonstrate this. Let's get coding. We learned in previous lessons that MemGPT agents can use their external memory, so both recall and archival memory for Agentic RAG. So this basically means retrieving information from archival and recall to inform the new messages that it generates. Unlike traditional RAG, agentic RAG allows agents to define both when and how to retrieve the data. So, for example, with archival memory search, it also decides what the actual query into the search function should be. So you'll often see variations in what the actual query that the agent makes is. So for example here it shows the query dog. In MemGPT agents have a general purpose archival memory that is the default external memory for the agent. So data can be saved in here by both the agent and the user to be used for RAG. So for example, rather than the agent explicitly storing memories, you as a user can also do things like upload files like a handbook PDF into the archival memory. Agents can also have additional forms of external memory or retrieve sources for RAG by via tools. So in this lab we're going to implement an agent that has access to additional tools for RAG. So we're first going to go over an example of accessing a fake external database. And then also running a Google search using LangChain. So first we're going to import our usual notebook print tool. And then we're also going to load some environment variables so that we can use to really search. As always we'll also create our Letta client. We'll also set the default LLM for the Letta client to be GPT-4-o-mini. We're first going to go over how to load data manually as a user into archival memory. So in a previous lab, we went over how to insert a string into archival memory. But you may also want to load data sources. So for example load data from a file into archival memory. So we can do this using the client create source function and specify a source name. So we'll name our source the employee handbook. And so when this one says created, you can see that it also is configured with an embedding config which specifies what is the embedding model that's going to be used to generate embeddings for this source data. Data within a source has to all use the same embedding config, so that we can make sure that our embedding-based queries are accurate. Next, we're going to load a file into the source called Handbook dot pdf. So this is just a fake handbook that I generated using ChatGPT. And we're going to specify the source by the ID to tell the client to load data from this handbook dot PDF and load it into the source. So now that data has been loaded into the source, we can actually attach it to an agent. So let's first, just create a basic MemGPT agent. You can also specify a name if you'd like but you don't have to. And this agent currently has no archival memory since we haven't even interacted with it at all. But we're now going to attach the source data to the agent. We're going to do this by using the client attach source function and specifying the agent ID and the source ID. When we run this, what's happening is that the data inside of the source is actually being copied into the agent's archival memory. The source contains data that has already been pre-processed, so it's already chunked to the data inside of Handbook dot PDF. And also generated embeddings for it. We can see what sources are attached to agent using the list sources function. Now we can ask our agent a question that relates to something like the company's vacation policies that's contained in the handbook. You don't necessarily need to add search archival because ideally the agent does realize that it doesn't have this answer to the information and searches archival automatically. But for the sake of this notebook, I'm going to add this phrase to make sure that the agent definitely searches archival memory. So we first see that there's an internal monologue. The user is asking about the company's vacation policies, and so it realizes that it needs to do an archival memory search. So it decides the query vacation policies into its archival memory. And since we loaded data into the archival memory store through the source, there's actually a bunch of data in here corresponding to the employee handbook that we loaded in. You can see that it returned five results in the archival memory search. And then now it decides to respond to the user about vacation policies to keep the conversation engaging. So just for fun, I made the vacation policies in this handbook really horrible. So it answers. It looks like vacation policies in our company are quite specific. It will need to provide an AI agent that can fully perform your duties during any leave, and ensure that they've been tested and approved the month in advance. What do you think about that process? But of course, if you have a different employee handbook, you will get a different response because the agent is actually using this retrieve data to run RAG and give a response based off of that retrieved data. So we just saw how the agent can connect to external data using archival memory. But rather than using archival memory search, you can actually just also add your own custom tools. So for example, if you already have your own RAG data pipeline that you're pretty happy with, you can actually just directly connect your MemGPT agent to that as opposed to having to load data into archival memory. So we have this fake query birthday database tool that we made. And so what this tool does is if given a name looks up the birthday of someone. So the argument is a string. And then the birthday is returned as the function response. And so our queried database is actually just a dictionary. But for the sake of this example let's pretend that it's a database. So all this function does is look up the name in the dictionary and then return the data if it finds it returns none if it doesn't. So the way that we can connect this function to MemGPT is to create a tool with this function. We can do this by calling client dot create tool and simply passing in the function. When this function is in Letta will actually parse the function's docstring to generate, OpenAI Json schema for the function automatically. So we just created this tool and we can also look at the tool object that it created. So this is a bit difficult to parse at a high level. It'll include the actual source code as well as the Json schema to pass to OpenAI. So now we can create an agent that has access to this birthday database. We'll call this agent birthday agent and give it access to a tool birthday tool. And then we'll also give it the standard chat memory, wherein the persona we tell it that they're an agent with access to a birthday database, and that they can look up information about the user's birthday. So now that we've created this agent, we can now ask it a very simple question "when is my birthday?" And since it knows my name or your name, if you changed it, it should be able to look up this information and tool. So looking at these responses, we can see that the agent's internal monologue decided to fetch the birthday. So it calls query birthday database gets the response which is March 6th, 1987, my birthday. And then now it responds, including the birthday. So this is a very simple example, but hopefully it shows you how you can connect external data tables that you might have to a MemGPT agent by specifying a tool that has a query function. Now we're going to go over a pretty similar example, except we're going to use LangChain to search the web using a LangChain tool. Letta actually supports tools from both LangChain and CrewAI. So if there's tools that you want to use from those packages you can use them along with Letta agents. In this example we're going to create a research agent that is able to search the web to answer a question. And then also includes citations. So first to make sure that we can get set up with Tavily, we're going to import our Tavily API key. And then we're going to import the Tavily search results tool from LangChain. So just to make sure this is working you can enter some kind of query. So I'm going to do what Obama's first name. Also feel free to enter your own query. And then we're going to see what the output is. So this is the response that the tool generates a lot of text actually. But it provides a URL also and some kind of summary of the content. You can try running this tool by calling dot run on some kind of query. And then you'll get results back like this which include both the URL and the content. This is a lot of tokens, so it might actually overflow in LLMs context window. But in Letta's framework, we actually will automatically trim the responses of functions that return results that are too long. We can import this LangChain tool into Letta by calling the tool dot from LangChain function and passing in the LangChain tool. So this will create a Letta tool object. Once we create the tool, we need to actually persist it so that we can add it to different agents. Now we can create a research agent that uses a Tavily search tool. First, we're going to define the research agent persona. This is basically telling the agent that they can use the search tool by its name in order to search the web. And then we also instruct the agent to generate references or to provide links to its sources. So here's an example here of if the user asks "What's Obama's first name?" Even if the assistant answer is in their response, they should also include sources. Using this persona, we can now create the research agent. So we'll name the agent research agent and we'll pass in the search tool name. And then we'll of course also create the chat memory class which uses the research agent persona. So now we can ask the research agent a question like "Who founded OpenAI?" So we can see here that because the function return was really large, it actually got truncated by Letta. Looking at the response messages, we can see that the agent successfully called the imported LangChain tool to run a Tavily search of who founded OpenAI. And then it uses this information to respond to the user with the send message function to provide both the answer and then also citations. I have actually had some trouble with GPT4-o-mini where it doesn't always do this. So if you were unlucky, you can also run the same query with a GPT-4 based agent instead. But you should make sure to only run this once or twice, because it's very expensive and we don't want you to run out of OpenAI credits. So to do this, you can actually import and LLM config object from Letta. Then we can create an agent. This time we'll just name a GPT four search agent which uses the GPT-4 config. You can also pass in your custom LLM config here as well. If you're running this locally. And now we can ask the same question. But this time it'll run with GPT-4 instead of GPT-4-o-mini. You'll probably notice that the query runs much more slowly. This is because GPT-4 is much higher latency and more expensive model. At the same time, the results are usually a lot more consistent. So if you didn't get good results with the previous example, you can try this out Just like it was supposed to, it provides both the response as well as the links. So now we've successfully implemented a MemGPT-based research using Letta as well as LangChain tools. So congratulations, We've implemented an example where we copy data into the agent's archival memory. And then also examples where we define a query into an external database as well as using LangChain-based tools. If you already use frameworks like LangChain and CrewAI, you can use those same tools, but also get the memory capabilities of MemGPT by using Letta.

Learn Code

Next Lesson

LLMs as Operating Systems: Agent Memory

Introduction

Editable memory

Understanding MemGPT

Building Agents with Memory

Programming Agent Memory

Agentic RAG and External Memory

Multi-agent Orchestration

Conclusion

Appendix - Tips, Help, and Download

Course Feedback

Community