Self-editing memory refers to the idea of allowing an LLM to update its own long-term memory or persistent memory over time. Self-editing memory is an essential aspect of building agents that can learn and improve over time. In this lesson, you'll learn how to build an agent with self-editing memory from scratch. Let's get coding. There might be some confusion about Letta and MemGPT. You'll be seeing both terms in this course. Letta is an open source agents framework that allows you to easily build and deploy persistent agents as services. So with APIs. MemGPT is a research paper that introduced the concept of self-editing memory for LLMs. MemGPT also refers to a type of agent design described in the paper that was inspired by operating systems, which has two tiers of memory. Among other specific design choices. You can build MemGPT agents using the Letta framework. So what makes a chatbot different from an agent? While most chatbots use LLMs to generate chat responses, under the hood, you're using an LLM to generate a new response in a conversation. So in this case, the user says "hello". We append that to the conversation and we ask the LLM to generate the next message in that conversation. It generates a response where the robot says, "hey there". So what makes an agent different from a chatbot? Well, agents act autonomously by taking multi-step actions in an agentic loop. So for one user message, an agent might actually run the LLM multiple times. In this case, it might actually think to itself for a little bit, "Hey, I just got a message. Do I know anything about who sent it?" Then the agent might run a memory search tool, and after that, the agent might finally decide to send a response saying, "Hey, Sarah, it's good to see you again. I hope you're having a great birthday." When the agent ran the memory search tool, it was able to find additional information about who the user was, and it even found out that the user's birthday is today. So in this example, we actually witnessed one user message trigger three different LLM calls. This is what we call the agentic loop. To implement multi-step reasoning with an LLM there needs to be a reasoning loop that has updates to state. So we have agent state that gets put into a context window. That context window is the input to LLM inference. And the result of LLM inference is used to update the agent state. We call a single reasoning step in this loop the agent step. So how do we add memory to agents? Long term memory is a part of the agent state, which gets compiled into the context window. We'll call this process context compilation. Let's imagine a case where the agent doesn't actually have any long-term memories. In this case, the context window could be something very simple. Just you are a helpful assistant. Well, one of the agent actually does have some memory, for example, that the user's name is Sarah and that their birthday is 1/1/2001. Context compilation refers to the idea that we need to put this agent state into the context window somehow. One way to do it, like in this example, is to convert the dictionary data into some sort of sentence. So now we understand how we might add memory to agents. But how do we make this memory editable. Well we can make the agent memory self-editing through the use of special memory tools. For example, let's say the human says hello. The agent, based off of the memory, thinks that the human's name is Sarah. Sarah replies. "Hi, Sarah." The human then says, "My name is actually Charles, not Sarah." In this case, the agent might think for a little bit, oh no, my memory must be faulty. Time to correct it. Then it calls a special function that specifically updates the agent state, referring to the name of the human. Then, because the agent can run multiple times, it'll reply. "Sorry, Charles." Let's actually put these lessons into practice and build an agent with self-editing memory from scratch. We'll walk you through how to use OpenAI's tool calling to implement some basic memory management features. Let's start by setting up OpenAI. Next, we need to decide on a model. Let's use GPT-4-o-mini. It's a good balance of speed, cost, and human reasoning ability. The next thing we need to define is the system prompt. Let's keep it really simple for now and just use "you're chatbot". The next step is to actually make an LLM request to OpenAI. The format of our message request is to put the system prompt up front and then the chat history next. In this case, we have an empty chat. So the only messages from the user asking "What is my name?" What do we expect should happen? Well, there's no context about what my name is. So as expected, the agent says, "I'm sorry, I don't know your name. How can I assist you today?" So next we'll go over how to actually add memory to the context window. The most basic form of memory we could probably have in Python is a Python dictionary. Let's set up an agent memory dictionary which has one field, human. This is going to keep all the memories related to the human. In this case, the only memory we have is that the human's name is Bob. Note that we'll actually have to update our system prompt. Our system prompt is no longer just your chatbot, but it also includes additional information that specifically tells the LLM that it does have some sort of memory. We tell the LLM that it has a section of its context called memory, and that it contains information relevant to your conversation. So we're explicitly instructing the LLM to use its memory to personalize the conversation. Next, let's actually send this request to OpenAI. You'll notice the main difference here is that we're now using a new system prompt, but we're also manually injecting the memory into the system prompt. Like before, the chat history only has one message. "What is my name?" So what do we expect will happen here? Well, in this case, we are actually providing information about the human's name. So we would expect that the chatbot returns something along the lines of "your name is Bob." Great. So everything is working as expected. In this section, we're going to go over how to actually make this memory object re-readable by the agent. Because we're representing our memory in Python, remember it's just a Python dictionary. The way we would define a memory editing tool is to simply write a Python function. Let's create a simple function called core memory save. Core memory save takes two arguments. The section and the memory. Core memory save will index into the section and append the memory to that section. Let's see how this works in practice. So the current memory is empty. It has two fields, but both are just empty strings. Now let's try running this memory editing tool ourselves. All right, so we ran the core memory save with some additional information about the human that the human's name is Charles. So this should have updated the memory. Great. So now we know our memory editing tool is working as we thought. The next step is to find a way to actually describe to the LLM how it should use this tool. With OpenAI's API they require you to do two things: You have to provide a description of the function. In this case, our description is to save important information about you, the agent, or the human you're chatting with. The next thing we need to do is provide some sort of Json schema. This is basically describing in a programmatic way what the arguments to the function are, and what the types of the arguments are. Additionally, we specify that both of the arguments are required. Now that we have our tool editing, description and schema, we can pass that metadata into OpenAI. So OpenAI will use this information to inform the LLM that it has access to these tools. So let's take a quick look at this call again. We're passing in the system prompt. Then we're also passing in the memory. And then finally we're passing in the chat history. Again the chat history here is a single message saying "My name is Bob." So let's think about what we would expect to happen. Well, the user is saying my name is Bob, but the memory is initialized in an empty state. So we would hope that the agent here will decide to use the core memory save function to save the information about the user's name being Bob. Let's see what happens. You can see that the finished reason is tool calls. This means that the agent or the LLM is trying to call a tool. We can see that the tool that the agent is trying to call is core memory save. And we can see that the arguments of the function are section, human, memory. The human's name is Bob. Amazing. This is exactly what we wanted. The LLM is trying to save this information that the human's name is Bob, after seeing it in the chat history. All right, so that's pretty great. But OpenAI isn't actually going to execute the tool for you. That's something then you're going to have to do yourself. The first step to executing the tool is going to be to load the arguments. OpenAI's response passes the arguments as a stringified Json. That means that we can use the Json that loads function to load the arguments back into a dictionary. Now that we have the arguments in the dictionary, we just need to pass them to the actual function. We can use this star-star syntax to do that. Now that we have the arguments in the dictionary, all we need to do is run the function. After running the function, we can check our memory. And as expected, it got updated. Let's run the agent again and see how it responds differently now that the memory has been updated. In this request we're including the system prompt, the updated memory object and a question "what is my name?" As expected, the agent returns the reply "Your name is Bob." Congrats! You've now implemented a memory editing function. In our current implementation, the agent can only take one step at a time and can either edit the memory or respond to the user. However, if we want our agent to support multi-step reasoning, so can combine multiple actions together, we can implement an agentic loop by calling chat completions in a while loop, and allowing the agent to decide whether it wants to continue its reasoning steps or break out of the loop. For simplicity, we'll just assume that if the agent response is not a tool, we'll break out of the loop, and if it is a tool, we'll stay inside the loop. Let's start by resetting the agent memory. Next, we'll write a slightly revision to the system prompt. The system prompt has additional information on how the agent should use the tools. We're letting the agent know that it either needs to call a tool or write a response to the user. We're also telling the agent not to take the same actions multiple times. And lastly, we're also telling the agent that when it learns new information, it should always call the Core Memory Save tool. These are just additional instructions that help the agent understand how to use the tools that we're providing it. Our basic agent step function will just take one argument user message. Next, we need to prepare the inputs to the LLM call. In our case, we have the system prompt. We have the memory, and we also include the new user message. Next, let's actually start building a loop. So the loop begins with calling OpenAI's chat completion's API. Remember, we need to pass information to the API about how we should use this tool. Once we get a response back from the API, we append this message to the messages list. Now this is the important part. If the agent isn't calling a tool, then we want to break out of the loop by returning. On the other hand, if the agent is calling the tool, we're going to execute that tool and continue the loop. The first thing we do is print the tool just so that we can see it. Next, we need to load the arguments into a Python dictionary. Once we've loaded the arguments as a Python dictionary, we can actually execute the core memory save function. And lastly, once we've executed the tool call, we also need to inject the tool call response into the message history. This is the standard style that OpenAI expects you to follow. Now let's actually try running this agentic loop with a simple message. "My name is Bob." Great. So we can see that the agent did two things: It first called the tool. You can see that it updated the section of memory called human with the new memory, The user's name is Bob, just as we expected. Then the agent has a follow-up message. Amazing. We can see that the agent is able to both edit its memory and generate a response to the user that uses the updated memory, all in a single step. Although in this example we only support two actions, a single tool and responding to the user. This same structure can be used to implement much more complex reasoning loops that combine many different tools. In MemGPT, all actions, even in a response to a user as a tool. And some tools such as sending a message, are designed to break the reasoning loop, whereas other tools such as searching archival memory and editing memory are designed not to break the loop. Congratulations! You've now implemented an agent with self-editing, memory, and multi-step reasoning from scratch. In the next section will go deeper into how MemGPT agents actually work.

Learn Code

Next Lesson

LLMs as Operating Systems: Agent Memory

Introduction

Editable memory

Understanding MemGPT

Building Agents with Memory

Programming Agent Memory

Agentic RAG and External Memory

Multi-agent Orchestration

Conclusion

Appendix - Tips, Help, and Download

Course Feedback

Community