Migrating off ConversationBufferWindowMemory or ConversationTokenBufferMemory
Follow this guide if you're trying to migrate off one of the old memory classes listed below:
Memory Type | Description |
---|---|
ConversationBufferWindowMemory | Keeps the last n messages of the conversation. Drops the oldest messages when there are more than n messages. |
ConversationTokenBufferMemory | Keeps only the most recent messages in the conversation under the constraint that the total number of tokens in the conversation does not exceed a certain limit. |
ConversationBufferWindowMemory
and ConversationTokenBufferMemory
apply additional processing on top of the raw conversation history to trim the conversation history to a size that fits inside the context window of a chat model.
This processing functionality can be accomplished using LangChain's built-in trim_messages function.
Weโll begin by exploring a straightforward method that involves applying processing logic to the entire conversation history.
While this approach is easy to implement, it has a downside: as the conversation grows, so does the latency, since the logic is re-applied to all previous exchanges in the conversation at each turn.
More advanced strategies focus on incrementally updating the conversation history to avoid redundant processing.
For instance, the langgraph how-to guide on summarization demonstrates how to maintain a running summary of the conversation while discarding older messages, ensuring they aren't re-processed during later turns.
Set upโ
%%capture --no-stderr
%pip install --upgrade --quiet langchain-openai langchain
import os
from getpass import getpass
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass()
Legacy usage with LLMChain / Conversation Chainโ
Details
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferWindowMemory
from langchain_core.messages import SystemMessage
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.prompts.chat import (
ChatPromptTemplate,
HumanMessagePromptTemplate,
MessagesPlaceholder,
)
from langchain_openai import ChatOpenAI
prompt = ChatPromptTemplate(
[
SystemMessage(content="You are a helpful assistant."),
MessagesPlaceholder(variable_name="chat_history"),
HumanMessagePromptTemplate.from_template("{text}"),
]
)
memory = ConversationBufferWindowMemory(memory_key="chat_history", return_messages=True)
legacy_chain = LLMChain(
llm=ChatOpenAI(),
prompt=prompt,
memory=memory,
)
legacy_result = legacy_chain.invoke({"text": "my name is bob"})
print(legacy_result)
legacy_result = legacy_chain.invoke({"text": "what was my name"})
print(legacy_result)
{'text': 'Nice to meet you, Bob! How can I assist you today?', 'chat_history': []}
{'text': 'Your name is Bob. How can I assist you further, Bob?', 'chat_history': [HumanMessage(content='my name is bob', additional_kwargs={}, response_metadata={}), AIMessage(content='Nice to meet you, Bob! How can I assist you today?', additional_kwargs={}, response_metadata={})]}
Reimplementing ConversationBufferWindowMemory logicโ
Let's first create appropriate logic to process the conversation history, and then we'll see how to integrate it into an application. You can later replace this basic setup with more advanced logic tailored to your specific needs.
We'll use trim_messages
to implement logic that keeps the last n
messages of the conversation. It will drop the oldest messages when the number of messages exceeds n
.
In addition, we will also keep the system message if it's present -- when present, it's the first message in a conversation that includes instructions for the chat model.
from langchain_core.messages import (
AIMessage,
BaseMessage,
HumanMessage,
SystemMessage,
trim_messages,
)
from langchain_openai import ChatOpenAI
messages = [
SystemMessage("you're a good assistant, you always respond with a joke."),
HumanMessage("i wonder why it's called langchain"),
AIMessage(
'Well, I guess they thought "WordRope" and "SentenceString" just didn\'t have the same ring to it!'
),
HumanMessage("and who is harrison chasing anyways"),
AIMessage(
"Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!"
),
HumanMessage("why is 42 always the answer?"),
AIMessage(
"Because itโs the only number thatโs constantly right, even when it doesnโt add up!"
),
HumanMessage("What did the cow say?"),
]
from langchain_core.messages import trim_messages
selected_messages = trim_messages(
messages,
token_counter=len, # <-- len will simply count the number of messages rather than tokens
max_tokens=5, # <-- allow up to 5 messages.
strategy="last",
# The start_on is specified
# to make sure we do not generate a sequence where
# a ToolMessage that contains the result of a tool invocation
# appears before the AIMessage that requested a tool invocation
# as this will cause some chat models to raise an error.
start_on=("human", "ai"),
include_system=True, # <-- Keep the system message
allow_partial=False,
)
for msg in selected_messages:
msg.pretty_print()
================================[1m System Message [0m================================
you're a good assistant, you always respond with a joke.
==================================[1m Ai Message [0m==================================
Hmmm let me think.
Why, he's probably chasing after the last cup of coffee in the office!
================================[1m Human Message [0m=================================
why is 42 always the answer?
==================================[1m Ai Message [0m==================================
Because itโs the only number thatโs constantly right, even when it doesnโt add up!
================================[1m Human Message [0m=================================
What did the cow say?