How to Build a Custom Conversational Agent with Langchain?

In this post, I explain how to build a custom conversational agent in Langchain.

How to Build a Custom Conversational Agent with Langchain?

If you are building a product on top of LLMs, you may have heard of Langchain. Langchain is an open-source, opinionated framework for working with a variety of large language models.

LangChain is a framework for developing applications powered by language models. It enables applications that are:

Data-aware: connect a language model to other sources of data

Agentic: allow a language model to interact with its environment

The main value propositions of LangChain are:

Components: abstractions for working with language models, along with a collection of implementations for each abstraction. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not.

Off-the-shelf chains: a structured assembly of components for accomplishing specific higher-level tasks.

Off-the-shelf chains make it easy to get started. For more complex applications and nuanced use cases, components make it easy to customize existing chains or build new ones.

You can find more details about how Langchain works in their documentation, which is excellent and covers most of the essential parts of the framework.

Custom Agents
One of the most common requests we’ve heard is better functionality and documentation for creating custom agents. This has always been a bit tricky - because in our mind it’s actually still very unclear what an “agent” actually is, and therefor what the “right” abstractions for them may be. Recently…

In this post, I will explain how to build a custom conversational agent in Langchain. The documentation only talks about custom LLM agents that use the React framework and tools to answer, and the default Langchain conversational agent may not be suitable for all use cases. So, what if your agent needs to talk to the user in a specific way and perform some of its tasks with tools?

A Langchain agent has three parts:

  1. PromptTemplate: the prompt that tells the LLM how it should behave.
  2. OutputParser: this parses the output of the LLM and decides if any tools should be called or not.
  3. AgentClass: a Python class that inherits from the Langchain Agent class to inform Langchain that our class is an agent.

First, let's create the prompt template.

PREFIX = """You are an AI agent that does its best to assist the user. You may need use the following tools:
Your Tools:
{tools}

----
Continuously review and analyze your actions to ensure you are performing to the best of your abilities.
Constructively self-criticize your big-picture behavior constantly.
Reflect on past decisions and strategies to refine your approach.

You should only respond in JSON format as described below:
Response Format:
{format_instructions}
Ensure the response can be parsed by Python json.loads
"""

FORMAT_INSTRUCTIONS = """{
    "thoughts": {
                "text": "thought",
                "speak": "thoughts summary to say to user",
            },
            "tool": {"name": "tool name", "input": "value"},
        }
"""
SUFFIX = """
USER'S INPUT
--------------------
Talks friendly, short. NOTE to the Response Format is mandatory.
USER INPUT: {input}
"""

Variables:

  1. tools: This section lists the tools that the agent has accessed and can use. Each tool has a name and description, and the agent classifies them based on these attributes to decide which tool to call based on user input.
  2. format_instructions: This variable tells the LLM how to respond. The LLM's responses should follow a format that we can parse.
  3. input: This variable contains general input from the user.

We now need to implement the output parser.

The output parser is responsible for parsing the LLM output into AgentAction and AgentFinish. This usually depends heavily on the prompt used.
This is where you can change the parsing to do retries, handle whitespace, etc
from langchain.agents.agent import Agent, AgentOutputParser
from typing import Union
from langchain.agents import AgentOutputParser
from langchain.schema import AgentAction, AgentFinish
import re
import json
def preprocess_json_input(input_str: str) -> str:
    """Preprocesses a string to be parsed as json.

    Replace single backslashes with double backslashes,
    while leaving already escaped ones intact.

    Args:
        input_str: String to be preprocessed

    Returns:
        Preprocessed string
    """
    corrected_str = re.sub(
        r'(?<!\\\\)\\\\(?!["\\\\/bfnrt]|u[0-9a-fA-F]{4})', r"\\\\\\\\", input_str
    )
    return corrected_str

class CustomOutputParser(AgentOutputParser):
    def parse(self, llm_output: str) -> Union[AgentAction, AgentFinish]:
        try:
            parsed = json.loads(llm_output, strict=False)
        except json.JSONDecodeError:
            preprocessed_text = preprocess_json_input(llm_output)
            try:
                parsed = json.loads(preprocessed_text, strict=False)
            except Exception:
                raise ValueError(f"Could not parse LLM output: `{llm_output}`")
        if "tool" in parsed and "name" in parsed["tool"]:
            return AgentAction(tool=parsed["tool"]["name"], tool_input=parsed["tool"]["input"], log=llm_output)
        return AgentFinish(
            return_values={"output": llm_output},
            log=llm_output,
        )

Okay, we have the prompt and parser, and now we need to build the final part.

from __future__ import annotations
from typing import Any, List, Optional, Sequence, Tuple
from pydantic import Field
from langchain.agents.utils import validate_tools_single_input
from langchain.base_language import BaseLanguageModel
from langchain.callbacks.base import BaseCallbackManager
from langchain.chains import LLMChain
from langchain.prompts.base import BasePromptTemplate
from langchain.prompts.chat import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    MessagesPlaceholder,
)
from langchain.schema import (
    AgentAction,
    AIMessage,
    BaseMessage,
    BaseOutputParser,
    HumanMessage,
    SystemMessage,
)
from langchain.tools.base import BaseTool
from langchain.callbacks.base import BaseCallbackManager

class CustomChatAgent(Agent):
    output_parser: AgentOutputParser = Field(
        default_factory=CustomOutputParser)

    @classmethod
    def _get_default_output_parser(cls, **kwargs: Any) -> AgentOutputParser:
        return CustomOutputParser()

    @property
    def _agent_type(self) -> str:
        raise NotImplementedError

    @property
    def observation_prefix(self) -> str:
        """Prefix to append the observation with."""
        return "Observe: "

    @property
    def llm_prefix(self) -> str:
        """Prefix to append the llm call with."""
        return ""

    @classmethod
    def _validate_tools(cls, tools: Sequence[BaseTool]) -> None:
        super()._validate_tools(tools)
        validate_tools_single_input(cls.__name__, tools)

    @classmethod
    def create_prompt(
        cls,
        tools: Sequence[BaseTool],
        system_message: str = PREFIX,
        human_message: str = SUFFIX,
        formats: str = FORMAT_INSTRUCTIONS,
        input_variables: Optional[List[str]] = None,
        output_parser: Optional[BaseOutputParser] = None,
    ) -> BasePromptTemplate:
        tool_strings = "\\n".join(
            [f"> {tool.name}: {tool.description}" for tool in tools]
        )
        _output_parser = output_parser or cls._get_default_output_parser()
        system_message = system_message.format(
            format_instructions=formats,
            tools=tool_strings,
        )
        if input_variables is None:
            input_variables = ["input", "chat_history", "agent_scratchpad"]
        messages = [
            SystemMessage(content=system_message),
            MessagesPlaceholder(variable_name="chat_history"),
            HumanMessagePromptTemplate.from_template(human_message),
            MessagesPlaceholder(variable_name="agent_scratchpad")
        ]
        return ChatPromptTemplate(input_variables=input_variables, messages=messages)

    def _construct_scratchpad(
        self, intermediate_steps: List[Tuple[AgentAction, str]]
    ) -> List[BaseMessage]:
        """Construct the scratchpad that lets the agent continue its thought process."""
        thoughts: List[BaseMessage] = []
        for action, observation in intermediate_steps:
            thoughts.append(AIMessage(content=action.log))
            human_message = HumanMessage(
                content=f"Observe: {observation}"
            )
            thoughts.append(human_message)
        return thoughts

    @classmethod
    def from_llm_and_tools(
        cls,
        llm: BaseLanguageModel,
        tools: Sequence[BaseTool],
        callback_manager: Optional[BaseCallbackManager] = None,
        output_parser: Optional[AgentOutputParser] = None,
        system_message: str = PREFIX,
        human_message: str = SUFFIX,
        formats: str = FORMAT_INSTRUCTIONS,
        input_variables: Optional[List[str]] = None,
        **kwargs: Any,
    ) -> Agent:
        """Construct an agent from an LLM and tools."""
        cls._validate_tools(tools)
        _output_parser = output_parser or cls._get_default_output_parser()
        prompt = cls.create_prompt(
            tools,
            system_message=system_message,
            human_message=human_message,
            formats=formats,
            input_variables=input_variables,
            output_parser=_output_parser,
        )
        llm_chain = LLMChain(
            llm=llm,
            prompt=prompt,
            callback_manager=callback_manager,
        )
        tool_names = [tool.name for tool in tools]
        return cls(
            llm_chain=llm_chain,
            allowed_tools=tool_names,
            output_parser=_output_parser,
            **kwargs,
        )

So, let's put all the parts together:

from langchain.agents import Tool, AgentExecutor
from langchain.agents import load_tools
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
math_llm = OpenAI(temperature=0.0)
tools = load_tools(
    ["human", "llm-math"],
    llm=math_llm,
)

We need memory for our agent to remember the conversation.

from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(memory_key="chat_history")
llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY, temperature=0)
agent = CustomChatAgent.from_llm_and_tools(
            llm=llm, tools=tools)
chain = AgentExecutor.from_agent_and_tools(
            agent=agent, tools=tools, verbose=True, memory=memory, stop=["Observe:"])
chain.run("hello")
> Entering new AgentExecutor chain...
{
    "thoughts": {
        "text": "I should greet the user back.",
        "speak": "Hello! How can I assist you today?"
    }
}

> Finished chain.
chain.run("What's my friend Eric's surname?")
> Entering new AgentExecutor chain...
{
    "thoughts": {
        "text": "The user is asking for their friend's surname. I need to find out Eric's surname.",
        "speak": "Let me find out Eric's surname for you."
    },
    "tool": {
        "name": "human",
        "input": "What is Eric's surname?"
    }
}

What is Eric's surname?

Observe: Zhu
{
    "thoughts": {
        "text": "The user provided the surname 'Zhu' for their friend Eric.",
        "speak": "Eric's surname is Zhu."
    }
}

> Finished chain.

Now, our agent can determine when to speak to the user and when to perform an action using tools.

This prompt and structure were inspired by the Langchain source code.

GitHub - hwchase17/langchain: ⚡ Building applications with LLMs through composability ⚡
⚡ Building applications with LLMs through composability ⚡ - GitHub - hwchase17/langchain: ⚡ Building applications with LLMs through composability ⚡