如何使用LangChain的输出解析器来调整语言模型的输出。

ChatGPT中文站 — Photo by Digital Content Writers India on Unsplash

介绍

我一直在使用Langchain的输出解析器来结构化语言模型的输出。我发现它是一个有用的工具，因为它使我能够以我想要的确切格式获取输出。

在本文中，我将分享我使用输出解析器的经验，讨论我如何使用它来构建不同语言模型的输出，并分享一些我发现的好处。

我希望这篇文章能够对任何对使用输出解析器感兴趣的人有所帮助。

以下是使用输出解析器的一些好处：

它可以帮助使语言模型的输出更加有结构和容易理解。
它可以用于获取比纯文本更有结构化的信息。
它可以定制以满足特定应用的需求。

实际运用中

假设我们想使用LLM来使用Go语言创建一个简单的TODO web API服务器。

首先，我们将定义输出结构。在本例中，它是一个带有“source_code”内容和文件名的“SourceCode”类。

from pydantic import BaseModel, Field, validator

class SourceCode(BaseModel):
    source_code: str = Field(description="The current source code")
    file_name: str = Field(description="The file name with extension for this code")

parser = PydanticOutputParser(pydantic_object=SourceCode)

然后，我们准备提出我们的提示来询问LLM。

from langchain.prompts import PromptTemplate

prompt = PromptTemplate(
    template="Provide the source code for the following requirement.\n{format_instructions}\n{requirement}\n",
    input_variables=["requirement"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

使用提示模板创建提示输入框。

todo_requirement = (
    "Create an TODO web api server in Go lang with CRUD operation endpoints."
)

_input = prompt.format_prompt(requirement=todo_requirement)

我们还可以在将输入发送给LLM之前检查其格式。

print(_input.to_string())

然后我们应该决定使用哪种LLM模型。我已经尝试过其中的几种，发现“text-davici-003”产生了更准确的输出。随意进行研究，找到更适合您需求的方式。

model_name = "text-davinci-003"
# model = OpenAI(model_name="text-ada-001", n=2, best_of=2)
temperature = 0.0
model = OpenAI(model_name=model_name, temperature=temperature)
output = model(_input.to_string())

# checking the output
# print(output)

这个没有像预期的那样工作，输出被切断并导致无法解析的非法JSON字符串。在进行一些研究后，原因是因为LangChain为OpenAI llm模型设置了默认限制为500个总令牌限制。令牌限制适用于输入和输出。这对于结果文本来说是不够的。为了避免这种情况，我需要使用tiktoken库来帮助我最大化令牌限制。

import tiktoken

encoding = tiktoken.encoding_for_model(model_name)
prompt_tokens = len(encoding.encode(_input.to_string()))

# ...
# text-davinci-003 model has a total token limit of 4097
model = OpenAI(model_name=model_name, temperature=temperature, max_tokens=4097-prompt_tokens)

这次，LLM 生成了预期的格式化输出，如下所示。

```
{
  "source_code": "package main

import (
        \"fmt\"
        \"net/http\"
)

func main() {
        http.HandleFunc(\"/todos\", todosHandler)
        http.ListenAndServe(\":8080\", nil)
}

func todosHandler(w http.ResponseWriter, r *http.Request) {
        switch r.Method {
        case \"GET\":
                // Handle GET request
        case \"POST\":
                // Handle POST request
        case \"PUT\":
                // Handle PUT request
        case \"DELETE\":
                // Handle DELETE request
        default:
                fmt.Fprintf(w, \"Method not supported\")
        }
}",
  "file_name": "todo.go"
}
```