什么是LLMs的反射调整?

通过自我反思来改善LLM微调

Photo by Stephen Andrews on Unsplash

我现在已经在数据科学行业工作了5年,我见过任何话题发展得像生成式人工智能一样快。几乎每周都有重大进展。本周的重大突破是反射调整LLMs,通过使用这种技术,已经使得Llama 3.1 70B模型成为迄今为止最好的开源模型。

在我们继续之前,

什么是微调?

微调是将预训练的LLM调整到特定任务或数据集的过程,通过在更小、更专业的数据集上继续训练它。以下是一个示例,说明了LLM的微调过程:

常规微调样本


Input: 'Compose a narrative involving the theme of nature'
Output: '........'

Input: 'Identify elements in this passage: the house is blue'
Output: '......'

反思微调

在普通微调的基础上添加功能,反射微调

You are a world-class AI system, capable of complex reasoning and reflection. 
Reason through the query inside <thinking> tags, and
then provide your final response inside <output> tags.
If you detect that you made a mistake in your reasoning at any point,
correct yourself inside <reflection> tags.

因此,最终的反思微调提示可能如下所示(您可以根据您的意愿更改标签,但这些标签产生了最佳结果)。

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a world-class AI system, capable of complex reasoning and reflection.
Reason through the query inside <thinking> tags, and
then provide your final response inside <output> tags.
If you detect that you made a mistake in your reasoning at any point,
correct yourself inside <reflection> tags.
<|eot_id|><|start_header_id|>user<|end_header_id|>

what is 2+2?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

LLM 反思后的产出

推理阶段:

  • 该模型首先通过标记生成其推理。该部分包含模型分析输入查询时的内部思维过程。
<thinking>
I need to determine the sum of 2 and 2.
This is a simple arithmetic problem.
</thinking>

错误检测和纠正:

  • 在部分中,模型可能包括标签,如果它发现自己的推理中有任何错误。这表明该模型能够识别错误,并将在最终确定答案之前尝试更正它们。
<thinking>
I initially thought the answer was 5, but that's incorrect.
The correct approach is to add the two numbers.
<reflection>
Correcting my previous thought: 2 + 2 = 4.
</reflection>
</thinking>

最终输出:

  • 一旦模型对其推理感到满意,它将在标签中提供最终答案。 本节介绍了从推理阶段得出的结论。
<output>
The answer is 4.
</output>

因此,最终输出是

<thinking>
I need to determine the sum of 2 and 2.
This is a simple arithmetic problem.
</thinking>

<thinking>
I initially thought the answer was 5, but that's incorrect.
The correct approach is to add the two numbers.
<reflection>
Correcting my previous thought: 2 + 2 = 4.
</reflection>
</thinking>

<output>
The answer is 4.
</output>

反射微调似乎是一个很好的解决方案,甚至使得Llama3.1 70B成为迄今为止最好的开源模型,只需进行反射微调。请尝试使用反射调整对其他LLM进行unsloth和本提示。您可以在下方查看Llama3.1反射微调模型:

希望这对你有所帮助,很快见到你。

2024-09-10 04:10:35 AI中文站翻译自原文