什么是LLMs的反射调整?
通过自我反思来改善LLM微调
我现在已经在数据科学行业工作了5年,我见过任何话题发展得像生成式人工智能一样快。几乎每周都有重大进展。本周的重大突破是反射调整LLMs,通过使用这种技术,已经使得Llama 3.1 70B模型成为迄今为止最好的开源模型。
在我们继续之前,
什么是微调?
微调是将预训练的LLM调整到特定任务或数据集的过程,通过在更小、更专业的数据集上继续训练它。以下是一个示例,说明了LLM的微调过程:
常规微调样本
Input: 'Compose a narrative involving the theme of nature'
Output: '........'
Input: 'Identify elements in this passage: the house is blue'
Output: '......'
反思微调
在普通微调的基础上添加功能,反射微调
You are a world-class AI system, capable of complex reasoning and reflection.
Reason through the query inside <thinking> tags, and
then provide your final response inside <output> tags.
If you detect that you made a mistake in your reasoning at any point,
correct yourself inside <reflection> tags.
因此,最终的反思微调提示可能如下所示(您可以根据您的意愿更改标签,但这些标签产生了最佳结果)。
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a world-class AI system, capable of complex reasoning and reflection.
Reason through the query inside <thinking> tags, and
then provide your final response inside <output> tags.
If you detect that you made a mistake in your reasoning at any point,
correct yourself inside <reflection> tags.
<|eot_id|><|start_header_id|>user<|end_header_id|>
what is 2+2?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
LLM 反思后的产出
推理阶段:
- 该模型首先通过
标记生成其推理。该部分包含模型分析输入查询时的内部思维过程。
<thinking>
I need to determine the sum of 2 and 2.
This is a simple arithmetic problem.
</thinking>
错误检测和纠正:
- 在部分中,模型可能包括标签,如果它发现自己的推理中有任何错误。这表明该模型能够识别错误,并将在最终确定答案之前尝试更正它们。
<thinking>
I initially thought the answer was 5, but that's incorrect.
The correct approach is to add the two numbers.
<reflection>
Correcting my previous thought: 2 + 2 = 4.
</reflection>
</thinking>
最终输出:
- 一旦模型对其推理感到满意,它将在
<output>
The answer is 4.
</output>
因此,最终输出是
<thinking>
I need to determine the sum of 2 and 2.
This is a simple arithmetic problem.
</thinking>
<thinking>
I initially thought the answer was 5, but that's incorrect.
The correct approach is to add the two numbers.
<reflection>
Correcting my previous thought: 2 + 2 = 4.
</reflection>
</thinking>
<output>
The answer is 4.
</output>
反射微调似乎是一个很好的解决方案,甚至使得Llama3.1 70B成为迄今为止最好的开源模型,只需进行反射微调。请尝试使用反射调整对其他LLM进行unsloth和本提示。您可以在下方查看Llama3.1反射微调模型:
希望这对你有所帮助,很快见到你。