# 从最后一个检查点继续微调

你必须先编辑 `Trainer` 以添加 `save_strategy` 以及 `save_steps`。下面会每 50 步将一个检查点保存到文件夹 `outputs`.

```python
trainer = SFTTrainer(
    ....
    args = TrainingArguments(
        ....
        output_dir = "outputs",
        save_strategy = "steps",
        save_steps = 50,
    ),
)
```

然后在 trainer 中执行：

```python
trainer_stats = trainer.train(resume_from_checkpoint = True)
```

这将从最新的检查点开始并继续训练。

### Wandb 集成

```
# 安装库
!pip install wandb --upgrade

# 设置 Wandb
!wandb login <token>

import os

os.environ["WANDB_PROJECT"] = "<name>"
os.environ["WANDB_LOG_MODEL"] = "checkpoint"
```

然后在 `TrainingArguments()` 中设置

```
report_to = "wandb",
logging_steps = 1, # 如有需要可更改
save_steps = 100 # 如有需要可更改
run_name = "<name>" # （可选）
```

要训练模型，请执行 `trainer.train()`；要恢复训练，请执行

```
import wandb
run = wandb.init()
artifact = run.use_artifact('<username>/<Wandb-project-name>/<run-id>', type='model')
artifact_dir = artifact.download()
trainer.train(resume_from_checkpoint=artifact_dir)
```

## :question:我该如何进行早停？

如果你想在评估损失不再下降时停止或暂停微调/训练运行，那么你可以使用早停，它会停止训练过程。使用 `EarlyStoppingCallback`.

和往常一样，先设置好你的 trainer 和评估数据集。下面的设置用于在 `eval_loss` （评估损失）在大约 3 步之后不再下降时停止训练运行。

```python
from trl import SFTConfig, SFTTrainer
trainer = SFTTrainer(
    args = SFTConfig(
        fp16_full_eval = True,
        per_device_eval_batch_size = 2,
        eval_accumulation_steps = 4,
        output_dir = "training_checkpoints", # 用于早停的已保存检查点位置
        save_strategy = "steps",             # 每 N 步保存一次模型
        save_steps = 10,                     # 多少步之后保存模型
        save_total_limit = 3,                # 仅保留 3 个已保存检查点以节省磁盘空间
        eval_strategy = "steps",             # 每 N 步评估一次
        eval_steps = 10,                     # 多少步之后进行评估
        load_best_model_at_end = True,       # 早停时必须使用
        metric_for_best_model = "eval_loss", # 我们希望用于早停的指标
        greater_is_better = False,           # eval loss 越低越好
    ),
    model = model,
    tokenizer = tokenizer,
    train_dataset = new_dataset["train"],
    eval_dataset = new_dataset["test"],
)
```

然后我们添加回调，也可以进行自定义：

```python
from transformers import EarlyStoppingCallback
early_stopping_callback = EarlyStoppingCallback(
    early_stopping_patience = 3,     # 如果 eval loss 没有下降，我们会等待多少步
                                     # 例如，loss 可能会上升，但会在 3 步后下降
    early_stopping_threshold = 0.0,  # 可以设置得更高——用于设置 loss 需要下降多少才
                                     # 我们认为应该早停。例如 0.01 表示如果 loss 从
                                     # 0.02 降到 0.01，我们就认为应该提前停止该运行。
)
trainer.add_callback(early_stopping_callback)
```

然后像往常一样通过以下方式训练模型 `trainer.train() 。`


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://unsloth.ai/docs/zh/ji-chu/finetuning-from-last-checkpoint.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
