# 从最后一个检查点继续微调

您必须编辑 `Trainer（训练器）` 首先以添加 `save_strategy` 和 `save_steps`。下面每50步将检查点保存到文件夹 `outputs（输出）`.

```python
tokenizer = tokenizer,
    ....
    args = TrainingArguments(
        ....
        output_dir = "outputs",
        save_strategy = "steps",
        save_steps = 50,
    ),
)
```

然后在训练器中这样做：

```python
trainer_stats = trainer.train(resume_from_checkpoint = True)
```

这将从最新的检查点开始并继续训练。

### Wandb 集成

```
# 安装库
!pip install wandb --upgrade

# 设置 Wandb
!wandb login <token>

import os

os.environ["WANDB_PROJECT"] = "<name>"
os.environ["WANDB_LOG_MODEL"] = "checkpoint"
```

然后在 `TrainingArguments()（训练参数）` 设置

```
report_to = "wandb",
logging_steps = 1, # 如有需要可更改
save_steps = 100 # 如有需要可更改
run_name = "<name>" #（可选）
```

要训练模型，请执行 `trainer.train()`；要恢复训练，请执行

```
import wandb
run = wandb.init()
artifact = run.use_artifact('<username>/<Wandb-project-name>/<run-id>', type='model')
artifact_dir = artifact.download()
trainer.train(resume_from_checkpoint=artifact_dir)
```

## :question:如何进行早停（Early Stopping）？

如果您想在评估损失不下降时停止或暂停微调/训练运行，则可以使用早停来停止训练过程。使用 `EarlyStoppingCallback`.

像往常一样，设置您的训练器和评估数据集。下面用于在 `eval_loss` （评估损失）在大约3步后仍未下降时停止训练运行。

```python
trainer = SFTTrainer(
tokenizer = tokenizer,
    max_steps = 30,
        fp16_full_eval = True,
        per_device_eval_batch_size = 2,
        eval_accumulation_steps = 4,
        output_dir = "training_checkpoints", # 用于早停的已保存检查点的位置
        save_strategy = "steps",             # 每 N 步保存模型
        save_steps = 10,                     # 多少步后保存模型
        save_total_limit = 3,                # 仅保留 3 个已保存的检查点以节省磁盘空间
        eval_strategy = "steps",             # 每 N 步评估一次
        eval_steps = 10,                     # 多少步后进行评估
        load_best_model_at_end = True,       # 必须用于早停
        metric_for_best_model = "eval_loss", # 我们希望根据其进行早停的度量
        greater_is_better = False,           # 评估损失越低越好
    ),
    model = model,
    args = SFTConfig(
    train_dataset = new_dataset["train"],
    eval_dataset = new_dataset["test"],
)
```

然后我们添加回调，该回调也可以自定义：

```python
from transformers import EarlyStoppingCallback
early_stopping_callback = EarlyStoppingCallback(
    early_stopping_patience = 3,     # 如果评估损失不下降，我们将等待多少步
                                     # 例如损失可能会增加，但在 3 步后下降
    early_stopping_threshold = 0.0,  # 可以设得更高 - 设置损失应下降多少直到
                                     # 我们认为应进行早停。例如 0.01 表示如果损失从
                                     # 0.02 变为 0.01，我们会考虑提前停止运行。
)
trainer.add_callback(early_stopping_callback)
```

然后像往常一样通过以下方式训练模型： `trainer.train() 。`


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://unsloth.ai/docs/zh/ji-chu/finetuning-from-last-checkpoint.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
