您的位置:首页 > 汽车 > 新车 > DeepSpeed入门

DeepSpeed入门

2025/10/24 19:21:24 来源:https://blog.csdn.net/smartcat2010/article/details/139566617  浏览:    关键词:DeepSpeed入门

pip install deepspeed

支持transformers: --deepspeed,以及config文件;

model_engine, optimizer, _, _ = deepspeed.initialize(args=cmd_args,model=model,model_parameters=params)

分布式和mixed-precision等,都包含在deepspeed.initialize和model_engine里面了;

删掉: torch.distributed.init_process_group(...)

for step, batch in enumerate(data_loader):#forward() methodloss = model_engine(batch)#runs backpropagationmodel_engine.backward(loss)#weight updatemodel_engine.step()

Gradient Average: 在model_engine.backward里自动解决;

Loss Scaling: 自动解决;

Learning Rate Scheduler: model_engin.step里自动解决;

save&load: (model、optimizer、lr scheduler状态,都存下来)(client_sd是用户自定义数据)

_, client_sd = model_engine.load_checkpoint(args.load_dir, args.ckpt_id)
step = client_sd['step']
...
if step % args.save_interval:client_sd['step'] = stepckpt_id = loss.item()model_engine.save_checkpoint(args.save_dir, ckpt_id, client_sd = client_sd)

配置文件:(例如名为ds_config.json)

{"train_batch_size": 8,"gradient_accumulation_steps": 1,"optimizer": {"type": "Adam","params": {"lr": 0.00015}},"fp16": {"enabled": true},"zero_optimization": true
}

hostfile: (和OpenMPI、Horovord兼容)(hostname GPU个数)

worker-1 slots=4
worker-2 slots=4

启动命令:

deepspeed --hostfile=myhostfile <client_entry.py> <client args> \--deepspeed --deepspeed_config ds_config.json

--num_nodes: 在几台机器上跑;

--num_gpus:在几张GPU卡上跑;

--include: 白名单节点和GPU编号;例:--include="worker-2:0,1"

--exclude: 黑名单节点和GPU编号;例:--exclude="worker-2:0@worker-3:0,1"

环境变量:

运行起来会被设置到所有node上;

".deepspeed_env"文件;放运行目录下,或者~/;例:

NCCL_IB_DISABLE=1
NCCL_SOCKET_IFNAME=eth0

在一台机器上运行"deepspeed"命令,会在所有node上launch进程;

也支持mpirun方式来launch,但通信后端用的仍是NCCL而不是MPI;

注意:

不支持CUDA_VISIBLE_DEVICES;只能这么来指定GPU:

deepspeed --include localhost:1 ...

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com