蓝狮娱乐-蓝狮注册登录站

深度学习--优化器篇(超详细付代码测试流程包含:SGD,SGDM,NAG,Adagrad,RMSProp,Adam,Adadelta,Nadam等常用优化器)

日期:2024-04-29 03:31 / 作者:佚名


W_{new}=W_{old}- \\alpha * ΔT(W_{old}) \\\\

V_{new}=V_{old}*\\eta - \\alpha * ΔT(W_{old}) \\\\W_{new}=W_{old}+ V_{new}\\\\

W_{new}=V_{old}*\\eta - \\alpha * ΔT(W_{old}) + W_{old}\\\\

W_{feature}=V_{old}*\\eta + W_{old}\\\\

V_{new}=V_{old}*\\eta - \\alpha * ΔT(W_{feature}) \\\\

class torch.optim.SGD(params, lr=, momentum=0, dampening=0, weight_decay=0, nesterov=False)[source]

cache_{new}=cache_{old}+[ΔT(W_{old})]^2 \\\\W_{new}=W_{old}-  \\frac{\\alpha}{\\sqrt{cache_{new}+ \\epsilon}}* ΔT(W_{old}) \\\\

torch.optim.Adagrad(params, lr=0.01, lr_decay=0, weight_decay=0, initial_accumulator_value=0)


cache_{new}=cache_{old}* \\gamma + (1- \\gamma) *[ΔT(W_{old})]^2 \\\\W_{new}=W_{old}-  \\frac{\\alpha}{\\sqrt{cache_{new}+ \\epsilon}}* ΔT(W_{old}) \\\\

E(cache_{new}^2)=E(cache_{old}^2) * \\gamma + (1- \\gamma) *[ΔT(W_{old})]^2 \\\\W_{new}=W_{old}-  \\frac{\\alpha}{\\sqrt{E(cache_{new}^2) + \\epsilon}}* ΔT(W_{old}) \\\\

torch.optim.RMSprop(params, lr=0.01, alpha=0.99, eps=1e-08, weight_decay=0, momentum=0, centered=False)

ΔW_{AdaGrad}=-  \\frac{\\alpha}{\\sqrt{cache_{new}+ \\epsilon}}* ΔT(W_{old}) \\\\ΔW_{RMSprop}=-  \\frac{\\alpha}{\\sqrt{E(cache_{new}^2) + \\epsilon}}* ΔT(W_{old}) \\\\ΔW=- \\frac{ΔW_{AdaGrad}}{ΔW_{RMSprop}}\\\\W_{new}=W_{old}+ ΔW \\\\

torch.optim.Adadelta(params, lr=1.0, rho=0.9, eps=1e-06, weight_decay=0)

m_{new}=\\beta1 * m_{old}- (1 - \\beta) * ΔT(W_{old}) \\\\

cache_{new}=cache_{old}* \\beta2 + (1- \\beta2) *[ΔT(W_{old})]^2 \\\\W_{new}=W_{old}- \\frac{\\alpha}{\\sqrt{cache_{new}+ \\epsilon}}* m_{new}\\\\

torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)
torch.optim.NAdam(params, lr=0.002, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, momentum_decay=0.004)



其他1 其他2 其他3 其他4 其他5 其他6 其他7

平台注册入口