Optim
VMCOptimizer
vmc/optim/optimizer/VMCOptimizer
1class VMCOptimizer(BaseVMCOptimizer):
2
3 def __init__(
4 self,
5 nqs: DDP,
6 sampler_param: dict,
7 electron_info: ElectronInfo,
8 opt: Optimizer,
9 lr_scheduler: Union[List[LRScheduler], LRScheduler] = None,
10 max_iter: int = 2000,
11 dtype: Dtype = None,
12 external_model: any = None,
13 check_point: str = None,
14 read_model_only: bool = False,
15 only_sample: bool = False,
16 pre_CI: CIWavefunction = None,
17 pre_train_info: dict = None,
18 clean_opt_state: bool = False,
19 noise_lambda: float = 0.05,
20 sr: bool = False,
21 sr_config: SRConfig | None = None,
22 use_lm: bool = False,
23 lm_config: LMConfig | None = None,
24 use_rgn: bool = False,
25 rgn_config: RGNConfig | None = None,
26 interval: int = 100,
27 prefix: str = "VMC",
28 MAX_AD_DIM: int = -1,
29 kfac: KFACPreconditioner | None = None,
30 use_clip_grad: bool = False,
31 max_grad_norm: float = 1.0,
32 max_grad_value: float = 1.0,
33 start_clip_grad: int = None,
34 clip_grad_method: str = "l2",
35 clip_grad_scheduler: Optional[Callable[[int], float]] = None,
36 use_3sigma: bool = False,
37 k_step_clip: int = 100,
38 use_spin_raising: bool = False,
39 spin_raising_coeff: float = 1.0,
40 only_output_spin_raising: bool = False,
41 spin_raising_scheduler: Optional[Callable[[int], float]] = None,
42 )
opt-params
1from utils import ElectronInfo, Dtype
2
3opt_type = optim.AdamW
4opt_params = {"lr": 0.001, "betas": (0.9, 0.999)}
5opt = opt_type(model.parameters(), **opt_params)
6
7prefix = "vmc"
8def clip_grad_scheduler(step):
9 if step <= 4000:
10 max_grad = 1.0
11 elif step <= 8000:
12 max_grad = 0.1
13 else:
14 max_grad = 0.01
15 return max_grad
16
17vmc_opt_params = {
18 "nqs": model,
19 "opt": opt,
20 # "lr_scheduler": lr_scheduler,
21 # "read_model_only": True,
22 "dtype": dtype,
23 "sampler_param": sampler_param,
24 # "only_sample": True,
25 "electron_info": electron_info,
26 # "use_spin_raising": True,
27 # "spin_raising_coeff": 1.0,
28 # "only_output_spin_raising": True,
29 "max_iter": 5000,
30 "interval": 100,
31 "MAX_AD_DIM": 80000,
32 # "check_point": f"./h50/focus-init/checkpoint/H50-2.00-oao-mps-rnn-dcut-30-222-focus-20w-checkpoint.pth",
33 "prefix": prefix,
34 "use_clip_grad": True,
35 "max_grad_norm": 1,
36 "start_clip_grad": -1,
37 "clip_grad_scheduler": clip_grad_scheduler,
38}
nqs: Ansatz(e.g. Transformer, MPS-RNN, Graph-MPS-RNN).opt: Optimizer(e.g., Adam, Adamw, SGD).lr_scheduler: LRScheduler, Default:None.read_model_only: Read model from the checkpoint file.dtype: data-dtype: (e.g.,Dtype(dtype=torch.complex128, device="cuda"))sampler_param: see sample-paramonly_sample: No calculating gradient. This is used to calculate energy.max_iter: the number of the iteration.interval: the time of the saving the checkpoint file.MAX_AD_DIM: the nbatch of the backward.check_point: Read model/optimizer/lr_scheduler from the checkpoint file, Default:None.prefix: the prefix of the checkpoint file, e.g.,vmc-checkpoint.pth.use_clip_grad: clip gradient, Default:False.max_grad_norm: the max of the l2-norm when clipping gradient.start_clip_grad: clip gradient from the k-th iteration.clip_grad_scheduler: the scheduler of clipping gradient, this isCallable[[int], float].sr: use minSR.sr_config: configure SR/minSR.damping_lambdacan be either a positive constant or a callableCallable[[int], float]receiving the optimization step. The default is the constant1.0e-4.
1from pynqs.optim import SRConfig
2
3# constant damping
4sr_config = SRConfig(sr_method="minsr", damping_lambda=1.0e-4)
5
6# scheduled damping
7sr_config = SRConfig(
8 sr_method="minsr",
9 damping_lambda=lambda step: max(1.0e-4 * 0.95**step, 1.0e-6),
10)
use_lm: use the Linear method.lm_config: configure the Linear method throughLMConfig. Thedeltafield can be either a non-negative constant or a callableCallable[[int], float]receiving the optimization step. With a constant value, PyNQS keeps the default schedulemax(delta * 0.9**step, 1e-6).
1from pynqs.optim import LMConfig
2
3lm_config = LMConfig(delta=0.1)
4lm_config = LMConfig(delta=lambda step: max(0.1 * 0.9**step, 1.0e-6))
use_rgn: use the Rayleigh-Gauss-Newton optimizer.rgn_config: configure RGN throughRGNConfig. Theepsilon,delta, anddamping_lambdafields can be constants or callablesCallable[[int], float]receiving the optimization step.
1from pynqs.optim import RGNConfig
2
3rgn_config = RGNConfig(
4 epsilon=1.0,
5 delta=0.0,
6 damping_lambda=1.0e-3,
7)
Optimizer
Linear method
Linear method ref.
Chem. Phys. 152, 024111 (2020); doi: 10.1063/1.5125803
PHYSICAL REVIEW RESEARCH 7, 043351 (2025)
Linear method的梯度计算在 pynqs/optim/grad/lm.py 的函数 LM_grad 中,
欲使用之,可直接在 class VMCOptimizer 中设置 use_lm=True,并通过 lm_config 传入 LMConfig。
除此之外,还有超参 \(\delta\) 需要调整,对应 LMConfig.delta,默认是 \(0.1\)。如果 delta 是常数,则按照 delta = max(delta * 0.9**(epoch), 1e-6) 进行衰减。
如果 delta 是 Callable[[int], float],则每一步直接使用 delta(epoch) 的返回值作为 \(\delta\)。
理论上在优化最后,这项应该衰减至 \(0\).
该段代码包含计算梯度和更新两部分,计算梯度按照 J. Chem. Phys. 152, 024111 (2020) 中的方式实现,具体公式推导见文档。 简而言之,最后是构造一下广义本征值问题(GEVP)并求解:
with
where
and
这里 \(|\varPsi_i\rangle = \partial_{\theta_i}|\varPsi\rangle\),
在更新的时候,实现了以上文章中类似线搜索的方式,见 try_step_update 函数。