Optim ##### VMCOptimizer ======================== ``vmc/optim/optimizer/VMCOptimizer`` .. code-block:: python :linenos: class VMCOptimizer(BaseVMCOptimizer): def __init__( self, nqs: DDP, sampler_param: dict, electron_info: ElectronInfo, opt: Optimizer, lr_scheduler: Union[List[LRScheduler], LRScheduler] = None, max_iter: int = 2000, dtype: Dtype = None, external_model: any = None, check_point: str = None, read_model_only: bool = False, only_sample: bool = False, pre_CI: CIWavefunction = None, pre_train_info: dict = None, clean_opt_state: bool = False, noise_lambda: float = 0.05, sr: bool = False, sr_config: SRConfig | None = None, use_lm: bool = False, lm_config: LMConfig | None = None, use_rgn: bool = False, rgn_config: RGNConfig | None = None, interval: int = 100, prefix: str = "VMC", MAX_AD_DIM: int = -1, kfac: KFACPreconditioner | None = None, use_clip_grad: bool = False, max_grad_norm: float = 1.0, max_grad_value: float = 1.0, start_clip_grad: int = None, clip_grad_method: str = "l2", clip_grad_scheduler: Optional[Callable[[int], float]] = None, use_3sigma: bool = False, k_step_clip: int = 100, use_spin_raising: bool = False, spin_raising_coeff: float = 1.0, only_output_spin_raising: bool = False, spin_raising_scheduler: Optional[Callable[[int], float]] = None, ) .. _opt-params: ---------- opt-params ---------- .. code-block:: python :linenos: from utils import ElectronInfo, Dtype opt_type = optim.AdamW opt_params = {"lr": 0.001, "betas": (0.9, 0.999)} opt = opt_type(model.parameters(), **opt_params) prefix = "vmc" def clip_grad_scheduler(step): if step <= 4000: max_grad = 1.0 elif step <= 8000: max_grad = 0.1 else: max_grad = 0.01 return max_grad vmc_opt_params = { "nqs": model, "opt": opt, # "lr_scheduler": lr_scheduler, # "read_model_only": True, "dtype": dtype, "sampler_param": sampler_param, # "only_sample": True, "electron_info": electron_info, # "use_spin_raising": True, # "spin_raising_coeff": 1.0, # "only_output_spin_raising": True, "max_iter": 5000, "interval": 100, "MAX_AD_DIM": 80000, # "check_point": f"./h50/focus-init/checkpoint/H50-2.00-oao-mps-rnn-dcut-30-222-focus-20w-checkpoint.pth", "prefix": prefix, "use_clip_grad": True, "max_grad_norm": 1, "start_clip_grad": -1, "clip_grad_scheduler": clip_grad_scheduler, } * ``nqs``: Ansatz(e.g. **Transformer**, **MPS-RNN**, **Graph-MPS-RNN**). * ``opt``: Optimizer(e.g., **Adam**, **Adamw**, **SGD**). * ``lr_scheduler``: LRScheduler, Default: ``None``. * ``read_model_only``: Read model from the checkpoint file. * ``dtype``: data-dtype: (e.g., ``Dtype(dtype=torch.complex128, device="cuda")``) * ``sampler_param``: see :ref:`sample-params` * ``only_sample``: No calculating gradient. This is used to calculate energy. * ``max_iter``: the number of the iteration. * ``interval``: the time of the saving the checkpoint file. * ``MAX_AD_DIM``: the nbatch of the **backward**. * ``check_point``: Read model/optimizer/lr_scheduler from the checkpoint file, Default: ``None``. * ``prefix``: the prefix of the checkpoint file, e.g., ``vmc-checkpoint.pth``. * ``use_clip_grad``: clip gradient, Default: ``False``. * ``max_grad_norm``: the max of the l2-norm when clipping gradient. * ``start_clip_grad``: clip gradient from the k-th iteration. * ``clip_grad_scheduler``: the scheduler of clipping gradient, this is ``Callable[[int], float]``. * ``sr``: use minSR. * ``sr_config``: configure SR/minSR. ``damping_lambda`` can be either a positive constant or a callable ``Callable[[int], float]`` receiving the optimization step. The default is the constant ``1.0e-4``. .. code-block:: python :linenos: from pynqs.optim import SRConfig # constant damping sr_config = SRConfig(sr_method="minsr", damping_lambda=1.0e-4) # scheduled damping sr_config = SRConfig( sr_method="minsr", damping_lambda=lambda step: max(1.0e-4 * 0.95**step, 1.0e-6), ) * ``use_lm``: use the Linear method. * ``lm_config``: configure the Linear method through ``LMConfig``. The ``delta`` field can be either a non-negative constant or a callable ``Callable[[int], float]`` receiving the optimization step. With a constant value, PyNQS keeps the default schedule ``max(delta * 0.9**step, 1e-6)``. .. code-block:: python :linenos: from pynqs.optim import LMConfig lm_config = LMConfig(delta=0.1) lm_config = LMConfig(delta=lambda step: max(0.1 * 0.9**step, 1.0e-6)) * ``use_rgn``: use the Rayleigh-Gauss-Newton optimizer. * ``rgn_config``: configure RGN through ``RGNConfig``. The ``epsilon``, ``delta``, and ``damping_lambda`` fields can be constants or callables ``Callable[[int], float]`` receiving the optimization step. .. code-block:: python :linenos: from pynqs.optim import RGNConfig rgn_config = RGNConfig( epsilon=1.0, delta=0.0, damping_lambda=1.0e-3, ) Optimizer ========= ------------- Linear method ------------- Linear method ref. - J. Chem. Phys. 152, 024111 (2020); doi: 10.1063/1.5125803 - PHYSICAL REVIEW RESEARCH 7, 043351 (2025) Linear method的梯度计算在 ``pynqs/optim/grad/lm.py`` 的函数 ``LM_grad`` 中, 欲使用之,可直接在 class ``VMCOptimizer`` 中设置 ``use_lm=True``,并通过 ``lm_config`` 传入 ``LMConfig``。 除此之外,还有超参 :math:`\delta` 需要调整,对应 ``LMConfig.delta``,默认是 :math:`0.1`。如果 ``delta`` 是常数,则按照 ``delta = max(delta * 0.9**(epoch), 1e-6)`` 进行衰减。 如果 ``delta`` 是 ``Callable[[int], float]``,则每一步直接使用 ``delta(epoch)`` 的返回值作为 :math:`\delta`。 理论上在优化最后,这项应该衰减至 :math:`0`. 该段代码包含计算梯度和更新两部分,计算梯度按照 J. Chem. Phys. 152, 024111 (2020) 中的方式实现,具体公式推导见文档。 简而言之,最后是构造一下广义本征值问题(GEVP)并求解: .. math:: Lc = \tilde{E} Rc,\quad L = \begin{bmatrix} E & G^\top_{\rm r} \\ G_{\rm c} & H \end{bmatrix} + \delta I , R = \begin{bmatrix} 1&\\ &S \end{bmatrix}+\delta'I with .. math:: [G_{\rm r}]_i &= \langle{h_i(n)}\rangle - E\langle{g_i(n)}\rangle\\ [G_{\rm c}]_i &= \sum_n p(n) \bar{\epsilon}(n) O_i(n)\\ S_{ij} &= \sum_n p(n) \bar{o}_i(n)O_j(n)\\ H_{ij} &= \sum_n p(n)\bar{o}_i(n) h_j(n) - [G_{\rm c}]_i\langle{O_j(n)}\rangle where .. math:: \bar{\epsilon}(n) &= E_{\rm loc}(n) - \langle{E_{\rm loc}(n)}\rangle\\ \bar{o}_i(n) &= O_i(n) - \langle{O_i(n)}\rangle and .. math:: h_i(n) = \partial_i E_{\rm loc}(n) + O_i(n)E_{\rm loc}(n),\quad \partial_i E_{\rm loc}(n) = \frac{\partial}{\partial\theta_i}\sum_{m\in SD}H_{nm}\frac{\varPsi(m)}{\varPsi(n)} .. math:: g_i(n) &= \frac{\langle n|\varPsi_i\rangle}{\langle n|\varPsi\rangle} = \frac{1}{\varPsi(n)} \frac{\partial\varPsi(n)}{\partial\theta_i} = O_i(n) \label{eq:gi}\\ h_i(n) &= \frac{\langle n|\hat{H}|\varPsi_i\rangle}{\langle n|\varPsi\rangle} = \partial_i E_{\rm loc}(n) + O_i(n)E_{\rm loc}(n) 这里 :math:`|\varPsi_i\rangle = \partial_{\theta_i}|\varPsi\rangle`, 在更新的时候,实现了以上文章中类似线搜索的方式,见 ``try_step_update`` 函数。