-
Bug
-
Resolution: Fixed
-
Medium
-
None
-
None
-
3
-
9223372036854775807
On a DNE Lustre client (MDSCOUNT ≥ 2), if any MDT's HSM coordinator is not enabled when a copytool calls llapi_hsm_copytool_register, the per-MDT ioctl returns -ENXIO, and lmv_hsm_ct_register double-fputs the kuc pipe file. When the copytool later closes the pipe fd, the kernel hits BUG_ON(f_count == 0) in filp_flush and panics.
Cause: libcfs_kkuc_group_add consumes the caller's fget-ed reference rather than taking its own via get_file. The error labels in lmv_hsm_ct_register fall through err_kkuc_rem → err_fput, so after a per-MDT failure libcfs_kkuc_group_rem fputs the reference and err_fput fputs it again.
Fix: libcfs_kkuc_group_add should take its own ref (reg->kr_fp = get_file(filp)), and lmv_hsm_ct_register should drop the RETURN(0) before err_kkuc_rem and guard libcfs_kkuc_group_rem with if (rc).
Repro: mount Lustre with ≥2 MDTs; lctl set_param mdt.<fs>-MDT0000.hsm_control=enabled and mdt.<fs>-MDT0001.hsm_control=disabled; run any HSM copytool (e.g. lhsmtool_posix --archive=1 --hsm-root /tmp/a $MOUNT) on a client. Kernel panics within ~1 s. Reproduced on Lustre 2.17.53, kernel 6.12.0-124.56.1.el10_1.aarch64.
Latent in upstream since LU-3365 (2014) because lmv_hsm_ct_register is unchanged since then and most deployments enable hsm_control on every MDT before any copytool registers.