Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-20311

hsm: lvm_hsm_ct_register double fput() -> kernel BUG

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Medium Medium
    • Lustre 2.18.0
    • None
    • None
    • 3
    • 9223372036854775807

      On a DNE Lustre client (MDSCOUNT ≥ 2), if any MDT's HSM coordinator is not enabled when a copytool calls llapi_hsm_copytool_register, the per-MDT ioctl returns -ENXIO, and lmv_hsm_ct_register double-fputs the kuc pipe file. When the copytool later closes the pipe fd, the kernel hits BUG_ON(f_count == 0) in filp_flush and panics.

      Cause: libcfs_kkuc_group_add consumes the caller's fget-ed reference rather than taking its own via get_file. The error labels in lmv_hsm_ct_register fall through err_kkuc_rem → err_fput, so after a per-MDT failure libcfs_kkuc_group_rem fputs the reference and err_fput fputs it again.

      Fix: libcfs_kkuc_group_add should take its own ref (reg->kr_fp = get_file(filp)), and lmv_hsm_ct_register should drop the RETURN(0) before err_kkuc_rem and guard libcfs_kkuc_group_rem with if (rc).

      Repro: mount Lustre with ≥2 MDTs; lctl set_param mdt.<fs>-MDT0000.hsm_control=enabled and mdt.<fs>-MDT0001.hsm_control=disabled; run any HSM copytool (e.g. lhsmtool_posix --archive=1 --hsm-root /tmp/a $MOUNT) on a client. Kernel panics within ~1 s. Reproduced on Lustre 2.17.53, kernel 6.12.0-124.56.1.el10_1.aarch64.

      Latent in upstream since LU-3365 (2014) because lmv_hsm_ct_register is unchanged since then and most deployments enable hsm_control on every MDT before any copytool registers.

            rread Robert Read
            rread Robert Read
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: