Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5195

HSM: mdt_hsm_cdt_actions.c:104:cdt_llog_process() failed to process HSM_ACTIONS llog

    XMLWordPrintable

Details

    • 3
    • 14513

    Description

      Several times while testing HSM in a virtual environment (Centos 6.5 + Lustre 2.5.1 on clients and servers), we've observed what may be HSM_ACTIONS llog corruption.

      Here's our internal bug description:
      A Lustre filesystem where HSM and changelogs were used started misbehaving. The system was rebooted, and started spewing a lot of these traces in the system log:

      <3>LustreError: 2990:0:(mdt_hsm_cdt_actions.c:104:cdt_llog_process()) tas01-MDT0000: failed to process HSM_ACTIONS llog (rc=-2)
      <3>LustreError: 2990:0:(mdt_hsm_cdt_actions.c:104:cdt_llog_process()) Skipped 600 previous similar messages
      <3>LustreError: 2990:0:(llog_cat.c:192:llog_cat_id2handle()) tas01-MDD0000: error opening log id 0x1c:1:0: rc = -2
      <3>LustreError: 2990:0:(llog_cat.c:192:llog_cat_id2handle()) Skipped 600 previous similar messages
      <3>LustreError: 2990:0:(llog_cat.c:556:llog_cat_process_cb()) tas01-MDD0000: cannot find handle for llog 0x1c:1: -2
      <3>LustreError: 2990:0:(llog_cat.c:556:llog_cat_process_cb()) Skipped 600 previous similar messages
      <3>LustreError: 2990:0:(mdt_hsm_cdt_actions.c:104:cdt_llog_process()) tas01-MDT0000: failed to process HSM_ACTIONS llog (rc=-2)
      <3>LustreError: 2990:0:(mdt_hsm_cdt_actions.c:104:cdt_llog_process()) Skipped 600 previous similar messages
      <3>LustreError: 2990:0:(llog_cat.c:192:llog_cat_id2handle()) tas01-MDD0000: error opening log id 0x1c:1:0: rc = -2
      <3>LustreError: 2990:0:(llog_cat.c:192:llog_cat_id2handle()) Skipped 600 previous similar messages

      At that point the MDS would not accept any HSM request, nor would it deliver any.

      The MGT/MDT were unmounted and remounted as ldisk, and the file hsm_actions was deleted. Lustre was then remounted, and HSM became usable again.

      We do not have a simple reproducer for this, but it has happened several times.

      Attachments

        Issue Links

          Activity

            People

              jamesanunez James Nunez (Inactive)
              paf Patrick Farrell
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: