Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4552

osc_cache.c:899:osc_extent_wait() timeout quite often

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.5.0, Lustre 2.4.2
    • None
    • RHEL6
    • 2
    • 12438

    Description

      We hit client hangs quite often on the all login nodes and following Lustre error messages printed out. It can't be recovery until client reboots.

      Jan 22 17:23:23 ff01 kernel: LustreError: 84026:0,(osc_cache.c:899:osc_extent_wait()) extent ffff8831a49b0678@{[0 > 0/255], [3|0|+|rpc|wihY|ffff88283005bc48], [4096|1|+||ffff8828fb76b228|256|ffff88319695e040]} home2-OST000b-osc-ffff883fdbbd8800: wait ext to 0 timedout, recovery in progress?
      

      Attachments

        1. lctl.dk.23.17.tgz
          1.40 MB
          Shuichi Ihara
        2. lctl.dk.after.tgz
          885 kB
          Shuichi Ihara
        3. lctl.dk1.tgz
          0.2 kB
          Shuichi Ihara
        4. messages.after_call_trace
          1.75 MB
          Shuichi Ihara
        5. messages.after_osc_msg
          2.51 MB
          Shuichi Ihara
        6. messages.before_call_trace
          1.01 MB
          Shuichi Ihara

        Issue Links

          Activity

            [LU-4552] osc_cache.c:899:osc_extent_wait() timeout quite often
            bfaccini Bruno Faccini (Inactive) made changes -
            Link New: This issue is duplicated by LU-4300 [ LU-4300 ]
            pjones Peter Jones made changes -
            Resolution New: Duplicate [ 3 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            pjones Peter Jones added a comment -

            Thanks Ihara

            pjones Peter Jones added a comment - Thanks Ihara
            pjones Peter Jones made changes -
            Labels Original: p4j

            Bruno, Yes, as far as we tested, I think it's duplicated issue of LU-4300. Please close this ticket LU-4552.

            ihara Shuichi Ihara (Inactive) added a comment - Bruno, Yes, as far as we tested, I think it's duplicated issue of LU-4300 . Please close this ticket LU-4552 .

            Hello Shuichi, do you agree if I close this ticket as a dup of LU-4300 ?

            bfaccini Bruno Faccini (Inactive) added a comment - Hello Shuichi, do you agree if I close this ticket as a dup of LU-4300 ?

            Thanks, Bruno!
            After "echo 0 > /proc/fs/lustre/ldlm/namespaces/*/early_lock_cancel" setting, the problem was not reproduced. So, it looks like this is same problem to LU-4300. We tried a couple of times, but didn't happen anything and installer finisehd without errors.

            ihara Shuichi Ihara (Inactive) added a comment - Thanks, Bruno! After "echo 0 > /proc/fs/lustre/ldlm/namespaces/*/early_lock_cancel" setting, the problem was not reproduced. So, it looks like this is same problem to LU-4300 . We tried a couple of times, but didn't happen anything and installer finisehd without errors.

            Hello Shuichi,
            After having a look to the back-traces (still need to review the Lustre debug-logs!), your problem seems similar to the one reported in LU-4300.
            Also, could you try to run the same 100% reproducer on a node where ELC has been disabled ?? I think this can be set with "echo 0 > /proc/fs/lustre/ldlm/namespaces/*/early_lock_cancel".

            bfaccini Bruno Faccini (Inactive) added a comment - Hello Shuichi, After having a look to the back-traces (still need to review the Lustre debug-logs!), your problem seems similar to the one reported in LU-4300 . Also, could you try to run the same 100% reproducer on a node where ELC has been disabled ?? I think this can be set with "echo 0 > /proc/fs/lustre/ldlm/namespaces/*/early_lock_cancel".
            pjones Peter Jones made changes -
            Labels New: p4j
            ihara Shuichi Ihara (Inactive) made changes -
            Attachment New: messages.after_call_trace [ 14040 ]
            Attachment New: messages.after_osc_msg [ 14041 ]
            Attachment New: messages.before_call_trace [ 14042 ]

            People

              hongchao.zhang Hongchao Zhang
              ihara Shuichi Ihara (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: