Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1535

LustreError: 1843:0:(mds_open.c:1645:mds_close())

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 1.8.9
    • None
    • None
    • 3
    • 6379

    Description

      On the our customer lustre system, MDS thread hanged after the call traces. On MDS, the following messages showed up during the call traces.

      Jun 17 05:45:08 ALPL505 kernel: LustreError: 1843:0:(mds_open.c:1645:mds_close()) @@@ no handle for file close ino 21989538: cookie 0x1e6d8ca7fa6bf800  req@ffff810287df6400 x1401981983149299/t0 o35->c9344f7b-1e2a-0615-0b51-cbf06bb316a5@NET_0x500000a030235_UUID:0/0 lens 408/4896 e 0 to 0 dl 1339883114 ref 1 fl Interpret:/0/0 rc 0/0
      Jun 17 05:45:08 ALPL505 kernel: LustreError: 1843:0:(mds_open.c:1645:mds_close()) Skipped 1 previous similar message
      Jun 17 05:45:08 ALPL505 kernel: LustreError: 1843:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-116)  req@ffff810287df6400 x1401981983149299/t0 o35->c9344f7b-1e2a-0615-0b51-cbf06bb316a5@NET_0x500000a030235_UUID:0/0 lens 408/2928 e 0 to 0 dl 1339883114 ref 1 fl Interpret:/0/0 rc -116/0
      Jun 17 05:45:08 ALPL505 kernel: LustreError: 1843:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 1 previous similar message
      Jun 17 05:45:09 ALPL505 kernel: LustreError: 2131:0:(mds_open.c:1645:mds_close()) @@@ no handle for file close ino 21922157: cookie 0x1e6d8ca7fa68693c  req@ffff810237e71c00 x1401981983149371/t0 o35->c9344f7b-1e2a-0615-0b51-cbf06bb316a5@NET_0x500000a030235_UUID:0/0 lens 408/4896 e 0 to 0 dl 1339883115 ref 1 fl Interpret:/0/0 rc 0/0
      Jun 17 06:02:54 ALPL505 kernel: Lustre: Service thread pid 981 was inactive for 710.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      

      Attachments

        Activity

          [LU-1535] LustreError: 1843:0:(mds_open.c:1645:mds_close())
          pjones Peter Jones added a comment -

          Rather than put an incorrect release we can just leave the release fixed field empty and populate it if/when we form concrete plans for a 1.8.9 release.

          pjones Peter Jones added a comment - Rather than put an incorrect release we can just leave the release fixed field empty and populate it if/when we form concrete plans for a 1.8.9 release.
          laisiyao Lai Siyao added a comment -

          The actual fix version should be 1.8.9, if there will be one. But it's invalid to fill in that field because it doesn't exist. 1.8.8 is used.

          laisiyao Lai Siyao added a comment - The actual fix version should be 1.8.9, if there will be one. But it's invalid to fill in that field because it doesn't exist. 1.8.8 is used.
          laisiyao Lai Siyao added a comment -

          Cory, thanks for your clarification.

          laisiyao Lai Siyao added a comment - Cory, thanks for your clarification.
          spitzcor Cory Spitz added a comment -

          Lai, which patch landed? You marked this bug fixed for 2.3.0, but I don't see any master patches beyond LU-1128 and that was marked fixed for 2.2.0 and 2.1.2. change #3138 landed to b1_8 so maybe this ticket should be marked fixed for 1.8.9 instead, if that is possible.

          spitzcor Cory Spitz added a comment - Lai, which patch landed? You marked this bug fixed for 2.3.0, but I don't see any master patches beyond LU-1128 and that was marked fixed for 2.2.0 and 2.1.2. change #3138 landed to b1_8 so maybe this ticket should be marked fixed for 1.8.9 instead, if that is possible.
          laisiyao Lai Siyao added a comment -

          patched landed.

          laisiyao Lai Siyao added a comment - patched landed.
          laisiyao Lai Siyao added a comment -

          The issue LU-1128 fixed is for ldlm server, that is, it may occur on MGS, MDS and OSS.

          laisiyao Lai Siyao added a comment - The issue LU-1128 fixed is for ldlm server, that is, it may occur on MGS, MDS and OSS.

          Lai,
          thanks! we will try your backport patch. btw, this problem happened on MDS, does LU-1128 causes this problem on MDS as well?

          ihara Shuichi Ihara (Inactive) added a comment - Lai, thanks! we will try your backport patch. btw, this problem happened on MDS, does LU-1128 causes this problem on MDS as well?
          laisiyao Lai Siyao added a comment -

          This looks to be the same issue in http://jira.whamcloud.com/browse/LU-1128. I've backported the fix at: http://review.whamcloud.com/#change,3138. Could you find a way to verify?

          laisiyao Lai Siyao added a comment - This looks to be the same issue in http://jira.whamcloud.com/browse/LU-1128 . I've backported the fix at: http://review.whamcloud.com/#change,3138 . Could you find a way to verify?

          the Lustre-1.8.6-wc1 is running on this cluster.

          ihara Shuichi Ihara (Inactive) added a comment - the Lustre-1.8.6-wc1 is running on this cluster.

          Ihara, what version of Lustre is this?

          adilger Andreas Dilger added a comment - Ihara, what version of Lustre is this?

          People

            laisiyao Lai Siyao
            ihara Shuichi Ihara (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: