Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4705

LustreError: 89827:0:(mdc_locks.c:916:mdc_enqueue()) ldlm_cli_enqueue: -2

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.11.0, Lustre 2.10.2
    • Lustre 2.5.1
    • None
    • Running tip of Lustre b2_5, 1 MGS, 1 MDS, 2 OSS, 12 clients.
    • 3
    • 12942

    Description

      Unexpected MDC LustreError's on most clients.

      Client 10:
      Mar 4 03:27:11 lustre10 kernel: LustreError: 183913:0:(mdc_locks.c:916:mdc_enqueue()) ldlm_cli_enqueue: -2

      Client 11:
      Mar 4 00:37:25 lustre11 kernel: LustreError: 89827:0:(mdc_locks.c:916:mdc_enqueue()) ldlm_cli_enqueue: -2

      Client 12:
      Mar 4 00:39:36 lustre12 kernel: LustreError: 11-0: cal-MDT0000-mdc-ffff8807b75c4000: Communicating with 192.168.20.1@tcp1, operation ldlm_enqueue failed with -116.
      Mar 4 00:39:36 lustre12 kernel: LustreError: 70225:0:(mdc_locks.c:916:mdc_enqueue()) ldlm_cli_enqueue: -116
      Mar 4 00:39:36 lustre12 kernel: LustreError: 70225:0:(vvp_io.c:1227:vvp_io_init()) cal: refresh file layout [0x200001c0b:0x176e:0x0] error -116.
      Mar 4 03:09:33 lustre12 kernel: LustreError: 70225:0:(mdc_locks.c:916:mdc_enqueue()) ldlm_cli_enqueue: -2

      Client 13:
      Mar 4 00:29:54 lustre13 kernel: LustreError: 167294:0:(mdc_locks.c:916:mdc_enqueue()) ldlm_cli_enqueue: -2

      Client 14:
      Mar 4 01:18:04 lustre14 kernel: LustreError: 11-0: cal-MDT0000-mdc-ffff880787af8400: Communicating with 192.168.20.1@tcp1, operation ldlm_enqueue failed with -116.
      Mar 4 01:18:04 lustre14 kernel: LustreError: 11503:0:(mdc_locks.c:916:mdc_enqueue()) ldlm_cli_enqueue: -116
      Mar 4 01:18:04 lustre14 kernel: LustreError: 11503:0:(vvp_io.c:1227:vvp_io_init()) cal: refresh file layout [0x200001c12:0xbbe2:0x0] error -116.

      Client 16:
      Mar 4 01:00:46 lustre16 kernel: LustreError: 141605:0:(mdc_locks.c:916:mdc_enqueue()) ldlm_cli_enqueue: -2

      Client 17:
      Mar 4 00:13:39 lustre17 kernel: LustreError: 11-0: cal-MDT0000-mdc-ffff8808038aa000: Communicating with 192.168.20.1@tcp1, operation ldlm_enqueue failed with -116.
      Mar 4 00:13:39 lustre17 kernel: LustreError: 126770:0:(mdc_locks.c:916:mdc_enqueue()) ldlm_cli_enqueue: -116
      Mar 4 00:13:39 lustre17 kernel: LustreError: 126770:0:(vvp_io.c:1227:vvp_io_init()) cal: refresh file layout [0x200001beb:0x1aedf:0x0] error -116.
      Mar 4 02:02:43 lustre17 kernel: LustreError: 126770:0:(mdc_locks.c:916:mdc_enqueue()) ldlm_cli_enqueue: -2

      Client 18:
      Mar 1 05:34:03 lustre18 kernel: LustreError: 146331:0:(mdc_locks.c:916:mdc_enqueue()) ldlm_cli_enqueue: -2

      Attachments

        Issue Links

          Activity

            [LU-4705] LustreError: 89827:0:(mdc_locks.c:916:mdc_enqueue()) ldlm_cli_enqueue: -2

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/29736/
            Subject: LU-4705 mdc: improve mdc_enqueue() error message
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: c27470755cf40ee33056011883a0d0600ce00340

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/29736/ Subject: LU-4705 mdc: improve mdc_enqueue() error message Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: c27470755cf40ee33056011883a0d0600ce00340

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/29736
            Subject: LU-4705 mdc: improve mdc_enqueue() error message
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: b51accd1a652406afbe41ad764d116d0f361a0fb

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/29736 Subject: LU-4705 mdc: improve mdc_enqueue() error message Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: b51accd1a652406afbe41ad764d116d0f361a0fb
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28978/
            Subject: LU-4705 mdc: improve mdc_enqueue() error message
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 471c5303eb29d5ea1ba5a683173bda63095dae78

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28978/ Subject: LU-4705 mdc: improve mdc_enqueue() error message Project: fs/lustre-release Branch: master Current Patch Set: Commit: 471c5303eb29d5ea1ba5a683173bda63095dae78

            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/28978
            Subject: LU-4705 mdc: improve mdc_enqueue() error message
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 9d8f53da6ac5482262c188ba1e0ca3fb395aedfd

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/28978 Subject: LU-4705 mdc: improve mdc_enqueue() error message Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 9d8f53da6ac5482262c188ba1e0ca3fb395aedfd

            I just saw an instance of this error in the Lustre file system at TJNAF. It is the only instance I can recall of it being seen here, we are running lustre 2.5.3 pristine

            To expand a bit more... I have a test environment that I'm using to benchmark oss systems. Presently I have three osts on a single server running lustre 2.5.3. I've mounted it on a single client and am running IOR tests with the following parameters:

            mpirun -np 12 -bynode -machinefile ./nodelist ./ior -F -e -m -g -i 10 -t 1024k -b 42G -o /testL/benchmark/test

            where nodelist contains a single node.

            kjstrosahl Kurt J. Strosahl (Inactive) added a comment - - edited I just saw an instance of this error in the Lustre file system at TJNAF. It is the only instance I can recall of it being seen here, we are running lustre 2.5.3 pristine To expand a bit more... I have a test environment that I'm using to benchmark oss systems. Presently I have three osts on a single server running lustre 2.5.3. I've mounted it on a single client and am running IOR tests with the following parameters: mpirun -np 12 -bynode -machinefile ./nodelist ./ior -F -e -m -g -i 10 -t 1024k -b 42G -o /testL/benchmark/test where nodelist contains a single node.

            This is being seen at Gulfstream. In their environment, there doesn't appear to be any operational consequence to it. But, it scared them. It'd be nice if we could mute these errors, as discussed in https://jira.hpdd.intel.com/browse/LU-4705?focusedCommentId=79255&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-79255

            mjo Mike O'Connor added a comment - This is being seen at Gulfstream. In their environment, there doesn't appear to be any operational consequence to it. But, it scared them. It'd be nice if we could mute these errors, as discussed in https://jira.hpdd.intel.com/browse/LU-4705?focusedCommentId=79255&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-79255

            People

              wc-triage WC Triage
              brett Brett Lee (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: