Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5934

mdt_intent_reint()) ASSERTION( rc == 0 ) failed: Error occurred but lock handle is still in use, rc = -2

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.7.0
    • Lustre 2.7.0, Lustre 2.5.4
    • None
    • 3
    • 16571

    Description

      Bug hit with soak tests while testing MDT failover:

      LDISKFS-fs warning (device dm-2): ldiskfs_multi_mount_protect: MMP interval 42 higher than expected, please wait.
      
      LDISKFS-fs (dm-2): recovery complete
      LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. quota=on. Opts: 
      Lustre: soaked-MDT0000: used disk, loading
      LustreError: 11-0: soaked-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11.
      Lustre: soaked-MDT0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450
      LDISKFS-fs (dm-3): recovery complete
      LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. quota=on. Opts: 
      Lustre: home-MDT0001: used disk, loading
      Lustre: home-MDT0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450
      Lustre: soaked-MDT0000: Will be in recovery for at least 2:30, or until 9 clients reconnect
      LustreError: 4283:0:(mdt_open.c:1652:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid.  req@ffff88081283c800 x14844821187
      66556/t0(77329207333) o101->e0f06817-48a4-23ba-7529-dacbe1c5c96b@192.168.1.116@o2ib100:0/0 lens 1408/3512 e 0 to 0 dl 1416337962 ref 
      1 fl Interpret:/4/0 rc 0/0
      Lustre: home-MDT0001: Will be in recovery for at least 2:30, or until 9 clients reconnect
      LustreError: 4653:0:(mdt_open.c:1652:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid.  req@ffff8804122ef800 x14844820751
      33836/t0(77310248349) o101->8fe2e201-b336-c2b3-1690-2b8671ccb3ac@192.168.1.121@o2ib100:0/0 lens 568/3512 e 0 to 0 dl 1416337966 ref 1
       fl Interpret:/4/0 rc 0/0
      LustreError: 4653:0:(mdt_open.c:1652:mdt_reint_open()) Skipped 6 previous similar messages
      LustreError: 4275:0:(mdt_open.c:1652:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid.  req@ffff88041ec27c00 x14844821336
      81580/t0(77313286391) o101->5af8cf20-f59d-b674-4342-95d4f9187d5e@192.168.1.126@o2ib100:0/0 lens 1744/3512 e 0 to 0 dl 1416337976 ref 
      1 fl Interpret:/4/0 rc 0/0
      LustreError: 4275:0:(mdt_open.c:1652:mdt_reint_open()) Skipped 26 previous similar messages
      LustreError: 4284:0:(mdt_open.c:1652:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid.  req@ffff880836688000 x14844821165
      12776/t0(77315789759) o101->0a6d6736-23b0-a177-bd78-0dc387bbcfed@192.168.1.124@o2ib100:0/0 lens 1792/3512 e 0 to 0 dl 1416337994 ref 
      1 fl Interpret:/4/0 rc 0/0
      LustreError: 4284:0:(mdt_open.c:1652:mdt_reint_open()) Skipped 2 previous similar messages
      LustreError: 4689:0:(mdt_open.c:1652:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid.  req@ffff880410c2b400 x14844823276
      10436/t0(77382041571) o101->c9895ebb-5dd8-614d-e10a-fe88fed518aa@192.168.1.127@o2ib100:0/0 lens 808/3512 e 0 to 0 dl 1416337998 ref 1
       fl Interpret:/4/0 rc 0/0
      LustreError: 4689:0:(mdt_open.c:1652:mdt_reint_open()) Skipped 8 previous similar messages
      Lustre: home-MDT0001: Recovery over after 0:39, of 9 clients 9 recovered and 0 were evicted.
      Lustre: soaked-MDT0000: Recovery over after 1:24, of 9 clients 9 recovered and 0 were evicted.
      LustreError: 0:0:(ldlm_lockd.c:344:waiting_locks_callback()) ### lock callback timer expired after 151s: evicting client at 192.168.1
      .124@o2ib100  ns: mdt-soaked-MDT0000_UUID lock: ffff88055ce43700/0xeb9796ef206ff833 lrc: 4/0,0 mode: CR/CR res: [0x2ac0000bd6:0xe1db:
      0x0].0 bits 0x2 rrc: 6 type: IBT flags: 0x60200000000020 nid: 192.168.1.124@o2ib100 remote: 0x55621c70e3b62760 expref: 76246 pid: 469
      4 timeout: 4340788641 lvb_type: 0
      LustreError: 4725:0:(mdt_handler.c:4078:mdt_intent_reint()) ASSERTION( rc == 0 ) failed: Error occurred but lock handle is still in use, rc = -2
      LustreError: 4725:0:(mdt_handler.c:4078:mdt_intent_reint()) LBUG
      Pid: 4725, comm: mdt03_011
      
      Call Trace:
       [<ffffffffa0550895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa0550e97>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa103d79a>] mdt_intent_reint+0x51a/0x520 [mdt]
       [<ffffffffa103ac4e>] mdt_intent_policy+0x3ae/0x770 [mdt]
       [<ffffffffa07ed2e5>] ldlm_lock_enqueue+0x135/0x980 [ptlrpc]
       [<ffffffffa0816dab>] ldlm_handle_enqueue0+0x51b/0x10c0 [ptlrpc]
       [<ffffffff81069f15>] ? enqueue_entity+0x125/0x450
       [<ffffffffa103b116>] mdt_enqueue+0x46/0xe0 [mdt]
       [<ffffffffa103fcda>] mdt_handle_common+0x52a/0x1470 [mdt]
       [<ffffffffa107d455>] mds_regular_handle+0x15/0x20 [mdt]
       [<ffffffffa08470d5>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
       [<ffffffffa05514ce>] ? cfs_timer_arm+0xe/0x10 [libcfs]
       [<ffffffffa05623b5>] ? lc_watchdog_touch+0x65/0x170 [libcfs]
       [<ffffffffa083e779>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
       [<ffffffff810546b9>] ? __wake_up_common+0x59/0x90
       [<ffffffffa084843d>] ptlrpc_main+0xaed/0x1770 [ptlrpc]
       [<ffffffffa0847950>] ? ptlrpc_main+0x0/0x1770 [ptlrpc]
       [<ffffffff8109abf6>] kthread+0x96/0xa0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109ab60>] ? kthread+0x0/0xa0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      
      Kernel panic - not syncing: LBUG
      

      Please note that LU-5930 was already created for the "OPEN & CREAT not in open replay/by_fid" message.

      Attachments

        Issue Links

          Activity

            People

              liang Liang Zhen (Inactive)
              johann Johann Lombardi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: