Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8907

LBUG: (namei.c:816:ll_create_node()) ASSERTION( list_empty(&inode->i_dentry) ) failed

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.10.0
    • Lustre 2.5.3
    • RHEL 6 Bull kernel 2.6.32-642.4.2.el6.Bull.100.x86_64
      Lustre build based on 2.5.3.90
    • 3
    • 9223372036854775807

    Description

      One Lustre client crashed right after a recovery with the following messages:

      2016-10-29 16:00:01 Lustre: DEBUG MARKER: Sat Oct 29 16:00:01 2016
       2016-10-29 16:00:01
       2016-10-29 16:01:07 Lustre: 17766:0:(client.c:1942:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1477749640/real 1477749640] req@ffff8811e1b9a400 x1547883906151520/t0(0) 
       o101->scratch3-MDT0000-mdc-ffff88205ec91400@JO.BOO.AL.IL@o2ib2:12/10 lens 632/1136 e 0 to 1 dl 1477749667 ref 2 fl Rpc:XP/0/ffffffff rc 0/-1
       2016-10-29 16:01:07 Lustre: 17766:0:(client.c:1942:ptlrpc_expire_one_request()) Skipped 12 previous similar messages
       2016-10-29 16:01:07 Lustre: scratch3-MDT0000-mdc-ffff88205ec91400: Connection to scratch3-MDT0000 (at JO.BOO.AL.IL@o2ib2) was lost; in progress operations using this service will wait for recovery to complet
       e
       2016-10-29 16:01:07 Lustre: Skipped 12 previous similar messages
       2016-10-29 16:01:07 Lustre: scratch3-MDT0000-mdc-ffff88205ec91400: Connection restored to scratch3-MDT0000 (at JO.BOO.AL.IL@o2ib2)
       2016-10-29 16:01:07 Lustre: Skipped 12 previous similar messages
       2016-10-29 16:01:07 LustreError: 17766:0:(namei.c:816:ll_create_node()) ASSERTION( list_empty(&inode->i_dentry) ) failed:
       2016-10-29 16:01:07 LustreError: 17766:0:(namei.c:816:ll_create_node()) LBUG
      
      

      The stack is as follows:

      PID: 17766 TASK: ffff8819ab5b0ab0 CPU: 6 COMMAND: "rsync"
       #0 [ffff88189c697b20] machine_kexec at ffffffff8103ff4b
       #1 [ffff88189c697b80] crash_kexec at ffffffff810cfce2
       #2 [ffff88189c697c50] panic at ffffffff81546ce9
       #3 [ffff88189c697cd0] lbug_with_loc at ffffffffa067beeb [libcfs]
       #4 [ffff88189c697cf0] ll_create_nd at ffffffffa0dbf854 [lustre]
       #5 [ffff88189c697d70] vfs_create at ffffffff811a7946
       #6 [ffff88189c697db0] do_filp_open at ffffffff811ab75e
       #7 [ffff88189c697f20] do_sys_open at ffffffff81194e87
       #8 [ffff88189c697f70] sys_open at ffffffff81194f90
       #9 [ffff88189c697f80] system_call_fastpath at ffffffff8100b0d2
       RIP: 0000003c88adb480 RSP: 00007fffd85b6bc8 RFLAGS: 00010246
       RAX: 0000000000000002 RBX: ffffffff8100b0d2 RCX: 0000000000000001
       RDX: 0000000000000180 RSI: 00000000000000c2 RDI: 00007fffd85b7d70
       RBP: 00007fffd85b7d9b R8: 7a672e7761722e30 R9: 00000000ffffffff
       R10: 0000000000000001 R11: 0000000000000246 R12: ffffffff81194f90
       R13: ffff88189c697f78 R14: 00007fffd85b7d9c R15: 0000000000000000
       ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b
      

       

      Even though the intent is to create the inode with a dentry in the filesystem namespace (/somepath/.tmpfile), the inode structure field i_dentry is populated with one dentry pointing at /.lustre/fid/[0x298cd542a:0x3b3d:0x0].

      Strangely enough, even though the crash occured in ll_create_nd() -> ll_create_it() -> ll_create_node, before the call to d_instantiate() in ll_create_nd() -> ll_create_it(), the fid actually really points back to the requested location:

      # lfs fid2path /mountpoint [0x298cd542a:0x3b3d:0x0]
      /somepath/.tmpfile
      

      I can't upload the crashdump due to site restriction, but will be happy to give more informations as you request them.

      Attachments

        Activity

          [LU-8907] LBUG: (namei.c:816:ll_create_node()) ASSERTION( list_empty(&inode->i_dentry) ) failed
          pjones Peter Jones added a comment -

          Landed for 2.10

          pjones Peter Jones added a comment - Landed for 2.10
          pjones Peter Jones added a comment -

          Thanks

          pjones Peter Jones added a comment - Thanks

          Thanks Bruno.

          The patch was backported to our branch.

          This ticket can be closed.

          spiechurski Sebastien Piechurski added a comment - Thanks Bruno. The patch was backported to our branch. This ticket can be closed.

          Hello Seb,
          As you can see https://review.whamcloud.com/25296/ has just landed on master.
          And as per its content (at least its pure fix part in core code, i.e. simply remove LBUG), it should be pretty easy for you to back-port it on top of your Lustre branch based on 2.5.3.90.

          bfaccini Bruno Faccini (Inactive) added a comment - Hello Seb, As you can see https://review.whamcloud.com/25296/ has just landed on master. And as per its content (at least its pure fix part in core code, i.e. simply remove LBUG), it should be pretty easy for you to back-port it on top of your Lustre branch based on 2.5.3.90.

          Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25296/
          Subject: LU-8907 llite: handle client racy case during create
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: ee25babfe72378f9496a9732742984f26eb7d4a5

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25296/ Subject: LU-8907 llite: handle client racy case during create Project: fs/lustre-release Branch: master Current Patch Set: Commit: ee25babfe72378f9496a9732742984f26eb7d4a5

          Hello Bruno,

           

          What is the next step for this patch ?

          Are we waiting for reviewers ?

          spiechurski Sebastien Piechurski added a comment - Hello Bruno,   What is the next step for this patch ? Are we waiting for reviewers ?

          Yes, new sanity/test_161d sub-test is also a reproducer in auto-tests environment and it also demonstrate that problem is still alive in master!
          This has been proofed by patch-set #2 testing of https://review.whamcloud.com/25296, and now patch-set #3 introduces a first attempt to fix simply removing the inaccurate

          LASSERT(ll_d_hlist_empty(&inode->i_dentry));

          in ll_create_node() and exposes it to full tests suites.

          bfaccini Bruno Faccini (Inactive) added a comment - Yes, new sanity/test_161d sub-test is also a reproducer in auto-tests environment and it also demonstrate that problem is still alive in master! This has been proofed by patch-set #2 testing of https://review.whamcloud.com/25296 , and now patch-set #3 introduces a first attempt to fix simply removing the inaccurate LASSERT(ll_d_hlist_empty(&inode->i_dentry)); in ll_create_node() and exposes it to full tests suites.

          Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: https://review.whamcloud.com/25296
          Subject: LU-8907 llite: handle client racy case during create
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 3d35f58c3b6c736e5db3d22dea2451bf634cdb95

          gerrit Gerrit Updater added a comment - Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: https://review.whamcloud.com/25296 Subject: LU-8907 llite: handle client racy case during create Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 3d35f58c3b6c736e5db3d22dea2451bf634cdb95

          Hi Bruno, did you see the problem reproduced in the auto tests environment ?

          spiechurski Sebastien Piechurski added a comment - Hi Bruno, did you see the problem reproduced in the auto tests environment ?

          I have just pushed https://review.whamcloud.com/24976 to b2_5 branch. It introduces sanity/test_161d sub-test in order to check that the reproducer I have identified also fails in the auto-tests environment in a first time. And then, with next patch-set that removing LBUG could fix and not introduce any regression.

          Next, will also check, using the same way, if problem also exists in master.

          bfaccini Bruno Faccini (Inactive) added a comment - I have just pushed https://review.whamcloud.com/24976 to b2_5 branch. It introduces sanity/test_161d sub-test in order to check that the reproducer I have identified also fails in the auto-tests environment in a first time. And then, with next patch-set that removing LBUG could fix and not introduce any regression. Next, will also check, using the same way, if problem also exists in master.

          People

            bfaccini Bruno Faccini (Inactive)
            spiechurski Sebastien Piechurski
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: