Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3318

mdc_set_lock_data() ASSERTION( old_inode->i_state & I_FREEING )

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.4.0, Lustre 2.5.0
    • Lustre 2.4.0
    • None
    • 2.3.64 ppc64 client
      2.1.4 x86_64 server
    • 3
    • 8220

    Description

      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) header@c000000c6a188980[0x0, 28, [0x100f00000:0x131a05a:0x0] hash]{ 
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) ....lovsub@c000000c6a188a18[0]
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) ....osc@c000000d6758cad0id: 0x0:20029530 idx: 240 gen: 0 kms_valid: 1 kms 1618944 
      rc: 0 force_sync: 0 min_xid: 0 size: 1618944 mtime: 1368036151 atime: 1368036151 ctime: 1368036150 blocks: 2048
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) } header@c000000c6a188980
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) stripe 0 is already owned.
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) header@c000000f4fd1c8f8[0x0, 45, [0x5ae58ebb30:0x27:0x0] hash]{ 
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....vvp@c000000f4fd1c990(+ 0 0) inode: c000000c6a513178 6549798167695589415/1524993723 100666 1 0 c000000f4fd1c990 [0x5ae58ebb30:0x27:0x0]
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....lov@c000000c562bdfd8stripes: 2, valid, lsm{c000000c56b65b00 0x0BD10BD0 1 2 0}: 
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) header@c000000c6a188980[0x0, 28, [0x100f00000:0x131a05a:0x0] hash]{ 
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....lovsub@c000000c6a188a18[0]
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....osc@c000000d6758cad0id: 0x0:20029530 idx: 240 gen: 0 kms_valid: 1 kms 1618944 rc: 0 force_sync: 0 min_xid: 0 size: 1618944 mtime: 1368036151 atime: 1368036151 ctime: 1368036150 blocks: 2048
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) } header@c000000c6a188980
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) header@c000000c6a188a70[0x0, 18, [0x1006b0000:0x130771a:0x0] hash]{ 
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....lovsub@c000000c6a188b08[1]
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....osc@c000000d6758cbf0id: 0x0:19953434 idx: 107 gen: 0 kms_valid: 1 kms 1048576 rc: 0 force_sync: 0 min_xid: 0 size: 1048576 mtime: 1368036151 atime: 1368036151 ctime: 1368036151 blocks: 2048
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) } header@c000000c6a188a70
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) 
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) } header@c000000f4fd1c8f8
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) owned.
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:186:lov_init_sub()) header@c000000f4fd1c7f0[0x0, 1, [0x5ae58ebb30:0x27:0x0]]
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:186:lov_init_sub()) try to own.
      2013-05-08 11:02:31 LustreError: 7256:0:(lcommon_cl.c:1201:cl_file_inode_init()) Failure to initialize cl object [0x5ae58ebb30:0x27:0x0]: -5
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) header@c000000c6a188980[0x0, 28, [0x100f00000:0x131a05a:0x0] hash]{ 
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) ....lovsub@c000000c6a188a18[0]
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) ....osc@c000000d6758cad0id: 0x0:20029530 idx: 240 gen: 0 kms_valid: 1 kms 1618944 rc: 0 force_sync: 0 min_xid: 0 size: 1618944 mtime: 1368036151 atime: 1368036151 ctime: 1368036150 blocks: 2048
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) } header@c000000c6a188980
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) stripe 0 is already owned.
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) header@c000000f4fd1c8f8[0x0, 45, [0x5ae58ebb30:0x27:0x0] hash]{ 
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....vvp@c000000f4fd1c990(+ 0 0) inode: c000000c6a513178 6549798167695589415/1524993723 100666 1 0 c000000f4fd1c990 [0x5ae58ebb30:0x27:0x0]
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....lov@c000000c562bdfd8stripes: 2, valid, lsm{c000000c56b65b00 0x0BD10BD0 1 2 0}: 
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) header@c000000c6a188980[0x0, 28, [0x100f00000:0x131a05a:0x0] hash]{ 
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....lovsub@c000000c6a188a18[0]
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....osc@c000000d6758cad0id: 0x0:20029530 idx: 240 gen: 0 kms_valid: 1 kms 1618944 rc: 0 force_sync: 0 min_xid: 0 size: 1618944 mtime: 1368036151 atime: 1368036151 ctime: 1368036150 blocks: 2048
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) } header@c000000c6a188980
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) header@c000000c6a188a70[0x0, 18, [0x1006b0000:0x130771a:0x0] hash]{ 
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....lovsub@c000000c6a188b08[1]
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....osc@c000000d6758cbf0id: 0x0:19953434 idx: 107 gen: 0 kms_valid: 1 kms 1048576 rc: 0 force_sync: 0 min_xid: 0 size: 1048576 mtime: 1368036151 atime: 1368036151 ctime: 1368036151 blocks: 2048
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) } header@c000000c6a188a70
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) 
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) } header@c000000f4fd1c8f8
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) owned.
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:186:lov_init_sub()) header@c000000f4fd1c6e8[0x0, 1, [0x5ae58ebb30:0x27:0x0]]
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:186:lov_init_sub()) try to own.
      2013-05-08 11:02:31 LustreError: 7256:0:(llite_lib.c:2197:ll_prep_inode()) new_inode -fatal: rc -5
      2013-05-08 11:08:51 LustreError: 45085:0:(mdc_locks.c:143:mdc_set_lock_data()) ASSERTION( old_inode->i_state & I_FREEING ) failed: Found existing inode c000000d336808f8/6549372160627028526/1524894535 state 0 in lock: setting data to c000000c6abfdb78/252641838/1524894535
      2013-05-08 11:08:51 LustreError: 45085:0:(mdc_locks.c:143:mdc_set_lock_data()) LBUG
      2013-05-08 11:08:51 Call Trace:
      2013-05-08 11:08:51 [c000000ddf04f1d0] [c000000000012f04] .show_stack+0x74/0x1c0 (unreliable)
      2013-05-08 11:08:51 [c000000ddf04f280] [d00000000a9e0cb8] .libcfs_debug_dumpstack+0xd8/0x150 [libcfs]
      2013-05-08 11:08:51 [c000000ddf04f330] [d00000000a9e1480] .lbug_with_loc+0x50/0xc0 [libcfs]
      2013-05-08 11:08:51 [c000000ddf04f3c0] [d00000000c24fd2c] .mdc_set_lock_data+0x33c/0x340 [mdc]
      2013-05-08 11:08:51 [c000000ddf04f480] [d00000000c4a8674] .ll_lookup_it_finish+0xb34/0x1700 [lustre]
      2013-05-08 11:08:51 [c000000ddf04f5d0] [d00000000c4a96d8] .ll_lookup_it+0x498/0xfb0 [lustre]
      2013-05-08 11:08:51 [c000000ddf04f750] [d00000000c4aa4b4] .ll_lookup_nd+0x2c4/0x580 [lustre]
      2013-05-08 11:08:51 [c000000ddf04f810] [c0000000001d0614] .do_lookup+0x254/0x2d0
      2013-05-08 11:08:51 [c000000ddf04f8e0] [c0000000001d3348] .__link_path_walk+0x1f8/0x15a0
      2013-05-08 11:08:51 [c000000ddf04fa10] [c0000000001d4aa8] .path_walk+0x98/0x180
      2013-05-08 11:08:51 [c000000ddf04fab0] [c0000000001d4d9c] .do_path_lookup+0x7c/0xf0
      2013-05-08 11:08:51 [c000000ddf04fb40] [c0000000001d5b80] .user_path_at+0x60/0xb0
      2013-05-08 11:08:51 [c000000ddf04fc90] [c0000000001c9034] .vfs_fstatat+0x44/0xb0
      2013-05-08 11:08:51 [c000000ddf04fd30] [c0000000001c9274] .SyS_stat64+0x24/0x60
      2013-05-08 11:08:51 [c000000ddf04fe30] [c000000000008564] syscall_exit+0x0/0x40
      2013-05-08 11:08:51 Kernel panic - not syncing: LBUG
      2013-05-08 11:08:51 Call Trace:
      2013-05-08 11:08:51 [c000000ddf04f1f0] [c000000000012f04] .show_stack+0x74/0x1c0 (unreliable)
      2013-05-08 11:08:51 [c000000ddf04f2a0] [c0000000005c4f34] .panic+0xc4/0x1f8
      2013-05-08 11:08:51 [c000000ddf04f330] [d00000000a9e14e0] .lbug_with_loc+0xb0/0xc0 [libcfs]
      2013-05-08 11:08:51 [c000000ddf04f3c0] [d00000000c24fd2c] .mdc_set_lock_data+0x33c/0x340 [mdc]
      2013-05-08 11:08:51 [c000000ddf04f480] [d00000000c4a8674] .ll_lookup_it_finish+0xb34/0x1700 [lustre]
      2013-05-08 11:08:51 [c000000ddf04f5d0] [d00000000c4a96d8] .ll_lookup_it+0x498/0xfb0 [lustre]
      2013-05-08 11:08:51 [c000000ddf04f750] [d00000000c4aa4b4] .ll_lookup_nd+0x2c4/0x580 [lustre]
      2013-05-08 11:08:51 [c000000ddf04f810] [c0000000001d0614] .do_lookup+0x254/0x2d0
      2013-05-08 11:08:51 [c000000ddf04f8e0] [c0000000001d3348] .__link_path_walk+0x1f8/0x15a0
      2013-05-08 11:08:51 [c000000ddf04fa10] [c0000000001d4aa8] .path_walk+0x98/0x180
      2013-05-08 11:08:51 [c000000ddf04fab0] [c0000000001d4d9c] .do_path_lookup+0x7c/0xf0
      2013-05-08 11:08:51 [c000000ddf04fb40] [c0000000001d5b80] .user_path_at+0x60/0xb0
      2013-05-08 11:08:51 [c000000ddf04fc90] [c0000000001c9034] .vfs_fstatat+0x44/0xb0
      2013-05-08 11:08:51 [c000000ddf04fd30] [c0000000001c9274] .SyS_stat64+0x24/0x60
      2013-05-08 11:08:51 [c000000ddf04fe30] [c000000000008564] syscall_exit+0x0/0x40
      2013-05-08 11:08:51 May  8 11:08:51 rzuseqlac2 kernel: LustreError: 45085:0:(mdc_locks.c:143:mdc_set_lock_data()) ASSERTION( old_inode->i_state & I_FREEING ) failed: Found existing inode c000000d336808f8/6549372160627028526/1524894535 state 0 in lock: setting data to c000000c6abfdb78/252641838/1524894535
      2013-05-08 11:08:51 May  8 11:08:51 rzuseqlac2 kernel: LustreError: 45085:0:(mdc_locks.c:143:mdc_set_lock_data()) LBUG
      2013-05-08 11:08:51 May  8 11:08:51 rzuseqlac2 kernel: Kernel panic - not syncing: LBUG
      

      We have two crash dumps available for analysis.

      LLNL-bug-id: BG-165

      Attachments

        Issue Links

          Activity

            [LU-3318] mdc_set_lock_data() ASSERTION( old_inode->i_state & I_FREEING )

            Ned, fwiw, my related case LU-3327 is definitely just a 32bit app on a 64bit client.

            thermeon Scott Sampson added a comment - Ned, fwiw, my related case LU-3327 is definitely just a 32bit app on a 64bit client.

            I'm having trouble understanding what exactly we think applications are doing to trigger this bug. For example, nasf said:

            we do not know how the 32-bit ino# will be used, and cannot prevent it to be passed down back to llite for using.

            How would an application pass an inode number back to llite? I'm not aware of any library or system calls that allow files to be accessed by inode number.

            Do we think it has to be the same process mixing 32-bit and 64-bit system calls, or is it problematic just to mix 32-bit and 64-bit processes that access the same files on the same client?

            nedbass Ned Bass (Inactive) added a comment - I'm having trouble understanding what exactly we think applications are doing to trigger this bug. For example, nasf said: we do not know how the 32-bit ino# will be used, and cannot prevent it to be passed down back to llite for using. How would an application pass an inode number back to llite? I'm not aware of any library or system calls that allow files to be accessed by inode number. Do we think it has to be the same process mixing 32-bit and 64-bit system calls, or is it problematic just to mix 32-bit and 64-bit processes that access the same files on the same client?

            Can we detect this case and return an error rather than asserting on it? Allowing a misbehaved application to crash the OS seems unacceptable.

            nedbass Ned Bass (Inactive) added a comment - Can we detect this case and return an error rather than asserting on it? Allowing a misbehaved application to crash the OS seems unacceptable.

            Most applications do not use or care about the inode number. However, 32-bit applications ona 64-bit client will definitely get an error from stat() if the curproc check is removed. That is MUCH more likely to cause a problem for users than a very obscure case of someone mixing 32-bit and 64-bit applications that use the inode number on the same client and expecting them to work well together.

            This has been in all of the 2.x releases without any complaints, so I'd rather leave this alone.

            adilger Andreas Dilger added a comment - Most applications do not use or care about the inode number. However, 32-bit applications ona 64-bit client will definitely get an error from stat() if the curproc check is removed. That is MUCH more likely to cause a problem for users than a very obscure case of someone mixing 32-bit and 64-bit applications that use the inode number on the same client and expecting them to work well together. This has been in all of the 2.x releases without any complaints, so I'd rather leave this alone.

            Mixed using 64-bit ino# internally and 32-bit ino# external for 32-bit applications may cause unexpectation, because we do not know how the 32-bit ino# will be used, and cannot prevent it to be passed down back to llite for using. So for a give Lustre client, either all 32-bit ino# (on 32-bit platform or via "-o 32bitapi") or all 64-bit ino#, NOT mixed.

            yong.fan nasf (Inactive) added a comment - Mixed using 64-bit ino# internally and 32-bit ino# external for 32-bit applications may cause unexpectation, because we do not know how the 32-bit ino# will be used, and cannot prevent it to be passed down back to llite for using. So for a give Lustre client, either all 32-bit ino# (on 32-bit platform or via "-o 32bitapi") or all 64-bit ino#, NOT mixed.
            di.wang Di Wang added a comment -

            yes, I agree removing cfs_curproc_is_32bit should fix this problem. http://review.whamcloud.com/#change,6371

            di.wang Di Wang added a comment - yes, I agree removing cfs_curproc_is_32bit should fix this problem. http://review.whamcloud.com/#change,6371

            My preference would be that we use 64-bit inode numbers internally all the time, and only expose 32-bit inode numbers to the application. It looks like this is a defect introduced by http://review.whamcloud.com/5711 from LU-2904, since this is what introduced the use of ll_need_32bit_api(sbi) to ll_iget(). I suspect it might be OK to replace this with (sbi->ll_flags & LL_SBI_32BIT_API) so that the cfs_curproc_is_32bit() part is skipped in this case.

            adilger Andreas Dilger added a comment - My preference would be that we use 64-bit inode numbers internally all the time, and only expose 32-bit inode numbers to the application. It looks like this is a defect introduced by http://review.whamcloud.com/5711 from LU-2904 , since this is what introduced the use of ll_need_32bit_api(sbi) to ll_iget(). I suspect it might be OK to replace this with (sbi->ll_flags & LL_SBI_32BIT_API) so that the cfs_curproc_is_32bit() part is skipped in this case.
            di.wang Di Wang added a comment -

            Hmm, how about in prep_inode, we always try to locate inode(ilookup) by ino created by fid_flatten32 first, if it can not find any inodes associated with the ino, then it will prepare the inode in the normal way. Since 32 bit application might be rare thing these days, we probably can add a flag in super block, and then set the flag if there are 32 bit application running, then only do fid_flatten32 check if this flag is being set. Though this method might make it a bit complicate, and also slow down the client metadata performance if there are 32 bit application has been running.

            di.wang Di Wang added a comment - Hmm, how about in prep_inode, we always try to locate inode(ilookup) by ino created by fid_flatten32 first, if it can not find any inodes associated with the ino, then it will prepare the inode in the normal way. Since 32 bit application might be rare thing these days, we probably can add a flag in super block, and then set the flag if there are 32 bit application running, then only do fid_flatten32 check if this flag is being set. Though this method might make it a bit complicate, and also slow down the client metadata performance if there are 32 bit application has been running.

            Hi Andreas, why is 32-bit inode number still needed by internal code?

            jay Jinshan Xiong (Inactive) added a comment - Hi Andreas, why is 32-bit inode number still needed by internal code?

            Jinshan,
            this isn't necessarily a problem with the NFS file handle, but rather some part of the internal code is using the 32-bit inode number AND the 64-bit inode number internally for some comparison, when it shouldn't be doing this.

            adilger Andreas Dilger added a comment - Jinshan, this isn't necessarily a problem with the NFS file handle, but rather some part of the internal code is using the 32-bit inode number AND the 64-bit inode number internally for some comparison, when it shouldn't be doing this.

            The problem is that if some 32bit user(like NFS ?) will use ino directly? if we cut 64 bit inode ino to 32 bit and return it to the user, which might not be able to get back the inode with this 32 bit ino?

            I thought we pack FID into NFS file handle. I can't think of an use case of using ino directly - but apparently I'm not an expert about this

            jay Jinshan Xiong (Inactive) added a comment - The problem is that if some 32bit user(like NFS ?) will use ino directly? if we cut 64 bit inode ino to 32 bit and return it to the user, which might not be able to get back the inode with this 32 bit ino? I thought we pack FID into NFS file handle. I can't think of an use case of using ino directly - but apparently I'm not an expert about this

            People

              di.wang Di Wang
              nedbass Ned Bass (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: