Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3318

mdc_set_lock_data() ASSERTION( old_inode->i_state & I_FREEING )

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.4.0, Lustre 2.5.0
    • Lustre 2.4.0
    • None
    • 2.3.64 ppc64 client
      2.1.4 x86_64 server
    • 3
    • 8220

    Description

      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) header@c000000c6a188980[0x0, 28, [0x100f00000:0x131a05a:0x0] hash]{ 
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) ....lovsub@c000000c6a188a18[0]
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) ....osc@c000000d6758cad0id: 0x0:20029530 idx: 240 gen: 0 kms_valid: 1 kms 1618944 
      rc: 0 force_sync: 0 min_xid: 0 size: 1618944 mtime: 1368036151 atime: 1368036151 ctime: 1368036150 blocks: 2048
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) } header@c000000c6a188980
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) stripe 0 is already owned.
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) header@c000000f4fd1c8f8[0x0, 45, [0x5ae58ebb30:0x27:0x0] hash]{ 
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....vvp@c000000f4fd1c990(+ 0 0) inode: c000000c6a513178 6549798167695589415/1524993723 100666 1 0 c000000f4fd1c990 [0x5ae58ebb30:0x27:0x0]
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....lov@c000000c562bdfd8stripes: 2, valid, lsm{c000000c56b65b00 0x0BD10BD0 1 2 0}: 
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) header@c000000c6a188980[0x0, 28, [0x100f00000:0x131a05a:0x0] hash]{ 
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....lovsub@c000000c6a188a18[0]
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....osc@c000000d6758cad0id: 0x0:20029530 idx: 240 gen: 0 kms_valid: 1 kms 1618944 rc: 0 force_sync: 0 min_xid: 0 size: 1618944 mtime: 1368036151 atime: 1368036151 ctime: 1368036150 blocks: 2048
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) } header@c000000c6a188980
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) header@c000000c6a188a70[0x0, 18, [0x1006b0000:0x130771a:0x0] hash]{ 
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....lovsub@c000000c6a188b08[1]
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....osc@c000000d6758cbf0id: 0x0:19953434 idx: 107 gen: 0 kms_valid: 1 kms 1048576 rc: 0 force_sync: 0 min_xid: 0 size: 1048576 mtime: 1368036151 atime: 1368036151 ctime: 1368036151 blocks: 2048
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) } header@c000000c6a188a70
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) 
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) } header@c000000f4fd1c8f8
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) owned.
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:186:lov_init_sub()) header@c000000f4fd1c7f0[0x0, 1, [0x5ae58ebb30:0x27:0x0]]
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:186:lov_init_sub()) try to own.
      2013-05-08 11:02:31 LustreError: 7256:0:(lcommon_cl.c:1201:cl_file_inode_init()) Failure to initialize cl object [0x5ae58ebb30:0x27:0x0]: -5
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) header@c000000c6a188980[0x0, 28, [0x100f00000:0x131a05a:0x0] hash]{ 
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) ....lovsub@c000000c6a188a18[0]
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) ....osc@c000000d6758cad0id: 0x0:20029530 idx: 240 gen: 0 kms_valid: 1 kms 1618944 rc: 0 force_sync: 0 min_xid: 0 size: 1618944 mtime: 1368036151 atime: 1368036151 ctime: 1368036150 blocks: 2048
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) } header@c000000c6a188980
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) stripe 0 is already owned.
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) header@c000000f4fd1c8f8[0x0, 45, [0x5ae58ebb30:0x27:0x0] hash]{ 
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....vvp@c000000f4fd1c990(+ 0 0) inode: c000000c6a513178 6549798167695589415/1524993723 100666 1 0 c000000f4fd1c990 [0x5ae58ebb30:0x27:0x0]
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....lov@c000000c562bdfd8stripes: 2, valid, lsm{c000000c56b65b00 0x0BD10BD0 1 2 0}: 
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) header@c000000c6a188980[0x0, 28, [0x100f00000:0x131a05a:0x0] hash]{ 
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....lovsub@c000000c6a188a18[0]
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....osc@c000000d6758cad0id: 0x0:20029530 idx: 240 gen: 0 kms_valid: 1 kms 1618944 rc: 0 force_sync: 0 min_xid: 0 size: 1618944 mtime: 1368036151 atime: 1368036151 ctime: 1368036150 blocks: 2048
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) } header@c000000c6a188980
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) header@c000000c6a188a70[0x0, 18, [0x1006b0000:0x130771a:0x0] hash]{ 
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....lovsub@c000000c6a188b08[1]
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....osc@c000000d6758cbf0id: 0x0:19953434 idx: 107 gen: 0 kms_valid: 1 kms 1048576 rc: 0 force_sync: 0 min_xid: 0 size: 1048576 mtime: 1368036151 atime: 1368036151 ctime: 1368036151 blocks: 2048
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) } header@c000000c6a188a70
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) 
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) } header@c000000f4fd1c8f8
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) owned.
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:186:lov_init_sub()) header@c000000f4fd1c6e8[0x0, 1, [0x5ae58ebb30:0x27:0x0]]
      2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:186:lov_init_sub()) try to own.
      2013-05-08 11:02:31 LustreError: 7256:0:(llite_lib.c:2197:ll_prep_inode()) new_inode -fatal: rc -5
      2013-05-08 11:08:51 LustreError: 45085:0:(mdc_locks.c:143:mdc_set_lock_data()) ASSERTION( old_inode->i_state & I_FREEING ) failed: Found existing inode c000000d336808f8/6549372160627028526/1524894535 state 0 in lock: setting data to c000000c6abfdb78/252641838/1524894535
      2013-05-08 11:08:51 LustreError: 45085:0:(mdc_locks.c:143:mdc_set_lock_data()) LBUG
      2013-05-08 11:08:51 Call Trace:
      2013-05-08 11:08:51 [c000000ddf04f1d0] [c000000000012f04] .show_stack+0x74/0x1c0 (unreliable)
      2013-05-08 11:08:51 [c000000ddf04f280] [d00000000a9e0cb8] .libcfs_debug_dumpstack+0xd8/0x150 [libcfs]
      2013-05-08 11:08:51 [c000000ddf04f330] [d00000000a9e1480] .lbug_with_loc+0x50/0xc0 [libcfs]
      2013-05-08 11:08:51 [c000000ddf04f3c0] [d00000000c24fd2c] .mdc_set_lock_data+0x33c/0x340 [mdc]
      2013-05-08 11:08:51 [c000000ddf04f480] [d00000000c4a8674] .ll_lookup_it_finish+0xb34/0x1700 [lustre]
      2013-05-08 11:08:51 [c000000ddf04f5d0] [d00000000c4a96d8] .ll_lookup_it+0x498/0xfb0 [lustre]
      2013-05-08 11:08:51 [c000000ddf04f750] [d00000000c4aa4b4] .ll_lookup_nd+0x2c4/0x580 [lustre]
      2013-05-08 11:08:51 [c000000ddf04f810] [c0000000001d0614] .do_lookup+0x254/0x2d0
      2013-05-08 11:08:51 [c000000ddf04f8e0] [c0000000001d3348] .__link_path_walk+0x1f8/0x15a0
      2013-05-08 11:08:51 [c000000ddf04fa10] [c0000000001d4aa8] .path_walk+0x98/0x180
      2013-05-08 11:08:51 [c000000ddf04fab0] [c0000000001d4d9c] .do_path_lookup+0x7c/0xf0
      2013-05-08 11:08:51 [c000000ddf04fb40] [c0000000001d5b80] .user_path_at+0x60/0xb0
      2013-05-08 11:08:51 [c000000ddf04fc90] [c0000000001c9034] .vfs_fstatat+0x44/0xb0
      2013-05-08 11:08:51 [c000000ddf04fd30] [c0000000001c9274] .SyS_stat64+0x24/0x60
      2013-05-08 11:08:51 [c000000ddf04fe30] [c000000000008564] syscall_exit+0x0/0x40
      2013-05-08 11:08:51 Kernel panic - not syncing: LBUG
      2013-05-08 11:08:51 Call Trace:
      2013-05-08 11:08:51 [c000000ddf04f1f0] [c000000000012f04] .show_stack+0x74/0x1c0 (unreliable)
      2013-05-08 11:08:51 [c000000ddf04f2a0] [c0000000005c4f34] .panic+0xc4/0x1f8
      2013-05-08 11:08:51 [c000000ddf04f330] [d00000000a9e14e0] .lbug_with_loc+0xb0/0xc0 [libcfs]
      2013-05-08 11:08:51 [c000000ddf04f3c0] [d00000000c24fd2c] .mdc_set_lock_data+0x33c/0x340 [mdc]
      2013-05-08 11:08:51 [c000000ddf04f480] [d00000000c4a8674] .ll_lookup_it_finish+0xb34/0x1700 [lustre]
      2013-05-08 11:08:51 [c000000ddf04f5d0] [d00000000c4a96d8] .ll_lookup_it+0x498/0xfb0 [lustre]
      2013-05-08 11:08:51 [c000000ddf04f750] [d00000000c4aa4b4] .ll_lookup_nd+0x2c4/0x580 [lustre]
      2013-05-08 11:08:51 [c000000ddf04f810] [c0000000001d0614] .do_lookup+0x254/0x2d0
      2013-05-08 11:08:51 [c000000ddf04f8e0] [c0000000001d3348] .__link_path_walk+0x1f8/0x15a0
      2013-05-08 11:08:51 [c000000ddf04fa10] [c0000000001d4aa8] .path_walk+0x98/0x180
      2013-05-08 11:08:51 [c000000ddf04fab0] [c0000000001d4d9c] .do_path_lookup+0x7c/0xf0
      2013-05-08 11:08:51 [c000000ddf04fb40] [c0000000001d5b80] .user_path_at+0x60/0xb0
      2013-05-08 11:08:51 [c000000ddf04fc90] [c0000000001c9034] .vfs_fstatat+0x44/0xb0
      2013-05-08 11:08:51 [c000000ddf04fd30] [c0000000001c9274] .SyS_stat64+0x24/0x60
      2013-05-08 11:08:51 [c000000ddf04fe30] [c000000000008564] syscall_exit+0x0/0x40
      2013-05-08 11:08:51 May  8 11:08:51 rzuseqlac2 kernel: LustreError: 45085:0:(mdc_locks.c:143:mdc_set_lock_data()) ASSERTION( old_inode->i_state & I_FREEING ) failed: Found existing inode c000000d336808f8/6549372160627028526/1524894535 state 0 in lock: setting data to c000000c6abfdb78/252641838/1524894535
      2013-05-08 11:08:51 May  8 11:08:51 rzuseqlac2 kernel: LustreError: 45085:0:(mdc_locks.c:143:mdc_set_lock_data()) LBUG
      2013-05-08 11:08:51 May  8 11:08:51 rzuseqlac2 kernel: Kernel panic - not syncing: LBUG
      

      We have two crash dumps available for analysis.

      LLNL-bug-id: BG-165

      Attachments

        Issue Links

          Activity

            [LU-3318] mdc_set_lock_data() ASSERTION( old_inode->i_state & I_FREEING )

            Hi Andreas, why is 32-bit inode number still needed by internal code?

            jay Jinshan Xiong (Inactive) added a comment - Hi Andreas, why is 32-bit inode number still needed by internal code?

            Jinshan,
            this isn't necessarily a problem with the NFS file handle, but rather some part of the internal code is using the 32-bit inode number AND the 64-bit inode number internally for some comparison, when it shouldn't be doing this.

            adilger Andreas Dilger added a comment - Jinshan, this isn't necessarily a problem with the NFS file handle, but rather some part of the internal code is using the 32-bit inode number AND the 64-bit inode number internally for some comparison, when it shouldn't be doing this.

            The problem is that if some 32bit user(like NFS ?) will use ino directly? if we cut 64 bit inode ino to 32 bit and return it to the user, which might not be able to get back the inode with this 32 bit ino?

            I thought we pack FID into NFS file handle. I can't think of an use case of using ino directly - but apparently I'm not an expert about this

            jay Jinshan Xiong (Inactive) added a comment - The problem is that if some 32bit user(like NFS ?) will use ino directly? if we cut 64 bit inode ino to 32 bit and return it to the user, which might not be able to get back the inode with this 32 bit ino? I thought we pack FID into NFS file handle. I can't think of an use case of using ino directly - but apparently I'm not an expert about this

            I can confirm that LU-3327 involves a 32bit application on a 64bit client.

            thermeon Scott Sampson added a comment - I can confirm that LU-3327 involves a 32bit application on a 64bit client.

            This was discussed in the past, and the assumption was made that no applications were going to access both 32-bit and 64-bit inode numbers at the same time. The inode number should not be stored or used internally, but rather the FID should always be used for internal access, and the inode number should always be regenerated in the context of the user thread that is doing the stat() or whatever needs it.

            adilger Andreas Dilger added a comment - This was discussed in the past, and the assumption was made that no applications were going to access both 32-bit and 64-bit inode numbers at the same time. The inode number should not be stored or used internally, but rather the FID should always be used for internal access, and the inode number should always be regenerated in the context of the user thread that is doing the stat() or whatever needs it.
            di.wang Di Wang added a comment - - edited

            The problem is that if some 32bit user(like NFS ?) will use ino directly? if we cut 64 bit inode ino to 32 bit and return it to the user, which might not be able to get back the inode with this 32 bit ino?

            di.wang Di Wang added a comment - - edited The problem is that if some 32bit user(like NFS ?) will use ino directly? if we cut 64 bit inode ino to 32 bit and return it to the user, which might not be able to get back the inode with this 32 bit ino?

            We should always use `unsigned long' in kernel space. The only concern for 32bit compatibility is when we're filling in the user buffer. So in this case for ll_iget(), we should call cl_fid_build_ino() with api32 = 0.

            jay Jinshan Xiong (Inactive) added a comment - We should always use `unsigned long' in kernel space. The only concern for 32bit compatibility is when we're filling in the user buffer. So in this case for ll_iget(), we should call cl_fid_build_ino() with api32 = 0.

            Is that possible those 32 bits application running on these ppc 64-bit clients?

            We confirmed that it is possible to build and run 32-bit apps on the ppc64 client that hit this bug.

            nedbass Ned Bass (Inactive) added a comment - Is that possible those 32 bits application running on these ppc 64-bit clients? We confirmed that it is possible to build and run 32-bit apps on the ppc64 client that hit this bug.
            di.wang Di Wang added a comment -

            Just tried to figure out the FID from these two inodes in the error message. The FID should be [0x5ae40b47b0: 0x1b22e: 0].

            So fid_flatten( [0x5ae40b47b0: 0x1b22e: 0]) = 6549372160627028526 and fid_flatten32([0x5ae40b47b0: 0x1b22e: 0]) = 252641838

            which just match the error message and confirm this guess

            2013-05-08 11:08:51 LustreError: 45085:0:(mdc_locks.c:143:mdc_set_lock_data()) ASSERTION( old_inode->i_state & I_FREEING ) failed: Found existing inode c000000d336808f8/6549372160627028526/1524894535 state 0 in lock: setting data to c000000c6abfdb78/252641838/1524894535
            2013-05-08 11:08:51 LustreError: 45085:0:(mdc_locks.c:143:mdc_set_lock_data()) LBUG
            

            But not sure how to fix this.

            di.wang Di Wang added a comment - Just tried to figure out the FID from these two inodes in the error message. The FID should be [0x5ae40b47b0: 0x1b22e: 0] . So fid_flatten( [0x5ae40b47b0: 0x1b22e: 0] ) = 6549372160627028526 and fid_flatten32( [0x5ae40b47b0: 0x1b22e: 0] ) = 252641838 which just match the error message and confirm this guess 2013-05-08 11:08:51 LustreError: 45085:0:(mdc_locks.c:143:mdc_set_lock_data()) ASSERTION( old_inode->i_state & I_FREEING ) failed: Found existing inode c000000d336808f8/6549372160627028526/1524894535 state 0 in lock: setting data to c000000c6abfdb78/252641838/1524894535 2013-05-08 11:08:51 LustreError: 45085:0:(mdc_locks.c:143:mdc_set_lock_data()) LBUG But not sure how to fix this.
            di.wang Di Wang added a comment -

            Is that possible those 32 bits application running on these ppc 64-bit clients? And also, is it possible to umount clients and erase the config log with tunefs, and remount clients, so lmv layer can be created on client side, which will slow down the sequence consume speed.

            di.wang Di Wang added a comment - Is that possible those 32 bits application running on these ppc 64-bit clients? And also, is it possible to umount clients and erase the config log with tunefs, and remount clients, so lmv layer can be created on client side, which will slow down the sequence consume speed.

            We didn't upgrade all of our clients to 2.1.4 because we paused the rollout due to LU-3029. Also, this file system (lscratchrza) has ppc 32-bit clients mounting it right now (rzdawndev a BG/P system), and there would be 32-bit applications writing to the file system. These clients are not the same clients that generated this ASSERTION, those are ppc 64-bit clients.

            marc@llnl.gov D. Marc Stearman (Inactive) added a comment - We didn't upgrade all of our clients to 2.1.4 because we paused the rollout due to LU-3029 . Also, this file system (lscratchrza) has ppc 32-bit clients mounting it right now (rzdawndev a BG/P system), and there would be 32-bit applications writing to the file system. These clients are not the same clients that generated this ASSERTION, those are ppc 64-bit clients.

            People

              di.wang Di Wang
              nedbass Ned Bass (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: