[LU-3318] mdc_set_lock_data() ASSERTION( old_inode->i_state & I_FREEING ) Created: 10/May/13 Updated: 07/Nov/13 Resolved: 31/May/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0, Lustre 2.5.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Ned Bass | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
2.3.64 ppc64 client |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 8220 | ||||||||||||||||
| Description |
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) header@c000000c6a188980[0x0, 28, [0x100f00000:0x131a05a:0x0] hash]{
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) ....lovsub@c000000c6a188a18[0]
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) ....osc@c000000d6758cad0id: 0x0:20029530 idx: 240 gen: 0 kms_valid: 1 kms 1618944
rc: 0 force_sync: 0 min_xid: 0 size: 1618944 mtime: 1368036151 atime: 1368036151 ctime: 1368036150 blocks: 2048
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) } header@c000000c6a188980
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) stripe 0 is already owned.
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) header@c000000f4fd1c8f8[0x0, 45, [0x5ae58ebb30:0x27:0x0] hash]{
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....vvp@c000000f4fd1c990(+ 0 0) inode: c000000c6a513178 6549798167695589415/1524993723 100666 1 0 c000000f4fd1c990 [0x5ae58ebb30:0x27:0x0]
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....lov@c000000c562bdfd8stripes: 2, valid, lsm{c000000c56b65b00 0x0BD10BD0 1 2 0}:
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) header@c000000c6a188980[0x0, 28, [0x100f00000:0x131a05a:0x0] hash]{
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....lovsub@c000000c6a188a18[0]
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....osc@c000000d6758cad0id: 0x0:20029530 idx: 240 gen: 0 kms_valid: 1 kms 1618944 rc: 0 force_sync: 0 min_xid: 0 size: 1618944 mtime: 1368036151 atime: 1368036151 ctime: 1368036150 blocks: 2048
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) } header@c000000c6a188980
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) header@c000000c6a188a70[0x0, 18, [0x1006b0000:0x130771a:0x0] hash]{
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....lovsub@c000000c6a188b08[1]
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....osc@c000000d6758cbf0id: 0x0:19953434 idx: 107 gen: 0 kms_valid: 1 kms 1048576 rc: 0 force_sync: 0 min_xid: 0 size: 1048576 mtime: 1368036151 atime: 1368036151 ctime: 1368036151 blocks: 2048
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) } header@c000000c6a188a70
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub())
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) } header@c000000f4fd1c8f8
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) owned.
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:186:lov_init_sub()) header@c000000f4fd1c7f0[0x0, 1, [0x5ae58ebb30:0x27:0x0]]
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:186:lov_init_sub()) try to own.
2013-05-08 11:02:31 LustreError: 7256:0:(lcommon_cl.c:1201:cl_file_inode_init()) Failure to initialize cl object [0x5ae58ebb30:0x27:0x0]: -5
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) header@c000000c6a188980[0x0, 28, [0x100f00000:0x131a05a:0x0] hash]{
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) ....lovsub@c000000c6a188a18[0]
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) ....osc@c000000d6758cad0id: 0x0:20029530 idx: 240 gen: 0 kms_valid: 1 kms 1618944 rc: 0 force_sync: 0 min_xid: 0 size: 1618944 mtime: 1368036151 atime: 1368036151 ctime: 1368036150 blocks: 2048
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) } header@c000000c6a188980
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:184:lov_init_sub()) stripe 0 is already owned.
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) header@c000000f4fd1c8f8[0x0, 45, [0x5ae58ebb30:0x27:0x0] hash]{
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....vvp@c000000f4fd1c990(+ 0 0) inode: c000000c6a513178 6549798167695589415/1524993723 100666 1 0 c000000f4fd1c990 [0x5ae58ebb30:0x27:0x0]
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....lov@c000000c562bdfd8stripes: 2, valid, lsm{c000000c56b65b00 0x0BD10BD0 1 2 0}:
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) header@c000000c6a188980[0x0, 28, [0x100f00000:0x131a05a:0x0] hash]{
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....lovsub@c000000c6a188a18[0]
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....osc@c000000d6758cad0id: 0x0:20029530 idx: 240 gen: 0 kms_valid: 1 kms 1618944 rc: 0 force_sync: 0 min_xid: 0 size: 1618944 mtime: 1368036151 atime: 1368036151 ctime: 1368036150 blocks: 2048
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) } header@c000000c6a188980
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) header@c000000c6a188a70[0x0, 18, [0x1006b0000:0x130771a:0x0] hash]{
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....lovsub@c000000c6a188b08[1]
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) ....osc@c000000d6758cbf0id: 0x0:19953434 idx: 107 gen: 0 kms_valid: 1 kms 1048576 rc: 0 force_sync: 0 min_xid: 0 size: 1048576 mtime: 1368036151 atime: 1368036151 ctime: 1368036151 blocks: 2048
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) } header@c000000c6a188a70
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub())
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) } header@c000000f4fd1c8f8
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:185:lov_init_sub()) owned.
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:186:lov_init_sub()) header@c000000f4fd1c6e8[0x0, 1, [0x5ae58ebb30:0x27:0x0]]
2013-05-08 11:02:31 LustreError: 7256:0:(lov_object.c:186:lov_init_sub()) try to own.
2013-05-08 11:02:31 LustreError: 7256:0:(llite_lib.c:2197:ll_prep_inode()) new_inode -fatal: rc -5
2013-05-08 11:08:51 LustreError: 45085:0:(mdc_locks.c:143:mdc_set_lock_data()) ASSERTION( old_inode->i_state & I_FREEING ) failed: Found existing inode c000000d336808f8/6549372160627028526/1524894535 state 0 in lock: setting data to c000000c6abfdb78/252641838/1524894535
2013-05-08 11:08:51 LustreError: 45085:0:(mdc_locks.c:143:mdc_set_lock_data()) LBUG
2013-05-08 11:08:51 Call Trace:
2013-05-08 11:08:51 [c000000ddf04f1d0] [c000000000012f04] .show_stack+0x74/0x1c0 (unreliable)
2013-05-08 11:08:51 [c000000ddf04f280] [d00000000a9e0cb8] .libcfs_debug_dumpstack+0xd8/0x150 [libcfs]
2013-05-08 11:08:51 [c000000ddf04f330] [d00000000a9e1480] .lbug_with_loc+0x50/0xc0 [libcfs]
2013-05-08 11:08:51 [c000000ddf04f3c0] [d00000000c24fd2c] .mdc_set_lock_data+0x33c/0x340 [mdc]
2013-05-08 11:08:51 [c000000ddf04f480] [d00000000c4a8674] .ll_lookup_it_finish+0xb34/0x1700 [lustre]
2013-05-08 11:08:51 [c000000ddf04f5d0] [d00000000c4a96d8] .ll_lookup_it+0x498/0xfb0 [lustre]
2013-05-08 11:08:51 [c000000ddf04f750] [d00000000c4aa4b4] .ll_lookup_nd+0x2c4/0x580 [lustre]
2013-05-08 11:08:51 [c000000ddf04f810] [c0000000001d0614] .do_lookup+0x254/0x2d0
2013-05-08 11:08:51 [c000000ddf04f8e0] [c0000000001d3348] .__link_path_walk+0x1f8/0x15a0
2013-05-08 11:08:51 [c000000ddf04fa10] [c0000000001d4aa8] .path_walk+0x98/0x180
2013-05-08 11:08:51 [c000000ddf04fab0] [c0000000001d4d9c] .do_path_lookup+0x7c/0xf0
2013-05-08 11:08:51 [c000000ddf04fb40] [c0000000001d5b80] .user_path_at+0x60/0xb0
2013-05-08 11:08:51 [c000000ddf04fc90] [c0000000001c9034] .vfs_fstatat+0x44/0xb0
2013-05-08 11:08:51 [c000000ddf04fd30] [c0000000001c9274] .SyS_stat64+0x24/0x60
2013-05-08 11:08:51 [c000000ddf04fe30] [c000000000008564] syscall_exit+0x0/0x40
2013-05-08 11:08:51 Kernel panic - not syncing: LBUG
2013-05-08 11:08:51 Call Trace:
2013-05-08 11:08:51 [c000000ddf04f1f0] [c000000000012f04] .show_stack+0x74/0x1c0 (unreliable)
2013-05-08 11:08:51 [c000000ddf04f2a0] [c0000000005c4f34] .panic+0xc4/0x1f8
2013-05-08 11:08:51 [c000000ddf04f330] [d00000000a9e14e0] .lbug_with_loc+0xb0/0xc0 [libcfs]
2013-05-08 11:08:51 [c000000ddf04f3c0] [d00000000c24fd2c] .mdc_set_lock_data+0x33c/0x340 [mdc]
2013-05-08 11:08:51 [c000000ddf04f480] [d00000000c4a8674] .ll_lookup_it_finish+0xb34/0x1700 [lustre]
2013-05-08 11:08:51 [c000000ddf04f5d0] [d00000000c4a96d8] .ll_lookup_it+0x498/0xfb0 [lustre]
2013-05-08 11:08:51 [c000000ddf04f750] [d00000000c4aa4b4] .ll_lookup_nd+0x2c4/0x580 [lustre]
2013-05-08 11:08:51 [c000000ddf04f810] [c0000000001d0614] .do_lookup+0x254/0x2d0
2013-05-08 11:08:51 [c000000ddf04f8e0] [c0000000001d3348] .__link_path_walk+0x1f8/0x15a0
2013-05-08 11:08:51 [c000000ddf04fa10] [c0000000001d4aa8] .path_walk+0x98/0x180
2013-05-08 11:08:51 [c000000ddf04fab0] [c0000000001d4d9c] .do_path_lookup+0x7c/0xf0
2013-05-08 11:08:51 [c000000ddf04fb40] [c0000000001d5b80] .user_path_at+0x60/0xb0
2013-05-08 11:08:51 [c000000ddf04fc90] [c0000000001c9034] .vfs_fstatat+0x44/0xb0
2013-05-08 11:08:51 [c000000ddf04fd30] [c0000000001c9274] .SyS_stat64+0x24/0x60
2013-05-08 11:08:51 [c000000ddf04fe30] [c000000000008564] syscall_exit+0x0/0x40
2013-05-08 11:08:51 May 8 11:08:51 rzuseqlac2 kernel: LustreError: 45085:0:(mdc_locks.c:143:mdc_set_lock_data()) ASSERTION( old_inode->i_state & I_FREEING ) failed: Found existing inode c000000d336808f8/6549372160627028526/1524894535 state 0 in lock: setting data to c000000c6abfdb78/252641838/1524894535
2013-05-08 11:08:51 May 8 11:08:51 rzuseqlac2 kernel: LustreError: 45085:0:(mdc_locks.c:143:mdc_set_lock_data()) LBUG
2013-05-08 11:08:51 May 8 11:08:51 rzuseqlac2 kernel: Kernel panic - not syncing: LBUG
We have two crash dumps available for analysis. LLNL-bug-id: BG-165 |
| Comments |
| Comment by Jinshan Xiong (Inactive) [ 10/May/13 ] |
|
Is this another occurrence of |
| Comment by Peter Jones [ 10/May/13 ] |
|
Di Can you comment whether this is the same issue as Thanks Peter |
| Comment by Ned Bass [ 10/May/13 ] |
|
These crashes occurred shortly after updating the PPC client to Lustre 2.3.64. Prior to this we were running 2.3.58 on the client and never hit this bug. |
| Comment by Di Wang [ 12/May/13 ] |
|
Oh, server version is 2.1.4, which does not have OI cache yet. So it should not be 3190, IMHO. |
| Comment by Di Wang [ 12/May/13 ] |
|
Is it easy to reproduce the problem? Could you upload crash dump somewhere? Hmm, the inode number looks unreal. 2013-05-08 11:08:51 LustreError: 45085:0:(mdc_locks.c:143:mdc_set_lock_data()) ASSERTION( old_inode->i_state & I_FREEING ) failed: Found existing inode c000000d336808f8/6549372160627028526/1524894535 state 0 in lock: setting data to c000000c6abfdb78/252641838/1524894535 2013-05-08 11:08:51 LustreError: 45085:0:(mdc_locks.c:143:mdc_set_lock_data()) LBUG I do not know whether those CLIO error message are harmful. Jinshan, Could you please comment. |
| Comment by Ned Bass [ 13/May/13 ] |
|
Di, see ftp://ftp.whamcloud.com/uploads/LU-3318.vmcore.tar.bz2 No, we don't know how to easily reproduce the problem. |
| Comment by Andreas Dilger [ 13/May/13 ] |
|
Ned, is this on a filesystem that was upgraded from 1.8 (or earlier) originally? Di, it looks like the lov_mds_md.lmm_oi is getting an IDIF FID "[0x100f00000:0x131a05a:0x0]", which doesn't make sense. Maybe it is for old 1.8 lov_mds_md that used "seq = 0" instead of filling in the generation at that point? I thought we stopped using ostid_to_fid() for lmm_oi? |
| Comment by Ned Bass [ 13/May/13 ] |
|
Yes, the filesystem was upgraded from 1.8 or earlier. |
| Comment by Di Wang [ 13/May/13 ] |
|
Andreas, that is ost_idx, not generation. /* convert an OST objid into an IDIF FID SEQ number */
static inline obd_seq fid_idif_seq(obd_id id, __u32 ost_idx)
{
return FID_SEQ_IDIF | (ost_idx << 16) | ((id >> 32) & 0xffff);
}
so ost_idx is 0xf0 in this case. oh, we only stopped using ostid_to_fid in the metadata stack of the client. Inside CLIO, it still uses ostid_to_fid there. |
| Comment by Di Wang [ 13/May/13 ] |
|
Sigh, it is a ppc64 crash dump, I do not have environment. Ned, could you please extract lustre debug log from it and upload somewhere. Thanks. |
| Comment by Ned Bass [ 13/May/13 ] |
|
Debug log from client LU-3318.lustre_debug_log.txt.gz |
| Comment by Di Wang [ 14/May/13 ] |
|
Hmm, I checked the debug log, 00000080:00020000:32.0:1367969153.115866:3776:32225:0:(llite_lib.c:2197:ll_prep_inode()) new_inode -fatal: rc -5
00020000:00020000:32.0:1367969153.123510:6448:32225:0:(lov_object.c:184:lov_init_sub()) header@c000000a3fd98200[0x0, 4, [0x100ed0000:0x130ad85:0x0] hash]{
00020000:00020000:32.0:1367969153.123522:6448:32225:0:(lov_object.c:184:lov_init_sub()) ....lovsub@c000000a3fd98298[0]
00020000:00020000:32.0:1367969153.123530:6448:32225:0:(lov_object.c:184:lov_init_sub()) ....osc@c000000a354367d0id: 0x0:19967365 idx: 237 gen: 0 kms_valid: 1 kms 2472 rc: 0 force_sync: 0 min_xid: 0 size: 2472 mtime: 1367966890 atime: 1367966860 ctime: 1367966890 blocks: 8
00020000:00020000:32.0:1367969153.123544:6448:32225:0:(lov_object.c:184:lov_init_sub()) } header@c000000a3fd98200
00020000:00020000:32.0:1367969153.123553:6096:32225:0:(lov_object.c:184:lov_init_sub()) stripe 0 is already owned.
00020000:00020000:32.0:1367969153.123563:6448:32225:0:(lov_object.c:185:lov_init_sub()) header@c000000a3fc27eb8[0x0, 3, [0x5ae187c950:0x1e332:0x0] hash]{
00020000:00020000:32.0:1367969153.156639:6448:32225:0:(lov_object.c:185:lov_init_sub()) ....vvp@c000000a3fc27f50(- 0 0) inode: c000000a555c1178 6548664631873889074/1524729801 100666 1 0 c000000a3fc27f50 [0x5ae187c950:0x1e332:0x0]
00020000:00020000:32.0:1367969153.156650:6752:32225:0:(lov_object.c:185:lov_init_sub()) ....lov@c000000a96bb9758stripes: 2, valid, lsm{c000000a2875fd80 0x0BD10BD0 1 2 0}:
00020000:00020000:32.0:1367969153.156658:6944:32225:0:(lov_object.c:185:lov_init_sub()) header@c000000a3fd98200[0x0, 4, [0x100ed0000:0x130ad85:0x0] hash]{
00020000:00020000:32.0:1367969153.156664:6944:32225:0:(lov_object.c:185:lov_init_sub()) ....lovsub@c000000a3fd98298[0]
00020000:00020000:32.0:1367969153.156689:6944:32225:0:(lov_object.c:185:lov_init_sub()) ....osc@c000000a354367d0id: 0x0:19967365 idx: 237 gen: 0 kms_valid: 1 kms 2472 rc: 0 force_sync: 0 min_xid: 0 size: 2472 mtime: 1367966890 atime: 1367966860 ctime: 1367966890 blocks: 8
00020000:00020000:32.0:1367969153.156711:6944:32225:0:(lov_object.c:185:lov_init_sub()) } header@c000000a3fd98200
00020000:00020000:32.0:1367969153.156749:6944:32225:0:(lov_object.c:185:lov_init_sub()) header@c000000a3fd982f0[0x0, 2, [0x1010e0000:0x1330db1:0x0] hash]{
00020000:00020000:32.0:1367969153.167572:6944:32225:0:(lov_object.c:185:lov_init_sub()) ....lovsub@c000000a3fd98388[1]
00020000:00020000:32.0:1367969153.167579:6944:32225:0:(lov_object.c:185:lov_init_sub()) ....osc@c000000a354368f0id: 0x0:20123057 idx: 270 gen: 0 kms_valid: 1 kms 0 rc: 0 force_sync: 0 min_xid: 0 size: 0 mtime: 1367966864 atime: 1367966864 ctime: 1367966864 blocks: 0
00020000:00020000:32.0:1367969153.167588:6944:32225:0:(lov_object.c:185:lov_init_sub()) } header@c000000a3fd982f0
00020000:00020000:32.0:1367969153.167592:6448:32225:0:(lov_object.c:185:lov_init_sub())
00020000:00020000:32.0:1367969153.167596:6448:32225:0:(lov_object.c:185:lov_init_sub()) } header@c000000a3fc27eb8
00020000:00020000:32.0:1367969153.167600:6096:32225:0:(lov_object.c:185:lov_init_sub()) owned.
00020000:00020000:32.0:1367969153.167605:6256:32225:0:(lov_object.c:186:lov_init_sub()) header@c0000005ee567990[0x0, 1, [0x5ae187c950:0x1e332:0x0]]
00020000:00020000:32.0:1367969153.167610:6096:32225:0:(lov_object.c:186:lov_init_sub()) try to own.
Hmm it seems different lov objects have the same FID, header@c000000a3fc27eb8[0x0, 3, [0x5ae187c950:0x1e332:0x0] hash] header@c0000005ee567990[0x0, 1, [0x5ae187c950:0x1e332:0x0]] So maybe CLIO cache or LU cache has some problem. Jinshan, please comment. Note: it is 2.1.4 server with 2.3.64 ppc client. |
| Comment by Andreas Dilger [ 14/May/13 ] |
|
It's also possible that the LOV objects actually DO have the same FID, since there has never been any checking of this field (it has only ever been used for debug messages). This could easily happen in case of a backup and restore of 1.8 inodes, and then creating new files under 1.8. Are there any messages that print the actual inode numbers instead of the LOV EA FID? That would allow us to check via lfs getstripe -v if the LOV EA FIDs are actually the same. At that point it should be easy to lfs_migrate one of the files to a new one. This would preferably be done on the file where the LOV EA FID does not match the inode number (i.e. ls -i). |
| Comment by Jinshan Xiong (Inactive) [ 14/May/13 ] |
|
Hi Ned, were you running 32bit applications in PPC64? |
| Comment by Ned Bass [ 14/May/13 ] |
|
Hi Jinshan, I think it's unlikely that 32-bit applications were running, but I'll investigate that possibility. |
| Comment by Di Wang [ 14/May/13 ] |
|
Andreas, for LOV objects, I actually mean top objects being built in CLIO stack, whose FID is actually got from lookup, for igif, it should be got from ino directly, instead of lmm_oi, if that is what you mean. But I just find this IGIF seq number[0x5ae187c950:0x1e332:0x0] is even bigger than normal seq 0x200000400ULL, so clearly something wrong here. |
| Comment by Di Wang [ 14/May/13 ] |
|
Hmm, According to the debug message, 00020000:00020000:9.0:1367969152.885287:6448:32223:0:(lov_object.c:185:lov_init_sub()) header@c000000a3fc27048[0x0, 3, [0x5ae187c950:0x1e322:0x0] hash]{
00020000:00020000:9.0:1367969152.885298:6448:32223:0:(lov_object.c:185:lov_init_sub()) ....vvp@c000000a3fc270e0(- 0 0) inode: c000000a35e8bbf8 6548664631873889058/1524729801 100666 1 0 c000000a3fc270e0 [0x5ae187c950:0x1e322:0x0]
inode ino is 6548664631873889058, which is indeed built by [0x5ae187c950:0x1e322:0x0]. But 0x5ae187c950 seems too big for me, (0x5ae187c950 - 0x20000400) = 380B FID sequences has been used. Even if the system is keeping umount/mount, it will take 380B/1000 = 380M amount/mount to consume 380 Billion sequence, which seems unlikely to me. Ned, how long the system has been upgraded to 2.1.4? Could you please try this "lfs fid2path 0x5ae187c950:0x1e322:0x0" |
| Comment by Ned Bass [ 14/May/13 ] |
|
Di,
I believe we upgraded it in March 2012.
It returns no such file or directory. |
| Comment by Ned Bass [ 14/May/13 ] |
|
For reference, recently created files also have "big" sequence numbers: # rzuseqlac2 /root > touch /p/lscratchrza/bass6/file1 # rzuseqlac2 /root > lfs path2fid !$ lfs path2fid /p/lscratchrza/bass6/file1 [0x5ae60e2460:0x55c7:0x0] |
| Comment by Jinshan Xiong (Inactive) [ 14/May/13 ] |
|
For the duplicate inode problem at the client site, the only possible cause I can discover now is that 32bit applications were running. |
| Comment by Andreas Dilger [ 14/May/13 ] |
|
Ned, do you have any x86 clients that can access this filesystem? Do they return something different for lfs path2fid /p/lscratchrza/bass6/file1? Di, is there a file under /proc/fs/lustre/fld/ on the MDS that can dump the allocated SEQ numbers? I agree that 0x5ae60e2460 seems completely unreasonable for a sequence number. |
| Comment by Ned Bass [ 14/May/13 ] |
|
Andreas, lfs path2fid /p/lscratchrza/bass6/file1 on an x86_64 client returns the same fid. |
| Comment by Di Wang [ 14/May/13 ] |
|
Andreas, there are no such stuff in 2.1 lustre. |
| Comment by D. Marc Stearman (Inactive) [ 14/May/13 ] |
|
This file system hardware is 7+ years old. It may have been reformatted once about 5 years ago. Large object numbers would not surprise me. It probably started out running lustre 1.6. |
| Comment by Di Wang [ 14/May/13 ] |
|
There is bug Ned, could you please check whether there are lmv layer on the client side? and also confirm if you erased the config log during upgrading from 1.8 to 2.1. And also are you able to find out some old files, which are created near after the upgrading and near before updating client to 2.3.58, to see how is their FID? Thanks! |
| Comment by Andreas Dilger [ 14/May/13 ] |
|
Di, you previously wrote:
Is there a patch needed to fix that problem? Could it lead to this inconsistency (or maybe the |
| Comment by Ned Bass [ 14/May/13 ] |
No, there is no lmv layer on the client.
It looks like we did not: llog_reader CONFIGS/lsa-MDT0000 Header size : 8192
Time : Thu Sep 29 09:45:18 2011
Number of records: 3486
Target uuid : config_uuid
-----------------------
#01 (224)marker 1 (flags=0x01, v1.8.5.0) lsa-mdtlov 'lov setup' Thu Sep 29 09:45:18 2011-
#02 (112)attach 0:lsa-mdtlov 1:lov 2:lsa-mdtlov_UUID
#03 (168)lov_setup 0:lsa-mdtlov 1:(struct lov_desc)
uuid=lsa-mdtlov_UUID stripe:cnt=1 size=1048576 offset=18446744073709551615 pattern=0x1
#04 (224)marker 1 (flags=0x02, v1.8.5.0) lsa-mdtlov 'lov setup' Thu Sep 29 09:45:18 2011-
#05 (224)marker 2 (flags=0x01, v1.8.5.0) lsa-MDT0000 'add mdt' Thu Sep 29 09:45:18 2011-
I'll look, but I'm not very hopeful of finding any. Do the files needs to have been created on a pre-2.3.58 client? We have 2.1 clients using this filesystem too, and no way to tell which clients created which files. |
| Comment by Di Wang [ 14/May/13 ] |
|
Andreas: Yes, the conversion of lmm_oi to FID has been separated from ostid_to_fid, and the patch http://review.whamcloud.com/#change,6044, and also since lmm_oi is only for debugging(or output for getstripe)purpose right now. so we might remove that finally. And for CLIO object, which is created by FID directly, i.e. the FID we got from name lookup, which is nothing related with lmm_oi, IMHO. So it seems to me lmm_oi is not the problem here. But I will check. Hmm, this indeed seems similar with May 12 22:51:39 l2 kernel: LustreError: 4543:0:(mdc_locks.c:143:mdc_set_lock_data()) ASSERTION( old_inode->i_state & I_FREEING ) failed: Found existing inode ffff88084365e178/144115205255786850/33554436 state 0 in lock: setting data to ffff8808689361b8/144115205255786850/33554436 So it is same lock, same inode i_ino/generation, but different inodes(pointer are different). And this ticket 2013-05-08 11:08:51 LustreError: 45085:0:(mdc_locks.c:143:mdc_set_lock_data()) ASSERTION( old_inode->i_state & I_FREEING ) failed: Found existing inode c000000d336808f8/6549372160627028526/1524894535 state 0 in lock: setting data to c000000c6abfdb78/252641838/1524894535 2013-05-08 11:08:51 LustreError: 45085:0:(mdc_locks.c:143:mdc_set_lock_data()) LBUG it is same lock, generation, but different i_ino and inodes. quite interesting. |
| Comment by Di Wang [ 14/May/13 ] |
|
Ned: yes, if you can find the files created just before and after you upgrade client to 2.3.58, and tell me their FIDs, that would be great. Since I want to understand whether this giant sequence is caused by |
| Comment by Di Wang [ 14/May/13 ] |
|
According to the error message here 2013-05-08 11:08:51 LustreError: 45085:0:(mdc_locks.c:143:mdc_set_lock_data()) ASSERTION( old_inode->i_state & I_FREEING ) failed: Found existing inode c000000d336808f8/6549372160627028526/1524894535 state 0 in lock: setting data to c000000c6abfdb78/252641838/1524894535 2013-05-08 11:08:51 LustreError: 45085:0:(mdc_locks.c:143:mdc_set_lock_data()) LBUG The old inode is 252641838, which definitely can not got from fid_flatten([0x5ae40b47b0:0x1b634:0x0]), so it should use fid_flatten32 to form the ino by the FID. But later it use fid_flatten to form the ino for the same object, which cause the LBUG. So clearly some one is calling 32-bit system call for one object, but later it call 64bit system call again for this object. |
| Comment by Di Wang [ 14/May/13 ] |
|
Another interesting problem is that according to the comment from Ned (14/May/13 7:51 AM) # rzuseqlac2 /root > touch /p/lscratchrza/bass6/file1 # rzuseqlac2 /root > lfs path2fid !$ lfs path2fid /p/lscratchrza/bass6/file1 [0x5ae60e2460:0x55c7:0x0] And when the ticket is created header@c000000f4fd1c6e8[0x0, 1, [0x5ae58ebb30:0x27:0x0]] the sequence is around 0x5ae58ebb30, Ned, I am not sure if you can still find this file (lfs fid2path 0x5ae58ebb30:0x27:0x0)and find out its creation time. If that file is being created around that time, then it consume about (0x5ae60e2460 - 0x5ae58ebb30) = 8M sequence in 4 days, which still seems too much for me, except I miss sth here. Ned, how many clients does this cluster have? And currently all clients are running with 2.3.58 right? |
| Comment by Ned Bass [ 14/May/13 ] |
|
Ah apparently our 2.1 x86 clients don't have the $ touch a b c d $ for x in a b c d ; do lfs path2fid $x ; done [0x5b05195040:0x1:0x0] [0x5b05197750:0x1:0x0] [0x5b05199e60:0x1:0x0] [0x5b0519c188:0x1:0x0] $ pwd /p/lscratchrza/bass6 $ rpm -q lustre lustre-2.1.2-4chaos_2.6.32_220.23.1.1chaos.ch5.x86_64.x86_64 $ arch x86_64 |
| Comment by D. Marc Stearman (Inactive) [ 14/May/13 ] |
|
We didn't upgrade all of our clients to 2.1.4 because we paused the rollout due to |
| Comment by Di Wang [ 14/May/13 ] |
|
Is that possible those 32 bits application running on these ppc 64-bit clients? And also, is it possible to umount clients and erase the config log with tunefs, and remount clients, so lmv layer can be created on client side, which will slow down the sequence consume speed. |
| Comment by Di Wang [ 15/May/13 ] |
|
Just tried to figure out the FID from these two inodes in the error message. The FID should be [0x5ae40b47b0: 0x1b22e: 0]. So fid_flatten( [0x5ae40b47b0: 0x1b22e: 0]) = 6549372160627028526 and fid_flatten32([0x5ae40b47b0: 0x1b22e: 0]) = 252641838 which just match the error message and confirm this guess 2013-05-08 11:08:51 LustreError: 45085:0:(mdc_locks.c:143:mdc_set_lock_data()) ASSERTION( old_inode->i_state & I_FREEING ) failed: Found existing inode c000000d336808f8/6549372160627028526/1524894535 state 0 in lock: setting data to c000000c6abfdb78/252641838/1524894535 2013-05-08 11:08:51 LustreError: 45085:0:(mdc_locks.c:143:mdc_set_lock_data()) LBUG But not sure how to fix this. |
| Comment by Ned Bass [ 15/May/13 ] |
We confirmed that it is possible to build and run 32-bit apps on the ppc64 client that hit this bug. |
| Comment by Jinshan Xiong (Inactive) [ 15/May/13 ] |
|
We should always use `unsigned long' in kernel space. The only concern for 32bit compatibility is when we're filling in the user buffer. So in this case for ll_iget(), we should call cl_fid_build_ino() with api32 = 0. |
| Comment by Di Wang [ 15/May/13 ] |
|
The problem is that if some 32bit user(like NFS ?) will use ino directly? if we cut 64 bit inode ino to 32 bit and return it to the user, which might not be able to get back the inode with this 32 bit ino? |
| Comment by Andreas Dilger [ 15/May/13 ] |
|
This was discussed in the past, and the assumption was made that no applications were going to access both 32-bit and 64-bit inode numbers at the same time. The inode number should not be stored or used internally, but rather the FID should always be used for internal access, and the inode number should always be regenerated in the context of the user thread that is doing the stat() or whatever needs it. |
| Comment by Scott Sampson [ 15/May/13 ] |
|
I can confirm that |
| Comment by Jinshan Xiong (Inactive) [ 15/May/13 ] |
I thought we pack FID into NFS file handle. I can't think of an use case of using ino directly - but apparently I'm not an expert about this |
| Comment by Andreas Dilger [ 15/May/13 ] |
|
Jinshan, |
| Comment by Jinshan Xiong (Inactive) [ 15/May/13 ] |
|
Hi Andreas, why is 32-bit inode number still needed by internal code? |
| Comment by Di Wang [ 15/May/13 ] |
|
Hmm, how about in prep_inode, we always try to locate inode(ilookup) by ino created by fid_flatten32 first, if it can not find any inodes associated with the ino, then it will prepare the inode in the normal way. Since 32 bit application might be rare thing these days, we probably can add a flag in super block, and then set the flag if there are 32 bit application running, then only do fid_flatten32 check if this flag is being set. Though this method might make it a bit complicate, and also slow down the client metadata performance if there are 32 bit application has been running. |
| Comment by Andreas Dilger [ 16/May/13 ] |
|
My preference would be that we use 64-bit inode numbers internally all the time, and only expose 32-bit inode numbers to the application. It looks like this is a defect introduced by http://review.whamcloud.com/5711 from |
| Comment by Di Wang [ 16/May/13 ] |
|
yes, I agree removing cfs_curproc_is_32bit should fix this problem. http://review.whamcloud.com/#change,6371 |
| Comment by nasf (Inactive) [ 17/May/13 ] |
|
Mixed using 64-bit ino# internally and 32-bit ino# external for 32-bit applications may cause unexpectation, because we do not know how the 32-bit ino# will be used, and cannot prevent it to be passed down back to llite for using. So for a give Lustre client, either all 32-bit ino# (on 32-bit platform or via "-o 32bitapi") or all 64-bit ino#, NOT mixed. |
| Comment by Andreas Dilger [ 17/May/13 ] |
|
Most applications do not use or care about the inode number. However, 32-bit applications ona 64-bit client will definitely get an error from stat() if the curproc check is removed. That is MUCH more likely to cause a problem for users than a very obscure case of someone mixing 32-bit and 64-bit applications that use the inode number on the same client and expecting them to work well together. This has been in all of the 2.x releases without any complaints, so I'd rather leave this alone. |
| Comment by Ned Bass [ 17/May/13 ] |
|
Can we detect this case and return an error rather than asserting on it? Allowing a misbehaved application to crash the OS seems unacceptable. |
| Comment by Ned Bass [ 17/May/13 ] |
|
I'm having trouble understanding what exactly we think applications are doing to trigger this bug. For example, nasf said:
How would an application pass an inode number back to llite? I'm not aware of any library or system calls that allow files to be accessed by inode number. Do we think it has to be the same process mixing 32-bit and 64-bit system calls, or is it problematic just to mix 32-bit and 64-bit processes that access the same files on the same client? |
| Comment by Scott Sampson [ 17/May/13 ] |
|
Ned, fwiw, my related case |
| Comment by Ned Bass [ 17/May/13 ] |
|
I managed to reproduce the bug by running this repeatedly, where stat32 is a program which I compiled with -m32 that just calls stat() on its argument. echo 3 > /proc/sys/vm/drop_caches cat $FILE > /dev/null sleep 1 ./stat32 $FILE I eventually hit the ASSERTION running cat. The behavior is non-deterministic. Sometimes stat32 gets -EIO back from stat() with this console error: LustreError: 59173:0:(llite_lib.c:2197:ll_prep_inode()) new_inode -fatal: rc -5 Other times there are no failures. |
| Comment by Ned Bass [ 17/May/13 ] |
|
Also, I looked more closely at the original crash dump to see what the application was doing. The user seems to be building a C++ application using the Boost.Jam build tool and the IBM xlC compilers. Here is the process hierarchy for the process that crashed: crash> ps -p 25993
PID: 0 TASK: c000000000e903f0 CPU: 0 COMMAND: "swapper"
PID: 1 TASK: c000000f59aa87f0 CPU: 45 COMMAND: "init"
PID: 3900 TASK: c000000f43edbb30 CPU: 32 COMMAND: "ksshd"
PID: 48487 TASK: c000000dc5d10900 CPU: 33 COMMAND: "ksshd"
PID: 48807 TASK: c000000d4eed2a70 CPU: 38 COMMAND: "ksshd"
PID: 48810 TASK: c000000385466480 CPU: 11 COMMAND: "tcsh"
PID: 5628 TASK: c0000004c12fe520 CPU: 12 COMMAND: "xjam1"
PID: 5925 TASK: c000000f3667a810 CPU: 36 COMMAND: "bjam"
PID: 22403 TASK: c0000004c0594e60 CPU: 17 COMMAND: "bgxlC"
PID: 25989 TASK: c0000003853634c0 CPU: 36 COMMAND: "ipa"
PID: 25993 TASK: c000000f36458ff0 CPU: 12 COMMAND: "xlCcode"
The file with the duplicate FID was a .cpp source file. I'm not sure which part of the tool chain is 32-bit, but this isn't exactly an obscure use case. |
| Comment by Di Wang [ 18/May/13 ] |
|
Hmm, since there are not enough debug log here, here is what I think it happens 1. Some 32 bit application(thread) access the file, which will prepare inode with 32 bit ino(because of http://review.whamcloud.com/#change,5711), and also get back LOOKUP|UPDATE|LAYOUT lock. Probably .cpp file is being processed by some 32-bit pre-processing tool before building or during building? Just wild guess. Anyway, the 32-bit ino must be built because the file has been accessed by one thread with TIF_32BIT flag. I will update the patch http://review.whamcloud.com/#change,6371 according to Andreas's comment. |
| Comment by Jodi Levi (Inactive) [ 31/May/13 ] |
|
Now that Change, 6371 has landed to master can this ticket be closed? Or is there additional patches to land? |
| Comment by Ned Bass [ 31/May/13 ] |
|
Yes, this can be closed. |
| Comment by Peter Jones [ 31/May/13 ] |
|
Thanks Ned! |