[LU-7798] ll_prep_inode()) ASSERTION( fid_is_sane(&md.body->mbo_fid1) ) failed: Created: 19/Feb/16 Updated: 07/Oct/16 Resolved: 07/Oct/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Frank Heckes (Inactive) | Assignee: | WC Triage |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | soak | ||
| Environment: |
lola |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
Error happened during soak testing of build '20160218' (see: https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160218). DNE is enabled. Event history:
Error message and stack trace is same for both events: <6>Lustre: soaked-MDT0004-mdc-ffff880852d1a000: Connection restored to 192.168.1.110@o2ib10 (at 192.168.1.110@o2ib10) <0>LustreError: 23705:0:(llite_lib.c:2295:ll_prep_inode()) ASSERTION( fid_is_sane(&md.body->mbo_fid1) ) failed: <0>LustreError: 23705:0:(llite_lib.c:2295:ll_prep_inode()) LBUG <4>Pid: 23705, comm: pct <4> <4>Call Trace: <4> [<ffffffffa050b875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa050be77>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa0aa0192>] ll_prep_inode+0x752/0xc40 [lustre] <4> [<ffffffffa07f6d60>] ? lustre_swab_mdt_body+0x0/0x130 [ptlrpc] <4> [<ffffffffa0ab42b2>] ll_new_node+0x682/0x7f0 [lustre] <4> [<ffffffffa0ab6b04>] ll_mkdir+0x104/0x220 [lustre] <4> [<ffffffff8122ec0f>] ? security_inode_permission+0x1f/0x30 <4> [<ffffffff8119d759>] vfs_mkdir+0xd9/0x140 <4> [<ffffffff811a04e7>] sys_mkdirat+0xc7/0x1b0 <4> [<ffffffff8100c6f5>] ? math_state_restore+0x45/0x60 <4> [<ffffffff811a05e8>] sys_mkdir+0x18/0x20 <4> [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b <4> <0>Kernel panic - not syncing: LBUG <4>Pid: 23705, comm: pct Not tainted 2.6.32-504.30.3.el6.x86_64 #1 <4>Call Trace: <4> [<ffffffff815293fc>] ? panic+0xa7/0x16f <4> [<ffffffffa050becb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] <4> [<ffffffffa0aa0192>] ? ll_prep_inode+0x752/0xc40 [lustre] <4> [<ffffffffa07f6d60>] ? lustre_swab_mdt_body+0x0/0x130 [ptlrpc] <4> [<ffffffffa0ab42b2>] ? ll_new_node+0x682/0x7f0 [lustre] <4> [<ffffffffa0ab6b04>] ? ll_mkdir+0x104/0x220 [lustre] <4> [<ffffffff8122ec0f>] ? security_inode_permission+0x1f/0x30 <4> [<ffffffff8119d759>] ? vfs_mkdir+0xd9/0x140 <4> [<ffffffff811a04e7>] ? sys_mkdirat+0xc7/0x1b0 <4> [<ffffffff8100c6f5>] ? math_state_restore+0x45/0x60 <4> [<ffffffff811a05e8>] ? sys_mkdir+0x18/0x20 <4> [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b
Attached console, messages and vmcore-dmesg.txt from both clients. |
| Comments |
| Comment by Frank Heckes (Inactive) [ 19/Feb/16 ] |
|
Crash files have been saved to lhn.lola.hpdd.intel.com:/scratch/crashdumps/lu-7798/ |
| Comment by Frank Heckes (Inactive) [ 19/Feb/16 ] |
|
I forgot to mention that the OSS nodes have been extended to operate in active-active failover configuration |
| Comment by Andreas Dilger [ 19/Feb/16 ] |
|
The client should not crash when accessing bad data from the network. It is very likely that this is a result of bad data being sent from the server, but that is a separate issue. This bug should focus on adding proper error handling to ll_prep_inode() for this case instead of crashing. |
| Comment by Cliff White (Inactive) [ 10/Mar/16 ] |
|
This bug has re-occured in 2.8.0-RC5 testing Lustre: 3790:0:(client.c:2063:ptlrpc_expire_one_request()) Skipped 6 previous similar messages^M
Lustre: 3779:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1457583412/real 1457583412] req@ffff880275d579c0 x1528367925364328/t0(0) o38->soaked-MDT0007-mdc-ffff88082f2f4800@192.168.1.111@o2ib10:12/10 lens 520/544 e 0 to 1 dl 1457583433 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1^M
Lustre: 3779:0:(client.c:2063:ptlrpc_expire_one_request()) Skipped 6 previous similar messages^M
LustreError: 11-0: soaked-MDT0006-mdc-ffff88082f2f4800: operation mds_reint to node 192.168.1.110@o2ib10 failed: rc = -19^M
Lustre: soaked-MDT0007-mdc-ffff88082f2f4800: Connection restored to 192.168.1.111@o2ib10 (at 192.168.1.111@o2ib10)^M
LustreError: 3779:0:(client.c:2874:ptlrpc_replay_interpret()) @@@ request replay timed out.^M
req@ffff88046cbfb6c0 x1528367924440432/t412317941034(412317941034) o36->soaked-MDT0006-mdc-ffff88082f2f4800@192.168.1.111@o2ib10:12/10 lens 616/192 e 1 to 1 dl 1457583622 ref 2 fl Interpret:EX/4/ffffffff rc -110/-1^M
Lustre: soaked-MDT0006-mdc-ffff88082f2f4800: Connection restored to 192.168.1.111@o2ib10 (at 192.168.1.111@o2ib10)^M
LustreError: 85659:0:(llite_lib.c:2295:ll_prep_inode()) ASSERTION( fid_is_sane(&md.body->mbo_fid1) ) failed: ^M
LustreError: 85659:0:(llite_lib.c:2295:ll_prep_inode()) LBUG^M
Pid: 85659, comm: pct^M
^M
Call Trace:^M
[<ffffffffa045c875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]^M
[<ffffffffa045ce77>] lbug_with_loc+0x47/0xb0 [libcfs]^M
[<ffffffffa09ff192>] ll_prep_inode+0x752/0xc40 [lustre]^M
[<ffffffffa0747d60>] ? lustre_swab_mdt_body+0x0/0x130 [ptlrpc]^M
[<ffffffffa0a132b2>] ll_new_node+0x682/0x7f0 [lustre]^M
[<ffffffffa0a15b04>] ll_mkdir+0x104/0x220 [lustre]^M
[<ffffffff8122ec0f>] ? security_inode_permission+0x1f/0x30^M
[<ffffffff8119d759>] vfs_mkdir+0xd9/0x140^M
[<ffffffff811a04e7>] sys_mkdirat+0xc7/0x1b0^M
[<ffffffff8100c6f5>] ? math_state_restore+0x45/0x60^M
[<ffffffff811a05e8>] sys_mkdir+0x18/0x20^M
[<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b^M
^M
|
| Comment by Cliff White (Inactive) [ 10/Mar/16 ] |
|
Rest of kernel panic Kernel panic - not syncing: LBUG^M Pid: 85659, comm: pct Not tainted 2.6.32-504.30.3.el6.x86_64 #1^M Call Trace:^M [<ffffffff815293fc>] ? panic+0xa7/0x16f^M [<ffffffffa045cecb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]^M [<ffffffffa09ff192>] ? ll_prep_inode+0x752/0xc40 [lustre]^M [<ffffffffa0747d60>] ? lustre_swab_mdt_body+0x0/0x130 [ptlrpc]^M [<ffffffffa0a132b2>] ? ll_new_node+0x682/0x7f0 [lustre]^M [<ffffffffa0a15b04>] ? ll_mkdir+0x104/0x220 [lustre]^M [<ffffffff8122ec0f>] ? security_inode_permission+0x1f/0x30^M [<ffffffff8119d759>] ? vfs_mkdir+0xd9/0x140^M [<ffffffff811a04e7>] ? sys_mkdirat+0xc7/0x1b0^M [<ffffffff8100c6f5>] ? math_state_restore+0x45/0x60^M [<ffffffff811a05e8>] ? sys_mkdir+0x18/0x20^M [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b^M Initializing cgroup subsys cpuset^M Initializing cgroup subsys cpu^M crash dump has been copied to lola:/scratch/crashdumps/lu-7798/ |
| Comment by Cliff White (Inactive) [ 06/Oct/16 ] |
|
We appear to have hit this again, while doing ZFS metadata testing: Oct 6 22:18:08 spirit-17 kernel: LustreError: 36451:0:(llite_lib.c:2465:ll_prep_inode()) ASSERTION( fid_is_sane(&md.body->mbo_fid1) ) failed: Oct 6 22:18:08 spirit-17 kernel: LustreError: 36451:0:(llite_lib.c:2465:ll_prep_inode()) LBUG Oct 6 22:18:08 spirit-17 kernel: Pid: 36451, comm: mdtest Oct 6 22:18:08 spirit-17 kernel: #012Call Trace: Oct 6 22:18:08 spirit-17 kernel: [<ffffffffa08ab853>] libcfs_debug_dumpstack+0x53/0x80 [libcfs] Oct 6 22:18:08 spirit-17 kernel: [<ffffffffa08abdf5>] lbug_with_loc+0x45/0xc0 [libcfs] Oct 6 22:18:08 spirit-17 kernel: [<ffffffffa0de7002>] ll_prep_inode+0xb12/0xe60 [lustre] Oct 6 22:18:08 spirit-17 kernel: [<ffffffffa0df6c3f>] ? ll_lookup_it+0x64f/0xe20 [lustre] Oct 6 22:18:08 spirit-17 kernel: [<ffffffffa0df7c2c>] ll_atomic_open+0x81c/0x12c0 [lustre] Oct 6 22:18:08 spirit-17 kernel: [<ffffffff811ec8a1>] do_last+0xa11/0x1270 Oct 6 22:18:08 spirit-17 kernel: [<ffffffff811eede2>] path_openat+0xc2/0x490 Oct 6 22:18:08 spirit-17 kernel: [<ffffffff811f05ab>] do_filp_open+0x4b/0xb0 Oct 6 22:18:08 spirit-17 kernel: [<ffffffff811fd147>] ? __alloc_fd+0xa7/0x130 Oct 6 22:18:08 spirit-17 kernel: [<ffffffff811ddf53>] do_sys_open+0xf3/0x1f0 Oct 6 22:18:08 spirit-17 kernel: [<ffffffff811de06e>] SyS_open+0x1e/0x20 Oct 6 22:18:08 spirit-17 kernel: [<ffffffff811de0a6>] SyS_creat+0x16/0x20 Oct 6 22:18:08 spirit-17 kernel: [<ffffffff81646a09>] system_call_fastpath+0x16/0x1b Oct 6 22:18:08 spirit-17 kernel: |
| Comment by Peter Jones [ 07/Oct/16 ] |
|
seems to be a duplicate of |