[LU-6665] Interop 2.7.0<->master conf-sanity test_80: (import.c:293:ptlrpc_invalidate_import()) ASSERTION( imp->imp_invalid ) failed Created: 29/May/15 Updated: 16/Jan/22 Resolved: 16/Jan/22 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Mikhail Pershin |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
server: 2.7.0 |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for sarah_lw <wei3.liu@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/6d359bc8-0035-11e5-a922-5254006e85c2. The sub-test test_80 failed with the following error: test failed to respond and timed out OST console show: 03:25:51:Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_val=10 fail_loc=0x906 03:25:51:LustreError: 11-0: MGC10.1.4.201@tcp: operation obd_ping to node 10.1.4.201@tcp failed: rc = -107 03:25:51:LustreError: Skipped 7 previous similar messages 03:26:22:LustreError: 166-1: MGC10.1.4.201@tcp: Connection to MGS (at 10.1.4.201@tcp) was lost; in progress operations using this service will fail 03:26:22:Lustre: 14127:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1432178732/real 1432178732] req@ffff880077ed96c0 x1501746053519508/t0(0) o250->MGC10.1.4.201@tcp@10.1.4.201@tcp:26/25 lens 400/544 e 0 to 1 dl 1432178738 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 03:26:22:Lustre: 14127:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 8 previous similar messages 03:26:22:Lustre: Evicted from MGS (at 10.1.4.201@tcp) after server handle changed from 0x7a35d8e1992e29fd to 0x7a35d8e1992e2aac 03:26:22:LustreError: 4602:0:(fail.c:132:__cfs_fail_timeout_set()) cfs_fail_timeout id 906 sleeping for 15000ms 03:26:22:Lustre: DEBUG MARKER: mkdir -p /mnt/ost2 03:26:22:Lustre: DEBUG MARKER: test -b /dev/lvm-Role_OSS/P2 03:26:22:Lustre: DEBUG MARKER: mkdir -p /mnt/ost2; mount -t lustre /dev/lvm-Role_OSS/P2 /mnt/ost2 03:26:22:LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. quota=on. Opts: 03:26:22:LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. quota=on. Opts: 03:26:22:LustreError: 4746:0:(fail.c:132:__cfs_fail_timeout_set()) cfs_fail_timeout id 906 sleeping for 15000ms 03:26:22:LustreError: 4602:0:(fail.c:136:__cfs_fail_timeout_set()) cfs_fail_timeout id 906 awake 03:26:22:Lustre: MGC10.1.4.201@tcp: Connection restored to MGS (at 10.1.4.201@tcp) 03:26:22:LustreError: 4746:0:(fail.c:136:__cfs_fail_timeout_set()) cfs_fail_timeout id 906 awake 03:26:22:LustreError: 4746:0:(import.c:293:ptlrpc_invalidate_import()) ASSERTION( imp->imp_invalid ) failed: 03:26:22:LustreError: 4746:0:(import.c:293:ptlrpc_invalidate_import()) LBUG 03:26:22:Pid: 4746, comm: mount.lustre 03:26:22: 03:26:22:Call Trace: 03:26:22: [<ffffffffa0820895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 03:26:22: [<ffffffffa0820e97>] lbug_with_loc+0x47/0xb0 [libcfs] 03:26:22: [<ffffffffa0c3f06d>] ptlrpc_invalidate_import+0x85d/0x930 [ptlrpc] 03:26:22: [<ffffffffa08311c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 03:26:22: [<ffffffffa0c440f6>] ? ptlrpc_set_import_discon+0xf6/0x5b0 [ptlrpc] 03:26:22: [<ffffffffa0c445e3>] ptlrpc_reconnect_import+0x33/0x1b0 [ptlrpc] 03:26:22: [<ffffffffa08311c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 03:26:22: [<ffffffffa12ea2ea>] mgc_set_info_async+0x5ea/0x1940 [mgc] 03:26:22: [<ffffffffa08311c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 03:26:22: [<ffffffffa0a006d1>] obd_set_info_async.clone.2+0xf1/0x360 [obdclass] 03:26:22: [<ffffffffa0a06c18>] lustre_start_mgc+0x14c8/0x1e00 [obdclass] 03:26:22: [<ffffffffa08311c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 03:26:22: [<ffffffffa0a356f2>] server_fill_super+0x5c2/0x1690 [obdclass] 03:26:22: [<ffffffffa082b818>] ? libcfs_log_return+0x28/0x40 [libcfs] 03:26:22: [<ffffffffa0a07ab0>] lustre_fill_super+0x560/0xa80 [obdclass] 03:26:22: [<ffffffffa0a07550>] ? lustre_fill_super+0x0/0xa80 [obdclass] 03:26:22: [<ffffffff811917af>] get_sb_nodev+0x5f/0xa0 03:26:22: [<ffffffffa09feb05>] lustre_get_sb+0x25/0x30 [obdclass] 03:26:22: [<ffffffff81190deb>] vfs_kern_mount+0x7b/0x1b0 03:26:22: [<ffffffff81190f92>] do_kern_mount+0x52/0x130 03:26:22: [<ffffffff811b2b9b>] do_mount+0x2fb/0x930 03:26:22: [<ffffffff811b3260>] sys_mount+0x90/0xe0 03:26:22: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b 03:26:22: 03:26:22:Kernel panic - not syncing: LBUG 03:26:22:Pid: 4746, comm: mount.lustre Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1 03:26:22:Call Trace: 03:26:22: [<ffffffff81529b76>] ? panic+0xa7/0x16f 03:26:22: [<ffffffffa0820eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] 03:26:22: [<ffffffffa0c3f06d>] ? ptlrpc_invalidate_import+0x85d/0x930 [ptlrpc] 03:26:22: [<ffffffffa08311c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 03:26:22: [<ffffffffa0c440f6>] ? ptlrpc_set_import_discon+0xf6/0x5b0 [ptlrpc] 03:26:22: [<ffffffffa0c445e3>] ? ptlrpc_reconnect_import+0x33/0x1b0 [ptlrpc] 03:26:22: [<ffffffffa08311c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 03:26:22: [<ffffffffa12ea2ea>] ? mgc_set_info_async+0x5ea/0x1940 [mgc] 03:26:22: [<ffffffffa08311c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 03:26:22: [<ffffffffa0a006d1>] ? obd_set_info_async.clone.2+0xf1/0x360 [obdclass] 03:26:22: [<ffffffffa0a06c18>] ? lustre_start_mgc+0x14c8/0x1e00 [obdclass] 03:26:22: [<ffffffffa08311c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 03:26:22: [<ffffffffa0a356f2>] ? server_fill_super+0x5c2/0x1690 [obdclass] 03:26:22: [<ffffffffa082b818>] ? libcfs_log_return+0x28/0x40 [libcfs] 03:26:22: [<ffffffffa0a07ab0>] ? lustre_fill_super+0x560/0xa80 [obdclass] 03:26:22: [<ffffffffa0a07550>] ? lustre_fill_super+0x0/0xa80 [obdclass] 03:26:22: [<ffffffff811917af>] ? get_sb_nodev+0x5f/0xa0 03:26:22: [<ffffffffa09feb05>] ? lustre_get_sb+0x25/0x30 [obdclass] 03:26:22: [<ffffffff81190deb>] ? vfs_kern_mount+0x7b/0x1b0 03:26:22: [<ffffffff81190f92>] ? do_kern_mount+0x52/0x130 03:26:22: [<ffffffff811b2b9b>] ? do_mount+0x2fb/0x930 03:26:22: [<ffffffff811b3260>] ? sys_mount+0x90/0xe0 03:26:22: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b 03:26:22:Initializing cgroup subsys cpuset 03:26:22:Initializing cgroup subsys cpu |
| Comments |
| Comment by Andreas Dilger [ 01/Jun/15 ] |
|
Sarah, is this a repeatable failure or only intermittent? |
| Comment by Sarah Liu [ 08/Jul/15 ] |
|
Hi Andreas, this is a repeatable issue: |
| Comment by Patrick Farrell (Inactive) [ 10/Aug/15 ] |
|
This sure looks like https://jira.hpdd.intel.com/browse/LU-4913. It's being reproduced by the test added for that issue. It seems the race there is not completely closed. (And I suspect this isn't related to interop.) Cray has seen this in our testing of 2.5 with the patch from |
| Comment by Saurabh Tandan (Inactive) [ 10/Feb/16 ] |
|
Another instance found for interop tag 2.7.66 - 2.7.1 Server/EL7 Client, build# 3316 Another instance found for interop tag 2.7.66 - 2.7.1 Server/EL6.7 Client, build# 3316 |
| Comment by Saurabh Tandan (Inactive) [ 24/Feb/16 ] |
|
Another instance found for interop - 2.7.1 Server/EL6.7 Client, tag 2.7.90. |
| Comment by Mikhail Pershin [ 16/Jan/22 ] |
|
outdated |