Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.1.0
-
None
-
3
-
24,361
-
5052
Description
Oracle reports this assertion failure when running obdfilter survey:
obdfilter-survey test 2a hung and hit the following LBUG on one of the client nodes:
Lustre: DEBUG MARKER: == obdfilter-survey test 2a: Stripe F/S over the Network
============================================= 08:40:49 (1292686849)
Lustre: 8086:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import lustre-OST0000_osc->host_0_UUID netid
50000: select flavor null
Lustre: 8086:0:(sec.c:1474:sptlrpc_import_sec_adapt()) Skipped 5 previous similar messages
LustreError: 8309:0:(osc_request.c:773:osc_announce_cached()) dirty 11296 - 11297 > system
dirty_max 589824
LustreError: 8290:0:(osc_request.c:773:osc_announce_cached()) dirty 12006 - 12007 > system
dirty_max 589824
LustreError: 8302:0:(osc_request.c:773:osc_announce_cached()) dirty 12051 - 12052 > system
dirty_max 589824
LustreError: 8306:0:(osc_request.c:773:osc_announce_cached()) dirty 5853 - 5854 > system dirty_max
589824
LustreError: 8400:0:(osc_request.c:773:osc_announce_cached()) dirty 10889 - 10890 > system
dirty_max 589824
LustreError: 8388:0:(osc_request.c:773:osc_announce_cached()) dirty 9779 - 9780 > system dirty_max
589824
LustreError: 8387:0:(osc_request.c:773:osc_announce_cached()) dirty 4950 - 4951 > system dirty_max
589824
LustreError: 8387:0:(osc_request.c:773:osc_announce_cached()) Skipped 1 previous similar message
LustreError: 8517:0:(osc_request.c:773:osc_announce_cached()) dirty 10796 - 10797 > system
dirty_max 589824
LustreError: 8517:0:(osc_request.c:773:osc_announce_cached()) Skipped 1 previous similar message
LustreError: 8756:0:(cl_page.c:986:cl_page_own0()) page@ffff81010a07ccc0[2 ffff810063a3ecd0:0
^0000000000000000_0000000000000000 1 0 2 ffff81010b699610 0000000000000000 0x0]
LustreError: 8756:0:(cl_page.c:986:cl_page_own0()) echo_client-page@ffff81010a24bf78
vm@ffff810101e03cc8
LustreError: 8756:0:(cl_page.c:986:cl_page_own0()) osc-page@ffff810109a339b8: 1< 0x845fed 258 0 - -
- > 2< 0 0 0x0 0x308 | 0000000000000000 ffff8100614308e8 ffff810066e8f600 ffffffff889451c0
ffff810109a339b8 > 3< - ffff81011ff8e040 0 0 1 > 4< 0 7 8 39845888 - | + - + - > 5< + - + - | 0 - -512 + +>
LustreError: 8756:0:(cl_page.c:986:cl_page_own0()) end page@ffff81010a07ccc0
LustreError: 8756:0:(cl_page.c:986:cl_page_own0()) pg->cp_owner == NULL
LustreError: 8756:0:(cl_page.c:986:cl_page_own0()) ASSERTION(0) failed
LustreError: 8756:0:(cl_page.c:986:cl_page_own0()) LBUG
Pid: 8756, comm: lctl
Call Trace:
[<ffffffff885b85f1>] libcfs_debug_dumpstack+0x51/0x60 [libcfs]
[<ffffffff885b8b2a>] lbug_with_loc+0x7a/0xd0 [libcfs]
[<ffffffff885c3960>] cfs_tracefile_init+0x0/0x10a [libcfs]
[<ffffffff886ab720>] cl_page_own0+0x1a0/0x2f0 [obdclass]
[<ffffffff88ac7801>] echo_client_brw_ioctl+0x1531/0x1cd0 [obdecho]
[<ffffffff8000d47a>] dput+0x2c/0x114
[<ffffffff88066381>] nfs_lookup_revalidate+0x2be/0x443 [nfs]
[<ffffffff88acaf50>] echo_client_iocontrol+0x1360/0x1b00 [obdecho]
[<ffffffff800cc354>] zone_statistics+0x3e/0x6d
[<ffffffff800d1707>] __vmalloc_area_node+0x12e/0x156
[<ffffffff88654e17>] obd_ioctl_getdata+0x5b7/0xeb0 [obdclass]
[<ffffffff8002c9bc>] mntput_no_expire+0x19/0x89
[<ffffffff8866965c>] class_handle_ioctl+0x1dcc/0x2160 [obdclass]
[<ffffffff8000cd72>] do_path_lookup+0x275/0x2f1
[<ffffffff8000d9e4>] permission+0x8d/0xc8
[<ffffffff801aaaeb>] misc_open+0x16c/0x260
[<ffffffff8865457a>] obd_class_ioctl+0x19a/0x230 [obdclass]
[<ffffffff80064c7d>] lock_kernel+0x1b/0x32
[<ffffffff8004217f>] do_ioctl+0x55/0x6b
[<ffffffff800301de>] vfs_ioctl+0x457/0x4b9
[<ffffffff800b76a3>] audit_syscall_entry+0x180/0x1b3
[<ffffffff8004c607>] sys_ioctl+0x59/0x78
[<ffffffff8005d28d>] tracesys+0xd5/0xe0
Kernel panic - not syncing: LBUG
Eric Mei comments that apparently obdecho threads incorrectly share pages they are not supposed to.