Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.8.0
-
None
-
3
-
9223372036854775807
Description
Error happens during soak testing of build '20151027' (see 2.7.62, 5245b94a4aed1c4dcfa8b394acf916ea7accf138 ) + the following patches
DNE is enabled. MDTs had been formatted with ldiskfs, OSTs with zfs as backend FS. MDTs are configured in active-active HA failover configuration:
- lola-8 (mdt0,1), lola-9 (mdt2,3)
- lola-9 (mdt4,5), lola-11 (mdt6,7)
Due to problems described in LU-7039 update_log, update_log_dirs have been deleted on all MDTs when mounted as ldiskfs devices.
Before starting soak the command
lctl lfsck_start -A -M soaked-MDT0000 -t all
was exectuted at Oct, 27 08:00 (cluster time)
Almost immediately nodes lola-9,10 crashed with LBUG:
Oct 27 08:00:42 lola-9 kernel: LustreError: 7394:0:(lu_object.h:862:lu_object_attr()) ASSERTION( ((o)->lo_header->loh_attr & LOHA_EXISTS) != 0 ) failed: Oct 27 08:00:42 lola-9 kernel: LustreError: 7394:0:(lu_object.h:862:lu_object_attr()) LBUG ----------------------------------------------------------------------- Oct 27 08:00:42 lola-10 kernel: LustreError: 6078:0:(lu_object.h:862:lu_object_attr()) ASSERTION( ((o)->lo_header->loh_attr & LOHA_EXISTS) != 0 ) failed: Oct 27 08:00:42 lola-10 kernel: LustreError: 6078:0:(lu_object.h:862:lu_object_attr()) LBUG Oct 27 08:00:42 lola-10 kernel: Pid: 6078, comm: lfsck_namespace Oct 27 08:00:42 lola-10 kernel: Oct 27 08:00:42 lola-10 kernel: Call Trace: Oct 27 08:00:42 lola-10 kernel: [<ffffffffa0658875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] Oct 27 08:00:42 lola-10 kernel: [<ffffffffa0658e77>] lbug_with_loc+0x47/0xb0 [libcfs] Oct 27 08:00:42 lola-10 kernel: [<ffffffffa107b06e>] lfsck_lock+0x3fe/0x430 [lfsck] Oct 27 08:00:42 lola-10 kernel: [<ffffffffa12374d4>] ? lod_index_try+0x184/0x300 [lod] Oct 27 08:00:42 lola-10 kernel: [<ffffffffa1099c86>] lfsck_namespace_insert_normal+0x306/0xa00 [lfsck] Oct 27 08:00:42 lola-10 kernel: [<ffffffffa10a1a21>] lfsck_namespace_dsd_single+0x511/0xd40 [lfsck] Oct 27 08:00:42 lola-10 kernel: [<ffffffffa10a5156>] lfsck_namespace_double_scan_dir+0x6d6/0xe40 [lfsck] Oct 27 08:00:42 lola-10 kernel: [<ffffffffa10a5c14>] lfsck_namespace_double_scan_one+0x354/0x1330 [lfsck] Oct 27 08:00:42 lola-10 kernel: [<ffffffffa07a58b1>] ? lu_object_find_at+0xb1/0xe0 [obdclass] Oct 27 08:00:42 lola-10 kernel: [<ffffffffa07a49ed>] ? lu_object_put+0x25d/0x3b0 [obdclass] Oct 27 08:00:42 lola-10 kernel: [<ffffffffa10afdcd>] lfsck_namespace_double_scan_one_trace_file+0x5bd/0x8d0 [lfsck] Oct 27 08:00:42 lola-10 kernel: [<ffffffffa10b045b>] lfsck_namespace_assistant_handler_p2+0x37b/0x1830 [lfsck] Oct 27 08:00:42 lola-10 kernel: [<ffffffff81087540>] ? process_timeout+0x0/0x10 Oct 27 08:00:42 lola-10 kernel: [<ffffffffa10924a3>] lfsck_assistant_engine+0x1633/0x2010 [lfsck] Oct 27 08:00:42 lola-10 kernel: [<ffffffff81064c00>] ? default_wake_function+0x0/0x20 Oct 27 08:00:42 lola-10 kernel: [<ffffffffa1090e70>] ? lfsck_assistant_engine+0x0/0x2010 [lfsck] Oct 27 08:00:42 lola-10 kernel: [<ffffffff8109e78e>] kthread+0x9e/0xc0 Oct 27 08:00:42 lola-10 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20 Oct 27 08:00:42 lola-10 kernel: [<ffffffff8109e6f0>] ? kthread+0x0/0xc0 Oct 27 08:00:42 lola-10 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20 Oct 27 08:00:42 lola-10 kernel: Oct 27 08:00:42 lola-10 kernel: Kernel panic - not syncing: LBUG
Crash dump files have been written have been written for the first event:
- lola-9:/var/crash/127.0.0.1-2015-10-27-08:00:58
and can be provided on demand.
No corresponding log entries exist on any other node.