[LU-7384] lu_object_attr()) ASSERTION( ((o)->lo_header->loh_attr & LOHA_EXISTS) != 0 Created: 04/Nov/15  Updated: 24/Nov/15  Resolved: 24/Nov/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Blocker
Reporter: nasf (Inactive) Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Error happens during soak testing of build '20151027' (see 2.7.62, 5245b94a4aed1c4dcfa8b394acf916ea7accf138 ) + the following patches
DNE is enabled. MDTs had been formatted with ldiskfs, OSTs with zfs as backend FS. MDTs are configured in active-active HA failover configuration:

  • lola-8 (mdt0,1), lola-9 (mdt2,3)
  • lola-9 (mdt4,5), lola-11 (mdt6,7)

Due to problems described in LU-7039 update_log, update_log_dirs have been deleted on all MDTs when mounted as ldiskfs devices.
Before starting soak the command

 lctl lfsck_start -A -M soaked-MDT0000 -t all

was exectuted at Oct, 27 08:00 (cluster time)
Almost immediately nodes lola-9,10 crashed with LBUG:

Oct 27 08:00:42 lola-9 kernel: LustreError: 7394:0:(lu_object.h:862:lu_object_attr()) ASSERTION( ((o)->lo_header->loh_attr &
 LOHA_EXISTS) != 0 ) failed:
Oct 27 08:00:42 lola-9 kernel: LustreError: 7394:0:(lu_object.h:862:lu_object_attr()) LBUG

-----------------------------------------------------------------------
Oct 27 08:00:42 lola-10 kernel: LustreError: 6078:0:(lu_object.h:862:lu_object_attr()) ASSERTION( ((o)->lo_header->loh_attr 
& LOHA_EXISTS) != 0 ) failed: 
Oct 27 08:00:42 lola-10 kernel: LustreError: 6078:0:(lu_object.h:862:lu_object_attr()) LBUG
Oct 27 08:00:42 lola-10 kernel: Pid: 6078, comm: lfsck_namespace
Oct 27 08:00:42 lola-10 kernel: 
Oct 27 08:00:42 lola-10 kernel: Call Trace:
Oct 27 08:00:42 lola-10 kernel: [<ffffffffa0658875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
Oct 27 08:00:42 lola-10 kernel: [<ffffffffa0658e77>] lbug_with_loc+0x47/0xb0 [libcfs]
Oct 27 08:00:42 lola-10 kernel: [<ffffffffa107b06e>] lfsck_lock+0x3fe/0x430 [lfsck]
Oct 27 08:00:42 lola-10 kernel: [<ffffffffa12374d4>] ? lod_index_try+0x184/0x300 [lod]
Oct 27 08:00:42 lola-10 kernel: [<ffffffffa1099c86>] lfsck_namespace_insert_normal+0x306/0xa00 [lfsck]
Oct 27 08:00:42 lola-10 kernel: [<ffffffffa10a1a21>] lfsck_namespace_dsd_single+0x511/0xd40 [lfsck]
Oct 27 08:00:42 lola-10 kernel: [<ffffffffa10a5156>] lfsck_namespace_double_scan_dir+0x6d6/0xe40 [lfsck]
Oct 27 08:00:42 lola-10 kernel: [<ffffffffa10a5c14>] lfsck_namespace_double_scan_one+0x354/0x1330 [lfsck]
Oct 27 08:00:42 lola-10 kernel: [<ffffffffa07a58b1>] ? lu_object_find_at+0xb1/0xe0 [obdclass]
Oct 27 08:00:42 lola-10 kernel: [<ffffffffa07a49ed>] ? lu_object_put+0x25d/0x3b0 [obdclass]
Oct 27 08:00:42 lola-10 kernel: [<ffffffffa10afdcd>] lfsck_namespace_double_scan_one_trace_file+0x5bd/0x8d0 [lfsck]
Oct 27 08:00:42 lola-10 kernel: [<ffffffffa10b045b>] lfsck_namespace_assistant_handler_p2+0x37b/0x1830 [lfsck]
Oct 27 08:00:42 lola-10 kernel: [<ffffffff81087540>] ? process_timeout+0x0/0x10
Oct 27 08:00:42 lola-10 kernel: [<ffffffffa10924a3>] lfsck_assistant_engine+0x1633/0x2010 [lfsck]
Oct 27 08:00:42 lola-10 kernel: [<ffffffff81064c00>] ? default_wake_function+0x0/0x20
Oct 27 08:00:42 lola-10 kernel: [<ffffffffa1090e70>] ? lfsck_assistant_engine+0x0/0x2010 [lfsck]
Oct 27 08:00:42 lola-10 kernel: [<ffffffff8109e78e>] kthread+0x9e/0xc0
Oct 27 08:00:42 lola-10 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
Oct 27 08:00:42 lola-10 kernel: [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
Oct 27 08:00:42 lola-10 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
Oct 27 08:00:42 lola-10 kernel: 
Oct 27 08:00:42 lola-10 kernel: Kernel panic - not syncing: LBUG

Crash dump files have been written have been written for the first event:

  • lola-9:/var/crash/127.0.0.1-2015-10-27-08:00:58
    and can be provided on demand.

No corresponding log entries exist on any other node.



 Comments   
Comment by Gerrit Updater [ 04/Nov/15 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/17042
Subject: LU-7384 lfsck: check transaction stop status
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 3c547175ac5f997a5f4d2ae2d19cf331642c61ab

Comment by Gerrit Updater [ 24/Nov/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17042/
Subject: LU-7384 lfsck: check transaction stop status
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 9a466ecccf5546e9fc3d0ce7b5c11280377e5a02

Comment by Joseph Gmitter (Inactive) [ 24/Nov/15 ]

Landed for 2.8

Generated at Sat Feb 10 02:08:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.