[LU-8740] LBUG dt_declare_delete()) ASSERTION( dt ) failed - in fsck Created: 20/Oct/16  Updated: 17/Dec/16  Resolved: 17/Dec/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Critical
Reporter: Cliff White (Inactive) Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

After a failover, lfsck was started on lola-8 (MGS/MDS node)

2016-10-20 05:38:51,030:fsmgmt.fsmgmt:INFO     executing cmd:
                lctl lfsck_start -M soaked-MDT0000 -s 1000 -t all
                if [[ $? != 0 ]]; then
                        lctl lfsck_start -M soaked-MDT0000 -s 1000 -t namespace,layout
                        if [[ $? != 0 ]]; then
                                lctl lfsck_start -M soaked-MDT0000 -s 1000 -t namespace
                        fi
                fi

The node immediately hit an LBUG.

Oct 20 05:38:53 lola-8 kernel: LustreError: 6404:0:(dt_object.h:2639:dt_declare_delete()) ASSERTION( dt ) failed:
Oct 20 05:38:53 lola-8 kernel: LustreError: 6404:0:(dt_object.h:2639:dt_declare_delete()) LBUG
Oct 20 05:38:53 lola-8 kernel: Pid: 6404, comm: mdt_out00_018
Oct 20 05:38:53 lola-8 kernel:
Oct 20 05:38:53 lola-8 kernel: Call Trace:
Oct 20 05:38:53 lola-8 kernel: [<ffffffffa081d875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
Oct 20 05:38:53 lola-8 kernel: [<ffffffffa081de77>] lbug_with_loc+0x47/0xb0 [libcfs]
Oct 20 05:38:53 lola-8 kernel: [<ffffffffa11cab8d>] dt_declare_delete+0x12d/0x1a0 [lfsck] 
Oct 20 05:38:53 lola-8 kernel: [<ffffffffa082e495>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
Oct 20 05:38:53 lola-8 kernel: [<ffffffffa11df3ce>] lfsck_namespace_in_notify+0x46e/0xd30 [lfsck]
Oct 20 05:38:53 lola-8 kernel: [<ffffffffa0938859>] ? htable_lookup+0xd9/0x210 [obdclass]
Oct 20 05:38:53 lola-8 kernel: [<ffffffffa11bd647>] lfsck_in_notify+0xf7/0x610 [lfsck]
Oct 20 05:38:53 lola-8 kernel: [<ffffffffa10b2a7c>] ? osd_declare_xattr_set+0x17c/0x330 [osd_ldiskfs]
Oct 20 05:38:53 lola-8 kernel: [<ffffffffa0ba343a>] out_xattr_set_add_exec+0x20a/0x3e0 [ptlrpc]
Oct 20 05:38:53 lola-8 kernel: [<ffffffffa11bd550>] ? lfsck_in_notify+0x0/0x610 [lfsck]
Oct 20 05:38:53 lola-8 kernel: [<ffffffffa0b9d73a>] out_xattr_set+0x33a/0x430 [ptlrpc]
Oct 20 05:38:53 lola-8 kernel: [<ffffffffa0ba0fd7>] out_handle+0x1377/0x18d0 [ptlrpc]
Oct 20 05:38:53 lola-8 kernel: [<ffffffff8105e9b6>] ? enqueue_task+0x66/0x80
Oct 20 05:38:53 lola-8 kernel: [<ffffffff8105ab8d>] ? check_preempt_curr+0x6d/0x90
Oct 20 05:38:53 lola-8 kernel: [<ffffffffa0aede20>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc]
Oct 20 05:38:53 lola-8 kernel: [<ffffffffa0b90e4a>] ? req_can_reconstruct+0x6a/0x120 [ptlrpc]
Oct 20 05:38:53 lola-8 kernel: [<ffffffffa0b9803c>] tgt_request_handle+0x8ec/0x1440 [ptlrpc]
Oct 20 05:38:53 lola-8 kernel: [<ffffffffa0b44791>] ptlrpc_main+0xd31/0x1800 [ptlrpc]
Oct 20 05:38:53 lola-8 kernel: [<ffffffff81539b0e>] ? thread_return+0x4e/0x7d0
Oct 20 05:38:53 lola-8 kernel: [<ffffffffa0b43a60>] ? ptlrpc_main+0x0/0x1800 [ptlrpc]
Oct 20 05:38:53 lola-8 kernel: [<ffffffff810a138e>] kthread+0x9e/0xc0
Oct 20 05:38:53 lola-8 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
Oct 20 05:38:53 lola-8 kernel: [<ffffffff810a12f0>] ? kthread+0x0/0xc0
Oct 20 05:38:53 lola-8 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
Oct 20 05:38:53 lola-8 kernel:


 Comments   
Comment by Cliff White (Inactive) [ 20/Oct/16 ]

Crash dump from lola-8 can be accessed on lola, at

/scratch/crash_lustre/lu-8740/lola-8
Comment by Joseph Gmitter (Inactive) [ 20/Oct/16 ]

Hi Fan Yong,

Could you look at this issue?

Thanks.
Joe

Comment by Gerrit Updater [ 21/Oct/16 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/23301
Subject: LU-8740 lfsck: hold lock when access trace file object
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4481e376002c8a5b09eb4bf054b67048ac38b223

Comment by Gerrit Updater [ 17/Dec/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23301/
Subject: LU-8740 lfsck: hold lock when access trace file object
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3d255acab28a2e36a90460bee4fbf7a88fad815c

Generated at Sat Feb 10 02:20:09 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.