[LU-6896] update llog object is missing during recovery. Created: 23/Jul/15  Updated: 09/Sep/16  Resolved: 10/Aug/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Critical
Reporter: Di Wang Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: dne2

Issue Links:
Related
is related to LU-6831 The ticket for tracking all DNE2 bugs Reopened
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Found this during 24 hours failover test on hyperion.

2015-07-22 21:06:58 Lustre: DEBUG MARKER: ==== Checking the clients loads AFTER failover -- failure NOT OK
2015-07-22 21:08:08 LustreError: 6302:0:(update_recovery.c:726:update_is_committed()) lustre-MDT0001: master transno 4294990737 committed 4294990976
2015-07-22 21:08:08 LustreError: 6302:0:(update_recovery.c:738:update_is_committed()) LBUG
2015-07-22 21:08:08 Pid: 6302, comm: tgt_recov
2015-07-22 21:08:08
2015-07-22 21:08:08 Call Trace:
2015-07-22 21:08:08  [<ffffffffa04a7875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
2015-07-22 21:08:08  [<ffffffffa04a7e77>] lbug_with_loc+0x47/0xb0 [libcfs]
2015-07-22 21:08:08  [<ffffffffa08ed00e>] update_recovery_exec+0x1ebe/0x1f20 [ptlrpc]
2015-07-22 21:08:08  [<ffffffffa04b3c01>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
2015-07-22 21:08:08  [<ffffffffa08eeed7>] distribute_txn_replay_handle+0x387/0x10c0 [ptlrpc]
2015-07-22 21:08:08  [<ffffffffa04b3c01>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
2015-07-22 21:08:08  [<ffffffffa0824483>] ? target_recovery_overseer+0x93/0x320 [ptlrpc]
2015-07-22 21:08:08  [<ffffffffa082b39f>] target_recovery_thread+0x92f/0x24d0 [ptlrpc] 
2015-07-22 21:08:08  [<ffffffffa082aa70>] ? target_recovery_thread+0x0/0x24d0 [ptlrpc]
2015-07-22 21:08:08  [<ffffffff8109e78e>] kthread+0x9e/0xc0
2015-07-22 21:08:08  [<ffffffff8100c28a>] child_rip+0xa/0x20
2015-07-22 21:08:08  [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
2015-07-22 21:08:08  [<ffffffff8100c280>] ? child_rip+0x0/0x20

After checking the log, it seems update log object is missing after recovery. And we probably need more synchronization for update llog creation.



 Comments   
Comment by Gerrit Updater [ 25/Jul/15 ]

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/15721
Subject: LU-6896 llog: update log initialization needs to be sync
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: be5bdd2e844c758014223ca9fd01b31ace9027ae

Comment by Gerrit Updater [ 09/Aug/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15721/
Subject: LU-6896 llog: update log initialization needs to be sync
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 42a175ccdd39f164a12c65e9f9cd726bd7a3aaed

Comment by Peter Jones [ 10/Aug/15 ]

Landed for 2.8

Generated at Sat Feb 10 02:04:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.