[LU-12616] MDS node crashed LustreError: 23042:0:(mdt_handler.c:5135:mdt_init0()) ASSERTION( info != ((void *)0) ) Created: 30/Jul/19  Updated: 23/Jun/20  Resolved: 03/Sep/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0

Type: Bug Priority: Major
Reporter: Alexander Boyko Assignee: Alexander Boyko
Resolution: Fixed Votes: 0
Labels: LTS12, patch

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
[ 1832.493775] LNet: 26562:0:(socklnd_cb.c:425:ksocknal_txlist_done()) Deleting packet type 1 len 520 172.18.1.3@tcp->172.18.1.4@tcp
[ 1870.590047] LustreError: 20610:0:(mgc_request.c:249:do_config_log_add()) MGC172.18.1.3@tcp: failed processing log, type 4: rc = -110
[ 1882.605517] LustreError: 23042:0:(mdt_handler.c:5135:mdt_init0()) ASSERTION( info != ((void *)0) ) failed:
[ 1882.616438] LustreError: 23042:0:(mdt_handler.c:5135:mdt_init0()) LBUG  

 The dk log shows the next steps

started cleanup of MDT01
started cleanup of MDT00
finished cleanup of MDT01, and cleanup of MDS also
started MDT01 mount + setup of MDS
finished setup of MDS
finished cleanup of MDT00, and cleanup of MDS also
asserted during MDT01 initialization
The main problem is MDS was stopped during MDT01 mount. It looks like wrong cleanup of MDS.



 Comments   
Comment by Gerrit Updater [ 30/Jul/19 ]

Alexandr Boyko (c17825@cray.com) uploaded a new patch: https://review.whamcloud.com/35652
Subject: LU-12616 obclass: fix MDS start/stop race
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 347718de9bfcd759949a0a56221ae5b75afd02dd

Comment by James A Simmons [ 05/Aug/19 ]

I think I might of caused this. Can you try patch https://review.whamcloud.com/#/c/34718/

Comment by Gerrit Updater [ 03/Sep/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35652/
Subject: LU-12616 obclass: fix MDS start/stop race
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3cce65712d94cffe8f1626545845b95b88aef672

Comment by Peter Jones [ 03/Sep/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 23/Jun/20 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39157
Subject: LU-12616 obclass: fix MDS start/stop race
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 8c757edba00b5fd6ddf76eb41b41fd398e95eb66

Generated at Sat Feb 10 02:54:10 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.