[LU-6906] During 24 hours DNE test, one of MDS can not be mounted after restarts. Created: 26/Jul/15  Updated: 10/Aug/15  Resolved: 03/Aug/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Minor
Reporter: Di Wang Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-6831 The ticket for tracking all DNE2 bugs Reopened
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

During 24 hours DNE test, one of MDS can not be mounted after restarts.

Lustre: 2635:0:(client.c:2018:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1437941992/real 1437941992]  req@ffff881022b95980 x1507790665285844/t0(0) o256->MGC192.168.2.125@o2ib@192.168.2.125@o2ib:26/25 lens 304/240 e 0 to 1 dl 1437942748 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
LustreError: 166-1: MGC192.168.2.125@o2ib: Connection to MGS (at 192.168.2.125@o2ib) was lost; in progress operations using this service will fail
LustreError: 2635:0:(mgc_request.c:2072:mgc_process_config()) Cannot process recover llog -5
Lustre: MGC192.168.2.125@o2ib: Connection restored to MGS (at 192.168.2.125@o2ib)
LustreError: 15c-8: MGC192.168.2.125@o2ib: The configuration from log 'lustre-MDT0002' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 2635:0:(obd_mount_server.c:1306:server_start_targets()) failed to start server lustre-MDT0002: -5
LustreError: 2635:0:(obd_mount_server.c:1790:server_fill_super()) Unable to start targets: -5
Lustre: Failing over lustre-MDT0002
Lustre: server umount lustre-MDT0002 complete
LustreError: 2635:0:(obd_mount.c:1342:lustre_fill_super()) Unable to mount  (-5)
Lustre: DEBUG MARKER: recovery-mds-scale test_failover_mds: @@@@@@ FAIL: Restart of mds3 failed!
Lustre: DEBUG MARKER: Duration: 86400


 Comments   
Comment by Gerrit Updater [ 26/Jul/15 ]

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/15728
Subject: LU-6906 mgc: IR log failure should not stop mount
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ae9a3fd00c393b111455b04d3a1b7f48906e6979

Comment by Gerrit Updater [ 03/Aug/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15728/
Subject: LU-6906 mgc: IR log failure should not stop mount
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ad4ca062720cf5c0ee5bda5c9dcc3e09521d07bb

Generated at Sat Feb 10 02:04:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.