[LU-11056] OSS can't connect to MDS after hard reboot Created: 25/May/18  Updated: 01/Feb/19  Resolved: 28/Aug/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.12.0, Lustre 2.10.7

Type: Bug Priority: Minor
Reporter: Hongchao Zhang Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

After the OSS was powered off and rebooted, it can't connect to MDS.

May 17 17:59:49 zfswh2-oss-17-2 kernel: LustreError: 11-0: zfswh2-MDT0000-lwp-OST0006: operation mds_connect to node 10.26.1.1@tcp failed: rc = -114
May 18 15:12:05 zfswh2-oss-17-2 kernel: Lustre: zfswh2-OST0006: Connection restored to zfswh2-MDT0000-mdtlov_UUID (at 10.26.1.1@tcp)
May 18 15:12:05 zfswh2-oss-17-2 kernel: Lustre: zfswh2-OST0007: Connection restored to zfswh2-MDT0000-mdtlov_UUID (at 10.26.1.1@tcp)
May 18 15:12:19 zfswh2-oss-17-2 kernel: Lustre: zfswh2-MDT0000-lwp-OST0005: Connection restored to 10.26.1.1@tcp (at 10.26.1.1@tcp)
May 18 15:13:09 zfswh2-oss-17-2 kernel: Lustre: Evicted from MGS (at 10.26.1.1@tcp) after server handle changed from 0xf074011de3327f2c to 0xb0f98aa68fc803d9
May 18 15:13:09 zfswh2-oss-17-2 kernel: Lustre: MGC10.26.1.1@tcp: Connection restored to 10.26.1.1@tcp (at 10.26.1.1@tcp)
May 18 15:13:34 zfswh2-oss-17-2 kernel: Lustre: zfswh2-MDT0000-lwp-OST0006: Connection restored to 10.26.1.1@tcp (at 10.26.1.1@tcp)


 Comments   
Comment by Hongchao Zhang [ 25/May/18 ]

the patch is tracked at https://review.whamcloud.com/#/c/32536

Comment by Gerrit Updater [ 28/Aug/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32536/
Subject: LU-11056 lwp: fix lwp reconnection issue
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0814d5077343953115f50982a2e93cebb29bda68

Comment by Peter Jones [ 28/Aug/18 ]

Landed for 2.12

Comment by Gerrit Updater [ 07/Jan/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33977
Subject: LU-11056 lwp: fix lwp reconnection issue
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 9d8b82481cbefd2fc6ffe48fd695dad95c5f8bd8

Comment by Gerrit Updater [ 19/Jan/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33977/
Subject: LU-11056 lwp: fix lwp reconnection issue
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: f9f2325146cee63ff1501479f720f9711538122a

Comment by Gerrit Updater [ 31/Jan/19 ]

Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34149
Subject: Revert "LU-11056 lwp: fix lwp reconnection issue"
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: dcb70a2a8be6f30dbed923e62dbcfee1db452afc

Comment by Gerrit Updater [ 01/Feb/19 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34154
Subject: Revert "LU-11056 lwp: fix lwp reconnection issue"
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 9183a17b8a5ab7df798b5c58d53d4a490bb01d89

Generated at Sat Feb 10 02:40:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.