[LU-7778] mount of MDT(==MGS) failed after MDS restart Created: 16/Feb/16 Updated: 24/Feb/16 Resolved: 24/Feb/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0, Lustre 2.9.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Frank Heckes (Inactive) | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | soak | ||
| Environment: |
lola |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Error happened during soak testing of build '20160215' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20150215). DNE is enabled. Please note that build 20150215 is a vanilla build of the master brunch. Sequence of events:
Attached messages, console and manual forced debug log of node lola-8. |
| Comments |
| Comment by Di Wang [ 16/Feb/16 ] |
Feb 15 16:37:47 lola-8 kernel: LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. quota=on. Opts: Feb 15 16:37:48 lola-8 kernel: LustreError: 11-0: soaked-MDT0006-osp-MDT0001: operation mds_connect to node 192.168.1.111@o2ib10 failed: rc = -16 Feb 15 16:37:48 lola-8 kernel: LustreError: Skipped 3 previous similar messages Feb 15 16:37:48 lola-8 kernel: Lustre: 4320:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1455583068/real 1455583068] req@ffff8804037909c0 x1526289292853684/t0(0) o38->soaked-MDT0000-osp-MDT0001@192.168.1.109@o2ib10:24/4 lens 520/544 e 0 to 1 dl 1455583079 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Feb 15 16:37:48 lola-8 kernel: Lustre: 4320:0:(client.c:2063:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 15 16:37:54 lola-8 kernel: LustreError: 137-5: soaked-MDT0003_UUID: not available for connect from 192.168.1.104@o2ib10 (no target). If you are running an HA pair check that the target is mounted on the other server. Feb 15 16:37:54 lola-8 kernel: LustreError: Skipped 58 previous similar messages Feb 15 16:38:03 lola-8 kernel: Lustre: soaked-MDT0001: Client d26c53bc-3d10-5c53-0c35-f189140fc2e8 (at 192.168.1.131@o2ib100) reconnecting, waiting for 14 clients in recovery for 3:53 Feb 15 16:38:03 lola-8 kernel: Lustre: Skipped 180 previous similar messages Feb 15 16:38:20 lola-8 kernel: LustreError: 15c-8: MGC192.168.1.108@o2ib10: The configuration from log 'soaked-MDT0000' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. Feb 15 16:38:20 lola-8 kernel: LustreError: 4538:0:(obd_mount_server.c:1309:server_start_targets()) failed to start server soaked-MDT0000: -5 It looks like MDT0 has trouble to communicate with MGS. But unfortunately, there are no logs to indicate what happens. I guess I need monitor the "run". |
| Comment by Gerrit Updater [ 18/Feb/16 ] |
|
wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/18509 |
| Comment by Gerrit Updater [ 24/Feb/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18509/ |
| Comment by Peter Jones [ 24/Feb/16 ] |
|
Landed for 2.8 and 2.9 |