Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Lustre 2.4.1
-
3
-
15573
Description
Lustre 2.4 clients initially failed to mount the Sonexion Lustre filesystem due to IB issues. Then subsequent mount attempts failed due to the presence of an MGC entry left behind by the failed mount attempt.
Here is the log from a client for the initial failed mount attempt:
console-20131016t19:2013-10-16T19:27:10.464396-05:00 c0-0c2s0n3 LustreError: 3603:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff88041e5a9000 x1449100178358368/t0(0) o101->MGC10.10.84.202@o2ib@10.10.84.202@o2ib :26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1 console-20131016t19:2013-10-16T19:27:20.047254-05:00 c0-0c2s0n3 LustreError: 3593:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff88081bdee400 x1449100178358376/t0(0) o101->MGC10.10.84.202@o2ib@10.10.84.202@o2ib :26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1 console-20131016t19:2013-10-16T19:27:32.906155-05:00 c0-0c2s0n3 LustreError: 3593:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff88081bdee400 x1449100178358380/t0(0) o101->MGC10.10.84.202@o2ib@10.10.84.202@o2ib :26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1 console-20131016t19:2013-10-16T19:28:02.179944-05:00 c0-0c2s0n3 LustreError: 3603:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff88081bdf0800 x1449100178358372/t0(0) o101->MGC10.10.84.202@o2ib@10.10.84.202@o2ib :26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1 console-20131016t19:2013-10-16T19:28:02.179952-05:00 c0-0c2s0n3 LustreError: 15c-8: MGC10.10.84.202@o2ib: The configuration from log 'snxtest-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. console-20131016t19:2013-10-16T19:28:02.179965-05:00 c0-0c2s0n3 LustreError: 3603:0:(llite_lib.c:1055:ll_fill_super()) Unable to process log: -5 console-20131016t19:2013-10-16T19:28:02.179973-05:00 c0-0c2s0n3 Lustre: Unmounted snxtest-client console-20131016t19:2013-10-16T19:28:20.355005-05:00 c0-0c2s0n3 Lustre: 3529:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1381969623/real 1381969623] req@ffff88041e5ab800 x1449100178358 364/t0(0) o250->MGC10.10.84.202@o2ib@10.10.84.202@o2ib:26/25 lens 400/544 e 0 to 1 dl 1381969698 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 console-20131016t19:2013-10-16T19:28:20.355017-05:00 c0-0c2s0n3 LustreError: 3603:0:(obd_mount.c:1267:lustre_fill_super()) Unable to mount (-5) console-20131016t19:2013-10-16T19:28:20.355024-05:00 c0-0c2s0n3 mount.lustre: mount 10.10.84.202@o2ib:10.10.84.203@o2ib:/snxtest at /mnt/pdraid failed: Input/output error console-20131016t19:2013-10-16T19:28:20.355031-05:00 c0-0c2s0n3 Is the MGS running? console-20131016t19:2013-10-16T19:28:20.355039-05:00 c0-0c2s0n3 Error mounting lustre filesystem, 10.10.84.202@o2ib:10.10.84.203@o2ib:/snxtest at /mnt/pdraid
Here is the output of "lctl dl" on a client after the failed mount attempt:
# /sbin/lctl dl 0 UP mgc MGC10.10.84.39@o2ib 10d641b6-8b1c-9e11-5561-2600bf3be157 5 1 UP lov snx11000-clilov-ffff88041dfd7c00 fa5ccc52-57a8-e85d-21bf-62a2292eaab7 4 2 UP lmv snx11000-clilmv-ffff88041dfd7c00 fa5ccc52-57a8-e85d-21bf-62a2292eaab7 4 3 UP mdc snx11000-MDT0000-mdc-ffff88041dfd7c00 fa5ccc52-57a8-e85d-21bf-62a2292eaab7 5 4 UP osc snx11000-OST0002-osc-ffff88041dfd7c00 fa5ccc52-57a8-e85d-21bf-62a2292eaab7 5 5 UP osc snx11000-OST0005-osc-ffff88041dfd7c00 fa5ccc52-57a8-e85d-21bf-62a2292eaab7 5 6 UP osc snx11000-OST0004-osc-ffff88041dfd7c00 fa5ccc52-57a8-e85d-21bf-62a2292eaab7 5 7 UP osc snx11000-OST0006-osc-ffff88041dfd7c00 fa5ccc52-57a8-e85d-21bf-62a2292eaab7 5 8 UP osc snx11000-OST0003-osc-ffff88041dfd7c00 fa5ccc52-57a8-e85d-21bf-62a2292eaab7 5 9 UP osc snx11000-OST0007-osc-ffff88041dfd7c00 fa5ccc52-57a8-e85d-21bf-62a2292eaab7 5 10 UP osc snx11000-OST0000-osc-ffff88041dfd7c00 fa5ccc52-57a8-e85d-21bf-62a2292eaab7 5 11 UP osc snx11000-OST0001-osc-ffff88041dfd7c00 fa5ccc52-57a8-e85d-21bf-62a2292eaab7 5 12 ST mgc MGC10.10.84.202@o2ib 48f1967d-e484-f070-dbd2-e190e1f7d19a 1
After the initial failed mount attempt, the IB issues were fixed, but the subsequent mount attempts failed. mount.lustre reported "File exists." Here are the log messages on a client.
console-20131016t19:2013-10-16T19:57:15.046983-05:00 c0-0c2s0n3 LustreError: 3867:0:(genops.c:320:class_newdev()) Device MGC10.10.84.202@o2ib already exists at 12, won't add console-20131016t19:2013-10-16T19:57:15.047438-05:00 c0-0c2s0n3 LustreError: 3867:0:(obd_config.c:374:class_attach()) Cannot create device MGC10.10.84.202@o2ib of type mgc : -17 console-20131016t19:2013-10-16T19:57:15.047721-05:00 c0-0c2s0n3 LustreError: 3867:0:(obd_mount.c:196:lustre_start_simple()) MGC10.10.84.202@o2ib attach error -17 console-20131016t19:2013-10-16T19:57:15.047727-05:00 c0-0c2s0n3 LustreError: 3867:0:(obd_mount.c:1267:lustre_fill_super()) Unable to mount (-17)
This error persisted on the client until it was rebooted. It looks like the initial mount failure left around a bad device entry, causing future mounts of this fs to fail.
Attachments
Issue Links
- is related to
-
LU-4943 Client Failes to mount filesystem
-
- Resolved
-
I think this bug really is the same as
LU-4943. The patch forLU-4943has been iterated upon and now takes the same approach as Parinay's patch. The patch forLU-4943(http://review.whamcloud.com/#/c/10129/14) has landed now, and I have tested that the landed patch resolves this issue. This bug should be closed.Parinay's patch (http://review.whamcloud.com/#/c/10569/) is almost the same as the above landed patch, so that one can be abandoned.