[LU-6713] Noisy error messages on client while creating DNE filesystem Created: 12/Jun/15 Updated: 01/Jul/16 Resolved: 28/Jul/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0, Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Robert Read (Inactive) | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Seen on Lustre 2.7.0. While create a 128 MDT filesystem, I noticed that clients sometimes take a long time to connect to new MDTs after they've been added. I saw a lot of these messages on a client's console: Jun 12 00:08:13 client00 kernel: LustreError: 1275:0:(fld_request.c:170:fld_client_add_target()) Skipped 12 previous similar messages Jun 12 00:08:13 client00 kernel: Lustre: 1275:0:(lmv_obd.c:300:lmv_init_ea_size()) scratch-clilmv-ffff880772a0ec00: NULL export for 11 Jun 12 00:08:13 client00 kernel: Lustre: 1275:0:(lmv_obd.c:300:lmv_init_ea_size()) Skipped 462 previous similar messages Jun 12 00:08:19 client00 kernel: LustreError: 1277:0:(fld_request.c:170:fld_client_add_target()) cli-scratch-clilmv-ffff880772a0ec00: Attempt to add target scratch-MDT0025-mdc-ffff880772a0ec00 (idx 37) on fly - skip it Jun 12 00:08:19 client00 kernel: LustreError: 1277:0:(fld_request.c:170:fld_client_add_target()) Skipped 13 previous similar messages Jun 12 00:08:19 client00 kernel: Lustre: 1277:0:(lmv_obd.c:300:lmv_init_ea_size()) scratch-clilmv-ffff880772a0ec00: NULL export for 12 Jun 12 00:08:19 client00 kernel: Lustre: 1277:0:(lmv_obd.c:300:lmv_init_ea_size()) Skipped 258 previous similar messages Jun 12 00:08:25 client00 kernel: Lustre: 1278:0:(lmv_obd.c:300:lmv_init_ea_size()) scratch-clilmv-ffff880772a0ec00: NULL export for 13 Jun 12 00:08:25 client00 kernel: Lustre: 1278:0:(lmv_obd.c:300:lmv_init_ea_size()) Skipped 56 previous similar messages Jun 12 00:08:25 client00 kernel: LustreError: 1278:0:(fld_request.c:170:fld_client_add_target()) cli-scratch-clilmv-ffff880772a0ec00: Attempt to add target scratch-MDT0027-mdc-ffff880772a0ec00 (idx 39) on fly - skip it Jun 12 00:08:25 client00 kernel: LustreError: 1278:0:(fld_request.c:170:fld_client_add_target()) Skipped 8 previous similar messages Eventually the client did connect to all the MDTs, but took about ~20 minutes. |
| Comments |
| Comment by Andreas Dilger [ 12/Jun/15 ] |
|
We saw some what is likely a related problem during performance testing. If "lfs mkdir -c" is used to create striped directories right after mount (before the MDSes all connect to each other), then the striped directories will have too few stripes. It would be interesting to get a debug log from the client and one of the MDSes, if possible, to see where it is spending so much time. Even with the MDSes creating 128*128=65536 connections between themselves, that shouldn't be more than a few seconds of RPCs. |
| Comment by Gerrit Updater [ 13/Jun/15 ] |
|
wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/15269 |
| Comment by Di Wang [ 13/Jun/15 ] |
|
This slowness might because lmv->lmv_init_mutex cover too much area in lmv_add_target. I just shrink the protection area of lmv_add_target(). see above patch. |
| Comment by Gerrit Updater [ 27/Jul/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15269/ |
| Comment by Peter Jones [ 28/Jul/15 ] |
|
Landed for 2.8 |