Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6713

Noisy error messages on client while creating DNE filesystem

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.8.0
    • Lustre 2.7.0, Lustre 2.8.0
    • None
    • 3
    • 9223372036854775807

    Description

      Seen on Lustre 2.7.0.

      While create a 128 MDT filesystem, I noticed that clients sometimes take a long time to connect to new MDTs after they've been added. I saw a lot of these messages on a client's console:

      Jun 12 00:08:13 client00 kernel: LustreError: 1275:0:(fld_request.c:170:fld_client_add_target()) Skipped 12 previous similar messages
      Jun 12 00:08:13 client00 kernel: Lustre: 1275:0:(lmv_obd.c:300:lmv_init_ea_size()) scratch-clilmv-ffff880772a0ec00: NULL export for 11
      Jun 12 00:08:13 client00 kernel: Lustre: 1275:0:(lmv_obd.c:300:lmv_init_ea_size()) Skipped 462 previous similar messages
      Jun 12 00:08:19 client00 kernel: LustreError: 1277:0:(fld_request.c:170:fld_client_add_target()) cli-scratch-clilmv-ffff880772a0ec00: Attempt to add target scratch-MDT0025-mdc-ffff880772a0ec00 (idx 37) on fly - skip it
      Jun 12 00:08:19 client00 kernel: LustreError: 1277:0:(fld_request.c:170:fld_client_add_target()) Skipped 13 previous similar messages
      Jun 12 00:08:19 client00 kernel: Lustre: 1277:0:(lmv_obd.c:300:lmv_init_ea_size()) scratch-clilmv-ffff880772a0ec00: NULL export for 12
      Jun 12 00:08:19 client00 kernel: Lustre: 1277:0:(lmv_obd.c:300:lmv_init_ea_size()) Skipped 258 previous similar messages
      Jun 12 00:08:25 client00 kernel: Lustre: 1278:0:(lmv_obd.c:300:lmv_init_ea_size()) scratch-clilmv-ffff880772a0ec00: NULL export for 13
      Jun 12 00:08:25 client00 kernel: Lustre: 1278:0:(lmv_obd.c:300:lmv_init_ea_size()) Skipped 56 previous similar messages
      Jun 12 00:08:25 client00 kernel: LustreError: 1278:0:(fld_request.c:170:fld_client_add_target()) cli-scratch-clilmv-ffff880772a0ec00: Attempt to add target scratch-MDT0027-mdc-ffff880772a0ec00 (idx 39) on fly - skip it
      Jun 12 00:08:25 client00 kernel: LustreError: 1278:0:(fld_request.c:170:fld_client_add_target()) Skipped 8 previous similar messages
      

      Eventually the client did connect to all the MDTs, but took about ~20 minutes.

      Attachments

        Issue Links

          Activity

            [LU-6713] Noisy error messages on client while creating DNE filesystem
            pjones Peter Jones added a comment -

            Landed for 2.8

            pjones Peter Jones added a comment - Landed for 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15269/
            Subject: LU-6713 lmv: lock necessary part of lmv_add_target
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 1670c57315340db997c9058950148a05634f43f1

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15269/ Subject: LU-6713 lmv: lock necessary part of lmv_add_target Project: fs/lustre-release Branch: master Current Patch Set: Commit: 1670c57315340db997c9058950148a05634f43f1
            di.wang Di Wang added a comment -

            This slowness might because lmv->lmv_init_mutex cover too much area in lmv_add_target. I just shrink the protection area of lmv_add_target(). see above patch.

            di.wang Di Wang added a comment - This slowness might because lmv->lmv_init_mutex cover too much area in lmv_add_target. I just shrink the protection area of lmv_add_target(). see above patch.

            wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/15269
            Subject: LU-6713 lmv: lock necessary part of lmv_add_target
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e04b00e82d25202950380d7cc2b31db9aff7d27a

            gerrit Gerrit Updater added a comment - wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/15269 Subject: LU-6713 lmv: lock necessary part of lmv_add_target Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e04b00e82d25202950380d7cc2b31db9aff7d27a

            We saw some what is likely a related problem during performance testing. If "lfs mkdir -c" is used to create striped directories right after mount (before the MDSes all connect to each other), then the striped directories will have too few stripes.

            It would be interesting to get a debug log from the client and one of the MDSes, if possible, to see where it is spending so much time. Even with the MDSes creating 128*128=65536 connections between themselves, that shouldn't be more than a few seconds of RPCs.

            adilger Andreas Dilger added a comment - We saw some what is likely a related problem during performance testing. If "lfs mkdir -c" is used to create striped directories right after mount (before the MDSes all connect to each other), then the striped directories will have too few stripes. It would be interesting to get a debug log from the client and one of the MDSes, if possible, to see where it is spending so much time. Even with the MDSes creating 128*128=65536 connections between themselves, that shouldn't be more than a few seconds of RPCs.

            People

              di.wang Di Wang
              rread Robert Read
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: