Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12506

Client unable to mount filesystem with very large number of MDTs

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.14.0, Lustre 2.12.7
    • Lustre 2.10.8, Lustre 2.12.3
    • None
    • 3
    • 9223372036854775807

    Description

      Hello,
      There was a message on the lustre-discuss list about this issue back in May (http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2019-May/016475.html) - and I've managed to reproduce this error. I couldn't find an open ticket for it however so I wanted to create one.

      My environment is the following:

      Servers and Clients are using the upstream 2.12.2 and same kernel version:

      [root@dac-e-1 ~]# lfs --version
      lfs 2.12.2
      # Server kernel version
      3.10.0-957.10.1.el7_lustre.x86_64
      # Client kernel version (unpatched)
      3.10.0-957.10.1.el7.x86_64
      

      There are 24 servers, each containing 12x NVMe flash devices. For this test I am configuring the block-devices on each server identically, with 3 devices on each server partitioned into a 200G MDT and the remaining space as OST.

      Altogether this makes 72 MDTs, and 288 OSTs in the filesystem.

      Below are the syslog messages from the client and servers when attempting to mount the filesystem:

      Client syslog - Nid: 10.47.21.72@o2ib1
      -- Logs begin at Wed 2019-07-03 19:54:04 BST, end at Thu 2019-07-04 13:06:12 BST. --
      Jul 04 12:59:43 cpu-e-1095 kernel: Lustre: DEBUG MARKER: Attempting client mount from 10.47.21.72@o2ib1
      Jul 04 12:59:56 cpu-e-1095 kernel: LustreError: 94792:0:(mdc_request.c:2700:mdc_setup()) fs1-MDT0031-mdc-ffff9f4c85ad8000: failed to setup changelog char device: rc = -16
      Jul 04 12:59:56 cpu-e-1095 kernel: LustreError: 94792:0:(obd_config.c:559:class_setup()) setup fs1-MDT0031-mdc-ffff9f4c85ad8000 failed (-16)
      Jul 04 12:59:56 cpu-e-1095 kernel: LustreError: 94792:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.47.18.1@o2ib1: cfg command failed: rc = -16
      Jul 04 12:59:56 cpu-e-1095 kernel: Lustre:    cmd=cf003 0:fs1-MDT0031-mdc  1:fs1-MDT0031_UUID  2:10.47.18.17@o2ib1  
      Jul 04 12:59:56 cpu-e-1095 kernel: LustreError: 15c-8: MGC10.47.18.1@o2ib1: The configuration from log 'fs1-client' failed (-16). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      Jul 04 12:59:56 cpu-e-1095 kernel: LustreError: 94774:0:(obd_config.c:610:class_cleanup()) Device 58 not setup
      Jul 04 12:59:56 cpu-e-1095 kernel: Lustre: Unmounted fs1-client
      Jul 04 12:59:56 cpu-e-1095 kernel: LustreError: 94774:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount  (-16)
      
      Servers syslog
      [root@xcat1 ~]# xdsh csd3-buff 'journalctl -a --since "12:59" _TRANSPORT=kernel' | xdshbak -c                                                                                                  
      HOSTS -------------------------------------------------------------------------
      dac-e-1
      -------------------------------------------------------------------------------
      -- Logs begin at Thu 2019-03-21 15:42:02 GMT, end at Thu 2019-07-04 13:04:24 BST. --
      Jul 04 12:59:43 dac-e-1 kernel: Lustre: DEBUG MARKER: Attempting client mount from 10.47.21.72@o2ib1
      Jul 04 12:59:55 dac-e-1 kernel: Lustre: MGS: Connection restored to 08925711-bdfa-621f-89ec-0364645c915c (at 10.47.21.72@o2ib1)
      Jul 04 12:59:55 dac-e-1 kernel: Lustre: Skipped 2036 previous similar messages
      
      HOSTS -------------------------------------------------------------------------
      dac-e-10, dac-e-11, dac-e-12, dac-e-13, dac-e-14, dac-e-15, dac-e-16, dac-e-17, dac-e-18, dac-e-19, dac-e-2, dac-e-20, dac-e-21, dac-e-22, dac-e-23, dac-e-24, dac-e-3, dac-e-4, dac-e-5, dac-e-6, dac-e-7, dac-e-8, dac-e-9
      -------------------------------------------------------------------------------
      -- No entries --
      

      Attached are lustre debug logs from both the client and the dac-e-1 server which contains the MGT.

      I can provide debug logs from all 24 servers if that would help, just let me know.

      I've successfully used the same configuration with 2x MDTs per server, so 48 MDTs in total, without problem, but I haven't confirmed what Scott mentioned on the mailing list about the failure starting at 56 MDTs.

      Thanks,
      Matt

      Attachments

        Issue Links

          Activity

            [LU-12506] Client unable to mount filesystem with very large number of MDTs
            spitzcor Cory Spitz added a comment -

            > The patch is required on clients but not on servers ?
            Yes, https://review.whamcloud.com/#/c/37759/ only affects mdc.

            spitzcor Cory Spitz added a comment - > The patch is required on clients but not on servers ? Yes, https://review.whamcloud.com/#/c/37759/ only affects mdc.

            Do you have this patch backported to b2_12 ?

            Can it be backported to upcoming 2.12.6 release ? 

            The patch is required on clients but not on servers ?

             I likely hit this issue when trying to mount two large lustre fs with 40 MDT each on the same client. MDT count 2*40=80 > 64. I can mount these lustre fs one at a time but not both at the same time.

            alex.ku Alex Kulyavtsev added a comment - Do you have this patch backported to b2_12 ? Can it be backported to upcoming 2.12.6 release ?  The patch is required on clients but not on servers ?  I likely hit this issue when trying to mount two large lustre fs with 40 MDT each on the same client. MDT count 2*40=80 > 64. I can mount these lustre fs one at a time but not both at the same time.
            pjones Peter Jones added a comment -

            ok Matt fair enough. Let's engage again if/when you are ready to start raising the bar again

            pjones Peter Jones added a comment - ok Matt fair enough. Let's engage again if/when you are ready to start raising the bar again

            Hi Peter, I'm afraid I haven't tested it no, and I'm unlikely to be able to do so for some time now as I'm not actively working with this system to test with any more.

            It might be something I get to look at again in Q3/Q4 this year as we will be installing more all-flash nodes to double the size of our current all-flash Lustre, so I imagine we will be doing some intensive work benchmarking it once the system integration is done. Indeed with the number of servers we'll have at that point, we will be getting close to needing it if we wanted to have an MDT on every server in the filesystem.

            Cheers,
            Matt

            mrb Matt Rásó-Barnett (Inactive) added a comment - Hi Peter, I'm afraid I haven't tested it no, and I'm unlikely to be able to do so for some time now as I'm not actively working with this system to test with any more. It might be something I get to look at again in Q3/Q4 this year as we will be installing more all-flash nodes to double the size of our current all-flash Lustre, so I imagine we will be doing some intensive work benchmarking it once the system integration is done. Indeed with the number of servers we'll have at that point, we will be getting close to needing it if we wanted to have an MDT on every server in the filesystem. Cheers, Matt
            pjones Peter Jones added a comment -

            mrb have you ever re-tried this test with the fix in place?

            pjones Peter Jones added a comment - mrb have you ever re-tried this test with the fix in place?

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37917/
            Subject: LU-12506 mdc: clean up code style for mdc_locks.c
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 0716f5a9d98a4fa299b2cfc7cfee236313e3dbcc

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37917/ Subject: LU-12506 mdc: clean up code style for mdc_locks.c Project: fs/lustre-release Branch: master Current Patch Set: Commit: 0716f5a9d98a4fa299b2cfc7cfee236313e3dbcc

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38058
            Subject: LU-12506 tests: clean up MDT name generation
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e5d323b7a9c1aa5969b90ef4fc3ec302a23d46e9

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38058 Subject: LU-12506 tests: clean up MDT name generation Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e5d323b7a9c1aa5969b90ef4fc3ec302a23d46e9

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37759/
            Subject: LU-12506 changelog: support large number of MDT
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: d0423abc1adc717b08de61be3556688cccd52ddf

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37759/ Subject: LU-12506 changelog: support large number of MDT Project: fs/lustre-release Branch: master Current Patch Set: Commit: d0423abc1adc717b08de61be3556688cccd52ddf

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37917
            Subject: LU-12506 mdc: clean up code style for mdc_locks.c
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: d08b729acb70fba933da40e7699b621e2643355f

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37917 Subject: LU-12506 mdc: clean up code style for mdc_locks.c Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d08b729acb70fba933da40e7699b621e2643355f

            Hi John,
            Thanks! It's a better solution to replace miscdevice with dynamic devices, I have updated the patch accordingly. Thanks

            hongchao.zhang Hongchao Zhang added a comment - Hi John, Thanks! It's a better solution to replace miscdevice with dynamic devices, I have updated the patch accordingly. Thanks
            jhammond John Hammond added a comment - - edited

            This could/should be solved by using dynamic devices instead of misc devices. See https://review.whamcloud.com/#/c/37552/4/lustre/ofd/ofd_access_log.c@406 for an approach which should work here as sell.

            jhammond John Hammond added a comment - - edited This could/should be solved by using dynamic devices instead of misc devices. See https://review.whamcloud.com/#/c/37552/4/lustre/ofd/ofd_access_log.c@406 for an approach which should work here as sell.

            People

              hongchao.zhang Hongchao Zhang
              mrb Matt Rásó-Barnett (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: