Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12506

Client unable to mount filesystem with very large number of MDTs

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.14.0, Lustre 2.12.7
    • Lustre 2.10.8, Lustre 2.12.3
    • None
    • 3
    • 9223372036854775807

    Description

      Hello,
      There was a message on the lustre-discuss list about this issue back in May (http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2019-May/016475.html) - and I've managed to reproduce this error. I couldn't find an open ticket for it however so I wanted to create one.

      My environment is the following:

      Servers and Clients are using the upstream 2.12.2 and same kernel version:

      [root@dac-e-1 ~]# lfs --version
      lfs 2.12.2
      # Server kernel version
      3.10.0-957.10.1.el7_lustre.x86_64
      # Client kernel version (unpatched)
      3.10.0-957.10.1.el7.x86_64
      

      There are 24 servers, each containing 12x NVMe flash devices. For this test I am configuring the block-devices on each server identically, with 3 devices on each server partitioned into a 200G MDT and the remaining space as OST.

      Altogether this makes 72 MDTs, and 288 OSTs in the filesystem.

      Below are the syslog messages from the client and servers when attempting to mount the filesystem:

      Client syslog - Nid: 10.47.21.72@o2ib1
      -- Logs begin at Wed 2019-07-03 19:54:04 BST, end at Thu 2019-07-04 13:06:12 BST. --
      Jul 04 12:59:43 cpu-e-1095 kernel: Lustre: DEBUG MARKER: Attempting client mount from 10.47.21.72@o2ib1
      Jul 04 12:59:56 cpu-e-1095 kernel: LustreError: 94792:0:(mdc_request.c:2700:mdc_setup()) fs1-MDT0031-mdc-ffff9f4c85ad8000: failed to setup changelog char device: rc = -16
      Jul 04 12:59:56 cpu-e-1095 kernel: LustreError: 94792:0:(obd_config.c:559:class_setup()) setup fs1-MDT0031-mdc-ffff9f4c85ad8000 failed (-16)
      Jul 04 12:59:56 cpu-e-1095 kernel: LustreError: 94792:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.47.18.1@o2ib1: cfg command failed: rc = -16
      Jul 04 12:59:56 cpu-e-1095 kernel: Lustre:    cmd=cf003 0:fs1-MDT0031-mdc  1:fs1-MDT0031_UUID  2:10.47.18.17@o2ib1  
      Jul 04 12:59:56 cpu-e-1095 kernel: LustreError: 15c-8: MGC10.47.18.1@o2ib1: The configuration from log 'fs1-client' failed (-16). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      Jul 04 12:59:56 cpu-e-1095 kernel: LustreError: 94774:0:(obd_config.c:610:class_cleanup()) Device 58 not setup
      Jul 04 12:59:56 cpu-e-1095 kernel: Lustre: Unmounted fs1-client
      Jul 04 12:59:56 cpu-e-1095 kernel: LustreError: 94774:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount  (-16)
      
      Servers syslog
      [root@xcat1 ~]# xdsh csd3-buff 'journalctl -a --since "12:59" _TRANSPORT=kernel' | xdshbak -c                                                                                                  
      HOSTS -------------------------------------------------------------------------
      dac-e-1
      -------------------------------------------------------------------------------
      -- Logs begin at Thu 2019-03-21 15:42:02 GMT, end at Thu 2019-07-04 13:04:24 BST. --
      Jul 04 12:59:43 dac-e-1 kernel: Lustre: DEBUG MARKER: Attempting client mount from 10.47.21.72@o2ib1
      Jul 04 12:59:55 dac-e-1 kernel: Lustre: MGS: Connection restored to 08925711-bdfa-621f-89ec-0364645c915c (at 10.47.21.72@o2ib1)
      Jul 04 12:59:55 dac-e-1 kernel: Lustre: Skipped 2036 previous similar messages
      
      HOSTS -------------------------------------------------------------------------
      dac-e-10, dac-e-11, dac-e-12, dac-e-13, dac-e-14, dac-e-15, dac-e-16, dac-e-17, dac-e-18, dac-e-19, dac-e-2, dac-e-20, dac-e-21, dac-e-22, dac-e-23, dac-e-24, dac-e-3, dac-e-4, dac-e-5, dac-e-6, dac-e-7, dac-e-8, dac-e-9
      -------------------------------------------------------------------------------
      -- No entries --
      

      Attached are lustre debug logs from both the client and the dac-e-1 server which contains the MGT.

      I can provide debug logs from all 24 servers if that would help, just let me know.

      I've successfully used the same configuration with 2x MDTs per server, so 48 MDTs in total, without problem, but I haven't confirmed what Scott mentioned on the mailing list about the failure starting at 56 MDTs.

      Thanks,
      Matt

      Attachments

        Issue Links

          Activity

            [LU-12506] Client unable to mount filesystem with very large number of MDTs

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/42087/
            Subject: LU-12506 changelog: support large number of MDT
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 0596a16841406b93ec1e348fcc9eecce62d9fe8b

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/42087/ Subject: LU-12506 changelog: support large number of MDT Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 0596a16841406b93ec1e348fcc9eecce62d9fe8b

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/42087
            Subject: LU-12506 changelog: support large number of MDT
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: b9380fe5ed814d91dac2d1d03ad817ffb0869766

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/42087 Subject: LU-12506 changelog: support large number of MDT Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: b9380fe5ed814d91dac2d1d03ad817ffb0869766
            gerrit Gerrit Updater added a comment - - edited

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41485
            Subject: LU-12506 tests: handle more MDTs in sanity.sh
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 28fa92e0552f0f9135256fa4611c68e5c6396773

            Pushed to LU-14058 instead.

            gerrit Gerrit Updater added a comment - - edited Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41485 Subject: LU-12506 tests: handle more MDTs in sanity.sh Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 28fa92e0552f0f9135256fa4611c68e5c6396773 Pushed to LU-14058 instead.

            Peter,

            is it possible to backport this patch to 2.12 and include it into 2.12.6 release? This will simplify upgrade on nodes with upstream client installed otherwise I will have to fork off.

            I have this patch tested  on rhel with 88MDTs for the code built from the HPE source tree.

             

             

            alex.ku Alex Kulyavtsev added a comment - Peter, is it possible to backport this patch to 2.12 and include it into 2.12.6 release? This will simplify upgrade on nodes with upstream client installed otherwise I will have to fork off. I have this patch tested  on rhel with 88MDTs for the code built from the HPE source tree.    
            pjones Peter Jones added a comment -

            The fix itself has landed for 2.14. The creation of a test is being tracked under LU-14058

            pjones Peter Jones added a comment - The fix itself has landed for 2.14. The creation of a test is being tracked under LU-14058
            spitzcor Cory Spitz added a comment -

            > The patch is required on clients but not on servers ?
            Yes, https://review.whamcloud.com/#/c/37759/ only affects mdc.

            spitzcor Cory Spitz added a comment - > The patch is required on clients but not on servers ? Yes, https://review.whamcloud.com/#/c/37759/ only affects mdc.

            Do you have this patch backported to b2_12 ?

            Can it be backported to upcoming 2.12.6 release ? 

            The patch is required on clients but not on servers ?

             I likely hit this issue when trying to mount two large lustre fs with 40 MDT each on the same client. MDT count 2*40=80 > 64. I can mount these lustre fs one at a time but not both at the same time.

            alex.ku Alex Kulyavtsev added a comment - Do you have this patch backported to b2_12 ? Can it be backported to upcoming 2.12.6 release ?  The patch is required on clients but not on servers ?  I likely hit this issue when trying to mount two large lustre fs with 40 MDT each on the same client. MDT count 2*40=80 > 64. I can mount these lustre fs one at a time but not both at the same time.
            pjones Peter Jones added a comment -

            ok Matt fair enough. Let's engage again if/when you are ready to start raising the bar again

            pjones Peter Jones added a comment - ok Matt fair enough. Let's engage again if/when you are ready to start raising the bar again

            Hi Peter, I'm afraid I haven't tested it no, and I'm unlikely to be able to do so for some time now as I'm not actively working with this system to test with any more.

            It might be something I get to look at again in Q3/Q4 this year as we will be installing more all-flash nodes to double the size of our current all-flash Lustre, so I imagine we will be doing some intensive work benchmarking it once the system integration is done. Indeed with the number of servers we'll have at that point, we will be getting close to needing it if we wanted to have an MDT on every server in the filesystem.

            Cheers,
            Matt

            mrb Matt Rásó-Barnett (Inactive) added a comment - Hi Peter, I'm afraid I haven't tested it no, and I'm unlikely to be able to do so for some time now as I'm not actively working with this system to test with any more. It might be something I get to look at again in Q3/Q4 this year as we will be installing more all-flash nodes to double the size of our current all-flash Lustre, so I imagine we will be doing some intensive work benchmarking it once the system integration is done. Indeed with the number of servers we'll have at that point, we will be getting close to needing it if we wanted to have an MDT on every server in the filesystem. Cheers, Matt
            pjones Peter Jones added a comment -

            mrb have you ever re-tried this test with the fix in place?

            pjones Peter Jones added a comment - mrb have you ever re-tried this test with the fix in place?

            People

              hongchao.zhang Hongchao Zhang
              mrb Matt Rásó-Barnett (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: