Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10040

nodemap and quota issues (ineffective GID mapping)

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.11.0, Lustre 2.10.2
    • Lustre 2.10.0, Lustre 2.10.1
    • 2
    • 9223372036854775807

    Description

      We're using the nodemap feature with map_mode=gid_only in production and we are seeing more and more issues with GID mapping, which seems to default to squash_gid instead of being properly mapped. The nodemap hasn't changed for these groups, we just add new groups from time to time.

      Example, configuration for mapping 'sherlock' on MGS:

      [root@oak-md1-s1 sherlock]# pwd
      /proc/fs/lustre/nodemap/sherlock
      
      [root@oak-md1-s1 sherlock]# cat ranges 
      [
       { id: 6, start_nid: 0.0.0.0@o2ib4, end_nid: 255.255.255.255@o2ib4 },
       { id: 5, start_nid: 0.0.0.0@o2ib3, end_nid: 255.255.255.255@o2ib3 }
      ]
      
      [root@oak-md1-s1 sherlock]# cat idmap 
      [
       { idtype: gid, client_id: 3525, fs_id: 3741 } { idtype: gid, client_id: 6401, fs_id: 3752 } { idtype: gid, client_id: 99001, fs_id: 3159 } { idtype: gid, client_id: 10525, fs_id: 3351 } { idtype: gid, client_id: 11886, fs_id: 3593 } { idtype: gid, client_id: 12193, fs_id: 3636 } { idtype: gid, client_id: 13103, fs_id: 3208 } { idtype: gid, client_id: 17079, fs_id: 3700 } { idtype: gid, client_id: 19437, fs_id: 3618 } { idtype: gid, client_id: 22959, fs_id: 3745 } { idtype: gid, client_id: 24369, fs_id: 3526 } { idtype: gid, client_id: 26426, fs_id: 3352 } { idtype: gid, client_id: 29361, fs_id: 3746 } { idtype: gid, client_id: 29433, fs_id: 3479 } { idtype: gid, client_id: 30289, fs_id: 3262 } { idtype: gid, client_id: 32264, fs_id: 3199 } { idtype: gid, client_id: 32774, fs_id: 3623 } { idtype: gid, client_id: 38517, fs_id: 3702 } { idtype: gid, client_id: 40387, fs_id: 3708 } { idtype: gid, client_id: 47235, fs_id: 3674 } { idtype: gid, client_id: 48931, fs_id: 3325 } { idtype: gid, client_id: 50590, fs_id: 3360 } { idtype: gid, client_id: 52892, fs_id: 3377 } { idtype: gid, client_id: 56316, fs_id: 3353 } { idtype: gid, client_id: 56628, fs_id: 3411 } { idtype: gid, client_id: 59943, fs_id: 3372 } { idtype: gid, client_id: 63938, fs_id: 3756 } { idtype: gid, client_id: 100533, fs_id: 3281 } { idtype: gid, client_id: 244300, fs_id: 3617 } { idtype: gid, client_id: 254778, fs_id: 3362 } { idtype: gid, client_id: 267829, fs_id: 3748 } { idtype: gid, client_id: 270331, fs_id: 3690 } { idtype: gid, client_id: 305454, fs_id: 3371 } { idtype: gid, client_id: 308753, fs_id: 3367 }
      
      [root@oak-md1-s1 sherlock]# cat squash_gid 
      99
      [root@oak-md1-s1 sherlock]# cat map_mode 
      gid_only
      
      [root@oak-md1-s1 sherlock]# cat admin_nodemap 
      0
      [root@oak-md1-s1 sherlock]# cat deny_unknown 
      1
      [root@oak-md1-s1 sherlock]# cat trusted_nodemap 
      0
      
      
      

      Issue with group: GID 3593 (mapped to GID 11886 on sherlock)

      lfs quota, not mapped (using canonical GID 3593):

      [root@oak-rbh01 ~]# lfs quota -g oak_euan /oak
      Disk quotas for group oak_euan (gid 3593):
           Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
                 /oak 33255114444  50000000000 50000000000       -  526016  7500000 7500000       -
      
      

      Broken lfs quota mapped on sherlock (o2ib4):

      [root@sh-113-01 ~]# lfs quota -g euan /oak
      Disk quotas for grp euan (gid 11886):
           Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
                 /oak 2875412844*      1       1       -      26*      1       1       -
      [root@sh-113-01 ~]# lctl list_nids
      10.9.113.1@o2ib4
      
      

      It matches the quota usage for squash_gid:

      [root@oak-rbh01 ~]# lfs quota -g 99 /oak
      Disk quotas for group 99 (gid 99):
           Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
                 /oak 2875412844*      1       1       -      26*      1       1       -
      
      

       

      Please note that GID mapping works OK for most of the groups though:

      3199 -> 32264(sherlock)
      
      canonical:
      [root@oak-rbh01 ~]# lfs quota -g oak_ruthm /oak
      Disk quotas for group oak_ruthm (gid 3199):
           Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
                 /oak 10460005688  20000000000 20000000000       - 1683058  3000000 3000000       -
      
      mapped (sherlock):
      [root@sh-113-01 ~]# lfs quota -g ruthm /oak
      Disk quotas for grp ruthm (gid 32264):
           Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
                 /oak 10460005688  20000000000 20000000000       - 1683058  3000000 3000000       -
      
      
      

      Failing over the MDT resolved a few groups, but not all. Failing the MDT back showed an issue on the exact same original groups having issues (currently 4-5).

      While I haven't seen it by myself yet, the issue seems to affect users as a few of them reported erroneous EDQUOT errors. This is why it is quite urgent to figure out what's wrong. Please note that the issue was already there before using the patch from LU-9929.

      I'm willing to attach some debug logs, but what debug flags should I enable to troubleshoot such a quota+nodemap issue on client and server?

      Thanks!
      Stephane

      Attachments

        1. break_nodemap_rbtree.sh
          0.7 kB
        2. oak-md1-s1.glb-grp.txt
          11 kB
        3. oak-md1-s2.dk.log
          1.25 MB
        4. oak-md1-s2.glb-grp.txt
          11 kB
        5. oak-md1-s2.mdt.dk.full.log
          53.84 MB
        6. reproducer.log
          3 kB
        7. sh-101-59.client.dk.full.log
          2.25 MB
        8. sh-113-01.dk.log
          547 kB

        Issue Links

          Activity

            [LU-10040] nodemap and quota issues (ineffective GID mapping)

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30206/
            Subject: LU-10040 nodemap: add nodemap idmap correctly
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: e881c665bb60543fd2bbbd2d195ccce99a65f16b

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30206/ Subject: LU-10040 nodemap: add nodemap idmap correctly Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: e881c665bb60543fd2bbbd2d195ccce99a65f16b

            James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/30206
            Subject: LU-10040 nodemap: add nodemap idmap correctly
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: a4de3c0f0ae3dbb684ba63874fa70e171c219cdf

            gerrit Gerrit Updater added a comment - James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/30206 Subject: LU-10040 nodemap: add nodemap idmap correctly Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: a4de3c0f0ae3dbb684ba63874fa70e171c219cdf
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29364/
            Subject: LU-10040 nodemap: add nodemap idmap correctly
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 253ccbd55ffe7fcdc405c9fcc4f72a47578920fe

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29364/ Subject: LU-10040 nodemap: add nodemap idmap correctly Project: fs/lustre-release Branch: master Current Patch Set: Commit: 253ccbd55ffe7fcdc405c9fcc4f72a47578920fe
            emoly.liu Emoly Liu added a comment -

            Stephane,

            That's great. After the MGS restarts/remounts, the other targets will detect the lock of config log changed and then fetch the config log from MGS to update their local copy.

            Thanks,

            Emoly

            emoly.liu Emoly Liu added a comment - Stephane, That's great. After the MGS restarts/remounts, the other targets will detect the lock of config log changed and then fetch the config log from MGS to update their local copy. Thanks, Emoly
            sthiell Stephane Thiell added a comment - - edited

            Hi Emoly,

            Good news. I renamed ./CONFIGS/nodemap into ./CONFIGS/nodemap.corrupted instead of removing it, but it worked! I was then able to mount the MGS and recreate all nodemaps by hand from there. And now, I can add new idmaps again and they are properly propagated to the targets. The corrupted 'sherlock' nodemap can't be seen anymore from the MGS.

            After some time, like a few minutes maybe (not immediately), the corrupted 'sherlock' nodemap was also automatically removed from all targets (MDT, OST). This is great.

            Thanks again! By the way, I am now running 2.10.1 with the patch on the MGS/MDS.

            Stephane

            sthiell Stephane Thiell added a comment - - edited Hi Emoly, Good news. I renamed ./CONFIGS/nodemap into ./CONFIGS/nodemap.corrupted instead of removing it, but it worked! I was then able to mount the MGS and recreate all nodemaps by hand from there. And now, I can add new idmaps again and they are properly propagated to the targets. The corrupted 'sherlock' nodemap can't be seen anymore from the MGS. After some time, like a few minutes maybe (not immediately), the corrupted 'sherlock' nodemap was also automatically removed from all targets (MDT, OST). This is great. Thanks again! By the way, I am now running 2.10.1 with the patch on the MGS/MDS. Stephane
            emoly.liu Emoly Liu added a comment - - edited

            Here are some steps to remove nodemap config log from MGS. This will remove all nodemap information from MGS, so before do that, you'd better save all of nodemap information by "cp -r /proc/fs/lustre/nodemap $nodemap_dir" or "lctl get_param nodemap.*.* > $nodemap_file".

            1. umount your MGS
            2. mount your MGS with ldiskfs type, by the command: mount -t ldiskfs $your_MGS_device $mountpoint
            3. cd $mountpoint, you will see file ./CONFIGS/nodemap. I also suggest to save a backup(e.g. /tmp/nodemap) before remove it.
            4. umount your MGS and remount it with lustre type

            Please let me know if this works for you.

            emoly.liu Emoly Liu added a comment - - edited Here are some steps to remove nodemap config log from MGS. This will remove all nodemap information from MGS, so before do that, you'd better save all of nodemap information by " cp -r /proc/fs/lustre/nodemap $nodemap_dir " or "lctl get_param nodemap.*.* > $nodemap_file". umount your MGS mount your MGS with ldiskfs type, by the command: mount -t ldiskfs $your_MGS_device $mountpoint cd $mountpoint , you will see file ./CONFIGS/nodemap . I also suggest to save a backup(e.g. /tmp/nodemap) before remove it. umount your MGS and remount it with lustre type Please let me know if this works for you.
            emoly.liu Emoly Liu added a comment -

            Stephane, I saw the same "-2" logs from my server on Oct. 9 when I tried to reproduce this issue. Let me see how to purge the nodemap log.

            emoly.liu Emoly Liu added a comment - Stephane, I saw the same "-2" logs from my server on Oct. 9 when I tried to reproduce this issue. Let me see how to purge the nodemap log.

            Also, I cannot remount the MGS anymore (2.10.1 + patch gerrit 29364):

            [ 1174.919438] LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
            [ 1174.932548] Lustre: 17247:0:(osd_handler.c:7008:osd_mount()) MGS-osd: device /dev/mapper/md1-rbod1-mgt was upgraded from Lustre-1.x without enabling the dirdata feature. If you do not want to downgrade to Lustre-1.x again, you can enable it via 'tune2fs -O dirdata device'
            [ 1175.062057] Lustre: 17247:0:(nodemap_storage.c:914:nodemap_load_entries()) MGS-osd: failed to load nodemap configuration: rc = -2
            [ 1175.075067] LustreError: 17247:0:(mgs_fs.c:187:mgs_fs_setup()) MGS: error loading nodemap config file, file must be removed via ldiskfs: rc = -2
            [ 1175.089557] LustreError: 17247:0:(mgs_handler.c:1297:mgs_init0()) MGS: MGS filesystem method init failed: rc = -2
            [ 1175.145812] LustreError: 17247:0:(obd_config.c:608:class_setup()) setup MGS failed (-2)
            [ 1175.154748] LustreError: 17247:0:(obd_mount.c:203:lustre_start_simple()) MGS setup error -2
            [ 1175.164081] LustreError: 17247:0:(obd_mount_server.c:135:server_deregister_mount()) MGS not registered
            [ 1175.174463] LustreError: 15e-a: Failed to start MGS 'MGS' (-2). Is the 'mgs' module loaded?
            [ 1175.282230] Lustre: server umount MGS complete
            
            sthiell Stephane Thiell added a comment - Also, I cannot remount the MGS anymore (2.10.1 + patch gerrit 29364): [ 1174.919438] LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc [ 1174.932548] Lustre: 17247:0:(osd_handler.c:7008:osd_mount()) MGS-osd: device /dev/mapper/md1-rbod1-mgt was upgraded from Lustre-1.x without enabling the dirdata feature. If you do not want to downgrade to Lustre-1.x again, you can enable it via 'tune2fs -O dirdata device' [ 1175.062057] Lustre: 17247:0:(nodemap_storage.c:914:nodemap_load_entries()) MGS-osd: failed to load nodemap configuration: rc = -2 [ 1175.075067] LustreError: 17247:0:(mgs_fs.c:187:mgs_fs_setup()) MGS: error loading nodemap config file, file must be removed via ldiskfs: rc = -2 [ 1175.089557] LustreError: 17247:0:(mgs_handler.c:1297:mgs_init0()) MGS: MGS filesystem method init failed: rc = -2 [ 1175.145812] LustreError: 17247:0:(obd_config.c:608:class_setup()) setup MGS failed (-2) [ 1175.154748] LustreError: 17247:0:(obd_mount.c:203:lustre_start_simple()) MGS setup error -2 [ 1175.164081] LustreError: 17247:0:(obd_mount_server.c:135:server_deregister_mount()) MGS not registered [ 1175.174463] LustreError: 15e-a: Failed to start MGS 'MGS' (-2). Is the 'mgs' module loaded? [ 1175.282230] Lustre: server umount MGS complete

            Hi,
            I applied the patch and tried on the MDS, but unfortunately it is not able to process nodemap log. I will need to find a way to purge the nodemap log.

            oak-MDT0000:

            [  127.492117] Lustre: Lustre: Build Version: 2.10.1_srcc02
            [  127.527461] LNet: Using FMR for registration
            [  127.553475] LNet: Added LNI 10.0.2.52@o2ib5 [8/256/0/180]
            [  190.367048] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc
            [  191.433340] LustreError: 137-5: oak-MDT0000_UUID: not available for connect from 10.210.45.60@o2ib3 (no target). If you are running an HA pair check that the target is mounted on the other server.
            [  191.452861] LustreError: Skipped 3 previous similar messages
            [  191.471790] Lustre: 13119:0:(mgc_request.c:1797:mgc_process_recover_nodemap_log()) MGC10.0.2.51@o2ib5: error processing nodemap log nodemap: rc = -2
            [  191.523256] Lustre: oak-MDT0000: Not available for connect from 10.210.47.38@o2ib3 (not set up)
            [  191.532970] Lustre: Skipped 3 previous similar messages
            [  192.015060] Lustre: oak-MDT0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
            [  192.501895] Lustre: oak-MDD0000: changelog on
            [  192.549977] Lustre: oak-MDT0000: Will be in recovery for at least 2:30, or until 1212 clients reconnect
            [  192.560492] Lustre: oak-MDT0000: Connection restored to cd0e08e0-aa22-f4da-21ed-94f218f886a1 (at 10.210.45.100@o2ib3)
            [  192.595309] Lustre: oak-MDT0000: root_squash is set to 99:99
            [  192.603004] Lustre: oak-MDT0000: nosquash_nids set to 10.0.2.[1-3]@o2ib5 10.0.2.[51-58]@o2ib5 10.0.2.[101-120]@o2ib5 10.0.2.[221-223]@o2ib5 10.0.2.[226-229]@o2ib5 10.0.2.[232-235]@o2ib5 10.0.2.[240-241]@o2ib5 10.210.47.253@o2ib3 10.9.0.[1-2]@o2ib4
            ...
            
            

            Thanks,
            Stephane

            sthiell Stephane Thiell added a comment - Hi, I applied the patch and tried on the MDS, but unfortunately it is not able to process nodemap log. I will need to find a way to purge the nodemap log. oak-MDT0000: [ 127.492117] Lustre: Lustre: Build Version: 2.10.1_srcc02 [ 127.527461] LNet: Using FMR for registration [ 127.553475] LNet: Added LNI 10.0.2.52@o2ib5 [8/256/0/180] [ 190.367048] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc [ 191.433340] LustreError: 137-5: oak-MDT0000_UUID: not available for connect from 10.210.45.60@o2ib3 (no target). If you are running an HA pair check that the target is mounted on the other server. [ 191.452861] LustreError: Skipped 3 previous similar messages [ 191.471790] Lustre: 13119:0:(mgc_request.c:1797:mgc_process_recover_nodemap_log()) MGC10.0.2.51@o2ib5: error processing nodemap log nodemap: rc = -2 [ 191.523256] Lustre: oak-MDT0000: Not available for connect from 10.210.47.38@o2ib3 (not set up) [ 191.532970] Lustre: Skipped 3 previous similar messages [ 192.015060] Lustre: oak-MDT0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [ 192.501895] Lustre: oak-MDD0000: changelog on [ 192.549977] Lustre: oak-MDT0000: Will be in recovery for at least 2:30, or until 1212 clients reconnect [ 192.560492] Lustre: oak-MDT0000: Connection restored to cd0e08e0-aa22-f4da-21ed-94f218f886a1 (at 10.210.45.100@o2ib3) [ 192.595309] Lustre: oak-MDT0000: root_squash is set to 99:99 [ 192.603004] Lustre: oak-MDT0000: nosquash_nids set to 10.0.2.[1-3]@o2ib5 10.0.2.[51-58]@o2ib5 10.0.2.[101-120]@o2ib5 10.0.2.[221-223]@o2ib5 10.0.2.[226-229]@o2ib5 10.0.2.[232-235]@o2ib5 10.0.2.[240-241]@o2ib5 10.210.47.253@o2ib3 10.9.0.[1-2]@o2ib4 ... Thanks, Stephane

            People

              emoly.liu Emoly Liu
              sthiell Stephane Thiell
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: