Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5690

Unable to parse the mdt.nosquash_nids parameter when using commas in expr_list

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.8.0
    • Lustre 2.4.3, Lustre 2.5.3
    • None
    • Lustre 2.4.3 w/ patch LU-4460
    • 2
    • 15931

    Description

      One of our support engineer is reporting that Lustre is unable to start the MDS when using a comma in the setting of mdt.nosquash_nids.

      We can find the LNET NID range syntax in the documentation (23.2.3. Tips on Using Root Squash). Setting mdt.nosquash_nids=192.168.0.[13,14]@tcp0 should be valid.

      <nidlist>     :== <nidrange> [ ' ' <nidrange> ]
      <nidrange>   :== <addrrange> '@' <net>
      <addrrange>  :== '*' |
                 <ipaddr_range> |
                 <numaddr_range>
      <ipaddr_range>       :==
      <numaddr_range>.<numaddr_range>.<numaddr_range>.<numaddr_range>
      <numaddr_range>      :== <number> |
                         <expr_list>
      <expr_list>  :== '[' <range_expr> [ ',' <range_expr>] ']'
      <range_expr> :== <number> |
                 <number> '-' <number> |
                 <number> '-' <number> '/' <number>
      <net>        :== <netname> | <netname><number>
      <netname>    :== "lo" | "tcp" | "o2ib" | "cib" | "openib" | "iib" | 
                 "vib" | "ra" | "elan" | "gm" | "mx" | "ptl"
      <number>     :== <nonnegative decimal> | <hexadecimal>
      

      I was able to reproduce this issue on my test lab. Here are the traces.

      [root@mds1 ~]# tunefs.lustre --erase-params --param mgsnode=192.168.122.41@tcp --param lov.stripecount=1 --param lov.stripesize=1048576 --param  "mdt.nosquash_nids=192.168.0.[13,14]@tcp0" /dev/vdb checking for existing Lustre data: found
      Reading CONFIGS/mountdata
      
         Read previous values:
      Target:     scratch-MDT0000
      Index:      0
      Lustre FS:  scratch
      Mount type: ldiskfs
      Flags:      0x1
                    (MDT )
      Persistent mount opts: user_xattr,errors=remount-ro
      Parameters: mgsnode=192.168.122.41@tcp lov.stripecount=1 lov.stripesize=1048576 mdt.nosquash_nids=192.168.0.13@tcp0
      
      
         Permanent disk data:
      Target:     scratch-MDT0000
      Index:      0
      Lustre FS:  scratch
      Mount type: ldiskfs
      Flags:      0x41
                    (MDT update )
      Persistent mount opts: user_xattr,errors=remount-ro
      Parameters: mgsnode=192.168.122.41@tcp lov.stripecount=1 lov.stripesize=1048576 mdt.nosquash_nids=192.168.0.[13,14]@tcp0
      
      Writing CONFIGS/mountdata
      [root@mds1 ~]# mount -v -t lustre /dev/vdb /mnt/fs/scratch/mdt/
      arg[0] = /sbin/mount.lustre
      arg[1] = -v
      arg[2] = -o
      arg[3] = rw
      arg[4] = /dev/vdb
      arg[5] = /mnt/fs/scratch/mdt/
      source = /dev/vdb (/dev/vdb), target = /mnt/fs/scratch/mdt
      options = rw
      checking for existing Lustre data: found
      Reading CONFIGS/mountdata
      mounting device /dev/vdb at /mnt/fs/scratch/mdt, flags=0x1000000 options=osd=osd-ldiskfs,user_xattr,errors=remount-ro,mgsnode=192.168.122.41@tcp,param=mgsnode=192.168.122.41@tcp,param=lov.stripecount=1,param=lov.stripesize=1048576,param=mdt.nosquash_nids=192.168.0.[13,14]@tcp0,svname=scratch-MDT0000,device=/dev/vdb
      mount.lustre: mount /dev/vdb at /mnt/fs/scratch/mdt failed: Invalid argument retries left: 0
      mount.lustre: mount /dev/vdb at /mnt/fs/scratch/mdt failed: Invalid argument
      This may have multiple causes.
      Are the mount options correct?
      Check the syslog for more info.
      [root@mds1 ~]# dmesg |tail
      LDISKFS-fs (vdb): barriers disabled
      LDISKFS-fs (vdb): mounted filesystem with ordered data mode. quota=on. Opts:
      LDISKFS-fs (vdb): barriers disabled
      LDISKFS-fs (vdb): mounted filesystem with ordered data mode. quota=on. Opts:
      LDISKFS-fs (vdb): Unrecognized mount option "14]@tcp0" or missing value   <=== NOT GOOD :-(
      LustreError: 8846:0:(osd_handler.c:5481:osd_mount()) scratch-MDT0000-osd: can't mount /dev/vdb: -22
      LustreError: 8846:0:(obd_config.c:572:class_setup()) setup scratch-MDT0000-osd failed (-22)
      LustreError: 8846:0:(obd_mount.c:201:lustre_start_simple()) scratch-MDT0000-osd setup error -22
      LustreError: 8846:0:(obd_mount_server.c:1686:server_fill_super()) Unable to start osd on /dev/vdb: -22
      LustreError: 8846:0:(obd_mount.c:1324:lustre_fill_super()) Unable to mount  (-22)
      

      In the debug trace:

      00000020:01000004:0.0:1411043156.536717:0:8846:0:(obd_mount.c:839:lmd_print()) options: user_xattr,errors=remount-ro,14]@tcp0
      

      Obviously, something is going wrong in the way the parameters are parsed.

      [root@mds1 ~]# mount -t ldiskfs /dev/vdb /mnt/fs/scratch/mdt/
      [root@mds1 ~]# hexdump -C /mnt/fs/scratch/mdt/CONFIGS/mountdata
      00000000  01 00 d0 1d 00 00 00 00  00 00 00 00 00 00 00 00  |................|
      00000010  01 00 00 00 41 00 00 00  00 00 00 00 01 00 00 00  |....A...........|
      00000020  73 63 72 61 74 63 68 00  00 00 00 00 00 00 00 00  |scratch.........|
      00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
      *
      00000060  73 63 72 61 74 63 68 2d  4d 44 54 30 30 30 30 00  |scratch-MDT0000.|
      00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
      *
      00001000  75 73 65 72 5f 78 61 74  74 72 2c 65 72 72 6f 72  |user_xattr,error|
      00001010  73 3d 72 65 6d 6f 75 6e  74 2d 72 6f 00 00 00 00  |s=remount-ro....|
      00001020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
      *
      00002000  20 6d 67 73 6e 6f 64 65  3d 31 39 32 2e 31 36 38  | mgsnode=192.168|
      00002010  2e 31 32 32 2e 34 31 40  74 63 70 20 6c 6f 76 2e  |.122.41@tcp lov.|
      00002020  73 74 72 69 70 65 63 6f  75 6e 74 3d 31 20 6c 6f  |stripecount=1 lo|
      00002030  76 2e 73 74 72 69 70 65  73 69 7a 65 3d 31 30 34  |v.stripesize=104|
      00002040  38 35 37 36 20 6d 64 74  2e 6e 6f 73 71 75 61 73  |8576 mdt.nosquas|
      00002050  68 5f 6e 69 64 73 3d 31  39 32 2e 31 36 38 2e 30  |h_nids=192.168.0|
      00002060  2e 5b 31 33 2c 31 34 5d  40 74 63 70 30 00 5d 40  |.[13,14]@tcp0.]@|
      00002070  74 63 70 00 00 00 00 00  00 00 00 00 00 00 00 00  |tcp.............|
      00002080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
      *
      00003000
      

      The flaw is in the way lmd_parse() parses the param flags in lustre/obdclass/obd_mount.c (source from Lustre 2.5.3):

      1143         } else if (strncmp(s1, "param=", 6) == 0) {
      1144             int length;
      1145             char *tail = strchr(s1 + 6, ',');
      1146             if (tail == NULL) {
      1147                 length = strlen(s1);
      1148             } else {
      1149                 lnet_nid_t nid;
      1150                 char      *param_str = tail + 1;
      1151                 int        supplementary = 1;
      1152 
      1153                 while (class_parse_nid_quiet(param_str, &nid,
      1154                                  &param_str) == 0) {
      1155                     supplementary = 0;
      1156                 }
      1157                 length = param_str - s1 - supplementary;
      1158             }
      1159             length -= 6;
      1160             strncat(lmd->lmd_params, s1 + 6, length);
      1161             strcat(lmd->lmd_params, " ");
      1162             s3 = s1 + 6 + length;
      1163             clear++;
      

      This part of code has been introduced by LU-4460. However, reverting the patch is not enough as the parsing will still fail because of line 1145 (introduced in commit 6a65619724).

      A workaround is to use mdt.nosquash_nids='192.168.0.13@tcp0 192.168.0.14@tcp0' (or of course 192.168.0.[13-14]@tcp0 here, but this is an example )

      Attachments

        Issue Links

          Activity

            People

              yujian Jian Yu
              bruno.travouillon Bruno Travouillon (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: