Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8498

configuration from log 'nodemap' failed (-22)

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.9.0
    • None
    • 1
    • 9223372036854775807

    Description

      In current master 2.8.56, a newly created file system failed at mounting OST due to nodemap log error.

      From the log message:

      00000100:00000001:3.0:1470938259.243549:0:9524:0:(client.c:1052:ptlrpc_set_destroy()) Process leaving
      00000100:00000001:3.0:1470938259.243549:0:9524:0:(client.c:2896:ptlrpc_queue_wait()) Process leaving (rc=0 : 0 : 0)
      10000000:00000001:3.0:1470938259.243551:0:9524:0:(mgc_request.c:1716:mgc_process_recover_nodemap_log()) Process leaving via out (rc=18446744073709551594 : -22 : 0xffffffffffffffea)
      

      it looks like the corresponding log has a zero size that triggered this error.

              if (ealen == 0) { /* no logs transferred */
      #ifdef HAVE_SERVER_SUPPORT
                      /* config changed since first read RPC */
                      if (cld_is_nodemap(cld) && config_read_offset == 0) {
                              recent_nodemap = NULL;
                              nodemap_config_dealloc(new_config);
                              new_config = NULL;
      
                              CDEBUG(D_INFO, "nodemap config changed in transit, retrying\n");
      
                              /* setting eof to false, we request config again */
                              eof = false;
                              GOTO(out, rc = 0);
                      }
      #endif
                      if (!eof)
                              rc = -EINVAL;
                      GOTO(out, rc);
              }
      

      We have a debug log and will attach it soon.

      Attachments

        1. log_failedmount_vanilla_master
          0.2 kB
          John Salinas
        2. log.Jinshan
          0.2 kB
          John Salinas
        3. log.mds.LU-8498
          0.2 kB
          John Salinas
        4. mds_lctl_log_afterpatch
          0.2 kB
          John Salinas
        5. oss_lctl_log.afterpatch
          0.2 kB
          John Salinas

        Issue Links

          Activity

            [LU-8498] configuration from log 'nodemap' failed (-22)
            donut-crowd Donut Crowd (Inactive) made changes -
            Remote Link Original: This issue links to "Page (HPDD Community Wiki)" [ 18190 ]
            pjones Peter Jones made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            pjones Peter Jones added a comment -

            Landed for 2.9

            pjones Peter Jones added a comment - Landed for 2.9

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21939/
            Subject: LU-8498 nodemap: new zfs index files not properly initialized
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: dd69c1da4ab22c66801df6a6e316d210ea71454d

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21939/ Subject: LU-8498 nodemap: new zfs index files not properly initialized Project: fs/lustre-release Branch: master Current Patch Set: Commit: dd69c1da4ab22c66801df6a6e316d210ea71454d
            jsalians_intel John Salinas (Inactive) made changes -
            Priority Original: Blocker [ 1 ] New: Critical [ 2 ]

            It appears sometimes the ost still does connect – I just tried now and of course it connected because I was trying to collect more information. I will lower priority for now.

            jsalians_intel John Salinas (Inactive) added a comment - It appears sometimes the ost still does connect – I just tried now and of course it connected because I was trying to collect more information. I will lower priority for now.

            Hey John,

            I see that there's a nodemap error, which is definitely a problem, but it looks like the OST still connects ok:
            Aug 20 19:55:46 wolf-4 kernel: Lustre: lsdraid-OST0001: Connection restored to 192.168.1.5@o2ib (at 192.168.1.5@o2ib)

            Was there a problem on the clients as well? Do you happen to have the debug logs for this time period for the OSS and MDS?

            BTW there's another nodemap related bug / patch here:
            http://review.whamcloud.com/#/c/22004/

            I don't think it's related to what you're seeing, but it could be worth adding to the mix.

            kit.westneat Kit Westneat (Inactive) added a comment - Hey John, I see that there's a nodemap error, which is definitely a problem, but it looks like the OST still connects ok: Aug 20 19:55:46 wolf-4 kernel: Lustre: lsdraid-OST0001: Connection restored to 192.168.1.5@o2ib (at 192.168.1.5@o2ib) Was there a problem on the clients as well? Do you happen to have the debug logs for this time period for the OSS and MDS? BTW there's another nodemap related bug / patch here: http://review.whamcloud.com/#/c/22004/ I don't think it's related to what you're seeing, but it could be worth adding to the mix.

            It does indeed appear that I can only connect 1 ost – when the second ost tries to connect I see this:

            [root@wolf-4 mdisolation_step2_draid_testing]# zpool create -f -o cachefile=none -O recordsize=16MB ost1 draid1 /dev/mapper/mpathl /dev/mapper/mpathk /dev/mapper/mpathj /dev/mapper/mpathi /dev/mapper/mpathr /dev/mapper/mpathm /dev/mapper/mpathb /dev/mapper/mpatha /dev/mapper/mpatho /dev/mapper/mpathn /dev/mapper/mpathq spare '$draid1-0-s0' metadata mirror /dev/mapper/mpathg /dev/mapper/mpathx
            [root@wolf-4 mdisolation_step2_draid_testing]# zpool feature@large_blocks=enabled ost1
            [root@wolf-4 mdisolation_step2_draid_testing]# mkfs.lustre --backfstype=zfs --reformat --fsname=lsdraid --ost --index=1 --mgsnode=192.168.1.5@o2ib ost1/ost1

            Permanent disk data:
            Target: lsdraid:OST0001
            Index: 1
            Lustre FS: lsdraid
            Mount type: zfs
            Flags: 0x62
            (OST first_time update )
            Persistent mount opts:
            Parameters: mgsnode=192.168.1.5@o2ib

            mkfs_cmd = zfs create -o canmount=off -o xattr=sa ost1/ost1
            Writing ost1/ost1 properties
            lustre:version=1
            lustre:flags=98
            lustre:index=1
            lustre:fsname=lsdraid
            lustre:svname=lsdraid:OST0001
            lustre:mgsnode=192.168.1.5@o2ib
            [root@wolf-4 mdisolation_step2_draid_testing]# mount -t lustre ost1/ost1 /mnt/lustre/ost1

            Aug 20 19:55:13 wolf-4 kernel: ZFS: spa metadata allocation class feature not enabled (ddt,dmu,mos)
            Aug 20 19:55:13 wolf-4 kernel: ZFS: vdev_alloc primary 'ddt,dmu,mos'
            Aug 20 19:55:14 wolf-4 kernel: ZFS: vdev_construct_zaps assign primary 'ddt,dmu,mos'
            Aug 20 19:55:27 wolf-4 ntpd_intres[2643]: ntp_intres.request: permission denied
            Aug 20 19:55:27 wolf-4 ntpd_intres[2643]: ntp_intres.request: permission denied
            Aug 20 19:55:41 wolf-4 kernel: LustreError: 41114:0:(mgc_request.c:257:do_config_log_add()) MGC192.168.1.5@o2ib: failed processing log, type 4: rc = -22
            Aug 20 19:55:41 wolf-4 kernel: Lustre: lsdraid-OST0001: new disk, initializing
            Aug 20 19:55:41 wolf-4 kernel: Lustre: srv-lsdraid-OST0001: No data found on store. Initialize space
            Aug 20 19:55:46 wolf-4 kernel: Lustre: lsdraid-OST0001: Connection restored to 192.168.1.5@o2ib (at 192.168.1.5@o2ib)
            Aug 20 19:55:46 wolf-4 kernel: Lustre: Skipped 2 previous similar messages

            [root@wolf-4 mdisolation_step2_draid_testing]# mount |grep lustre
            ost1/ost1 on /mnt/lustre/ost1 type lustre (ro)

            from mds:
            [root@wolf-5 mdisolation_step2_draid_testing]# lctl dl
            0 UP osd-zfs lsdraid-MDT0000-osd lsdraid-MDT0000-osd_UUID 10
            1 UP mgs MGS MGS 9
            2 UP mgc MGC192.168.1.5@o2ib ac271bb5-f2ee-996f-94cf-6c9cd20a078a 5
            3 UP mds MDS MDS_uuid 3
            4 UP lod lsdraid-MDT0000-mdtlov lsdraid-MDT0000-mdtlov_UUID 4
            5 UP mdt lsdraid-MDT0000 lsdraid-MDT0000_UUID 9
            6 UP mdd lsdraid-MDD0000 lsdraid-MDD0000_UUID 4
            7 UP qmt lsdraid-QMT0000 lsdraid-QMT0000_UUID 4
            8 UP lwp lsdraid-MDT0000-lwp-MDT0000 lsdraid-MDT0000-lwp-MDT0000_UUID 5
            9 UP osp lsdraid-OST0000-osc-MDT0000 lsdraid-MDT0000-mdtlov_UUID 5
            10 UP osp lsdraid-OST0001-osc-MDT0000 lsdraid-MDT0000-mdtlov_UUID 5

            from oss1
            [root@wolf-3 mdisolation_step2_draid_testing]# lctl dl
            0 UP osd-zfs lsdraid-OST0000-osd lsdraid-OST0000-osd_UUID 5
            1 UP mgc MGC192.168.1.5@o2ib c2ddb1b6-eda7-06a1-c4c7-7a10c79169f2 5
            2 UP ost OSS OSS_uuid 3
            3 UP obdfilter lsdraid-OST0000 lsdraid-OST0000_UUID 5
            4 UP lwp lsdraid-MDT0000-lwp-OST0000 lsdraid-MDT0000-lwp-OST0000_UUID 5
            [root@wolf-3 mdisolation_step2_draid_testing]#

            from oss2
            [root@wolf-4 mdisolation_step2_draid_testing]# lctl dl
            0 UP osd-zfs lsdraid-OST0001-osd lsdraid-OST0001-osd_UUID 5
            1 UP mgc MGC192.168.1.5@o2ib 9baa6009-12fa-026e-a230-d8562d7b63c7 5
            2 UP ost OSS OSS_uuid 3
            3 UP obdfilter lsdraid-OST0001 lsdraid-OST0001_UUID 5
            4 UP lwp lsdraid-MDT0000-lwp-OST0001 lsdraid-MDT0000-lwp-OST0001_UUID 5

            jsalians_intel John Salinas (Inactive) added a comment - It does indeed appear that I can only connect 1 ost – when the second ost tries to connect I see this: [root@wolf-4 mdisolation_step2_draid_testing] # zpool create -f -o cachefile=none -O recordsize=16MB ost1 draid1 /dev/mapper/mpathl /dev/mapper/mpathk /dev/mapper/mpathj /dev/mapper/mpathi /dev/mapper/mpathr /dev/mapper/mpathm /dev/mapper/mpathb /dev/mapper/mpatha /dev/mapper/mpatho /dev/mapper/mpathn /dev/mapper/mpathq spare '$draid1-0-s0' metadata mirror /dev/mapper/mpathg /dev/mapper/mpathx [root@wolf-4 mdisolation_step2_draid_testing] # zpool feature@large_blocks=enabled ost1 [root@wolf-4 mdisolation_step2_draid_testing] # mkfs.lustre --backfstype=zfs --reformat --fsname=lsdraid --ost --index=1 --mgsnode=192.168.1.5@o2ib ost1/ost1 Permanent disk data: Target: lsdraid:OST0001 Index: 1 Lustre FS: lsdraid Mount type: zfs Flags: 0x62 (OST first_time update ) Persistent mount opts: Parameters: mgsnode=192.168.1.5@o2ib mkfs_cmd = zfs create -o canmount=off -o xattr=sa ost1/ost1 Writing ost1/ost1 properties lustre:version=1 lustre:flags=98 lustre:index=1 lustre:fsname=lsdraid lustre:svname=lsdraid:OST0001 lustre:mgsnode=192.168.1.5@o2ib [root@wolf-4 mdisolation_step2_draid_testing] # mount -t lustre ost1/ost1 /mnt/lustre/ost1 Aug 20 19:55:13 wolf-4 kernel: ZFS: spa metadata allocation class feature not enabled (ddt,dmu,mos) Aug 20 19:55:13 wolf-4 kernel: ZFS: vdev_alloc primary 'ddt,dmu,mos' Aug 20 19:55:14 wolf-4 kernel: ZFS: vdev_construct_zaps assign primary 'ddt,dmu,mos' Aug 20 19:55:27 wolf-4 ntpd_intres [2643] : ntp_intres.request: permission denied Aug 20 19:55:27 wolf-4 ntpd_intres [2643] : ntp_intres.request: permission denied Aug 20 19:55:41 wolf-4 kernel: LustreError: 41114:0:(mgc_request.c:257:do_config_log_add()) MGC192.168.1.5@o2ib: failed processing log, type 4: rc = -22 Aug 20 19:55:41 wolf-4 kernel: Lustre: lsdraid-OST0001: new disk, initializing Aug 20 19:55:41 wolf-4 kernel: Lustre: srv-lsdraid-OST0001: No data found on store. Initialize space Aug 20 19:55:46 wolf-4 kernel: Lustre: lsdraid-OST0001: Connection restored to 192.168.1.5@o2ib (at 192.168.1.5@o2ib) Aug 20 19:55:46 wolf-4 kernel: Lustre: Skipped 2 previous similar messages [root@wolf-4 mdisolation_step2_draid_testing] # mount |grep lustre ost1/ost1 on /mnt/lustre/ost1 type lustre (ro) from mds: [root@wolf-5 mdisolation_step2_draid_testing] # lctl dl 0 UP osd-zfs lsdraid-MDT0000-osd lsdraid-MDT0000-osd_UUID 10 1 UP mgs MGS MGS 9 2 UP mgc MGC192.168.1.5@o2ib ac271bb5-f2ee-996f-94cf-6c9cd20a078a 5 3 UP mds MDS MDS_uuid 3 4 UP lod lsdraid-MDT0000-mdtlov lsdraid-MDT0000-mdtlov_UUID 4 5 UP mdt lsdraid-MDT0000 lsdraid-MDT0000_UUID 9 6 UP mdd lsdraid-MDD0000 lsdraid-MDD0000_UUID 4 7 UP qmt lsdraid-QMT0000 lsdraid-QMT0000_UUID 4 8 UP lwp lsdraid-MDT0000-lwp-MDT0000 lsdraid-MDT0000-lwp-MDT0000_UUID 5 9 UP osp lsdraid-OST0000-osc-MDT0000 lsdraid-MDT0000-mdtlov_UUID 5 10 UP osp lsdraid-OST0001-osc-MDT0000 lsdraid-MDT0000-mdtlov_UUID 5 from oss1 [root@wolf-3 mdisolation_step2_draid_testing] # lctl dl 0 UP osd-zfs lsdraid-OST0000-osd lsdraid-OST0000-osd_UUID 5 1 UP mgc MGC192.168.1.5@o2ib c2ddb1b6-eda7-06a1-c4c7-7a10c79169f2 5 2 UP ost OSS OSS_uuid 3 3 UP obdfilter lsdraid-OST0000 lsdraid-OST0000_UUID 5 4 UP lwp lsdraid-MDT0000-lwp-OST0000 lsdraid-MDT0000-lwp-OST0000_UUID 5 [root@wolf-3 mdisolation_step2_draid_testing] # from oss2 [root@wolf-4 mdisolation_step2_draid_testing] # lctl dl 0 UP osd-zfs lsdraid-OST0001-osd lsdraid-OST0001-osd_UUID 5 1 UP mgc MGC192.168.1.5@o2ib 9baa6009-12fa-026e-a230-d8562d7b63c7 5 2 UP ost OSS OSS_uuid 3 3 UP obdfilter lsdraid-OST0001 lsdraid-OST0001_UUID 5 4 UP lwp lsdraid-MDT0000-lwp-OST0001 lsdraid-MDT0000-lwp-OST0001_UUID 5
            jsalians_intel John Salinas (Inactive) added a comment - - edited

            The good news is one OST can now connect to MDS. I saw one odd messages when clients are mounting I see this:

            [26059.015447] libcfs: module verification failed: signature and/or required key missing - tainting kernel
            [26059.030063] LNet: HW CPU cores: 72, npartitions: 8
            [26059.038309] alg: No test for adler32 (adler32-zlib)
            [26059.044985] alg: No test for crc32 (crc32-table)
            [26068.348811] LNet: HW CPU cores: 72, npartitions: 8
            [26068.357409] alg: No test for adler32 (adler32-zlib)
            [26068.364132] alg: No test for crc32 (crc32-table)
            [26076.388483] Lustre: Lustre: Build Version: 2.8.56
            [26076.471053] LNet: Added LNI 192.168.1.6@o2ib [8/512/0/180]
            [26076.536301] LustreError: 44972:0:(mgc_request.c:1533:mgc_apply_recover_logs()) mgc: cannot find uuid by nid 192.168.1.4@o2ib
            [26076.550097] Lustre: 44972:0:(mgc_request.c:1760:mgc_process_recover_nodemap_log()) MGC192.168.1.5@o2ib: error processing recovery log lsdraid-cliir: rc = -2

            .4 is OSS2 and .5 is MDS node

            [root@wolf-7 ~]# df -h /mnt/lustre/
            Filesystem Size Used Avail Use% Mounted on
            192.168.1.5@o2ib:/lsdraid 7.0T 16M 7.0T 1% /mnt/lustre

            so we are missing the zpool from OSS2:

            [root@wolf-3 ~]# zpool list
            NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
            ost0 9.97T 21.0M 9.97T - 0% 0% 1.00x ONLINE -
            [root@wolf-3 ~]# zfs list
            NAME USED AVAIL REFER MOUNTPOINT
            ost0 17.8M 7.01T 19K /ost0
            ost0/ost0 17.6M 7.01T 17.6M /ost0/ost0
            [root@wolf-3 ~]# mount |grep lustre
            ost0/ost0 on /mnt/lustre/ost0 type lustre (ro)

            [root@wolf-4 mdisolation_step2_draid_testing]# zpool list
            NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
            ost1 9.97T 846K 9.97T - 0% 0% 1.00x ONLINE -
            [root@wolf-4 mdisolation_step2_draid_testing]# zfs list
            NAME USED AVAIL REFER MOUNTPOINT
            ost1 1.51M 7.01T 19K /ost1
            ost1/ost1 1.37M 7.01T 1.37M /ost1/ost1
            [root@wolf-4 mdisolation_step2_draid_testing]# mount |grep lustre
            ost1/ost1 on /mnt/lustre/ost1 type lustre (ro)
            [25673.466998] ZFS: spa metadata allocation class feature not enabled (ddt,dmu,mos)
            [25673.477813] ZFS: vdev_alloc primary 'ddt,dmu,mos'
            [25674.379121] ZFS: vdev_construct_zaps assign primary 'ddt,dmu,mos'
            [25677.322609] LustreError: 52126:0:(mgc_request.c:257:do_config_log_add()) MGC192.168.1.5@o2ib: failed processing log, type 4: rc = -22
            [25677.342892] Lustre: lsdraid-OST0000: new disk, initializing
            [25677.351849] Lustre: srv-lsdraid-OST0000: No data found on store. Initialize space
            [25677.450718] Lustre: lsdraid-OST0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450
            [26177.093757] LustreError: 11-0: lsdraid-MDT0000-lwp-OST0000: operation obd_ping to node 192.168.1.5@o2ib failed: rc = -107
            [26177.108646] LustreError: Skipped 21 previous similar messages
            [26177.117392] Lustre: lsdraid-MDT0000-lwp-OST0000: Connection to lsdraid-MDT0000 (at 192.168.1.5@o2ib) was lost; in progress operations using this service will wait for recovery to complete
            [26177.140650] Lustre: Skipped 21 previous similar messages
            [26177.149666] LustreError: 167-0: lsdraid-MDT0000-lwp-OST0000: This client was evicted by lsdraid-MDT0000; in progress operations using this service will fail.
            [26177.170128] LustreError: Skipped 21 previous similar messages
            [26177.180193] Lustre: lsdraid-MDT0000-lwp-OST0000: Connection restored to 192.168.1.5@o2ib (at 192.168.1.5@o2ib)
            [26177.194074] Lustre: Skipped 21 previous similar messages

            jsalians_intel John Salinas (Inactive) added a comment - - edited The good news is one OST can now connect to MDS. I saw one odd messages when clients are mounting I see this: [26059.015447] libcfs: module verification failed: signature and/or required key missing - tainting kernel [26059.030063] LNet: HW CPU cores: 72, npartitions: 8 [26059.038309] alg: No test for adler32 (adler32-zlib) [26059.044985] alg: No test for crc32 (crc32-table) [26068.348811] LNet: HW CPU cores: 72, npartitions: 8 [26068.357409] alg: No test for adler32 (adler32-zlib) [26068.364132] alg: No test for crc32 (crc32-table) [26076.388483] Lustre: Lustre: Build Version: 2.8.56 [26076.471053] LNet: Added LNI 192.168.1.6@o2ib [8/512/0/180] [26076.536301] LustreError: 44972:0:(mgc_request.c:1533:mgc_apply_recover_logs()) mgc: cannot find uuid by nid 192.168.1.4@o2ib [26076.550097] Lustre: 44972:0:(mgc_request.c:1760:mgc_process_recover_nodemap_log()) MGC192.168.1.5@o2ib: error processing recovery log lsdraid-cliir: rc = -2 .4 is OSS2 and .5 is MDS node [root@wolf-7 ~] # df -h /mnt/lustre/ Filesystem Size Used Avail Use% Mounted on 192.168.1.5@o2ib:/lsdraid 7.0T 16M 7.0T 1% /mnt/lustre so we are missing the zpool from OSS2: [root@wolf-3 ~] # zpool list NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT ost0 9.97T 21.0M 9.97T - 0% 0% 1.00x ONLINE - [root@wolf-3 ~] # zfs list NAME USED AVAIL REFER MOUNTPOINT ost0 17.8M 7.01T 19K /ost0 ost0/ost0 17.6M 7.01T 17.6M /ost0/ost0 [root@wolf-3 ~] # mount |grep lustre ost0/ost0 on /mnt/lustre/ost0 type lustre (ro) [root@wolf-4 mdisolation_step2_draid_testing] # zpool list NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT ost1 9.97T 846K 9.97T - 0% 0% 1.00x ONLINE - [root@wolf-4 mdisolation_step2_draid_testing] # zfs list NAME USED AVAIL REFER MOUNTPOINT ost1 1.51M 7.01T 19K /ost1 ost1/ost1 1.37M 7.01T 1.37M /ost1/ost1 [root@wolf-4 mdisolation_step2_draid_testing] # mount |grep lustre ost1/ost1 on /mnt/lustre/ost1 type lustre (ro) [25673.466998] ZFS: spa metadata allocation class feature not enabled (ddt,dmu,mos) [25673.477813] ZFS: vdev_alloc primary 'ddt,dmu,mos' [25674.379121] ZFS: vdev_construct_zaps assign primary 'ddt,dmu,mos' [25677.322609] LustreError: 52126:0:(mgc_request.c:257:do_config_log_add()) MGC192.168.1.5@o2ib: failed processing log, type 4: rc = -22 [25677.342892] Lustre: lsdraid-OST0000: new disk, initializing [25677.351849] Lustre: srv-lsdraid-OST0000: No data found on store. Initialize space [25677.450718] Lustre: lsdraid-OST0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450 [26177.093757] LustreError: 11-0: lsdraid-MDT0000-lwp-OST0000: operation obd_ping to node 192.168.1.5@o2ib failed: rc = -107 [26177.108646] LustreError: Skipped 21 previous similar messages [26177.117392] Lustre: lsdraid-MDT0000-lwp-OST0000: Connection to lsdraid-MDT0000 (at 192.168.1.5@o2ib) was lost; in progress operations using this service will wait for recovery to complete [26177.140650] Lustre: Skipped 21 previous similar messages [26177.149666] LustreError: 167-0: lsdraid-MDT0000-lwp-OST0000: This client was evicted by lsdraid-MDT0000; in progress operations using this service will fail. [26177.170128] LustreError: Skipped 21 previous similar messages [26177.180193] Lustre: lsdraid-MDT0000-lwp-OST0000: Connection restored to 192.168.1.5@o2ib (at 192.168.1.5@o2ib) [26177.194074] Lustre: Skipped 21 previous similar messages

            Today I re-built with tip of master + this patch and was able to get an OSS to mount successfully – using the same scripts:

            1. ./setup_oss1.sh
              Restarting network (via systemctl): Job for network.service failed because the control process exited with error code. See "systemctl status network.service" and "journalctl -xe" for details.
              [FAILED]
              modprobe: FATAL: Module lnet is in use.
              modprobe: FATAL: Module zfs is in use.
              pool: ost0
              state: ONLINE
              scan: none requested
              config:

            NAME STATE READ WRITE CKSUM
            ost0 ONLINE 0 0 0
            draid1-0

            {any}

            ONLINE 0 0 0
            mpathaj ONLINE 0 0 0
            mpathai ONLINE 0 0 0
            mpathah ONLINE 0 0 0
            mpathag ONLINE 0 0 0
            mpathaq ONLINE 0 0 0
            mpathap ONLINE 0 0 0
            mpathak ONLINE 0 0 0
            mpathz ONLINE 0 0 0
            mpatham ONLINE 0 0 0
            mpathal ONLINE 0 0 0
            mpathao ONLINE 0 0 0
            mirror-1

            {ddt,dmu,mos}

            ONLINE 0 0 0
            mpathae ONLINE 0 0 0
            mpathaa ONLINE 0 0 0
            spares
            $draid1-0-s0 AVAIL

            errors: No known data errors
            ost0 feature@large_blocks enabled local

            Permanent disk data:
            Target: lsdraid-OST0000
            Index: 0
            Lustre FS: lsdraid
            Mount type: zfs
            Flags: 0x42
            (OST update )
            Persistent mount opts:
            Parameters: mgsnode=192.168.1.5@o2ib

            mkfs_cmd = zfs create -o canmount=off -o xattr=sa ost0/ost0
            Writing ost0/ost0 properties
            lustre:version=1
            lustre:flags=66
            lustre:index=0
            lustre:fsname=lsdraid
            lustre:svname=lsdraid-OST0000
            lustre:mgsnode=192.168.1.5@o2ib

            1. mount |grep ost0
              ost0 on /ost0 type zfs (rw,xattr,noacl)
              ost0/ost0 on /mnt/lustre/ost0 type lustre (ro)

            Will install the rest of the system and if we can get all of the nodes up then will mark this as fixed – but we will need this fix merged.

            jsalians_intel John Salinas (Inactive) added a comment - Today I re-built with tip of master + this patch and was able to get an OSS to mount successfully – using the same scripts: ./setup_oss1.sh Restarting network (via systemctl): Job for network.service failed because the control process exited with error code. See "systemctl status network.service" and "journalctl -xe" for details. [FAILED] modprobe: FATAL: Module lnet is in use. modprobe: FATAL: Module zfs is in use. pool: ost0 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM ost0 ONLINE 0 0 0 draid1-0 {any} ONLINE 0 0 0 mpathaj ONLINE 0 0 0 mpathai ONLINE 0 0 0 mpathah ONLINE 0 0 0 mpathag ONLINE 0 0 0 mpathaq ONLINE 0 0 0 mpathap ONLINE 0 0 0 mpathak ONLINE 0 0 0 mpathz ONLINE 0 0 0 mpatham ONLINE 0 0 0 mpathal ONLINE 0 0 0 mpathao ONLINE 0 0 0 mirror-1 {ddt,dmu,mos} ONLINE 0 0 0 mpathae ONLINE 0 0 0 mpathaa ONLINE 0 0 0 spares $draid1-0-s0 AVAIL errors: No known data errors ost0 feature@large_blocks enabled local Permanent disk data: Target: lsdraid-OST0000 Index: 0 Lustre FS: lsdraid Mount type: zfs Flags: 0x42 (OST update ) Persistent mount opts: Parameters: mgsnode=192.168.1.5@o2ib mkfs_cmd = zfs create -o canmount=off -o xattr=sa ost0/ost0 Writing ost0/ost0 properties lustre:version=1 lustre:flags=66 lustre:index=0 lustre:fsname=lsdraid lustre:svname=lsdraid-OST0000 lustre:mgsnode=192.168.1.5@o2ib mount |grep ost0 ost0 on /ost0 type zfs (rw,xattr,noacl) ost0/ost0 on /mnt/lustre/ost0 type lustre (ro) Will install the rest of the system and if we can get all of the nodes up then will mark this as fixed – but we will need this fix merged.

            People

              kit.westneat Kit Westneat (Inactive)
              jay Jinshan Xiong (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: