Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8508

kernel:LustreError: 3842:0:(lu_object.c:1243:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.9.0
    • Lustre 2.8.0
    • None
    • 3
    • 9223372036854775807

    Description

      Lustre DNE2 Testing, noticed some issue with latest master builds. When mounting storage targets on servers other than ones with the MGT i get a kernel panic with the below; I have validated this is not (to the best of my ability) network, I have also tried and FE build which works and another master build (3419) which works:

       
      [root@zlfs2-oss1 ~]# mount -vvv -t lustre /dev/nvme0n1 /mnt/MDT0000
      arg[0] = /sbin/mount.lustre
      arg[1] = -v
      arg[2] = -o
      arg[3] = rw
      arg[4] = /dev/nvme0n1
      arg[5] = /mnt/MDT0000
      source = /dev/nvme0n1 (/dev/nvme0n1), target = /mnt/MDT0000
      options = rw
      checking for existing Lustre data: found
      Reading CONFIGS/mountdata
      Writing CONFIGS/mountdata
      mounting device /dev/nvme0n1 at /mnt/MDT0000, flags=0x1000000 options=osd=osd-ldiskfs,user_xattr,errors=remount-ro,mgsnode=192.168.5.21@o2ib,virgin,update,param=mgsnode=192.168.5.21@o2ib,svname=zlfs2-MDT0000,device=/dev/nvme0n1
      mount.lustre: cannot parse scheduler options for '/sys/block/nvme0n1/queue/scheduler'
      
      Message from syslogd@zlfs2-oss1 at Aug 16 21:52:33 ...
       kernel:LustreError: 3842:0:(lu_object.c:1243:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
      
      Message from syslogd@zlfs2-oss1 at Aug 16 21:52:33 ...
       kernel:LustreError: 3842:0:(lu_object.c:1243:lu_device_fini()) LBUG
      
      Message from syslogd@zlfs2-oss1 at Aug 16 21:52:33 ...
       kernel:Kernel panic - not syncing: LBUG
      

      Attached is some debugging / more info.

      Builds Tried:
      master b3424 - issues
      master b3423 - issues
      master b3420 - issues
      master b3419 - works
      fe 2.8 b18 - works

      Attachments

        Issue Links

          Activity

            [LU-8508] kernel:LustreError: 3842:0:(lu_object.c:1243:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1

            Hey Peter,

            Are we talking about change 22004? I only see two style comments from Andreas. There are a few over 80 chars autocomments as well, but I thought we were ignoring those now to match the Linux style guide. I'll refresh it, but I want to make sure I'm not missing something.

            Thanks,
            Kit

            kit.westneat Kit Westneat (Inactive) added a comment - Hey Peter, Are we talking about change 22004? I only see two style comments from Andreas. There are a few over 80 chars autocomments as well, but I thought we were ignoring those now to match the Linux style guide. I'll refresh it, but I want to make sure I'm not missing something. Thanks, Kit
            pjones Peter Jones added a comment -

            Kit

            I think that at the moment a second reviewer is holding off in anticipation of another version being forthcoming given that there are quite a number of comments so I tihnk that it would be good to refresh it

            Peter

            pjones Peter Jones added a comment - Kit I think that at the moment a second reviewer is holding off in anticipation of another version being forthcoming given that there are quite a number of comments so I tihnk that it would be good to refresh it Peter

            Hey Peter,

            I wasn't planning on it since he +1'd it, unless there were other issues found, but I can if that's desired.

            • Kit
            kit.westneat Kit Westneat (Inactive) added a comment - Hey Peter, I wasn't planning on it since he +1'd it, unless there were other issues found, but I can if that's desired. Kit
            pjones Peter Jones added a comment -

            Kit

            Will you be refreshing the patch in light of Andreas's review feedback?

            Peter

            pjones Peter Jones added a comment - Kit Will you be refreshing the patch in light of Andreas's review feedback? Peter

            BTW the cause of the second bug is that if a new OST mounts before the MGC has pulled the nodemap config from the MGS, it creates a new blank config on disk. Part of that code was erroneously assuming that it was in the MGS, as normally all new records are created there and then sent to the OSTs, so it was returning an error. That's why the first OST failed to mount. When the other OSTs were mounted, the MGC was already connected to the MGS, so it was able to pull the config and save it properly. That's why the other OSTs were able to mount after rebooting, but nvme0n1 wasn't able to until the others were mounted.

            kit.westneat Kit Westneat (Inactive) added a comment - BTW the cause of the second bug is that if a new OST mounts before the MGC has pulled the nodemap config from the MGS, it creates a new blank config on disk. Part of that code was erroneously assuming that it was in the MGS, as normally all new records are created there and then sent to the OSTs, so it was returning an error. That's why the first OST failed to mount. When the other OSTs were mounted, the MGC was already connected to the MGS, so it was able to pull the config and save it properly. That's why the other OSTs were able to mount after rebooting, but nvme0n1 wasn't able to until the others were mounted.

            This patch is still a work in progress, but addresses both these issues.

            kit.westneat Kit Westneat (Inactive) added a comment - This patch is still a work in progress, but addresses both these issues.

            Kit Westneat (kit.westneat@gmail.com) uploaded a new patch: http://review.whamcloud.com/22004
            Subject: LU-8508 nodemap: improve object handling in cache saving
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 6753d578f44195ff6e4476266538887f6cd07712

            gerrit Gerrit Updater added a comment - Kit Westneat (kit.westneat@gmail.com) uploaded a new patch: http://review.whamcloud.com/22004 Subject: LU-8508 nodemap: improve object handling in cache saving Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 6753d578f44195ff6e4476266538887f6cd07712

            Fail to mount the OST is another issue that is different from the original "ASSERTION( atomic_read(&d->ld_ref) == 0 )".

            The first target I try to mount which isn't on the same server as the MGT will fail and get stuck in this state. Not mounted but in a lock somewhere, its like it starts the service without a target.

            Have you mounted up the MGS before mounting the MDT or OTS? If no, please mount up the MGS (or say MGT on the MGS node) firstly. Otherwise, please enable -1 level Lustre kernel debug on both the MGS and OSS/MDS, then try again and attach the Lustre debug logs. Thanks!

            yong.fan nasf (Inactive) added a comment - Fail to mount the OST is another issue that is different from the original "ASSERTION( atomic_read(&d->ld_ref) == 0 )". The first target I try to mount which isn't on the same server as the MGT will fail and get stuck in this state. Not mounted but in a lock somewhere, its like it starts the service without a target. Have you mounted up the MGS before mounting the MDT or OTS? If no, please mount up the MGS (or say MGT on the MGS node) firstly. Otherwise, please enable -1 level Lustre kernel debug on both the MGS and OSS/MDS, then try again and attach the Lustre debug logs. Thanks!
            adam.j.roe Adam Roe (Inactive) added a comment - - edited

            Some more verbose information on the failed first mount:

            [root@zlfs2-mds2 ~]# mount -vvv -t lustre  /dev/nvme0n1 /mnt/OST0004
            arg[0] = /sbin/mount.lustre
            arg[1] = -v
            arg[2] = -o
            arg[3] = rw
            arg[4] = /dev/nvme0n1
            arg[5] = /mnt/OST0004
            source = /dev/nvme0n1 (/dev/nvme0n1), target = /mnt/OST0004
            options = rw
            checking for existing Lustre data: found
            Reading CONFIGS/mountdata
            Writing CONFIGS/mountdata
            mounting device /dev/nvme0n1 at /mnt/OST0004, flags=0x1000000 options=osd=osd-ldiskfs,,errors=remount-ro,mgsnode=192.168.5.21@o2ib,virgin,update,param=mgsnode=192.168.5.21@o2ib,svname=zlfs2-OST0004,device=/dev/nvme0n1
            mount.lustre: cannot parse scheduler options for '/sys/block/nvme0n1/queue/scheduler'
            mount.lustre: mount /dev/nvme0n1 at /mnt/OST0004 failed: Invalid argument retries left: 0
            mount.lustre: mount /dev/nvme0n1 at /mnt/OST0004 failed: Invalid argument
            This may have multiple causes.
            Are the mount options correct?
            Check the syslog for more info.
            
            Aug 17 22:10:37 zlfs2-mds2 kernel: LDISKFS-fs (nvme0n1): file extents enabled, maximum tree depth=5
            Aug 17 22:10:37 zlfs2-mds2 kernel: LDISKFS-fs (nvme0n1): mounted filesystem with ordered data mode. Opts: errors=remount-ro
            Aug 17 22:10:37 zlfs2-mds2 kernel: LDISKFS-fs (nvme0n1): file extents enabled, maximum tree depth=5
            Aug 17 22:10:37 zlfs2-mds2 kernel: LDISKFS-fs (nvme0n1): mounted filesystem with ordered data mode. Opts: ,errors=remount-ro,no_mbcache
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4148:0:(mgc_request.c:257:do_config_log_add()) MGC192.168.5.21@o2ib: failed processing log, type 4: rc = -22
            Aug 17 22:10:38 zlfs2-mds2 kernel: Lustre: zlfs2-OST0004: new disk, initializing
            Aug 17 22:10:38 zlfs2-mds2 kernel: Lustre: srv-zlfs2-OST0004: No data found on store. Initialize space
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(nodemap_storage.c:368:nodemap_idx_nodemap_add_update()) cannot add nodemap config to non-existing MGS.
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(nodemap_storage.c:1315:nodemap_fs_init()) zlfs2-OST0004: error loading nodemap config file, file must be removed via ldiskfs: rc = -22
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff881ffa035080[0x0, 1, [0x1:0x0:0x0] hash exist]{
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff881ffa0350d0
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff882006418700osd-ldiskfs-object@ffff882006418700(i:ffff881ff9da1e88:78/1354905553)[plain]
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff881ffa035080
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff881ffa034fc0[0x0, 1, [0x200000003:0x0:0x0] hash exist]{
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff881ffa035010
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff88200641b700osd-ldiskfs-object@ffff88200641b700(i:ffff881ff9d98d88:77/1354905519)[plain]
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff881ffa034fc0
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff881ffa034c00[0x0, 1, [0xa:0x0:0x0] hash exist]{
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff881ffa034c50
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff88200641b100osd-ldiskfs-object@ffff88200641b100(i:ffff881ff9daaf88:79/1354905587)[plain]
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff881ffa034c00
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff88202684cd80[0x0, 1, [0xa:0x2:0x0] hash exist]{
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff88202684cdd0
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff881ffa237e00osd-ldiskfs-object@ffff881ffa237e00(i:ffff882024048948:80/1354905588)[plain]
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff88202684cd80
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff881ffa035380[0x0, 1, [0x200000001:0x1017:0x0] hash exist]{
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff881ffa0353d0
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff88200641be00osd-ldiskfs-object@ffff88200641be00(i:ffff882024309a48:7864321/2679038361)[plain]
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff881ffa035380
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(obd_config.c:578:class_setup()) setup zlfs2-OST0004 failed (-22)
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(obd_config.c:1671:class_config_llog_handler()) MGC192.168.5.21@o2ib: cfg command failed: rc = -22
            Aug 17 22:10:38 zlfs2-mds2 kernel: Lustre:    cmd=cf003 0:zlfs2-OST0004  1:dev  2:0  3:f
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 15b-f: MGC192.168.5.21@o2ib: The configuration from log 'zlfs2-OST0004'failed from the MGS (-22).  Make sure this client and the MGS are running compatible versions of Lustre.
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4148:0:(obd_mount_server.c:1352:server_start_targets()) failed to start server zlfs2-OST0004: -22
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4148:0:(obd_mount_server.c:1844:server_fill_super()) Unable to start targets: -22
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4148:0:(obd_config.c:625:class_cleanup()) Device 3 not setup
            Aug 17 22:10:38 zlfs2-mds2 kernel: Lustre: server umount zlfs2-OST0004 complete
            Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4148:0:(obd_mount.c:1453:lustre_fill_super()) Unable to mount /dev/nvme0n1 (-22)
            

            I then try to mount again and get the same as above with this extra:

            The target service is already running. (/dev/nvme0n1)
            

            I reboot the server, mounting that target still fails, however If I mount a different target on the same server beforehand, say nvme1n1 this I am then able to mount nvme0n1 without issue.

            adam.j.roe Adam Roe (Inactive) added a comment - - edited Some more verbose information on the failed first mount: [root@zlfs2-mds2 ~]# mount -vvv -t lustre /dev/nvme0n1 /mnt/OST0004 arg[0] = /sbin/mount.lustre arg[1] = -v arg[2] = -o arg[3] = rw arg[4] = /dev/nvme0n1 arg[5] = /mnt/OST0004 source = /dev/nvme0n1 (/dev/nvme0n1), target = /mnt/OST0004 options = rw checking for existing Lustre data: found Reading CONFIGS/mountdata Writing CONFIGS/mountdata mounting device /dev/nvme0n1 at /mnt/OST0004, flags=0x1000000 options=osd=osd-ldiskfs,,errors=remount-ro,mgsnode=192.168.5.21@o2ib,virgin,update,param=mgsnode=192.168.5.21@o2ib,svname=zlfs2-OST0004,device=/dev/nvme0n1 mount.lustre: cannot parse scheduler options for '/sys/block/nvme0n1/queue/scheduler' mount.lustre: mount /dev/nvme0n1 at /mnt/OST0004 failed: Invalid argument retries left: 0 mount.lustre: mount /dev/nvme0n1 at /mnt/OST0004 failed: Invalid argument This may have multiple causes. Are the mount options correct? Check the syslog for more info. Aug 17 22:10:37 zlfs2-mds2 kernel: LDISKFS-fs (nvme0n1): file extents enabled, maximum tree depth=5 Aug 17 22:10:37 zlfs2-mds2 kernel: LDISKFS-fs (nvme0n1): mounted filesystem with ordered data mode. Opts: errors=remount-ro Aug 17 22:10:37 zlfs2-mds2 kernel: LDISKFS-fs (nvme0n1): file extents enabled, maximum tree depth=5 Aug 17 22:10:37 zlfs2-mds2 kernel: LDISKFS-fs (nvme0n1): mounted filesystem with ordered data mode. Opts: ,errors=remount-ro,no_mbcache Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4148:0:(mgc_request.c:257:do_config_log_add()) MGC192.168.5.21@o2ib: failed processing log, type 4: rc = -22 Aug 17 22:10:38 zlfs2-mds2 kernel: Lustre: zlfs2-OST0004: new disk, initializing Aug 17 22:10:38 zlfs2-mds2 kernel: Lustre: srv-zlfs2-OST0004: No data found on store. Initialize space Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(nodemap_storage.c:368:nodemap_idx_nodemap_add_update()) cannot add nodemap config to non-existing MGS. Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(nodemap_storage.c:1315:nodemap_fs_init()) zlfs2-OST0004: error loading nodemap config file, file must be removed via ldiskfs: rc = -22 Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff881ffa035080[0x0, 1, [0x1:0x0:0x0] hash exist]{ Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff881ffa0350d0 Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff882006418700osd-ldiskfs-object@ffff882006418700(i:ffff881ff9da1e88:78/1354905553)[plain] Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff881ffa035080 Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff881ffa034fc0[0x0, 1, [0x200000003:0x0:0x0] hash exist]{ Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff881ffa035010 Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff88200641b700osd-ldiskfs-object@ffff88200641b700(i:ffff881ff9d98d88:77/1354905519)[plain] Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff881ffa034fc0 Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff881ffa034c00[0x0, 1, [0xa:0x0:0x0] hash exist]{ Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff881ffa034c50 Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff88200641b100osd-ldiskfs-object@ffff88200641b100(i:ffff881ff9daaf88:79/1354905587)[plain] Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff881ffa034c00 Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff88202684cd80[0x0, 1, [0xa:0x2:0x0] hash exist]{ Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff88202684cdd0 Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff881ffa237e00osd-ldiskfs-object@ffff881ffa237e00(i:ffff882024048948:80/1354905588)[plain] Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff88202684cd80 Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff881ffa035380[0x0, 1, [0x200000001:0x1017:0x0] hash exist]{ Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff881ffa0353d0 Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff88200641be00osd-ldiskfs-object@ffff88200641be00(i:ffff882024309a48:7864321/2679038361)[plain] Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff881ffa035380 Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(obd_config.c:578:class_setup()) setup zlfs2-OST0004 failed (-22) Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(obd_config.c:1671:class_config_llog_handler()) MGC192.168.5.21@o2ib: cfg command failed: rc = -22 Aug 17 22:10:38 zlfs2-mds2 kernel: Lustre: cmd=cf003 0:zlfs2-OST0004 1:dev 2:0 3:f Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 15b-f: MGC192.168.5.21@o2ib: The configuration from log 'zlfs2-OST0004'failed from the MGS (-22). Make sure this client and the MGS are running compatible versions of Lustre. Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4148:0:(obd_mount_server.c:1352:server_start_targets()) failed to start server zlfs2-OST0004: -22 Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4148:0:(obd_mount_server.c:1844:server_fill_super()) Unable to start targets: -22 Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4148:0:(obd_config.c:625:class_cleanup()) Device 3 not setup Aug 17 22:10:38 zlfs2-mds2 kernel: Lustre: server umount zlfs2-OST0004 complete Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4148:0:(obd_mount.c:1453:lustre_fill_super()) Unable to mount /dev/nvme0n1 (-22) I then try to mount again and get the same as above with this extra: The target service is already running. (/dev/nvme0n1) I reboot the server, mounting that target still fails, however If I mount a different target on the same server beforehand, say nvme1n1 this I am then able to mount nvme0n1 without issue.

            Okay so the strange behavior:

            The first target I try to mount which isn't on the same server as the MGT will fail and get stuck in this state. Not mounted but in a lock somewhere, its like it starts the service without a target.

            mount.lustre: mount /dev/nvme1n1 at /mnt/OST0005 failed: Operation already in progress
            The target service is already running. (/dev/nvme1n1)
            

            All other targets after the first will error out, but not get stuck:

            mount.lustre: mount /dev/nvme0n1 at /mnt/MDT0004 failed: Invalid argument
            This may have multiple causes.
            Are the mount options correct?
            Check the syslog for more info.
            

            If I then run the mount command for a second time it will mount. But I have not found a way to recover the first locked target. I have to reboot and remount.

            adam.j.roe Adam Roe (Inactive) added a comment - Okay so the strange behavior: The first target I try to mount which isn't on the same server as the MGT will fail and get stuck in this state. Not mounted but in a lock somewhere, its like it starts the service without a target. mount.lustre: mount /dev/nvme1n1 at /mnt/OST0005 failed: Operation already in progress The target service is already running. (/dev/nvme1n1) All other targets after the first will error out, but not get stuck: mount.lustre: mount /dev/nvme0n1 at /mnt/MDT0004 failed: Invalid argument This may have multiple causes. Are the mount options correct? Check the syslog for more info. If I then run the mount command for a second time it will mount. But I have not found a way to recover the first locked target. I have to reboot and remount.

            Okay I tested the build - to update. It didn't crash the system, but it fails to mount the target. See below:

            Some stuff mounts, some wont - strange behavior going on, will report back soon.

            Mount

            [root@lfsmaster FORMAT_SCRIPTS]# ssh zlfs2-mds2
            Last login: Wed Aug 17 08:46:40 2016 from 10.10.100.99
            [root@zlfs2-mds2 ~]# mount -vvv -t lustre  /dev/nvme0n1 /mnt/OST0004
            arg[0] = /sbin/mount.lustre
            arg[1] = -v
            arg[2] = -o
            arg[3] = rw
            arg[4] = /dev/nvme0n1
            arg[5] = /mnt/OST0004
            source = /dev/nvme0n1 (/dev/nvme0n1), target = /mnt/OST0004
            options = rw
            checking for existing Lustre data: found
            Reading CONFIGS/mountdata
            Writing CONFIGS/mountdata
            mounting device /dev/nvme0n1 at /mnt/OST0004, flags=0x1000000 options=osd=osd-ldiskfs,,errors=remount-ro,mgsnode=192.168.5.21@o2ib,virgin,update,param=mgsnode=192.168.5.21@o2ib,svname=zlfs2-OST0004,device=/dev/nvme0n1
            mount.lustre: cannot parse scheduler options for '/sys/block/nvme0n1/queue/scheduler'
            mount.lustre: mount /dev/nvme0n1 at /mnt/OST0004 failed: Invalid argument retries left: 0
            mount.lustre: mount /dev/nvme0n1 at /mnt/OST0004 failed: Invalid argument
            This may have multiple causes.
            Are the mount options correct?
            Check the syslog for more info.
            

            Logs

            Aug 17 18:10:22 zlfs2-mds2 kernel: LDISKFS-fs (nvme0n1): file extents enabled, maximum tree depth=5
            Aug 17 18:10:22 zlfs2-mds2 kernel: LDISKFS-fs (nvme0n1): mounted filesystem with ordered data mode. Opts: errors=remount-ro
            Aug 17 18:10:23 zlfs2-mds2 kernel: LDISKFS-fs (nvme0n1): file extents enabled, maximum tree depth=5
            Aug 17 18:10:23 zlfs2-mds2 kernel: LDISKFS-fs (nvme0n1): mounted filesystem with ordered data mode. Opts: ,errors=remount-ro,no_mbcache
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14037:0:(mgc_request.c:257:do_config_log_add()) MGC192.168.5.21@o2ib: failed processing log, type 4: rc = -22
            Aug 17 18:10:23 zlfs2-mds2 kernel: Lustre: zlfs2-OST0004: new disk, initializing
            Aug 17 18:10:23 zlfs2-mds2 kernel: Lustre: srv-zlfs2-OST0004: No data found on store. Initialize space
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(nodemap_storage.c:368:nodemap_idx_nodemap_add_update()) cannot add nodemap config to non-existing MGS.
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(nodemap_storage.c:1315:nodemap_fs_init()) zlfs2-OST0004: error loading nodemap config file, file must be removed via ldiskfs: rc = -22
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff881ffb9ed5c0[0x0, 1, [0x1:0x0:0x0] hash exist]{
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff881ffb9ed610
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff882022c59100osd-ldiskfs-object@ffff882022c59100(i:ffff881ff8e91e88:78/106428767)[plain]
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff881ffb9ed5c0
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff881ffb9ed800[0x0, 1, [0x200000003:0x0:0x0] hash exist]{
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff881ffb9ed850
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff882022c59e00osd-ldiskfs-object@ffff882022c59e00(i:ffff881ff8e88d88:77/106428733)[plain]
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff881ffb9ed800
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff881ffb9ed2c0[0x0, 1, [0xa:0x0:0x0] hash exist]{
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff881ffb9ed310
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff882022c58600osd-ldiskfs-object@ffff882022c58600(i:ffff881ff8e9af88:79/106428801)[plain]
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff881ffb9ed2c0
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff882007239140[0x0, 1, [0xa:0x2:0x0] hash exist]{
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff882007239190
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff88200f958800osd-ldiskfs-object@ffff88200f958800(i:ffff881ff8ea80c8:80/106428802)[plain]
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff882007239140
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff881ffb9ed440[0x0, 1, [0x200000001:0x1017:0x0] hash exist]{
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff881ffb9ed490
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff882022c59600osd-ldiskfs-object@ffff882022c59600(i:ffff881ff8e71e88:5898241/3450894875)[plain]
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff881ffb9ed440
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(obd_config.c:578:class_setup()) setup zlfs2-OST0004 failed (-22)
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(obd_config.c:1671:class_config_llog_handler()) MGC192.168.5.21@o2ib: cfg command failed: rc = -22
            Aug 17 18:10:23 zlfs2-mds2 kernel: Lustre:    cmd=cf003 0:zlfs2-OST0004  1:dev  2:0  3:f
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 15b-f: MGC192.168.5.21@o2ib: The configuration from log 'zlfs2-OST0004'failed from the MGS (-22).  Make sure this client and the MGS are running compatible versions of Lustre.
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14037:0:(obd_mount_server.c:1352:server_start_targets()) failed to start server zlfs2-OST0004: -22
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14037:0:(obd_mount_server.c:1844:server_fill_super()) Unable to start targets: -22
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14037:0:(obd_config.c:625:class_cleanup()) Device 3 not setup
            Aug 17 18:10:23 zlfs2-mds2 kernel: Lustre: server umount zlfs2-OST0004 complete
            Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14037:0:(obd_mount.c:1453:lustre_fill_super()) Unable to mount /dev/nvme0n1 (-22)
            
            adam.j.roe Adam Roe (Inactive) added a comment - Okay I tested the build - to update. It didn't crash the system, but it fails to mount the target. See below: Some stuff mounts, some wont - strange behavior going on, will report back soon. Mount [root@lfsmaster FORMAT_SCRIPTS]# ssh zlfs2-mds2 Last login: Wed Aug 17 08:46:40 2016 from 10.10.100.99 [root@zlfs2-mds2 ~]# mount -vvv -t lustre /dev/nvme0n1 /mnt/OST0004 arg[0] = /sbin/mount.lustre arg[1] = -v arg[2] = -o arg[3] = rw arg[4] = /dev/nvme0n1 arg[5] = /mnt/OST0004 source = /dev/nvme0n1 (/dev/nvme0n1), target = /mnt/OST0004 options = rw checking for existing Lustre data: found Reading CONFIGS/mountdata Writing CONFIGS/mountdata mounting device /dev/nvme0n1 at /mnt/OST0004, flags=0x1000000 options=osd=osd-ldiskfs,,errors=remount-ro,mgsnode=192.168.5.21@o2ib,virgin,update,param=mgsnode=192.168.5.21@o2ib,svname=zlfs2-OST0004,device=/dev/nvme0n1 mount.lustre: cannot parse scheduler options for '/sys/block/nvme0n1/queue/scheduler' mount.lustre: mount /dev/nvme0n1 at /mnt/OST0004 failed: Invalid argument retries left: 0 mount.lustre: mount /dev/nvme0n1 at /mnt/OST0004 failed: Invalid argument This may have multiple causes. Are the mount options correct? Check the syslog for more info. Logs Aug 17 18:10:22 zlfs2-mds2 kernel: LDISKFS-fs (nvme0n1): file extents enabled, maximum tree depth=5 Aug 17 18:10:22 zlfs2-mds2 kernel: LDISKFS-fs (nvme0n1): mounted filesystem with ordered data mode. Opts: errors=remount-ro Aug 17 18:10:23 zlfs2-mds2 kernel: LDISKFS-fs (nvme0n1): file extents enabled, maximum tree depth=5 Aug 17 18:10:23 zlfs2-mds2 kernel: LDISKFS-fs (nvme0n1): mounted filesystem with ordered data mode. Opts: ,errors=remount-ro,no_mbcache Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14037:0:(mgc_request.c:257:do_config_log_add()) MGC192.168.5.21@o2ib: failed processing log, type 4: rc = -22 Aug 17 18:10:23 zlfs2-mds2 kernel: Lustre: zlfs2-OST0004: new disk, initializing Aug 17 18:10:23 zlfs2-mds2 kernel: Lustre: srv-zlfs2-OST0004: No data found on store. Initialize space Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(nodemap_storage.c:368:nodemap_idx_nodemap_add_update()) cannot add nodemap config to non-existing MGS. Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(nodemap_storage.c:1315:nodemap_fs_init()) zlfs2-OST0004: error loading nodemap config file, file must be removed via ldiskfs: rc = -22 Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff881ffb9ed5c0[0x0, 1, [0x1:0x0:0x0] hash exist]{ Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff881ffb9ed610 Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff882022c59100osd-ldiskfs-object@ffff882022c59100(i:ffff881ff8e91e88:78/106428767)[plain] Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff881ffb9ed5c0 Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff881ffb9ed800[0x0, 1, [0x200000003:0x0:0x0] hash exist]{ Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff881ffb9ed850 Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff882022c59e00osd-ldiskfs-object@ffff882022c59e00(i:ffff881ff8e88d88:77/106428733)[plain] Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff881ffb9ed800 Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff881ffb9ed2c0[0x0, 1, [0xa:0x0:0x0] hash exist]{ Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff881ffb9ed310 Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff882022c58600osd-ldiskfs-object@ffff882022c58600(i:ffff881ff8e9af88:79/106428801)[plain] Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff881ffb9ed2c0 Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff882007239140[0x0, 1, [0xa:0x2:0x0] hash exist]{ Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff882007239190 Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff88200f958800osd-ldiskfs-object@ffff88200f958800(i:ffff881ff8ea80c8:80/106428802)[plain] Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff882007239140 Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff881ffb9ed440[0x0, 1, [0x200000001:0x1017:0x0] hash exist]{ Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff881ffb9ed490 Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff882022c59600osd-ldiskfs-object@ffff882022c59600(i:ffff881ff8e71e88:5898241/3450894875)[plain] Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff881ffb9ed440 Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(obd_config.c:578:class_setup()) setup zlfs2-OST0004 failed (-22) Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(obd_config.c:1671:class_config_llog_handler()) MGC192.168.5.21@o2ib: cfg command failed: rc = -22 Aug 17 18:10:23 zlfs2-mds2 kernel: Lustre: cmd=cf003 0:zlfs2-OST0004 1:dev 2:0 3:f Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 15b-f: MGC192.168.5.21@o2ib: The configuration from log 'zlfs2-OST0004'failed from the MGS (-22). Make sure this client and the MGS are running compatible versions of Lustre. Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14037:0:(obd_mount_server.c:1352:server_start_targets()) failed to start server zlfs2-OST0004: -22 Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14037:0:(obd_mount_server.c:1844:server_fill_super()) Unable to start targets: -22 Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14037:0:(obd_config.c:625:class_cleanup()) Device 3 not setup Aug 17 18:10:23 zlfs2-mds2 kernel: Lustre: server umount zlfs2-OST0004 complete Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14037:0:(obd_mount.c:1453:lustre_fill_super()) Unable to mount /dev/nvme0n1 (-22)

            People

              kit.westneat Kit Westneat (Inactive)
              adam.j.roe Adam Roe (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: