Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8498

configuration from log 'nodemap' failed (-22)

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.9.0
    • None
    • 1
    • 9223372036854775807

    Description

      In current master 2.8.56, a newly created file system failed at mounting OST due to nodemap log error.

      From the log message:

      00000100:00000001:3.0:1470938259.243549:0:9524:0:(client.c:1052:ptlrpc_set_destroy()) Process leaving
      00000100:00000001:3.0:1470938259.243549:0:9524:0:(client.c:2896:ptlrpc_queue_wait()) Process leaving (rc=0 : 0 : 0)
      10000000:00000001:3.0:1470938259.243551:0:9524:0:(mgc_request.c:1716:mgc_process_recover_nodemap_log()) Process leaving via out (rc=18446744073709551594 : -22 : 0xffffffffffffffea)
      

      it looks like the corresponding log has a zero size that triggered this error.

              if (ealen == 0) { /* no logs transferred */
      #ifdef HAVE_SERVER_SUPPORT
                      /* config changed since first read RPC */
                      if (cld_is_nodemap(cld) && config_read_offset == 0) {
                              recent_nodemap = NULL;
                              nodemap_config_dealloc(new_config);
                              new_config = NULL;
      
                              CDEBUG(D_INFO, "nodemap config changed in transit, retrying\n");
      
                              /* setting eof to false, we request config again */
                              eof = false;
                              GOTO(out, rc = 0);
                      }
      #endif
                      if (!eof)
                              rc = -EINVAL;
                      GOTO(out, rc);
              }
      

      We have a debug log and will attach it soon.

      Attachments

        1. log_failedmount_vanilla_master
          0.2 kB
        2. log.Jinshan
          0.2 kB
        3. log.mds.LU-8498
          0.2 kB
        4. mds_lctl_log_afterpatch
          0.2 kB
        5. oss_lctl_log.afterpatch
          0.2 kB

        Issue Links

          Activity

            [LU-8498] configuration from log 'nodemap' failed (-22)

            It does indeed appear that I can only connect 1 ost – when the second ost tries to connect I see this:

            [root@wolf-4 mdisolation_step2_draid_testing]# zpool create -f -o cachefile=none -O recordsize=16MB ost1 draid1 /dev/mapper/mpathl /dev/mapper/mpathk /dev/mapper/mpathj /dev/mapper/mpathi /dev/mapper/mpathr /dev/mapper/mpathm /dev/mapper/mpathb /dev/mapper/mpatha /dev/mapper/mpatho /dev/mapper/mpathn /dev/mapper/mpathq spare '$draid1-0-s0' metadata mirror /dev/mapper/mpathg /dev/mapper/mpathx
            [root@wolf-4 mdisolation_step2_draid_testing]# zpool feature@large_blocks=enabled ost1
            [root@wolf-4 mdisolation_step2_draid_testing]# mkfs.lustre --backfstype=zfs --reformat --fsname=lsdraid --ost --index=1 --mgsnode=192.168.1.5@o2ib ost1/ost1

            Permanent disk data:
            Target: lsdraid:OST0001
            Index: 1
            Lustre FS: lsdraid
            Mount type: zfs
            Flags: 0x62
            (OST first_time update )
            Persistent mount opts:
            Parameters: mgsnode=192.168.1.5@o2ib

            mkfs_cmd = zfs create -o canmount=off -o xattr=sa ost1/ost1
            Writing ost1/ost1 properties
            lustre:version=1
            lustre:flags=98
            lustre:index=1
            lustre:fsname=lsdraid
            lustre:svname=lsdraid:OST0001
            lustre:mgsnode=192.168.1.5@o2ib
            [root@wolf-4 mdisolation_step2_draid_testing]# mount -t lustre ost1/ost1 /mnt/lustre/ost1

            Aug 20 19:55:13 wolf-4 kernel: ZFS: spa metadata allocation class feature not enabled (ddt,dmu,mos)
            Aug 20 19:55:13 wolf-4 kernel: ZFS: vdev_alloc primary 'ddt,dmu,mos'
            Aug 20 19:55:14 wolf-4 kernel: ZFS: vdev_construct_zaps assign primary 'ddt,dmu,mos'
            Aug 20 19:55:27 wolf-4 ntpd_intres[2643]: ntp_intres.request: permission denied
            Aug 20 19:55:27 wolf-4 ntpd_intres[2643]: ntp_intres.request: permission denied
            Aug 20 19:55:41 wolf-4 kernel: LustreError: 41114:0:(mgc_request.c:257:do_config_log_add()) MGC192.168.1.5@o2ib: failed processing log, type 4: rc = -22
            Aug 20 19:55:41 wolf-4 kernel: Lustre: lsdraid-OST0001: new disk, initializing
            Aug 20 19:55:41 wolf-4 kernel: Lustre: srv-lsdraid-OST0001: No data found on store. Initialize space
            Aug 20 19:55:46 wolf-4 kernel: Lustre: lsdraid-OST0001: Connection restored to 192.168.1.5@o2ib (at 192.168.1.5@o2ib)
            Aug 20 19:55:46 wolf-4 kernel: Lustre: Skipped 2 previous similar messages

            [root@wolf-4 mdisolation_step2_draid_testing]# mount |grep lustre
            ost1/ost1 on /mnt/lustre/ost1 type lustre (ro)

            from mds:
            [root@wolf-5 mdisolation_step2_draid_testing]# lctl dl
            0 UP osd-zfs lsdraid-MDT0000-osd lsdraid-MDT0000-osd_UUID 10
            1 UP mgs MGS MGS 9
            2 UP mgc MGC192.168.1.5@o2ib ac271bb5-f2ee-996f-94cf-6c9cd20a078a 5
            3 UP mds MDS MDS_uuid 3
            4 UP lod lsdraid-MDT0000-mdtlov lsdraid-MDT0000-mdtlov_UUID 4
            5 UP mdt lsdraid-MDT0000 lsdraid-MDT0000_UUID 9
            6 UP mdd lsdraid-MDD0000 lsdraid-MDD0000_UUID 4
            7 UP qmt lsdraid-QMT0000 lsdraid-QMT0000_UUID 4
            8 UP lwp lsdraid-MDT0000-lwp-MDT0000 lsdraid-MDT0000-lwp-MDT0000_UUID 5
            9 UP osp lsdraid-OST0000-osc-MDT0000 lsdraid-MDT0000-mdtlov_UUID 5
            10 UP osp lsdraid-OST0001-osc-MDT0000 lsdraid-MDT0000-mdtlov_UUID 5

            from oss1
            [root@wolf-3 mdisolation_step2_draid_testing]# lctl dl
            0 UP osd-zfs lsdraid-OST0000-osd lsdraid-OST0000-osd_UUID 5
            1 UP mgc MGC192.168.1.5@o2ib c2ddb1b6-eda7-06a1-c4c7-7a10c79169f2 5
            2 UP ost OSS OSS_uuid 3
            3 UP obdfilter lsdraid-OST0000 lsdraid-OST0000_UUID 5
            4 UP lwp lsdraid-MDT0000-lwp-OST0000 lsdraid-MDT0000-lwp-OST0000_UUID 5
            [root@wolf-3 mdisolation_step2_draid_testing]#

            from oss2
            [root@wolf-4 mdisolation_step2_draid_testing]# lctl dl
            0 UP osd-zfs lsdraid-OST0001-osd lsdraid-OST0001-osd_UUID 5
            1 UP mgc MGC192.168.1.5@o2ib 9baa6009-12fa-026e-a230-d8562d7b63c7 5
            2 UP ost OSS OSS_uuid 3
            3 UP obdfilter lsdraid-OST0001 lsdraid-OST0001_UUID 5
            4 UP lwp lsdraid-MDT0000-lwp-OST0001 lsdraid-MDT0000-lwp-OST0001_UUID 5

            jsalians_intel John Salinas (Inactive) added a comment - It does indeed appear that I can only connect 1 ost – when the second ost tries to connect I see this: [root@wolf-4 mdisolation_step2_draid_testing] # zpool create -f -o cachefile=none -O recordsize=16MB ost1 draid1 /dev/mapper/mpathl /dev/mapper/mpathk /dev/mapper/mpathj /dev/mapper/mpathi /dev/mapper/mpathr /dev/mapper/mpathm /dev/mapper/mpathb /dev/mapper/mpatha /dev/mapper/mpatho /dev/mapper/mpathn /dev/mapper/mpathq spare '$draid1-0-s0' metadata mirror /dev/mapper/mpathg /dev/mapper/mpathx [root@wolf-4 mdisolation_step2_draid_testing] # zpool feature@large_blocks=enabled ost1 [root@wolf-4 mdisolation_step2_draid_testing] # mkfs.lustre --backfstype=zfs --reformat --fsname=lsdraid --ost --index=1 --mgsnode=192.168.1.5@o2ib ost1/ost1 Permanent disk data: Target: lsdraid:OST0001 Index: 1 Lustre FS: lsdraid Mount type: zfs Flags: 0x62 (OST first_time update ) Persistent mount opts: Parameters: mgsnode=192.168.1.5@o2ib mkfs_cmd = zfs create -o canmount=off -o xattr=sa ost1/ost1 Writing ost1/ost1 properties lustre:version=1 lustre:flags=98 lustre:index=1 lustre:fsname=lsdraid lustre:svname=lsdraid:OST0001 lustre:mgsnode=192.168.1.5@o2ib [root@wolf-4 mdisolation_step2_draid_testing] # mount -t lustre ost1/ost1 /mnt/lustre/ost1 Aug 20 19:55:13 wolf-4 kernel: ZFS: spa metadata allocation class feature not enabled (ddt,dmu,mos) Aug 20 19:55:13 wolf-4 kernel: ZFS: vdev_alloc primary 'ddt,dmu,mos' Aug 20 19:55:14 wolf-4 kernel: ZFS: vdev_construct_zaps assign primary 'ddt,dmu,mos' Aug 20 19:55:27 wolf-4 ntpd_intres [2643] : ntp_intres.request: permission denied Aug 20 19:55:27 wolf-4 ntpd_intres [2643] : ntp_intres.request: permission denied Aug 20 19:55:41 wolf-4 kernel: LustreError: 41114:0:(mgc_request.c:257:do_config_log_add()) MGC192.168.1.5@o2ib: failed processing log, type 4: rc = -22 Aug 20 19:55:41 wolf-4 kernel: Lustre: lsdraid-OST0001: new disk, initializing Aug 20 19:55:41 wolf-4 kernel: Lustre: srv-lsdraid-OST0001: No data found on store. Initialize space Aug 20 19:55:46 wolf-4 kernel: Lustre: lsdraid-OST0001: Connection restored to 192.168.1.5@o2ib (at 192.168.1.5@o2ib) Aug 20 19:55:46 wolf-4 kernel: Lustre: Skipped 2 previous similar messages [root@wolf-4 mdisolation_step2_draid_testing] # mount |grep lustre ost1/ost1 on /mnt/lustre/ost1 type lustre (ro) from mds: [root@wolf-5 mdisolation_step2_draid_testing] # lctl dl 0 UP osd-zfs lsdraid-MDT0000-osd lsdraid-MDT0000-osd_UUID 10 1 UP mgs MGS MGS 9 2 UP mgc MGC192.168.1.5@o2ib ac271bb5-f2ee-996f-94cf-6c9cd20a078a 5 3 UP mds MDS MDS_uuid 3 4 UP lod lsdraid-MDT0000-mdtlov lsdraid-MDT0000-mdtlov_UUID 4 5 UP mdt lsdraid-MDT0000 lsdraid-MDT0000_UUID 9 6 UP mdd lsdraid-MDD0000 lsdraid-MDD0000_UUID 4 7 UP qmt lsdraid-QMT0000 lsdraid-QMT0000_UUID 4 8 UP lwp lsdraid-MDT0000-lwp-MDT0000 lsdraid-MDT0000-lwp-MDT0000_UUID 5 9 UP osp lsdraid-OST0000-osc-MDT0000 lsdraid-MDT0000-mdtlov_UUID 5 10 UP osp lsdraid-OST0001-osc-MDT0000 lsdraid-MDT0000-mdtlov_UUID 5 from oss1 [root@wolf-3 mdisolation_step2_draid_testing] # lctl dl 0 UP osd-zfs lsdraid-OST0000-osd lsdraid-OST0000-osd_UUID 5 1 UP mgc MGC192.168.1.5@o2ib c2ddb1b6-eda7-06a1-c4c7-7a10c79169f2 5 2 UP ost OSS OSS_uuid 3 3 UP obdfilter lsdraid-OST0000 lsdraid-OST0000_UUID 5 4 UP lwp lsdraid-MDT0000-lwp-OST0000 lsdraid-MDT0000-lwp-OST0000_UUID 5 [root@wolf-3 mdisolation_step2_draid_testing] # from oss2 [root@wolf-4 mdisolation_step2_draid_testing] # lctl dl 0 UP osd-zfs lsdraid-OST0001-osd lsdraid-OST0001-osd_UUID 5 1 UP mgc MGC192.168.1.5@o2ib 9baa6009-12fa-026e-a230-d8562d7b63c7 5 2 UP ost OSS OSS_uuid 3 3 UP obdfilter lsdraid-OST0001 lsdraid-OST0001_UUID 5 4 UP lwp lsdraid-MDT0000-lwp-OST0001 lsdraid-MDT0000-lwp-OST0001_UUID 5
            jsalians_intel John Salinas (Inactive) added a comment - - edited

            The good news is one OST can now connect to MDS. I saw one odd messages when clients are mounting I see this:

            [26059.015447] libcfs: module verification failed: signature and/or required key missing - tainting kernel
            [26059.030063] LNet: HW CPU cores: 72, npartitions: 8
            [26059.038309] alg: No test for adler32 (adler32-zlib)
            [26059.044985] alg: No test for crc32 (crc32-table)
            [26068.348811] LNet: HW CPU cores: 72, npartitions: 8
            [26068.357409] alg: No test for adler32 (adler32-zlib)
            [26068.364132] alg: No test for crc32 (crc32-table)
            [26076.388483] Lustre: Lustre: Build Version: 2.8.56
            [26076.471053] LNet: Added LNI 192.168.1.6@o2ib [8/512/0/180]
            [26076.536301] LustreError: 44972:0:(mgc_request.c:1533:mgc_apply_recover_logs()) mgc: cannot find uuid by nid 192.168.1.4@o2ib
            [26076.550097] Lustre: 44972:0:(mgc_request.c:1760:mgc_process_recover_nodemap_log()) MGC192.168.1.5@o2ib: error processing recovery log lsdraid-cliir: rc = -2

            .4 is OSS2 and .5 is MDS node

            [root@wolf-7 ~]# df -h /mnt/lustre/
            Filesystem Size Used Avail Use% Mounted on
            192.168.1.5@o2ib:/lsdraid 7.0T 16M 7.0T 1% /mnt/lustre

            so we are missing the zpool from OSS2:

            [root@wolf-3 ~]# zpool list
            NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
            ost0 9.97T 21.0M 9.97T - 0% 0% 1.00x ONLINE -
            [root@wolf-3 ~]# zfs list
            NAME USED AVAIL REFER MOUNTPOINT
            ost0 17.8M 7.01T 19K /ost0
            ost0/ost0 17.6M 7.01T 17.6M /ost0/ost0
            [root@wolf-3 ~]# mount |grep lustre
            ost0/ost0 on /mnt/lustre/ost0 type lustre (ro)

            [root@wolf-4 mdisolation_step2_draid_testing]# zpool list
            NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
            ost1 9.97T 846K 9.97T - 0% 0% 1.00x ONLINE -
            [root@wolf-4 mdisolation_step2_draid_testing]# zfs list
            NAME USED AVAIL REFER MOUNTPOINT
            ost1 1.51M 7.01T 19K /ost1
            ost1/ost1 1.37M 7.01T 1.37M /ost1/ost1
            [root@wolf-4 mdisolation_step2_draid_testing]# mount |grep lustre
            ost1/ost1 on /mnt/lustre/ost1 type lustre (ro)
            [25673.466998] ZFS: spa metadata allocation class feature not enabled (ddt,dmu,mos)
            [25673.477813] ZFS: vdev_alloc primary 'ddt,dmu,mos'
            [25674.379121] ZFS: vdev_construct_zaps assign primary 'ddt,dmu,mos'
            [25677.322609] LustreError: 52126:0:(mgc_request.c:257:do_config_log_add()) MGC192.168.1.5@o2ib: failed processing log, type 4: rc = -22
            [25677.342892] Lustre: lsdraid-OST0000: new disk, initializing
            [25677.351849] Lustre: srv-lsdraid-OST0000: No data found on store. Initialize space
            [25677.450718] Lustre: lsdraid-OST0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450
            [26177.093757] LustreError: 11-0: lsdraid-MDT0000-lwp-OST0000: operation obd_ping to node 192.168.1.5@o2ib failed: rc = -107
            [26177.108646] LustreError: Skipped 21 previous similar messages
            [26177.117392] Lustre: lsdraid-MDT0000-lwp-OST0000: Connection to lsdraid-MDT0000 (at 192.168.1.5@o2ib) was lost; in progress operations using this service will wait for recovery to complete
            [26177.140650] Lustre: Skipped 21 previous similar messages
            [26177.149666] LustreError: 167-0: lsdraid-MDT0000-lwp-OST0000: This client was evicted by lsdraid-MDT0000; in progress operations using this service will fail.
            [26177.170128] LustreError: Skipped 21 previous similar messages
            [26177.180193] Lustre: lsdraid-MDT0000-lwp-OST0000: Connection restored to 192.168.1.5@o2ib (at 192.168.1.5@o2ib)
            [26177.194074] Lustre: Skipped 21 previous similar messages

            jsalians_intel John Salinas (Inactive) added a comment - - edited The good news is one OST can now connect to MDS. I saw one odd messages when clients are mounting I see this: [26059.015447] libcfs: module verification failed: signature and/or required key missing - tainting kernel [26059.030063] LNet: HW CPU cores: 72, npartitions: 8 [26059.038309] alg: No test for adler32 (adler32-zlib) [26059.044985] alg: No test for crc32 (crc32-table) [26068.348811] LNet: HW CPU cores: 72, npartitions: 8 [26068.357409] alg: No test for adler32 (adler32-zlib) [26068.364132] alg: No test for crc32 (crc32-table) [26076.388483] Lustre: Lustre: Build Version: 2.8.56 [26076.471053] LNet: Added LNI 192.168.1.6@o2ib [8/512/0/180] [26076.536301] LustreError: 44972:0:(mgc_request.c:1533:mgc_apply_recover_logs()) mgc: cannot find uuid by nid 192.168.1.4@o2ib [26076.550097] Lustre: 44972:0:(mgc_request.c:1760:mgc_process_recover_nodemap_log()) MGC192.168.1.5@o2ib: error processing recovery log lsdraid-cliir: rc = -2 .4 is OSS2 and .5 is MDS node [root@wolf-7 ~] # df -h /mnt/lustre/ Filesystem Size Used Avail Use% Mounted on 192.168.1.5@o2ib:/lsdraid 7.0T 16M 7.0T 1% /mnt/lustre so we are missing the zpool from OSS2: [root@wolf-3 ~] # zpool list NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT ost0 9.97T 21.0M 9.97T - 0% 0% 1.00x ONLINE - [root@wolf-3 ~] # zfs list NAME USED AVAIL REFER MOUNTPOINT ost0 17.8M 7.01T 19K /ost0 ost0/ost0 17.6M 7.01T 17.6M /ost0/ost0 [root@wolf-3 ~] # mount |grep lustre ost0/ost0 on /mnt/lustre/ost0 type lustre (ro) [root@wolf-4 mdisolation_step2_draid_testing] # zpool list NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT ost1 9.97T 846K 9.97T - 0% 0% 1.00x ONLINE - [root@wolf-4 mdisolation_step2_draid_testing] # zfs list NAME USED AVAIL REFER MOUNTPOINT ost1 1.51M 7.01T 19K /ost1 ost1/ost1 1.37M 7.01T 1.37M /ost1/ost1 [root@wolf-4 mdisolation_step2_draid_testing] # mount |grep lustre ost1/ost1 on /mnt/lustre/ost1 type lustre (ro) [25673.466998] ZFS: spa metadata allocation class feature not enabled (ddt,dmu,mos) [25673.477813] ZFS: vdev_alloc primary 'ddt,dmu,mos' [25674.379121] ZFS: vdev_construct_zaps assign primary 'ddt,dmu,mos' [25677.322609] LustreError: 52126:0:(mgc_request.c:257:do_config_log_add()) MGC192.168.1.5@o2ib: failed processing log, type 4: rc = -22 [25677.342892] Lustre: lsdraid-OST0000: new disk, initializing [25677.351849] Lustre: srv-lsdraid-OST0000: No data found on store. Initialize space [25677.450718] Lustre: lsdraid-OST0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450 [26177.093757] LustreError: 11-0: lsdraid-MDT0000-lwp-OST0000: operation obd_ping to node 192.168.1.5@o2ib failed: rc = -107 [26177.108646] LustreError: Skipped 21 previous similar messages [26177.117392] Lustre: lsdraid-MDT0000-lwp-OST0000: Connection to lsdraid-MDT0000 (at 192.168.1.5@o2ib) was lost; in progress operations using this service will wait for recovery to complete [26177.140650] Lustre: Skipped 21 previous similar messages [26177.149666] LustreError: 167-0: lsdraid-MDT0000-lwp-OST0000: This client was evicted by lsdraid-MDT0000; in progress operations using this service will fail. [26177.170128] LustreError: Skipped 21 previous similar messages [26177.180193] Lustre: lsdraid-MDT0000-lwp-OST0000: Connection restored to 192.168.1.5@o2ib (at 192.168.1.5@o2ib) [26177.194074] Lustre: Skipped 21 previous similar messages

            Today I re-built with tip of master + this patch and was able to get an OSS to mount successfully – using the same scripts:

            1. ./setup_oss1.sh
              Restarting network (via systemctl): Job for network.service failed because the control process exited with error code. See "systemctl status network.service" and "journalctl -xe" for details.
              [FAILED]
              modprobe: FATAL: Module lnet is in use.
              modprobe: FATAL: Module zfs is in use.
              pool: ost0
              state: ONLINE
              scan: none requested
              config:

            NAME STATE READ WRITE CKSUM
            ost0 ONLINE 0 0 0
            draid1-0

            {any}

            ONLINE 0 0 0
            mpathaj ONLINE 0 0 0
            mpathai ONLINE 0 0 0
            mpathah ONLINE 0 0 0
            mpathag ONLINE 0 0 0
            mpathaq ONLINE 0 0 0
            mpathap ONLINE 0 0 0
            mpathak ONLINE 0 0 0
            mpathz ONLINE 0 0 0
            mpatham ONLINE 0 0 0
            mpathal ONLINE 0 0 0
            mpathao ONLINE 0 0 0
            mirror-1

            {ddt,dmu,mos}

            ONLINE 0 0 0
            mpathae ONLINE 0 0 0
            mpathaa ONLINE 0 0 0
            spares
            $draid1-0-s0 AVAIL

            errors: No known data errors
            ost0 feature@large_blocks enabled local

            Permanent disk data:
            Target: lsdraid-OST0000
            Index: 0
            Lustre FS: lsdraid
            Mount type: zfs
            Flags: 0x42
            (OST update )
            Persistent mount opts:
            Parameters: mgsnode=192.168.1.5@o2ib

            mkfs_cmd = zfs create -o canmount=off -o xattr=sa ost0/ost0
            Writing ost0/ost0 properties
            lustre:version=1
            lustre:flags=66
            lustre:index=0
            lustre:fsname=lsdraid
            lustre:svname=lsdraid-OST0000
            lustre:mgsnode=192.168.1.5@o2ib

            1. mount |grep ost0
              ost0 on /ost0 type zfs (rw,xattr,noacl)
              ost0/ost0 on /mnt/lustre/ost0 type lustre (ro)

            Will install the rest of the system and if we can get all of the nodes up then will mark this as fixed – but we will need this fix merged.

            jsalians_intel John Salinas (Inactive) added a comment - Today I re-built with tip of master + this patch and was able to get an OSS to mount successfully – using the same scripts: ./setup_oss1.sh Restarting network (via systemctl): Job for network.service failed because the control process exited with error code. See "systemctl status network.service" and "journalctl -xe" for details. [FAILED] modprobe: FATAL: Module lnet is in use. modprobe: FATAL: Module zfs is in use. pool: ost0 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM ost0 ONLINE 0 0 0 draid1-0 {any} ONLINE 0 0 0 mpathaj ONLINE 0 0 0 mpathai ONLINE 0 0 0 mpathah ONLINE 0 0 0 mpathag ONLINE 0 0 0 mpathaq ONLINE 0 0 0 mpathap ONLINE 0 0 0 mpathak ONLINE 0 0 0 mpathz ONLINE 0 0 0 mpatham ONLINE 0 0 0 mpathal ONLINE 0 0 0 mpathao ONLINE 0 0 0 mirror-1 {ddt,dmu,mos} ONLINE 0 0 0 mpathae ONLINE 0 0 0 mpathaa ONLINE 0 0 0 spares $draid1-0-s0 AVAIL errors: No known data errors ost0 feature@large_blocks enabled local Permanent disk data: Target: lsdraid-OST0000 Index: 0 Lustre FS: lsdraid Mount type: zfs Flags: 0x42 (OST update ) Persistent mount opts: Parameters: mgsnode=192.168.1.5@o2ib mkfs_cmd = zfs create -o canmount=off -o xattr=sa ost0/ost0 Writing ost0/ost0 properties lustre:version=1 lustre:flags=66 lustre:index=0 lustre:fsname=lsdraid lustre:svname=lsdraid-OST0000 lustre:mgsnode=192.168.1.5@o2ib mount |grep ost0 ost0 on /ost0 type zfs (rw,xattr,noacl) ost0/ost0 on /mnt/lustre/ost0 type lustre (ro) Will install the rest of the system and if we can get all of the nodes up then will mark this as fixed – but we will need this fix merged.

            I would be surprised if there could be a timing issue – I am running scripts by hand first on mds and once it is done oss1 and I try it several times by hand to capture debug lugs. This process has worked for months and months until nodemap came along. However, that doesn't mean there isn't another issue here.

            jsalians_intel John Salinas (Inactive) added a comment - I would be surprised if there could be a timing issue – I am running scripts by hand first on mds and once it is done oss1 and I try it several times by hand to capture debug lugs. This process has worked for months and months until nodemap came along. However, that doesn't mean there isn't another issue here.

            It looks like the nodemap config loaded properly, but there was an error in registration:
            On MDS:

            20000000:02020000:21.0:1471449182.711299:0:40275:0:(mgs_handler.c:297:mgs_check_target()) 13b-9: lsdraid-OST0000 claims to have registered, but this MGS does not know about it, preventing registration.
            

            On OSS:

            10000000:01000000:37.0:1471448892.078388:0:48119:0:(mgc_request.c:2086:mgc_process_log()) MGC192.168.1.5@o2ib: configuration from log 'nodemap' succeeded (0).
            

            Is it possible there's some kind of timing issue between reformatting the OSSes and the MDS?

            kit.westneat Kit Westneat (Inactive) added a comment - It looks like the nodemap config loaded properly, but there was an error in registration: On MDS: 20000000:02020000:21.0:1471449182.711299:0:40275:0:(mgs_handler.c:297:mgs_check_target()) 13b-9: lsdraid-OST0000 claims to have registered, but this MGS does not know about it, preventing registration. On OSS: 10000000:01000000:37.0:1471448892.078388:0:48119:0:(mgc_request.c:2086:mgc_process_log()) MGC192.168.1.5@o2ib: configuration from log 'nodemap' succeeded (0). Is it possible there's some kind of timing issue between reformatting the OSSes and the MDS?

            Please see:
            11192890 Aug 17 10:50 oss_lctl_log.afterpatch
            8799197 Aug 17 10:54 mds_lctl_log_afterpatch

            let me know if you need anything else

            jsalians_intel John Salinas (Inactive) added a comment - Please see: 11192890 Aug 17 10:50 oss_lctl_log.afterpatch 8799197 Aug 17 10:54 mds_lctl_log_afterpatch let me know if you need anything else

            Can you post the MDS and OSS logs from that time period?

            kit.westneat Kit Westneat (Inactive) added a comment - Can you post the MDS and OSS logs from that time period?

            Build #148 (Aug 17, 2016 1:01:09 PM)

            add description
            Changes LU-8498 nodemap: new zfs index files not properly initialized (detail)
            Manually triggered by user johnsali for Gerrit: http://review.whamcloud.com/21939 in silent mode
            Revision: 6ddcac0c807ffa7c847aef5faf46b6643d0ae538

            $ cat reprovision.sh
            date >> ljb_log
            loadjenkinsbuild -b 148 -j zfs-c-p-lustre-patch-vanilla -t server -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-4 -r >> ljb_log
            loadjenkinsbuild -b 148 -j zfs-c-p-lustre-patch-vanilla -t server -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-5 -r >> ljb_log
            loadjenkinsbuild -b 148 -j zfs-c-p-lustre-patch-vanilla -t server -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-3 -r >> ljb_log

            loadjenkinsbuild -b 148 -j zfs-c-p-lustre-patch-vanilla -t client -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-6 -r >> ljb_log
            loadjenkinsbuild -b 148 -j zfs-c-p-lustre-patch-vanilla -t client -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-7 -r >> ljb_log
            loadjenkinsbuild -b 148 -j zfs-c-p-lustre-patch-vanilla -t client -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-8 -r >> ljb_log
            loadjenkinsbuild -b 148 -j zfs-c-p-lustre-patch-vanilla -t client -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-9 -r >> ljb_log
            $ ./reprovision.sh

            MDS 0
            Restarting network (via systemctl): Job for network.service failed because the control process exited with error code. See "systemctl status network.service" and "journalctl -xe" for details.

            Permanent disk data:
            Target: lsdraid:MDT0000
            Index: 0
            Lustre FS: lsdraid
            Mount type: zfs
            Flags: 0x65
            (MDT MGS first_time update )
            Persistent mount opts:
            Parameters:

            checking for existing Lustre data: not found
            mkfs_cmd = zpool create -f -O canmount=off zfspool mirror /dev/sdb /dev/sdc mirror /dev/sdd /dev/sde
            mkfs_cmd = zfs create -o canmount=off -o xattr=sa zfspool/mdt1
            Writing zfspool/mdt1 properties
            lustre:version=1
            lustre:flags=101
            lustre:index=0
            lustre:fsname=lsdraid
            lustre:svname=lsdraid:MDT0000

            [ 3213.129550] ZFS: Loaded module v0.6.5-1, ZFS pool version 5000, ZFS filesystem version 5
            [ 3213.395985] sdb:
            [ 3213.540905] sdb: sdb1 sdb9
            [ 3213.707262] sdb: sdb1 sdb9
            [ 3213.844325] sdc:
            [ 3213.914772] sdc: sdc1 sdc9
            [ 3214.106819] sdc: sdc1 sdc9
            [ 3214.251383] sdd:
            [ 3214.295644] sdd: sdd1 sdd9
            [ 3214.463067] sdd: sdd1 sdd9
            [ 3214.600756] sde:
            [ 3214.653600] sde: sde1 sde9
            [ 3214.862012] sde: sde1 sde9
            [ 3215.053045] SPL: using hostid 0x00000000
            [ 3216.449252] Lustre: MGS: Connection restored to ee5079dc-35fc-2fc6-00a3-b0600c8552a3 (at 0@lo)
            [ 3216.844470] Lustre: ctl-lsdraid-MDT0000: No data found on store. Initialize space
            [ 3216.885787] Lustre: lsdraid-MDT0000: new disk, initializing
            [ 3217.429235] Lustre: ctl-lsdraid-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400]:0:mdt
            [ 3242.250332] Lustre: lsdraid-MDT0000: Connection restored to ee5079dc-35fc-2fc6-00a3-b0600c8552a3 (at 0@lo)
            [ 3385.282575] Lustre: MGS: Connection restored to 03723fcc-1218-74ec-5511-35a6a3f56a98 (at 192.168.1.3@o2ib)

            OSS 1
            Restarting network (via systemctl): Job for network.service failed because the control process exited with error code. See "systemctl status network.service" and "journalctl -xe" for details.
            [FAILED]
            pool: ost0
            state: ONLINE
            scan: none requested
            config:

            NAME STATE READ WRITE CKSUM
            ost0 ONLINE 0 0 0
            draid1-0

            {any}

            ONLINE 0 0 0
            mpathaj ONLINE 0 0 0
            mpathai ONLINE 0 0 0
            mpathah ONLINE 0 0 0
            mpathag ONLINE 0 0 0
            mpathaq ONLINE 0 0 0
            mpathap ONLINE 0 0 0
            mpathak ONLINE 0 0 0
            mpathz ONLINE 0 0 0
            mpatham ONLINE 0 0 0
            mpathal ONLINE 0 0 0
            mpathao ONLINE 0 0 0
            mirror-1

            {ddt,dmu,mos}

            ONLINE 0 0 0
            mpathae ONLINE 0 0 0
            mpathaa ONLINE 0 0 0
            spares
            $draid1-0-s0 AVAIL

            errors: No known data errors
            ost0 feature@large_blocks enabled local

            Permanent disk data:
            Target: lsdraid-OST0000
            Index: 0
            Lustre FS: lsdraid
            Mount type: zfs
            Flags: 0x42
            (OST update )
            Persistent mount opts:
            Parameters: mgsnode=192.168.1.5@o2ib

            mkfs_cmd = zfs create -o canmount=off -o xattr=sa ost0/ost0
            Writing ost0/ost0 properties
            lustre:version=1
            lustre:flags=66
            lustre:index=0
            lustre:fsname=lsdraid
            lustre:svname=lsdraid-OST0000
            lustre:mgsnode=192.168.1.5@o2ib
            mount.lustre: mount ost0/ost0 at /mnt/lustre/ost0 failed: No such file or directory
            Is the MGS specification correct?
            Is the filesystem name correct?
            If upgrading, is the copied client log valid? (see upgrade docs)

            [ 3359.353412] ZFS: Unloaded module v0.6.5-1
            [ 3359.365204] SPL: Unloaded module v0.6.5-1
            [ 3359.383985] SPL: Loaded module v0.6.5-1
            [ 3360.922575] ZFS: Loaded module v0.6.5-1, ZFS pool version 5000, ZFS filesystem version 5
            [ 3362.920084] ZFS: spa metadata allocation class feature not enabled (ddt,dmu,mos)
            [ 3362.929898] ZFS: vdev_alloc primary 'ddt,dmu,mos'
            [ 3363.067121] SPL: using hostid 0x00000000
            [ 3363.778060] ZFS: vdev_construct_zaps assign primary 'ddt,dmu,mos'
            [ 3366.736766] LustreError: 13a-8: Failed to get MGS log lsdraid-OST0000 and no local copy.
            [ 3366.746709] LustreError: 15c-8: MGC192.168.1.5@o2ib: The configuration from log 'lsdraid-OST0000' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
            [ 3366.775097] LustreError: 47845:0:(obd_mount_server.c:1352:server_start_targets()) failed to start server lsdraid-OST0000: -2
            [ 3366.788532] LustreError: 47845:0:(obd_mount_server.c:1844:server_fill_super()) Unable to start targets: -2
            [ 3366.800271] LustreError: 47845:0:(obd_mount_server.c:1558:server_put_super()) no obd lsdraid-OST0000
            [ 3367.038876] Lustre: server umount lsdraid-OST0000 complete
            [ 3367.046063] LustreError: 47845:0:(obd_mount.c:1453:lustre_fill_super()) Unable to mount (-2)

            jsalians_intel John Salinas (Inactive) added a comment - Build #148 (Aug 17, 2016 1:01:09 PM) add description Changes LU-8498 nodemap: new zfs index files not properly initialized (detail) Manually triggered by user johnsali for Gerrit: http://review.whamcloud.com/21939 in silent mode Revision: 6ddcac0c807ffa7c847aef5faf46b6643d0ae538 $ cat reprovision.sh date >> ljb_log loadjenkinsbuild -b 148 -j zfs-c-p-lustre-patch-vanilla -t server -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-4 -r >> ljb_log loadjenkinsbuild -b 148 -j zfs-c-p-lustre-patch-vanilla -t server -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-5 -r >> ljb_log loadjenkinsbuild -b 148 -j zfs-c-p-lustre-patch-vanilla -t server -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-3 -r >> ljb_log loadjenkinsbuild -b 148 -j zfs-c-p-lustre-patch-vanilla -t client -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-6 -r >> ljb_log loadjenkinsbuild -b 148 -j zfs-c-p-lustre-patch-vanilla -t client -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-7 -r >> ljb_log loadjenkinsbuild -b 148 -j zfs-c-p-lustre-patch-vanilla -t client -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-8 -r >> ljb_log loadjenkinsbuild -b 148 -j zfs-c-p-lustre-patch-vanilla -t client -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-9 -r >> ljb_log $ ./reprovision.sh MDS 0 Restarting network (via systemctl): Job for network.service failed because the control process exited with error code. See "systemctl status network.service" and "journalctl -xe" for details. Permanent disk data: Target: lsdraid:MDT0000 Index: 0 Lustre FS: lsdraid Mount type: zfs Flags: 0x65 (MDT MGS first_time update ) Persistent mount opts: Parameters: checking for existing Lustre data: not found mkfs_cmd = zpool create -f -O canmount=off zfspool mirror /dev/sdb /dev/sdc mirror /dev/sdd /dev/sde mkfs_cmd = zfs create -o canmount=off -o xattr=sa zfspool/mdt1 Writing zfspool/mdt1 properties lustre:version=1 lustre:flags=101 lustre:index=0 lustre:fsname=lsdraid lustre:svname=lsdraid:MDT0000 [ 3213.129550] ZFS: Loaded module v0.6.5-1, ZFS pool version 5000, ZFS filesystem version 5 [ 3213.395985] sdb: [ 3213.540905] sdb: sdb1 sdb9 [ 3213.707262] sdb: sdb1 sdb9 [ 3213.844325] sdc: [ 3213.914772] sdc: sdc1 sdc9 [ 3214.106819] sdc: sdc1 sdc9 [ 3214.251383] sdd: [ 3214.295644] sdd: sdd1 sdd9 [ 3214.463067] sdd: sdd1 sdd9 [ 3214.600756] sde: [ 3214.653600] sde: sde1 sde9 [ 3214.862012] sde: sde1 sde9 [ 3215.053045] SPL: using hostid 0x00000000 [ 3216.449252] Lustre: MGS: Connection restored to ee5079dc-35fc-2fc6-00a3-b0600c8552a3 (at 0@lo) [ 3216.844470] Lustre: ctl-lsdraid-MDT0000: No data found on store. Initialize space [ 3216.885787] Lustre: lsdraid-MDT0000: new disk, initializing [ 3217.429235] Lustre: ctl-lsdraid-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400] :0:mdt [ 3242.250332] Lustre: lsdraid-MDT0000: Connection restored to ee5079dc-35fc-2fc6-00a3-b0600c8552a3 (at 0@lo) [ 3385.282575] Lustre: MGS: Connection restored to 03723fcc-1218-74ec-5511-35a6a3f56a98 (at 192.168.1.3@o2ib) OSS 1 Restarting network (via systemctl): Job for network.service failed because the control process exited with error code. See "systemctl status network.service" and "journalctl -xe" for details. [FAILED] pool: ost0 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM ost0 ONLINE 0 0 0 draid1-0 {any} ONLINE 0 0 0 mpathaj ONLINE 0 0 0 mpathai ONLINE 0 0 0 mpathah ONLINE 0 0 0 mpathag ONLINE 0 0 0 mpathaq ONLINE 0 0 0 mpathap ONLINE 0 0 0 mpathak ONLINE 0 0 0 mpathz ONLINE 0 0 0 mpatham ONLINE 0 0 0 mpathal ONLINE 0 0 0 mpathao ONLINE 0 0 0 mirror-1 {ddt,dmu,mos} ONLINE 0 0 0 mpathae ONLINE 0 0 0 mpathaa ONLINE 0 0 0 spares $draid1-0-s0 AVAIL errors: No known data errors ost0 feature@large_blocks enabled local Permanent disk data: Target: lsdraid-OST0000 Index: 0 Lustre FS: lsdraid Mount type: zfs Flags: 0x42 (OST update ) Persistent mount opts: Parameters: mgsnode=192.168.1.5@o2ib mkfs_cmd = zfs create -o canmount=off -o xattr=sa ost0/ost0 Writing ost0/ost0 properties lustre:version=1 lustre:flags=66 lustre:index=0 lustre:fsname=lsdraid lustre:svname=lsdraid-OST0000 lustre:mgsnode=192.168.1.5@o2ib mount.lustre: mount ost0/ost0 at /mnt/lustre/ost0 failed: No such file or directory Is the MGS specification correct? Is the filesystem name correct? If upgrading, is the copied client log valid? (see upgrade docs) [ 3359.353412] ZFS: Unloaded module v0.6.5-1 [ 3359.365204] SPL: Unloaded module v0.6.5-1 [ 3359.383985] SPL: Loaded module v0.6.5-1 [ 3360.922575] ZFS: Loaded module v0.6.5-1, ZFS pool version 5000, ZFS filesystem version 5 [ 3362.920084] ZFS: spa metadata allocation class feature not enabled (ddt,dmu,mos) [ 3362.929898] ZFS: vdev_alloc primary 'ddt,dmu,mos' [ 3363.067121] SPL: using hostid 0x00000000 [ 3363.778060] ZFS: vdev_construct_zaps assign primary 'ddt,dmu,mos' [ 3366.736766] LustreError: 13a-8: Failed to get MGS log lsdraid-OST0000 and no local copy. [ 3366.746709] LustreError: 15c-8: MGC192.168.1.5@o2ib: The configuration from log 'lsdraid-OST0000' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. [ 3366.775097] LustreError: 47845:0:(obd_mount_server.c:1352:server_start_targets()) failed to start server lsdraid-OST0000: -2 [ 3366.788532] LustreError: 47845:0:(obd_mount_server.c:1844:server_fill_super()) Unable to start targets: -2 [ 3366.800271] LustreError: 47845:0:(obd_mount_server.c:1558:server_put_super()) no obd lsdraid-OST0000 [ 3367.038876] Lustre: server umount lsdraid-OST0000 complete [ 3367.046063] LustreError: 47845:0:(obd_mount.c:1453:lustre_fill_super()) Unable to mount (-2)
            jsalians_intel John Salinas (Inactive) added a comment - - edited

            Looks like the latest patch failed two tests:

            == sanity test 101f: check mmap read performance ===================================================== 15:55:08 (1471388108)
            /usr/bin/iozone
            Cancel LRU locks on lustre client to flush the client cache
            Reset readahead stats
            mmap read the file with small block size
            checking missing pages
            sanity test_101f: @@@@@@ FAIL: misses too much pages!
            Trace dump:
            = /usr/lib64/lustre/tests/test-framework.sh:4817:error()
            = /usr/lib64/lustre/tests/sanity.sh:6671:test_101f()
            = /usr/lib64/lustre/tests/test-framework.sh:5081:run_one()
            = /usr/lib64/lustre/tests/test-framework.sh:5120:run_one_logged()
            = /usr/lib64/lustre/tests/test-framework.sh:4967:run_test()
            = /usr/lib64/lustre/tests/sanity.sh:6674:main()

            Here is the debug log: https://testing.hpdd.intel.com/test_logs/de8292ec-6437-11e6-906c-5254006e85c2/show_text
            ...etc...
            00000080:00200000:1.0:1471388110.672492:0:19130:0:(rw26.c:779:ll_write_end()) queued page: 255.
            00000080:00200000:1.0:1471388110.672493:0:19130:0:(rw26.c:647:ll_write_begin()) Writing 32767 of 0 to 4096 bytes
            00000080:00200000:1.0:1471388110.672496:0:19130:0:(rw26.c:779:ll_write_end()) page@ffff8805daa07a00[3 ffff8805fbf98740 1 1 ffff8805fd1a1b78]

            00000080:00200000:1.0:1471388110.672497:0:19130:0:(rw26.c:779:ll_write_end()) vvp-page@ffff8805daa07a50(0:0) vm@ffffea00179166c0 2fffff00000805 3:0 ffff8805daa07a00 32767 lru

            00000080:00200000:1.0:1471388110.672498:0:19130:0:(rw26.c:779:ll_write_end()) lov-page@ffff8805daa07a90, raid0

            00000080:00200000:1.0:1471388110.672501:0:19130:0:(rw26.c:779:ll_write_end()) osc-page@ffff8805daa07af8 32767: 1< 0x845fed 0 0 - - > 2< 134213632 0 0 0x0 0x100 | (null) ffff8805fda93270 ffff8805fbec04c0 > 3< 0 0 0 > 4< 0 0 8 33816576 - | - - - - > 5< - - - - | 0 - | 0 - ->

            00000080:00200000:1.0:1471388110.672502:0:19130:0:(rw26.c:779:ll_write_end()) end page@ffff8805daa07a00

            00000080:00200000:1.0:1471388110.672502:0:19130:0:(rw26.c:779:ll_write_end()) queued page: 256.
            00000080:00200000:1.0:1471388110.672503:0:19130:0:(vvp_io.c:902:vvp_io_write_commit()) commit async pages: 256, from 0, to 4096
            00000008:00010000:1.0:1471388110.672509:0:19130:0:(osc_cache.c:816:osc_extent_find()) ### extent: ffff8805f7e7cfc0 ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 4/0,1 mode: PW/PW res: [0x1925:0x0:0x0].0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x20000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1
            00000008:00010000:1.0:1471388110.672528:0:19130:0:(osc_cache.c:2395:osc_queue_async_io()) ### extent: ffff8805f7e7cfc0 ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 4/0,1 mode: PW/PW res: [0x1925:0x0:0x0].0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x20000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1
            00000008:00010000:1.0:1471388110.672545:0:19130:0:(osc_cache.c:2395:osc_queue_async_io()) ### extent: ffff8805f7e7cfc0 ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 4/0,1 mode: PW/PW res: [0x1925:0x0:0x0].0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x20000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1
            00000008:00010000:1.0:1471388110.672561:0:19130:0:(osc_cache.c:2395:osc_queue_async_io()) ### extent: ffff8805f7e7cfc0 ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 4/0,1 mode: PW/PW res: [0x1925:0x0:0x0].0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x20000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1
            00000008:00010000:1.0:1471388110.672578:0:19130:0:(osc_cache.c:2395:osc_queue_async_io()) ### extent: ffff8805f7e7cfc0 ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 4/0,1 mode: PW/PW res: [0x1925:0x0:0x0].0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x20000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1
            00000008:00010000:1.0:1471388110.672594:0:19130:0:(osc_cache.c:2395:osc_queue_async_io()) ### extent: ffff8805f7e7cfc0 ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 4/0,1 mode: PW/PW res: [0x1925:0x0:0x0].0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x20000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1
            00000008:00010000:1.0:1471388110.672610:0:19130:0:(osc_cache.c:2395:osc_queue_async_io()) ### extent: ffff8805f7e7cfc0 ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 4/0,1 mode: PW/PW res: [0x1925:0x0:0x0].0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x20000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1
            00000008:00010000:1.0:1471388110.672626:0:19130:0:(osc_cache.c:2395:osc_queue_async_io()) ### extent: ffff8805f7e7cfc0 ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 4/0,1 mode: PW/PW res: [0x1925:0x0:0x0].0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x20000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1
            00000080:00200000:1.0:1471388110.672643:0:19130:0:(vvp_io.c:924:vvp_io_write_commit()) Committed 256 pages 1048576 bytes, tot: 1048576
            00000080:00200000:1.0:1471388110.672644:0:19130:0:(file.c:1036:ll_merge_attr()) [0x2000013a0:0xe9:0x0] updating i_size 134217728
            00000080:00200000:1.0:1471388110.672645:0:19130:0:(vvp_io.c:1042:vvp_io_write_start()) write: nob 1048576, result: 1048576
            00000020:00010000:1.0:1471388110.672650:0:19130:0:(cl_lock.c:58:cl_lock_trace0()) release lock: ffff880617cc4eb8 (ffff8805fd246998/0) at cl_lock_release():237
            00000020:00010000:1.0:1471388110.672651:0:19130:0:(cl_lock.c:58:cl_lock_trace0()) cancel lock: ffff880617cc4eb8 (ffff8805fd246998/0) at cl_lock_cancel():154
            00000100:00100000:0.0:1471388110.672652:0:6896:0:(ptlrpcd.c:411:ptlrpcd_check()) transfer 1 async RPCs [1->0]
            00000020:00010000:1.0:1471388110.672653:0:19130:0:(cl_lock.c:58:cl_lock_trace0()) cancel lock: ffff880619f20a30 (ffff880608208758/1) at cl_lock_cancel():154
            00010000:00010000:1.0:1471388110.672654:0:19130:0:(ldlm_lock.c:800:ldlm_lock_decref_internal_nolock()) ### ldlm_lock_decref(PW) ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 5/0,1 mode: PW/PW res: [0x1925:0x0:0x0].0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x10020000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1
            00000008:00010000:0.0:1471388110.672655:0:6896:0:(osc_cache.c:1097:osc_extent_make_ready()) ### extent: ffff8805f7e7cfc0 ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 5/0,1 mode: PW/PW res: [0x1925:0x0:0x0].0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x10020000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1
            00010000:00010000:1.0:1471388110.672657:0:19130:0:(ldlm_lock.c:873:ldlm_lock_decref_internal()) ### add lock into lru list ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 4/0,0 mode: PW/PW res: [0x1925:0x0:0x0].0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x10020000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1
            00000020:00010000:1.0:1471388110.672660:0:19130:0:(cl_lock.c:58:cl_lock_trace0()) destroy lock: ffff880617cc4eb8 (ffff8805fd246998/0) at cl_lock_fini():89
            00000020:00010000:1.0:1471388110.672661:0:19130:0:(cl_lock.c:58:cl_lock_trace0()) destroy lock: ffff880619f20a30 (ffff8805fd246998/1) at cl_lock_fini():89
            00000080:00200000:1.0:1471388110.672663:0:19130:0:(file.c:1169:ll_file_io_generic()) Range unlock [32512, 32767]
            00000080:00200000:1.0:1471388110.672664:0:19130:0:(vvp_io.c:312:vvp_io_fini()) [0x2000013a0:0xe9:0x0] ignore/verify layout 0/0, layout version 0 restore needed 0
            00000080:00200000:1.0:1471388110.672665:0:19130:0:(file.c:1219:ll_file_io_generic()) iot: 1, result: 1048576
            00000080:00200000:1.0:1471388110.672671:0:19130:0:(file.c:2933:ll_fsync()) VFS Op:inode=[0x2000013a0:0xe9:0x0](ffff8805fbddda90)
            00000080:00200000:1.0:1471388110.672674:0:19130:0:(vvp_io.c:1373:vvp_io_init()) [0x2000013a0:0xe9:0x0] ignore/verify layout 0/0, layout version 0 restore needed 0
            00020000:00200000:1.0:1471388110.672677:0:19130:0:(lov_io.c:417:lov_io_iter_init()) shrink: 0 [0, 18446744073709551615)
            00010000:00010000:0.0:1471388110.672740:0:6896:0:(ldlm_lock.c:1474:ldlm_lock_match()) ### matched (133169152 133173247) ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 3/0,0 mode: PW/PW res: [0x1925:0x0:0x0].0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x20000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1
            00000008:00100000:0.0:1471388110.672866:0:6896:0:(osc_request.c:1315:osc_brw_prep_request()) brw rpc ffff8805f6020900 - object 0x0:6437 offset 133169152<>134217728
            00000100:00100000:2.0:1471388110.672906:0:6898:0:(client.c:1620:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_00_02:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:6898:1542858500576320:192.168.5.153@o2ib:4
            00000008:00010000:2.0:1471388110.674148:0:6898:0:(osc_cache.c:857:osc_extent_finish()) ### extent: ffff8805f7e7cfc0 ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 2/0,0 mode: PW/PW res: [0x1925:0x0:0x0].0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x20000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1
            00000008:00010000:2.0:1471388110.674232:0:6898:0:(osc_cache.c:489:__osc_extent_remove()) ### extent: ffff8805f7e7cfc0 ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 2/0,0 mode: PW/PW res: [0x1925:0x0:0x0].0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x20000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1
            00000100:00100000:2.0:1471388110.674239:0:6898:0:(client.c:2043:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_00_02:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:6898:1542858500576320:192.168.5.153@o2ib:4
            00000080:00200000:1.0:1471388110.674265:0:19130:0:(vvp_io.c:312:vvp_io_fini()) [0x2000013a0:0xe9:0x0] ignore/verify layout 0/0, layout version 0 restore needed 0
            00000100:00100000:1.0:1471388110.674286:0:19130:0:(client.c:1620:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc iozone:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:19130:1542858500576336:192.168.5.144@o2ib:44
            00000100:00100000:1.0:1471388110.674303:0:19130:0:(client.c:2333:ptlrpc_set_wait()) set ffff88061bd7a900 going to sleep for 6 seconds
            00000100:00100000:1.0:1471388110.917462:0:19130:0:(client.c:2043:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc iozone:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:19130:1542858500576336:192.168.5.144@o2ib:44
            00000080:00200000:1.0:1471388110.917471:0:19130:0:(vvp_io.c:1373:vvp_io_init()) [0x2000013a0:0xe9:0x0] ignore/verify layout 0/0, layout version 0 restore needed 0
            00020000:00200000:1.0:1471388110.917475:0:19130:0:(lov_io.c:417:lov_io_iter_init()) shrink: 0 [0, 9223372036854775807)
            00000100:00100000:2.0:1471388110.917514:0:6898:0:(ptlrpcd.c:411:ptlrpcd_check()) transfer 1 async RPCs [3->2]
            00000100:00100000:2.0:1471388110.917520:0:6898:0:(client.c:1620:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_00_02:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:6898:1542858500576352:192.168.5.153@o2ib:16
            00000100:00100000:2.0:1471388112.175215:0:6898:0:(client.c:2687:ptlrpc_free_committed()) lustre-OST0001-osc-ffff880bf93c0000: committing for last_committed 21474841962 gen 1
            00000100:00100000:2.0:1471388112.176550:0:6898:0:(client.c:2043:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_00_02:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:6898:1542858500576352:192.168.5.153@o2ib:16
            00000080:00200000:1.0:1471388112.176597:0:19130:0:(vvp_io.c:312:vvp_io_fini()) [0x2000013a0:0xe9:0x0] ignore/verify layout 0/0, layout version 0 restore needed 0
            00000080:00200000:1.0:1471388112.176613:0:19130:0:(file.c:323:ll_file_release()) VFS Op:inode=[0x2000013a0:0xe9:0x0](ffff8805fbddda90)
            00000002:00080000:1.0:1471388112.176641:0:19130:0:(mdc_request.c:782:mdc_close()) @@@ matched open req@ffff8800b4abb900 x1542858500572144/t21474859659(21474859659) o101->lustre-MDT0000-mdc-ffff880bf93c0000@192.168.5.144@o2ib:12/10 lens 816/600 e 0 to 0 dl 1471388117 ref 1 fl Complete:RP/4/0 rc 0/0
            00000100:00100000:1.0:1471388112.176666:0:19130:0:(client.c:1620:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc iozone:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:19130:1542858500576368:192.168.5.144@o2ib:35
            00000100:00100000:1.0:1471388112.176700:0:19130:0:(client.c:2333:ptlrpc_set_wait()) set ffff88061bd7a900 going to sleep for 6 seconds
            00000100:00100000:1.0:1471388112.177257:0:19130:0:(client.c:2043:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc iozone:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:19130:1542858500576368:192.168.5.144@o2ib:35
            00000002:00100000:1.0:1471388112.177274:0:19130:0:(mdc_request.c:710:mdc_free_open()) @@@ free open request rq_replay= 0
            req@ffff8800b4abb900 x1542858500572144/t21474859659(21474859659) o101->lustre-MDT0000-mdc-ffff880bf93c0000@192.168.5.144@o2ib:12/10 lens 816/600 e 0 to 0 dl 1471388117 ref 1 fl Complete:R/4/0 rc 0/0
            00010000:00010000:17.0:1471388112.181769:0:19132:0:(ldlm_resource.c:323:lprocfs_lru_size_seq_write()) dropping all unused locks from namespace lustre-OST0000-osc-ffff880bf93c0000
            00010000:00010000:17.0:1471388112.181823:0:19132:0:(ldlm_resource.c:323:lprocfs_lru_size_seq_write()) dropping all unused locks from namespace lustre-OST0001-osc-ffff880bf93c0000
            00010000:00010000:7.0:1471388112.181886:0:13815:0:(ldlm_request.c:1115:ldlm_cli_cancel_local()) ### client-side cancel ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 2/0,0 mode: PW/PW res: [0x1925:0x0:0x0].0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x28400000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1
            00000080:00200000:7.0:1471388112.181908:0:13815:0:(vvp_io.c:1373:vvp_io_init()) [0x2000013a0:0xe9:0x0] ignore/verify layout 1/0, layout version 0 restore needed 0
            00000080:00200000:7.0:1471388112.242363:0:13815:0:(vvp_io.c:312:vvp_io_fini()) [0x2000013a0:0xe9:0x0] ignore/verify layout 1/0, layout version 0 restore needed 0
            00010000:00010000:7.0:1471388112.242395:0:13815:0:(ldlm_request.c:1174:ldlm_cancel_pack()) ### packing ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 1/0,0 mode: -/PW res: [0x1925:0x0:0x0].0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0>1048575) flags: 0x4869400000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1
            00010000:00010000:7.0:1471388112.242402:0:13815:0:(ldlm_request.c:1178:ldlm_cancel_pack()) 1 locks packed
            00000100:00100000:7.0:1471388112.242414:0:13815:0:(client.c:1620:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ldlm_bl_13:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:13815:1542858500576384:192.168.5.153@o2ib:103
            00000100:00100000:7.0:1471388112.242451:0:13815:0:(client.c:2333:ptlrpc_set_wait()) set ffff8805fd730e40 going to sleep for 6 seconds
            00000100:00100000:7.0:1471388112.242942:0:13815:0:(client.c:2043:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ldlm_bl_13:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:13815:1542858500576384:192.168.5.153@o2ib:103
            ...etc...
            00000020:00010000:6.0:1471388113.593184:0:19136:0:(cl_lock.c:58:cl_lock_trace0()) destroy lock: ffff880c1bbe8030 (ffff8806082087e8/1) at cl_lock_fini():89
            00000080:00200000:6.0:1471388113.593188:0:19136:0:(vvp_io.c:312:vvp_io_fini()) [0x2000013a0:0xe9:0x0] ignore/verify layout 0/1, layout version 0 restore needed 0
            00000080:00200000:6.0:1471388113.593207:0:19136:0:(file.c:3410:__ll_inode_revalidate()) VFS Op:inode=[0x200000007:0x1:0x0](ffff880bf55fc410),name=/
            00000002:00010000:6.0:1471388113.593212:0:19136:0:(mdc_locks.c:1083:mdc_intent_lock()) (name: ,[0x200000007:0x1:0x0]) in obj [0x200000007:0x1:0x0], intent: lookup flags 0
            00010000:00010000:6.0:1471388113.593216:0:19136:0:(ldlm_lock.c:746:ldlm_lock_addref_internal_nolock()) ### ldlm_lock_addref(PR) ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff880bfae55e00/0x96bbf1af18b6a255 lrc: 2/1,0 mode: PR/PR res: [0x200000007:0x1:0x0].0x0 bits 0x11 rrc: 2 type: IBT flags: 0x0 nid: local remote: 0x1a97a8993566831b expref: -99 pid: 5162 timeout: 0 lvb_type: 0
            00010000:00010000:6.0:1471388113.593225:0:19136:0:(ldlm_lock.c:1474:ldlm_lock_match()) ### matched (0 0) ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff880bfae55e00/0x96bbf1af18b6a255 lrc: 2/1,0 mode: PR/PR res: [0x200000007:0x1:0x0].0x0 bits 0x11 rrc: 1 type: IBT flags: 0x0 nid: local remote: 0x1a97a8993566831b expref: -99 pid: 5162 timeout: 0 lvb_type: 0
            00000080:00010000:6.0:1471388113.593230:0:19136:0:(dcache.c:298:ll_lookup_finish_locks()) setting l_data to inode [0x200000007:0x1:0x0](ffff880bf55fc410)
            00000080:00010000:6.0:1471388113.593232:0:19136:0:(llite_internal.h:1339:ll_set_lock_data()) setting l_data to inode [0x200000007:0x1:0x0](ffff880bf55fc410) for lock 0x96bbf1af18b6a255
            00000080:00010000:6.0:1471388113.593234:0:19136:0:(dcache.c:201:ll_intent_drop_lock()) releasing lock with cookie 0x96bbf1af18b6a255 from it ffff880bf787bbd8
            00010000:00010000:6.0:1471388113.593235:0:19136:0:(ldlm_lock.c:800:ldlm_lock_decref_internal_nolock()) ### ldlm_lock_decref(PR) ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff880bfae55e00/0x96bbf1af18b6a255 lrc: 3/1,0 mode: PR/PR res: [0x200000007:0x1:0x0].0x0 bits 0x11 rrc: 1 type: IBT flags: 0x10000000000000 nid: local remote: 0x1a97a8993566831b expref: -99 pid: 5162 timeout: 0 lvb_type: 0
            00010000:00010000:6.0:1471388113.593239:0:19136:0:(ldlm_lock.c:873:ldlm_lock_decref_internal()) ### add lock into lru list ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff880bfae55e00/0x96bbf1af18b6a255 lrc: 2/0,0 mode: PR/PR res: [0x200000007:0x1:0x0].0x0 bits 0x11 rrc: 1 type: IBT flags: 0x10000000000000 nid: local remote: 0x1a97a8993566831b expref: -99 pid: 5162 timeout: 0 lvb_type: 0
            00000080:00200000:6.0:1471388113.593243:0:19136:0:(file.c:3708:ll_inode_permission()) VFS Op:inode=[0x200000007:0x1:0x0](ffff880bf55fc410), inode mode 41ed mask 1
            00000080:00200000:6.0:1471388113.593248:0:19136:0:(dcache.c:361:ll_revalidate_nd()) VFS Op:name=f101f.sanity, flags=0
            00000080:00200000:6.0:1471388113.593250:0:19136:0:(file.c:3410:__ll_inode_revalidate()) VFS Op:inode=[0x200000007:0x1:0x0](ffff880bf55fc410),name=/
            00000002:00010000:6.0:1471388113.593252:0:19136:0:(mdc_locks.c:1083:mdc_intent_lock()) (name: ,[0x200000007:0x1:0x0]) in obj [0x200000007:0x1:0x0], intent: lookup flags 0
            00010000:00010000:6.0:1471388113.593254:0:19136:0:(ldlm_lock.c:746:ldlm_lock_addref_internal_nolock()) ### ldlm_lock_addref(PR) ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff880bfae55e00/0x96bbf1af18b6a255 lrc: 2/1,0 mode: PR/PR res: [0x200000007:0x1:0x0].0x0 bits 0x11 rrc: 2 type: IBT flags: 0x0 nid: local remote: 0x1a97a8993566831b expref: -99 pid: 5162 timeout: 0 lvb_type: 0
            00010000:00010000:6.0:1471388113.593258:0:19136:0:(ldlm_lock.c:1474:ldlm_lock_match()) ### matched (0 0) ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff880bfae55e00/0x96bbf1af18b6a255 lrc: 2/1,0 mode: PR/PR res: [0x200000007:0x1:0x0].0x0 bits 0x11 rrc: 1 type: IBT flags: 0x0 nid: local remote: 0x1a97a8993566831b expref: -99 pid: 5162 timeout: 0 lvb_type: 0
            00000080:00010000:6.0:1471388113.593263:0:19136:0:(dcache.c:298:ll_lookup_finish_locks()) setting l_data to inode [0x200000007:0x1:0x0](ffff880bf55fc410)
            00000080:00010000:6.0:1471388113.593264:0:19136:0:(llite_internal.h:1339:ll_set_lock_data()) setting l_data to inode [0x200000007:0x1:0x0](ffff880bf55fc410) for lock 0x96bbf1af18b6a255
            00000080:00010000:6.0:1471388113.593266:0:19136:0:(dcache.c:201:ll_intent_drop_lock()) releasing lock with cookie 0x96bbf1af18b6a255 from it ffff880bf787bd18
            00010000:00010000:6.0:1471388113.593267:0:19136:0:(ldlm_lock.c:800:ldlm_lock_decref_internal_nolock()) ### ldlm_lock_decref(PR) ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff880bfae55e00/0x96bbf1af18b6a255 lrc: 3/1,0 mode: PR/PR res: [0x200000007:0x1:0x0].0x0 bits 0x11 rrc: 1 type: IBT flags: 0x10000000000000 nid: local remote: 0x1a97a8993566831b expref: -99 pid: 5162 timeout: 0 lvb_type: 0
            00010000:00010000:6.0:1471388113.593275:0:19136:0:(ldlm_lock.c:873:ldlm_lock_decref_internal()) ### add lock into lru list ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff880bfae55e00/0x96bbf1af18b6a255 lrc: 2/0,0 mode: PR/PR res: [0x200000007:0x1:0x0].0x0 bits 0x11 rrc: 1 type: IBT flags: 0x10000000000000 nid: local remote: 0x1a97a8993566831b expref: -99 pid: 5162 timeout: 0 lvb_type: 0
            00000080:00200000:6.0:1471388113.593279:0:19136:0:(file.c:3708:ll_inode_permission()) VFS Op:inode=[0x200000007:0x1:0x0](ffff880bf55fc410), inode mode 41ed mask 3
            00000080:00200000:6.0:1471388113.593284:0:19136:0:(namei.c:1323:ll_unlink()) VFS Op:name=f101f.sanity, dir=[0x200000007:0x1:0x0](ffff880bf55fc410)
            00010000:00010000:6.0:1471388113.593293:0:19136:0:(ldlm_request.c:1115:ldlm_cli_cancel_local()) ### client-side cancel ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5a9000/0x96bbf1af18b6cdf9 lrc: 2/0,0 mode: PR/PR res: [0x2000013a0:0xe9:0x0].0x0 bits 0x1b rrc: 3 type: IBT flags: 0x28400000000 nid: local remote: 0x1a97a8993567018b expref: -99 pid: 19136 timeout: 0 lvb_type: 3
            00000080:00200000:6.0:1471388113.593304:0:19136:0:(vvp_object.c:141:vvp_conf_set()) [0x2000013a0:0xe9:0x0]: losing layout lock
            00010000:00010000:6.0:1471388113.593310:0:19136:0:(ldlm_request.c:1115:ldlm_cli_cancel_local()) ### client-side cancel ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fc0c4200/0x96bbf1af18b6cdcf lrc: 2/0,0 mode: CR/CR res: [0x2000013a0:0xe9:0x0].0x0 bits 0x9 rrc: 3 type: IBT flags: 0x28400000000 nid: local remote: 0x1a97a89935670137 expref: -99 pid: 19130 timeout: 0 lvb_type: 3
            00000080:00200000:6.0:1471388113.593315:0:19136:0:(vvp_object.c:141:vvp_conf_set()) [0x2000013a0:0xe9:0x0]: losing layout lock
            00010000:00010000:6.0:1471388113.593324:0:19136:0:(ldlm_request.c:1174:ldlm_cancel_pack()) ### packing ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5a9000/0x96bbf1af18b6cdf9 lrc: 1/0,0 mode: --/PR res: [0x2000013a0:0xe9:0x0].0x0 bits 0x1b rrc: 2 type: IBT flags: 0x4829400000000 nid: local remote: 0x1a97a8993567018b expref: -99 pid: 19136 timeout: 0 lvb_type: 3
            00010000:00010000:6.0:1471388113.593329:0:19136:0:(ldlm_request.c:1174:ldlm_cancel_pack()) ### packing ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fc0c4200/0x96bbf1af18b6cdcf lrc: 1/0,0 mode: --/CR res: [0x2000013a0:0xe9:0x0].0x0 bits 0x9 rrc: 2 type: IBT flags: 0x4829400000000 nid: local remote: 0x1a97a89935670137 expref: -99 pid: 19130 timeout: 0 lvb_type: 3
            00010000:00010000:6.0:1471388113.593334:0:19136:0:(ldlm_request.c:1178:ldlm_cancel_pack()) 2 locks packed
            00010000:00010000:6.0:1471388113.593335:0:19136:0:(ldlm_lock.c:200:ldlm_lock_put()) ### final lock_put on destroyed lock, freeing it. ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5a9000/0x96bbf1af18b6cdf9 lrc: 0/0,0 mode: --/PR res: [0x2000013a0:0xe9:0x0].0x0 bits 0x1b rrc: 2 type: IBT flags: 0x4829400000000 nid: local remote: 0x1a97a8993567018b expref: -99 pid: 19136 timeout: 0 lvb_type: 3
            00010000:00010000:6.0:1471388113.593340:0:19136:0:(ldlm_lock.c:200:ldlm_lock_put()) ### final lock_put on destroyed lock, freeing it. ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fc0c4200/0x96bbf1af18b6cdcf lrc: 0/0,0 mode: --/CR res: [0x2000013a0:0xe9:0x0].0x0 bits 0x9 rrc: 1 type: IBT flags: 0x4829400000000 nid: local remote: 0x1a97a89935670137 expref: -99 pid: 19130 timeout: 0 lvb_type: 3
            00000100:00100000:6.0:1471388113.593354:0:19136:0:(client.c:1620:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc iozone:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:19136:1542858500609344:192.168.5.144@o2ib:36
            00000100:00100000:6.0:1471388113.593373:0:19136:0:(client.c:2333:ptlrpc_set_wait()) set ffff8805f6d15c80 going to sleep for 6 seconds
            00000100:00100000:6.0:1471388113.594796:0:19136:0:(client.c:2043:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc iozone:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:19136:1542858500609344:192.168.5.144@o2ib:36
            00000080:00200000:6.0:1471388113.594832:0:19136:0:(file.c:3410:__ll_inode_revalidate()) VFS Op:inode=[0x200000007:0x1:0x0](ffff880bf55fc410),name=/
            00000002:00010000:6.0:1471388113.594836:0:19136:0:(mdc_locks.c:1083:mdc_intent_lock()) (name: ,[0x200000007:0x1:0x0]) in obj [0x200000007:0x1:0x0], intent: lookup flags 0
            00010000:00010000:6.0:1471388113.594840:0:19136:0:(ldlm_lock.c:746:ldlm_lock_addref_internal_nolock()) ### ldlm_lock_addref(PR) ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff880bfae55e00/0x96bbf1af18b6a255 lrc: 2/1,0 mode: PR/PR res: [0x200000007:0x1:0x0].0x0 bits 0x11 rrc: 2 type: IBT flags: 0x0 nid: local remote: 0x1a97a8993566831b expref: -99 pid: 5162 timeout: 0 lvb_type: 0
            00010000:00010000:6.0:1471388113.594845:0:19136:0:(ldlm_lock.c:1474:ldlm_lock_match()) ### matched (0 0) ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff880bfae55e00/0x96bbf1af18b6a255 lrc: 2/1,0 mode: PR/PR res: [0x200000007:0x1:0x0].0x0 bits 0x11 rrc: 1 type: IBT flags: 0x0 nid: local remote: 0x1a97a8993566831b expref: -99 pid: 5162 timeout: 0 lvb_type: 0
            00000080:00010000:6.0:1471388113.594850:0:19136:0:(dcache.c:298:ll_lookup_finish_locks()) setting l_data to inode [0x200000007:0x1:0x0](ffff880bf55fc410)
            00000080:00010000:6.0:1471388113.594851:0:19136:0:(llite_internal.h:1339:ll_set_lock_data()) setting l_data to inode [0x200000007:0x1:0x0](ffff880bf55fc410) for lock 0x96bbf1af18b6a255
            00000080:00010000:6.0:1471388113.594853:0:19136:0:(dcache.c:201:ll_intent_drop_lock()) releasing lock with cookie 0x96bbf1af18b6a255 from it ffff880bf787bb40
            00010000:00010000:6.0:1471388113.594854:0:19136:0:(ldlm_lock.c:800:ldlm_lock_decref_internal_nolock()) ### ldlm_lock_decref(PR) ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff880bfae55e00/0x96bbf1af18b6a255 lrc: 3/1,0 mode: PR/PR res: [0x200000007:0x1:0x0].0x0 bits 0x11 rrc: 1 type: IBT flags: 0x10000000000000 nid: local remote: 0x1a97a8993566831b expref: -99 pid: 5162 timeout: 0 lvb_type: 0
            00010000:00010000:6.0:1471388113.594859:0:19136:0:(ldlm_lock.c:873:ldlm_lock_decref_internal()) ### add lock into lru list ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff880bfae55e00/0x96bbf1af18b6a255 lrc: 2/0,0 mode: PR/PR res: [0x200000007:0x1:0x0].0x0 bits 0x11 rrc: 1 type: IBT flags: 0x10000000000000 nid: local remote: 0x1a97a8993566831b expref: -99 pid: 5162 timeout: 0 lvb_type: 0
            00000080:00200000:6.0:1471388113.594863:0:19136:0:(file.c:3708:ll_inode_permission()) VFS Op:inode=[0x200000007:0x1:0x0](ffff880bf55fc410), inode mode 41ed mask 1
            00000080:00200000:6.0:1471388113.594868:0:19136:0:(namei.c:665:ll_lookup_nd()) VFS Op:name=f101f.sanity.DUMMY, dir=[0x200000007:0x1:0x0](ffff880bf55fc410), flags=1
            00000080:00200000:6.0:1471388113.594871:0:19136:0:(namei.c:559:ll_lookup_it()) VFS Op:name=f101f.sanity.DUMMY, dir=[0x200000007:0x1:0x0](ffff880bf55fc410), intent=getattr
            00000002:00010000:6.0:1471388113.594875:0:19136:0:(mdc_locks.c:1083:mdc_intent_lock()) (name: f101f.sanity.DUMMY,[0x0:0x0:0x0]) in obj [0x200000007:0x1:0x0], intent: getattr flags 0
            00010000:00010000:6.0:1471388113.594886:0:19136:0:(ldlm_lock.c:746:ldlm_lock_addref_internal_nolock()) ### ldlm_lock_addref(CR) ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5aa200/0x96bbf1af18b6ce00 lrc: 3/1,0 mode: --/CR res: [0x200000007:0x1:0x0].0x0 bits 0x0 rrc: 2 type: IBT flags: 0x10000000000000 nid: local remote: 0x0 expref: -99 pid: 19136 timeout: 0 lvb_type: 0
            00010000:00010000:6.0:1471388113.594890:0:19136:0:(ldlm_request.c:921:ldlm_cli_enqueue()) ### client-side enqueue START, flags 0x1000 ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5aa200/0x96bbf1af18b6ce00 lrc: 3/1,0 mode: --/CR res: [0x200000007:0x1:0x0].0x0 bits 0x2 rrc: 2 type: IBT flags: 0x0 nid: local remote: 0x0 expref: -99 pid: 19136 timeout: 0 lvb_type: 0
            00010000:00010000:6.0:1471388113.594895:0:19136:0:(ldlm_request.c:976:ldlm_cli_enqueue()) ### sending request ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5aa200/0x96bbf1af18b6ce00 lrc: 3/1,0 mode: --/CR res: [0x200000007:0x1:0x0].0x0 bits 0x2 rrc: 2 type: IBT flags: 0x0 nid: local remote: 0x0 expref: -99 pid: 19136 timeout: 0 lvb_type: 0
            00000100:00100000:6.0:1471388113.594906:0:19136:0:(client.c:1620:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc iozone:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:19136:1542858500609360:192.168.5.144@o2ib:101
            00000100:00100000:6.0:1471388113.594918:0:19136:0:(client.c:2333:ptlrpc_set_wait()) set ffff8805f6d15c80 going to sleep for 6 seconds
            00000100:00100000:6.0:1471388113.595565:0:19136:0:(client.c:2043:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc iozone:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:19136:1542858500609360:192.168.5.144@o2ib:101
            00010000:00010000:6.0:1471388113.595586:0:19136:0:(ldlm_request.c:567:ldlm_cli_enqueue_fini()) ### client-side enqueue END (ABORTED) ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5aa200/0x96bbf1af18b6ce00 lrc: 4/1,0 mode: --/CR res: [0x200000007:0x1:0x0].0x0 bits 0x2 rrc: 2 type: IBT flags: 0x0 nid: local remote: 0x0 expref: -99 pid: 19136 timeout: 0 lvb_type: 0
            00010000:00010000:6.0:1471388113.595595:0:19136:0:(ldlm_request.c:513:failed_lock_cleanup()) ### setting FL_LOCAL_ONLY | LDLM_FL_FAILED | LDLM_FL_ATOMIC_CB | LDLM_FL_CBPENDING ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5aa200/0x96bbf1af18b6ce00 lrc: 4/1,0 mode: --/CR res: [0x200000007:0x1:0x0].0x0 bits 0x2 rrc: 2 type: IBT flags: 0x206400000000 nid: local remote: 0x0 expref: -99 pid: 19136 timeout: 0 lvb_type: 0
            00010000:00010000:6.0:1471388113.595600:0:19136:0:(ldlm_lock.c:800:ldlm_lock_decref_internal_nolock()) ### ldlm_lock_decref(CR) ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5aa200/0x96bbf1af18b6ce00 lrc: 4/1,0 mode: --/CR res: [0x200000007:0x1:0x0].0x0 bits 0x2 rrc: 2 type: IBT flags: 0x10206400000000 nid: local remote: 0x0 expref: -99 pid: 19136 timeout: 0 lvb_type: 0
            00010000:00010000:6.0:1471388113.595605:0:19136:0:(ldlm_lock.c:856:ldlm_lock_decref_internal()) ### final decref done on cbpending lock ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5aa200/0x96bbf1af18b6ce00 lrc: 3/0,0 mode: --/CR res: [0x200000007:0x1:0x0].0x0 bits 0x2 rrc: 2 type: IBT flags: 0x10206400000000 nid: local remote: 0x0 expref: -99 pid: 19136 timeout: 0 lvb_type: 0
            00010000:00010000:6.0:1471388113.595610:0:19136:0:(ldlm_lockd.c:1749:ldlm_handle_bl_callback()) ### client blocking AST callback handler ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5aa200/0x96bbf1af18b6ce00 lrc: 4/0,0 mode: --/CR res: [0x200000007:0x1:0x0].0x0 bits 0x2 rrc: 2 type: IBT flags: 0x206400000000 nid: local remote: 0x0 expref: -99 pid: 19136 timeout: 0 lvb_type: 0
            00010000:00010000:6.0:1471388113.595614:0:19136:0:(ldlm_lockd.c:1762:ldlm_handle_bl_callback()) Lock ffff8805fd5aa200 already unused, calling callback (ffffffffa0eceff0)
            00010000:00010000:6.0:1471388113.595616:0:19136:0:(ldlm_request.c:1115:ldlm_cli_cancel_local()) ### client-side cancel ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5aa200/0x96bbf1af18b6ce00 lrc: 5/0,0 mode: --/CR res: [0x200000007:0x1:0x0].0x0 bits 0x2 rrc: 2 type: IBT flags: 0x20e400000000 nid: local remote: 0x0 expref: -99 pid: 19136 timeout: 0 lvb_type: 0
            00010000:00010000:6.0:1471388113.595628:0:19136:0:(ldlm_request.c:1128:ldlm_cli_cancel_local()) not sending request (at caller's instruction)
            00010000:00010000:6.0:1471388113.595630:0:19136:0:(ldlm_lockd.c:1771:ldlm_handle_bl_callback()) ### client blocking callback handler END ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5aa200/0x96bbf1af18b6ce00 lrc: 3/0,0 mode: --/CR res: [0x200000007:0x1:0x0].0x0 bits 0x2 rrc: 2 type: IBT flags: 0x4a0f400000000 nid: local remote: 0x0 expref: -99 pid: 19136 timeout: 0 lvb_type: 0
            00010000:00010000:6.0:1471388113.595635:0:19136:0:(ldlm_lock.c:200:ldlm_lock_put()) ### final lock_put on destroyed lock, freeing it. ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5aa200/0x96bbf1af18b6ce00 lrc: 0/0,0 mode: --/CR res: [0x200000007:0x1:0x0].0x0 bits 0x2 rrc: 2 type: IBT flags: 0x4a0f400000000 nid: local remote: 0x0 expref: -99 pid: 19136 timeout: 0 lvb_type: 0
            00000002:00100000:6.0:1471388113.595646:0:19136:0:(mdc_locks.c:587:mdc_finish_enqueue()) @@@ op: 8 disposition: 7, status: 0 req@ffff8805f8e1b300 x1542858500609360/t0(0) o101->lustre-MDT0000-mdc-ffff880bf93c0000@192.168.5.144@o2ib:12/10 lens 712/536 e 0 to 0 dl 1471388120 ref 1 fl Complete:R/0/0 rc 301/301
            00010000:00010000:6.0:1471388113.595659:0:19136:0:(ldlm_lock.c:1495:ldlm_lock_match()) ### not matched ns ffff8805fa8cb000 type 13 mode 30 res 8589934599/1 (0 0)
            This looks like LU-8413

            and test_21 which had a TIMEOUT and test failed to respond and it timed out. It appeared the OSS node ran out of memory during the test which caused processes including system.journal to be killed. It appears that this caused the dd on the client to be continually blocked for me than 120 seconds and never make progress. 18:57:54:[ 35.401655] Mem-Info:
            18:57:54:[ 35.405029] Node 0 DMA per-cpu:
            18:57:54:[ 35.409384] CPU 0: hi: 0, btch: 1 usd: 0
            18:57:54:[ 35.415581] Node 0 DMA32 per-cpu:
            18:57:54:[ 35.420128] CPU 0: hi: 42, btch: 7 usd: 36
            18:57:54:[ 35.426323] active_anon:632 inactive_anon:2092 isolated_anon:0
            18:57:54:[ 35.426323] active_file:0 inactive_file:0 isolated_file:0
            18:57:54:[ 35.426323] unevictable:18297 dirty:0 writeback:0 unstable:0
            18:57:54:[ 35.426323] free:499 slab_reclaimable:1742 slab_unreclaimable:4599
            18:57:54:[ 35.426323] mapped:748 shmem:2129 pagetables:74 bounce:0
            18:57:54:[ 35.426323] free_cma:0
            18:57:54:[ 35.466224] Node 0 DMA free:488kB min:4kB low:4kB high:4kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:572kB managed:488kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
            18:57:54:[ 35.516862] lowmem_reserve[]: 0 140 140 140
            18:57:54:[ 35.523578] Node 0 DMA32 free:1508kB min:1512kB low:1888kB high:2268kB active_anon:2528kB inactive_anon:8368kB active_file:0kB inactive_file:0kB unevictable:73188kB isolated(anon):0kB isolated(file):0kB present:166320kB managed:145724kB mlocked:0kB dirty:0kB writeback:0kB mapped:2992kB shmem:8516kB slab_reclaimable:6968kB slab_unreclaimable:18396kB kernel_stack:1344kB pagetables:296kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
            18:57:54:[ 35.579210] lowmem_reserve[]: 0 0 0 0
            18:57:54:[ 35.585507] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 488kB
            18:57:54:[ 35.603198] Node 0 DMA32: 183*4kB (UEMR) 49*8kB (UR) 14*16kB (R) 5*32kB (R) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1508kB
            18:57:54:[ 35.621271] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
            18:57:54:[ 35.632651] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
            18:57:54:[ 35.643597] 20426 total pagecache pages
            18:57:54:[ 35.649313] 0 pages in swap cache
            18:57:54:[ 35.654428] Swap cache stats: add 0, delete 0, find 0/0
            18:57:54:[ 35.661603] Free swap = 0kB
            18:57:54:[ 35.666143] Total swap = 0kB
            18:57:54:[ 35.670610] 41723 pages RAM
            18:57:54:[ 35.674975] 0 pages HighMem/MovableOnly
            18:57:54:[ 35.680511] 5170 pages reserved
            18:57:54:[ 35.685199] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
            18:57:54:[ 35.695151] [ 200] 0 200 4289 99 12 0 0 rpc.idmapd
            18:57:54:[ 35.705657] [ 227] 0 227 8998 639 21 0 -1000 systemd-udevd
            18:57:54:[ 35.716393] [ 399] 0 399 7157 316 17 0 0 systemd-journal
            18:57:54:[ 35.727277] Out of memory: Kill process 399 (systemd-journal) score 8 or sacrifice child
            18:57:54:[ 35.737359] Killed process 399 (systemd-journal) total-vm:28628kB, anon-rss:228kB, file-rss:1036kB
            18:57:54:[ 35.771271] systemd[1]: systemd-journald.service: main process exited, code=killed, status=9/KILL
            18:57:54:[FAILED] Failed to start Journal Service.
            18:57:54:See 'systemctl status systemd-journald.service' for details.
            18:57:54:[ 35.796637] systemd[1]: Failed to start Journal Service.
            18:57:54:[ 35.803704] systemd[1]: Unit systemd-journald.service entered failed state.
            18:57:54:[ 35.812630] systemd[1]: systemd-journald.service failed.
            18:57:54:[ 35.820624] systemd[1]: systemd-journald.service has no holdoff time, scheduling restart.
            18:57:54:[ OK ] Stopped Journal Service.
            18:57:54:[ 35.836694] systemd[1]: start request repeated too quickly for systemd-journald.service
            18:57:54:[FAILED] Failed to start Journal Service.
            18:57:54:See 'systemctl status systemd-journald.service' for details.
            18:57:54:[ 35.860669] systemd[1]: Failed to start Journal Service.
            18:57:54:[ 35.889869] systemd[1]: Unit systemd-journald.service entered failed state.
            18:57:54:[ 35.898766] systemd[1]: systemd-journald.service failed.
            18:57:54:[ 35.905807] systemd[1]: start request repeated too quickly for systemd-journald.service
            18:57:54:[FAILED] Failed to start Journal Service.
            18:57:54:See 'systemctl status systemd-journald.service' for details.
            18:57:54:[ 35.930716] systemd[1]: Failed to start Journal Service.
            18:57:54:[ 35.937928] systemd-udevd[227]: worker [384] failed while handling '/devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:08.0/0000:03:00.0/host0/port-0:0/end_device-0:0/target0:0:0/0:0:0:0/block/sda/sda2'
            This looks like LU-8326

            Will get a build going and see how it looks on my end.

            jsalians_intel John Salinas (Inactive) added a comment - - edited Looks like the latest patch failed two tests: == sanity test 101f: check mmap read performance ===================================================== 15:55:08 (1471388108) /usr/bin/iozone Cancel LRU locks on lustre client to flush the client cache Reset readahead stats mmap read the file with small block size checking missing pages sanity test_101f: @@@@@@ FAIL: misses too much pages! Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4817:error() = /usr/lib64/lustre/tests/sanity.sh:6671:test_101f() = /usr/lib64/lustre/tests/test-framework.sh:5081:run_one() = /usr/lib64/lustre/tests/test-framework.sh:5120:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:4967:run_test() = /usr/lib64/lustre/tests/sanity.sh:6674:main() Here is the debug log: https://testing.hpdd.intel.com/test_logs/de8292ec-6437-11e6-906c-5254006e85c2/show_text ...etc... 00000080:00200000:1.0:1471388110.672492:0:19130:0:(rw26.c:779:ll_write_end()) queued page: 255. 00000080:00200000:1.0:1471388110.672493:0:19130:0:(rw26.c:647:ll_write_begin()) Writing 32767 of 0 to 4096 bytes 00000080:00200000:1.0:1471388110.672496:0:19130:0:(rw26.c:779:ll_write_end()) page@ffff8805daa07a00 [3 ffff8805fbf98740 1 1 ffff8805fd1a1b78] 00000080:00200000:1.0:1471388110.672497:0:19130:0:(rw26.c:779:ll_write_end()) vvp-page@ffff8805daa07a50(0:0) vm@ffffea00179166c0 2fffff00000805 3:0 ffff8805daa07a00 32767 lru 00000080:00200000:1.0:1471388110.672498:0:19130:0:(rw26.c:779:ll_write_end()) lov-page@ffff8805daa07a90, raid0 00000080:00200000:1.0:1471388110.672501:0:19130:0:(rw26.c:779:ll_write_end()) osc-page@ffff8805daa07af8 32767: 1< 0x845fed 0 0 - - > 2< 134213632 0 0 0x0 0x100 | (null) ffff8805fda93270 ffff8805fbec04c0 > 3< 0 0 0 > 4< 0 0 8 33816576 - | - - - - > 5< - - - - | 0 - | 0 - -> 00000080:00200000:1.0:1471388110.672502:0:19130:0:(rw26.c:779:ll_write_end()) end page@ffff8805daa07a00 00000080:00200000:1.0:1471388110.672502:0:19130:0:(rw26.c:779:ll_write_end()) queued page: 256. 00000080:00200000:1.0:1471388110.672503:0:19130:0:(vvp_io.c:902:vvp_io_write_commit()) commit async pages: 256, from 0, to 4096 00000008:00010000:1.0:1471388110.672509:0:19130:0:(osc_cache.c:816:osc_extent_find()) ### extent: ffff8805f7e7cfc0 ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 4/0,1 mode: PW/PW res: [0x1925:0x0:0x0] .0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x20000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1 00000008:00010000:1.0:1471388110.672528:0:19130:0:(osc_cache.c:2395:osc_queue_async_io()) ### extent: ffff8805f7e7cfc0 ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 4/0,1 mode: PW/PW res: [0x1925:0x0:0x0] .0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x20000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1 00000008:00010000:1.0:1471388110.672545:0:19130:0:(osc_cache.c:2395:osc_queue_async_io()) ### extent: ffff8805f7e7cfc0 ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 4/0,1 mode: PW/PW res: [0x1925:0x0:0x0] .0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x20000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1 00000008:00010000:1.0:1471388110.672561:0:19130:0:(osc_cache.c:2395:osc_queue_async_io()) ### extent: ffff8805f7e7cfc0 ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 4/0,1 mode: PW/PW res: [0x1925:0x0:0x0] .0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x20000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1 00000008:00010000:1.0:1471388110.672578:0:19130:0:(osc_cache.c:2395:osc_queue_async_io()) ### extent: ffff8805f7e7cfc0 ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 4/0,1 mode: PW/PW res: [0x1925:0x0:0x0] .0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x20000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1 00000008:00010000:1.0:1471388110.672594:0:19130:0:(osc_cache.c:2395:osc_queue_async_io()) ### extent: ffff8805f7e7cfc0 ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 4/0,1 mode: PW/PW res: [0x1925:0x0:0x0] .0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x20000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1 00000008:00010000:1.0:1471388110.672610:0:19130:0:(osc_cache.c:2395:osc_queue_async_io()) ### extent: ffff8805f7e7cfc0 ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 4/0,1 mode: PW/PW res: [0x1925:0x0:0x0] .0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x20000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1 00000008:00010000:1.0:1471388110.672626:0:19130:0:(osc_cache.c:2395:osc_queue_async_io()) ### extent: ffff8805f7e7cfc0 ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 4/0,1 mode: PW/PW res: [0x1925:0x0:0x0] .0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x20000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1 00000080:00200000:1.0:1471388110.672643:0:19130:0:(vvp_io.c:924:vvp_io_write_commit()) Committed 256 pages 1048576 bytes, tot: 1048576 00000080:00200000:1.0:1471388110.672644:0:19130:0:(file.c:1036:ll_merge_attr()) [0x2000013a0:0xe9:0x0] updating i_size 134217728 00000080:00200000:1.0:1471388110.672645:0:19130:0:(vvp_io.c:1042:vvp_io_write_start()) write: nob 1048576, result: 1048576 00000020:00010000:1.0:1471388110.672650:0:19130:0:(cl_lock.c:58:cl_lock_trace0()) release lock: ffff880617cc4eb8 (ffff8805fd246998/0) at cl_lock_release():237 00000020:00010000:1.0:1471388110.672651:0:19130:0:(cl_lock.c:58:cl_lock_trace0()) cancel lock: ffff880617cc4eb8 (ffff8805fd246998/0) at cl_lock_cancel():154 00000100:00100000:0.0:1471388110.672652:0:6896:0:(ptlrpcd.c:411:ptlrpcd_check()) transfer 1 async RPCs [1->0] 00000020:00010000:1.0:1471388110.672653:0:19130:0:(cl_lock.c:58:cl_lock_trace0()) cancel lock: ffff880619f20a30 (ffff880608208758/1) at cl_lock_cancel():154 00010000:00010000:1.0:1471388110.672654:0:19130:0:(ldlm_lock.c:800:ldlm_lock_decref_internal_nolock()) ### ldlm_lock_decref(PW) ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 5/0,1 mode: PW/PW res: [0x1925:0x0:0x0] .0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x10020000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1 00000008:00010000:0.0:1471388110.672655:0:6896:0:(osc_cache.c:1097:osc_extent_make_ready()) ### extent: ffff8805f7e7cfc0 ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 5/0,1 mode: PW/PW res: [0x1925:0x0:0x0] .0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x10020000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1 00010000:00010000:1.0:1471388110.672657:0:19130:0:(ldlm_lock.c:873:ldlm_lock_decref_internal()) ### add lock into lru list ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 4/0,0 mode: PW/PW res: [0x1925:0x0:0x0] .0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x10020000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1 00000020:00010000:1.0:1471388110.672660:0:19130:0:(cl_lock.c:58:cl_lock_trace0()) destroy lock: ffff880617cc4eb8 (ffff8805fd246998/0) at cl_lock_fini():89 00000020:00010000:1.0:1471388110.672661:0:19130:0:(cl_lock.c:58:cl_lock_trace0()) destroy lock: ffff880619f20a30 (ffff8805fd246998/1) at cl_lock_fini():89 00000080:00200000:1.0:1471388110.672663:0:19130:0:(file.c:1169:ll_file_io_generic()) Range unlock [32512, 32767] 00000080:00200000:1.0:1471388110.672664:0:19130:0:(vvp_io.c:312:vvp_io_fini()) [0x2000013a0:0xe9:0x0] ignore/verify layout 0/0, layout version 0 restore needed 0 00000080:00200000:1.0:1471388110.672665:0:19130:0:(file.c:1219:ll_file_io_generic()) iot: 1, result: 1048576 00000080:00200000:1.0:1471388110.672671:0:19130:0:(file.c:2933:ll_fsync()) VFS Op:inode= [0x2000013a0:0xe9:0x0] (ffff8805fbddda90) 00000080:00200000:1.0:1471388110.672674:0:19130:0:(vvp_io.c:1373:vvp_io_init()) [0x2000013a0:0xe9:0x0] ignore/verify layout 0/0, layout version 0 restore needed 0 00020000:00200000:1.0:1471388110.672677:0:19130:0:(lov_io.c:417:lov_io_iter_init()) shrink: 0 [0, 18446744073709551615) 00010000:00010000:0.0:1471388110.672740:0:6896:0:(ldlm_lock.c:1474:ldlm_lock_match()) ### matched (133169152 133173247) ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 3/0,0 mode: PW/PW res: [0x1925:0x0:0x0] .0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x20000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1 00000008:00100000:0.0:1471388110.672866:0:6896:0:(osc_request.c:1315:osc_brw_prep_request()) brw rpc ffff8805f6020900 - object 0x0:6437 offset 133169152<>134217728 00000100:00100000:2.0:1471388110.672906:0:6898:0:(client.c:1620:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_00_02:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:6898:1542858500576320:192.168.5.153@o2ib:4 00000008:00010000:2.0:1471388110.674148:0:6898:0:(osc_cache.c:857:osc_extent_finish()) ### extent: ffff8805f7e7cfc0 ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 2/0,0 mode: PW/PW res: [0x1925:0x0:0x0] .0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x20000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1 00000008:00010000:2.0:1471388110.674232:0:6898:0:(osc_cache.c:489:__osc_extent_remove()) ### extent: ffff8805f7e7cfc0 ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 2/0,0 mode: PW/PW res: [0x1925:0x0:0x0] .0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x20000000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1 00000100:00100000:2.0:1471388110.674239:0:6898:0:(client.c:2043:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_00_02:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:6898:1542858500576320:192.168.5.153@o2ib:4 00000080:00200000:1.0:1471388110.674265:0:19130:0:(vvp_io.c:312:vvp_io_fini()) [0x2000013a0:0xe9:0x0] ignore/verify layout 0/0, layout version 0 restore needed 0 00000100:00100000:1.0:1471388110.674286:0:19130:0:(client.c:1620:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc iozone:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:19130:1542858500576336:192.168.5.144@o2ib:44 00000100:00100000:1.0:1471388110.674303:0:19130:0:(client.c:2333:ptlrpc_set_wait()) set ffff88061bd7a900 going to sleep for 6 seconds 00000100:00100000:1.0:1471388110.917462:0:19130:0:(client.c:2043:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc iozone:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:19130:1542858500576336:192.168.5.144@o2ib:44 00000080:00200000:1.0:1471388110.917471:0:19130:0:(vvp_io.c:1373:vvp_io_init()) [0x2000013a0:0xe9:0x0] ignore/verify layout 0/0, layout version 0 restore needed 0 00020000:00200000:1.0:1471388110.917475:0:19130:0:(lov_io.c:417:lov_io_iter_init()) shrink: 0 [0, 9223372036854775807) 00000100:00100000:2.0:1471388110.917514:0:6898:0:(ptlrpcd.c:411:ptlrpcd_check()) transfer 1 async RPCs [3->2] 00000100:00100000:2.0:1471388110.917520:0:6898:0:(client.c:1620:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_00_02:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:6898:1542858500576352:192.168.5.153@o2ib:16 00000100:00100000:2.0:1471388112.175215:0:6898:0:(client.c:2687:ptlrpc_free_committed()) lustre-OST0001-osc-ffff880bf93c0000: committing for last_committed 21474841962 gen 1 00000100:00100000:2.0:1471388112.176550:0:6898:0:(client.c:2043:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_00_02:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:6898:1542858500576352:192.168.5.153@o2ib:16 00000080:00200000:1.0:1471388112.176597:0:19130:0:(vvp_io.c:312:vvp_io_fini()) [0x2000013a0:0xe9:0x0] ignore/verify layout 0/0, layout version 0 restore needed 0 00000080:00200000:1.0:1471388112.176613:0:19130:0:(file.c:323:ll_file_release()) VFS Op:inode= [0x2000013a0:0xe9:0x0] (ffff8805fbddda90) 00000002:00080000:1.0:1471388112.176641:0:19130:0:(mdc_request.c:782:mdc_close()) @@@ matched open req@ffff8800b4abb900 x1542858500572144/t21474859659(21474859659) o101->lustre-MDT0000-mdc-ffff880bf93c0000@192.168.5.144@o2ib:12/10 lens 816/600 e 0 to 0 dl 1471388117 ref 1 fl Complete:RP/4/0 rc 0/0 00000100:00100000:1.0:1471388112.176666:0:19130:0:(client.c:1620:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc iozone:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:19130:1542858500576368:192.168.5.144@o2ib:35 00000100:00100000:1.0:1471388112.176700:0:19130:0:(client.c:2333:ptlrpc_set_wait()) set ffff88061bd7a900 going to sleep for 6 seconds 00000100:00100000:1.0:1471388112.177257:0:19130:0:(client.c:2043:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc iozone:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:19130:1542858500576368:192.168.5.144@o2ib:35 00000002:00100000:1.0:1471388112.177274:0:19130:0:(mdc_request.c:710:mdc_free_open()) @@@ free open request rq_replay= 0 req@ffff8800b4abb900 x1542858500572144/t21474859659(21474859659) o101->lustre-MDT0000-mdc-ffff880bf93c0000@192.168.5.144@o2ib:12/10 lens 816/600 e 0 to 0 dl 1471388117 ref 1 fl Complete:R/4/0 rc 0/0 00010000:00010000:17.0:1471388112.181769:0:19132:0:(ldlm_resource.c:323:lprocfs_lru_size_seq_write()) dropping all unused locks from namespace lustre-OST0000-osc-ffff880bf93c0000 00010000:00010000:17.0:1471388112.181823:0:19132:0:(ldlm_resource.c:323:lprocfs_lru_size_seq_write()) dropping all unused locks from namespace lustre-OST0001-osc-ffff880bf93c0000 00010000:00010000:7.0:1471388112.181886:0:13815:0:(ldlm_request.c:1115:ldlm_cli_cancel_local()) ### client-side cancel ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 2/0,0 mode: PW/PW res: [0x1925:0x0:0x0] .0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x28400000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1 00000080:00200000:7.0:1471388112.181908:0:13815:0:(vvp_io.c:1373:vvp_io_init()) [0x2000013a0:0xe9:0x0] ignore/verify layout 1/0, layout version 0 restore needed 0 00000080:00200000:7.0:1471388112.242363:0:13815:0:(vvp_io.c:312:vvp_io_fini()) [0x2000013a0:0xe9:0x0] ignore/verify layout 1/0, layout version 0 restore needed 0 00010000:00010000:7.0:1471388112.242395:0:13815:0:(ldlm_request.c:1174:ldlm_cancel_pack()) ### packing ns: lustre-OST0001-osc-ffff880bf93c0000 lock: ffff8805fc0c3e00/0x96bbf1af18b6cddd lrc: 1/0,0 mode: - /PW res: [0x1925:0x0:0x0] .0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0 >1048575) flags: 0x4869400000000 nid: local remote: 0x90fceb1b494878f5 expref: -99 pid: 19130 timeout: 0 lvb_type: 1 00010000:00010000:7.0:1471388112.242402:0:13815:0:(ldlm_request.c:1178:ldlm_cancel_pack()) 1 locks packed 00000100:00100000:7.0:1471388112.242414:0:13815:0:(client.c:1620:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ldlm_bl_13:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:13815:1542858500576384:192.168.5.153@o2ib:103 00000100:00100000:7.0:1471388112.242451:0:13815:0:(client.c:2333:ptlrpc_set_wait()) set ffff8805fd730e40 going to sleep for 6 seconds 00000100:00100000:7.0:1471388112.242942:0:13815:0:(client.c:2043:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ldlm_bl_13:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:13815:1542858500576384:192.168.5.153@o2ib:103 ...etc... 00000020:00010000:6.0:1471388113.593184:0:19136:0:(cl_lock.c:58:cl_lock_trace0()) destroy lock: ffff880c1bbe8030 (ffff8806082087e8/1) at cl_lock_fini():89 00000080:00200000:6.0:1471388113.593188:0:19136:0:(vvp_io.c:312:vvp_io_fini()) [0x2000013a0:0xe9:0x0] ignore/verify layout 0/1, layout version 0 restore needed 0 00000080:00200000:6.0:1471388113.593207:0:19136:0:(file.c:3410:__ll_inode_revalidate()) VFS Op:inode= [0x200000007:0x1:0x0] (ffff880bf55fc410),name=/ 00000002:00010000:6.0:1471388113.593212:0:19136:0:(mdc_locks.c:1083:mdc_intent_lock()) (name: , [0x200000007:0x1:0x0] ) in obj [0x200000007:0x1:0x0] , intent: lookup flags 0 00010000:00010000:6.0:1471388113.593216:0:19136:0:(ldlm_lock.c:746:ldlm_lock_addref_internal_nolock()) ### ldlm_lock_addref(PR) ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff880bfae55e00/0x96bbf1af18b6a255 lrc: 2/1,0 mode: PR/PR res: [0x200000007:0x1:0x0] .0x0 bits 0x11 rrc: 2 type: IBT flags: 0x0 nid: local remote: 0x1a97a8993566831b expref: -99 pid: 5162 timeout: 0 lvb_type: 0 00010000:00010000:6.0:1471388113.593225:0:19136:0:(ldlm_lock.c:1474:ldlm_lock_match()) ### matched (0 0) ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff880bfae55e00/0x96bbf1af18b6a255 lrc: 2/1,0 mode: PR/PR res: [0x200000007:0x1:0x0] .0x0 bits 0x11 rrc: 1 type: IBT flags: 0x0 nid: local remote: 0x1a97a8993566831b expref: -99 pid: 5162 timeout: 0 lvb_type: 0 00000080:00010000:6.0:1471388113.593230:0:19136:0:(dcache.c:298:ll_lookup_finish_locks()) setting l_data to inode [0x200000007:0x1:0x0] (ffff880bf55fc410) 00000080:00010000:6.0:1471388113.593232:0:19136:0:(llite_internal.h:1339:ll_set_lock_data()) setting l_data to inode [0x200000007:0x1:0x0] (ffff880bf55fc410) for lock 0x96bbf1af18b6a255 00000080:00010000:6.0:1471388113.593234:0:19136:0:(dcache.c:201:ll_intent_drop_lock()) releasing lock with cookie 0x96bbf1af18b6a255 from it ffff880bf787bbd8 00010000:00010000:6.0:1471388113.593235:0:19136:0:(ldlm_lock.c:800:ldlm_lock_decref_internal_nolock()) ### ldlm_lock_decref(PR) ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff880bfae55e00/0x96bbf1af18b6a255 lrc: 3/1,0 mode: PR/PR res: [0x200000007:0x1:0x0] .0x0 bits 0x11 rrc: 1 type: IBT flags: 0x10000000000000 nid: local remote: 0x1a97a8993566831b expref: -99 pid: 5162 timeout: 0 lvb_type: 0 00010000:00010000:6.0:1471388113.593239:0:19136:0:(ldlm_lock.c:873:ldlm_lock_decref_internal()) ### add lock into lru list ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff880bfae55e00/0x96bbf1af18b6a255 lrc: 2/0,0 mode: PR/PR res: [0x200000007:0x1:0x0] .0x0 bits 0x11 rrc: 1 type: IBT flags: 0x10000000000000 nid: local remote: 0x1a97a8993566831b expref: -99 pid: 5162 timeout: 0 lvb_type: 0 00000080:00200000:6.0:1471388113.593243:0:19136:0:(file.c:3708:ll_inode_permission()) VFS Op:inode= [0x200000007:0x1:0x0] (ffff880bf55fc410), inode mode 41ed mask 1 00000080:00200000:6.0:1471388113.593248:0:19136:0:(dcache.c:361:ll_revalidate_nd()) VFS Op:name=f101f.sanity, flags=0 00000080:00200000:6.0:1471388113.593250:0:19136:0:(file.c:3410:__ll_inode_revalidate()) VFS Op:inode= [0x200000007:0x1:0x0] (ffff880bf55fc410),name=/ 00000002:00010000:6.0:1471388113.593252:0:19136:0:(mdc_locks.c:1083:mdc_intent_lock()) (name: , [0x200000007:0x1:0x0] ) in obj [0x200000007:0x1:0x0] , intent: lookup flags 0 00010000:00010000:6.0:1471388113.593254:0:19136:0:(ldlm_lock.c:746:ldlm_lock_addref_internal_nolock()) ### ldlm_lock_addref(PR) ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff880bfae55e00/0x96bbf1af18b6a255 lrc: 2/1,0 mode: PR/PR res: [0x200000007:0x1:0x0] .0x0 bits 0x11 rrc: 2 type: IBT flags: 0x0 nid: local remote: 0x1a97a8993566831b expref: -99 pid: 5162 timeout: 0 lvb_type: 0 00010000:00010000:6.0:1471388113.593258:0:19136:0:(ldlm_lock.c:1474:ldlm_lock_match()) ### matched (0 0) ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff880bfae55e00/0x96bbf1af18b6a255 lrc: 2/1,0 mode: PR/PR res: [0x200000007:0x1:0x0] .0x0 bits 0x11 rrc: 1 type: IBT flags: 0x0 nid: local remote: 0x1a97a8993566831b expref: -99 pid: 5162 timeout: 0 lvb_type: 0 00000080:00010000:6.0:1471388113.593263:0:19136:0:(dcache.c:298:ll_lookup_finish_locks()) setting l_data to inode [0x200000007:0x1:0x0] (ffff880bf55fc410) 00000080:00010000:6.0:1471388113.593264:0:19136:0:(llite_internal.h:1339:ll_set_lock_data()) setting l_data to inode [0x200000007:0x1:0x0] (ffff880bf55fc410) for lock 0x96bbf1af18b6a255 00000080:00010000:6.0:1471388113.593266:0:19136:0:(dcache.c:201:ll_intent_drop_lock()) releasing lock with cookie 0x96bbf1af18b6a255 from it ffff880bf787bd18 00010000:00010000:6.0:1471388113.593267:0:19136:0:(ldlm_lock.c:800:ldlm_lock_decref_internal_nolock()) ### ldlm_lock_decref(PR) ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff880bfae55e00/0x96bbf1af18b6a255 lrc: 3/1,0 mode: PR/PR res: [0x200000007:0x1:0x0] .0x0 bits 0x11 rrc: 1 type: IBT flags: 0x10000000000000 nid: local remote: 0x1a97a8993566831b expref: -99 pid: 5162 timeout: 0 lvb_type: 0 00010000:00010000:6.0:1471388113.593275:0:19136:0:(ldlm_lock.c:873:ldlm_lock_decref_internal()) ### add lock into lru list ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff880bfae55e00/0x96bbf1af18b6a255 lrc: 2/0,0 mode: PR/PR res: [0x200000007:0x1:0x0] .0x0 bits 0x11 rrc: 1 type: IBT flags: 0x10000000000000 nid: local remote: 0x1a97a8993566831b expref: -99 pid: 5162 timeout: 0 lvb_type: 0 00000080:00200000:6.0:1471388113.593279:0:19136:0:(file.c:3708:ll_inode_permission()) VFS Op:inode= [0x200000007:0x1:0x0] (ffff880bf55fc410), inode mode 41ed mask 3 00000080:00200000:6.0:1471388113.593284:0:19136:0:(namei.c:1323:ll_unlink()) VFS Op:name=f101f.sanity, dir= [0x200000007:0x1:0x0] (ffff880bf55fc410) 00010000:00010000:6.0:1471388113.593293:0:19136:0:(ldlm_request.c:1115:ldlm_cli_cancel_local()) ### client-side cancel ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5a9000/0x96bbf1af18b6cdf9 lrc: 2/0,0 mode: PR/PR res: [0x2000013a0:0xe9:0x0] .0x0 bits 0x1b rrc: 3 type: IBT flags: 0x28400000000 nid: local remote: 0x1a97a8993567018b expref: -99 pid: 19136 timeout: 0 lvb_type: 3 00000080:00200000:6.0:1471388113.593304:0:19136:0:(vvp_object.c:141:vvp_conf_set()) [0x2000013a0:0xe9:0x0] : losing layout lock 00010000:00010000:6.0:1471388113.593310:0:19136:0:(ldlm_request.c:1115:ldlm_cli_cancel_local()) ### client-side cancel ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fc0c4200/0x96bbf1af18b6cdcf lrc: 2/0,0 mode: CR/CR res: [0x2000013a0:0xe9:0x0] .0x0 bits 0x9 rrc: 3 type: IBT flags: 0x28400000000 nid: local remote: 0x1a97a89935670137 expref: -99 pid: 19130 timeout: 0 lvb_type: 3 00000080:00200000:6.0:1471388113.593315:0:19136:0:(vvp_object.c:141:vvp_conf_set()) [0x2000013a0:0xe9:0x0] : losing layout lock 00010000:00010000:6.0:1471388113.593324:0:19136:0:(ldlm_request.c:1174:ldlm_cancel_pack()) ### packing ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5a9000/0x96bbf1af18b6cdf9 lrc: 1/0,0 mode: --/PR res: [0x2000013a0:0xe9:0x0] .0x0 bits 0x1b rrc: 2 type: IBT flags: 0x4829400000000 nid: local remote: 0x1a97a8993567018b expref: -99 pid: 19136 timeout: 0 lvb_type: 3 00010000:00010000:6.0:1471388113.593329:0:19136:0:(ldlm_request.c:1174:ldlm_cancel_pack()) ### packing ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fc0c4200/0x96bbf1af18b6cdcf lrc: 1/0,0 mode: --/CR res: [0x2000013a0:0xe9:0x0] .0x0 bits 0x9 rrc: 2 type: IBT flags: 0x4829400000000 nid: local remote: 0x1a97a89935670137 expref: -99 pid: 19130 timeout: 0 lvb_type: 3 00010000:00010000:6.0:1471388113.593334:0:19136:0:(ldlm_request.c:1178:ldlm_cancel_pack()) 2 locks packed 00010000:00010000:6.0:1471388113.593335:0:19136:0:(ldlm_lock.c:200:ldlm_lock_put()) ### final lock_put on destroyed lock, freeing it. ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5a9000/0x96bbf1af18b6cdf9 lrc: 0/0,0 mode: --/PR res: [0x2000013a0:0xe9:0x0] .0x0 bits 0x1b rrc: 2 type: IBT flags: 0x4829400000000 nid: local remote: 0x1a97a8993567018b expref: -99 pid: 19136 timeout: 0 lvb_type: 3 00010000:00010000:6.0:1471388113.593340:0:19136:0:(ldlm_lock.c:200:ldlm_lock_put()) ### final lock_put on destroyed lock, freeing it. ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fc0c4200/0x96bbf1af18b6cdcf lrc: 0/0,0 mode: --/CR res: [0x2000013a0:0xe9:0x0] .0x0 bits 0x9 rrc: 1 type: IBT flags: 0x4829400000000 nid: local remote: 0x1a97a89935670137 expref: -99 pid: 19130 timeout: 0 lvb_type: 3 00000100:00100000:6.0:1471388113.593354:0:19136:0:(client.c:1620:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc iozone:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:19136:1542858500609344:192.168.5.144@o2ib:36 00000100:00100000:6.0:1471388113.593373:0:19136:0:(client.c:2333:ptlrpc_set_wait()) set ffff8805f6d15c80 going to sleep for 6 seconds 00000100:00100000:6.0:1471388113.594796:0:19136:0:(client.c:2043:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc iozone:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:19136:1542858500609344:192.168.5.144@o2ib:36 00000080:00200000:6.0:1471388113.594832:0:19136:0:(file.c:3410:__ll_inode_revalidate()) VFS Op:inode= [0x200000007:0x1:0x0] (ffff880bf55fc410),name=/ 00000002:00010000:6.0:1471388113.594836:0:19136:0:(mdc_locks.c:1083:mdc_intent_lock()) (name: , [0x200000007:0x1:0x0] ) in obj [0x200000007:0x1:0x0] , intent: lookup flags 0 00010000:00010000:6.0:1471388113.594840:0:19136:0:(ldlm_lock.c:746:ldlm_lock_addref_internal_nolock()) ### ldlm_lock_addref(PR) ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff880bfae55e00/0x96bbf1af18b6a255 lrc: 2/1,0 mode: PR/PR res: [0x200000007:0x1:0x0] .0x0 bits 0x11 rrc: 2 type: IBT flags: 0x0 nid: local remote: 0x1a97a8993566831b expref: -99 pid: 5162 timeout: 0 lvb_type: 0 00010000:00010000:6.0:1471388113.594845:0:19136:0:(ldlm_lock.c:1474:ldlm_lock_match()) ### matched (0 0) ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff880bfae55e00/0x96bbf1af18b6a255 lrc: 2/1,0 mode: PR/PR res: [0x200000007:0x1:0x0] .0x0 bits 0x11 rrc: 1 type: IBT flags: 0x0 nid: local remote: 0x1a97a8993566831b expref: -99 pid: 5162 timeout: 0 lvb_type: 0 00000080:00010000:6.0:1471388113.594850:0:19136:0:(dcache.c:298:ll_lookup_finish_locks()) setting l_data to inode [0x200000007:0x1:0x0] (ffff880bf55fc410) 00000080:00010000:6.0:1471388113.594851:0:19136:0:(llite_internal.h:1339:ll_set_lock_data()) setting l_data to inode [0x200000007:0x1:0x0] (ffff880bf55fc410) for lock 0x96bbf1af18b6a255 00000080:00010000:6.0:1471388113.594853:0:19136:0:(dcache.c:201:ll_intent_drop_lock()) releasing lock with cookie 0x96bbf1af18b6a255 from it ffff880bf787bb40 00010000:00010000:6.0:1471388113.594854:0:19136:0:(ldlm_lock.c:800:ldlm_lock_decref_internal_nolock()) ### ldlm_lock_decref(PR) ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff880bfae55e00/0x96bbf1af18b6a255 lrc: 3/1,0 mode: PR/PR res: [0x200000007:0x1:0x0] .0x0 bits 0x11 rrc: 1 type: IBT flags: 0x10000000000000 nid: local remote: 0x1a97a8993566831b expref: -99 pid: 5162 timeout: 0 lvb_type: 0 00010000:00010000:6.0:1471388113.594859:0:19136:0:(ldlm_lock.c:873:ldlm_lock_decref_internal()) ### add lock into lru list ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff880bfae55e00/0x96bbf1af18b6a255 lrc: 2/0,0 mode: PR/PR res: [0x200000007:0x1:0x0] .0x0 bits 0x11 rrc: 1 type: IBT flags: 0x10000000000000 nid: local remote: 0x1a97a8993566831b expref: -99 pid: 5162 timeout: 0 lvb_type: 0 00000080:00200000:6.0:1471388113.594863:0:19136:0:(file.c:3708:ll_inode_permission()) VFS Op:inode= [0x200000007:0x1:0x0] (ffff880bf55fc410), inode mode 41ed mask 1 00000080:00200000:6.0:1471388113.594868:0:19136:0:(namei.c:665:ll_lookup_nd()) VFS Op:name=f101f.sanity.DUMMY, dir= [0x200000007:0x1:0x0] (ffff880bf55fc410), flags=1 00000080:00200000:6.0:1471388113.594871:0:19136:0:(namei.c:559:ll_lookup_it()) VFS Op:name=f101f.sanity.DUMMY, dir= [0x200000007:0x1:0x0] (ffff880bf55fc410), intent=getattr 00000002:00010000:6.0:1471388113.594875:0:19136:0:(mdc_locks.c:1083:mdc_intent_lock()) (name: f101f.sanity.DUMMY, [0x0:0x0:0x0] ) in obj [0x200000007:0x1:0x0] , intent: getattr flags 0 00010000:00010000:6.0:1471388113.594886:0:19136:0:(ldlm_lock.c:746:ldlm_lock_addref_internal_nolock()) ### ldlm_lock_addref(CR) ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5aa200/0x96bbf1af18b6ce00 lrc: 3/1,0 mode: --/CR res: [0x200000007:0x1:0x0] .0x0 bits 0x0 rrc: 2 type: IBT flags: 0x10000000000000 nid: local remote: 0x0 expref: -99 pid: 19136 timeout: 0 lvb_type: 0 00010000:00010000:6.0:1471388113.594890:0:19136:0:(ldlm_request.c:921:ldlm_cli_enqueue()) ### client-side enqueue START, flags 0x1000 ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5aa200/0x96bbf1af18b6ce00 lrc: 3/1,0 mode: --/CR res: [0x200000007:0x1:0x0] .0x0 bits 0x2 rrc: 2 type: IBT flags: 0x0 nid: local remote: 0x0 expref: -99 pid: 19136 timeout: 0 lvb_type: 0 00010000:00010000:6.0:1471388113.594895:0:19136:0:(ldlm_request.c:976:ldlm_cli_enqueue()) ### sending request ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5aa200/0x96bbf1af18b6ce00 lrc: 3/1,0 mode: --/CR res: [0x200000007:0x1:0x0] .0x0 bits 0x2 rrc: 2 type: IBT flags: 0x0 nid: local remote: 0x0 expref: -99 pid: 19136 timeout: 0 lvb_type: 0 00000100:00100000:6.0:1471388113.594906:0:19136:0:(client.c:1620:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc iozone:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:19136:1542858500609360:192.168.5.144@o2ib:101 00000100:00100000:6.0:1471388113.594918:0:19136:0:(client.c:2333:ptlrpc_set_wait()) set ffff8805f6d15c80 going to sleep for 6 seconds 00000100:00100000:6.0:1471388113.595565:0:19136:0:(client.c:2043:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc iozone:8db7504e-a9aa-cce3-6ec0-a9745ceb5f0c:19136:1542858500609360:192.168.5.144@o2ib:101 00010000:00010000:6.0:1471388113.595586:0:19136:0:(ldlm_request.c:567:ldlm_cli_enqueue_fini()) ### client-side enqueue END (ABORTED) ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5aa200/0x96bbf1af18b6ce00 lrc: 4/1,0 mode: --/CR res: [0x200000007:0x1:0x0] .0x0 bits 0x2 rrc: 2 type: IBT flags: 0x0 nid: local remote: 0x0 expref: -99 pid: 19136 timeout: 0 lvb_type: 0 00010000:00010000:6.0:1471388113.595595:0:19136:0:(ldlm_request.c:513:failed_lock_cleanup()) ### setting FL_LOCAL_ONLY | LDLM_FL_FAILED | LDLM_FL_ATOMIC_CB | LDLM_FL_CBPENDING ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5aa200/0x96bbf1af18b6ce00 lrc: 4/1,0 mode: --/CR res: [0x200000007:0x1:0x0] .0x0 bits 0x2 rrc: 2 type: IBT flags: 0x206400000000 nid: local remote: 0x0 expref: -99 pid: 19136 timeout: 0 lvb_type: 0 00010000:00010000:6.0:1471388113.595600:0:19136:0:(ldlm_lock.c:800:ldlm_lock_decref_internal_nolock()) ### ldlm_lock_decref(CR) ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5aa200/0x96bbf1af18b6ce00 lrc: 4/1,0 mode: --/CR res: [0x200000007:0x1:0x0] .0x0 bits 0x2 rrc: 2 type: IBT flags: 0x10206400000000 nid: local remote: 0x0 expref: -99 pid: 19136 timeout: 0 lvb_type: 0 00010000:00010000:6.0:1471388113.595605:0:19136:0:(ldlm_lock.c:856:ldlm_lock_decref_internal()) ### final decref done on cbpending lock ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5aa200/0x96bbf1af18b6ce00 lrc: 3/0,0 mode: --/CR res: [0x200000007:0x1:0x0] .0x0 bits 0x2 rrc: 2 type: IBT flags: 0x10206400000000 nid: local remote: 0x0 expref: -99 pid: 19136 timeout: 0 lvb_type: 0 00010000:00010000:6.0:1471388113.595610:0:19136:0:(ldlm_lockd.c:1749:ldlm_handle_bl_callback()) ### client blocking AST callback handler ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5aa200/0x96bbf1af18b6ce00 lrc: 4/0,0 mode: --/CR res: [0x200000007:0x1:0x0] .0x0 bits 0x2 rrc: 2 type: IBT flags: 0x206400000000 nid: local remote: 0x0 expref: -99 pid: 19136 timeout: 0 lvb_type: 0 00010000:00010000:6.0:1471388113.595614:0:19136:0:(ldlm_lockd.c:1762:ldlm_handle_bl_callback()) Lock ffff8805fd5aa200 already unused, calling callback (ffffffffa0eceff0) 00010000:00010000:6.0:1471388113.595616:0:19136:0:(ldlm_request.c:1115:ldlm_cli_cancel_local()) ### client-side cancel ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5aa200/0x96bbf1af18b6ce00 lrc: 5/0,0 mode: --/CR res: [0x200000007:0x1:0x0] .0x0 bits 0x2 rrc: 2 type: IBT flags: 0x20e400000000 nid: local remote: 0x0 expref: -99 pid: 19136 timeout: 0 lvb_type: 0 00010000:00010000:6.0:1471388113.595628:0:19136:0:(ldlm_request.c:1128:ldlm_cli_cancel_local()) not sending request (at caller's instruction) 00010000:00010000:6.0:1471388113.595630:0:19136:0:(ldlm_lockd.c:1771:ldlm_handle_bl_callback()) ### client blocking callback handler END ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5aa200/0x96bbf1af18b6ce00 lrc: 3/0,0 mode: --/CR res: [0x200000007:0x1:0x0] .0x0 bits 0x2 rrc: 2 type: IBT flags: 0x4a0f400000000 nid: local remote: 0x0 expref: -99 pid: 19136 timeout: 0 lvb_type: 0 00010000:00010000:6.0:1471388113.595635:0:19136:0:(ldlm_lock.c:200:ldlm_lock_put()) ### final lock_put on destroyed lock, freeing it. ns: lustre-MDT0000-mdc-ffff880bf93c0000 lock: ffff8805fd5aa200/0x96bbf1af18b6ce00 lrc: 0/0,0 mode: --/CR res: [0x200000007:0x1:0x0] .0x0 bits 0x2 rrc: 2 type: IBT flags: 0x4a0f400000000 nid: local remote: 0x0 expref: -99 pid: 19136 timeout: 0 lvb_type: 0 00000002:00100000:6.0:1471388113.595646:0:19136:0:(mdc_locks.c:587:mdc_finish_enqueue()) @@@ op: 8 disposition: 7, status: 0 req@ffff8805f8e1b300 x1542858500609360/t0(0) o101->lustre-MDT0000-mdc-ffff880bf93c0000@192.168.5.144@o2ib:12/10 lens 712/536 e 0 to 0 dl 1471388120 ref 1 fl Complete:R/0/0 rc 301/301 00010000:00010000:6.0:1471388113.595659:0:19136:0:(ldlm_lock.c:1495:ldlm_lock_match()) ### not matched ns ffff8805fa8cb000 type 13 mode 30 res 8589934599/1 (0 0) This looks like LU-8413 and test_21 which had a TIMEOUT and test failed to respond and it timed out. It appeared the OSS node ran out of memory during the test which caused processes including system.journal to be killed. It appears that this caused the dd on the client to be continually blocked for me than 120 seconds and never make progress. 18:57:54:[ 35.401655] Mem-Info: 18:57:54:[ 35.405029] Node 0 DMA per-cpu: 18:57:54:[ 35.409384] CPU 0: hi: 0, btch: 1 usd: 0 18:57:54:[ 35.415581] Node 0 DMA32 per-cpu: 18:57:54:[ 35.420128] CPU 0: hi: 42, btch: 7 usd: 36 18:57:54:[ 35.426323] active_anon:632 inactive_anon:2092 isolated_anon:0 18:57:54:[ 35.426323] active_ file:0 inactive_ file:0 isolated_ file:0 18:57:54:[ 35.426323] unevictable:18297 dirty:0 writeback:0 unstable:0 18:57:54:[ 35.426323] free:499 slab_reclaimable:1742 slab_unreclaimable:4599 18:57:54:[ 35.426323] mapped:748 shmem:2129 pagetables:74 bounce:0 18:57:54:[ 35.426323] free_cma:0 18:57:54:[ 35.466224] Node 0 DMA free:488kB min:4kB low:4kB high:4kB active_anon:0kB inactive_anon:0kB active_ file:0kB inactive_ file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:572kB managed:488kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes 18:57:54:[ 35.516862] lowmem_reserve[]: 0 140 140 140 18:57:54:[ 35.523578] Node 0 DMA32 free:1508kB min:1512kB low:1888kB high:2268kB active_anon:2528kB inactive_anon:8368kB active_ file:0kB inactive_ file:0kB unevictable:73188kB isolated(anon):0kB isolated(file):0kB present:166320kB managed:145724kB mlocked:0kB dirty:0kB writeback:0kB mapped:2992kB shmem:8516kB slab_reclaimable:6968kB slab_unreclaimable:18396kB kernel_stack:1344kB pagetables:296kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes 18:57:54:[ 35.579210] lowmem_reserve[]: 0 0 0 0 18:57:54:[ 35.585507] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 488kB 18:57:54:[ 35.603198] Node 0 DMA32: 183*4kB (UEMR) 49*8kB (UR) 14*16kB (R) 5*32kB (R) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1508kB 18:57:54:[ 35.621271] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB 18:57:54:[ 35.632651] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB 18:57:54:[ 35.643597] 20426 total pagecache pages 18:57:54:[ 35.649313] 0 pages in swap cache 18:57:54:[ 35.654428] Swap cache stats: add 0, delete 0, find 0/0 18:57:54:[ 35.661603] Free swap = 0kB 18:57:54:[ 35.666143] Total swap = 0kB 18:57:54:[ 35.670610] 41723 pages RAM 18:57:54:[ 35.674975] 0 pages HighMem/MovableOnly 18:57:54:[ 35.680511] 5170 pages reserved 18:57:54:[ 35.685199] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name 18:57:54:[ 35.695151] [ 200] 0 200 4289 99 12 0 0 rpc.idmapd 18:57:54:[ 35.705657] [ 227] 0 227 8998 639 21 0 -1000 systemd-udevd 18:57:54:[ 35.716393] [ 399] 0 399 7157 316 17 0 0 systemd-journal 18:57:54:[ 35.727277] Out of memory: Kill process 399 (systemd-journal) score 8 or sacrifice child 18:57:54:[ 35.737359] Killed process 399 (systemd-journal) total-vm:28628kB, anon-rss:228kB, file-rss:1036kB 18:57:54:[ 35.771271] systemd [1] : systemd-journald.service: main process exited, code=killed, status=9/KILL 18:57:54:[FAILED [0m] Failed to start Journal Service. 18:57:54:See 'systemctl status systemd-journald.service' for details. 18:57:54:[ 35.796637] systemd [1] : Failed to start Journal Service. 18:57:54:[ 35.803704] systemd [1] : Unit systemd-journald.service entered failed state. 18:57:54:[ 35.812630] systemd [1] : systemd-journald.service failed. 18:57:54:[ 35.820624] systemd [1] : systemd-journald.service has no holdoff time, scheduling restart. 18:57:54:[ OK  [0m] Stopped Journal Service. 18:57:54:[ 35.836694] systemd [1] : start request repeated too quickly for systemd-journald.service 18:57:54:[FAILED [0m] Failed to start Journal Service. 18:57:54:See 'systemctl status systemd-journald.service' for details. 18:57:54:[ 35.860669] systemd [1] : Failed to start Journal Service. 18:57:54:[ 35.889869] systemd [1] : Unit systemd-journald.service entered failed state. 18:57:54:[ 35.898766] systemd [1] : systemd-journald.service failed. 18:57:54:[ 35.905807] systemd [1] : start request repeated too quickly for systemd-journald.service 18:57:54:[FAILED [0m] Failed to start Journal Service. 18:57:54:See 'systemctl status systemd-journald.service' for details. 18:57:54:[ 35.930716] systemd [1] : Failed to start Journal Service. 18:57:54:[ 35.937928] systemd-udevd [227] : worker [384] failed while handling '/devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:08.0/0000:03:00.0/host0/port-0:0/end_device-0:0/target0:0:0/0:0:0:0/block/sda/sda2' This looks like LU-8326 Will get a build going and see how it looks on my end.

            Build #139 (Aug 16, 2016 7:48:23 PM)
            add description
            Changes LU-8498 nodemap: new zfs index files not properly initialized (detail)
            Manually triggered by user johnsali for Gerrit: http://review.whamcloud.com/21939 in silent mode.
            Commit 3f2b10c706db51b156bec4516ca277e116663482 by kit.westneat
            LU-8498 nodemap: new zfs index files not properly initialized
            Calling index ->next on a new zfs returns a non-zero RC, but ldiskfs
            indexes start with a blank record. This change modifies the config load
            code to always write the default nodemap to an empty index file.
            Signed-off-by: Kit Westneat <kit.westneat@gmail.com> Change-Id:
            I30a365f65463979889f09f7ad5ffcdacc83fa868

            loadjenkinsbuild -b 139 -j zfs-c-p-lustre-patch-vanilla -t server -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-4 -r >> ljb_log
            loadjenkinsbuild -b 139 -j zfs-c-p-lustre-patch-vanilla -t server -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-5 -r >> ljb_log
            loadjenkinsbuild -b 139 -j zfs-c-p-lustre-patch-vanilla -t server -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-3 -r >> ljb_log
            loadjenkinsbuild -b 139 -j zfs-c-p-lustre-patch-vanilla -t client -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-6 -r >> ljb_log
            loadjenkinsbuild -b 139 -j zfs-c-p-lustre-patch-vanilla -t client -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-7 -r >> ljb_log
            loadjenkinsbuild -b 139 -j zfs-c-p-lustre-patch-vanilla -t client -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-8 -r >> ljb_log
            loadjenkinsbuild -b 139 -j zfs-c-p-lustre-patch-vanilla -t client -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-9 -r >> ljb_log

            Same error for me:

            [root@wolf-3 mdisolation_step2_draid_testing]# zpool status -v ost0
            pool: ost0
            state: ONLINE
            scan: none requested
            config:

            NAME STATE READ WRITE CKSUM
            ost0 ONLINE 0 0 0
            draid1-0

            {any}

            ONLINE 0 0 0
            mpathaj ONLINE 0 0 0
            mpathai ONLINE 0 0 0
            mpathah ONLINE 0 0 0
            mpathag ONLINE 0 0 0
            mpathaq ONLINE 0 0 0
            mpathap ONLINE 0 0 0
            mpathak ONLINE 0 0 0
            mpathz ONLINE 0 0 0
            mpatham ONLINE 0 0 0
            mpathal ONLINE 0 0 0
            mpathao ONLINE 0 0 0
            mirror-1

            {ddt,dmu,mos}

            ONLINE 0 0 0
            mpathae ONLINE 0 0 0
            mpathaa ONLINE 0 0 0
            spares
            $draid1-0-s0 AVAIL

            errors: No known data errors
            [root@wolf-3 mdisolation_step2_draid_testing]# zpool get all ost0 |grep large_blocks
            ost0 feature@large_blocks enabled local
            [root@wolf-3 mdisolation_step2_draid_testing]# zfs list
            NAME USED AVAIL REFER MOUNTPOINT
            ost0 1.35M 7.01T 19K /ost0
            ost0/ost0 1.21M 7.01T 1.21M /ost0/ost0
            [root@wolf-3 mdisolation_step2_draid_testing]# mount -t lustre ost0/ost0 /mnt/lustre/ost0
            mount.lustre: mount ost0/ost0 at /mnt/lustre/ost0 failed: No such file or directory
            Is the MGS specification correct?
            Is the filesystem name correct?
            If upgrading, is the copied client log valid? (see upgrade docs)

            [ 4669.245843] LustreError: 13a-8: Failed to get MGS log lsdraid-OST0000 and no local copy.
            [ 4669.255978] LustreError: 15c-8: MGC192.168.1.5@o2ib: The configuration from log 'lsdraid-OST0000' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
            [ 4669.284565] LustreError: 48267:0:(obd_mount_server.c:1352:server_start_targets()) failed to start server lsdraid-OST0000: -2
            [ 4669.303204] LustreError: 48267:0:(obd_mount_server.c:1844:server_fill_super()) Unable to start targets: -2
            [ 4669.303994] LustreError: 48267:0:(obd_mount_server.c:1558:server_put_super()) no obd lsdraid-OST0000
            [ 4669.465491] Lustre: server umount lsdraid-OST0000 complete
            [ 4669.472711] LustreError: 48267:0:(obd_mount.c:1453:lustre_fill_super()) Unable to mount (-2)

            jsalians_intel John Salinas (Inactive) added a comment - Build #139 (Aug 16, 2016 7:48:23 PM) add description Changes LU-8498 nodemap: new zfs index files not properly initialized (detail) Manually triggered by user johnsali for Gerrit: http://review.whamcloud.com/21939 in silent mode. Commit 3f2b10c706db51b156bec4516ca277e116663482 by kit.westneat LU-8498 nodemap: new zfs index files not properly initialized Calling index ->next on a new zfs returns a non-zero RC, but ldiskfs indexes start with a blank record. This change modifies the config load code to always write the default nodemap to an empty index file. Signed-off-by: Kit Westneat <kit.westneat@gmail.com> Change-Id: I30a365f65463979889f09f7ad5ffcdacc83fa868 loadjenkinsbuild -b 139 -j zfs-c-p-lustre-patch-vanilla -t server -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-4 -r >> ljb_log loadjenkinsbuild -b 139 -j zfs-c-p-lustre-patch-vanilla -t server -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-5 -r >> ljb_log loadjenkinsbuild -b 139 -j zfs-c-p-lustre-patch-vanilla -t server -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-3 -r >> ljb_log loadjenkinsbuild -b 139 -j zfs-c-p-lustre-patch-vanilla -t client -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-6 -r >> ljb_log loadjenkinsbuild -b 139 -j zfs-c-p-lustre-patch-vanilla -t client -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-7 -r >> ljb_log loadjenkinsbuild -b 139 -j zfs-c-p-lustre-patch-vanilla -t client -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-8 -r >> ljb_log loadjenkinsbuild -b 139 -j zfs-c-p-lustre-patch-vanilla -t client -p test-el7.2-x86_64 -d el7.2 -a x86_64 -v -n wolf-9 -r >> ljb_log Same error for me: [root@wolf-3 mdisolation_step2_draid_testing] # zpool status -v ost0 pool: ost0 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM ost0 ONLINE 0 0 0 draid1-0 {any} ONLINE 0 0 0 mpathaj ONLINE 0 0 0 mpathai ONLINE 0 0 0 mpathah ONLINE 0 0 0 mpathag ONLINE 0 0 0 mpathaq ONLINE 0 0 0 mpathap ONLINE 0 0 0 mpathak ONLINE 0 0 0 mpathz ONLINE 0 0 0 mpatham ONLINE 0 0 0 mpathal ONLINE 0 0 0 mpathao ONLINE 0 0 0 mirror-1 {ddt,dmu,mos} ONLINE 0 0 0 mpathae ONLINE 0 0 0 mpathaa ONLINE 0 0 0 spares $draid1-0-s0 AVAIL errors: No known data errors [root@wolf-3 mdisolation_step2_draid_testing] # zpool get all ost0 |grep large_blocks ost0 feature@large_blocks enabled local [root@wolf-3 mdisolation_step2_draid_testing] # zfs list NAME USED AVAIL REFER MOUNTPOINT ost0 1.35M 7.01T 19K /ost0 ost0/ost0 1.21M 7.01T 1.21M /ost0/ost0 [root@wolf-3 mdisolation_step2_draid_testing] # mount -t lustre ost0/ost0 /mnt/lustre/ost0 mount.lustre: mount ost0/ost0 at /mnt/lustre/ost0 failed: No such file or directory Is the MGS specification correct? Is the filesystem name correct? If upgrading, is the copied client log valid? (see upgrade docs) [ 4669.245843] LustreError: 13a-8: Failed to get MGS log lsdraid-OST0000 and no local copy. [ 4669.255978] LustreError: 15c-8: MGC192.168.1.5@o2ib: The configuration from log 'lsdraid-OST0000' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. [ 4669.284565] LustreError: 48267:0:(obd_mount_server.c:1352:server_start_targets()) failed to start server lsdraid-OST0000: -2 [ 4669.303204] LustreError: 48267:0:(obd_mount_server.c:1844:server_fill_super()) Unable to start targets: -2 [ 4669.303994] LustreError: 48267:0:(obd_mount_server.c:1558:server_put_super()) no obd lsdraid-OST0000 [ 4669.465491] Lustre: server umount lsdraid-OST0000 complete [ 4669.472711] LustreError: 48267:0:(obd_mount.c:1453:lustre_fill_super()) Unable to mount (-2)

            Kit Westneat (kit.westneat@gmail.com) uploaded a new patch: http://review.whamcloud.com/21939
            Subject: LU-8498 nodemap: new zfs index files not properly initialized
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e5097f35594e265803a363f4e2e6fc6ea62f62fc

            gerrit Gerrit Updater added a comment - Kit Westneat (kit.westneat@gmail.com) uploaded a new patch: http://review.whamcloud.com/21939 Subject: LU-8498 nodemap: new zfs index files not properly initialized Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e5097f35594e265803a363f4e2e6fc6ea62f62fc

            People

              kit.westneat Kit Westneat (Inactive)
              jay Jinshan Xiong (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: