Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16905

sanity-quota/18 Failure (possible due to incorrect timeout)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      == sanity-quota test 18: MDS failover while writing, no watchdog triggered (b14840) ========================================================== 08:41:17 (1686832877)
      sleep 5 for ZFS zfs
      Waiting for MDT destroys to complete
      Creating test directory
      fail_val=0
      fail_loc=0
      Waiting 90s for 'u'
      Updated after 2s: want 'u' got 'u'
      User quota (limit: 200)
      Disk quotas for usr quota_usr (uid 60000):
      Filesystem kbytes quota limit grace files quota limit grace
      /mnt/lustre 0 0 204800 - 0 0 0 -
      lustre-MDT0000_UUID
      0 - 0 - 0 - 0 -
      lustre-OST0000_UUID
      0 - 0 - - - - -
      lustre-OST0001_UUID
      0 - 0 - - - - -
      Total allocated inode limit: 0, total allocated block limit: 0
      sysctl: cannot stat /proc/sys/lustre/timeout: No such file or directory
      Write 100M (buffered) ...
      running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
      [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d18.sanity-quota/f18.sanity-quota] [count=100]
      UUID 1K-blocks Used Available Use% Mounted on
      lustre-MDT0000_UUID 2210688 4096 2204544 1% /mnt/lustre[MDT:0]
      lustre-OST0000_UUID 3771392 3072 3748864 1% /mnt/lustre[OST:0]
      lustre-OST0001_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:1]

      filesystem_summary: 7542784 6144 7515136 1% /mnt/lustre

      Fail mds for 0 seconds
      Failing mds1 on oleg365-server
      Stopping /mnt/lustre-mds1 (opts on oleg365-server
      08:41:31 (1686832891) shut down
      Failover mds1 to oleg365-server
      mount facets: mds1
      Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1
      oleg365-server: oleg365-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 8
      pdsh@oleg365-client: oleg365-server: ssh exited with exit code 1
      Started lustre-MDT0000
      08:41:48 (1686832908) targets are mounted
      08:41:48 (1686832908) facet_failover done
      oleg365-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
      mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
      100+0 records in
      100+0 records out
      104857600 bytes (105 MB) copied, 48.3932 s, 2.2 MB/s
      (dd_pid=1833, time=25, timeout=600)
      Disk quotas for usr quota_usr (uid 60000):
      Filesystem kbytes quota limit grace files quota limit grace
      /mnt/lustre 98310 0 204800 - 1 0 0 -
      lustre-MDT0000_UUID
      2* - 2 - 1 - 0 -
      lustre-OST0000_UUID
      98309 - 114688 - - - - -
      lustre-OST0001_UUID
      0 - 0 - - - - -
      Total allocated inode limit: 0, total allocated block limit: 114688
      Delete files...
      Wait for unlink objects finished...
      sleep 5 for ZFS zfs
      sleep 5 for ZFS zfs
      Waiting for MDT destroys to complete
      sleep 5 for ZFS zfs
      Waiting for MDT destroys to complete
      Creating test directory
      fail_val=0
      fail_loc=0
      User quota (limit: 200)
      Disk quotas for usr quota_usr (uid 60000):
      Filesystem kbytes quota limit grace files quota limit grace
      /mnt/lustre 0 0 204800 - 0 0 0 -
      lustre-MDT0000_UUID
      0 - 0 - 0 - 0 -
      lustre-OST0000_UUID
      0 - 0 - - - - -
      lustre-OST0001_UUID
      0 - 0 - - - - -
      Total allocated inode limit: 0, total allocated block limit: 0
      sysctl: cannot stat /proc/sys/lustre/timeout: No such file or directory
      Write 100M (directio) ...
      running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
      [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d18.sanity-quota/f18.sanity-quota] [count=100] [oflag=direct]
      UUID 1K-blocks Used Available Use% Mounted on
      lustre-MDT0000_UUID 2210560 3840 2204672 1% /mnt/lustre[MDT:0]
      lustre-OST0000_UUID 3771392 3072 3758080 1% /mnt/lustre[OST:0]
      lustre-OST0001_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:1]

      filesystem_summary: 7542784 6144 7524352 1% /mnt/lustre

      Fail mds for 0 seconds
      Failing mds1 on oleg365-server
      Stopping /mnt/lustre-mds1 (opts on oleg365-server
      08:42:46 (1686832966) shut down
      Failover mds1 to oleg365-server
      mount facets: mds1
      Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1
      oleg365-server: oleg365-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 8
      pdsh@oleg365-client: oleg365-server: ssh exited with exit code 1
      Started lustre-MDT0000
      08:43:02 (1686832982) targets are mounted
      08:43:02 (1686832982) facet_failover done
      oleg365-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
      mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
      100+0 records in
      100+0 records out
      104857600 bytes (105 MB) copied, 52.5656 s, 2.0 MB/s
      (dd_pid=4187, time=30, timeout=600)
      Disk quotas for usr quota_usr (uid 60000):
      Filesystem kbytes quota limit grace files quota limit grace
      /mnt/lustre 102407 0 204800 - 1 0 0 -
      lustre-MDT0000_UUID
      2* - 2 - 1 - 0 -
      lustre-OST0000_UUID
      102406 - 107525 - - - - -
      lustre-OST0001_UUID
      0 - 0 - - - - -
      Total allocated inode limit: 0, total allocated block limit: 107525
      Delete files...
      Wait for unlink objects finished...
      sleep 5 for ZFS zfs
      sleep 5 for ZFS zfs
      Waiting for MDT destroys to complete
      sanity-quota test_18: @@@@@@ FAIL: [ 2836.180747] Lustre: ll_ost_io00_004: service thread pid 27906 was inactive for 40.067 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Trace dump:
      = /home/green/git/lustre-release/lustre/tests/test-framework.sh:6566:error()
      = /home/green/git/lustre-release/lustre/tests/sanity-quota.sh:2945:test_18()
      = /home/green/git/lustre-release/lustre/tests/test-framework.sh:6906:run_one()
      = /home/green/git/lustre-release/lustre/tests/test-framework.sh:6955:run_one_logged()
      = /home/green/git/lustre-release/lustre/tests/test-framework.sh:6792:run_test()
      = /home/green/git/lustre-release/lustre/tests/sanity-quota.sh:2948:main()
      Dumping lctl log to /tmp/testlogs//sanity-quota.test_18.*.1686833039.log
      Delete files...
      Wait for unlink objects finished...
      rsync: chown "/tmp/testlogs/.sanity-quota.test_18.debug_log.oleg365-server.1686833039.log.4knRXN" failed: Operation not permitted (1)
      rsync: chown "/tmp/testlogs/.sanity-quota.test_18.dmesg.oleg365-server.1686833039.log.Nvxt9o" failed: Operation not permitted (1)
      rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1651) [generator=3.1.2]
      sleep 5 for ZFS zfs
      Waiting for MDT destroys to complete
      Delete files...
      Wait for unlink objects finished...
      sleep 5 for ZFS zfs
      Waiting for MDT destroys to complete

      Attachments

        Activity

          People

            arshad512 Arshad Hussain
            arshad512 Arshad Hussain
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: