Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19674

ost-pools test_25 hung

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Medium
    • Lustre 2.18.0
    • Lustre 2.17.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for jianyu <yujian@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/7318d056-303a-4312-af07-6e96e5ad1e37

      test_25 failed with the following error:

      == ost-pools test 25: Create new pool and restart MDS ==== 21:05:29 (1764709529)
      CMD: onyx-80vm11 lctl pool_new lustre.testpool1
      onyx-80vm11: Pool lustre.testpool1 created
      CMD: onyx-80vm11 lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.testpool1 				2>/dev/null || echo foo
      CMD: onyx-80vm12 lctl get_param -n lod.lustre-MDT0001-mdtlov.pools.testpool1 				2>/dev/null || echo foo
      CMD: onyx-80vm11 lctl get_param -n lod.lustre-MDT0002-mdtlov.pools.testpool1 				2>/dev/null || echo foo
      CMD: onyx-80vm1.onyx.whamcloud.com lctl get_param -n lov.lustre-*.pools.testpool1 		2>/dev/null || echo foo
      CMD: onyx-80vm11 lctl pool_add lustre.testpool1 OST0000; sync
      onyx-80vm11: OST lustre-OST0000_UUID added to pool lustre.testpool1
      CMD: onyx-80vm1.onyx.whamcloud.com lctl get_param -n 			lov.lustre-*.pools.testpool1 | sort -u |
      			tr '\n' ' ' 
      Failing mds1 on onyx-80vm11
      CMD: onyx-80vm11 grep -c /mnt/lustre-mds1' ' /proc/mounts || true
      Stopping /mnt/lustre-mds1 (opts:) on onyx-80vm11
      CMD: onyx-80vm11 umount -d /mnt/lustre-mds1
      CMD: onyx-80vm11 lsmod | grep lnet > /dev/null &&
      lctl dl | grep ' ST ' || true
      21:05:36 (1764709536) shut down
      facet: mds1 facet_host: onyx-80vm11 facet_failover_host: onyx-80vm11
      Failover mds1 to onyx-80vm11
      mount facets: mds1
      CMD: onyx-80vm11 dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1
      CMD: onyx-80vm11 dmsetup status /dev/mapper/mds1_flakey 2>&1
      CMD: onyx-80vm11 test -b /dev/mapper/mds1_flakey
      CMD: onyx-80vm11 e2label /dev/mapper/mds1_flakey
      Start mds1: mount -t lustre -o localrecov  /dev/mapper/mds1_flakey /mnt/lustre-mds1
      CMD: onyx-80vm11 mkdir -p /mnt/lustre-mds1; mount -t lustre -o localrecov  /dev/mapper/mds1_flakey /mnt/lustre-mds1
      CMD: onyx-80vm11 /usr/sbin/lctl get_param -n health_check
      CMD: onyx-80vm11 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/opt/iozone/bin:/usr/lib64/openmpi/bin:/usr/share/Modules/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config 		TESTLOG_PREFIX=/autotest/autotest-2/2025-12-02/lustre-master_full-dne-part-1_4677_7_48951508-b72b-41b2-b97c-9a9735efc9b9//ost-pools TESTNAME=test_25 		CONFIG=/usr/lib64/lustre/tests/cfg/autotest_config.sh bash rpc.sh set_default_debug \"vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck\" \"all\" 
      onyx-80vm11: CMD: onyx-80vm11 hostname -I
      onyx-80vm11: onyx-80vm11.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
      CMD: onyx-80vm11 e2label /dev/mapper/mds1_flakey 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
      pdsh@onyx-80vm1: onyx-80vm11: ssh exited with exit code 1
      CMD: onyx-80vm11 e2label /dev/mapper/mds1_flakey 2>/dev/null
      Started lustre-MDT0000
      21:05:51 (1764709551) targets are mounted
      21:05:51 (1764709551) facet_failover done
      CMD: onyx-80vm11 lctl get_param -n at_min
      CMD: onyx-80vm11 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/opt/iozone/bin:/usr/lib64/openmpi/bin:/usr/share/Modules/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config 		TESTLOG_PREFIX=/autotest/autotest-2/2025-12-02/lustre-master_full-dne-part-1_4677_7_48951508-b72b-41b2-b97c-9a9735efc9b9//ost-pools TESTNAME=test_25 		CONFIG=/usr/lib64/lustre/tests/cfg/autotest_config.sh bash rpc.sh wait_import_state FULL os[cp].lustre-OST0000-osc-MDT0000.ost_server_uuid 50 
      onyx-80vm11: CMD: onyx-80vm11 hostname -I
      onyx-80vm11: onyx-80vm11.onyx.whamcloud.com: executing wait_import_state FULL os[cp].lustre-OST0000-osc-MDT0000.ost_server_uuid 50
      onyx-80vm11: os[cp].lustre-OST0000-osc-MDT0000.ost_server_uuid in FULL state after 0 sec
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-master/4677 - 4.18.0-553.85.1.el8_10.x86_64
      servers: https://build.whamcloud.com/job/lustre-master/4677 - 4.18.0-553.85.1.el8_lustre.x86_64

      <<Please provide additional information about the failure here>>

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      ost-pools test_25 - Timeout occurred after 839 minutes, last suite running was ost-pools

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: