Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11387

sanityn test_34: lock timeout happened

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.13.0, Lustre 2.12.3
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/4080e38e-b988-11e8-9df3-52540065bddc

      test_34 failed with the following error:

      lock timeout happened
      

      This is a new failure starts from tag-2.11.55

      test log

      == sanityn test 34: no lock timeout under IO ========================================================= 02:25:24 (1536978324)
      CMD: trevis-33vm3 lctl get_param -n ldlm.namespaces.filter-*.lock_timeouts
      CMD: trevis-33vm3 lctl set_param fail_loc=0x512
      fail_loc=0x512
      CMD: trevis-33vm3 lctl set_param fail_loc=0x512
      fail_loc=0x512
      CMD: trevis-33vm3 lctl set_param fail_loc=0x512
      fail_loc=0x512
      CMD: trevis-33vm3 lctl set_param fail_loc=0x512
      fail_loc=0x512
      CMD: trevis-33vm3 lctl set_param fail_loc=0x512
      fail_loc=0x512
      CMD: trevis-33vm3 lctl set_param fail_loc=0x512
      fail_loc=0x512
      CMD: trevis-33vm3 lctl set_param fail_loc=0x512
      fail_loc=0x512
      lock should not expire
      writing on client1
      reading on client2
      CMD: trevis-33vm3 lctl get_param -n ldlm.namespaces.filter-*.lock_timeouts
       sanityn test_34: @@@@@@ FAIL: lock timeout happened 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:5764:error()
        = /usr/lib64/lustre/tests/sanityn.sh:1190:test_34()
        = /usr/lib64/lustre/tests/test-framework.sh:6045:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:6084:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:5931:run_test()
        = /usr/lib64/lustre/tests/sanityn.sh:1198:main()
      Dumping lctl log to /autotest/autotest2/2018-09-14/lustre-master-el7-x86_64--full--1_1_1__3792___ae7d30c1-91f6-4521-85bc-1217e7c76208/sanityn.test_34.*.1536978344.log
      CMD: trevis-33vm1.trevis.whamcloud.com,trevis-33vm2,trevis-33vm3,trevis-33vm4 /usr/sbin/lctl dk > /autotest/autotest2/2018-09-14/lustre-master-el7-x86_64--full--1_1_1__3792___ae7d30c1-91f6-4521-85bc-1217e7c76208/sanityn.test_34.debug_log.\$(hostname -s).1536978344.log;
               dmesg > /autotest/autotest2/2018-09-14/lustre-master-el7-x86_64--full--1_1_1__3792___ae7d30c1-91f6-4521-85bc-1217e7c76208/sanityn.test_34.dmesg.\$(hostname -s).1536978344.log
      Resetting fail_loc on all nodes...CMD: trevis-33vm1.trevis.whamcloud.com,trevis-33vm2,trevis-33vm3,trevis-33vm4 lctl set_param -n fail_loc=0 	    fail_val=0 2>/dev/null
      done.
      CMD: trevis-33vm3 lctl set_param -n fail_loc=0 2>/dev/null || true
      CMD: trevis-33vm1.trevis.whamcloud.com lctl get_param -n at_min
      CMD: trevis-33vm1.trevis.whamcloud.com PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin::/sbin:/bin:/usr/sbin: NAME=autotest_config bash rpc.sh wait_import_state \(FULL\|IDLE\) osc.lustre-OST0000-osc-ffff9de2562cf000.ost_server_uuid
      osc.lustre-OST0000-osc-ffff9de25e57e000.ost_server_uuid 40 
      == rpc test complete, duration -o sec ================================================================ 02:25:47 (1536978347)
      trevis-33vm1.trevis.whamcloud.com: executing wait_import_state (FULL|IDLE) osc.lustre-OST0000-osc-ffff9de2562cf000.ost_server_uuid osc.lustre-OST0000-osc-ffff9de25e57e000.ost_server_uuid 40
      osc.lustre-OST0000-osc-ffff9de2562cf000.ost_server_uuid in IDLE state after 0 sec
      CMD: trevis-33vm1.trevis.whamcloud.com lctl get_param -n at_min
      CMD: trevis-33vm1.trevis.whamcloud.com PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin::/sbin:/bin:/usr/sbin: NAME=autotest_config bash rpc.sh wait_import_state \(FULL\|IDLE\) osc.lustre-OST0001-osc-ffff9de2562cf000.ost_server_uuid
      osc.lustre-OST0001-osc-ffff9de25e57e000.ost_server_uuid 40 
      == rpc test complete, duration -o sec ================================================================ 02:25:48 (1536978348)
      

      OST dmesg

      [21528.561273] Lustre: *** cfs_fail_loc=512, val=0***
      [21529.151161] Lustre: *** cfs_fail_loc=512, val=0***
      [21529.163524] LustreError: 6657:0:(ldlm_lockd.c:251:expired_lock_main()) ### lock callback timer expired after 1s: evicting client at 10.9.5.158@tcp  ns: filter-lustre-OST0001_UUID lock: ffff8c6a7596e240/0x7408246bca0d1d1c lrc: 3/0,0 mode: PW/PW res: [0x433a:0x0:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x60000400000020 nid: 10.9.5.158@tcp remote: 0x7280a0e6030637b7 expref: 9 pid: 8833 timeout: 21529 lvb_type: 0
      [21529.169313] Lustre: *** cfs_fail_loc=512, val=0***
      [21529.169943] LustreError: 29618:0:(ldlm_lockd.c:2263:ldlm_cancel_handler()) ldlm_cancel from 10.9.5.158@tcp arrived at 1536978332 with bad export cookie 8360972753583283470
      [21529.169966] Lustre: *** cfs_fail_loc=512, val=0***
      [21529.172542] Lustre: *** cfs_fail_loc=512, val=0***
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanityn test_34 - lock timeout happened

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: