Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18237

recovery-small: test 10a fails with 'no eviction: before:1726673307'

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Minor Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

      This issue was created by maloo for James Simmons <uja.ornl@gmail.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/64bcf39a-0c5d-4ebe-aa8e-07c6118719cd

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-reviews/107661 - 5.14.0-362.24.1.el9_3.x86_64
      servers: https://build.whamcloud.com/job/lustre-reviews/107661 - 5.14.0-362.24.1_lustre.el9.x86_64

      <<Please provide additional information about the failure here>>

      current_state: FULL
      state_history:

      • [ 1726673245, DISCONN ]
      • [ 1726673245, CONNECTING ]
      • [ 1726673245, RECOVER ]
      • [ 1726673245, FULL ]
      • [ 1726673262, DISCONN ]
      • [ 1726673262, CONNECTING ]
      • [ 1726673262, RECOVER ]
      • [ 1726673262, FULL ]
      • [ 1726673279, DISCONN ]
      • [ 1726673279, CONNECTING ]
      • [ 1726673279, RECOVER ]
      • [ 1726673279, FULL ]
      • [ 1726673297, DISCONN ]
      • [ 1726673297, CONNECTING ]
      • [ 1726673297, RECOVER ]
      • [ 1726673297, FULL ]
        mdc.lustre-MDT0001-mdc-ffff88f406920000.state=
        current_state: FULL
        state_history:
      • [ 1726671294, CONNECTING ]
      • [ 1726671295, FULL ]
        mdc.lustre-MDT0002-mdc-ffff88f406920000.state=
        current_state: FULL
        state_history:
      • [ 1726671294, CONNECTING ]
      • [ 1726671295, FULL ]
        mdc.lustre-MDT0003-mdc-ffff88f406920000.state=
        current_state: FULL
        state_history:
      • [ 1726671294, CONNECTING ]
      • [ 1726671295, FULL ]
        recovery-small test_10a: @@@@@@ FAIL: no eviction: before:1726673307
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:7177:error()
        = /usr/lib64/lustre/tests/recovery-small.sh:154:test_10a()
        = /usr/lib64/lustre/tests/test-framework.sh:7522:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:7585:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:7408:run_test()
        = /usr/lib64/lustre/tests/recovery-small.sh:171:main()
        Dumping lctl log to /autotest/autotest-1/2024-09-18/lustre-reviews_review-dne-part-5_107661_16_4f1e33f4-b320-40c7-8af0-801d1ab9dc56//recovery-small.test_10a.*.1726673420.log
        CMD: trevis-24vm7,trevis-56vm1.trevis.whamcloud.com,trevis-56vm2,trevis-56vm3,trevis-83vm7 /usr/sbin/lctl dk > /autotest/autotest-1/2024-09-18/lustre-reviews_review-dne-part-5_107661_16_4f1e33f4-b320-40c7-8af0-801d1ab9dc56//recovery-small.test_10a.debug_log.\$(hostname -s).1726673420.log;
        dmesg > /autotest/autotest-1/2024-09-18/lustre-reviews_review-dne-part-5_107661_16_4f1e33f4-b320-40c7-8af0-801d1ab9dc56//recovery-small.test_10a.dmesg.\$(hostname -s).1726673420.log
        CMD: trevis-56vm1.trevis.whamcloud.com checkstat -v -p 0777 /mnt/lustre
        /mnt/lustre has perms 0777 OK
        CMD: trevis-83vm7 dmesg
        [ 2314.424588] Lustre: mdt00_001: service thread pid 10192 was inactive for 42.474 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
        CMD: trevis-83vm7 dmesg
        [ 2383.543931] Lustre: mdt00_001: service thread pid 10192 completed after 111.597s. This likely indicates the system was overloaded (too many service threads, or not enough hardware resources).

            wc-triage WC Triage
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: