Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18237

recovery-small: test 10a fails with 'no eviction: before:1726673307'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for James Simmons <uja.ornl@gmail.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/64bcf39a-0c5d-4ebe-aa8e-07c6118719cd

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-reviews/107661 - 5.14.0-362.24.1.el9_3.x86_64
      servers: https://build.whamcloud.com/job/lustre-reviews/107661 - 5.14.0-362.24.1_lustre.el9.x86_64

      <<Please provide additional information about the failure here>>

      current_state: FULL
      state_history:

      • [ 1726673245, DISCONN ]
      • [ 1726673245, CONNECTING ]
      • [ 1726673245, RECOVER ]
      • [ 1726673245, FULL ]
      • [ 1726673262, DISCONN ]
      • [ 1726673262, CONNECTING ]
      • [ 1726673262, RECOVER ]
      • [ 1726673262, FULL ]
      • [ 1726673279, DISCONN ]
      • [ 1726673279, CONNECTING ]
      • [ 1726673279, RECOVER ]
      • [ 1726673279, FULL ]
      • [ 1726673297, DISCONN ]
      • [ 1726673297, CONNECTING ]
      • [ 1726673297, RECOVER ]
      • [ 1726673297, FULL ]
        mdc.lustre-MDT0001-mdc-ffff88f406920000.state=
        current_state: FULL
        state_history:
      • [ 1726671294, CONNECTING ]
      • [ 1726671295, FULL ]
        mdc.lustre-MDT0002-mdc-ffff88f406920000.state=
        current_state: FULL
        state_history:
      • [ 1726671294, CONNECTING ]
      • [ 1726671295, FULL ]
        mdc.lustre-MDT0003-mdc-ffff88f406920000.state=
        current_state: FULL
        state_history:
      • [ 1726671294, CONNECTING ]
      • [ 1726671295, FULL ]
        recovery-small test_10a: @@@@@@ FAIL: no eviction: before:1726673307
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:7177:error()
        = /usr/lib64/lustre/tests/recovery-small.sh:154:test_10a()
        = /usr/lib64/lustre/tests/test-framework.sh:7522:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:7585:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:7408:run_test()
        = /usr/lib64/lustre/tests/recovery-small.sh:171:main()
        Dumping lctl log to /autotest/autotest-1/2024-09-18/lustre-reviews_review-dne-part-5_107661_16_4f1e33f4-b320-40c7-8af0-801d1ab9dc56//recovery-small.test_10a.*.1726673420.log
        CMD: trevis-24vm7,trevis-56vm1.trevis.whamcloud.com,trevis-56vm2,trevis-56vm3,trevis-83vm7 /usr/sbin/lctl dk > /autotest/autotest-1/2024-09-18/lustre-reviews_review-dne-part-5_107661_16_4f1e33f4-b320-40c7-8af0-801d1ab9dc56//recovery-small.test_10a.debug_log.\$(hostname -s).1726673420.log;
        dmesg > /autotest/autotest-1/2024-09-18/lustre-reviews_review-dne-part-5_107661_16_4f1e33f4-b320-40c7-8af0-801d1ab9dc56//recovery-small.test_10a.dmesg.\$(hostname -s).1726673420.log
        CMD: trevis-56vm1.trevis.whamcloud.com checkstat -v -p 0777 /mnt/lustre
        /mnt/lustre has perms 0777 OK
        CMD: trevis-83vm7 dmesg
        [ 2314.424588] Lustre: mdt00_001: service thread pid 10192 was inactive for 42.474 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
        CMD: trevis-83vm7 dmesg
        [ 2383.543931] Lustre: mdt00_001: service thread pid 10192 completed after 111.597s. This likely indicates the system was overloaded (too many service threads, or not enough hardware resources).

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: