Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18288

sanity test_120e: FAIL: 2 cancel RPC occured

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.16.0
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for jianyu <yujian@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/36c3631b-e39a-4fa7-b2f9-567a21a15aa6

      test_120e failed with the following error:

      == sanity test 120e: Early Lock Cancel: unlink test ====== 00:39:44 (1727743184)
      striped dir -i0 -c1 -H crush /mnt/lustre/d120e.sanity
      ldlm.namespaces.lustre-MDT0000-mdc-ffff0000dfa1c000.lru_size=200
      ldlm.namespaces.lustre-MDT0001-mdc-ffff0000dfa1c000.lru_size=200
      ldlm.namespaces.lustre-MDT0002-mdc-ffff0000dfa1c000.lru_size=200
      ldlm.namespaces.lustre-MDT0003-mdc-ffff0000dfa1c000.lru_size=200
      ldlm.namespaces.lustre-OST0000-osc-ffff0000dfa1c000.lru_size=200
      ldlm.namespaces.lustre-OST0001-osc-ffff0000dfa1c000.lru_size=200
      ldlm.namespaces.lustre-OST0002-osc-ffff0000dfa1c000.lru_size=200
      ldlm.namespaces.lustre-OST0003-osc-ffff0000dfa1c000.lru_size=200
      ldlm.namespaces.lustre-OST0004-osc-ffff0000dfa1c000.lru_size=200
      ldlm.namespaces.lustre-OST0005-osc-ffff0000dfa1c000.lru_size=200
      ldlm.namespaces.lustre-OST0006-osc-ffff0000dfa1c000.lru_size=200
      ldlm.namespaces.lustre-OST0007-osc-ffff0000dfa1c000.lru_size=200
      1+0 records in
      1+0 records out
      512 bytes copied, 0.00646128 s, 79.2 kB/s
      1+0 records in
      1+0 records out
      512 bytes copied, 0.00815561 s, 62.8 kB/s
      CMD: onyx-76vm12 /usr/sbin/lctl get_param -n ldlm.services.ldlm_canceld.stats
      CMD: onyx-76vm12 /usr/sbin/lctl get_param -n ldlm.services.ldlm_canceld.stats
       sanity test_120e: @@@@@@ FAIL: 2 cancel RPC occured
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-master/4581 - 5.14.0-362.24.1.el9_3.aarch64
      servers: https://build.whamcloud.com/job/lustre-master/4581 - 4.18.0-477.27.1.el8_lustre.x86_64

      <<Please provide additional information about the failure here>>

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity test_120e - 2 cancel RPC occured

      Attachments

        Issue Links

          Activity

            [LU-18288] sanity test_120e: FAIL: 2 cancel RPC occured
            pjones Peter Jones added a comment -

            Merged for 2.16

            pjones Peter Jones added a comment - Merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56642/
            Subject: LU-18288 tests: lru_resize_disable sets lru_max_age
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 2f946588daef34b54ed67734290973267915865e

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56642/ Subject: LU-18288 tests: lru_resize_disable sets lru_max_age Project: fs/lustre-release Branch: master Current Patch Set: Commit: 2f946588daef34b54ed67734290973267915865e
            yujian Jian Yu added a comment -

            Lustre 2.16.0 RC5 sanity test 120f hit the same failure:
            https://testing.whamcloud.com/test_sets/098f4984-aeae-448d-b550-2b94f6f87eff

            yujian Jian Yu added a comment - Lustre 2.16.0 RC5 sanity test 120f hit the same failure: https://testing.whamcloud.com/test_sets/098f4984-aeae-448d-b550-2b94f6f87eff

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56642
            Subject: LU-18288 tests: lru_resize_disable sets lru_max_age
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: f604c7cdcbb132f6e97743354253d0f68b39671b

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56642 Subject: LU-18288 tests: lru_resize_disable sets lru_max_age Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f604c7cdcbb132f6e97743354253d0f68b39671b
            adilger Andreas Dilger added a comment - - edited

            It looks like this started failing frequently on 2024-09-03:

            It might be caused by patch https://review.whamcloud.com/53682 "LU-17428 ldlm: reduce default lru_max_age", which landed on 2024-08-30 triggering more lock cancellation during the testing?

            adilger Andreas Dilger added a comment - - edited It looks like this started failing frequently on 2024-09-03: It might be caused by patch https://review.whamcloud.com/53682 " LU-17428 ldlm: reduce default lru_max_age ", which landed on 2024-08-30 triggering more lock cancellation during the testing?
            yujian Jian Yu added a comment -

            The failure occurred consistently on the following test sessions:
            lustre-master-el8.8-x86_64-full-dne-part-2
            lustre-master-el8.9-x86_64-full-zfs-part-2
            lustre-master-el9.4-x86_64_lustre-master-sles15sp6-x86_64-full-part-2
            lustre-master-el8.10-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-oss-zfs
            lustre-master-el8.10-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-mds-zfs
            lustre-master-el9.4-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-oss-zfs

            yujian Jian Yu added a comment - The failure occurred consistently on the following test sessions: lustre-master-el8.8-x86_64-full-dne-part-2 lustre-master-el8.9-x86_64-full-zfs-part-2 lustre-master-el9.4-x86_64_lustre-master-sles15sp6-x86_64-full-part-2 lustre-master-el8.10-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-oss-zfs lustre-master-el8.10-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-mds-zfs lustre-master-el9.4-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-oss-zfs

            Various test_120* subtests have been failing in a similar manner for years. However, this is becoming much more frequent.

            adilger Andreas Dilger added a comment - Various test_120* subtests have been failing in a similar manner for years. However, this is becoming much more frequent.

            People

              adilger Andreas Dilger
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: