Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16743

ZFS sanity test_316: lfs migrate -m1 failed: no such file or directory

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/473fe491-78ba-4552-a0ab-556025352754

      test_316 failed with the following error:

      lfs migrate -m1 failed
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-master-patchless/768 - 4.18.0-240.22.1.el8_3.x86_64
      servers: https://build.whamcloud.com/job/lustre-master-patchless/768 - 4.18.0-240.22.1.el8_3.x86_64

      Started failing on 2023-04-13. Nothing in the logs except ESTALE error:

      ll_close_inode_openhandle()) lustre-clilmv-ffff991acdd0a800: inode [0x200005223:0x10470:0x0] mdc close failed: rc = -116
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity test_316 - lfs migrate -m1 failed

      Attachments

        Activity

          [LU-16743] ZFS sanity test_316: lfs migrate -m1 failed: no such file or directory
          pjones Peter Jones added a comment -

          Landed for 2.16

          pjones Peter Jones added a comment - Landed for 2.16

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52052/
          Subject: LU-16743 lod: create stripe with correct attr
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 6be9476e790ceef71e874b2745a8280443d5c90b

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52052/ Subject: LU-16743 lod: create stripe with correct attr Project: fs/lustre-release Branch: master Current Patch Set: Commit: 6be9476e790ceef71e874b2745a8280443d5c90b

          "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52052
          Subject: LU-16743 mdt: clear attr->la_flags before migration
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 7a84550a7bf813d93ef1d9275b1164912e9579ce

          gerrit Gerrit Updater added a comment - "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52052 Subject: LU-16743 mdt: clear attr->la_flags before migration Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 7a84550a7bf813d93ef1d9275b1164912e9579ce
          laisiyao Lai Siyao added a comment -
          00080000:00000040:1.0:1686430631.310794:0:327942:0:(osd_object.c:1034:osd_attr_get()) lustre-MDT0001: set orphan flag on [0x240002340:0x20:0x0] (0x8877f/0x0)
          

          it shows inode->i_flags has ORPHAN set, which should not happen, because this flag should be set on obj->oo_lma_flags (which is 0 here).

          laisiyao Lai Siyao added a comment - 00080000:00000040:1.0:1686430631.310794:0:327942:0:(osd_object.c:1034:osd_attr_get()) lustre-MDT0001: set orphan flag on [0x240002340:0x20:0x0] (0x8877f/0x0) it shows inode->i_flags has ORPHAN set, which should not happen, because this flag should be set on obj->oo_lma_flags (which is 0 here).
          laisiyao Lai Siyao added a comment -

          Maloo test results search shows this occurs on ZFS backend only, and it failed because the ORPHAN flag is read on the target stripe of migrating directory, I'm afraid its inode attr may not be initialized correctly.

          laisiyao Lai Siyao added a comment - Maloo test results search shows this occurs on ZFS backend only, and it failed because the ORPHAN flag is read on the target stripe of migrating directory, I'm afraid its inode attr may not be initialized correctly.
          adilger Andreas Dilger added a comment - - edited

          Patches landed in that timeframe that might affect this test:

          $ git log --oneline --after 2023-04-10 --before 2023-04-14
          ae2ff5174d LU-16732 ldiskfs: update for ext4-delayed-iput for RHEL9.1
          4b12f2b9ae LU-16646 krb: improve lookup of user's credentials
          9784178eff LU-16646 krb: use system ccache for Lustre services
          5731acd499 LU-16646 krb: get rid of MEMORY private cache for krb creds
          3214d4d860 LU-16630 sec: improve Kerberos cross-realm trust remapping
          0b00e318e2 LU-13444 tests: sanity to set PTLDEBUG
          fce9af0b7f LU-16503 utils: add --hex-idx option for lfs getstripe
          023a873c14 LU-16608 tests: Reduce lock count in 255c
          d7e20133c3 LU-16221 kernel: RHEL 9.1 server support
          dfb08bbf77 LU-16465 llite: fix LSOM blocks for ftruncate and close
          2de1dbd440 LU-16350 osd-ldiskfs: no_llseek removed, dquot_transfer
          0006eb3644 LU-16328 llite: migrate_folio, vfs_setxattr
          133ed0cf6f LU-16327 llite: read_folio, release_folio, filler_t
          ca59060c42 LU-11047 mdt: standardize mdt object locking
          4435d0121f LU-14139 statahead: batched statahead processing
          1c8a49bedf LU-11404 llite: only first sync to MDS matter
          36a199db2b LU-10391 ptlrpc: change cc_nid in nrs to be struct lnet_nid
          a6915dd9d8 LU-10391 obdclass: change class_match_nid to take lnet_nid
          163331cb81 LU-10391 lustre: introduce class_parse_nid()
          b4a28a3269 LU-10391 lustre: rename class_parse_nid to class_parse_nid4
          16d84b0305 LU-10391 obdclass: change class_add/check_uuid to large nid
          42b49afdc8 LU-10391 lnet: change LNetAddPeer() to take struct lnet_nid
          6c0b4329d0 LU-16339 quota: notify OSTs until lge_qunit_nu is set
          3be4258839 LU-16634 llite: move common ioctl code to ll_iocontrol()
          1f4825eff0 LU-16634 obdclass: improve iocontrol error messages
          4a1465577e LU-16634 misc: remove unnecessary ioctl typecasts
          
          adilger Andreas Dilger added a comment - - edited Patches landed in that timeframe that might affect this test: $ git log --oneline --after 2023-04-10 --before 2023-04-14 ae2ff5174d LU-16732 ldiskfs: update for ext4-delayed-iput for RHEL9.1 4b12f2b9ae LU-16646 krb: improve lookup of user's credentials 9784178eff LU-16646 krb: use system ccache for Lustre services 5731acd499 LU-16646 krb: get rid of MEMORY private cache for krb creds 3214d4d860 LU-16630 sec: improve Kerberos cross-realm trust remapping 0b00e318e2 LU-13444 tests: sanity to set PTLDEBUG fce9af0b7f LU-16503 utils: add --hex-idx option for lfs getstripe 023a873c14 LU-16608 tests: Reduce lock count in 255c d7e20133c3 LU-16221 kernel: RHEL 9.1 server support dfb08bbf77 LU-16465 llite: fix LSOM blocks for ftruncate and close 2de1dbd440 LU-16350 osd-ldiskfs: no_llseek removed, dquot_transfer 0006eb3644 LU-16328 llite: migrate_folio, vfs_setxattr 133ed0cf6f LU-16327 llite: read_folio, release_folio, filler_t ca59060c42 LU-11047 mdt: standardize mdt object locking 4435d0121f LU-14139 statahead: batched statahead processing 1c8a49bedf LU-11404 llite: only first sync to MDS matter 36a199db2b LU-10391 ptlrpc: change cc_nid in nrs to be struct lnet_nid a6915dd9d8 LU-10391 obdclass: change class_match_nid to take lnet_nid 163331cb81 LU-10391 lustre: introduce class_parse_nid() b4a28a3269 LU-10391 lustre: rename class_parse_nid to class_parse_nid4 16d84b0305 LU-10391 obdclass: change class_add/check_uuid to large nid 42b49afdc8 LU-10391 lnet: change LNetAddPeer() to take struct lnet_nid 6c0b4329d0 LU-16339 quota: notify OSTs until lge_qunit_nu is set 3be4258839 LU-16634 llite: move common ioctl code to ll_iocontrol() 1f4825eff0 LU-16634 obdclass: improve iocontrol error messages 4a1465577e LU-16634 misc: remove unnecessary ioctl typecasts

          Lai, could you please take a look at this.  It failed 12x last week, which isn't a huge number given how often sanity.sh is run, but it looks like a real defect and not a test script issue.

          adilger Andreas Dilger added a comment - Lai, could you please take a look at this.  It failed 12x last week, which isn't a huge number given how often sanity.sh is run, but it looks like a real defect and not a test script issue.
          nangelinas Nikitas Angelinas added a comment - +1 on master: https://testing.whamcloud.com/test_sets/38964588-9046-45fe-902a-bb7ac1660871
          ssmirnov Serguei Smirnov added a comment - +1 on master: https://testing.whamcloud.com/test_sets/84b4be3d-4ab1-4333-aa16-d415848972ec

          People

            laisiyao Lai Siyao
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: