Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5637

sanity test_130a: FIEMAP on 1-stripe file(/mnt/lustre/f130a.sanity) failed

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.11.0
    • Lustre 2.7.0
    • None
    • client: lustre-master RHEL7
      server: lustre-master RHEL6
      build #2641
    • 3
    • 15784

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/bfcd5a82-3610-11e4-8a7f-5254006e85c2.

      The sub-test test_130a failed with the following error:

      FIEMAP on 1-stripe file(/mnt/lustre/f130a.sanity) failed

      several FIEMAP tests failed from test 130a to 130e

      == sanity test 130a: FIEMAP (1-stripe file) ========================================================== 09:49:12 (1409935752)
      1+0 records in
      1+0 records out
      65536 bytes (66 kB) copied, 0.00196579 s, 33.3 MB/s
      Filesystem type is: bd00bd0
      File size of /mnt/lustre/f130a.sanity is 65536 (16 blocks of 4096 bytes)
       ext:     logical_offset:        physical_offset: length:   expected: flags:
         0:        0..      15:      40923..     40938:     16:             eof
      /mnt/lustre/f130a.sanity: 1 extent found
       sanity test_130a: @@@@@@ FAIL: FIEMAP on 1-stripe file(/mnt/lustre/f130a.sanity) failed 
      

      Attachments

        Issue Links

          Activity

            [LU-5637] sanity test_130a: FIEMAP on 1-stripe file(/mnt/lustre/f130a.sanity) failed
            jhammond John Hammond added a comment -

            Seems like the wrong e2fsprogs is being installed again. In the recent failures, the extent length reported by filefrag is in terms of 4096 byte blocks, whereas the test expects it in terms of 1K blocks, and 16 != 64.

            jhammond John Hammond added a comment - Seems like the wrong e2fsprogs is being installed again. In the recent failures, the extent length reported by filefrag is in terms of 4096 byte blocks, whereas the test expects it in terms of 1K blocks, and 16 != 64.
            pjones Peter Jones added a comment -

            Niu

            Discussing this on the triage call we thought that rare issue occurring more frequently might be due to PFL changes. Could you please investigate?

            Thanks

            Peter

            pjones Peter Jones added a comment - Niu Discussing this on the triage call we thought that rare issue occurring more frequently might be due to PFL changes. Could you please investigate? Thanks Peter
            sarah Sarah Liu added a comment - - edited

            In the past 7 days(from 5/30 to 6/5) lustre-review testing, sanity-130a/b/c/d/e each failed 21 times. The error only seen on review-ldiskfs(10 fails) and review-dne-part1(11 fails). For review-ldiskfs and dne-part1, there were totally 244 sessions ran last week, so the failure rate is about 10%

            sarah Sarah Liu added a comment - - edited In the past 7 days(from 5/30 to 6/5) lustre-review testing, sanity-130a/b/c/d/e each failed 21 times. The error only seen on review-ldiskfs(10 fails) and review-dne-part1(11 fails). For review-ldiskfs and dne-part1, there were totally 244 sessions ran last week, so the failure rate is about 10%

            I think the proper fix is to start building e2fsprogs for sles12 in our tools/e2fsprogs builds, then installing that on sles12 clients. so the fix(es) are outside of lustre, in e2fsprogs and TEI.

            fwiw, I have built and installed the most recent 1.42.12-wc1 version of e2fsprogs locally on sles12 clients and servers without any problems.

            bogl Bob Glossman (Inactive) added a comment - I think the proper fix is to start building e2fsprogs for sles12 in our tools/e2fsprogs builds, then installing that on sles12 clients. so the fix(es) are outside of lustre, in e2fsprogs and TEI. fwiw, I have built and installed the most recent 1.42.12-wc1 version of e2fsprogs locally on sles12 clients and servers without any problems.

            It would be best to install the Lustre e2fsprogs on the clients. The current test is skipped if filefrag doesn't support any FIEMAP functionality. Most of the FIEMAP code was landed into upstream e2fsprogs, but not the Lustre-specific part that handles files striped across multiple devices.

            The best solution would be to allow the test to be run, but skip the parts that depend on the "device:" being returned. The stock filefrag will also not run on multi-striped files, so those tests also need to be skipped.

            adilger Andreas Dilger added a comment - It would be best to install the Lustre e2fsprogs on the clients. The current test is skipped if filefrag doesn't support any FIEMAP functionality. Most of the FIEMAP code was landed into upstream e2fsprogs, but not the Lustre-specific part that handles files striped across multiple devices. The best solution would be to allow the test to be run, but skip the parts that depend on the "device:" being returned. The stock filefrag will also not run on multi-striped files, so those tests also need to be skipped.

            Maybe the proper remedy is to just install lustre e2fsprogs on all our test nodes regardless of client/server or version. However I don't think we require it there on client lustre installs, those done with lustre-client-* rpms on unpatched kernels. That being so we might not be testing what we say we support if we do so.

            bogl Bob Glossman (Inactive) added a comment - Maybe the proper remedy is to just install lustre e2fsprogs on all our test nodes regardless of client/server or version. However I don't think we require it there on client lustre installs, those done with lustre-client-* rpms on unpatched kernels. That being so we might not be testing what we say we support if we do so.

            looking at test nodes on Onyx I can confirm that only native e2fsprogs is installed on client nodes, not lustre e2fsprogs. However this is so for el6 as well as el7. Not clear why it's only broken on el7. native e2fsprogs is e2fsprogs-1.42.9-4.el7.x86_64 in el7, e2fsprogs-1.41.12-18.el6_5.1.x86_64 in el6.

            bogl Bob Glossman (Inactive) added a comment - looking at test nodes on Onyx I can confirm that only native e2fsprogs is installed on client nodes, not lustre e2fsprogs. However this is so for el6 as well as el7. Not clear why it's only broken on el7. native e2fsprogs is e2fsprogs-1.42.9-4.el7.x86_64 in el7, e2fsprogs-1.41.12-18.el6_5.1.x86_64 in el6.

            that could explain my inability to reproduce the failure. I have lustre e2fsprogs installed everywhere, including on the client where it may not even be needed. In fact I have new, new 1.42.12.wc1 versions installed.

            bogl Bob Glossman (Inactive) added a comment - that could explain my inability to reproduce the failure. I have lustre e2fsprogs installed everywhere, including on the client where it may not even be needed. In fact I have new, new 1.42.12.wc1 versions installed.

            The failed test run was printing "expected:" (i.e. next expected physical block number) instead of "dev:" (i.e. the Lustre OST number). This looks like the node did not have the correct Lustre e2fsprogs installed.

            Similarly, on the next test_130b "FIEMAP (2-stripe file)" filefrag printed:

            File size of /mnt/lustre/f130b.sanity is 2097152 (512 blocks of 4096 bytes)
            /mnt/lustre/f130b.sanity: FIBMAP unsupported
            

            which means also that the Lustre e2fsprogs wasn't installed, since Lustre doesn't allow normal FIEMAP for multi-stripe files.

            adilger Andreas Dilger added a comment - The failed test run was printing "expected:" (i.e. next expected physical block number) instead of "dev:" (i.e. the Lustre OST number). This looks like the node did not have the correct Lustre e2fsprogs installed. Similarly, on the next test_130b "FIEMAP (2-stripe file)" filefrag printed: File size of /mnt/lustre/f130b.sanity is 2097152 (512 blocks of 4096 bytes) /mnt/lustre/f130b.sanity: FIBMAP unsupported which means also that the Lustre e2fsprogs wasn't installed, since Lustre doesn't allow normal FIEMAP for multi-stripe files.

            haven't been able to reproduce this failure manually yet. running just sanity, 130a alone doesn't fail. I see different output from the filefrag cmd as follows:

            Filesystem type is: bd00bd0
            File size of /mnt/lustre/f130a.sanity is 65536 (64 blocks of 1024 bytes)
             ext:     device_logical:        physical_offset: length:  dev: flags:
               0:        0..      63:     137344..    137407:     64: 0000: last,net,eof
            /mnt/lustre/f130a.sanity: 1 extent found
            

            The values from my manual test look more like what I would expect than the ones in the failed test log. Can't yet explain the difference.

            bogl Bob Glossman (Inactive) added a comment - haven't been able to reproduce this failure manually yet. running just sanity, 130a alone doesn't fail. I see different output from the filefrag cmd as follows: Filesystem type is: bd00bd0 File size of /mnt/lustre/f130a.sanity is 65536 (64 blocks of 1024 bytes) ext: device_logical: physical_offset: length: dev: flags: 0: 0.. 63: 137344.. 137407: 64: 0000: last,net,eof /mnt/lustre/f130a.sanity: 1 extent found The values from my manual test look more like what I would expect than the ones in the failed test log. Can't yet explain the difference.
            pjones Peter Jones added a comment -

            Bob

            Could you please assist with this issue?

            Thanks

            Peter

            pjones Peter Jones added a comment - Bob Could you please assist with this issue? Thanks Peter

            People

              mdiep Minh Diep
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: