Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.11.0, Lustre 2.10.4
    • Lustre 2.11.0
    • None
    • onyx, full interop
      servers: el7.4, ldiskfs, branch b2_10, v2.10.2, b52
      clients: el7.4, branch master, v2.10.56, b3678
    • 3
    • 9223372036854775807

    Description

      session: https://testing.hpdd.intel.com/test_sessions/cb5f13e0-9177-4f4e-9a26-d197001db0c0
      test set: https://testing.hpdd.intel.com/test_sets/5922f1c0-e4ef-11e7-8027-52540065bddc

      From test_log:

      copying /usr/share/dbench/client.txt to /mnt/lustre/d8.sanity-pfl/client.txt
      cp: error writing '/mnt/lustre/d8.sanity-pfl/client.txt': Invalid argument
      cp: failed to extend '/mnt/lustre/d8.sanity-pfl/client.txt': Invalid argument
        Trace dump:
        = rundbench:55:main()
      sanity-pfl: FAIL: test-framework exiting on error
       sanity-pfl test_8: @@@@@@ FAIL: dbench failed 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:5328:error()
        = /usr/lib64/lustre/tests/sanity-pfl.sh:333:test_8()
        = /usr/lib64/lustre/tests/test-framework.sh:5604:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:5643:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:5490:run_test()
        = /usr/lib64/lustre/tests/sanity-pfl.sh:337:main()
      

      Attachments

        Issue Links

          Activity

            [LU-10437] sanity-pfl test_8: dbench failed

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30784/
            Subject: LU-10437 lod: clear layout header when generating layout
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: 0e38e97e2c4209ac31f3f6f9bc245da9a991006c

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30784/ Subject: LU-10437 lod: clear layout header when generating layout Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 0e38e97e2c4209ac31f3f6f9bc245da9a991006c
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30785/
            Subject: LU-10437 lod: clear layout header when generating layout
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 47d6ce20cfd8b04f20f7fc7accc39b3902780900

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30785/ Subject: LU-10437 lod: clear layout header when generating layout Project: fs/lustre-release Branch: master Current Patch Set: Commit: 47d6ce20cfd8b04f20f7fc7accc39b3902780900

            Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: https://review.whamcloud.com/30785
            Subject: LU-10437 lod: clear layout header when generating layout
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: b34f3ccad87cbcdd9b0c2bd4a84d6221735dc9dd

            gerrit Gerrit Updater added a comment - Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: https://review.whamcloud.com/30785 Subject: LU-10437 lod: clear layout header when generating layout Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: b34f3ccad87cbcdd9b0c2bd4a84d6221735dc9dd

            Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: https://review.whamcloud.com/30784
            Subject: LU-10437 lod: clear layout header when generating layout
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: 2eb12e91fea8f57b1c00dac7d756bedcda4aee1f

            gerrit Gerrit Updater added a comment - Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: https://review.whamcloud.com/30784 Subject: LU-10437 lod: clear layout header when generating layout Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 2eb12e91fea8f57b1c00dac7d756bedcda4aee1f

            it turned out that the b2_10 branch doesn't clear either lcm_flags or lcm_padding fields when packing layout on the server side. The corresponding code is in lod_generate_lovea():

                    lcm = (struct lov_comp_md_v1 *)lmm;                                     
                    lcm->lcm_magic = cpu_to_le32(LOV_MAGIC_COMP_V1);                        
                    lcm->lcm_entry_count = cpu_to_le16(comp_cnt);                           
                                                                                            
                    offset = sizeof(*lcm) + sizeof(*lcme) * comp_cnt;                       
                    LASSERT(offset % sizeof(__u64) == 0);    
            

            This will confuse b2_11 clients because lcm_flags and lcm_mirror_count would be random numbers and then will not pass sanity check.

            I think the best way to fix this problem is to create a patch to clear the corresponding fields in b2_10.

            jay Jinshan Xiong (Inactive) added a comment - it turned out that the b2_10 branch doesn't clear either lcm_flags or lcm_padding fields when packing layout on the server side. The corresponding code is in lod_generate_lovea() : lcm = (struct lov_comp_md_v1 *)lmm; lcm->lcm_magic = cpu_to_le32(LOV_MAGIC_COMP_V1); lcm->lcm_entry_count = cpu_to_le16(comp_cnt); offset = sizeof(*lcm) + sizeof(*lcme) * comp_cnt; LASSERT(offset % sizeof(__u64) == 0); This will confuse b2_11 clients because lcm_flags and lcm_mirror_count would be random numbers and then will not pass sanity check. I think the best way to fix this problem is to create a patch to clear the corresponding fields in b2_10.
            yujian Jian Yu added a comment -

            sanity-pfl test 15 in the same interop test session hit the same failure:

            == sanity-pfl test 15: Verify component options for lfs find ========================================= 12:46:26 (1513687586)
            dd: error writing '/mnt/lustre/d15.sanity-pfl/f1': Invalid argument
            

            Debug log on client node:

            00000080:00200000:1.0:1513687587.114545:0:6996:0:(vvp_io.c:312:vvp_io_fini()) [0x200062e21:0x2743:0x0] ignore/verify layout 1/0, layout version 0 need write layout 0, restore needed 0
            00020000:00020000:1.0:1513687587.114768:0:6996:0:(lov_object.c:1220:lov_layout_change()) lustre-clilov-ffff88005daa9000: cannot apply new layout on [0x200062e21:0x2743:0x0] : rc = -22
            00010000:00010000:1.0:1513687587.116338:0:6996:0:(ldlm_lock.c:800:ldlm_lock_decref_internal_nolock()) ### ldlm_lock_decref(CR) ns: ?? lock: ffff88005cf43440/0xed5adfd7af8ec4ec lrc: 3/1,0 mode: CR/CR res: ?? rrc=?? type: ??? flags: 0x10000000000000 nid: local remote: 0x4869a6ac1a9aca2b expref: -99 pid: 6996 timeout: 0 lvb_type: 3
            00010000:00010000:1.0:1513687587.116341:0:6996:0:(ldlm_lock.c:873:ldlm_lock_decref_internal()) ### add lock into lru list ns: ?? lock: ffff88005cf43440/0xed5adfd7af8ec4ec lrc: 2/0,0 mode: CR/CR res: ?? rrc=?? type: ??? flags: 0x10000000000000 nid: local remote: 0x4869a6ac1a9aca2b expref: -99 pid: 6996 timeout: 0 lvb_type: 3
            00000080:00020000:1.0:1513687587.116345:0:6996:0:(vvp_io.c:1495:vvp_io_init()) lustre: refresh file layout [0x200062e21:0x2743:0x0] error -22.
            00000080:00200000:1.0:1513687587.117486:0:6996:0:(vvp_io.c:312:vvp_io_fini()) [0x200062e21:0x2743:0x0] ignore/verify layout 0/0, layout version -2 need write layout 0, restore needed 0
            00000080:00200000:1.0:1513687587.117488:0:6996:0:(file.c:1423:ll_file_io_generic()) f1: 2 io complete with rc: -22, result: 0, restart: 0
            00000080:00200000:1.0:1513687587.117489:0:6996:0:(file.c:1459:ll_file_io_generic()) f1: write *ppos: 1048576, pos: 1048576, ret: 0, rc: -22
            
            yujian Jian Yu added a comment - sanity-pfl test 15 in the same interop test session hit the same failure: == sanity-pfl test 15: Verify component options for lfs find ========================================= 12:46:26 (1513687586) dd: error writing '/mnt/lustre/d15.sanity-pfl/f1': Invalid argument Debug log on client node: 00000080:00200000:1.0:1513687587.114545:0:6996:0:(vvp_io.c:312:vvp_io_fini()) [0x200062e21:0x2743:0x0] ignore/verify layout 1/0, layout version 0 need write layout 0, restore needed 0 00020000:00020000:1.0:1513687587.114768:0:6996:0:(lov_object.c:1220:lov_layout_change()) lustre-clilov-ffff88005daa9000: cannot apply new layout on [0x200062e21:0x2743:0x0] : rc = -22 00010000:00010000:1.0:1513687587.116338:0:6996:0:(ldlm_lock.c:800:ldlm_lock_decref_internal_nolock()) ### ldlm_lock_decref(CR) ns: ?? lock: ffff88005cf43440/0xed5adfd7af8ec4ec lrc: 3/1,0 mode: CR/CR res: ?? rrc=?? type: ??? flags: 0x10000000000000 nid: local remote: 0x4869a6ac1a9aca2b expref: -99 pid: 6996 timeout: 0 lvb_type: 3 00010000:00010000:1.0:1513687587.116341:0:6996:0:(ldlm_lock.c:873:ldlm_lock_decref_internal()) ### add lock into lru list ns: ?? lock: ffff88005cf43440/0xed5adfd7af8ec4ec lrc: 2/0,0 mode: CR/CR res: ?? rrc=?? type: ??? flags: 0x10000000000000 nid: local remote: 0x4869a6ac1a9aca2b expref: -99 pid: 6996 timeout: 0 lvb_type: 3 00000080:00020000:1.0:1513687587.116345:0:6996:0:(vvp_io.c:1495:vvp_io_init()) lustre: refresh file layout [0x200062e21:0x2743:0x0] error -22. 00000080:00200000:1.0:1513687587.117486:0:6996:0:(vvp_io.c:312:vvp_io_fini()) [0x200062e21:0x2743:0x0] ignore/verify layout 0/0, layout version -2 need write layout 0, restore needed 0 00000080:00200000:1.0:1513687587.117488:0:6996:0:(file.c:1423:ll_file_io_generic()) f1: 2 io complete with rc: -22, result: 0, restart: 0 00000080:00200000:1.0:1513687587.117489:0:6996:0:(file.c:1459:ll_file_io_generic()) f1: write *ppos: 1048576, pos: 1048576, ret: 0, rc: -22
            yujian Jian Yu added a comment -

            Dmesg log on client node:

            LustreError: 514:0:(lov_object.c:1220:lov_layout_change()) lustre-clilov-ffff88005daa9000: cannot apply new layout on [0x200062e21:0x2735:0x0] : rc = -22
            LustreError: 514:0:(vvp_io.c:1495:vvp_io_init()) lustre: refresh file layout [0x200062e21:0x2735:0x0] error -22.
            

            Debug log on client node:

            00000080:00200000:1.0:1513687522.574612:0:514:0:(vvp_io.c:312:vvp_io_fini()) [0x200062e21:0x2735:0x0] ignore/verify layout 1/0, layout version 0 need write layout 0, restore needed 0
            00020000:00020000:1.0:1513687522.578220:0:514:0:(lov_object.c:1220:lov_layout_change()) lustre-clilov-ffff88005daa9000: cannot apply new layout on [0x200062e21:0x2735:0x0] : rc = -22
            00010000:00010000:1.0:1513687522.579782:0:514:0:(ldlm_lock.c:800:ldlm_lock_decref_internal_nolock()) ### ldlm_lock_decref(CR) ns: ?? lock: ffff88005cf42240/0xed5adfd7af8ec0c4 lrc: 3/1,0 mode: CR/CR res: ?? rrc=?? type: ??? flags: 0x10000000000000 nid: local remote: 0x4869a6ac1a9ab3d4 expref: -99 pid: 514 timeout: 0 lvb_type: 3
            00010000:00010000:1.0:1513687522.579786:0:514:0:(ldlm_lock.c:873:ldlm_lock_decref_internal()) ### add lock into lru list ns: ?? lock: ffff88005cf42240/0xed5adfd7af8ec0c4 lrc: 2/0,0 mode: CR/CR res: ?? rrc=?? type: ??? flags: 0x10000000000000 nid: local remote: 0x4869a6ac1a9ab3d4 expref: -99 pid: 514 timeout: 0 lvb_type: 3
            00000080:00020000:1.0:1513687522.579791:0:514:0:(vvp_io.c:1495:vvp_io_init()) lustre: refresh file layout [0x200062e21:0x2735:0x0] error -22.
            00000080:00200000:1.0:1513687522.580937:0:514:0:(vvp_io.c:312:vvp_io_fini()) [0x200062e21:0x2735:0x0] ignore/verify layout 0/0, layout version -2 need write layout 0, restore needed 0
            00000080:00200000:1.0:1513687522.580940:0:514:0:(file.c:1423:ll_file_io_generic()) client.txt: 2 io complete with rc: -22, result: 0, restart: 0
            00000080:00200000:1.0:1513687522.580942:0:514:0:(file.c:1459:ll_file_io_generic()) client.txt: write *ppos: 16777216, pos: 16777216, ret: 0, rc: -22
            

            Dmesg log on MDS:

            LustreError: 18669:0:(mdt_lvb.c:163:mdt_lvbo_fill()) lustre-MDT0000: expected 368 actual 344.
            
            yujian Jian Yu added a comment - Dmesg log on client node: LustreError: 514:0:(lov_object.c:1220:lov_layout_change()) lustre-clilov-ffff88005daa9000: cannot apply new layout on [0x200062e21:0x2735:0x0] : rc = -22 LustreError: 514:0:(vvp_io.c:1495:vvp_io_init()) lustre: refresh file layout [0x200062e21:0x2735:0x0] error -22. Debug log on client node: 00000080:00200000:1.0:1513687522.574612:0:514:0:(vvp_io.c:312:vvp_io_fini()) [0x200062e21:0x2735:0x0] ignore/verify layout 1/0, layout version 0 need write layout 0, restore needed 0 00020000:00020000:1.0:1513687522.578220:0:514:0:(lov_object.c:1220:lov_layout_change()) lustre-clilov-ffff88005daa9000: cannot apply new layout on [0x200062e21:0x2735:0x0] : rc = -22 00010000:00010000:1.0:1513687522.579782:0:514:0:(ldlm_lock.c:800:ldlm_lock_decref_internal_nolock()) ### ldlm_lock_decref(CR) ns: ?? lock: ffff88005cf42240/0xed5adfd7af8ec0c4 lrc: 3/1,0 mode: CR/CR res: ?? rrc=?? type: ??? flags: 0x10000000000000 nid: local remote: 0x4869a6ac1a9ab3d4 expref: -99 pid: 514 timeout: 0 lvb_type: 3 00010000:00010000:1.0:1513687522.579786:0:514:0:(ldlm_lock.c:873:ldlm_lock_decref_internal()) ### add lock into lru list ns: ?? lock: ffff88005cf42240/0xed5adfd7af8ec0c4 lrc: 2/0,0 mode: CR/CR res: ?? rrc=?? type: ??? flags: 0x10000000000000 nid: local remote: 0x4869a6ac1a9ab3d4 expref: -99 pid: 514 timeout: 0 lvb_type: 3 00000080:00020000:1.0:1513687522.579791:0:514:0:(vvp_io.c:1495:vvp_io_init()) lustre: refresh file layout [0x200062e21:0x2735:0x0] error -22. 00000080:00200000:1.0:1513687522.580937:0:514:0:(vvp_io.c:312:vvp_io_fini()) [0x200062e21:0x2735:0x0] ignore/verify layout 0/0, layout version -2 need write layout 0, restore needed 0 00000080:00200000:1.0:1513687522.580940:0:514:0:(file.c:1423:ll_file_io_generic()) client.txt: 2 io complete with rc: -22, result: 0, restart: 0 00000080:00200000:1.0:1513687522.580942:0:514:0:(file.c:1459:ll_file_io_generic()) client.txt: write *ppos: 16777216, pos: 16777216, ret: 0, rc: -22 Dmesg log on MDS: LustreError: 18669:0:(mdt_lvb.c:163:mdt_lvbo_fill()) lustre-MDT0000: expected 368 actual 344.

            People

              wc-triage WC Triage
              jcasper James Casper (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: