Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-582

1.8<->2.1 interop: sanity test_132: FAIL: some glimpse RPC is expected

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.4.0, Lustre 2.1.5
    • Lustre 2.0.0, Lustre 2.1.2, Lustre 2.1.3, Lustre 1.8.8, Lustre 1.8.6
    • None
    • 3
    • 23,339
    • 4234

    Description

      After the upgrading, sanity test 132 failed on Lustre 2.0.66.0 as follows:

      == sanity test 132: som avoids glimpse rpc == 03:17:56 (1312885076)
      ====> SOM is disabled, 0 glimpse RPC occured
       sanity test_132: @@@@@@ FAIL: some glimpse RPC is expected 
      

      Please refer to the Maloo report for more logs: https://maloo.whamcloud.com/test_sets/a570d34e-c278-11e0-8bdf-52540025f9af

      This is an known issue: bug 23339.

      Attachments

        Issue Links

          Activity

            [LU-582] 1.8<->2.1 interop: sanity test_132: FAIL: some glimpse RPC is expected

            the logs of client and OST

            hongchao.zhang Hongchao Zhang added a comment - the logs of client and OST

            there is indeed glimpse request during calling 'stat $DIR/$tfile'

            in client-7

            00000001:00010000:2.0:1344970176.071156:0:12030:0:(glimpse.c:120:cl_glimpse_lock()) Glimpsing inode [0x20b2735b0:0x1:0x0]
            00000020:00010000:2.0:1344970176.071177:0:12030:0:(cl_lock.c:143:cl_lock_trace0()) enqueue lock: ffff8802e6a12938@(2 ffff8802decb8ae0 1 0 0 1 1 0)(ffff880331731d80/0/1) at cl_enqueue_try():1189
            00000020:00010000:2.0:1344970176.071182:0:12030:0:(cl_lock.c:143:cl_lock_trace0()) enclosure lock: ffff8802e6a12af8@(1 (null) 0 0 0 1 1 0)(ffff880331731d80/1/1) at cl_lock_enclosure():1685
            00000020:00010000:2.0:1344970176.071186:0:12030:0:(cl_lock.c:143:cl_lock_trace0()) enclosure lock: ffff8802e6a12938@(2 ffff8802decb8ae0 1 1 0 1 1 0)(ffff880331731d80/0/2) at cl_lock_enclosure():1685
            00000020:00010000:2.0:1344970176.071190:0:12030:0:(cl_lock.c:143:cl_lock_trace0()) enqueue lock: ffff8802e6a12af8@(2 ffff8802decb8ae0 1 0 0 1 1 0)(ffff880331731aa0/1/0) at cl_enqueue_try():1189

            in fat-intel-2

            00010000:00000001:13.0:1344970176.078486:0:7796:0:(ldlm_lockd.c:1065:ldlm_handle_enqueue0()) Process entered
            00010000:00010000:13.0:1344970176.078487:0:7796:0:(ldlm_lockd.c:1067:ldlm_handle_enqueue0()) ### server-side enqueue handler START
            00010000:00000001:13.0:1344970176.078489:0:7796:0:(ldlm_lockd.c:1419:ldlm_request_cancel()) Process entered
            00010000:00000001:13.0:1344970176.078489:0:7796:0:(ldlm_lockd.c:1423:ldlm_request_cancel()) Process leaving (rc=0 : 0 : 0)
            00010000:00000001:13.0:1344970176.078491:0:7796:0:(ldlm_lock.c:1245:ldlm_lock_create()) Process entered
            00010000:00000010:13.0:1344970176.078494:0:7796:0:(ldlm_resource.c:983:ldlm_resource_new()) slab-alloced '(res)': 320 at ffff8103058d1940.
            00002000:00000001:13.0:1344970176.078496:0:7796:0:(filter_lvb.c:76:filter_lvbo_init()) Process entered
            00002000:00000010:13.0:1344970176.078498:0:7796:0:(filter_lvb.c:84:filter_lvbo_init()) kmalloced 'lvb': 40 at ffff810320fdd880.
            00002000:00000002:13.0:1344970176.078500:0:7796:0:(filter_lvb.c:96:filter_lvbo_init()) lustre-OST0000: filter_lvbo_init(o_seq=0, o_id=70152)
            00002000:00000001:13.0:1344970176.078502:0:7796:0:(filter.c:1474:filter_fid2dentry()) Process entered
            00002000:00000002:13.0:1344970176.078504:0:7796:0:(filter.c:1499:filter_fid2dentry()) looking up object O/d8/70152
            00002000:00000002:13.0:1344970176.078507:0:7796:0:(filter.c:1518:filter_fid2dentry()) got child objid 70152: ffff810320f62660, count = 1
            00002000:00000001:13.0:1344970176.078509:0:7796:0:(filter.c:1522:filter_fid2dentry()) Process leaving (rc=18446604449170728544 : -139624538823072 : ffff810320f62660)
            00002000:00010000:13.0:1344970176.078511:0:7796:0:(filter_lvb.c:116:filter_lvbo_init()) res: 0x11208 initial lvb size: 0x200, mtime: 0x502a9dbf, blocks: 0x8
            00002000:00000001:13.0:1344970176.078514:0:7796:0:(filter_lvb.c:120:filter_lvbo_init()) Process leaving
            00002000:00000002:13.0:1344970176.078514:0:7796:0:(filter.c:221:f_dput()) putting 70152: ffff810320f62660, count = 0
            00010000:00000001:13.0:1344970176.078516:0:7796:0:(ldlm_lock.c:413:ldlm_lock_new()) Process entered
            00010000:00000010:13.0:1344970176.078518:0:7796:0:(ldlm_lock.c:418:ldlm_lock_new()) slab-alloced '(lock)': 560 at ffff810305f58240.
            00000020:00000001:13.0:1344970176.078520:0:7796:0:(lustre_handles.c:88:class_handle_hash()) Process entered
            00000020:00000040:13.0:1344970176.078521:0:7796:0:(lustre_handles.c:122:class_handle_hash()) added object ffff810305f58240 with handle 0xfa6adb28e6af43de to hash
            00000020:00000001:13.0:1344970176.078524:0:7796:0:(lustre_handles.c:123:class_handle_hash()) Process leaving
            00010000:00000001:13.0:1344970176.078525:0:7796:0:(ldlm_lock.c:455:ldlm_lock_new()) Process leaving (rc=18446604448717701696 : -139624991849920 : ffff810305f58240)
            00010000:00000001:13.0:1344970176.078528:0:7796:0:(ldlm_extent.c:806:ldlm_interval_alloc()) Process entered
            00010000:00000010:13.0:1344970176.078530:0:7796:0:(ldlm_extent.c:809:ldlm_interval_alloc()) slab-alloced '(node)': 72 at ffff81030d7ceac0.
            00010000:00000001:13.0:1344970176.078532:0:7796:0:(ldlm_extent.c:815:ldlm_interval_alloc()) Process leaving (rc=18446604448844016320 : -139624865535296 : ffff81030d7ceac0)
            00010000:00000001:13.0:1344970176.078536:0:7796:0:(ldlm_lock.c:1284:ldlm_lock_create()) Process leaving (rc=18446604448717701696 : -139624991849920 : ffff810305f58240)
            00010000:00010000:13.0:1344970176.078539:0:7796:0:(ldlm_lockd.c:1152:ldlm_handle_enqueue0()) ### server-side enqueue handler, new lock created ns: filter-lustre-OST0000_UUID lock: ffff810305f58240/0xfa6adb28e6af43de lrc: 2/0,0 mode: -/PR res: 70152/0 rrc: 1 type: EXT [0->0] (req 0>0) flags: 0x0 remote: 0xeb07c50dcc271870 expref: -99 pid: 7796 timeout 0
            00000020:00000040:13.0:1344970176.078546:0:7796:0:(genops.c:1064:__class_export_add_lock_ref()) lock = ffff810305f58240, export = ffff81030c5bbc00, refs = 1
            00010000:00000040:13.0:1344970176.078548:0:7796:0:(ldlm_lockd.c:1164:ldlm_handle_enqueue0()) lock GETting export ffff81030c5bbc00 : new locks_count 16
            00000020:00000040:13.0:1344970176.078550:0:7796:0:(genops.c:782:class_export_get()) GETting export ffff81030c5bbc00 : new refcount 22
            00010000:00000001:13.0:1344970176.078553:0:7796:0:(ldlm_lock.c:1302:ldlm_lock_enqueue()) Process entered
            00002000:00000001:13.0:1344970176.078554:0:7796:0:(filter.c:1696:filter_intent_policy()) Process entered

            the filter_intent_policy is called!

            hongchao.zhang Hongchao Zhang added a comment - there is indeed glimpse request during calling 'stat $DIR/$tfile' in client-7 00000001:00010000:2.0:1344970176.071156:0:12030:0:(glimpse.c:120:cl_glimpse_lock()) Glimpsing inode [0x20b2735b0:0x1:0x0] 00000020:00010000:2.0:1344970176.071177:0:12030:0:(cl_lock.c:143:cl_lock_trace0()) enqueue lock: ffff8802e6a12938@(2 ffff8802decb8ae0 1 0 0 1 1 0)(ffff880331731d80/0/1) at cl_enqueue_try():1189 00000020:00010000:2.0:1344970176.071182:0:12030:0:(cl_lock.c:143:cl_lock_trace0()) enclosure lock: ffff8802e6a12af8@(1 (null) 0 0 0 1 1 0)(ffff880331731d80/1/1) at cl_lock_enclosure():1685 00000020:00010000:2.0:1344970176.071186:0:12030:0:(cl_lock.c:143:cl_lock_trace0()) enclosure lock: ffff8802e6a12938@(2 ffff8802decb8ae0 1 1 0 1 1 0)(ffff880331731d80/0/2) at cl_lock_enclosure():1685 00000020:00010000:2.0:1344970176.071190:0:12030:0:(cl_lock.c:143:cl_lock_trace0()) enqueue lock: ffff8802e6a12af8@(2 ffff8802decb8ae0 1 0 0 1 1 0)(ffff880331731aa0/1/0) at cl_enqueue_try():1189 in fat-intel-2 00010000:00000001:13.0:1344970176.078486:0:7796:0:(ldlm_lockd.c:1065:ldlm_handle_enqueue0()) Process entered 00010000:00010000:13.0:1344970176.078487:0:7796:0:(ldlm_lockd.c:1067:ldlm_handle_enqueue0()) ### server-side enqueue handler START 00010000:00000001:13.0:1344970176.078489:0:7796:0:(ldlm_lockd.c:1419:ldlm_request_cancel()) Process entered 00010000:00000001:13.0:1344970176.078489:0:7796:0:(ldlm_lockd.c:1423:ldlm_request_cancel()) Process leaving (rc=0 : 0 : 0) 00010000:00000001:13.0:1344970176.078491:0:7796:0:(ldlm_lock.c:1245:ldlm_lock_create()) Process entered 00010000:00000010:13.0:1344970176.078494:0:7796:0:(ldlm_resource.c:983:ldlm_resource_new()) slab-alloced '(res)': 320 at ffff8103058d1940. 00002000:00000001:13.0:1344970176.078496:0:7796:0:(filter_lvb.c:76:filter_lvbo_init()) Process entered 00002000:00000010:13.0:1344970176.078498:0:7796:0:(filter_lvb.c:84:filter_lvbo_init()) kmalloced 'lvb': 40 at ffff810320fdd880. 00002000:00000002:13.0:1344970176.078500:0:7796:0:(filter_lvb.c:96:filter_lvbo_init()) lustre-OST0000: filter_lvbo_init(o_seq=0, o_id=70152) 00002000:00000001:13.0:1344970176.078502:0:7796:0:(filter.c:1474:filter_fid2dentry()) Process entered 00002000:00000002:13.0:1344970176.078504:0:7796:0:(filter.c:1499:filter_fid2dentry()) looking up object O/d8/70152 00002000:00000002:13.0:1344970176.078507:0:7796:0:(filter.c:1518:filter_fid2dentry()) got child objid 70152: ffff810320f62660, count = 1 00002000:00000001:13.0:1344970176.078509:0:7796:0:(filter.c:1522:filter_fid2dentry()) Process leaving (rc=18446604449170728544 : -139624538823072 : ffff810320f62660) 00002000:00010000:13.0:1344970176.078511:0:7796:0:(filter_lvb.c:116:filter_lvbo_init()) res: 0x11208 initial lvb size: 0x200, mtime: 0x502a9dbf, blocks: 0x8 00002000:00000001:13.0:1344970176.078514:0:7796:0:(filter_lvb.c:120:filter_lvbo_init()) Process leaving 00002000:00000002:13.0:1344970176.078514:0:7796:0:(filter.c:221:f_dput()) putting 70152: ffff810320f62660, count = 0 00010000:00000001:13.0:1344970176.078516:0:7796:0:(ldlm_lock.c:413:ldlm_lock_new()) Process entered 00010000:00000010:13.0:1344970176.078518:0:7796:0:(ldlm_lock.c:418:ldlm_lock_new()) slab-alloced '(lock)': 560 at ffff810305f58240. 00000020:00000001:13.0:1344970176.078520:0:7796:0:(lustre_handles.c:88:class_handle_hash()) Process entered 00000020:00000040:13.0:1344970176.078521:0:7796:0:(lustre_handles.c:122:class_handle_hash()) added object ffff810305f58240 with handle 0xfa6adb28e6af43de to hash 00000020:00000001:13.0:1344970176.078524:0:7796:0:(lustre_handles.c:123:class_handle_hash()) Process leaving 00010000:00000001:13.0:1344970176.078525:0:7796:0:(ldlm_lock.c:455:ldlm_lock_new()) Process leaving (rc=18446604448717701696 : -139624991849920 : ffff810305f58240) 00010000:00000001:13.0:1344970176.078528:0:7796:0:(ldlm_extent.c:806:ldlm_interval_alloc()) Process entered 00010000:00000010:13.0:1344970176.078530:0:7796:0:(ldlm_extent.c:809:ldlm_interval_alloc()) slab-alloced '(node)': 72 at ffff81030d7ceac0. 00010000:00000001:13.0:1344970176.078532:0:7796:0:(ldlm_extent.c:815:ldlm_interval_alloc()) Process leaving (rc=18446604448844016320 : -139624865535296 : ffff81030d7ceac0) 00010000:00000001:13.0:1344970176.078536:0:7796:0:(ldlm_lock.c:1284:ldlm_lock_create()) Process leaving (rc=18446604448717701696 : -139624991849920 : ffff810305f58240) 00010000:00010000:13.0:1344970176.078539:0:7796:0:(ldlm_lockd.c:1152:ldlm_handle_enqueue0()) ### server-side enqueue handler, new lock created ns: filter-lustre-OST0000_UUID lock: ffff810305f58240/0xfa6adb28e6af43de lrc: 2/0,0 mode: - /PR res: 70152/0 rrc: 1 type: EXT [0->0] (req 0 >0) flags: 0x0 remote: 0xeb07c50dcc271870 expref: -99 pid: 7796 timeout 0 00000020:00000040:13.0:1344970176.078546:0:7796:0:(genops.c:1064:__class_export_add_lock_ref()) lock = ffff810305f58240, export = ffff81030c5bbc00, refs = 1 00010000:00000040:13.0:1344970176.078548:0:7796:0:(ldlm_lockd.c:1164:ldlm_handle_enqueue0()) lock GETting export ffff81030c5bbc00 : new locks_count 16 00000020:00000040:13.0:1344970176.078550:0:7796:0:(genops.c:782:class_export_get()) GETting export ffff81030c5bbc00 : new refcount 22 00010000:00000001:13.0:1344970176.078553:0:7796:0:(ldlm_lock.c:1302:ldlm_lock_enqueue()) Process entered 00002000:00000001:13.0:1344970176.078554:0:7796:0:(filter.c:1696:filter_intent_policy()) Process entered the filter_intent_policy is called!
            yujian Jian Yu added a comment -

            After clean upgrading from Lustre 1.8.8-wc1 to 2.1.3 RC1, the issue occurred again:
            https://maloo.whamcloud.com/test_sets/57d3cf2c-e673-11e1-afac-52540035b04c

            yujian Jian Yu added a comment - After clean upgrading from Lustre 1.8.8-wc1 to 2.1.3 RC1, the issue occurred again: https://maloo.whamcloud.com/test_sets/57d3cf2c-e673-11e1-afac-52540035b04c

            Is there a new ticket for 2.x client?
            I ran into this problem yesterday. Both server and client are 2.1.2.

            jaylan Jay Lan (Inactive) added a comment - Is there a new ticket for 2.x client? I ran into this problem yesterday. Both server and client are 2.1.2.
            yujian Jian Yu added a comment - - edited

            Old Lustre Version: 1.8.8-wc1
            Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/198/

            New Lustre Version: 2.1.2
            Lustre Tag: v2_1_2_RC2
            Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/86/
            Network: TCP (1GigE)

            After clean upgrading with the following configuration, sanity test 132 still failed:

            OSS1: RHEL5/x86_64
            OSS2: RHEL5/x86_64
            MDS: RHEL5/x86_64
            Client1: RHEL6/x86_64
            Client2: RHEL6/x86_64
            
            client-1: == sanity test 132: som avoids glimpse rpc == 04:43:30 (1339069410)
            client-1: ====> SOM is disabled, 0 glimpse RPC occured
            client-1:  sanity test_132: @@@@@@ FAIL: some glimpse RPC is expected
            client-1: Dumping lctl log to /home/yujian/test_logs/2012-06-07/043300/sanity.test_132.*.1339069414.log
            client-1: FAIL 132 (5s)
            

            Maloo report: https://maloo.whamcloud.com/test_sets/c250c2d6-b0d3-11e1-99ce-52540035b04c

            yujian Jian Yu added a comment - - edited Old Lustre Version: 1.8.8-wc1 Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/198/ New Lustre Version: 2.1.2 Lustre Tag: v2_1_2_RC2 Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/86/ Network: TCP (1GigE) After clean upgrading with the following configuration, sanity test 132 still failed: OSS1: RHEL5/x86_64 OSS2: RHEL5/x86_64 MDS: RHEL5/x86_64 Client1: RHEL6/x86_64 Client2: RHEL6/x86_64 client-1: == sanity test 132: som avoids glimpse rpc == 04:43:30 (1339069410) client-1: ====> SOM is disabled, 0 glimpse RPC occured client-1: sanity test_132: @@@@@@ FAIL: some glimpse RPC is expected client-1: Dumping lctl log to /home/yujian/test_logs/2012-06-07/043300/sanity.test_132.*.1339069414.log client-1: FAIL 132 (5s) Maloo report: https://maloo.whamcloud.com/test_sets/c250c2d6-b0d3-11e1-99ce-52540035b04c

            This is failing 16% of the time currently, with 2.6.18 clients vs. 2.6.32 servers:

            https://maloo.whamcloud.com/test_sets/35a3f450-9faa-11e1-b416-52540035b04c

            adilger Andreas Dilger added a comment - This is failing 16% of the time currently, with 2.6.18 clients vs. 2.6.32 servers: https://maloo.whamcloud.com/test_sets/35a3f450-9faa-11e1-b416-52540035b04c
            pjones Peter Jones added a comment -

            James

            I think that a new ticket would be simpler\clearer

            Peter

            pjones Peter Jones added a comment - James I think that a new ticket would be simpler\clearer Peter

            Originally this ticket reported a interop problem but I noted it that its fails with 2.X clients. It stopped failing but now its fails again with Lustre 2.X clients. prehaps this ticket shoudl be retitled.

            simmonsja James A Simmons added a comment - Originally this ticket reported a interop problem but I noted it that its fails with 2.X clients. It stopped failing but now its fails again with Lustre 2.X clients. prehaps this ticket shoudl be retitled.
            pjones Peter Jones added a comment -

            James

            Can you please confirm that the 1.8.x client that you are testing with is at least version 1.8.7-wc1 (not the Oracle 1.8.7)?

            Thanks

            Peter

            pjones Peter Jones added a comment - James Can you please confirm that the 1.8.x client that you are testing with is at least version 1.8.7-wc1 (not the Oracle 1.8.7)? Thanks Peter

            Peter can you reopen this ticket.

            simmonsja James A Simmons added a comment - Peter can you reopen this ticket.
            pjones Peter Jones added a comment -

            Thanks James!

            pjones Peter Jones added a comment - Thanks James!

            People

              hongchao.zhang Hongchao Zhang
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: