Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-582

1.8<->2.1 interop: sanity test_132: FAIL: some glimpse RPC is expected

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.4.0, Lustre 2.1.5
    • Lustre 2.0.0, Lustre 2.1.2, Lustre 2.1.3, Lustre 1.8.8, Lustre 1.8.6
    • None
    • 3
    • 23,339
    • 4234

    Description

      After the upgrading, sanity test 132 failed on Lustre 2.0.66.0 as follows:

      == sanity test 132: som avoids glimpse rpc == 03:17:56 (1312885076)
      ====> SOM is disabled, 0 glimpse RPC occured
       sanity test_132: @@@@@@ FAIL: some glimpse RPC is expected 
      

      Please refer to the Maloo report for more logs: https://maloo.whamcloud.com/test_sets/a570d34e-c278-11e0-8bdf-52540025f9af

      This is an known issue: bug 23339.

      Attachments

        Issue Links

          Activity

            [LU-582] 1.8<->2.1 interop: sanity test_132: FAIL: some glimpse RPC is expected

            I've been using http://review.whamcloud.com/#change,3693 for
            several months, and recommend it for b2_1.

            schamp Stephen Champion added a comment - I've been using http://review.whamcloud.com/#change,3693 for several months, and recommend it for b2_1.

            I saw this problem between 2.1.3 server and 2.3.0 client. Let me know if you want the test_logs.

            jaylan Jay Lan (Inactive) added a comment - I saw this problem between 2.1.3 server and 2.3.0 client. Let me know if you want the test_logs.

            the patch has been updated

            hongchao.zhang Hongchao Zhang added a comment - the patch has been updated

            there is a bug in get_ost_param

            get_ost_param() {
            local token=$1
            local gl_sum=0
            for node in $(osts_nodes); do
            gl=$(do_node $node "$LCTL get_param -n ost.OSS.ost.stats" | awk '/'$token'/

            {print $2}

            ' | head -n 1)
            [ x$gl = x"" ] && gl=0
            gl_sum=$((gl_sum + gl))
            done
            echo $gl <--- here should be "echo $gl_sum"!
            }

            previous, there is only one OSS in autotest, then the problem is hidden, but in upgrade test, there are 2 OSS,
            then the issue shows up. the updated patch will be attached soon!

            hongchao.zhang Hongchao Zhang added a comment - there is a bug in get_ost_param get_ost_param() { local token=$1 local gl_sum=0 for node in $(osts_nodes); do gl=$(do_node $node "$LCTL get_param -n ost.OSS.ost.stats" | awk '/'$token'/ {print $2} ' | head -n 1) [ x$gl = x"" ] && gl=0 gl_sum=$((gl_sum + gl)) done echo $gl <--- here should be "echo $gl_sum"! } previous, there is only one OSS in autotest, then the problem is hidden, but in upgrade test, there are 2 OSS, then the issue shows up. the updated patch will be attached soon!
            yujian Jian Yu added a comment -

            the debug patch is tracked at http://review.whamcloud.com/#change,3692

            The real patch is in http://review.whamcloud.com/#change,3693. Testing is ongoing.

            yujian Jian Yu added a comment - the debug patch is tracked at http://review.whamcloud.com/#change,3692 The real patch is in http://review.whamcloud.com/#change,3693 . Testing is ongoing.

            Yujian help to reproduce the bug after upgrading Lustre, and the ldlm_glimpse_enqueue is indeed increased.

            the debug patch is tracked at http://review.whamcloud.com/#change,3692

            Hi Yujian, could you please help to test with the debug patch? thanks!!

            hongchao.zhang Hongchao Zhang added a comment - Yujian help to reproduce the bug after upgrading Lustre, and the ldlm_glimpse_enqueue is indeed increased. the debug patch is tracked at http://review.whamcloud.com/#change,3692 Hi Yujian, could you please help to test with the debug patch? thanks!!

            the logs of client and OST

            hongchao.zhang Hongchao Zhang added a comment - the logs of client and OST

            there is indeed glimpse request during calling 'stat $DIR/$tfile'

            in client-7

            00000001:00010000:2.0:1344970176.071156:0:12030:0:(glimpse.c:120:cl_glimpse_lock()) Glimpsing inode [0x20b2735b0:0x1:0x0]
            00000020:00010000:2.0:1344970176.071177:0:12030:0:(cl_lock.c:143:cl_lock_trace0()) enqueue lock: ffff8802e6a12938@(2 ffff8802decb8ae0 1 0 0 1 1 0)(ffff880331731d80/0/1) at cl_enqueue_try():1189
            00000020:00010000:2.0:1344970176.071182:0:12030:0:(cl_lock.c:143:cl_lock_trace0()) enclosure lock: ffff8802e6a12af8@(1 (null) 0 0 0 1 1 0)(ffff880331731d80/1/1) at cl_lock_enclosure():1685
            00000020:00010000:2.0:1344970176.071186:0:12030:0:(cl_lock.c:143:cl_lock_trace0()) enclosure lock: ffff8802e6a12938@(2 ffff8802decb8ae0 1 1 0 1 1 0)(ffff880331731d80/0/2) at cl_lock_enclosure():1685
            00000020:00010000:2.0:1344970176.071190:0:12030:0:(cl_lock.c:143:cl_lock_trace0()) enqueue lock: ffff8802e6a12af8@(2 ffff8802decb8ae0 1 0 0 1 1 0)(ffff880331731aa0/1/0) at cl_enqueue_try():1189

            in fat-intel-2

            00010000:00000001:13.0:1344970176.078486:0:7796:0:(ldlm_lockd.c:1065:ldlm_handle_enqueue0()) Process entered
            00010000:00010000:13.0:1344970176.078487:0:7796:0:(ldlm_lockd.c:1067:ldlm_handle_enqueue0()) ### server-side enqueue handler START
            00010000:00000001:13.0:1344970176.078489:0:7796:0:(ldlm_lockd.c:1419:ldlm_request_cancel()) Process entered
            00010000:00000001:13.0:1344970176.078489:0:7796:0:(ldlm_lockd.c:1423:ldlm_request_cancel()) Process leaving (rc=0 : 0 : 0)
            00010000:00000001:13.0:1344970176.078491:0:7796:0:(ldlm_lock.c:1245:ldlm_lock_create()) Process entered
            00010000:00000010:13.0:1344970176.078494:0:7796:0:(ldlm_resource.c:983:ldlm_resource_new()) slab-alloced '(res)': 320 at ffff8103058d1940.
            00002000:00000001:13.0:1344970176.078496:0:7796:0:(filter_lvb.c:76:filter_lvbo_init()) Process entered
            00002000:00000010:13.0:1344970176.078498:0:7796:0:(filter_lvb.c:84:filter_lvbo_init()) kmalloced 'lvb': 40 at ffff810320fdd880.
            00002000:00000002:13.0:1344970176.078500:0:7796:0:(filter_lvb.c:96:filter_lvbo_init()) lustre-OST0000: filter_lvbo_init(o_seq=0, o_id=70152)
            00002000:00000001:13.0:1344970176.078502:0:7796:0:(filter.c:1474:filter_fid2dentry()) Process entered
            00002000:00000002:13.0:1344970176.078504:0:7796:0:(filter.c:1499:filter_fid2dentry()) looking up object O/d8/70152
            00002000:00000002:13.0:1344970176.078507:0:7796:0:(filter.c:1518:filter_fid2dentry()) got child objid 70152: ffff810320f62660, count = 1
            00002000:00000001:13.0:1344970176.078509:0:7796:0:(filter.c:1522:filter_fid2dentry()) Process leaving (rc=18446604449170728544 : -139624538823072 : ffff810320f62660)
            00002000:00010000:13.0:1344970176.078511:0:7796:0:(filter_lvb.c:116:filter_lvbo_init()) res: 0x11208 initial lvb size: 0x200, mtime: 0x502a9dbf, blocks: 0x8
            00002000:00000001:13.0:1344970176.078514:0:7796:0:(filter_lvb.c:120:filter_lvbo_init()) Process leaving
            00002000:00000002:13.0:1344970176.078514:0:7796:0:(filter.c:221:f_dput()) putting 70152: ffff810320f62660, count = 0
            00010000:00000001:13.0:1344970176.078516:0:7796:0:(ldlm_lock.c:413:ldlm_lock_new()) Process entered
            00010000:00000010:13.0:1344970176.078518:0:7796:0:(ldlm_lock.c:418:ldlm_lock_new()) slab-alloced '(lock)': 560 at ffff810305f58240.
            00000020:00000001:13.0:1344970176.078520:0:7796:0:(lustre_handles.c:88:class_handle_hash()) Process entered
            00000020:00000040:13.0:1344970176.078521:0:7796:0:(lustre_handles.c:122:class_handle_hash()) added object ffff810305f58240 with handle 0xfa6adb28e6af43de to hash
            00000020:00000001:13.0:1344970176.078524:0:7796:0:(lustre_handles.c:123:class_handle_hash()) Process leaving
            00010000:00000001:13.0:1344970176.078525:0:7796:0:(ldlm_lock.c:455:ldlm_lock_new()) Process leaving (rc=18446604448717701696 : -139624991849920 : ffff810305f58240)
            00010000:00000001:13.0:1344970176.078528:0:7796:0:(ldlm_extent.c:806:ldlm_interval_alloc()) Process entered
            00010000:00000010:13.0:1344970176.078530:0:7796:0:(ldlm_extent.c:809:ldlm_interval_alloc()) slab-alloced '(node)': 72 at ffff81030d7ceac0.
            00010000:00000001:13.0:1344970176.078532:0:7796:0:(ldlm_extent.c:815:ldlm_interval_alloc()) Process leaving (rc=18446604448844016320 : -139624865535296 : ffff81030d7ceac0)
            00010000:00000001:13.0:1344970176.078536:0:7796:0:(ldlm_lock.c:1284:ldlm_lock_create()) Process leaving (rc=18446604448717701696 : -139624991849920 : ffff810305f58240)
            00010000:00010000:13.0:1344970176.078539:0:7796:0:(ldlm_lockd.c:1152:ldlm_handle_enqueue0()) ### server-side enqueue handler, new lock created ns: filter-lustre-OST0000_UUID lock: ffff810305f58240/0xfa6adb28e6af43de lrc: 2/0,0 mode: -/PR res: 70152/0 rrc: 1 type: EXT [0->0] (req 0>0) flags: 0x0 remote: 0xeb07c50dcc271870 expref: -99 pid: 7796 timeout 0
            00000020:00000040:13.0:1344970176.078546:0:7796:0:(genops.c:1064:__class_export_add_lock_ref()) lock = ffff810305f58240, export = ffff81030c5bbc00, refs = 1
            00010000:00000040:13.0:1344970176.078548:0:7796:0:(ldlm_lockd.c:1164:ldlm_handle_enqueue0()) lock GETting export ffff81030c5bbc00 : new locks_count 16
            00000020:00000040:13.0:1344970176.078550:0:7796:0:(genops.c:782:class_export_get()) GETting export ffff81030c5bbc00 : new refcount 22
            00010000:00000001:13.0:1344970176.078553:0:7796:0:(ldlm_lock.c:1302:ldlm_lock_enqueue()) Process entered
            00002000:00000001:13.0:1344970176.078554:0:7796:0:(filter.c:1696:filter_intent_policy()) Process entered

            the filter_intent_policy is called!

            hongchao.zhang Hongchao Zhang added a comment - there is indeed glimpse request during calling 'stat $DIR/$tfile' in client-7 00000001:00010000:2.0:1344970176.071156:0:12030:0:(glimpse.c:120:cl_glimpse_lock()) Glimpsing inode [0x20b2735b0:0x1:0x0] 00000020:00010000:2.0:1344970176.071177:0:12030:0:(cl_lock.c:143:cl_lock_trace0()) enqueue lock: ffff8802e6a12938@(2 ffff8802decb8ae0 1 0 0 1 1 0)(ffff880331731d80/0/1) at cl_enqueue_try():1189 00000020:00010000:2.0:1344970176.071182:0:12030:0:(cl_lock.c:143:cl_lock_trace0()) enclosure lock: ffff8802e6a12af8@(1 (null) 0 0 0 1 1 0)(ffff880331731d80/1/1) at cl_lock_enclosure():1685 00000020:00010000:2.0:1344970176.071186:0:12030:0:(cl_lock.c:143:cl_lock_trace0()) enclosure lock: ffff8802e6a12938@(2 ffff8802decb8ae0 1 1 0 1 1 0)(ffff880331731d80/0/2) at cl_lock_enclosure():1685 00000020:00010000:2.0:1344970176.071190:0:12030:0:(cl_lock.c:143:cl_lock_trace0()) enqueue lock: ffff8802e6a12af8@(2 ffff8802decb8ae0 1 0 0 1 1 0)(ffff880331731aa0/1/0) at cl_enqueue_try():1189 in fat-intel-2 00010000:00000001:13.0:1344970176.078486:0:7796:0:(ldlm_lockd.c:1065:ldlm_handle_enqueue0()) Process entered 00010000:00010000:13.0:1344970176.078487:0:7796:0:(ldlm_lockd.c:1067:ldlm_handle_enqueue0()) ### server-side enqueue handler START 00010000:00000001:13.0:1344970176.078489:0:7796:0:(ldlm_lockd.c:1419:ldlm_request_cancel()) Process entered 00010000:00000001:13.0:1344970176.078489:0:7796:0:(ldlm_lockd.c:1423:ldlm_request_cancel()) Process leaving (rc=0 : 0 : 0) 00010000:00000001:13.0:1344970176.078491:0:7796:0:(ldlm_lock.c:1245:ldlm_lock_create()) Process entered 00010000:00000010:13.0:1344970176.078494:0:7796:0:(ldlm_resource.c:983:ldlm_resource_new()) slab-alloced '(res)': 320 at ffff8103058d1940. 00002000:00000001:13.0:1344970176.078496:0:7796:0:(filter_lvb.c:76:filter_lvbo_init()) Process entered 00002000:00000010:13.0:1344970176.078498:0:7796:0:(filter_lvb.c:84:filter_lvbo_init()) kmalloced 'lvb': 40 at ffff810320fdd880. 00002000:00000002:13.0:1344970176.078500:0:7796:0:(filter_lvb.c:96:filter_lvbo_init()) lustre-OST0000: filter_lvbo_init(o_seq=0, o_id=70152) 00002000:00000001:13.0:1344970176.078502:0:7796:0:(filter.c:1474:filter_fid2dentry()) Process entered 00002000:00000002:13.0:1344970176.078504:0:7796:0:(filter.c:1499:filter_fid2dentry()) looking up object O/d8/70152 00002000:00000002:13.0:1344970176.078507:0:7796:0:(filter.c:1518:filter_fid2dentry()) got child objid 70152: ffff810320f62660, count = 1 00002000:00000001:13.0:1344970176.078509:0:7796:0:(filter.c:1522:filter_fid2dentry()) Process leaving (rc=18446604449170728544 : -139624538823072 : ffff810320f62660) 00002000:00010000:13.0:1344970176.078511:0:7796:0:(filter_lvb.c:116:filter_lvbo_init()) res: 0x11208 initial lvb size: 0x200, mtime: 0x502a9dbf, blocks: 0x8 00002000:00000001:13.0:1344970176.078514:0:7796:0:(filter_lvb.c:120:filter_lvbo_init()) Process leaving 00002000:00000002:13.0:1344970176.078514:0:7796:0:(filter.c:221:f_dput()) putting 70152: ffff810320f62660, count = 0 00010000:00000001:13.0:1344970176.078516:0:7796:0:(ldlm_lock.c:413:ldlm_lock_new()) Process entered 00010000:00000010:13.0:1344970176.078518:0:7796:0:(ldlm_lock.c:418:ldlm_lock_new()) slab-alloced '(lock)': 560 at ffff810305f58240. 00000020:00000001:13.0:1344970176.078520:0:7796:0:(lustre_handles.c:88:class_handle_hash()) Process entered 00000020:00000040:13.0:1344970176.078521:0:7796:0:(lustre_handles.c:122:class_handle_hash()) added object ffff810305f58240 with handle 0xfa6adb28e6af43de to hash 00000020:00000001:13.0:1344970176.078524:0:7796:0:(lustre_handles.c:123:class_handle_hash()) Process leaving 00010000:00000001:13.0:1344970176.078525:0:7796:0:(ldlm_lock.c:455:ldlm_lock_new()) Process leaving (rc=18446604448717701696 : -139624991849920 : ffff810305f58240) 00010000:00000001:13.0:1344970176.078528:0:7796:0:(ldlm_extent.c:806:ldlm_interval_alloc()) Process entered 00010000:00000010:13.0:1344970176.078530:0:7796:0:(ldlm_extent.c:809:ldlm_interval_alloc()) slab-alloced '(node)': 72 at ffff81030d7ceac0. 00010000:00000001:13.0:1344970176.078532:0:7796:0:(ldlm_extent.c:815:ldlm_interval_alloc()) Process leaving (rc=18446604448844016320 : -139624865535296 : ffff81030d7ceac0) 00010000:00000001:13.0:1344970176.078536:0:7796:0:(ldlm_lock.c:1284:ldlm_lock_create()) Process leaving (rc=18446604448717701696 : -139624991849920 : ffff810305f58240) 00010000:00010000:13.0:1344970176.078539:0:7796:0:(ldlm_lockd.c:1152:ldlm_handle_enqueue0()) ### server-side enqueue handler, new lock created ns: filter-lustre-OST0000_UUID lock: ffff810305f58240/0xfa6adb28e6af43de lrc: 2/0,0 mode: - /PR res: 70152/0 rrc: 1 type: EXT [0->0] (req 0 >0) flags: 0x0 remote: 0xeb07c50dcc271870 expref: -99 pid: 7796 timeout 0 00000020:00000040:13.0:1344970176.078546:0:7796:0:(genops.c:1064:__class_export_add_lock_ref()) lock = ffff810305f58240, export = ffff81030c5bbc00, refs = 1 00010000:00000040:13.0:1344970176.078548:0:7796:0:(ldlm_lockd.c:1164:ldlm_handle_enqueue0()) lock GETting export ffff81030c5bbc00 : new locks_count 16 00000020:00000040:13.0:1344970176.078550:0:7796:0:(genops.c:782:class_export_get()) GETting export ffff81030c5bbc00 : new refcount 22 00010000:00000001:13.0:1344970176.078553:0:7796:0:(ldlm_lock.c:1302:ldlm_lock_enqueue()) Process entered 00002000:00000001:13.0:1344970176.078554:0:7796:0:(filter.c:1696:filter_intent_policy()) Process entered the filter_intent_policy is called!
            yujian Jian Yu added a comment -

            After clean upgrading from Lustre 1.8.8-wc1 to 2.1.3 RC1, the issue occurred again:
            https://maloo.whamcloud.com/test_sets/57d3cf2c-e673-11e1-afac-52540035b04c

            yujian Jian Yu added a comment - After clean upgrading from Lustre 1.8.8-wc1 to 2.1.3 RC1, the issue occurred again: https://maloo.whamcloud.com/test_sets/57d3cf2c-e673-11e1-afac-52540035b04c

            Is there a new ticket for 2.x client?
            I ran into this problem yesterday. Both server and client are 2.1.2.

            jaylan Jay Lan (Inactive) added a comment - Is there a new ticket for 2.x client? I ran into this problem yesterday. Both server and client are 2.1.2.

            People

              hongchao.zhang Hongchao Zhang
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: