Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11362

sanity test_156: timeout loop in ptlrpc_check_set()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for John Hammond <jhammond@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/8ee96d5e-b56c-11e8-a7de-52540065bddc

      test_156 failed with the following error:

      Timeout occurred after 166 mins, last suite running was sanity, restarting cluster to continue tests
      

      dd (pid 6412) is looping in ptlrpc_check_set():

      [10125.205230] dd              S ffff968d3a832f70     0  6412  31766 0x00000080
      [10125.206015] Call Trace:
      [10125.206273]  [<ffffffffac914029>] schedule+0x29/0x70
      [10125.206797]  [<ffffffffac9118d4>] schedule_timeout+0x174/0x2c0
      [10125.207550]  [<ffffffffac2a3750>] ? internal_add_timer+0x70/0x70
      [10125.208213]  [<ffffffffc0af5650>] ? ptlrpc_init_rq_pool+0x110/0x110 [ptlrpc]
      [10125.208966]  [<ffffffffc0aff3b0>] ptlrpc_set_wait+0x480/0x790 [ptlrpc]
      [10125.209703]  [<ffffffffac2cf670>] ? wake_up_state+0x20/0x20
      [10125.210355]  [<ffffffffc0aff73d>] ptlrpc_queue_wait+0x7d/0x220 [ptlrpc]
      [10125.211068]  [<ffffffffc0ae46a2>] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc]
      [10125.211774]  [<ffffffffc0adf7c0>] ? ldlm_expired_completion_wait+0x220/0x220 [ptlrpc]
      [10125.212679]  [<ffffffffc0c7e710>] ? osc_lock_lockless_cancel+0xe0/0xe0 [osc]
      [10125.213491]  [<ffffffffc0c7dad0>] ? osc_lock_upcall+0x580/0x580 [osc]
      [10125.214167]  [<ffffffffc0c74965>] osc_enqueue_base+0x2b5/0x6a0 [osc]
      [10125.214844]  [<ffffffffc0c7d550>] ? osc_lock_lvb_update+0x330/0x330 [osc]
      [10125.215616]  [<ffffffffc0c7f17b>] osc_lock_enqueue+0x38b/0x840 [osc]
      [10125.216297]  [<ffffffffc0c7d550>] ? osc_lock_lvb_update+0x330/0x330 [osc]
      [10125.217058]  [<ffffffffc0935de5>] cl_lock_enqueue+0x65/0x120 [obdclass]
      [10125.217778]  [<ffffffffc0cd8285>] lov_lock_enqueue+0x95/0x150 [lov]
      [10125.218497]  [<ffffffffc0935de5>] cl_lock_enqueue+0x65/0x120 [obdclass]
      [10125.219225]  [<ffffffffc0936377>] cl_lock_request+0x67/0x1f0 [obdclass]
      [10125.219931]  [<ffffffffc093a26b>] cl_io_lock+0x2bb/0x3d0 [obdclass]
      [10125.220688]  [<ffffffffc093a5fb>] cl_io_loop+0x11b/0xc70 [obdclass]
      [10125.221444]  [<ffffffffc0d30e02>] ll_file_io_generic+0x4e2/0xd10 [lustre]
      [10125.222178]  [<ffffffffc0d31b82>] ll_file_aio_write+0x372/0x540 [lustre]
      [10125.222880]  [<ffffffffc0d31df4>] ll_file_write+0xa4/0x170 [lustre]
      [10125.223591]  [<ffffffffac41b490>] vfs_write+0xc0/0x1f0
      [10125.224151]  [<ffffffffac9206e1>] ? system_call_after_swapgs+0xae/0x146
      [10125.224845]  [<ffffffffac41c2bf>] SyS_write+0x7f/0xf0
      [10125.225434]  [<ffffffffac9206e1>] ? system_call_after_swapgs+0xae/0x146
      [10125.226136]  [<ffffffffac920795>] system_call_fastpath+0x1c/0x21
      [10125.226762]  [<ffffffffac9206e1>] ? system_call_after_swapgs+0xae/0x146
      

       
      client logs show dd does 2000+ 1-second loops in ptlrpc_set_wait():

      00010000:00010000:1.0:1536619472.378230:0:6412:0:(ldlm_request.c:942:ldlm_cli_enqueue()) ### client-side enqueue START, flags 0x40000000 ns: lustre-OST0002-osc-ffff968d3a345000 lock: ffff968d26d2c480/0x17ff4219e8fed853 lrc: 3/0,1 mode: --/PW res: [0x2ff0:0x0:0x0].0x0 rrc: 2 type: EXT [0->4095] (req 0->4095) flags: 0x0 nid: local remote: 0x0 expref: -99 pid: 6412 timeout: 0 lvb_type: 1
      00010000:00010000:1.0:1536619472.378239:0:6412:0:(ldlm_request.c:1014:ldlm_cli_enqueue()) ### sending request ns: lustre-OST0002-osc-ffff968d3a345000 lock: ffff968d26d2c480/0x17ff4219e8fed853 lrc: 3/0,1 mode: --/PW res: [0x2ff0:0x0:0x0].0x0 rrc: 2 type: EXT [0->4095] (req 0->4095) flags: 0x0 nid: local remote: 0x0 expref: -99 pid: 6412 timeout: 0 lvb_type: 1
      00000100:00080000:1.0:1536619472.378247:0:6412:0:(client.c:1569:ptlrpc_send_new_req()) @@@ req waiting for recovery: (FULL != CONNECTING)  req@ffff968d218dcc00 x1611261872492656/t0(0) o101->lustre-OST0002-osc-ffff968d3a345000@10.9.5.125@tcp:28/4 lens 328/400 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
      00000100:00100000:1.0:1536619472.378251:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds
      00000100:00100000:1.0:1536619473.377745:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds
      00010000:00010000:1.0:1536619473.852020:0:18075:0:(ldlm_lockd.c:1669:ldlm_handle_bl_callback()) ### client blocking AST callback handler ns: lustre-OST0001-osc-ffff968d3a345000 lock: ffff968cf7975440/0x17ff4219e8fed845 lrc: 2/0,0 mode: PR/PR res: [0x2fd0:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x420000010000 nid: local remote: 0x61c83a90b60e6c16 expref: -99 pid: 6411 timeout: 0 lvb_type: 1
      00010000:00010000:1.0:1536619473.852031:0:18075:0:(ldlm_lockd.c:1700:ldlm_handle_bl_callback()) Lock ffff968cf7975440 already unused, calling callback (ffffffffc0c7e710)
      00010000:00010000:1.0:1536619473.852034:0:18075:0:(ldlm_request.c:1251:ldlm_cli_cancel_local()) ### client-side cancel ns: lustre-OST0001-osc-ffff968d3a345000 lock: ffff968cf7975440/0x17ff4219e8fed845 lrc: 3/0,0 mode: PR/PR res: [0x2fd0:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x428400010000 nid: local remote: 0x61c83a90b60e6c16 expref: -99 pid: 6411 timeout: 0 lvb_type: 1
      00010000:00010000:1.0:1536619473.852117:0:18075:0:(ldlm_request.c:1310:ldlm_cancel_pack()) ### packing ns: lustre-OST0001-osc-ffff968d3a345000 lock: ffff968cf7975440/0x17ff4219e8fed845 lrc: 2/0,0 mode: --/PR res: [0x2fd0:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x4c29400010000 nid: local remote: 0x61c83a90b60e6c16 expref: -99 pid: 6411 timeout: 0 lvb_type: 1
      00010000:00010000:1.0:1536619473.852129:0:18075:0:(ldlm_request.c:1314:ldlm_cancel_pack()) 1 locks packed
      00010000:00010000:1.0:1536619473.852141:0:18075:0:(ldlm_lockd.c:1709:ldlm_handle_bl_callback()) ### client blocking callback handler END ns: lustre-OST0001-osc-ffff968d3a345000 lock: ffff968cf7975440/0x17ff4219e8fed845 lrc: 1/0,0 mode: --/PR res: [0x2fd0:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x4c29400010000 nid: local remote: 0x61c83a90b60e6c16 expref: -99 pid: 6411 timeout: 0 lvb_type: 1
      00010000:00010000:1.0:1536619473.852147:0:18075:0:(ldlm_lock.c:197:ldlm_lock_put()) ### final lock_put on destroyed lock, freeing it. ns: lustre-OST0001-osc-ffff968d3a345000 lock: ffff968cf7975440/0x17ff4219e8fed845 lrc: 0/0,0 mode: --/PR res: [0x2fd0:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x4c29400010000 nid: local remote: 0x61c83a90b60e6c16 expref: -99 pid: 6411 timeout: 0 lvb_type: 1
      00000100:00100000:1.0:1536619473.852166:0:1881:0:(client.c:1625:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_01_01:ac35fb57-abc0-fa1f-db48-e4504a124ce3:1881:1611261872492672:10.9.5.125@tcp:103
      00000100:00100000:1.0:1536619473.852624:0:1881:0:(client.c:2053:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_01_01:ac35fb57-abc0-fa1f-db48-e4504a124ce3:1881:1611261872492672:10.9.5.125@tcp:103
      00000100:00100000:1.0:1536619474.377727:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds
      00000100:00100000:1.0:1536619475.377720:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds
      00000100:00100000:1.0:1536619476.377721:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds
      00000100:00100000:1.0:1536619477.377737:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds
      00000100:00100000:1.0:1536619478.377757:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds
      00000100:00100000:1.0:1536619479.377746:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds
      00000100:00100000:1.0:1536619480.377722:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds
      00000100:00100000:1.0:1536619481.377735:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds
      00000100:00100000:1.0:1536619482.377718:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds
      00000100:00100000:1.0:1536619483.377739:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds
      00000100:00100000:1.0:1536619484.377729:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds
      00000100:00100000:1.0:1536619485.377712:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds
      00000100:00100000:1.0:1536619486.377724:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds
      ...
      00000100:00100000:1.0:1536621583.377721:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds
      00000100:00100000:1.0:1536621584.377725:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds
      00000100:00100000:1.0:1536621585.377723:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity test_156 - Timeout occurred after 166 mins, last suite running was sanity, restarting cluster to continue tests

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: