Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5432

bogus FIDs cause endless loops in fld_client_rpc()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.7.0
    • Lustre 2.7.0
    • 3
    • 15132

    Description

      If a secondary MDT tries to use a FID with a bogus sequence then the handler will loop forever in fld_client_rpc():

      [ 9667.196083] LustreError: 28865:0:(fld_handler.c:261:fld_server_lookup()) srv-lustre-MDT0000: Cannot find sequence 0x4000002c0000401: rc = -2
      [ 9667.198478] LustreError: 28865:0:(fld_handler.c:261:fld_server_lookup()) Skipped 167399 previous similar messages
      [ 9699.057201] LNet: 4534:0:(watchdog.c:200:lcw_dump_stack()) Service thread pid 17054 was inactive for 62.00s. The thread might be hung, or it might only be slow and will resume  later. Dumping the stack trace for debugging purposes:
      [ 9699.061160] Pid: 17054, comm: mdt01_011
      
      17054 mdt01_011
      [<ffffffffa06821fa>] ptlrpc_set_wait+0x2ea/0x830 [ptlrpc]
      [<ffffffffa06827c7>] ptlrpc_queue_wait+0x87/0x220 [ptlrpc]
      [<ffffffffa087455b>] fld_client_rpc+0x15b/0x4b0 [fld]
      [<ffffffffa0879c81>] fld_server_lookup+0x151/0x340 [fld]
      [<ffffffffa0d6f567>] lod_fld_lookup+0x1e7/0x350 [lod]
      [<ffffffffa0d81b63>] lod_object_init+0x103/0x3c0 [lod]
      [<ffffffffa0455b98>] lu_object_alloc+0xd8/0x320 [obdclass]
      [<ffffffffa045718f>] lu_object_find_at+0x2bf/0x410 [obdclass]
      [<ffffffffa04572f6>] lu_object_find+0x16/0x20 [obdclass]
      [<ffffffffa0c95f56>] mdt_object_find+0x56/0x170 [mdt]
      [<ffffffffa0ccbe71>] mdt_reint_open+0x2e1/0x2180 [mdt]
      [<ffffffffa0cb2811>] mdt_reint_rec+0x41/0xe0 [mdt]
      [<ffffffffa0c9cdb3>] mdt_reint_internal+0x4d3/0x7b0 [mdt]
      [<ffffffffa0c9d286>] mdt_intent_reint+0x1f6/0x520 [mdt]
      [<ffffffffa0c9b929>] mdt_intent_policy+0x499/0xcf0 [mdt]
      [<ffffffffa0644342>] ldlm_lock_enqueue+0x302/0x880 [ptlrpc]
      [<ffffffffa066c343>] ldlm_handle_enqueue0+0x373/0x1130 [ptlrpc]
      [<ffffffffa06eb592>] tgt_enqueue+0x62/0x1d0 [ptlrpc]
      [<ffffffffa06eacbe>] tgt_request_handle+0x71e/0xb10 [ptlrpc]
      [<ffffffffa069d847>] ptlrpc_main+0xd47/0x1860 [ptlrpc]
      [<ffffffff8109eab6>] kthread+0x96/0xa0
      [<ffffffff8100c30a>] child_rip+0xa/0x20
      [<ffffffffffffffff>] 0xffffffffffffffff
      

      This was found through RPC corruption.

      Attachments

        Activity

          People

            jhammond John Hammond
            jhammond John Hammond
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: