Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5261

user process is unkillable in wait_for_completion()

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.6.0, Lustre 2.5.2, Lustre 2.15.0
    • None
    • 3
    • 14680

    Description

      The user processes waiting in wait_for_completion() (osc_io_setattr_end() and osc_io_fsync_end()) are unkillable and require the node to be rebooted if the server is unavailable:

      LustreError: 13775:0:(ofd_obd.c:873:ofd_setattr()) testfs-OST0001: can't find object [0x100000000:0x5:0x0]
      Lustre: testfs-OST0001-o: trigger OI scrub by RPC for [0x100000000:0x5:0x0], rc = 0 [1]
      INFO: task touch:15134 blocked for more than 120 seconds.
      touch         D 0000000000000001     0 15134  15113 0x00000000
      Call Trace:
       [<ffffffff8150f475>] schedule_timeout+0x215/0x2e0
       [<ffffffff8150f0f3>] wait_for_common+0x123/0x180
       [<ffffffff8150f20d>] wait_for_completion+0x1d/0x20
       [<ffffffffa0cdba7c>] osc_io_setattr_end+0xbc/0x190 [osc]
       [<ffffffffa08bd100>] cl_io_end+0x60/0x150 [obdclass]
       [<ffffffffa0d554b1>] lov_io_end_wrapper+0xf1/0x100 [lov]
       [<ffffffffa0d551fe>] lov_io_call+0x8e/0x130 [lov]
       [<ffffffffa0d56f8c>] lov_io_end+0x4c/0xf0 [lov]
       [<ffffffffa08bd100>] cl_io_end+0x60/0x150 [obdclass]
       [<ffffffffa08c1e82>] cl_io_loop+0xc2/0x1b0 [obdclass]
       [<ffffffffa11838d8>] cl_setattr_ost+0x218/0x2f0 [lustre]
       [<ffffffffa11501cc>] ll_setattr_raw+0xa2c/0x1080 [lustre]
       [<ffffffffa115087d>] ll_setattr+0x5d/0xf0 [lustre]
       [<ffffffff8119ead8>] notify_change+0x168/0x340
       [<ffffffff811b2b7c>] utimes_common+0xdc/0x1b0
       [<ffffffff811b2ce9>] do_utimes+0x99/0xf0
       [<ffffffff811b2e42>] sys_utimensat+0x32/0x90
      

      The problem being hit on the OST is somewhat irrelevant for the purposes of this bug. It would be ideal if the client actually handled this error properly and didn't hang at all, but there will always be some other case where the OST is inactive and the client doesn't get any reply at all.

      Instead of using wait_for_completion() this could use l_wait_event() or wait_for_completion_killable() so that the user process can be killed if there is a problem on the OST.

      Attachments

        Issue Links

          Activity

            People

              emoly.liu Emoly Liu
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: