Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • 3
    • 6294

    Description

      Some recent landing introduced a problem in osp cleanup.
      Specifically test 27 of recovery small seems to be affected.
      This test specifically breaks osc communication and perhaps osp is not able to recover?
      trace of hung umount:

      PID: 28642  TASK: ffff880097b0a140  CPU: 6   COMMAND: "umount"
       #0 [ffff88007b6c1898] schedule at ffffffff814f7c98
       #1 [ffff88007b6c1960] osp_sync_fini at ffffffffa069d09d [osp]
       #2 [ffff88007b6c19c0] osp_process_config at ffffffffa06972c0 [osp]
       #3 [ffff88007b6c1a20] lod_cleanup_desc_tgts at ffffffffa05ed564 [lod]
       #4 [ffff88007b6c1a70] lod_process_config at ffffffffa05f0266 [lod]
       #5 [ffff88007b6c1af0] mdd_process_config at ffffffffa0427c4b [mdd]
       #6 [ffff88007b6c1b50] mdt_stack_fini at ffffffffa0726b21 [mdt]
       #7 [ffff88007b6c1bb0] mdt_device_fini at ffffffffa072799a [mdt]
       #8 [ffff88007b6c1bf0] class_cleanup at ffffffffa0fb5247 [obdclass]
       #9 [ffff88007b6c1c70] class_process_config at ffffffffa0fb6b2c [obdclass]
      #10 [ffff88007b6c1d00] class_manual_cleanup at ffffffffa0fb7869 [obdclass]
      #11 [ffff88007b6c1dc0] server_put_super at ffffffffa0fc83bc [obdclass]
      #12 [ffff88007b6c1e30] generic_shutdown_super at ffffffff8117d6ab
      #13 [ffff88007b6c1e50] kill_anon_super at ffffffff8117d796
      #14 [ffff88007b6c1e70] lustre_kill_super at ffffffffa0fb9666 [obdclass]
      #15 [ffff88007b6c1e90] deactivate_super at ffffffff8117e825
      #16 [ffff88007b6c1eb0] mntput_no_expire at ffffffff8119a89f
      #17 [ffff88007b6c1ee0] sys_umount at ffffffff8119b34b
      

      After this nothing cound progress:

      [145371.090429] LustreError: 22398:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_f
      ail_timeout id 407 sleeping for 10000ms
      [145380.552626] Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, o
      r until 1 client reconnects
      [145380.573743] Lustre: lustre-MDT0000: Recovery over after 0:01, of 1 clients 1
       recovered and 0 were evicted.
      [145380.588321] Lustre: lustre-OST0001: deleting orphan objects from 0x0:176 to 
      192
      [145380.588747] Lustre: Skipped 1 previous similar message
      [145381.093065] LustreError: 22398:0:(fail.c:137:__cfs_fail_timeout_set()) cfs_f
      ail_timeout id 407 awake
      [145459.804339] Lustre: Failing over lustre-MDT0000
      [145459.809747] LustreError: 11-0: lustre-MDT0000-mdc-ffff88008c61dbf0: Communic
      ating with 0@lo, operation mds_reint failed with -19.
      [145459.810324] LustreError: Skipped 5 previous similar messages
      [145460.115626] LustreError: 20940:0:(client.c:1039:ptlrpc_import_delay_req()) @
      @@ IMP_CLOSED   req@ffff88008fd2ebf0 x1425344870597376/t0(0) o6->lustre-OST0000-
      osc-MDT0000@0@lo:28/4 lens 664/432 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
      [145460.157395] LustreError: 20938:0:(client.c:1039:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff880073692bf0 x1425344870597409/t0(0) o6->lustre-OST0001-osc-MDT0000@0@lo:28/4 lens 664/432 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
      [145460.157395] LustreError: 20938:0:(client.c:1039:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff880073692bf0 x1425344870597409/t0(0) o6->lustre-OST0001-osc-MDT0000@0@lo:28/4 lens 664/432 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
      [145460.158377] LustreError: 20938:0:(client.c:1039:ptlrpc_import_delay_req()) Skipped 7 previous similar messages
      [145460.608530] LustreError: 137-5: lustre-MDT0000: Not available for connect from 0@lo (stopping)
      [145465.605136] LustreError: 137-5: lustre-MDT0000: Not available for connect from 0@lo (stopping)
      [145465.606017] LustreError: Skipped 3 previous similar messages
      ...
      

      I have a crashdump.

      Attachments

        Activity

          People

            bzzz Alex Zhuravlev
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: