Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7755

reboot fails on lustre client if filesystem is still mounted

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.7.0
    • None
    • RHEL7; kernel: 3.10.0_229.4.2.el7.x86_64; mlnx-OFED.3.0.2
    • 3
    • 9223372036854775807

    Description

      When rebooting a lustre client where lustre filesystem is still mounted, shutdown hangs with following traces:
      [ OK ] Stopped LSB: Starts and stops the InfiniBand ACM service.
      [ OK ] Stopped target Network.
      Stopping LSB: Activates/Deactivates InfiniBand Drive...t boot time....
      [ 1787.569703] LNetError: 131-3: Received notification of device removal
      [ 1787.569703] Please shutdown LNET to allow this to proceed
      [ ***] A stop job is running for LSB: Activates/Deactivates...t at boot time.[ 1921.209984] INFO: task modprobe:4495 blocked for more than 120 seconds.
      [ 1921.216612] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 1921.224447] modprobe D ffff88046f6b3680 0 4495 4436 0x00000080
      [ 1921.231541] ffff880857f63c60 0000000000000082 ffff880857fa38e0 ffff880857f63fd8
      [ 1921.239017] ffff880857f63fd8 ffff880857f63fd8 ffff880857fa38e0 ffff8808667c20d8
      [ 1921.246478] ffff8808667c20e0 7fffffffffffffff ffff880857fa38e0 ffff8808667c2100
      [ 1921.253938] Call Trace:
      [ 1921.256396] [<ffffffff816095f9>] schedule+0x29/0x70
      [ 1921.261361] [<ffffffff81607549>] schedule_timeout+0x209/0x2d0
      [ 1921.267209] [<ffffffffa0786357>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [ 1921.273821] [<ffffffff81609af6>] wait_for_completion+0x116/0x170
      [ 1921.279913] [<ffffffff810a9500>] ? wake_up_state+0x20/0x20
      [ 1921.285490] [<ffffffffa0764dd3>] cma_remove_one+0x193/0x210 [rdma_cm]
      [ 1921.292030] [<ffffffffa03a7466>] ib_unregister_device+0x46/0xf0 [ib_core]
      [ 1921.298914] [<ffffffffa04bd5db>] mlx4_ib_remove+0xdb/0x320 [mlx4_ib]
      [ 1921.305363] [<ffffffffa0504f48>] mlx4_remove_device+0x88/0xd0 [mlx4_core]
      [ 1921.312234] [<ffffffffa0504fd3>] mlx4_unregister_interface+0x43/0x80 [mlx4_core]
      [ 1921.319717] [<ffffffffa04d563f>] mlx4_ib_cleanup+0x10/0x9d1 [mlx4_ib]
      [ 1921.326248] [<ffffffff810dad3b>] SyS_delete_module+0x16b/0x2d0
      [ 1921.332170] [<ffffffff8160f98a>] ? do_page_fault+0x1a/0x70
      [ 1921.337752] [<ffffffff81013b0c>] ? do_notify_resume+0x9c/0xb0
      [ 1921.343586] [<ffffffff81614169>] system_call_fastpath+0x16/0x1b
      [** ] A stop job is running for LSB: Activates/Deactivates...t at boot time.[ 2041.265430] INFO: task modprobe:4495 blocked for more than 120 seconds.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              FSaunier Frederic Saunier (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: