Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4695

Timeout at end of recovery-small

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.6.0
    • 3
    • 12906

    Description

      This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

      This issue relates to the following test suite run:
      http://maloo.whamcloud.com/test_sets/95df0edc-a132-11e3-ab55-52540035b04c.

      suite_log:

      CMD: wtm-15vm4 umount -d -f /mnt/ost1
      CMD: wtm-15vm4 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
      CMD: wtm-15vm4 ! zpool list -H lustre-ost1 >/dev/null 2>&1 ||
      			grep -q ^lustre-ost1/ /proc/mounts ||
      			zpool export  lustre-ost1
      CMD: wtm-15vm4 grep -c /mnt/ost2' ' /proc/mounts
      Stopping /mnt/ost2 (opts:-f) on wtm-15vm4
      CMD: wtm-15vm4 umount -d -f /mnt/ost2
      

      console log of OST1:

      Lustre: DEBUG MARKER: == recovery-small test complete, duration 2715 sec == 20:15:02 (1393647302)
      Lustre: lustre-OST0000: deleting orphan objects from 0x0:5251 to 0x0:5377
      Lustre: lustre-OST0001: deleting orphan objects from 0x0:5123 to 0x0:5153
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 5 sec
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 5 sec
      Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 5 sec
      Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 5 sec
      Lustre: DEBUG MARKER: grep -c /mnt/ost1' ' /proc/mounts
      Lustre: DEBUG MARKER: umount -d -f /mnt/ost1
      Lustre: server umount lustre-OST0000 complete
      Lustre: Skipped 2 previous similar messages
      Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
      Lustre: DEBUG MARKER: ! zpool list -H lustre-ost1 >/dev/null 2>&1 ||
                              grep -q ^lustre-ost1/ /proc/mounts ||
                              zpool export  lustre-ost1
      Lustre: DEBUG MARKER: grep -c /mnt/ost2' ' /proc/mounts
      Lustre: DEBUG MARKER: umount -d -f /mnt/ost2
      Lustre: lustre-OST0001 is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 5. Is it stuck?
      LustreError: 166-1: MGC10.10.16.161@tcp: Connection to MGS (at 10.10.16.161@tcp) was lost; in progress operations using this service will fail
      LustreError: Skipped 2 previous similar messages
      Lustre: lustre-OST0001 is waiting for obd_unlinked_exports more than 16 seconds. The obd refcount = 5. Is it stuck?
      Lustre: lustre-OST0001 is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 5. Is it stuck?
      Lustre: lustre-OST0001 is waiting for obd_unlinked_exports more than 64 seconds. The obd refcount = 5. Is it stuck?
      Lustre: lustre-OST0001 is waiting for obd_unlinked_exports more than 128 seconds. The obd refcount = 5. Is it stuck?
      INFO: task umount:26349 blocked for more than 120 seconds.
            Tainted: P           ---------------    2.6.32-431.5.1.el6_lustre.g1131719.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      umount        D 0000000000000001     0 26349  26348 0x00000080
       ffff88001bebdaa8 0000000000000086 ffff88001bebda08 ffff8800118be000
       ffffffffa13a5a09 0000000000000000 ffff880010c85084 ffffffffa13a5a09
       ffff88002b5545f8 ffff88001bebdfd8 000000000000fbc8 ffff88002b5545f8
      Call Trace:
       [<ffffffff81528eb2>] schedule_timeout+0x192/0x2e0
       [<ffffffff81084220>] ? process_timeout+0x0/0x10
       [<ffffffffa132a29b>] obd_exports_barrier+0xab/0x180 [obdclass]
       [<ffffffffa187f98b>] ofd_device_fini+0x6b/0x250 [ofd]
       [<ffffffffa1353a03>] class_cleanup+0x573/0xd30 [obdclass]
       [<ffffffffa132c436>] ? class_name2dev+0x56/0xe0 [obdclass]
       [<ffffffffa135572a>] class_process_config+0x156a/0x1ad0 [obdclass]
       [<ffffffffa07bf4e8>] ? libcfs_log_return+0x28/0x40 [libcfs]
       [<ffffffffa134da42>] ? lustre_cfg_new+0x312/0x6e0 [obdclass]
       [<ffffffffa1355e09>] class_manual_cleanup+0x179/0x6f0 [obdclass]
       [<ffffffffa132c436>] ? class_name2dev+0x56/0xe0 [obdclass]
       [<ffffffffa138f1f9>] server_put_super+0x8e9/0xe40 [obdclass]
       [<ffffffff8118b23b>] generic_shutdown_super+0x5b/0xe0
       [<ffffffff8118b326>] kill_anon_super+0x16/0x60
       [<ffffffffa1357cc6>] lustre_kill_super+0x36/0x60 [obdclass]
       [<ffffffff8118bac7>] deactivate_super+0x57/0x80
       [<ffffffff811aaaff>] mntput_no_expire+0xbf/0x110
       [<ffffffff811ab64b>] sys_umount+0x7b/0x3a0
       [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      Lustre: 12217:0:(client.c:1912:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1393647833/real 1393647833]  req@ffff880038f04800 x1461340392671056/t0(0) o250->MGC10.10.16.161@tcp@10.10.16.161@tcp:26/25 lens 400/544 e 0 to 1 dl 1393647858 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Lustre: 12217:0:(client.c:1912:ptlrpc_expire_one_request()) Skipped 20 previous similar messages
      Lustre: lustre-OST0001 is waiting for obd_unlinked_exports more than 256 seconds. The obd refcount = 5. Is it stuck?
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: