[LU-4695] Timeout at end of recovery-small Created: 03/Mar/14  Updated: 04/Jun/14  Resolved: 04/Jun/14

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: zfs

Issue Links:
Duplicate
is duplicated by LU-3665 obdfilter-survey test_3a: unmount stu... Resolved
Related
is related to LU-3230 conf-sanity fails to start run: umoun... Resolved
Severity: 3
Rank (Obsolete): 12906

 Description   

This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

This issue relates to the following test suite run:
http://maloo.whamcloud.com/test_sets/95df0edc-a132-11e3-ab55-52540035b04c.

suite_log:

CMD: wtm-15vm4 umount -d -f /mnt/ost1
CMD: wtm-15vm4 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: wtm-15vm4 ! zpool list -H lustre-ost1 >/dev/null 2>&1 ||
			grep -q ^lustre-ost1/ /proc/mounts ||
			zpool export  lustre-ost1
CMD: wtm-15vm4 grep -c /mnt/ost2' ' /proc/mounts
Stopping /mnt/ost2 (opts:-f) on wtm-15vm4
CMD: wtm-15vm4 umount -d -f /mnt/ost2

console log of OST1:

Lustre: DEBUG MARKER: == recovery-small test complete, duration 2715 sec == 20:15:02 (1393647302)
Lustre: lustre-OST0000: deleting orphan objects from 0x0:5251 to 0x0:5377
Lustre: lustre-OST0001: deleting orphan objects from 0x0:5123 to 0x0:5153
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 5 sec
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 5 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 5 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 5 sec
Lustre: DEBUG MARKER: grep -c /mnt/ost1' ' /proc/mounts
Lustre: DEBUG MARKER: umount -d -f /mnt/ost1
Lustre: server umount lustre-OST0000 complete
Lustre: Skipped 2 previous similar messages
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
Lustre: DEBUG MARKER: ! zpool list -H lustre-ost1 >/dev/null 2>&1 ||
                        grep -q ^lustre-ost1/ /proc/mounts ||
                        zpool export  lustre-ost1
Lustre: DEBUG MARKER: grep -c /mnt/ost2' ' /proc/mounts
Lustre: DEBUG MARKER: umount -d -f /mnt/ost2
Lustre: lustre-OST0001 is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 5. Is it stuck?
LustreError: 166-1: MGC10.10.16.161@tcp: Connection to MGS (at 10.10.16.161@tcp) was lost; in progress operations using this service will fail
LustreError: Skipped 2 previous similar messages
Lustre: lustre-OST0001 is waiting for obd_unlinked_exports more than 16 seconds. The obd refcount = 5. Is it stuck?
Lustre: lustre-OST0001 is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 5. Is it stuck?
Lustre: lustre-OST0001 is waiting for obd_unlinked_exports more than 64 seconds. The obd refcount = 5. Is it stuck?
Lustre: lustre-OST0001 is waiting for obd_unlinked_exports more than 128 seconds. The obd refcount = 5. Is it stuck?
INFO: task umount:26349 blocked for more than 120 seconds.
      Tainted: P           ---------------    2.6.32-431.5.1.el6_lustre.g1131719.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
umount        D 0000000000000001     0 26349  26348 0x00000080
 ffff88001bebdaa8 0000000000000086 ffff88001bebda08 ffff8800118be000
 ffffffffa13a5a09 0000000000000000 ffff880010c85084 ffffffffa13a5a09
 ffff88002b5545f8 ffff88001bebdfd8 000000000000fbc8 ffff88002b5545f8
Call Trace:
 [<ffffffff81528eb2>] schedule_timeout+0x192/0x2e0
 [<ffffffff81084220>] ? process_timeout+0x0/0x10
 [<ffffffffa132a29b>] obd_exports_barrier+0xab/0x180 [obdclass]
 [<ffffffffa187f98b>] ofd_device_fini+0x6b/0x250 [ofd]
 [<ffffffffa1353a03>] class_cleanup+0x573/0xd30 [obdclass]
 [<ffffffffa132c436>] ? class_name2dev+0x56/0xe0 [obdclass]
 [<ffffffffa135572a>] class_process_config+0x156a/0x1ad0 [obdclass]
 [<ffffffffa07bf4e8>] ? libcfs_log_return+0x28/0x40 [libcfs]
 [<ffffffffa134da42>] ? lustre_cfg_new+0x312/0x6e0 [obdclass]
 [<ffffffffa1355e09>] class_manual_cleanup+0x179/0x6f0 [obdclass]
 [<ffffffffa132c436>] ? class_name2dev+0x56/0xe0 [obdclass]
 [<ffffffffa138f1f9>] server_put_super+0x8e9/0xe40 [obdclass]
 [<ffffffff8118b23b>] generic_shutdown_super+0x5b/0xe0
 [<ffffffff8118b326>] kill_anon_super+0x16/0x60
 [<ffffffffa1357cc6>] lustre_kill_super+0x36/0x60 [obdclass]
 [<ffffffff8118bac7>] deactivate_super+0x57/0x80
 [<ffffffff811aaaff>] mntput_no_expire+0xbf/0x110
 [<ffffffff811ab64b>] sys_umount+0x7b/0x3a0
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Lustre: 12217:0:(client.c:1912:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1393647833/real 1393647833]  req@ffff880038f04800 x1461340392671056/t0(0) o250->MGC10.10.16.161@tcp@10.10.16.161@tcp:26/25 lens 400/544 e 0 to 1 dl 1393647858 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 12217:0:(client.c:1912:ptlrpc_expire_one_request()) Skipped 20 previous similar messages
Lustre: lustre-OST0001 is waiting for obd_unlinked_exports more than 256 seconds. The obd refcount = 5. Is it stuck?


 Comments   
Comment by Nathaniel Clark [ 03/Mar/14 ]

recovery-small/29b review-zfs on master (pre 2.6)
https://maloo.whamcloud.com/test_sets/a2782ad6-a063-11e3-9f3a-52540035b04c

Comment by Nathaniel Clark [ 03/Mar/14 ]

replay-single/42 review-zfs on master (pre 2.6)
https://maloo.whamcloud.com/test_sets/1a3d804a-a003-11e3-bfe8-52540035b04c

Comment by Jodi Levi (Inactive) [ 18/Apr/14 ]

Nathaniel,
Can you confirm if this is still occurring?
Thank you!

Comment by Nathaniel Clark [ 21/Apr/14 ]

replay-single/42 review-zfs on master (pre 2.6)
https://maloo.whamcloud.com/test_sets/d7cf8e58-c82d-11e3-98ab-52540035b04c

Comment by Jodi Levi (Inactive) [ 04/Jun/14 ]

Reopening ticket to remove fix version as it is a duplicate.

Generated at Sat Feb 10 01:45:02 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.