Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.4.0
-
None
-
3
-
7057
Description
Running various tests in a loop I am seeing the message like that somewhat regularly.
The latest one happened in replay-single test 52
== replay-single test 52: time out lock replay (3764) == 01:37:31 (1362638251) Filesystem 1K-blocks Used Available Use% Mounted on 192.168.10.216@tcp:/lustre 374928 50772 303616 15% /mnt/lustre mcreate: cannot create `/mnt/lustre2/fsa-centos6-6.localnet' with mode 0100644: Read-only file system rm: cannot remove `/mnt/lustre2/fsa-centos6-6.localnet': No such file or directory fail_loc=0x8000030c Failing mds1 on centos6-6.localnet Stopping /mnt/mds1 (opts:) on centos6-6.localnet
[37981.683825] Lustre: DEBUG MARKER: == replay-single test 52: time out lock replay (3764) == 01:37:31 (1362638251)
[37981.704645] Lustre: DEBUG MARKER: cancel_lru_locks mdc start
[37981.735425] Lustre: DEBUG MARKER: cancel_lru_locks mdc stop
[37981.917515] Turning device loop0 (0x700000) read-only
[37981.942649] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
[37981.950096] Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000
[37984.526745] LustreError: 137-5: UUID 'lustre-MDT0000_UUID' is not available for connect (no target)
[37984.527548] LustreError: Skipped 1 previous similar message
[37989.523057] LustreError: 137-5: UUID 'lustre-MDT0000_UUID' is not available for connect (no target)
[37989.523952] LustreError: Skipped 2 previous similar messages
[37996.136083] Lustre: MGS is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 5. Is it stuck?
[37999.132513] LustreError: 137-5: UUID 'lustre-MDT0000_UUID' is not available for connect (no target)
[38012.137563] Lustre: MGS is waiting for obd_unlinked_exports more than 16 seconds. The obd refcount = 5. Is it stuck?
[38019.133096] LustreError: 137-5: UUID 'lustre-MDT0000_UUID' is not available for connect (no target)
[38019.133919] LustreError: Skipped 11 previous similar messages
[38044.137577] Lustre: MGS is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 5. Is it stuck?
[38050.132095] Lustre: 11212:0:(client.c:1866:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1362638298/real 1362638298] req@ffff8800b451a7f0 x1428827873041828/t0(0) o250->MGC192.168.10.216@tcp@0@lo:26/25 lens 400/544 e 0 to 1 dl 1362638319 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[38050.134028] Lustre: 11212:0:(client.c:1866:ptlrpc_expire_one_request()) Skipped 25 previous similar messages
[38054.136029] LustreError: 137-5: UUID 'lustre-MDT0000_UUID' is not available for connect (no target)
[38054.136894] LustreError: Skipped 20 previous similar messages
[38108.137578] Lustre: MGS is waiting for obd_unlinked_exports more than 64 seconds. The obd refcount = 5. Is it stuck?
[38119.136041] LustreError: 137-5: UUID 'lustre-MDT0000_UUID' is not available for connect (no target)
[38119.137021] LustreError: Skipped 38 previous similar messages
[38236.137579] Lustre: MGS is waiting for obd_unlinked_exports more than 128 seconds. The obd refcount = 5. Is it stuck?
[38249.135021] LustreError: 137-5: UUID 'lustre-MDT0000_UUID' is not available for connect (no target)
[38249.135879] LustreError: Skipped 77 previous similar messages
[38400.624138] INFO: task umount:3429 blocked for more than 120 seconds.
[38400.624649] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[38400.625436] umount D 0000000000000003 2608 3429 3428 0x00000000
[38400.626043] ffff88006ed25a98 0000000000000086 0000000000000000 ffff88006ed25a48
[38400.626850] ffff88006ed25a08 ffff88006a542bf0 ffffffffa0d9320f 0000000000000000
[38400.627634] ffff88007a544ab8 ffff88006ed25fd8 000000000000fba8 ffff88007a544ab8
[38400.628438] Call Trace:
[38400.628792] [<ffffffff814f8ad1>] schedule_timeout+0x191/0x2e0
[38400.629232] [<ffffffff8107bcd0>] ? process_timeout+0x0/0x10
[38400.629751] [<ffffffffa0a7f75d>] cfs_schedule_timeout_and_set_state+0x1d/0x20 [libcfs]
[38400.630619] [<ffffffffa0d19670>] obd_exports_barrier+0xb0/0x190 [obdclass]
[38400.631133] [<ffffffffa05d2936>] mgs_device_fini+0xf6/0x5c0 [mgs]
[38400.631615] [<ffffffffa0d45cc7>] class_cleanup+0x577/0xda0 [obdclass]
[38400.633249] [<ffffffffa0d1be9c>] ? class_name2dev+0x7c/0xe0 [obdclass]
[38400.633799] [<ffffffffa0d475ac>] class_process_config+0x10bc/0x1c80 [obdclass]
[38400.634531] [<ffffffffa0d40f93>] ? lustre_cfg_new+0x353/0x7e0 [obdclass]
[38400.635035] [<ffffffffa0d482e9>] class_manual_cleanup+0x179/0x6e0 [obdclass]
[38400.635513] [<ffffffff814faebe>] ? _read_unlock+0xe/0x10
[38400.635993] [<ffffffffa0d1be9c>] ? class_name2dev+0x7c/0xe0 [obdclass]
[38400.636501] [<ffffffffa0d5415d>] server_put_super+0x43d/0xe60 [obdclass]
[38400.637008] [<ffffffff8117d6ab>] generic_shutdown_super+0x5b/0xe0
[38400.637480] [<ffffffff8117d796>] kill_anon_super+0x16/0x60
[38400.638004] [<ffffffffa0d4a0e6>] lustre_kill_super+0x36/0x60 [obdclass]
[38400.638468] [<ffffffff8117e825>] deactivate_super+0x85/0xa0
[38400.638904] [<ffffffff8119a89f>] mntput_no_expire+0xbf/0x110
[38400.639338] [<ffffffff8119b34b>] sys_umount+0x7b/0x3a0
[38400.639764] [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
[38492.136071] Lustre: MGS is waiting for obd_unlinked_exports more than 256 seconds. The obd refcount = 5. Is it stuck?
[38509.132648] LustreError: 137-5: UUID 'lustre-MDT0000_UUID' is not available for connect (no target)
[38509.133504] LustreError: Skipped 154 previous similar messages
...
I dumped a crashdump so all interested parties can take a look. /exports/crashdumps/t2/hung-obd_unlinked_exports.dmp (modules present too)