[LU-7888] kernel: INFO: task mount.lustre:22219 blocked for more than 120 seconds Created: 19/Mar/16  Updated: 17/Aug/16  Resolved: 16/May/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0, Lustre 2.9.0
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Major
Reporter: Andreas Dilger Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

During a test he following caused the test to fail:

Mar  2 15:12:36 lotus-42vm5 kernel: LDISKFS-fs (sdc): mounted filesystem with ordered data mode. quota=on. Opts: 
Mar  2 15:12:36 lotus-42vm5 kernel: 
Mar  2 15:12:38 lotus-42vm5 kernel: LDISKFS-fs (sda): mounted filesystem with ordered data mode. quota=on. Opts: 
Mar  2 15:12:38 lotus-42vm5 kernel: 
Mar  2 15:12:38 lotus-42vm5 kernel: LDISKFS-fs (sde): mounted filesystem with ordered data mode. quota=on. Opts: 
Mar  2 15:12:38 lotus-42vm5 kernel: 
Mar  2 15:12:38 lotus-42vm5 kernel: LDISKFS-fs (sdd): mounted filesystem with ordered data mode. quota=on. Opts: 
Mar  2 15:12:38 lotus-42vm5 kernel: 
Mar  2 15:13:03 lotus-42vm5 kernel: LDISKFS-fs (sdc): mounted filesystem with ordered data mode. quota=on. Opts: 
Mar  2 15:13:03 lotus-42vm5 kernel: 
Mar  2 15:13:03 lotus-42vm5 kernel: LDISKFS-fs (sdc): mounted filesystem with ordered data mode. quota=on. Opts: 
Mar  2 15:13:03 lotus-42vm5 kernel: 
Mar  2 15:13:09 lotus-42vm5 kernel: LDISKFS-fs (sdd): mounted filesystem with ordered data mode. quota=on. Opts: 
Mar  2 15:13:09 lotus-42vm5 kernel: 
Mar  2 15:13:09 lotus-42vm5 kernel: LDISKFS-fs (sdd): mounted filesystem with ordered data mode. quota=on. Opts: 
Mar  2 15:13:09 lotus-42vm5 kernel: 
Mar  2 15:13:09 lotus-42vm5 kernel: Lustre: ctl-testfs-MDT0000: No data found on store. Initialize space
Mar  2 15:13:09 lotus-42vm5 kernel: Lustre: testfs-MDT0000: new disk, initializing
Mar  2 15:13:09 lotus-42vm5 kernel: Lustre: Failing over testfs-MDT0000
Mar  2 15:13:10 lotus-42vm5 kernel: Lustre: server umount testfs-MDT0000 complete
Mar  2 15:13:10 lotus-42vm5 kernel: LDISKFS-fs (sda): mounted filesystem with ordered data mode. quota=on. Opts: 
Mar  2 15:13:10 lotus-42vm5 kernel: 
Mar  2 15:13:10 lotus-42vm5 kernel: LDISKFS-fs (sda): mounted filesystem with ordered data mode. quota=on. Opts: 
Mar  2 15:13:10 lotus-42vm5 kernel: 
Mar  2 15:13:10 lotus-42vm5 kernel: Lustre: srv-testfs-MDT0001: No data found on store. Initialize space
Mar  2 15:13:10 lotus-42vm5 kernel: Lustre: Skipped 1 previous similar message
Mar  2 15:13:10 lotus-42vm5 kernel: Lustre: testfs-MDT0001: new disk, initializing
Mar  2 15:13:10 lotus-42vm5 kernel: LustreError: 137-5: testfs-MDT0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
Mar  2 15:13:41 lotus-42vm5 kernel: Lustre: 7592:0:(client.c:2048:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1456960415/real 1456960415]  req@ffff880061d19c80 x1527730997821688/t0(0) o38->testfs-MDT0000-osp-MDT0001@10.14.82.129@tcp:24/4 lens 520/544 e 0 to 1 dl 1456960421 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Mar  2 15:13:42 lotus-42vm5 kernel: LDISKFS-fs (sdd): mounted filesystem with ordered data mode. quota=on. Opts: 
Mar  2 15:13:42 lotus-42vm5 kernel: 
Mar  2 15:14:00 lotus-42vm5 kernel: LustreError: 137-5: testfs-MDT0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
Mar  2 15:14:00 lotus-42vm5 kernel: LustreError: Skipped 1 previous similar message
Mar  2 15:14:36 lotus-42vm5 kernel: Lustre: 7592:0:(client.c:2048:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1456960465/real 1456960465]  req@ffff88005e254980 x1527730997821712/t0(0) o38->testfs-MDT0000-osp-MDT0001@10.14.82.129@tcp:24/4 lens 520/544 e 0 to 1 dl 1456960476 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Mar  2 15:14:36 lotus-42vm5 kernel: Lustre: 7592:0:(client.c:2048:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Mar  2 15:14:50 lotus-42vm5 kernel: LustreError: 137-5: testfs-MDT0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
Mar  2 15:14:50 lotus-42vm5 kernel: LustreError: Skipped 1 previous similar message
Mar  2 15:15:31 lotus-42vm5 kernel: Lustre: 7592:0:(client.c:2048:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1456960515/real 1456960515]  req@ffff88005e254c80 x1527730997821736/t0(0) o38->testfs-MDT0000-osp-MDT0001@10.14.82.129@tcp:24/4 lens 520/544 e 0 to 1 dl 1456960531 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Mar  2 15:15:31 lotus-42vm5 kernel: Lustre: 7592:0:(client.c:2048:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Mar  2 15:15:40 lotus-42vm5 kernel: LustreError: 137-5: testfs-MDT0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
Mar  2 15:15:40 lotus-42vm5 kernel: LustreError: Skipped 1 previous similar message
Mar  2 15:15:44 lotus-42vm5 kernel: INFO: task mount.lustre:22219 blocked for more than 120 seconds.
Mar  2 15:15:44 lotus-42vm5 kernel:      Not tainted 2.6.32-573.18.1.el6_lustre.x86_64 #1
Mar  2 15:15:44 lotus-42vm5 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar  2 15:15:44 lotus-42vm5 kernel: mount.lustre  D 0000000000000000     0 22219  22217 0x00000080
Mar  2 15:15:44 lotus-42vm5 kernel: ffff88007840f808 0000000000000082 ffff880002215a00 ffff88007bb26f4b
Mar  2 15:15:44 lotus-42vm5 kernel: ffffffffa12a9e1b ffffffffa12a9e19 ffff88007840f828 ffff88007bb26f78
Mar  2 15:15:44 lotus-42vm5 kernel: ffff88007840f818 ffffffff8129caf8 ffff88007c5c45f8 ffff88007840ffd8
Mar  2 15:15:44 lotus-42vm5 kernel: Call Trace:
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffff8129caf8>] ? vsnprintf+0x218/0x5e0
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffff8153bc06>] __mutex_lock_slowpath+0x96/0x210
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffff8129cf64>] ? snprintf+0x34/0x40
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa125404f>] ? server_name2fsname+0x6f/0x90 [obdclass]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa040a1f0>] ? qsd_conn_callback+0x0/0x180 [lquota]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffff8153b72b>] mutex_lock+0x2b/0x50
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa128ac3f>] lustre_register_lwp_item+0xdf/0xab0 [obdclass]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa040bd09>] qsd_prepare+0x949/0x11b0 [lquota]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa0464c82>] osd_prepare+0x132/0x720 [osd_ldiskfs]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa08e5640>] lod_prepare+0xf0/0x1e0 [lod]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa095d0af>] mdd_prepare+0xef/0x1280 [mdd]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa149f81b>] ? tgt_ses_key_init+0x6b/0x190 [ptlrpc]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa125ff2d>] ? keys_fill+0xdd/0x1c0 [obdclass]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa07faf76>] mdt_prepare+0x56/0x3b0 [mdt]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa1264403>] ? lu_context_init+0xa3/0x240 [obdclass]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa128a51f>] server_start_targets+0x176f/0x1db0 [obdclass]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa1252550>] ? class_config_llog_handler+0x0/0x17b0 [obdclass]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa1291595>] server_fill_super+0xbe5/0x1a7c [obdclass]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa125cc82>] lustre_fill_super+0xa82/0x2150 [obdclass]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa125c200>] ? lustre_fill_super+0x0/0x2150 [obdclass]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffff8119567f>] get_sb_nodev+0x5f/0xa0
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa1253de5>] lustre_get_sb+0x25/0x30 [obdclass]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffff81194cbb>] vfs_kern_mount+0x7b/0x1b0
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffff81194e62>] do_kern_mount+0x52/0x130
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffff811b6e1b>] do_mount+0x2fb/0x930
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffff811b74e0>] sys_mount+0x90/0xe0
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
Mar  2 15:15:44 lotus-42vm5 kernel: INFO: task mount.lustre:22496 blocked for more than 120 seconds.
Mar  2 15:15:44 lotus-42vm5 kernel:      Not tainted 2.6.32-573.18.1.el6_lustre.x86_64 #1
Mar  2 15:15:44 lotus-42vm5 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar  2 15:15:44 lotus-42vm5 kernel: mount.lustre  D 0000000000000000     0 22496  22495 0x00000080
Mar  2 15:15:44 lotus-42vm5 kernel: ffff88006083f908 0000000000000082 ffffffffffffffff 0000000000000000
Mar  2 15:15:44 lotus-42vm5 kernel: 0000000000000001 77fa41c5810ac58a 0000000000000e3b ffff880077fa41c5
Mar  2 15:15:44 lotus-42vm5 kernel: 0000000000000000 0000000affffffff ffff880077e805f8 ffff88006083ffd8
Mar  2 15:15:44 lotus-42vm5 kernel: Call Trace:
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa036db53>] ? libcfs_debug_vmsg2+0x5e3/0xbe0 [libcfs]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffff8153bc06>] __mutex_lock_slowpath+0x96/0x210
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffff8153b72b>] mutex_lock+0x2b/0x50
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa0341d10>] mgc_set_info_async+0x450/0x1a50 [mgc]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa036e191>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa1284745>] server_mgc_set_fs+0x115/0x4e0 [obdclass]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa1288e5f>] server_start_targets+0xaf/0x1db0 [obdclass]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa125ab48>] ? lustre_start_mgc+0xac8/0x2180 [obdclass]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa1291595>] server_fill_super+0xbe5/0x1a7c [obdclass]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa125cc82>] lustre_fill_super+0xa82/0x2150 [obdclass]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa125c200>] ? lustre_fill_super+0x0/0x2150 [obdclass]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffff8119567f>] get_sb_nodev+0x5f/0xa0
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffffa1253de5>] lustre_get_sb+0x25/0x30 [obdclass]
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffff81194cbb>] vfs_kern_mount+0x7b/0x1b0
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffff81194e62>] do_kern_mount+0x52/0x130
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffff811b6e1b>] do_mount+0x2fb/0x930
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffff811b74e0>] sys_mount+0x90/0xe0
Mar  2 15:15:44 lotus-42vm5 kernel: [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
Mar  2 15:16:26 lotus-42vm5 kernel: Lustre: 7592:0:(client.c:2048:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1456960565/real 1456960565]  req@ffff88005e254980 x1527730997821760/t0(0) o38->testfs-MDT0000-osp-MDT0001@10.14.82.129@tcp:24/4 lens 520/544 e 0 to 1 dl 1456960586 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Mar  2 15:16:26 lotus-42vm5 kernel: Lustre: 7592:0:(client.c:2048:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Mar  2 15:16:30 lotus-42vm5 kernel: LustreError: 137-5: testfs-MDT0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
Mar  2 15:16:30 lotus-42vm5 kernel: LustreError: Skipped 1 previous similar message
Mar  2 15:17:21 lotus-42vm5 kernel: Lustre: 7592:0:(client.c:2048:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1456960615/real 1456960615]  req@ffff88005e254c80 x1527730997821784/t0(0) o38->testfs-MDT0000-osp-MDT0001@10.14.82.129@tcp:24/4 lens 520/544 e 0 to 1 dl 1456960641 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
class_config_llog_handler+0x0/0x17b0 [obdclass]
Mar  2 15:19:00 lotus-42vm5 kernel: LustreError: 137-5: testfs-MDT0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.


 Comments   
Comment by Gerrit Updater [ 21/Mar/16 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/19034
Subject: LU-7888 obdclass: not hold global lock when lwp callback
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d0e4a83866bfead0106671e67efa78fa0ed6ecc5

Comment by Gerrit Updater [ 16/May/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19034/
Subject: LU-7888 obdclass: not hold global lock when lwp callback
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: bccfc65d04dbd59bedb5dc1509bbdc732fc09b53

Comment by Peter Jones [ 16/May/16 ]

Landed for 2.9

Generated at Sat Feb 10 02:12:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.