Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 2.4.0
-
3
-
5195
Description
single node FSTYPE=zfs sanity.sh is hanging for me:
[ 4020.591245] Lustre: DEBUG MARKER: == sanity test 64b: check out-of-space detection on client ============= 18:00:03 (1350079203) [ 4301.008200] LNet: Service thread pid 29286 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [ 4301.018042] Pid: 29286, comm: ll_ost_io01_005 [ 4301.018946] [ 4301.018947] Call Trace: [ 4301.019836] [<ffffffff814fb5d1>] schedule_timeout+0x191/0x2e0 [ 4301.021092] [<ffffffff8107bcd0>] ? process_timeout+0x0/0x10 [ 4301.022300] [<ffffffffa09f56e0>] __cv_timedwait_common+0xc0/0x280 [spl] [ 4301.023736] [<ffffffff814fc20c>] ? __mutex_lock_slowpath+0x21c/0x2c0 [ 4301.025058] [<ffffffff8108fd60>] ? autoremove_wake_function+0x0/0x40 [ 4301.026319] [<ffffffffa09f58d3>] __cv_timedwait+0x13/0x20 [spl] [ 4301.027593] [<ffffffffa0a956b7>] txg_delay+0xf7/0x140 [zfs] [ 4301.028851] [<ffffffffa0a6d5fe>] dsl_dir_tempreserve_space+0x1de/0x1f0 [zfs] [ 4301.030237] [<ffffffffa0a5732d>] ? dmu_tx_wait+0x11d/0x120 [zfs] [ 4301.031719] [<ffffffffa0a5740e>] ? dmu_tx_assign+0x8e/0x4b0 [zfs] [ 4301.033237] [<ffffffffa08b955c>] ? osd_trans_start+0x9c/0x430 [osd_zfs] [ 4301.034868] [<ffffffffa0cebe1d>] ? ofd_trans_start+0x22d/0x450 [ofd] [ 4301.036448] [<ffffffffa0cf0772>] ? ofd_commitrw_write+0x652/0x12a0 [ofd] [ 4301.037807] [<ffffffffa0cf1775>] ? ofd_commitrw+0x3b5/0x9c0 [ofd] [ 4301.039079] [<ffffffffa0c2dbd8>] ? obd_commitrw+0x128/0x3d0 [ost] [ 4301.040326] [<ffffffffa0c34e34>] ? ost_brw_write+0xd04/0x15d0 [ost] [ 4301.041592] [<ffffffffa05042f0>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc] [ 4301.042939] [<ffffffffa0c3a250>] ? ost_handle+0x3120/0x4550 [ost] [ 4301.044167] [<ffffffffa0e9b464>] ? libcfs_id2str+0x74/0xb0 [libcfs] [ 4301.045471] [<ffffffffa0551483>] ? ptlrpc_server_handle_request+0x463/0xe70 [ptlrpc] [ 4301.047047] [<ffffffffa0e8f66e>] ? cfs_timer_arm+0xe/0x10 [libcfs] [ 4301.048460] [<ffffffffa054a171>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc] [ 4301.049727] [<ffffffffa0e9f6d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] [ 4301.051178] [<ffffffff81051f73>] ? __wake_up+0x53/0x70 [ 4301.052242] [<ffffffffa055401a>] ? ptlrpc_main+0xb9a/0x1960 [ptlrpc] [ 4301.053519] [<ffffffffa0553480>] ? ptlrpc_main+0x0/0x1960 [ptlrpc] [ 4301.054678] [<ffffffff8100c14a>] ? child_rip+0xa/0x20 [ 4301.055860] [<ffffffffa0553480>] ? ptlrpc_main+0x0/0x1960 [ptlrpc] [ 4301.057251] [<ffffffffa0553480>] ? ptlrpc_main+0x0/0x1960 [ptlrpc] [ 4301.058523] [<ffffffff8100c140>] ? child_rip+0x0/0x20 [ 4301.059582] [ 4301.059918] LustreError: dumping log to /tmp/lustre-log.1350079484.29286 [ 4307.044564] LNet: Service thread pid 29646 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [ 4307.049281] Pid: 29646, comm: ll_ost_io00_007 [ 4307.050021] [ 4307.050022] Call Trace: [ 4307.050711] [<ffffffff8100e530>] ? dump_trace+0x190/0x3b0 [ 4307.051628] [<ffffffff81174d23>] ? create_object+0x33/0x2c0 [ 4307.052579] [<ffffffff814fdb54>] ? _spin_lock_irqsave+0x24/0x30 [ 4307.053624] [<ffffffff81061580>] ? pick_next_task_fair+0xd0/0x130 [ 4307.054684] [<ffffffff814fb5d1>] schedule_timeout+0x191/0x2e0 [ 4307.055661] [<ffffffff8107bcd0>] ? process_timeout+0x0/0x10 [ 4307.056646] [<ffffffffa09f56e0>] __cv_timedwait_common+0xc0/0x280 [spl] [ 4307.057801] [<ffffffff814fc20c>] ? __mutex_lock_slowpath+0x21c/0x2c0 [ 4307.058899] [<ffffffffa09ed0ca>] ? kmem_alloc_debug+0x13a/0x4c0 [spl] [ 4307.060010] [<ffffffffa09f58d3>] __cv_timedwait+0x13/0x20 [spl] [ 4307.061059] [<ffffffffa0a956b7>] txg_delay+0xf7/0x140 [zfs] [ 4307.062048] [<ffffffffa0a6d5fe>] dsl_dir_tempreserve_space+0x1de/0x1f0 [zfs] [ 4307.063270] [<ffffffffa0a577fb>] dmu_tx_assign+0x47b/0x4b0 [zfs] [ 4307.064295] [<ffffffffa08b955c>] osd_trans_start+0x9c/0x430 [osd_zfs] [ 4307.065440] [<ffffffffa0cebe1d>] ofd_trans_start+0x22d/0x450 [ofd] [ 4307.066507] [<ffffffffa0cf0772>] ofd_commitrw_write+0x652/0x12a0 [ofd] [ 4307.067656] [<ffffffffa0cf1775>] ofd_commitrw+0x3b5/0x9c0 [ofd] [ 4307.068690] [<ffffffffa0c2dbd8>] obd_commitrw+0x128/0x3d0 [ost] [ 4307.069707] [<ffffffffa0c34e34>] ost_brw_write+0xd04/0x15d0 [ost] [ 4307.070777] [<ffffffffa05042f0>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc] [ 4307.071911] [<ffffffffa0c3a250>] ost_handle+0x3120/0x4550 [ost] [ 4307.072975] [<ffffffffa0e9b464>] ? libcfs_id2str+0x74/0xb0 [libcfs] [ 4307.074082] [<ffffffffa0551483>] ptlrpc_server_handle_request+0x463/0xe70 [ptlrpc] [ 4307.075370] [<ffffffffa0e8f66e>] ? cfs_timer_arm+0xe/0x10 [libcfs] [ 4307.076445] [<ffffffffa054a171>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc] [ 4307.077561] [<ffffffffa0e9f6d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] [ 4307.078655] [<ffffffff81051f73>] ? __wake_up+0x53/0x70 [ 4307.079601] [<ffffffffa055401a>] ptlrpc_main+0xb9a/0x1960 [ptlrpc] [ 4307.080673] [<ffffffffa0553480>] ? ptlrpc_main+0x0/0x1960 [ptlrpc] [ 4307.081885] [<ffffffff8100c14a>] child_rip+0xa/0x20 [ 4307.082707] [<ffffffffa0553480>] ? ptlrpc_main+0x0/0x1960 [ptlrpc] [ 4307.083715] [<ffffffffa0553480>] ? ptlrpc_main+0x0/0x1960 [ptlrpc] [ 4307.084911] [<ffffffff8100c140>] ? child_rip+0x0/0x20 [ 4307.085896] ... [ 4914.044577] Lustre: 29335:0:(service.c:1266:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-207), not sending early reply [ 4914.044580] req@ffff8801a51c9bf0 x1415657954602061/t0(0) o4->5dc665da-eb3e-3cef-f2b8-d3318e082c88@0@lo:0/0 lens 488/448 e 5 to 0 dl 1350080102 ref 2 fl Interpret:/0/0 rc 0/0 [ 5024.072116] Lustre: 4182:0:(client.c:1909:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1350079284/real 1350079284] req@ffff88016fb9fbf0 x1415657954602058/t0(0) o4->lustre-OST0001-osc-ffff88023539fbf0@0@lo:6/4 lens 488/448 e 5 to 1 dl 1350080207 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 [ 5024.080010] Lustre: lustre-OST0001-osc-ffff88023539fbf0: Connection to lustre-OST0001 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete [ 5024.083451] Lustre: lustre-OST0001: Client 5dc665da-eb3e-3cef-f2b8-d3318e082c88 (at 0@lo) reconnecting [ 5024.087183] Lustre: lustre-OST0001: Client 5dc665da-eb3e-3cef-f2b8-d3318e082c88 (at 0@lo) refused reconnection, still busy with 1 active RPCs [ 5024.090527] LustreError: 11-0: an error occurred while communicating with 0@lo. The ost_connect operation failed with -16
And it never completes.