[LU-2554] replay-single test 80b: dd: opening `/mnt/lustre/f80b': Input/output error Created: 31/Dec/12 Updated: 14/Aug/16 Resolved: 14/Aug/16 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.8 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Jian Yu | Assignee: | WC Triage |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre Branch: b1_8 ENABLE_QUOTA=yes MGS/MDS Nodes: client-27vm3 (active), client-27vm7(passive) OSS Nodes: client-27vm4 (active), client-27vm8(active) |
||
| Severity: | 3 |
| Rank (Obsolete): | 5978 |
| Description |
|
replay-single test 80b failed as follows: == replay-single test 80b: write replay with changed data (checksum resend) ========================== 03:49:14 (1356781754)
CMD: client-27vm4 lctl get_param obdfilter.lustre-OST0000.sync_journal
obdfilter.lustre-OST0000.sync_journal=1
CMD: client-27vm4 lctl set_param -n obdfilter.lustre-OST0000.sync_journal 0
CMD: client-27vm4 sync
Filesystem 1K-blocks Used Available Use% Mounted on
client-27vm3:client-27vm7:/lustre
36535940 1731756 32947784 5% /mnt/lustre
CMD: client-27vm4 /usr/sbin/lctl --device %lustre-OST0000 notransno
CMD: client-27vm4 /usr/sbin/lctl --device %lustre-OST0000 readonly
CMD: client-27vm4 /usr/sbin/lctl mark ost1 REPLAY BARRIER on lustre-OST0000
error on ioctl 0x4008669a for '/mnt/lustre/f80b' (3): Input/output error
error: setstripe: create stripe file '/mnt/lustre/f80b' failed
dd: opening `/mnt/lustre/f80b': Input/output error
replay-single test_80b: @@@@@@ FAIL: Cannot write
Syslog on MDS client-27vm7 showed that: Dec 29 03:49:16 client-27vm7 kernel: Lustre: DEBUG MARKER: == replay-single test 80b: write replay with changed data (checksum resend) ========================== 03:49:14 (1356781754) Dec 29 03:49:16 client-27vm7 rshd[3181]: pam_unix(rsh:session): session closed for user root Dec 29 03:49:16 client-27vm7 xinetd[2057]: EXIT: shell status=0 pid=3181 duration=0(sec) Dec 29 03:49:32 client-27vm7 kernel: LustreError: 2442:0:(lov_request.c:694:lov_update_create_set()) error creating fid 0x30003c sub-object on OST idx 0/1: rc = -11 Dec 29 03:50:18 client-27vm7 kernel: Lustre: Service thread pid 2526 was inactive for 56.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Dec 29 03:50:19 client-27vm7 kernel: Pid: 2526, comm: ll_mdt_02 Dec 29 03:50:19 client-27vm7 kernel: Dec 29 03:50:19 client-27vm7 kernel: Call Trace: Dec 29 03:50:19 client-27vm7 kernel: [<ffffffff88921220>] lustre_pack_request+0x630/0x6f0 [ptlrpc] Dec 29 03:50:19 client-27vm7 kernel: [<ffffffff8006389f>] schedule_timeout+0x8a/0xad Dec 29 03:50:19 client-27vm7 kernel: [<ffffffff8009a41d>] process_timeout+0x0/0x5 Dec 29 03:50:19 client-27vm7 kernel: [<ffffffff889e7695>] osc_create+0xc75/0x13d0 [osc] Dec 29 03:50:19 client-27vm7 kernel: [<ffffffff8008ee84>] default_wake_function+0x0/0xe Dec 29 03:50:19 client-27vm7 kernel: [<ffffffff88a96edb>] qos_remedy_create+0x45b/0x570 [lov] Dec 29 03:50:19 client-27vm7 kernel: [<ffffffff8002deea>] __wake_up+0x38/0x4f Dec 29 03:50:20 client-27vm7 kernel: [<ffffffff8008e67d>] dequeue_task+0x18/0x37 Dec 29 03:50:20 client-27vm7 kernel: [<ffffffff88a90df3>] lov_fini_create_set+0x243/0x11e0 [lov] Dec 29 03:50:20 client-27vm7 kernel: [<ffffffff88a84b72>] lov_create+0x1552/0x1860 [lov] Dec 29 03:50:20 client-27vm7 kernel: [<ffffffff88a857a8>] lov_iocontrol+0x928/0xf0f [lov] Dec 29 03:50:20 client-27vm7 kernel: [<ffffffff8008ee84>] default_wake_function+0x0/0xe Dec 29 03:50:20 client-27vm7 kernel: [<ffffffff88c72b21>] mds_finish_open+0x1fa1/0x4370 [mds] Dec 29 03:50:20 client-27vm7 kernel: [<ffffffff80009860>] __d_lookup+0xb0/0xff Dec 29 03:50:20 client-27vm7 kernel: [<ffffffff8000d543>] dput+0x2c/0x114 Dec 29 03:50:20 client-27vm7 kernel: [<ffffffff88c52fad>] mds_verify_child+0x2dd/0x870 [mds] Dec 29 03:50:20 client-27vm7 kernel: [<ffffffff888f59a0>] ldlm_blocking_ast+0x0/0x2a0 [ptlrpc] Dec 29 03:50:20 client-27vm7 kernel: [<ffffffff88c79d41>] mds_open+0x2f01/0x386b [mds] Dec 29 03:50:20 client-27vm7 kernel: [<ffffffff887bacfd>] libcfs_debug_vmsg2+0x70d/0x970 [libcfs] Dec 29 03:50:20 client-27vm7 kernel: [<ffffffff888d886c>] _ldlm_lock_debug+0x57c/0x6e0 [ptlrpc] Dec 29 03:50:20 client-27vm7 kernel: [<ffffffff8891f5f1>] lustre_swab_buf+0x81/0x170 [ptlrpc] Dec 29 03:50:20 client-27vm7 kernel: [<ffffffff8000d543>] dput+0x2c/0x114 Dec 29 03:50:20 client-27vm7 kernel: [<ffffffff88c500a5>] mds_reint_rec+0x365/0x550 [mds] Dec 29 03:50:20 client-27vm7 kernel: [<ffffffff88c7ac6e>] mds_update_unpack+0x1fe/0x280 [mds] Dec 29 03:50:20 client-27vm7 kernel: [<ffffffff88c42eda>] mds_reint+0x35a/0x420 [mds] Dec 29 03:50:20 client-27vm7 kernel: [<ffffffff88c41dea>] fixup_handle_for_resent_req+0x5a/0x2c0 [mds] Dec 29 03:50:20 client-27vm7 kernel: [<ffffffff88c4cbee>] mds_intent_policy+0x49e/0xc10 [mds] Dec 29 03:50:21 client-27vm7 kernel: [<ffffffff888e0270>] ldlm_resource_putref_internal+0x230/0x460 [ptlrpc] Dec 29 03:50:21 client-27vm7 kernel: [<ffffffff888ddeb6>] ldlm_lock_enqueue+0x186/0xb20 [ptlrpc] Dec 29 03:50:21 client-27vm7 kernel: [<ffffffff888da7fd>] ldlm_lock_create+0x9bd/0x9f0 [ptlrpc] Dec 29 03:50:21 client-27vm7 kernel: [<ffffffff88902870>] ldlm_server_blocking_ast+0x0/0x83d [ptlrpc] Dec 29 03:50:21 client-27vm7 kernel: [<ffffffff888ffb39>] ldlm_handle_enqueue+0xc09/0x1210 [ptlrpc] Dec 29 03:50:21 client-27vm7 kernel: [<ffffffff88c4bb2e>] mds_handle+0x40ce/0x4cf0 [mds] Dec 29 03:50:21 client-27vm7 kernel: [<ffffffff887b7868>] libcfs_ip_addr2str+0x38/0x40 [libcfs] Dec 29 03:50:21 client-27vm7 kernel: [<ffffffff887b7c7e>] libcfs_nid2str+0xbe/0x110 [libcfs] Dec 29 03:50:21 client-27vm7 kernel: [<ffffffff8892aaf5>] ptlrpc_server_log_handling_request+0x105/0x130 [ptlrpc] Dec 29 03:50:21 client-27vm7 kernel: [<ffffffff8892d874>] ptlrpc_server_handle_request+0x984/0xe00 [ptlrpc] Dec 29 03:50:21 client-27vm7 kernel: [<ffffffff8892dfd5>] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Dec 29 03:50:21 client-27vm7 kernel: [<ffffffff8008d2a9>] __wake_up_common+0x3e/0x68 Dec 29 03:50:21 client-27vm7 kernel: [<ffffffff8892ef16>] ptlrpc_main+0xf16/0x10e0 [ptlrpc] Dec 29 03:50:21 client-27vm7 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Dec 29 03:50:21 client-27vm7 kernel: [<ffffffff8892e000>] ptlrpc_main+0x0/0x10e0 [ptlrpc] Dec 29 03:50:21 client-27vm7 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Dec 29 03:50:21 client-27vm7 kernel: Dec 29 03:50:21 client-27vm7 kernel: LustreError: dumping log to /tmp/lustre-log.1356781818.2526 Dec 29 03:50:32 client-27vm7 kernel: LustreError: 2526:0:(lov_request.c:694:lov_update_create_set()) error creating fid 0x30003c sub-object on OST idx 0/1: rc = -5 Dec 29 03:50:32 client-27vm7 kernel: LustreError: 2526:0:(mds_open.c:440:mds_create_objects()) error creating objects for inode 3145788: rc = -5 Dec 29 03:50:32 client-27vm7 kernel: LustreError: 2526:0:(mds_open.c:825:mds_finish_open()) mds_create_objects: rc = -5 Dec 29 03:50:32 client-27vm7 kernel: Lustre: Service thread pid 2526 completed after 69.90s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Dec 29 03:50:42 client-27vm7 kernel: LustreError: 2442:0:(lov_request.c:694:lov_update_create_set()) error creating fid 0x30003c sub-object on OST idx 1/1: rc = -11 Dec 29 03:51:42 client-27vm7 kernel: LustreError: 2532:0:(lov_request.c:694:lov_update_create_set()) error creating fid 0x30003c sub-object on OST idx 1/1: rc = -5 Dec 29 03:51:42 client-27vm7 kernel: LustreError: 2532:0:(mds_open.c:440:mds_create_objects()) error creating objects for inode 3145788: rc = -5 Dec 29 03:51:42 client-27vm7 kernel: LustreError: 2532:0:(mds_open.c:825:mds_finish_open()) mds_create_objects: rc = -5 Dec 29 03:51:42 client-27vm7 rshd[3223]: root@client-27vm1.lab.whamcloud.com as root: cmd='/usr/sbin/lctl mark "/usr/sbin/lctl mark replay-single test_80b: @@@@@@ FAIL: Cannot write ";echo XXRETCODE:$?' Dec 29 03:51:42 client-27vm7 kernel: Lustre: DEBUG MARKER: /usr/sbin/lctl mark replay-single test_80b: @@@@@@ FAIL: Cannot write Maloo report: https://maloo.whamcloud.com/test_sets/56e3c084-51b9-11e2-a904-52540035b04c Test 81a,81b,82,83 also failed with the same issue. |
| Comments |
| Comment by Jian Yu [ 03/Jan/13 ] |
|
The replay-single test 80b, 81a, 81b, 82, 83 passed in another failover test run on the same Lustre b1_8 build #236: |
| Comment by Jian Yu [ 09/Jan/13 ] |
|
One more instance: Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/238 |
| Comment by James A Simmons [ 14/Aug/16 ] |
|
Old blocker for unsupported version |