[LU-4177] Failure on test suite parallel-scale-nfsv3 test_iorssf Created: 29/Oct/13 Updated: 08/Jan/14 Resolved: 30/Oct/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Di Wang |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | dne | ||
| Environment: |
client and server: lustre-b2_5 build #2 RHEL6 ldiskfs |
||
| Severity: | 3 |
| Bugzilla ID: | 1,639 |
| Rank (Obsolete): | 11308 |
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/861fabce-40a6-11e3-a5c2-52540035b04c. The sub-test test_iorssf failed with the following error:
client dmesg shows: Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test iorssf: iorssf == 01:40:35 (1383036035) INFO: task IOR:13356 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. IOR D 0000000000000000 0 13356 13352 0x00000080 ffff880324b4bc88 0000000000000082 0000000000000000 ffff880324b4bc4c 0000000000000000 ffff88033fc24300 ffff880032e76700 0000000000000400 ffff880323f4c638 ffff880324b4bfd8 000000000000fb88 ffff880323f4c638 Call Trace: [<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8150f62b>] mutex_lock+0x2b/0x50 [<ffffffff8111c4b1>] generic_file_aio_write+0x71/0x100 [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13 [<ffffffffa0386f8e>] nfs_file_write+0xde/0x1f0 [nfs] [<ffffffff8118106a>] do_sync_write+0xfa/0x140 [<ffffffff81096da0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff8121bed6>] ? security_file_permission+0x16/0x20 [<ffffffff81181368>] vfs_write+0xb8/0x1a0 [<ffffffff81181c61>] sys_write+0x51/0x90 [<ffffffff810dc685>] ? __audit_syscall_exit+0x265/0x290 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Lustre: DEBUG MARKER: parallel-scale-nfsv3 test_iorssf: @@@@@@ FAIL: test_iorssf failed with 1 |
| Comments |
| Comment by Sarah Liu [ 29/Oct/13 ] |
|
same error seen in parallel-scale-nfsv4 test_iorssf: https://maloo.whamcloud.com/test_sets/01509f4c-40a7-11e3-a5c2-52540035b04c Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test iorssf: iorssf == 02:27:02 (1383038822) INFO: task IOR:17809 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. IOR D 0000000000000002 0 17809 17806 0x00000080 ffff8803293e9c88 0000000000000086 0000000000000000 ffff880032e36700 0000000000000002 0000000000000001 ffff8803293e9c28 ffffffff81065d54 ffff88032cfb1058 ffff8803293e9fd8 000000000000fb88 ffff88032cfb1058 Call Trace: [<ffffffff81065d54>] ? enqueue_task_fair+0x64/0x100 [<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8150f62b>] mutex_lock+0x2b/0x50 [<ffffffff8111c4b1>] generic_file_aio_write+0x71/0x100 [<ffffffffa0386f8e>] nfs_file_write+0xde/0x1f0 [nfs] [<ffffffff8118106a>] do_sync_write+0xfa/0x140 [<ffffffff81096da0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8121bed6>] ? security_file_permission+0x16/0x20 [<ffffffff81181368>] vfs_write+0xb8/0x1a0 [<ffffffff81181c61>] sys_write+0x51/0x90 [<ffffffff810dc685>] ? __audit_syscall_exit+0x265/0x290 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task IOR:17810 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. IOR D 0000000000000002 0 17810 17806 0x00000080 ffff88032bb81c88 0000000000000086 0000000000000000 ffff88032bb81c88 0000000000000000 0000000000000000 00000000ffffffff ffff88032bb80000 ffff880334f01ab8 ffff88032bb81fd8 000000000000fb88 ffff880334f01ab8 Call Trace: [<ffffffff8150e18a>] ? thread_return+0xa8/0x76e [<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8150f62b>] mutex_lock+0x2b/0x50 [<ffffffff8111c4b1>] generic_file_aio_write+0x71/0x100 [<ffffffffa0386f8e>] nfs_file_write+0xde/0x1f0 [nfs] [<ffffffff8118106a>] do_sync_write+0xfa/0x140 [<ffffffff81096da0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff811c419b>] ? inotify_inode_queue_event+0xeb/0x120 [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff8121bed6>] ? security_file_permission+0x16/0x20 [<ffffffff81181368>] vfs_write+0xb8/0x1a0 [<ffffffff8117f8fa>] ? vfs_llseek+0x2a/0x40 [<ffffffff81181c61>] sys_write+0x51/0x90 [<ffffffff810dc685>] ? __audit_syscall_exit+0x265/0x290 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b |
| Comment by Jodi Levi (Inactive) [ 29/Oct/13 ] |
|
Di, |
| Comment by Di Wang [ 30/Oct/13 ] |
|
Path: /mnt/lustre/d0.ior.ssf Summary: Commencing write performance test. Commencing read performance test. Operation Max (MiB) Min (MiB) Mean (MiB) Std Dev Max (OPs) Min (OPs) Mean (OPs) Std Dev Mean (s) Op grep #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize --------- --------- --------- ---------- ------- --------- --------- ---------- ------- -------- Max Write: 16.64 MiB/sec (17.45 MB/sec) Run finished: Tue Oct 29 01:54:04 2013 It seems duplicate with LU-1639. |
| Comment by Di Wang [ 30/Oct/13 ] |
|
I checked debug log on MDS, it seems show the directory is indeed not empty 00000004:00000001:3.0:1383036844.842826:0:3111:0:(osd_handler.c:4729:osd_it_ea_next()) Process entered 00000004:00000001:3.0:1383036844.842827:0:3111:0:(osd_handler.c:4737:osd_it_ea_next()) Process leaving (rc=0 : 0 : 0) 00000004:00000001:3.0:1383036844.842828:0:3111:0:(osd_handler.c:4565:osd_it_ea_fini()) Process entered 00000020:00000001:3.0:1383036844.842830:0:3111:0:(lustre_fid.h:719:fid_flatten32()) Process leaving (rc=29361006 : 29361006 : 1c0036e) 00000004:00000001:3.0:1383036844.842831:0:3111:0:(osd_handler.c:4568:osd_it_ea_fini()) Process leaving 00000004:00000001:3.0:1383036844.842832:0:3111:0:(mdd_dir.c:271:mdd_dir_is_empty()) Process leaving (rc=18446744073709551577 : -39 : ffffffffffffffd9) 00000004:00000001:3.0:1383036844.842833:0:3111:0:(mdd_dir.c:462:mdd_may_delete()) Process leaving (rc=18446744073709551577 : -39 : ffffffffffffffd9) 00000004:00000001:3.0:1383036844.842835:0:3111:0:(mdd_dir.c:1351:mdd_unlink_sanity_check()) Process leaving (rc=18446744073709551577 : -39 : ffffffffffffffd9) 00000004:00000001:3.0:1383036844.842836:0:3111:0:(mdd_dir.c:1502:mdd_unlink()) Process leaving via cleanup (rc=18446744073709551577 : -39 : 0xffffffffffffffd9) Though no idea about why iorData is not being deleted before rmdir d0.ior.ssf. Unfortunately there are no debug log here to know the reason. Anyway, it is nothing about DNE, should be duplicate with LU-1639. |