[LU-552] 1.8<->2.1 interop: parallel-scale connectathon test hung Created: 29/Jul/11 Updated: 16/Aug/16 Resolved: 16/Aug/16 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0, Lustre 1.8.6 |
| Fix Version/s: | Lustre 2.1.0, Lustre 1.8.7 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Jian Yu | Assignee: | WC Triage |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre Clients: Lustre Servers: |
||
| Severity: | 3 |
| Rank (Obsolete): | 10342 |
| Description |
|
parallel-scale connectathon test hung as follows: Test #7 - Test parent/child mutual exclusion. Parent: 7.0 - F_TLOCK [ ffc, 9] PASSED. Parent: Wrote 'aaaa eh' to testfile [ 4092, 7 ]. Parent: Now free child to run, should block on lock. Parent: Check data in file to insure child blocked. Parent: Read 'aaaa eh' from testfile [ 4092, 7 ]. Parent: 7.1 - COMPARE [ ffc, 7] PASSED. Parent: Now unlock region so child will unblock. Parent: 7.2 - F_ULOCK [ ffc, 9] PASSED. On client node fat-amd-3-ib: [root@fat-amd-3-ib tests]# ps auxww <~snip~> root 16272 0.0 0.0 107268 2160 pts/0 S+ 08:19 0:00 bash /usr/lib64/lustre/tests/parallel-scale.sh root 16274 0.0 0.0 106020 1304 pts/0 S+ 08:19 0:00 sh runtests -f root 16281 0.0 0.0 6600 556 pts/0 S+ 08:19 0:00 tlocklfs /mnt/lustre/d0.connectathon root 16282 0.0 0.0 6436 332 pts/0 S+ 08:19 0:00 tlocklfs /mnt/lustre/d0.connectathon [root@fat-amd-3-ib tests]# echo t > /proc/sysrq-trigger <~snip~> tlocklfs S 0000000000000004 0 16281 16274 0x00000080 ffff880234c9dca8 0000000000000082 0000000000000000 0000000000000082 ffff880234c9dc28 ffff8803d5c97cb8 0000000000000000 0000000101fbf9c8 ffff8803190f7ab8 ffff880234c9dfd8 000000000000f598 ffff8803190f7ab8 Call Trace: [<ffffffff8117bf7b>] pipe_wait+0x5b/0x80 [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 [<ffffffff814dbc1e>] ? mutex_lock+0x1e/0x50 [<ffffffff8117c9d6>] pipe_read+0x3e6/0x4e0 [<ffffffff811723ea>] do_sync_read+0xfa/0x140 [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 [<ffffffff811bc395>] ? fcntl_setlk+0x75/0x320 [<ffffffff81204ef6>] ? security_file_permission+0x16/0x20 [<ffffffff81172e15>] vfs_read+0xb5/0x1a0 [<ffffffff810d1ac2>] ? audit_syscall_entry+0x272/0x2a0 [<ffffffff81172f51>] sys_read+0x51/0x90 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b tlocklfs S 0000000000000004 0 16282 16281 0x00000080 ffff880105f5ba98 0000000000000086 00003f9affffff9d 0000020300003f9a 0000000000000000 0000000000000001 ffff880105f5ba88 ffffffffa0789fb0 ffff880104877ab8 ffff880105f5bfd8 000000000000f598 ffff880104877ab8 Call Trace: [<ffffffffa0789fb0>] ? ldlm_lock_dump+0x560/0x640 [ptlrpc] [<ffffffffa07b900d>] ldlm_flock_completion_ast+0x61d/0x9f0 [ptlrpc] [<ffffffff8105dc20>] ? default_wake_function+0x0/0x20 [<ffffffffa07a7565>] ldlm_cli_enqueue_fini+0x6c5/0xba0 [ptlrpc] [<ffffffff8105dc20>] ? default_wake_function+0x0/0x20 [<ffffffffa07ab074>] ldlm_cli_enqueue+0x344/0x7a0 [ptlrpc] [<ffffffffa09a7edd>] ll_file_flock+0x47d/0x6b0 [lustre] [<ffffffff81190f40>] ? mntput_no_expire+0x30/0x110 [<ffffffffa07b89f0>] ? ldlm_flock_completion_ast+0x0/0x9f0 [ptlrpc] [<ffffffff8117f451>] ? path_put+0x31/0x40 [<ffffffff811bc243>] vfs_lock_file+0x23/0x40 [<ffffffff811bc497>] fcntl_setlk+0x177/0x320 [<ffffffff811845f7>] sys_fcntl+0x197/0x530 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b Dmesg on the MDS node fat-amd-1-ib showed that: Lustre: DEBUG MARKER: == test connectathon: connectathon == 08:17:31 Lustre: DEBUG MARKER: ./runtests -N 10 -b -f /mnt/lustre/d0.connectathon Lustre: DEBUG MARKER: ./runtests -N 10 -g -f /mnt/lustre/d0.connectathon Lustre: DEBUG MARKER: ./runtests -N 10 -s -f /mnt/lustre/d0.connectathon Lustre: DEBUG MARKER: ./runtests -N 10 -l -f /mnt/lustre/d0.connectathon Lustre: Service thread pid 27106 was inactive for 0.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Lustre: Service thread pid 27106 completed after 0.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Pid: 27106, comm: mdt_06 Call Trace: [<ffffffffa09c16fe>] cfs_waitq_wait+0xe/0x10 [libcfs] [<ffffffffa0c17b89>] ptlrpc_wait_event+0x2b9/0x2c0 [ptlrpc] [<ffffffff8105dc60>] ? default_wake_function+0x0/0x20 [<ffffffffa0c1f6a5>] ptlrpc_main+0x4f5/0x1900 [ptlrpc] [<ffffffff8100c1ca>] child_rip+0xa/0x20 [<ffffffffa0c1f1b0>] ? ptlrpc_main+0x0/0x1900 [ptlrpc] [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 Maloo report: https://maloo.whamcloud.com/test_sets/52ca975a-b9a5-11e0-8bdf-52540025f9af |
| Comments |
| Comment by Jian Yu [ 26/Aug/11 ] |
|
Lustre Clients: Lustre Servers: Client mount options: "user_xattr,acl,flock" Client mount options: "user_xattr,acl" Client mount options: "user_xattr,acl,localflock" |
| Comment by Jian Yu [ 15/Feb/12 ] |
|
Lustre Clients: Lustre Servers: Client mount options: "user_xattr,acl,flock" |
| Comment by James A Simmons [ 16/Aug/16 ] |
|
Old blocker for unsupported version |