[LU-9908] conf-sanity test_41b: test failed to respond and timed out Created: 24/Aug/17 Updated: 24/Oct/17 Resolved: 16/Oct/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.1 |
| Fix Version/s: | Lustre 2.11.0, Lustre 2.10.2 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
This issue was created by maloo for Bob Glossman <bob.glossman@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/c07d5048-8871-11e7-b93b-5254006e85c2. The sub-test test_41b failed with the following error: test failed to respond and timed out test hangs & fails during client umount of lustre. can't find root cause(s). have looked for OOPs or Panics with stack traces and can't find any. History search shows several similar fails on sles12sp2 recently. Info required for matching: conf-sanity 41b |
| Comments |
| Comment by Bob Glossman (Inactive) [ 24/Aug/17 ] |
|
this failure didn't reproduce on retest, so it's not a 100% fail. |
| Comment by Bob Glossman (Inactive) [ 24/Aug/17 ] |
|
fails like this are happening in more places than test 41b. once again it hangs during a client umount. since conf-sanity on sles12sp2 is tested so little this failure may have been lurking for quite a long time. |
| Comment by Yang Sheng [ 25/Aug/17 ] |
|
Looks like client hang: 19:32:01:[15548.934823] Leftover inexact backtrace: 19:32:01:[15548.934823] 19:32:01:[15548.934826] umount S 0000000000000000 0 22377 22376 0x00000000 19:32:01:[15548.934827] ffff88007ae87a78 ffff8800641a1300 ffff88007c1b5800 ffff88007ae88000 19:32:01:[15548.934828] ffff88007ae87ab0 00000001003a2c78 ffff88007fc0e040 0000000000000000 19:32:01:[15548.934829] ffff88007ae87a90 ffffffff815e4c45 ffff88007fc0e040 ffff88007ae87b38 19:32:01:[15548.934830] Call Trace: 19:32:01:[15548.934832] [<ffffffff815e4c45>] schedule+0x35/0x80 19:32:01:[15548.934833] [<ffffffff815e74d3>] schedule_timeout+0x163/0x2d0 19:32:01:[15548.934857] [<ffffffffa0994c7b>] ptlrpc_set_wait+0x1cb/0x850 [ptlrpc] 19:32:01:[15548.934881] [<ffffffffa0995378>] ptlrpc_queue_wait+0x78/0x210 [ptlrpc] 19:32:01:[15548.934889] [<ffffffffa0ac851b>] mdc_statfs+0xab/0x2e0 [mdc] 19:32:01:[15548.934898] [<ffffffffa092a1ce>] lmv_statfs+0x26e/0xa30 [lmv] 19:32:01:[15548.934917] [<ffffffffa0c3bbeb>] ll_statfs_internal+0xeb/0xe00 [lustre] 19:32:01:[15548.934929] [<ffffffffa0c3c97b>] ll_statfs+0x7b/0x160 [lustre] 19:32:01:[15548.934932] [<ffffffff8122dc13>] statfs_by_dentry+0x93/0x110 19:32:01:[15548.934935] [<ffffffff8122dca6>] vfs_statfs+0x16/0xb0 19:32:01:[15548.934937] [<ffffffff8122dd80>] user_statfs+0x40/0x70 19:32:01:[15548.934939] [<ffffffff8122ddc0>] SYSC_statfs+0x10/0x30 19:32:01:[15548.934941] [<ffffffff815e872e>] entry_SYSCALL_64_fastpath+0x12/0x6d 19:32:01:[15548.936314] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x12/0x6d 19:32:01:[15548.936314] |
| Comment by Bob Glossman (Inactive) [ 28/Aug/17 ] |
|
this fail seems to be reproducing. Here's another one: |
| Comment by Gerrit Updater [ 28/Aug/17 ] |
|
Yang Sheng (yang.sheng@intel.com) uploaded a new patch: https://review.whamcloud.com/28767 |
| Comment by Bob Glossman (Inactive) [ 07/Sep/17 ] |
|
another on master: |
| Comment by Bob Glossman (Inactive) [ 14/Sep/17 ] |
|
the patch https://review.whamcloud.com/28767 changes (fixes?) test 70e, but does nothing for similar fails seen in test 41b. Here's another seen on b2_10 in 41b: |
| Comment by Bob Glossman (Inactive) [ 18/Sep/17 ] |
|
another on master: |
| Comment by Gerrit Updater [ 20/Sep/17 ] |
|
Bob Glossman (bob.glossman@intel.com) uploaded a new patch: https://review.whamcloud.com/29108 |
| Comment by Sarah Liu [ 20/Sep/17 ] |
|
another one on b2_10 branch 2.10.1 RC1 testing with SLES12sp2 client |
| Comment by Gerrit Updater [ 16/Oct/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28767/ |
| Comment by Peter Jones [ 16/Oct/17 ] |
|
Landed for 2.11 |
| Comment by Gerrit Updater [ 24/Oct/17 ] |
|
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/29108/ |