[LU-2573] Replay-single test_26 Error: 'test failed to respond and timed out' Created: 04/Jan/13 Updated: 17/Apr/17 Resolved: 17/Apr/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Keith Mannthey (Inactive) | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Autotest system |
||
| Severity: | 3 |
| Rank (Obsolete): | 6010 |
| Description |
|
This is from a git submission and the automated testing seen here: https://maloo.whamcloud.com/test_sessions/5c4ed954-4693-11e2-b16f-52540035b04c The mds has a page fault and reboots. Dec 14 21:52:23 client-30vm3 kernel: Lustre: DEBUG MARKER: umount -d /mnt/mds1 Dec 14 21:52:23 client-30vm3 xinetd[1573]: EXIT: shell status=0 pid=7391 duration=0(sec) Dec 14 21:52:23 client-30vm3 xinetd[1573]: START: shell pid=7414 from=::ffff:10.10.4.185 Dec 14 21:52:23 client-30vm3 rshd[7415]: root@client-30vm6.lab.whamcloud.com as root: cmd='(PATH=$PATH:/usr/lib64/lustre/utils:/usr/lib64/lustre/tests:/sbin:/usr/sbin; cd /usr/lib64/lustre/tests; LUSTRE="/usr/lib64/lustre" sh -c "umount -d /mnt/mds1");echo XXRETCODE:$?' Dec 14 21:52:30 client-30vm3 kernel: Removing read-only on unknown block (0xfd00000) Dec 14 21:52:30 client-30vm3 kernel: BUG: Bad page map in process in.rshd pte:00000001 pmd:7bd7d067 Dec 14 21:52:30 client-30vm3 xinetd[1573]: EXIT: shell status=0 pid=7414 duration=7(sec) Dec 14 21:52:30 client-30vm3 kernel: page:ffffea0000000000 flags:(null) count:-1 mapcount:-1 mapping:(null) index:0 Dec 14 21:52:30 client-30vm3 kernel: addr:00007fc144a2e000 vm_flags:08000070 anon_vma:(null) mapping:ffff88007b46a558 index:91 Dec 14 21:52:30 client-30vm3 kernel: vma->vm_ops->fault: filemap_fault+0x0/0x500 Dec 14 21:52:30 client-30vm3 kernel: vma->vm_file->f_op->mmap: generic_file_mmap+0x0/0x60 Dec 14 21:52:30 client-30vm3 kernel: Pid: 7414, comm: in.rshd Not tainted 2.6.32-279.14.1.el6_lustre.g5fd2de9.x86_64 #1 Dec 14 21:52:30 client-30vm3 kernel: Call Trace: Dec 14 21:52:30 client-30vm3 kernel: [<ffffffff8113ab48>] ? print_bad_pte+0x1d8/0x290 Dec 14 21:52:30 client-30vm3 xinetd[1573]: START: shell pid=7438 from=::ffff:10.10.4.185 Dec 14 21:52:30 client-30vm3 kernel: [<ffffffff81280f8c>] ? __bitmap_weight+0x8c/0xb0 Dec 14 21:52:30 client-30vm3 kernel: [<ffffffff8113d99b>] ? unmap_vmas+0xbeb/0xc30 Dec 14 21:52:30 client-30vm3 kernel: [<ffffffff81144ce1>] ? unmap_region+0x91/0x130 Dec 14 21:52:30 client-30vm3 kernel: [<ffffffff81145396>] ? do_munmap+0x2b6/0x3a0 Dec 14 21:52:30 client-30vm3 kernel: [<ffffffff811454d6>] ? sys_munmap+0x56/0x80 Dec 14 21:52:30 client-30vm3 kernel: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b Dec 14 21:52:30 client-30vm3 kernel: Disabling lock debugging due to kernel taint Dec 14 21:52:30 client-30vm3 rshd[7439]: root@client-30vm6.lab.whamcloud.com as root: cmd='/usr/sbin/lctl mark "lsm The system reboots a few seconds later. The MDS system log seems to be the real trace of the root issue and can be seen here https://maloo.whamcloud.com/test_logs/c2101c70-4694-11e2-b16f-52540035b04c/download The mds resets a 2nd time during Sanity. This looks to possibly be an isolated issue. 0 of the last 100 tests reported. |
| Comments |
| Comment by Andreas Dilger [ 17/Apr/17 ] |
|
Close old issue. |