[LU-2573] Replay-single test_26 Error: 'test failed to respond and timed out' Created: 04/Jan/13  Updated: 17/Apr/17  Resolved: 17/Apr/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Keith Mannthey (Inactive) Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

Autotest system
https://maloo.whamcloud.com/test_sessions/5c4ed954-4693-11e2-b16f-52540035b04c


Severity: 3
Rank (Obsolete): 6010

 Description   

This is from a git submission and the automated testing seen here: https://maloo.whamcloud.com/test_sessions/5c4ed954-4693-11e2-b16f-52540035b04c

The mds has a page fault and reboots.

Dec 14 21:52:23 client-30vm3 kernel: Lustre: DEBUG MARKER: umount -d /mnt/mds1
Dec 14 21:52:23 client-30vm3 xinetd[1573]: EXIT: shell status=0 pid=7391 duration=0(sec)
Dec 14 21:52:23 client-30vm3 xinetd[1573]: START: shell pid=7414 from=::ffff:10.10.4.185
Dec 14 21:52:23 client-30vm3 rshd[7415]: root@client-30vm6.lab.whamcloud.com as root: cmd='(PATH=$PATH:/usr/lib64/lustre/utils:/usr/lib64/lustre/tests:/sbin:/usr/sbin; cd /usr/lib64/lustre/tests; LUSTRE="/usr/lib64/lustre" sh -c "umount -d /mnt/mds1");echo XXRETCODE:$?'
Dec 14 21:52:30 client-30vm3 kernel: Removing read-only on unknown block (0xfd00000)
Dec 14 21:52:30 client-30vm3 kernel: BUG: Bad page map in process in.rshd  pte:00000001 pmd:7bd7d067
Dec 14 21:52:30 client-30vm3 xinetd[1573]: EXIT: shell status=0 pid=7414 duration=7(sec)
Dec 14 21:52:30 client-30vm3 kernel: page:ffffea0000000000 flags:(null) count:-1 mapcount:-1 mapping:(null) index:0
Dec 14 21:52:30 client-30vm3 kernel: addr:00007fc144a2e000 vm_flags:08000070 anon_vma:(null) mapping:ffff88007b46a558 index:91
Dec 14 21:52:30 client-30vm3 kernel: vma->vm_ops->fault: filemap_fault+0x0/0x500
Dec 14 21:52:30 client-30vm3 kernel: vma->vm_file->f_op->mmap: generic_file_mmap+0x0/0x60
Dec 14 21:52:30 client-30vm3 kernel: Pid: 7414, comm: in.rshd Not tainted 2.6.32-279.14.1.el6_lustre.g5fd2de9.x86_64 #1
Dec 14 21:52:30 client-30vm3 kernel: Call Trace:
Dec 14 21:52:30 client-30vm3 kernel: [<ffffffff8113ab48>] ? print_bad_pte+0x1d8/0x290
Dec 14 21:52:30 client-30vm3 xinetd[1573]: START: shell pid=7438 from=::ffff:10.10.4.185
Dec 14 21:52:30 client-30vm3 kernel: [<ffffffff81280f8c>] ? __bitmap_weight+0x8c/0xb0
Dec 14 21:52:30 client-30vm3 kernel: [<ffffffff8113d99b>] ? unmap_vmas+0xbeb/0xc30
Dec 14 21:52:30 client-30vm3 kernel: [<ffffffff81144ce1>] ? unmap_region+0x91/0x130
Dec 14 21:52:30 client-30vm3 kernel: [<ffffffff81145396>] ? do_munmap+0x2b6/0x3a0
Dec 14 21:52:30 client-30vm3 kernel: [<ffffffff811454d6>] ? sys_munmap+0x56/0x80
Dec 14 21:52:30 client-30vm3 kernel: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
Dec 14 21:52:30 client-30vm3 kernel: Disabling lock debugging due to kernel taint
Dec 14 21:52:30 client-30vm3 rshd[7439]: root@client-30vm6.lab.whamcloud.com as root: cmd='/usr/sbin/lctl mark "lsm

The system reboots a few seconds later. The MDS system log seems to be the real trace of the root issue and can be seen here https://maloo.whamcloud.com/test_logs/c2101c70-4694-11e2-b16f-52540035b04c/download

The mds resets a 2nd time during Sanity.

This looks to possibly be an isolated issue. 0 of the last 100 tests reported.



 Comments   
Comment by Andreas Dilger [ 17/Apr/17 ]

Close old issue.

Generated at Sat Feb 10 01:26:20 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.