Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2573

Replay-single test_26 Error: 'test failed to respond and timed out'

    XMLWordPrintable

Details

    • 3
    • 6010

    Description

      This is from a git submission and the automated testing seen here: https://maloo.whamcloud.com/test_sessions/5c4ed954-4693-11e2-b16f-52540035b04c

      The mds has a page fault and reboots.

      Dec 14 21:52:23 client-30vm3 kernel: Lustre: DEBUG MARKER: umount -d /mnt/mds1
      Dec 14 21:52:23 client-30vm3 xinetd[1573]: EXIT: shell status=0 pid=7391 duration=0(sec)
      Dec 14 21:52:23 client-30vm3 xinetd[1573]: START: shell pid=7414 from=::ffff:10.10.4.185
      Dec 14 21:52:23 client-30vm3 rshd[7415]: root@client-30vm6.lab.whamcloud.com as root: cmd='(PATH=$PATH:/usr/lib64/lustre/utils:/usr/lib64/lustre/tests:/sbin:/usr/sbin; cd /usr/lib64/lustre/tests; LUSTRE="/usr/lib64/lustre" sh -c "umount -d /mnt/mds1");echo XXRETCODE:$?'
      Dec 14 21:52:30 client-30vm3 kernel: Removing read-only on unknown block (0xfd00000)
      Dec 14 21:52:30 client-30vm3 kernel: BUG: Bad page map in process in.rshd  pte:00000001 pmd:7bd7d067
      Dec 14 21:52:30 client-30vm3 xinetd[1573]: EXIT: shell status=0 pid=7414 duration=7(sec)
      Dec 14 21:52:30 client-30vm3 kernel: page:ffffea0000000000 flags:(null) count:-1 mapcount:-1 mapping:(null) index:0
      Dec 14 21:52:30 client-30vm3 kernel: addr:00007fc144a2e000 vm_flags:08000070 anon_vma:(null) mapping:ffff88007b46a558 index:91
      Dec 14 21:52:30 client-30vm3 kernel: vma->vm_ops->fault: filemap_fault+0x0/0x500
      Dec 14 21:52:30 client-30vm3 kernel: vma->vm_file->f_op->mmap: generic_file_mmap+0x0/0x60
      Dec 14 21:52:30 client-30vm3 kernel: Pid: 7414, comm: in.rshd Not tainted 2.6.32-279.14.1.el6_lustre.g5fd2de9.x86_64 #1
      Dec 14 21:52:30 client-30vm3 kernel: Call Trace:
      Dec 14 21:52:30 client-30vm3 kernel: [<ffffffff8113ab48>] ? print_bad_pte+0x1d8/0x290
      Dec 14 21:52:30 client-30vm3 xinetd[1573]: START: shell pid=7438 from=::ffff:10.10.4.185
      Dec 14 21:52:30 client-30vm3 kernel: [<ffffffff81280f8c>] ? __bitmap_weight+0x8c/0xb0
      Dec 14 21:52:30 client-30vm3 kernel: [<ffffffff8113d99b>] ? unmap_vmas+0xbeb/0xc30
      Dec 14 21:52:30 client-30vm3 kernel: [<ffffffff81144ce1>] ? unmap_region+0x91/0x130
      Dec 14 21:52:30 client-30vm3 kernel: [<ffffffff81145396>] ? do_munmap+0x2b6/0x3a0
      Dec 14 21:52:30 client-30vm3 kernel: [<ffffffff811454d6>] ? sys_munmap+0x56/0x80
      Dec 14 21:52:30 client-30vm3 kernel: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
      Dec 14 21:52:30 client-30vm3 kernel: Disabling lock debugging due to kernel taint
      Dec 14 21:52:30 client-30vm3 rshd[7439]: root@client-30vm6.lab.whamcloud.com as root: cmd='/usr/sbin/lctl mark "lsm
      

      The system reboots a few seconds later. The MDS system log seems to be the real trace of the root issue and can be seen here https://maloo.whamcloud.com/test_logs/c2101c70-4694-11e2-b16f-52540035b04c/download

      The mds resets a 2nd time during Sanity.

      This looks to possibly be an isolated issue. 0 of the last 100 tests reported.

      Attachments

        Activity

          People

            wc-triage WC Triage
            keith Keith Mannthey (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: