Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6108

Interop 2.6.0<->2.7 sanity-scrub test_12: OST oops

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.7.0
    • None
    • server: 2.6.0
      client: lustre-master build #2808
    • 3
    • 16995

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/767fbc34-9624-11e4-acfc-5254006e85c2.

      The sub-test test_12 failed with the following error:

      test failed to respond and timed out
      
      18:56:07:Lustre: DEBUG MARKER: == sanity-scrub test 12: OI scrub can rebuild invalid /O entries == 18:55:49 (1420512949)
      18:56:07:Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0x195
      18:56:07:Lustre: *** cfs_fail_loc=195, val=0***
      18:56:07:Lustre: DEBUG MARKER: grep -c /mnt/ost1' ' /proc/mounts
      18:56:07:Lustre: DEBUG MARKER: umount -d /mnt/ost1
      18:56:07:Lustre: Failing over lustre-OST0000
      18:56:07:Lustre: Skipped 2 previous similar messages
      18:56:07:Lustre: server umount lustre-OST0000 complete
      18:56:07:Lustre: Skipped 3 previous similar messages
      18:56:07:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
      18:56:07:LustreError: 137-5: lustre-OST0000_UUID: not available for connect from 10.2.4.99@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
      18:56:07:LustreError: Skipped 3 previous similar messages
      18:56:07:Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0x233
      18:56:07:Lustre: DEBUG MARKER: mkdir -p /mnt/ost1
      18:56:07:Lustre: DEBUG MARKER: test -b /dev/lvm-Role_OSS/P1
      18:56:07:Lustre: DEBUG MARKER: mkdir -p /mnt/ost1; mount -t lustre -o user_xattr,noscrub  		                   /dev/lvm-Role_OSS/P1 /mnt/ost1
      18:56:07:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: 
      18:56:07:Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
      18:56:07:Lustre: lustre-OST0000: Will be in recovery for at least 1:00, or until 2 clients reconnect
      18:56:07:Lustre: Skipped 1 previous similar message
      18:56:08:Lustre: DEBUG MARKER: e2label /dev/lvm-Role_OSS/P1 2>/dev/null
      18:56:08:Lustre: lustre-OST0000: Denying connection for new client 2bf68531-a8c6-f88c-5c44-2fce40112099 (at 10.2.4.94@tcp), waiting for all 2 known clients (1 recovered, 0 in progress, and 0 evicted) to recover in 1:05
      18:56:08:Lustre: lustre-OST0000: Recovery over after 0:05, of 2 clients 2 recovered and 0 were evicted.
      18:56:08:Lustre: Skipped 2 previous similar messages
      18:56:08:Lustre: *** cfs_fail_loc=233, val=0***
      18:56:08:LustreError: 9129:0:(ldlm_resource.c:1150:ldlm_resource_get()) lustre-OST0000: lvbo_init failed for resource 0x26b:0x0: rc = -78
      18:56:08:LustreError: 9129:0:(ldlm_resource.c:1150:ldlm_resource_get()) Skipped 2 previous similar messages
      18:56:08:BUG: unable to handle kernel paging request at 000000005a5a5a5a
      18:56:08:IP: [<ffffffffa07ea251>] ldlm_lock_create+0x201/0xd70 [ptlrpc]
      18:56:08:PGD 0 
      18:56:08:Oops: 0000 [#1] SMP 
      18:56:08:last sysfs file: /sys/devices/system/cpu/online
      18:56:08:CPU 0 
      18:56:08:Modules linked in: osp(U) ofd(U) lfsck(U) ost(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic sha256_generic libcfs(U) ldiskfs(U) jbd2 nfs fscache nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs autofs4 ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: obdecho]
      18:56:08:
      18:56:08:Pid: 11802, comm: ll_ost00_057 Tainted: G        W  ---------------    2.6.32-431.20.3.el6_lustre.x86_64 #1 Red Hat KVM
      18:56:08:RIP: 0010:[<ffffffffa07ea251>]  [<ffffffffa07ea251>] ldlm_lock_create+0x201/0xd70 [ptlrpc]
      18:56:08:RSP: 0018:ffff880040213c60  EFLAGS: 00010246
      18:56:08:RAX: ffff88004f297ad8 RBX: ffff88004f297900 RCX: ffff88007c022840
      18:56:08:RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88004f297a10
      18:56:09:RBP: ffff880040213cb0 R08: 0000000000000005 R09: 0000000000000000
      18:56:09:R10: ffff88004f297900 R11: 0000000000000200 R12: ffff88004f297900
      18:56:09:R13: ffffffffa0916ea0 R14: ffff88006d5a0000 R15: 000000005a5a5a5a
      18:56:09:FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
      18:56:09:CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      18:56:09:CR2: 000000005a5a5a5a CR3: 0000000079c56000 CR4: 00000000000006f0
      18:56:09:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      18:56:09:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      18:56:09:Process ll_ost00_057 (pid: 11802, threadinfo ffff880040212000, task ffff88004020eaa0)
      18:56:09:Stack:
      18:56:09: ffff880000000010 0000000000000000 0000000b00000004 ffffffffa05d5e31
      18:56:09:<d> ffff88003deddcc8 ffff880049cea000 ffff88003deddda8 ffffffffa0916ea0
      18:56:09:<d> ffff88006d5a0000 0000000000000001 ffff880040213d20 ffffffffa0811c74
      18:56:09:Call Trace:
      18:56:09: [<ffffffffa05d5e31>] ? lprocfs_counter_add+0x151/0x1c0 [obdclass]
      18:56:09: [<ffffffffa0811c74>] ldlm_handle_enqueue0+0x174/0x11d0 [ptlrpc]
      18:56:09: [<ffffffffa08947c2>] tgt_enqueue+0x62/0x1d0 [ptlrpc]
      18:56:09: [<ffffffffa0894b6c>] tgt_request_handle+0x23c/0xac0 [ptlrpc]
      18:56:09: [<ffffffffa084426a>] ptlrpc_main+0xd1a/0x1980 [ptlrpc]
      18:56:09: [<ffffffffa0843550>] ? ptlrpc_main+0x0/0x1980 [ptlrpc]
      18:56:09: [<ffffffff8109abf6>] kthread+0x96/0xa0
      18:56:09: [<ffffffff8100c20a>] child_rip+0xa/0x20
      18:56:09: [<ffffffff8109ab60>] ? kthread+0x0/0xa0
      18:56:09: [<ffffffff8100c200>] ? child_rip+0x0/0x20
      18:56:09:Code: 00 00 49 8d 84 24 d8 01 00 00 49 c7 84 24 90 00 00 00 00 00 00 00 ba 01 00 00 00 49 89 84 24 d8 01 00 00 49 89 84 24 e0 01 00 00 <49> 8b 07 48 8b 00 48 8b b8 88 01 00 00 e8 7d ba de ff 4d 89 24 
      18:56:09:RIP  [<ffffffffa07ea251>] ldlm_lock_create+0x201/0xd70 [ptlrpc]
      

      Info required for matching: sanity-scrub 12

      Attachments

        Activity

          People

            wc-triage WC Triage
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: