Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5904

Interop 2.6.0<->master sanity-scrub test_13: BUG: ldlm_lock_create() on OST

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.7.0
    • None
    • server: 2.6.0
      client: lustre-master build #2733
    • 3
    • 16494

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/a99dab2e-68f9-11e4-9444-5254006e85c2.

      The sub-test test_13 failed with the following error:

      test failed to respond and timed out
      
      15:56:02:Lustre: DEBUG MARKER: == sanity-scrub test 13: OI scrub can rebuild missed /O entries == 15:55:38 (1415404538)
      15:56:02:Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0x196
      15:56:02:Lustre: *** cfs_fail_loc=196, val=0***
      15:56:02:Lustre: Skipped 63 previous similar messages
      15:56:02:Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0
      15:56:02:Lustre: DEBUG MARKER: grep -c /mnt/ost1' ' /proc/mounts
      15:56:03:Lustre: DEBUG MARKER: umount -d /mnt/ost1
      15:56:03:Lustre: Failing over lustre-OST0000
      15:56:03:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
      15:56:03:Lustre: DEBUG MARKER: mkdir -p /mnt/ost1
      15:56:03:Lustre: DEBUG MARKER: test -b /dev/lvm-Role_OSS/P1
      15:56:03:Lustre: DEBUG MARKER: mkdir -p /mnt/ost1; mount -t lustre -o user_xattr,noscrub  		                   /dev/lvm-Role_OSS/P1 /mnt/ost1
      15:56:03:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: 
      15:56:04:LustreError: 137-5: lustre-OST0000_UUID: not available for connect from 10.2.4.225@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
      15:56:04:Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
      15:56:04:Lustre: DEBUG MARKER: e2label /dev/lvm-Role_OSS/P1 2>/dev/null
      15:56:04:Lustre: lustre-OST0000: Will be in recovery for at least 1:00, or until 2 clients reconnect
      15:56:04:Lustre: lustre-OST0000: Denying connection for new client 02b0a6e6-f977-0e0b-772a-b2c3b66ac87b (at 10.2.4.220@tcp), waiting for all 2 known clients (0 recovered, 0 in progress, and 0 evicted) to recover in 0:59
      15:56:04:Lustre: lustre-OST0000: Denying connection for new client 02b0a6e6-f977-0e0b-772a-b2c3b66ac87b (at 10.2.4.220@tcp), waiting for all 2 known clients (1 recovered, 0 in progress, and 0 evicted) to recover in 1:19
      15:56:04:Lustre: lustre-OST0000: Recovery over after 0:06, of 2 clients 2 recovered and 0 were evicted.
      15:56:04:Lustre: lustre-OST0000: deleting orphan objects from 0x0:738 to 0x0:769
      15:56:04:LustreError: 11266:0:(ldlm_resource.c:1150:ldlm_resource_get()) lustre-OST0000: lvbo_init failed for resource 0x2d8:0x0: rc = -2
      15:56:04:LustreError: 11266:0:(ldlm_resource.c:1150:ldlm_resource_get()) Skipped 63 previous similar messages
      15:56:04:BUG: unable to handle kernel paging request at 000000005a5a5a5a
      15:56:04:IP: [<ffffffffa07ea251>] ldlm_lock_create+0x201/0xd70 [ptlrpc]
      15:56:04:PGD 0 
      15:56:05:Oops: 0000 [#1] SMP 
      15:56:05:last sysfs file: /sys/devices/system/cpu/online
      15:56:05:CPU 0 
      15:56:05:Modules linked in: osp(U) ofd(U) lfsck(U) ost(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic sha256_generic libcfs(U) ldiskfs(U) jbd2 nfs fscache nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs autofs4 ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: obdecho]
      15:56:05:
      15:56:05:Pid: 11261, comm: ll_ost00_014 Tainted: G        W  ---------------    2.6.32-431.20.3.el6_lustre.x86_64 #1 Red Hat KVM
      15:56:06:RIP: 0010:[<ffffffffa07ea251>]  [<ffffffffa07ea251>] ldlm_lock_create+0x201/0xd70 [ptlrpc]
      15:56:06:RSP: 0018:ffff88006c515c60  EFLAGS: 00010246
      15:56:06:RAX: ffff88006ccc1358 RBX: ffff88006ccc1180 RCX: ffff880037a64880
      15:56:06:RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88006ccc1290
      15:56:06:RBP: ffff88006c515cb0 R08: 0000000000000005 R09: 0000000000000000
      15:56:06:R10: ffff88006ccc1180 R11: 0000000000000200 R12: ffff88006ccc1180
      15:56:06:R13: ffffffffa0916ea0 R14: ffff88007ba5e400 R15: 000000005a5a5a5a
      15:56:06:FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
      15:56:06:CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      15:56:06:CR2: 000000005a5a5a5a CR3: 000000007db32000 CR4: 00000000000006f0
      15:56:06:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      15:56:07:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      15:56:07:Process ll_ost00_014 (pid: 11261, threadinfo ffff88006c514000, task ffff88006f803540)
      15:56:07:Stack:
      15:56:07: ffff880000000010 0000000000000000 0000000b00000004 ffffffffa05d5e31
      15:56:07:<d> ffff88007ce36738 ffff88006e0ed850 ffff88007ce36818 ffffffffa0916ea0
      15:56:07:<d> ffff88007ba5e400 0000000000000001 ffff88006c515d20 ffffffffa0811c74
      15:56:07:Call Trace:
      15:56:07: [<ffffffffa05d5e31>] ? lprocfs_counter_add+0x151/0x1c0 [obdclass]
      15:56:07: [<ffffffffa0811c74>] ldlm_handle_enqueue0+0x174/0x11d0 [ptlrpc]
      15:56:07: [<ffffffffa08947c2>] tgt_enqueue+0x62/0x1d0 [ptlrpc]
      15:56:08: [<ffffffffa0894b6c>] tgt_request_handle+0x23c/0xac0 [ptlrpc]
      15:56:08: [<ffffffffa084426a>] ptlrpc_main+0xd1a/0x1980 [ptlrpc]
      15:56:08: [<ffffffffa0843550>] ? ptlrpc_main+0x0/0x1980 [ptlrpc]
      15:56:08: [<ffffffff8109abf6>] kthread+0x96/0xa0
      15:56:08: [<ffffffff8100c20a>] child_rip+0xa/0x20
      15:56:09: [<ffffffff8109ab60>] ? kthread+0x0/0xa0
      15:56:09: [<ffffffff8100c200>] ? child_rip+0x0/0x20
      15:56:09:Code: 00 00 49 8d 84 24 d8 01 00 00 49 c7 84 24 90 00 00 00 00 00 00 00 ba 01 00 00 00 49 89 84 24 d8 01 00 00 49 89 84 24 e0 01 00 00 <49> 8b 07 48 8b 00 48 8b b8 88 01 00 00 e8 7d ba de ff 4d 89 24 
      15:56:09:RIP  [<ffffffffa07ea251>] ldlm_lock_create+0x201/0xd70 [ptlrpc]
      15:56:09: RSP <ffff88006c515c60>
      15:56:09:CR2: 000000005a5a5a5a
      15:56:09:Initializing cgroup subsys cpuset
      15:56:09:Initializing cgroup subsys cpu
      

      Info required for matching: sanity-scrub 13

      Attachments

        Activity

          People

            wc-triage WC Triage
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: