Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3760

Interop 2.3.0<->2.4 failure on test suite sanity-scrub test_6: unable to handle kernel paging request at ffff880336ebfe50

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.5.0
    • None
    • 3
    • 9692

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/61f9acf0-02bb-11e3-a4b4-52540035b04c.

      The sub-test test_6 failed with the following error:

      test failed to respond and timed out

      14:37:34:Lustre: DEBUG MARKER: == sanity-scrub test 6: OI scrub resumes from last checkpoint == 14:37:23 (1376170643)
      14:37:34:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
      14:37:34:Lustre: DEBUG MARKER: umount -d -f /mnt/mds1
      14:37:34:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
      14:38:29:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
      14:38:29:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
      14:38:29:Lustre: DEBUG MARKER: mkfs.lustre --mgs --fsname=lustre --mdt --index=0 --param=sys.timeout=20 --param=lov.stripesize=1048576 --param=lov.stripecount=0 --param=mdt.identity_upcall=/usr/sbin/l_getidentity --backfstype=ldiskfs --device-size=100000 --mountfsoptions=errors=remount
      14:38:29:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=off. Opts: 
      14:38:43:Lustre: DEBUG MARKER: running=$(grep -c /mnt/mds1' ' /proc/mounts);
      14:38:43:mpts=$(mount | grep -c /mnt/mds1' ');
      14:38:43:if [ $running -ne $mpts ]; then
      14:38:43:    echo $(hostname) env are INSANE!;
      14:38:43:    exit 1;
      14:38:43:fi
      14:38:43:Lustre: DEBUG MARKER: running=$(grep -c /mnt/mds1' ' /proc/mounts);
      14:38:43:mpts=$(mount | grep -c /mnt/mds1' ');
      14:38:43:if [ $running -ne $mpts ]; then
      14:38:43:    echo $(hostname) env are INSANE!;
      14:38:43:    exit 1;
      14:38:43:fi
      14:38:43:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1
      14:38:44:Lustre: DEBUG MARKER: test -b /dev/lvm-MDS/P1
      14:38:44:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre -o user_xattr,acl  		                   /dev/lvm-MDS/P1 /mnt/mds1
      14:38:44:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=off. Opts: 
      14:38:44:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=off. Opts: 
      14:38:44:Lustre: Setting parameter lustre-MDT0000-mdtlov.lov.stripesize in log lustre-MDT0000
      14:38:44:Lustre: Skipped 14 previous similar messages
      14:38:44:Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
      14:38:44:Lustre: DEBUG MARKER: e2label /dev/lvm-MDS/P1 2>/dev/null
      14:38:55:Lustre: DEBUG MARKER: lctl get_param -n timeout
      14:38:55:Lustre: DEBUG MARKER: /usr/sbin/lctl mark Using TIMEOUT=20
      14:38:55:Lustre: DEBUG MARKER: Using TIMEOUT=20
      14:38:55:Lustre: DEBUG MARKER: lctl dl | grep ' IN osc ' 2>/dev/null | wc -l
      14:38:55:Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version
      14:38:55:BUG: unable to handle kernel paging request at ffff880336ebfe50
      14:38:55:IP: [<ffffffffa0a164db>] alloc_qos+0x87b/0x2190 [lov]
      14:38:55:PGD 1a86063 PUD 0 
      14:38:55:Oops: 0000 [#1] SMP 
      14:38:55:last sysfs file: /sys/devices/system/cpu/possible
      14:38:55:CPU 0 
      14:38:55:Modules linked in: cmm(U) osd_ldiskfs(U) mdt(U) mdd(U) mds(U) fsfilt_ldiskfs(U) mgs(U) mgc(U) lustre(U) lquota(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic libcfs(U) ldiskfs(U) jbd2 nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
      14:38:55:
      14:38:55:Pid: 1238, comm: mdt00_002 Not tainted 2.6.32-279.5.1.el6_lustre.gb16fe80.x86_64 #1 Red Hat KVM
      14:38:55:RIP: 0010:[<ffffffffa0a164db>]  [<ffffffffa0a164db>] alloc_qos+0x87b/0x2190 [lov]
      14:38:55:RSP: 0018:ffff88003778d510  EFLAGS: 00010213
      14:38:55:RAX: ffff880064192b80 RBX: ffff880050dc4908 RCX: ffff88005a1d6b80
      14:38:55:RDX: 0000000000000006 RSI: ffff8800634712f8 RDI: ffff880050dc48d8
      14:38:55:RBP: ffff88003778d5c0 R08: 000000005a5a5a5a R09: 0000000000000001
      14:38:55:R10: ffff880050dc47b8 R11: 0000000000000006 R12: 0000000000000000
      14:38:55:R13: 0000000000000006 R14: 000000005a5a5a5a R15: ffff880050dc4758
      14:38:55:FS:  00007f5102c27700(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
      14:38:55:CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      14:38:56:CR2: ffff880336ebfe50 CR3: 0000000037d04000 CR4: 00000000000006f0
      14:38:56:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      14:38:56:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      14:38:56:Process mdt00_002 (pid: 1238, threadinfo ffff88003778c000, task ffff880064586aa0)
      14:38:56:Stack:
      14:38:56: ffff88003778d670 0000000600000000 ffff88005aa23478 ffff880050dc47b8
      14:38:56:<d> 0000000000000000 ffffffff811ad7d7 0000000000000001 0000000000000246
      14:38:56:<d> ffff88005a1d6b80 0000000600000050 ffff880050dc48d8 ffff88003778d62c
      14:38:56:Call Trace:
      14:38:56: [<ffffffff811ad7d7>] ? __find_get_block+0x97/0xe0
      14:38:56: [<ffffffffa0a17f20>] alloc_idx_array+0x130/0xdf0 [lov]
      14:38:56: [<ffffffffa0a19b14>] qos_prep_create+0xf4/0x1600 [lov]
      14:38:56: [<ffffffffa0a13aba>] lov_prep_create_set+0xea/0x390 [lov]
      14:38:56: [<ffffffffa09fa78c>] lov_create+0x1ac/0x1410 [lov]
      14:38:56: [<ffffffffa0d02bdb>] ? osd_object_read_unlock+0x9b/0xe0 [osd_ldiskfs]
      14:38:56: [<ffffffffa0c13f06>] ? mdd_read_unlock+0x26/0x30 [mdd]
      14:38:56: [<ffffffffa0bf8a8c>] mdd_lov_create+0xd0c/0x21c0 [mdd]
      14:38:56: [<ffffffffa0c06a0d>] mdd_create+0xdfd/0x2180 [mdd]
      14:38:56: [<ffffffffa04f4952>] ? cfs_hash_bd_from_key+0x42/0xe0 [libcfs]
      14:38:56: [<ffffffffa04f42f9>] ? cfs_hash_bd_add_locked+0x29/0x90 [libcfs]
      14:38:56: [<ffffffffa0cffb4f>] ? osd_xattr_get+0x9f/0x350 [osd_ldiskfs]
      14:38:56: [<ffffffffa0914637>] cml_create+0x97/0x250 [cmm]
      14:38:56: [<ffffffffa0c70ddf>] ? mdt_version_get_save+0x8f/0xd0 [mdt]
      14:38:56: [<ffffffffa0c84b9f>] mdt_reint_open+0x108f/0x18a0 [mdt]
      14:38:56: [<ffffffffa0c0d2be>] ? md_ucred+0x1e/0x60 [mdd]
      14:38:56: [<ffffffffa0c52235>] ? mdt_ucred+0x15/0x20 [mdt]
      14:38:56: [<ffffffffa0c6e151>] mdt_reint_rec+0x41/0xe0 [mdt]
      14:38:56: [<ffffffffa0c679aa>] mdt_reint_internal+0x50a/0x810 [mdt]
      14:38:56: [<ffffffffa0c67f7d>] mdt_intent_reint+0x1ed/0x500 [mdt]
      14:38:56: [<ffffffffa0c64191>] mdt_intent_policy+0x371/0x6a0 [mdt]
      14:38:56: [<ffffffffa0792881>] ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc]
      14:38:56: [<ffffffffa07ba9bf>] ldlm_handle_enqueue0+0x48f/0xf70 [ptlrpc]
      14:38:56: [<ffffffffa0c64506>] mdt_enqueue+0x46/0x130 [mdt]
      14:38:56: [<ffffffffa0c5b802>] mdt_handle_common+0x922/0x1740 [mdt]
      14:38:56: [<ffffffffa0c5c6f5>] mdt_regular_handle+0x15/0x20 [mdt]
      14:38:56: [<ffffffffa07eab3c>] ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc]
      14:38:56: [<ffffffffa04df65e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      14:38:56: [<ffffffffa07e1f37>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc]
      14:38:56: [<ffffffff810533f3>] ? __wake_up+0x53/0x70
      14:38:56: [<ffffffffa07ec111>] ptlrpc_main+0xbf1/0x19e0 [ptlrpc]
      14:38:57: [<ffffffffa07eb520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
      14:38:57: [<ffffffff8100c14a>] child_rip+0xa/0x20
      14:38:57: [<ffffffffa07eb520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
      14:38:57: [<ffffffffa07eb520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
      14:38:57: [<ffffffff8100c140>] ? child_rip+0x0/0x20
      14:38:57:Code: 00 41 8d 45 01 31 d2 f7 f1 8b 03 41 89 d5 83 c0 01 44 89 ea 89 03 48 8b 43 10 44 8b 34 90 41 83 fe ff 74 cc 49 8b 47 58 45 89 f0 <4a> 8b 04 c0 48 85 c0 74 bc f6 80 80 00 00 00 01 74 b3 48 8b 15 
      14:38:57:RIP  [<ffffffffa0a164db>] alloc_qos+0x87b/0x2190 [lov]
      14:38:57: RSP <ffff88003778d510>
      14:38:57:CR2: ffff880336ebfe50
      14:38:57:Initializing cgroup subsys cpuset
      14:38:57:Initializing cgroup subsys cpu
      

      Attachments

        Activity

          [LU-3760] Interop 2.3.0<->2.4 failure on test suite sanity-scrub test_6: unable to handle kernel paging request at ffff880336ebfe50

          It is another failure instance LU-3368

          yong.fan nasf (Inactive) added a comment - It is another failure instance LU-3368

          Fan Yong,
          Can you please have a look at this one and comment?
          Thank you!

          jlevi Jodi Levi (Inactive) added a comment - Fan Yong, Can you please have a look at this one and comment? Thank you!

          People

            yong.fan nasf (Inactive)
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: