Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3160

recovery-random-scale test_fail_client_mds: RIP: cl_object_top+0xe/0x150 [obdclass]

    XMLWordPrintable

Details

    • 3
    • 7697

    Description

      While running recovery-random-scale (failing one random client and then failing mds), dd operation on the other live client (wtm-4vm5) hung and the client crashed:

      2013-04-11 03:01:35: dd run starting
      + mkdir -p /mnt/lustre/d0.dd-wtm-4vm5.rosso.whamcloud.com
      + /usr/bin/lfs setstripe -c -1 /mnt/lustre/d0.dd-wtm-4vm5.rosso.whamcloud.com
      + cd /mnt/lustre/d0.dd-wtm-4vm5.rosso.whamcloud.com
      ++ /usr/bin/lfs df /mnt/lustre/d0.dd-wtm-4vm5.rosso.whamcloud.com
      + FREE_SPACE=97381848
      + BLKS=21910915
      + echo 'Free disk space is 97381848, 4k blocks to dd is 21910915'
      + load_pid=2634
      + wait 2634
      + dd bs=4k count=21910915 status=noxfer if=/dev/zero of=/mnt/lustre/d0.dd-wtm-4vm5.rosso.whamcloud.com/dd-file
      

      Console log on wtm-4vm5:

      BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
      IP: [<ffffffffa05b4b4e>] cl_object_top+0xe/0x150 [obdclass]
      PGD 7b01e067 PUD 7b016067 PMD 0
      Oops: 0000 [#1] SMP
      last sysfs file: /sys/devices/system/cpu/possible
      CPU 0
      Modules linked in: lmv(U) mgc(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic libcfs(U) nfsd exportfs autofs4 nfs lockd fscache nfs_acl auth_rpcgss sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
      
      Pid: 2683, comm: flush-lustre-1 Not tainted 2.6.32-279.19.1.el6.x86_64 #1 Red Hat KVM
      RIP: 0010:[<ffffffffa05b4b4e>]  [<ffffffffa05b4b4e>] cl_object_top+0xe/0x150 [obdclass]
      RSP: 0018:ffff88004c3c5980  EFLAGS: 00010282
      RAX: ffff88007bc75800 RBX: ffff88007d1c21e8 RCX: 0000000000000098
      RDX: ffff88003e2bb200 RSI: ffffffffa0602400 RDI: 0000000000000098
      RBP: ffff88004c3c5990 R08: 0000000000000001 R09: 0000000000000000
      R10: 000000000000000f R11: 000000000000000f R12: ffff88007d1bc3d0
      R13: 0000000000000004 R14: 0000000000000098 R15: ffff88007bc75800
      FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      CR2: 0000000000000098 CR3: 000000007b02d000 CR4: 00000000000006f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process flush-lustre-1 (pid: 2683, threadinfo ffff88004c3c4000, task ffff88007b730080)
      Stack:
       ffff88004c3c5990 ffff88007d1c21e8 ffff88004c3c59d0 ffffffffa05c575d
      <d> 0000000000000000 ffff8800430e3c00 ffff88007d1bc378 0000000000000000
      <d> ffff88007d1bf768 ffff88007b020e80 ffff88004c3c5a30 ffffffffa09d3488
      Call Trace:
       [<ffffffffa05c575d>] cl_io_sub_init+0x3d/0xc0 [obdclass]
       [<ffffffffa09d3488>] lov_sub_get+0x218/0x690 [lov]
       [<ffffffffa09d5116>] lov_io_iter_init+0xd6/0x480 [lov]
       [<ffffffffa05c279d>] cl_io_iter_init+0x5d/0x110 [obdclass]
       [<ffffffffa05c6d3c>] cl_io_loop+0x4c/0x1b0 [obdclass]
       [<ffffffffa0a5233b>] cl_sync_file_range+0x2fb/0x4e0 [lustre]
       [<ffffffffa0a7ba7f>] ll_writepages+0x6f/0x1a0 [lustre]
       [<ffffffff811255d1>] do_writepages+0x21/0x40
       [<ffffffff8119fe8d>] writeback_single_inode+0xdd/0x290
       [<ffffffff811a029e>] writeback_sb_inodes+0xce/0x180
       [<ffffffff811a03fb>] writeback_inodes_wb+0xab/0x1b0
       [<ffffffff811a079b>] wb_writeback+0x29b/0x3f0
       [<ffffffff814e9c50>] ? thread_return+0x4e/0x76e
       [<ffffffff8107d572>] ? del_timer_sync+0x22/0x30
       [<ffffffff811a0a89>] wb_do_writeback+0x199/0x240
       [<ffffffff811a0b93>] bdi_writeback_task+0x63/0x1b0
       [<ffffffff81090857>] ? bit_waitqueue+0x17/0xd0
       [<ffffffff81134170>] ? bdi_start_fn+0x0/0x100
       [<ffffffff811341f6>] bdi_start_fn+0x86/0x100
       [<ffffffff81134170>] ? bdi_start_fn+0x0/0x100
       [<ffffffff81090626>] kthread+0x96/0xa0
       [<ffffffff8100c0ca>] child_rip+0xa/0x20
       [<ffffffff81090590>] ? kthread+0x0/0xa0
       [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Code: 04 00 00 00 04 00 e8 52 b7 e8 ff 48 c7 c7 60 2b 60 a0 e8 16 b3 e7 ff 66 0f 1f 44 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 <48> 8b 07 0f 1f 80 00 00 00 00 48 89 c2 48 8b 80 88 00 00 00 48
      RIP  [<ffffffffa05b4b4e>] cl_object_top+0xe/0x150 [obdclass]
       RSP <ffff88004c3c5980>
      

      Maloo report: https://maloo.whamcloud.com/test_sets/c1c906c6-a294-11e2-81ba-52540035b04c

      The console logs in the above Maloo report were not gathered completely due to TT-1107. Please refer to the attachment for the full console logs.

      Attachments

        Activity

          People

            niu Niu Yawei (Inactive)
            yujian Jian Yu
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: