Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3160

recovery-random-scale test_fail_client_mds: RIP: cl_object_top+0xe/0x150 [obdclass]

Details

    • 3
    • 7697

    Description

      While running recovery-random-scale (failing one random client and then failing mds), dd operation on the other live client (wtm-4vm5) hung and the client crashed:

      2013-04-11 03:01:35: dd run starting
      + mkdir -p /mnt/lustre/d0.dd-wtm-4vm5.rosso.whamcloud.com
      + /usr/bin/lfs setstripe -c -1 /mnt/lustre/d0.dd-wtm-4vm5.rosso.whamcloud.com
      + cd /mnt/lustre/d0.dd-wtm-4vm5.rosso.whamcloud.com
      ++ /usr/bin/lfs df /mnt/lustre/d0.dd-wtm-4vm5.rosso.whamcloud.com
      + FREE_SPACE=97381848
      + BLKS=21910915
      + echo 'Free disk space is 97381848, 4k blocks to dd is 21910915'
      + load_pid=2634
      + wait 2634
      + dd bs=4k count=21910915 status=noxfer if=/dev/zero of=/mnt/lustre/d0.dd-wtm-4vm5.rosso.whamcloud.com/dd-file
      

      Console log on wtm-4vm5:

      BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
      IP: [<ffffffffa05b4b4e>] cl_object_top+0xe/0x150 [obdclass]
      PGD 7b01e067 PUD 7b016067 PMD 0
      Oops: 0000 [#1] SMP
      last sysfs file: /sys/devices/system/cpu/possible
      CPU 0
      Modules linked in: lmv(U) mgc(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic libcfs(U) nfsd exportfs autofs4 nfs lockd fscache nfs_acl auth_rpcgss sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
      
      Pid: 2683, comm: flush-lustre-1 Not tainted 2.6.32-279.19.1.el6.x86_64 #1 Red Hat KVM
      RIP: 0010:[<ffffffffa05b4b4e>]  [<ffffffffa05b4b4e>] cl_object_top+0xe/0x150 [obdclass]
      RSP: 0018:ffff88004c3c5980  EFLAGS: 00010282
      RAX: ffff88007bc75800 RBX: ffff88007d1c21e8 RCX: 0000000000000098
      RDX: ffff88003e2bb200 RSI: ffffffffa0602400 RDI: 0000000000000098
      RBP: ffff88004c3c5990 R08: 0000000000000001 R09: 0000000000000000
      R10: 000000000000000f R11: 000000000000000f R12: ffff88007d1bc3d0
      R13: 0000000000000004 R14: 0000000000000098 R15: ffff88007bc75800
      FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      CR2: 0000000000000098 CR3: 000000007b02d000 CR4: 00000000000006f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process flush-lustre-1 (pid: 2683, threadinfo ffff88004c3c4000, task ffff88007b730080)
      Stack:
       ffff88004c3c5990 ffff88007d1c21e8 ffff88004c3c59d0 ffffffffa05c575d
      <d> 0000000000000000 ffff8800430e3c00 ffff88007d1bc378 0000000000000000
      <d> ffff88007d1bf768 ffff88007b020e80 ffff88004c3c5a30 ffffffffa09d3488
      Call Trace:
       [<ffffffffa05c575d>] cl_io_sub_init+0x3d/0xc0 [obdclass]
       [<ffffffffa09d3488>] lov_sub_get+0x218/0x690 [lov]
       [<ffffffffa09d5116>] lov_io_iter_init+0xd6/0x480 [lov]
       [<ffffffffa05c279d>] cl_io_iter_init+0x5d/0x110 [obdclass]
       [<ffffffffa05c6d3c>] cl_io_loop+0x4c/0x1b0 [obdclass]
       [<ffffffffa0a5233b>] cl_sync_file_range+0x2fb/0x4e0 [lustre]
       [<ffffffffa0a7ba7f>] ll_writepages+0x6f/0x1a0 [lustre]
       [<ffffffff811255d1>] do_writepages+0x21/0x40
       [<ffffffff8119fe8d>] writeback_single_inode+0xdd/0x290
       [<ffffffff811a029e>] writeback_sb_inodes+0xce/0x180
       [<ffffffff811a03fb>] writeback_inodes_wb+0xab/0x1b0
       [<ffffffff811a079b>] wb_writeback+0x29b/0x3f0
       [<ffffffff814e9c50>] ? thread_return+0x4e/0x76e
       [<ffffffff8107d572>] ? del_timer_sync+0x22/0x30
       [<ffffffff811a0a89>] wb_do_writeback+0x199/0x240
       [<ffffffff811a0b93>] bdi_writeback_task+0x63/0x1b0
       [<ffffffff81090857>] ? bit_waitqueue+0x17/0xd0
       [<ffffffff81134170>] ? bdi_start_fn+0x0/0x100
       [<ffffffff811341f6>] bdi_start_fn+0x86/0x100
       [<ffffffff81134170>] ? bdi_start_fn+0x0/0x100
       [<ffffffff81090626>] kthread+0x96/0xa0
       [<ffffffff8100c0ca>] child_rip+0xa/0x20
       [<ffffffff81090590>] ? kthread+0x0/0xa0
       [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Code: 04 00 00 00 04 00 e8 52 b7 e8 ff 48 c7 c7 60 2b 60 a0 e8 16 b3 e7 ff 66 0f 1f 44 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 <48> 8b 07 0f 1f 80 00 00 00 00 48 89 c2 48 8b 80 88 00 00 00 48
      RIP  [<ffffffffa05b4b4e>] cl_object_top+0xe/0x150 [obdclass]
       RSP <ffff88004c3c5980>
      

      Maloo report: https://maloo.whamcloud.com/test_sets/c1c906c6-a294-11e2-81ba-52540035b04c

      The console logs in the above Maloo report were not gathered completely due to TT-1107. Please refer to the attachment for the full console logs.

      Attachments

        Activity

          [LU-3160] recovery-random-scale test_fail_client_mds: RIP: cl_object_top+0xe/0x150 [obdclass]
          pjones Peter Jones added a comment -

          Landed for 2.4

          pjones Peter Jones added a comment - Landed for 2.4

          This should be a problem of force umount that it causes failure in refresh layout. I have updated the patch, please take a look.

          jay Jinshan Xiong (Inactive) added a comment - This should be a problem of force umount that it causes failure in refresh layout. I have updated the patch, please take a look.
          jay Jinshan Xiong (Inactive) added a comment - - edited

          Do you know why we didn't flush dirty on layout lock revocation in the begining? Thanks.

          We don't want to introduce cascading problem - in case one of the OSTs is unaccessible, the client will be evicted by the MDT because it can't cancel the layout lock in time.

          If what you said was true, we should have seen this problem in migration, because we're using an orphan object to restripe. I will take a look at this.

          jay Jinshan Xiong (Inactive) added a comment - - edited Do you know why we didn't flush dirty on layout lock revocation in the begining? Thanks. We don't want to introduce cascading problem - in case one of the OSTs is unaccessible, the client will be evicted by the MDT because it can't cancel the layout lock in time. If what you said was true, we should have seen this problem in migration, because we're using an orphan object to restripe. I will take a look at this.

          Looks we chose to flush dirty when changing layout but not on layout lock revocation, not sure if it was the best choice, but the problem can be worked around by anohter way (ignore -ENOENT error on layout refresh for io).

          niu Niu Yawei (Inactive) added a comment - Looks we chose to flush dirty when changing layout but not on layout lock revocation, not sure if it was the best choice, but the problem can be worked around by anohter way (ignore -ENOENT error on layout refresh for io).

          Hi, Xiong

          Why don't we flush dirty before lost layout lock (see ll_md_blocking_ast())?

          This patch caused LBUG when running conf-sanity test_0:

          • client write data to file, then do unlink;
          • layout lock is revoked, but the dirty isn't being flushed;
          • on client umount, kernel try to flush dirty back by calling ll_writepages(), which will verify layout (what this patch did), but layout fetch will defenitely fail because the file has been removed on MDS;
          • dirty isn't flushed back at the end, and the LBUG is triggered.

          Do you know why we didn't flush dirty on layout lock revocation in the begining? Thanks.

          niu Niu Yawei (Inactive) added a comment - Hi, Xiong Why don't we flush dirty before lost layout lock (see ll_md_blocking_ast())? This patch caused LBUG when running conf-sanity test_0: client write data to file, then do unlink; layout lock is revoked, but the dirty isn't being flushed; on client umount, kernel try to flush dirty back by calling ll_writepages(), which will verify layout (what this patch did), but layout fetch will defenitely fail because the file has been removed on MDS; dirty isn't flushed back at the end, and the LBUG is triggered. Do you know why we didn't flush dirty on layout lock revocation in the begining? Thanks.

          Looks the patch revealed another problem, conf-sanity test_0 will always hit LBUG. (see LU-3230), I'm looking into it.

          niu Niu Yawei (Inactive) added a comment - Looks the patch revealed another problem, conf-sanity test_0 will always hit LBUG. (see LU-3230 ), I'm looking into it.
          niu Niu Yawei (Inactive) added a comment - http://review.whamcloud.com/6154

          Yes, this is exactly what we're doing. However, there are some cases that we need to start an IO at OSC layer, which may have already held conf_lock so it may cause deadlock. ci_ignore_layout was worked out to indicate if it needs to hold layout lock to start the IO. Usually if the IO is about page we're pretty safe because layout change has to clean up pages.

          For cl_sync_file_range() case, I set ci_ignore_layout to 1 because I thought if there exists dirty pages, layout won't be changed. Obviously I was wrong because there is race.

          For this specific problem, it can be fixed by revising ci_ignore_layout to 0. Please also check the other callers of cl_sync_file_range() to make sure it's true.

          jay Jinshan Xiong (Inactive) added a comment - Yes, this is exactly what we're doing. However, there are some cases that we need to start an IO at OSC layer, which may have already held conf_lock so it may cause deadlock. ci_ignore_layout was worked out to indicate if it needs to hold layout lock to start the IO. Usually if the IO is about page we're pretty safe because layout change has to clean up pages. For cl_sync_file_range() case, I set ci_ignore_layout to 1 because I thought if there exists dirty pages, layout won't be changed. Obviously I was wrong because there is race. For this specific problem, it can be fixed by revising ci_ignore_layout to 0. Please also check the other callers of cl_sync_file_range() to make sure it's true.

          Hi, Xiong

          I don't see why we didn't use lo_type_guard to protect layout at the beginning, which looks simpler and safer than current way (active_ios & waitq).

          • Get read lock for io: call lov_conf_freeze() in lov_io_init_raid0(), call lov_conf_thaw() in lov_io_fini();
          • Get write lock for layout change: call lov_conf_lock() & lov_conf_unlock() in lov_layout_change() to make the "layout delete -> layout reinstall" atomic;

          Did I miss anything?

          niu Niu Yawei (Inactive) added a comment - Hi, Xiong I don't see why we didn't use lo_type_guard to protect layout at the beginning, which looks simpler and safer than current way (active_ios & waitq). Get read lock for io: call lov_conf_freeze() in lov_io_init_raid0(), call lov_conf_thaw() in lov_io_fini(); Get write lock for layout change: call lov_conf_lock() & lov_conf_unlock() in lov_layout_change() to make the "layout delete -> layout reinstall" atomic; Did I miss anything?

          Niu, I think you're absolutely right about this. I extracted the stack trace from vmcore, so here is another process:

          PID: 11897  TASK: ffff8808365a3500  CPU: 25  COMMAND: "dd"
           #0 [ffff88044e527e90] crash_nmi_callback at ffffffff81029796
           #1 [ffff88044e527ea0] notifier_call_chain at ffffffff814ef745
           #2 [ffff88044e527ee0] atomic_notifier_call_chain at ffffffff814ef7aa
           #3 [ffff88044e527ef0] notify_die at ffffffff810969ae
           #4 [ffff88044e527f20] do_nmi at ffffffff814ed3c3
           #5 [ffff88044e527f50] nmi at ffffffff814eccd0
              [exception RIP: strrchr+23]
              RIP: ffffffff81270227  RSP: ffff8808344b3268  RFLAGS: 00000202
              RAX: ffffffffa05c21bb  RBX: ffff880418403200  RCX: 0000000000000000
              RDX: 0000000000000000  RSI: 000000000000002f  RDI: ffffffffa05c21b0
              RBP: ffff8808344b3268   R8: 0000000000000073   R9: 00000000fffffffc
              R10: 0000000000000001  R11: 000000000000000f  R12: ffff880833a92f08
              R13: ffff88041d764f38  R14: ffff880418403200  R15: ffffffffa060ed80
              ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
          --- <NMI exception stack> ---
           #6 [ffff8808344b3268] strrchr at ffffffff81270227
           #7 [ffff8808344b3270] libcfs_debug_vmsg2 at ffffffffa045c76a [libcfs]
           #8 [ffff8808344b33e0] libcfs_debug_msg at ffffffffa045d2c1 [libcfs]
           #9 [ffff8808344b3440] cl_page_put at ffffffffa05a056d [obdclass]
          #10 [ffff8808344b34b0] cl_page_delete0 at ffffffffa05a10eb [obdclass]
          #11 [ffff8808344b34f0] cl_page_delete at ffffffffa05a1412 [obdclass]
          #12 [ffff8808344b3510] ll_invalidatepage at ffffffffa0a759bd [lustre]
          #13 [ffff8808344b3550] vvp_page_discard at ffffffffa0a87cdc [lustre]
          #14 [ffff8808344b3580] cl_page_invoid at ffffffffa059cf98 [obdclass]
          #15 [ffff8808344b35d0] cl_page_discard at ffffffffa059d0a3 [obdclass]
          #16 [ffff8808344b35e0] discard_cb at ffffffffa05a4a84 [obdclass]
          #17 [ffff8808344b3620] cl_page_gang_lookup at ffffffffa05a1f14 [obdclass]
          #18 [ffff8808344b36d0] cl_lock_discard_pages at ffffffffa05a489e [obdclass]
          #19 [ffff8808344b3720] osc_lock_flush at ffffffffa091aa6f [osc]
          #20 [ffff8808344b3780] osc_lock_cancel at ffffffffa091acd7 [osc]
          #21 [ffff8808344b37d0] cl_lock_cancel0 at ffffffffa05a2735 [obdclass]
          #22 [ffff8808344b3800] cl_lock_cancel at ffffffffa05a32db [obdclass]
          #23 [ffff8808344b3820] cl_locks_prune at ffffffffa05a6bd3 [obdclass]
          #24 [ffff8808344b38c0] lov_delete_raid0 at ffffffffa09aa7dc [lov]
          #25 [ffff8808344b3970] lov_conf_set at ffffffffa09ab1fb [lov]
          #26 [ffff8808344b39e0] cl_conf_set at ffffffffa059b298 [obdclass]
          #27 [ffff8808344b3a10] ll_layout_conf at ffffffffa0a2e1b8 [lustre]
          #28 [ffff8808344b3a50] ll_layout_lock_set at ffffffffa0a3bced [lustre]
          #29 [ffff8808344b3b40] ll_layout_refresh at ffffffffa0a3f61b [lustre]
          #30 [ffff8808344b3c90] vvp_io_init at ffffffffa0a8b61f [lustre]
          #31 [ffff8808344b3cd0] cl_io_init0 at ffffffffa05a98e8 [obdclass]
          #32 [ffff8808344b3d10] cl_io_init at ffffffffa05ac6a4 [obdclass]
          #33 [ffff8808344b3d50] cl_io_rw_init at ffffffffa05adf64 [obdclass]
          #34 [ffff8808344b3da0] ll_file_io_generic at ffffffffa0a31598 [lustre]
          #35 [ffff8808344b3e20] ll_file_aio_write at ffffffffa0a32c12 [lustre]
          #36 [ffff8808344b3e80] ll_file_write at ffffffffa0a32efc [lustre]
          #37 [ffff8808344b3ef0] vfs_write at ffffffff81176588
          #38 [ffff8808344b3f30] sys_write at ffffffff81176e81
          #39 [ffff8808344b3f80] system_call_fastpath at ffffffff8100b072
              RIP: 00000030690dae60  RSP: 00007fffd66dfbc8  RFLAGS: 00000206
              RAX: 0000000000000001  RBX: ffffffff8100b072  RCX: 00000030690dae60
              RDX: 0000000000001000  RSI: 00000000019e3000  RDI: 0000000000000001
              RBP: 00000000019e3000   R8: 000000306938eee8   R9: 0000000000000001
              R10: 0000000000003003  R11: 0000000000000246  R12: 00000000019e2fff
              R13: 0000000000000000  R14: 0000000000001000  R15: 0000000000000000
              ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
          

          It's changing layout. However, I don't know why this happened. I can't extract lustre log unfortunately.

          jay Jinshan Xiong (Inactive) added a comment - Niu, I think you're absolutely right about this. I extracted the stack trace from vmcore, so here is another process: PID: 11897 TASK: ffff8808365a3500 CPU: 25 COMMAND: "dd" #0 [ffff88044e527e90] crash_nmi_callback at ffffffff81029796 #1 [ffff88044e527ea0] notifier_call_chain at ffffffff814ef745 #2 [ffff88044e527ee0] atomic_notifier_call_chain at ffffffff814ef7aa #3 [ffff88044e527ef0] notify_die at ffffffff810969ae #4 [ffff88044e527f20] do_nmi at ffffffff814ed3c3 #5 [ffff88044e527f50] nmi at ffffffff814eccd0 [exception RIP: strrchr+23] RIP: ffffffff81270227 RSP: ffff8808344b3268 RFLAGS: 00000202 RAX: ffffffffa05c21bb RBX: ffff880418403200 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 000000000000002f RDI: ffffffffa05c21b0 RBP: ffff8808344b3268 R8: 0000000000000073 R9: 00000000fffffffc R10: 0000000000000001 R11: 000000000000000f R12: ffff880833a92f08 R13: ffff88041d764f38 R14: ffff880418403200 R15: ffffffffa060ed80 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #6 [ffff8808344b3268] strrchr at ffffffff81270227 #7 [ffff8808344b3270] libcfs_debug_vmsg2 at ffffffffa045c76a [libcfs] #8 [ffff8808344b33e0] libcfs_debug_msg at ffffffffa045d2c1 [libcfs] #9 [ffff8808344b3440] cl_page_put at ffffffffa05a056d [obdclass] #10 [ffff8808344b34b0] cl_page_delete0 at ffffffffa05a10eb [obdclass] #11 [ffff8808344b34f0] cl_page_delete at ffffffffa05a1412 [obdclass] #12 [ffff8808344b3510] ll_invalidatepage at ffffffffa0a759bd [lustre] #13 [ffff8808344b3550] vvp_page_discard at ffffffffa0a87cdc [lustre] #14 [ffff8808344b3580] cl_page_invoid at ffffffffa059cf98 [obdclass] #15 [ffff8808344b35d0] cl_page_discard at ffffffffa059d0a3 [obdclass] #16 [ffff8808344b35e0] discard_cb at ffffffffa05a4a84 [obdclass] #17 [ffff8808344b3620] cl_page_gang_lookup at ffffffffa05a1f14 [obdclass] #18 [ffff8808344b36d0] cl_lock_discard_pages at ffffffffa05a489e [obdclass] #19 [ffff8808344b3720] osc_lock_flush at ffffffffa091aa6f [osc] #20 [ffff8808344b3780] osc_lock_cancel at ffffffffa091acd7 [osc] #21 [ffff8808344b37d0] cl_lock_cancel0 at ffffffffa05a2735 [obdclass] #22 [ffff8808344b3800] cl_lock_cancel at ffffffffa05a32db [obdclass] #23 [ffff8808344b3820] cl_locks_prune at ffffffffa05a6bd3 [obdclass] #24 [ffff8808344b38c0] lov_delete_raid0 at ffffffffa09aa7dc [lov] #25 [ffff8808344b3970] lov_conf_set at ffffffffa09ab1fb [lov] #26 [ffff8808344b39e0] cl_conf_set at ffffffffa059b298 [obdclass] #27 [ffff8808344b3a10] ll_layout_conf at ffffffffa0a2e1b8 [lustre] #28 [ffff8808344b3a50] ll_layout_lock_set at ffffffffa0a3bced [lustre] #29 [ffff8808344b3b40] ll_layout_refresh at ffffffffa0a3f61b [lustre] #30 [ffff8808344b3c90] vvp_io_init at ffffffffa0a8b61f [lustre] #31 [ffff8808344b3cd0] cl_io_init0 at ffffffffa05a98e8 [obdclass] #32 [ffff8808344b3d10] cl_io_init at ffffffffa05ac6a4 [obdclass] #33 [ffff8808344b3d50] cl_io_rw_init at ffffffffa05adf64 [obdclass] #34 [ffff8808344b3da0] ll_file_io_generic at ffffffffa0a31598 [lustre] #35 [ffff8808344b3e20] ll_file_aio_write at ffffffffa0a32c12 [lustre] #36 [ffff8808344b3e80] ll_file_write at ffffffffa0a32efc [lustre] #37 [ffff8808344b3ef0] vfs_write at ffffffff81176588 #38 [ffff8808344b3f30] sys_write at ffffffff81176e81 #39 [ffff8808344b3f80] system_call_fastpath at ffffffff8100b072 RIP: 00000030690dae60 RSP: 00007fffd66dfbc8 RFLAGS: 00000206 RAX: 0000000000000001 RBX: ffffffff8100b072 RCX: 00000030690dae60 RDX: 0000000000001000 RSI: 00000000019e3000 RDI: 0000000000000001 RBP: 00000000019e3000 R8: 000000306938eee8 R9: 0000000000000001 R10: 0000000000003003 R11: 0000000000000246 R12: 00000000019e2fff R13: 0000000000000000 R14: 0000000000001000 R15: 0000000000000000 ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b It's changing layout. However, I don't know why this happened. I can't extract lustre log unfortunately.

          People

            niu Niu Yawei (Inactive)
            yujian Jian Yu
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: