Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4619

Client GPF, null pointer & LBUG using FIEMAP

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.6.0, Lustre 2.5.2
    • Lustre 2.4.0, Lustre 2.4.1, Lustre 2.5.0, Lustre 2.6.0
    • 3
    • 12644

    Description

      When attempting to run a simple code designed to exercise the fiemap ioctl on master client (from 140211)/master server(slightly older) on CentOS 6.4, we see the following failures (code will be attached to this ticket) (all of them are replicated by the same code, it seems to be more or less random which we get):

      1) Null pointer dereference in cl_object_top

      2)

       <0>LustreError: 1752:0:(mdc_request.c:913:mdc_close()) ASSERTION( mod->mod_open_req != NULL && mod->mod_open_req->rq_type != LI_POISON ) failed: POISONED open (null)!
      <0>LustreError: 1752:0:(mdc_request.c:913:mdc_close()) LBUG
      

      3) GPF in mdc_close trying to print a debug message.

      Running the code a few times (5-10, the included 'runit' script does this; it sometimes needs to be run twice to crash a client) consistently crashes my client with one of these errors.

      Also, after crashing the client, something is corrupted in the fiemap directory containing the code, and I have to delete it and replace it with a copy from elsewhere before I can run the code again. (The fiemap code also sometimes reports errors related to incorrect behavior before the client crashes.)

      Here are the full stack traces for those bugs.

      Full stack trace from 1) above:

      <1>BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
      <1>IP: [<ffffffffa040b3de>] cl_object_top+0xe/0x150 [obdclass]
      <4>PGD 374fb067 PUD 7d5d9067 PMD 0
      <4>Oops: 0000 [#1] SMP
      <4>last sysfs file: /sys/devices/virtual/block/dm-1/dm/name
      <4>CPU 1
      <4>Modules linked in: osc(U) lmv(U) mgc(U) nfs lockd fscache auth_rpcgss nfs_acl lustre(U) lov(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic sha256_generic crc32c_intel libcfs(U) sunrpc ipv6 ppdev parport_pc parport e1000 vmware_balloon sg i2c_piix4 i2c_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
      <4>
      <4>Pid: 1614, comm: flush-lustre-1 Not tainted 2.6.32.358.18.1.el6_lustre #4 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
      <4>RIP: 0010:[<ffffffffa040b3de>]  [<ffffffffa040b3de>] cl_object_top+0xe/0x150 [obdclass]
      <4>RSP: 0018:ffff88007c9319d0  EFLAGS: 00010282
      <4>RAX: ffff88007c916800 RBX: ffff88007d62f488 RCX: 0000000000000080
      <4>RDX: ffff880079b3f400 RSI: ffffffffa04676a0 RDI: 0000000000000080
      <4>RBP: ffff88007c9319e0 R08: 0000000000000001 R09: 0000000000000000
      <4>R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007d62ba10
      <4>R13: 0000000000000004 R14: 0000000000000080 R15: ffff88007c916800
      <4>FS:  0000000000000000(0000) GS:ffff880002280000(0000) knlGS:0000000000000000
      <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      <4>CR2: 0000000000000080 CR3: 000000007b4ac000 CR4: 00000000000007e0
      <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>Process flush-lustre-1 (pid: 1614, threadinfo ffff88007c930000, task ffff88007cea6ae0)
      <4>Stack:
      <4> ffff88007c9319e0 ffff88007d62f488 ffff88007c931a20 ffffffffa041abed
      <4><d> 0000000000000000 ffff88007b4b34c0 ffff88007d62b9b8 0000000000000000
      <4><d> ffff88007be6a608 ffff88007d6d2898 ffff88007c931a80 ffffffffa08de9fb
      <4>Call Trace:
      <4> [<ffffffffa041abed>] cl_io_sub_init+0x3d/0xc0 [obdclass]
      <4> [<ffffffffa08de9fb>] lov_sub_get+0x22b/0x690 [lov]
      <4> [<ffffffffa08e1376>] lov_io_iter_init+0xd6/0x480 [lov]
      <4> [<ffffffffa0417dcd>] cl_io_iter_init+0x5d/0x110 [obdclass]
      <4> [<ffffffffa041bdac>] cl_io_loop+0x4c/0x1b0 [obdclass]
      <4> [<ffffffffa0968d1b>] cl_sync_file_range+0x31b/0x500 [lustre]
      <4> [<ffffffffa0993e5b>] ll_writepages+0x8b/0x1c0 [lustre]
      <4> [<ffffffff8112e181>] do_writepages+0x21/0x40
      <4> [<ffffffff811aca6d>] writeback_single_inode+0xdd/0x290
      <4> [<ffffffff811ace7e>] writeback_sb_inodes+0xce/0x180
      <4> [<ffffffff811ad242>] wb_writeback+0x162/0x3f0
      <4> [<ffffffff8150e570>] ? thread_return+0x4e/0x76e
      <4> [<ffffffff81081be2>] ? del_timer_sync+0x22/0x30
      <4> [<ffffffff811ad58b>] wb_do_writeback+0xbb/0x240
      <4> [<ffffffff811ad773>] bdi_writeback_task+0x63/0x1b0
      <4> [<ffffffff81096c67>] ? bit_waitqueue+0x17/0xd0
      <4> [<ffffffff8113cc20>] ? bdi_start_fn+0x0/0x100
      <4> [<ffffffff8113cca6>] bdi_start_fn+0x86/0x100
      <4> [<ffffffff8113cc20>] ? bdi_start_fn+0x0/0x100
      <4> [<ffffffff81096a36>] kthread+0x96/0xa0
      <4> [<ffffffff8100c0ca>] child_rip+0xa/0x20
      <4> [<ffffffff810969a0>] ? kthread+0x0/0xa0
      <4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      <4>Code: 48 89 df e8 85 e0 d5 e0 48 c7 c3 f4 ff ff ff e9 2a ff ff ff 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 <48> 8b 07 0f 1f 80 00 00 00 00 48 89 c2 48 8b 40 70 48 85 c0 75
      <1>RIP  [<ffffffffa040b3de>] cl_object_top+0xe/0x150 [obdclass]
      <4> RSP <ffff88007c9319d0>
      <4>CR2: 0000000000000080
      

      Full trace from 2 above:

      <0>LustreError: 1752:0:(mdc_request.c:913:mdc_close()) ASSERTION( mod->mod_open_req != NULL && mod->mod_open_req->rq_type != LI_POISON ) failed: POISONED open (null)!
      <0>LustreError: 1752:0:(mdc_request.c:913:mdc_close()) LBUG
      <4>Pid: 1752, comm: rm
      <4>
      <4>Call Trace:
      <4> [<ffffffffa0297895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      <4> [<ffffffffa0297e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      <4> [<ffffffffa08a1ec5>] mdc_close+0x845/0xa50 [mdc]
      <4> [<ffffffffa099beb9>] ? ll_i2suppgid+0x19/0x30 [lustre]
      <4> [<ffffffffa059f626>] lmv_close+0x2c6/0x530 [lmv]
      <4> [<ffffffffa096da67>] ll_close_inode_openhandle+0x2f7/0x10b0 [lustre]
      <4> [<ffffffffa096e9cc>] ll_md_real_close+0x1ac/0x220 [lustre]
      <4> [<ffffffffa099e48d>] ll_md_blocking_ast+0x39d/0x7d0 [lustre]
      <4> [<ffffffffa03e0bed>] ? class_handle_unhash_nolock+0x2d/0x150 [obdclass]
      <4> [<ffffffffa03e111c>] ? class_handle_unhash+0x3c/0x50 [obdclass]
      <4> [<ffffffffa05ece9c>] ldlm_cancel_callback+0x6c/0x1a0 [ptlrpc]
      <4> [<ffffffffa0606b8a>] ldlm_cli_cancel_local+0x8a/0x470 [ptlrpc]
      <4> [<ffffffffa0609e8e>] ldlm_cli_cancel_list_local+0xee/0x290 [ptlrpc]
      <4> [<ffffffffa060a1c3>] ldlm_cancel_resource_local+0x193/0x290 [ptlrpc]
      <4> [<ffffffffa08a49ee>] mdc_resource_get_unused+0x14e/0x2c0 [mdc]
      <4> [<ffffffffa08a5e0d>] mdc_unlink+0x3ad/0x500 [mdc]
      <4> [<ffffffffa059c18b>] lmv_unlink+0x1db/0x7a0 [lmv]
      <4> [<ffffffffa0982858>] ? ll_prep_md_op_data+0x1a8/0x4a0 [lustre]
      <4> [<ffffffffa09a2678>] ll_unlink+0x158/0x610 [lustre]
      <4> [<ffffffff8118fe90>] vfs_unlink+0xa0/0xf0
      <4> [<ffffffff8118ebca>] ? lookup_hash+0x3a/0x50
      <4> [<ffffffff81192235>] do_unlinkat+0xf5/0x1b0
      <4> [<ffffffff810dc937>] ? audit_syscall_entry+0x1d7/0x200
      <4> [<ffffffff81192452>] sys_unlinkat+0x22/0x40
      <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      

      Full trace from 3 above:

      <4>general protection fault: 0000 [#1] SMP
      <4>last sysfs file: /sys/module/ptlrpc/initstate
      <4>CPU 1
      <4>Modules linked in: osc(U) lmv(U) mgc(U) nfs lockd fscache auth_rpcgss nfs_acl lustre(U) lov(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic sha256_generic crc32c_intel libcfs(U) sunrpc ipv6 ppdev parport_pc parport e1000 vmware_balloon sg i2c_piix4 i2c_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
      <4>
      <4>Pid: 1858, comm: rm Not tainted 2.6.32.358.18.1.el6_lustre #4 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
      <4>RIP: 0010:[<ffffffff8128389f>]  [<ffffffff8128389f>] memcpy+0x9f/0x120
      <4>RSP: 0018:ffff88007ae4b4f0  EFLAGS: 00010202
      <4>RAX: db73880000000601 RBX: db73880000000601 RCX: 0000000000000011
      <4>RDX: 0000000000000011 RSI: ffffffffa08b274f RDI: db73880000000601
      <4>RBP: ffff88007ae4b588 R08: 6374616d20404040 R09: 6e65706f20646568
      <4>R10: 686374616d204040 R11: 206e65706f206465 R12: ffffffffa08b2760
      <4>R13: ffffffffa08b274f R14: ffff88007ae4b678 R15: db73880000001000
      <4>FS:  00007f2f14f58700(0000) GS:ffff880002280000(0000) knlGS:0000000000000000
      <4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4>CR2: 00000000004073b0 CR3: 000000007bd98000 CR4: 00000000000007e0
      <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>Process rm (pid: 1858, threadinfo ffff88007ae4a000, task ffff88007b377500)
      <4>Stack:
      <4> ffffffff81281826 00000000ffffffff ffff88007ae4b558 ffffffffa08b7080
      <4><d> 0000000000000000 0000000000000011 00000601810a15fa 00000000000009ff
      <4><d> db73880000000601 0000000000000000 0000000000000000 0000000000000000
      <4>Call Trace:
      <4> [<ffffffff81281826>] ? vsnprintf+0x336/0x5e0
      <4> [<ffffffffa029727b>] ? cfs_set_ptldebug_header+0x2b/0xc0 [libcfs]
      <4> [<ffffffffa02a717a>] libcfs_debug_vmsg2+0x2ea/0xbb0 [libcfs]
      <4> [<ffffffff81281708>] ? vsnprintf+0x218/0x5e0
      <4> [<ffffffffa02a37e5>] ? libcfs_nid2str+0x155/0x160 [libcfs]
      <4> [<ffffffffa0639034>] _debug_req+0x454/0x680 [ptlrpc]
      <4> [<ffffffffa08a1c89>] mdc_close+0x609/0xa50 [mdc]
      <4> [<ffffffffa099beb9>] ? ll_i2suppgid+0x19/0x30 [lustre]
      <4> [<ffffffffa059f626>] lmv_close+0x2c6/0x530 [lmv]
      <4> [<ffffffffa096da67>] ll_close_inode_openhandle+0x2f7/0x10b0 [lustre]
      <4> [<ffffffffa096e9cc>] ll_md_real_close+0x1ac/0x220 [lustre]
      <4> [<ffffffffa099e48d>] ll_md_blocking_ast+0x39d/0x7d0 [lustre]
      <4> [<ffffffffa03e0bed>] ? class_handle_unhash_nolock+0x2d/0x150 [obdclass]
      <4> [<ffffffffa03e111c>] ? class_handle_unhash+0x3c/0x50 [obdclass]
      <4> [<ffffffffa05ece9c>] ldlm_cancel_callback+0x6c/0x1a0 [ptlrpc]
      <4> [<ffffffffa0606b8a>] ldlm_cli_cancel_local+0x8a/0x470 [ptlrpc]
      <4> [<ffffffffa0609e8e>] ldlm_cli_cancel_list_local+0xee/0x290 [ptlrpc]
      <4> [<ffffffffa060a1c3>] ldlm_cancel_resource_local+0x193/0x290 [ptlrpc]
      <4> [<ffffffffa08a49ee>] mdc_resource_get_unused+0x14e/0x2c0 [mdc]
      <4> [<ffffffffa08a5e0d>] mdc_unlink+0x3ad/0x500 [mdc]
      <4> [<ffffffffa059c18b>] lmv_unlink+0x1db/0x7a0 [lmv]
      <4> [<ffffffffa0982858>] ? ll_prep_md_op_data+0x1a8/0x4a0 [lustre]
      <4> [<ffffffffa09a2678>] ll_unlink+0x158/0x610 [lustre]
      <4> [<ffffffff8118fe90>] vfs_unlink+0xa0/0xf0
      <4> [<ffffffff8118ebca>] ? lookup_hash+0x3a/0x50
      <4> [<ffffffff81192235>] do_unlinkat+0xf5/0x1b0
      <4> [<ffffffff810dc937>] ? audit_syscall_entry+0x1d7/0x200
      <4> [<ffffffff81192452>] sys_unlinkat+0x22/0x40
      <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      

      Attachments

        Activity

          People

            bobijam Zhenyu Xu
            paf Patrick Farrell
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: