Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4222

Oops in mdt_dump_lmm+0x16/0x410 [mdt]

Details

    • 3
    • 11490

    Description

      Nov 6 16:38:07 lustre-mds-0-0 kernel: BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
      Nov 6 16:38:07 lustre-mds-0-0 kernel: IP: [<ffffffffa0cfb246>] mdt_dump_lmm+0x16/0x410 [mdt]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: PGD 0
      Nov 6 16:38:07 lustre-mds-0-0 kernel: Oops: 0000 1 SMP
      Nov 6 16:38:07 lustre-mds-0-0 kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:09.0/0000:19:00.0/0000:1a:04.0/0000:1c:00.0/irq
      Nov 6 16:38:07 lustre-mds-0-0 kernel: CPU 4
      Nov 6 16:38:07 lustre-mds-0-0 kernel: Modules linked in: osp(U) lod(U) mdt(U) mgs(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) lquota(U) mdd(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic crc32c_intel libcfs(U) ldiskfs(U) autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf iptable_filter ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support i7core_edac edac_core ioatdma raid10 myri10ge ses enclosure sg igb dca ptp pps_core sr_mod cdrom ext4 mbcache jbd2 sd_mod crc_t10dif usb_storage ahci mptsas mptscsih mptbase scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
      Nov 6 16:38:07 lustre-mds-0-0 kernel:
      Nov 6 16:38:07 lustre-mds-0-0 kernel: Pid: 4408, comm: mdt02_002 Not tainted 2.6.32-358.14.1.el6_lustre.g0a46394.x86_64 #1 SUN MICROSYSTEMS SUN FIRE X4170 SERVER /ASSY,MOTHERBOARD,X4170
      Nov 6 16:38:07 lustre-mds-0-0 kernel: RIP: 0010:[<ffffffffa0cfb246>] [<ffffffffa0cfb246>] mdt_dump_lmm+0x16/0x410 [mdt]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: RSP: 0018:ffff88066bf87a20 EFLAGS: 00010282
      Nov 6 16:38:07 lustre-mds-0-0 kernel: RAX: 0000000000000003 RBX: ffff88066bf7e000 RCX: ffffc9002118d6f0
      Nov 6 16:38:07 lustre-mds-0-0 kernel: RDX: ffff88066914bc00 RSI: 0000000000000000 RDI: 0000000000000040
      Nov 6 16:38:07 lustre-mds-0-0 kernel: RBP: ffff88066bf87a70 R08: 0000000000008001 R09: ffff88066bf7e510
      Nov 6 16:38:07 lustre-mds-0-0 kernel: R10: ffff88067451c49c R11: ffffffffa03b89b0 R12: ffff880669236070
      Nov 6 16:38:07 lustre-mds-0-0 kernel: R13: ffff8806793c77a0 R14: 0000000000000038 R15: ffff880669208a68
      Nov 6 16:38:07 lustre-mds-0-0 kernel: FS: 00007f00c33bf700(0000) GS:ffff88038ac00000(0000) knlGS:0000000000000000
      Nov 6 16:38:07 lustre-mds-0-0 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      Nov 6 16:38:07 lustre-mds-0-0 kernel: CR2: 000000000000001c CR3: 00000006789ec000 CR4: 00000000000007e0
      Nov 6 16:38:07 lustre-mds-0-0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      Nov 6 16:38:07 lustre-mds-0-0 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Nov 6 16:38:07 lustre-mds-0-0 kernel: Process mdt02_002 (pid: 4408, threadinfo ffff88066bf86000, task ffff88066d264080)
      Nov 6 16:38:07 lustre-mds-0-0 kernel: Stack:
      Nov 6 16:38:07 lustre-mds-0-0 kernel: ffff88066bf7e000 ffff880677269000 ffff88066bf87a70 ffffffffa0ce1832
      Nov 6 16:38:07 lustre-mds-0-0 kernel: <d> ffff880669236070 ffff88066bf7e000 ffff880669236070 ffff8806793c77a0
      Nov 6 16:38:07 lustre-mds-0-0 kernel: <d> 0000000000000038 ffff880669208a68 ffff88066bf87b00 ffffffffa0cf4b0f
      Nov 6 16:38:07 lustre-mds-0-0 kernel: Call Trace:
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0ce1832>] ? mdt_pack_attr2body+0xe2/0x270 [mdt]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0cf4b0f>] mdt_getattr_internal+0x56f/0x1210 [mdt]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0cf661e>] mdt_getattr_name_lock+0xe6e/0x1980 [mdt]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06bd135>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06e5646>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06bf3c4>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0cf73cd>] mdt_intent_getattr+0x29d/0x490 [mdt]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0ce3f3e>] mdt_intent_policy+0x39e/0x720 [mdt]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0675831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa069c1ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0ce43c6>] mdt_enqueue+0x46/0xe0 [mdt]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0ceaab7>] mdt_handle_common+0x647/0x16d0 [mdt]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06bebac>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0d243f5>] mds_regular_handle+0x15/0x20 [mdt]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06ce3c8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa03e85de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa03f9d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06c5729>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06cf75e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06cec90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06cec90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06cec90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Nov 6 16:38:07 lustre-mds-0-0 kernel: Code: 41 ab 9e ff 48 89 83 70 04 00 00 e9 2d ff ff ff 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 28 0f 1f 44 00 00 <44> 0f b7 66 1c 41 89 fe 41 89 fd 48 89 f3 41 81 e6 00 04 06 02
      Nov 6 16:38:07 lustre-mds-0-0 kernel: RIP [<ffffffffa0cfb246>] mdt_dump_lmm+0x16/0x410 [mdt]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: RSP <ffff88066bf87a20>
      Nov 6 16:38:07 lustre-mds-0-0 kernel: CR2: 000000000000001c
      Nov 6 16:38:07 lustre-mds-0-0 kernel: --[ end trace 0dadd51afe1c36b7 ]--
      Nov 6 16:38:07 lustre-mds-0-0 kernel: Kernel panic - not syncing: Fatal exception

      We were trying to setup active/active MDS/MDT from a cluster with two MDS and two MDT. while trying to mount the 1.8.9 clients, we hit this panic on the MDS.

      Our goal was from a 1.8.x server -> upgrade to 2.4.1 -> backup and restore the single MDT to a new system with 1 MDT -> add another MDT on different MDS as remote mdt.

      The last step is to use tunefs.lustre to configure active/active HA on MDS(s)

      Attachments

        Activity

          [LU-4222] Oops in mdt_dump_lmm+0x16/0x410 [mdt]

          I'll try it.

          Regarding my earlier comment, I do sometimes see failure with 3 4 or mounts, but usually only one client out of 100. So it seems to become more likely with more mounts. Also the order of mounts seems to matter. I don't get errors if I mount all six in reverse order :/ I can't figure out what the pattern is yet.

          nedbass Ned Bass (Inactive) added a comment - I'll try it. Regarding my earlier comment, I do sometimes see failure with 3 4 or mounts, but usually only one client out of 100. So it seems to become more likely with more mounts. Also the order of mounts seems to matter. I don't get errors if I mount all six in reverse order :/ I can't figure out what the pattern is yet.
          di.wang Di Wang added a comment - - edited

          Ned, Could you try this patch

          diff --git a/lustre/mdc/mdc_locks.c b/lustre/mdc/mdc_locks.c
          index 28c07b4..4541f1a 100644
          --- a/lustre/mdc/mdc_locks.c
          +++ b/lustre/mdc/mdc_locks.c
          @@ -434,6 +434,10 @@ static struct ptlrpc_request *mdc_intent_getattr_pack(struct obd_export *exp,
                   lit = req_capsule_client_get(&req->rq_pill, &RMF_LDLM_INTENT);
                   lit->opc = (__u64)it->it_op;
           
          +       if (obddev->u.cli.cl_max_mds_easize == 0) {
          +               CERROR("%s: cl_max_mds_easize is zero!\n", obddev->obd_name);
          +               RETURN(ERR_PTR(-EINVAL));
          +       }
                   /* pack the intended request */
                   mdc_getattr_pack(req, valid, it->it_flags, op_data,
                                    obddev->u.cli.cl_max_mds_easize);
          

          to see whether you can see the error message from client side?

          di.wang Di Wang added a comment - - edited Ned, Could you try this patch diff --git a/lustre/mdc/mdc_locks.c b/lustre/mdc/mdc_locks.c index 28c07b4..4541f1a 100644 --- a/lustre/mdc/mdc_locks.c +++ b/lustre/mdc/mdc_locks.c @@ -434,6 +434,10 @@ static struct ptlrpc_request *mdc_intent_getattr_pack(struct obd_export *exp, lit = req_capsule_client_get(&req->rq_pill, &RMF_LDLM_INTENT); lit->opc = (__u64)it->it_op; + if (obddev->u.cli.cl_max_mds_easize == 0) { + CERROR("%s: cl_max_mds_easize is zero!\n", obddev->obd_name); + RETURN(ERR_PTR(-EINVAL)); + } /* pack the intended request */ mdc_getattr_pack(req, valid, it->it_flags, op_data, obddev->u.cli.cl_max_mds_easize); to see whether you can see the error message from client side?

          My test looks something like this, where <hostlist> contains about 100 nodes (pdsh is a distributed remote command invoker):

          pdsh -w <hostlist> 'umount -a -t lustre ; for x in 1 2 3 4 5 6 ; do mount /p/lscratch$x ; done ; cat /p/lscratch5/bass6/x 2>&1'
          
          

          If I mount all six filesystems, I always get some subset of nodes for which the cat command failed with 'Bad address'.

          If I mount only five, I usually but not always get failures.

          If I mount four or fewer, I never get failures.

          nedbass Ned Bass (Inactive) added a comment - My test looks something like this, where <hostlist> contains about 100 nodes (pdsh is a distributed remote command invoker): pdsh -w <hostlist> 'umount -a -t lustre ; for x in 1 2 3 4 5 6 ; do mount /p/lscratch$x ; done ; cat /p/lscratch5/bass6/x 2>&1' If I mount all six filesystems, I always get some subset of nodes for which the cat command failed with 'Bad address'. If I mount only five, I usually but not always get failures. If I mount four or fewer, I never get failures.

          Di, this client normally mounts 6 filesystems. The problem seems more likely to happen if 5 or more of those filesystems are mounted. I have yet to see it if I only mount 4 or fewer of the filesystems. I'm using about 100 client nodes to test so the evidence is pretty compelling.

          nedbass Ned Bass (Inactive) added a comment - Di, this client normally mounts 6 filesystems. The problem seems more likely to happen if 5 or more of those filesystems are mounted. I have yet to see it if I only mount 4 or fewer of the filesystems. I'm using about 100 client nodes to test so the evidence is pretty compelling.
          di.wang Di Wang added a comment -

          Hmm, I think there are two problems here
          1. MDS should not retrieve LOVEA if there are no room in the request, this is easy to fix.
          2. Why clients send getattr RPC without reserving the space for LOVEA, IMHO, if the cl_max_md_size is being set correctly, all of getattr RPC should have enough space for LOVEA. Hmm, could you check debug log on the client side to see where is the RPC from. is it from mdc_intent_getattr_pack ?

          di.wang Di Wang added a comment - Hmm, I think there are two problems here 1. MDS should not retrieve LOVEA if there are no room in the request, this is easy to fix. 2. Why clients send getattr RPC without reserving the space for LOVEA, IMHO, if the cl_max_md_size is being set correctly, all of getattr RPC should have enough space for LOVEA. Hmm, could you check debug log on the client side to see where is the RPC from. is it from mdc_intent_getattr_pack ?

          Di, initial results are that the client-side patch does not fix the problem.

          nedbass Ned Bass (Inactive) added a comment - Di, initial results are that the client-side patch does not fix the problem.

          After rebooting more clients, many did come up still having this problem. So we should be able to verify pretty easily if the patch works.

          nedbass Ned Bass (Inactive) added a comment - After rebooting more clients, many did come up still having this problem. So we should be able to verify pretty easily if the patch works.
          di.wang Di Wang added a comment - - edited

          Hmm, it seems to me cl_max_md_size is not being set correctly in time in some cases, which client will reply to pack getattr RPC. I update the patch, but that requires to remount the client. As you said remount can fix the problem? So after remount you can not reproduce the problem anymore? Anyway can you try this patch? thanks.

          di.wang Di Wang added a comment - - edited Hmm, it seems to me cl_max_md_size is not being set correctly in time in some cases, which client will reply to pack getattr RPC. I update the patch, but that requires to remount the client. As you said remount can fix the problem? So after remount you can not reproduce the problem anymore? Anyway can you try this patch? thanks.

          My only guess is that mdt_getattr_name_lock() sets MA_LOV in ma_need:

          1422                 if (try_layout) {
          1423                         child_bits |= MDS_INODELOCK_LAYOUT;
          1424                         /* try layout lock, it may fail to be granted due to
          1425                          * contention at LOOKUP or UPDATE */
          1426                         if (!mdt_object_lock_try(info, child, lhc, child_bits,
          1427                                                  MDT_CROSS_LOCK)) {
          1428                                 child_bits &= ~MDS_INODELOCK_LAYOUT;
          1429                                 LASSERT(child_bits != 0);
          1430                                 rc = mdt_object_lock(info, child, lhc,
          1431                                                 child_bits, MDT_CROSS_LOCK);
          1432                         } else {
          1433                                 ma_need |= MA_LOV;
          ...
          1452         rc = mdt_getattr_internal(info, child, ma_need);
          
          nedbass Ned Bass (Inactive) added a comment - My only guess is that mdt_getattr_name_lock() sets MA_LOV in ma_need: 1422 if (try_layout) { 1423 child_bits |= MDS_INODELOCK_LAYOUT; 1424 /* try layout lock, it may fail to be granted due to 1425 * contention at LOOKUP or UPDATE */ 1426 if (!mdt_object_lock_try(info, child, lhc, child_bits, 1427 MDT_CROSS_LOCK)) { 1428 child_bits &= ~MDS_INODELOCK_LAYOUT; 1429 LASSERT(child_bits != 0); 1430 rc = mdt_object_lock(info, child, lhc, 1431 child_bits, MDT_CROSS_LOCK); 1432 } else { 1433 ma_need |= MA_LOV; ... 1452 rc = mdt_getattr_internal(info, child, ma_need);
          di.wang Di Wang added a comment -

          hmm, so the RPC does not need LOVEA, but somehow the server still trying to get it. hmm

          di.wang Di Wang added a comment - hmm, so the RPC does not need LOVEA, but somehow the server still trying to get it. hmm

          Di, with that patch I get:

          mdt_getattr_internal() ls4-MDT0000: RPC from <UUID>: does not need LOVEA
          mdt_attr_get_lov() [<FID>] retrieve lovEA with (null):0
          mdt_attr_get_lov() [<FID>] got lovEA with (null):80
          

          This would have crashed, but I kept the check before mdt_dump_lmm() to log an error and return EFAULT if ma->ma_lmm is NULL.

          nedbass Ned Bass (Inactive) added a comment - Di, with that patch I get: mdt_getattr_internal() ls4-MDT0000: RPC from <UUID>: does not need LOVEA mdt_attr_get_lov() [<FID>] retrieve lovEA with (null):0 mdt_attr_get_lov() [<FID>] got lovEA with (null):80 This would have crashed, but I kept the check before mdt_dump_lmm() to log an error and return EFAULT if ma->ma_lmm is NULL .

          People

            di.wang Di Wang
            mdiep Minh Diep
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: