Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4222

Oops in mdt_dump_lmm+0x16/0x410 [mdt]

Details

    • 3
    • 11490

    Description

      Nov 6 16:38:07 lustre-mds-0-0 kernel: BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
      Nov 6 16:38:07 lustre-mds-0-0 kernel: IP: [<ffffffffa0cfb246>] mdt_dump_lmm+0x16/0x410 [mdt]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: PGD 0
      Nov 6 16:38:07 lustre-mds-0-0 kernel: Oops: 0000 1 SMP
      Nov 6 16:38:07 lustre-mds-0-0 kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:09.0/0000:19:00.0/0000:1a:04.0/0000:1c:00.0/irq
      Nov 6 16:38:07 lustre-mds-0-0 kernel: CPU 4
      Nov 6 16:38:07 lustre-mds-0-0 kernel: Modules linked in: osp(U) lod(U) mdt(U) mgs(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) lquota(U) mdd(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic crc32c_intel libcfs(U) ldiskfs(U) autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf iptable_filter ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support i7core_edac edac_core ioatdma raid10 myri10ge ses enclosure sg igb dca ptp pps_core sr_mod cdrom ext4 mbcache jbd2 sd_mod crc_t10dif usb_storage ahci mptsas mptscsih mptbase scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
      Nov 6 16:38:07 lustre-mds-0-0 kernel:
      Nov 6 16:38:07 lustre-mds-0-0 kernel: Pid: 4408, comm: mdt02_002 Not tainted 2.6.32-358.14.1.el6_lustre.g0a46394.x86_64 #1 SUN MICROSYSTEMS SUN FIRE X4170 SERVER /ASSY,MOTHERBOARD,X4170
      Nov 6 16:38:07 lustre-mds-0-0 kernel: RIP: 0010:[<ffffffffa0cfb246>] [<ffffffffa0cfb246>] mdt_dump_lmm+0x16/0x410 [mdt]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: RSP: 0018:ffff88066bf87a20 EFLAGS: 00010282
      Nov 6 16:38:07 lustre-mds-0-0 kernel: RAX: 0000000000000003 RBX: ffff88066bf7e000 RCX: ffffc9002118d6f0
      Nov 6 16:38:07 lustre-mds-0-0 kernel: RDX: ffff88066914bc00 RSI: 0000000000000000 RDI: 0000000000000040
      Nov 6 16:38:07 lustre-mds-0-0 kernel: RBP: ffff88066bf87a70 R08: 0000000000008001 R09: ffff88066bf7e510
      Nov 6 16:38:07 lustre-mds-0-0 kernel: R10: ffff88067451c49c R11: ffffffffa03b89b0 R12: ffff880669236070
      Nov 6 16:38:07 lustre-mds-0-0 kernel: R13: ffff8806793c77a0 R14: 0000000000000038 R15: ffff880669208a68
      Nov 6 16:38:07 lustre-mds-0-0 kernel: FS: 00007f00c33bf700(0000) GS:ffff88038ac00000(0000) knlGS:0000000000000000
      Nov 6 16:38:07 lustre-mds-0-0 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      Nov 6 16:38:07 lustre-mds-0-0 kernel: CR2: 000000000000001c CR3: 00000006789ec000 CR4: 00000000000007e0
      Nov 6 16:38:07 lustre-mds-0-0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      Nov 6 16:38:07 lustre-mds-0-0 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Nov 6 16:38:07 lustre-mds-0-0 kernel: Process mdt02_002 (pid: 4408, threadinfo ffff88066bf86000, task ffff88066d264080)
      Nov 6 16:38:07 lustre-mds-0-0 kernel: Stack:
      Nov 6 16:38:07 lustre-mds-0-0 kernel: ffff88066bf7e000 ffff880677269000 ffff88066bf87a70 ffffffffa0ce1832
      Nov 6 16:38:07 lustre-mds-0-0 kernel: <d> ffff880669236070 ffff88066bf7e000 ffff880669236070 ffff8806793c77a0
      Nov 6 16:38:07 lustre-mds-0-0 kernel: <d> 0000000000000038 ffff880669208a68 ffff88066bf87b00 ffffffffa0cf4b0f
      Nov 6 16:38:07 lustre-mds-0-0 kernel: Call Trace:
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0ce1832>] ? mdt_pack_attr2body+0xe2/0x270 [mdt]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0cf4b0f>] mdt_getattr_internal+0x56f/0x1210 [mdt]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0cf661e>] mdt_getattr_name_lock+0xe6e/0x1980 [mdt]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06bd135>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06e5646>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06bf3c4>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0cf73cd>] mdt_intent_getattr+0x29d/0x490 [mdt]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0ce3f3e>] mdt_intent_policy+0x39e/0x720 [mdt]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0675831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa069c1ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0ce43c6>] mdt_enqueue+0x46/0xe0 [mdt]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0ceaab7>] mdt_handle_common+0x647/0x16d0 [mdt]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06bebac>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0d243f5>] mds_regular_handle+0x15/0x20 [mdt]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06ce3c8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa03e85de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa03f9d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06c5729>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06cf75e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06cec90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06cec90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06cec90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Nov 6 16:38:07 lustre-mds-0-0 kernel: Code: 41 ab 9e ff 48 89 83 70 04 00 00 e9 2d ff ff ff 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 28 0f 1f 44 00 00 <44> 0f b7 66 1c 41 89 fe 41 89 fd 48 89 f3 41 81 e6 00 04 06 02
      Nov 6 16:38:07 lustre-mds-0-0 kernel: RIP [<ffffffffa0cfb246>] mdt_dump_lmm+0x16/0x410 [mdt]
      Nov 6 16:38:07 lustre-mds-0-0 kernel: RSP <ffff88066bf87a20>
      Nov 6 16:38:07 lustre-mds-0-0 kernel: CR2: 000000000000001c
      Nov 6 16:38:07 lustre-mds-0-0 kernel: --[ end trace 0dadd51afe1c36b7 ]--
      Nov 6 16:38:07 lustre-mds-0-0 kernel: Kernel panic - not syncing: Fatal exception

      We were trying to setup active/active MDS/MDT from a cluster with two MDS and two MDT. while trying to mount the 1.8.9 clients, we hit this panic on the MDS.

      Our goal was from a 1.8.x server -> upgrade to 2.4.1 -> backup and restore the single MDT to a new system with 1 MDT -> add another MDT on different MDS as remote mdt.

      The last step is to use tunefs.lustre to configure active/active HA on MDS(s)

      Attachments

        Activity

          [LU-4222] Oops in mdt_dump_lmm+0x16/0x410 [mdt]
          di.wang Di Wang added a comment -

          Oh, it seems from LU-3338. Ned, does this patch in your lustre source(lustre-2.4.0-19chaos)?

          di.wang Di Wang added a comment - Oh, it seems from LU-3338 . Ned, does this patch in your lustre source(lustre-2.4.0-19chaos)?
          di.wang Di Wang added a comment -

          Hmm, I compared lustre b2_4 code with lustre chaos 2.4, in mdc_intent_getattr_pack

          chaos lustre 2.4

                  /* pack the intended request */
                  mdc_getattr_pack(req, valid, it->it_flags, op_data,
                                   obddev->u.cli.cl_default_mds_easize);
          

          my lustre b2_4 branch

               /* pack the intended request */
                  mdc_getattr_pack(req, valid, it->it_flags, op_data,
                                   obddev->u.cli.cl_max_mds_easize);
          

          Ned, Could you please confirm that? Some one change this code in chaos?

          di.wang Di Wang added a comment - Hmm, I compared lustre b2_4 code with lustre chaos 2.4, in mdc_intent_getattr_pack chaos lustre 2.4 /* pack the intended request */ mdc_getattr_pack(req, valid, it->it_flags, op_data, obddev->u.cli.cl_default_mds_easize); my lustre b2_4 branch /* pack the intended request */ mdc_getattr_pack(req, valid, it->it_flags, op_data, obddev->u.cli.cl_max_mds_easize); Ned, Could you please confirm that? Some one change this code in chaos?
          di.wang Di Wang added a comment -

          Hmm, could you please try this plus 8550

          diff --git a/lustre/mdc/mdc_request.c b/lustre/mdc/mdc_request.c
          index cc592a7..f9ef390 100644
          --- a/lustre/mdc/mdc_request.c
          +++ b/lustre/mdc/mdc_request.c
          @@ -189,14 +189,13 @@ static int mdc_getattr_common(struct obd_export *exp,
           
                   CDEBUG(D_NET, "mode: %o\n", body->mode);
           
          -        if (body->eadatasize != 0) {
          -                mdc_update_max_ea_from_body(exp, body);
          -
          -                eadata = req_capsule_server_sized_get(pill, &RMF_MDT_MD,
          -                                                      body->eadatasize);
          -                if (eadata == NULL)
          -                        RETURN(-EPROTO);
          -        }
          +       mdc_update_max_ea_from_body(exp, body);
          +       if (body->eadatasize != 0) {
          +               eadata = req_capsule_server_sized_get(pill, &RMF_MDT_MD,
          +                                                     body->eadatasize);
          +               if (eadata == NULL)
          +                       RETURN(-EPROTO);
          +       }
           
                   if (body->valid & OBD_MD_FLRMTPERM) {
          
          
          
          di.wang Di Wang added a comment - Hmm, could you please try this plus 8550 diff --git a/lustre/mdc/mdc_request.c b/lustre/mdc/mdc_request.c index cc592a7..f9ef390 100644 --- a/lustre/mdc/mdc_request.c +++ b/lustre/mdc/mdc_request.c @@ -189,14 +189,13 @@ static int mdc_getattr_common(struct obd_export *exp, CDEBUG(D_NET, "mode: %o\n", body->mode); - if (body->eadatasize != 0) { - mdc_update_max_ea_from_body(exp, body); - - eadata = req_capsule_server_sized_get(pill, &RMF_MDT_MD, - body->eadatasize); - if (eadata == NULL) - RETURN(-EPROTO); - } + mdc_update_max_ea_from_body(exp, body); + if (body->eadatasize != 0) { + eadata = req_capsule_server_sized_get(pill, &RMF_MDT_MD, + body->eadatasize); + if (eadata == NULL) + RETURN(-EPROTO); + } if (body->valid & OBD_MD_FLRMTPERM) {
          di.wang Di Wang added a comment -

          Hmm, this probably mean cl_max_md_size initialization is being delayed when there are multiple client being mounted at the same time, since this cl_max_md_size is being updated when OSC is connected. so we need initialize the cl_max_md_size synchronously during the mount process.

          di.wang Di Wang added a comment - Hmm, this probably mean cl_max_md_size initialization is being delayed when there are multiple client being mounted at the same time, since this cl_max_md_size is being updated when OSC is connected. so we need initialize the cl_max_md_size synchronously during the mount process.

          No, the error does not show up on the client when I reproduce the bug.

          nedbass Ned Bass (Inactive) added a comment - No, the error does not show up on the client when I reproduce the bug.

          I'll try it.

          Regarding my earlier comment, I do sometimes see failure with 3 4 or mounts, but usually only one client out of 100. So it seems to become more likely with more mounts. Also the order of mounts seems to matter. I don't get errors if I mount all six in reverse order :/ I can't figure out what the pattern is yet.

          nedbass Ned Bass (Inactive) added a comment - I'll try it. Regarding my earlier comment, I do sometimes see failure with 3 4 or mounts, but usually only one client out of 100. So it seems to become more likely with more mounts. Also the order of mounts seems to matter. I don't get errors if I mount all six in reverse order :/ I can't figure out what the pattern is yet.
          di.wang Di Wang added a comment - - edited

          Ned, Could you try this patch

          diff --git a/lustre/mdc/mdc_locks.c b/lustre/mdc/mdc_locks.c
          index 28c07b4..4541f1a 100644
          --- a/lustre/mdc/mdc_locks.c
          +++ b/lustre/mdc/mdc_locks.c
          @@ -434,6 +434,10 @@ static struct ptlrpc_request *mdc_intent_getattr_pack(struct obd_export *exp,
                   lit = req_capsule_client_get(&req->rq_pill, &RMF_LDLM_INTENT);
                   lit->opc = (__u64)it->it_op;
           
          +       if (obddev->u.cli.cl_max_mds_easize == 0) {
          +               CERROR("%s: cl_max_mds_easize is zero!\n", obddev->obd_name);
          +               RETURN(ERR_PTR(-EINVAL));
          +       }
                   /* pack the intended request */
                   mdc_getattr_pack(req, valid, it->it_flags, op_data,
                                    obddev->u.cli.cl_max_mds_easize);
          

          to see whether you can see the error message from client side?

          di.wang Di Wang added a comment - - edited Ned, Could you try this patch diff --git a/lustre/mdc/mdc_locks.c b/lustre/mdc/mdc_locks.c index 28c07b4..4541f1a 100644 --- a/lustre/mdc/mdc_locks.c +++ b/lustre/mdc/mdc_locks.c @@ -434,6 +434,10 @@ static struct ptlrpc_request *mdc_intent_getattr_pack(struct obd_export *exp, lit = req_capsule_client_get(&req->rq_pill, &RMF_LDLM_INTENT); lit->opc = (__u64)it->it_op; + if (obddev->u.cli.cl_max_mds_easize == 0) { + CERROR("%s: cl_max_mds_easize is zero!\n", obddev->obd_name); + RETURN(ERR_PTR(-EINVAL)); + } /* pack the intended request */ mdc_getattr_pack(req, valid, it->it_flags, op_data, obddev->u.cli.cl_max_mds_easize); to see whether you can see the error message from client side?

          My test looks something like this, where <hostlist> contains about 100 nodes (pdsh is a distributed remote command invoker):

          pdsh -w <hostlist> 'umount -a -t lustre ; for x in 1 2 3 4 5 6 ; do mount /p/lscratch$x ; done ; cat /p/lscratch5/bass6/x 2>&1'
          
          

          If I mount all six filesystems, I always get some subset of nodes for which the cat command failed with 'Bad address'.

          If I mount only five, I usually but not always get failures.

          If I mount four or fewer, I never get failures.

          nedbass Ned Bass (Inactive) added a comment - My test looks something like this, where <hostlist> contains about 100 nodes (pdsh is a distributed remote command invoker): pdsh -w <hostlist> 'umount -a -t lustre ; for x in 1 2 3 4 5 6 ; do mount /p/lscratch$x ; done ; cat /p/lscratch5/bass6/x 2>&1' If I mount all six filesystems, I always get some subset of nodes for which the cat command failed with 'Bad address'. If I mount only five, I usually but not always get failures. If I mount four or fewer, I never get failures.

          Di, this client normally mounts 6 filesystems. The problem seems more likely to happen if 5 or more of those filesystems are mounted. I have yet to see it if I only mount 4 or fewer of the filesystems. I'm using about 100 client nodes to test so the evidence is pretty compelling.

          nedbass Ned Bass (Inactive) added a comment - Di, this client normally mounts 6 filesystems. The problem seems more likely to happen if 5 or more of those filesystems are mounted. I have yet to see it if I only mount 4 or fewer of the filesystems. I'm using about 100 client nodes to test so the evidence is pretty compelling.
          di.wang Di Wang added a comment -

          Hmm, I think there are two problems here
          1. MDS should not retrieve LOVEA if there are no room in the request, this is easy to fix.
          2. Why clients send getattr RPC without reserving the space for LOVEA, IMHO, if the cl_max_md_size is being set correctly, all of getattr RPC should have enough space for LOVEA. Hmm, could you check debug log on the client side to see where is the RPC from. is it from mdc_intent_getattr_pack ?

          di.wang Di Wang added a comment - Hmm, I think there are two problems here 1. MDS should not retrieve LOVEA if there are no room in the request, this is easy to fix. 2. Why clients send getattr RPC without reserving the space for LOVEA, IMHO, if the cl_max_md_size is being set correctly, all of getattr RPC should have enough space for LOVEA. Hmm, could you check debug log on the client side to see where is the RPC from. is it from mdc_intent_getattr_pack ?

          Di, initial results are that the client-side patch does not fix the problem.

          nedbass Ned Bass (Inactive) added a comment - Di, initial results are that the client-side patch does not fix the problem.

          People

            di.wang Di Wang
            mdiep Minh Diep
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: