[LU-4222] Oops in mdt_dump_lmm+0x16/0x410 [mdt] - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Blocker
Fix Version/s: Lustre 2.6.0, Lustre 2.5.1
Affects Version/s: Lustre 2.4.1
Labels:
- mn4
- sdsc

Severity:
3
Rank (Obsolete):
11490

Description

Nov 6 16:38:07 lustre-mds-0-0 kernel: BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
Nov 6 16:38:07 lustre-mds-0-0 kernel: IP: [<ffffffffa0cfb246>] mdt_dump_lmm+0x16/0x410 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: PGD 0
Nov 6 16:38:07 lustre-mds-0-0 kernel: Oops: 0000 1 SMP
Nov 6 16:38:07 lustre-mds-0-0 kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:09.0/0000:19:00.0/0000:1a:04.0/0000:1c:00.0/irq
Nov 6 16:38:07 lustre-mds-0-0 kernel: CPU 4
Nov 6 16:38:07 lustre-mds-0-0 kernel: Modules linked in: osp(U) lod(U) mdt(U) mgs(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) lquota(U) mdd(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic crc32c_intel libcfs(U) ldiskfs(U) autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf iptable_filter ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support i7core_edac edac_core ioatdma raid10 myri10ge ses enclosure sg igb dca ptp pps_core sr_mod cdrom ext4 mbcache jbd2 sd_mod crc_t10dif usb_storage ahci mptsas mptscsih mptbase scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Nov 6 16:38:07 lustre-mds-0-0 kernel:
Nov 6 16:38:07 lustre-mds-0-0 kernel: Pid: 4408, comm: mdt02_002 Not tainted 2.6.32-358.14.1.el6_lustre.g0a46394.x86_64 #1 SUN MICROSYSTEMS SUN FIRE X4170 SERVER /ASSY,MOTHERBOARD,X4170
Nov 6 16:38:07 lustre-mds-0-0 kernel: RIP: 0010:[<ffffffffa0cfb246>] [<ffffffffa0cfb246>] mdt_dump_lmm+0x16/0x410 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: RSP: 0018:ffff88066bf87a20 EFLAGS: 00010282
Nov 6 16:38:07 lustre-mds-0-0 kernel: RAX: 0000000000000003 RBX: ffff88066bf7e000 RCX: ffffc9002118d6f0
Nov 6 16:38:07 lustre-mds-0-0 kernel: RDX: ffff88066914bc00 RSI: 0000000000000000 RDI: 0000000000000040
Nov 6 16:38:07 lustre-mds-0-0 kernel: RBP: ffff88066bf87a70 R08: 0000000000008001 R09: ffff88066bf7e510
Nov 6 16:38:07 lustre-mds-0-0 kernel: R10: ffff88067451c49c R11: ffffffffa03b89b0 R12: ffff880669236070
Nov 6 16:38:07 lustre-mds-0-0 kernel: R13: ffff8806793c77a0 R14: 0000000000000038 R15: ffff880669208a68
Nov 6 16:38:07 lustre-mds-0-0 kernel: FS: 00007f00c33bf700(0000) GS:ffff88038ac00000(0000) knlGS:0000000000000000
Nov 6 16:38:07 lustre-mds-0-0 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Nov 6 16:38:07 lustre-mds-0-0 kernel: CR2: 000000000000001c CR3: 00000006789ec000 CR4: 00000000000007e0
Nov 6 16:38:07 lustre-mds-0-0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 6 16:38:07 lustre-mds-0-0 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 6 16:38:07 lustre-mds-0-0 kernel: Process mdt02_002 (pid: 4408, threadinfo ffff88066bf86000, task ffff88066d264080)
Nov 6 16:38:07 lustre-mds-0-0 kernel: Stack:
Nov 6 16:38:07 lustre-mds-0-0 kernel: ffff88066bf7e000 ffff880677269000 ffff88066bf87a70 ffffffffa0ce1832
Nov 6 16:38:07 lustre-mds-0-0 kernel: <d> ffff880669236070 ffff88066bf7e000 ffff880669236070 ffff8806793c77a0
Nov 6 16:38:07 lustre-mds-0-0 kernel: <d> 0000000000000038 ffff880669208a68 ffff88066bf87b00 ffffffffa0cf4b0f
Nov 6 16:38:07 lustre-mds-0-0 kernel: Call Trace:
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0ce1832>] ? mdt_pack_attr2body+0xe2/0x270 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0cf4b0f>] mdt_getattr_internal+0x56f/0x1210 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0cf661e>] mdt_getattr_name_lock+0xe6e/0x1980 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06bd135>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06e5646>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06bf3c4>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0cf73cd>] mdt_intent_getattr+0x29d/0x490 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0ce3f3e>] mdt_intent_policy+0x39e/0x720 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0675831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa069c1ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0ce43c6>] mdt_enqueue+0x46/0xe0 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0ceaab7>] mdt_handle_common+0x647/0x16d0 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06bebac>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0d243f5>] mds_regular_handle+0x15/0x20 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06ce3c8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa03e85de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa03f9d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06c5729>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06cf75e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06cec90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06cec90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06cec90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Nov 6 16:38:07 lustre-mds-0-0 kernel: Code: 41 ab 9e ff 48 89 83 70 04 00 00 e9 2d ff ff ff 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 28 0f 1f 44 00 00 <44> 0f b7 66 1c 41 89 fe 41 89 fd 48 89 f3 41 81 e6 00 04 06 02
Nov 6 16:38:07 lustre-mds-0-0 kernel: RIP [<ffffffffa0cfb246>] mdt_dump_lmm+0x16/0x410 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: RSP <ffff88066bf87a20>
Nov 6 16:38:07 lustre-mds-0-0 kernel: CR2: 000000000000001c
Nov 6 16:38:07 lustre-mds-0-0 kernel: --~~[ end trace 0dadd51afe1c36b7 ]~~--
Nov 6 16:38:07 lustre-mds-0-0 kernel: Kernel panic - not syncing: Fatal exception

We were trying to setup active/active MDS/MDT from a cluster with two MDS and two MDT. while trying to mount the 1.8.9 clients, we hit this panic on the MDS.

Our goal was from a 1.8.x server -> upgrade to 2.4.1 -> backup and restore the single MDT to a new system with 1 MDT -> add another MDT on different MDS as remote mdt.

The last step is to use tunefs.lustre to configure active/active HA on MDS(s)

Attachments

Activity

[LU-4222] Oops in mdt_dump_lmm+0x16/0x410 [mdt]

Di Wang added a comment - 13/Dec/13 1:51 AM

Oh, it seems from ~~LU-3338~~. Ned, does this patch in your lustre source(lustre-2.4.0-19chaos)?

Di Wang added a comment - 13/Dec/13 1:51 AM Oh, it seems from LU-3338 . Ned, does this patch in your lustre source(lustre-2.4.0-19chaos)?

Di Wang added a comment - 13/Dec/13 1:46 AM

Hmm, I compared lustre b2_4 code with lustre chaos 2.4, in mdc_intent_getattr_pack

chaos lustre 2.4

        /* pack the intended request */
        mdc_getattr_pack(req, valid, it->it_flags, op_data,
                         obddev->u.cli.cl_default_mds_easize);

my lustre b2_4 branch

     /* pack the intended request */
        mdc_getattr_pack(req, valid, it->it_flags, op_data,
                         obddev->u.cli.cl_max_mds_easize);

Ned, Could you please confirm that? Some one change this code in chaos?

Di Wang added a comment - 13/Dec/13 1:46 AM Hmm, I compared lustre b2_4 code with lustre chaos 2.4, in mdc_intent_getattr_pack chaos lustre 2.4 /* pack the intended request */ mdc_getattr_pack(req, valid, it->it_flags, op_data, obddev->u.cli.cl_default_mds_easize); my lustre b2_4 branch /* pack the intended request */ mdc_getattr_pack(req, valid, it->it_flags, op_data, obddev->u.cli.cl_max_mds_easize); Ned, Could you please confirm that? Some one change this code in chaos?

Di Wang added a comment - 13/Dec/13 1:36 AM

Hmm, could you please try this plus 8550

diff --git a/lustre/mdc/mdc_request.c b/lustre/mdc/mdc_request.c
index cc592a7..f9ef390 100644
--- a/lustre/mdc/mdc_request.c
+++ b/lustre/mdc/mdc_request.c
@@ -189,14 +189,13 @@ static int mdc_getattr_common(struct obd_export *exp,
 
         CDEBUG(D_NET, "mode: %o\n", body->mode);
 
-        if (body->eadatasize != 0) {
-                mdc_update_max_ea_from_body(exp, body);
-
-                eadata = req_capsule_server_sized_get(pill, &RMF_MDT_MD,
-                                                      body->eadatasize);
-                if (eadata == NULL)
-                        RETURN(-EPROTO);
-        }
+       mdc_update_max_ea_from_body(exp, body);
+       if (body->eadatasize != 0) {
+               eadata = req_capsule_server_sized_get(pill, &RMF_MDT_MD,
+                                                     body->eadatasize);
+               if (eadata == NULL)
+                       RETURN(-EPROTO);
+       }
 
         if (body->valid & OBD_MD_FLRMTPERM) {

Di Wang added a comment - 13/Dec/13 1:36 AM Hmm, could you please try this plus 8550 diff --git a/lustre/mdc/mdc_request.c b/lustre/mdc/mdc_request.c index cc592a7..f9ef390 100644 --- a/lustre/mdc/mdc_request.c +++ b/lustre/mdc/mdc_request.c @@ -189,14 +189,13 @@ static int mdc_getattr_common(struct obd_export *exp, CDEBUG(D_NET, "mode: %o\n", body->mode); - if (body->eadatasize != 0) { - mdc_update_max_ea_from_body(exp, body); - - eadata = req_capsule_server_sized_get(pill, &RMF_MDT_MD, - body->eadatasize); - if (eadata == NULL) - RETURN(-EPROTO); - } + mdc_update_max_ea_from_body(exp, body); + if (body->eadatasize != 0) { + eadata = req_capsule_server_sized_get(pill, &RMF_MDT_MD, + body->eadatasize); + if (eadata == NULL) + RETURN(-EPROTO); + } if (body->valid & OBD_MD_FLRMTPERM) {

Di Wang added a comment - 13/Dec/13 1:33 AM

Hmm, this probably mean cl_max_md_size initialization is being delayed when there are multiple client being mounted at the same time, since this cl_max_md_size is being updated when OSC is connected. so we need initialize the cl_max_md_size synchronously during the mount process.

Di Wang added a comment - 13/Dec/13 1:33 AM Hmm, this probably mean cl_max_md_size initialization is being delayed when there are multiple client being mounted at the same time, since this cl_max_md_size is being updated when OSC is connected. so we need initialize the cl_max_md_size synchronously during the mount process.

Ned Bass (Inactive) added a comment - 13/Dec/13 1:29 AM

No, the error does not show up on the client when I reproduce the bug.

Ned Bass (Inactive) added a comment - 13/Dec/13 1:29 AM No, the error does not show up on the client when I reproduce the bug.

Ned Bass (Inactive) added a comment - 13/Dec/13 1:06 AM

I'll try it.

Regarding my earlier comment, I do sometimes see failure with 3 4 or mounts, but usually only one client out of 100. So it seems to become more likely with more mounts. Also the order of mounts seems to matter. I don't get errors if I mount all six in reverse order :/ I can't figure out what the pattern is yet.

Ned Bass (Inactive) added a comment - 13/Dec/13 1:06 AM I'll try it. Regarding my earlier comment, I do sometimes see failure with 3 4 or mounts, but usually only one client out of 100. So it seems to become more likely with more mounts. Also the order of mounts seems to matter. I don't get errors if I mount all six in reverse order :/ I can't figure out what the pattern is yet.

Di Wang added a comment - 13/Dec/13 12:48 AM - edited

Ned, Could you try this patch

diff --git a/lustre/mdc/mdc_locks.c b/lustre/mdc/mdc_locks.c
index 28c07b4..4541f1a 100644
--- a/lustre/mdc/mdc_locks.c
+++ b/lustre/mdc/mdc_locks.c
@@ -434,6 +434,10 @@ static struct ptlrpc_request *mdc_intent_getattr_pack(struct obd_export *exp,
         lit = req_capsule_client_get(&req->rq_pill, &RMF_LDLM_INTENT);
         lit->opc = (__u64)it->it_op;
 
+       if (obddev->u.cli.cl_max_mds_easize == 0) {
+               CERROR("%s: cl_max_mds_easize is zero!\n", obddev->obd_name);
+               RETURN(ERR_PTR(-EINVAL));
+       }
         /* pack the intended request */
         mdc_getattr_pack(req, valid, it->it_flags, op_data,
                          obddev->u.cli.cl_max_mds_easize);

to see whether you can see the error message from client side?

Di Wang added a comment - 13/Dec/13 12:48 AM - edited Ned, Could you try this patch diff --git a/lustre/mdc/mdc_locks.c b/lustre/mdc/mdc_locks.c index 28c07b4..4541f1a 100644 --- a/lustre/mdc/mdc_locks.c +++ b/lustre/mdc/mdc_locks.c @@ -434,6 +434,10 @@ static struct ptlrpc_request *mdc_intent_getattr_pack(struct obd_export *exp, lit = req_capsule_client_get(&req->rq_pill, &RMF_LDLM_INTENT); lit->opc = (__u64)it->it_op; + if (obddev->u.cli.cl_max_mds_easize == 0) { + CERROR("%s: cl_max_mds_easize is zero!\n", obddev->obd_name); + RETURN(ERR_PTR(-EINVAL)); + } /* pack the intended request */ mdc_getattr_pack(req, valid, it->it_flags, op_data, obddev->u.cli.cl_max_mds_easize); to see whether you can see the error message from client side?

Ned Bass (Inactive) added a comment - 13/Dec/13 12:45 AM

My test looks something like this, where <hostlist> contains about 100 nodes (pdsh is a distributed remote command invoker):

pdsh -w <hostlist> 'umount -a -t lustre ; for x in 1 2 3 4 5 6 ; do mount /p/lscratch$x ; done ; cat /p/lscratch5/bass6/x 2>&1'

If I mount all six filesystems, I always get some subset of nodes for which the cat command failed with 'Bad address'.

If I mount only five, I usually but not always get failures.

If I mount four or fewer, I never get failures.

Ned Bass (Inactive) added a comment - 13/Dec/13 12:45 AM My test looks something like this, where <hostlist> contains about 100 nodes (pdsh is a distributed remote command invoker): pdsh -w <hostlist> 'umount -a -t lustre ; for x in 1 2 3 4 5 6 ; do mount /p/lscratch$x ; done ; cat /p/lscratch5/bass6/x 2>&1' If I mount all six filesystems, I always get some subset of nodes for which the cat command failed with 'Bad address'. If I mount only five, I usually but not always get failures. If I mount four or fewer, I never get failures.

Ned Bass (Inactive) added a comment - 13/Dec/13 12:41 AM

Di, this client normally mounts 6 filesystems. The problem seems more likely to happen if 5 or more of those filesystems are mounted. I have yet to see it if I only mount 4 or fewer of the filesystems. I'm using about 100 client nodes to test so the evidence is pretty compelling.

Ned Bass (Inactive) added a comment - 13/Dec/13 12:41 AM Di, this client normally mounts 6 filesystems. The problem seems more likely to happen if 5 or more of those filesystems are mounted. I have yet to see it if I only mount 4 or fewer of the filesystems. I'm using about 100 client nodes to test so the evidence is pretty compelling.

Di Wang added a comment - 13/Dec/13 12:29 AM

Hmm, I think there are two problems here
1. MDS should not retrieve LOVEA if there are no room in the request, this is easy to fix.
2. Why clients send getattr RPC without reserving the space for LOVEA, IMHO, if the cl_max_md_size is being set correctly, all of getattr RPC should have enough space for LOVEA. Hmm, could you check debug log on the client side to see where is the RPC from. is it from mdc_intent_getattr_pack ?

Di Wang added a comment - 13/Dec/13 12:29 AM Hmm, I think there are two problems here 1. MDS should not retrieve LOVEA if there are no room in the request, this is easy to fix. 2. Why clients send getattr RPC without reserving the space for LOVEA, IMHO, if the cl_max_md_size is being set correctly, all of getattr RPC should have enough space for LOVEA. Hmm, could you check debug log on the client side to see where is the RPC from. is it from mdc_intent_getattr_pack ?

Ned Bass (Inactive) added a comment - 13/Dec/13 12:18 AM

Di, initial results are that the client-side patch does not fix the problem.

Ned Bass (Inactive) added a comment - 13/Dec/13 12:18 AM Di, initial results are that the client-side patch does not fix the problem.

People

Assignee:: Di Wang

Reporter:: Minh Diep

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 07/Nov/13 1:04 AM

Updated:: 20/Jan/14 10:38 PM

Resolved:: 07/Jan/14 4:30 PM