[LU-4791] lod_ah_init() ASSERTION( lc->ldo_stripenr == 0 ) failed: Created: 20/Mar/14 Updated: 21/May/14 Resolved: 17/Apr/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0, Lustre 2.4.2 |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.5.2 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Patrick Valentin (Inactive) | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | mn4 | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 13190 | ||||||||||||
| Description |
|
Lustre: 2.4.2
On a file system with 170 OSTs but without "wide striping" enabled (ea_inode not set on MDT), issuing an "lctl setstripe -1 <file>" command and then writing in this file causes a MDS crash. # lfs setstripe -c -1 /fs_pv/170_stripe_file error on ioctl 0x4008669a for '/fs_pv/170_stripe_file' (3): No space left on device error: setstripe: create stripe file '/fs_pv/170_stripe_file' failed # ls -l /fs_pv/ total 0 -rw-r--r-- 1 root root 0 Mar 20 14:17 170_stripe_file after "lfs setstripe" command, the dmesg content on MDS is the following: # dmesg Lustre: 11776:0:(osd_handler.c:833:osd_trans_start()) fs_pv-MDT0000: too many transaction credits (2424 > 2048) Lustre: 11776:0:(osd_handler.c:840:osd_trans_start()) create: 0/0, delete: 0/0, destroy: 0/0 Lustre: 11776:0:(osd_handler.c:845:osd_trans_start()) attr_set: 0/0, xattr_set: 2/28 Lustre: 11776:0:(osd_handler.c:852:osd_trans_start()) write: 171/2394, punch: 0/0, quota 2/2 Lustre: 11776:0:(osd_handler.c:857:osd_trans_start()) insert: 0/0, delete: 0/0 Lustre: 11776:0:(osd_handler.c:862:osd_trans_start()) ref_add: 0/0, ref_del: 0/0 Pid: 11776, comm: mdt01_005 Call Trace: [<ffffffffa03a5895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0bb131e>] osd_trans_start+0x65e/0x680 [osd_ldiskfs] [<ffffffffa0cd8309>] lod_trans_start+0x1b9/0x250 [lod] [<ffffffffa084b357>] mdd_trans_start+0x17/0x20 [mdd] [<ffffffffa083b0b9>] mdd_create_data+0x539/0x7d0 [mdd] [<ffffffffa0c4beac>] mdt_finish_open+0x125c/0x1950 [mdt] [<ffffffffa0c47778>] ? mdt_object_open_lock+0x1c8/0x510 [mdt] [<ffffffffa0c4ca56>] mdt_open_by_fid_lock+0x4b6/0x7d0 [mdt] [<ffffffffa0c4d5cb>] mdt_reint_open+0x56b/0x21d0 [mdt] [<ffffffffa03c283e>] ? upcall_cache_get_entry+0x28e/0x860 [libcfs] [<ffffffffa06fcdbc>] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] [<ffffffffa0592240>] ? lu_ucred+0x20/0x30 [obdclass] [<ffffffffa0c18015>] ? mdt_ucred+0x15/0x20 [mdt] [<ffffffffa0c342ec>] ? mdt_root_squash+0x2c/0x410 [mdt] [<ffffffffa0724636>] ? __req_capsule_get+0x166/0x700 [ptlrpc] [<ffffffffa0592240>] ? lu_ucred+0x20/0x30 [obdclass] [<ffffffffa0c38aa1>] mdt_reint_rec+0x41/0xe0 [mdt] [<ffffffffa0c1dc73>] mdt_reint_internal+0x4c3/0x780 [mdt] [<ffffffffa0c1e1fd>] mdt_intent_reint+0x1ed/0x520 [mdt] [<ffffffffa0c1c0ae>] mdt_intent_policy+0x39e/0x720 [mdt] [<ffffffffa06b4831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] [<ffffffffa06db1df>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] [<ffffffffa0c1c536>] mdt_enqueue+0x46/0xe0 [mdt] [<ffffffffa0c22c27>] mdt_handle_common+0x647/0x16d0 [mdt] [<ffffffffa06fdb9c>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] [<ffffffffa0c5c835>] mds_regular_handle+0x15/0x20 [mdt] [<ffffffffa070d3b8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] [<ffffffffa03a65de>] ? cfs_timer_arm+0xe/0x10 [libcfs] [<ffffffffa03b7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] [<ffffffffa0704719>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] [<ffffffff81058bd3>] ? __wake_up+0x53/0x70 [<ffffffffa070e74e>] ptlrpc_main+0xace/0x1700 [ptlrpc] [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [<ffffffff8100c200>] ? child_rip+0x0/0x20 When trying to write to the file, the write command hangs and the MDS crashes: # echo Hello > /fs_pv/170_stripe_file After MDS is restarted, the crash trace is the following: crash> bt PID: 5428 TASK: ffff88031ae66ac0 CPU: 5 COMMAND: "mdt01_001" #0 [ffff880315645738] machine_kexec at ffffffff8103915b #1 [ffff880315645798] crash_kexec at ffffffff810c5e62 #2 [ffff880315645868] panic at ffffffff815280aa #3 [ffff8803156458e8] lbug_with_loc at ffffffffa03a5eeb [libcfs] #4 [ffff880315645908] lod_ah_init at ffffffffa0cee9ef [lod] #5 [ffff880315645968] mdd_object_make_hint at ffffffffa082ea83 [mdd] #6 [ffff880315645998] mdd_create_data at ffffffffa083aeb2 [mdd] #7 [ffff8803156459f8] mdt_finish_open at ffffffffa0c4beac [mdt] #8 [ffff880315645a88] mdt_reint_open at ffffffffa0c4e046 [mdt] #9 [ffff880315645b78] mdt_reint_rec at ffffffffa0c38aa1 [mdt] #10 [ffff880315645b98] mdt_reint_internal at ffffffffa0c1dc73 [mdt] #11 [ffff880315645bd8] mdt_intent_reint at ffffffffa0c1e1fd [mdt] #12 [ffff880315645c28] mdt_intent_policy at ffffffffa0c1c0ae [mdt] #13 [ffff880315645c68] ldlm_lock_enqueue at ffffffffa06b4831 [ptlrpc] #14 [ffff880315645cc8] ldlm_handle_enqueue0 at ffffffffa06db1df [ptlrpc] #15 [ffff880315645d38] mdt_enqueue at ffffffffa0c1c536 [mdt] #16 [ffff880315645d58] mdt_handle_common at ffffffffa0c22c27 [mdt] #17 [ffff880315645da8] mds_regular_handle at ffffffffa0c5c835 [mdt] #18 [ffff880315645db8] ptlrpc_server_handle_request at ffffffffa070d3b8 [ptlrpc] #19 [ffff880315645eb8] ptlrpc_main at ffffffffa070e74e [ptlrpc] #20 [ffff880315645f48] kernel_thread at ffffffff8100c20a crash> log | tail -80 LustreError: 5428:0:(lod_object.c:704:lod_ah_init()) ASSERTION( lc->ldo_stripenr == 0 ) failed: LustreError: 5428:0:(lod_object.c:704:lod_ah_init()) LBUG Pid: 5428, comm: mdt01_001 Call Trace: [<ffffffffa03a5895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa03a5e97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa0cee9ef>] lod_ah_init+0x57f/0x5c0 [lod] [<ffffffffa082ea83>] mdd_object_make_hint+0x83/0xa0 [mdd] [<ffffffffa083aeb2>] mdd_create_data+0x332/0x7d0 [mdd] [<ffffffffa0c4beac>] mdt_finish_open+0x125c/0x1950 [mdt] [<ffffffffa0c47778>] ? mdt_object_open_lock+0x1c8/0x510 [mdt] [<ffffffffa0c4e046>] mdt_reint_open+0xfe6/0x21d0 [mdt] [<ffffffffa03c283e>] ? upcall_cache_get_entry+0x28e/0x860 [libcfs] [<ffffffffa06fcdbc>] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] [<ffffffffa0c38aa1>] mdt_reint_rec+0x41/0xe0 [mdt] [<ffffffffa0c1dc73>] mdt_reint_internal+0x4c3/0x780 [mdt] [<ffffffffa0c1e1fd>] mdt_intent_reint+0x1ed/0x520 [mdt] [<ffffffffa0c1c0ae>] mdt_intent_policy+0x39e/0x720 [mdt] [<ffffffffa06b4831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] [<ffffffffa06db1df>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] [<ffffffffa0c1c536>] mdt_enqueue+0x46/0xe0 [mdt] [<ffffffffa0c22c27>] mdt_handle_common+0x647/0x16d0 [mdt] [<ffffffffa06fdb9c>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] [<ffffffffa0c5c835>] mds_regular_handle+0x15/0x20 [mdt] [<ffffffffa070d3b8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] [<ffffffffa03a65de>] ? cfs_timer_arm+0xe/0x10 [libcfs] [<ffffffffa03b7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] [<ffffffffa0704719>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] [<ffffffff81058bd3>] ? __wake_up+0x53/0x70 [<ffffffffa070e74e>] ptlrpc_main+0xace/0x1700 [ptlrpc] [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [<ffffffff8100c200>] ? child_rip+0x0/0x20 Kernel panic - not syncing: LBUG Pid: 5428, comm: mdt01_001 Not tainted 2.6.32-431.1.2.el6.Bull.44.x86_64 #1 Call Trace: [<ffffffff815280a3>] ? panic+0xa7/0x16f [<ffffffffa03a5eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] [<ffffffffa0cee9ef>] ? lod_ah_init+0x57f/0x5c0 [lod] [<ffffffffa082ea83>] ? mdd_object_make_hint+0x83/0xa0 [mdd] [<ffffffffa083aeb2>] ? mdd_create_data+0x332/0x7d0 [mdd] [<ffffffffa0c4beac>] ? mdt_finish_open+0x125c/0x1950 [mdt] [<ffffffffa0c47778>] ? mdt_object_open_lock+0x1c8/0x510 [mdt] [<ffffffffa0c4e046>] ? mdt_reint_open+0xfe6/0x21d0 [mdt] [<ffffffffa03c283e>] ? upcall_cache_get_entry+0x28e/0x860 [libcfs] [<ffffffffa06fcdbc>] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] [<ffffffffa0c38aa1>] ? mdt_reint_rec+0x41/0xe0 [mdt] [<ffffffffa0c1dc73>] ? mdt_reint_internal+0x4c3/0x780 [mdt] [<ffffffffa0c1e1fd>] ? mdt_intent_reint+0x1ed/0x520 [mdt] [<ffffffffa0c1c0ae>] ? mdt_intent_policy+0x39e/0x720 [mdt] [<ffffffffa06b4831>] ? ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] [<ffffffffa06db1df>] ? ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] [<ffffffffa0c1c536>] ? mdt_enqueue+0x46/0xe0 [mdt] [<ffffffffa0c22c27>] ? mdt_handle_common+0x647/0x16d0 [mdt] [<ffffffffa06fdb9c>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] [<ffffffffa0c5c835>] ? mds_regular_handle+0x15/0x20 [mdt] [<ffffffffa070d3b8>] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] [<ffffffffa03a65de>] ? cfs_timer_arm+0xe/0x10 [libcfs] [<ffffffffa03b7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] [<ffffffffa0704719>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] [<ffffffff81058bd3>] ? __wake_up+0x53/0x70 [<ffffffffa070e74e>] ? ptlrpc_main+0xace/0x1700 [ptlrpc] [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [<ffffffff8100c20a>] ? child_rip+0xa/0x20 [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [<ffffffff8100c200>] ? child_rip+0x0/0x20 crash> After MGT, MDT and OSTs are mounted, the hanged "echo" command on client ends, and the file content is correct: # cat /fs_pv/170_stripe_file Hello The content of dmesg on client is then the following: Lustre: 4376:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1395328613/real 1395328613] req@ffff8801b95afc00 x1463099919122744/t0(0) o101->fs_pv-MDT0000-mdc-ffff8801bd642400@10.1.0.15@o2ib:12/10 lens 584/1136 e 0 to 1 dl 1395328620 ref 2 fl Rpc:XP/0/ffffffff rc 0/-1 Lustre: 4376:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 1480 previous similar messages Lustre: fs_pv-MDT0000-mdc-ffff8801bd642400: Connection to fs_pv-MDT0000 (at 10.1.0.15@o2ib) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 41 previous similar messages Lustre: fs_pv-OST00a9-osc-ffff8801bd642400: Connection to fs_pv-OST00a9 (at 10.1.0.15@o2ib) was lost; in progress operations using this service will wait for recovery to complete LustreError: 166-1: MGC10.1.0.15@o2ib: Connection to MGS (at 10.1.0.15@o2ib) was lost; in progress operations using this service will fail Lustre: Skipped 169 previous similar messages LNetError: 6581:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 8 seconds LNetError: 6581:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Timed out RDMA with 10.1.0.15@o2ib (58): c: 0, oc: 0, rc: 8 Lustre: Evicted from MGS (at 10.1.0.15@o2ib) after server handle changed from 0x4ece3e4b34440eb4 to 0x221c3affca3337c9 Lustre: MGC10.1.0.15@o2ib: Connection restored to MGS (at 10.1.0.15@o2ib) Lustre: Skipped 11 previous similar messages Lustre: fs_pv-OST0002-osc-ffff8801bd642400: Connection restored to fs_pv-OST0002 (at 10.1.0.15@o2ib) Lustre: fs_pv-OST000c-osc-ffff8801bd642400: Connection restored to fs_pv-OST000c (at 10.1.0.15@o2ib) Lustre: Skipped 90 previous similar messages As this ASSERT is the same as described in After adding another call to lod_object_free_striping() a few lines below in the same routine, if dt_xattr_set() fails, this fixes the problem. Perhaps the next step would be to modify "lfs setstripe" so that it set the stripe count to 160 when the requested value is larger and "ea_inode" is no set. The change we added to --- a/lustre/lod/lod_lov.c
+++ b/lustre/lod/lod_lov.c
@@ -562,6 +562,8 @@ int lod_generate_and_set_lovea(const str
info->lti_buf.lb_len = lmm_size;
rc = dt_xattr_set(env, next, &info->lti_buf, XATTR_NAME_LOV, 0,
th, BYPASS_CAPA);
+ if (rc < 0)
+ lod_object_free_striping(env, lo);
RETURN(rc);
}
|
| Comments |
| Comment by Peter Jones [ 20/Mar/14 ] |
|
Di is looking into this one |
| Comment by Di Wang [ 21/Mar/14 ] |
|
Patrick, Thanks for the analyse, which does make sense to me. Will you post a patch on review? Thanks. |
| Comment by Jodi Levi (Inactive) [ 21/Mar/14 ] |
|
Is Master affected by this as well? |
| Comment by Antoine Percher [ 24/Mar/14 ] |
|
The thing that I don't understand is why the MDT try to create a file |
| Comment by Di Wang [ 27/Mar/14 ] |
|
Jodi: I did not try this on master, but according to the code, the problem should exist on master as well. Antonie: I am not sure what you mean? But according to the description "without "wide striping" enabled", you should expect some kind of errors if creating more than 160 stripes. Though ENOSPC might be a bit confused in this case. So I thought this ticket is for resolving the Crash? Please correct me, if I misunderstood. Thanks. |
| Comment by Di Wang [ 28/Mar/14 ] |
|
master http://review.whamcloud.com/#/c/9835/ |
| Comment by Antoine Percher [ 28/Mar/14 ] |
|
Di, I just want to said without "wide striping" enabled" when you try to creat a file with the max stripe with ""setstripe -c -1 /fs_pv/170_stripe_file"" the mdt does not assume that the max is 160 and for me that |
| Comment by Antoine Percher [ 28/Mar/14 ] |
|
I can reformulate like this : |
| Comment by Di Wang [ 28/Mar/14 ] |
|
Antoine: Ah, I see what you mean. sure I will update the patch. Thanks! |
| Comment by Di Wang [ 28/Mar/14 ] |
|
It turns out we did not consider the overhead of xattr, I just update the patch. BTW: if you do not enable "wide striping", the max stripe is 165 for now. |
| Comment by James A Simmons [ 31/Mar/14 ] |
|
This might fix |
| Comment by James Nunez (Inactive) [ 17/Apr/14 ] |
|
Patch landed to master |
| Comment by Peter Jones [ 17/Apr/14 ] |
|
Landed for 2.6. Will track landing for b2_4 and b2_5 separately |
| Comment by Aurelien Degremont (Inactive) [ 18/Apr/14 ] |
|
Peter, could you point where this tracking will be done? Another tickets? Which ones? |
| Comment by Peter Jones [ 18/Apr/14 ] |
|
I do this outside of JIRA |
| Comment by James Nunez (Inactive) [ 18/Apr/14 ] |
|
Patch for b2_5 at http://review.whamcloud.com/#/c/10020/ |
| Comment by Jay Lan (Inactive) [ 18/Apr/14 ] |
|
Does this patch supersede the patch in |
| Comment by Di Wang [ 18/Apr/14 ] |
|
no, I believe you need both. |
| Comment by James A Simmons [ 06/May/14 ] |
|
This patch resolved |
| Comment by Bob Glossman (Inactive) [ 08/May/14 ] |
|
backport to b2_4: |
| Comment by Ryan Haasken [ 21/May/14 ] |
|
There are two patches against b2_4 linked in this ticket. I believe that this b2_4 patch should be abandoned: http://review.whamcloud.com/#/c/9837 because it includes this fix for Then this would be the correct b2_4 fix for this ticket: http://review.whamcloud.com/10267 Is that right? |
| Comment by Patrick Valentin (Inactive) [ 21/May/14 ] |
|
Yes http://review.whamcloud.com/#/c/9837 must be abandoned. |