[LU-4791] lod_ah_init() ASSERTION( lc->ldo_stripenr == 0 ) failed: Created: 20/Mar/14  Updated: 21/May/14  Resolved: 17/Apr/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0, Lustre 2.4.2
Fix Version/s: Lustre 2.6.0, Lustre 2.5.2

Type: Bug Priority: Critical
Reporter: Patrick Valentin (Inactive) Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: mn4

Issue Links:
Related
is related to LU-2963 fail to create large stripe count fil... Resolved
is related to LU-5027 Needs rhel 6.5 support in b2_4 Resolved
Severity: 3
Rank (Obsolete): 13190

 Description   

Lustre: 2.4.2
kernel: 2.6.32-431.1.2
configuration to reproduce: 2 nodes

  • first node: MGT + MDT + 170 OSTs (loop devices)
  • second node: client

On a file system with 170 OSTs but without "wide striping" enabled (ea_inode not set on MDT), issuing an "lctl setstripe -1 <file>" command and then writing in this file causes a MDS crash.
lctl fails with ENOSPC, but the file is created

# lfs setstripe -c -1 /fs_pv/170_stripe_file
error on ioctl 0x4008669a for '/fs_pv/170_stripe_file' (3): No space left on device
error: setstripe: create stripe file '/fs_pv/170_stripe_file' failed
# ls -l /fs_pv/
total 0
-rw-r--r-- 1 root root 0 Mar 20 14:17 170_stripe_file

after "lfs setstripe" command, the dmesg content on MDS is the following:

# dmesg
Lustre: 11776:0:(osd_handler.c:833:osd_trans_start()) fs_pv-MDT0000: too many transaction credits (2424 > 2048)
Lustre: 11776:0:(osd_handler.c:840:osd_trans_start())   create: 0/0, delete: 0/0, destroy: 0/0
Lustre: 11776:0:(osd_handler.c:845:osd_trans_start())   attr_set: 0/0, xattr_set: 2/28
Lustre: 11776:0:(osd_handler.c:852:osd_trans_start())   write: 171/2394, punch: 0/0, quota 2/2
Lustre: 11776:0:(osd_handler.c:857:osd_trans_start())   insert: 0/0, delete: 0/0
Lustre: 11776:0:(osd_handler.c:862:osd_trans_start())   ref_add: 0/0, ref_del: 0/0
Pid: 11776, comm: mdt01_005

Call Trace:
 [<ffffffffa03a5895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa0bb131e>] osd_trans_start+0x65e/0x680 [osd_ldiskfs]
 [<ffffffffa0cd8309>] lod_trans_start+0x1b9/0x250 [lod]
 [<ffffffffa084b357>] mdd_trans_start+0x17/0x20 [mdd]
 [<ffffffffa083b0b9>] mdd_create_data+0x539/0x7d0 [mdd]
 [<ffffffffa0c4beac>] mdt_finish_open+0x125c/0x1950 [mdt]
 [<ffffffffa0c47778>] ? mdt_object_open_lock+0x1c8/0x510 [mdt]
 [<ffffffffa0c4ca56>] mdt_open_by_fid_lock+0x4b6/0x7d0 [mdt]
 [<ffffffffa0c4d5cb>] mdt_reint_open+0x56b/0x21d0 [mdt]
 [<ffffffffa03c283e>] ? upcall_cache_get_entry+0x28e/0x860 [libcfs]
 [<ffffffffa06fcdbc>] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc]
 [<ffffffffa0592240>] ? lu_ucred+0x20/0x30 [obdclass]
 [<ffffffffa0c18015>] ? mdt_ucred+0x15/0x20 [mdt]
 [<ffffffffa0c342ec>] ? mdt_root_squash+0x2c/0x410 [mdt]
 [<ffffffffa0724636>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
 [<ffffffffa0592240>] ? lu_ucred+0x20/0x30 [obdclass]
 [<ffffffffa0c38aa1>] mdt_reint_rec+0x41/0xe0 [mdt]
 [<ffffffffa0c1dc73>] mdt_reint_internal+0x4c3/0x780 [mdt]
 [<ffffffffa0c1e1fd>] mdt_intent_reint+0x1ed/0x520 [mdt]
 [<ffffffffa0c1c0ae>] mdt_intent_policy+0x39e/0x720 [mdt]
 [<ffffffffa06b4831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
 [<ffffffffa06db1df>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
 [<ffffffffa0c1c536>] mdt_enqueue+0x46/0xe0 [mdt]
 [<ffffffffa0c22c27>] mdt_handle_common+0x647/0x16d0 [mdt]
 [<ffffffffa06fdb9c>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
 [<ffffffffa0c5c835>] mds_regular_handle+0x15/0x20 [mdt]
 [<ffffffffa070d3b8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
 [<ffffffffa03a65de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
 [<ffffffffa03b7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
 [<ffffffffa0704719>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
 [<ffffffff81058bd3>] ? __wake_up+0x53/0x70
 [<ffffffffa070e74e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
 [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffff8100c200>] ? child_rip+0x0/0x20

When trying to write to the file, the write command hangs and the MDS crashes:

# echo Hello > /fs_pv/170_stripe_file 

After MDS is restarted, the crash trace is the following:

crash> bt
PID: 5428   TASK: ffff88031ae66ac0  CPU: 5   COMMAND: "mdt01_001"
 #0 [ffff880315645738] machine_kexec at ffffffff8103915b
 #1 [ffff880315645798] crash_kexec at ffffffff810c5e62
 #2 [ffff880315645868] panic at ffffffff815280aa
 #3 [ffff8803156458e8] lbug_with_loc at ffffffffa03a5eeb [libcfs]
 #4 [ffff880315645908] lod_ah_init at ffffffffa0cee9ef [lod]
 #5 [ffff880315645968] mdd_object_make_hint at ffffffffa082ea83 [mdd]
 #6 [ffff880315645998] mdd_create_data at ffffffffa083aeb2 [mdd]
 #7 [ffff8803156459f8] mdt_finish_open at ffffffffa0c4beac [mdt]
 #8 [ffff880315645a88] mdt_reint_open at ffffffffa0c4e046 [mdt]
 #9 [ffff880315645b78] mdt_reint_rec at ffffffffa0c38aa1 [mdt]
#10 [ffff880315645b98] mdt_reint_internal at ffffffffa0c1dc73 [mdt]
#11 [ffff880315645bd8] mdt_intent_reint at ffffffffa0c1e1fd [mdt]
#12 [ffff880315645c28] mdt_intent_policy at ffffffffa0c1c0ae [mdt]
#13 [ffff880315645c68] ldlm_lock_enqueue at ffffffffa06b4831 [ptlrpc]
#14 [ffff880315645cc8] ldlm_handle_enqueue0 at ffffffffa06db1df [ptlrpc]
#15 [ffff880315645d38] mdt_enqueue at ffffffffa0c1c536 [mdt]
#16 [ffff880315645d58] mdt_handle_common at ffffffffa0c22c27 [mdt]
#17 [ffff880315645da8] mds_regular_handle at ffffffffa0c5c835 [mdt]
#18 [ffff880315645db8] ptlrpc_server_handle_request at ffffffffa070d3b8 [ptlrpc]
#19 [ffff880315645eb8] ptlrpc_main at ffffffffa070e74e [ptlrpc]
#20 [ffff880315645f48] kernel_thread at ffffffff8100c20a

crash> log | tail -80

LustreError: 5428:0:(lod_object.c:704:lod_ah_init()) ASSERTION( lc->ldo_stripenr == 0 ) failed: 
LustreError: 5428:0:(lod_object.c:704:lod_ah_init()) LBUG
Pid: 5428, comm: mdt01_001

Call Trace:
 [<ffffffffa03a5895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa03a5e97>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa0cee9ef>] lod_ah_init+0x57f/0x5c0 [lod]
 [<ffffffffa082ea83>] mdd_object_make_hint+0x83/0xa0 [mdd]
 [<ffffffffa083aeb2>] mdd_create_data+0x332/0x7d0 [mdd]
 [<ffffffffa0c4beac>] mdt_finish_open+0x125c/0x1950 [mdt]
 [<ffffffffa0c47778>] ? mdt_object_open_lock+0x1c8/0x510 [mdt]
 [<ffffffffa0c4e046>] mdt_reint_open+0xfe6/0x21d0 [mdt]
 [<ffffffffa03c283e>] ? upcall_cache_get_entry+0x28e/0x860 [libcfs]
 [<ffffffffa06fcdbc>] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc]
 [<ffffffffa0c38aa1>] mdt_reint_rec+0x41/0xe0 [mdt]
 [<ffffffffa0c1dc73>] mdt_reint_internal+0x4c3/0x780 [mdt]
 [<ffffffffa0c1e1fd>] mdt_intent_reint+0x1ed/0x520 [mdt]
 [<ffffffffa0c1c0ae>] mdt_intent_policy+0x39e/0x720 [mdt]
 [<ffffffffa06b4831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
 [<ffffffffa06db1df>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
 [<ffffffffa0c1c536>] mdt_enqueue+0x46/0xe0 [mdt]
 [<ffffffffa0c22c27>] mdt_handle_common+0x647/0x16d0 [mdt]
 [<ffffffffa06fdb9c>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
 [<ffffffffa0c5c835>] mds_regular_handle+0x15/0x20 [mdt]
 [<ffffffffa070d3b8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
 [<ffffffffa03a65de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
 [<ffffffffa03b7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
 [<ffffffffa0704719>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
 [<ffffffff81058bd3>] ? __wake_up+0x53/0x70
 [<ffffffffa070e74e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
 [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffff8100c200>] ? child_rip+0x0/0x20

Kernel panic - not syncing: LBUG
Pid: 5428, comm: mdt01_001 Not tainted 2.6.32-431.1.2.el6.Bull.44.x86_64 #1
Call Trace:
 [<ffffffff815280a3>] ? panic+0xa7/0x16f
 [<ffffffffa03a5eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
 [<ffffffffa0cee9ef>] ? lod_ah_init+0x57f/0x5c0 [lod]
 [<ffffffffa082ea83>] ? mdd_object_make_hint+0x83/0xa0 [mdd]
 [<ffffffffa083aeb2>] ? mdd_create_data+0x332/0x7d0 [mdd]
 [<ffffffffa0c4beac>] ? mdt_finish_open+0x125c/0x1950 [mdt]
 [<ffffffffa0c47778>] ? mdt_object_open_lock+0x1c8/0x510 [mdt]
 [<ffffffffa0c4e046>] ? mdt_reint_open+0xfe6/0x21d0 [mdt]
 [<ffffffffa03c283e>] ? upcall_cache_get_entry+0x28e/0x860 [libcfs]
 [<ffffffffa06fcdbc>] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc]
 [<ffffffffa0c38aa1>] ? mdt_reint_rec+0x41/0xe0 [mdt]
 [<ffffffffa0c1dc73>] ? mdt_reint_internal+0x4c3/0x780 [mdt]
 [<ffffffffa0c1e1fd>] ? mdt_intent_reint+0x1ed/0x520 [mdt]
 [<ffffffffa0c1c0ae>] ? mdt_intent_policy+0x39e/0x720 [mdt]
 [<ffffffffa06b4831>] ? ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
 [<ffffffffa06db1df>] ? ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
 [<ffffffffa0c1c536>] ? mdt_enqueue+0x46/0xe0 [mdt]
 [<ffffffffa0c22c27>] ? mdt_handle_common+0x647/0x16d0 [mdt]
 [<ffffffffa06fdb9c>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
 [<ffffffffa0c5c835>] ? mds_regular_handle+0x15/0x20 [mdt]
 [<ffffffffa070d3b8>] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
 [<ffffffffa03a65de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
 [<ffffffffa03b7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
 [<ffffffffa0704719>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
 [<ffffffff81058bd3>] ? __wake_up+0x53/0x70
 [<ffffffffa070e74e>] ? ptlrpc_main+0xace/0x1700 [ptlrpc]
 [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffff8100c20a>] ? child_rip+0xa/0x20
 [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
crash> 

After MGT, MDT and OSTs are mounted, the hanged "echo" command on client ends, and the file content is correct:

# cat /fs_pv/170_stripe_file
Hello

The content of dmesg on client is then the following:

Lustre: 4376:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1395328613/real 1395328613]  req@ffff8801b95afc00 x1463099919122744/t0(0) o101->fs_pv-MDT0000-mdc-ffff8801bd642400@10.1.0.15@o2ib:12/10 lens 584/1136 e 0 to 1 dl 1395328620 ref 2 fl Rpc:XP/0/ffffffff rc 0/-1
Lustre: 4376:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 1480 previous similar messages
Lustre: fs_pv-MDT0000-mdc-ffff8801bd642400: Connection to fs_pv-MDT0000 (at 10.1.0.15@o2ib) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 41 previous similar messages
Lustre: fs_pv-OST00a9-osc-ffff8801bd642400: Connection to fs_pv-OST00a9 (at 10.1.0.15@o2ib) was lost; in progress operations using this service will wait for recovery to complete
LustreError: 166-1: MGC10.1.0.15@o2ib: Connection to MGS (at 10.1.0.15@o2ib) was lost; in progress operations using this service will fail
Lustre: Skipped 169 previous similar messages
LNetError: 6581:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 8 seconds
LNetError: 6581:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Timed out RDMA with 10.1.0.15@o2ib (58): c: 0, oc: 0, rc: 8
Lustre: Evicted from MGS (at 10.1.0.15@o2ib) after server handle changed from 0x4ece3e4b34440eb4 to 0x221c3affca3337c9
Lustre: MGC10.1.0.15@o2ib: Connection restored to MGS (at 10.1.0.15@o2ib)
Lustre: Skipped 11 previous similar messages
Lustre: fs_pv-OST0002-osc-ffff8801bd642400: Connection restored to fs_pv-OST0002 (at 10.1.0.15@o2ib)
Lustre: fs_pv-OST000c-osc-ffff8801bd642400: Connection restored to fs_pv-OST000c (at 10.1.0.15@o2ib)
Lustre: Skipped 90 previous similar messages

As this ASSERT is the same as described in LU-4260, we backported the corresponding patch on lustre 2.4.2, but it does not fix the problem.
This patch adds a call to lod_object_free_striping() if lod_fld_lookup() fails in lod_generate_and_set_lovea().

After adding another call to lod_object_free_striping() a few lines below in the same routine, if dt_xattr_set() fails, this fixes the problem.
There is still the "No space left on device" error message when runnig "lfs setstripe -c -1", but the MDS no longer crashes.

Perhaps the next step would be to modify "lfs setstripe" so that it set the stripe count to 160 when the requested value is larger and "ea_inode" is no set.
But the additional call to lod_object_free_striping() should be kept, as dt_xattr_set() could fail for any other reason.

The change we added to LU-4260 is the following:

--- a/lustre/lod/lod_lov.c
+++ b/lustre/lod/lod_lov.c
@@ -562,6 +562,8 @@ int lod_generate_and_set_lovea(const str
        info->lti_buf.lb_len = lmm_size;
        rc = dt_xattr_set(env, next, &info->lti_buf, XATTR_NAME_LOV, 0,
                          th, BYPASS_CAPA);
+       if (rc < 0)
+               lod_object_free_striping(env, lo);

        RETURN(rc);
 }


 Comments   
Comment by Peter Jones [ 20/Mar/14 ]

Di is looking into this one

Comment by Di Wang [ 21/Mar/14 ]

Patrick, Thanks for the analyse, which does make sense to me. Will you post a patch on review? Thanks.

Comment by Jodi Levi (Inactive) [ 21/Mar/14 ]

Is Master affected by this as well?

Comment by Antoine Percher [ 24/Mar/14 ]

The thing that I don't understand is why the MDT try to create a file
with more than 160 ost while the max in this case is 160.
The proposal fix is just a WA to clear lo struct, a good fix
will be that MDT can creat a file with 160 ost without ENOSPC error

Comment by Di Wang [ 27/Mar/14 ]

Jodi: I did not try this on master, but according to the code, the problem should exist on master as well.

Antonie: I am not sure what you mean? But according to the description "without "wide striping" enabled", you should expect some kind of errors if creating more than 160 stripes. Though ENOSPC might be a bit confused in this case. So I thought this ticket is for resolving the Crash? Please correct me, if I misunderstood. Thanks.

Comment by Di Wang [ 28/Mar/14 ]

master http://review.whamcloud.com/#/c/9835/
b2_4 http://review.whamcloud.com/#/c/9837

Comment by Antoine Percher [ 28/Mar/14 ]

Di, I just want to said without "wide striping" enabled" when you try to creat a file with the max stripe with ""setstripe -c -1 /fs_pv/170_stripe_file"" the mdt does not assume that the max is 160 and for me that
is the main issue.

Comment by Antoine Percher [ 28/Mar/14 ]

I can reformulate like this :
When we run lfs setstripe -c -1 /fs_pv/stripe_file with your patch and without "wide striping" enabled
is lfs return E2BIG or creat a file with around 160 stripes ?
If "lfs return E2BIG" then I said that is the main issue
In lustre 2.1.x the max-stripe was MAX(NbOST,160)
and now in 2.4.x the max-stripe should be
if (wide striping enabled) then Max(Nbost,max stripe on 1m)
else MAX(NbOST,160)

Comment by Di Wang [ 28/Mar/14 ]

Antoine: Ah, I see what you mean. sure I will update the patch. Thanks!

Comment by Di Wang [ 28/Mar/14 ]

It turns out we did not consider the overhead of xattr, I just update the patch. BTW: if you do not enable "wide striping", the max stripe is 165 for now.

Comment by James A Simmons [ 31/Mar/14 ]

This might fix LU-2963 that we have experienced at ORNL.

Comment by James Nunez (Inactive) [ 17/Apr/14 ]

Patch landed to master

Comment by Peter Jones [ 17/Apr/14 ]

Landed for 2.6. Will track landing for b2_4 and b2_5 separately

Comment by Aurelien Degremont (Inactive) [ 18/Apr/14 ]

Peter, could you point where this tracking will be done? Another tickets? Which ones?

Comment by Peter Jones [ 18/Apr/14 ]

I do this outside of JIRA

Comment by James Nunez (Inactive) [ 18/Apr/14 ]

Patch for b2_5 at http://review.whamcloud.com/#/c/10020/

Comment by Jay Lan (Inactive) [ 18/Apr/14 ]

Does this patch supersede the patch in LU-4260
"ASSERTION( lc->ldo_stripenr == 0 ) failed: "?
I found the one line patch is included in the patch of this LU.

Comment by Di Wang [ 18/Apr/14 ]

no, I believe you need both.

Comment by James A Simmons [ 06/May/14 ]

This patch resolved LU-2963. I asked for LU-2963 to be closed. Could you link that ticket to this one. Thank you.

Comment by Bob Glossman (Inactive) [ 08/May/14 ]

backport to b2_4:
http://review.whamcloud.com/10267

Comment by Ryan Haasken [ 21/May/14 ]

There are two patches against b2_4 linked in this ticket. I believe that this b2_4 patch should be abandoned: http://review.whamcloud.com/#/c/9837

because it includes this fix for LU-4260: http://review.whamcloud.com/#/c/8325/

Then this would be the correct b2_4 fix for this ticket: http://review.whamcloud.com/10267

Is that right?

Comment by Patrick Valentin (Inactive) [ 21/May/14 ]

Yes http://review.whamcloud.com/#/c/9837 must be abandoned.
And both LU-4260: http://review.whamcloud.com/#/c/8325 and LU-4791: http://review.whamcloud.com/10267 must be landed in b2_4.

Generated at Sat Feb 10 01:45:53 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.