Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.6.0, Lustre 2.5.2
Affects Version/s: Lustre 2.6.0, Lustre 2.4.2
Labels:
- mn4

Severity:
3
Rank (Obsolete):
13190

Description

Lustre: 2.4.2
kernel: 2.6.32-431.1.2
configuration to reproduce: 2 nodes

first node: MGT + MDT + 170 OSTs (loop devices)
second node: client

On a file system with 170 OSTs but without "wide striping" enabled (ea_inode not set on MDT), issuing an "lctl setstripe -1 <file>" command and then writing in this file causes a MDS crash.
lctl fails with ENOSPC, but the file is created

# lfs setstripe -c -1 /fs_pv/170_stripe_file
error on ioctl 0x4008669a for '/fs_pv/170_stripe_file' (3): No space left on device
error: setstripe: create stripe file '/fs_pv/170_stripe_file' failed
# ls -l /fs_pv/
total 0
-rw-r--r-- 1 root root 0 Mar 20 14:17 170_stripe_file

after "lfs setstripe" command, the dmesg content on MDS is the following:

# dmesg
Lustre: 11776:0:(osd_handler.c:833:osd_trans_start()) fs_pv-MDT0000: too many transaction credits (2424 > 2048)
Lustre: 11776:0:(osd_handler.c:840:osd_trans_start())   create: 0/0, delete: 0/0, destroy: 0/0
Lustre: 11776:0:(osd_handler.c:845:osd_trans_start())   attr_set: 0/0, xattr_set: 2/28
Lustre: 11776:0:(osd_handler.c:852:osd_trans_start())   write: 171/2394, punch: 0/0, quota 2/2
Lustre: 11776:0:(osd_handler.c:857:osd_trans_start())   insert: 0/0, delete: 0/0
Lustre: 11776:0:(osd_handler.c:862:osd_trans_start())   ref_add: 0/0, ref_del: 0/0
Pid: 11776, comm: mdt01_005

Call Trace:
 [<ffffffffa03a5895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa0bb131e>] osd_trans_start+0x65e/0x680 [osd_ldiskfs]
 [<ffffffffa0cd8309>] lod_trans_start+0x1b9/0x250 [lod]
 [<ffffffffa084b357>] mdd_trans_start+0x17/0x20 [mdd]
 [<ffffffffa083b0b9>] mdd_create_data+0x539/0x7d0 [mdd]
 [<ffffffffa0c4beac>] mdt_finish_open+0x125c/0x1950 [mdt]
 [<ffffffffa0c47778>] ? mdt_object_open_lock+0x1c8/0x510 [mdt]
 [<ffffffffa0c4ca56>] mdt_open_by_fid_lock+0x4b6/0x7d0 [mdt]
 [<ffffffffa0c4d5cb>] mdt_reint_open+0x56b/0x21d0 [mdt]
 [<ffffffffa03c283e>] ? upcall_cache_get_entry+0x28e/0x860 [libcfs]
 [<ffffffffa06fcdbc>] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc]
 [<ffffffffa0592240>] ? lu_ucred+0x20/0x30 [obdclass]
 [<ffffffffa0c18015>] ? mdt_ucred+0x15/0x20 [mdt]
 [<ffffffffa0c342ec>] ? mdt_root_squash+0x2c/0x410 [mdt]
 [<ffffffffa0724636>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
 [<ffffffffa0592240>] ? lu_ucred+0x20/0x30 [obdclass]
 [<ffffffffa0c38aa1>] mdt_reint_rec+0x41/0xe0 [mdt]
 [<ffffffffa0c1dc73>] mdt_reint_internal+0x4c3/0x780 [mdt]
 [<ffffffffa0c1e1fd>] mdt_intent_reint+0x1ed/0x520 [mdt]
 [<ffffffffa0c1c0ae>] mdt_intent_policy+0x39e/0x720 [mdt]
 [<ffffffffa06b4831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
 [<ffffffffa06db1df>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
 [<ffffffffa0c1c536>] mdt_enqueue+0x46/0xe0 [mdt]
 [<ffffffffa0c22c27>] mdt_handle_common+0x647/0x16d0 [mdt]
 [<ffffffffa06fdb9c>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
 [<ffffffffa0c5c835>] mds_regular_handle+0x15/0x20 [mdt]
 [<ffffffffa070d3b8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
 [<ffffffffa03a65de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
 [<ffffffffa03b7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
 [<ffffffffa0704719>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
 [<ffffffff81058bd3>] ? __wake_up+0x53/0x70
 [<ffffffffa070e74e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
 [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffff8100c200>] ? child_rip+0x0/0x20

When trying to write to the file, the write command hangs and the MDS crashes:

# echo Hello > /fs_pv/170_stripe_file

After MDS is restarted, the crash trace is the following:

crash> bt
PID: 5428   TASK: ffff88031ae66ac0  CPU: 5   COMMAND: "mdt01_001"
 #0 [ffff880315645738] machine_kexec at ffffffff8103915b
 #1 [ffff880315645798] crash_kexec at ffffffff810c5e62
 #2 [ffff880315645868] panic at ffffffff815280aa
 #3 [ffff8803156458e8] lbug_with_loc at ffffffffa03a5eeb [libcfs]
 #4 [ffff880315645908] lod_ah_init at ffffffffa0cee9ef [lod]
 #5 [ffff880315645968] mdd_object_make_hint at ffffffffa082ea83 [mdd]
 #6 [ffff880315645998] mdd_create_data at ffffffffa083aeb2 [mdd]
 #7 [ffff8803156459f8] mdt_finish_open at ffffffffa0c4beac [mdt]
 #8 [ffff880315645a88] mdt_reint_open at ffffffffa0c4e046 [mdt]
 #9 [ffff880315645b78] mdt_reint_rec at ffffffffa0c38aa1 [mdt]
#10 [ffff880315645b98] mdt_reint_internal at ffffffffa0c1dc73 [mdt]
#11 [ffff880315645bd8] mdt_intent_reint at ffffffffa0c1e1fd [mdt]
#12 [ffff880315645c28] mdt_intent_policy at ffffffffa0c1c0ae [mdt]
#13 [ffff880315645c68] ldlm_lock_enqueue at ffffffffa06b4831 [ptlrpc]
#14 [ffff880315645cc8] ldlm_handle_enqueue0 at ffffffffa06db1df [ptlrpc]
#15 [ffff880315645d38] mdt_enqueue at ffffffffa0c1c536 [mdt]
#16 [ffff880315645d58] mdt_handle_common at ffffffffa0c22c27 [mdt]
#17 [ffff880315645da8] mds_regular_handle at ffffffffa0c5c835 [mdt]
#18 [ffff880315645db8] ptlrpc_server_handle_request at ffffffffa070d3b8 [ptlrpc]
#19 [ffff880315645eb8] ptlrpc_main at ffffffffa070e74e [ptlrpc]
#20 [ffff880315645f48] kernel_thread at ffffffff8100c20a

crash> log | tail -80

LustreError: 5428:0:(lod_object.c:704:lod_ah_init()) ASSERTION( lc->ldo_stripenr == 0 ) failed: 
LustreError: 5428:0:(lod_object.c:704:lod_ah_init()) LBUG
Pid: 5428, comm: mdt01_001

Call Trace:
 [<ffffffffa03a5895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa03a5e97>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa0cee9ef>] lod_ah_init+0x57f/0x5c0 [lod]
 [<ffffffffa082ea83>] mdd_object_make_hint+0x83/0xa0 [mdd]
 [<ffffffffa083aeb2>] mdd_create_data+0x332/0x7d0 [mdd]
 [<ffffffffa0c4beac>] mdt_finish_open+0x125c/0x1950 [mdt]
 [<ffffffffa0c47778>] ? mdt_object_open_lock+0x1c8/0x510 [mdt]
 [<ffffffffa0c4e046>] mdt_reint_open+0xfe6/0x21d0 [mdt]
 [<ffffffffa03c283e>] ? upcall_cache_get_entry+0x28e/0x860 [libcfs]
 [<ffffffffa06fcdbc>] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc]
 [<ffffffffa0c38aa1>] mdt_reint_rec+0x41/0xe0 [mdt]
 [<ffffffffa0c1dc73>] mdt_reint_internal+0x4c3/0x780 [mdt]
 [<ffffffffa0c1e1fd>] mdt_intent_reint+0x1ed/0x520 [mdt]
 [<ffffffffa0c1c0ae>] mdt_intent_policy+0x39e/0x720 [mdt]
 [<ffffffffa06b4831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
 [<ffffffffa06db1df>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
 [<ffffffffa0c1c536>] mdt_enqueue+0x46/0xe0 [mdt]
 [<ffffffffa0c22c27>] mdt_handle_common+0x647/0x16d0 [mdt]
 [<ffffffffa06fdb9c>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
 [<ffffffffa0c5c835>] mds_regular_handle+0x15/0x20 [mdt]
 [<ffffffffa070d3b8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
 [<ffffffffa03a65de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
 [<ffffffffa03b7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
 [<ffffffffa0704719>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
 [<ffffffff81058bd3>] ? __wake_up+0x53/0x70
 [<ffffffffa070e74e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
 [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffff8100c200>] ? child_rip+0x0/0x20

Kernel panic - not syncing: LBUG
Pid: 5428, comm: mdt01_001 Not tainted 2.6.32-431.1.2.el6.Bull.44.x86_64 #1
Call Trace:
 [<ffffffff815280a3>] ? panic+0xa7/0x16f
 [<ffffffffa03a5eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
 [<ffffffffa0cee9ef>] ? lod_ah_init+0x57f/0x5c0 [lod]
 [<ffffffffa082ea83>] ? mdd_object_make_hint+0x83/0xa0 [mdd]
 [<ffffffffa083aeb2>] ? mdd_create_data+0x332/0x7d0 [mdd]
 [<ffffffffa0c4beac>] ? mdt_finish_open+0x125c/0x1950 [mdt]
 [<ffffffffa0c47778>] ? mdt_object_open_lock+0x1c8/0x510 [mdt]
 [<ffffffffa0c4e046>] ? mdt_reint_open+0xfe6/0x21d0 [mdt]
 [<ffffffffa03c283e>] ? upcall_cache_get_entry+0x28e/0x860 [libcfs]
 [<ffffffffa06fcdbc>] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc]
 [<ffffffffa0c38aa1>] ? mdt_reint_rec+0x41/0xe0 [mdt]
 [<ffffffffa0c1dc73>] ? mdt_reint_internal+0x4c3/0x780 [mdt]
 [<ffffffffa0c1e1fd>] ? mdt_intent_reint+0x1ed/0x520 [mdt]
 [<ffffffffa0c1c0ae>] ? mdt_intent_policy+0x39e/0x720 [mdt]
 [<ffffffffa06b4831>] ? ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
 [<ffffffffa06db1df>] ? ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
 [<ffffffffa0c1c536>] ? mdt_enqueue+0x46/0xe0 [mdt]
 [<ffffffffa0c22c27>] ? mdt_handle_common+0x647/0x16d0 [mdt]
 [<ffffffffa06fdb9c>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
 [<ffffffffa0c5c835>] ? mds_regular_handle+0x15/0x20 [mdt]
 [<ffffffffa070d3b8>] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
 [<ffffffffa03a65de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
 [<ffffffffa03b7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
 [<ffffffffa0704719>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
 [<ffffffff81058bd3>] ? __wake_up+0x53/0x70
 [<ffffffffa070e74e>] ? ptlrpc_main+0xace/0x1700 [ptlrpc]
 [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffff8100c20a>] ? child_rip+0xa/0x20
 [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
crash>

After MGT, MDT and OSTs are mounted, the hanged "echo" command on client ends, and the file content is correct:

# cat /fs_pv/170_stripe_file
Hello

The content of dmesg on client is then the following:

Lustre: 4376:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1395328613/real 1395328613]  req@ffff8801b95afc00 x1463099919122744/t0(0) o101->fs_pv-MDT0000-mdc-ffff8801bd642400@10.1.0.15@o2ib:12/10 lens 584/1136 e 0 to 1 dl 1395328620 ref 2 fl Rpc:XP/0/ffffffff rc 0/-1
Lustre: 4376:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 1480 previous similar messages
Lustre: fs_pv-MDT0000-mdc-ffff8801bd642400: Connection to fs_pv-MDT0000 (at 10.1.0.15@o2ib) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 41 previous similar messages
Lustre: fs_pv-OST00a9-osc-ffff8801bd642400: Connection to fs_pv-OST00a9 (at 10.1.0.15@o2ib) was lost; in progress operations using this service will wait for recovery to complete
LustreError: 166-1: MGC10.1.0.15@o2ib: Connection to MGS (at 10.1.0.15@o2ib) was lost; in progress operations using this service will fail
Lustre: Skipped 169 previous similar messages
LNetError: 6581:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 8 seconds
LNetError: 6581:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Timed out RDMA with 10.1.0.15@o2ib (58): c: 0, oc: 0, rc: 8
Lustre: Evicted from MGS (at 10.1.0.15@o2ib) after server handle changed from 0x4ece3e4b34440eb4 to 0x221c3affca3337c9
Lustre: MGC10.1.0.15@o2ib: Connection restored to MGS (at 10.1.0.15@o2ib)
Lustre: Skipped 11 previous similar messages
Lustre: fs_pv-OST0002-osc-ffff8801bd642400: Connection restored to fs_pv-OST0002 (at 10.1.0.15@o2ib)
Lustre: fs_pv-OST000c-osc-ffff8801bd642400: Connection restored to fs_pv-OST000c (at 10.1.0.15@o2ib)
Lustre: Skipped 90 previous similar messages

As this ASSERT is the same as described in ~~LU-4260~~, we backported the corresponding patch on lustre 2.4.2, but it does not fix the problem.
This patch adds a call to lod_object_free_striping() if lod_fld_lookup() fails in lod_generate_and_set_lovea().

After adding another call to lod_object_free_striping() a few lines below in the same routine, if dt_xattr_set() fails, this fixes the problem.
There is still the "No space left on device" error message when runnig "lfs setstripe -c -1", but the MDS no longer crashes.

Perhaps the next step would be to modify "lfs setstripe" so that it set the stripe count to 160 when the requested value is larger and "ea_inode" is no set.
But the additional call to lod_object_free_striping() should be kept, as dt_xattr_set() could fail for any other reason.

The change we added to ~~LU-4260~~ is the following:

--- a/lustre/lod/lod_lov.c
+++ b/lustre/lod/lod_lov.c
@@ -562,6 +562,8 @@ int lod_generate_and_set_lovea(const str
        info->lti_buf.lb_len = lmm_size;
        rc = dt_xattr_set(env, next, &info->lti_buf, XATTR_NAME_LOV, 0,
                          th, BYPASS_CAPA);
+       if (rc < 0)
+               lod_object_free_striping(env, lo);

        RETURN(rc);
 }

Attachments

Issue Links

is related to

LU-5027 Needs rhel 6.5 support in b2_4

Resolved

is related to

LU-2963 fail to create large stripe count file with -ENOSPC error

Resolved

Activity

People

Assignee:: Di Wang (Inactive)

Reporter:: Patrick Valentin (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 20/Mar/14 5:32 PM

Updated:: 21/May/14 5:07 PM

Resolved:: 17/Apr/14 7:43 PM