Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.6.0, Lustre 2.4.2
-
3
-
13190
Description
Lustre: 2.4.2
kernel: 2.6.32-431.1.2
configuration to reproduce: 2 nodes
- first node: MGT + MDT + 170 OSTs (loop devices)
- second node: client
On a file system with 170 OSTs but without "wide striping" enabled (ea_inode not set on MDT), issuing an "lctl setstripe -1 <file>" command and then writing in this file causes a MDS crash.
lctl fails with ENOSPC, but the file is created
# lfs setstripe -c -1 /fs_pv/170_stripe_file error on ioctl 0x4008669a for '/fs_pv/170_stripe_file' (3): No space left on device error: setstripe: create stripe file '/fs_pv/170_stripe_file' failed # ls -l /fs_pv/ total 0 -rw-r--r-- 1 root root 0 Mar 20 14:17 170_stripe_file
after "lfs setstripe" command, the dmesg content on MDS is the following:
# dmesg Lustre: 11776:0:(osd_handler.c:833:osd_trans_start()) fs_pv-MDT0000: too many transaction credits (2424 > 2048) Lustre: 11776:0:(osd_handler.c:840:osd_trans_start()) create: 0/0, delete: 0/0, destroy: 0/0 Lustre: 11776:0:(osd_handler.c:845:osd_trans_start()) attr_set: 0/0, xattr_set: 2/28 Lustre: 11776:0:(osd_handler.c:852:osd_trans_start()) write: 171/2394, punch: 0/0, quota 2/2 Lustre: 11776:0:(osd_handler.c:857:osd_trans_start()) insert: 0/0, delete: 0/0 Lustre: 11776:0:(osd_handler.c:862:osd_trans_start()) ref_add: 0/0, ref_del: 0/0 Pid: 11776, comm: mdt01_005 Call Trace: [<ffffffffa03a5895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0bb131e>] osd_trans_start+0x65e/0x680 [osd_ldiskfs] [<ffffffffa0cd8309>] lod_trans_start+0x1b9/0x250 [lod] [<ffffffffa084b357>] mdd_trans_start+0x17/0x20 [mdd] [<ffffffffa083b0b9>] mdd_create_data+0x539/0x7d0 [mdd] [<ffffffffa0c4beac>] mdt_finish_open+0x125c/0x1950 [mdt] [<ffffffffa0c47778>] ? mdt_object_open_lock+0x1c8/0x510 [mdt] [<ffffffffa0c4ca56>] mdt_open_by_fid_lock+0x4b6/0x7d0 [mdt] [<ffffffffa0c4d5cb>] mdt_reint_open+0x56b/0x21d0 [mdt] [<ffffffffa03c283e>] ? upcall_cache_get_entry+0x28e/0x860 [libcfs] [<ffffffffa06fcdbc>] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] [<ffffffffa0592240>] ? lu_ucred+0x20/0x30 [obdclass] [<ffffffffa0c18015>] ? mdt_ucred+0x15/0x20 [mdt] [<ffffffffa0c342ec>] ? mdt_root_squash+0x2c/0x410 [mdt] [<ffffffffa0724636>] ? __req_capsule_get+0x166/0x700 [ptlrpc] [<ffffffffa0592240>] ? lu_ucred+0x20/0x30 [obdclass] [<ffffffffa0c38aa1>] mdt_reint_rec+0x41/0xe0 [mdt] [<ffffffffa0c1dc73>] mdt_reint_internal+0x4c3/0x780 [mdt] [<ffffffffa0c1e1fd>] mdt_intent_reint+0x1ed/0x520 [mdt] [<ffffffffa0c1c0ae>] mdt_intent_policy+0x39e/0x720 [mdt] [<ffffffffa06b4831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] [<ffffffffa06db1df>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] [<ffffffffa0c1c536>] mdt_enqueue+0x46/0xe0 [mdt] [<ffffffffa0c22c27>] mdt_handle_common+0x647/0x16d0 [mdt] [<ffffffffa06fdb9c>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] [<ffffffffa0c5c835>] mds_regular_handle+0x15/0x20 [mdt] [<ffffffffa070d3b8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] [<ffffffffa03a65de>] ? cfs_timer_arm+0xe/0x10 [libcfs] [<ffffffffa03b7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] [<ffffffffa0704719>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] [<ffffffff81058bd3>] ? __wake_up+0x53/0x70 [<ffffffffa070e74e>] ptlrpc_main+0xace/0x1700 [ptlrpc] [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [<ffffffff8100c200>] ? child_rip+0x0/0x20
When trying to write to the file, the write command hangs and the MDS crashes:
# echo Hello > /fs_pv/170_stripe_file
After MDS is restarted, the crash trace is the following:
crash> bt PID: 5428 TASK: ffff88031ae66ac0 CPU: 5 COMMAND: "mdt01_001" #0 [ffff880315645738] machine_kexec at ffffffff8103915b #1 [ffff880315645798] crash_kexec at ffffffff810c5e62 #2 [ffff880315645868] panic at ffffffff815280aa #3 [ffff8803156458e8] lbug_with_loc at ffffffffa03a5eeb [libcfs] #4 [ffff880315645908] lod_ah_init at ffffffffa0cee9ef [lod] #5 [ffff880315645968] mdd_object_make_hint at ffffffffa082ea83 [mdd] #6 [ffff880315645998] mdd_create_data at ffffffffa083aeb2 [mdd] #7 [ffff8803156459f8] mdt_finish_open at ffffffffa0c4beac [mdt] #8 [ffff880315645a88] mdt_reint_open at ffffffffa0c4e046 [mdt] #9 [ffff880315645b78] mdt_reint_rec at ffffffffa0c38aa1 [mdt] #10 [ffff880315645b98] mdt_reint_internal at ffffffffa0c1dc73 [mdt] #11 [ffff880315645bd8] mdt_intent_reint at ffffffffa0c1e1fd [mdt] #12 [ffff880315645c28] mdt_intent_policy at ffffffffa0c1c0ae [mdt] #13 [ffff880315645c68] ldlm_lock_enqueue at ffffffffa06b4831 [ptlrpc] #14 [ffff880315645cc8] ldlm_handle_enqueue0 at ffffffffa06db1df [ptlrpc] #15 [ffff880315645d38] mdt_enqueue at ffffffffa0c1c536 [mdt] #16 [ffff880315645d58] mdt_handle_common at ffffffffa0c22c27 [mdt] #17 [ffff880315645da8] mds_regular_handle at ffffffffa0c5c835 [mdt] #18 [ffff880315645db8] ptlrpc_server_handle_request at ffffffffa070d3b8 [ptlrpc] #19 [ffff880315645eb8] ptlrpc_main at ffffffffa070e74e [ptlrpc] #20 [ffff880315645f48] kernel_thread at ffffffff8100c20a crash> log | tail -80 LustreError: 5428:0:(lod_object.c:704:lod_ah_init()) ASSERTION( lc->ldo_stripenr == 0 ) failed: LustreError: 5428:0:(lod_object.c:704:lod_ah_init()) LBUG Pid: 5428, comm: mdt01_001 Call Trace: [<ffffffffa03a5895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa03a5e97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa0cee9ef>] lod_ah_init+0x57f/0x5c0 [lod] [<ffffffffa082ea83>] mdd_object_make_hint+0x83/0xa0 [mdd] [<ffffffffa083aeb2>] mdd_create_data+0x332/0x7d0 [mdd] [<ffffffffa0c4beac>] mdt_finish_open+0x125c/0x1950 [mdt] [<ffffffffa0c47778>] ? mdt_object_open_lock+0x1c8/0x510 [mdt] [<ffffffffa0c4e046>] mdt_reint_open+0xfe6/0x21d0 [mdt] [<ffffffffa03c283e>] ? upcall_cache_get_entry+0x28e/0x860 [libcfs] [<ffffffffa06fcdbc>] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] [<ffffffffa0c38aa1>] mdt_reint_rec+0x41/0xe0 [mdt] [<ffffffffa0c1dc73>] mdt_reint_internal+0x4c3/0x780 [mdt] [<ffffffffa0c1e1fd>] mdt_intent_reint+0x1ed/0x520 [mdt] [<ffffffffa0c1c0ae>] mdt_intent_policy+0x39e/0x720 [mdt] [<ffffffffa06b4831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] [<ffffffffa06db1df>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] [<ffffffffa0c1c536>] mdt_enqueue+0x46/0xe0 [mdt] [<ffffffffa0c22c27>] mdt_handle_common+0x647/0x16d0 [mdt] [<ffffffffa06fdb9c>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] [<ffffffffa0c5c835>] mds_regular_handle+0x15/0x20 [mdt] [<ffffffffa070d3b8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] [<ffffffffa03a65de>] ? cfs_timer_arm+0xe/0x10 [libcfs] [<ffffffffa03b7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] [<ffffffffa0704719>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] [<ffffffff81058bd3>] ? __wake_up+0x53/0x70 [<ffffffffa070e74e>] ptlrpc_main+0xace/0x1700 [ptlrpc] [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [<ffffffff8100c200>] ? child_rip+0x0/0x20 Kernel panic - not syncing: LBUG Pid: 5428, comm: mdt01_001 Not tainted 2.6.32-431.1.2.el6.Bull.44.x86_64 #1 Call Trace: [<ffffffff815280a3>] ? panic+0xa7/0x16f [<ffffffffa03a5eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] [<ffffffffa0cee9ef>] ? lod_ah_init+0x57f/0x5c0 [lod] [<ffffffffa082ea83>] ? mdd_object_make_hint+0x83/0xa0 [mdd] [<ffffffffa083aeb2>] ? mdd_create_data+0x332/0x7d0 [mdd] [<ffffffffa0c4beac>] ? mdt_finish_open+0x125c/0x1950 [mdt] [<ffffffffa0c47778>] ? mdt_object_open_lock+0x1c8/0x510 [mdt] [<ffffffffa0c4e046>] ? mdt_reint_open+0xfe6/0x21d0 [mdt] [<ffffffffa03c283e>] ? upcall_cache_get_entry+0x28e/0x860 [libcfs] [<ffffffffa06fcdbc>] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] [<ffffffffa0c38aa1>] ? mdt_reint_rec+0x41/0xe0 [mdt] [<ffffffffa0c1dc73>] ? mdt_reint_internal+0x4c3/0x780 [mdt] [<ffffffffa0c1e1fd>] ? mdt_intent_reint+0x1ed/0x520 [mdt] [<ffffffffa0c1c0ae>] ? mdt_intent_policy+0x39e/0x720 [mdt] [<ffffffffa06b4831>] ? ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] [<ffffffffa06db1df>] ? ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] [<ffffffffa0c1c536>] ? mdt_enqueue+0x46/0xe0 [mdt] [<ffffffffa0c22c27>] ? mdt_handle_common+0x647/0x16d0 [mdt] [<ffffffffa06fdb9c>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] [<ffffffffa0c5c835>] ? mds_regular_handle+0x15/0x20 [mdt] [<ffffffffa070d3b8>] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] [<ffffffffa03a65de>] ? cfs_timer_arm+0xe/0x10 [libcfs] [<ffffffffa03b7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] [<ffffffffa0704719>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] [<ffffffff81058bd3>] ? __wake_up+0x53/0x70 [<ffffffffa070e74e>] ? ptlrpc_main+0xace/0x1700 [ptlrpc] [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [<ffffffff8100c20a>] ? child_rip+0xa/0x20 [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [<ffffffffa070dc80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [<ffffffff8100c200>] ? child_rip+0x0/0x20 crash>
After MGT, MDT and OSTs are mounted, the hanged "echo" command on client ends, and the file content is correct:
# cat /fs_pv/170_stripe_file Hello
The content of dmesg on client is then the following:
Lustre: 4376:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1395328613/real 1395328613] req@ffff8801b95afc00 x1463099919122744/t0(0) o101->fs_pv-MDT0000-mdc-ffff8801bd642400@10.1.0.15@o2ib:12/10 lens 584/1136 e 0 to 1 dl 1395328620 ref 2 fl Rpc:XP/0/ffffffff rc 0/-1 Lustre: 4376:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 1480 previous similar messages Lustre: fs_pv-MDT0000-mdc-ffff8801bd642400: Connection to fs_pv-MDT0000 (at 10.1.0.15@o2ib) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 41 previous similar messages Lustre: fs_pv-OST00a9-osc-ffff8801bd642400: Connection to fs_pv-OST00a9 (at 10.1.0.15@o2ib) was lost; in progress operations using this service will wait for recovery to complete LustreError: 166-1: MGC10.1.0.15@o2ib: Connection to MGS (at 10.1.0.15@o2ib) was lost; in progress operations using this service will fail Lustre: Skipped 169 previous similar messages LNetError: 6581:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 8 seconds LNetError: 6581:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Timed out RDMA with 10.1.0.15@o2ib (58): c: 0, oc: 0, rc: 8 Lustre: Evicted from MGS (at 10.1.0.15@o2ib) after server handle changed from 0x4ece3e4b34440eb4 to 0x221c3affca3337c9 Lustre: MGC10.1.0.15@o2ib: Connection restored to MGS (at 10.1.0.15@o2ib) Lustre: Skipped 11 previous similar messages Lustre: fs_pv-OST0002-osc-ffff8801bd642400: Connection restored to fs_pv-OST0002 (at 10.1.0.15@o2ib) Lustre: fs_pv-OST000c-osc-ffff8801bd642400: Connection restored to fs_pv-OST000c (at 10.1.0.15@o2ib) Lustre: Skipped 90 previous similar messages
As this ASSERT is the same as described in LU-4260, we backported the corresponding patch on lustre 2.4.2, but it does not fix the problem.
This patch adds a call to lod_object_free_striping() if lod_fld_lookup() fails in lod_generate_and_set_lovea().
After adding another call to lod_object_free_striping() a few lines below in the same routine, if dt_xattr_set() fails, this fixes the problem.
There is still the "No space left on device" error message when runnig "lfs setstripe -c -1", but the MDS no longer crashes.
Perhaps the next step would be to modify "lfs setstripe" so that it set the stripe count to 160 when the requested value is larger and "ea_inode" is no set.
But the additional call to lod_object_free_striping() should be kept, as dt_xattr_set() could fail for any other reason.
The change we added to LU-4260 is the following:
--- a/lustre/lod/lod_lov.c +++ b/lustre/lod/lod_lov.c @@ -562,6 +562,8 @@ int lod_generate_and_set_lovea(const str info->lti_buf.lb_len = lmm_size; rc = dt_xattr_set(env, next, &info->lti_buf, XATTR_NAME_LOV, 0, th, BYPASS_CAPA); + if (rc < 0) + lod_object_free_striping(env, lo); RETURN(rc); }