Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.4.0, Lustre 2.5.0
-
3
-
6755
Description
This could probably be reproduced by racer given enough runs but I can reproduce it as follows:
# llmount.sh # mount n@tcp:/lustre /mnt/lustre2 -t lustre # (cd /mnt/lustre; while true; do lfs setstripe -c 1 f0; done) & # (cd /mnt/lustre2; while true; do mv f0 f1; done) & Message from syslogd@n at Feb 8 15:36:51 ... kernel:LustreError: 3186:0:(lod_lov.c:782:lod_load_striping()) ASSERTION( lo->ldo_stripe[i] ) failed: stripe 0 is NULL Message from syslogd@n at Feb 8 15:36:51 ... kernel:LustreError: 3186:0:(lod_lov.c:782:lod_load_striping()) LBUG Message from syslogd@n at Feb 8 15:36:51 ... kernel:Kernel panic - not syncing: LBUG
Here is the crash dump for the rename handler:
crash> bt -l PID: 13628 TASK: ffff8800a98a1540 CPU: 1 COMMAND: "mdt00_001" #0 [ffff8800a98a3828] machine_kexec at ffffffff81031f7b /usr/src/debug/kernel-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6.x86_64/arch/x86/kernel/machine_kexec_64.c: 336 #1 [ffff8800a98a3888] crash_kexec at ffffffff810b8c22 /usr/src/debug/kernel-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6.x86_64/kernel/kexec.c: 1106 #2 [ffff8800a98a3958] panic at ffffffff814e9818 /usr/src/debug/kernel-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6.x86_64/kernel/panic.c: 103 #3 [ffff8800a98a39d8] lbug_with_loc at ffffffffa0595eeb [libcfs] /root/lustre-release/libcfs/libcfs/linux/linux-debug.c: 188 #4 [ffff8800a98a39f8] lod_load_striping at ffffffffa0e199f3 [lod] /root/lustre-release/lustre/lod/lod_internal.h: 255 #5 [ffff8800a98a3a38] lod_declare_attr_set at ffffffffa0e25fbb [lod] /root/lustre-release/lustre/lod/lod_object.c: 300 #6 [ffff8800a98a3a88] mdd_rename at ffffffffa0beb6d8 [mdd] /root/lustre-release/lustre/mdd/mdd_dir.c: 2087 #7 [ffff8800a98a3ba8] mdt_reint_rename at ffffffffa0d54617 [mdt] /root/lustre-release/lustre/mdt/mdt_reint.c: 1270 #8 [ffff8800a98a3cc8] mdt_reint_rec at ffffffffa0d506b1 [mdt] /root/lustre-release/libcfs/include/libcfs/libcfs_debug.h: 211 #9 [ffff8800a98a3ce8] mdt_reint_internal at ffffffffa0d49d13 [mdt] /root/lustre-release/libcfs/include/libcfs/libcfs_debug.h: 211 #10 [ffff8800a98a3d28] mdt_reint at ffffffffa0d4a044 [mdt] /root/lustre-release/lustre/mdt/mdt_handler.c: 1818 #11 [ffff8800a98a3d48] mdt_handle_common at ffffffffa0d3afb8 [mdt] /root/lustre-release/lustre/mdt/mdt_handler.c: 2981 #12 [ffff8800a98a3d98] mds_regular_handle at ffffffffa0d725f5 [mdt] /root/lustre-release/lustre/mdt/mdt_mds.c: 354 #13 [ffff8800a98a3da8] ptlrpc_server_handle_request at ffffffffa08e9c7c [ptlrpc] /root/lustre-release/lustre/include/lustre_net.h: 2771 #14 [ffff8800a98a3ea8] ptlrpc_main at ffffffffa08eb1c6 [ptlrpc] /root/lustre-release/lustre/ptlrpc/service.c: 2487 #15 [ffff8800a98a3f48] kernel_thread at ffffffff8100c0ca /usr/src/debug///////kernel-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6.x86_64/arch/x86/kernel/entry_64.S: 1213
lfs setstripe is in ioctl() with its mdt_reint_open() handler in:
mdt_reint_open() ... mdt_create_data() ... lod_declare_xattr_set() ... osp_precreate_reserve()
Attachments
Issue Links
- is related to
-
LU-2523 ll_update_inode()) ASSERTION( lu_fid_eq(&lli->lli_fid, &body->fid1) ) failed: Trying to change FID
-
- Resolved
-
-
LU-4083 lod_lov.c:824:lod_load_striping()) ASSERTION( lo->ldo_stripenr == 0 ) failed
-
- Resolved
-
-
LU-4260 ASSERTION( lc->ldo_stripenr == 0 ) failed:
-
- Resolved
-
-
LU-3059 shrink lod_object to 128 bytes
-
- Resolved
-
-
LU-3072 add more operations to racer
-
- Closed
-
I think that http://review.whamcloud.com/#/c/7919 resolves the race condition in the unlink and rename paths, but isn't http://review.whamcloud.com/#/c/7223/3 still necessary for the setattr case? After 7223 is landed, will this bug be ready to close?
7223 was rejected on the basis that the mdt layer should not look into lod internals, but doesn't 7919 do the same thing? Can we land 7223 as a short-term solution for the setattr case?