[LU-10335] Ubuntu1604 client sanity-130a: FAIL: filefrag -ves core dumped Created: 06/Dec/17 Updated: 27/May/20 Resolved: 10/Aug/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0, Lustre 2.10.2, Lustre 2.10.4 |
| Fix Version/s: | Lustre 2.12.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Sarah Liu | Assignee: | Sarah Liu |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | ubuntu | ||
| Environment: |
server: 2.10.2 RC1 |
||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
Here is the Maloo link https://testing.hpdd.intel.com/test_sets/cad15292-db78-11e7-9c63-52540065bddc test_130a console == sanity test 130a: FIEMAP (1-stripe file) ========================================================== 01:03:11 (1512522191) 1+0 records in 1+0 records out 65536 bytes (66 kB, 64 KiB) copied, 0.00117857 s, 55.6 MB/s Filesystem type is: bd00bd0 File size of /mnt/lustre/f130a.sanity is 65536 (16 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: /usr/lib64/lustre/tests/sanity.sh: line 9256: 25375 Aborted (core dumped) filefrag -ves $fm_file sanity test_130a: @@@@@@ FAIL: filefrag /mnt/lustre/f130a.sanity failed test_130b/c/e == sanity test 130e: FIEMAP (test continuation FIEMAP calls) ========================================= 01:03:25 (1512522205) /mnt/lustre/f130e.sanity: FIBMAP unsupported Filesystem type is: bd00bd0 File size of /mnt/lustre/f130e.sanity is 67043328 (16368 blocks of 4096 bytes) |
| Comments |
| Comment by Andreas Dilger [ 06/Dec/17 ] |
|
This looks like the error message reported in *** buffer overflow detected ***: filefrag terminated ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7ff72cd1c7e5] /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x5c)[0x7ff72cdbe11c] /lib/x86_64-linux-gnu/libc.so.6(+0x117120)[0x7ff72cdbc120] /lib/x86_64-linux-gnu/libc.so.6(+0x116689)[0x7ff72cdbb689] /lib/x86_64-linux-gnu/libc.so.6(_IO_default_xsputn+0x80)[0x7ff72cd206b0] /lib/x86_64-linux-gnu/libc.so.6(_IO_vfprintf+0xc90)[0x7ff72ccf2e00] /lib/x86_64-linux-gnu/libc.so.6(__vsprintf_chk+0x84)[0x7ff72cdbb714] /lib/x86_64-linux-gnu/libc.so.6(__sprintf_chk+0x7d)[0x7ff72cdbb66d] filefrag[0x4018c8] filefrag[0x401ccf] filefrag[0x4012f5] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7ff72ccc5830] filefrag[0x4015c9] ======= Memory map: ======== What version of e2fsprogs is installed on the Ubuntu client? |
| Comment by Andreas Dilger [ 08/Dec/17 ] |
|
Sarah, could you please report what version of filefrag is installed on the Ubuntu client, rpm -qf $(which filefrag). I suspect it is the unpatched filefrag, which will not be able to run test_130b/c/e, but it should be able to run test_130a without crashing. Could you (or someone) please login to an Ubuntu client and run filefrag under gdb to get a stack trace? You may need to install the e2fsprogs-debug RPM to get useful information from the crash. |
| Comment by Sarah Liu [ 11/Dec/17 ] |
|
Hi Andreas, just checked and there is no filefrag installed on the Ubuntu client root@onyx-21vm3:~# dpkg -l|grep -i filefra root@onyx-21vm3:~# dpkg -l|grep -i e2fsprogs ii e2fsprogs 1.42.13-1ubuntu1 amd64 ext2/ext3/ext4 file system utilities |
| Comment by Andreas Dilger [ 12/Dec/17 ] |
|
The filefrag program is part of e2fsprogs. |
| Comment by Andreas Dilger [ 29/Jan/18 ] |
|
Sarah, could you please run filefrag under gdb and collect the stack trace. |
| Comment by Sarah Liu [ 29/Jan/18 ] |
|
Ok, I will update the ticket when I get the trace |
| Comment by Sarah Liu [ 07/Feb/18 ] |
|
this is what I got: (gdb) set args -ves /mnt/lustre/file-130a
(gdb) run
Starting program: /usr/sbin/filefrag -ves /mnt/lustre/file-130a
Filesystem type is: bd00bd0
File size of /mnt/lustre/file-130a is 65536 (16 blocks of 4096 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
*** buffer overflow detected ***: /usr/sbin/filefrag terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7ffff7a847e5]
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x5c)[0x7ffff7b2615c]
/lib/x86_64-linux-gnu/libc.so.6(+0x117160)[0x7ffff7b24160]
/lib/x86_64-linux-gnu/libc.so.6(+0x1166c9)[0x7ffff7b236c9]
/lib/x86_64-linux-gnu/libc.so.6(_IO_default_xsputn+0x80)[0x7ffff7a886b0]
/lib/x86_64-linux-gnu/libc.so.6(_IO_vfprintf+0xc90)[0x7ffff7a5ae00]
/lib/x86_64-linux-gnu/libc.so.6(__vsprintf_chk+0x84)[0x7ffff7b23754]
/lib/x86_64-linux-gnu/libc.so.6(__sprintf_chk+0x7d)[0x7ffff7b236ad]
/usr/sbin/filefrag[0x4018c8]
/usr/sbin/filefrag[0x401ccf]
/usr/sbin/filefrag[0x4012f5]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7ffff7a2d830]
/usr/sbin/filefrag[0x4015c9]
======= Memory map: ========
00400000-00403000 r-xp 00000000 fd:01 2231256 /usr/sbin/filefrag
00602000-00603000 r--p 00002000 fd:01 2231256 /usr/sbin/filefrag
00603000-00604000 rw-p 00003000 fd:01 2231256 /usr/sbin/filefrag
00604000-00626000 rw-p 00000000 00:00 0 [heap]
7ffff77f7000-7ffff780d000 r-xp 00000000 fd:01 262665 /lib/x86_64-linux-gnu/libgcc_s.so.1
7ffff780d000-7ffff7a0c000 ---p 00016000 fd:01 262665 /lib/x86_64-linux-gnu/libgcc_s.so.1
7ffff7a0c000-7ffff7a0d000 rw-p 00015000 fd:01 262665 /lib/x86_64-linux-gnu/libgcc_s.so.1
7ffff7a0d000-7ffff7bcd000 r-xp 00000000 fd:01 266956 /lib/x86_64-linux-gnu/libc-2.23.so
7ffff7bcd000-7ffff7dcd000 ---p 001c0000 fd:01 266956 /lib/x86_64-linux-gnu/libc-2.23.so
7ffff7dcd000-7ffff7dd1000 r--p 001c0000 fd:01 266956 /lib/x86_64-linux-gnu/libc-2.23.so
7ffff7dd1000-7ffff7dd3000 rw-p 001c4000 fd:01 266956 /lib/x86_64-linux-gnu/libc-2.23.so
7ffff7dd3000-7ffff7dd7000 rw-p 00000000 00:00 0
7ffff7dd7000-7ffff7dfd000 r-xp 00000000 fd:01 266954 /lib/x86_64-linux-gnu/ld-2.23.so
7ffff7fea000-7ffff7fed000 rw-p 00000000 00:00 0
7ffff7ff6000-7ffff7ff7000 rw-p 00000000 00:00 0
7ffff7ff7000-7ffff7ffa000 r--p 00000000 00:00 0 [vvar]
7ffff7ffa000-7ffff7ffc000 r-xp 00000000 00:00 0 [vdso]
7ffff7ffc000-7ffff7ffd000 r--p 00025000 fd:01 266954 /lib/x86_64-linux-gnu/ld-2.23.so
7ffff7ffd000-7ffff7ffe000 rw-p 00026000 fd:01 266954 /lib/x86_64-linux-gnu/ld-2.23.so
7ffff7ffe000-7ffff7fff000 rw-p 00000000 00:00 0
7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0 [stack]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Program received signal SIGABRT, Aborted.
0x00007ffff7a42428 in __GI_raise (sig=sig@entry=6)
at ../sysdeps/unix/sysv/linux/raise.c:54
54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) backtrace
#0 0x00007ffff7a42428 in __GI_raise (sig=sig@entry=6)
at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007ffff7a4402a in __GI_abort () at abort.c:89
#2 0x00007ffff7a847ea in __libc_message (do_abort=do_abort@entry=2,
fmt=fmt@entry=0x7ffff7b9c49f "*** %s ***: %s terminated\n")
at ../sysdeps/posix/libc_fatal.c:175
#3 0x00007ffff7b2615c in __GI___fortify_fail (msg=<optimized out>,
msg@entry=0x7ffff7b9c430 "buffer overflow detected") at fortify_fail.c:37
#4 0x00007ffff7b24160 in __GI___chk_fail () at chk_fail.c:28
#5 0x00007ffff7b236c9 in _IO_str_chk_overflow (fp=<optimized out>,
c=<optimized out>) at vsprintf_chk.c:31
#6 0x00007ffff7a886b0 in __GI__IO_default_xsputn (f=0x7fffffffa580,
data=<optimized out>, n=8) at genops.c:455
#7 0x00007ffff7a5ae00 in _IO_vfprintf_internal (s=s@entry=0x7fffffffa580,
format=<optimized out>, format@entry=0x402124 "%#04x,",
ap=ap@entry=0x7fffffffa6b8) at vfprintf.c:1631
#8 0x00007ffff7b23754 in ___vsprintf_chk (s=0x7fffffffa7c0 "0x800", flags=1,
slen=6, format=0x402124 "%#04x,", args=args@entry=0x7fffffffa6b8)
at vsprintf_chk.c:82
#9 0x00007ffff7b236ad in ___sprintf_chk (s=<optimized out>,
flags=<optimized out>, slen=<optimized out>, format=<optimized out>)
at sprintf_chk.c:31
#10 0x00000000004018c8 in ?? ()
---Type <return> to continue, or q <return> to quit---
#11 0x0000000000401ccf in ?? ()
#12 0x00000000004012f5 in ?? ()
#13 0x00007ffff7a2d830 in __libc_start_main (main=0x400b00, argc=3,
argv=0x7fffffffec08, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>, stack_end=0x7fffffffebf8)
at ../csu/libc-start.c:291
#14 0x00000000004015c9 in ?? ()
(gdb)
|
| Comment by Sarah Liu [ 14/Feb/18 ] |
|
I rebuild e2fsprogs with debug symbols but cannot hit the problem with the updated filefrag root@onyx-24vm1:~# file /usr/sbin/filefrag /usr/sbin/filefrag: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=6c421f2064cfcb7aff7314dcb9db4f380e7378f0, not stripped root@onyx-24vm1:~# gdb filefrag GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1 Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from filefrag...done. (gdb) set args -ves /mnt/lustre/foo (gdb) run Starting program: /usr/sbin/filefrag -ves /mnt/lustre/foo Filesystem type is: bd00bd0 File size of /mnt/lustre/foo is 65536 (16 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 15: 35346.. 35361: 16: last,0x80000000,eof /mnt/lustre/foo: 1 extent found [Inferior 1 (process 10604) exited normally] (gdb) quit |
| Comment by Andreas Dilger [ 01/Mar/18 ] |
|
The problem was that the unpatched Ubuntu e2fsprogs was printing unknown flags into a temporary buffer, but Lustre always sets the 0x80000000 flag to identify network-based filesystems. However, this overflows the temporary buffer, which was only 6 bytes: /* print any unknown flags as hex values */ for (mask = 1; fe_flags != 0 && mask != 0; mask <<= 1) { char hex[6]; if ((fe_flags & mask) == 0) continue; sprintf(hex, "%#04x,", mask); print_flag(&fe_flags, mask, flags, hex); } Any unknown flag would overflow this, since it would always have at least 4 hex digits, plus the leading 0x and a trailing NUL, so at least 7 characters printed each time. I've submitted a patch upstream for this. |
| Comment by Andreas Dilger [ 29/Mar/18 ] |
|
According to https://marc.info/?l=linux-ext4&m=152010285623799&w=2 the patch was landed, but was subsequently lost from the tree:
I've resubmitted the patch, and asked that it also be landed to the Debian maintenance branch so it will appear in Ubuntu. |
| Comment by Andreas Dilger [ 30/Mar/18 ] |
|
This has landed to upstream e2fsprogs for the 1.45 and 1.44.2 releases: commit 17a1f2c1929630e3a79e6b98168d56f96acf2e8b
Author: Andreas Dilger <adilger@dilger.ca>
AuthorDate: Thu Mar 29 12:36:54 2018 -0600
Commit: Theodore Ts'o <tytso@mit.edu>
CommitDate: Thu Mar 29 23:01:19 2018 -0400
filefrag: avoid temporary buffer overflow
If an unknown flag is present in a FIEMAP extent, it is printed as a
hex value into a temporary buffer before adding it to the flags. If
that unknown flag is over 0xfff then it will overflow the temporary
buffer.
Reported-by: Sarah Liu <wei3.liu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10335
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add to ALWAYS_EXCEPT for Ubuntu until they get an updated e2fsprogs release with this fix and/or we install our patched e2fsprogs. |
| Comment by Andreas Dilger [ 08/Aug/18 ] |
|
If the above patch has been included into the Ubuntu e2fsprogs (at least for the versions we are testing), then this ALWAYS_EXCEPT can be removed. If not, can you please file a ticket with the upstream Ubuntu bug tracker to have them backport the above patch from e2fsprogs master into their release. |
| Comment by James A Simmons [ 09/Aug/18 ] |
|
I just did a apt-get source e2fsprogs and checked for the fix you posted here. For Ubuntu18 the fix is there but its lacking in Ubuntu16. Since for 2.12 we have Ubuntu server support we will need the lustre special e2fsprogs anyways. |
| Comment by Andreas Dilger [ 10/Aug/18 ] |
|
The patched e2fsprogs does not have this problem, only the unpatched e2fsprogs. It is good that this is included in Ubuntu 18, and it isn't clear we can do anything about Ubuntu 16 at this point. |
| Comment by James A Simmons [ 10/Aug/18 ] |
|
Once we have patched e2fsprogs for ldiskfs support for Ubuntu this should go away. |
| Comment by Gerrit Updater [ 23/Oct/18 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33459 |
| Comment by Gerrit Updater [ 17/Nov/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33459/ |