[LU-10335] Ubuntu1604 client sanity-130a: FAIL: filefrag -ves core dumped Created: 06/Dec/17  Updated: 27/May/20  Resolved: 10/Aug/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0, Lustre 2.10.2, Lustre 2.10.4
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Major
Reporter: Sarah Liu Assignee: Sarah Liu
Resolution: Won't Do Votes: 0
Labels: ubuntu
Environment:

server: 2.10.2 RC1
client: Ubuntu16.04


Issue Links:
Related
is related to LU-6007 FIEMAP fails xfstests's fiemap-tester Open
is related to LU-10997 Ubuntu 18 support Resolved
is related to LU-13177 add e2fsprog support for SLES15SP1 Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Here is the Maloo link https://testing.hpdd.intel.com/test_sets/cad15292-db78-11e7-9c63-52540065bddc

test_130a console

== sanity test 130a: FIEMAP (1-stripe file) ========================================================== 01:03:11 (1512522191)
1+0 records in
1+0 records out
65536 bytes (66 kB, 64 KiB) copied, 0.00117857 s, 55.6 MB/s
Filesystem type is: bd00bd0
File size of /mnt/lustre/f130a.sanity is 65536 (16 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
/usr/lib64/lustre/tests/sanity.sh: line 9256: 25375 Aborted                 (core dumped) filefrag -ves $fm_file
 sanity test_130a: @@@@@@ FAIL: filefrag /mnt/lustre/f130a.sanity failed

test_130b/c/e

== sanity test 130e: FIEMAP (test continuation FIEMAP calls) ========================================= 01:03:25 (1512522205)
/mnt/lustre/f130e.sanity: FIBMAP unsupported
Filesystem type is: bd00bd0
File size of /mnt/lustre/f130e.sanity is 67043328 (16368 blocks of 4096 bytes)


 Comments   
Comment by Andreas Dilger [ 06/Dec/17 ]

This looks like the error message reported in LU-10333:

*** buffer overflow detected ***: filefrag terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7ff72cd1c7e5]
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x5c)[0x7ff72cdbe11c]
/lib/x86_64-linux-gnu/libc.so.6(+0x117120)[0x7ff72cdbc120]
/lib/x86_64-linux-gnu/libc.so.6(+0x116689)[0x7ff72cdbb689]
/lib/x86_64-linux-gnu/libc.so.6(_IO_default_xsputn+0x80)[0x7ff72cd206b0]
/lib/x86_64-linux-gnu/libc.so.6(_IO_vfprintf+0xc90)[0x7ff72ccf2e00]
/lib/x86_64-linux-gnu/libc.so.6(__vsprintf_chk+0x84)[0x7ff72cdbb714]
/lib/x86_64-linux-gnu/libc.so.6(__sprintf_chk+0x7d)[0x7ff72cdbb66d]
filefrag[0x4018c8]
filefrag[0x401ccf]
filefrag[0x4012f5]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7ff72ccc5830]
filefrag[0x4015c9]
======= Memory map: ========

What version of e2fsprogs is installed on the Ubuntu client?

Comment by Andreas Dilger [ 08/Dec/17 ]

Sarah, could you please report what version of filefrag is installed on the Ubuntu client, rpm -qf $(which filefrag). I suspect it is the unpatched filefrag, which will not be able to run test_130b/c/e, but it should be able to run test_130a without crashing.

Could you (or someone) please login to an Ubuntu client and run filefrag under gdb to get a stack trace? You may need to install the e2fsprogs-debug RPM to get useful information from the crash.

Comment by Sarah Liu [ 11/Dec/17 ]

Hi Andreas,

just checked and there is no filefrag installed on the Ubuntu client

root@onyx-21vm3:~# dpkg -l|grep -i filefra 
root@onyx-21vm3:~# dpkg -l|grep -i e2fsprogs
ii  e2fsprogs                              1.42.13-1ubuntu1                           amd64        ext2/ext3/ext4 file system utilities

Comment by Andreas Dilger [ 12/Dec/17 ]

The filefrag program is part of e2fsprogs.

Comment by Andreas Dilger [ 29/Jan/18 ]

Sarah, could you please run filefrag under gdb and collect the stack trace.

Comment by Sarah Liu [ 29/Jan/18 ]

Ok, I will update the ticket when I get the trace

Comment by Sarah Liu [ 07/Feb/18 ]

this is what I got:

(gdb) set args -ves /mnt/lustre/file-130a 
(gdb) run
Starting program: /usr/sbin/filefrag -ves /mnt/lustre/file-130a 
Filesystem type is: bd00bd0
File size of /mnt/lustre/file-130a is 65536 (16 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
*** buffer overflow detected ***: /usr/sbin/filefrag terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7ffff7a847e5]
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x5c)[0x7ffff7b2615c]
/lib/x86_64-linux-gnu/libc.so.6(+0x117160)[0x7ffff7b24160]
/lib/x86_64-linux-gnu/libc.so.6(+0x1166c9)[0x7ffff7b236c9]
/lib/x86_64-linux-gnu/libc.so.6(_IO_default_xsputn+0x80)[0x7ffff7a886b0]
/lib/x86_64-linux-gnu/libc.so.6(_IO_vfprintf+0xc90)[0x7ffff7a5ae00]
/lib/x86_64-linux-gnu/libc.so.6(__vsprintf_chk+0x84)[0x7ffff7b23754]
/lib/x86_64-linux-gnu/libc.so.6(__sprintf_chk+0x7d)[0x7ffff7b236ad]
/usr/sbin/filefrag[0x4018c8]
/usr/sbin/filefrag[0x401ccf]
/usr/sbin/filefrag[0x4012f5]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7ffff7a2d830]
/usr/sbin/filefrag[0x4015c9]
======= Memory map: ========
00400000-00403000 r-xp 00000000 fd:01 2231256                            /usr/sbin/filefrag
00602000-00603000 r--p 00002000 fd:01 2231256                            /usr/sbin/filefrag
00603000-00604000 rw-p 00003000 fd:01 2231256                            /usr/sbin/filefrag
00604000-00626000 rw-p 00000000 00:00 0                                  [heap]
7ffff77f7000-7ffff780d000 r-xp 00000000 fd:01 262665                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7ffff780d000-7ffff7a0c000 ---p 00016000 fd:01 262665                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7ffff7a0c000-7ffff7a0d000 rw-p 00015000 fd:01 262665                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7ffff7a0d000-7ffff7bcd000 r-xp 00000000 fd:01 266956                     /lib/x86_64-linux-gnu/libc-2.23.so
7ffff7bcd000-7ffff7dcd000 ---p 001c0000 fd:01 266956                     /lib/x86_64-linux-gnu/libc-2.23.so
7ffff7dcd000-7ffff7dd1000 r--p 001c0000 fd:01 266956                     /lib/x86_64-linux-gnu/libc-2.23.so
7ffff7dd1000-7ffff7dd3000 rw-p 001c4000 fd:01 266956                     /lib/x86_64-linux-gnu/libc-2.23.so
7ffff7dd3000-7ffff7dd7000 rw-p 00000000 00:00 0 
7ffff7dd7000-7ffff7dfd000 r-xp 00000000 fd:01 266954                     /lib/x86_64-linux-gnu/ld-2.23.so
7ffff7fea000-7ffff7fed000 rw-p 00000000 00:00 0 
7ffff7ff6000-7ffff7ff7000 rw-p 00000000 00:00 0 
7ffff7ff7000-7ffff7ffa000 r--p 00000000 00:00 0                          [vvar]
7ffff7ffa000-7ffff7ffc000 r-xp 00000000 00:00 0                          [vdso]
7ffff7ffc000-7ffff7ffd000 r--p 00025000 fd:01 266954                     /lib/x86_64-linux-gnu/ld-2.23.so
7ffff7ffd000-7ffff7ffe000 rw-p 00026000 fd:01 266954                     /lib/x86_64-linux-gnu/ld-2.23.so
7ffff7ffe000-7ffff7fff000 rw-p 00000000 00:00 0 
7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0                          [stack]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Program received signal SIGABRT, Aborted.
0x00007ffff7a42428 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:54
54	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) backtrace
#0  0x00007ffff7a42428 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff7a4402a in __GI_abort () at abort.c:89
#2  0x00007ffff7a847ea in __libc_message (do_abort=do_abort@entry=2, 
    fmt=fmt@entry=0x7ffff7b9c49f "*** %s ***: %s terminated\n")
    at ../sysdeps/posix/libc_fatal.c:175
#3  0x00007ffff7b2615c in __GI___fortify_fail (msg=<optimized out>, 
    msg@entry=0x7ffff7b9c430 "buffer overflow detected") at fortify_fail.c:37
#4  0x00007ffff7b24160 in __GI___chk_fail () at chk_fail.c:28
#5  0x00007ffff7b236c9 in _IO_str_chk_overflow (fp=<optimized out>, 
    c=<optimized out>) at vsprintf_chk.c:31
#6  0x00007ffff7a886b0 in __GI__IO_default_xsputn (f=0x7fffffffa580, 
    data=<optimized out>, n=8) at genops.c:455
#7  0x00007ffff7a5ae00 in _IO_vfprintf_internal (s=s@entry=0x7fffffffa580, 
    format=<optimized out>, format@entry=0x402124 "%#04x,", 
    ap=ap@entry=0x7fffffffa6b8) at vfprintf.c:1631
#8  0x00007ffff7b23754 in ___vsprintf_chk (s=0x7fffffffa7c0 "0x800", flags=1, 
    slen=6, format=0x402124 "%#04x,", args=args@entry=0x7fffffffa6b8)
    at vsprintf_chk.c:82
#9  0x00007ffff7b236ad in ___sprintf_chk (s=<optimized out>, 
    flags=<optimized out>, slen=<optimized out>, format=<optimized out>)
    at sprintf_chk.c:31
#10 0x00000000004018c8 in ?? ()
---Type <return> to continue, or q <return> to quit---
#11 0x0000000000401ccf in ?? ()
#12 0x00000000004012f5 in ?? ()
#13 0x00007ffff7a2d830 in __libc_start_main (main=0x400b00, argc=3, 
    argv=0x7fffffffec08, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7fffffffebf8)
    at ../csu/libc-start.c:291
#14 0x00000000004015c9 in ?? ()
(gdb) 

Comment by Sarah Liu [ 14/Feb/18 ]

I rebuild e2fsprogs with debug symbols but cannot hit the problem with the updated filefrag

root@onyx-24vm1:~# file /usr/sbin/filefrag 
/usr/sbin/filefrag: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=6c421f2064cfcb7aff7314dcb9db4f380e7378f0, not stripped

root@onyx-24vm1:~# gdb filefrag
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from filefrag...done.
(gdb) set args -ves /mnt/lustre/foo
(gdb) run
Starting program: /usr/sbin/filefrag -ves /mnt/lustre/foo
Filesystem type is: bd00bd0
File size of /mnt/lustre/foo is 65536 (16 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..      15:      35346..     35361:     16:             last,0x80000000,eof
/mnt/lustre/foo: 1 extent found
[Inferior 1 (process 10604) exited normally]
(gdb) quit
Comment by Andreas Dilger [ 01/Mar/18 ]

The problem was that the unpatched Ubuntu e2fsprogs was printing unknown flags into a temporary buffer, but Lustre always sets the 0x80000000 flag to identify network-based filesystems. However, this overflows the temporary buffer, which was only 6 bytes:

        /* print any unknown flags as hex values */
        for (mask = 1; fe_flags != 0 && mask != 0; mask <<= 1) {
                char hex[6];

                if ((fe_flags & mask) == 0)
                        continue;
                sprintf(hex, "%#04x,", mask);
                print_flag(&fe_flags, mask, flags, hex);
        }

Any unknown flag would overflow this, since it would always have at least 4 hex digits, plus the leading 0x and a trailing NUL, so at least 7 characters printed each time. I've submitted a patch upstream for this.

Comment by Andreas Dilger [ 29/Mar/18 ]

According to https://marc.info/?l=linux-ext4&m=152010285623799&w=2 the patch was landed, but was subsequently lost from the tree:

List: linux-ext4
Subject: Re: [PATCH] filefrag: avoid temporary buffer overflow
From: Theodore Ts'o <tytso () mit ! edu>
Date: 2018-03-03 18:47:12
Message-ID: 20180303184712.GA26224 () thunk ! org

On Fri, Mar 02, 2018 at 09:48:28AM -0800, Darrick J. Wong wrote:
> On Thu, Mar 01, 2018 at 01:09:46PM -0700, Andreas Dilger wrote:
> > From: Andreas Dilger <adilger@dilger.ca>
> >
> > If an unknown flag is present in a FIEMAP extent, it is printed as a
> > hex value into a temporary buffer before adding it to the flags. If
> > that unknown flag is over 0xffff then it will overflow the temporary
> > buffer.
> >
> > Reported-by: Sarah Liu <wei3.liu@intel.com>
> > Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10335
> > Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
>
> Looks ok,
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied with the 0xfff fixup in the commit description.

  • Ted

I've resubmitted the patch, and asked that it also be landed to the Debian maintenance branch so it will appear in Ubuntu.

Comment by Andreas Dilger [ 30/Mar/18 ]

This has landed to upstream e2fsprogs for the 1.45 and 1.44.2 releases:

commit 17a1f2c1929630e3a79e6b98168d56f96acf2e8b
Author:     Andreas Dilger <adilger@dilger.ca>
AuthorDate: Thu Mar 29 12:36:54 2018 -0600
Commit:     Theodore Ts'o <tytso@mit.edu>
CommitDate: Thu Mar 29 23:01:19 2018 -0400

    filefrag: avoid temporary buffer overflow
    
    If an unknown flag is present in a FIEMAP extent, it is printed as a
    hex value into a temporary buffer before adding it to the flags.  If
    that unknown flag is over 0xfff then it will overflow the temporary
    buffer.
    
    Reported-by: Sarah Liu <wei3.liu@intel.com>
    Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10335
    Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Add to ALWAYS_EXCEPT for Ubuntu until they get an updated e2fsprogs release with this fix and/or we install our patched e2fsprogs.

Comment by Andreas Dilger [ 08/Aug/18 ]

If the above patch has been included into the Ubuntu e2fsprogs (at least for the versions we are testing), then this ALWAYS_EXCEPT can be removed. If not, can you please file a ticket with the upstream Ubuntu bug tracker to have them backport the above patch from e2fsprogs master into their release.

Comment by James A Simmons [ 09/Aug/18 ]

I just did a apt-get source e2fsprogs and checked for the fix you posted here. For Ubuntu18 the fix is there but its lacking in Ubuntu16. Since for 2.12 we have Ubuntu server support we will need the lustre special e2fsprogs anyways.

Comment by Andreas Dilger [ 10/Aug/18 ]

The patched e2fsprogs does not have this problem, only the unpatched e2fsprogs. It is good that this is included in Ubuntu 18, and it isn't clear we can do anything about Ubuntu 16 at this point.

Comment by James A Simmons [ 10/Aug/18 ]

Once we have patched e2fsprogs for ldiskfs support for Ubuntu this should go away.

Comment by Gerrit Updater [ 23/Oct/18 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33459
Subject: LU-10335 test: enable sanity 130 tests for Ubuntu
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 59bab702aed6d10ae70336b8372a13e6169093ee

Comment by Gerrit Updater [ 17/Nov/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33459/
Subject: LU-10335 test: enable sanity 130 tests for Ubuntu
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 41a099f9c03c2a3ff62360433985ea5de3e52962

Generated at Sat Feb 10 02:34:08 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.