[LU-1250] 1.8.7-wc1/2.1.1<->2.2.0 interop: lfs: munmap_chunk(): invalid pointer: 0x0000000001b0bed0 Created: 22/Mar/12 Updated: 25/Apr/14 Resolved: 25/Apr/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.1, Lustre 1.8.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Jian Yu | Assignee: | Keith Mannthey (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre Configuration: Distro/Arch: RHEL6.2/x86_64 |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 8295 |
| Description |
|
Test steps: While running step 3), the striping information was gotten correctly. However, in the meantime, lfs hit the following issue on both 2.1.1 and 1.8.7-wc1 clients: # lfs getstripe -c /mnt/lustre/testfile.client-1.2000 2000 *** glibc detected *** lfs: munmap_chunk(): invalid pointer: 0x0000000001b0bed0 *** ======= Backtrace: ========= /lib64/libc.so.6[0x398c0750c6] /lib64/libc.so.6(closedir+0xd)[0x398c0a690d] lfs[0x41c3a1] lfs[0x41c7f0] lfs[0x406370] lfs[0x4273a8] lfs[0x406937] /lib64/libc.so.6(__libc_start_main+0xfd)[0x398c01ecdd] lfs[0x4030b9] ======= Memory map: ======== 00400000-00464000 r-xp 00000000 08:01 928651 /usr/bin/lfs 00663000-00665000 rw-p 00063000 08:01 928651 /usr/bin/lfs 00665000-00680000 rw-p 00000000 00:00 0 00864000-00866000 rw-p 00064000 08:01 928651 /usr/bin/lfs 01b05000-01b26000 rw-p 00000000 00:00 0 [heap] 398b800000-398b820000 r-xp 00000000 08:01 541056 /lib64/ld-2.12.so 398ba1f000-398ba20000 r--p 0001f000 08:01 541056 /lib64/ld-2.12.so 398ba20000-398ba21000 rw-p 00020000 08:01 541056 /lib64/ld-2.12.so 398ba21000-398ba22000 rw-p 00000000 00:00 0 398bc00000-398bc02000 r-xp 00000000 08:01 541059 /lib64/libdl-2.12.so 398bc02000-398be02000 ---p 00002000 08:01 541059 /lib64/libdl-2.12.so 398be02000-398be03000 r--p 00002000 08:01 541059 /lib64/libdl-2.12.so 398be03000-398be04000 rw-p 00003000 08:01 541059 /lib64/libdl-2.12.so 398c000000-398c197000 r-xp 00000000 08:01 541057 /lib64/libc-2.12.so 398c197000-398c397000 ---p 00197000 08:01 541057 /lib64/libc-2.12.so 398c397000-398c39b000 r--p 00197000 08:01 541057 /lib64/libc-2.12.so 398c39b000-398c39c000 rw-p 0019b000 08:01 541057 /lib64/libc-2.12.so 398c39c000-398c3a1000 rw-p 00000000 00:00 0 398cc00000-398cc22000 r-xp 00000000 08:01 540770 /lib64/libncurses.so.5.7 398cc22000-398ce21000 ---p 00022000 08:01 540770 /lib64/libncurses.so.5.7 398ce21000-398ce22000 rw-p 00021000 08:01 540770 /lib64/libncurses.so.5.7 398d800000-398d81d000 r-xp 00000000 08:01 541091 /lib64/libtinfo.so.5.7 398d81d000-398da1d000 ---p 0001d000 08:01 541091 /lib64/libtinfo.so.5.7 398da1d000-398da21000 rw-p 0001d000 08:01 541091 /lib64/libtinfo.so.5.7 398dc00000-398dc3a000 r-xp 00000000 08:01 541092 /lib64/libreadline.so.6.0 398dc3a000-398de3a000 ---p 0003a000 08:01 541092 /lib64/libreadline.so.6.0 398de3a000-398de42000 rw-p 0003a000 08:01 541092 /lib64/libreadline.so.6.0 398de42000-398de43000 rw-p 00000000 00:00 0 7fe05e942000-7fe05e958000 r-xp 00000000 08:01 540760 /lib64/libgcc_s-4.4.6-20110824.so.1 7fe05e958000-7fe05eb57000 ---p 00016000 08:01 540760 /lib64/libgcc_s-4.4.6-20110824.so.1 7fe05eb57000-7fe05eb58000 rw-p 00015000 08:01 540760 /lib64/libgcc_s-4.4.6-20110824.so.1 7fe05eb60000-7fe05eb65000 rw-p 00000000 00:00 0 7fe05eb66000-7fe05eb68000 rw-p 00000000 00:00 0 7fe05eb68000-7fe05eb6d000 rw-s 00000000 00:04 720896 /SYSV00000000 (deleted) 7fe05eb6d000-7fe05eb6e000 rw-p 00000000 00:00 0 7fff8ca1d000-7fff8ca32000 rw-p 00000000 00:00 0 [stack] 7fff8cb12000-7fff8cb13000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] Aborted (core dumped) # gdb /usr/bin/lfs /tmp/lfs.2972.core
Core was generated by `lfs getstripe -c /mnt/lustre/testfile.client-1.2000'.
Program terminated with signal 6, Aborted.
#0 0x000000398c032885 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x000000398c032885 in raise () from /lib64/libc.so.6
#1 0x000000398c034065 in abort () from /lib64/libc.so.6
#2 0x000000398c06f7a7 in __libc_message () from /lib64/libc.so.6
#3 0x000000398c0750c6 in malloc_printerr () from /lib64/libc.so.6
#4 0x000000398c0a690d in closedir () from /lib64/libc.so.6
#5 0x000000000041c3a1 in llapi_semantic_traverse (path=0x1b05030 "/mnt/lustre/testfile.client-1.2000", parent=0x0, sem_init=0x41d2b0 <cb_getstripe>,
sem_fini=0x4177d0 <cb_common_fini>, data=0x7fff8ca2f600, de=0x0, size=4097) at liblustreapi.c:1132
#6 0x000000000041c7f0 in param_callback (path=0x7fff8ca3164e "/mnt/lustre/testfile.client-1.2000", sem_init=0x41d2b0 <cb_getstripe>, sem_fini=0x4177d0 <cb_common_fini>,
param=0x7fff8ca2f600) at liblustreapi.c:1159
#7 0x0000000000406370 in lfs_getstripe (argc=3, argv=0x7fff8ca2f840) at lfs.c:889
#8 0x00000000004273a8 in Parser_execarg (argc=3, argv=0x7fff8ca2f840, cmds=0x664400) at util/parser.c:104
#9 0x0000000000406937 in main (argc=4, argv=0x7fff8ca2f838) at lfs.c:2528
The core dump file lfs.2972.core on 2.1.1 client is attached. |
| Comments |
| Comment by Jian Yu [ 23/Mar/12 ] |
|
In addition, if the "lfs getstripe" operation was performed on an non-empty directory which had 2000 stripe count, then the following errors would occur on both 2.1.1 and 1.8.7-wc1 clients: error: llapi_mds_getfileinfo: IOC_MDC_GETFILEINFO failed for /mnt/lustre/testdir.client-1/: Invalid argument (22) error: llapi_semantic_traverse: '' is UNKNOWN type 0 |
| Comment by Peter Jones [ 18/May/12 ] |
|
Keith Could you please look into this one? Thanks Peter |
| Comment by Keith Mannthey (Inactive) [ 10/Aug/12 ] |
|
I have scripts and Torro access to debug this issue as it was initially created. Creating 1000 OSTs on a system requires some disk space and I am investigating this and waiting for the initial fat-amd-2 and fat-amd-4 to free up. |
| Comment by Keith Mannthey (Inactive) [ 29/Aug/12 ] |
|
I have fat nodes and am working to recreate this issue. |
| Comment by Keith Mannthey (Inactive) [ 13/Sep/12 ] |
|
Sorry I didn't get it recreated due to other work. I will wait for hardware again. |
| Comment by Keith Mannthey (Inactive) [ 08/Feb/13 ] |
|
Are we seeing this issue with 1.8.9 interop testing? |
| Comment by Jian Yu [ 16/Feb/13 ] |
Hi Keith, The large stripe count (>160 OSTs) interop testing was not in the Lustre 1.8.9-wc1 build/release test plan. We've to run this manually in the Lustre 2.x (>=2.2) build/release testing cycles. |
| Comment by Keith Mannthey (Inactive) [ 16/May/13 ] |
|
Is this still an error the needs to be looked into? |
| Comment by Jian Yu [ 17/May/13 ] |
|
Since the "large xattr" feature is still not being enabled by default on master branch, the large stripe count interop testing has not been performed by default. So, to make sure whether the issue still exists or not, we have to perform the test manually on 1.8.9-wc1/2.1.5<->2.4.0. |
| Comment by Keith Mannthey (Inactive) [ 25/Apr/14 ] |
|
Please reopen if this occurs again. |