[LU-4045] Overflow in tracking declares for index, Created: 02/Oct/13 Updated: 03/Nov/17 Resolved: 16/Jul/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Alexey Lyashkov | Assignee: | Hongchao Zhang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 10856 | ||||
| Description |
|
is it's expected? Lustre: DEBUG MARKER: == sanity test 233: checking that OBF of the FS root succeeds == 14:55:00 (1380624900) Lustre: DEBUG MARKER: == sanity test 234: xattr cache should not crash on ENOMEM == 14:55:00 (1380624900) Lustre: DEBUG MARKER: sanity test_234: @@@@@@ FAIL: touch failed Lustre: DEBUG MARKER: == sanity test 235: LU-1715: flock deadlock detection does not work properly == 14:55:00 (1380624900) Lustre: DEBUG MARKER: sanity test_235: @@@@@@ FAIL: error executing flock_deadlock /mnt/lustre/f.sanity.235 Lustre: DEBUG MARKER: == sanity test 236: Layout swap on open unlinked file == 14:55:00 (1380624900) Lustre: DEBUG MARKER: sanity test_236: @@@@@@ FAIL: cannot setstripe on '/mnt/lustre/d0.sanity/d236/f1': rc = 28 Lustre: DEBUG MARKER: == sanity test 900: umount should not race with any mgc requeue thread == 14:55:00 (1380624900) LustreError: 11-0: lustre-MDT0000-lwp-OST0001: Communicating with 0@lo, operation obd_ping failed with -107. Lustre: lustre-MDT0000-lwp-OST0000: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 2 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping) Lustre: Skipped 1 previous similar message LustreError: Skipped 4 previous similar messages LustreError: 17372:0:(client.c:1076:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8801d1537610 x1447689357275628/t0(0) o13->lustre-OST0000-osc-MDT0000@0@lo:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Lustre: server umount lustre-MDT0000 complete Lustre: 28485:0:(client.c:1897:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1380624907/real 1380624907] req@ffff8801f2f64108 x1447689357275644/t0(0) o251->MGC192.168.69.5@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1380624913 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: 28485:0:(client.c:1897:ptlrpc_expire_one_request()) Skipped 1 previous similar message LNet: Removed LNI 192.168.69.5@tcp LNet: HW CPU cores: 8, npartitions: 2 alg: No test for crc32 (crc32-table) alg: No test for adler32 (adler32-zlib) alg: No test for crc32 (crc32-pclmul) Lustre: Lustre: Build Version: 2.4.93-gbe84477-CHANGED-2.6.32-358.18.1-lustre LNet: Added LNI 192.168.69.5@tcp [8/256/0/180] LNet: Accept secure, port 988 Lustre: Echo OBD driver; http://www.lustre.org/ LDISKFS-fs (loop0): mounted filesystem with ordered data mode. quota=on. Opts: Lustre: lustre-MDT0000: used disk, loading LustreError: 11-0: lustre-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11. LDISKFS-fs (loop1): mounted filesystem with ordered data mode. quota=on. Opts: LustreError: 29613:0:(osd_io.c:1065:osd_ldiskfs_write_record()) loop1: error reading offset 0 (block 0): rc = -28 LustreError: 29613:0:(llog_osd.c:156:llog_osd_write_blob()) lustre-OST0000-osd: error writing log record: rc = -28 Lustre: 29613:0:(osd_internal.h:978:osd_trans_exec_op()) lustre-OST0000-osd: Overflow in tracking declares for index, rb = 0 Pid: 29613, comm: mount.lustre Call Trace: [<ffffffffa09198c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0c85908>] osd_trans_exec_op+0x1f8/0x2e0 [osd_ldiskfs] [<ffffffffa0c96ec3>] osd_attr_set+0xe3/0x540 [osd_ldiskfs] [<ffffffffa0ca3cb5>] ? osd_punch+0x1d5/0x620 [osd_ldiskfs] [<ffffffffa0b17d91>] llog_osd_write_blob+0x211/0x850 [obdclass] [<ffffffffa0b1ba84>] llog_osd_write_rec+0x814/0x13d0 [obdclass] [<ffffffffa0b3d99e>] ? lu_context_init+0x4e/0x240 [obdclass] [<ffffffffa0b3d99e>] ? lu_context_init+0x4e/0x240 [obdclass] [<ffffffffa0aef3d8>] llog_write_rec+0xc8/0x290 [obdclass] [<ffffffffa0af0ab8>] ? llog_write+0x288/0x420 [obdclass] [<ffffffffa0af0add>] llog_write+0x2ad/0x420 [obdclass] [<ffffffffa0af0c74>] llog_copy_handler+0x24/0x30 [obdclass] [<ffffffffa0af1ce6>] llog_process_thread+0x866/0xce0 [obdclass] [<ffffffffa0af0c50>] ? llog_copy_handler+0x0/0x30 [obdclass] [<ffffffffa0af383a>] llog_process_or_fork+0x12a/0x550 [obdclass] [<ffffffffa0af4062>] llog_backup+0x3e2/0x510 [obdclass] [<ffffffff8129b390>] ? sprintf+0x40/0x50 [<ffffffffa073046f>] mgc_process_log+0x10af/0x1770 [mgc] [<ffffffffa0730d33>] ? do_config_log_add+0x203/0x580 [mgc] [<ffffffffa0732153>] mgc_process_config+0x653/0x11e0 [mgc] [<ffffffffa0b32eb6>] lustre_process_log+0x256/0x990 [obdclass] [<ffffffffa0929871>] ? libcfs_debug_msg+0x41/0x50 [libcfs] [<ffffffffa0b6150b>] ? server_register_mount+0x54b/0x820 [obdclass] [<ffffffffa0b63917>] server_start_targets+0x357/0x1aa0 [obdclass] [<ffffffffa0b36355>] ? lustre_start_mgc+0x465/0x1dc0 [obdclass] [<ffffffffa0c8d13b>] ? osd_obd_connect+0x22b/0x270 [osd_ldiskfs] [<ffffffffa0b2e7d0>] ? class_config_llog_handler+0x0/0x1830 [obdclass] [<ffffffffa0b69aac>] server_fill_super+0xbac/0x1630 [obdclass] [<ffffffffa0b37e78>] lustre_fill_super+0x1c8/0x520 [obdclass] [<ffffffffa0b37cb0>] ? lustre_fill_super+0x0/0x520 [obdclass] [<ffffffff81199cbf>] get_sb_nodev+0x5f/0xa0 [<ffffffffa0b30105>] lustre_get_sb+0x25/0x30 [obdclass] [<ffffffff8119927b>] vfs_kern_mount+0x7b/0x1b0 [<ffffffff81199422>] do_kern_mount+0x52/0x130 [<ffffffff811b9ca2>] do_mount+0x2d2/0x8c0 [<ffffffff811ba320>] sys_mount+0x90/0xe0 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b LustreError: 29613:0:(osd_io.c:1065:osd_ldiskfs_write_record()) loop1: error reading offset 0 (block 0): rc = -28 LustreError: 29613:0:(llog_osd.c:156:llog_osd_write_blob()) lustre-OST0000-osd: error writing log record: rc = -28 |
| Comments |
| Comment by Ian Costello (Inactive) [ 15/Jan/14 ] |
|
We are also seeing these stack traces in the logs for recently upgraded 2.4.2 MDS's. |
| Comment by Ian Costello (Inactive) [ 15/Jan/14 ] |
|
Actually the stack traces we are seeing are not the same, but the message prior is. Here is an example of the issue (if necessary will create a separate bug to track): 2014-01-14T17:43:16+11:00 gerbil5 kernel: Lustre: 1013:0:(osd_internal.h:956:osd_trans_exec_op()) short-MDT0000-osd: Overflow in tracking declares for index, rb = 2 |
| Comment by Mark Mansk [ 03/Jun/14 ] |
|
We're seeing this test fail at Cray now with 2.6. Any status on it? |
| Comment by Gerrit Updater [ 24/Apr/15 ] |
|
Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: http://review.whamcloud.com/14566 |
| Comment by Hongchao Zhang [ 24/Apr/15 ] |
|
the patch is tracked at http://review.whamcloud.com/#/c/14566/ |
| Comment by Gerrit Updater [ 16/Jul/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14566/ |
| Comment by Peter Jones [ 16/Jul/15 ] |
|
Landed for 2.8 |