[LU-2041] SWL ldiskfs_add_entry: bad entry in directory #127928380: rec_len is smaller than minimal Created: 28/Sep/12  Updated: 18/Oct/12  Resolved: 18/Oct/12

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0, Lustre 2.4.0
Fix Version/s: Lustre 2.3.0, Lustre 2.4.0

Type: Bug Priority: Critical
Reporter: Cliff White (Inactive) Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

SWL - Hyperion/LLNL


Issue Links:
Duplicate
is duplicated by LU-2080 SWL -ldiskfs_add_entry: bad entry in ... Resolved
Severity: 3
Rank (Obsolete): 4217

 Description   

Running SWL test, after ~6 hours seeing ldiskfs failure and EROFS on the MDT.

 
Sep 27 20:13:20 hyperion-rst6 kernel: Lustre: 4918:0:(mdt_recovery.c:622:mdt_steal_ack_locks()) Stealing 1 locks from rs ffff880108e55000 x1414295525519816.t4537566185 o0 NID 192.168.116.112@o2ib1
Sep 27 20:13:20 hyperion-rst6 kernel: Lustre: 4233:0:(service.c:2105:ptlrpc_handle_rs()) All locks stolen from rs ffff880108e55000 x1414295525519816.t4537566185 o0 NID 192.168.116.112@o2ib1
Sep 27 20:16:39 hyperion-rst6 kernel: LDISKFS-fs error (device md1): ldiskfs_add_entry:
Sep 27 20:16:39 hyperion-rst6 kernel: LDISKFS-fs error (device md1): ldiskfs_add_entry: bad entry in directory #127928380: rec_len is smaller than minimal - block=64030811offset=536(536), inode=127946027, rec_len=0, name_len=4
Sep 27 20:16:39 hyperion-rst6 kernel: Aborting journal on device md1-8.
Sep 27 20:16:39 hyperion-rst6 kernel: LDISKFS-fs error (device md1): ldiskfs_journal_start_sb: Detected aborted journal
Sep 27 20:16:39 hyperion-rst6 kernel: LDISKFS-fs (md1): Remounting filesystem read-only
Sep 27 20:16:39 hyperion-rst6 kernel: LDISKFS-fs error (device md1): ldiskfs_journal_start_sb: Detected aborted journal
Sep 27 20:16:39 hyperion-rst6 kernel: LustreError: 4774:0:(fsfilt-ldiskfs.c:332:fsfilt_ldiskfs_start()) error starting handle for op 10 (114 credits): rc -30
Sep 27 20:16:39 hyperion-rst6 kernel: LustreError: 4774:0:(llog_server.c:414:llog_origin_handle_cancel()) fsfilt_start_log() failed: -30
Sep 27 20:16:39 hyperion-rst6 kernel: LustreError: 4774:0:(llog_server.c:453:llog_origin_handle_cancel()) Cancel 0 of 121 llog-records failed: -30
Sep 27 20:16:39 hyperion-rst6 kernel: LustreError: 4774:0:(fsfilt-ldiskfs.c:332:fsfilt_ldiskfs_start()) error starting handle for op 10 (114 credits): rc -30
Sep 27 20:16:39 hyperion-rst6 kernel: LustreError: 4774:0:(llog_server.c:414:llog_origin_handle_cancel()) fsfilt_start_log() failed: -30
Sep 27 20:16:39 hyperion-rst6 kernel: LustreError: 4774:0:(llog_server.c:453:llog_origin_handle_cancel()) Cancel 0 of 121 llog-records failed: -30
Sep 27 20:16:39 hyperion-rst6 kernel: LDISKFS-fs (md1): bad entry in directory #127928380: rec_len is smaller than minimal - block=64030811offset=536(536), inode=127946027, rec_len=0, name_len=4
Sep 27 20:16:39 hyperion-rst6 kernel: LDISKFS-fs error (device md1) in iam_txn_add: Journal has aborted
Sep 27 20:16:39 hyperion-rst6 kernel: LustreError: 4865:0:(osd_io.c:1014:osd_ldiskfs_write_record()) journal_get_write_access() returned error -30
Sep 27 20:16:39 hyperion-rst6 kernel: LustreError: 4865:0:(osd_handler.c:894:osd_trans_stop()) Failure in transaction hook: -30
Sep 27 20:16:39 hyperion-rst6 kernel: LustreError: 4865:0:(osd_handler.c:899:osd_trans_stop()) Failure to stop transaction: -30
Sep 27 20:16:39 hyperion-rst6 kernel: LDISKFS-fs error (device md1) in iam_txn_add: Journal has aborted
Sep 27 20:16:39 hyperion-rst6 kernel: LustreError: 4731:0:(osd_io.c:1014:osd_ldiskfs_write_record()) journal_get_write_access() returned error -30
Sep 27 20:16:39 hyperion-rst6 kernel: LustreError: 4731:0:(osd_handler.c:894:osd_trans_stop()) Failure in transaction hook: -30
Sep 27 20:16:39 hyperion-rst6 kernel: LustreError: 4731:0:(osd_handler.c:899:osd_trans_stop()) Failure to stop transaction: -30
Sep 27 20:16:39 hyperion-rst6 kernel: Remounting filesystem read-only
Sep 27 20:16:39 hyperion-rst6 kernel: LDISKFS-fs error (device md1) in iam_txn_add: Journal has aborted

Ran fsck on device, recovered

LDISKFS-fs (md1): mounted filesystem with ordered data mode. quota=off. Opts:
LDISKFS-fs (md1): mounted filesystem with ordered data mode. quota=off. Opts:
Lustre: MGC192.168.127.6@o2ib: Reactivating import
Lustre: lustre-MDT0000: used disk, loading
Lustre: 5311:0:(ldlm_lib.c:2139:target_recovery_init()) RECOVERY: service lustre-MDT0000, 91 recoverable clients, last_transno 4541171590
Lustre: lustre-MDT0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450
Lustre: 4326:0:(mgc_request.c:1534:mgc_process_recover_log()) Process recover log lustre-mdtir error -22
Lustre: 4326:0:(mgc_request.c:1534:mgc_process_recover_log()) Skipped 2 previous similar messages
Lustre: lustre-MDT0000: Will be in recovery for at least 2:30, or until 91 clients reconnect
LustreError: 5315:0:(ldlm_lib.c:1717:check_for_next_transno()) lustre-MDT0000: waking for gap in transno, VBR is OFF (skip: 4541171591, ql: 62, comp: 29, conn: 91, next: 4541177085, last_committed: 4541171590)
LustreError: 5315:0:(mds_lov.c:351:mds_lov_update_objids()) Unexpected gap in objids
LustreError: 5315:0:(ldlm_lib.c:1717:check_for_next_transno()) lustre-MDT0000: waking for gap in transno, VBR is OFF (skip: 4541177086, ql: 62, comp: 29, conn: 91, next: 4541181324, last_committed: 4541171590)
Lustre: lustre-MDT0000: disconnecting 1 stale clients
Lustre: lustre-MDT0000: Recovery over after 1:24, of 91 clients 90 recovered and 1 was evicted.
Lustre: MDS mdd_obd-lustre-MDT0000: lustre-OST0028_UUID now active, resetting orphans

Will continue to run, see if we get repeat.



 Comments   
Comment by Peter Jones [ 28/Sep/12 ]

Fanyong

Could you please look into this one?

Thanks

Peter

Comment by nasf (Inactive) [ 28/Sep/12 ]

It is really strange failure. Several possible reasons:

1) The block was reused by others, and the dentry was re-written.

Recently, only OI shrink patch may affect the block allocation, but such patch only affect the blocks belong to OI files, those blocks can never be reused by other non-OI directories, in spite of whether the OI shrink patch added or not. So it should not related such patch.

On the other hand, if the block is re-written by others, then it is quite possible that the whole entry was re-wrriten, but not just the "rec_len" itself. But from the log, the "ino#" and "name_len" are reasonable values. So the blocks should not be re-written.

2) ldiskfs internal bugs.
There may some corner cases in ldiskfs leaf nodes split improperly set wrong "rec_len". We can add some debug patch for checking.

3) Hardware issues.

So if we can reproduce the failure, then it may be caused by 2), otherwise, may be 3).

Comment by nasf (Inactive) [ 28/Sep/12 ]

This is the patch for 2):

http://review.whamcloud.com/#change,4118

Comment by Di Wang [ 28/Sep/12 ]

Sorry for stepping in here. Clearly 1948 is not being fixed with the patch of 1976. I just re-investigated the result Cliff got on 1948. It seems to me two threads concurrently access the dir.

Thread A adds entry to dir 78644697, and met zero rec_len here.

 
2012-09-18 14:13:37 LDISKFS-fs error (device md1): add_dirent:
2012-09-18 14:13:37 LDISKFS-fs error (device md1): ldiskfs_add_entry: bad entry in directory #78644697: rec_len is smaller than minimal - block=39336951offset=152(152), inode=78655820, rec_len=0, name_len=4
2012-09-18 14:13:37 Aborting journal on device md1-8.

Thread B hit BUG_ON in the later place, which I added for debugging the problem,

1930	@@ -1387,7 +1751,17 @@
			1931	 »      if (de->inode) {
			1932	 »      »       struct ext4_dir_entry_2 *de1 = (struct ext4_dir_entry_2 *)((char *)de + nlen);
			1933	 »      »       de1->rec_len = ext4_rec_len_to_disk(rlen - nlen, blocksize);
			1934	+»      »       if (de1->rec_len == 0) {
			1935	+»      »       »       __ext4_error(dir->i_sb, function, "%lu: name %s, lck %p, name_len=%d\n",
			1936	+»      »       »       »            dir->i_ino, name, name_len);
			1937	+»      »       »       BUG_ON(de1->rec_len == 0);
			1938	+»      »       }
			1939	 »      »       de->rec_len = ext4_rec_len_to_disk(nlen, blocksize);
			1940	+»      »       if (de->rec_len == 0) {
			1941	+»      »       »       __ext4_error(dir->i_sb, function, "%lu: name %s, lck %p name_len=%d\n",
			1942	+»      »       »       »            dir->i_ino, name, name_len);
			1943	+»      »       »       BUG_ON(de->rec_len == 0);
			1944	+»      »       }
			1945	 »      »       de = de1;
			1946	 »      }
			1947	 »      de->file_type = EXT4_FT_UNKNOWN;

The console message for hitting the BUG.

 
2012-09-18 14:13:37 78644697: name L0F7, name_len=4
2012-09-18 14:13:37
2012-09-18 14:13:37 ------------[ cut here ]------------
2012-09-18 14:13:37 kernel BUG at /var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/server/distro/el6/ib_stack/inkernel/BUILD/BUILD/lustre-ldiskfs-3.3.0/ldiskfs/namei.c:1757!
2012-09-18 14:13:37 invalid opcode: 0000 [#1] SMP
2012-09-18 14:13:37 last sysfs file: /sys/devices/pci0000:00/0000:00:09.0/0000:05:00.0/infiniband_mad/umad0/port
2012-09-18 14:13:37 CPU 7
......

If thread A and B are serialized for this function, both of them should fail at ldiskfs_check_dir_entry. So it still seems ldiskfs pdir patch problem. Liang, Could you please have a look again for this one.

And also for this test, each dir only includes 10 files, so it should not involve index splitting IMHO, all dir are just single block dir.

Comment by Liang Zhen (Inactive) [ 28/Sep/12 ]

for directory w/o indices, PDO lock equals to a RW semaphore which is exclusive for writing, which means PDO is actually disabled for small directory like this. But I will check my code again...

Comment by Liang Zhen (Inactive) [ 28/Sep/12 ]

Nasf, I would suggest to run with wangdi's patch on LU-1948 to gather more information instead of yours patch, actually I just added a little more debug information to Di's patch: http://review.whamcloud.com/#change,4020
How do you think?

Comment by nasf (Inactive) [ 28/Sep/12 ]

Di, why you think the "If thread A and B are serialized for this function, both of them should fail at ldiskfs_check_dir_entry" ?

If thread A call add_dirent_to_buf() with NULL "de", it will call ldiskfs_check_dir_entry(); If thread B call ldiskfs_check_dir_entry() with non-NULL "de", it will not call ldiskfs_check_dir_entry().

So thread A and thread B may fail at different positions. Maybe I miss something, please correct me.

Comment by Di Wang [ 29/Sep/12 ]

Please see the trace on LU-1948

2012-09-18 14:13:37 [<ffffffffa0db5c5d>] ldiskfs_add_entry+0xcd/0x500 [ldiskfs]
2012-09-18 14:13:37 [<ffffffffa1001689>] __osd_ea_add_rec+0xb9/0x190 [osd_ldiskfs]
2012-09-18 14:13:37 [<ffffffffa100dbeb>] osd_index_ea_insert+0x21b/0x5e0 [osd_ldiskfs]
2012-09-18 14:13:37 [<ffffffffa0eee977>] __mdd_index_insert_only+0x147/0x150 [mdd]
2012-09-18 14:13:37 [<ffffffffa0eef9b1>] __mdd_index_insert+0x51/0x1f0 [mdd]
2012-09-18 14:13:37 [<ffffffffa0ef55e3>] mdd_create+0x19a3/0x20c0 [mdd]
2012-09-18 14:13:37 [<ffffffffa03aa5b1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
2012-09-18 14:13:37 [<ffffffffa06a4637>] cml_create+0x97/0x250 [cmm]
2012-09-18 14:13:37 [<ffffffffa0f8bb9f>] mdt_reint_open+0x108f/0x18a0 [mdt]
2012-09-18 14:13:37 [<ffffffffa0f75151>] mdt_reint_rec+0x41/0xe0 [mdt]

Then in ldiskfs_add_entry, because there are only about 10 entries per dir, so it should not be dx dir, and the size should be == 4096.

blocks = dir->i_size >> sb->s_blocksize_bits;
for (block = 0; block < blocks; block++) {
bh = ldiskfs_bread(handle, dir, block, 0, &retval);
if(!bh)
return retval;
retval = add_dirent_to_buf(handle, dentry, inode, NULL, bh); <--- it should go into add_dirent_to_buf here, since the initial size should be 4096, and add 10 entries should not be over 4096.

if (retval != -ENOSPC)

{ brelse(bh); return retval; }

if (blocks == 1 && !dx_fallback &&
LDISKFS_HAS_COMPAT_FEATURE(sb, LDISKFS_FEATURE_COMPAT_DIR_INDEX))
return make_indexed_dir(handle, dentry, inode, bh);
brelse(bh);
}
bh = ldiskfs_append(handle, dir, &block, &retval);
if (!bh)
return retval;
de = (struct ldiskfs_dir_entry_2 *) bh->b_data;
de->inode = 0;
de->rec_len = ldiskfs_rec_len_to_disk(blocksize, blocksize);
retval = add_dirent_to_buf(handle, dentry, inode, de, bh); <---- even it goes into from here, de->inode = 0 make it impossible to get to the BUG_ON place.

Comment by Liang Zhen (Inactive) [ 29/Sep/12 ]

fdtree will actually create 10 subdirs and 10 files, so it has 20 entries, but the directory is still plain w/o index, I tried this on my local workstation, and verified this.
The question is, why we can only see this while running fdtree, although logs on this ticket didn't print out filename, but because namelen is still 4 which matches namelen of fdtree, so I guess the directory is still created by fdtree

Comment by Liang Zhen (Inactive) [ 29/Sep/12 ]

I checked PDO patch again but found nothing, because directory created by fdtree only has 20 entries and it's still a plain directory w/o index, I didn't change this part at all, also, directory created by fdtree has hostname+pid as part directory name, which means directory should be unique for each thread, if there's a bug to mess up file creation for a single thread, we might have seen it months ago because PDO is a 2.2 feature.

However, I can't explain why we can see two failures in different places (wangdi's previous comment), the only possibility comes to my mind is OSD created two instances of htree_lock_head for the directory, or even created two lu_object for the same directory in rare case.

The patch I posted in previous comment should give information to verify this (http://review.whamcloud.com/#change,4020, build is http://build.whamcloud.com/job/lustre-reviews/9547/), so we might have to wait for the other try with this patch.

Comment by Liang Zhen (Inactive) [ 30/Sep/12 ]

I rebased my debug patch and make it depend on patch of LU-1951, the build is ready:
http://build.whamcloud.com/job/lustre-reviews/9573/

Comment by Liang Zhen (Inactive) [ 02/Oct/12 ]

I merged fix of LU-1951, fix of LU-2041 and debug patch of LU-1948 together (http://review.whamcloud.com/#change,4153) , here is the build of it: http://build.whamcloud.com/job/lustre-reviews/9619/ , I think it's the next thing we should try on Hyperion, Cliff, could you please try it? Thanks.

Comment by Liang Zhen (Inactive) [ 03/Oct/12 ]

also add another fix from wangdi to the debug patch, the latest build is: http://build.whamcloud.com/job/lustre-reviews/9641/

Comment by Liang Zhen (Inactive) [ 04/Oct/12 ]

just posted a patch : http://review.whamcloud.com/#change,4186
It fixed a corner case in osd_it_ea_rec() which might cause inconsistent OI cache, I'm not 100% sure it's fix of this bug, but at least last time SWL ran much longer than before w/o seeing this bug after applying this patch.

Comment by Cliff White (Inactive) [ 05/Oct/12 ]

Crashed.

Oct  5 18:27:58 hyperion-rst6 kernel: LDISKFS-fs Dumping htree_lock ffff88011e8db200, mode 3, head ffff880139a39180
Oct  5 18:27:58 hyperion-rst6 kernel:   granted mode: 3:1
Oct  5 18:27:58 hyperion-rst6 kernel: LDISKFS-fs error (device md1): add_dirent(de1): dir(134759485, dx: 1, nblks: 33) de(mdtest.16.41 : 12), off 2376 rlen 40 nlen 40
Oct  5 18:27:58 hyperion-rst6 kernel: LDISKFS-fs error (device md1): ldiskfs_dx_find_entry: bad entry in directory #134759485: rec_len is smaller than minimal - block=67436580offset=2416(59760), inode=134767633, rec_len=0, name_len=5
Oct  5 18:27:58 hyperion-rst6 kernel: Aborting journal on device md1-8.
Oct  5 18:27:58 hyperion-rst6 kernel: LustreError: 4331:0:(osd_io.c:1014:osd_ldiskfs_write_record()) journal_get_write_access() returned error -30
Oct  5 18:27:58 hyperion-rst6 kernel: LustreError: 4331:0:(osd_handler.c:894:osd_trans_stop()) Failure in transaction hook: -30
Oct  5 18:27:58 hyperion-rst6 kernel: LDISKFS-fs error (device md1) in osd_trans_stop: Journal has aborted
Oct  5 18:27:58 hyperion-rst6 kernel: LDISKFS-fs (md1): Remounting filesystem read-only
Oct  5 18:27:58 hyperion-rst6 kernel: LDISKFS-fs (md1): Remounting filesystem read-only
Oct  5 18:27:58 hyperion-rst6 kernel: LustreError: 4331:0:(osd_handler.c:899:osd_trans_stop()) Failure to stop transaction: -30
Oct  5 18:27:58 hyperion-rst6 kernel:
Oct  5 18:27:58 hyperion-rst6 kernel: ------------[ cut here ]------------
Oct  5 18:27:58 hyperion-rst6 kernel: kernel BUG at /var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/server/distro/el6/ib_stack/inkernel/BUILD/BUILD/lustre-ldiskfs-3.3.0/ldiskfs/namei.c:1757!
Oct  5 18:27:58 hyperion-rst6 kernel: invalid opcode: 0000 [#1] SMP
Oct  5 18:27:58 hyperion-rst6 kernel: last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
Oct  5 18:27:58 hyperion-rst6 kernel: CPU 5  
Oct  5 18:27:59 hyperion-rst6 kernel: Modules linked in: cmm(U) osd_ldiskfs(U) mdt(U) mdd(U) mds(U) fsfilt_ldiskfs(U) exportfs mgs(U) mgc(U) lustre(U) lquota(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ldiskfs(U) mbcache jbd2 zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate ko2iblnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa mlx4_ib ib_mad ib_core dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun kvm raid0 sg sr_mod cdrom sd_mod crc_t10dif dcdbas serio_raw ata_generic pata_acpi ata_piix iTCO_wdt iTCO_vendor_support mptsas mptscsih mptbase scsi_transport_sas i7core_edac edac_core ipv6 nfs lockd fscache nfs_acl auth_rpcgss sunrpc mlx4_en mlx4_core bnx2 [last unloaded: scsi_wait_scan]
Oct  5 18:27:59 hyperion-rst6 kernel:
Oct  5 18:27:59 hyperion-rst6 kernel: Pid: 6331, comm: mdt03_016 Tainted: P           ---------------    2.6.32-279.5.1.el6_lustre.gb16fe80.x86_64 #1 Dell Inc. PowerEdge R610/0K399H
Oct  5 18:27:59 hyperion-rst6 kernel: RIP: 0010:[<ffffffffa0726502>]  [<ffffffffa0726502>] add_dirent_to_buf+0x552/0x6a0 [ldiskfs]
Oct  5 18:27:59 hyperion-rst6 kernel: RSP: 0018:ffff880159a17480  EFLAGS: 00010246
Oct  5 18:27:59 hyperion-rst6 kernel: RAX: ffff8801eea83800 RBX: ffff880313470948 RCX: ffff880313470970
Oct  5 18:27:59 hyperion-rst6 kernel: RDX: ffff88031833f400 RSI: 0000000000000046 RDI: ffff8801eea83c00
Oct  5 18:27:59 hyperion-rst6 kernel: RBP: ffff880159a17550 R08: 0000000000000000 R09: 0000000000000000
Oct  5 18:27:59 hyperion-rst6 kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffff880146a44250
Oct  5 18:27:59 hyperion-rst6 kernel: R13: 0000000000000948 R14: 000000000000000c R15: 0000000000000028
Oct  5 18:27:59 hyperion-rst6 kernel: FS:  00002aaaab47e700(0000) GS:ffff880028240000(0000) knlGS:0000000000000000
Oct  5 18:27:59 hyperion-rst6 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Oct  5 18:27:59 hyperion-rst6 kernel: CR2: 00002aaaab34230c CR3: 0000000001a85000 CR4: 00000000000006e0
Oct  5 18:27:59 hyperion-rst6 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct  5 18:27:59 hyperion-rst6 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Oct  5 18:27:59 hyperion-rst6 kernel: Stack:
Oct  5 18:27:59 hyperion-rst6 kernel: ffffc9002ae7e1f0 000000000000000c ffff880100000948 0000000000000028
Oct  5 18:27:59 hyperion-rst6 kernel: <d> 0000000000000028 00000000ecc23e5e ffff8801abec6080 ffffffff080863e4
Oct  5 18:27:59 hyperion-rst6 kernel: <d> 0000000000021000 ffffffffa074720c ffff88011e8db200 ffff880113f7cc18
Oct  5 18:27:59 hyperion-rst6 kernel: Call Trace:
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0728999>] ldiskfs_dx_add_entry+0x169/0x7e0 [ldiskfs]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa07433f1>] ? ldiskfs_xattr_set_handle+0x151/0x560 [ldiskfs]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa072929d>] ldiskfs_add_entry+0x28d/0x500 [ldiskfs]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0f13689>] __osd_ea_add_rec+0xb9/0x190 [osd_ldiskfs]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0f1fc1b>] osd_index_ea_insert+0x21b/0x5e0 [osd_ldiskfs]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0e21d2d>] ? __mdd_xattr_set+0x25d/0x380 [mdd]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0e3a767>] __mdd_index_insert_only+0x147/0x150 [mdd]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0e3b901>] __mdd_index_insert+0x51/0x1f0 [mdd]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0e3bdf9>] ? mdd_object_capa+0x29/0x1b0 [mdd]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0e416eb>] mdd_create+0x1a0b/0x2180 [mdd]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0f19f9f>] ? osd_xattr_get+0x9f/0x360 [osd_ldiskfs]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0f6d637>] cml_create+0x97/0x250 [cmm]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0eaaddf>] ? mdt_version_get_save+0x8f/0xd0 [mdt]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0ebeb9f>] mdt_reint_open+0x108f/0x18a0 [mdt]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0e4728e>] ? md_ucred+0x1e/0x60 [mdd]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0e8c235>] ? mdt_ucred+0x15/0x20 [mdt]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0ea8151>] mdt_reint_rec+0x41/0xe0 [mdt]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0ea19aa>] mdt_reint_internal+0x50a/0x810 [mdt]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0ea1f7d>] mdt_intent_reint+0x1ed/0x500 [mdt]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0e9e191>] mdt_intent_policy+0x371/0x6a0 [mdt]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa090d881>] ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa09359bf>] ldlm_handle_enqueue0+0x48f/0xf70 [ptlrpc]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0e9e506>] mdt_enqueue+0x46/0x130 [mdt]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0e95802>] mdt_handle_common+0x922/0x1740 [mdt]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0e966f5>] mdt_regular_handle+0x15/0x20 [mdt]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0965b3c>] ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa039365e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa03a514f>] ? lc_watchdog_touch+0x6f/0x180 [libcfs]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa095cf37>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffff810533f3>] ? __wake_up+0x53/0x70
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0967111>] ptlrpc_main+0xbf1/0x19e0 [ptlrpc]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0966520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0966520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffffa0966520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
Oct  5 18:27:59 hyperion-rst6 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
Oct  5 18:27:59 hyperion-rst6 kernel: Code: 54 24 18 48 8b 55 b8 48 89 c1 31 c0 48 89 14 24 48 c7 c2 38 c3 74 a0 e8 0d 55 01 00 48 8b 4d a8 66 83 79 04 00 0f 85 13 fd ff ff <0f> 0b eb fe 48 8b 7d 80 e8 d1 e7 01 00 49 8b bc 24 08 01 00 00
Oct  5 18:27:59 hyperion-rst6 kernel: RIP  [<ffffffffa0726502>] add_dirent_to_buf+0x552/0x6a0 [ldiskfs]
Oct  5 18:27:59 hyperion-rst6 kernel: RSP <ffff880159a17480>

I don't think we dropped a vmcore on this one, still looking

Comment by Liang Zhen (Inactive) [ 05/Oct/12 ]

this time we crashed in mdtest, and the directory is a directory with indices, so it's more like a PDO issue, however:

  • we can see in the log (granted mode: 3:1), which means the directory only granted one thread to enter the directory, but there are two threads hit different errors in different places, under the same directory (with same ino: 134759485)
  • there is a entry has name_len=5, but mdtest will never create file with name_len=5
    Oct  5 18:27:58 hyperion-rst6 kernel: LDISKFS-fs error (device md1): ldiskfs_dx_find_entry: bad entry in directory #134759485: rec_len is smaller than minimal - block=67436580offset=2416(59760), inode=134767633, rec_len=0, name_len=5
    

Based on previous two things, it could be either:
a) PDO lock totally screwed up locking;

or

b) two OSD objects with different FIDs point to a same inode (directory) for some unknown reason.

I think the only choice now is disable several things one by one:

  • OI cache
  • OI shrinker
  • PDO

So I have to create different builds to try...

Comment by Liang Zhen (Inactive) [ 06/Oct/12 ]

latest build is here: http://build.whamcloud.com/job/lustre-reviews/9744/
major change in this build is disabled OI cache

Comment by nasf (Inactive) [ 06/Oct/12 ]

This patch fixes some potential OI mapping cache inconsistency issues:

http://review.whamcloud.com/#change,4207

Comment by Liang Zhen (Inactive) [ 07/Oct/12 ]

The latest build is here: http://build.whamcloud.com/job/lustre-reviews/9758

I merged fanyong's fix into this build.

Comment by Peter Jones [ 08/Oct/12 ]

Dropping priority because landed for 2.3

Comment by Jodi Levi (Inactive) [ 17/Oct/12 ]

Patch for master here: http://review.whamcloud.com/#change,4208

Comment by Jodi Levi (Inactive) [ 18/Oct/12 ]

Please let me know if this needs to be reopened for more work.

Generated at Sat Feb 10 01:21:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.