[LU-2041] SWL ldiskfs_add_entry: bad entry in directory #127928380: rec_len is smaller than minimal - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.3.0, Lustre 2.4.0
Affects Version/s: Lustre 2.3.0, Lustre 2.4.0
Labels:
None
Environment:
SWL - Hyperion/LLNL

Severity:
3
Rank (Obsolete):
4217

Description

Running SWL test, after ~6 hours seeing ldiskfs failure and EROFS on the MDT.

 
Sep 27 20:13:20 hyperion-rst6 kernel: Lustre: 4918:0:(mdt_recovery.c:622:mdt_steal_ack_locks()) Stealing 1 locks from rs ffff880108e55000 x1414295525519816.t4537566185 o0 NID 192.168.116.112@o2ib1
Sep 27 20:13:20 hyperion-rst6 kernel: Lustre: 4233:0:(service.c:2105:ptlrpc_handle_rs()) All locks stolen from rs ffff880108e55000 x1414295525519816.t4537566185 o0 NID 192.168.116.112@o2ib1
Sep 27 20:16:39 hyperion-rst6 kernel: LDISKFS-fs error (device md1): ldiskfs_add_entry:
Sep 27 20:16:39 hyperion-rst6 kernel: LDISKFS-fs error (device md1): ldiskfs_add_entry: bad entry in directory #127928380: rec_len is smaller than minimal - block=64030811offset=536(536), inode=127946027, rec_len=0, name_len=4
Sep 27 20:16:39 hyperion-rst6 kernel: Aborting journal on device md1-8.
Sep 27 20:16:39 hyperion-rst6 kernel: LDISKFS-fs error (device md1): ldiskfs_journal_start_sb: Detected aborted journal
Sep 27 20:16:39 hyperion-rst6 kernel: LDISKFS-fs (md1): Remounting filesystem read-only
Sep 27 20:16:39 hyperion-rst6 kernel: LDISKFS-fs error (device md1): ldiskfs_journal_start_sb: Detected aborted journal
Sep 27 20:16:39 hyperion-rst6 kernel: LustreError: 4774:0:(fsfilt-ldiskfs.c:332:fsfilt_ldiskfs_start()) error starting handle for op 10 (114 credits): rc -30
Sep 27 20:16:39 hyperion-rst6 kernel: LustreError: 4774:0:(llog_server.c:414:llog_origin_handle_cancel()) fsfilt_start_log() failed: -30
Sep 27 20:16:39 hyperion-rst6 kernel: LustreError: 4774:0:(llog_server.c:453:llog_origin_handle_cancel()) Cancel 0 of 121 llog-records failed: -30
Sep 27 20:16:39 hyperion-rst6 kernel: LustreError: 4774:0:(fsfilt-ldiskfs.c:332:fsfilt_ldiskfs_start()) error starting handle for op 10 (114 credits): rc -30
Sep 27 20:16:39 hyperion-rst6 kernel: LustreError: 4774:0:(llog_server.c:414:llog_origin_handle_cancel()) fsfilt_start_log() failed: -30
Sep 27 20:16:39 hyperion-rst6 kernel: LustreError: 4774:0:(llog_server.c:453:llog_origin_handle_cancel()) Cancel 0 of 121 llog-records failed: -30
Sep 27 20:16:39 hyperion-rst6 kernel: LDISKFS-fs (md1): bad entry in directory #127928380: rec_len is smaller than minimal - block=64030811offset=536(536), inode=127946027, rec_len=0, name_len=4
Sep 27 20:16:39 hyperion-rst6 kernel: LDISKFS-fs error (device md1) in iam_txn_add: Journal has aborted
Sep 27 20:16:39 hyperion-rst6 kernel: LustreError: 4865:0:(osd_io.c:1014:osd_ldiskfs_write_record()) journal_get_write_access() returned error -30
Sep 27 20:16:39 hyperion-rst6 kernel: LustreError: 4865:0:(osd_handler.c:894:osd_trans_stop()) Failure in transaction hook: -30
Sep 27 20:16:39 hyperion-rst6 kernel: LustreError: 4865:0:(osd_handler.c:899:osd_trans_stop()) Failure to stop transaction: -30
Sep 27 20:16:39 hyperion-rst6 kernel: LDISKFS-fs error (device md1) in iam_txn_add: Journal has aborted
Sep 27 20:16:39 hyperion-rst6 kernel: LustreError: 4731:0:(osd_io.c:1014:osd_ldiskfs_write_record()) journal_get_write_access() returned error -30
Sep 27 20:16:39 hyperion-rst6 kernel: LustreError: 4731:0:(osd_handler.c:894:osd_trans_stop()) Failure in transaction hook: -30
Sep 27 20:16:39 hyperion-rst6 kernel: LustreError: 4731:0:(osd_handler.c:899:osd_trans_stop()) Failure to stop transaction: -30
Sep 27 20:16:39 hyperion-rst6 kernel: Remounting filesystem read-only
Sep 27 20:16:39 hyperion-rst6 kernel: LDISKFS-fs error (device md1) in iam_txn_add: Journal has aborted

Ran fsck on device, recovered

LDISKFS-fs (md1): mounted filesystem with ordered data mode. quota=off. Opts:
LDISKFS-fs (md1): mounted filesystem with ordered data mode. quota=off. Opts:
Lustre: MGC192.168.127.6@o2ib: Reactivating import
Lustre: lustre-MDT0000: used disk, loading
Lustre: 5311:0:(ldlm_lib.c:2139:target_recovery_init()) RECOVERY: service lustre-MDT0000, 91 recoverable clients, last_transno 4541171590
Lustre: lustre-MDT0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450
Lustre: 4326:0:(mgc_request.c:1534:mgc_process_recover_log()) Process recover log lustre-mdtir error -22
Lustre: 4326:0:(mgc_request.c:1534:mgc_process_recover_log()) Skipped 2 previous similar messages
Lustre: lustre-MDT0000: Will be in recovery for at least 2:30, or until 91 clients reconnect
LustreError: 5315:0:(ldlm_lib.c:1717:check_for_next_transno()) lustre-MDT0000: waking for gap in transno, VBR is OFF (skip: 4541171591, ql: 62, comp: 29, conn: 91, next: 4541177085, last_committed: 4541171590)
LustreError: 5315:0:(mds_lov.c:351:mds_lov_update_objids()) Unexpected gap in objids
LustreError: 5315:0:(ldlm_lib.c:1717:check_for_next_transno()) lustre-MDT0000: waking for gap in transno, VBR is OFF (skip: 4541177086, ql: 62, comp: 29, conn: 91, next: 4541181324, last_committed: 4541171590)
Lustre: lustre-MDT0000: disconnecting 1 stale clients
Lustre: lustre-MDT0000: Recovery over after 1:24, of 91 clients 90 recovered and 1 was evicted.
Lustre: MDS mdd_obd-lustre-MDT0000: lustre-OST0028_UUID now active, resetting orphans

Will continue to run, see if we get repeat.

Attachments

Issue Links

is duplicated by

LU-2080 SWL -ldiskfs_add_entry: bad entry in directory #: rec_len is smaller than minimal

Resolved

Activity

[LU-2041] SWL ldiskfs_add_entry: bad entry in directory #127928380: rec_len is smaller than minimal

Liang Zhen (Inactive) added a comment - 04/Oct/12 2:12 PM

just posted a patch : http://review.whamcloud.com/#change,4186
It fixed a corner case in osd_it_ea_rec() which might cause inconsistent OI cache, I'm not 100% sure it's fix of this bug, but at least last time SWL ran much longer than before w/o seeing this bug after applying this patch.

Liang Zhen (Inactive) added a comment - 04/Oct/12 2:12 PM just posted a patch : http://review.whamcloud.com/#change,4186 It fixed a corner case in osd_it_ea_rec() which might cause inconsistent OI cache, I'm not 100% sure it's fix of this bug, but at least last time SWL ran much longer than before w/o seeing this bug after applying this patch.

Liang Zhen (Inactive) added a comment - 03/Oct/12 3:31 AM

also add another fix from wangdi to the debug patch, the latest build is: http://build.whamcloud.com/job/lustre-reviews/9641/

Liang Zhen (Inactive) added a comment - 03/Oct/12 3:31 AM also add another fix from wangdi to the debug patch, the latest build is: http://build.whamcloud.com/job/lustre-reviews/9641/

Liang Zhen (Inactive) added a comment - 02/Oct/12 11:00 AM

I merged fix of ~~LU-1951~~, fix of ~~LU-2041~~ and debug patch of ~~LU-1948~~ together (http://review.whamcloud.com/#change,4153) , here is the build of it: http://build.whamcloud.com/job/lustre-reviews/9619/ , I think it's the next thing we should try on Hyperion, Cliff, could you please try it? Thanks.

Liang Zhen (Inactive) added a comment - 02/Oct/12 11:00 AM I merged fix of LU-1951 , fix of LU-2041 and debug patch of LU-1948 together ( http://review.whamcloud.com/#change,4153 ) , here is the build of it: http://build.whamcloud.com/job/lustre-reviews/9619/ , I think it's the next thing we should try on Hyperion, Cliff, could you please try it? Thanks.

Liang Zhen (Inactive) added a comment - 30/Sep/12 12:41 AM

I rebased my debug patch and make it depend on patch of ~~LU-1951~~, the build is ready:
http://build.whamcloud.com/job/lustre-reviews/9573/

Liang Zhen (Inactive) added a comment - 30/Sep/12 12:41 AM I rebased my debug patch and make it depend on patch of LU-1951 , the build is ready: http://build.whamcloud.com/job/lustre-reviews/9573/

Liang Zhen (Inactive) added a comment - 29/Sep/12 7:12 AM - edited

I checked PDO patch again but found nothing, because directory created by fdtree only has 20 entries and it's still a plain directory w/o index, I didn't change this part at all, also, directory created by fdtree has hostname+pid as part directory name, which means directory should be unique for each thread, if there's a bug to mess up file creation for a single thread, we might have seen it months ago because PDO is a 2.2 feature.

However, I can't explain why we can see two failures in different places (wangdi's previous comment), the only possibility comes to my mind is OSD created two instances of htree_lock_head for the directory, or even created two lu_object for the same directory in rare case.

The patch I posted in previous comment should give information to verify this (http://review.whamcloud.com/#change,4020, build is http://build.whamcloud.com/job/lustre-reviews/9547/), so we might have to wait for the other try with this patch.

Liang Zhen (Inactive) added a comment - 29/Sep/12 7:12 AM - edited I checked PDO patch again but found nothing, because directory created by fdtree only has 20 entries and it's still a plain directory w/o index, I didn't change this part at all, also, directory created by fdtree has hostname+pid as part directory name, which means directory should be unique for each thread, if there's a bug to mess up file creation for a single thread, we might have seen it months ago because PDO is a 2.2 feature. However, I can't explain why we can see two failures in different places (wangdi's previous comment), the only possibility comes to my mind is OSD created two instances of htree_lock_head for the directory, or even created two lu_object for the same directory in rare case. The patch I posted in previous comment should give information to verify this ( http://review.whamcloud.com/#change,4020 , build is http://build.whamcloud.com/job/lustre-reviews/9547/ ), so we might have to wait for the other try with this patch.

Liang Zhen (Inactive) added a comment - 29/Sep/12 12:54 AM - edited

fdtree will actually create 10 subdirs and 10 files, so it has 20 entries, but the directory is still plain w/o index, I tried this on my local workstation, and verified this.
The question is, why we can only see this while running fdtree, although logs on this ticket didn't print out filename, but because namelen is still 4 which matches namelen of fdtree, so I guess the directory is still created by fdtree

Liang Zhen (Inactive) added a comment - 29/Sep/12 12:54 AM - edited fdtree will actually create 10 subdirs and 10 files, so it has 20 entries, but the directory is still plain w/o index, I tried this on my local workstation, and verified this. The question is, why we can only see this while running fdtree, although logs on this ticket didn't print out filename, but because namelen is still 4 which matches namelen of fdtree, so I guess the directory is still created by fdtree

Di Wang added a comment - 29/Sep/12 12:42 AM

Please see the trace on ~~LU-1948~~

2012-09-18 14:13:37 [<ffffffffa0db5c5d>] ldiskfs_add_entry+0xcd/0x500 [ldiskfs]
2012-09-18 14:13:37 [<ffffffffa1001689>] __osd_ea_add_rec+0xb9/0x190 [osd_ldiskfs]
2012-09-18 14:13:37 [<ffffffffa100dbeb>] osd_index_ea_insert+0x21b/0x5e0 [osd_ldiskfs]
2012-09-18 14:13:37 [<ffffffffa0eee977>] __mdd_index_insert_only+0x147/0x150 [mdd]
2012-09-18 14:13:37 [<ffffffffa0eef9b1>] __mdd_index_insert+0x51/0x1f0 [mdd]
2012-09-18 14:13:37 [<ffffffffa0ef55e3>] mdd_create+0x19a3/0x20c0 [mdd]
2012-09-18 14:13:37 [<ffffffffa03aa5b1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
2012-09-18 14:13:37 [<ffffffffa06a4637>] cml_create+0x97/0x250 [cmm]
2012-09-18 14:13:37 [<ffffffffa0f8bb9f>] mdt_reint_open+0x108f/0x18a0 [mdt]
2012-09-18 14:13:37 [<ffffffffa0f75151>] mdt_reint_rec+0x41/0xe0 [mdt]

Then in ldiskfs_add_entry, because there are only about 10 entries per dir, so it should not be dx dir, and the size should be == 4096.

blocks = dir->i_size >> sb->s_blocksize_bits;
for (block = 0; block < blocks; block++) {
bh = ldiskfs_bread(handle, dir, block, 0, &retval);
if(!bh)
return retval;
retval = add_dirent_to_buf(handle, dentry, inode, NULL, bh); <--- it should go into add_dirent_to_buf here, since the initial size should be 4096, and add 10 entries should not be over 4096.

if (retval != -ENOSPC)

{ brelse(bh); return retval; }

if (blocks == 1 && !dx_fallback &&
LDISKFS_HAS_COMPAT_FEATURE(sb, LDISKFS_FEATURE_COMPAT_DIR_INDEX))
return make_indexed_dir(handle, dentry, inode, bh);
brelse(bh);
}
bh = ldiskfs_append(handle, dir, &block, &retval);
if (!bh)
return retval;
de = (struct ldiskfs_dir_entry_2 *) bh->b_data;
de->inode = 0;
de->rec_len = ldiskfs_rec_len_to_disk(blocksize, blocksize);
retval = add_dirent_to_buf(handle, dentry, inode, de, bh); <---- even it goes into from here, de->inode = 0 make it impossible to get to the BUG_ON place.

Di Wang added a comment - 29/Sep/12 12:42 AM Please see the trace on LU-1948 2012-09-18 14:13:37 [<ffffffffa0db5c5d>] ldiskfs_add_entry+0xcd/0x500 [ldiskfs] 2012-09-18 14:13:37 [<ffffffffa1001689>] __osd_ea_add_rec+0xb9/0x190 [osd_ldiskfs] 2012-09-18 14:13:37 [<ffffffffa100dbeb>] osd_index_ea_insert+0x21b/0x5e0 [osd_ldiskfs] 2012-09-18 14:13:37 [<ffffffffa0eee977>] __mdd_index_insert_only+0x147/0x150 [mdd] 2012-09-18 14:13:37 [<ffffffffa0eef9b1>] __mdd_index_insert+0x51/0x1f0 [mdd] 2012-09-18 14:13:37 [<ffffffffa0ef55e3>] mdd_create+0x19a3/0x20c0 [mdd] 2012-09-18 14:13:37 [<ffffffffa03aa5b1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 2012-09-18 14:13:37 [<ffffffffa06a4637>] cml_create+0x97/0x250 [cmm] 2012-09-18 14:13:37 [<ffffffffa0f8bb9f>] mdt_reint_open+0x108f/0x18a0 [mdt] 2012-09-18 14:13:37 [<ffffffffa0f75151>] mdt_reint_rec+0x41/0xe0 [mdt] Then in ldiskfs_add_entry, because there are only about 10 entries per dir, so it should not be dx dir, and the size should be == 4096. blocks = dir->i_size >> sb->s_blocksize_bits; for (block = 0; block < blocks; block++) { bh = ldiskfs_bread(handle, dir, block, 0, &retval); if(!bh) return retval; retval = add_dirent_to_buf(handle, dentry, inode, NULL, bh); <--- it should go into add_dirent_to_buf here, since the initial size should be 4096, and add 10 entries should not be over 4096. if (retval != -ENOSPC) { brelse(bh); return retval; } if (blocks == 1 && !dx_fallback && LDISKFS_HAS_COMPAT_FEATURE(sb, LDISKFS_FEATURE_COMPAT_DIR_INDEX)) return make_indexed_dir(handle, dentry, inode, bh); brelse(bh); } bh = ldiskfs_append(handle, dir, &block, &retval); if (!bh) return retval; de = (struct ldiskfs_dir_entry_2 *) bh->b_data; de->inode = 0; de->rec_len = ldiskfs_rec_len_to_disk(blocksize, blocksize); retval = add_dirent_to_buf(handle, dentry, inode, de, bh); <---- even it goes into from here, de->inode = 0 make it impossible to get to the BUG_ON place.

nasf (Inactive) added a comment - 28/Sep/12 11:55 PM

Di, why you think the "If thread A and B are serialized for this function, both of them should fail at ldiskfs_check_dir_entry" ?

If thread A call add_dirent_to_buf() with NULL "de", it will call ldiskfs_check_dir_entry(); If thread B call ldiskfs_check_dir_entry() with non-NULL "de", it will not call ldiskfs_check_dir_entry().

So thread A and thread B may fail at different positions. Maybe I miss something, please correct me.

nasf (Inactive) added a comment - 28/Sep/12 11:55 PM Di, why you think the "If thread A and B are serialized for this function, both of them should fail at ldiskfs_check_dir_entry" ? If thread A call add_dirent_to_buf() with NULL "de", it will call ldiskfs_check_dir_entry(); If thread B call ldiskfs_check_dir_entry() with non-NULL "de", it will not call ldiskfs_check_dir_entry(). So thread A and thread B may fail at different positions. Maybe I miss something, please correct me.

Liang Zhen (Inactive) added a comment - 28/Sep/12 8:42 AM

Nasf, I would suggest to run with wangdi's patch on ~~LU-1948~~ to gather more information instead of yours patch, actually I just added a little more debug information to Di's patch: http://review.whamcloud.com/#change,4020
How do you think?

Liang Zhen (Inactive) added a comment - 28/Sep/12 8:42 AM Nasf, I would suggest to run with wangdi's patch on LU-1948 to gather more information instead of yours patch, actually I just added a little more debug information to Di's patch: http://review.whamcloud.com/#change,4020 How do you think?

Liang Zhen (Inactive) added a comment - 28/Sep/12 3:51 AM

for directory w/o indices, PDO lock equals to a RW semaphore which is exclusive for writing, which means PDO is actually disabled for small directory like this. But I will check my code again...

Liang Zhen (Inactive) added a comment - 28/Sep/12 3:51 AM for directory w/o indices, PDO lock equals to a RW semaphore which is exclusive for writing, which means PDO is actually disabled for small directory like this. But I will check my code again...

Di Wang added a comment - 28/Sep/12 3:43 AM

Sorry for stepping in here. Clearly 1948 is not being fixed with the patch of 1976. I just re-investigated the result Cliff got on 1948. It seems to me two threads concurrently access the dir.

Thread A adds entry to dir 78644697, and met zero rec_len here.

 
2012-09-18 14:13:37 LDISKFS-fs error (device md1): add_dirent:
2012-09-18 14:13:37 LDISKFS-fs error (device md1): ldiskfs_add_entry: bad entry in directory #78644697: rec_len is smaller than minimal - block=39336951offset=152(152), inode=78655820, rec_len=0, name_len=4
2012-09-18 14:13:37 Aborting journal on device md1-8.

Thread B hit BUG_ON in the later place, which I added for debugging the problem,

1930	@@ -1387,7 +1751,17 @@
			1931	 »      if (de->inode) {
			1932	 »      »       struct ext4_dir_entry_2 *de1 = (struct ext4_dir_entry_2 *)((char *)de + nlen);
			1933	 »      »       de1->rec_len = ext4_rec_len_to_disk(rlen - nlen, blocksize);
			1934	+»      »       if (de1->rec_len == 0) {
			1935	+»      »       »       __ext4_error(dir->i_sb, function, "%lu: name %s, lck %p, name_len=%d\n",
			1936	+»      »       »       »            dir->i_ino, name, name_len);
			1937	+»      »       »       BUG_ON(de1->rec_len == 0);
			1938	+»      »       }
			1939	 »      »       de->rec_len = ext4_rec_len_to_disk(nlen, blocksize);
			1940	+»      »       if (de->rec_len == 0) {
			1941	+»      »       »       __ext4_error(dir->i_sb, function, "%lu: name %s, lck %p name_len=%d\n",
			1942	+»      »       »       »            dir->i_ino, name, name_len);
			1943	+»      »       »       BUG_ON(de->rec_len == 0);
			1944	+»      »       }
			1945	 »      »       de = de1;
			1946	 »      }
			1947	 »      de->file_type = EXT4_FT_UNKNOWN;

The console message for hitting the BUG.

 
2012-09-18 14:13:37 78644697: name L0F7, name_len=4
2012-09-18 14:13:37
2012-09-18 14:13:37 ------------[ cut here ]------------
2012-09-18 14:13:37 kernel BUG at /var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/server/distro/el6/ib_stack/inkernel/BUILD/BUILD/lustre-ldiskfs-3.3.0/ldiskfs/namei.c:1757!
2012-09-18 14:13:37 invalid opcode: 0000 [#1] SMP
2012-09-18 14:13:37 last sysfs file: /sys/devices/pci0000:00/0000:00:09.0/0000:05:00.0/infiniband_mad/umad0/port
2012-09-18 14:13:37 CPU 7
......

If thread A and B are serialized for this function, both of them should fail at ldiskfs_check_dir_entry. So it still seems ldiskfs pdir patch problem. Liang, Could you please have a look again for this one.

And also for this test, each dir only includes 10 files, so it should not involve index splitting IMHO, all dir are just single block dir.

Di Wang added a comment - 28/Sep/12 3:43 AM Sorry for stepping in here. Clearly 1948 is not being fixed with the patch of 1976. I just re-investigated the result Cliff got on 1948. It seems to me two threads concurrently access the dir. Thread A adds entry to dir 78644697, and met zero rec_len here. 2012-09-18 14:13:37 LDISKFS-fs error (device md1): add_dirent: 2012-09-18 14:13:37 LDISKFS-fs error (device md1): ldiskfs_add_entry: bad entry in directory #78644697: rec_len is smaller than minimal - block=39336951offset=152(152), inode=78655820, rec_len=0, name_len=4 2012-09-18 14:13:37 Aborting journal on device md1-8. Thread B hit BUG_ON in the later place, which I added for debugging the problem, 1930 @@ -1387,7 +1751,17 @@ 1931 » if (de->inode) { 1932 » » struct ext4_dir_entry_2 *de1 = (struct ext4_dir_entry_2 *)((char *)de + nlen); 1933 » » de1->rec_len = ext4_rec_len_to_disk(rlen - nlen, blocksize); 1934 +» » if (de1->rec_len == 0) { 1935 +» » » __ext4_error(dir->i_sb, function, "%lu: name %s, lck %p, name_len=%d\n", 1936 +» » » » dir->i_ino, name, name_len); 1937 +» » » BUG_ON(de1->rec_len == 0); 1938 +» » } 1939 » » de->rec_len = ext4_rec_len_to_disk(nlen, blocksize); 1940 +» » if (de->rec_len == 0) { 1941 +» » » __ext4_error(dir->i_sb, function, "%lu: name %s, lck %p name_len=%d\n", 1942 +» » » » dir->i_ino, name, name_len); 1943 +» » » BUG_ON(de->rec_len == 0); 1944 +» » } 1945 » » de = de1; 1946 » } 1947 » de->file_type = EXT4_FT_UNKNOWN; The console message for hitting the BUG. 2012-09-18 14:13:37 78644697: name L0F7, name_len=4 2012-09-18 14:13:37 2012-09-18 14:13:37 ------------[ cut here ]------------ 2012-09-18 14:13:37 kernel BUG at /var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/server/distro/el6/ib_stack/inkernel/BUILD/BUILD/lustre-ldiskfs-3.3.0/ldiskfs/namei.c:1757! 2012-09-18 14:13:37 invalid opcode: 0000 [#1] SMP 2012-09-18 14:13:37 last sysfs file: /sys/devices/pci0000:00/0000:00:09.0/0000:05:00.0/infiniband_mad/umad0/port 2012-09-18 14:13:37 CPU 7 ...... If thread A and B are serialized for this function, both of them should fail at ldiskfs_check_dir_entry. So it still seems ldiskfs pdir patch problem. Liang, Could you please have a look again for this one. And also for this test, each dir only includes 10 files, so it should not involve index splitting IMHO, all dir are just single block dir.

People

Assignee:: nasf (Inactive)

Reporter:: Cliff White (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 28/Sep/12 12:18 AM

Updated:: 18/Oct/12 2:06 PM

Resolved:: 18/Oct/12 2:06 PM