[LU-1292] ldiskfs_ext_walk_space error Created: 08/Apr/12  Updated: 06/Nov/13  Resolved: 06/Nov/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Larry Gu Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

rhel5.6/kernel 2.6.18-238.19.1.el5, intel xeon x5650


Severity: 3
Rank (Obsolete): 4026

 Description   

We have a lustre 2.1 filesystem with 2mds and 4oss, each has two osts. There is an issue recently, one ost reports a ldiskfs_ext_walk_space error, then remount with read-only. We reboot the oss, the ost can mount and work without any fsck. Below is the syslog:

Apr 8 21:00:43 oss1 kernel: LDISKFS-fs error (device sdb): ldiskfs_ext_walk_space: inode #12388311: (comm ll_ost_io_153) path[1].p_hdr == NULL
Apr 8 21:00:43 oss1 kernel: Aborting journal on device sdb-8.
Apr 8 21:00:43 oss1 kernel: LustreError: 7722:0:(obd.h:1613:obd_transno_commit_cb()) dcfs-OST0007: transno 3231335624 commit error: 2
Apr 8 21:00:43 oss1 kernel: LDISKFS-fs error (device sdb): ldiskfs_journal_start_sb: Detected aborted journal
Apr 8 21:00:43 oss1 kernel: LDISKFS-fs (sdb): Remounting filesystem read-only
Apr 8 21:00:43 oss1 kernel: LustreError: 31323:0:(fsfilt-ldiskfs.c:492:fsfilt_ldiskfs_brw_start()) can't get handle for 45 credits: rc = -30
Apr 8 21:00:43 oss1 kernel: LustreError: 31323:0:(filter_io_26.c:712:filter_commitrw_write()) error starting transaction: rc = -30
Apr 8 21:00:43 oss1 kernel: LDISKFS-fs (sdb): Remounting filesystem read-only
Apr 8 21:00:43 oss1 kernel: LustreError: 18874:0:(fsfilt-ldiskfs.c:358:fsfilt_ldiskfs_start()) error starting handle for op 8 (71 credits): rc -30
Apr 8 21:00:43 oss1 kernel: LustreError: 18035:0:(fsfilt-ldiskfs.c:492:fsfilt_ldiskfs_brw_start()) can't get handle for 569 credits: rc = -30
Apr 8 21:00:43 oss1 kernel: LustreError: 18035:0:(filter_io_26.c:712:filter_commitrw_write()) error starting transaction: rc = -30
Apr 8 21:00:43 oss1 kernel: LustreError: 18035:0:(fsfilt-ldiskfs.c:358:fsfilt_ldiskfs_start()) error starting handle for op 8 (71 credits): rc -30
Apr 8 21:00:43 oss1 kernel: LustreError: 18035:0:(fsfilt-ldiskfs.c:358:fsfilt_ldiskfs_start()) Skipped 2 previous similar messages
Apr 8 21:00:43 oss1 kernel: LustreError: 18035:0:(filter_io_26.c:712:filter_commitrw_write()) error starting transaction: rc = -30
Apr 8 21:00:43 oss1 kernel: LustreError: 18035:0:(filter_io_26.c:712:filter_commitrw_write()) error starting transaction: rc = -30
Apr 8 21:00:43 oss1 kernel: LustreError: 8014:0:(filter_io_26.c:712:filter_commitrw_write()) error starting transaction: rc = -30
Apr 8 21:00:43 oss1 kernel: LustreError: 18033:0:(filter_io_26.c:712:filter_commitrw_write()) error starting transaction: rc = -30



 Comments   
Comment by Marek Magrys [ 11/Apr/12 ]

We used to have a similar problem, running 'forced' fsck on the failed OST did the trick, so you might give it a try.

Generated at Sat Feb 10 01:15:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.