[LU-6278] lfsck of upgraded 1.8 filesystem does not add linkEA Created: 25/Feb/15  Updated: 05/Jun/18  Resolved: 05/Jun/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.4
Fix Version/s: Lustre 2.5.5

Type: Bug Priority: Minor
Reporter: Andreas Dilger Assignee: nasf (Inactive)
Resolution: Won't Fix Votes: 0
Labels: mq115, patch
Environment:

Lustre 2.5.3-88-ge835226

  • single-node server (MGT, MDT, 4x OST)
  • formatted with Lustre 1.4, upgraded to 1.6, 1.8, 2.1, 2.4.3, 2.5.3+
  • lfsck run on Lustre 2.4.3 after enabling dirdata
  • lfsck run multiple times on 2.5.3+
Filesystem volume name:   myth-MDT0000
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery dirdata sparse_super large_file huge_file uninit_bg dir_nlink
Filesystem state:         clean
Inode count:              2621440
Block count:              2621440
Free blocks:              2119689
Free inodes:              1075487
Filesystem created:       Thu Nov  9 13:21:39 2006
Last mount time:          Mon Feb 23 23:52:09 2015
Last write time:          Mon Feb 23 23:52:09 2015
Last checked:             Mon Feb 23 04:02:09 2015

Issue Links:
Related
Severity: 3
Rank (Obsolete): 17601

 Description   

The filesystem was formatted with 1.x, so has many inodes with IGIF FIDs. OI Scrub was run under 2.4.3 to create the OI files and lma xattrs on all files.

Running lctl lfsck_start -M myth-MDT0000 -t namespace on the MDS (even multiple times) with 2.5.3+ does not appear to create the link xattrs on directories, even though it has created link xattrs on the regular files. This means that lfs fid2path does not work properly.

There appears to be some errors that LFSCK hits (see first_failure_position below), but it does not report them to the log, so there is no way to know what is going wrong.

# lctl get_param mdd.*.lfsck_namespace
mdd.myth-MDT0000.lfsck_namespace=
name: lfsck_namespace
magic: 0xa0629d03
version: 2
status: completed
flags: scanned-once,inconsistent
param: (null)
time_since_last_completed: 233 seconds
time_since_latest_start: 349 seconds
time_since_last_checkpoint: 233 seconds
latest_start_position: 13, N/A, N/A
last_checkpoint_position: 2621440, N/A, N/A
first_failure_position: 32769, [0x8001:0x4145293e:0x0], 1051280482039016589
checked_phase1: 3104290
checked_phase2: 0
updated_phase1: 912977
updated_phase2: 0
failed_phase1: 0
failed_phase2: 0
dirs: 84295
M-linked: 0
nlinks_repaired: 0
lost_found: 0
success_count: 12
run_time_phase1: 116 seconds
run_time_phase2: 0 seconds
average_speed_phase1: 26761 items/sec
average_speed_phase2: 0 objs/sec
real-time_speed_phase1: N/A
real-time_speed_phase2: N/A
current_position: N/A

There are only useless log entries from OI Scrub instead:

00000004:10000000:1.0:1424838545.669434:0:2326:0:(osd_scrub.c:1240:osd_otable_it_preload()) OSD pre-loaded: max = 2621440, preload = 10470, rc = 0
00000004:10000000:1.0:1424838545.669464:0:2326:0:(osd_scrub.c:1240:osd_otable_it_preload()) OSD pre-loaded: max = 2621440, preload = 10471, rc = 0
00000004:10000000:1.0:1424838545.669501:0:2326:0:(osd_scrub.c:1240:osd_otable_it_preload()) OSD pre-loaded: max = 2621440, preload = 10474, rc = 0
00000004:10000000:1.0:1424838545.669773:0:2326:0:(osd_scrub.c:1240:osd_otable_it_preload()) OSD pre-loaded: max = 2621440, preload = 32769, rc = 0
00000004:10000000:1.0:1424838545.670051:0:2326:0:(osd_scrub.c:1240:osd_otable_it_preload()) OSD pre-loaded: max = 2621440, preload = 32770, rc = 0
00000004:10000000:1.0:1424838545.670686:0:2326:0:(osd_scrub.c:1240:osd_otable_it_preload()) OSD pre-loaded: max = 2621440, preload = 32775, rc = 0

The first_failure_location is a directory in the namespace:

debugfs:  ncheck 0x8001
Inode   Pathname
32769   /ROOT/backup/ruby/Music/U2
debugfs:  stat <0x8001>
Inode: 32769   Type: directory    Mode:  0500   Flags: 0x0
Generation: 1095051582    Version: 0x00000000:00000000
User:  1000   Group:  1000   Size: 4096
File ACL: 0    Directory ACL: 0
Links: 5   Blockcount: 8
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x4a6548f7:00000000 -- Mon Jul 20 22:49:59 2009
 atime: 0x53d6d027:01b56e18 -- Mon Jul 28 16:35:19 2014
 mtime: 0x48699a91:00000000 -- Mon Jun 30 20:46:41 2008
crtime: 0x4a6548e2:21db6948 -- Mon Jul 20 22:49:38 2009
Size of extra inode fields: 28
Extended attributes stored in inode body: 
  lma = "00 00 00 00 00 00 00 00 01 80 00 00 00 00 00 00 3e 29 45 41 00 00 00 00 " (24)
  lma: fid=[0x8001:0x4145293e:0x0] compat=0 incompat=0
BLOCKS:
(0):53248
TOTAL: 1


 Comments   
Comment by nasf (Inactive) [ 25/Feb/15 ]
status: completed
flags: scanned-once,inconsistent

That means the LFSCK is running under "dryrun" mode, but because of some bug on b2_5, such param has not been shown correctly. I will make patch to verify that.

Comment by Gerrit Updater [ 25/Feb/15 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13861
Subject: LU-6278 lfsck: show lfsck parameter correctly
Project: fs/lustre-release
Branch: b2_5
Current Patch Set: 1
Commit: 9337942978442695cb2b36f729c005f4fa397e9e

Comment by Andreas Dilger [ 25/Feb/15 ]

With the updated patches applied to the MDS it does look like dryrun was enabled:

mdd.myth-MDT0000.lfsck_namespace=
name: lfsck_namespace
magic: 0xa0629d03
version: 2
status: completed
flags: scanned-once,inconsistent
param: dryrun
time_since_last_completed: 16716 seconds
time_since_latest_start: 17122 seconds
time_since_last_checkpoint: 16716 seconds
Comment by Andreas Dilger [ 25/Feb/15 ]

It doesn't seem possible to clear the dryrun setting, even with -r.

Comment by nasf (Inactive) [ 25/Feb/15 ]

On master branch, the "dryrun" flag will be clear automatically when "-r" specified. I will back port the patch to b2_5. But before that, you can specify "--dryrun off" for the same purpose.

Comment by Andreas Dilger [ 25/Feb/15 ]

I removed the lfsck_bookmark and lfsck_namespace files, and re-ran LFSCK, which cleared the dryrun flag. It no longer shows the first_failure_location, and now the link xattrs were created:

# lfs fid2path /myth "[0x8001:0x4145293e:0x0]"
/myth/backup/ruby/Music/U2

It seems like a defect that dryrun isn't cleared once the lfsck run is completed, rather than staying set until it is manually removed.

Comment by nasf (Inactive) [ 05/Jun/18 ]

The issue has been resolved on master, we have no plan to land more patches for b2_5 based release. Then close the ticket.

Generated at Sat Feb 10 01:58:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.