[LU-7429] sanity-lfsck test_23c: @@@@@@ FAIL: (8) unexpected size Created: 16/Nov/15  Updated: 14/Nov/19  Resolved: 24/Jan/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0, Lustre 2.9.0, Lustre 2.10.0
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Minor
Reporter: parinay v kondekar (Inactive) Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: patch

Attachments: File 23c.lctl.tgz     File 23c__7th_JAN.lctl.tgz    
Issue Links:
Duplicate
is duplicated by LU-7113 sanity-lfsck_23b test failed Lustre: ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Configuration : 4 node setup 1 MDS, 1 OSS, 2 patchless Clients
Release
2.6.32_431.29.2.el6.x86_64
2.6.32_431.29.2.el6.x86_64

Server 2.7.62
Client 2.7.62

git hash 049252c

stdout.log
== sanity-lfsck test 23c: LFSCK can repair dangling name entry (3) == 02:09:55 (1446948595)
#####
The objectA has multiple hard links, one of them corresponding
to the name entry_B. But there is something wrong for the name
entry_B and cause entry_B to references non-exist object_C.
In the first-stage scanning, the LFSCK will think the entry_B
as dangling, and re-create the lost object_C. And then others
modified the re-created object_C. When the LFSCK comes to the
second-stage scanning, it will find that the former re-creating
object_C maybe wrong and try to replace the object_C with the
real object_A. But because object_C has been modified, so the
LFSCK cannot replace it.
#####
Inject failure stub on MDT0 to simulate dangling name entry
fail_loc=0x1621
fail_loc=0
'ls' should fail because of dangling name entry
fail_val=10
fail_loc=0x1602
Trigger namespace LFSCK to find out dangling name entry
Started LFSCK on the device lustre-MDT0000: scrub namespace
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
Waiting 32 secs for update
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
Waiting 22 secs for update
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
Waiting 12 secs for update
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
Waiting 2 secs for update
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/d0/foo': No such file or directory
Update not seen after 32s: wanted '0' got ''
stat: cannot stat `/mnt/lustre/d23c.sanity-lfsck/guard': No such file or directory
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: scanning-phase2
flags: scanned-once,inconsistent
param: create_mdtobj
last_completed_time: N/A
time_since_last_completed: N/A
latest_start_time: 1446948595
time_since_latest_start: 33 seconds
last_checkpoint_time: 1446948595
time_since_last_checkpoint: 33 seconds
latest_start_position: 12, N/A, N/A
last_checkpoint_position: 25037, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 8
checked_phase2: 0
updated_phase1: 1
updated_phase2: 0
failed_phase1: 0
failed_phase2: 0
directories: 4
dirent_repaired: 2
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 1
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 0
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 0
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 0
name_hash_repaired: 0
success_count: 0
run_time_phase1: 0 seconds
run_time_phase2: 9 seconds
average_speed_phase1: 8 items/sec
average_speed_phase2: 0 objs/sec
average_speed_total: 0 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: 0 objs/sec
current_position: [0x0:0x0:0x0]
 sanity-lfsck test_23c: @@@@@@ FAIL: (8) unexpected size 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4812:error_noexit()
  = /usr/lib64/lustre/tests/test-framework.sh:4843:error()
  = /usr/lib64/lustre/tests/sanity-lfsck.sh:3114:test_23c()
  = /usr/lib64/lustre/tests/test-framework.sh:5090:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:5127:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:4944:run_test()
  = /usr/lib64/lustre/tests/sanity-lfsck.sh:3137:main()
Dumping lctl log to /tmp/test_logs/1446948549/sanity-lfsck.test_23c.*.1446948628.log
FAIL 23c (34s)



stderr.log
pdsh@fre0203: fre0201: ssh exited with exit code 1
pdsh@fre0203: fre0202: ssh exited with exit code 1
Using TIMEOUT=20
excepting tests: 



Seagate-Bug-Id: MRP-3134



 Comments   
Comment by parinay v kondekar (Inactive) [ 07/Jan/16 ]

PTLDEBUG=-1 logs attached.

Comment by Gerrit Updater [ 15/Feb/16 ]

kirtan.shetty (kirtan.shetty@seagate.com) uploaded a new patch: http://review.whamcloud.com/18452
Subject: LU-7429 test: LFSCK isn't getting completed within given time.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c8ac134bca875e95e0cc3a46e9e3fd2ccdf098e6

Comment by nasf (Inactive) [ 17/Feb/16 ]
        wait_update_facet client "stat $DIR/$tdir/d0/foo |
                awk '/Size/ { print \\\$2 }'" "0" 32 || {
                stat $DIR/$tdir/guard
                $SHOW_NAMESPACE
                error "(8) unexpected size"
        }

Above logic does not expect the LFSCK to complete, instead, it hopes the LFSCK to find out the dangling name entry and re-create the lost MDT-object. Such repairing should happen during the "scanning-phase1". But according to the logs, the LFSCK has moved to the "scanning-phase2". So the expected repairing should have happened already. But the test results shows that it did NOT. So only waiting more time for the LFSCK may not fix the root issue, unless it happened at very rare corner, means that before the "$SHOW_NAMESPACE", it was in "phase1", but when "$SHOW_NAMESPACE", it moved to "phase2". If that is true, then we need to think why the NAMESPACE LFSCK run so slow.

Comment by Gerrit Updater [ 22/Jun/16 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/20916
Subject: LU-7429 tests: inject lfsck failure properly
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1114c5927df938729ba59da1c39232dbdc74df3a

Comment by Gerrit Updater [ 11/Jul/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20916/
Subject: LU-7429 tests: inject lfsck failure properly
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b399a2d8cbe195eb93e6000d19e24e5ed8864b69

Comment by nasf (Inactive) [ 12/Jul/16 ]

The patch has been landed to master.

Comment by Gerrit Updater [ 19/Jul/16 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/21412
Subject: LU-7429 tests: generate dangling name entry properly
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: cb3462ae2274a55a03e323d7f5fcdd597b337163

Comment by Gerrit Updater [ 27/Jul/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21412/
Subject: LU-7429 tests: generate dangling name entry properly
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 9bb5a2fd3e76b460fd5121d48bc492be27a2e4f5

Comment by Gerrit Updater [ 09/Jan/17 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/24763
Subject: LU-7429 tests: generate dangling name entry properly
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4696914e3a5347580628474380c15ea9532cf806

Comment by nasf (Inactive) [ 09/Jan/17 ]

Reopen the ticket for the pending patch https://review.whamcloud.com/24763

Comment by Gerrit Updater [ 24/Jan/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24763/
Subject: LU-7429 tests: generate dangling name entry properly
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d3919b209e0a4762540d9c28dd6bc6e3c4bbf33b

Generated at Sat Feb 10 02:08:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.