[LU-8472] sanity-scrub test_5 times out Created: 02/Aug/16  Updated: 06/Feb/18  Resolved: 26/Sep/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Critical
Reporter: James Nunez (Inactive) Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

autotest review-dne-part-2


Issue Links:
Related
is related to LU-8493 Do not set stale flag for new created... Resolved
is related to LU-8768 sanity-scrub test_10a: 'test failed ... Open
is related to LU-8646 sanity-scrub tests 4b and 4c fail wit... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity-scrub test 5 hangs and times out. Looking at the console logs, it’s not clear why the test is hanging.

From the test_log, the last thing we see

Starting client: onyx-40vm1.onyx.hpdd.intel.com:  -o user_xattr,flock onyx-40vm7@tcp:/lustre /mnt/lustre
CMD: onyx-40vm1.onyx.hpdd.intel.com mkdir -p /mnt/lustre
CMD: onyx-40vm1.onyx.hpdd.intel.com mount -t lustre -o user_xattr,flock onyx-40vm7@tcp:/lustre /mnt/lustre
CMD: onyx-40vm3,onyx-40vm7 /usr/sbin/lctl set_param -n osd-ldiskfs.*.full_scrub_ratio=0
CMD: onyx-40vm3,onyx-40vm7 /usr/sbin/lctl set_param fail_val=3 fail_loc=0x190
fail_val=3
fail_loc=0x190
fail_val=3
fail_loc=0x190
  File: '/mnt/lustre/d5.sanity-scrub/mds1/f5.sanity-scrub800'
  Size: 0         	Blocks: 0          IO Block: 4194304 regular empty file
Device: 2c54f966h/743766374d	Inode: 144115205322834801  Links: 1
Access: (0444/-r--r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2016-08-01 19:00:37.000000000 -0700
Modify: 2016-08-01 18:59:51.000000000 -0700
Change: 2016-08-01 19:00:37.000000000 -0700
 Birth: -

I don’t see any LBUG or ASSERTIONs in any of the console logs. In client logs when this test completes, we see a mds_connect to the MDS fail and then later connect. From the client dmesg, we see the problem connection to the MDS, but we don’t connect with the MDS:

[13304.508640] Lustre: Unmounted lustre-client
[13304.511334] Lustre: Skipped 2 previous similar messages
[13367.608943] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre
[13367.616737] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock onyx-40vm7@tcp:/lustre /mnt/lustre
[13368.970139] Lustre: DEBUG MARKER: grep -c /mnt/lustre' ' /proc/mounts
[13368.978423] Lustre: DEBUG MARKER: lsof -t /mnt/lustre
[13369.048679] Lustre: DEBUG MARKER: umount /mnt/lustre 2>&1
[13449.961128] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre
[13449.968913] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock onyx-40vm7@tcp:/lustre /mnt/lustre
[13450.020802] LustreError: 11-0: lustre-MDT0001-mdc-ffff880051c8a800: operation mds_connect to node 10.2.4.185@tcp failed: rc = -16
[13450.024064] LustreError: Skipped 2 previous similar messages
[13485.017940] LustreError: 11-0: lustre-MDT0001-mdc-ffff880051c8a800: operation mds_connect to node 10.2.4.185@tcp failed: rc = -16
[13485.023276] LustreError: Skipped 6 previous similar messages
[13550.024059] LustreError: 11-0: lustre-MDT0001-mdc-ffff880051c8a800: operation mds_connect to node 10.2.4.185@tcp failed: rc = -16
[13550.031793] LustreError: Skipped 12 previous similar messages
[13680.018274] LustreError: 11-0: lustre-MDT0001-mdc-ffff880051c8a800: operation mds_connect to node 10.2.4.185@tcp failed: rc = -16
[13680.023602] LustreError: Skipped 25 previous similar messages
[13940.017773] LustreError: 11-0: lustre-MDT0001-mdc-ffff880051c8a800: operation mds_connect to node 10.2.4.185@tcp failed: rc = -16
[13940.023074] LustreError: Skipped 51 previous similar messages
[14455.018070] LustreError: 11-0: lustre-MDT0001-mdc-ffff880051c8a800: operation mds_connect to node 10.2.4.185@tcp failed: rc = -16
[14455.023708] LustreError: Skipped 102 previous similar messages
[15060.018374] LustreError: 11-0: lustre-MDT0001-mdc-ffff880051c8a800: operation mds_connect to node 10.2.4.185@tcp failed: rc = -16
[15060.024085] LustreError: Skipped 120 previous similar messages
[15665.017755] LustreError: 11-0: lustre-MDT0001-mdc-ffff880051c8a800: operation mds_connect to node 10.2.4.185@tcp failed: rc = -16
[15665.024000] LustreError: Skipped 120 previous similar messages
[16270.017567] LustreError: 11-0: lustre-MDT0001-mdc-ffff880051c8a800: operation mds_connect to node 10.2.4.185@tcp failed: rc = -16
[16270.023955] LustreError: Skipped 120 previous similar messages
[16816.123140] SysRq : Show State
[16816.124024]   task                        PC stack   pid father
[16816.124024] systemd         S 0000000000000000     0     1      0 0x00000000
[16816.124024]  ffff88007c7b3db8 0000000000000086 ffff88007c7a8000 ffff88007c7b3fd8
[16816.124024]  ffff88007c7b3fd8 ffff88007c7b3fd8 ffff88007c7a8000 0000000000000000
[16816.124024]  0000000000000000 ffff88003693d5a0 ffff88007c7a8000 0000000000000000
[16816.124024] Call Trace:
[16816.124024]  [<ffffffff8163b809>] schedule+0x29/0x70
[16816.124024]  [<ffffffff8163a99d>] schedule_hrtimeout_range_clock+0x12d/0x150
[16816.124024]  [<ffffffff81228a59>] ? ep_scan_ready_list.isra.9+0x1b9/0x1f0
[16816.124024]  [<ffffffff8163a9d3>] schedule_hrtimeout_range+0x13/0x20
[16816.124024]  [<ffffffff81228cee>] ep_poll+0x23e/0x360
[16816.124024]  [<ffffffff8122b23d>] ? do_timerfd_settime+0x2ed/0x3a0
[16816.124024]  [<ffffffff810b88d0>] ? wake_up_state+0x20/0x20
[16816.124024]  [<ffffffff81229ded>] SyS_epoll_wait+0xed/0x120
[16816.124024]  [<ffffffff81646889>] system_call_fastpath+0x16/0x1b

These failures do NOT look like LU-8070 nor LU-8399.

Recent failure logs are at
2016-07-22 - https://testing.hpdd.intel.com/test_sets/b5c71fbc-5087-11e6-8968-5254006e85c2
2016-07-25 - https://testing.hpdd.intel.com/test_sets/cdc385e8-52d9-11e6-bf87-5254006e85c2
2016-07-25 - https://testing.hpdd.intel.com/test_sets/eaaeec4a-52d6-11e6-bf87-5254006e85c2
2016-07-29 - https://testing.hpdd.intel.com/test_sets/97c36552-5604-11e6-b5b1-5254006e85c2
2016-07-29 - https://testing.hpdd.intel.com/test_sets/5d523e04-55e9-11e6-aa74-5254006e85c2
2016-08-01 - https://testing.hpdd.intel.com/test_sets/0dca7564-583b-11e6-aa74-5254006e85c2
2016-08-01 - https://testing.hpdd.intel.com/test_sets/64a45f36-584e-11e6-aa74-5254006e85c2
2016-08-02 - https://testing.hpdd.intel.com/test_sets/058493fc-588c-11e6-b5b1-5254006e85c2



 Comments   
Comment by Andreas Dilger [ 04/Aug/16 ]

Fan Yong, can you please take a look at this.

Comment by nasf (Inactive) [ 13/Aug/16 ]

There is known issue about DNE recovery and OI scrub: if the MDT is restored from file-level backup, then when the MDT mount up, it is quite possible that only part of the OI mappings are rebuilt, the other parts will be rebuilt at background after the MDT mount if not specify "-o noscrub" option. But if "-o noscrub" option is specified when mount the MDT, then only init OI scrub fixed some important objects' OI mapping, the other OI mappings are invalid until the admin trigger OI scrub manually after the MDT mount. Usually, it is fine. But after DNE introduced, things become complex, because the DEN recovery depends on update logs that are FID based llogs. When mount the MDT, it will collect update logs from the other MDTs and parse the update logs to replay the un-commited modification. These FID based recovery RPCs depend on the valid OI mappings. But as described above, it is NOT guaranteed that all the OI mappings are rebuilt during the recovery. So if "-o noscrub" option specified, then the cross-MDT recovery may hit "-EREMCHG (-78)" failure; otherwise it may hit "-EINPROGRESS (-115)" failure and retry for ever until related OI mapping rebuilt.

In the sanity-scrub test_5, we mount the MDT without "-o noscrub", the cross-MDT recovery triggered full speed OI scrub, but we set fail_loc OBD_FAIL_OSD_SCRUB_DELAY to slowdown the OI scrub speed to check its status, then cause the OI scrub run very slowly (1 OI mapping per 3 seconds), so the recovery thread has to wait there and retry from time by time. That is why the test expired.

Comment by nasf (Inactive) [ 13/Aug/16 ]

It is not possible to totally avoid FID based RPC during OI scrub, but we need to try to avoid that, especially on the MDT side. There are at least three points can be improved:

1) The DNE logic should guarantee that when the sub-modifications on all related MDTs have been committed, related update logs should be cancelled from the MDTs. Otherwise, when the MDTs remount next time, it will trigger unnecessary update logs scanning. But we have observed related bugs on current master about the update logs failure:

[112275.488838] LustreError: 82035:0:(llog_cat.c:744:llog_cat_cancel_records()) lustre-MDT0000-osp-MDT0001: fail to cancel 1 of 1 llog-records: rc = -116

2) In update_recovery_exec(), before replaying the cross-MDT modifications, it needs to check whether related modifications/update logs have been committed. If yes, skip it. But for the implementation convenient, the current logic locates object with the FID in the update logs before checking whether committed or not. Such behaviour may trigger OI scrub and be blocked even if we should skip such update logs. So related logic need to be adjusted for avoid unnecessary object location.

3) The OI scrub tests should try to flush all modifications before file-level backup to avoid cross-MDT recovery after restore.

Currently, the 1) is in process via LU-8493. I will work on the point 3).
Di, would you please to give some adjustment of DNE recovery logic for the point 2)? Thanks!

Comment by Gerrit Updater [ 13/Aug/16 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/21918
Subject: LU-8472 scrub: try to avoid recovery during OI scrub
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5b6e28c38bef45878e3ef2d595d52f67d927fc15

Comment by nasf (Inactive) [ 14/Sep/16 ]

We hit many sanity scrub test failures recently that are related with OI scrub triggered by locating the update logs. We need the patch http://review.whamcloud.com/21918.

Comment by Gerrit Updater [ 26/Sep/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21918/
Subject: LU-8472 scrub: try to avoid recovery during OI scrub
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a41c6fad4672a60166088b9ad8aeb4f1b51c38e7

Comment by Peter Jones [ 26/Sep/16 ]

Landed for 2.9

Generated at Sat Feb 10 02:17:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.