[LU-9251] sanity-lfsck test_26a: test failed to respond and timed out Created: 24/Mar/17  Updated: 15/Feb/18

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: nasf (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for nasf <fan.yong@intel.com>

Please provide additional information about the failure here.

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/803bb4ba-0fa0-11e7-b5b2-5254006e85c2.

== sanity-lfsck test 26a: LFSCK can add the missing local name entry back to the namespace =========== 23:24:00 (1490225040)
#####
The local name entry back referenced by the MDT-object is lost.
The namespace LFSCK will add the missing local name entry back
to the normal namespace.
#####
Inject failure stub on MDT0 to simulate the case that
foo's name entry will be removed, but the foo's object
and its linkEA are kept in the system.
CMD: trevis-37vm7 /usr/sbin/lctl set_param fail_loc=0x1624
fail_loc=0x1624
CMD: trevis-37vm7 /usr/sbin/lctl set_param fail_loc=0
fail_loc=0
Trigger namespace LFSCK to repair the missing remote name entry
CMD: trevis-37vm7 /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t namespace -r -A
Started LFSCK on the device lustre-MDT0000: scrub namespace
CMD: trevis-37vm7 /usr/sbin/lctl lfsck_query -t namespace -M lustre-MDT0000 -w |
		      awk '/^namespace_mdts_completed/ { print \$2 }'


 Comments   
Comment by nasf (Inactive) [ 24/Mar/17 ]

According to the logs, the namespace LFSCK on the MDT0/1/2 have completed, but the namespace on the MDT3 were blocked at:

[17949.831270] lfsck           S ffff880077498000     0 25770      2 0x00000080
[17949.832977]  ffff88006978fc90 0000000000000046 ffff880077498000 ffff88006978ffd8
[17949.834796]  ffff88006978ffd8 ffff88006978ffd8 ffff880077498000 ffff88007ae86000
[17949.836610]  ffff8800774acc00 ffff88007ae960d0 0000000000000000 ffff880077498000
[17949.838416] Call Trace:
[17949.839706]  [<ffffffff8168b9e9>] schedule+0x29/0x70
[17949.841292]  [<ffffffffa0d45a8e>] lfsck_double_scan_generic+0x22e/0x2c0 [lfsck]
[17949.843026]  [<ffffffff810c4fe0>] ? wake_up_state+0x20/0x20
[17949.844615]  [<ffffffffa0d4edc0>] lfsck_namespace_double_scan+0x30/0x140 [lfsck]
[17949.846366]  [<ffffffffa0d45de9>] lfsck_double_scan+0x59/0x200 [lfsck]
[17949.848044]  [<ffffffffa084b641>] ? lprocfs_counter_sub+0xc1/0x130 [obdclass]
[17949.849783]  [<ffffffffa0d4abe4>] lfsck_master_engine+0x494/0x1330 [lfsck]
[17949.851479]  [<ffffffff810c4fe0>] ? wake_up_state+0x20/0x20
[17949.853070]  [<ffffffffa0d4a750>] ? lfsck_master_oit_engine+0x14c0/0x14c0 [lfsck]
[17949.854831]  [<ffffffff810b064f>] kthread+0xcf/0xe0
[17949.856366]  [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140
[17949.858048]  [<ffffffff81696898>] ret_from_fork+0x58/0x90
[17949.859627]  [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140
[17949.861292] lfsck_namespace S ffffffffa0db1ca0     0 25772      2 0x00000080
[17949.863007]  ffff880061e5fd48 0000000000000046 ffff88007749bec0 ffff880061e5ffd8
[17949.864821]  ffff880061e5ffd8 ffff880061e5ffd8 ffff88007749bec0 0000000000000000
[17949.866645]  ffff88007ae86008 ffff88007ae86000 ffff88007ae96000 ffffffffa0db1ca0
[17949.868463] Call Trace:
[17949.869757]  [<ffffffff8168b9e9>] schedule+0x29/0x70
[17949.871347]  [<ffffffffa0d4cc7d>] lfsck_assistant_engine+0x11fd/0x20c0 [lfsck]
[17949.873082]  [<ffffffff810ce47c>] ? dequeue_entity+0x11c/0x5d0
[17949.874708]  [<ffffffff8168b3e0>] ? __schedule+0x3b0/0x990
[17949.876289]  [<ffffffff810c4fe0>] ? wake_up_state+0x20/0x20
[17949.877882]  [<ffffffffa0d4ba80>] ? lfsck_master_engine+0x1330/0x1330 [lfsck]
[17949.879594]  [<ffffffff810b064f>] kthread+0xcf/0xe0
[17949.881114]  [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140
[17949.882839]  [<ffffffff81696898>] ret_from_fork+0x58/0x90
[17949.884412]  [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140

Means the namespace LFSCK almost finished on the MDT3, but for some unknown reason, it was blocked when sending notification to other MDTs. Without the status update, the lfsck_query() will wait there.

Generated at Sat Feb 10 02:24:32 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.