[LU-9251] sanity-lfsck test_26a: test failed to respond and timed out Created: 24/Mar/17 Updated: 15/Feb/18 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | nasf (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for nasf <fan.yong@intel.com> Please provide additional information about the failure here. This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/803bb4ba-0fa0-11e7-b5b2-5254006e85c2. == sanity-lfsck test 26a: LFSCK can add the missing local name entry back to the namespace =========== 23:24:00 (1490225040)
#####
The local name entry back referenced by the MDT-object is lost.
The namespace LFSCK will add the missing local name entry back
to the normal namespace.
#####
Inject failure stub on MDT0 to simulate the case that
foo's name entry will be removed, but the foo's object
and its linkEA are kept in the system.
CMD: trevis-37vm7 /usr/sbin/lctl set_param fail_loc=0x1624
fail_loc=0x1624
CMD: trevis-37vm7 /usr/sbin/lctl set_param fail_loc=0
fail_loc=0
Trigger namespace LFSCK to repair the missing remote name entry
CMD: trevis-37vm7 /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t namespace -r -A
Started LFSCK on the device lustre-MDT0000: scrub namespace
CMD: trevis-37vm7 /usr/sbin/lctl lfsck_query -t namespace -M lustre-MDT0000 -w |
awk '/^namespace_mdts_completed/ { print \$2 }'
|
| Comments |
| Comment by nasf (Inactive) [ 24/Mar/17 ] |
|
According to the logs, the namespace LFSCK on the MDT0/1/2 have completed, but the namespace on the MDT3 were blocked at: [17949.831270] lfsck S ffff880077498000 0 25770 2 0x00000080 [17949.832977] ffff88006978fc90 0000000000000046 ffff880077498000 ffff88006978ffd8 [17949.834796] ffff88006978ffd8 ffff88006978ffd8 ffff880077498000 ffff88007ae86000 [17949.836610] ffff8800774acc00 ffff88007ae960d0 0000000000000000 ffff880077498000 [17949.838416] Call Trace: [17949.839706] [<ffffffff8168b9e9>] schedule+0x29/0x70 [17949.841292] [<ffffffffa0d45a8e>] lfsck_double_scan_generic+0x22e/0x2c0 [lfsck] [17949.843026] [<ffffffff810c4fe0>] ? wake_up_state+0x20/0x20 [17949.844615] [<ffffffffa0d4edc0>] lfsck_namespace_double_scan+0x30/0x140 [lfsck] [17949.846366] [<ffffffffa0d45de9>] lfsck_double_scan+0x59/0x200 [lfsck] [17949.848044] [<ffffffffa084b641>] ? lprocfs_counter_sub+0xc1/0x130 [obdclass] [17949.849783] [<ffffffffa0d4abe4>] lfsck_master_engine+0x494/0x1330 [lfsck] [17949.851479] [<ffffffff810c4fe0>] ? wake_up_state+0x20/0x20 [17949.853070] [<ffffffffa0d4a750>] ? lfsck_master_oit_engine+0x14c0/0x14c0 [lfsck] [17949.854831] [<ffffffff810b064f>] kthread+0xcf/0xe0 [17949.856366] [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140 [17949.858048] [<ffffffff81696898>] ret_from_fork+0x58/0x90 [17949.859627] [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140 [17949.861292] lfsck_namespace S ffffffffa0db1ca0 0 25772 2 0x00000080 [17949.863007] ffff880061e5fd48 0000000000000046 ffff88007749bec0 ffff880061e5ffd8 [17949.864821] ffff880061e5ffd8 ffff880061e5ffd8 ffff88007749bec0 0000000000000000 [17949.866645] ffff88007ae86008 ffff88007ae86000 ffff88007ae96000 ffffffffa0db1ca0 [17949.868463] Call Trace: [17949.869757] [<ffffffff8168b9e9>] schedule+0x29/0x70 [17949.871347] [<ffffffffa0d4cc7d>] lfsck_assistant_engine+0x11fd/0x20c0 [lfsck] [17949.873082] [<ffffffff810ce47c>] ? dequeue_entity+0x11c/0x5d0 [17949.874708] [<ffffffff8168b3e0>] ? __schedule+0x3b0/0x990 [17949.876289] [<ffffffff810c4fe0>] ? wake_up_state+0x20/0x20 [17949.877882] [<ffffffffa0d4ba80>] ? lfsck_master_engine+0x1330/0x1330 [lfsck] [17949.879594] [<ffffffff810b064f>] kthread+0xcf/0xe0 [17949.881114] [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140 [17949.882839] [<ffffffff81696898>] ret_from_fork+0x58/0x90 [17949.884412] [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140 Means the namespace LFSCK almost finished on the MDT3, but for some unknown reason, it was blocked when sending notification to other MDTs. Without the status update, the lfsck_query() will wait there. |