Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.9.0, Lustre 2.10.0
Labels:
- soak
Environment:
Soak cluster lustre: 2.8.60_5_gcc5601d

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

LFSCK fails to complete on lola-8, MDT0000:

                                lctl lfsck_start -M soaked-MDT0000 -s 1000 -t namespace
                        fi
                fi
2016-11-21 13:21:36,440:fsmgmt.fsmgmt:INFO     lfsck started on lola-8
2016-11-21 13:21:52,069:fsmgmt.fsmgmt:INFO     lfsck still in progress for soaked-MDT0000 after 15s
2016-11-21 13:22:22,672:fsmgmt.fsmgmt:INFO     lfsck still in progress for soaked-MDT0000 after 45s
2016-11-21 13:23:23,898:fsmgmt.fsmgmt:INFO     lfsck still in progress for soaked-MDT0000 after 105s
2016-11-21 13:25:26,280:fsmgmt.fsmgmt:INFO     lfsck still in progress for soaked-MDT0000 after 225s
2016-11-21 13:29:31,072:fsmgmt.fsmgmt:INFO     lfsck still in progress for soaked-MDT0000 after 465s
2016-11-21 13:37:40,601:fsmgmt.fsmgmt:INFO     lfsck still in progress for soaked-MDT0000 after 945s
2016-11-21 13:53:59,778:fsmgmt.fsmgmt:INFO     lfsck still in progress for soaked-MDT0000 after 1905s
2016-11-21 14:26:38,226:fsmgmt.fsmgmt:INFO     lfsck still in progress for soaked-MDT0000 after 3825s
2016-11-21 15:31:55,117:fsmgmt.fsmgmt:INFO     lfsck still in progress for soaked-MDT0000 after 7665s

I aborted LFSCK with lfsck_stop
The LFSCK stopped, but clients and other servers were not able to re-connect.
Example client:

Lustre: Evicted from MGS (at MGC192.168.1.108@o2ib10_0) after server handle changed from 0xd9fafa0ca7b8e5dc to 0x732870fe43aa2fe7
Lustre: MGC192.168.1.108@o2ib10: Connection restored to MGC192.168.1.108@o2ib10_0 (at 192.168.1.108@o2ib10)
Lustre: Skipped 1 previous similar message
LustreError: 183198:0:(lmv_obd.c:1402:lmv_statfs()) can't stat MDS #0 (soaked-MDT0000-mdc-ffff880426c9c000), error -4
LustreError: 183198:0:(llite_lib.c:1736:ll_statfs_internal()) md_statfs fails: rc = -4

The system appears to be wedged in this state, rebooting and remounting the lola-8 MDT does no fix the issue.
I dumped the Lustre log on lola-8 while it was in LFSCK, attached.
Also, the lfsck_layout

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

lola-8.nov21-2016.txt.gz
2.61 MB
22/Nov/16 10:26 PM
lola-8.layout.txt
0.9 kB
22/Nov/16 10:26 PM

Issue Links

is related to

LU-8647 lfsck_namespace_double_scan()) ASSERTION( list_empty(&lad->lad_req_list) ) failed

Resolved

mentioned in: Page Loading...

Activity

People

Assignee:: Lai Siyao

Reporter:: Cliff White (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 22/Nov/16 10:26 PM

Updated:: 12/Aug/22 10:00 PM

Resolved:: 12/Aug/22 10:00 PM