[LU-9952] soft lockup in osd_inode_iteration() for lustre 2.8.1 - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.8.0
Labels:
None
Environment:
Lustre ldiskfs server back end running version 2.8.1 with a few additional patches. The OS is RHEL6.9

Severity:
2
Rank (Obsolete):
9223372036854775807

Description

One of production file systems running lustre 2.8.1 experienced a soft lock up very similar to ~~LU-9488~~. I attempted to back port the patch but way to many changes have happened between 2.8.1 and lustre 2.10.0. Unsure if I would get the port right. I have attached the back trace.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

vmcore-dmesg-f1.txt
512 kB
06/Sep/17 6:56 PM

Issue Links

is related to

LU-9040 Soft lockup on CPU during lfsck

Resolved

LU-9488 soft lockup in osd_inode_iteration()

Resolved

Activity

[LU-9952] soft lockup in osd_inode_iteration() for lustre 2.8.1

James A Simmons added a comment - 26/Sep/17 7:00 PM

We wouldn't be running the test framework on our production system. It looks like I just need to create a bunch of files on the file system.

lctl set_param -n osd*.MDT.force_sync=1
lctl set_param fail_val=1 fail_loc=0x190
lctl lfsck_start -M lustre-MDT0000
lctl set_param fail_val=0 fail_loc=0x198

While you check status:
lctl get_param -n osd-ldiskfs.lustre-MDT000.oi_scrub | grep status

Does this look right? What values do I use to reset it back to normal working conditions.

James A Simmons added a comment - 26/Sep/17 7:00 PM We wouldn't be running the test framework on our production system. It looks like I just need to create a bunch of files on the file system. lctl set_param -n osd*. MDT .force_sync=1 lctl set_param fail_val=1 fail_loc=0x190 lctl lfsck_start -M lustre-MDT0000 lctl set_param fail_val=0 fail_loc=0x198 While you check status: lctl get_param -n osd-ldiskfs.lustre-MDT000.oi_scrub | grep status Does this look right? What values do I use to reset it back to normal working conditions.

nasf (Inactive) added a comment - 21/Sep/17 10:11 AM

Let check whether this one https://review.whamcloud.com/#/c/29133/ works or not.

nasf (Inactive) added a comment - 21/Sep/17 10:11 AM Let check whether this one https://review.whamcloud.com/#/c/29133/ works or not.

James A Simmons added a comment - 19/Sep/17 4:47 PM

Could you create a test condition before the 30th of September?

James A Simmons added a comment - 19/Sep/17 4:47 PM Could you create a test condition before the 30th of September?

nasf (Inactive) added a comment - 19/Sep/17 1:17 AM

I think that we need some new fail_loc to simulate osd_inode_iteration() trouble. For example, inject the new failure stub in the osd_iit_next() to simulate kinds of bitmap layout cases.

nasf (Inactive) added a comment - 19/Sep/17 1:17 AM I think that we need some new fail_loc to simulate osd_inode_iteration() trouble. For example, inject the new failure stub in the osd_iit_next() to simulate kinds of bitmap layout cases.

James A Simmons added a comment - 18/Sep/17 3:49 PM - edited

We are in the process of testing these patches. I attempted to recreate the problem with "lctl set_param fail_loc=0x1504" but that didn't work. What would you recommend to recreate this problem on a 2.8 system? Note we removed the offending files to make our production file system usable again.

James A Simmons added a comment - 18/Sep/17 3:49 PM - edited We are in the process of testing these patches. I attempted to recreate the problem with "lctl set_param fail_loc=0x1504" but that didn't work. What would you recommend to recreate this problem on a 2.8 system? Note we removed the offending files to make our production file system usable again.

nasf (Inactive) added a comment - 08/Sep/17 4:31 AM

The known patches on master that are related with the OI scrub soft lockup are back ported as following:

https://review.whamcloud.com/28903
https://review.whamcloud.com/28904
https://review.whamcloud.com/28905
https://review.whamcloud.com/28906

nasf (Inactive) added a comment - 08/Sep/17 4:31 AM The known patches on master that are related with the OI scrub soft lockup are back ported as following: https://review.whamcloud.com/28903 https://review.whamcloud.com/28904 https://review.whamcloud.com/28905 https://review.whamcloud.com/28906

Peter Jones added a comment - 07/Sep/17 5:21 PM

Fan Yong

Can you please advise on this one?

Thanks

Peter

Peter Jones added a comment - 07/Sep/17 5:21 PM Fan Yong Can you please advise on this one? Thanks Peter

People

Assignee:: nasf (Inactive)

Reporter:: James A Simmons

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 06/Sep/17 6:56 PM

Updated:: 05/Jun/18 4:42 PM

Resolved:: 05/Jun/18 4:42 PM