[LU-2285] Test failure on replay-ost-single test_3: write page inode failed -2 - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.4.0
Labels:
None

Severity:
3
Rank (Obsolete):
5469

Description

This issue was created by maloo for Oleg Drokin <green@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/70b83f68-2797-11e2-9e20-52540035b04c.

The sub-test test_3 failed with the following error:

test_3 failed with 1

on client we can see in dmesg that the write actually failed.

Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-ost-single test 3: Fail OST during write, with verification ================================ 12:42:29 \(1352148149\)
Lustre: DEBUG MARKER: == replay-ost-single test 3: Fail OST during write, with verification ================================ 12:42:29 (1352148149)
LustreError: 17578:0:(vvp_io.c:1037:vvp_io_commit_write()) Write page 11 of inode ffff81031f41fb28 failed -2
Lustre: DEBUG MARKER: /usr/sbin/lctl mark  replay-ost-single test_3: @@@@@@ FAIL: test_3 failed with 1

Info required for matching: replay-ost-single 3

Attachments

Issue Links

is duplicated by

LU-2493 osp_precreate.c:781:osp_precreate_get_id()) ASSERTION( d->opd_pre_used_id < d->opd_pre_last_created )

Resolved

is related to

LU-9588 replay-ost-single test cases 3 remove from ALWAYS_EXCEPT

Resolved

is related to

LU-2832 Test failure on test suite replay-ost-single test_3: ost_write failure -19

Resolved

Activity

[LU-2285] Test failure on replay-ost-single test_3: write page inode failed -2

Alex Zhuravlev added a comment - 19/Nov/12 2:28 AM

look at something like:

/* This will trigger a watchdog timeout */
OBD_FAIL_TIMEOUT(OBD_FAIL_MDS_STATFS_LCW_SLEEP,
(MDT_SERVICE_WATCHDOG_FACTOR *
at_get(&svcpt->scp_at_estimate)) + 1);

Alex Zhuravlev added a comment - 19/Nov/12 2:28 AM look at something like: /* This will trigger a watchdog timeout */ OBD_FAIL_TIMEOUT(OBD_FAIL_MDS_STATFS_LCW_SLEEP, (MDT_SERVICE_WATCHDOG_FACTOR * at_get(&svcpt->scp_at_estimate)) + 1);

Li Wei (Inactive) added a comment - 19/Nov/12 1:13 AM

It is tricky to make sure the file creation (osp_precreate_reserve() and osp_precreate_get_id()) happens after osp_precreate_cleanup_orphans() has read opd_last_used_id and before the orphan cleanup RPC is replied.

Li Wei (Inactive) added a comment - 19/Nov/12 1:13 AM It is tricky to make sure the file creation (osp_precreate_reserve() and osp_precreate_get_id()) happens after osp_precreate_cleanup_orphans() has read opd_last_used_id and before the orphan cleanup RPC is replied.

Alex Zhuravlev added a comment - 15/Nov/12 8:58 AM

hmm, why is it tricky? basically we want non-empty pool in a specific OSP, then stop corresponded OST, wait till MDS got disconnected and try to create on using that specific OST ?

Alex Zhuravlev added a comment - 15/Nov/12 8:58 AM hmm, why is it tricky? basically we want non-empty pool in a specific OSP, then stop corresponded OST, wait till MDS got disconnected and try to create on using that specific OST ?

Li Wei (Inactive) added a comment - 15/Nov/12 8:55 AM

The new test is a bit tricky to get right, but here is a proof-of-concept fix for the major issue: http://review.whamcloud.com/4590.

Li Wei (Inactive) added a comment - 15/Nov/12 8:55 AM The new test is a bit tricky to get right, but here is a proof-of-concept fix for the major issue: http://review.whamcloud.com/4590 .

Alex Zhuravlev added a comment - 13/Nov/12 3:21 AM

no, I think the problem is there and it's more serious than you described. basically there should be no reservations/allocations done during orphan cleanup procedure. so I'm suggesting to develop a test reproducing this at first.

Alex Zhuravlev added a comment - 13/Nov/12 3:21 AM no, I think the problem is there and it's more serious than you described. basically there should be no reservations/allocations done during orphan cleanup procedure. so I'm suggesting to develop a test reproducing this at first.

Li Wei (Inactive) added a comment - 13/Nov/12 3:17 AM - edited

Alex, are you suggesting there is no problem needs to be fixed with regard to the race? But I don't get how existing code block allocations during orphan deletions.

Li Wei (Inactive) added a comment - 13/Nov/12 3:17 AM - edited Alex, are you suggesting there is no problem needs to be fixed with regard to the race? But I don't get how existing code block allocations during orphan deletions.

Alex Zhuravlev added a comment - 13/Nov/12 1:44 AM

I'd suggest to develop a test which block orphan cleanup procedure on OST and force MDS to allocate pre-created object. the race you described seem to be a subset of general case where any object allocation should be blocked during orphan cleanup procedure.

Alex Zhuravlev added a comment - 13/Nov/12 1:44 AM I'd suggest to develop a test which block orphan cleanup procedure on OST and force MDS to allocate pre-created object. the race you described seem to be a subset of general case where any object allocation should be blocked during orphan cleanup procedure.

Li Wei (Inactive) added a comment - 12/Nov/12 11:33 PM - edited

The file creation was satisfied from the pre-created pool without waiting for a pre-creation RPC. The race I was trying to point out is between osp_precreate_get_id() and osp_precreate_cleanup_orphans(). The latter sends on-disk last used IDs to OSTs, while osp_precreate_get_id() increments in-memory last used IDs. Hence, the race may cause the object IDs assigned based on pre-OST-failure counters to be destroyed by the concurrent orphan deletion request. The log shows that this happens.

If osp_precreate_cleanup_orphans() sends in-memory last used IDs instead, this problem would go away, I think. Of course, we need to make sure in-memory last used IDs are properly initialized with their on-disk counterparts before the initial orphan deletions. I haven't tried this yet. Any objections?

Li Wei (Inactive) added a comment - 12/Nov/12 11:33 PM - edited The file creation was satisfied from the pre-created pool without waiting for a pre-creation RPC. The race I was trying to point out is between osp_precreate_get_id() and osp_precreate_cleanup_orphans(). The latter sends on-disk last used IDs to OSTs, while osp_precreate_get_id() increments in-memory last used IDs. Hence, the race may cause the object IDs assigned based on pre-OST-failure counters to be destroyed by the concurrent orphan deletion request. The log shows that this happens. If osp_precreate_cleanup_orphans() sends in-memory last used IDs instead, this problem would go away, I think. Of course, we need to make sure in-memory last used IDs are properly initialized with their on-disk counterparts before the initial orphan deletions. I haven't tried this yet. Any objections?

Alex Zhuravlev added a comment - 12/Nov/12 5:05 AM

> The file creation in question happened after the OSP had sent the OBD_FL_DELORPHAN request but before the OSP got the reply and updated its counters. I think the file creation should wait until the last created ID was known from the OST.

correct, though this is supposed to be so in the code, look at osp_precreate_thread():

upon reconnect we restart the main loop and do not get to osp_precreate_send() before synchronous osp_precreate_cleanup_orphans() is completed.

probably there is a race between reconnect and ptlrpc_queue_wait() ?

Alex Zhuravlev added a comment - 12/Nov/12 5:05 AM > The file creation in question happened after the OSP had sent the OBD_FL_DELORPHAN request but before the OSP got the reply and updated its counters. I think the file creation should wait until the last created ID was known from the OST. correct, though this is supposed to be so in the code, look at osp_precreate_thread(): upon reconnect we restart the main loop and do not get to osp_precreate_send() before synchronous osp_precreate_cleanup_orphans() is completed. probably there is a race between reconnect and ptlrpc_queue_wait() ?

Li Wei (Inactive) added a comment - 12/Nov/12 3:34 AM

In both reports above, the processes writing to Lustre got ENOENT:

== replay-ost-single test 3: Fail OST during write, with verification ================================ 12:42:29 (1352148149)
Failing ost1 on node client-21-ib
CMD: client-21-ib grep -c /mnt/ost1' ' /proc/mounts
tee: standard output: No such file or directory
[...]

I'm able to reproduce this ENOENT locally on a single-node VM setup with ZFS targets:

== replay-ost-single test 2: |x| 10 open(O_CREAT)s == 12:38:36 (1352695116)
replay-ost-single.sh: line 72: echo: write error: No such file or directory
 replay-ost-single test_2: @@@@@@ FAIL: create /mnt/lustre/d0.replay-ost-single/f.replay-ost-single.2-2

From my syslog:

Nov 12 12:38:36 h221g kernel: LustreError: 22747:0:(osp_precreate.c:382:osp_precreate_cleanup_orphans()) lustre-OST0000-osc-MDT0000: going to cleanup orphans since 3
Nov 12 12:38:37 h221g kernel: LustreError: 22605:0:(ofd_obd.c:1069:ofd_orphans_destroy()) lustre-OST0000: deleting orphan objects from 5 to 67
Nov 12 12:38:37 h221g kernel: Lustre: DEBUG MARKER: == replay-ost-single test 2: |x| 10 open(O_CREAT)s == 12:38:36 (1352695116)
Nov 12 12:38:37 h221g kernel: LustreError: 22501:0:(osp_precreate.c:721:osp_precreate_reserve()) lustre-OST0000-osc-MDT0000: precreated 33: opd_pre_last_created 67opd_pre_used_id 34 opd_pre_reserved 0
Nov 12 12:38:37 h221g kernel: LustreError: 22501:0:(osp_precreate.c:780:osp_precreate_get_id()) Incremented opd_pre_used_id for OSP 0: 35
Nov 12 12:38:37 h221g kernel: LustreError: 22501:0:(osp_object.c:291:osp_object_create()) Written last used ID for OSP 0: 35: 0
Nov 12 12:38:41 h221g kernel: LustreError: 22605:0:(ldlm_resource.c:1107:ldlm_resource_get()) lvbo_init failed for resource 34: rc -2
Nov 12 12:38:42 h221g kernel: LustreError: 22605:0:(ldlm_resource.c:1107:ldlm_resource_get()) lvbo_init failed for resource 33: rc -2
Nov 12 12:38:42 h221g kernel: LustreError: 22501:0:(osp_precreate.c:721:osp_precreate_reserve()) lustre-OST0000-osc-MDT0000: precreated 32: opd_pre_last_created 67opd_pre_used_id 35 opd_pre_reserved 0
Nov 12 12:38:42 h221g kernel: LustreError: 22501:0:(osp_precreate.c:780:osp_precreate_get_id()) Incremented opd_pre_used_id for OSP 0: 36
Nov 12 12:38:42 h221g kernel: LustreError: 22501:0:(osp_object.c:291:osp_object_create()) Written last used ID for OSP 0: 36: 0
Nov 12 12:38:42 h221g kernel: Lustre: DEBUG MARKER: replay-ost-single test_2: @@@@@@ FAIL: create /mnt/lustre/d0.replay-ost-single/f.replay-ost-single.2-2
Nov 12 12:38:42 h221g kernel: LustreError: 22605:0:(ldlm_resource.c:1107:ldlm_resource_get()) Skipped 1 previous similar message
Nov 12 12:38:42 h221g kernel: LustreError: 22605:0:(ldlm_resource.c:1107:ldlm_resource_get()) lvbo_init failed for resource 32: rc -2
Nov 12 12:38:43 h221g kernel: LustreError: 22747:0:(osp_precreate.c:447:osp_precreate_cleanup_orphans()) lustre-OST0000-osc-MDT0000: Got last_id 67 from OST, last_used is 36, next 67
Nov 12 12:38:43 h221g kernel: LustreError: 22747:0:(osp_precreate.c:284:osp_precreate_send()) lustre-OST0000-osc-MDT0000: new last_created 100

The file creation in question happened after the OSP had sent the OBD_FL_DELORPHAN request but before the OSP got the reply and updated its counters. I think the file creation should wait until the last created ID was known from the OST.

Another problem observed during the investigation is that osp sends last used ID plus one in OBD_FL_DELORPHAN requests but OFD (as well as OBDFilter) expects just last used ID.

I'll submit patches to fix these issues.

Li Wei (Inactive) added a comment - 12/Nov/12 3:34 AM In both reports above, the processes writing to Lustre got ENOENT: == replay-ost-single test 3: Fail OST during write, with verification ================================ 12:42:29 (1352148149) Failing ost1 on node client-21-ib CMD: client-21-ib grep -c /mnt/ost1' ' /proc/mounts tee: standard output: No such file or directory [...] I'm able to reproduce this ENOENT locally on a single-node VM setup with ZFS targets: == replay-ost-single test 2: |x| 10 open(O_CREAT)s == 12:38:36 (1352695116) replay-ost-single.sh: line 72: echo: write error: No such file or directory replay-ost-single test_2: @@@@@@ FAIL: create /mnt/lustre/d0.replay-ost-single/f.replay-ost-single.2-2 From my syslog: Nov 12 12:38:36 h221g kernel: LustreError: 22747:0:(osp_precreate.c:382:osp_precreate_cleanup_orphans()) lustre-OST0000-osc-MDT0000: going to cleanup orphans since 3 Nov 12 12:38:37 h221g kernel: LustreError: 22605:0:(ofd_obd.c:1069:ofd_orphans_destroy()) lustre-OST0000: deleting orphan objects from 5 to 67 Nov 12 12:38:37 h221g kernel: Lustre: DEBUG MARKER: == replay-ost-single test 2: |x| 10 open(O_CREAT)s == 12:38:36 (1352695116) Nov 12 12:38:37 h221g kernel: LustreError: 22501:0:(osp_precreate.c:721:osp_precreate_reserve()) lustre-OST0000-osc-MDT0000: precreated 33: opd_pre_last_created 67opd_pre_used_id 34 opd_pre_reserved 0 Nov 12 12:38:37 h221g kernel: LustreError: 22501:0:(osp_precreate.c:780:osp_precreate_get_id()) Incremented opd_pre_used_id for OSP 0: 35 Nov 12 12:38:37 h221g kernel: LustreError: 22501:0:(osp_object.c:291:osp_object_create()) Written last used ID for OSP 0: 35: 0 Nov 12 12:38:41 h221g kernel: LustreError: 22605:0:(ldlm_resource.c:1107:ldlm_resource_get()) lvbo_init failed for resource 34: rc -2 Nov 12 12:38:42 h221g kernel: LustreError: 22605:0:(ldlm_resource.c:1107:ldlm_resource_get()) lvbo_init failed for resource 33: rc -2 Nov 12 12:38:42 h221g kernel: LustreError: 22501:0:(osp_precreate.c:721:osp_precreate_reserve()) lustre-OST0000-osc-MDT0000: precreated 32: opd_pre_last_created 67opd_pre_used_id 35 opd_pre_reserved 0 Nov 12 12:38:42 h221g kernel: LustreError: 22501:0:(osp_precreate.c:780:osp_precreate_get_id()) Incremented opd_pre_used_id for OSP 0: 36 Nov 12 12:38:42 h221g kernel: LustreError: 22501:0:(osp_object.c:291:osp_object_create()) Written last used ID for OSP 0: 36: 0 Nov 12 12:38:42 h221g kernel: Lustre: DEBUG MARKER: replay-ost-single test_2: @@@@@@ FAIL: create /mnt/lustre/d0.replay-ost-single/f.replay-ost-single.2-2 Nov 12 12:38:42 h221g kernel: LustreError: 22605:0:(ldlm_resource.c:1107:ldlm_resource_get()) Skipped 1 previous similar message Nov 12 12:38:42 h221g kernel: LustreError: 22605:0:(ldlm_resource.c:1107:ldlm_resource_get()) lvbo_init failed for resource 32: rc -2 Nov 12 12:38:43 h221g kernel: LustreError: 22747:0:(osp_precreate.c:447:osp_precreate_cleanup_orphans()) lustre-OST0000-osc-MDT0000: Got last_id 67 from OST, last_used is 36, next 67 Nov 12 12:38:43 h221g kernel: LustreError: 22747:0:(osp_precreate.c:284:osp_precreate_send()) lustre-OST0000-osc-MDT0000: new last_created 100 The file creation in question happened after the OSP had sent the OBD_FL_DELORPHAN request but before the OSP got the reply and updated its counters. I think the file creation should wait until the last created ID was known from the OST. Another problem observed during the investigation is that osp sends last used ID plus one in OBD_FL_DELORPHAN requests but OFD (as well as OBDFilter) expects just last used ID. I'll submit patches to fix these issues.

Li Wei (Inactive) added a comment - 12/Nov/12 3:14 AM

https://maloo.whamcloud.com/test_sets/a2a98882-2865-11e2-9c12-52540035b04c

Li Wei (Inactive) added a comment - 12/Nov/12 3:14 AM https://maloo.whamcloud.com/test_sets/a2a98882-2865-11e2-9c12-52540035b04c

People

Assignee:: Li Wei (Inactive)

Reporter:: Maloo

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 05/Nov/12 11:45 PM

Updated:: 22/Dec/17 10:29 AM

Resolved:: 22/Dec/17 10:29 AM