[LU-9212] Lustre errors on precreation (e -115) Created: 09/Mar/17  Updated: 21/Mar/18  Resolved: 21/Mar/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: cv_eng (Inactive) Assignee: Peter Jones
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

[root@storage06 ~]# uname -r
3.10.0-327.3.1.el7_lustre.x86_64
[root@storage06 ~]# cat /etc/redhat-release
Scientific Linux release 7.2 (Nitrogen)


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Dear Support,

One of our customer contacted us because of the following errors:
[root@mds01 ~]# dmesg -T | grep -E "opd_pre_status -116|opd_pre_status -115"
[Thu Mar 9 14:50:45 2017] LustreError: 25696:0:(osp_precreate.c:1243:osp_precreate_ready_condition()) wurfs-OST001b-osc-MDT0000: precreate failed opd_pre_status -115
[Thu Mar 9 15:01:44 2017] LustreError: 25753:0:(osp_precreate.c:1243:osp_precreate_ready_condition()) wurfs-OST001b-osc-MDT0000: precreate failed opd_pre_status -115
[Thu Mar 9 15:11:46 2017] LustreError: 12529:0:(osp_precreate.c:1243:osp_precreate_ready_condition()) wurfs-OST001b-osc-MDT0000: precreate failed opd_pre_status -115
[Thu Mar 9 15:22:04 2017] LustreError: 25542:0:(osp_precreate.c:1243:osp_precreate_ready_condition()) wurfs-OST001b-osc-MDT0000: precreate failed opd_pre_status -115
[Thu Mar 9 15:33:14 2017] LustreError: 25248:0:(osp_precreate.c:1243:osp_precreate_ready_condition()) wurfs-OST0021-osc-MDT0000: precreate failed opd_pre_status -116
[Thu Mar 9 15:43:56 2017] LustreError: 25656:0:(osp_precreate.c:1243:osp_precreate_ready_condition()) wurfs-OST001b-osc-MDT0000: precreate failed opd_pre_status -115
[Thu Mar 9 15:54:21 2017] LustreError: 25645:0:(osp_precreate.c:1243:osp_precreate_ready_condition()) wurfs-OST001c-osc-MDT0000: precreate failed opd_pre_status -115

I have started focusing on one of the osts that are reporting the error: ost001b.
It is currently on storage06:
[9752632.984562] Lustre: wurfs-OST001c-os: trigger OI scrub by RPC for [0x1001c0000:0x1ad9d9d:0x0], rc = 0 [1]
[9752632.984569] Lustre: Skipped 127 previous similar messages
[9752877.938323] LustreError: 23938:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST0021: unable to precreate: rc = -116
[9752877.938334] LustreError: 23938:0:(ofd_dev.c:1781:ofd_create_hdl()) Skipped 266 previous similar messages
[9752884.305673] mpt2sas1: log_info(0x31120302): originator(PL), code(0x12), sub_code(0x0302)
[9753043.238296] LustreError: 42247:0:(osd_compat.c:593:osd_obj_update_entry()) wurfs-OST001c-osd: the FID [0x580000400:0x2d2c:0x0] is used by two objects: 524529/2876391915 1582/813719656
[9753043.238307] LustreError: 42247:0:(osd_compat.c:593:osd_obj_update_entry()) Skipped 3455 previous similar messages
[9753235.723470] Lustre: wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0], rc = 0 [1]
[9753235.723478] Lustre: Skipped 130 previous similar messages
[9753429.709498] mpt2sas1: log_info(0x31120302): originator(PL), code(0x12), sub_code(0x0302)
[9753478.487827] LustreError: 42848:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001c: unable to precreate: rc = -115
[9753478.487835] LustreError: 42848:0:(ofd_dev.c:1781:ofd_create_hdl()) Skipped 263 previous similar messages
[9753648.598453] LustreError: 4652:0:(osd_compat.c:593:osd_obj_update_entry()) wurfs-OST001c-osd: the FID [0x580000400:0x2d2c:0x0] is used by two objects: 524529/2876391915 1582/813719656
[9753648.598462] LustreError: 4652:0:(osd_compat.c:593:osd_obj_update_entry()) Skipped 3509 previous similar messages

I can see these messages repeated a lot of times.
[root@storage06 ~]# mount | grep mapper
/dev/mapper/ost001b on /lustre/ost1b type lustre (ro)
/dev/mapper/ost001c on /lustre/ost1c type lustre (ro)
/dev/mapper/ost001d on /lustre/ost1d type lustre (ro)
/dev/mapper/ost0021 on /lustre/ost21 type lustre (ro)
/dev/mapper/ost0022 on /lustre/ost22 type lustre (ro)
/dev/mapper/ost0023 on /lustre/ost23 type lustre (ro)

The customer has run: lctl lfsck_start -M wurfs-OST001b but it didnt help.

I suppose that the next step is:
debugfs -c -R "ncheck 524529 1582" /dev//dev/mapper/ost001b
(I wonder if it can be done while the fs is mounted.)



 Comments   
Comment by cv_eng (Inactive) [ 09/Mar/17 ]

I pressed enter my mistake.

However, the next step, i think it would be:
debugfs -c -R "stat <result from previous command>" dev/mapper/ost001b | grep fid
so we can understand which are the files it complains about.

Then, mount the OST as ldiskfs mode and remove the files?
Unfortunately seems that the customer has noticed issues, only because the files were being written to the OSTs where there are no errors, so looking at the logs doesnt seem to help.

(We have this error for multiple OSTs)

Comment by Joseph Gmitter (Inactive) [ 09/Mar/17 ]

Hi cv_eng,

I see the version was selected as 2.8.0. Was that due to filing it originally as an LU ticket (moved over to your CV project now) or is the site really using 2.8.0? Also, can you tell us what site this is supporting?

Thanks.
Joe

Comment by cv_eng (Inactive) [ 09/Mar/17 ]

Hello Joe,

The customer is currently using Lustre 2.8.0 (no IEEL).
The customer i am talking about is Universiteit van Wageningen.

I am not sure i have opened the ticket in the right place.

Kind regards,
Andrea

Comment by Brad Hoagland (Inactive) [ 10/Mar/17 ]

Hi cv_eng,

Could your provide the Confirmation of Service (COS) number for this site?

Thank you,

Brad

Comment by cv_eng (Inactive) [ 12/Mar/17 ]

Dear Brad,

I am not able to get it as of now, will try monday.
Is there anything else that we can do meanwhile?

Kind regards,

Comment by cv_eng (Inactive) [ 13/Mar/17 ]

Dear Brad, all,

Please note that this is the community version of Lustre and there is no support contract paid.

Kind regards,

Comment by cv_eng (Inactive) [ 13/Mar/17 ]

Dear all,

Keeping you posted.
We have disabled the OSS with the OST and checked the filesystem.
I have no error on the OSS.

I have an error on the MDS when i try to create a file on that OST:
[root@node009 brokenosts]# touch test2
touch: cannot touch ‘test2’: No space left on device

On the mds:
Mar 13 16:51:05 mds01 kernel: LustreError: 25716:0:(osp_precreate.c:1243:osp_precreate_ready_condition()) wurfs-OST001b-osc-MDT0000: precreate failed opd_pre_status -115

Shall we schedule a downtime and run a fsck on the mdt volume?

Kind regards,

Comment by Peter Jones [ 15/Mar/17 ]

You may find that you get more prompt community support from directing such questions to the community mailing list. More people subscribe to those than review tickets in the issue tracking systm.

Comment by cv_eng (Inactive) [ 16/Mar/17 ]

Dear all,

Any reply would be appreciated.
The current situation is the following.
We have stopped pacemaker on the storage06 (which is the one that has that resource running):
[root@storage06 log]# pcs status | grep 1b
storage-ost001b (ocf::heartbeat:Filesystem): Started storage06.failover.cluster

  • storage-ost001b_monitor_120000 on storage06.failover.cluster 'not running' (7): call=295, status=complete, exitreason='none'
    *

And then we have tried to execute e2fsck -n /dev/mapper/ost001b.
The e2fsck has reported nothing to be repaired.

Today, i noticed that there are still errors and we can't create files on this OST:
[Mon Mar 13 18:36:44 2017] LustreError: 42126:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
[Mon Mar 13 18:46:44 2017] LustreError: 35949:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
[Mon Mar 13 18:56:44 2017] LustreError: 26996:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
[Mon Mar 13 19:06:45 2017] LustreError: 26989:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
[Tue Mar 14 03:37:13 2017] LustreError: 26995:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
[Tue Mar 14 03:47:13 2017] LustreError: 44782:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
[Tue Mar 14 03:57:14 2017] LustreError: 35964:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
[Tue Mar 14 04:07:14 2017] LustreError: 35964:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
[Tue Mar 14 04:17:14 2017] LustreError: 26994:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
[Tue Mar 14 04:27:15 2017] LustreError: 27006:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
[Tue Mar 14 04:37:15 2017] LustreError: 27006:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
[Tue Mar 14 04:47:15 2017] LustreError: 35964:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
[Tue Mar 14 07:07:30 2017] LustreError: 35960:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115

Llooking at cat /usr/include/asm-generic/errno.h, seems that error refers to:
#define EINPROGRESS 115 /* Operation now in progress */
#define ESTALE 116 /* Stale file handle */

(on some other osts we do have error 116 as well)

Will also try to use the mailing list.

Kind regards,

Comment by cv_eng (Inactive) [ 16/Mar/17 ]

Dear Peter,

UPDATE:
I have noticed that on this OST (wurfs-OST001b) The IO Scrub gets launched every ~7 seconds:
[root@storage06 wurfs-OST001b]# cat oi_scrub
name: OI_scrub
magic: 0x4c5fd252
oi_files: 64
status: completed
flags:
param:
time_since_last_completed: 8 seconds
time_since_latest_start: 8 seconds
time_since_last_checkpoint: 8 seconds
latest_start_position: 12
last_checkpoint_position: 30515713
first_failure_position: N/A
checked: 3417
updated: 0
failed: 0
prior_updated: 0
noscrub: 0
igif: 1
success_count: 2526979
run_time: 0 seconds
average_speed: 3417 objects/sec
real-time_speed: N/A
current_position: N/A
lf_scanned: 0
lf_repaired: 0
lf_failed: 0
[root@storage06 wurfs-OST001b]# cat oi_scrub
name: OI_scrub
magic: 0x4c5fd252
oi_files: 64
status: completed
flags:
param:
time_since_last_completed: 2 seconds
time_since_latest_start: 2 seconds
time_since_last_checkpoint: 2 seconds
latest_start_position: 12
last_checkpoint_position: 30515713
first_failure_position: N/A
checked: 3417
updated: 0
failed: 0
prior_updated: 0
noscrub: 0
igif: 1
success_count: 2526980
run_time: 0 seconds
average_speed: 3417 objects/sec
real-time_speed: N/A
current_position: N/A
lf_scanned: 0
lf_repaired: 0
lf_failed: 0

And, dumping the logs from the ring buffer i see:
00080000:02000400:24.0:1489665812.888068:0:35949:0:(osd_handler.c:860:osd_fid_lookup()) wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0], rc = 0 [1]
00002000:00020000:24.0:1489665812.888083:0:35949:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
00100000:10000000:27.0:1489665812.923388:0:40057:0:(osd_scrub.c:758:osd_scrub_post()) wurfs-OST001b: OI scrub post, result = 1
00100000:10000000:27.0:1489665812.923400:0:40057:0:(osd_scrub.c:1520:osd_scrub_main()) wurfs-OST001b: OI scrub: stop, pos = 30515713: rc = 1
00002000:00080000:24.0:1489665822.903706:0:35949:0:(ofd_dev.c:1747:ofd_create_hdl()) wurfs-OST001b: reserve 64 objects in group 0x0 at 26893346
00100000:10000000:27.0:1489665822.903984:0:40212:0:(osd_scrub.c:660:osd_scrub_prep()) wurfs-OST001b: OI scrub prep, flags = 0x4e
00100000:10000000:27.0:1489665822.903992:0:40212:0:(osd_scrub.c:278:osd_scrub_file_reset()) wurfs-OST001b: reset OI scrub file, old flags = 0x0, add flags = 0x0
00100000:10000000:27.0:1489665822.904016:0:40212:0:(osd_scrub.c:1510:osd_scrub_main()) wurfs-OST001b: OI scrub start, flags = 0x4e, pos = 12
00080000:02000400:24.0:1489665822.904062:0:35949:0:(osd_handler.c:860:osd_fid_lookup()) wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0], rc = 0 [1]
00002000:00020000:24.0:1489665822.904079:0:35949:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
00100000:10000000:27.0:1489665822.940373:0:40212:0:(osd_scrub.c:758:osd_scrub_post()) wurfs-OST001b: OI scrub post, result = 1
00100000:10000000:27.0:1489665822.940385:0:40212:0:(osd_scrub.c:1520:osd_scrub_main()) wurfs-OST001b: OI scrub: stop, pos = 30515713: rc = 1
00002000:00080000:8.0:1489665832.919771:0:10464:0:(ofd_dev.c:1747:ofd_create_hdl()) wurfs-OST001b: reserve 64 objects in group 0x0 at 26893346
00100000:10000000:20.0:1489665832.920031:0:40406:0:(osd_scrub.c:660:osd_scrub_prep()) wurfs-OST001b: OI scrub prep, flags = 0x4e
00100000:10000000:20.0:1489665832.920037:0:40406:0:(osd_scrub.c:278:osd_scrub_file_reset()) wurfs-OST001b: reset OI scrub file, old flags = 0x0, add flags = 0x0
00100000:10000000:20.0:1489665832.920057:0:40406:0:(osd_scrub.c:1510:osd_scrub_main()) wurfs-OST001b: OI scrub start, flags = 0x4e, pos = 12
00080000:02000400:8.0:1489665832.920094:0:10464:0:(osd_handler.c:860:osd_fid_lookup()) wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0], rc = 0 [1]
00002000:00020000:8.0:1489665832.920113:0:10464:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
00100000:10000000:20.0:1489665832.955088:0:40406:0:(osd_scrub.c:758:osd_scrub_post()) wurfs-OST001b: OI scrub post, result = 1
00100000:10000000:20.0:1489665832.955101:0:40406:0:(osd_scrub.c:1520:osd_scrub_main()) wurfs-OST001b: OI scrub: stop, pos = 30515713: rc = 1
00002000:00080000:30.0:1489665842.935720:0:35960:0:(ofd_dev.c:1747:ofd_create_hdl()) wurfs-OST001b: reserve 64 objects in group 0x0 at 26893346
00100000:10000000:27.0:1489665842.936008:0:40553:0:(osd_scrub.c:660:osd_scrub_prep()) wurfs-OST001b: OI scrub prep, flags = 0x4e
00100000:10000000:27.0:1489665842.936015:0:40553:0:(osd_scrub.c:278:osd_scrub_file_reset()) wurfs-OST001b: reset OI scrub file, old flags = 0x0, add flags = 0x0
00100000:10000000:27.0:1489665842.936038:0:40553:0:(osd_scrub.c:1510:osd_scrub_main()) wurfs-OST001b: OI scrub start, flags = 0x4e, pos = 12
00080000:02000400:30.0:1489665842.936081:0:35960:0:(osd_handler.c:860:osd_fid_lookup()) wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0], rc = 0 [1]
00002000:00020000:30.0:1489665842.936096:0:35960:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
00100000:10000000:27.0:1489665842.972129:0:40553:0:(osd_scrub.c:758:osd_scrub_post()) wurfs-OST001b: OI scrub post, result = 1
00100000:10000000:27.0:1489665842.972141:0:40553:0:(osd_scrub.c:1520:osd_scrub_main()) wurfs-OST001b: OI scrub: stop, pos = 30515713: rc = 1
00002000:00080000:10.0:1489665852.951770:0:35949:0:(ofd_dev.c:1747:ofd_create_hdl()) wurfs-OST001b: reserve 64 objects in group 0x0 at 26893346
00100000:10000000:18.0:1489665852.951986:0:40838:0:(osd_scrub.c:660:osd_scrub_prep()) wurfs-OST001b: OI scrub prep, flags = 0x4e
00100000:10000000:18.0:1489665852.951992:0:40838:0:(osd_scrub.c:278:osd_scrub_file_reset()) wurfs-OST001b: reset OI scrub file, old flags = 0x0, add flags = 0x0
00100000:10000000:18.0:1489665852.952017:0:40838:0:(osd_scrub.c:1510:osd_scrub_main()) wurfs-OST001b: OI scrub start, flags = 0x4e, pos = 12
00080000:02000400:10.0:1489665852.952060:0:35949:0:(osd_handler.c:860:osd_fid_lookup()) wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0], rc = 0 [1]
00002000:00020000:10.0:1489665852.952089:0:35949:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
00100000:10000000:18.0:1489665852.987792:0:40838:0:(osd_scrub.c:758:osd_scrub_post()) wurfs-OST001b: OI scrub post, result = 1
00100000:10000000:18.0:1489665852.987804:0:40838:0:(osd_scrub.c:1520:osd_scrub_main()) wurfs-OST001b: OI scrub: stop, pos = 30515713: rc = 1
00002000:00080000:8.0:1489665862.967664:0:35949:0:(ofd_dev.c:1747:ofd_create_hdl()) wurfs-OST001b: reserve 64 objects in group 0x0 at 26893346
00100000:10000000:27.0:1489665862.967948:0:41207:0:(osd_scrub.c:660:osd_scrub_prep()) wurfs-OST001b: OI scrub prep, flags = 0x4e
00100000:10000000:27.0:1489665862.967955:0:41207:0:(osd_scrub.c:278:osd_scrub_file_reset()) wurfs-OST001b: reset OI scrub file, old flags = 0x0, add flags = 0x0
00100000:10000000:27.0:1489665862.967982:0:41207:0:(osd_scrub.c:1510:osd_scrub_main()) wurfs-OST001b: OI scrub start, flags = 0x4e, pos = 12
00080000:02000400:8.0:1489665862.968024:0:35949:0:(osd_handler.c:860:osd_fid_lookup()) wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0], rc = 0 [1]
00002000:00020000:8.0:1489665862.968040:0:35949:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
00100000:10000000:27.0:1489665863.004087:0:41207:0:(osd_scrub.c:758:osd_scrub_post()) wurfs-OST001b: OI scrub post, result = 1
00100000:10000000:27.0:1489665863.004098:0:41207:0:(osd_scrub.c:1520:osd_scrub_main()) wurfs-OST001b: OI scrub: stop, pos = 30515713: rc = 1

I tried to see where that FID leads but seems that the file doesnt actually exist;
(The customer has moved everything away from this osts)
[root@nfs01 ~]# lfs fid2path wurfs "[0x1001b0000:0x19a5c22:0x0]"
ioctl err -22: Invalid argument (22)
fid2path: error on FID [0x1001b0000:0x19a5c22:0x0]: Invalid argument

Any ideas?

Comment by James Casper [ 30/Mar/17 ]

Saw something similar on master b3541:

https://testing.hpdd.intel.com/test_sets/7d9a7bb2-0a12-11e7-8c9f-5254006e85c2
LU-9277: (ofd_dev.c:1776:ofd_create_hdl()) lustre-OST0002: unable to precreate: rc = -30
LU-9212: (ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115

Comment by Peter Jones [ 21/Mar/18 ]

Unsupported site running older version

Generated at Sat Feb 10 02:24:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.