|
I pressed enter my mistake.
However, the next step, i think it would be:
debugfs -c -R "stat <result from previous command>" dev/mapper/ost001b | grep fid
so we can understand which are the files it complains about.
Then, mount the OST as ldiskfs mode and remove the files?
Unfortunately seems that the customer has noticed issues, only because the files were being written to the OSTs where there are no errors, so looking at the logs doesnt seem to help.
(We have this error for multiple OSTs)
|
|
Dear all,
Any reply would be appreciated.
The current situation is the following.
We have stopped pacemaker on the storage06 (which is the one that has that resource running):
[root@storage06 log]# pcs status | grep 1b
storage-ost001b (ocf::heartbeat:Filesystem): Started storage06.failover.cluster
- storage-ost001b_monitor_120000 on storage06.failover.cluster 'not running' (7): call=295, status=complete, exitreason='none'
*
And then we have tried to execute e2fsck -n /dev/mapper/ost001b.
The e2fsck has reported nothing to be repaired.
Today, i noticed that there are still errors and we can't create files on this OST:
[Mon Mar 13 18:36:44 2017] LustreError: 42126:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
[Mon Mar 13 18:46:44 2017] LustreError: 35949:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
[Mon Mar 13 18:56:44 2017] LustreError: 26996:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
[Mon Mar 13 19:06:45 2017] LustreError: 26989:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
[Tue Mar 14 03:37:13 2017] LustreError: 26995:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
[Tue Mar 14 03:47:13 2017] LustreError: 44782:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
[Tue Mar 14 03:57:14 2017] LustreError: 35964:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
[Tue Mar 14 04:07:14 2017] LustreError: 35964:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
[Tue Mar 14 04:17:14 2017] LustreError: 26994:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
[Tue Mar 14 04:27:15 2017] LustreError: 27006:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
[Tue Mar 14 04:37:15 2017] LustreError: 27006:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
[Tue Mar 14 04:47:15 2017] LustreError: 35964:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
[Tue Mar 14 07:07:30 2017] LustreError: 35960:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
Llooking at cat /usr/include/asm-generic/errno.h, seems that error refers to:
#define EINPROGRESS 115 /* Operation now in progress */
#define ESTALE 116 /* Stale file handle */
(on some other osts we do have error 116 as well)
Will also try to use the mailing list.
Kind regards,
|
|
Dear Peter,
UPDATE:
I have noticed that on this OST (wurfs-OST001b) The IO Scrub gets launched every ~7 seconds:
[root@storage06 wurfs-OST001b]# cat oi_scrub
name: OI_scrub
magic: 0x4c5fd252
oi_files: 64
status: completed
flags:
param:
time_since_last_completed: 8 seconds
time_since_latest_start: 8 seconds
time_since_last_checkpoint: 8 seconds
latest_start_position: 12
last_checkpoint_position: 30515713
first_failure_position: N/A
checked: 3417
updated: 0
failed: 0
prior_updated: 0
noscrub: 0
igif: 1
success_count: 2526979
run_time: 0 seconds
average_speed: 3417 objects/sec
real-time_speed: N/A
current_position: N/A
lf_scanned: 0
lf_repaired: 0
lf_failed: 0
[root@storage06 wurfs-OST001b]# cat oi_scrub
name: OI_scrub
magic: 0x4c5fd252
oi_files: 64
status: completed
flags:
param:
time_since_last_completed: 2 seconds
time_since_latest_start: 2 seconds
time_since_last_checkpoint: 2 seconds
latest_start_position: 12
last_checkpoint_position: 30515713
first_failure_position: N/A
checked: 3417
updated: 0
failed: 0
prior_updated: 0
noscrub: 0
igif: 1
success_count: 2526980
run_time: 0 seconds
average_speed: 3417 objects/sec
real-time_speed: N/A
current_position: N/A
lf_scanned: 0
lf_repaired: 0
lf_failed: 0
And, dumping the logs from the ring buffer i see:
00080000:02000400:24.0:1489665812.888068:0:35949:0:(osd_handler.c:860:osd_fid_lookup()) wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0], rc = 0 [1]
00002000:00020000:24.0:1489665812.888083:0:35949:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
00100000:10000000:27.0:1489665812.923388:0:40057:0:(osd_scrub.c:758:osd_scrub_post()) wurfs-OST001b: OI scrub post, result = 1
00100000:10000000:27.0:1489665812.923400:0:40057:0:(osd_scrub.c:1520:osd_scrub_main()) wurfs-OST001b: OI scrub: stop, pos = 30515713: rc = 1
00002000:00080000:24.0:1489665822.903706:0:35949:0:(ofd_dev.c:1747:ofd_create_hdl()) wurfs-OST001b: reserve 64 objects in group 0x0 at 26893346
00100000:10000000:27.0:1489665822.903984:0:40212:0:(osd_scrub.c:660:osd_scrub_prep()) wurfs-OST001b: OI scrub prep, flags = 0x4e
00100000:10000000:27.0:1489665822.903992:0:40212:0:(osd_scrub.c:278:osd_scrub_file_reset()) wurfs-OST001b: reset OI scrub file, old flags = 0x0, add flags = 0x0
00100000:10000000:27.0:1489665822.904016:0:40212:0:(osd_scrub.c:1510:osd_scrub_main()) wurfs-OST001b: OI scrub start, flags = 0x4e, pos = 12
00080000:02000400:24.0:1489665822.904062:0:35949:0:(osd_handler.c:860:osd_fid_lookup()) wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0], rc = 0 [1]
00002000:00020000:24.0:1489665822.904079:0:35949:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
00100000:10000000:27.0:1489665822.940373:0:40212:0:(osd_scrub.c:758:osd_scrub_post()) wurfs-OST001b: OI scrub post, result = 1
00100000:10000000:27.0:1489665822.940385:0:40212:0:(osd_scrub.c:1520:osd_scrub_main()) wurfs-OST001b: OI scrub: stop, pos = 30515713: rc = 1
00002000:00080000:8.0:1489665832.919771:0:10464:0:(ofd_dev.c:1747:ofd_create_hdl()) wurfs-OST001b: reserve 64 objects in group 0x0 at 26893346
00100000:10000000:20.0:1489665832.920031:0:40406:0:(osd_scrub.c:660:osd_scrub_prep()) wurfs-OST001b: OI scrub prep, flags = 0x4e
00100000:10000000:20.0:1489665832.920037:0:40406:0:(osd_scrub.c:278:osd_scrub_file_reset()) wurfs-OST001b: reset OI scrub file, old flags = 0x0, add flags = 0x0
00100000:10000000:20.0:1489665832.920057:0:40406:0:(osd_scrub.c:1510:osd_scrub_main()) wurfs-OST001b: OI scrub start, flags = 0x4e, pos = 12
00080000:02000400:8.0:1489665832.920094:0:10464:0:(osd_handler.c:860:osd_fid_lookup()) wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0], rc = 0 [1]
00002000:00020000:8.0:1489665832.920113:0:10464:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
00100000:10000000:20.0:1489665832.955088:0:40406:0:(osd_scrub.c:758:osd_scrub_post()) wurfs-OST001b: OI scrub post, result = 1
00100000:10000000:20.0:1489665832.955101:0:40406:0:(osd_scrub.c:1520:osd_scrub_main()) wurfs-OST001b: OI scrub: stop, pos = 30515713: rc = 1
00002000:00080000:30.0:1489665842.935720:0:35960:0:(ofd_dev.c:1747:ofd_create_hdl()) wurfs-OST001b: reserve 64 objects in group 0x0 at 26893346
00100000:10000000:27.0:1489665842.936008:0:40553:0:(osd_scrub.c:660:osd_scrub_prep()) wurfs-OST001b: OI scrub prep, flags = 0x4e
00100000:10000000:27.0:1489665842.936015:0:40553:0:(osd_scrub.c:278:osd_scrub_file_reset()) wurfs-OST001b: reset OI scrub file, old flags = 0x0, add flags = 0x0
00100000:10000000:27.0:1489665842.936038:0:40553:0:(osd_scrub.c:1510:osd_scrub_main()) wurfs-OST001b: OI scrub start, flags = 0x4e, pos = 12
00080000:02000400:30.0:1489665842.936081:0:35960:0:(osd_handler.c:860:osd_fid_lookup()) wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0], rc = 0 [1]
00002000:00020000:30.0:1489665842.936096:0:35960:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
00100000:10000000:27.0:1489665842.972129:0:40553:0:(osd_scrub.c:758:osd_scrub_post()) wurfs-OST001b: OI scrub post, result = 1
00100000:10000000:27.0:1489665842.972141:0:40553:0:(osd_scrub.c:1520:osd_scrub_main()) wurfs-OST001b: OI scrub: stop, pos = 30515713: rc = 1
00002000:00080000:10.0:1489665852.951770:0:35949:0:(ofd_dev.c:1747:ofd_create_hdl()) wurfs-OST001b: reserve 64 objects in group 0x0 at 26893346
00100000:10000000:18.0:1489665852.951986:0:40838:0:(osd_scrub.c:660:osd_scrub_prep()) wurfs-OST001b: OI scrub prep, flags = 0x4e
00100000:10000000:18.0:1489665852.951992:0:40838:0:(osd_scrub.c:278:osd_scrub_file_reset()) wurfs-OST001b: reset OI scrub file, old flags = 0x0, add flags = 0x0
00100000:10000000:18.0:1489665852.952017:0:40838:0:(osd_scrub.c:1510:osd_scrub_main()) wurfs-OST001b: OI scrub start, flags = 0x4e, pos = 12
00080000:02000400:10.0:1489665852.952060:0:35949:0:(osd_handler.c:860:osd_fid_lookup()) wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0], rc = 0 [1]
00002000:00020000:10.0:1489665852.952089:0:35949:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
00100000:10000000:18.0:1489665852.987792:0:40838:0:(osd_scrub.c:758:osd_scrub_post()) wurfs-OST001b: OI scrub post, result = 1
00100000:10000000:18.0:1489665852.987804:0:40838:0:(osd_scrub.c:1520:osd_scrub_main()) wurfs-OST001b: OI scrub: stop, pos = 30515713: rc = 1
00002000:00080000:8.0:1489665862.967664:0:35949:0:(ofd_dev.c:1747:ofd_create_hdl()) wurfs-OST001b: reserve 64 objects in group 0x0 at 26893346
00100000:10000000:27.0:1489665862.967948:0:41207:0:(osd_scrub.c:660:osd_scrub_prep()) wurfs-OST001b: OI scrub prep, flags = 0x4e
00100000:10000000:27.0:1489665862.967955:0:41207:0:(osd_scrub.c:278:osd_scrub_file_reset()) wurfs-OST001b: reset OI scrub file, old flags = 0x0, add flags = 0x0
00100000:10000000:27.0:1489665862.967982:0:41207:0:(osd_scrub.c:1510:osd_scrub_main()) wurfs-OST001b: OI scrub start, flags = 0x4e, pos = 12
00080000:02000400:8.0:1489665862.968024:0:35949:0:(osd_handler.c:860:osd_fid_lookup()) wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0], rc = 0 [1]
00002000:00020000:8.0:1489665862.968040:0:35949:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115
00100000:10000000:27.0:1489665863.004087:0:41207:0:(osd_scrub.c:758:osd_scrub_post()) wurfs-OST001b: OI scrub post, result = 1
00100000:10000000:27.0:1489665863.004098:0:41207:0:(osd_scrub.c:1520:osd_scrub_main()) wurfs-OST001b: OI scrub: stop, pos = 30515713: rc = 1
I tried to see where that FID leads but seems that the file doesnt actually exist;
(The customer has moved everything away from this osts)
[root@nfs01 ~]# lfs fid2path wurfs "[0x1001b0000:0x19a5c22:0x0]"
ioctl err -22: Invalid argument (22)
fid2path: error on FID [0x1001b0000:0x19a5c22:0x0]: Invalid argument
Any ideas?
|