[LU-9157] replay-single test_80c: rmdir failed Created: 24/Feb/17 Updated: 26/Feb/19 Resolved: 12/Sep/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0, Lustre 2.10.1, Lustre 2.11.0, Lustre 2.10.2, Lustre 2.12.0, Lustre 2.10.3, Lustre 2.10.4, Lustre 2.10.5, Lustre 2.10.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | James Casper | Assignee: | Lai Siyao |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
trevis-46vm1/2/3/7/8, Full Group test w/DNE, |
||
| Issue Links: |
|
||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||
| Description |
|
https://testing.hpdd.intel.com/test_sessions/13a7c94c-f92a-11e6-aa39-5254006e85c2 From test_log: rm: cannot remove '/mnt/lustre/d80c.replay-single/remote_dir': Input/output error replay-single test_80c: @@@@@@ FAIL: rmdir failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4841:error() = /usr/lib64/lustre/tests/replay-single.sh:2637:test_80c() = /usr/lib64/lustre/tests/test-framework.sh:5117:run_one() = /usr/lib64/lustre/tests/test-framework.sh:5156:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:5003:run_test() = /usr/lib64/lustre/tests/replay-single.sh:2641:main() |
| Comments |
| Comment by Sarah Liu [ 20/Jun/17 ] |
|
MDS console https://testing.hpdd.intel.com/test_sets/93814ab4-53f0-11e7-bacf-5254006e85c2 10:16:00:[59442.222216] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre lustre-mdt2/mdt2 /mnt/lustre-mds2
10:16:00:[59442.618812] Lustre: lustre-MDT0001: Imperative Recovery not enabled, recovery window 60-180
10:16:00:[59442.622180] LustreError: 3386:0:(osd_oi.c:503:osd_oid()) lustre-MDT0001-osd: unsupported quota oid: 0x16
10:16:00:[59442.829897] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
10:16:00:[59443.177304] Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
10:16:00:[59443.788151] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-55vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
10:16:00:[59443.792192] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-55vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
10:16:00:[59444.131692] Lustre: DEBUG MARKER: trevis-55vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
10:19:58:[59444.136692] Lustre: DEBUG MARKER: trevis-55vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
10:19:58:[59444.397031] Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1
10:19:58:[59444.739127] Lustre: DEBUG MARKER: zfs get -H -o value lustre:svname lustre-mdt2/mdt2 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
10:19:58:[59445.082988] Lustre: DEBUG MARKER: zfs get -H -o value lustre:svname lustre-mdt2/mdt2 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
10:19:58:[59445.432320] Lustre: DEBUG MARKER: zfs get -H -o value lustre:svname lustre-mdt2/mdt2 2>/dev/null
10:19:58:[59446.305312] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-55vm2.trevis.hpdd.intel.com: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid
10:19:58:[59446.305685] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-55vm1.trevis.hpdd.intel.com: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid
10:19:58:[59446.626587] Lustre: DEBUG MARKER: trevis-55vm1.trevis.hpdd.intel.com: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid
10:19:58:[59446.628706] Lustre: DEBUG MARKER: trevis-55vm2.trevis.hpdd.intel.com: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid
10:19:58:[59447.057549] Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 5 clients reconnect
10:19:58:[59447.570240] Lustre: lustre-MDT0001: Recovery over after 0:01, of 5 clients 5 recovered and 0 were evicted.
10:19:58:[59448.124116] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 1 sec
10:19:58:[59448.124291] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 1 sec
10:19:58:[59448.512706] Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 1 sec
10:19:58:[59448.527843] Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 1 sec
10:19:58:[59450.364950] Lustre: DEBUG MARKER: /usr/sbin/lctl mark replay-single test_80c: @@@@@@ FAIL: rmdir failed
10:19:58:[59450.685468] Lustre: DEBUG MARKER: replay-single test_80c: @@@@@@ FAIL: rmdir failed
10:19:58:[59451.049620] Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /test_logs/2017-06-16/lustre-master-el7-x86_64--full--1_19_1__3603__-70308503758260-171737/replay-single.test_80c.debug_log.$(hostname -s).1497694559.log;
10:19:58:[59451.049620] dmesg > /test_logs/2017-06-16/lustre-master-el7-x86_64--full--1_19_1__3
10:19:58:[59453.691066] Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 fail_val=0 2>/dev/null
10:19:58:[59455.834157] Lustre: DEBUG MARKER: for i in $(find /proc/spl -type f); do
10:19:58:[59455.834157] echo ====${i}=======================;
10:19:58:[59455.834157] cat ${i};
10:19:58:[59455.834157] done
10:19:58:[59456.934168] Lustre: DEBUG MARKER: echo ====zpool events -v=======================;
10:19:58:[59456.934168] zpool events -v
10:19:58:[59457.623449] LustreError: 3434:0:(llog_cat.c:763:llog_cat_cancel_records()) lustre-MDT0000-osp-MDT0001: fail to cancel 0 of 1 llog-records: rc = -116
10:19:58:[59457.752332] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-single test 80d: DNE: create remote dir, drop update rep from MDT1, fail 2 MDTs ============ 10:16:06 \(1497694566\)
|
| Comment by James Nunez (Inactive) [ 08/Aug/18 ] |
|
Some recent failures: It fails 100% of the time on master with review-dne-zfs-part-4. |
| Comment by Peter Jones [ 08/Aug/18 ] |
|
Lai Could you please advise? Thanks Peter |
| Comment by Lai Siyao [ 04/Sep/18 ] |
|
The failure should be the same as 40000000:00000040:0.0:1535086843.183909:0:24271:0:(fid_request.c:400:seq_client_alloc_fid()) cli-cli-lustre-MDT0000-osp-MDT0001: Allocated FID [0x20000bb98:0x1:0x0] 00000004:00000040:0.0:1535086843.183917:0:24271:0:(mdt_handler.c:5324:mdt_object_init()) object init, fid = [0x20000bb98:0x1:0x0] 00000004:00000040:1.0:1535086843.184867:0:23089:0:(osp_trans.c:265:object_update_request_dump()) i = 0 fid = [0x20000bb98:0x1:0x0] op = create params = 1 batchid = 0 size = 256 repsize 0 00000040:00001000:0.0:1535086843.213847:0:24271:0:(llog_osd.c:399:llog_osd_write_rec()) new record 106a0000 to [0x20000bb98:0x1:0x0] 00000004:00000040:0.0:1535086843.213858:0:24271:0:(osp_trans.c:265:object_update_request_dump()) i = 0 fid = [0x20000bb98:0x1:0x0] op = attr_get params = 0 batchid = 0 size = 40 repsize 208 00000004:00000040:0.0:1535086843.214723:0:24271:0:(osp_md_object.c:1163:osp_md_write()) write [0x20000bb98:0x1:0x0] offset = 0 length = 32768 00000004:00000040:0.0:1535086843.214752:0:24271:0:(osp_md_object.c:1163:osp_md_write()) write [0x20000bb98:0x1:0x0] offset = 32768 length = 504 00000040:00080000:0.0:1535086843.214755:0:24271:0:(llog_osd.c:698:llog_osd_write_rec()) added record [0x20000bb98:0x1:0x0].1, 504 off33272 00000004:00000040:1.0:1535086843.214788:0:23089:0:(osp_trans.c:265:object_update_request_dump()) i = 0 fid = [0x20000bb98:0x1:0x0] op = write params = 2 batchid = 0 size = 32832 repsize 0 00000004:00000040:1.0:1535086843.214789:0:23089:0:(osp_trans.c:265:object_update_request_dump()) i = 1 fid = [0x20000bb98:0x1:0x0] op = write params = 2 batchid = 0 size = 568 repsize 0 00000040:00001000:1.0:1535086846.270681:0:24271:0:(llog_osd.c:399:llog_osd_write_rec()) new record 106a0000 to [0x20000bb98:0x1:0x0] 00000004:00000040:1.0:1535086846.270684:0:24271:0:(osp_md_object.c:1163:osp_md_write()) write [0x20000bb98:0x1:0x0] offset = 0 length = 88 00000004:00000040:1.0:1535086846.270692:0:24271:0:(osp_md_object.c:1163:osp_md_write()) write [0x20000bb98:0x1:0x0] offset = 88 length = 4 00000004:00000040:1.0:1535086846.270695:0:24271:0:(osp_md_object.c:1163:osp_md_write()) write [0x20000bb98:0x1:0x0] offset = 32760 length = 8 00000004:00000040:1.0:1535086846.270699:0:24271:0:(osp_md_object.c:1163:osp_md_write()) write [0x20000bb98:0x1:0x0] offset = 33272 length = 1160 00000040:00080000:1.0:1535086846.270702:0:24271:0:(llog_osd.c:698:llog_osd_write_rec()) added record [0x20000bb98:0x1:0x0].2, 1160 off34432 00000004:00000040:1.0:1535086872.182169:0:23089:0:(osp_trans.c:265:object_update_request_dump()) i = 3 fid = [0x20000bb98:0x1:0x0] op = write params = 2 batchid = 0 size = 152 repsize 0 00000004:00000040:1.0:1535086872.182172:0:23089:0:(osp_trans.c:265:object_update_request_dump()) i = 4 fid = [0x20000bb98:0x1:0x0] op = write params = 2 batchid = 0 size = 72 repsize 0 00000004:00000040:1.0:1535086872.182174:0:23089:0:(osp_trans.c:265:object_update_request_dump()) i = 5 fid = [0x20000bb98:0x1:0x0] op = write params = 2 batchid = 0 size = 72 repsize 0 00000004:00000040:1.0:1535086872.182176:0:23089:0:(osp_trans.c:265:object_update_request_dump()) i = 6 fid = [0x20000bb98:0x1:0x0] op = write params = 2 batchid = 0 size = 1224 repsize 0 00000004:00000040:0.0:1535086877.998347:0:26196:0:(mdt_handler.c:5344:mdt_object_free()) object free, fid = [0x20000bb98:0x1:0x0] 00000004:00000040:1.0:1535086890.715650:0:27095:0:(mdt_handler.c:5324:mdt_object_init()) object init, fid = [0x20000bb98:0x1:0x0] 00000004:00000040:1.0:1535086890.715674:0:27095:0:(osp_trans.c:265:object_update_request_dump()) i = 0 fid = [0x20000bb98:0x1:0x0] op = attr_get params = 0 batchid = 0 size = 40 repsize 208 00000004:00000040:1.0:1535086890.716560:0:27095:0:(osp_md_object.c:1238:osp_md_read()) lustre-MDT0000-osp-MDT0001 [0x20000bb98:0x1:0x0] read offset 0 size 32768 00000004:00000040:1.0:1535086890.716562:0:27095:0:(osp_trans.c:265:object_update_request_dump()) i = 0 fid = [0x20000bb98:0x1:0x0] op = read params = 2 batchid = 0 size = 72 repsize 32768 00000004:00000040:1.0:1535086890.721227:0:27095:0:(osp_md_object.c:1238:osp_md_read()) lustre-MDT0000-osp-MDT0001 [0x20000bb98:0x1:0x0] read offset 32768 size 32768 00000004:00000040:1.0:1535086890.721229:0:27095:0:(osp_trans.c:265:object_update_request_dump()) i = 0 fid = [0x20000bb98:0x1:0x0] op = read params = 2 batchid = 0 size = 72 repsize 32768 40000000:00000040:0.0:1535086899.333519:0:26923:0:(fid_request.c:400:seq_client_alloc_fid()) cli-cli-lustre-MDT0000-osp-MDT0001: Allocated FID [0x20000bb98:0x1:0x0] 00000004:00000040:0.0:1535086899.333537:0:26923:0:(mdt_handler.c:5324:mdt_object_init()) object init, fid = [0x20000bb98:0x1:0x0] 00000004:00000040:1.0:1535086899.334459:0:26954:0:(osp_trans.c:265:object_update_request_dump()) i = 0 fid = [0x20000bb98:0x1:0x0] op = create params = 1 batchid = 0 size = 256 repsize 0 00000040:00001000:0.0:1535086899.371627:0:26923:0:(llog_osd.c:399:llog_osd_write_rec()) new record 106a0000 to [0x20000bb98:0x1:0x0] 00000004:00000040:0.0:1535086899.371630:0:26923:0:(osp_md_object.c:1163:osp_md_write()) write [0x20000bb98:0x1:0x0] offset = 0 length = 32768 00000004:00000040:0.0:1535086899.371679:0:26923:0:(osp_md_object.c:1163:osp_md_write()) write [0x20000bb98:0x1:0x0] offset = 32768 length = 504 00000040:00080000:0.0:1535086899.371684:0:26923:0:(llog_osd.c:698:llog_osd_write_rec()) added record [0x20000bb98:0x1:0x0].1, 504 off33272 00000004:00080000:1.0:1535086899.394315:0:26954:0:(osp_object.c:1304:osp_invalidate()) Invalidate osp_object [0x20000bb98:0x1:0x0] |
| Comment by Peter Jones [ 12/Sep/18 ] |
|
duplicate of |