[LU-5188] nbp6-OST002f-osc-MDT0000: invalid setattr record, lsr_valid:0 Created: 12/Jun/14 Updated: 03/Mar/15 Resolved: 27/Jun/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.3 |
| Fix Version/s: | Lustre 2.6.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Mahmoud Hanafi | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HB | ||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 14392 | ||||||||||||||||||||
| Description |
|
mounting mdt we start to see these errors then the system becomes unreponsive nbp6-mds login: LDISKFS-fs (dm-1): recovery complete LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-2): recovery complete LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. quota=on. Opts: Lustre: nbp6-MDT0000: Not available for connect from 10.153.1.57@o2ib233 (not set up) Lustre: nbp6-MDT0000: used disk, loading Lustre: nbp6-MDT0000: Not available for connect from 10.153.0.76@o2ib233 (not set up) Lustre: 2967:0:(mdt_handler.c:4969:mdt_process_config()) For interoperability, skip this mdt.group_upcall. It is obsolete. Lustre: 2967:0:(mdt_handler.c:4969:mdt_process_config()) For interoperability, skip this mdt.quota_type. It is obsolete. LustreError: 3069:0:(osp_sync.c:487:osp_sync_new_setattr_job()) nbp6-OST002f-osc-MDT0000: invalid setattr record, lsr_valid:0 LustreError: 3069:0:(osp_sync.c:797:osp_sync_process_queues()) can't send: -22 LustreError: 3069:0:(osp_sync.c:487:osp_sync_new_setattr_job()) nbp6-OST002f-osc-MDT0000: invalid setattr record, lsr_valid:0 LustreError: 3069:0:(osp_sync.c:797:osp_sync_process_queues()) can't send: -22 LustreError: 11-0: nbp6-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11. Lustre: nbp6-MDT0000: Will be in recovery for at least 5:00, or until 1083 clients reconnect LustreError: 3069:0:(osp_sync.c:797:osp_sync_process_queues()) can't send: -22 LustreError: 3081:0:(osp_sync.c:487:osp_sync_new_setattr_job()) nbp6-OST0032-osc-MDT0000: invalid setattr record, lsr_valid:0 LustreError: 3081:0:(osp_sync.c:487:osp_sync_new_setattr_job()) Skipped 212864 previous similar messages LustreError: 3069:0:(osp_sync.c:797:osp_sync_process_queues()) Skipped 229931 previous similar messages LustreError: 3081:0:(osp_sync.c:797:osp_sync_process_queues()) can't send: -22 LustreError: 3069:0:(osp_sync.c:487:osp_sync_new_setattr_job()) nbp6-OST002f-osc-MDT0000: invalid setattr record, lsr_valid:0 LustreError: 3069:0:(osp_sync.c:487:osp_sync_new_setattr_job()) Skipped 450953 previous similar messages LustreError: 3081:0:(osp_sync.c:797:osp_sync_process_queues()) Skipped 445853 previous similar messages LustreError: 3081:0:(osp_sync.c:797:osp_sync_process_queues()) can't send: -22 LustreError: 3069:0:(osp_sync.c:487:osp_sync_new_setattr_job()) nbp6-OST002f-osc-MDT0000: invalid setattr record, lsr_valid:0 LustreError: 3069:0:(osp_sync.c:487:osp_sync_new_setattr_job()) Skipped 930108 previous similar messages LustreError: 3081:0:(osp_sync.c:797:osp_sync_process_queues()) Skipped 925668 previous similar messages LustreError: 3081:0:(osp_sync.c:797:osp_sync_process_queues()) can't send: -22 LustreError: 3069:0:(osp_sync.c:487:osp_sync_new_setattr_job()) nbp6-OST002f-osc-MDT0000: invalid setattr record, lsr_valid:0 LustreError: 3069:0:(osp_sync.c:487:osp_sync_new_setattr_job()) Skipped 1897185 previous similar messages LustreError: 3081:0:(osp_sync.c:797:osp_sync_process_queues()) Skipped 1898934 previous similar messages LustreError: 3081:0:(osp_sync.c:487:osp_sync_new_setattr_job()) nbp6-OST0032-osc-MDT0000: invalid setattr record, lsr_valid:0 LustreError: 3069:0:(osp_sync.c:797:osp_sync_process_queues()) can't send: -22 LustreError: 3069:0:(osp_sync.c:797:osp_sync_process_queues()) Skipped 3829999 previous similar messages LustreError: 3081:0:(osp_sync.c:487:osp_sync_new_setattr_job()) Skipped 3851279 previous similar messages LustreError: 3081:0:(osp_sync.c:797:osp_sync_process_queues()) can't send: -22 LustreError: 3069:0:(osp_sync.c:487:osp_sync_new_setattr_job()) nbp6-OST002f-osc-MDT0000: invalid setattr record, lsr_valid:0 LustreError: 3069:0:(osp_sync.c:487:osp_sync_new_setattr_job()) Skipped 7683067 previous similar messages LustreError: 3081:0:(osp_sync.c:797:osp_sync_process_queues()) Skipped 7758692 previous similar messages LustreError: 3081:0:(osp_sync.c:487:osp_sync_new_setattr_job()) nbp6-OST0032-osc-MDT0000: invalid setattr record, lsr_valid:0 LustreError: 3069:0:(osp_sync.c:797:osp_sync_process_queues()) can't send: -22 LustreError: 3069:0:(osp_sync.c:797:osp_sync_process_queues()) Skipped 15494392 previous similar messages LustreError: 3081:0:(osp_sync.c:487:osp_sync_new_setattr_job()) Skipped 15461799 previous similar messages LustreError: 3069:0:(osp_sync.c:487:osp_sync_new_setattr_job()) nbp6-OST002f-osc-MDT0000: invalid setattr record, lsr_valid:0 LustreError: 3081:0:(osp_sync.c:797:osp_sync_process_queues()) can't send: -22 LustreError: 3081:0:(osp_sync.c:797:osp_sync_process_queues()) Skipped 31047376 previous similar messages LustreError: 3069:0:(osp_sync.c:487:osp_sync_new_setattr_job()) Skipped 30898154 previous similar messages INFO: task crond:3224 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. crond D 0000000000000003 0 3224 2649 0x00000080 ffff8805c5579d38 0000000000000086 ffff8805c5579d00 ffff8805c5579cfc 0000000000000000 ffff88063fc24700 ffff880028296780 0000000000000500 ffff880625ed85f8 ffff8805c5579fd8 000000000000fc40 ffff880625ed85f8 Call Trace: [<ffffffff81540545>] schedule_timeout+0x215/0x2e0 [<ffffffff81346555>] ? extract_entropy+0xe5/0x140 [<ffffffff815401c3>] wait_for_common+0x123/0x180 [<ffffffff81063be0>] ? default_wake_function+0x0/0x20 [<ffffffff815402dd>] wait_for_completion+0x1d/0x20 [<ffffffff810921c8>] synchronize_sched+0x58/0x60 [<ffffffff81092150>] ? wakeme_after_rcu+0x0/0x20 [<ffffffff8122205c>] install_session_keyring_to_cred+0x6c/0xd0 [<ffffffff812221f3>] join_session_keyring+0x133/0x160 [<ffffffff810dbff7>] ? audit_syscall_entry+0x1d7/0x200 [<ffffffff81220df8>] keyctl_join_session_keyring+0x38/0x70 [<ffffffff81221a20>] sys_keyctl+0x170/0x190 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b LustreError: 3081:0:(osp_sync.c:797:osp_sync_process_queues()) can't send: -22 LustreError: 3069:0:(osp_sync.c:487:osp_sync_new_setattr_job()) nbp6-OST002f-osc-MDT0000: invalid setattr record, lsr_valid:0 LustreError: 3069:0:(osp_sync.c:487:osp_sync_new_setattr_job()) Skipped 61763336 previous similar messages LustreError: 3081:0:(osp_sync.c:797:osp_sync_process_queues()) Skipped 62092386 previous similar messages |
| Comments |
| Comment by Peter Jones [ 12/Jun/14 ] |
|
Oleg is looking into this |
| Comment by Oleg Drokin [ 12/Jun/14 ] |
|
Hm, it appears that this is a bug in Patch is at http://review.whamcloud.com/10706 - this should help your immediate problem. Niu: regarding the |
| Comment by Mahmoud Hanafi [ 12/Jun/14 ] |
|
ok thanks will build and install the patch. |
| Comment by Peter Jones [ 12/Jun/14 ] |
|
NIu Could you please complete any follow up work on this ticket? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 13/Jun/14 ] |
Right, I think we'd return 0 instead of -EINVAL, so that we just print an error message, and osp sync process can continue processing on other records. |
| Comment by Andreas Dilger [ 16/Jun/14 ] |
|
I think skipping the invalid record is OK, since LFSCK 2 should fix up the ownership of the OST objects on its next scan. |
| Comment by Mahmoud Hanafi [ 20/Jun/14 ] |
|
The patched fixed our issue. |
| Comment by Andreas Dilger [ 20/Jun/14 ] |
|
Patch landed to master for 2.6.0. |
| Comment by Di Wang [ 25/Jun/14 ] |
|
This patch cause It seems osp_sync_new_setattr_job return 0 but did not issue RPC, so we should decrease opd_syn_rpc_in_flight and opd_sync_rpc_in_progress in this case. |
| Comment by Di Wang [ 25/Jun/14 ] |
| Comment by Jodi Levi (Inactive) [ 27/Jun/14 ] |
|
Patch landed to Master. |