[LU-17783] Wrong buffer for field 'batch_update_reply' in format 'MDS_BATCH' from statahead for old server - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.16.0
Affects Version/s: Lustre 2.16.0
Labels:
- statahead
Environment:
Client: 2.15.62
Server: 2.14.0.144

Severity:
3
Rank (Obsolete):
9223372036854775807
Epic Link:
Batched Statahead

Description

Mounting an older server (2.14.0-ddn144) with a master client (2.15.62) causes error messages to be printed on the console related to batched statahead:

LustreError: 11-0: myth-MDT0000-mdc-ffff90ca1519d3d8:
    operation mds_batch to node 192.168.20.1@tcp failed: rc = -95
LustreError: 3832:0:(layout.c:2320:__req_capsule_get()) @@@
    Wrong buffer for field 'batch_update_reply' (1 of 1) in format 'MDS_BATCH', 0 vs. 8 (server)
    req@ffff90ca017f4040 x1797385654110784/t0(0) o63->myth-MDT0000-mdc-ffff90ca1519d3d8@
    192.168.20.1@tcp:24/4 lens 3432/224 e 0 to 0 dl 1714120732 ref 1
    fl Interpret:RQU/200/0 rc -95/-95 job:'ll_sa.0.centos7' uid:0 gid:0

The client should not be trying to use batch RPCs when connected to an MDS/OSS that does not support the "OBD_CONNECT_BATCH_RPC" feature flag:

client# lctl get_param mdc.*.import | grep connect_flags:
    connect_flags: [ write_grant, server_lock, version, acl, create_on_write, inode_bit_locks,
                     getattr_by_fid, no_oh_for_devices, max_byte_per_rpc, early_lock_cancel,
                     adaptive_timeouts, lru_resize, alt_checksum_algorithm, fid_is_enabled,
                     version_recovery, pools, grant_shrink, large_ea, full20, layout_lock,
                     64bithash, jobstats, umask, einprogress, grant_param, lvb_type, short_io,
                     flock_deadlock, disp_stripe, open_by_fid, lfsck, multi_mod_rpcs, dir_stripe,
                     subtree, bulk_mbits, second_flags, file_secctx, dir_migrate, sum_statfs,
                     overstriping, flr, lock_convert, archive_id_array, increasing_xid, selinux_policy,
                     lsom, pcc, crush, async_discard, getattr_pfid, lseek, dom_lvb, atomic_open_lock ]

Attachments

Issue Links

is duplicated by

LU-17834 wrong buffer for field

Open

is related to

LU-17610 Batched statahead errors on client against 2.12 server

Open

LU-14139 batched statahead processing

Resolved

Activity

People

Assignee:: Qian Yingjin

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 26/Apr/24 8:47 AM

Updated:: 29/May/24 1:23 PM

Resolved:: 29/May/24 1:23 PM