[LU-10039] ioctl error after downgrade Created: 27/Sep/17  Updated: 28/Sep/17

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0, Lustre 2.10.1
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Andreas Dilger Assignee: Zhenyu Xu
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-10027 Unable to finish mount on MDS while ... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Took following steps to reproduce the issue and the issue was not reproduced.
Steps for both zfs and ldiskfs:
1. Clients and Servers with b_ieel3_0 build 222. Created Lustre file system.
2. Unmounted and upgraded all clients and Servers to 2.10.1. Mounted lustre again.
3. Unmounted Clients and Downgraded them to b_ieel3_0 build 222. Mounted Lustre again when nodes were up.
4. Unmounted MDS and downgraded it to b_ieel3_0 build 222. Mounted lustre and it worked this time.
5. Ran sanity now and the test ended abruptly after the completion of test_117 for both zfs and ldiskfs.

When I checked the states of nodes, MDT and one client were unmounted, OSTs and 2nd client were mounted.

Results:
ldiskfs - https://testing.hpdd.intel.com/test_sessions/df43fbf4-a2fb-11e7-b786-5254006e85c2
zfs - https://testing.hpdd.intel.com/test_sessions/335a5516-a2fd-11e7-bb19-5254006e85c2



 Comments   
Comment by Andreas Dilger [ 27/Sep/17 ]

After discussion with Saurabh, it seems that the user tools (in particular lfs) were not downgraded with the kernel modules. This is generating errors when using lfs setstripe in sanity test_27A, which is setting the default layout on the root directory:

error on ioctl 0x4008669a for '/mnt/lustre' (3): Invalid argument
error: setstripe: create stripe file '/mnt/lustre' failed
error on ioctl 0x4008669a for '/mnt/lustre' (3): Invalid argument
error: setstripe: create stripe file '/mnt/lustre' failed

The ioctl is LL_IOC_LOV_SETSTRIPE (0x4000 >> (_IOC_DIRSHIFT = 14) = _IOC_WRITE, 0x08 = sizeof(long), 0x66 = 'f', 0x9a = 154). However, it isn't clear why the kernel is complaining about the struct lov_mds_md that is being sent. Is this a bad magic, or something else? Since we aren't using composite layouts for these files, it should be compatible with existing tools.

Comment by Saurabh Tandan (Inactive) [ 28/Sep/17 ]

Re-ran the sanity using exact same steps as above and sanity run did not ended abruptly this time. Before running sanity I ran individual tests i.e. test_27A and test_65i which failed in previous run, but they passed when ran alone in second run. Results for second run are as follows:
ldiskfs- https://testing.hpdd.intel.com/test_sessions/d735f45c-a40a-11e7-b786-5254006e85c2
zfs- https://testing.hpdd.intel.com/test_sessions/0a221f12-a40b-11e7-b786-5254006e85c2

Generated at Sat Feb 10 02:31:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.