[LU-12833] lustre_lwp_add_conn can't find lwp device Created: 07/Oct/19  Updated: 06/Dec/19  Resolved: 06/Dec/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Major
Reporter: Alexander Boyko Assignee: Alexander Boyko
Resolution: Fixed Votes: 0
Labels: patch
Environment:

After upgrade from 2.5 version


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

LWP uses fsname-client config for creating devices
the error occurred for #25 record

00000040:00001000:2.0:1569263809.706616:0:96423:0:(llog.c:603:llog_process_thread()) lrh_index: 25 lrh_len: 112 (4304 remains)
00000020:00000001:2.0:1569263809.706616:0:96423:0:(obd_mount_server.c:804:client_lwp_config_process()) Process entered
00000020:00000001:2.0:1569263809.706617:0:96423:0:(obd_mount_server.c:738:lustre_lwp_add_conn()) Process entered
00000020:00000001:2.0:1569263809.706618:0:96423:0:(obd_mount_server.c:700:lustre_find_lwp()) Process entered
00000020:00000010:2.0:1569263809.706618:0:96423:0:(obd_mount_server.c:705:lustre_find_lwp()) kmalloced '*lwpname': 64 at ffff8807c129bd40.
00000020:00000001:2.0:1569263809.706619:0:96423:0:(obd_mount_server.c:346:tgt_name2lwp_name()) Process entered
00000020:00000010:2.0:1569263809.706619:0:96423:0:(obd_mount_server.c:348:tgt_name2lwp_name()) kmalloced 'fsname': 64 at ffff8807c129b780.
00000020:00000001:2.0:1569263809.706620:0:96423:0:(obd_mount_server.c:372:tgt_name2lwp_name()) Process leaving via cleanup (rc=0 : 0 : 0x0)
00000020:00000010:2.0:1569263809.706621:0:96423:0:(obd_mount_server.c:376:tgt_name2lwp_name()) kfreed 'fsname': 64 at ffff8807c129b780.
00000020:00000001:2.0:1569263809.706656:0:96423:0:(obd_mount_server.c:727:lustre_find_lwp()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe)
00000020:00020000:2.0:1569263809.706657:0:96423:0:(obd_mount_server.c:742:lustre_lwp_add_conn()) xxx1116-OST0000: can't find lwp device.
00000020:00000001:2.0:1569263809.706660:0:96423:0:(obd_mount_server.c:743:lustre_lwp_add_conn()) Process leaving via out (rc=18446744073709551614 : -2 : 0xfffffffffffffffe)
00000020:00000010:2.0:1569263809.706661:0:96423:0:(obd_mount_server.c:772:lustre_lwp_add_conn()) kfreed 'lwpname': 64 at ffff8807c129bd40.
00000020:00000001:2.0:1569263809.706662:0:96423:0:(obd_mount_server.c:773:lustre_lwp_add_conn()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe)
00000020:00000001:2.0:1569263809.706663:0:96423:0:(obd_mount_server.c:902:client_lwp_config_process()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe)

Base on config it should be skipped

#19 (224)END marker 7 (flags=0x02, v2.5.1.0) xxx1116-client 'mount opts' Thu Oct 1 15:29:07 2015-
#20 (224)SKIP START marker 11 (flags=0x05, v2.5.1.0) xxx1116-MDT0002 'add mdc' Thu Oct 1 15:55:50 2015-Thu Oct 1 16:22:50 2015
#21 (088)SKIP add_uuid nid=10.10.10.8@o2ib(0x500000a956a08) 0: 1:10.10.10.8@o2ib
#22 (128)SKIP attach 0:xxx1116-MDT0002-mdc 1:mdc 2:xxx1116-clilmv_UUID
#23 (144)SKIP setup 0:xxx1116-MDT0002-mdc 1:xxx1116-MDT0002_UUID 2:10.10.10.8@o2ib
#24 (088)SKIP add_uuid nid=10.10.10.7@o2ib(0x500000a956a07) 0: 1:10.10.10.7@o2ib
#25 (112)SKIP add_conn 0:xxx1116-MDT0002-mdc 1:10.10.10.7@o2ib
#26 (088)SKIP add_uuid nid=10.10.10.7@o2ib(0x500000a956a07) 0: 1:10.10.10.7@o2ib
#27 (112)SKIP add_conn 0:xxx1116-MDT0002-mdc 1:10.10.10.7@o2ib
#28 (168)SKIP modify_mdc_tgts add 0:xxx1116-clilmv 1:xxx1116-MDT0002_UUID 2:2 3:1 4:xxx1116-MDT0002-mdc_UUID
#29 (224)SKIP END marker 11 (flags=0x06, v2.5.1.0) xxx1116-MDT0002 'add mdc' Thu Oct 1 15:55:50 2015-Thu Oct 1 16:22:50 2015
#30 (224)marker 12 (flags=0x01, v2.5.1.0) xxx1116-client 'mount opts' Thu Oct 1 15:55:50 2015-

It looks like client_lwp_config_process () has a bug, and processing add_conn without processing add_uuid before.
For marker it skips the record if SKIP flag is set. For add_uuid it base on flags from marker processing so skips too. But for add_conn it processes the record, tries to find a lwp device and fails. Because a device is added by add_uuid record.

The workaround is to cleanup client config from SKIP records. 
lctl clear_conf command and write conf should help also.

Only SKIP for command 'add mdc' breaks LWP config processing. 



 Comments   
Comment by Gerrit Updater [ 07/Oct/19 ]

Alexandr Boyko (c17825@cray.com) uploaded a new patch: https://review.whamcloud.com/36391
Subject: LU-12833 obdclass: fix LWP config processing
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f79f124a0cdce385a78cd9cf43fe73cf063bf260

Comment by Gerrit Updater [ 06/Dec/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36391/
Subject: LU-12833 obdclass: fix LWP config processing
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e1b10b084ebc74a0ce74538caa329775181b329b

Comment by Peter Jones [ 06/Dec/19 ]

Landed for 2.14

Generated at Sat Feb 10 02:56:03 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.