[LU-9224] race condition between client_fid_fini() and seq_client_flush() Created: 18/Mar/17  Updated: 30/Mar/17  Resolved: 30/Mar/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Major
Reporter: nasf (Inactive) Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The failure is reported in the sanity test_802 as following:

 [34652.466072] Turning device loop1 (0x700001) read-only
 [34652.466711] Lustre: lustre-OST0000-osd: set dev_rdonly on this device
 [34653.175764] LDISKFS-fs (loop2): file extents enabled, maximum tree depth=5
 [34653.180480] LDISKFS-fs (loop2): mounted filesystem with ordered data mode. Opts: ,errors=remount-ro,no_mbcache,nodelalloc
 [34653.182471] Turning device loop2 (0x700002) read-only
 [34656.566184] Lustre: 22422:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1489746185/real 1489746185]  req@ffff88004c56fc40 x1562110670363184/t0(0) o8->lustre-OST0001-osc-MDT0000@0@lo:28/4 lens 520/544 e 0 to 1 dl 1489746190 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
 [34658.568693] LustreError: 3307:0:(lmv_obd.c:1395:lmv_statfs()) can't stat MDS #0 (lustre-MDT0000-mdc-ffff88002024c800), error -13
 [34658.570580] BUG: unable to handle kernel paging request at ffff88007e6d6e10
 [34658.571428] IP: [<ffffffff81700f82>] mutex_lock_nested+0xc2/0x3b0
 [34658.572016] PGD 2e75067 PUD bcc1a067 PMD bca26067 PTE 800000007e6d6060
 [34658.572635] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
 [34658.573208] Modules linked in: obdecho(OE) lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) brd ext4 loop mbcache jbd2 rpcsec_gss_krb5 syscopyarea ata_generic sysfillrect pata_acpi sysimgblt ttm drm_kms_helper drm ata_piix i2c_piix4 pcspkr serio_raw virtio_console virtio_blk virtio_balloon i2c_core libata floppy nfsd ip_tables [last unloaded: obdecho]
 [34658.577701] CPU: 2 PID: 22422 Comm: ptlrpcd_rcv Tainted: G        W  OE  ------------   3.10.0-debug #1
 [34658.578784] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 [34658.579352] task: ffff8800ad87eac0 ti: ffff88006ae90000 task.ti: ffff88006ae90000
 [34658.580395] RIP: 0010:[<ffffffff81700f82>]  [<ffffffff81700f82>] mutex_lock_nested+0xc2/0x3b0
 [34658.581479] RSP: 0018:ffff88006ae93a48  EFLAGS: 00010046
 [34658.582032] RAX: 0000000000020000 RBX: ffff8800ad87eac0 RCX: 0000000000000000
 [34658.582627] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000246
 [34658.583219] RBP: ffff88006ae93ab8 R08: 0000000000000000 R09: 0000000000000000
 [34658.583809] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800ad87eac0
 [34658.584423] R13: ffff88007e6d6e08 R14: ffff88007e6d6e10 R15: 0000000000000246
 [34658.585012] FS:  0000000000000000(0000) GS:ffff8800bc700000(0000) knlGS:0000000000000000
 [34658.586634] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 [34658.587204] CR2: ffff88007e6d6e10 CR3: 00000000487fb000 CR4: 00000000000006e0
 [34658.587796] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 [34658.588386] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 [34658.588972] Stack:
 [34658.589472]  ffffffffa010b059 ffff880071feb000 ffff88006ae90000 ffff880071feb0d8
 [34658.590546]  ffff8800b0955c48 ffff88006ae93ad0 ffffffffa022fee7 0000000000000010
 [34658.591620]  0000000006b1ecbe ffff8800ad87eac0 ffff88007e6d6e08 ffff880053b03c40
 [34658.592692] Call Trace:
 [34658.593207]  [<ffffffffa010b059>] ? seq_client_flush+0x59/0x170 [fid]
 [34658.593799]  [<ffffffffa022fee7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
 [34658.594384]  [<ffffffffa010b059>] seq_client_flush+0x59/0x170 [fid]
 [34658.594961]  [<ffffffff810b7ce0>] ? wake_up_state+0x20/0x20
 [34658.595534]  [<ffffffffa0783341>] mdc_import_event+0x91/0xa00 [mdc]
 [34658.596149]  [<ffffffffa05d9b1d>] ptlrpc_deactivate_and_unlock_import+0xdd/0x3e0 [ptlrpc]
 [34658.597244]  [<ffffffffa05ddc1e>] ptlrpc_connect_interpret+0x4ae/0x26f0 [ptlrpc]
 [34658.598391]  [<ffffffffa05bb404>] ? ptlrpc_unregister_bulk+0xc4/0x7c0 [ptlrpc]
 [34658.599529]  [<ffffffff817063f7>] ? _raw_spin_unlock+0x27/0x40
 [34658.600122]  [<ffffffffa05b3159>] ? after_reply+0x9a9/0xf50 [ptlrpc]
 [34658.600730]  [<ffffffffa05b4728>] ? ptlrpc_check_set.part.21+0x1028/0x1e80 [ptlrpc]
 [34658.601815]  [<ffffffffa05b3b77>] ptlrpc_check_set.part.21+0x477/0x1e80 [ptlrpc]
 [34658.602866]  [<ffffffff81391dd7>] ? debug_object_free+0x127/0x180
 [34658.603439]  [<ffffffff81700989>] ? schedule_timeout+0x179/0x2a0
 [34658.604036]  [<ffffffffa05b55db>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc]
 [34658.604651]  [<ffffffffa05e138b>] ptlrpcd_check+0x4bb/0x570 [ptlrpc]
 [34658.605262]  [<ffffffffa05e16fb>] ptlrpcd+0x2bb/0x580 [ptlrpc]
 [34658.605834]  [<ffffffff810b7ce0>] ? wake_up_state+0x20/0x20
 [34658.606434]  [<ffffffffa05e1440>] ? ptlrpcd_check+0x570/0x570 [ptlrpc]
 [34658.607018]  [<ffffffff810a2eda>] kthread+0xea/0xf0
 [34658.607574]  [<ffffffff810a2df0>] ? kthread_create_on_node+0x140/0x140
 [34658.608155]  [<ffffffff8170fbd8>] ret_from_fork+0x58/0x90
 [34658.608718]  [<ffffffff810a2df0>] ? kthread_create_on_node+0x140/0x140
 [34658.609297] Code: 41 f7 87 44 c0 ff ff 00 ff ff 07 0f 85 b8 02 00 00 9c 58 0f 1f 44 00 00 49 89 c7 fa 66 0f 1f 44 00 00 b8 00 00 02 00 4d 8d 75 08 <f0> 41 0f c1 45 08 89 c2 c1 ea 10 66 39 c2 89 d1 0f 85 69 02 00 
 [34658.611497] RIP  [<ffffffff81700f82>] mutex_lock_nested+0xc2/0x3b0
 [34658.612077]  RSP <ffff88006ae93a48>
 [34658.612605] CR2: ffff88007e6d6e10


 Comments   
Comment by nasf (Inactive) [ 18/Mar/17 ]

When client mount failure or umount, the client_fid_fini() will be called. At that time, the async connection failure will trigger seq_client_flush(). Unfortunately the parameter for seq_client_flush() may be released by client_fid_fini() by race

Comment by Gerrit Updater [ 18/Mar/17 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/26079
Subject: LU-9224 fid: race between client_fid_fini and seq_client_flush
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d7959890967eaca7d0a6076622d5e2f3d015fa5d

Comment by Gerrit Updater [ 30/Mar/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26079/
Subject: LU-9224 fid: race between client_fid_fini and seq_client_flush
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c0d30fc79be0788fca19bfa7e3b5048881195fc6

Generated at Sat Feb 10 02:24:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.