[LU-15012] Unreplayed open leads to version mismatch Created: 17/Sep/21  Updated: 06/Jan/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Andriy Skulysh Assignee: Andriy Skulysh
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
00000002:00100000:1.0:1599093179.305780:0:16802:0:(mdc_request.c:911:mdc_close()) @@@ matched open  req@ffff9868388f9b00 x1676760733129664/t30064836004(30064836004) o101->lustre-MDT0001-mdc ffff986938cf2800@192.168.2.11@tcp:12/10 lens 760/840 e 0 to 0 dl 1599093229 ref 1 fl Complete:RP/4/ffffffff rc 0/-1 job:'cp.0'

Uncommitted close removes open from replay list:

00000100:00080000:0.0:1599093302.301396:0:12330:0:(import.c:86:import_set_state_nolock()) ffff9868364e1800 lustre-MDT0001_UUID: changing import state from CONNECTING to REPLAY
00000100:00080000:0.0:1599093302.301439:0:12330:0:(import.c:1568:ptlrpc_import_recovery_state_machine()) replay requested by lustre-MDT0001_UUID
00000100:00100000:0.0:1599093302.301441:0:12330:0:(client.c:2795:ptlrpc_free_committed()) lustre-MDT0001-mdc-ffff986938cf2800: committing for last_committed 30064836076 gen 1
00000100:00100000:0.0:1599093302.301444:0:12330:0:(client.c:2821:ptlrpc_free_committed()) @@@ stopping search  req@ffff98683a193180 x1676760733148032/t30064836094(30064836094) o36->lustre-MDT0001-mdc-ffff986938cf2800@192.168.2.12@tcp:12/10 lens 488/456 e 0 to 0 dl 1599093230 ref 1 fl Complete:R/4/0 rc 0/0 job:'cp.0'
00000100:00100000:0.0:1599093302.301455:0:12330:0:(client.c:2848:ptlrpc_free_committed()) @@@ free closed open request  req@ffff9868388f9b00 x1676760733129664/t30064836004(30064836004) o101->lustre-MDT0001-mdc-ffff986938cf2800@192.168.2.12@tcp:12/10 lens 760/840 e 0 to 0 dl 1599093229 ref 1 fl Complete:R/4/ffffffff rc 0/-1 job:'cp.0'
00000100:00000040:0.0:1599093302.301464:0:12330:0:(client.c:2604:__ptlrpc_req_finished()) @@@ refcount now 1  req@ffff9868388f9b00 x1676760733129664/t30064836004(30064836004) o101->lustre-MDT0001-mdc-ffff986938cf2800@192.168.2.12@tcp:12/10 lens 760/840 e 0 to 0 dl 1599093229 ref 2 fl Complete:RM/4/ffffffff rc 0/-1 job:'cp.0'
00000100:00080000:0.0:1599093302.301469:0:12330:0:(recover.c:88:ptlrpc_replay_next()) import ffff9868364e1800 from lustre-MDT0001_UUID committed 30064836076 last 0

So unlink from another client destroys the file (move to orphan):

00000020:00000040:0.0:1599093308.803564:0:2607:0:(tgt_handler.c:579:tgt_handle_recovery()) @@@ Got new replay  req@ffff930c9b201200 x1676760723796928/t0(30064836133) o36->d148e573-bac7-d122-a32d-19499a53d6da@192.168.2.20@tcp:338/0 lens 488/0 e 0 to 0 dl 1599093358 ref 1 fl Complete:/4/ffffffff rc 0/-1 job:'rm.0'
00000004:00080000:0.0:1599093308.803938:0:2607:0:(mdd_dir.c:1547:mdd_finish_unlink([0x240000bd3:0x5d27:0x0]  open count = 0 is dir 0

and all other requests the file fail with version checking:

00000100:00000400:0.0:1599093308.966908:0:12330:0:(client.c:3045:ptlrpc_replay_interpret()) @@@ Version mismatch during replay  req@ffff98683fedd200 x1676760733164352/t30064836142(30064836142) o36->lustre-MDT0001-mdc-ffff986938cf2800@192.168.2.12@tcp:12/10 lens 544/440 e 0 to 0 dl 1599093359 ref 2 fl Interpret:R/4/0 rc -75/-75 job:'cp.0'


 Comments   
Comment by Gerrit Updater [ 17/Sep/21 ]

"Andriy Skulysh <andriy.skulysh@hpe.com>" uploaded a new patch: https://review.whamcloud.com/44965
Subject: LU-15012 llite: Unreplayed open leads to version mismatch
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ca4cda6f394ba2189028c7f5caf7a9208fe91ee3

Generated at Sat Feb 10 03:14:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.