[LU-15756] obdfilter-survey test_2b:osc_request.c:1037:osc_init_grant()) lustre-..._osc: granted 364544 but already consumed 1048576 Created: 18/Apr/22  Updated: 20/Apr/22  Resolved: 20/Apr/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-14302 obdfilter-survey test 2b hangs in cas... Reopened
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Cliff White <cwhite@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/84394511-6981-4fb2-a676-622684e8d2b0

test_2b failed with the following error:

Timeout occurred after 164 minutes, last suite running was obdfilter-survey

Obdfilter appears to have failed with a stack dump

[ 6088.557336] Lustre: DEBUG MARKER: == obdfilter-survey test 2b: Stripe F/S over the Network, async journal ========================================================== 22:47:10 (1649026030)
[ 7214.297009] LustreError: 7934:0:(osc_request.c:1037:osc_init_grant()) lustre-OST0006_osc: granted 364544 but already consumed 1048576
[ 7214.299054] LustreError: 7934:0:(osc_request.c:1037:osc_init_grant()) Skipped 1 previous similar message
[ 8540.211625] LustreError: 7934:0:(osc_request.c:1037:osc_init_grant()) lustre-OST0005_osc: granted 364544 but already consumed 1048576
[ 9740.964390] LustreError: 26861:0:(echo_client.c:1087:echo_device_free()) echo_client still has objects at cleanup time, wait for 1 second
[ 9741.330538] sysrq: SysRq : Changing Loglevel
[ 9741.331382] sysrq: Loglevel set to 8
[ 9741.333400] sysrq: SysRq : Show State
[ 9741.334065] task:systemd         state:S stack:    0 pid:    1 ppid:     0 flags:0x00000000
[ 9741.335391] Call Trace:
[ 9741.335874]  __schedule+0x2c4/0x700
[ 9741.336476]  schedule+0x37/0xa0
[ 9741.337026]  schedule_hrtimeout_range_clock+0xf7/0x110
[ 9741.337885]  ? seq_release+0x24/0x30
[ 9741.338495]  ep_poll+0x403/0x460
[ 9741.339059]  ? wake_up_q+0x80/0x80
[ 9741.339638]  do_epoll_wait+0xb0/0xd0
[ 9741.340253]  __x64_sys_epoll_wait+0x1a/0x20
[ 9741.340975]  do_syscall_64+0x5b/0x1a0
[ 9741.341601]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 9741.342437] RIP: 0033:0x7f2e1bc230f7
[ 9741.343057] Code: Unable to access opcode bytes at RIP 0x7f2e1bc230cd.
[ 9741.344111] RSP: 002b:00007ffd66015680 EFLAGS: 00000293 ORIG_RAX: 00000000000000e8
[ 9741.345324] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f2e1bc230f7
[ 9741.346468] RDX: 0000000000000066 RSI: 00007ffd660156c0 RDI: 0000000000000004
[ 9741.347607] RBP: 00007ffd660156c0 R08: 0000000000000000 R09: 00007ffd66015ce0
[ 9741.348741] R10: 00000000ffffffff R11: 0000000000000293 R12: 0000000000000066
[ 9741.349872] R13: 00000000ffffffff R14: 00007ffd660156c0 R15: 0000000000000001
[ 9741.351013] task:kthreadd        state:S stack:    0 pid:    2 ppid:     0 flags:0x80004000
[ 9741.352334] Call Trace:
[ 9741.352773]  __schedule+0x2c4/0x700
[ 9741.353386]  ? __set_cpus_allowed_ptr+0xa4/0x200
[ 9741.354160]  schedule+0x37/0xa0
[ 9741.354703]  kthreadd+0x2e8/0x300
[ 9741.355282]  ? kthread_create_on_cpu+0xa0/0xa0
[ 9741.356022]  ret_from_fork+0x35/0x40
[ 9741.356633] task:rcu_gp          state:I stack:    0 pid:    3 ppid:     2 flags:0x80004000
[ 9741.357950] Call Trace:
[ 9741.358393]  __schedule+0x2c4/0x700
[ 9741.358993]  schedule+0x37/0xa0
[ 9741.359551]  rescuer_thread+0x29e/0x340
[ 9741.360208]  ? process_one_work+0x360/0x360
[ 9741.360908]  kthread+0x116/0x130
[ 9741.361466]  ? kthread_flush_work_fn+0x10/0x10
[ 9741.362208]  ret_from_fork+0x35/0x40
[ 9741.362812] task:rcu_par_gp      state:I stack:    0 pid:    4 ppid:     2 flags:0x80004000
[ 9741.364129] Call Trace:
[ 9741.364570]  __schedule+0x2c4/0x700
[ 9741.365167]  schedule+0x37/0xa0
[ 9741.365708]  rescuer_thread+0x29e/0x340
[ 9741.366357]  ? process_one_work+0x360/0x360
[ 9741.367063]  kthread+0x116/0x130
[ 9741.367616]  ? kthread_flush_work_fn+0x10/0x10
[ 9741.368357]  ret_from_fork+0x35/0x40
[ 9741.368970] task:kworker/0:0H    state:I stack:    0 pid:    6 ppid:     2 flags:0x80004000
[ 9741.370310] Workqueue:  0x0 (events_highpri)
[ 9741.371033] Call Trace:
[ 9741.371481]  __schedule+0x2c4/0x700
[ 9741.372079]  schedule+0x37/0xa0
[ 9741.372620]  worker_thread+0xb8/0x390
[ 9741.373249]  ? create_worker+0x1a0/0x1a0
[ 9741.373911]  kthread+0x116/0x130
[ 9741.374468]  ? kthread_flush_work_fn+0x10/0x10
[ 9741.375209]  ret_from_fork+0x35/0x40
[ 9741.375811] task:mm_percpu_wq    state:I stack:    0 pid:    9 ppid:     2 flags:0x80004000

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
obdfilter-survey test_2b - Timeout occurred after 164 minutes, last suite running was obdfilter-survey



 Comments   
Comment by Oleg Drokin [ 19/Apr/22 ]

This is not a stack dump, this is a timeout that triggered sysrq-t from test framework to get traces from all tasks on the system

Generated at Sat Feb 10 03:21:01 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.