report hung radostrace ops on interrupt#92
Conversation
|
test result - https://paste.ubuntu.com/p/FvpMC8XsCX/ |
| clog << "process killed" << endl; | ||
| got_sigint = 1; | ||
| } else { | ||
| exit(signum); |
There was a problem hiding this comment.
exit(3) isn't a async-signal-safe function: https://www.man7.org/linux/man-pages/man7/signal-safety.7.html
better to use _exit() instead.
The same concern is applicable for clog too. We need to review "signal safety" across the codease. But that can be done separately later.
| int found = 0; | ||
|
|
||
| memset(&key, 0, sizeof(key)); | ||
| while (bpf_map_get_next_key(map_fd, &key, &next_key) == 0) { |
There was a problem hiding this comment.
To get the first key that's not in the map, bpf_map_get_next_key needs to be called with NULL as key. But a valid "key" is passed now (key is zero-initalized but it's not NULL), so this is likely to miss the correct first key.
struct client_op_k *cur = NULL;
while (bpf_map_get_next_key(map_fd, cur, &next_key) == 0) {
if (bpf_map_lookup_elem(map_fd, &next_key, &val) == 0) {
...
}
key = next_key;
cur = &key;
}
would fix it.
|
I also think "hung ops" better describes than "hang ops", so suggest name changes as such. |
3baee40 to
aa20743
Compare
Report unfinished radostrace ops from the BPF ops map before exit when tracing stops via Ctrl+C or when the session timeout expires. - keep SIGINT on a graceful shutdown path - dump pending ops with duration, client, tid, osd, object, and rw mode - report pending ops before exiting on -t timeout - prevent false positives when ring buffer reservation cannot reserve space Signed-off-by: Zhang Hua <joshua.zhang@canonical.com>
aa20743 to
b05de17
Compare
taodd
left a comment
There was a problem hiding this comment.
We also need to add a test to deliberately cause a hang Op
| } | ||
| if (name_len > 0) { | ||
| bpf_probe_read_user(val->object_name, name_len, (void *)name_base); | ||
| } |
| return 0; | ||
| } | ||
|
|
||
| static int report_hung_ops(struct radostrace_bpf *skel, |
There was a problem hiding this comment.
We need to change the format about the hung ops output.
- Keep the same format as the complete ops output
- Add a column to indicate whether an op is complete or incomplete
Handle SIGINT without exiting immediately so radostrace can dump operations still pending in the BPF ops map before cleanup.