Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
2b6011b to
2606352
Compare
2606352 to
807b2a9
Compare
| pub fn is_creation(&self) -> bool { | ||
| matches!(self.file, FileData::Creation(_)) | ||
| } | ||
|
|
||
| pub fn is_unlink(&self) -> bool { | ||
| matches!(self.file, FileData::Unlink(_)) | ||
| } | ||
|
|
There was a problem hiding this comment.
There has to be an idiomatic way of dispatching to a handling function depending on the type ; a match could be a way, but the is_creation() function wasn't going in that direction.
@Molter73 , what would you advise ?
There was a problem hiding this comment.
One difficulty is that the 'unlink' handler has to be called "after" the host path is set, and the 'creation' handler has to be called "before" the host path.
There was a problem hiding this comment.
We can do a match at the call site, but that would require us to make the file field public in the event and it might not be something we want to do.
Another alternative could be to move the check itself to the associated handle_* method, but that means we need to make that call for all events, regardless of the type they are. This might be the way to go if it doesn't impact performance too much
With the current approach we could still expand usability of the method, imagine in the future we need to call the method protected by this check for rename events as well, we can simply rename the method and add the additional check to something like this:
pub fn is_path_removed(&self) -> bool {
matches!(self.file, FileData::Unlink(_) | FileData::Rename(_))
}For the time being, I wouldn't worry too much about it, we can revisit it once the code evolves a bit more to see what pattern emerges.
| with open(original_file, 'w') as f: | ||
| f.write('This is a test') | ||
|
|
||
| # Create two hardlinks in the unmonitored directory | ||
| hardlink_file1 = os.path.join(ignored_dir, 'hardlink1.txt') | ||
| os.link(original_file, hardlink_file1) | ||
|
|
||
| hardlink_file2 = os.path.join(ignored_dir, 'hardlink2.txt') | ||
| os.link(original_file, hardlink_file2) | ||
|
|
||
| os.remove(hardlink_file1) | ||
| os.remove(hardlink_file2) |
There was a problem hiding this comment.
This seems to be flaky. The inode of the original file may not have landed in the probe's map before we start deleting.
I will split the test to wait for the CREATION event before going on with the unlinks.
There was a problem hiding this comment.
This shows that we have a bit of a problem in our code, if a file is created and removed we will miss the host path for it, I wonder if that is related to a check we do on the filesystem and if we should just ignore that and proceed to add the host path regardless of whether we see it in the filesystem or not 🤔
I think I originally put that check in place because I was worried inodes for a file that was removed would linger if we missed the unlink event, but I'm fairly certain the scan should take care of cleaning up.
There was a problem hiding this comment.
I wonder if that is related to a check we do on the filesystem
Do you mean that you suspect that this is adding to the delay before the host path is added to the inode map ?
Even if we try to add the information as soon as possible the race-condition is there, and I don't see any way we could avoid it completely.
There was a problem hiding this comment.
You shouldn't have a race condition though:
- The call to create the file is done.
- The file_open LSM hook is called and an event is put in the ringbuffer.
- The call to create the file finishes and immediately begins the operation for removing it.
- The path_unlink LSM hook is called and an event is put in the ringbuffer.
The fact we have a single ringbuffer ensures userspace will see the two events in order so we should have the host path when the unlink event comes around.
I was under the incorrect impression this check would be in the path of the file creation event, but it is not, so I don't really see where the race condition would be. I'll try to debug this a bit with a fresher head on Monday.
87bffaf to
fb7d100
Compare
Molter73
left a comment
There was a problem hiding this comment.
Overall looks good, have a couple small comments, but overall we should be good to merge soon
| if self.inode_map.borrow_mut().remove(inode).is_some() { | ||
| self.metrics.scan_inc(ScanLabels::InodeRemoved); | ||
| } |
There was a problem hiding this comment.
Do we want to have a metric for when the removal fails? It shouldn't happen, but it would be nice to know if it does.
There was a problem hiding this comment.
Well, it would happen if a customer monitors files with more than one link. I agree that it would be good to know if we are in this case ; as the behavior of fact would probably be quite erratic.
| with open(original_file, 'w') as f: | ||
| f.write('This is a test') | ||
|
|
||
| # Create two hardlinks in the unmonitored directory | ||
| hardlink_file1 = os.path.join(ignored_dir, 'hardlink1.txt') | ||
| os.link(original_file, hardlink_file1) | ||
|
|
||
| hardlink_file2 = os.path.join(ignored_dir, 'hardlink2.txt') | ||
| os.link(original_file, hardlink_file2) | ||
|
|
||
| os.remove(hardlink_file1) | ||
| os.remove(hardlink_file2) |
There was a problem hiding this comment.
This shows that we have a bit of a problem in our code, if a file is created and removed we will miss the host path for it, I wonder if that is related to a check we do on the filesystem and if we should just ignore that and proceed to add the host path regardless of whether we see it in the filesystem or not 🤔
I think I originally put that check in place because I was worried inodes for a file that was removed would linger if we missed the unlink event, but I'm fairly certain the scan should take care of cleaning up.
| event.set_old_host_path(host_path); | ||
| } | ||
|
|
||
| // Special handling for unlink events |
There was a problem hiding this comment.
What makes the handling special?
There was a problem hiding this comment.
What I want to express is that "this is the specific things we have to do because this event is a deletion".
| Ok(()) | ||
| } | ||
|
|
||
| /// Special handling for unlink events. |
There was a problem hiding this comment.
What makes the handling special?
| # Create a guard file to ensure all events have been processed | ||
| guard_file = os.path.join(monitored_dir, 'guard.txt') | ||
| with open(guard_file, 'w') as f: | ||
| f.write('guard') |
There was a problem hiding this comment.
This is probably a pattern we could adopt directly in the wait_events method.
Description
When a file is unlinked, the corresponding inode should be removed from the kernel inode map and also from the inode->path map in userland.
Checklist
Automated testing
If any of these don't apply, please comment below.
Testing Performed
TODO(replace-me)
Use this space to explain how you tested your PR, or, if you didn't test it, why you did not do so. (Valid reasons include "CI is sufficient" or "No testable changes")
In addition to reviewing your code, reviewers must also review your testing instructions, and make sure they are sufficient.
For more details, ref the Confluence page about this section.