strace: order of <unfinished ...> and <... resumed>

try-catch-finally picture try-catch-finally · Jul 12, 2014 · Viewed 10.1k times · Source

I'm writing a script that analyzes file access traced with strace.

The trace contains some calls which have been interrupted by another process. strace shows them with <unfinished ...> and <... close resumed> (in case of an interrupted close call) markers.

[pid 26817] 12:48:22.972737 close(449 <unfinished ...>
[pid 28708] 12:48:22.972797 fcntl(451, F_SETFD, FD_CLOEXEC <unfinished ...>
[pid 26817] 12:48:22.972808 <... close resumed> ) = 0 

The process and all its threads have been traced with

strace -f -tt -p <pid>

The man page is uncertain about when the call has been finished.

If a system call is being executed and meanwhile another one is being called from a different thread/process then strace will try to preserve the order of those events and mark the ongoing call as being unfinished. When the call returns it will be marked as resumed.

While I'd assume that, natually the resumed marker will indicate that the call is now finished. I'd like to ask if it is so.

Can the above trace excerpt be reconstructed to

A

[pid 28708] 12:48:22.972797 fcntl(451, F_SETFD, FD_CLOEXEC <unfinished ...>
[pid 26817] 12:48:22.972808 close(449) = 0

or should it be reconstructed to

B

[pid 26817] 12:48:22.972737 close(449) = 0 
[pid 28708] 12:48:22.972797 fcntl(451, F_SETFD, FD_CLOEXEC <unfinished ...>

The order is crucial here since there may be multiple calls between the unfinished and resumed and one of them might do something with the file that is about to be closed at the moment.

Answer

Juan Cespedes picture Juan Cespedes · Sep 8, 2014

The system call begins when strace writes the line close(449 <unfinished ...>, and ends when it outputs <... close resumed>.

The close is not interrupted by any other call or signal: the other call is executed by another process, while the kernel is closing your file descriptor.

There is no way to know which is the exact point when the file descriptor is closed; the only thing you know is that it is not closed until the syscall is executed, and that it is closed when the syscall finishes.