Saturday, October 24, 2009

ptrace system call

Linux provides few system calls for tracing. Ptrace system call is used to trace the process and strace is used to trace the system calls and signals. Let’s see the usage of ptrace here.


What is ptrace?

A parent process can observe and control the execution of another process, and examine its core images and registers using the ptrace system call.

Ptrace finds its application primarily in break-point debugging and system call tracing.


How to trace a process?

Parent forks a child and the child does a PTRACE_TRACEME allowing the parent to trace the child process. The child then execs a new process (a process that is to be traced) by the parent (tracer) process.

When the child’s (exec’d) process is being traced, the child process stops when it is delivered a signal, even if the signal is being ignored. (SIGKILL is exceptional, it kills the child process, as usual). The parent process (the tracer) will be notified at its wait() system call. It may then inspect and modify the child process while it is stopped. The parent may then ignore the signal or re-send another signal to the child. The parent can kill the child or leave the child process to continue when it’s done with tracing.

The parent process can also attach another existing process and trace it using the request PTRACE_ATTACH.


Syntax:


#include sys/ptrace.h

long ptrace(enum __ptrace_request request, pid_t pid,
void *addr, void *data);


pid is the id of the process to be traced.
Request determines the type of action to be performed on the tracing (child) process
Addr, data, we shall see them later. This varies depending on the requests. Few request don’t need the addr, data.


Requests:

Lets see some of the important requests that can be performed on the process (to be traced) by using ptrace system call. Man ptrace will explore you more options than this.

PTRACE_TRACEME
Indicates that this process is to be traced by its parent. Any signal (except SIGKILL) delivered to this process will cause it to stop and its parent to be notified via wait(). Also, all subsequent calls to exec() by this process will cause a SIGTRAP to be sent to it, giving the parent a chance to gain control before the new program begins execution. A process probably shouldn’t make this request if its parent isn’t expecting to trace it. (pid, addr, and data are ignored.) This request is done only by the child process.

PTRACE_PEEKTEXT, PTRACE_PEEKDATA
Reads a word at the location addr in the child’s memory, returning the word as the result of the ptrace() call. Linux does not have separate text and data address spaces, so the two requests are currently equivalent. (The argument data is ignored.)

PTRACE_PEEKUSR
Reads a word at offset addr in the childâs USER area, which holds the registers and other information about the process

PTRACE_POKETEXT, PTRACE_POKEDATA
Copies the word data to location addr in the child’s memory.

PTRACE_POKEUSR
Copies the word data to offset addr in the child’s USER area. As above, the offset must typically be word-aligned. In order to maintain the integrity of the kernel, some modifications to the USER area are disallowed.

PTRACE_GETREGS, PTRACE_GETFPREGS
Copies the child’s general purpose or floating-point registers, respectively, to location data in the parent. (addr is ignored.)

PTRACE_GETSIGINFO
Retrieve information about the signal that caused the stop. Copies a siginfo_t structure from the child to location data in the parent. (addr is ignored.)

PTRACE_SETREGS, PTRACE_SETFPREGS
Copies the childâs general purpose or floating-point registers, respectively, from location data in the parent. As for PTRACE_POKEUSER, some general purpose register modifications may be disallowed. (addr is ignored.)

PTRACE_SETSIGINFO
Set signal information. Copies a siginfo_t structure from location data in the parent to the child. This will only affect signals that would normally be delivered to the child and were caught by the tracer.

PTRACE_CONT
Restarts the stoped child process

PTRACE_SYSCALL, PTRACE_SINGLESTEP
Restarts the stopped child as for PTRACE_CONT, but arranges for the child to be stopped at the next entry to or exit from a system call, or after execution of a single instruction, respectively.

PTRACE_KILL
Sends the child a SIGKILL to terminate it.

PTRACE_ATTACH
Attaches to the process specified in pid, making it a traced "child" of the current process; the behavior of the child is as if it had done a PTRACE_TRACEME.

PTRACE_DETACH
Restarts the stopped child as for PTRACE_CONT, but first detaches from the process, undoing the reparenting effect of PTRACE_ATTACH, and the effects of PTRACE_TRACEME.

Return values:

On success, PTRACE_PEEK* requests return the requested data, while other requests return zero. On error, all requests return -1, and errno is set appropriately.

Examples:

Lets see an example to print the system call number called by a child process.

#include sys/ptrace.h
#include sys/types.h
#include sys/wait.h
#include unistd.h
#include linux/user.h

int main()
{ pid_t child;
struct user_regs_struct regs;
child = fork();
if(child == 0) {
ptrace(PTRACE_TRACEME, 0, NULL, NULL);
execl("/bin/ls", "ls", NULL);
}
else {
wait(NULL);
ptrace(PTRACE_GETREGS,
child, 0 ,
&regs);
printf("The child made a "
"system call %ld\n", regs.orig_eax);
ptrace(PTRACE_CONT, child, NULL, NULL);
}
return 0;
}

In the above program, a parent forks a child and the child executes the image of ls by calling the execl system call.


The system call numbers are defined in /usr/src/linux-2.6.23.12/arch/i386/kernel/syscall_table.S
ENTRY(sys_call_table)
.long sys_unlink /* 10 */
.long sys_execve /*exec’s system call no is 11 */


So, the output of the program should print the system call number as 11 and list out all the files in the dir.

Before executing the system call, the kernel checks whether the process is being traced. If it is, the kernel stops the process and gives control to the tracking process so it can examine and modify the traced process' registers.


Before running exec, the child calls ptrace with the first argument, equal to TRACE_TRACEME. This tells the kernel that the process is being traced, and when the child executes the execve system call, it hands over control to its parent. The parent waits for notification from the kernel with a wait() call. Then the parent can check the arguments of the system call or do other things, such as looking into the registers.
When the system call occurs, the kernel saves the original contents of the eax register, which contains the system call number. We can read this value from child's USER segment by calling ptrace with the first argument

PTRACE_GETREGS.


After we are done examining the system call, the child can continue with a call to ptrace with the first argument PTRACE_CONT, which lets the system call continue.
The structure user_regs_struct is defined in /usr/include/sys/user.h and contains ebx, ecx, edx system call argv registers and eip, instruction pointer. You can also print those register values if you wish to.


Lets see another example which calculates the number of machine cycle instructions executed by ls.

#include stdio.h
#include stdlib.h
#include signal.h
#include syscall.h
#include sys/ptrace.h
#include sys/types.h
#include sys/wait.h
#include unistd.h
#include errno.h

int main(void)
{
long long counter = 0; /* machine instruction counter */
int wait_val; /* child's return value */
int pid; /* child's process id */

puts("Please wait");

switch (pid = fork()) {
case -1:
perror("fork");
break;
case 0: /* child process starts */
ptrace(PTRACE_TRACEME, 0, 0, 0);
/*
* must be called in order to allow the
* control over the child process
*/
execl("/bin/ls", "ls", NULL);
/*
* executes the program and causes
* the child to stop and send a signal
* to the parent, the parent can now
* switch to PTRACE_SINGLESTEP
*/
break;
/* child process ends */
default:/* parent process starts */
wait(&wait_val);
/*
* parent waits for child to stop at next
* instruction (execl()) */
while (wait_val == 1407 ) {
counter++;
if (ptrace(PTRACE_SINGLESTEP, pid, 0, 0) != 0)
perror("ptrace");
/*
* switch to singlestep tracing and
* release child
* if unable call error.
*/
wait(&wait_val);
/* wait for next instruction to complete */
}
}
printf("Number of machine instructions : %lld\n", counter);
return 0;
}


Here, the parent forks a child which execs ls. The child sets its trace bit by calling PTRACE_TRACEME allowing the parent to trace it. Parent does a single step tracing by calling ptrace with the arg. PTRACE_SINGLESTEP to the child’s pid.

You can exec your program to see how faster it gets executed by checking for lowest machine cycle instructions.


It will be more interesting if you try to overwrite the child’s image using the other ptrace requests like POKE and set register of child’s area. You can also try inserting breakpoints to it.

No comments:

Post a Comment