How process input and output works in Linux
With a basic knowledge of stdout, stderr, and stdin, you can usually get through your daily tasks, but by learning a little more of how those things work under the hood, you can put them to more interesting, and powerful uses.
Let's start by taking a look at how commands and the shell really work, and then we'l move on to manipulating the input and output of any running process.
The bash shell (or a command) and its input and output
Contrary to popular belief, the bash redirection operators do not need to appear at the end of the line; they may appear anywhere on the line, even mixed in with the command line arguments. The following is perfectly valid
[root@zork1 scratch]# 2>err ls 1>out foo nofileand is equivalent to:
[root@zork1 scratch]# ls -l foo nofile 1>out 2>err
Both commands produce the same results: a long listing for the existing file "foo" is placed in a file called "out" and an error message regarding non-existing file "nofile" is placed into a file called "err."
What the bash redirection operators actually do is replace and/or duplicate the file descriptors associated with the command being run. In this case our command is the "ls" process, and our bash shell "execve"s this program, and an environment including the file descriptors is created for the duration that the ls program runs. There is a file descriptor for standard input (0), standard output (1), and standard error (2). You may see them by determing the process id for the program and by listing the /proc/[pid]/fd/ directory. You will see that files "0", "1", and "2" are symbolic links to other files on the system, such as device files associated with psuedo-terminals (your screen.) In our example, fd "1" is changed to point to the file "out" instead of the terminal screen. And similarly for fd "2".
If you want to have a little fun, open a couple of different windows (I recommend screen, or even better, tmux), use the "tty" command to determine the terminal device associated with each window, and then from within one window, launch a new bash process and redirect its output to the other window, as follows:
2>/dev/pts/2 bashThis launches a bash shell whose error output is sent to the second window. Now type commands in the new bash shell and see what happens. Standard error should appear in the other window and standard output should appear in the original window. Determine the pid of the new shell and list the /proc/[pid]/fd folder and see what you find there.
Now a word on pipes. Like the redirection operator, a pipe also modifies file descriptors. If you have "command1 | command2", the pipe changes the stdout (fd 1) of command1 and points it to a new pipe. At the same time, it changes the stdin (fd 0) of command2 and points it to that same pipe. It does this before any other redirection operators are evaluated. You can see this for yourself by typing the following commands:
[root@zork1 ~]# bash | cat - [root@zork1 ~]# tty /dev/pts/1 [root@zork1 ~]# ps -ef | grep bash | grep pts\/1 root 1515 1502 0 Jul29 pts/1 00:00:00 -bash root 3294 1515 0 13:13 pts/1 00:00:00 bash root 3304 3294 0 13:13 pts/1 00:00:00 grep bash [root@zork1 ~]# ls -l /proc/3294/fd total 0 lrwx------. 1 root root 64 Jul 30 13:14 0 -> /dev/pts/1 l-wx------. 1 root root 64 Jul 30 13:14 1 -> pipe: lrwx------. 1 root root 64 Jul 30 13:13 2 -> /dev/pts/1 lrwx------. 1 root root 64 Jul 30 13:14 255 -> /dev/pts/1 [root@zork1 ~]# ps -ef | grep cat root 3295 1515 0 13:13 pts/1 00:00:00 cat - root 3308 3294 0 13:14 pts/1 00:00:00 grep cat [root@zork1 ~]# ls -l /proc/3295/fd total 0 lr-x------. 1 root root 64 Jul 30 13:14 0 -> pipe: lrwx------. 1 root root 64 Jul 30 13:14 1 -> /dev/pts/1 lrwx------. 1 root root 64 Jul 30 13:13 2 -> /dev/pts/1 [root@zork1 ~]#You see that fd 1 for bash, and fd 0 for cat, are both pointed to the same pipe.
It's important to realize that the redirection operators are evaluated from left to right. Particularly so when you want to swap stdout and stderr.
Open a new bash shell, cd into an empty directory, and issue the following commands:
[root@zork1 scratch]# touch foo [root@zork1 scratch]# ls foo nofile ls: cannot access nofile: No such file or directory foo [root@zork1 scratch]#
Both standard input and output are going to the same place - the local terminal screen. Now issue the following commands:
[root@zork1 scratch]# tty /dev/pts/2 [root@zork1 scratch]# 3>&2 2>&1 1>&3 3>&- ls foo nofile ls: cannot access nofile: No such file or directory foo [root@zork1 scratch]#
You see the same output as before, but something different must be going on because we are using a lot of redirects. Evaluating from left to right, we create a new file descriptor 3 that points to the same place 2 currently points (/dev/pts/2); then we make fd 2 point to where fd 1 currently points (also /dev/pts/2); then we make fd 1 point to where 3 currently points (again, /dev/pts/2). And we close file descriptor 3 for good measure since we aren't using it anymore. We have just swapped standard input and standard output, although everything still goes to the same place (/dev/pts/2) and so we don't notice any difference in the output. So we didn't really achieve anything, except now we know how to swap stdin and stdout.
Which can be useful. Consider the following:
[root@zork1 scratch]# 3>&2 2>&1 1>&3 3>&- ls foo nofile | cat - >out foo [root@zork1 scratch]# cat out ls: cannot access nofile: No such file or directory [root@zork1 scratch]#The first thing to happen is the creation of a pipe, then fd 1 for "ls" and fd 0 for "cat" are both pointed to this pipe. Then, on the "ls" side the following: new file descriptor 3 is created and pointed to where fd 2 points, which is /dev/pts/2. Then file descriptor 2 is pointed to where 1 currently points which is the named pipe. Then file descriptor 1 is pointed to where 3 points which is /dev/pts/2. Then fd 3 is closed. Then the "ls" command is run. The error output (fd 2) goes into the named pipe. The standard output (fd 1) goes to the screen (/dev/pts/2). Meanwhile, on the "cat" side, the standard input is provided by the pipe, and we are redirecting the standard output to a file called "out". Thus, the cat receives the error output from "ls" on the pipe and treats it as its standard input (fd 0). "Cat" copies its input to its output, which in this case is the file called "out". In summary: the standard output from "ls" is echoed to screen and the error output is piped into the cat command on the right-hand side of the pipe.
Indeed we can simplify the above command without needing to create a fd 3. We change fd 2 first, and then we are free to direct fd 1 in any manner we wish, for example:
[root@zork1 scratch]# 2>&1 ls foo nofile | cat - >outwill have both stdout and stderr going into the pipe
[root@zork1 scratch]# 2>&1 1>lsout ls foo nofile | cat - >outwill put stdout into a file called "lsout" and stderr will go into the pipe
[root@zork1 scratch]# 2>&1 1>/dev/null ls foo nofile | cat - >outwill make stdout disappear, and stderr will go into the pipe.
If you keep in mind that most programs and scripts are designed to feed their standard output to fd 1 and their standard error to fd 2 (as long as the programmer adheres to good coding practice), you have mechanisms for dealing with those two streams as you see fit. For example, a common question is "I would like to swap stderr and stdout so that I can pipe stderr into another command." As we saw above, you shouldn't really think of it as "swapping", you should think instead of (a) setting fd 2 to the pipe by duplicating fd 1; and (b) setting fd 1 to something other than the pipe (/dev/null, your terminal screen, or a file).
Once you understand this process, many possibilities open up. For example, if you would like to pipe stdout into one command and pipe stderr into a different command, this is easily achieved by creating your own pipes and setting the file descriptors to those devices, as shown in the next section:
Using pipes to capture different output streams
Since you know now that pipes and redirection operators are nothing more than manipulation of file descriptors, you can create and use your own pipes as you see fit. Let's run a command, and pipe its standard output into another command, and its standard error into yet a different command.
- First let's open up three tmux windows
- Next let's create two pipes
[root@zork1 scratch]# mkfifo /tmp/myfifo1; mkfifo /tmp/myfifo2(you may also use "mknod /tmp/myfifo p" if you like)
- Now in window1 launch an ls command whose standard output goes into pipe1 and whose standard error goes into pipe2:
[root@zork1 scratch]# 1>/tmp/myfifo1 2>/tmp/myfifo2 ls foo nofile
When you hit return, nothing happens, that's because it's waiting for the other end of the pipe to read it.
- In window 2, launch a cat process whose standard input is the first pipe:
[root@zork1 scratch]# </tmp/myfifo1 cat -
This also will wait for pipe to close before doing anything.
- In window 3, launch a second cat process that reads from the second pipe:
[root@zork1 scratch]# </tmp/myfifo2 cat -
Hitting return on this process will complete the chain, and all three processes will terminate.
As you can see in the following picture, the expected results are that no output from the ls command appears in window 1 (upper left); the standard output from the ls command appears in window 2 (upper right); and the standard error from the ls command appears in window 3 (bottom.) So now you know how to treat stderr and stdout independently and use them in any manner you see fit.
stderr to screen, stdout and stderr to file
Here is how to run one or more commands, capturing the standard output and error, in the order in which they are generated, to a logfile, while displaying only the standard error on any terminal screen you like.
- Open two windows (shells)
- Create some test files:
touch /tmp/foo /tmp/foo1 /tmp/foo2
- in window1:
mkfifo /tmp/fifo </tmp/fifo cat - >/tmp/logfile
- Then, in window2:
(ls -l /tmp/foo /tmp/nofile /tmp/foo1 /tmp/nofile /tmp/nofile; echo successful test; ls /tmp/nofile1111) 2>&1 1>/tmp/fifo | tee /tmp/fifo 1>/dev/pts/1The subshell runs some "ls" and "echo" commands in sequence, such that some succeed (providing stdout) and some fail (providing stderr) in order to generate a mingled stream of output and error messages, so that you can verify the correct ordering in the log file.
How to take control of a running process's stdin/stdout/stderr
The above dialogue prepares us for an even more interesting task: how to manipulate the input and output of processes that are already running.
Let's say you are physically at the console of your linux server, you log in, you run a bunch of commands, and you walk away without closing the shell (let's ignore any security implications). Later you are away from the server room, you log in remotely and you would like to view the command history from earlier. Perhaps you need to know exactly which command options you used for a particular program. How retrieve this command history? This is a bit of a dilemma because:
- You don't have physical access to the console
- The bash shell is still running which means it holds the command history in memory (it has not yet written to .bash_history)
- There is no signal you can send to the bash process to cause it to dump its history
- Even if you could dump the history, there's no guarantees it won't be overwritten by other bash processes on the system before you have a chance to review it
The way to deal with this is to identify the bash process in question (the target shell) and, as long as you have access to another shell on the system, whether via ssh or other means, you can take control of the standard input and output of the target shell. Then you can issue a "history" command which will retrieve the history from memory and display it. You can also issue any other commands you like because you now interact with the target shell. In more layman's terms, you "redirect the standard input and output" of the target shell to the shell you have access to.
This procedure actually allows you to control the input and output of any running process on the system - shell or otherwise - which means its uses extend beyond our example. As long as you understand the principle, you can modify it to suit your needs.
The procedure works on Linux systems and requires gdb. (The GNU debugger). GDB has the capability to attach to a running process and modify its parameters, in our case we use it to change the file descriptors for the process stdin, stdout, and stderr. A couple points to keep in mind is that when GDB attaches to a process, it suspends execution of the process in the same way as SIGSTOP. When GDB detaches, execution resumes in the same way as SIGCONT.
For any process on a Linux system, if you know its pid, you may examine its file descriptors with "ls -l /proc/[pid]/fd/". This brings up a question: if the filesystem gives you access to the file descriptors, why don't you just modify the files in /proc/[pid]/fd/ instead of using GDB? Two reasons:
- The kernel does not let you. If you try to modify those files, regardless of whether you own them or are root, you will get "Permission denied."
- More importantly, you should not be modifying file descriptors on a running process anyway. You first need to suspend the process - whether by GDB, SIGSTOP, or Ctrl-Z (which is just a shortcut way to send SIGSTOP) - and then make your changes.
ProcedureThis procedure makes use of "screen" or "tmux", which is not strictly necessary, but makes life easier.
- ssh into the server. launch a screen or tmux session and open a few windows. You can determine the pseudo-device for each window by typing "tty." Let's say we have "/dev/pts/" corresponding to three shells we're running in the screen or tmux session
- determine the pid of the target bash process. this process is currently associated with a terminal device such as /dev/tty1 because this was the device associated with mingetty when you logged in at the console
- from screen window 1, run "gdb -p [pid]" and run the following commands within gdb:
- p dup2(open("/dev/pts/2",0),0) # this changes the standard input for the target process
- p dup2(open("/dev/pts/3",1),1) # this changes the standard output for the target process
- p dup2(open("/dev/pts/3",1,),2) # this changes the the standard err for the target process
- from window 1 (/dev/pts/1), "ls -l /proc/[pid]/fd" to verify the file descriptor changes for the bash process we want to manipulate
- what you type in window 2 is now fed to two place: the bash shell launched with Window 2, and the target shell. therefore, from window 2 (/dev/pts/2), type "hhiissttoorryy[return][return]". the reason you have to type everything twice is because the input is divied out to both the current bash shell and the target bash shell. This is because the operating knows there are two sources that are tapping into the keyboard input for /dev/pts/2, and it fairly distributes the characters you type. the first character goes to one destination, the next goes to the second destination, etc. If you had three processes whose stdin was /dev/pts/3, then you would have to type each character three times in order to ensure the target shell receives the full command. Otherwise it only gets every third character. The kernel feeds the input characters in round-robin fashion to the recipients.
- The problem with the above step is that it runs "history" in both shells. You can get around this by temporarily setting the stdin for the Window 2 bash shell to some unused device, such as /dev/tty5 or something. This means the /dev/pts/2 keyboard is now associated as input for only one process (instead of two). Now you can type commands as normal (they just won't be echoed to the Window 2 screen, they will be echoed to /dev/pts/3 instead because that's where you've redirected the stdout.)
- since you typed 'history' and it was fed to the target process, whose stdout is window 3, switch to window 3 so you can see the command output. now you have the command history for the target bash shell.
- back to window 1, use gdb again on [pid] to reset the standard in,out,err for the target shell back to their original values (/dev/tty1). And you can also set the Window 2 stdin back to what it should be.
If you like, you can clean up the extra file descriptors by removing them with 'exec 3>&-' (removes fd 3 for example)
You can probably make the above even easier and more user-friendly by using a single Window for both input and output: open the window, determine its tty or pty, temporarily set its shell's assigned standard in and out to an unusued device, assign the target shell's standard in/out to use this tty or pty. This way you can use a single screen for all input and output with the target shell.