+++ title = "Intro to Linux and the Bash command line, pt III" date = 2019-01-11T08:23:00Z +++ New year, new post. In this third, and most probably final part of these tutorial/guide series I will be mentioning some useful commands and programs usually present in most standard linux installations. I will be talking especially about programs/commands to manipulate text output from programs and files. I will also talk a little bit about regular expressions, a powerful tool to perform searches inside text strings. ## Filters These programs that perform operations on input text and then write them to standard output are commonly known as filters. You may already be familiar with one of these commands, which the first one that I'm going to talk about. ### cat This command allows you the see the content of a text file (or files). It stands for con*cat*enate, and not for the house pet. The most basic use of this command is to view the contents of a text file, just by typing cat followed by the path of the file we wish to see. However, as it name implies, it also has the ability to concatenate the contents of multiple text files, for example text file 'sample.txt' ```sh user@host:~/Documents/notes$ cat sample.txt Pepe cool Tide Pods lame Uganda Knuckles cool Thanos cool JPEG ok Despacito lame Bowsette cool Harold cool Sans coolest Minions lamest NPC cool ``` And we want to concatenate its content with file 'sample2.txt' to standard output ```sh user@host:~/Documents/notes$ cat sample.txt sample2.txt Pepe cool Tide Pods lame Uganda Knuckles cool Thanos cool JPEG ok Despacito lame Bowsette cool Harold cool Sans coolest Minions lamest NPC cool Troll Face old Can haz chezburger really old ROFLcopter super old Dancing baby ancient ``` As usual, this command accepts different options, like for example the -n option to display line ```sh user@host:~/Documents/notes$ cat -n sample.txt 1 Pepe cool 2 Tide Pods lame 3 Uganda Knuckles cool 4 Thanos cool 5 JPEG ok 6 Despacito lame 7 Bowsette cool 8 Harold cool 9 Sans coolest 10 Minions lamest 11 NPC cool ``` As always, you can check other options by looking up cat and any other command with man, as explained in the previous part. ### head This is a really simple command, it shows the first n lines of a text file/output. To use it, type head, followed by -n, then the number of lines to show, and then the file. For example, let's say we want to see the first 5 lines of sample.txt ```sh user@host:~/Documents/notes$ head -n 5 sample.txt Pepe cool Tide Pods lame Uganda Knuckles cool Thanos cool JPEG ok ``` If we use the command without passing the number of lines we wish to see, it outputs the first 10 lines by default. For this just type head followed by the path of the file. ### tail Basically the same as head, except it shows the last lines. Let's say we want to see the last three lines ```sh user@host:~/Documents/notes$ tail -n 3 sample.txt Sans coolest Minions lamest NPC cool ``` As with head, the default is to output 10 lines. ### sort This command is as obvious as it seems. It sorts output. For example ```sh user@host:~/Documents/notes$ sort sample.txt Bowsette cool Despacito lame Harold cool JPEG ok Minions lamest NPC cool Pepe cool Sans coolest Thanos cool Tide Pods lame Uganda Knuckles cool ``` ### sed This one is a really powerful utility to transform and manipulate text, however, to keep this tutorial short, I will only be showing a couple of the most used cases. sed stands for "stream editor". The way to use sed, is to pass it a kind of script (a sed script) that tells it what to do with the text. The first and one of the most basic uses of sed, is to basically perform the same task as head, to get the first n number of lines. For example, let's say we want the first 7 lines of sample.txt ```sh user@host:~/Documents/notes$ sed '7q' sample.txt Pepe cool Tide Pods lame Uganda Knuckles cool Thanos cool JPEG ok Despacito lame Bowsette cool ``` Of course what I've just told it you it does is a simplification of what it really does. Most accurately, the command or script that we passed to sed tells it to output the first seven lines, and the q tells it to stop after that. Another basic use of sed, and arguably the most common one, is to perform search and replace operations on text. The basic syntax for this operations is 's///' where is the term you want to search for, and is the term you wish to replace it with. By default it will replace only the first occurrence in each line, however, we can specify which or how many occurrences we want to replace by adding a number and/or letter to the end. For example, if we add a two ('s///2') it will replace only second occurrences of each line. But what if we want to replace each and every occurrence in all of the text? For that we would use the letter g at the end. Let's say for example, that we want to replace all occurrences of "cool" in our sample.txt file, for "dank". In this case we would type something like this ```sh user@host:~/Documents/notes$ sed 's/cool/dank/g' sample.txt Pepe dank Tide Pods lame Uganda Knuckles dank Thanos dank JPEG ok Despacito lame Bowsette dank Harold dank Sans dankest Minions lamest NPC dank ``` A thing to keep in mind, is that you should be enclosing the sed script in single quotes. Of course these are only some of the most basic uses of this command. ### grep This is the last program to manipulate text output that I want to mention. I will demonstrate its basic use in this section, but I will show you a little bit more about it in the next section when I will be writing about regular expressions. Back to grep, it is a program that basically searches a pattern that you give it, and it will print to you the lines that contain that pattern. For example, let's say that we want to see only the cool (or dank) memes in our file to be displayed ```sh user@host:~/Documents/notes$ grep 'cool' sample.txt Pepe cool Uganda Knuckles cool Thanos coolcharacter Bowsette cool Harold cool Sans coolest NPC cool ``` This line of text that we passed it, is actually the most basic form of regular expression, of which we will be looking into detail next. ## Regular expressions A regular expression, or regex for short, is a string of text, that define a search pattern for a larger set of text. Regexes are used in many programs, such as in text editors, and search engines, and can be also of great use in the terminal ### An intermission Before going into actual regular expressions in grep, I want to mention a couple of characters that can make your life easier when dealing with files in the terminal. They are called wildcards, and they are the asterisk (*) and the question mark (?). If you've ever wondered why you can't use those characters in any of your files' names, that's why. I'll start by explaining the asterisk. When you use the asterisk, you are asking to look at or take all files that contain the any number of any combination of symbols in the place where you put it. For example, we could be looking at files that start with sa ```sh user@host:~/Documents/notes$ ls sa* saturday.txt sample.txt sample2.txt sample.png ``` Or another example, we could be looking for files that just contain sa in their name ```sh user@host:~/Documents/notes$ ls *sa* asado.png saturday.txt sample.txt sample2.txt sample.png ``` Now the question mark. The question mark indicates that there should be a character in its place, just any character. Let's say that we want to see all files with name "sample" that have a three character extension ```sh user@host:~/Documents/notes$ ls sample.??? sample.txt sample.png ``` Wildcards come really handy when you need to manipulate multiple files with similar names. If the files that you wish to manipulate don't really have similar names, you might want to use curly braces to indicate a list of files to manipulate, separated by commas. For example ```sh user@host:~/Documents/notes$ rm {monday.txt,december1999.txt,saturday.txt} ``` ### Back to regex Now I'll explain some things about regular expressions, and I'll demonstrate some basic uses with grep. Here are some basic concepts * `.` - The dot means a single character (any character). e.g. 'be.r' would match bear, beer, befr, etc. * `*` - The preceding element matches 0 or more times. e.g. 'an*t' would match at, ant, annt, annnt, etc. * `+` - The preceding element matches one or more times. e.g. 'an+t' would match ant, annt, annnt, etc. * `?` - The preceding element matches 0 or one time. e.g. 'an?t' would match at, and ant. * `{n}` - The preceding element matches exactly n times. * `{min, }` - The preceding element matches at least min times. * `{min, max}` - The preceding element matches at least min times, and no more than max times. * `|` - The pipe, logical OR operator. e.g. 'gray|grey' would match gray and grey * `()` - The parenthesis group multiple characters as one element. e.g. 'gr(a|e)y' would match gray and grey. * `[abc]` - It matches if a character is one of those inside the brackets. * `[^abc]` - It matches if none of the characters is one of those inside the brackets. * `[a-d]` - A range of characters. i.e. a, b, c, or d. * `^` - Matches the beginning of the line. * `$` - Matches the end of the line. So now let's suppose for a practical example with grep, that we want to find all lines that have "cool" or "ok" in them. In this case we would use the "|" pipe symbol. However, if we use normal grep, we would have to escape the pipe symbol like this "\|". That's why it is better that we use "grep -E" to enable extended regex, or its shorter alias "egrep". It would look something like this ```sh user@host:~/Documents/notes$ egrep 'cool|ok' sample.txt Pepe cool Uganda Knuckles cool Thanos cool JPEG ok Bowsette cool Harold cool Sans coolest NPC cool ``` Let's suppose, for another example, that we want to match those lines with a 't' as the last character ```sh user@host:~/Documents/notes$ egrep 't$' sample.txt Sans coolest Minions lamest ``` I have already mentioned and shown you the use of regexes with grep (and/or egrep). Now I would like to show a more practical example with sed. Yes, sed uses its own script language to alter text input, however, it also makes use of regular expressions. Let's suppose that we have a file that looks like this ```sh user@host:~/Documents/notes$ cat shortcuts # Some shortcuts d ~/Documents D ~/Downloads m ~/Music pp ~/Pictures vv ~/Videos s ~/.scripts # My scripts cf ~/.config # My configs ``` As we can see there is a lot of whitespace, and although comments might be of help to humans, they are of no use to machine. Let's begin by getting rid of the comments, for that first need to remember the search and replace command of sed, 's//replace/g', since we basically want to get rid of any comment-looking string and replace it with, well, nothing. Now we have to think of a regex that will match comments, for that '#.*' will do. What regex means is, match '#' and everything after it. Now let's put it together, and ```sh user@host:~/Documents/notes$ sed 's/#.*//g' shortcuts d ~/Documents D ~/Downloads m ~/Music pp ~/Pictures vv ~/Videos s ~/.scripts cf ~/.config ``` Bam, there it is. However, we still have the blank lines left, and, if you pay close attention, the comments have been deleted, but, the spaces that used to be before some of the comments are still there. So first, let's improve our current sed command, if we want to match 0 or more spaces (zero because not every comment has a space before it) we would use the `*` symbol, but what symbol would we use for spaces? Well, that's an easy one, in sed we escape spaces like this '\s', so now our sed command looks like this 's/\s*#.*//g'. Let's take care of the last part, getting rid of blank lines. For this we would need to issue a separate command, but fortunately we can stack commands in one line with a semicolon (;). Now that we know that we need a way to match empty lines with a regex, that's very easy - '^$' just match the beginning and the end of line together, after that, we add a sed command for deleting lines which I haven't mentioned (d), and our one liner is ready... ```sh user@host:~/Documents/notes$ sed 's/\s*#.*//g; /^$/d' shortcuts d ~/Documents D ~/Downloads m ~/Music pp ~/Pictures vv ~/Videos s ~/.scripts cf ~/.config ``` Of course, issuing this command will not replace the original file, it will simply output the result to the terminal screen. If you want to overwrite the original file with the result of the sed command, you can pass sed the '-i' option. ## Piping and redirecting output This post is already getting too long, however there's one more useful thing about *nix systems that I'd like to mention - the pipeline. The pipeline in Unix and Unix-like OSs is a chain of redirected output to the input of another program. Along with that, there are operators to redirect standard output to files (and viceversa). ### Redirecting to and from files Let's suppose that we want to repeat the last example, and want to clean the file of comments and blank lines. We already now how to overwrite that file, however, what if we want to save it to another file using common Unix operators in bash. For that we can use the '>' and '>>' operators. For example, let's we want to save the result to a second file called "shortcuts_clean" ```sh user@host:~/Documents/notes$ sed 's/\s*#.*//g; /^$/d' shortcuts > shortcuts_clean ``` Since there was no "shortcuts_clean" file, it has been created automatically. However, if the file had already existed, it would have overwritten it, unless we had used the '>>' operator, in that case, it would have appended the output to the already existent file. Just as there's '>' to redirect TO files, there's also the '<' to redirect from files to a program's standard input. However, must of the times you would just pass the name/path of the file to the program as an argument. ### Piping Now that we know how to redirect from and to files, we can learn how to redirect from one program to another, with pipes. The pipe operator in *nix systems is the vertical bar symbol (|). Let's suppose that we want to see the first three files in our current directory, for that, we can pipe the output of ls into head, like this ```sh user@host:~/Documents/notes$ ls | head -n 3 asado.png monday.txt sample.txt ``` Now let's get back to our sample.txt file. Let's imagine that we first want to sort our lines, and we want to preserve only those lines that contain "cool" or "lame". Then let's suppose we want to modify to contain legit terms, and not some antiquated boomer slang, so we want to replace cool with dank, and lame with normie. Finally we want that to be output to a file instead of the screen. Whew! Sounds like a lot of stuff to do, but it is quite simple, and it looks like this ```sh user@host:~/Documents/notes$ egrep 'cool|lame' sample.txt | sort | sed 's/cool/dank/g;s/lame/normie/g' > memes.txt ``` So if we now take a look at the file... ```sh user@host:~/Documents/notes$ cat memes.txt Bowsette dank Despacito normie Harold dank Minions normiest NPC dank Pepe dank Sans dankest Thanos dank Tide Pods normie Uganda Knuckles dank ``` And that's basically it. ## Post scriptum Before ending it for good, I want to show some other programs that might be of use in the Bash command line. ### less This command might come in handy when there's another command that outputs a lot of text that overfills the terminal screen. You can pipe (as we have just learned) the output of that command to less, so that you can navigate with your arrow keys, or better yet with vim keys (hjkl). You can also search for terms by typing slash (/), just like with man. ### tar This program is used in Linux to create and extract archives with the .tar format, usually also compressing them using gunzip (.gz). There usually two ways you will be using the program. One to extract files from a compressed archive ```sh user@host:~/Documents/notes$ tar -xzvf oldnotes.tar.gz ``` And the other to archive and compress files ```sh user@host:~/Documents/notes$ tar -czvf allnotes.tar.gz * ``` To learn more about the different options of this program, I recommend you check the man pages of tar ('man tar'). ### ssh and scp You may have already heard about ssh, which stands for "secure shell", even if you are new to Linux or Unix/Unix-like systems. This program is used to connect to other computers over a network (or the internet for instance), especially to servers. Let's suppose that you have a server with ip address 180.80.8.20 and your user is tux ```sh user@host:~$ ssh tux@180.80.8.20 ``` Of course, here we have assumed that the standard ssh port (22) is being used, otherwise you will have to specify it by passing -p followed by the port number. Now let's talk about scp, which stands for "secure copy". This command uses the same protocol as ssh, and it's used to copy files from one computer to another over a network. Let's suppose that you want to copy a file from your current computer to the server we used in the previous example ```sh user@host:~$ scp somefile tux@180.80.8.20:/home/tux/directory/ ``` If we were trying to do it the other way around, that is, from the remote computer to your local computer, it would look like this ```sh user@host:~$ scp tux@180.80.8.20:/home/tux/directory/somefile directory/ ``` Just as with ssh, if you are not using standard port 22, you need to say to the program to which port you are trying to connect, except that in the case of scp, the flag is '-P' instead of '-p', and goes right after "scp". Well, that's it for this tutorial/guide series, I really hope it was of use to you.