+++
title = "Intro to Linux and the Bash command line, pt III"
date = 2019-01-11T08:23:00Z
+++

New year, new post. In this third, and most probably final part of these
tutorial/guide series I will be mentioning some useful commands and programs
usually present in most standard linux installations. I will be talking
especially about programs/commands to manipulate text output from programs and
files. I will also talk a little bit about regular expressions, a powerful tool
to perform searches inside text strings.

<!-- more -->

## Filters

These programs that perform operations on input text and then write them to
standard output are commonly known as filters. You may already be familiar with
one of these commands, which the first one that I'm going to talk about.

### cat

This command allows you the see the content of a text file (or files). It
stands for con*cat*enate, and not for the house pet. The most basic use of this
command is to view the contents of a text file, just by typing cat followed by
the path of the file we wish to see. However, as it name implies, it also has
the ability to concatenate the contents of multiple text files, for example
text file 'sample.txt'

```sh
user@host:~/Documents/notes$ cat sample.txt
Pepe    cool
Tide Pods   lame
Uganda Knuckles cool
Thanos  cool
JPEG    ok
Despacito   lame
Bowsette    cool
Harold  cool
Sans    coolest
Minions lamest
NPC cool
```

And we want to concatenate its content with file 'sample2.txt' to standard
output

```sh
user@host:~/Documents/notes$ cat sample.txt sample2.txt
Pepe    cool
Tide Pods   lame
Uganda Knuckles cool
Thanos  cool
JPEG    ok
Despacito   lame
Bowsette    cool
Harold  cool
Sans    coolest
Minions lamest
NPC cool
Troll Face  old
Can haz chezburger  really old
ROFLcopter  super old
Dancing baby    ancient
```

As usual, this command accepts different options, like for example the -n
option to display line

```sh
user@host:~/Documents/notes$ cat -n sample.txt
     1  Pepe    cool
     2  Tide Pods   lame
     3  Uganda Knuckles cool
     4  Thanos  cool
     5  JPEG    ok
     6  Despacito   lame
     7  Bowsette    cool
     8  Harold  cool
     9  Sans    coolest
    10  Minions lamest
    11  NPC cool
```

As always, you can check other options by looking up cat and any other command
with man, as explained in the previous part.

### head

This is a really simple command, it shows the first n lines of a text
file/output. To use it, type head, followed by -n, then the number of lines to
show, and then the file. For example, let's say we want to see the first 5
lines of sample.txt

```sh
user@host:~/Documents/notes$ head -n 5 sample.txt
Pepe    cool
Tide Pods   lame
Uganda Knuckles cool
Thanos  cool
JPEG    ok
```

If we use the command without passing the number of lines we wish to see, it
outputs the first 10 lines by default. For this just type head followed by the
path of the file.

### tail

Basically the same as head, except it shows the last lines. Let's say we want
to see the last three lines

```sh
user@host:~/Documents/notes$ tail -n 3 sample.txt
Sans    coolest
Minions lamest
NPC cool
```

As with head, the default is to output 10 lines.

### sort

This command is as obvious as it seems. It sorts output. For example

```sh
user@host:~/Documents/notes$ sort sample.txt
Bowsette    cool
Despacito   lame
Harold  cool
JPEG    ok
Minions lamest
NPC cool
Pepe    cool
Sans    coolest
Thanos  cool
Tide Pods   lame
Uganda Knuckles cool
```

### sed

This one is a really powerful utility to transform and manipulate text,
however, to keep this tutorial short, I will only be showing a couple of the
most used cases. sed stands for "stream editor".

The way to use sed, is to pass it a kind of script (a sed script) that tells it
what to do with the text. The first and one of the most basic uses of sed, is
to basically perform the same task as head, to get the first n number of lines.
For example, let's say we want the first 7 lines of sample.txt

```sh
user@host:~/Documents/notes$ sed '7q' sample.txt
Pepe    cool
Tide Pods   lame
Uganda Knuckles cool
Thanos  cool
JPEG    ok
Despacito   lame
Bowsette    cool
```

Of course what I've just told it you it does is a simplification of what it
really does. Most accurately, the command or script that we passed to sed tells
it to output the first seven lines, and the q tells it to stop after that.

Another basic use of sed, and arguably the most common one, is to perform
search and replace operations on text. The basic syntax for this operations is
's/<search>/<replace>/' where <search> is the term you want to search for, and
<replace> is the term you wish to replace it with.

By default it will replace only the first occurrence in each line, however, we
can specify which or how many occurrences we want to replace by adding a number
and/or letter to the end. For example, if we add a two
('s/<search>/<replace>/2') it will replace only second occurrences of each
line.

But what if we want to replace each and every occurrence in all of the text?
For that we would use the letter g at the end. Let's say for example, that we
want to replace all occurrences of "cool" in our sample.txt file, for "dank".
In this case we would type something like this

```sh
user@host:~/Documents/notes$ sed 's/cool/dank/g' sample.txt
Pepe    dank
Tide Pods   lame
Uganda Knuckles dank
Thanos  dank
JPEG    ok
Despacito   lame
Bowsette    dank
Harold  dank
Sans    dankest
Minions lamest
NPC dank
```

A thing to keep in mind, is that you should be enclosing the sed script in
single quotes. Of course these are only some of the most basic uses of this
command.

### grep

This is the last program to manipulate text output that I want to mention. I
will demonstrate its basic use in this section, but I will show you a little
bit more about it in the next section when I will be writing about regular
expressions.

Back to grep, it is a program that basically searches a pattern that you give
it, and it will print to you the lines that contain that pattern. For example,
let's say that we want to see only the cool (or dank) memes in our file to be
displayed

```sh
user@host:~/Documents/notes$ grep 'cool' sample.txt
Pepe    cool
Uganda Knuckles cool
Thanos  coolcharacter
Bowsette    cool
Harold  cool
Sans    coolest
NPC cool
```

This line of text that we passed it, is actually the most basic form of regular
expression, of which we will be looking into detail next.

## Regular expressions

A regular expression, or regex for short, is a string of text, that define a
search pattern for a larger set of text. Regexes are used in many programs,
such as in text editors, and search engines, and can be also of great use in
the terminal

### An intermission

Before going into actual regular expressions in grep, I want to mention a
couple of characters that can make your life easier when dealing with files in
the terminal. They are called wildcards, and they are the asterisk (*) and the
question mark (?). If you've ever wondered why you can't use those characters
in any of your files' names, that's why.

I'll start by explaining the asterisk. When you use the asterisk, you are
asking to look at or take all files that contain the any number of any
combination of symbols in the place where you put it. For example, we could be
looking at files that start with sa

```sh
user@host:~/Documents/notes$ ls sa*
saturday.txt sample.txt sample2.txt sample.png
```

Or another example, we could be looking for files that just contain sa in their
name

```sh
user@host:~/Documents/notes$ ls *sa*
asado.png saturday.txt sample.txt sample2.txt sample.png
```

Now the question mark. The question mark indicates that there should be a
character in its place, just any character. Let's say that we want to see all
files with name "sample" that have a three character extension

```sh
user@host:~/Documents/notes$ ls sample.???
sample.txt sample.png
```

Wildcards come really handy when you need to manipulate multiple files with
similar names. If the files that you wish to manipulate don't really have
similar names, you might want to use curly braces to indicate a list of files
to manipulate, separated by commas. For example

```sh
user@host:~/Documents/notes$ rm {monday.txt,december1999.txt,saturday.txt}
```

### Back to regex

Now I'll explain some things about regular expressions, and I'll demonstrate
some basic uses with grep. Here are some basic concepts

* `.` - The dot means a single character (any character). e.g. 'be.r' would match
  bear, beer, befr, etc.
* `*` - The preceding element matches 0 or more times. e.g. 'an*t' would match
  at, ant, annt, annnt, etc.
* `+` - The preceding element matches one or more times. e.g. 'an+t' would match
  ant, annt, annnt, etc.
* `?` - The preceding element matches 0 or one time. e.g. 'an?t' would match at,
  and ant.
* `{n}` - The preceding element matches exactly n times.
* `{min, }` - The preceding element matches at least min times.
* `{min, max}` - The preceding element matches at least min times, and no more
  than max times.
* `|` - The pipe, logical OR operator. e.g. 'gray|grey' would match gray and grey
* `()` - The parenthesis group multiple characters as one element. e.g.
  'gr(a|e)y' would match gray and grey.
* `[abc]` - It matches if a character is one of those inside the brackets.
* `[^abc]` - It matches if none of the characters is one of those inside the
  brackets.
* `[a-d]` - A range of characters. i.e. a, b, c, or d.
* `^` - Matches the beginning of the line.
* `$` - Matches the end of the line.

So now let's suppose for a practical example with grep, that we want to find
all lines that have "cool" or "ok" in them. In this case we would use the "|"
pipe symbol. However, if we use normal grep, we would have to escape the pipe
symbol like this "\|". That's why it is better that we use "grep -E" to enable
extended regex, or its shorter alias "egrep". It would look something like this

```sh
user@host:~/Documents/notes$ egrep 'cool|ok' sample.txt
Pepe    cool
Uganda Knuckles cool
Thanos  cool
JPEG    ok
Bowsette    cool
Harold  cool
Sans    coolest
NPC cool
```

Let's suppose, for another example, that we want to match those lines with a 't' as the last character

```sh
user@host:~/Documents/notes$ egrep 't$' sample.txt
Sans    coolest
Minions lamest
```

I have already mentioned and shown you the use of regexes with grep (and/or
egrep). Now I would like to show a more practical example with sed. Yes, sed
uses its own script language to alter text input, however, it also makes use of
regular expressions.

Let's suppose that we have a file that looks like this

```sh
user@host:~/Documents/notes$ cat shortcuts
# Some shortcuts

d       ~/Documents
D       ~/Downloads
m       ~/Music
pp      ~/Pictures
vv      ~/Videos


s       ~/.scripts # My scripts
cf      ~/.config # My configs
```

As we can see there is a lot of whitespace, and although comments might be of
help to humans, they are of no use to machine. Let's begin by getting rid of
the comments, for that first need to remember the search and replace command of
sed, 's/<search>/replace/g', since we basically want to get rid of any
comment-looking string and replace it with, well, nothing. Now we have to think
of a regex that will match comments, for that '#.*' will do. What regex means
is, match '#' and everything after it. Now let's put it together, and

```sh
user@host:~/Documents/notes$ sed 's/#.*//g' shortcuts


d       ~/Documents
D       ~/Downloads
m       ~/Music
pp      ~/Pictures
vv      ~/Videos


s       ~/.scripts
cf      ~/.config
```

Bam, there it is. However, we still have the blank lines left, and, if you pay
close attention, the comments have been deleted, but, the spaces that used to
be before some of the comments are still there.

So first, let's improve our current sed command, if we want to match 0 or more
spaces (zero because not every comment has a space before it) we would use the
`*` symbol, but what symbol would we use for spaces? Well, that's an easy one, in
sed we escape spaces like this '\s', so now our sed command looks like this
's/\s*#.*//g'.

Let's take care of the last part, getting rid of blank lines. For this we would
need to issue a separate command, but fortunately we can stack commands in one
line with a semicolon (;). Now that we know that we need a way to match empty
lines with a regex, that's very easy - '^$' just match the beginning and the
end of line together, after that, we add a sed command for deleting lines which
I haven't mentioned (d), and our one liner is ready...

```sh
user@host:~/Documents/notes$ sed 's/\s*#.*//g; /^$/d' shortcuts
d       ~/Documents
D       ~/Downloads
m       ~/Music
pp      ~/Pictures
vv      ~/Videos
s       ~/.scripts
cf      ~/.config
```

Of course, issuing this command will not replace the original file, it will
simply output the result to the terminal screen. If you want to overwrite the
original file with the result of the sed command, you can pass sed the '-i'
option.

## Piping and redirecting output

This post is already getting too long, however there's one more useful thing
about *nix systems that I'd like to mention - the pipeline. The pipeline in
Unix and Unix-like OSs is a chain of redirected output to the input of another
program. Along with that, there are operators to redirect standard output to
files (and viceversa).

### Redirecting to and from files

Let's suppose that we want to repeat the last example, and want to clean the
file of comments and blank lines. We already now how to overwrite that file,
however, what if we want to save it to another file using common Unix operators
in bash. For that we can use the '>' and '>>' operators. For example, let's we
want to save the result to a second file called "shortcuts_clean"

```sh
user@host:~/Documents/notes$ sed 's/\s*#.*//g; /^$/d' shortcuts > shortcuts_clean
```

Since there was no "shortcuts_clean" file, it has been created automatically.
However, if the file had already existed, it would have overwritten it, unless
we had used the '>>' operator, in that case, it would have appended the output
to the already existent file.

Just as there's '>' to redirect TO files, there's also the '<' to redirect from
files to a program's standard input. However, must of the times you would just
pass the name/path of the file to the program as an argument.

### Piping

Now that we know how to redirect from and to files, we can learn how to
redirect from one program to another, with pipes. The pipe operator in *nix
systems is the vertical bar symbol (|). Let's suppose that we want to see the
first three files in our current directory, for that, we can pipe the output of
ls into head, like this

```sh
user@host:~/Documents/notes$ ls | head -n 3
asado.png
monday.txt
sample.txt
```

Now let's get back to our sample.txt file. Let's imagine that we first want to
sort our lines, and we want to preserve only those lines that contain "cool" or
"lame". Then let's suppose we want to modify to contain legit terms, and not
some antiquated boomer slang, so we want to replace cool with dank, and lame
with normie. Finally we want that to be output to a file instead of the screen.
Whew! Sounds like a lot of stuff to do, but it is quite simple, and it looks
like this

```sh
user@host:~/Documents/notes$ egrep 'cool|lame' sample.txt | sort | sed 's/cool/dank/g;s/lame/normie/g' > memes.txt
```

So if we now take a look at the file...

```sh
user@host:~/Documents/notes$ cat memes.txt
Bowsette    dank
Despacito   normie
Harold  dank
Minions normiest
NPC dank
Pepe    dank
Sans    dankest
Thanos  dank
Tide Pods   normie
Uganda Knuckles dank
```

And that's basically it.

## Post scriptum

Before ending it for good, I want to show some other programs that might be of
use in the Bash command line.

### less

This command might come in handy when there's another command that outputs a
lot of text that overfills the terminal screen. You can pipe (as we have just
learned) the output of that command to less, so that you can navigate with your
arrow keys, or better yet with vim keys (hjkl). You can also search for terms
by typing slash (/), just like with man.

### tar

This program is used in Linux to create and extract archives with the .tar
format, usually also compressing them using gunzip (.gz).

There usually two ways you will be using the program. One to extract files from
a compressed archive

```sh
user@host:~/Documents/notes$ tar -xzvf oldnotes.tar.gz
```

And the other to archive and compress files

```sh
user@host:~/Documents/notes$ tar -czvf allnotes.tar.gz *
```

To learn more about the different options of this program, I recommend you
check the man pages of tar ('man tar').

### ssh and scp

You may have already heard about ssh, which stands for "secure shell", even if
you are new to Linux or Unix/Unix-like systems. This program is used to connect
to other computers over a network (or the internet for instance), especially to
servers.

Let's suppose that you have a server with ip address 180.80.8.20 and your user
is tux

```sh
user@host:~$ ssh tux@180.80.8.20
```

Of course, here we have assumed that the standard ssh port (22) is being used,
otherwise you will have to specify it by passing -p followed by the port
number.

Now let's talk about scp, which stands for "secure copy". This command uses the
same protocol as ssh, and it's used to copy files from one computer to another
over a network. Let's suppose that you want to copy a file from your current
computer to the server we used in the previous example

```sh
user@host:~$ scp somefile tux@180.80.8.20:/home/tux/directory/
```

If we were trying to do it the other way around, that is, from the remote
computer to your local computer, it would look like this

```sh
user@host:~$ scp tux@180.80.8.20:/home/tux/directory/somefile directory/
```

Just as with ssh, if you are not using standard port 22, you need to say to the
program to which port you are trying to connect, except that in the case of
scp, the flag is '-P' instead of '-p', and goes right after "scp".

Well, that's it for this tutorial/guide series, I really hope it was of use to
you.