Shell Basics
The Linux bash shell is a powerful command-line interface that provides users with a wide range of functionality. Whether you're a seasoned developer or a newcomer to the world of Linux, learning the basics of the bash shell is essential to getting the most out of your Linux experience. This guide will walk you through the most common bash shell commands, giving you a solid foundation to build upon as you continue to learn and explore. By the end of this guide, you'll be able to navigate the file system, manage files and directories, and perform a variety of other tasks from the command line. So, without further ado, let's dive in and start learning the basics of the Linux bash shell!
A very useful reference guide (and recommended reading!) is Google's official shell guide found here: https://google.github.io/styleguide/shellguide.html
cd
Allows you to change the current working directory.
The cd
(change directory) command is used to navigate the file system in the bash shell. You can use it to move to different directories on your system. For example, to move to the home directory, you would run the following command:
$ cd ~
To move to a specific directory, you need to provide the path to that directory. For example:
$ cd /var/www/html
man
The man
(manual) command is used to display the manual pages for a specific command in the bash shell. The manual pages contain detailed information about the command, including its syntax, options, and usage examples. To display the manual page for a command, you simply need to type man
followed by the command name. For example, to view the manual page for the ls
command, you would run the following command:
$ man ls
The manual pages are organized into sections, with each section providing different types of information about the command. Some of the most common sections include:
- NAME: a brief description of the command
- SYNOPSIS: the syntax for using the command
- DESCRIPTION: a detailed explanation of the command and its options
- OPTIONS: a list of the available options for the command
- EXAMPLES: examples of how to use the command
The manual pages are a valuable resource for learning about the various commands available in the bash shell, and are a must-know tool for anyone looking to become proficient with the Linux command line.
& and wait
In bash, the &
symbol and wait
are used to manage the execution of commands and enable parallel processing of tools.
&
symbol: The&
symbol is used to run a command in the background, allowing the shell to continue executing other commands while the background command is running. For example:
$ command1 &
$ command2
In this example, command1
is run in the background and command2
is executed immediately after, without waiting for command1
to finish. 2. wait
: The wait
command is used to wait for background commands to finish before executing subsequent commands. For example:
$ command1 &
$ wait
$ command2
In this example, command1
is run in the background, and wait is used to wait for command1
to finish before executing command2
. 3. Parallel processing refers to the execution of multiple commands simultaneously, in order to reduce the total time required to complete all of the commands. In bash, you can use the &
symbol and wait to run commands in the background and control the order in which they are executed. For example:
Parallel processing: Parallel processing refers to the execution of multiple commands simultaneously, in order to reduce the total time required to complete all of the commands. In bash, you can use the &
symbol and wait
to run commands in the background and control the order in which they are executed. For example:
$ command1 &
$ command2 &
$ wait
$ command3
In this example, command1
and command2
are run in parallel in the background, and wait
is used to wait for both commands to finish before executing command3
.
Note: Running commands in parallel can be a powerful technique to improve performance, but it can also increase the complexity of your scripts and make it more difficult to manage the execution order of commands. Use parallel processing with caution and make sure to test your scripts thoroughly before deploying them in a production environment.
. and ..
In Unix-like operating systems, .
and ..
are special directory entries that are used for navigation between directories.
.
: The.
(dot) entry refers to the current directory. For example, if you are in the /home/user directory, the . entry would refer to/home/user
...
: The..
(double dot) entry refers to the parent directory. For example, if you are in the/home/user/docs
directory, the..
entry would refer to/home/user
.
These entries can be used to navigate between directories, either by using the cd
command or by specifying the path to a file or directory. For example, to move from the current directory to its parent directory, you could use the following command:
$ cd ..
Note that .
and ..
are relative paths, meaning that their meaning depends on the current directory. The absolute paths for these entries can be obtained by using the pwd command to print the current working directory, and then concatenating the relative path with the absolute path.
~
~
is the home directory (/home/username
by default).
*
The *
(asterisk) is a wildcard character in Unix-like operating systems that is used to match zero or more characters in a file or directory name.
- Usage: The
*
wildcard is used in many contexts, including file name expansion, pattern matching, and command line arguments. For example, to list all of the files in the current directory that have a.txt
extension, you could use the following command:
$ ls *.txt
In this example, the *
wildcard matches zero or more characters in the file name, and the resulting list of files will include all files in the current directory that have names ending in .txt
.
- Multiple Wildcards: The
*
wildcard can also be used in combination with other characters to match more specific patterns. For example:
$ ls file*
In this example, the *
wildcard matches zero or more characters that come after the string file, and the resulting list of files will include all files in the current directory that start with the string file.
Note: The wildcard character *
is a powerful tool that can be used to perform operations on a large number of files or directories, but it can also be dangerous if used improperly. Be careful when using wildcards, especially when executing commands that modify or delete files, as it's possible to inadvertently affect more files than intended.
ls
ls
: The ls
(list) command is used to display the contents of a directory. By default, it shows the files and directories in the current directory. For example:
$ ls
You can also use the ls command to view the contents of a specific directory. For example:
$ ls /var/www/html
Running ls with -s flag will list files or directories with their sizes (in kilobytes).
$ ls -s
Running with -alth
flag shows a more detailed listing of files ordered by modification time along with file attributes, permissions, ownership, and file sizes:
$ ls -alth
cat
The cat
(concatenate) command is a basic Unix utility that is used to concatenate and display the contents of one or more files. The most common use cases of cat
include:
- Display the contents of a file: To display the contents of a file, simply run
cat
followed by the file name:
$ cat file.txt
This will print the contents of file.txt
to the standard output (typically, the terminal).
- Concatenate multiple files:
cat
can also be used to concatenate multiple files into a single output. For example:
$ cat file1.txt file2.txt > output.txt
In this example, the contents of file1.txt
and file2.txt
are concatenated into a single output file output.txt
.
- Combine standard input and files:
cat
can also be used to combine standard input (from the keyboard) with the contents of one or more files. For example:
$ cat > output.txt
In this example, the contents of standard input (typed on the keyboard) are redirected to the file output.txt
.
Note: The cat
command is simple, but versatile and widely used in Unix-like systems. However, be careful when using it, especially with large files, as it can cause performance problems or fill up the disk space quickly.
column
The column
tool is a Unix command line utility that is used to format and display tabular data in a readable and organized manner.
Here's a simple explanation of how column
works:
- Input: The column command takes its input from either a file or standard input. The input data must be separated by a delimiter, such as a tab or a comma, to indicate the columns.
- Output: The output of the
column
command is a table of the input data, where each column is properly aligned and separated by a specified delimiter. By default, the delimiter is a tab character. - Options: The
column
command has several options that can be used to customize its output. Some of the most commonly used options are:
- -c: Specify the number of columns in the output.
- -t: Specify the delimiter to use between columns.
- -s: Specify the delimiter to use between rows.
- -n: Treat the input as numeric data and sort it accordingly.
- Usage: Here is an example of how you can use the
column
command to format and display tabular data:
$ cat genotype_report.txt
ID CallRate #AA #AB #BB #NC %AA %AB %BB %NC
27813162 0.185837 141 192 142 2081 0.06 0.08 0.06 0.81
30282025 0.185837 115 240 120 2081 0.04 0.09 0.05 0.81
21928702 0.996870 596 951 1001 8 0.23 0.37 0.39 0.00
27942765 0.241393 142 292 183 1939 0.06 0.11 0.07 0.76
Which is a little hard to track columns. Consider instead:
$ cat genotype_report.txt | column -t
ID CallRate #AA #AB #BB #NC %AA %AB %BB %NC
27813162 0.185837 141 192 142 2081 0.06 0.08 0.06 0.81
30282025 0.185837 115 240 120 2081 0.04 0.09 0.05 0.81
21928702 0.996870 596 951 1001 8 0.23 0.37 0.39 0.00
27942765 0.241393 142 292 183 1939 0.06 0.11 0.07 0.76
find
The find
command is a powerful tool in Unix-like operating systems that is used to search for files and directories based on certain criteria.
Here's a simple explanation of how find
works:
- Syntax: The basic syntax of the
find
command is:
$ find [path] [expression]
Where [path]
is the location to start the search, and [expression]
is a set of options and tests used to filter the search results.
- Options: The
find
command has several options that can be used to customize the search results. Some of the most commonly used options are:
- -name: Search for files and directories with a specific name.
- -type: Search for files of a specific type, such as regular files, directories, links, etc.
- -mtime: Search for files based on their modification time.
- -exec: Execute a command on the search results.
- Usage: Here is an example of how you can use the find command to search for files with a specific name in the current directory and all of its subdirectories:
$ find . -name "*.txt"
./file1.txt
./dir1/file2.txt
./dir2/file3.txt
In this example, the find
command starts the search in the current directory (.
) and searches for files with the .txt
extension. The search results are displayed on the screen.
Note: The find
command is a very powerful and flexible tool, and it has many more options and capabilities that have not been covered in this simple explanation. To learn more about the find
command, refer to the official documentation or other online resources.
grep
grep: The grep command is used to search for patterns in text files in the bash shell. It is a powerful tool for searching through large amounts of text data, making it ideal for tasks such as finding specific information in log files or code. The basic syntax for using grep is as follows:
$ grep [pattern] [file]
where pattern
is the pattern you want to search for and file
is the name of the file you want to search in. For example:
$ grep "error" logfile.log
This will search the file logfile.log
for lines containing the word "error".
grep
has many options that allow you to customize your search, such as searching for patterns in multiple files, searching recursively through directories, and displaying only the matching lines. Some common options include:
-r
: search recursively through directories-i
: perform a case-insensitive search-l
: list the names of the files that contain the matching pattern, but not the actual lines that match.-c
: count the number of matches.-n
: print the line number of each match.
With grep
, you can quickly find the information you need in large text data sets, making it a valuable tool for any bash shell user.
stdout, stdin, and pipes
stdout
, stdin
, and |
(pipes): In the bash shell, stdout
(standard output) and stdin
(standard input) are two streams of data that are used by the shell and its commands to exchange information.
stdout
is the default destination for the output of a command. When you run a command in the bash shell, the result is sent to stdout
and displayed on the screen. For example, when you run the ls
command, the list of files in the current directory is sent to stdout
and displayed on the screen.
stdin
is the default source of input for a command. When a command requires input, it reads the data from stdin
. For example, when you run the sort
command, it will wait for input from stdin
and sort the data that is entered.
Pipes are a powerful feature of the bash shell that allow you to connect the stdout of one command to the stdin of another command. This allows you to chain multiple commands together to perform complex processing tasks. For example, to sort the output of the ls
command and display the result, you would use the following pipeline:
$ ls | sort
In this pipeline, the ls
command sends its output to stdout
, which is then redirected to the stdin of the sort command. The sort
command sorts the data and sends its output to stdout, which is then displayed on the screen.
Pipes are a highly versatile feature of the bash shell that allow you to combine simple commands to perform complex processing tasks. They are an essential tool for anyone looking to become proficient with the Linux command line.
To print a pedigree file to stdout
and then pipe it into the column
formatting tool:
$ cat pedigree.txt | column -t
head
Prints the first n lines of a file/input.
$ head [file]
Can specify number of lines using -n
flag.
mkdir
Creates a new directory.
$ mkdir [my-directory]
mv
Used to move files and directories. Can also rename files and directories.
# move files/directories to new destionation
$ mv [files/directories] [destination]
# rename file/directory
$ mv [file/directory] [new name]
cp
Copy files and directories.
$ cp [files/directories] [destination]
pwd
Print working directory will output the path of the current working directory.
$ pwd
sudo
Sudo is used to temporaily gain/give administrative privileges. If any command is restricted, use sudo before the given command. It will then prompt for your password.
$ sudo [command]
rm
rm
: The rm
(remove) command is used to delete files and directories in the bash shell. It is important to be cautious when using this command, as it permanently deletes the specified files and directories and they cannot be recovered. For example, to delete a file named file.txt, you would run the following command:
$ rm file.txt
To delete a directory, you need to use the -r (recursive) option. For example:
# simple file remove
$ rm [path/to/file]
# remove directory
$ rm -r [path/to/directory]
# remove multiple files
$ rm [path/to/file1] [path/to/file2]
wc
Count the number of lines, words, characters, and bytes in a file or input.
$ wc [options] [file]
For example:
$wc pedigree.txt
Will look something like this:
397 4222 22566 /pedigree.txt
- 397 is the number of lines.
- 4222 is the number of words.
- 22566 is the number of characters.
You can also pass in multiple files.
Important flags:
- -l, --lines - Print the number of lines.
- -w, --words - Print the number of words.
- -m, --chars - Print the number of characters.
- -c, --bytes - Print the number of bytes.
awk
awk
: The awk
command is a powerful text processing tool in the bash shell. It is used to manipulate and extract information from text files based on patterns or conditions. The basic syntax for using awk
is as follows:
$ awk 'program' file
where program
is the set of instructions for processing the data and file
is the name of the file you want to process.
In its simplest form, awk
operates on each line of a file and performs actions based on the content of the line. For example, to print the first field of each line in a file, you would use the following awk
command:
$ awk '{print $1}' file
awk
also supports more complex processing, such as filtering data based on conditions and performing arithmetic operations. For example, to print the lines in a file that contain the word "error" and calculate the average value of the third field, you would use the following awk
command:
$ awk '/error/ {sum+=$3; count++} END {print "Average:", sum/count}' file
awk
is a highly flexible and versatile tool that can be used to perform a wide range of text processing tasks. Whether you need to extract information from log files, manipulate CSV data, or perform complex data processing, awk is an essential tool for any bash shell user.
For example, lets say you have this pedigree file:
9997515 5017534 7651593
9939593 5765357 6245551
9998766 5455357 7736346
9954672 5017534 6245551
9999762 5455357 6245551
To print out all lines of the file:
$ awk '{print}' pedigree.txt
If you wanted to see every line with a certain id in the first column, you could do this:
$ awk '{if ($1 == "9997515") print}' pedigree.txt
Outputs:
9997515 5017534 7651593
Imagine you wanted to find the id of an animal that had a particular sire and dam. You could do this:
$ awk '{if($2 == "5017534" && $3 == "7651593") {print}}' pedigree.txt
Outputs:
9997515 5017534 7651593
To print certain columns consecutively, you can use multiple print commands. $0 means to print every column, then $1 means to print the first column. For example:
$ awk '{print $1; print $2}' pedigree.txt
will print all animals and their sires consecutively to stdout
.
join
The join
command is a Unix utility that is used to combine two files based on the values in a common field, much like a relational database join operation. The join
command is useful for combining data from two or more files into a single output, based on a common field in each file.
- Basic Usage: The basic syntax of the
join
command is:
$ join [OPTION]... FILE1 FILE2
where FILE1
and FILE2
are the two files to be joined, and [OPTION]...
are optional flags that modify the behavior of the command.
By default, join
combines the two files based on the first field in each file (delimited by whitespace). For example:
$ join file1.txt file2.txt
This will output the combined data, with each line representing a combination of a matching line from file1.txt and file2.txt.
- Options:
join
supports a variety of options that can be used to modify its behavior, including:
-j
: Specifies the field to use for joining the two files (default is 1).-t
: Specifies the delimiter used to separate fields in the input files (default is tab).-a
: Specifies which file(s) to include all lines from, even if there is no match in the other file.
- Input Format:
join
operates on plain text files, and assumes that each line of the input files contains one or more fields separated by a delimiter (default is tab). The tool can handle numeric, string, and date/time data, and can perform joins on any field in the files. Note:join
is a powerful tool for combining data from two or more files, but it is limited to combining data based on a common field in each file. If you need to perform more complex data manipulations, consider using a tool likeawk
orsed
.
datamash
datamash
is a command-line tool for performing simple numerical and statistical operations on text data, such as finding the sum, average, minimum, or maximum of a column of numbers in a text file.
- Basic Usage: The basic syntax of the
datamash
command is:
$ datamash [OPTION]... [OPERATION] [INPUT FILE]
where [OPTION]...
are optional flags that modify the behavior of the command, [OPERATION]
is the operation to be performed (such as sum
, mean
, min
, or max
), and [INPUT FILE]
is the file containing the data.
For example, to find the sum
of the first column consisting of numbers in a file data.txt
, you could use the following command:
$ datamash sum 1 < data.txt
- Options:
datamash
supports a variety of options that can be used to modify its behavior, including:
-t
: Specifies the delimiter used to separate fields in the input file (default is tab).-g
: Specifies the grouping field(s) used to perform operations on a subset of the data.-s
: Specifies the sorting field(s) used to sort the data before performing the operation.
- Input Format:
datamash
operates on plain text files, and assumes that each line of the input file contains one or more fields separated by a delimiter (default is tab). The tool can handle numeric, string, and date/time data, and can perform operations on any column of data in the file.
sed
sed
is a Unix stream editor that is used to perform text transformations on an input stream (such as a file or output from a command). sed
is a powerful tool that can perform a wide variety of text transformations, including substitution, deletion, insertion, and search and replace operations.
- Basic Usage: The basic syntax of the
sed
command is:
$ sed [OPTION]... 'SCRIPT' [INPUT FILE]
where [OPTION]...
are optional flags that modify the behavior of the command, 'SCRIPT'
is a script that defines the transformations to be performed, and [INPUT FILE]
is the file or input stream to be transformed (if no input file is specified, sed
operates on the standard input).
For example, to perform a simple search and replace operation on a file file.txt
, you could use the following command:
$ sed 's/old/new/g' file.txt
This will replace all occurrences of the string "old" with the string "new" in the file file.txt
.
- Script Syntax: The script passed to
sed
as an argument defines the transformations to be performed. The script consists of a series of commands, each of which is applied to each line of the input file. The most commonly usedsed
commands include:
s/old/new/g
: Substitutes all occurrences of "old" with "new" on each line of the input.d
: Deletes the current line.a\text
: Appends the specified text to the current line.i\text
: Inserts the specified text before the current line.
- Options: sed supports a variety of options that can be used to modify its behavior, including:
-n
: Suppresses the default output of each line after it has been processed by the script.-e
: Specifies that multiple scripts can be passed to sed as arguments.-i
: Modifies the input file in place, rather than writing the output to standard output.
- Input Format:
sed
operates on plain text files or input streams, and can handle any type of text data, including numeric, string, and date/time data.