Shell Basics

The Linux bash shell is a powerful command-line interface that provides users with a wide range of functionality. Whether you're a seasoned developer or a newcomer to the world of Linux, learning the basics of the bash shell is essential to getting the most out of your Linux experience. This guide will walk you through the most common bash shell commands, giving you a solid foundation to build upon as you continue to learn and explore. By the end of this guide, you'll be able to navigate the file system, manage files and directories, and perform a variety of other tasks from the command line. So, without further ado, let's dive in and start learning the basics of the Linux bash shell!

A very useful reference guide (and recommended reading!) is Google's official shell guide found here: https://google.github.io/styleguide/shellguide.html

cd

Allows you to change the current working directory.

The cd (change directory) command is used to navigate the file system in the bash shell. You can use it to move to different directories on your system. For example, to move to the home directory, you would run the following command:

$ cd ~

To move to a specific directory, you need to provide the path to that directory. For example:

$ cd /var/www/html

man

The man (manual) command is used to display the manual pages for a specific command in the bash shell. The manual pages contain detailed information about the command, including its syntax, options, and usage examples. To display the manual page for a command, you simply need to type man followed by the command name. For example, to view the manual page for the ls command, you would run the following command:

$ man ls

The manual pages are organized into sections, with each section providing different types of information about the command. Some of the most common sections include:

NAME: a brief description of the command
SYNOPSIS: the syntax for using the command
DESCRIPTION: a detailed explanation of the command and its options
OPTIONS: a list of the available options for the command
EXAMPLES: examples of how to use the command

The manual pages are a valuable resource for learning about the various commands available in the bash shell, and are a must-know tool for anyone looking to become proficient with the Linux command line.

& and wait

In bash, the & symbol and wait are used to manage the execution of commands and enable parallel processing of tools.

& symbol: The & symbol is used to run a command in the background, allowing the shell to continue executing other commands while the background command is running. For example:

$ command1 &
$ command2

In this example, command1 is run in the background and command2 is executed immediately after, without waiting for command1 to finish. 2. wait: The wait command is used to wait for background commands to finish before executing subsequent commands. For example:

$ command1 &
$ wait
$ command2

In this example, command1 is run in the background, and wait is used to wait for command1 to finish before executing command2. 3. Parallel processing refers to the execution of multiple commands simultaneously, in order to reduce the total time required to complete all of the commands. In bash, you can use the & symbol and wait to run commands in the background and control the order in which they are executed. For example: Parallel processing: Parallel processing refers to the execution of multiple commands simultaneously, in order to reduce the total time required to complete all of the commands. In bash, you can use the & symbol and wait to run commands in the background and control the order in which they are executed. For example:

$ command1 &
$ command2 &
$ wait
$ command3

In this example, command1 and command2 are run in parallel in the background, and wait is used to wait for both commands to finish before executing command3.

Note: Running commands in parallel can be a powerful technique to improve performance, but it can also increase the complexity of your scripts and make it more difficult to manage the execution order of commands. Use parallel processing with caution and make sure to test your scripts thoroughly before deploying them in a production environment.

. and ..

In Unix-like operating systems, . and .. are special directory entries that are used for navigation between directories.

.: The . (dot) entry refers to the current directory. For example, if you are in the /home/user directory, the . entry would refer to /home/user.
..: The .. (double dot) entry refers to the parent directory. For example, if you are in the /home/user/docs directory, the .. entry would refer to /home/user.

These entries can be used to navigate between directories, either by using the cd command or by specifying the path to a file or directory. For example, to move from the current directory to its parent directory, you could use the following command:

$ cd ..

Note that . and .. are relative paths, meaning that their meaning depends on the current directory. The absolute paths for these entries can be obtained by using the pwd command to print the current working directory, and then concatenating the relative path with the absolute path.

~

~ is the home directory (/home/username by default).

*

The * (asterisk) is a wildcard character in Unix-like operating systems that is used to match zero or more characters in a file or directory name.

Usage: The * wildcard is used in many contexts, including file name expansion, pattern matching, and command line arguments. For example, to list all of the files in the current directory that have a .txt extension, you could use the following command:

$ ls *.txt

In this example, the * wildcard matches zero or more characters in the file name, and the resulting list of files will include all files in the current directory that have names ending in .txt.

Multiple Wildcards: The * wildcard can also be used in combination with other characters to match more specific patterns. For example:

$ ls file*

In this example, the * wildcard matches zero or more characters that come after the string file, and the resulting list of files will include all files in the current directory that start with the string file.

Note: The wildcard character * is a powerful tool that can be used to perform operations on a large number of files or directories, but it can also be dangerous if used improperly. Be careful when using wildcards, especially when executing commands that modify or delete files, as it's possible to inadvertently affect more files than intended.

ls

ls: The ls (list) command is used to display the contents of a directory. By default, it shows the files and directories in the current directory. For example:

$ ls

You can also use the ls command to view the contents of a specific directory. For example:

$ ls /var/www/html

Running ls with -s flag will list files or directories with their sizes (in kilobytes).

$ ls -s

Running with -alth flag shows a more detailed listing of files ordered by modification time along with file attributes, permissions, ownership, and file sizes:

$ ls -alth

cat

The cat (concatenate) command is a basic Unix utility that is used to concatenate and display the contents of one or more files. The most common use cases of cat include:

Display the contents of a file: To display the contents of a file, simply run cat followed by the file name:

$ cat file.txt

This will print the contents of file.txt to the standard output (typically, the terminal).

Concatenate multiple files: cat can also be used to concatenate multiple files into a single output. For example:

$ cat file1.txt file2.txt > output.txt

In this example, the contents of file1.txt and file2.txt are concatenated into a single output file output.txt.

Combine standard input and files: cat can also be used to combine standard input (from the keyboard) with the contents of one or more files. For example:

$ cat > output.txt

In this example, the contents of standard input (typed on the keyboard) are redirected to the file output.txt.

Note: The cat command is simple, but versatile and widely used in Unix-like systems. However, be careful when using it, especially with large files, as it can cause performance problems or fill up the disk space quickly.

column

The column tool is a Unix command line utility that is used to format and display tabular data in a readable and organized manner.

Here's a simple explanation of how column works:

Input: The column command takes its input from either a file or standard input. The input data must be separated by a delimiter, such as a tab or a comma, to indicate the columns.
Output: The output of the column command is a table of the input data, where each column is properly aligned and separated by a specified delimiter. By default, the delimiter is a tab character.
Options: The column command has several options that can be used to customize its output. Some of the most commonly used options are:

-c: Specify the number of columns in the output.
-t: Specify the delimiter to use between columns.
-s: Specify the delimiter to use between rows.
-n: Treat the input as numeric data and sort it accordingly.

Usage: Here is an example of how you can use the column command to format and display tabular data:

$ cat genotype_report.txt
ID CallRate #AA #AB #BB #NC %AA %AB %BB %NC
27813162 0.185837 141 192 142 2081 0.06 0.08 0.06 0.81
30282025 0.185837 115 240 120 2081 0.04 0.09 0.05 0.81
21928702 0.996870 596 951 1001 8 0.23 0.37 0.39 0.00
27942765 0.241393 142 292 183 1939 0.06 0.11 0.07 0.76

Which is a little hard to track columns. Consider instead:

$ cat genotype_report.txt | column -t
ID        CallRate  #AA  #AB  #BB   #NC   %AA   %AB   %BB   %NC
27813162  0.185837  141  192  142   2081  0.06  0.08  0.06  0.81
30282025  0.185837  115  240  120   2081  0.04  0.09  0.05  0.81
21928702  0.996870  596  951  1001  8     0.23  0.37  0.39  0.00
27942765  0.241393  142  292  183   1939  0.06  0.11  0.07  0.76

find

The find command is a powerful tool in Unix-like operating systems that is used to search for files and directories based on certain criteria.

Here's a simple explanation of how find works:

Syntax: The basic syntax of the find command is:

$ find [path] [expression]

Where [path] is the location to start the search, and [expression] is a set of options and tests used to filter the search results.

Options: The find command has several options that can be used to customize the search results. Some of the most commonly used options are:

-name: Search for files and directories with a specific name.
-type: Search for files of a specific type, such as regular files, directories, links, etc.
-mtime: Search for files based on their modification time.
-exec: Execute a command on the search results.

Usage: Here is an example of how you can use the find command to search for files with a specific name in the current directory and all of its subdirectories:

$ find . -name "*.txt"
./file1.txt
./dir1/file2.txt
./dir2/file3.txt

In this example, the find command starts the search in the current directory (.) and searches for files with the .txt extension. The search results are displayed on the screen.

Note: The find command is a very powerful and flexible tool, and it has many more options and capabilities that have not been covered in this simple explanation. To learn more about the find command, refer to the official documentation or other online resources.

grep

grep: The grep command is used to search for patterns in text files in the bash shell. It is a powerful tool for searching through large amounts of text data, making it ideal for tasks such as finding specific information in log files or code. The basic syntax for using grep is as follows:

$ grep [pattern] [file]

where pattern is the pattern you want to search for and file is the name of the file you want to search in. For example:

$ grep "error" logfile.log

This will search the file logfile.log for lines containing the word "error".

grep has many options that allow you to customize your search, such as searching for patterns in multiple files, searching recursively through directories, and displaying only the matching lines. Some common options include:

-r: search recursively through directories
-i: perform a case-insensitive search
-l: list the names of the files that contain the matching pattern, but not the actual lines that match.
-c: count the number of matches.
-n: print the line number of each match.

With grep, you can quickly find the information you need in large text data sets, making it a valuable tool for any bash shell user.

stdout, stdin, and pipes

stdout, stdin, and | (pipes): In the bash shell, stdout (standard output) and stdin (standard input) are two streams of data that are used by the shell and its commands to exchange information. stdout is the default destination for the output of a command. When you run a command in the bash shell, the result is sent to stdout and displayed on the screen. For example, when you run the ls command, the list of files in the current directory is sent to stdout and displayed on the screen.

stdin is the default source of input for a command. When a command requires input, it reads the data from stdin. For example, when you run the sort command, it will wait for input from stdin and sort the data that is entered.

Pipes are a powerful feature of the bash shell that allow you to connect the stdout of one command to the stdin of another command. This allows you to chain multiple commands together to perform complex processing tasks. For example, to sort the output of the ls command and display the result, you would use the following pipeline:

$ ls | sort

In this pipeline, the ls command sends its output to stdout, which is then redirected to the stdin of the sort command. The sort command sorts the data and sends its output to stdout, which is then displayed on the screen.

Pipes are a highly versatile feature of the bash shell that allow you to combine simple commands to perform complex processing tasks. They are an essential tool for anyone looking to become proficient with the Linux command line.

To print a pedigree file to stdout and then pipe it into the column formatting tool:

$ cat pedigree.txt | column -t

head

Prints the first n lines of a file/input.

$ head [file]

Can specify number of lines using -n flag.

mkdir

Creates a new directory.

$ mkdir [my-directory]

mv

Used to move files and directories. Can also rename files and directories.

# move files/directories to new destionation
$ mv [files/directories] [destination]
# rename file/directory
$ mv [file/directory] [new name]

cp

Copy files and directories.

$ cp [files/directories] [destination]

pwd

Print working directory will output the path of the current working directory.

$ pwd

sudo

Sudo is used to temporaily gain/give administrative privileges. If any command is restricted, use sudo before the given command. It will then prompt for your password.

$ sudo [command]

rm

rm: The rm (remove) command is used to delete files and directories in the bash shell. It is important to be cautious when using this command, as it permanently deletes the specified files and directories and they cannot be recovered. For example, to delete a file named file.txt, you would run the following command:

$ rm file.txt

To delete a directory, you need to use the -r (recursive) option. For example:

# simple file remove
$ rm [path/to/file]
# remove directory
$ rm -r [path/to/directory]
# remove multiple files
$ rm [path/to/file1] [path/to/file2]

wc

Count the number of lines, words, characters, and bytes in a file or input.

$ wc [options] [file]

For example:

$wc pedigree.txt

Will look something like this:

397 4222 22566 /pedigree.txt

397 is the number of lines.
4222 is the number of words.
22566 is the number of characters.

You can also pass in multiple files.

Important flags:

-l, --lines - Print the number of lines.
-w, --words - Print the number of words.
-m, --chars - Print the number of characters.
-c, --bytes - Print the number of bytes.

awk

awk: The awk command is a powerful text processing tool in the bash shell. It is used to manipulate and extract information from text files based on patterns or conditions. The basic syntax for using awk is as follows:

$ awk 'program' file

where program is the set of instructions for processing the data and file is the name of the file you want to process.

In its simplest form, awk operates on each line of a file and performs actions based on the content of the line. For example, to print the first field of each line in a file, you would use the following awk command:

$ awk '{print $1}' file

awk also supports more complex processing, such as filtering data based on conditions and performing arithmetic operations. For example, to print the lines in a file that contain the word "error" and calculate the average value of the third field, you would use the following awk command:

$ awk '/error/ {sum+=$3; count++} END {print "Average:", sum/count}' file

awk is a highly flexible and versatile tool that can be used to perform a wide range of text processing tasks. Whether you need to extract information from log files, manipulate CSV data, or perform complex data processing, awk is an essential tool for any bash shell user.

For example, lets say you have this pedigree file:

9997515 5017534 7651593
9939593 5765357 6245551
9998766 5455357 7736346
9954672 5017534 6245551
9999762 5455357 6245551

To print out all lines of the file:

$ awk '{print}' pedigree.txt

If you wanted to see every line with a certain id in the first column, you could do this:

$ awk '{if ($1 == "9997515") print}' pedigree.txt

Outputs:

9997515 5017534 7651593

Imagine you wanted to find the id of an animal that had a particular sire and dam. You could do this:

$ awk '{if($2 == "5017534" && $3 == "7651593") {print}}' pedigree.txt

Outputs:

9997515 5017534 7651593

To print certain columns consecutively, you can use multiple print commands. $0 means to print every column, then $1 means to print the first column. For example:

$ awk '{print $1; print $2}' pedigree.txt

will print all animals and their sires consecutively to stdout.

join

The join command is a Unix utility that is used to combine two files based on the values in a common field, much like a relational database join operation. The join command is useful for combining data from two or more files into a single output, based on a common field in each file.

Basic Usage: The basic syntax of the join command is:

$ join [OPTION]... FILE1 FILE2

where FILE1 and FILE2 are the two files to be joined, and [OPTION]... are optional flags that modify the behavior of the command.

By default, join combines the two files based on the first field in each file (delimited by whitespace). For example:

$ join file1.txt file2.txt

This will output the combined data, with each line representing a combination of a matching line from file1.txt and file2.txt.

Options: join supports a variety of options that can be used to modify its behavior, including:

-j: Specifies the field to use for joining the two files (default is 1).
-t: Specifies the delimiter used to separate fields in the input files (default is tab).
-a: Specifies which file(s) to include all lines from, even if there is no match in the other file.

Input Format: join operates on plain text files, and assumes that each line of the input files contains one or more fields separated by a delimiter (default is tab). The tool can handle numeric, string, and date/time data, and can perform joins on any field in the files. Note: join is a powerful tool for combining data from two or more files, but it is limited to combining data based on a common field in each file. If you need to perform more complex data manipulations, consider using a tool like awk or sed.

datamash

datamash is a command-line tool for performing simple numerical and statistical operations on text data, such as finding the sum, average, minimum, or maximum of a column of numbers in a text file.

Basic Usage: The basic syntax of the datamash command is:

$ datamash [OPTION]... [OPERATION] [INPUT FILE]

where [OPTION]... are optional flags that modify the behavior of the command, [OPERATION] is the operation to be performed (such as sum, mean, min, or max), and [INPUT FILE] is the file containing the data.

For example, to find the sum of the first column consisting of numbers in a file data.txt, you could use the following command:

$ datamash sum 1 < data.txt

Options: datamash supports a variety of options that can be used to modify its behavior, including:

-t: Specifies the delimiter used to separate fields in the input file (default is tab).
-g: Specifies the grouping field(s) used to perform operations on a subset of the data.
-s: Specifies the sorting field(s) used to sort the data before performing the operation.

Input Format: datamash operates on plain text files, and assumes that each line of the input file contains one or more fields separated by a delimiter (default is tab). The tool can handle numeric, string, and date/time data, and can perform operations on any column of data in the file.

sed

sed is a Unix stream editor that is used to perform text transformations on an input stream (such as a file or output from a command). sed is a powerful tool that can perform a wide variety of text transformations, including substitution, deletion, insertion, and search and replace operations.

Basic Usage: The basic syntax of the sed command is:

$ sed [OPTION]... 'SCRIPT' [INPUT FILE]

where [OPTION]... are optional flags that modify the behavior of the command, 'SCRIPT' is a script that defines the transformations to be performed, and [INPUT FILE] is the file or input stream to be transformed (if no input file is specified, sed operates on the standard input).

For example, to perform a simple search and replace operation on a file file.txt, you could use the following command:

$ sed 's/old/new/g' file.txt

This will replace all occurrences of the string "old" with the string "new" in the file file.txt.

Script Syntax: The script passed to sed as an argument defines the transformations to be performed. The script consists of a series of commands, each of which is applied to each line of the input file. The most commonly used sed commands include:

s/old/new/g: Substitutes all occurrences of "old" with "new" on each line of the input.
d: Deletes the current line.
a\text: Appends the specified text to the current line.
i\text: Inserts the specified text before the current line.

Options: sed supports a variety of options that can be used to modify its behavior, including:

-n: Suppresses the default output of each line after it has been processed by the script.
-e: Specifies that multiple scripts can be passed to sed as arguments.
-i: Modifies the input file in place, rather than writing the output to standard output.

Input Format: sed operates on plain text files or input streams, and can handle any type of text data, including numeric, string, and date/time data.

Helical CLI

Linux, Mac, or Windows?