Jekyll2019-01-29T20:01:50+00:00https://joe-wan.github.io/feed.xmlJoe WanJoe Wan's homepage.Joe WanUnix Command Line for the Molecular Ecologist2017-01-13T00:00:00+00:002017-01-13T00:00:00+00:00https://joe-wan.github.io/blog/unix-command-line<p>Basic familiarity in the Unix command line opens a whole world of bioinformatics
tools and analysis opportunities. This guide introduces the most important
commands and concepts you’ll need to get started.</p>
<h2 id="quick-cheat-sheet">Quick cheat sheet</h2>
<h3 id="navigating-the-file-system">Navigating the file system</h3>
<ul>
<li>Change directories: <code class="highlighter-rouge">cd [PATH]</code></li>
<li>List files in directory: <code class="highlighter-rouge">ls</code> (current directory) or <code class="highlighter-rouge">ls [PATH]</code> (another directory)</li>
<li>See file sizes in a directory: <code class="highlighter-rouge">ls -lh [PATH]</code></li>
<li>Print current directory: <code class="highlighter-rouge">pwd</code></li>
<li>Make a directory: <code class="highlighter-rouge">mkdir [PATH]</code></li>
<li>Remove an empty directory: <code class="highlighter-rouge">rmdir [PATH]</code></li>
</ul>
<h3 id="working-with-files">Working with files</h3>
<p>Basic operations:</p>
<ul>
<li>Move/rename a file: <code class="highlighter-rouge">mv [FILE] [NEW PATH]</code></li>
<li>Delete a file (permanently!): <code class="highlighter-rouge">rm [FILE]</code></li>
<li>Delete an entire directory (permanently!): <code class="highlighter-rouge">rm -r [DIRECTORY]</code></li>
<li>Copy a file: <code class="highlighter-rouge">cp [FILE] [NEW PATH]</code></li>
<li>Copy an entire directory: <code class="highlighter-rouge">cp -r [DIRECTORY] [NEW PATH]</code></li>
</ul>
<p>Useful tricks:</p>
<ul>
<li>Download a file: <code class="highlighter-rouge">wget -O [OUTPUT FILE] [URL]</code> or <code class="highlighter-rouge">curl -o [OUTPUT FILE] [URL]</code> (<code class="highlighter-rouge">wget</code> may not be available in OS X)</li>
<li>View a file: <code class="highlighter-rouge">head -[NUMBER OF LINES] [FILE]</code> or <code class="highlighter-rouge">less [FILE]</code></li>
<li>Create a blank file: <code class="highlighter-rouge">touch [FILE]</code></li>
<li>Edit a file: <code class="highlighter-rouge">nano [FILE]</code></li>
<li>Search within a file: <code class="highlighter-rouge">grep [QUERY] [FILE]</code> or <code class="highlighter-rouge">grep [QUERY] [FILE] -A [AFTER] -B [BEFORE]</code></li>
<li>Create a symbolic link to a file: <code class="highlighter-rouge">ln -s [FILE] [NEW LINK]</code></li>
<li>Wildcards: <code class="highlighter-rouge">*</code> matches any number of characters in a path; <code class="highlighter-rouge">?</code> matches one</li>
</ul>
<h3 id="working-with-programs">Working with programs</h3>
<ul>
<li>General syntax: <code class="highlighter-rouge">[PROGRAM NAME OR PATH] [ARGUMENT 1] [ARGUMENT 2] [...]</code> (quote arguments if they have spaces)</li>
<li>For help: <code class="highlighter-rouge">[PROGRAM] -h</code> or <code class="highlighter-rouge">[PROGRAM] --help</code> or <code class="highlighter-rouge">man [PROGRAM]</code></li>
<li>Redirecting output to file: <code class="highlighter-rouge">[PROGRAM] [ARGUMENTS] > [OUTPUT FILE]</code></li>
<li>
<p>Piping output to another program: <code class="highlighter-rouge">[PROGRAM 1] [ARGUMENTS] | [PROGRAM 2] [ARGUMENTS]</code></p>
</li>
<li>View zipped file: <code class="highlighter-rouge">zcat [FILE] | less</code> (<code class="highlighter-rouge">gzcat</code> on OS X)</li>
<li>Redirect output to log: <code class="highlighter-rouge">[PROGRAM WITH OUTPUT] > [OUTPUT FILE]</code></li>
<li>Redirect output and errors to log: <code class="highlighter-rouge">[PROGRAM WITH OUTPUT] 2>&1 logfile.log</code></li>
</ul>
<p>Writing shell scripts:</p>
<ul>
<li>Begin the script with “shebang” line: <code class="highlighter-rouge"><span class="c">#!/bin/bash</span></code></li>
<li>Use <code class="highlighter-rouge">#</code> for comments</li>
<li>Run with <code class="highlighter-rouge">bash [SCRIPT PATH]</code> or <code class="highlighter-rouge">[SCRIPT PATH]</code> (with <code class="highlighter-rouge">./</code> if in current directory)</li>
<li>If running as <code class="highlighter-rouge">[SCRIPT PATH]</code>, use <code class="highlighter-rouge">chmod +x [SCRIPT PATH]</code> to set permissions</li>
</ul>
<h2 id="why-should-i-use-the-command-line">Why should I use the command line?</h2>
<p>Like the desktop interface you use on your computer every day, the command line
allows you to work with files and programs. Though the commands might feel
clunky at first, you’ll find that using the command line offers several
important benefits:</p>
<ul>
<li><strong>Power.</strong> Many tools can only be used through the command line. This is
particularly the case for bioinformatics tools–to use many cutting-edge
bioinformatic algorithms, you’ll need to run them in the command line.</li>
<li><strong>Flexibility.</strong> With basic familiarity with the command line, you can
combine different tools and automate parts of your analyses.</li>
<li><strong>Reproducibility.</strong> With its simple interface and the ability to save lists
of commands as scripts, the command line is a powerful tool for reproducible
analysis.</li>
<li><strong>Working remotely.</strong> If you want to run analyses remotely (for instance, on
a more powerful machine or a computing cluster) you’ll often need to work
in the command line.</li>
</ul>
<h2 id="what-is-unix-what-is-the-command-line-what-is-bash">What is Unix? What is the command-line? What is <code class="highlighter-rouge">bash</code>?</h2>
<p>Unix is a family of operating systems which share a set of tools and a system
for organizing files and resources. Both Linux and OS X are “Unix-like”
operating systems, and the tools we’ll use in this tutorial should work on both.</p>
<p>When using the command line, you enter commands which are interpreted by a
<strong>shell</strong> or command-line interpreter. The shell reads a command input by the
user, runs the action, then repeats this cycle indefinitely.</p>
<p><code class="highlighter-rouge">bash</code> is the most commonly used shell and is often the default, so this
tutorial focuses on <code class="highlighter-rouge">bash</code> syntax. We’ll just go over basic commands, but
<code class="highlighter-rouge">bash</code> has its own powerful (but clunky) programming language; I recommend
checking some more advanced features (<code class="highlighter-rouge">for</code> loops, <code class="highlighter-rouge">if</code> statements, and
variables come in handy quite often).</p>
<h2 id="1-getting-started-navigating-the-file-system">1. Getting started: navigating the file system</h2>
<p>Let’s get started. Open up a command line terminal and type <code class="highlighter-rouge">ls</code>, then press
enter. You will see a list of files and folders. On my computer, I see:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Desktop
Documents
Downloads
Music
Pictures
Videos
</code></pre></div></div>
<p>The <code class="highlighter-rouge">ls</code> command (short for “list”) lists files and folders in the current
directory. You can change directories with <code class="highlighter-rouge">cd</code> (short for “change
directories”). Enter the command <code class="highlighter-rouge">cd Desktop</code> (use the name of another directory
if you don’t have a <code class="highlighter-rouge">Desktop</code> folder). Then, use <code class="highlighter-rouge">ls</code> again. You should see
the same list of files.</p>
<p>How do we go back? Unix uses <code class="highlighter-rouge">..</code> to refer to the directory above the one we’re
currently in. So, enter <code class="highlighter-rouge">cd ..</code>, then check that we’re back in our starting
location by listing the files with the <code class="highlighter-rouge">ls</code> command.</p>
<p>You’ve seen how to do some basic navigation. Next, we’ll learn how Unix uses
<strong>paths</strong> to specify locations. <code class="highlighter-rouge">Desktop</code> and <code class="highlighter-rouge">..</code> are both examples of paths.
As you may have seen before, a path is a list of folders, separated with <code class="highlighter-rouge">/</code>.
It is very important to distinguish between <strong>absolute</strong> and <strong>relative</strong> paths.</p>
<h3 id="absolute-paths">Absolute paths</h3>
<p>An absolute path refers unambiguously to a single location. Just as the address
“1600 Pennsylvania Ave NW, Washington, DC, USA” refers to a particular building,
an absolute path like <code class="highlighter-rouge">/home/joe/Documents/file.txt</code> refers to a specific
file. <strong>Absolute paths begin with a forward slash <code class="highlighter-rouge">/</code> or a tilde <code class="highlighter-rouge">~</code>.</strong> This is
because in Unix operating systems (including OS X and Linux), all files live
under a <strong>root directory</strong> called <code class="highlighter-rouge">/</code>.</p>
<p>The tilde <code class="highlighter-rouge">~</code> is a special character in absolute paths. Paths beginning with
<code class="highlighter-rouge">~</code> refer to the current user’s home directory (often something like
<code class="highlighter-rouge">/home/USERNAME</code>). Usually, when starting a session in the command line, you
begin in your home directory.</p>
<h3 id="relative-paths">Relative paths</h3>
<p>A relative path gives a location <em>relative</em> to the current location. Just as
the phrase “the person on my right” can refer to different people depending on
where the speaker is, a relative path like <code class="highlighter-rouge">Documents/file.txt</code> only makes
sense if you know the current directory (the <strong>working directory</strong>). If you are
in the directory <code class="highlighter-rouge">/home/joe/</code>, the path <code class="highlighter-rouge">Documents/file.txt</code> refers to a
file named <code class="highlighter-rouge">file.txt</code> inside a directory named <code class="highlighter-rouge">Documents</code> which is in the
current directory–that is, a file with the absolute path
<code class="highlighter-rouge">/home/joe/Documents/file.txt</code>. <strong>If a path doesn’t start with <code class="highlighter-rouge">/</code> or <code class="highlighter-rouge">~</code>, it
is a relative path.</strong></p>
<p>In relative paths, <code class="highlighter-rouge">..</code> and <code class="highlighter-rouge">.</code> have special meanings. The symbol <code class="highlighter-rouge">..</code> is very
useful; it means “the directory above.” So, if we’re in <code class="highlighter-rouge">/home/joe/Documents/</code>,
<code class="highlighter-rouge">../</code> refers to <code class="highlighter-rouge">/home/joe/</code> and <code class="highlighter-rouge">../../</code> refers to <code class="highlighter-rouge">/home/</code>. It can also show
up in the middle of paths: <code class="highlighter-rouge">Documents/../Files/</code> would refer to
<code class="highlighter-rouge">/home/joe/Files/</code>.</p>
<p>Paths starting with <code class="highlighter-rouge">./</code> refer to things inside the current folder, so
<code class="highlighter-rouge">./Documents</code> is the same as <code class="highlighter-rouge">Documents</code>. This may seem pointless, but this is
actually useful in some cases (e.g. when we run programs, as you’ll see later).</p>
<p>With both types of path, be careful about spaces and certain other special
characters (including <code class="highlighter-rouge">&'"$<>()|"</code>). The interpreter treats spaces as
separators, so a path like <code class="highlighter-rouge">/home/joe/Documents/New File.txt</code> looks like two
separate parts: <code class="highlighter-rouge">/home/joe/Documents/New</code> and <code class="highlighter-rouge">File.txt</code>. Paths with special
characters need to be quoted: <code class="highlighter-rouge">'/home/joe/Documents/New File.txt'</code>. You can
also put a backslash before the special character:
<code class="highlighter-rouge">/home/joe/Documents/My\ File.txt</code></p>
<p>One final note: if a path ends with a <code class="highlighter-rouge">/</code>, it must refer to a directory.
However, the reverse is not true: a path not ending in <code class="highlighter-rouge">/</code> might still be
a directory.</p>
<h3 id="putting-it-together">Putting it together</h3>
<p>Using your knowledge of paths, you can navigate the entire file system.
Usually, relative and absolute paths can be used interchangably.</p>
<p>Enter <code class="highlighter-rouge">pwd</code> (short for “print working directory”; it has nothing to do with
passwords). This tells you your current location as an absolute path; mine is
<code class="highlighter-rouge">/home/joe/</code>. <code class="highlighter-rouge">pwd</code> is useful when you forget where you currently are!</p>
<p>Earlier, we used <code class="highlighter-rouge">cd</code> to navigate one directory at a time, but it is actually
more powerful. You can give it an absolute path: try using <code class="highlighter-rouge">cd /</code> to go to <code class="highlighter-rouge">/</code>,
the root directory, and using <code class="highlighter-rouge">ls</code> to list files. Then, go back to the original
directory (use the absolute path previously printed by <code class="highlighter-rouge">pwd</code>). Practice using
<code class="highlighter-rouge">cd</code> with some different absolute and relative paths.</p>
<p>The <code class="highlighter-rouge">ls</code> command also has other useful abilities. You can give it a path to list
the contents of that path: for example, <code class="highlighter-rouge">ls /home/joe/Desktop</code>. Want to know
how big your files are? Use <code class="highlighter-rouge">ls -lh</code> to see this information. (Try including
a path after <code class="highlighter-rouge">ls -lh</code>!)</p>
<p>Finally, you can make directories with the <code class="highlighter-rouge">mkdir</code> command and remove an empty
directory with the <code class="highlighter-rouge">rmdir</code> command. Try this out by making a directory for your
command-line learning efforts: I put mine at <code class="highlighter-rouge">~/Documents/command_line_demo</code>.
Make a directory inside that directory called <code class="highlighter-rouge">junk</code>, then remove the directory
with <code class="highlighter-rouge">rmdir</code>.</p>
<h3 id="summary-navigating-the-file-system">Summary: navigating the file system</h3>
<ul>
<li>Change directories: <code class="highlighter-rouge">cd [PATH]</code></li>
<li>List files in directory: <code class="highlighter-rouge">ls</code> (current directory) or <code class="highlighter-rouge">ls [PATH]</code> (another directory)</li>
<li>See file sizes in a directory: <code class="highlighter-rouge">ls -lh [PATH]</code></li>
<li>Print current directory: <code class="highlighter-rouge">pwd</code></li>
<li>Make a directory: <code class="highlighter-rouge">mkdir [PATH]</code></li>
<li>Remove an empty directory: <code class="highlighter-rouge">rmdir [PATH]</code></li>
</ul>
<h2 id="2-working-with-files">2. Working with files</h2>
<p>Now we know how to move around the filesystem. However, that isn’t very useful
if it’s all we can do! To do work using the command line, we have to know how
to work with files.</p>
<h3 id="downloading-and-viewing-files">Downloading and viewing files</h3>
<p>Many of the files you work with will be downloaded from the web. It’s often
handy to use the command line to download files. The <code class="highlighter-rouge">wget</code> tool will do this.
Let’s download an ebook of <em>On the Origin of Species</em> from the Project Gutenberg
website using the command:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>wget 'http://www.gutenberg.org/cache/epub/2009/pg2009.txt'
</code></pre></div></div>
<p>On OS X, you will have to use:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl -o pg2009.txt 'http://www.gutenberg.org/cache/epub/2009/pg2009.txt'
</code></pre></div></div>
<p>Is this the right file? We could open it up in Microsoft Word to see, but let’s
stick to command line tools. The <code class="highlighter-rouge">cat</code> command prints the contents of a
text file. Let’s try it:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cat pg2009.txt
</code></pre></div></div>
<p>Wow, that printed the entire text of <em>On the Origin of Species</em>! There are other
ways to view this file without printing all of it. Try:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>head -10 pg2009.txt
</code></pre></div></div>
<p>This should print the first 10 lines of our file. Finally, for something fancy,
try the <code class="highlighter-rouge">less</code> viewer:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>less pg2009.txt
</code></pre></div></div>
<p>This lets you scroll up and down through the file, all in the command line!
Type the letter ‘q’ to exit.</p>
<h3 id="moving-copying-and-deleting-files">Moving, copying, and deleting files</h3>
<p>Having our Darwin ebook is nice, but <code class="highlighter-rouge">pg2009.txt</code> isn’t an informative title.
We could have used command line options to download it to a different location:
<code class="highlighter-rouge">wget -O [OUTPUT FILE] [URL]</code> or <code class="highlighter-rouge">curl -o [OUTPUT FILE] [URL]</code>, but this is a
good opportunity to learn how to rename files. Let’s rename the file using the
<code class="highlighter-rouge">mv</code> (“move”) command:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mv pg2009.txt 'Origin of Species.txt'
</code></pre></div></div>
<p>Notice the single quotes around the filename! This is necessary because there
are spaces in the new path. The command
<code class="highlighter-rouge">mv pg2009.txt Origin\ of\ Species.txt</code> would also have worked. Also,
though we’ve just used <code class="highlighter-rouge">mv</code> to rename a file, <code class="highlighter-rouge">mv</code> will also move files between
directories if the new path is a directory (<code class="highlighter-rouge">test/</code>) or points to a location in
a different directory (<code class="highlighter-rouge">test/'Origin of Species.txt'</code>):</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mkdir test/
mv 'Origin of Species.txt' test/
# Equivalent to the following:
# mv 'Origin of Species.txt' test/'Origin of Species.txt'
</code></pre></div></div>
<p>Now try your hand at moving the file back to our working directory (hint:
<code class="highlighter-rouge">./</code> is the relative path for the current directory).</p>
<p>Copying books is hard, but copying ebooks is trivial. Let’s make a new copy of
Darwin’s book using the <code class="highlighter-rouge">cp</code> (copy) command:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cp 'Origin of Species.txt' darwin.txt
</code></pre></div></div>
<p>Use the <code class="highlighter-rouge">head</code> or <code class="highlighter-rouge">less</code> command to check that <code class="highlighter-rouge">darwin.txt</code> has the same text.</p>
<p>I’m getting overwhelmed by all these files; aren’t you? Let’s delete the extra
copy using the <code class="highlighter-rouge">rm</code> command:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rm darwin.txt
</code></pre></div></div>
<p>Be <strong>VERY</strong> careful using the <code class="highlighter-rouge">rm</code> command. Unlike deleting things in the
Finder, <code class="highlighter-rouge">rm</code> doesn’t send things to a trash folder. <strong>Once you <code class="highlighter-rouge">rm</code> a file,
it’s gone forever!</strong> If you’re nervous about this, you can install and use the
<a href="http://hasseg.org/trash/"><code class="highlighter-rouge">trash</code> tool for OS X</a> or <a href="https://github.com/andreafrancia/trash-cli"><code class="highlighter-rouge">trash-cli</code> for some
Linux systems</a>.</p>
<p><code class="highlighter-rouge">rm</code> will also delete entire directories if you need it to (even if there are
files inside). First, make a folder with some files in it:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mkdir Library/
cp 'Origin of Species.txt' Library/darwin.txt
</code></pre></div></div>
<p>You can check that this worked with the <code class="highlighter-rouge">ls</code> commands. To delete the directory,
use <code class="highlighter-rouge">rm -r</code> (“remove, recursive”).</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rm -r Library/
</code></pre></div></div>
<p>Notice that <code class="highlighter-rouge">rm -r</code> looks like <code class="highlighter-rouge">ls -lh</code>: both
have a command (<code class="highlighter-rouge">rm</code>, <code class="highlighter-rouge">ls</code>) followed by some options (<code class="highlighter-rouge">-r</code>, <code class="highlighter-rouge">-lh</code>). Another
useful command is <code class="highlighter-rouge">cp -r</code> (“copy, recursive”), which copies an entire
directory (including any files inside it). We’ll learn more about options when
we discuss running programs.</p>
<h3 id="other-tricks-for-files">Other tricks for files</h3>
<h4 id="creating-files">Creating files</h4>
<p>Want to create a blank file? Do the following:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>touch file.txt
</code></pre></div></div>
<h4 id="editing-text">Editing text</h4>
<p>Often you’ll want to make small edits to text from within the terminal. There
are many tools for this but I find that <code class="highlighter-rouge">nano</code> is the simplest and easiest to
use. Try:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>nano 'Origin of Species.txt'
</code></pre></div></div>
<p>To exit, hit Ctrl + X; you’ll be given options to save or discard changes. This
and other commands are at the bottom of the screen (<code class="highlighter-rouge">^X</code> is Ctrl + X and so on)
if you forget.</p>
<p>If you give <code class="highlighter-rouge">nano</code> the name of a file that doesn’t exist yet, it will create
that file for you.</p>
<h4 id="search">Search</h4>
<p>Looking for text in a file is often useful for bioinformatics. However, if
you’ve ever accidentally opened a genome in Microsoft Word, you might know that
this could be painfully slow. The <code class="highlighter-rouge">grep</code> tool is very fast and incredibly
useful for bioinformatics! As a demo, we can find out what Darwin has to say
about fungus:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> grep 'fungus' 'Origin of Species.txt'
</code></pre></div></div>
<p>(The first thing after <code class="highlighter-rouge">grep</code> is what you’re searching for; the next is the
file you’re looking in. I mix up the order of these two all the time.)</p>
<p>Not bad! Darwin uses that word twice: once in the line “…the water or some
parasitic fungus is infinitely more numerous in…” and again in “…fungus
exceeds its allies in the above respects, it will then be…”</p>
<p>Even cooler: we can find the context of those lines. <code class="highlighter-rouge">-A [NUMBER]</code> and
<code class="highlighter-rouge">-B [NUMBER]</code> tell <code class="highlighter-rouge">grep</code> how many lines before and after a match to print. So:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>grep 'fungus' 'Origin of Species.txt' -A 10 -B 10
</code></pre></div></div>
<p>This shows us the only passage in the <em>Origin</em> where Darwin discusses fungi–a
very interesting paragraph about niches and resource partitioning. (But do
closely related species actually compete more strongly?)</p>
<h4 id="links">Links</h4>
<p>One of my favorite Unix features is the ability to link files. Instead of
copying a file, you can make a <strong>symbolic link</strong> to the file that you can read
from and write to just like a real file. To do this, use <code class="highlighter-rouge">ln -s [PATH]</code> (“link,
symbolic”):</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ln -s 'Origin of Species.txt' 'symbolic_link.txt'
</code></pre></div></div>
<p>View the file (<code class="highlighter-rouge">head</code> or <code class="highlighter-rouge">less</code>) to see that the contents are the same. You
need to be careful with symbolic links, however. First, making changes in the
linked version will also change the original file–I have accidentally
overwritten data this way. Second, if your link uses a relative path and you
move the directory containing the link, the link will no longer point to a
valid file.</p>
<p>You can also make symbolic links to directories, which comes in handy pretty
often.</p>
<h4 id="wildcards">Wildcards</h4>
<p>It can be frustrating using commands like <code class="highlighter-rouge">mv</code> or <code class="highlighter-rouge">rm</code> on many similar files.
Say we have a bunch of text files:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cp 'Origin of Species.txt' junk1.txt
cp 'Origin of Species.txt' junk2.txt
cp 'Origin of Species.txt' junk03.txt
cp 'Origin of Species.txt' more_junk.txt
</code></pre></div></div>
<p>We could delete each file separately, but this takes too long. Instead, we can
use the <strong>wildcards</strong> <code class="highlighter-rouge">*</code> and <code class="highlighter-rouge">?</code>. <code class="highlighter-rouge">*</code> can match any number of characters,
while <code class="highlighter-rouge">?</code> can match only one character. So, this command will delete only
<code class="highlighter-rouge">junk1.txt</code> and <code class="highlighter-rouge">junk2.txt</code>:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rm junk?.txt
</code></pre></div></div>
<p>This command will delete <code class="highlighter-rouge">junk1.txt</code>, <code class="highlighter-rouge">junk2.txt</code>, and <code class="highlighter-rouge">junk03.txt</code>:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rm junk*.txt
</code></pre></div></div>
<p>Finally, this command will delete all four files:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rm *junk*.txt
</code></pre></div></div>
<p>This command successfully matches <code class="highlighter-rouge">junk1.txt</code>, etc. because <code class="highlighter-rouge">*</code> is allowed to
match zero characters.</p>
<p>It’s important to understand how the shell actually processes wildcards. Under
the hood, whenever it sees part of a command with <code class="highlighter-rouge">*</code> or <code class="highlighter-rouge">?</code>, it will replace
that with a list of valid paths matching that command. So, when we typed
<code class="highlighter-rouge">rm junk*.txt</code>, the shell converted it to this command:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rm junk1.txt junk2.txt junk03.txt
</code></pre></div></div>
<p>This is subtle, but very important to understand. Some commands won’t take more
than one path, so using a wildcard won’t allow you to run them on multiple
files. Also, using wildcards can lead to unexpected effects. What does the
following command do?</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mv junk?.txt
</code></pre></div></div>
<p><code class="highlighter-rouge">mv</code> doesn’t work with one path, but this command actually does something. Why?
The shell expands it to <code class="highlighter-rouge">mv junk1.txt junk2.txt</code>, and <strong>overwrites</strong> <code class="highlighter-rouge">junk2.txt</code>
with <code class="highlighter-rouge">junk1.txt</code>. I’ve accidentally overwritten data this way!</p>
<h3 id="summary-working-with-files">Summary: working with files</h3>
<p>Basic operations:</p>
<ul>
<li>Move/rename a file: <code class="highlighter-rouge">mv [FILE] [NEW PATH]</code></li>
<li>Delete a file (permanently!): <code class="highlighter-rouge">rm [FILE]</code></li>
<li>Delete an entire directory (permanently!): <code class="highlighter-rouge">rm -r [DIRECTORY]</code></li>
<li>Copy a file: <code class="highlighter-rouge">cp [FILE] [NEW PATH]</code></li>
<li>Copy an entire directory: <code class="highlighter-rouge">cp -r [DIRECTORY] [NEW PATH]</code></li>
</ul>
<p>Useful tricks:</p>
<ul>
<li>Download a file: <code class="highlighter-rouge">wget -O [OUTPUT FILE] [URL]</code> or <code class="highlighter-rouge">curl -o [OUTPUT FILE] [URL]</code> (<code class="highlighter-rouge">wget</code> may not be available in OS X)</li>
<li>View a file: <code class="highlighter-rouge">head -[NUMBER OF LINES] [FILE]</code> or <code class="highlighter-rouge">less [FILE]</code></li>
<li>Create a blank file: <code class="highlighter-rouge">touch [FILE]</code></li>
<li>Edit a file: <code class="highlighter-rouge">nano [FILE]</code></li>
<li>Search within a file: <code class="highlighter-rouge">grep [QUERY] [FILE]</code></li>
<li>Create a symbolic link to a file: <code class="highlighter-rouge">ln -s [FILE] [NEW LINK]</code></li>
<li>Wildcards: <code class="highlighter-rouge">*</code> matches any number of characters in a path; <code class="highlighter-rouge">?</code> matches one</li>
</ul>
<h2 id="3-working-with-programs">3. Working with programs</h2>
<p>Okay, now we have some familiarity with moving around in directories and with
manipulating files. Now, we’re ready to start actually doing stuff–and for
that, we need programs. We’ll learn to give input and options to programs, and
to work with output.</p>
<p>To run a program in the command line, you type the name of the program, or a
path to the executable file that runs the program. After that, include any
arguments (information about what to do) that the program expects. All parts
of this command should be separated with spaces–this is why you had to quote
the filename if it contained spaces (to prevent it being interpreted as
multiple arguments). When we run
<code class="highlighter-rouge">grep 'fungus' 'Origin of Species.txt'</code>, Unix interprets this as three
parts: <code class="highlighter-rouge">grep</code> <code class="highlighter-rouge">fungus</code> <code class="highlighter-rouge">Origin of Species.txt</code>; <code class="highlighter-rouge">grep</code> is the program and
the other two parts are arguments.</p>
<h3 id="figuring-out-program-usage">Figuring out program usage</h3>
<p>Many of the commands we’ve seen take arguments that modify their behavior–
think <code class="highlighter-rouge">ls -lh</code> and <code class="highlighter-rouge">cp -r</code>. Often, these arguments (“flags”) begin with one or
two hyphens. When using a new program, we have to figure out which flags we
need to do the task we want.</p>
<p>Fortunately, command-line programs worth their salt come with built-in help.
This can usually be found with <code class="highlighter-rouge">[PROGRAM] -h</code> or <code class="highlighter-rouge">[PROGRAM] --help</code>, or by
entering <code class="highlighter-rouge">man [PROGRAM]</code> (for “manual”).</p>
<p>We’ll use the <code class="highlighter-rouge">wc</code> command as an example. Let’s say we’re interested in how
many lines are in our <code class="highlighter-rouge">Origin of Species.txt</code> file. The <code class="highlighter-rouge">wc</code> (“word
count”) program can count lines, but how? Let’s read the help. Try the three
commands and see which one works:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>wc -h
wc --help
man wc
</code></pre></div></div>
<p>Which option gives us the line count? Count the lines using this option. You
should come up with around twenty thousand lines!</p>
<h3 id="redirecting-output">Redirecting output</h3>
<p>Many programs can take their input from other programs and give their output to
other programs. You can take advantage of this to control where output goes.</p>
<p>We saw that <code class="highlighter-rouge">grep</code> can be used to be search lines. Let’s say we’d like to write
the fungus passage we found earlier to a file. For this we use <code class="highlighter-rouge">></code>, which
<strong>redirects</strong> output that would be written to the command line to a file:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>grep 'fungus' 'Origin of Species.txt' -A 10 -B 10 > 'darwin_on_fungi.txt'
</code></pre></div></div>
<p>Run the command and check this file. Redirecting can be useful to save error
messages to a log file. For this, you’ll want to use <code class="highlighter-rouge">2>&1</code>, which covers
error messages as well as regular output:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[PROGRAM WITH OUTPUT] 2>&1 logfile.log
</code></pre></div></div>
<p>You can also send the output of programs through other programs using a <code class="highlighter-rouge">|</code>,
which is known as <strong>piping</strong> output. Let’s use the <code class="highlighter-rouge">wc</code> program to count the
lines in the passage, without having to save it to a file:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>grep 'fungus' 'Origin of Species.txt' -A 10 -B 10 | wc -l
</code></pre></div></div>
<p>You can pipe multiple times and even combine with redirecting:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>grep 'fungus' 'Origin of Species.txt' | wc -l > 'darwin_on_fungi_line_count.txt'
</code></pre></div></div>
<p>This finds the lines, counts them, then writes the number to a file.</p>
<p>This is one of the most powerful features of the Unix command line. One of the
most useful commands when processing nucleotide data is <code class="highlighter-rouge">zcat [FILE] | wc -l</code>,
(<code class="highlighter-rouge">gzcat</code> if on OS X) which uses <code class="highlighter-rouge">zcat</code> to uncompress the data in a file, then
sends it to <code class="highlighter-rouge">wc -l</code> to count the lines. You can also use <code class="highlighter-rouge">zcat [FILE] | less</code>
to view a zipped file without uncompressing it.</p>
<h3 id="your-own-programs-simple-shell-scripts">Your own programs: simple shell scripts</h3>
<p>You can also run custom programs. One useful example is storing your command
line commands in a file called a <strong>shell script</strong>.</p>
<p>Let’s say we want some easier way to run the command <code class="highlighter-rouge">ls -lh ~/</code> (checking
contents of home directory and sizes). Create a file called <code class="highlighter-rouge">check_home.sh</code> in
you working directory and edit it (using a text editor, or the <code class="highlighter-rouge">nano</code> command
for extra credit). Enter the following text:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/bash</span>
<span class="nb">ls</span> <span class="nt">-lh</span> ~/
</code></pre></div></div>
<p>The first line is a standard way to start bash scripts, called a “shebang” line
(“hash” for <code class="highlighter-rouge">#</code> + “bang” for <code class="highlighter-rouge">!</code>). The part after <code class="highlighter-rouge">#!</code> is the program that
should be used to run the file, in this case bash (the program that processes
command line input).</p>
<p>Save the file and run it like this:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>bash check_home.sh
</code></pre></div></div>
<p>This should list information about the file in your home directory. But let’s
say we want to run the <code class="highlighter-rouge">check_home.sh</code> program without having to type <code class="highlighter-rouge">bash</code>
before it. Let’s try</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>check_home.sh
</code></pre></div></div>
<p>This fails, saying <code class="highlighter-rouge">check_home.sh: command not found</code>. This is because Unix
won’t run a program in your current directory unless you add <code class="highlighter-rouge">./</code> before the
name. This is a security feature: imagine if someone tricked you into
downloading a file called <code class="highlighter-rouge">cd</code>. So, let’s try:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./check_home.sh
</code></pre></div></div>
<p>Oh no, another error: <code class="highlighter-rouge">bash: ./check_home.sh: Permission denied</code>. What
happened? Unix won’t run a file unless you give that file <strong>execute
permission</strong>. This is another security feature. The <code class="highlighter-rouge">chmod</code> command can be used
to add this permission:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>chmod +x check_home.sh
</code></pre></div></div>
<p>Now, it should work as intended! Run it again using:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./check_home.sh
</code></pre></div></div>
<p>Finally, let’s say we want to add information about what the script’s doing.
Update the file so it says:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/bash</span>
<span class="c"># List the contents of our home directory</span>
<span class="nb">ls</span> <span class="nt">-lh</span> ~/
</code></pre></div></div>
<p>The second line beginning with <code class="highlighter-rouge">#</code> is a <strong>comment</strong>; it tells bash not to run
the rest of the line. Comments can also go at the end of a line of code:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/bash</span>
<span class="c"># List the contents of our home directory</span>
<span class="nb">ls</span> <span class="nt">-lh</span> ~/ <span class="c"># This line has a end comment</span>
</code></pre></div></div>
<p>Run the commented script to see that it works properly. Comments are a great
way to make your shell scripts clearer.</p>
<p>You’ve saved the commands for a simple “analysis” into a shell script! This
is really convenient for doing complex analyses: just put all the commands into
a shell script, and run the whole script.</p>
<p>Bash (this particular type of shell script) is actually its own programming
language, and has variables, loops, and other features you’d expect. You can
implement complex logic in your scripts if you need.</p>
<h3 id="summary-working-with-programs">Summary: working with programs</h3>
<ul>
<li>General syntax: <code class="highlighter-rouge">[PROGRAM NAME OR PATH] [ARGUMENT 1] [ARGUMENT 2] [...]</code> (quote arguments if they have spaces)</li>
<li>For help: <code class="highlighter-rouge">[PROGRAM] -h</code> or <code class="highlighter-rouge">[PROGRAM] --help</code> or <code class="highlighter-rouge">man [PROGRAM]</code></li>
<li>Redirecting output to file: <code class="highlighter-rouge">[PROGRAM] [ARGUMENTS] > [OUTPUT FILE]</code></li>
<li>Piping output to another program: <code class="highlighter-rouge">[PROGRAM 1] [ARGUMENTS] | [PROGRAM 2] [ARGUMENTS]</code></li>
</ul>
<p>Useful pipes/redirects:</p>
<ul>
<li>Count lines in zipped file: <code class="highlighter-rouge">zcat [FILE] | wc -l</code> (<code class="highlighter-rouge">gzcat</code> on OS X)</li>
<li>View zipped file: <code class="highlighter-rouge">zcat [FILE] | less</code> (<code class="highlighter-rouge">gzcat</code> on OS X)</li>
<li>Redirect output to log: <code class="highlighter-rouge">[PROGRAM WITH OUTPUT] > [OUTPUT FILE]</code></li>
<li>Redirect output and errors to log: <code class="highlighter-rouge">[PROGRAM WITH OUTPUT] 2>&1 logfile.log</code></li>
</ul>
<p>Writing shell scripts:</p>
<ul>
<li>Begin the script with “shebang” line: <code class="highlighter-rouge"><span class="c">#!/bin/bash</span></code></li>
<li>Use <code class="highlighter-rouge">#</code> for comments</li>
<li>Run with <code class="highlighter-rouge">bash [SCRIPT PATH]</code> or <code class="highlighter-rouge">[SCRIPT PATH]</code> (with <code class="highlighter-rouge">./</code> if in current directory)</li>
<li>If running as <code class="highlighter-rouge">[SCRIPT PATH]</code>, use <code class="highlighter-rouge">chmod +x [SCRIPT PATH]</code> to set permissions</li>
</ul>Joe WanBasic familiarity in the Unix command line opens a whole world of bioinformatics tools and analysis opportunities. This guide introduces the most important commands and concepts you’ll need to get started.