Objectives

Note: This UNIX primer is heavily borrowed from a lab originally designed by two excellent bioinformaticians at UC Davis, Keith Bradnam and Ian Korf. You can find their version of the UNIX primer here: UNIX and Perl Primer for Biologists


Why UNIX?

UNIX systems allow us to use a keyboard to issue commands, without the need for a graphical interface or mouse navigation. Importantly, this feature enables us to automate keyboard tasks—a necessity when working with genome-scale data.

Increasingly, the raw output of biological research is usually in the format of large text files that must be analyzed in silico. UNIX is particularly well-suited for working with such files and has several powerful and flexible commands that allow for easy data processing.


The terminal

We will be accessing the UNIX system through the terminal. The terminal provides a command line interface through which we can control the UNIX-based operating system.

To access the terminal, use the Spotlight search tool (the little magnifying glass in the top right of the menu bar), and then launch Apple’s terminal application. You can also access the terminal through the Applications menu.

Your terminal should look something like:


Accessing the server

Now, it’s time to try and log in to the server. You will need to access it remotely using a command called ssh. We will first log on to Cornell’s login server (cbsulogin). Then, we will log on to the class server (cbsufindley).

Substitute my_lab_id with your lab id and you will be prompted to enter your password after each command.

$ ssh my_lab_id@cbsulogin.tc.cornell.edu   ## substitute your lab id for my_lab_id; this allows your computer access to the class server
    <enter your password>
$ ssh cbsufindley                   ## this is our class server, where computational tasks can be performed
    <enter your password> 

Note: If you are having trouble logging in to Cornell’s login server (cbsulogin), it may be because it’s too busy. If this happens, try accessing a different login server (cbsulogin2, cbsulogin3).


Working with files

Moving files

The following sections will deal with moving, renaming, removing, and looking at files. First, make a temporary file named tmp1.txt using the command touch. Then, move this file to the /workdir/genomics/my_lab_id/Temp. We will do this using the mv command.

For the mv command, we always have to specify a source file (or directory) that we want to move, and then specify a target location.

$ cd /workdir/genomics/my_lab_id
$ touch tmp1.txt
$ mv tmp1.txt /workdir/genomics/my_lab_id/Temp
$ ls
Temp
## You have moved the tmp1.txt file, so you should not see it listed here.
$ ls Temp/
## Is tmp1.txt there?  It should be.

Renaming files

In the earlier example, the destination for the mv command was a directory name (Temp). So we moved a file from its source location to a target location (source and target are important concepts for many UNIX commands). But note that the target could have been a (different) file name, rather than a directory. Try it out.

$ cd /workdir/genomics/my_lab_id/Temp
$ ls
tmp1.txt
$ mv tmp1.txt tmp2.txt
$ ls
tmp2.txt

In this example, we created used mv to move the file called tmp1.txt to a file called tmp2.txt. In UNIX, mv is essentially how we rename files, as UNIX does not have a separate rename command.

You can also use mv to move and rename directories, just as you did with files.

TASK: Try making a new directory (Temp2) in your /workdir/genomics/your_name/ folder. Move the Temp2 directory inside the Temp/ directory using mv.


The most dangerous UNIX command you will ever learn!

How can we remove files from directories? In order to do this, we must use the rm command.

Potentially, rm is a very dangerous command. UNIX does not have a trash or recycling bin. Therefore, if you delete something with rm, you will not get it back! Indeed, it is possible to delete everything in your home directory (all directories and subdirectories) with rm.

Let me repeat that. It is possible to delete EVERY file you have ever created with one simple rm command. Are you scared yet? It is also possible for you to delete everything in the course! Now that’s extra scary. Luckily, there is a way of making rm slightly less dangerous. Adding the -i makes the prompt ask for confirmation prior to deleting.

$ cd /workdir/genomics/your_name/Temp
$ ls
Temp2 tmp2.txt
$ rm -i tmp2.txt
rm: remove regular empty file ‘tmp2.txt’?
## type a 'y' and enter to confirm that you want the file removed

Isn’t the -i option helpful? From here on out, you should always add -i when you use the rm command.


Go forth and multiply

Copying files with the cp (copy) command is very similar to moving them. Remember to always specify a source and a target location. Let’s create a new file and make a copy of it.

$ cd /workdir/genomics/my_lab_id/
$ touch tmp3
$ cp tmp3 tmp4
$ ls
tmp3 tmp4

There are two files, Droso.fa and rna.fa in the /workdir/genomics/data folder. Copy these into your folder, using the cp command.

$ cd /workdir/genomics/my_lab_id
$ cp /workdir/genomics/data/Droso.fa /workdir/genomics/my_lab_id
$ cp /workdir/genomics/data/rna.fa .
$ ls
Droso.fa  rna.fa  Temp  tmp3 tmp4

The last step introduces a new concept. In UNIX, the current directory can be represented by a . (dot). You wil mostly use this for copying files to the current directory you are in.


Time to tidy up

We now have an empty directory that we should remove. To do this, use the rmdir command. This will only remove empty directories, so is quite safe to use.

$ cd /workdir/genomics/my_lab_id/Temp     
$ rmdir Temp2
$ cd ..
$ rmdir Temp
$ rm -i tmp*

In the last step, we have introduced the concept of wildcards characters. The asterisk (*) acts as a wilcard, essentially meaning ‘match anything’. Because both tmp3 and tmp4 begin with tmp, tmp* will refer to both files.


Viewing and editing files

Less is more

So far, we have covered listing the contents of directories and moving/copying/deleting either files and/or directories. Now we will briefly cover how you can look at files. In UNIX, the less command lets you view (but not edit) text files.

Let’s take a look at a file of nucleotide sequences that encode rna. Each sequence is represented by a header line that starts with a ‘>’. These sequences include information about species and gene names in the header lines (“Migut” stands for Mimulus guttatus) and have a gene ID associated with them after the period (e.g., “E00076” is the 76th gene on the fifth or “E” chromosome).

$ less rna.fa

TASK: Explore the file using by scrolling around with a mouse. How many headers are there in the file?

When you are done looking at the file, get out of less by typing q for quit.


Editing files with nano

In addition to looking at files, you may also want to create files you can manipulate or edit files that already exist. Fortunately, UNIX has a built-in text editor that is pretty handy. You can access the text editor with the command nano.

$ nano test

Your screen should look like:

Look at all those handy functions detailed across the bottom!

Type ‘this is a test’ in the file and close it by pressing ‘control’ and ‘X’ at the same time. You should say ‘y’ when asked if you want to save it. You should now have a new file called ‘test’ in your directory. If you use less to look at it, it should read ‘this is a test’.


LAB ASSIGNMENT 1

Your lab is almost finished. Please complete the following activity to confirm that you have finished the lab. This will test your ability to move, copy, and edit files.

You can use nano to edit a file that already exists. First, copy the rna.fa into a new file named rna.fa_my_lab_id. Using nano, edit the rna.fa_my_lab_id file by changing the name of the first header to ‘Gene1’. Then, use the mv command to move the rna.fa_my_lab_id to the /workdir/genomics/lab1 directory.