Introducing the Shell
Some vocabulary:
- GUI: Graphical User Interface, the most common way to use computers nowadays
- CLI: Command Line Interface, old fashioned
CLIs are harder to learn than GUIs, but:
- best supported way to interact with HPC (and this might be why most of you are here today).
- smooth transition between experimentation and automation: write commands interactively and then compose them into a script (great for reproducibility) (see REPL)
- can be more productive once learned
- the command prompt may look in many ways, showing
various information. We’ll (try to) show it always as
$, which is the “worst” case (least amount of information). - You can type commands after the command prompt.
- when copying and pasting, be careful not copy the command prompt.
- A shell is a program whose primary purpose is to read commands and run other programs.
- This lesson uses Bash, the default shell in many implementations of Unix.
- Programs can be run in Bash by entering commands at the command-line prompt.
- The shell’s main advantages are its high action-to-keystroke ratio, its support for automating repetitive tasks, and its capacity to access networked machines.
- A significant challenge when using the shell can be knowing what commands need to be run and how to run them.
Navigating Files and Directories
- The file system is responsible for managing information on the disk.
- Information is stored in files, which are stored in directories (folders).
- Directories can also store other directories, which then form a directory tree.
-
pwdprints the user’s current working directory. -
ls [path]prints a listing of a specific file or directory;lson its own lists the current working directory. -
cd [path]changes the current working directory. - Most commands take options that begin with a single
-. - Directory names in a path are separated with
/on Unix, but\on Windows. -
/on its own is the root directory of the whole file system. - An absolute path specifies a location from the root of the file system.
- A relative path specifies a location starting from the current location.
-
.on its own means ‘the current directory’;..means ‘the directory above the current one’.
Working With Files and Directories
-
cp [old] [new]copies a file. -
mkdir [path]creates a new directory. -
mv [old] [new]moves (renames) a file or directory. -
rm [path]removes (deletes) a file. -
*matches zero or more characters in a filename, so*.txtmatches all files ending in.txt. -
?matches any single character in a filename, so?.txtmatchesa.txtbut notany.txt. - Use of the Control key may be described in many ways, including
Ctrl-X,Control-X, and^X. - The shell does not have a trash bin: once something is deleted, it’s really gone.
- Most files’ names are
something.extension. The extension isn’t required, and doesn’t guarantee anything, but is normally used to indicate the type of data in the file. - Depending on the type of work you do, you may need a more powerful text editor than Nano.
Pipes and Filters
-
wccounts lines, words, and characters in its inputs. -
catdisplays the contents of its inputs. -
sortsorts its inputs. -
headdisplays the first 10 lines of its input by default without additional arguments. -
taildisplays the last 10 lines of its input by default without additional arguments. -
command > [file]redirects a command’s output to a file (overwriting any existing content). -
command >> [file]appends a command’s output to a file. -
[first] | [second]is a pipeline: the output of the first command is used as the input to the second. - The best way to use the shell is to use pipes to combine simple single-purpose programs (filters).
Loops
- A
forloop repeats commands once for every thing in a list. - Every
forloop needs a variable to refer to the thing it is currently operating on. - Use
$nameto expand a variable (i.e., get its value).${name}can also be used. - Do not use spaces, quotes, or wildcard characters such as ‘*’ or ‘?’ in filenames, as it complicates variable expansion.
- Give files consistent names that are easy to match with wildcard patterns to make it easy to select them for looping.
- Use the up-arrow key to scroll up through previous commands to edit and repeat them.
- Use Ctrl+R to search through the previously entered commands.
- Use
historyto display recent commands, and![number]to repeat a command by number.
Shell Scripts
- Save commands in files (usually called shell scripts) for re-use.
-
bash [filename]runs the commands saved in a file. -
$@refers to all of a shell script’s command-line arguments. -
$1,$2, etc., refer to the first command-line argument, the second command-line argument, etc. - Place variables in quotes if the values might have spaces in them.
- Letting users decide what files to process is more flexible and more consistent with built-in Unix commands.
Finding Things
-
findfinds files with specific properties that match patterns. -
grepselects lines in files that match patterns. -
--helpis an option supported by many bash commands, and programs that can be run from within Bash, to display more information on how to use these commands or programs. -
man [command]displays the manual page for a given command. -
$([command])inserts a command’s output in place.
File Permissions
-
ls -lgives information about simple file permissions. - files and directories have a user owner and a group owner
-
chownandchgrpcan be used to change the permissions of a file - ACLs give a more powerful mechanism to define ownership
- ACLs can be manipulated with the
getfaclandsetfaclcommands.
Recommendations on using the shell on shared HPC systems
- When working on the login node of a HPC system, make sure your commands are not using too many computational resources and do not run for more than a couple of minutes.
- If you need more power, create an interactive (or batch) allocation with the resource manager of the HPC cluster (e.g., SLURM)
- Performing too many file operations might as well cause slowdowns for other users attempting to use the filesystem.
- Performing too many file operations might be inefficient