Introductory Notes for New Users of RCS HPC Systems
1. |
This Page |
This page is intended for users new to RCS HPC/HTC systems. It describes:
- the steps required to access RCS HPC systems — in particular those steps required from a MS Windows desktop/laptop;
- the nature of these HPC systems;
- sharing resources with other users and running computational job on them;
- some experimental services which are new to these HPC systems.
2. |
Overview |
All RCS HPC systems are used remotely via SSH. Users authenticate (i.e., login) using an SSH client; after successful authentication a command-line interface is presented. This can be used to submit computational jobs to the batch system queues.
The remainder of this section may be considered the short version of this document — for those familiar with remotely accessing Linux-based HPC systems and submitting jobs to batch systems. For those that are not, please read the remaining sections!
- Getting an Account and Authentication
- Email [email protected] briefly describing your computational requirements. More. . .
- Connecting
- All systems are accessed via SSH, SCP and/or SFTP. More. . .
- Network/Firewall Issues
- All systems are firewalled. Some systems are accessible from all University of Manchester IP addresses; others are not. Few are accessible from outside of the University of Manchester. More. . .
- Using GUI-Based Applications
- SSH, on its own, gives a command-line interface only. Should the use of GUI-based applications be required, for example the Notepad/Wordpad-like editor Gedit, or the Matlab graphical shell, then X-Windows (X11) may be tunnelled through the SSH connection (and an X-server will be required on the local desktop/laptop). More. . .
- The Nature of the Systems
- All RCS-administered HPC systems are Linux clusters of many computers, usually called nodes. In most cases these clusters exist on a completely private network; users directly access only one or two login/head nodes. More. . .
- Running a Computational Job in the Batch System
- Many people are likely to be using each cluster simultaneously; all computational jobs must be run run on compute nodes, not on the login/head node(s). Computational work is submitted to these compute nodes via the batch system queues. More. . .
- Running Interactive Computational Jobs
- The vast majority of computational work carried out on RCS HPC systems is done in batch mode, i.e., non-interactively. On rare occasion it is necessary to run jobs interactively. Experimental queues exist on two RCS systems, Man2 and Mace01, which facilitate this. More. . .
- Virtual Desktops
- Running GUI-based, interactive computations presents a problem: if the local desktop or laptop which on which the GUI is displayed is switched off, or looses network connectivity, the computation will be killed even though it is running on the remote HPC system. Using a virtual desktop to display the GUI eliminates this problem. More. . .
- Troubleshooting and FAQ
3. |
Getting an Account; Authentication |
Getting an Account
To get an account on any RCS-administered HPC system, email [email protected], briefly describing the computational work that you wish to carry out, for example:
- applications, compilers or libraries needed;
- a rough estimate of diskspace required;
- the nature of the computational jobs you hope to run, for example:
- Do you wish to run a few long-running jobs, or a lot of short jobs?
- Does your work require a particularly large amount of memory?
- Is your code serial or parallel? (i.e., can it use more than one CPU at once?)
Getting Your Username and Password
For each HPC system run by RCS, you will have a username and password to enable you to authenticate (login) and run computational jobs. These credentials are independent of your central IT Services username and password, though, simply for ease of administration, the username will usually be the same.
Once you have an account on an RCS-administered HPC system, the system-administrator will contact you to give you your username and password.
For security reasons, as soon as you have received your credentials for a system, you should login and change your password (using the passwd) command.
4. |
Connecting to RCS Linux-Based HPC Systems |
4.1. |
Secure Shell (SSH) |
Secure Shell (SSH) is a network protocol which is used to connect to remote computers, i.e., to authenticate (login) and interact with the remote system.
Macintosh OS-X systems and all popular Linux distributions include an SSH client called OpenSSH; MS Windows users must download and install one. The most popular is PuTTY which can be freely downloaded and installed.
4.2. |
Using OpenSSH on Linux and OS-X |
At a command line, on Linux or OS-X, simply type
ssh <username>@<remote.system.name>for example
ssh [email protected]The first time you connect to a particular system you will be prompted to confirm its authenticity, for example
The authenticity of host 'man2.nw-grid.ac.uk (130.88.200.243)' can't be established. RSA key fingerprint is cf:48:69:ff:99:f0:a1:4a:80:0b:46:b5:40:c0:fc:4c. Are you sure you want to continue connecting (yes/no)?Unless you have any reason for doubt, enter yes and you will then be prompted for your password — enter that given to you by the system's administrator (not your central IT Services password).
4.3. |
Using PuTTY |
From a MS Windows desktop/laptop, to authenticate (login) to a remote Linux system, install and start PuTTY
PuTTY Configuration
Enter the name of the system to which you wish to connect and click Open.
The first time you connect to any given system you will see a PuTTY aecurity alert
PuTTY Security Alert
The first time you connect to a system you will see a PuTTY security alert
The next step is authentication. Enter your username at the prompt — this will usually be your central IT Services username:
PuTTY Login Prompt
Enter your username at the prompt and then the password given to your for this system, when asked.
5. |
GUI-Based Applications and X-Windows |
Using PuTTY alone allows you to login and enter commands, for example, submit computational jobs to the batch system. But what if you want to start a GUI-based editor, such as gedit, or start the Matlab GUI? Then you will need to be running an X11 Server on your local desktop/laptop and also to connect using PuTTY with X11 tunnelling enabled.
Macintosh OS-X systems and all popular Linux distributions include an X11 server — that on Linux is always running (assuming you are running a GUI-based desktop such as GNOME or KDE). MS Windows users must download and install one. The most popular are Hummingbird eXceed and Xming; the University has a site licence for eXceed; Xming may be freely downloaded and installed.
PuTTY: Enable X11 Forwarding
Ensure the Enable X11 Forwarding box is checked.
Once you have an X11 server installed, then in order:
- Start the X11 server — eXceed or Xming.
- Start PuTTY — ensure the Enable X11 forwarding box is "checked" (see figure).
- Login as to the remote Linux system as normal. You should then be able to start GUI-based applications such as gedit and Matlab on the remote system and have them displayed on your local desktop/laptop.
6. |
File Transfer |
It is likely that you will wish to upload files to the HPC system, or download them to your desktop/laptop. Linux users can do this by using the OpenSSH utilities suite (which comes will all popular distros). MS Windows users must download a suitable client; WinSCP, which is freely downloadable, is a popular choice.
6.1. |
Using SCP and SFTP |
At a command line, on Linux or OS-X, to upload a file from your desktop/laptop, simply type
scp <local.filename> <username>@<remote.system.name>:<remote.filename>for example
scp my_prog.f90 [email protected]:my_programme.f90To download a file to your desktop/laptop, enter, for example,
scp [email protected]:my_results.dat my_remote_results.dat
6.2. |
Using WinSCP |
To download or upload files, start WinSCP and enter the name of the system you which to upload/download files to/from in the Host name box, and your username and password.
WinSCP Login
Enter the name of the remote system, your username and password, and click Login
The first time you login to any given system you will see a warning message.
WinSCP Warning
The first time you login to any given system you will see a warning message.
Once logged in, a nice drag-n-drop interface is presented.
WinSCP Drag-n-Drop Interface
Once logged in, a nice drag-n-drop interface is presented.
7. |
Network and Firewall Issues |
All RCS-administered HPC systems are firewalled; the firewall policies vary and depend on the purpose of the system. The system-specific documentation should give details. If a system is not accessible from all University of Manchester IP addresses, users will be required to register addresses from which they plan to connect. Access may be possible using the University VPN — from both on and off campus.
8. |
The Nature of HPC Systems |
- Each HPC System is a Cluster
-
Each HPC system is a cluster of nodes, on a private network. Only the
login/master node is accessible on the public network and only this node
is accessed by users. All the other nodes are compute nodes (which are
directly accessed only by the system-administrator).
- Each HPC System is Used by Many People
- Many people use each HPC cluster; the computational resources are shared between them.
- Batch Systems and Queues
- Computational work is submitted from the login/master node to the compute nodes by users via a batch system
9. |
Running Computational Jobs |
The HPC systems are a shared computational resource. To ensure everyone gets a fair share and to allow the system to function correctly:
- All computationally-intensive work must be submitted to the batch system's queues — computationally-intensive processes run on the login node will be killed without warning. (Low intensity work, such as editing, is of course perfectly accessible on the login node.)
- A consequence of this is that most computational work must be carried out in batch mode, i.e., non-interactively. For example, if using Matlab, the required computation must be done my asking Matlab to run a Matlab programme, rather than by starting the (interactive) application interface (e.g., GUI).
- Most RCS-run HPC systems use SGE (Sun Grid Engine) as the batch system:
- An Introduction to SGE: Batch Systems, SGE, Queues, Submitting, Monitoring and Deleting Jobs;
- Parallel Jobs with example Qsub Scripts;
- Job Arrays: running lots of related jobs efficiently;
- More on SGE: Job Dependencies, Advance Reservation.
- Some RCS-run HPC systems do not make use of SGE, notably Horace. Details of how to submit jobs to their respective batch systems should be found in the dedicated documentation.
10. |
Experimental Interactive/GUI Queues |
Traditionally all computational jobs run on HPC clusters are batch jobs, i.e., once started, there is no interaction with the computation; no GUI is required or used. For example, Matlab code is run at the command-line interface (e.g., matlab < my_prog.m) rather than within the graphical shell.
However, in some cases use of an application GUI may be desirable or even necessary (e.g., with Matlab or Fluent). For this, interactive queues exist on man2.nw-grid.ac.uk and mace01.mace.manchester.ac.uk which enable users to queue interactive, GUI-based sessions.
These interactive queues are experimental. Please contact the system administrator of the Man2 and/or Mace01 before using them.
11. |
Experimental Virtual Desktop Services |
It may be that queued, GUI-based sessions take some hours, and that during that time a user wishes to change location (e.g., move from office to home) and/or the computer being used (e.g., from office desktop to home laptop).
Shutting down (or suspending) a desktop or laptop on which a remotely-running application GUI is displayed will usually force the application to exit (when the connection timeout is exceeded) killing the job half-way through); and of course a user can no longer interact with an application displayed on a desktop from a different location!
VNC solves these problems via its virtual desktop. Applications run displayed on the virtual desktop whether or not this virtual desktop is it self currently being displayed. This means that a user can start an application on a virtual desktop, then disconnect and reconnect as required, while the application continues to run untroubled.
An experimental virtual desktop (VNC) services is being trialled on mace01.mace.manchester.ac.uk.