User documentation for CASTOR at the RAL Tier-1

CASTOR (CERN Advanced STORage manager) is a software technology used to manage a large tape-based storage system. It was developed at CERN, but it has been customised for the environment at RAL. This page has a brief explanation of the system and some guidance for uses, with pointers to more detailed information. The information is especially aimed at non-LHC VOs. It is assumed that the reader has some general knowledge of the Grid (see elsewhere on this web site for pointers to further information).

CASTOR contacts

There is a weekly meeting for discussion between CASTOR users and members of the Tier-1 team. This is currently held every Wednesday at 13:30, except for the first Wednesday in the month when it's replaced by a general Tier-1 meeting. The meeting is held physically in building R75 (the RAL main entrance building), but remote participants can ask to be phoned in.

CASTOR components

CASTOR is a complex system with a number of more-or-less separate components. General information about CASTOR can be found on the CERN web site:

CASTOR home page at CERN
CASTOR User Guide
However, the configuration at RAL is somewhat different, so this page explains some of the RAL-specific details, as illustrated in this schematic diagram.

The comments below include reference to some CASTOR-specific commands. The Tier-1 has its own User Interface (UI) hosts, but access to these is no longer available to general users. However, it may be possible to negotiate access for a small number of people per experiment to enable debugging or privileged operations. For RAL users it may also be possible to use the Tier-2 UI in PPD, but this operates as a separate system and hence may not fully interoperate (use of RFIO requires UIDs and GIDs to match on the two systems). Access from outside RAL is generally not possible except via the Grid (SRM) interface.

Some environment variables are needed for the commands, as described below. To summarise, the relevant variables are:


CNS_HOST=castorns.ads.rl.ac.uk
STAGE_SVCCLASS=atlasSimStrip
STAGE_HOST=catlasstager.ads.rl.ac.uk
RFIO_USE_CASTOR_V2=YES
with the service class and stager name changed as appropriate.

The tape storage system

The Tier-1 has a large robotic tape store with a potential capacity of around 5 Pb, and a total of 18 tape drives. This is shared between all the user communities, but some of the drives are reserved to particular VOs to prevent one VO starving another of resources. The mounting of tape cartridges to and from tape drives is managed by a piece of software called the Volume Drive Queue Manager (VDQM), but this is normally transparent to users.

As files are deleted from tapes, gaps are left. A piece of software called Repack is used to move files around on tape to recover this space. In addition there is a possiblility to group related files together on tape in file families, but this is somewhat complex and is not described here.

The Name Server

The CASTOR name server maintains a single namespace for all files stored in the system at RAL, using an underlying Oracle database. This is a unix-like namespace with a root of /castor/ads.rl.ac.uk/. The namserver host is castorns.ads.rl.ac.uk; usually this will be used by default, but if necessary it can be defined using the CNS_HOST environment variable.

Nameserver commands are prefixed with ns. These are generally rather low-level commands which are unlikely to be needed by most users, but the nsls command may be useful to list files in a similar way to the unix ls. The environment variable CASTOR_HOME can be defined as a prefix used with relative path names.

The Stagers

A Stager is a software system which manages files on a pool of disk servers, and transfers (stages) those files to and from the tape system. In general CASTOR expects to use the disk servers as a cache, from which files can be deleted if space is needed as they can be subsequently recalled from tape. However, CASTOR has also been recently enhanced to support the use of disk-only files with no tape copy. Files can only be accessed from disk, so if a file is migrated to tape it has to be staged back to disk to be used, which can take some time.

At RAL each of the major LHC experiments (ATLAS, CMS and LHCb) have their own stager to avoid contention. The other experiments all share a single stager. Requests in the stagers are scheduled using the LSF batch scheduler. The disk pools managed by a stager are divided into service classes which have a set of defined properties, for example whether the files are backed up to tape or not.

There are again some command-line tools to interact with the stager. The most useful of these is stager_qry -M <filename>, which gives some information about the status of the given file, in particular whether it's currently staged to disk.

These commands need to know the stager host name, which can be set using the STAGE_HOST environment variable. The current names are catlasstager.ads.rl.ac.uk, ccmsstager.ads.rl.ac.uk, clhcbstager.ads.rl.ac.uk and genstager.ads.rl.ac.uk. It may also be necessary to define the variable STAGE_SVCCLASS to the name of the relevant service class. The service classes are specific to each experiment so in general you will need to ask about which class to use, but the defined class names can be obtained from the information system or from ganglia as described below. There is also some information on the wiki.

File classes

Each file belongs to a file class, the main purpose of which is to define whether the file will be copied to tape or not. The file class may depend on both the service class and the file name, e.g. all files under a given directory. Files in a given file class will also generally be grouped together when written to tape. File class properties can be listed with the nslistclass command, and the class for a given file can be determined using nsls --class.

RFIO

RFIO (Remote File I/O) is a software protocol which provides Unix-like access to files in the CASTOR namespace. Note that this is not a true Unix filesystem, but a library which mimics the standard posix i/o functions and a set of command-line tools similar to the standard Unix tools. Particularly useful commands are rfdir (similar to nsls described above) and rfcp (similar to the Unix cp).

The current version of RFIO is not Grid-aware, it maps users according to their local Unix uid/gid, which limits its usefulness in a Grid environment, especially at RAL where users are in general no longer given local accounts. A Grid-aware version of RFIO has been developed and is expected to appear with the next major upgrade to CASTOR, but this may not be until the end of 2009. However, the latter is already in use with the DPM disk storage system used at many Grid sites, and the clients are therefore distributed with the standard grid User Interface software. Unfortunately these have the same names but are not interoperable with the CASTOR RFIO tools. The former are generally stored in /opt/lcg/bin and the latter in /usr/bin with the latter usually coming first in the PATH, so it may be necessary to ensure that you refer to them using the full path name. You should also set the environment variable RFIO_USE_CASTOR_V2=YES, as well as the stager variables described above.

The SRM

SRM (Storage Resource Manager) is a standard Grid protocol used to communicate with a storage system. The CASTOR implementation is currently a separate software layer on top of the standard CASTOR system, with its own front-end servers and back-end database. The SRM exposes a Grid-enabled web-service interface and can therefore be addressed directly by a client, but in general it's more convenient to use higher-level tools as described below. Each experiment (VO) has its own SRM endpoint called srm-<voname>.gridpp.rl.ac.uk, which may map to several load-balanced hosts for resilience.

The SRM has recently been upgraded to version 2 of the protocol. The main new feature is suppore for so-called space tokens. These are named storage areas with defined properties, which for CASTOR basically map to service classes (although not all service classes have an associated space token).

The current implementation of the CASTOR SRM is not VOMS-aware; it relies on a static mapping from the DN of the user to a VO-based Unix account via the so-called Grid map file. One consequence of this is that it is not possible for a user to belong to more than one VO with the same certificate (DN) as the user mapping will always be whichever one happens to be found first. Users in multiple VOs should therefore have a separate certificate for each VO.

The information system

Information about the CASTOR SRMs is published in the Grid information system in the standard way. Currently each SRM endpoint appears as a separate Storage Element (SE), and each service class is published as a separate GlueSA object, which in turn has an attached VOInfo object for each associated space token (if any).

High-level client tools

The standard way to access CASTOR is via the general Grid clients, i.e. the lcg-utils command-line tools, the GFAL API and the FTS for bulk data movement. General information about data management can be found in the gLite Users' Guide, and there is also a good introduction on the SEEGrid wiki. GFAL and lcg-utils have man pages which can also be found on the web, and FTS has wiki-based documentation:

Note that GFAL and lcg-utils have both C and python APIs.

In general the lcg-utils tools are designed to work with the LFC file catalogue. However, for simple uses this may not be necessary and the tools can be used without the LFC; files can be copied to and from the CASTOR SRM with lcg-cp and deleted with lcg-del. A simple example to copy a local file into CASTOR would be:


lcg-cp file:/etc/group srm://srm-atlas.gridpp.rl.ac.uk/castor/ads.rl.ac.uk/test/atlas/test.file
(after creating your Grid proxy), where srm-atlas should be replaced by the name of the SRM endpoint for your VO, and the file names should be changed as appropriate.

CASTOR monitoring

Some information about the state of CASTOR can be obtained from ganglia.


Last modified Tue 13 January 2009
Switch to HTTPS . Website Help . Print View . Built with GridSite 1.4.3
For more about GridPP please contact Neasan O'Neill