PDSF - STAR
From Computing@RNC
(→Batch jobs: Local scratch space) |
|||
(20 intermediate revisions not shown) | |||
Line 1: | Line 1: | ||
- | + | == What's this MuDST file thing, and how do I work with them? == | |
+ | |||
+ | |||
+ | The MuDST's are the reduced STAR data files, which are hosted both here at PDSF and at RCF. There is an excellent tutorial t help you get started here: | ||
+ | |||
+ | http://rnc.lbl.gov/~jhthomas/public/MuDstTutorial06.ppt | ||
== Local STAR Resources at PDSF == | == Local STAR Resources at PDSF == | ||
Line 24: | Line 29: | ||
- | == | + | == STAR PWG & Disk Space == |
- | |||
+ | The unix filegroups associated with the working groups were developed as star working groups were formed. At this time (Jan 2010) we are restructuring the working groups to remap to the recently redefined groups. such that: | ||
+ | starpcol -- peripheral collisions | ||
+ | starhf -- heavy flavor | ||
+ | starspin -- spin | ||
- | + | And now we are adding: | |
- | + | starbulkc -- bulk correlations (starebye + starhbt) | |
+ | starjetc -- jet correlations (starhipt + starestr) | ||
+ | starlf -- light flavor (starspec + starstra) | ||
+ | |||
+ | |||
+ | Each of the OLD PWGs has an area on /eliza13/[OLD PWG]. Users should be assigned to at least one working group. To check which groups you are in, simply type the unix command, '''id'''. If you find that you are not in the STAR PWG that you need to be, please contact your PWG convener and aske to be added to that group at PDSF. | ||
+ | |||
+ | The new PWGs will be defined soon and new PWG areas will be setup on the new disk /eliza14/star/pwg/[New PWG] | ||
+ | |||
+ | == STAR use of NERSC HPSS? == | ||
+ | |||
+ | STAR has significant storage allocation on [ http://www.nersc.gov/nusers/sytems/HPSS NERSC HPSS ] that is applied for annually. Each STAR user is able to individually access that storage. To make use of HPSS, you will need to set up your HPSS token. Details can be found [http://www.nersc.gov/nusers/systems/hpss/usage_intro.php here]. | ||
There are several resources available to help you use HPSS more effectively; please see the HPSS help page ( [http://www.nersc.gov/nusers/systems/hpss/usage_examples.php HPSS Use Examples]) for examples. | There are several resources available to help you use HPSS more effectively; please see the HPSS help page ( [http://www.nersc.gov/nusers/systems/hpss/usage_examples.php HPSS Use Examples]) for examples. | ||
- | For STAR, a few scripts for simulation are posted here: [[Hijing]], [[Geant]], [[Event Reconstruction]] | + | ==== How to backup your files to HPSS with HTAR ==== |
+ | |||
+ | HPSS is the main storage element available at PDSF. Users should think of HPSS as their primary storage on the cluster - centralized disks are for temporary data storage while processing files - the disks are expected to crash occasionally, and *the policy is that they are brought back online empty*. | ||
+ | |||
+ | You can write individual files on HPSS, but you can create tar-files directly on HPSS using the [ http://www.nersc.gov/nusers/resources/hpss/usage_htar.php htar utility ]. For example, the following creates a directory "mythesisdir" under ones home area and then builds a tarfile on hpss in that directory with contents of the disk directory, "qa-analysis-dir". | ||
+ | |||
+ | hsi "mkdir -p mythesisdir/" | ||
+ | htar -cvf mythesisdir/QA-Analysis-2010.tar qa-analysis-dir | ||
+ | |||
+ | Task like this can be extended to routinely manage your important data. | ||
+ | |||
+ | |||
+ | ==== How can I use HPSS with my batch jobs? ==== | ||
+ | |||
+ | For STAR, a few scripts for simulation are posted here: [[Hijing]], [[Geant]], [[Event Reconstruction]]. Eventually we expect to reshape these to make use of an STAR specific interface to HPSS. That development will be documented here when ready. | ||
Line 52: | Line 85: | ||
- | === Batch jobs: Local scratch space === | + | === Batch jobs: Local scratch space ($SCRATCH) === |
- | Each node has local disk storage associated with it. It is recommended that users read and write to the scratch area while their jobs is running, then copy their output files to the final destination (either HPSS or GPFS disk). | + | Each node has local disk storage associated with it, through $SCRATCH. It is recommended that users read and write to the scratch area while their jobs is running, then copy their output files to the final destination (either HPSS or GPFS disk). |
SGE, the batch queue system, maintains a unique disk area for each job as scratch. The environment variable $SCRATCH is mapped to this area for each individual job. This means that users do not have to worry about their jobs running on different cores of one node interfering with each other. | SGE, the batch queue system, maintains a unique disk area for each job as scratch. The environment variable $SCRATCH is mapped to this area for each individual job. This means that users do not have to worry about their jobs running on different cores of one node interfering with each other. | ||
Line 65: | Line 98: | ||
#!/bin/sh | #!/bin/sh | ||
- | + | mudstfile = $1 | |
cd $SCRATCH | cd $SCRATCH | ||
pwd | pwd | ||
- | root4star ~/analysis/macros/myAnalysis.C $mudstfile | + | root4star -q ~/analysis/macros/myAnalysis.C $mudstfile |
mv myoutput.root $mudstfile.analysis.root | mv myoutput.root $mudstfile.analysis.root | ||
hsi "cd analysis; prompt; mput $mudstfile.analysis.root" | hsi "cd analysis; prompt; mput $mudstfile.analysis.root" | ||
Line 77: | Line 110: | ||
<pre> | <pre> | ||
/scratch/1135296.1.starprod.64bit.q | /scratch/1135296.1.starprod.64bit.q | ||
- | Warning in <TEnvRec::ChangeValue>: duplicate entry <Library.TMCParticle=libEGPythia6.so libEG.so libGraf.so libVMC.so> for level 0; ignored | + | Warning in <TEnvRec::ChangeValue>: duplicate entry <Library.TMCParticle=libEGPythia6.so |
+ | libEG.so libGraf.so libVMC.so> for level 0; ignored | ||
******************************************* | ******************************************* | ||
* * | * * | ||
Line 109: | Line 143: | ||
</pre> | </pre> | ||
+ | |||
+ | === How to retrieve SGE info for jobs that have finished === | ||
+ | |||
+ | Accounting information can be obtained using the SGE '''qacct''' command which by defaut queries the SGE accounting file $SGE_ROOT/default/common/accounting. Since on PDSF, the accounting file is rotated, you will need to point to an specific accounting file to query your job. First, find the accounting file by date, | ||
+ | |||
+ | ls $SGE_ROOT/default/common/accounting.* | ||
+ | |||
+ | And then query the file by: | ||
+ | |||
+ | qacct -j ''yourjobid'' -f $SGE_ROOT/default/common/accounting.''yourjobrundate'' | ||
+ | |||
+ | |||
+ | |||
+ | == What disk resources are available ? == | ||
+ | |||
+ | |||
+ | There are several disk/storage systems available to STAR PDSF users. Use | ||
+ | |||
+ | myquota -g rhstar | ||
+ | |||
+ | to see what disks are available to STAR. Disk space, other than /home, /common and afs areas, are NOT backed-up. The bulk of the disk space (on the '''eliza''' systems) are use-at-your-own risk and '''CAN BE WIPED CLEAN AS NEEDED'''. STAR users need to back up their data on HPSS. It is the YOUR responsibility to backup your important files to HPSS. | ||
+ | |||
+ | |||
+ | |||
+ | == How to access the /eliza filesystems from a batch job? == | ||
+ | |||
+ | |||
+ | The '''eliza''' filesystems are visible from the interactive nodes (pdsf.nersc.gov) and the batch pool. Batch processes should always specify an '''dvio resource''' in the job description (the STAR scheduler handles this more or less automatically): | ||
+ | |||
+ | qsub -hard -l eliza13io=1 ''yourscript'' | ||
+ | |||
+ | For more examples of the dvio use (dv=datavault), please [https://help.nersc.gov/cgi-bin/consult.cfg/php/enduser/std_adp.php?p_faqid=99&p_created=1016565688&p_sid=Mv1p2Awj&p_accessibility=0&p_redirect=&p_lva=&p_sp=cF9zcmNoPTEmcF9zb3J0X2J5PWRmbHQmcF9ncmlkc29ydD0mcF9yb3dfY250PTM1JnBfcHJvZHM9MTYmcF9jYXRzPTAmcF9wdj0xLjE2JnBfY3Y9JnBfc2VhcmNoX3R5cGU9YW5zd2Vycy5zZWFyY2hfbmwmcF9wYWdlPTE*&p_li=&p_topview=1 NERSC maintained PDSF FAQ entry]. | ||
+ | |||
+ | |||
== ****** Caution going below this item ******* == | == ****** Caution going below this item ******* == | ||
Line 148: | Line 216: | ||
- | |||
- | |||
- | + | == How to klog to my RHIC afs account? == | |
- | + | PDSF does not belong to the '''rhic.bnl.gov''' afs cell (the default cell is '''nersc'''), so you have to specify the '''rhic.bnl.gov''' cell explicitly. Additionally, your PDSF username may be different than on RACF. If so, you need to specify your afs account name explicitly as well. | |
- | |||
+ | klog -cell rhic.bnl.gov -principal ''YourRCFUserName'' | ||
+ | |||
+ | == How to transfer files from BNL to PDSF using GRID copy? == | ||
- | + | At RCF STAR has 4 grid machines, stargrid01-04. When you log in (e.g. rterm stargrid01 from rssh.rhic.bnl.gov ), you will have grid tools in your path. | |
+ | In order to obtain a short lived NERSC grid certificate, <u> at one of the stargrid machines at RCF </u> type: | ||
- | + | <u> <b> Step1: </b> </u> | |
- | + | myproxy-logon -l your_username_at_pdsf -s nerscca.nersc.gov [ -t hours ] | |
- | + | At the MyProxy prompt, you enter your normal nersc password. | |
+ | The default proxy lifetime is 11 hours but you can specify longer times up to 277 hours. (Note: you can test the lifetime of your certificate at any time by typing: grid-proxy-info ) | ||
+ | See also: [http://www.nersc.gov/nusers/services/Grid/certificates.php#nerscca NERSC documentation ] | ||
- | |||
- | |||
- | + | Now you can transfer the files from BNL to PDSF using globus-url-copy, for example: | |
- | + | ||
- | |||
- | |||
+ | <u> <b> Step 2: </b> </u> | ||
- | + | globus-url-copy (-vb) (-r) (-p 10) (-cd) (-tcp-bs 8000000) file:///star/$path_to_your_file_at_bnl gsiftp://pdsfdtn1.nersc.gov/eliza14/$path_to_your_file_at_pdsf | |
- | |||
+ | where -vb is verbose, -r is recursive, -cd says to create the directories at destination. | ||
- | + | Performance options: -tcp-bs 8000000 is the size (in bytes) of the TCP buffer, -p 10 says 10 streams | |
+ | For other options see for example: [http://www.globus.org/toolkit/docs/latest-stable/data/gridftp/user/#globus-url-copy web documentation] | ||
+ | |||
+ | Make sure you have write permission on $path_to_your_file_at_pdsf! | ||
+ | |||
+ | The globus-url-copy command can copy only files and not directories. If you need to move a directory, you might can use the tar or gzip commands to wrap it all up into a file. | ||
+ | |||
+ | |||
+ | <u> Note2: </u> If you need perl there is a mismatch between perl in /opt/star/bin/perl and grid installs that has not yet been fixed | ||
+ | and you will need to 'unsetenv PERL5LIB' (September 20, 2010). If you don't need perl don't worry. | ||
- | |||
- | + | [[PDSF-STAR-FAQs]] |
Current revision as of 20:37, 2 November 2010
What's this MuDST file thing, and how do I work with them?
The MuDST's are the reduced STAR data files, which are hosted both here at PDSF and at RCF. There is an excellent tutorial t help you get started here:
http://rnc.lbl.gov/~jhthomas/public/MuDstTutorial06.ppt
Local STAR Resources at PDSF
STAR has some local copies of several key software components at PDSF: many of the most used STAR software builds, and a local copy of the STAR database. Use of the local resources, as opposed to loading over the wide area network (through AFS or an offsite db server) should improve the performance of your jobs. You can find the list of locally installed libraries (as maintained by Eric Hjort) PDSF-STAR Libraries.
You can switch between the local and AFS libraries through a file in your home directory called .pdsf_setup :
#!/usr/bin/csh -f # Select one of the options below # use_local strongly recommended # to get afs based setup: # setenv STAR_PDSF_LINUX_SETUP use_afs #to avoid STAR setup altogether: # setenv STAR_PDSF_LINUX_SETUP use_none #to get local STAR setup: setenv STAR_PDSF_LINUX_SETUP use_local
Note that use_local is uncommented. To use the AFS libraries comment out the use_local and uncomment the use_afs line. Using the AFS libraries will typically be slower but may be useful if one needs the most recently built DEV libraries or doing a comparison with the local setup.
Use of the local DB on pdsf is automatic with the STAR load-balancing model. If you need to use a different server for testing you can grab the appropriate reference file, e.g. dbServers_dbx.xml (please ask an expert if you are unsure!) from $STAR/StDb/servers area and put that into your home directory as dbServers.xml.
STAR PWG & Disk Space
The unix filegroups associated with the working groups were developed as star working groups were formed. At this time (Jan 2010) we are restructuring the working groups to remap to the recently redefined groups. such that:
starpcol -- peripheral collisions starhf -- heavy flavor starspin -- spin
And now we are adding:
starbulkc -- bulk correlations (starebye + starhbt) starjetc -- jet correlations (starhipt + starestr) starlf -- light flavor (starspec + starstra)
Each of the OLD PWGs has an area on /eliza13/[OLD PWG]. Users should be assigned to at least one working group. To check which groups you are in, simply type the unix command, id. If you find that you are not in the STAR PWG that you need to be, please contact your PWG convener and aske to be added to that group at PDSF.
The new PWGs will be defined soon and new PWG areas will be setup on the new disk /eliza14/star/pwg/[New PWG]
STAR use of NERSC HPSS?
STAR has significant storage allocation on [ http://www.nersc.gov/nusers/sytems/HPSS NERSC HPSS ] that is applied for annually. Each STAR user is able to individually access that storage. To make use of HPSS, you will need to set up your HPSS token. Details can be found here.
There are several resources available to help you use HPSS more effectively; please see the HPSS help page ( HPSS Use Examples) for examples.
How to backup your files to HPSS with HTAR
HPSS is the main storage element available at PDSF. Users should think of HPSS as their primary storage on the cluster - centralized disks are for temporary data storage while processing files - the disks are expected to crash occasionally, and *the policy is that they are brought back online empty*.
You can write individual files on HPSS, but you can create tar-files directly on HPSS using the [ http://www.nersc.gov/nusers/resources/hpss/usage_htar.php htar utility ]. For example, the following creates a directory "mythesisdir" under ones home area and then builds a tarfile on hpss in that directory with contents of the disk directory, "qa-analysis-dir".
hsi "mkdir -p mythesisdir/" htar -cvf mythesisdir/QA-Analysis-2010.tar qa-analysis-dir
Task like this can be extended to routinely manage your important data.
How can I use HPSS with my batch jobs?
For STAR, a few scripts for simulation are posted here: Hijing, Geant, Event Reconstruction. Eventually we expect to reshape these to make use of an STAR specific interface to HPSS. That development will be documented here when ready.
How can I tell what embedding files are on disk?
We are in the process of implementing the filecatalog system recently expanded at RCF to include the embedding files. For the mean time, however, please refer to the page:
Embedding Report
for the listing of available embedding files.
SGE Questions
For a good overview, please see the [ http://www.nersc.gov/nusers/systems/PDSF/software/SGE.php PDSF SGE page ]. There are several tools available to monitor your jobs. The qmon command is a graphical interface to SGE which can be quite useful if your network connection is good. Try the inline commands 'sgeuser' and 'qstat' for over-all farm status and your individual job listings, respectively and are discussed in the overview page linked above.
Batch jobs: Local scratch space ($SCRATCH)
Each node has local disk storage associated with it, through $SCRATCH. It is recommended that users read and write to the scratch area while their jobs is running, then copy their output files to the final destination (either HPSS or GPFS disk).
SGE, the batch queue system, maintains a unique disk area for each job as scratch. The environment variable $SCRATCH is mapped to this area for each individual job. This means that users do not have to worry about their jobs running on different cores of one node interfering with each other.
It's important to remember that SGE removes this directory as soon as the job is complete. If you want to keep any ouput files, your job will need to archive those files before exiting.
Batch jobs: I get an error when I try to create a directory under /scratch. What do I do?
The local scratch area is now managed by SGE, and users *cannot* create and maintain their own directories on /scratch. The disk area you can write to is pointed to by the env variable $SCRATCH or $TMPDIR. Please use these instead of /scratch/$username:
#!/bin/sh mudstfile = $1 cd $SCRATCH pwd root4star -q ~/analysis/macros/myAnalysis.C $mudstfile mv myoutput.root $mudstfile.analysis.root hsi "cd analysis; prompt; mput $mudstfile.analysis.root"
Has the output:
/scratch/1135296.1.starprod.64bit.q Warning in <TEnvRec::ChangeValue>: duplicate entry <Library.TMCParticle=libEGPythia6.so libEG.so libGraf.so libVMC.so> for level 0; ignored ******************************************* * * * W E L C O M E to R O O T * * * * Version 5.12/00f 23 October 2006 * * * * You are welcome to visit our Web site * * http://root.cern.ch * * * ******************************************* FreeType Engine v2.1.9 used to render TrueType fonts. Compiled on 23 July 2008 for linux with thread support. CINT/ROOT C/C++ Interpreter version 5.16.13, June 8, 2006 Type ? for help. Commands must be C++ statements. Enclose multiple statements between { }. *** Float Point Exception is OFF *** *** Start at Date : Thu Oct 15 11:08:59 2009 QAInfo:You are using STAR_LEVEL : new, ROOT_LEVEL : 5.12.00 and node : pdsf3 [clip] *********************************************************************** * NERSC HPSS User SYSTEM (archive.nersc.gov) * *********************************************************************** Username: aarose UID: 34500 Acct: 34500(34500) Copies: 1 Firewall: off [hsi.3.4.3 Thu Jan 29 16:10:54 PST 2009][V3.4.3_2009_01_28.05] A:/home/s/starofl-> [clip]
How to retrieve SGE info for jobs that have finished
Accounting information can be obtained using the SGE qacct command which by defaut queries the SGE accounting file $SGE_ROOT/default/common/accounting. Since on PDSF, the accounting file is rotated, you will need to point to an specific accounting file to query your job. First, find the accounting file by date,
ls $SGE_ROOT/default/common/accounting.*
And then query the file by:
qacct -j yourjobid -f $SGE_ROOT/default/common/accounting.yourjobrundate
What disk resources are available ?
There are several disk/storage systems available to STAR PDSF users. Use
myquota -g rhstar
to see what disks are available to STAR. Disk space, other than /home, /common and afs areas, are NOT backed-up. The bulk of the disk space (on the eliza systems) are use-at-your-own risk and CAN BE WIPED CLEAN AS NEEDED. STAR users need to back up their data on HPSS. It is the YOUR responsibility to backup your important files to HPSS.
How to access the /eliza filesystems from a batch job?
The eliza filesystems are visible from the interactive nodes (pdsf.nersc.gov) and the batch pool. Batch processes should always specify an dvio resource in the job description (the STAR scheduler handles this more or less automatically):
qsub -hard -l eliza13io=1 yourscript
For more examples of the dvio use (dv=datavault), please NERSC maintained PDSF FAQ entry.
****** Caution going below this item *******
How to use CHOS to select OS environment?
Please see the NERSC maintained PDSF FAQ entry for CHOS .
PDSF offers several system versions, but at this point *RH8* works best for STAR users. Place in your home directory a *.chos* (there is a dot before "chos", it's a dot file) file with one line in it:
32sl44
For more info see " NERSC list of general PDSF FAQs .
How to avoid selecting mass storage resident files from the file catalog?
If the path starts with /nersc/projects/starofl then you are getting files in HPSS. You need to include "storage != HPSS" in your get_file_list.pl query. Or require "storage=local" or "storage=NFS" in your query.
How to submit jobs to SGE when my primary group is NOT rhstar?
SGE creates a shell from scratch and by default none of your session variables are inherited. To overcome this difficulty create a *.sge_request* file in a directory from which you plan to submit your STAR jobs. This file should contain the following lines:
-v EGROUP=rhstar -P star
You can add other resource variables to this file instead of on the qsub command line. The manpage for sge_request (man sge_request) will tell you more about this file. If placed in a current working directory, this file will affect ONLY jobs submitted from this directory. If placed in $HOME, it will affect ALL your jobs.
How to klog to my RHIC afs account?
PDSF does not belong to the rhic.bnl.gov afs cell (the default cell is nersc), so you have to specify the rhic.bnl.gov cell explicitly. Additionally, your PDSF username may be different than on RACF. If so, you need to specify your afs account name explicitly as well.
klog -cell rhic.bnl.gov -principal YourRCFUserName
How to transfer files from BNL to PDSF using GRID copy?
At RCF STAR has 4 grid machines, stargrid01-04. When you log in (e.g. rterm stargrid01 from rssh.rhic.bnl.gov ), you will have grid tools in your path.
In order to obtain a short lived NERSC grid certificate, at one of the stargrid machines at RCF type:
Step1:
myproxy-logon -l your_username_at_pdsf -s nerscca.nersc.gov [ -t hours ]
At the MyProxy prompt, you enter your normal nersc password.
The default proxy lifetime is 11 hours but you can specify longer times up to 277 hours. (Note: you can test the lifetime of your certificate at any time by typing: grid-proxy-info ) See also: NERSC documentation
Now you can transfer the files from BNL to PDSF using globus-url-copy, for example:
Step 2:
globus-url-copy (-vb) (-r) (-p 10) (-cd) (-tcp-bs 8000000) file:///star/$path_to_your_file_at_bnl gsiftp://pdsfdtn1.nersc.gov/eliza14/$path_to_your_file_at_pdsf
where -vb is verbose, -r is recursive, -cd says to create the directories at destination.
Performance options: -tcp-bs 8000000 is the size (in bytes) of the TCP buffer, -p 10 says 10 streams
For other options see for example: web documentation
Make sure you have write permission on $path_to_your_file_at_pdsf!
The globus-url-copy command can copy only files and not directories. If you need to move a directory, you might can use the tar or gzip commands to wrap it all up into a file.
Note2: If you need perl there is a mismatch between perl in /opt/star/bin/perl and grid installs that has not yet been fixed
and you will need to 'unsetenv PERL5LIB' (September 20, 2010). If you don't need perl don't worry.
Debug data: