PDSF - STAR

From Computing@RNC

(Difference between revisions)
Jump to: navigation, search
Line 111: Line 111:
   
   
-
---++++++!!How to avoid selecting mass storage resident files from the file catalog?
 
-
If the path starts with /nersc/projects/starofl then you are
 
-
getting files in HPSS. You need to specify <b>"storage != HPSS"</b>
 
-
or <b>"storage=local"</b> or <b>"storage=NFS"</b> in the conditions in your
 
-
query to get files on disk.
 
-
---++ SGE Questions
+
== How to avoid selecting mass storage resident files from the file catalog? ==
-
For an overview, please see the [[http://www.nersc.gov/nusers/systems/PDSF/software/SGE.php][PDSF SGE page]]
+
 
-
---+++!! I/O requirements
+
 
-
---+++!! Memory requirements
+
If the path starts with '''/nersc/projects/starofl''' then you are getting files in HPSS. You need to include '''"storage != HPSS"''' in your '''get_file_list.pl''' query.  Or require '''"storage=local"''' or '''"storage=NFS"''' in your query.
-
---+++!! OS/Arch Environments
+
 
-
---+++!! How to monitor jobs via SGE?
+
 
-
There are several tools available to monitor your jobs. The most general is the *qmon* command, a graphical interface to SGE. Try the inline 'sgeuser' and 'qstat' commands for over-all farm status and your individual job listings, respectively.
+
== SGE Questions ==
-
---+++!!How to submit jobs to SGE when my primary group is NOT rhstar?
+
 
-
SGE creates a shell from scratch and by default none of your
+
 
-
session variables are inherited. To overcome this difficulty create
+
For a good overview, please see the [ http://www.nersc.gov/nusers/systems/PDSF/software/SGE.php PDSF SGE page ]There are several tools available to monitor your jobs. The '''qmon''' command is a graphical interface to SGE which can be quite useful if your network connection is good. Try the inline commands 'sgeuser' and 'qstat' for over-all farm status and your individual job listings, respectively and are discussed in the overview page linked above.
-
a *.sge_request* file in a directory from which you plan to submit
+
 
 +
 
 +
== How to submit jobs to SGE when my primary group is NOT rhstar? ==
 +
 
 +
 
 +
SGE creates a shell from scratch and by default none of your session variables are inherited. To overcome this difficulty create a *.sge_request* file in a directory from which you plan to submit
your STAR jobs. This file should contain the following lines:
your STAR jobs. This file should contain the following lines:
-
<verbatim>
 
-
-v EGROUP=rhstar
 
-
-P star
 
-
</verbatim>
 
-
*man sge_request* will tell you more about this file.
+
  -v EGROUP=rhstar
 +
  -P star
-
If placed in a current working directory, this file will affect *ONLY*
 
-
jobs submitted from this directory. If placed in $HOME, it will affect *ALL*
 
-
your jobs.
 
-
---+++!! How to retrieve job qacct info for a job that ran a long time ago i.e. qacct no long knows about?
+
You can add other resource variables to this file instead of on the qsub command line.  The manpage for sge_request (man sge_request) will tell you more about this file.  If placed in a current working directory, this file will affect '''ONLY''' jobs submitted from this directory. If placed in '''$HOME''', it will affect '''ALL''' your jobs.
-
When jobs lingered in the queue for several days,
 
-
the accounting file rotated and you need to specify it
 
-
explicitly.
 
-
First do the following:
 
-
<verbatim>
 
-
ls -ltr /auto/sge2/default/common/acc*
 
-
</verbatim>
 
-
You'll get something like:
 
-
<verbatim>
 
-
/auto/sge2/default/common/accounting.2.2005.04.22.04_19_31
 
-
/auto/sge2/default/common/accounting.2.2005.04.23.04_19_31
 
-
/auto/sge2/default/common/accounting.2.2005.04.24.04_19_31
 
-
/auto/sge2/default/common/accounting.2.2005.04.25.04_19_30
 
-
/auto/sge2/default/common/accounting.2.2005.04.26.04_19_31
 
-
/auto/sge2/default/common/accounting
 
-
</verbatim>
 
-
Start with the accounting file that has the date when
+
== How to retrieve job qacct info  ==
-
your job was submitted and work your way down:
+
-
<verbatim>
+
-
qacct -j <job ID> -f /auto/sge2/default/common/accounting.2.2005.04.22.04_19_31
+
-
</verbatim>
+
-
---++ What disk resources are available, and how do I use them?
+
Accounting information can be obtained using the SGE '''qacct''' command which by defaut queries the SGE accounting file $SGE_ROOT/default/common/accounting. Since on PDSF, the accounting file is rotated, you will need to point to an specific accounting file to query your job. First, find the accounting file by date,
-
There are several disk/storage systems available to STAR PDSF users. Use the 'myquota -g [group]' to see what disks are available to you (the "id" command provides a list of the unix groups to which you belong). Also, STAR users are encouraged to use HPSS for "safer" storage.
+
-
---+++!! Automatic monitoring/clean-up of disk resources?
+
-
The eliza disks are use-at-your-own risk data disks, and can be wiped as needed. The /home area is meant for code and permanent storage, and is backed up.
+
-
---+++!! How to access the /eliza filesystems ?</h3>
+
-
Please note, the eliza files systems are for data files *only*, and are not backed up. The policy for these disks dictates that, on emergency maintenance, they can be wiped and brought up empty. Please plan your use accordingly; all code should be resident on the /home area, andd all important files should be archived to HPSS - this is *your* responsibility!
+
-
+
-
The */eliza* filesystems are visible from the interactive nodes (*pdsf.nersc.gov*) and the *SGE2* batch pool. So to access */eliza[blah]* you have to log into *pdsf.nersc.gov* or open an interactive session on the *SGE2* pool with a *qsh* command. Batch processes should always specify a *dvio resource* in the job description (the scheduler handles this more or less automatically):
+
-
<verbatim>
+
-
    qsub -hard -l eliza13io=1 [script]
+
-
</verbatim>
+
-
For more examples of the dvio use, please [[https://help.nersc.gov/cgi-bin/consult.cfg/php/enduser/std_adp.php?p_faqid=99&p_created=1016565688&p_sid=Mv1p2Awj&p_accessibility=0&p_redirect=&p_lva=&p_sp=cF9zcmNoPTEmcF9zb3J0X2J5PWRmbHQmcF9ncmlkc29ydD0mcF9yb3dfY250PTM1JnBfcHJvZHM9MTYmcF9jYXRzPTAmcF9wdj0xLjE2JnBfY3Y9JnBfc2VhcmNoX3R5cGU9YW5zd2Vycy5zZWFyY2hfbmwmcF9wYWdlPTE*&p_li=&p_topview=1][see this PDSF FAQ entry]]. 
+
-
---+++!! HPSS access
+
-
---++++!! for your own use
+
-
Lay out basic instructions and link to general NERSC page.
+
-
---++++!! for general STAR use
+
-
Do we want to deploy DataCarousel?
+
 +
  ls $SGE_ROOT/default/common/accounting.*
 +
And then query the file by:
-
---+++Where can I find scheduler logs on PDSF?
+
  qacct -j ''yourjobid'' -f $SGE_ROOT/default/common/accounting.''yourjobrundate''
-
The scheduler logs are in:
+
-
<verbatim>
+
-
/auto/newspool/star_sched
+
-
</verbatim>
+
-
---++STAR DB at PDSF Use
+
== What disk resources are available ? ==
-
---+++!! What is the precedence for the mysql calibrations DB selection?
+
 
-
From low to high (available in dev and starting from SL05a):
+
 
 +
There are several disk/storage systems available to STAR PDSF users. Use  
 +
 
 +
  myquota -g rhstar 
 +
 
 +
to see what disks are available to STAR. Disk space, other than /home, /common and afs areas, are NOT backed-up. The bulk of the disk space (on the '''eliza''' systems) are use-at-your-own risk and '''CAN BE WIPED CLEAN AS NEEDED'''.  STAR users need to back up their data on HPSS.  It is the YOUR responsibility to backup your important files to HPSS.
 +
 
 +
 
 +
 +
== How to access the /eliza filesystems ? ==
 +
 
 +
 +
The '''eliza''' filesystems are visible from the interactive nodes (pdsf.nersc.gov) and the batch pool.  Batch processes should always specify an '''dvio resource''' in the job description (the STAR scheduler handles this more or less automatically):
 +
 
 +
    qsub -hard -l eliza13io=1 ''yourscript''
-
  * The default server will be /afs/rhic.bnl.gov/star/packages/DEV/StDb/servers/dbServers.xml
+
For more examples of the dvio use (dv=datavault), please [https://help.nersc.gov/cgi-bin/consult.cfg/php/enduser/std_adp.php?p_faqid=99&p_created=1016565688&p_sid=Mv1p2Awj&p_accessibility=0&p_redirect=&p_lva=&p_sp=cF9zcmNoPTEmcF9zb3J0X2J5PWRmbHQmcF9ncmlkc29ydD0mcF9yb3dfY250PTM1JnBfcHJvZHM9MTYmcF9jYXRzPTAmcF9wdj0xLjE2JnBfY3Y9JnBfc2VhcmNoX3R5cGU9YW5zd2Vycy5zZWFyY2hfbmwmcF9wYWdlPTE*&p_li=&p_topview=1 NERSC maintained PDSF FAQ entry].  
-
  * dbSever.xml file in you home directory will take precedence over the STAR default
 
-
  * setting an environmental variable will take precedence over both:
+
== How to klog to my RHIC afs account? ==
-
<verbatim>
+
-
    setenv STDB_SERVERS /path/to/specific/dbServers.xml
+
-
</verbatim>
+
-
---+++!! How use a PDSF mirror of the BNL mysql DB?
+
PDSF does not belong to the '''rhic.bnl.gov''' afs cell (the default cell is '''nersc'''), so you have to specify the '''rhic.bnl.gov''' cell explicitly.  Additionally, your PDSF username may be different than on RACF.  If so, you need to specify your afs account name explicitly as well.
-
To use one of several PDSF mirrors of the STAR Calibrations Data Base server you need to create a dbServers.xml file in your home directory. This file should have the following content:
 
-
<verbatim>
 
-
<StDbServer>
 
-
<server> pdsf </server>
 
-
<host> stardb.nersc.gov </host>
 
-
<port> 3306 </port>
 
-
<socket> /tmp/mysql.sock </socket>
 
-
</StDbServer>
 
-
</verbatim>
 
 +
  klog -cell rhic.bnl.gov -principal ''YourRCFUserName''
-
---++How to klog to my RHIC afs account?
 
-
PDSF does not belong to the *rhic* afs cell and our default cell is *nersc*, so you have to specify
 
-
the *rhic* cell explicitly:
 
-
<verbatim>
 
-
klog -cell rhic
 
-
</verbatim>
 
-
For some users their pdsf user name is not the same as their rcf user name. In such case you need
 
-
to also specify the principal:
 
-
<verbatim>
 
-
klog -cell rhic -principal <your rcf user name>
 
-
</verbatim>
 
 +
== What's this MuDST file thing, and how do I work with them? ==
-
---++What's this MuDST file thing, and how do I work with them?
+
The MuDST's are the reduced STAR data files, which are hosted both here at PDSF and at RCF. There is an excellent [ http://rnc.lbl.gov/~jhthomas/public/MuDstTutorial06.ppt tutorial ] to help you get started.
-
The MuDST's are the reduced STAR data files, which are hosted both here at PDSF and at RCF. There is an excellent  
+
-
[[http://rnc.lbl.gov/~jhthomas/public/MuDstTutorial06.ppt][tutorial]] to help you get started.
+

Revision as of 23:11, 1 October 2009

Contents

My password no longer works at PDSF

  • I miss-typed the password three (or more) times

For security reasons, NERSC will *lock out* an account that has three or more failed login attempts. The lock out will last *12 hours*. If this happens to you, contact NERSC account support (1-800-666-3772, option #2) to have it reset.

  • It's been a while, and I can't remember my password

Please call NERSC account support (1-800-666-3772, option #2) to have your password reset. Note, if you have not logged in for a long time (~6 months) your account may be deactiviated. If this is the case, NERSC account support will ask that you re-submit your signed NERSC User Agreement.



How to create & access individual web content on PDSF

  • Old model is Deprecated
  * static content put into $HOME/public_html 
  * accessible via http://pdsfweb01.nersc.gov/~username
  • Current & Future Model
  * static content put under group writeable area: /project/projectdirs/star/www/ 
  * accessed by http://portal.nersc.gov/project/star/
  * Please add your own user area subdirectory - e.g. http://portal.nersc.gov/project/star/username 
  * there will not be a system wide migration;  each user should migrate their own web area.
  * static html only (e.g. dynamic content must be pre-generated )


How to use IO resources of networked file systems (*eliza*)

The networked file systems on PDSF are visible from both interactive (pdsf.nersc.gov) and batch nodes. Batch processes should always specify an IO resource in the job description. The star scheduler handles this more or less automatically. For explicit job submission, use:


    qsub -hard -l elizaXXio=1 [script]


Where -l elizaXXio=1 identifies the network resources IO (XX should be a number of the eliza system) being accessed by the job and assigns a resource limit of 1. Failure to supply resource limits explicitly can cause your jobs to take a larger fraction of an IO resource, degrading it's overall performance to the detriment of everyone.

Users who abuse the limits will have their use of the system limited more directly.

For more information about setting io resource usage, please see this PDSF FAQ entry.


How can I use HPSS with my batch jobs?

HPSS is the main storage element available at PDSF. All PDSF users have a HPSS account, and access to the STAR resources. To make use of HPSS, you will need to set up your HPSS token. Details can be found here. Users should think of HPSS as their primary storage on the cluster - centralized disks are for temporary data storage while processing files - the disks are expected to crash occasionally, and the policy is that they are brought back online empty.

There are several resources available to help you use HPSS more effectively; please see the HPSS help page ( HPSS Use Examples) for examples.

For STAR, a few scripts for simulation are posted here: Hijing, Geant, Event Reconstruction


How can I tell what embedding files are on disk?

We are in the process of implementing the filecatalog system recently expanded at RCF to include the embedding files. For the mean time, however, please refer to the page:

   Embedding Report

for the listing of available embedding files.


Local STAR Resources at PDSF

STAR has some local copies of several key software components at PDSF: many of the most used STAR software builds, and a local copy of the STAR database. Use of the local resources, as opposed to loading over the wide area network (through AFS or an offsite db server) should improve the performance of your jobs. You can find the list of locally installed libraries (as maintained by Eric Hjort) PDSF-STAR Libraries.

You can switch between the local and AFS libraries through a file in your home directory called .pdsf_setup :

   #!/usr/bin/csh -f
   # Select one of the options below
   # use_local strongly recommended
   # to get afs based setup:
   #       setenv STAR_PDSF_LINUX_SETUP use_afs
   #to avoid STAR setup altogether:
   #       setenv STAR_PDSF_LINUX_SETUP use_none
   #to get local STAR setup:
           setenv STAR_PDSF_LINUX_SETUP use_local


Note that use_local is uncommented. To use the AFS libraries comment out the use_local and uncomment the use_afs line. Using the AFS libraries will typically be slower but may be useful if one needs the most recently built DEV libraries or doing a comparison with the local setup.

Use of the local DB on pdsf is automatic with the STAR load-balancing model. If you need to use a different server for testing you can grab the appropriate reference file, e.g. dbServers_dbx.xml (please ask an expert if you are unsure!) from $STAR/StDb/servers area and put that into your home directory as dbServers.xml.


Use of STAR PWG Disk Space

Each PWG has an area on /eliza13/[PWG]. Each user should be assigned to at least one working group. To check which groups you are in, simply type the unix command, id. If you find that you are not in the STAR PWG that you need to be, please contact your PWG convener and aske to be added to that group at PDSF.




****** Caution going below this item *******

How to use CHOS to select OS environment?

Please see the NERSC maintained PDSF FAQ entry for CHOS .

PDSF offers several system versions, but at this point *RH8* works best for STAR users. Place in your home directory a *.chos* (there is a dot before "chos", it's a dot file) file with one line in it:

  32sl44

For more info see " NERSC list of general PDSF FAQs .



How to avoid selecting mass storage resident files from the file catalog?

If the path starts with /nersc/projects/starofl then you are getting files in HPSS. You need to include "storage != HPSS" in your get_file_list.pl query. Or require "storage=local" or "storage=NFS" in your query.


SGE Questions

For a good overview, please see the [ http://www.nersc.gov/nusers/systems/PDSF/software/SGE.php PDSF SGE page ]. There are several tools available to monitor your jobs. The qmon command is a graphical interface to SGE which can be quite useful if your network connection is good. Try the inline commands 'sgeuser' and 'qstat' for over-all farm status and your individual job listings, respectively and are discussed in the overview page linked above.


How to submit jobs to SGE when my primary group is NOT rhstar?

SGE creates a shell from scratch and by default none of your session variables are inherited. To overcome this difficulty create a *.sge_request* file in a directory from which you plan to submit your STAR jobs. This file should contain the following lines:

  -v EGROUP=rhstar
  -P star


You can add other resource variables to this file instead of on the qsub command line. The manpage for sge_request (man sge_request) will tell you more about this file. If placed in a current working directory, this file will affect ONLY jobs submitted from this directory. If placed in $HOME, it will affect ALL your jobs.


How to retrieve job qacct info

Accounting information can be obtained using the SGE qacct command which by defaut queries the SGE accounting file $SGE_ROOT/default/common/accounting. Since on PDSF, the accounting file is rotated, you will need to point to an specific accounting file to query your job. First, find the accounting file by date,

 ls $SGE_ROOT/default/common/accounting.*

And then query the file by:

 qacct -j yourjobid -f $SGE_ROOT/default/common/accounting.yourjobrundate


What disk resources are available ?

There are several disk/storage systems available to STAR PDSF users. Use

 myquota -g rhstar  

to see what disks are available to STAR. Disk space, other than /home, /common and afs areas, are NOT backed-up. The bulk of the disk space (on the eliza systems) are use-at-your-own risk and CAN BE WIPED CLEAN AS NEEDED. STAR users need to back up their data on HPSS. It is the YOUR responsibility to backup your important files to HPSS.


How to access the /eliza filesystems ?

The eliza filesystems are visible from the interactive nodes (pdsf.nersc.gov) and the batch pool. Batch processes should always specify an dvio resource in the job description (the STAR scheduler handles this more or less automatically):

    qsub -hard -l eliza13io=1 yourscript

For more examples of the dvio use (dv=datavault), please NERSC maintained PDSF FAQ entry.


How to klog to my RHIC afs account?

PDSF does not belong to the rhic.bnl.gov afs cell (the default cell is nersc), so you have to specify the rhic.bnl.gov cell explicitly. Additionally, your PDSF username may be different than on RACF. If so, you need to specify your afs account name explicitly as well.


 klog -cell rhic.bnl.gov -principal YourRCFUserName


What's this MuDST file thing, and how do I work with them?

The MuDST's are the reduced STAR data files, which are hosted both here at PDSF and at RCF. There is an excellent [ http://rnc.lbl.gov/~jhthomas/public/MuDstTutorial06.ppt tutorial ] to help you get started.


Debug data:
Personal tools