All Hands' Home
JLab USQCD 2008 Site
USQCD Home
Call for Proposals &
Addendum
Agenda
Proposals
DOE INCITE
Cluster performance
Banquet
2007 Meeting
2006 Meeting
2005 Meeting
|
|
Date: February 6, 2008 14:00:01 PM CST
To: USQCD Collaboration Members
From: USQCD Scientific Program Committee -- Andreas Kronfeld (Chair),
Tom Blum, Chris Dawson, Colin Morningstar, Frithjof Karsch,
John Negele, Junko Shigemitsu
Dear Colleagues,
This message is a Call for Proposals for awards of time on the USQCD
computer resources dedicated to lattice QCD (and other lattice field
theories). These are the DOE QCDOC at BNL, clusters at Fermilab and
JLab, and awards to USQCD from the INCITE program.
All members of the USQCD Collaboration are eligible to submit proposals.
Those interested in joining the Collaboration should contact Bob Sugar
(sugar@physics.ucsb.edu).
Let us begin with some important dates:
February 6: this Call for Proposals
February 29: proposals due
March 21: reports to proponents sent out
April 4, 5: All Hands' Meeting at JLab
April 25: allocations announced (if not earlier)
July 1: new allocations start
Proponents are invited to make an oral presentation of their proposals
at the All Hands' Meeting. These presentations are not mandatory, but
are recommended for large projects and in those cases where the report
raises serious issues.
The web site for the All Hands' Meeting is
http://www.usqcd.org/meetings/allHands2008/.
Since 2007, USQCD policy has been to apply as a Collaboration for time
on the "leadership-class" computers, installed at Argonne and Oak Ridge
National Laboratories, and allocated through the DOE's INCITE Program
(see http://hpc.science.doe.gov/). This strategy has been successful,
with an award of 10M core-hours to USQCD on the ORNL Cray XT4 in CY
2007, and awards of 19.6M core-hours on the ANL BlueGene/P and 7.1M
core-hours on the ORNL Cray XT4 for CY 2008. The USQCD allocation for
CY 2008 was the largest awarded by the INCITE Program. The CY 2008
allocations are for the first year of a three-year INCITE grant.
Allocations for the second and third year of the grant will be made
after the review of progress reports to be submitted in the summers of
CY 2008 and 2009.
The resources that we can expect in 2009 and 2010, based on past trends,
are considerable (see below). The timetable for INCITE is not within
our control, however, and it has not been synchronized with the USQCD
allocation year (starting each July 1). For this reason, our process
for large awards looks ahead to the INCITE process. The details are
presented below, but let us state the key advantage here: in this way we
can identify a set of projects that have a high scientific value to the
USQCD Collaboration as a whole, that are portable to INCITE hardware,
and that can responsibly and fruitfully use large computer resources.
Although the Executive Committee submitted a three-year plan for use of
the leadership-class computers in the INCITE proposal, it has the
freedom to shift priorities when submitting the annual progress reports.
Basing requests for additional resources and shifts in direction on
proposals that have been vetted scientifically by nearly all of the US
lattice QCD community should carry great weight. When USQCD is granted
time, the Executive Committee will consult with the Scientific Program
Committee to decide how much to increase the total allocation(s) of
these project(s), while moving them to INCITE's leadership-class
machines.
This will require more effort from proponents of big projects to explain
how the USQCD Collaboration will profit from extended running, and how
feasible it is to move to INCITE computers. At the same time, we do not
want to place such a burden on proponents of smaller projects. Please
note, however, that should some large-project running be moved to INCITE
computers, it would be possible to increase allocations for small
projects.
Collaboration members who wish to perform calculations on these resources
can present requests according to procedures specified below. The
Scientific Program Committee would like to handle requests and awards
either in QCDOC node-hours (for QCDOC) or in the equivalent node-hours
for the JLab "6n" cluster (for all clusters). Conversion factors for all
machines are given below. When making requests, please keep in mind that
the total computing resource is around 12 teraflop/s-yr, corresponding to
around 4 teraflop/s-yr and 88,473,600 node-hours on QCDOC, and around
8 teraflop/s-yr and 27,400,000 6n-equivalent node-hours on clusters.
A detailed description of USQCD resources can be found at the end of
this message.
Starting this year, proposals for time on clusters must also specify the
amount of disk and tape storage needed. Until now, Fermilab and JLab
have kindly provided storage at no cost to the LQCD Infrastructure
Project. With FY 2008, this has changed, and the Project will be
charged for new disks and tapes. Clearly, the more storage we buy, the
fewer CPU nodes we can buy. It is therefore essential that proposals
determine whether it is more cost effective to store or recompute files.
Guidelines for reaching these conclusions are given below.
The requests can be of three types:
A) requests for large amounts of supercomputer time---more than
500,000 6n-equivalent node-hours---to support calculations of
benefit for the whole USQCD Collaboration;
B) requests for medium amounts of supercomputer time---500,000
6n-equivalent node-hours or less---to support calculations that
are scientifically sound but not necessarily of such broad benefit;
C) requests for exploratory calculations, such as those needed to
develop and/or benchmark code, acquire expertise on the use of the
machines, or to perform investigations of limited scope.
Requests of Type A and B must be made in writing to the Scientific
Program Committee and are subject to the policies spelled out below.
Requests of Type C should be made in an e-mail message to
Bob Mawhinney (rdm@phys.columbia.edu) for QCDOCs at BNL,
Paul Mackenzie (mackenzie@fnal.gov) for clusters at FNAL,
Chip Watson (Chip.Watson@jlab.org) for clusters at JLAB.
Type C requests for QCDOC can be for access to either a 64-node, single
motherboard partition or a larger partition; the total request should be
less than a ten thousand node-hours. Type C requests for clusters should
not exceed 4000 6n-equivalent node-hours. The requests will be honored
up to a total not exceeding 5% of the available time. If the demand
exceeds such limits, the Scientific Program Committee will reconsider
the procedures for access.
The rest of this message deals with requests of Types A and B. It is
organized as follows:
i) policy directives regarding the usage of awarded resources;
ii) guidelines for the format of the proposals and deadline for
submission;
iii) procedures that will be followed to reach a consensus on the
research programs and the allocations;
iv) procedures that will be followed to guide USQCD's response to
new computing resources from programs such as INCITE;
v) description of USQCD resources at BNL, Fermilab, and JLAB, followed
by a forecast of what could be attainable through INCITE.
- o -
i) Policy directives.
1) This Call for Proposals is for calculations that will further the
physics goals of the USQCD Collaboration, as stated in the proposals for
funding submitted to the DOE (see http://www.usqcd.org/), and have the
potential of benefiting additional research projects by members of the
Collaboration.
2) Proposals of Type A are for investigations of very large scale,
which will require a substantial fraction of the available resources.
Proposals of Type B are for investigations of medium to large scale,
which will require a smaller amount of resources. Proposals requesting
more than 500,000 6n-equivalent node-hours will be considered as
Type A, smaller ones as Type B. For QCDOC, this dividing line is
6 weeks (820 hours) on a 4096-node partition.
It is hoped that about 80% of the available resources will be
allocated to proposals of Type A and about 15% to proposals of
Type B, with the rest being reserved for small allocations and
contingencies. Because our process is proposal-driven, however, we
cannot guarantee the 80-15-5 split.
3) Proposals of Type A are for investigations that benefit the whole
USQCD Collaboration. Thus it is expected that the calculations will
either produce data, such as lattice gauge fields or quark propagators,
that can be used by the entire Collaboration, or that the calculations
produce physics results listed among the Collaboration's strategic goals.
Accordingly, proponents planning to generate multi-purpose data must
describe in their proposal what data will be made available to the whole
Collaboration, and how soon, and specify clearly what physics analyses
they would like to perform in an "exclusive manner" on these data (see
below), and the expected time to complete them.
Similarly, proponents planning important physics analyses should explain
how the proposed work meets our strategic goals and how its results
would interest the broader physics community.
Projects generating multi-purpose data are candidates to be moved to
leadership-class computers to which USQCD attains access. Therefore,
these proposals must provide additional information on several fronts:
they should
demonstrate the potential to be of broad benefit, for example
by providing a list of other projects that would use the shared
data;
present a roadmap for future planning, presenting, for example,
criteria for deciding when to stop with one ensemble and start
with another;
discuss how they would cope with a substantial increase in
allocated resources, from the portability of the code and
storage needed to the availability of competent personnel
to carry out the running;
explain the technical feasibility of moving the generated data
to USQCD facilities, storing it there, and loading it into QCDOC
or a cluster for further calculations.
Finally, it will be much easier to move multi-purpose projects to INCITE
if broadly similar projects (e.g., with the same action, but different
algorithms or parameters) are coherent and unified.
Projects carrying out strategic analyses are candidates to be extended
should the multi-purpose projects move (to any extent) to other
leadership-class computers. These proposals should, therefore, include
the same kind of information as those above, except for matters of code
portability. These proposals should detail their plans to move data
generated on INCITE (or similar) resources and analyze it on USQCD
resources.
4) Proposals of Type B are not required to share data or to work towards
stated Collaboration goals, although if they do that it is a plus.
Type B proposals may also be scientifically valuable even if not closely
aligned with USQCD goals. In that case the proposal should contain a
clear discussion of the physics motivations. If appropriate, Type B
proposals may discuss data-sharing and strategic importance as in the
case of Type A proposals.
Proponents of Type B proposals are not required to give as many details
about how they would extend their running, should USQCD obtain computer
time elsewhere. If USQCD is indeed successful in such efforts, it is
likely that Type B awards will be increased at some time during the
allocation year.
5) The data that will be made available to the whole Collaboration will
have to be released promptly. "Promptly" should be interpreted with
common sense. Lattice gauge fields and propagators do not have to be
released as they are produced, especially if the group is still testing
the production environment. On the other hand, it is not considered
reasonable to delay release of, say, 190 files, just because the last 10
will not be available for a few months.
After a period during which such data will remain for the exclusive use
of the members of the USQCD Collaboration, and possibly of members of
other collaborations under reciprocal agreements, the data will be made
available worldwide as decided by the Executive Committee.
6) The USQCD Collaboration recognizes that the production of shared data
will generally entail a substantial amount of work by the investigators
generating the data. They should therefore be given priority in
analyzing the data, particularly for their principal physics interests.
Thus, proponents are encouraged to outline a set of physics analyses that
they would like to carry out with these data in an exclusive manner and
the amount of time that they would like to reserve to themselves to
complete such calculations.
When using the shared data, all other members of the USQCD collaboration
agree to respect such exclusivity. Thus, they shall refrain from using
the data to reproduce the reserved or closely similar analyses. In its
evaluation of the proposals the Scientific Program Committee will in
particular examine the requests for exclusive use of the data and will
ask the proposers to revise it in case the request was found too broad or
excessive in any other form. Once an accepted proposal has been posted
on the Collaboration website, it should be deemed by all parties that the
request for exclusive use has been accepted by the Scientific Program
Committee. Any dispute that may arise in regards to the usage of such
data will have to be directed to the Scientific Program Committee for
resolution and all members of the Collaboration should abide by the
decisions of this Committee.
7) Usage of the USQCD software, developed under our SciDAC grants, is
recommended, but not required. USQCD software is designed to be
efficient and portable, and its development leverages efforts throughout
the Collaboration. If you use this software, the SPC can be confident
that your project can use USQCD resources efficiently. Software
developed outside the collaboration must be documented to show that it
performs efficiently on its target platform(s). Information on
portability is welcome, but not mandatory.
8) The investigators whose proposals have been selected by the Scientific
Program Committee for a possible award of USQCD resources shall agree to
have their proposals posted on a password protected website, available
only to our Collaboration, for consideration during the All Hands'
Meeting. Abstracts of approved proposals will be posted on a publicly
accessible web site, and should be written accordingly.
9) The investigators receiving an allocation of time following this Call
for Proposals must maintain a public web page that reasonably documents
their plans, progress, and the availability of data. These pages should
contain information that funding agencies and review panels can use to
determine whether USQCD is a well-run organization. The public web page
need not contain unpublished scientific results, or other sensitive
information.
The SPC will not accept new proposals from old projects that still have
no web page. Please communicate the URL to mackenzie@fnal.gov
- o -
ii) Format of the proposals and deadline for submission.
The proposals should contain a title page with title, abstract and the
listing of all participating investigators. The body, including
bibliography and embedded figures, should not exceed 12 pages in length
for requests of Type A, and 10 pages in length for requests of Type B,
with font size of 11pt or larger. If necessary, further figures, with
captions but without text, can be appended, for a maximum of 8 additional
pages. CVs, publication lists and similar personal information are not
requested and should not be submitted. Title page, proposal body and
optional appended figures should be submitted as a single pdf file, in an
attachment to an e-mail message sent to ask@fnal.gov
The deadline for receipt of the proposals is noon CST on Friday,
February 29, 2008.
The last sentence of the abstract must state the total amount of computer
time for QCDOC in node-hours and/or for clusters in 6n-equivalent
node-hours (see below). Proposals lacking this information will be
returned without review (but will be reviewed if the corrected proposal
is returned quickly and without other changes).
The body of the proposal should contain the following information,
if possible in the order below:
1) The physics goals of the calculation.
2) The computational strategy, including such details as gauge and
fermionic actions, parameters, computational methods.
3) The software used, including the a decription of the main algorithms
and the code base employed. If you use USQCD software, it is not
necessary to document performance in the proposal. If you use your own
code base, then the proposal should provide enough information to show
that it performs efficiently on its target platform(s). Information on
portability is welcome, but not mandatory. As feedback for the software
development team, proposals may include an explanation of deficiencies
of the USQCD software for carrying out the proposed work.
4) The amount of resources requested, for QCDOC in node-hours or for
clusters in 6n-equivalent node-hours. Here one should also state which
machine is most desirable and why, and whether it is feasible or
desirable to run some parts of the proposed work on one machine, and
other parts on another.
USQCD has clusters with several kinds of nodes, from single-processor,
single-core, to dual-processor, quad-core. The Scientific Program
Committee will use the following table to convert:
1 QCDOC node-hour = 0.122 6n-equivalent node-hour
1 qcd node-hour = 0.498 6n-equivalent node-hour
1 pion node-hour = 0.683 6n-equivalent node-hour
1 6n node-hour = 1 6n-equivalent node-hour
1 kaon node-hour = 1.757 6n-equivalent node-hour
1 7n node-hour = 3.1 6n-equivalent node-hour
These numbers are based on the average of asqtad and DWF fermion
inverters. See http://lqcd.fnal.gov/performance.html for details on
clusters, including performance of the clover inverter. For proposals
the conversion factor for the new "J/psi" cluster at Fermilab can be
taken to be
1 J/psi node-hour = 2.55 6n-equivalent node-hour (estimated)
The total request(s) on QCDOC and clusters should also be specified in
the last sentence of the proposal's abstract (see above).
Proposals of Type A should indicate longer-term computing needs here,
writing with an eye towards the possibility of additional computer
resources from programs such as INCITE.
In addition to CPU, proposals must specify how much mass storage is
needed. The resources section of the proposal should state how much
existing storage is in use, and much new storage is needed, for disk and
tape, in Tbytes. In addition, please also restate the storage request
in 6n-equivalent node-hours, using the following conversion factor:
1 Tbyte disk = 20,000 6n-equivalent node-hour
1 Tbyte tape = 4,000 6n-equivalent node-hour
The point of this exercise is to encourage you decide whether it makes
more sense to store or to recompute a file. These estimates are advice
to the LQCD Infrastructure Project; they are not part of the allocation
for computing.
Please bear in mind that proposals for QCDOC that plan to move files to
JLab or Fermilab must also estimate the project's mass storage needs.
5) What data will be made available to the entire Collaboration, and
the schedule for sharing it.
6) What calculations the investigators would like to perform in an
"exclusive manner" (see above in the section on policy directives),
and for how long they would like to reserve to themselves this
exclusive right.
iii) Procedure for the awards.
The Scientific Program Committee will receive proposals until the
deadline of noon CST on Friday, February 29, 2008. Proposals not
stating the total request for QCDOC and clusters in the last sentence of
the abstract will be returned without review (but will be reviewed if
the corrected proposal is returned quickly and without other changes).
Proposals that are considered meritorious and conforming to the goals of
the Collaboration will be posted on the web at http://www.usqcd.org/,
in the Collaboration's password-protected area. Proposals recommended
for awards in previous years can be found there too.
The Scientific Program Committee (SPC) will make a preliminary
assessment of the proposals. On March 21, 2008, the SPC will send a
report to the proponents raising any concerns about the proposal.
The proposals will be presented and discussed at the All Hands' Meeting,
April 4-5, 2008, at Thomas Jefferson National Accelerator Facility.
Proposals of Type A will be allotted somewhat more time than proposals
of Type B, because they have to devote time to planning and logistics,
not just science. If the SPC report raised serious issues, the
proponents may submit a revised proposal any time up to the beginning of
the All Hands' Meeting, and they should explain the changes in the oral
presentation.
A Collaboration discussion will follow the presentations of the Type A
proposals. This is the opportunity for the whole Collaboration to
comment on the directions to be taken in making awards to these
proposals. Particularly if we are fortunate enough to obtain further
computing resources from INCITE, it is very important that the
Collaboration guide the SPC in setting priorities among the Type A
proposals. We shall need to determine not only the size of the awards,
but also which projects receive priority in INCITE bids.
On the second day we will have presentations on proposals of Type B,
followed by another round of discussion. As before, it is important to
the process for Collaborators to express their views on the relative
priority of the proposed projects. The possibility of increasing
Type B allocations, should USQCD receive further computing resources,
makes this input more important than ever.
Following the All Hands' Meeting the SPC will determine a set of
recommendations on the awards. The quality of the initial proposal, the
proponents' response to concerns raised in the written report, and the
views of the Collaboration expressed at the All Hands' Meeting will all
influence the outcome. The SPC will send its recommendations to the
Executive Committee shortly after the All Hands' Meeting, and inform the
proponents once the recommendations have been accepted by the Executive
Committee. The successful proposals and the size of their awards will be
posted on the web.
The new USQCD allocations will commence July 1, 2008.
Scientific publications describing calculations carried out with these
awards should acknowledge the use of USQCD resources, by including the
following sentence in the Acknowledgments:
"Computations for this work were carried out in part on facilities of
the USQCD Collaboration, which are funded by the Office of Science of
the U.S. Department of Energy."
Projects whose sole source of computing is USQCD should omit the phrase
"in part".
iv) Procedure for INCITE.
The USQCD Collaboration must be ready for significant increases in
available computer resources from programs such as INCITE. At present,
INCITE is the only program subject to these policies, but it is hoped
that these policies would be effective if other, similar opportunities
are identified by the Executive Committee. For brevity, the following
paragraphs refer only to INCITE.
In addition to the awards, the SPC will present a prioritized list of
Type A projects that could be moved or extended to INCITE. When
preparing its progress report and request for resources for CY 2009, the
Executive Committee will assess how many of these projects should be put
forth. The INCITE request will draw from sections 1-4 (see above) of
the original USQCD proposal. It will also include information deemed
necessary to make the case persuasive to INCITE, including a statement
that the proposal's scientific importance has been vetted by the SPC and
the whole USQCD Collaboration. The INCITE request will be assembled
collaboratively by the Executive Committee and the proponents of the
original USQCD proposal(s).
To allow for flexibility, the Executive Committee may consult with the
SPC. Some reasons to do so would be to ensure that our INCITE request
stays faithful to USQCD's scientific priorities; to check if priorities
remain the same as at the previous annual review; to determine the
extent to which a project should be extended, or merely moved; to assess
whether it is wise to re-balance the allocations so that the whole
Collaboration receives a boost.
v) USQCD computing resources.
The Scientific Program Committee will allocate 7200 hours/year to
Type A and Type B proposals. Of the 8766 hours in an average year the
facilities are supposed to provide 8000 hours of uptime. We then reserve
400 hours (i.e., 5%) for each host laboratory's own use, and another 400
hours for Type C proposals and contingencies.
At BNL:
QCDOC supercomputer 12,288 processors running at 400 MHz.
The typical mode of operation is with a few large partitions, ranging
from 1024 to 4096 nodes and possibly larger. Requests should include
a discussion of possible, as well as optimal, partition sizes for the
desired physics.
1 QCDOC node-hour = 0.122 6n-equivalent node-hour
For further information see http://www.bnl.gov/lqcd/
At FNAL:
120 node cluster ("QCD")
120 single-processor 2.8 GHz P4 nodes
1 GB memory/node
Myrinet network
34 GB local scratch disk/node
total: 7200*120*0.498 = 430,220 6n-equivalent node-hours
1 qcd node-hour = 0.498 6n-equivalent node-hour
520 node cluster ("Pion")
518 single-processor 3.2 GHz P4 nodes
1 GB memory/node
Infiniband network
30 GB local scratch disk/node
total: 7200*518*0.683 = 2,547,790 6n-equivalent node-hours
1 pion node-hour = 0.683 6n-equivalent node-hour
600 node cluster ("Kaon")
600 dual-core, dual-processor 2.0 GHz Opteron nodes
(2400 total cpu cores available)
4 GB memory/node
Infiniband network
88 GB local scratch disk/node
total: 7200*600*1.757 = 7,591,110 6n-equivalent node-hours
1 kaon node-hour = 1.757 6n-equivalent node-hour
Projected ~1000 node cluster ("J/psi")
1000 quad-core, dual-socket xx GHz Xeon or Opteron nodes
(~8000 total cpu cores available)
8 GB memory/node
Infiniband network
88 GB local scratch disk/node
Expected release to production: March 1, 2009
total: 2400*1000*2.55 = 6,120,000 6n-equivalent node-hour
1 J/psi node-hour = 2.55 6n-equivalent node-hour (estimated)
These clusters will share 80 TBytes of associated disk storage (60 TBytes
in dCache, ~ 20 TBytes in conventional NFS-mounted storage), and have
access to ~ 250 TBytes of tape storage. The current maximum size of
files is 400 GBytes.
For further information see http://lqcd.fnal.gov/
At JLAB:
256 node Infiniband cluster ("6n")
256 dual-core 3.0 GHz Pentium-D
(512 total cpu cores available)
1 GB memory/node
Infiniband 4x fabric
50 GB local scratch disk/node
Users must run 2 processes per node, or run multi-threaded code.
total: 7200*256*1 = 1,843,200 6n-equivalent node-hours
396 node Infiniband cluster ("7n")
396 quad-core, dual-processor 1.9 GHz Opteron (Barcelona)
(3168 total cpu cores available)
8 GB memory/node
Infiniband 4x fabric
50 GB local scratch disk/node
Users must run 8 processes per node, or run multi-threaded code.
total: 7200*396*3.1 = 8,838,720 6n-equivalent node-hours
1 7n node-hour = 3.1 6n-equivalent node-hour
These clusters will share 25-40 TBytes of associated disk storage, and
have access to a large tape storage facility of 500 Tbytes. The maximum
size of tape files is currently 20 GB, but could be increased if needed.
For further information see http://lqcd.jlab.org/
Potentially available from INCITE:
The INCITE proposal requested 104.4M core-hours on the BlueGene/P and
37.5M core-hours on the Cray XT4 for CY 2009, and 195.8M core-hours on
the BlueGene/P and 76.2M core-hours on the Cray XT4 for CY 2010. These
requests were based on the announced plans of the DOE to upgrade its
leadership-class machines and our recent history of awards from the
INCITE Program and at NERSC. We are unlikely to learn our allocation
for CY 2009 until the end of the current calendar year.
Back to Top
|