Computing related topics

Jefferson lab provides for it's nuclear physics users a scientific computing infrastructure. A computing cluster, colloquially known as The Farm, provides interactive and batch data processing capabilities. A mass storage system provides bulk data storage in the form of several flavors of disk storage (work, volatile and cache) and a robotic tape library. The computing requirements are driven my the facility users and coordinated by the hall or collaboration leadership and Experimental Nuclear Physics (ENP) division management. The ENP division funds the computing resources and their ongoing maintenance as part of the annual operation budget.

This computing infrastructure is managed by the Scientific Computing (SCI) group in IT division. Networking and other support for computing for ENP is provided by the Computing and Networking Infrastructure (CNI) group, also in IT. The IT division is also responsible for cyber security and for managing various computing related metrics that are reported to the DOE.

There are several areas where coordinated interaction between ENP and IT takes place at a technical level. This is done via the offline and online computing coordinators, assisted by the leader of the ENP data acquisition support group.

The coordinators are:

Offline

  • A - Ole Hansen
  • B - Veronique Ziegler
  • C - Brad Sawatzky
  • D - Mark Ito

Online

  • A - Alexandre Camsonne
  • B - SergeyBoyarinov
  • C - Brad Sawatzky
  • D - David Lawrence

Some of these areas of interaction between ENP and IT divisions are documented in the following pages.

Data Management Plans

Each experiment performed at Jefferson lab represents a significant investment not only to the groups working on the experiment but also the funding agencies. It is prudent then to ensure that this investment is protected so that future researchers may not only take advantage of the final physics results but also have access to the data that produced those results in a form allowing data processing to be repeated in the light of new techniques or insights. 

By far the largest volume of data generated by an experiment is the raw data containing the digitized readout from the data acquisition system. However, the raw data is only meaningful in the context defined by the metadata that is recorded as the data is taken, this includes accelerator parameters, operating conditions and calibration of the detector, operator logs and much more. Since all of this data is stored in digital form it is also important to archive documentation, the software to read the data formats and even the software used to process the data. By far the safest course is to attempt to preserve as much of the information and software that is available. To be sure there will be points of diminishing return and, case by case, a decision on what is not worth keeping must be made.

Such is the importance of the preservation of data, that the funding agencies are asking grant applicants to provide a plan for how they will manage the data from their experiment.

With this in mind the Scientific Computing group (SCI) in IT division has written a JLab Data Management Plan that broadly outlines the steps taken to preserve data. Based on this plan the Experimental Nuclear Physics division management has prepared data management plans for each of the four experiment halls. Each hall specific plan takes into account differences in the ways in which the halls operate their online and offline data processing. These plans can be referred to by principle investigators when preparing their own data management plan and should much simplify that process.

AttachmentSize
Data_Management_Plan_Hall-A.pdf65.18 KB
Data_Management_Plan_Hall-B.pdf69.55 KB
Data_Management_Plan_Hall-C.pdf70.08 KB
Data_Management_Plan_Hall_D_v2.pdf236.11 KB