Pegasus Data Management Plan

Effective December 2014, the Pegasus group is subject to revised data managment planning requirements [1] as terms of its continued Federal research support. The following Data Management Plan has been drafted in accordance with the requirements and serves as the basis for the disclosure and hosting of public data sets herein.

Data Management Plan
Version: 1
Effective: 1 December 2014

Data Types and Sources

Raw Data is defined to be the actual signals, images, etc. recorded by diagnostics arising from operating the Pegasus facility, including calibration factors where warranted. Analyzed Data is defined to be raw data that has been aggregated, interpreted, or processed by a researcher in a documented fashion in order to draw a conclusion. Published Data are defined as raw and/or analyzed data that appear in peer-reviewed publications, including charts, figures, and images. Digital Research Data that are considered necessary to validate research findings include all published data used to infer the conclusions presented in peer-reviewed publications.

Content and Format

All raw research data is organized by plasma discharge number. Commercial scientific analysis suites are used to analyze data. These include: Igor Pro, IDL, MATLAB, SIMULINK, and Mathematica. Open source and commercial C and FORTRAN compilers are employed when needed. Open-source MDSplus client software is employed for access to a subset of Pegasus data.

Raw data storage format includes: Igor Pro binary waveform files [2]; ASCII text files; image files in publicly-documented Tag Image File Format (TIFF) [3], device-independent bitmap format (BMP) [4], and Flexible Image Transport System format (FITS) [5]; and proprietary formats associated with some diagnostics, such as the Vision Research CINE imaging system binary format, whose specification is available under a vendor non-disclosure agreement. Since September 2006, a subset of this data is also available via MDSplus [6].

Analyzed data are generated in a variety of formats suitable to their intended research usage. This includes, but is not limited to: electronic log books; digital photographs; Igor Pro packed experiment files [7]; Subversion version control repositories; SQL relational databases; and analysis code outputs.

The Igor Pro scientific analysis suite is used due to cost and ease-of-use considerations. Its data formats are incorporated into the data archive format. They are openly documented and therefore do not rely upon Igor Pro to be utilized by others. MDSplus has been partially adopted due to its rising prominence in the fusion community. Its relative complexity, high software update rate, and associated higher support and maintenance costs have limited its application within the group to cases where it is a technical requirement for the proper operation and archival of some control and diagnostic system output.

Preservation and Sharing

All raw data generated by Pegasus operations and analyzed data deemed relevant for archival storage by research staff are preserved by ensuring at least three copies are maintained at all times. Such data preservation will be performed by the Pegasus research team for the duration of funded research activity.

9 TB of primary storage is located on a file server located in the Pegasus laboratory space. Such storage is used for recently-acquired raw data and relevant analyzed data. Primary storage is comprised of three computers equipped with hardware RAID-5 systems. Two systems are designated as backups to the main file server. One is located on-site and the other off-site. Older raw data is placed in a read-only format and periodically archived in triplicate to hard disk drives. One copy is kept online to host its content. The remaining two archival copies are stored in separate on- and off-site locations.

Sharing of unpublished raw and analyzed data will be facilitated, consistent with compliance of non-disclosure agreements or other legal restrictions. Cost and support staff considerations associated with the development, public hosting, administration, securing, and maintenance of an Internet or web-based portal of such data is infeasible at the anticipated level of supported effort. With an acceptable use agreement, requests for unpublished raw or analyzed data will be negotiated and accommodated on an individual-request basis. Interested parties may request data access via pegasus_dmp@lists.wisc.edu. This address will be listed on the public Pegasus web site (http://pegasus.ep.wisc.edu).

All published data will be aggregated into publication-specific data sets (“public data sets”). Public data sets will be provided in openly-documented, machine-readable formats. They will be hosted on the Pegasus public website while direct project funding is provided. An index page associating publications and their public data sets will be established, including the generation of an associated Digital Object Identifier (DOI) [8] to identify the data set index in a permanent, consistent, and data host-neutral fashion. When feasible and cost-effective, public data sets may additionally be archived with the publisher of the peer-reviewed article as supplementary information and subsequently managed according to their preservation polices. Publications in peer-reviewed journals will indicate how associated public data sets may be accessed by clearly identifying the public data set index DOI within the body of the article. The data set will be made publicly available and the data set index site updated by Pegasus staff no later than thirty days following publication.

If direct project funding ends, Pegasus staff will provide an archival set of all collected raw data and the published data described above. Analyzed data that is not selected for permanent archival (e.g., that subject to non-disclosure agreement, containing proprietary information, or industry trade secrets) will be destroyed after direct funding ends. The archival data collection will then be deposited with a permanent, publicly Internet-accessible archival repository in existence at the time, such as presently provided by the University of Wisconsin MINDS@UW digital content repository [9]. Appropriate DOI metadata updates will be performed to allow uninterrupted public data access. Management responsibility will be transferred from supported researchers to the chosen repository managers at the time of archival.

This policy is subject to future revision to accommodate changes to data management policies, laws, and regulations or in the event of modifications to available resources to the Pegasus project.

Protection

Pegasus digital research data does not include confidential information, personal privacy data, Personally Identifiable Information, or affect U.S. national, homeland, or economic security. A minority subset of analysis routines and site-specific systems have been procured under non-disclosure agreement with commercial vendors. They will not be publicly disclosed without vendor permission. These are: binary format specifications for Phantom CINE imaging systems, and the source code and documentation for the Pegasus Plasma Control System, developed jointly with General Atomics. With the exception of the intellectual property, proprietary and business confidential information identified in the paragraph above, public disclosure of Pegasus research data to the maximal extent permitted does not cause a significant negative impact on innovation or US competiveness.

[1] “Statement on Digital Data Management [7/28/2014],” DOE Office of Science (2014). Online: http://science.energy.gov/funding-opportunities/digital-data-management; accessed April 9, 2015.
[2] “Igor Technical Note 003: Igor Binary File Format,” WaveMetrics, Inc. (1999). Online: ftp://ftp.wavemetrics.com/mirror/IgorPro/Technical_Notes/TN003.zip; accessed April 9, 2015.
[3] “TIFF: Revision 6.0,” Adobe Developers Association, June 1992. Online: http://partners.adobe.com/public/developer/en/tiff/TIFF6.pdf; accessed April 9, 2015.
[4] “Bitmap Storage,” Microsoft Corporation. Online: https://msdn.microsoft.com/en-us/library/dd183391%28v=vs.85%29.aspx; accessed April 9, 2015.
[5] W.D. Pence, et al., “Definition of the Flexible Image Transport System (FITS), version 3.0,” Astronomy and Astrophysics 524, A42 (2010).
[6] J. A. Stillerman, T. W. Fredian, K. A. Klare, and G. Manduchi. “MDSplus sata acquisition system”. Rev. Sci. Instrum., 68 (1):939-942, 1997.
[7] H. Rodstein, “Igor Pro Technical Note 003: Writing Packed Files,” WaveMetrics, Inc. (1999). Online: ftp://ftp.wavemetrics.com/mirror/IgorPro/Technical_Notes/PTN003.zip; accessed April 9, 2015.
[8] “DOI Handbook.” Online: http://dx.doi.org/10.1000/182; accessed April 9, 2015.
[9] “Minds@UW,” University of Wisconsin-Madison Digital Collections Center. Online: http://uwdcc.library.wisc.edu/minds/index.shtml; accessed April 9, 2015.