Effective December 2014, the Pegasus group is subject to revised data management planning requirements as terms of its continued Federal research support. The following Data Management Plan has been drafted in accordance with the requirements and serves as the basis for the disclosure and hosting of public data sets herein.
Data Management Plan
Version: 3
Effective: 1 July 2022
Rationale
This data management plan governs digital research data generated by the Pegasus-III Experiment and its predecessor, the Pegasus Toroidal Experiment. In general, it is intended to describe how research products from Federally supported funds are made publicly available to the extent permitted by law.
Pegasus-III data does not in general include confidential information, personal privacy data, Personally Identifiable Information, or affect U.S. national, homeland, or economic security. Elements under non-disclosure agreement (NDA), however, will not be disclosed. These presently include: (1) specifications for CINE files, and (2) the Pegasus-III plasma control system, developed jointly with General Atomics. With the exception of the intellectual property, proprietary and business confidential information identified in this paragraph, public disclosure of Pegasus-III data to the maximal extent permitted does not cause a significant negative impact on innovation or U.S. competitiveness.
Data Collected, Generated, or Used
For the purposes of this policy, the following terms are defined to describe the data collected, generated and used that originates from the Pegasus-III Experiment and prior records originating from the Pegasus Toroidal Experiment. Raw data are defined to be the actual signals, images, etc. recorded by diagnostics arising from operating the Pegasus-III (Pegasus) spherical tokamak and its supporting systems, including calibration factors where warranted. Analyzed data are defined to be raw data that has been aggregated, interpreted, or processed by researcher(s) to draw conclusions. Published data are defined as raw and/or analyzed data that appear in peer-reviewed publications, including charts, figures, and images. Digital research data that are considered necessary to validate research findings are all published data justifying conclusions in peer-reviewed publications. Public data sets are aggregated digital research data corresponding to, cited in, and accompanying publication of individual peer-reviewed articles by supported Pegasus-III researchers after December 1, 2014. Their provenance and public hosting are discussed below.
Standards
All raw data is organized by facility pulse/discharge number for pulse-specific data and/or by relational database name and acquisition time stamps for intra-pulse records.
Raw data storage formats include, but are not limited to: Igor Pro binary waveforms; ASCII text; image files in publicly-documented Tag Image File Format (TIFF), device-independent bitmap format (BMP), and Flexible Image Transport System format (FITS); MDSplus trees; and, where required, proprietary formats made available to Pegasus-III team members under non-disclosure agreement (NDA).
Analyzed data are generated in a variety of formats. These include, but are not limited to: electronic log books; digital photographs; Igor Pro packed experiment files; Subversion software source code repositories; SQL relational databases; and analysis code outputs.
Related Tools, Software and/or Code
Commercial scientific analysis suites and research codes are used to analyze data. These suites include: Igor Pro, IDL, MATLAB, SIMULINK, LabVIEW, and Mathematica. C and FORTRAN compilers and Python interpreters are employed when needed. Low-level access codes and application programming interfaces for Pegasus-III data are provided to authorized users by Pegasus-III staff.
Access to raw and analyzed data is provided via network-based services hosted on Pegasus-III servers. These include: MDSplus, SMB file shares, Subversion source control, and SQL databases, with access privileges made available to authorized users. Open-source clients are available to utilize these services.
Data Sharing
Sharing of unpublished raw and analyzed data will be facilitated via a written acceptable use agreement, consistent with compliance to NDAs or other legal restrictions. Cost and support staff considerations associated with the development, public hosting, administration, securing, and maintenance of an Internet or web-based portal of such data is infeasible at the anticipated level of supported effort. Requests for unpublished data will be negotiated on an individual-request basis. Such requests should be addressed to pegasus_dmp@office365.wisc.edu, listed on the public Pegasus-III website (https://pegasus.ep.wisc.edu).
All published data will be aggregated into publication-specific data sets (“public data sets”). Public data sets will be provided in openly-documented, machine-readable formats and hosted on the Pegasus-III public website. An index page therein associates peer-reviewed publications, their public data sets, and their compliant file formats. Public data sets will be posted within 30 days of an article’s initial publication.
Following bibliographic best practice, Digital Object Identifiers (DOIs) are generated to identify public data sets in a permanent, consistent, citable, and data host-neutral fashion. Publications in peer-reviewed journals utilizing data subject to this policy published after December 1, 2014 will indicate how their public data sets may be accessed by clearly citing the public data set DOI in the body of the article. Such articles published after January 1, 2021 will additionally be published under an appropriate open-access license.
Data Preservation
All raw and analyzed data relevant for archival storage are preserved by ensuring at least three copies are always maintained with high reliability on on-site and off-site storage systems. Public data sets are hosted on the Pegasus-III public website. Such preservation will be performed by Pegasus-III staff for the duration of funded research activity. Raw and analyzed data may additionally be stored on institutional and/or commercial cloud providers available to University of Wisconsin-Madison staff.
If direct project funding ends, Pegasus-III staff at the time will produce an archival set of all collected raw data, published data, and a subset of analyzed data described above. Analyzed data that is not selected for permanent archival at that time will be destroyed. The archival data collection will then be deposited with a permanent, publicly Internet-accessible repository in existence at the time. Appropriate DOI metadata updates will be performed to allow uninterrupted public data access and preserve the bibliographic record. Management responsibility will be transferred from supported researchers to the chosen repository managers at the time of archival.
Data Protection: Security and Integrity
32 TB of redundant, protected storage (~180 TB total raw storage) are presently implemented for legacy Pegasus and Pegasus-III data archival. It is implemented using three commercially-available networked RAID-6 systems. Each system maintains its storage integrity against simultaneous loss of two storage disks in its multi-disk RAID array. A primary RAID-6 system in the Pegasus-III laboratory is used for day-to-day operations and data access. A secondary backup system is located in a separate, physically access-controlled server room on-site. A tertiary backup system is located off-site. Backup systems are synchronized from primary storage daily.
Data integrity is safeguarded by the commercial vendors of the underlying storage platforms from a technical perspective and Pegasus-III staff from an administrative perspective.
Oversight of Data Management
All data management and intellectual property responsibilities are assigned to the projects’ principal investigator or their designee. In general, write access to these storage services is limited to systems on the Pegasus-III facility that are responsible for generating data under the direct control of supported researchers. Others generally have read-only access privileges.
This policy is subject to future revision to accommodate changes to data management policies, laws, and regulations or in the event of modifications to available resources to the Pegasus-III Experiment. Current data management policies will be posted and maintained on the Pegasus-III public website.
Revision History
Version 3
6 March 2025
Updated language to reflect DMP in present funding period.
Version 2
9 November 2020
Updated notification email address.
Version 1
1 December 2014
Initial Release