Broad View Of Data Management

Requirements for Good Data Management

The crucial first step in data management is recording the relevant information in a form that can be managed. Not all components of the recorded data may need to be managed explicitly, but unrecorded data are unmanageable.

Archivist Terry Cook has discussed the formal criteria for creating data that will remain available and useful over time (Cook, 1995). Cook draws on the work of Richard Cox, David Bearman, and John McDonald to define a "set of needs for capturing, maintaining and using electronic records":

1. "Records must be comprehensive: a record reflecting who, what, when, where, why, with whom, and so on, must be created for every business transaction."

2. "Records must be authentic: authorizations for access to the data...must be recorded and traceable to each record and transaction."

3. "Records must be tamper-proof: no deletion or alteration to a record should occur...If a record is changed or corrected, a second record must be created and linked to the first. Moreover, each use...of a record is also a transaction and thus must generate its own record."

Although Cook is specifically addressing the requirements for managing business data, essentially the same needs must be met by anyone wanting to ensure the validity of conclusions drawn from stored flow cytometry data.

Recording the Relevant Information

The relevant information in flow cytometry data is much more than numerical lists of measurements. It is important that the data include specific information identifying the content and purpose of each sample, the machine parameters of the run, and any unusual features that could be important for future analysis.

To date most commercial flow cytometry software has been oriented toward the display and analysis of data from single samples, and has provided very limited facilities for linking the full array of other possible information to cell measurement data. Staff of flow cytometry facilities should be familiar with the data enrichment options that their software does offer and should encourage facility users to take full advantage of them.

As much of the relevant context information as possible should be included, linked to the numerical data. A data management system needs to be consistent in the types of information that are included. To avoid omitting impor-

Data Processing and Analysis

