Is Your Data Valid? The What & How
29.8.17
Data Validation refers to various procedures which ensure that a program or machine receives and operates on correct, useful and clean data. These procedures often make use of routines known as validation rules or check routines that perform the task of checking for correctness, significance and the security of data entered into the system.
The accuracy of any research depends upon the reliability and significance of the data which is assessed to produce the results. In the domain of clinical research, data validation can be separated into two categories; Clinical Database Validation (testing the software) and Clinical Data Validation (testing data in the software).
Clinical Database Validation and Clinical Data Validation have similarities between them, they both need significant documentation, their comprised processes ensure the validation of the entire system through quality and consistency and they are both required by various guides and regulations.
Amidst these similarities, there exist notable differences between these two clinical validation categories covered in timing, documentation, and processes.
Data validation in clinical research is centered on the series of documented tests of data which ensure the quality and integrity of the data. These documented tests are usually concerned with checking four of the following eight characteristics of authentic clinical data.
- Originality: all data is coming from the original source and should not be overwritten so that copies and transformations of the data are complete, accurate and are traceable back to the original source.
- Attribution: known and recorded sources of the data.
- Legibility: clear, human readable data.
- Contemporary: source of data immediately recorded as generated.
- Accuracy: correctness of data.
- Endurance: availability of the data for the entire time required to be kept.
- Completeness: inclusion of all available data
- Consistency: all the data makes use of consistent terms and is non-conflicting.
Data validation tests always investigate the originality, accuracy, completeness, and consistency of data.
Data Validation Outline
The process of data validation can be so complex and is highly dependent on captured data, the data management software used, business and regulatory concerns and many others factors. This implies the possibility of many variations and options. However, the process can be summarized under the following general outline:
-
PLANNING
- The sponsor makes decisions on what checks ought to be used, the appropriate code lists and the procedures that will be followed for invalid results.
- All checks, code lists, and procedures are set and documented.
-
IMPLEMENTATION AND TESTING
- The set checks and code lists are now executed in the clinical database management system.
- As a usual part of database validation, test data and test procedures for the checks are created.
- Test procedures are carried out.
-
DATA ENTRY AND VALIDATION
- Checks are performed through data entry, either while data is entered or at intervals.
- Invalid results are adjusted or allowed following the designed procedures.
- The highlighted last set of checks is always referred to as data cleaning.
-
DATABASE LOCK
- The database is locked when no more changes or updates to the data are expected.
- Analysts may still run further checks even after database lock to find out if any alterations are necessary in order to generate the analysis datasets.
The quality and integrity of clinical data is critically important because the FDA, business partners, and other regulators greatly consider this in the evaluation of the worth of a product. Also, clinical data reliability affects treatment decisions hence affecting patient health involving a great fraction of the world population.
Data+ offers you innovative solutions to streamline your data capturing process. It helps you enrich your understanding by ensuring data accuracy. It saves you time by helping you pick up only the information relevant for your research. Data+ interfaces public sources of data and seamlessly populates your database with relevant information, all while ongoingly validating the data.