The Age of Big Data for Clinical Trials

By December 26, 2017 March 24th, 2019 Blog, Healthcare IT, Medical Research

The Age of Big Data for Clinical Trials


The term Big data is used to describe a large amount of data which can exist in a structured, semi-structured or unstructured state that businesses can utilize and analyze for potential gains. Big Data is just entering the clinical trials arena, but experts believe that it won’t be long until it will be used extensively in the healthcare sector.

The rapidly expanding field of big data is currently playing a pivotal role in the development of healthcare and research sectors. With the advent of big data, accumulation, management, analysis and assimilation of large volumes of data produced by healthcare systems, has become much simpler. Recently big data is being utilized to aid the process of care delivery, clinical research, and disease exploration. A report by McKinsey Global Institute has predicted that if US healthcare were to utilize big data accurately, the industry could generate more than $300 billion in value every year [1].

When talking about the health care and pharmacological sectors, data growth is observed owing to several sources which include the research and development process, and data pertaining to the patients, doctors, retailers, caregivers and others. If this data is organized and utilized effectively, it can make identification of potential and suitable drug candidates easier, thus speeding the process of clinical trials and assisting in delivering the medicines to the market more quickly. Other advantages of big data in the healthcare sector are mentioned below:

Key Advantages of Big Data in a Clinical Setting

  • Big data can be useful for predicting the model of a biological process, thus assisting in designing more sophisticated drugs. By controlling the available molecular and clinical information, predictive modeling could help recognize new potential-candidate molecules with a possibility of being efficaciously developed into drugs that act on biological targets safely and successfully.
  • Identification of patients for clinical trials based on other sources than those available at healthcare institutions—for example, social media—can be made possible with big data. Moreover, the basis for including patients in any trial involves many factors such as the genetic information, the daily routine, blood type, etc. and targeting such specific populations can become much easier, thus facilitating smaller, shorter and less expensive trials.
  • The prospects of significant data in clinical research goes further than identification of patients and their enrollment in the clinical trials. It can be utilized to isolate novel, targeted therapies based on biomarkers and genetic markers, for procedure achievability, to agnize the adverse event reactions among patient subpopulations, and for pre-population of electronic data capture (EDC) case reports.
  • Medical images that are used for diagnostic purposes occupies a significant amount of storage space. Ultrasound, fluoroscopy, CT scan, MRI, X-ray, and molecular imaging are some of the examples of imaging techniques that are prominent in the clinical setting. The data captured can range anywhere from a few megabytes to hundreds of gigabytes per study. Therefore, such data requires large storage capacities and proper organization.

Organizing and Filtering Large Datasets

As the size and volume of data increase, understanding the relationships among the data and designing well-organized, precise, and computationally operational approaches mandate new computer-aided methods and platforms. Various methods were developed for analyzing and organizing large data, and one such framework is Hadoop which employs MapReduce, which is a programming archetype that facilitates scalability across many servers in a Hadoop cluster with diverse real-world applications. This system has been utilized to increase the processing speed of data.

A computer-based decision support system was designed by Chen et al. which could help doctors accurately plan treatments for patients suffering from traumatic brain injury (TBI) [2]. In this design, a patient’s demographic data, medical records, and other information learned from the CT scans were combined to predict the level of intracranial pressure (ICP). The precision, sensitivity, and specificity were reported to be around 70.3%, 65.2%, and 73.7%, respectively. Owing to the growing number of healthcare institutions and increasing number of patients has inadvertently forced a greater dependence on computer-based medical diagnostics and decision support systems in the healthcare industry. Therefore, various divisions in the healthcare sector such as diagnosis, prognosis, and analysis can be enhanced by exploiting computational intelligence to manage, filter and organize big data.


  1. J. Manyika, M. Chui, B. Brown et al., Big Data: The Next Frontier for Innovation, Competition, and Productivity, McKinsey Global Institute, 2011.
  2. W. Chen, C. Cockrell, K. R. Ward, and K. Najarian, “Intracranial pressure level prediction in traumatic brain injury by extracting features from multiple sources and using machine learning methods,” in Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM ’10), pp. 510–515, IEEE, December 2010

Leave a Reply