Big Data in Healthcare

Posted by Christine Fritsch on September 19, 2017

A few weeks ago, I discussed the impact of big data in the law enforcement community. This week, I’d like to touch on another area of growth for big data and analytics: healthcare.

Major federal healthcare departments, Health and Human Services (HHS) and Veterans Affairs (VA) in particular, have been utilizing big data and analytics in a variety of ways; from biomedical research to medical monitoring and precision medicine to even helping curb waste, fraud and abuse.

To the last point, healthcare agencies have been gathering a greater amount of internal and external data, both structured and unstructured, and using technologies such as machine learning, natural language processing and predictive analytics to sift through the information and identify outliers and unusual patterns to identify potential fraud. Through these methods, the Centers for Medicare and Medicaid (CMS) under HHS has identified hundreds of millions of dollars in fraud. According to a FedTech article, Steve Shandy, program manager at HHS OIG, predicts big investments in natural language processing and social media analytics down the line to continue identifying abuse and fraud.  

Big data in healthcare, especially when it comes to research, simply makes sense. A large amount of various types of data must be collected and studied in order to uncover solutions to medical anomalies. Within recent years, both HHS and VA have announced a sampling of various big data and analytic initiatives the agencies have pursued to aid in medical research and administration:


  • The Veteran’s Health Administration is working with wearable devices such as Fitbit or the Apple Watch in order to monitor the health statistics of patients with programmed sensors/algorithms to report abnormalities in health standards.
  • Using super computing capabilities at the Energy Department, VA is collecting blood samples for genetic analysis in order to predict and treat post-traumatic stress disorder and other combat-related injuries and effects.


  • The National Institutes of Health (NIH) recently expanded its supercomputing capabilities by doubling its capacity to perform 1.2 thousand trillion operations per second in order to dig through an immense amount of data regarding cancer, diabetes, mental health and other medical research.
  • NIH instituted its “All of US” program in order to gather data over time from more than 1 million peoples in the U.S. to study a variety of health conditions and the impact of individual differences in lifestyle, environment and biological makeup.
  • The Food and Drug Administration is using high-performance computing modeling and simulations in order to evaluate medical devices and drugs and observe patients that may need dosage adjustments to utilize the drug’s effectiveness. Moreover, FDA is working to build a natural history database to collect data that may lead to development of “model-based drugs” for chronic diseases.

Reported spending numbers in big data from FY 2014 through FY 2016 seem to confirm the growth within HHS and VA:

Source: FPDS, Deltek

Both VA and HHS saw large increases in spending between FY 2015 to FY 2016, up nearly 50% at VA and 24% at HHS. The rise in VA is primarily due to a 66% increase in big data services for the agency with $2.9M spent in FY 2015 and $8.5M in FY 2016. Of that $8.5M, VA spent $2.4M on analysis support and $2.3M that same year in data warehouse – related services.  In software, VA spent $10.9M in FY 2015 and $19M in FY 2016, marking a 42% increase. Breaking that down further, the agency invested nearly $7.3M in FY 2016 on software related to predictive analytics and $3.3M on machine data analytics.

At HHS, spending in big data services rose 22% from FY 2015 to FY 2016; $37.4$M in FY 2015 to $47.9M in FY 2016. Likewise, spending in big data software increased by 25% from FY 2015 to FY 2016; $27.3M in FY 2015 to $36.6M in FY 2016. Within the services sector, $30.8M was spent in FY 2016 on data warehouse-related support services and $14.3 the same year in analysis support. On the software side, $20.5M was spent in FY 2016 on predictive analytics, $9.3M in analytics and $2M in machine data analytics.

Note: the above numbers are based on FPDS reported spending from FY 2014 - FY 2016. Deltek has filtered through the spending using specific big data keywords.

Given the range of use in big data within the healthcare sector, continued interest in big data spending, particularly in software and services, is likely. Technologies and methods such as machine learning, artificial intelligence, natural language processing and predictive analytics seem to be at the forefront in future use of big data in healthcare to aid in its missions.