We live in an era with unprecedented increases in the size of health data. Digitization of medical records, medical imaging, genomic data, clinical notes, and more all contributed to an exponential increase in the amount of medical data. The potential benefit of leveraging this health data is enormous. However, with this growth in health data, new challenges arise, including the focus on data privacy and security, the need for data standardization and interoperability. There is a need for effective tools for extracting information that is buried in this data and using it to derive valuable insights, inferences, and deep analytics that can make sense of the data and support clinicians.

Today, I’m excited to announce Project Health Insights Preview. Project Health Insights is a service that derives insights based on patient data and includes pre-built models that aim to power key high value scenarios in the health domain. The models receive patient data in different modalities, perform analysis, and enable clinicians to obtain inferences and insights with evidence from the input data. These insights can assist healthcare professionals in understanding clinical data, like patient profiling, clinical trials matching, and more.

Doctor accessing a patients details using a touch screen computer in a clinic.

Project Health Insights—leveraging patient data to power actionable insights

Project Health Insights supports pre-built models that receive patient data in multiple modalities as their input, and produce insights and inferences that include:

  • Confidence scores: The higher the confidence score is, the more certain the model was about the inference value provided.
  • Evidence: linking model output with specific evidence within the input provided, such as references to spans of text reflecting the data that led to an insight.

Project Health Insights Preview includes two enterprise grade AI models that can be provisioned and deployed in a matter of minutes: Oncology Phenotype and Clinical Trial Matcher.

Oncology Phenotype is a model that enables healthcare providers to rapidly identify key cancer attributes within their patient populations with an existing cancer diagnosis. The model identifies cancer attributes such as tumor site, histology, clinical stage, tumor, nodes, and metastasis (TNM) categories and pathologic stage TNM categories from unstructured clinical documents.

Key features of the Oncology Phenotype model include:

  • Cancer case finding.
  • Clinical text extraction for solid tumors.
  • Importance ranking of evidence.

Clinical Trial Matcher is a model that matches patients to potentially suitable clinical trials, according to the trial’s eligibility criteria and patient data. The model helps with finding relevant clinical trials, that patients could be qualified for, as well as with finding a cohort of potentially eligible patients for a list of clinical trials.

Key Features of the Clinical Trial Matcher model include:

  • Support for scenarios that are:
    • Patient Centric: Helping patients find potentially suitable clinical trials and assess their eligibility against the trials criteria.
    • Trial Centric: Matching a trial with a database of patients to locate a cohort of potentially suitable patients.
  • Interactive Matching where the model provides insights into missing information that is needed to further narrow down the potential clinical trial list via an interactive experience.
  • Support for various modalities of patient data such as unstructured clinical notes, structured patient data, and Fast Healthcare Interoperability Resources (FHIR®) bundles.
  • Support for search across built-in knowledge graphs for clinical trials from clinicaltrials.gov as well as against a custom trial protocol with specific eligibility criteria.

Streamlining clinical trial matching and cancer research

According to the World Health Organization, the number of registered clinical trials increased by more than 4800 percent from 1999 to 2021. Today there are more than 82,000 clinical trials actively recruiting participants worldwide (based on clinicaltrials.gov), with increasingly complicated trial eligibility criteria. However, enrollment in clinical trials is based on manual screening of millions of patients, each with up to hundreds of clinical notes requiring review and analysis by a healthcare professional, making it an unsustainable process. Given this, it is not surprising that up to 80 percent of clinical trials miss their clinical trial enrollment timelines, and up to 48 percent fail to meet clinical trial enrollment targets according to data provided by Tufts University. The Clinical Trial Matcher model aims to solve this exact problem by effectively matching patients with diverse conditions to clinical trials for which they are potentially eligible through analysis of patient’s data and the complex eligibility criteria of clinical trials.

The Oncology Phenotype model allows physicians to effectively analyze cancer patients’ data based on their tumor site, tumor histology, and cancer staging. These models deliver crucial building blocks to realize the goals set out by the White House Cancer Moonshot initiative: to develop and test new treatments, to share more data and knowledge, to collaborate on tools that can benefit all, and to make progress towards ending cancer as we know it.

Providing value across the health and life sciences industry

John’s Hopkins University Medical Center is an early user of Project Health Insights. Dr. Srinivasan Yegnasubramanian is using the Oncology Phenotype model to leverage unstructured data to accelerate Cancer Registry curation efforts for patients with solid tumors.

Pangaea Data is a Microsoft partner working in health AI. “At Pangaea Data we help companies discover 22 times more undiagnosed, misdiagnosed, and miscoded patients by characterizing them through unlocking and summarization of clinically valid actionable intelligence from patient records in a federated privacy-preserving, scalable, and evolving manner. We are exploring using Project Health Insights to augment our own advanced capabilities for characterizing patients.”—Vibhor Gupta, Director and Founder, Pangaea Data.

Akkure Genomics helps patients utilize their own genomic data or DNA to improve their chances of finding a clinical trial. “At AKKURE GENOMICS we leverage Project Health Insights, which empowers our own AI and digital DNA platform capabilities, to help patients get matched to clinical trials based on their individual medical diagnoses, thus boosting enrollment, improving the chances of finding a precision-matched trial and accelerating discovery of new therapeutics and cures.”—Professor Oran Rigby, Chief Engineering Officer and Founder, Akkure.

Built with the end user in mind

Initial models were validated in a research setting through a strategic partnership between Microsoft and Providence to accelerate digital transformation in health and life sciences. These models can enable oncologists to substantially scale up their precision oncology capabilities and generate intelligence and insights useful to clinicians as well as beneficial to patients.

Microsoft’s ability to structure complex concepts with their natural language processing tools for cancer has contributed significantly to our ability to build research cohorts and discuss cancer treatment options.”—Dr. Carlo Bifulco, Chief Medical Officer, Providence Genomics.

Microsoft will continue to expand capabilities within Project Health Insights to support additional health workloads and enable insights that will guide key decision-making in healthcare.

Microsoft continues to grow its portfolio of AI services for health

Microsoft continues to invest in AI services for the health and life sciences industry. Along with other new offerings in the Microsoft Cloud for Healthcare, we are pleased to announce new enhancements to Text Analytics for Health (TA4H).

The new enhancements include:

  • Social Determinants of Health (SDoH) and Ethnicity information extraction. The newly introduced SDoH and Ethnicity features enable extraction of social, environmental, and demographics factors from unstructured text. These factors will empower the development of more inclusive healthcare applications. Read more about it in our blog.
  • Temporal assertions—past, present, and future. The ability to identify the temporal context of TA4H entities whether in the past, present or future.

Text showing SDOH analysis

  • Customers can now extend TA4H to support custom entities based on their own data. Customers can now also extend the entities extracted by the service.

We are also excited to share that Azure Health Bot now has a new Azure OpenAI template in preview. The Azure Health Bot OpenAI template allows customers to extend their Azure Health Bot instance with Azure OpenAI Service for answering unrecognized utterances in a more intelligent way. This feature will be enabled through the Azure Health Bot template catalogue. Customers can choose to import this template into their bot instance using their Azure OpenAI resource endpoint and key, enabling fallback answers generated by GPT from trusted, medically viable sources that can be provisioned by customers. This feature provides a mechanism for customers to experiment with this capability as preview.1 Read more about this and how to apply responsible AI principles when implementing your own Health Bot instance in this blog.

We look forward to what the coming years will bring for the health and life sciences industry empowered by these new capabilities and the continued innovation we are seeing across AI and machine learning. The potential for improved precision care, quicker and more efficient clinical trials, and thereby drug and therapy availability and medical research is unparalleled. Microsoft looks forward to partnering with you and your organizations on this journey to improve the health of humankind.

Learn more


1 At this time, we are offering the preview for internal testing and evaluation purposes only.

®FHIR is a registered trademark of Health Level Seven International, registered in the U.S. Trademark Office and are used with their permission.