How to apply NLP on healthcare dataset  

There are many NLP-based solutions in the healthcare industry, both open-source and enterprise,  that claim to be very accurate and deliver quick results. However, when such systems are implemented in real production scenarios, they end up being low precision and low recall, affecting productivity and hurting the company’s bottom line. NLP with medical text is difficult, and this session will lay down the basics of clinical NLP, along with possible pitfalls and how to get around them.


Understand models and techniques that are typically deployed for healthcare NLP, constraints in building a deep learning architecture for healthcare data, and considerations in architecting a scalable backend for processing large quantities of data in an encrypted fashion

That has changed with the advent of higher APIs in spark which are called, guess what, dataframes and datasets, which came into the picture with Spark 1.6. In this hack session we will some common pandas transformations on some popular datasets and will see how similar the spark dataframe transformation are to the pandas transformations. You as a developer will need to remember less code and less ways of thinking.

Apart from these we will take a look at some basic spark concepts that will help us write better Spark transformations.

Lastly for the advanced users who have always wanted more from pandas we will take a look at the map, flatmap, lazy kind of computations that are possible now in scala using the spark datasets.

Some of the concepts that we will discuss in this hack session are detailed below.
  1. Introduction to clinical NLP
  2. Explaining the problem statement – NER for NLP
  3. Data formats and data collection
  4. Feature Engineering for clinical NER
  5.  Pitfalls and possible solutions to each;
    • Non-standard text format
    • Multiple types of DX entries – diabetes==diab
    • Contextual negation
    • Temporal Negation
    • Experiencer Negation
    • Building taxonomies and ontologies
    • Compliance issues
  6. Architectural decisions for processing healthcare data
  7. One last thing – if you are building a healthcare product, what should be your MVP


Manas Ranjan Kar

Manas is currently leading the NLP & Data Science practice at Episource, a healthcare company. His daily work revolves around working on semantic technologies and computational linguistics (NLP), building algorithms and machine learning models, researching data science journals, and architecting secure product backends in the cloud. He has designed multiple commercial NLP solutions in the area of healthcare, foods & beverages, finance, and retail. He is deeply involved in functionally creating large scale business process automation & deep insights from structured/unstructured data using NLP & Machine Learning.

Duration of Hack-Session: 1 hour

Buy Ticket
Social media & sharing icons powered by UltimatelySocial
Download Brochure

Download Brochure