How to identify spreadsheets and databases with protected health information (PII and PHI)

The nice folks from IBM’s developerWorks group asked me to write an intermediate-level set of instructions (with a little code) for how technical teams can identify and find databases and spreadsheets that might contain personally identifiable information (PII) and protected health information (PHI).

The article is now available on IBM’s developerWorks, here’s the abstract:

Identity theft and medical fraud are growing problems. They are so big the U.S. government is spending billions of dollars securing its own computer systems and has written thousands of pages of new regulations that you must follow to help protect your customer and employee data. To comply with new regulations and properly secure data, you will need to find personally identifiable information (PII) and protected health information (PHI) in your databases and documents. Both PHI and PII are conceptually easy to understand but very difficult to track in the thousands of relational data stores, files, and spreadsheets that make up a typical organization’s IT environment. This article describes some methods to automatically identify and inventory PII, PHI, and other sensitive data with databases and spreadsheets using Java™ technology and the Apache Ant build tool.

Don’t forget that there are open source and commercial scanning tools start with similar functionality but add more features. When you look for third-party tools, consider automated discovery (the tools automatically find databases and record sources with PII/PHI), configurable templates (you add your own rules), broad coverage (all files, databases, and network transfers are covered), content scanning, and auditing.

Please check out my developerWorks article and comment either directly on the IBM site or drop me some notes here about what you think about it.

Newsletter Sign Up


34 thoughts on “How to identify spreadsheets and databases with protected health information (PII and PHI)

  1. Great artical. I work in hospital and often have access to patients private information. Like our own credit history, we are able to track them from different credit companies. So, the same concept should be applied here too. There should be some independent companies which keep tracks record on who pull our information and look at it. With a small fee we can check it by ourselves and disbute the ones without our authorities.

  2. To comply with new regulations and properly secure data, you will need to find personally identifiable information (PII) and protected health information (PHI) in your databases and documents.

  3. Identity theft and medical fraud are definitely a growing dilemma in today’s time and I would like to especially thank Shahid who took out time to write on this topic and ease patients like us around.

    Not only this, I even gained immense information while reading on his “my developer Works article”, great help indeed!!

    I gained quite a useful site called Findrxonline. Please do go through it and see how safe and informative it is. Hope you have a good time reading it too.

    Regards
    Dan Watson

  4. Nice post..Both PHI and PII are conceptually easy to understand but very difficult to track in the thousands of relational data stores, files, and spreadsheets that make up a typical organization’s IT environment.

  5. This article describes some methods to automatically identify and
    inventory PII, PHI, and other sensitive data with databases and
    spreadsheets using Java Technology.

  6. This is mind blowing! the U.S. government is superb!it is spending billions of dollars securing its own computer systems ..just these new regulations must be strictly followed for security purpose!

  7. To comply with new regulations and properly secure data, you
    will need to find personally identifiable information and protected health information  in your databases
    and documents.

  8. Great artical. I work in hospital and often have access to patients
    private information. Like our own credit history, we are able to track
    them from different credit companies. So, the same concept should be
    applied here too. There should be some independent companies which keep
    tracks record on who pull our information and look at it.

  9. As is so often the case when programming, the options can become a mountain compared to the molehill of the problem. – I’ve programmed multiple versions of a similar system, and am still developing new approaches to explore .. 

Add Comment