A number of clients have been asking me about protected health information (PHI) solutions so I thought I’d put out a general call for help from my esteemed readers. What I’m looking for is a general-purpose data de-identification library (preferably open source) that I could use in both OSS and commercial systems. Even if it costs money, I’d love to hear about it.
The idea is to be able to find PHI automatically in any arbitrary data packet (HL7, e-mail, database, etc), be able to flag it, do a one-way hash, tokenize it, add it to a dictionary, etc. ...