Data Privacy

I attended a seminar today about data privacy given by Bradley Malin, an assistant professor of Biomedical Informatics at Vanderbilt.

It was a pretty interesting talk on an interesting subject. You must have heard about the AOL search data scandal this past summer when AOL released lots of search data. One may think that it does not hurt to publicly release anonymous data. Although users were anonymized by using some unique numbers, it didn’t take long for people to personally identify a user by cross-referencing with other public data. In this talk, Malin also emphasized the point that even though data is anonymous, it may still be possible to identify who it belongs to by using other publicly available data, which in his interest is the medical data.

One highlight of the talk was that while there used to be just 15 or 16 fields in a birth certificate more than a decade ago, there are now more than 200 fields with a lot of personally identifying information. I was very surprised when I learned that. It was also very interesting to learn that there are about 3 million cameras in the US and 4 million cameras in England. This means that people are on camera lots of times on average (I don’t remember the exact figures at the moment).

Malin also talked about some methods that they developed to prevent re-identification from anonymous data. There seems to be efforts to prevent identification from facial information from images.

Data and personal information privacy is a very important issue. The increasing usage of web in all parts of our lives and the ease and sometimes fun aspect of sharing our personal lives overcomes the fear of being identified. We are becoming more transparent and we are encouraged to be so. More than a year ago, I asked “Share it all or not?” to point out how we seem to like sharing our likes, dislikes, whereabouts, etc. Of course, it is fun and cool most of the time, but we should remember that we are leaving traces everywhere we go and we have to be careful and intelligent. Any data that we think would not be so important and personally identifying has the potential to be brought together with some other data to complete the chain.

Technorati Tags: , ,

[tags] bioinformatics, privacy[/tags]

Leave a Reply