I recently had the chance to sit down with M. Eric Johnson, Director of Tuck’s Glassmeyer/McNamee Center for Digital Strategies and Professor of Operations Management at the Tuck School of Business, Dartmouth College, to talk about his recent paper “Data Hemorrhages in the Health-Care Sector” (.pdf). The results of Dr. Johnson’s study were startling. For instance, his finding that a great deal of personal patient information is openly available on Peer-to-Peer (P2P) file sharing networks resulted in a great deal of media attention from publications dealing with privacy like SC Magazine, technology publications like Wired, and general interest publications like USA Today. We are thrilled that Dr. Johnson agreed to do a full interview with Security, Privacy, and The Law.
Because the interview is long and covers a number of important topics of interest, we will post the interview in three parts. The first installment of the interview follows below. In this part of the interview, Dr. Johnson discusses how he came to be interested in information security, how he conducted his research, and his findings about just how much personal health information is available on P2P networks.
AARON WRIGHT: Thank you for agreeing to talk to me. The first thing I wanted to ask you was if you would be willing to give us a little bit of the overview of your paper “Data Hemorrhages and Healthcare Sector”. So, would you talk, just a little bit for people who haven’t read the paper, about your very broad findings?
DR. M. ERIC JOHNSON: Yes. And you said earlier you wanted me to give you some background of why we were doing this to begin with. Do you want me to start with that?
AARON: That would be great.
ERIC: I direct a center for digital strategies here at Tuck and the center is focused on enterprise computing and large organizations. So we see the world through the eyes of a Chief Information Officer of a Fortune 500 company, that’s our viewpoint. And we do a number of things related to that. We run a CIO roundtable, both in the US and in Europe, that meets three times here in the US and twice a year in Europe. And it’s CIO of those kinds of companies. So in the US, it’s companies like Cisco Systems and Eaton and 3M and Staples, folks like that. In Europe, it’s BMW, BT and Nestle, ABB, folks like that. We’ve been doing that for some time now, seven/eight years, and our focus is thinking about how does technology enable business strategy. About four or five years ago, we began to hear more and more from the CIOs that security and related privacy issues were beginning to more and more impact their ability to use IT in different kinds of environments and it really had a big impact on a lot of the things they were doing and so that’s really what started our interest in security. We run a security workshop with CISO (chief information securities officers) that meets once a year, our most recent one from the fall we held in conjunction with Senators Collins and Lieberman, sponsored by The I3P in Washington at the Senate Dirksen building. It’s the same kind of thing as the CIO roundtable, we pulled together a group of 30 or 40 CISOs of large organizations, including some healthcare providers, and we had a discussion about the pressing issues that they are facing One of the issues that has really captured our imagination, my imagination, is not the technical securities issues – that is the hacks and so forth – but rather what I call the inadvertent disclosures.
I argue that many of the largest security breaches over the last few years have been what I would call, inadvertent disclosures – that is disclosures that resulted by mistakes in the organization or sloppiness in the organization that exposed customer data in one way, shape, or form.
So we started studying that in earnest maybe about three years ago and we’ve looked at a lot of different aspects of inadvertent disclosures – everything from lost laptops to misposting on the web of information, taking things that weren’t meant to be web-facing and inadvertently making them web-facing. But probably the most interesting area that we’ve looked at in the last couple of years has been the issue of inadvertent disclosures through file-sharing. It’s a, I think, largely very misunderstood area. People, I think, are often shocked to realize that there are millions and millions of people that are participating in different types of file-sharing activities. People would think about Napster from 10 years ago, think well isn’t that dead and gone, but, of course, in the place of Napster grew up many, many different clients and networks that enable file sharing. And as the recording industries and other have gone after them one by one, it just seems to drive even more innovation in that space. So as soon as one kind of gets closed, as soon as they’ve closed down, eDonkey or Gorkster, up grow five new ones. Last year it was LimeWire, this year is FrostWire, and it just keeps growing and growing and growing. And most of the estimates show that the population just has continued to grow well-past 10 million simultaneous users sharing music and other media all the time. But, of course, what we’ve learned over the years is that people often, inadvertently, share much more than their media files.
We actually started this project couple years ago looking at banking. We studied the top 30 US banks and found lots and lots of sensitive material being leaked out. That boiled up into a congressional hearing on that topic 18 months ago. It was a fascinating discussion because the people in the panel were myself, the CIO of the Department of Transportation, who had to explain how its chief privacy officer had leaked out a whole bunch of information, the CEO of LimeWire, who was on the hot seat, and a number of other interesting folks. What was clear there was the realization that there was a lot of stuff leaking out and not a lot had been done about it.
So, at that time, we decided to start looking at healthcare because we believed that the leaks we had seen in banking probably would be even more substantial in healthcare. And in fact, that is indeed what we found. We have lots of theories on that, why that’s true and so forth. But really, the way I see the study – this one on data hemorrhaging, is, really kind of what I call a window into the data that’s moving around within the US healthcare system. I had a lot of interest a couple of years ago just in peer-to-peer and the problems that filesharing faced. I mean I think it’s still a big problem but that’s not really my interest anymore. Now I see peer-to-peer as just one window into the kinds of issues that organizations face in maintaining control over data. I think peer-to-peer was a particularly interesting window into the fragmented nature of the US healthcare system. Unlike banking where you have, 10/20 very large players that control most of the activities, and, in those banks, there is a lot of sophisticated IT -in healthcare, it’s much more fragmented.
We started by looking at the top ten publicly traded healthcare firms, and for each firm we created, what we call, a digital signature. Basically, a set of terms related to each one of those firms that, if you Google would probably take you back to that firm and if you typed into LimeWire, would likely lead you to things that might surprise you, in terms of documents and whatnot that are being inadvertently shared.
If you look under the hood of any one of those top ten publicly-trade firms, you’ll find that each one of those is a roll-up of many, many small hospitals. So, you have lots and lots of individual hospitals that still operate under their original names in their communities. And so the name of those hospitals would be part of the digital signature that we were searching on, that we were looking for. We would use the names of local hospitals, or brands that they use in those markets, as pieces of the digital signature that we would search for and then we would search across the major networks for file matches across those set of digital signatures. And of course, we found a lot of files related to them.
Our goal in this initial study was really just to kind of get a sense of the types of data that we might encounter and so we did a sampling over a couple-week period where we collected about 3000 files that had some match against the digital signature of these ten concerns. And, as you might expect, out of those that we’re just kind of grabbing automatically to see what they look like and maybe half of them, the paper describes in detail the exact statistics, but, roughly half of them were duplicate files that had been copied and were being moved around by different players. A big hunk of them were irrelevant to our real interest – that is, they might have something to do with healthcare, but they weren’t really what we were looking for. You find all kinds of funny things, of course. Medical students are sharing whole healthcare texts that have been digitized, so, of course, we saw those things running about and journal articles and whatnot that were being shared.
But, once we sift that down at the end of the day, we found a nice hunk, I forget what the exact number was, 200-300 files that were of interest to us that had some match against these concerns and in the end contained some data we thought was interesting in some way, shape, or form. We went through and cataloged those files and categorized them by the organization. By whether they were a spreadsheet, a little database, a word document, .PDF file. And what kinds of information they had, did they have patient information, did they have employee information. I mean, in some cases we’d find a spreadsheet with a bunch of employee information. It was just a working spreadsheet or whatever being used inside of one of these organizations. But, in some cases, and we describe that in the paper, there were alarming disclosures of patient information or employee information. We cataloged those and we did a simple analysis of what we found.
Then over the next six months, we followed up with a few particularly promising organizations that seemed to be having more issues around leaks. One of the things you have to understand about peer-to-peer, that people have a hard time digesting, is that what you see on any given day or week or time changes dramatically as new members come and go, old members log in, share files, log out; unlike the web where a website may be persistent over a period of time and relatively stable. With peer-to-peer, the network is constantly changing and the individuals involved are constantly changing. And, so what you might find on a Tuesday at 2 o’clock can be very different than Thursday at 3. So we, in a rather casual way, over the next six months, sampled back in particularly promising areas of the network, places or terms we thought were particularly interesting. Or even individuals because when people share, they become members of these networks and many times from a music point of view, if I find that someone I know is sharing interesting music to me, if I go back and look, I’ll find they’re sharing more interesting music to me, so I’ll go back periodically to see what they’ve got. Just like if I go to check your blog, if I like your blog I might kind of syndicate, or whatever. It’s not quite as advanced as Google Reader but you can go back and browse, what’s called browsing a host, go back and look at a particular host that you found was interesting, just like if I went back to a blog that I found that was interesting. And so we did that and as we did that, of course, we found even more and more alarming and interesting leaks – some of which were quite extensive. Leaks where you would find a spreadsheet from one health care organization with over 20,000 patients, and for those patients, 82 fields of information, not just name, date, social security numbers, things like that, but a much more detailed set of information, including their employer, their insurance carrier, the doctor that was treating them, the diagnostic codes that were used. So some were very rich sources of information and they would come from health care organizations, they would come from partners in the health care supply chains, a collection agency, a group of anesthesiologists that may service a whole set of hospitals in a region, or a group of psychiatric providers who, again, may be servicing across a number of different healthcare organizations. Each one pointing to the fragmented nature of IT in these healthcare chains.
[Continued in Part 2]
* In the next part of this interview Dr. Johnson talks about why information in the healthcare sector is uniquely vulnerable and why that vulnerability represnts a special set of chalenges and dangers to providers and consumers alike.
- M. Eric Johnson’s homepage.
- “Data Hemorrhages in the Health-Care Sector” (.pdf)
- Angela Moscaritolo, SC Magazine, “Medical Data Leakage Rampant on P2P Networks” 2/12/09
- Kim Zetter, Threat Level Blog, Wired, “Academic Claims to Find Sensitive Medical Info Exposed on Peer-to-Peer Networks” 3/2/09
- Byron Acohido and Jon Swartz, USA Today “Data Scams Have Kicked into High Gear as Markets Tumble” 1/30/09