Interview with M. Eric Johnson, Part 3

In this, the third and final part of Security, Privacy and the Law’s interview with M. Eric Johnson (Part 1 may be found here and Part 2 is here), Dr. Johnson talks about why the fragmented nature of the American healthcare system is so dangerous and why he believes greater consolidation would better protect private information. He also talks about the specific problems associated with data security on peer-to-peer file sharing networks.

AARON WRIGHT: That makes good sense. So you keep coming back, it seems to me, to the sort of fragmented nature of the U.S. healthcare system, and you talked very early on about having a couple of theories about why inadvertent disclosures were so prevalent, you call “more prevalent.” I don’t want to put words in your mouth. Do you think that’s because of the fragmented nature of the healthcare sector?

DR. M. ERIC JOHNSON: Yeh. I really do. I mean from an IT perspective, the IT that is employed in the healthcare sector in the US – while there is some very sophisticated technology what we would call islands of automation – the kind of enterprise IT used to actually kind of run the business is less sophisticated than many other industries. The fragmented nature of the industry really drives that, but it’s not the only thing that drives that, the incentives for individual health care organizations to put large investments in enterprise IT have not been so clear. And, of course, I think that’s one of the things that the Obama administration is trying to change with this stimulus bill and the new legislation around that is to try to create incentives, financial incentives, for organizations to make investments in more enterprise IT. 

I think one of things I find really interesting about this, something I’ve been puzzling through myself in the last few weeks, is that, among the privacy advocates, there’s a lot of concern about universal health care records and electronic medical records in general. I think that you have to separate out a couple of issues there. One issue is just the security of healthcare information, and I would argue that moving towards enterprise healthcare IT will improve the security of healthcare information over the ad hoc way we track information now. There are some privacy advocates that will argue that paper is inherently more secure and they have one point which is that, as information gets aggregated, the magnitude of disclosures could be much larger than stealing file folders individually. With that said, I think what they’re missing is that there’s a tremendous amount of information that’s already digital and I think they’re naïve to believe it’s not going to be more digital. In a very short time anyways, it’s all moving very quickly there. The question is how will it move and will it be moved into more secure kind of enterprise systems, or will it live in lots of smaller less secure applications? And, I would argue that moving towards enterprise or more enterprise IT format will enhance security in general across the U.S. healthcare system over time. Will it overnight? No. Then, will the transition be painful? Yes. But I think that I’d rather have them (enterprises) investing in security, and I trust their security a lot more than I would trust the security of a small office and their ability to manage my information in a spreadsheet. 

But, then there’s the other issue, which I think is more legitimate from the privacy respective, and that is what policy decisions will we make about this information once it is universally accessible? And that’s another question which has lots and lots of implications. As the information becomes more universally available in more standard formats, then the temptation will be to use that information, of course, and to use it for both good and maybe not so good reasons. So, everything from public health initiatives to allowing firms to use that information to market to me or present opportunities from a healthcare provider that maybe I’m not so excited about, or to allow employers, or the U.S. government for that matter, to use that information to maybe make decisions about my own healthcare or the way I’ll be treated that isn’t so exciting to me also. That’s a really large debate, but that’s not, in my mind, a security debate, that’s a policy debate and it’s easy to get them mixed up. 

AARON: You would say from a purely security perspective then, that greater centralization of the health records would be an improvement over the status quo?

ERIC: And moving towards enterprise IT solutions, which have far larger investments and security than many of the applications that exist today. 

AARON: Just to highlight this point, because I think it’s an interesting one, your paper reports being able to access several thousand patients’ data with relatively little effort. You say you think that there’s an incentive for criminals to use more effort than the effort you put into finding this data. Do you feel comfortable ball-parking what percentage of people data might already be available to a determined criminal? 

ERIC:  That would be a hard one to ball-park and probably way out on a limb for me to do something like that. But I also think, that our little peer-to-peer experiment used a greater effort than a casual observer. I was working with a company called Tiversa and Tiversa has access to the major peer-to-peer networks, so we could see multiple networks at one time, and be able to track them over some period of time. Still, we weren’t expending very much effort and had a pretty small budget. But, you know, a more motivated individual would certainly be able to do more than we did and, of course, we were just looking at one little window, one little source of disclosure. There’s many other ways to harvest data from healthcare organizations than the peer-to-peer. And, so, I think that the data could be had and there is a lot of data out there.

AARON: So, I know, you are not very interested in peer-to-peer anymore. But, frankly, I am, so I want to talk to you a little bit about that, if that’s all right. So, why do you think it is that peer-to-peer is such a common way for this information to come out.

ERIC: Well, I think, probably there’s a couple of features of peer-to-peer that really facilitate this. This is a hypothesis of mine, if we had never killed Napster, if we had found a way to reform Napster to being with, maybe we wouldn’t be having this conversation. But as I said earlier, the death of Napster and then the subsequent legal maneuvering of the recording industry and other content owners against peer-to-peer file sharing created tremendous innovation in this space. And with that innovation came lots and lots of different clients operating on different networks each with their own motivations and interests. Some of them open source, some of them private companies. Many of them started as companies and then moved to open source over time. But in all those cases, you end up with lots of different clients. So you take the Gnutella network, there are many new clients that operate on Gnutella, and any particular user of one of those clients has different levels of sophistication and so forth, and so a lot of what we can ascertain is that many times it’s just user error, when they install the client that they end up exposing more information that they thought they would - their whole hard drive in some cases. Sometimes that’s because just ignorance of the user, other times it may be because the client itself was really designed in such a way to try to expose more information either maliciously or, you know, to facilitate file share. The peer-to-peer file-sharing community wants to make it as easy as possible for people to share information so many of the clients come up with wizards that look on your hard drive for media files and if you store media files in and amongst other documents,- for example, if you’ve got a bunch a stuff sitting in My Documents, media and otherwise - typically it’s going to suggest that you share My Documents folder and bam, you are sharing everything. And then, of course, there is malware, and there is a fair amount of malware growing in that community, so those things also end up causing users to expose.

AARON: One of the things we’ve been tracking on our blog is Congresswoman Mary Bono Mack has recently introduced a bill, I don’t know if you are aware of this, seeking to regulate peer-to-peer networks, and which would require clear and conspicuous notice of what files the peer-to-peer networks would be sharing and informed consent of the user before the installation of the software and one the initial activation of the file sharing functions. Just based on my description, does that seem like it addresses some of those concerns?

ERIC: I am aware of some of these actions and I think they’re completely futile. They’re interesting attempts. Everyone sees the problem and they want to fix it but the reality is, if you look at the pier-to-pier community now, there’s so few real companies left. When we had this hearing 18 months ago, there was a company called LimeWire and they could grab a CEO by the neck and drag him in there. But, that’s just one little piece and even since that time, now we’ve got open source versions of LimeWire, FostWire and others that are growing very quickly. Who are you going to regulate? And many of these are not U.S.-based anymore. They are completely open-source initiatives. I don’t see that it’s practical at all to try to get the different communities, these open-source communities, they are not going to adhere to the regulation and there’s no one to go grab by the neck and drag them into court and say, “change to your   program”. So I think it’s nice for her or for them to create some hype around this or whatever, but it’s not going to have any real effect.

AARON: Do you see any potential legal solutions or do you think this is something that’s got to be dealt on the end of the user?

ERIC: I think there are two or three avenues to kind of try to reduce the peer-to-peer problem. Of course, user education, as you just eluded to, is a big piece. There are other avenues and some of them are pretty unpalatable. The internet service providers have been pointed to as one of the solutions. The security community and particular software that you can buy from security providers is another place to look. But I think that in all those cases, I really think about it more from a business point of view. You know, I think for business, the real issues is to try to prevent data from getting into ad hoc formats that then could easily be leaked out. Whether it’s through peer-to-peer or lost laptops or any of these other ways. And to say that we can go fix this peer-to-peer problem, I think it’s more a symptom. I don’t think we are going to fix it per se and even if we could, then there would be some other ways that the information can leak out. The real issue is better access control around the information and better control over the data from a business point of view.

AARON: Okay. That’s all the questions I had and we are about out of time. So, anything else you want to add before we go?

ERIC: Well, I think that the last thing I would say is that the next couple of years are going to be very interesting in this space, between the investments in healthcare and the new administration’s positioning around security. Melissa Hathaway has got her work cut out for her with a lot of interesting issues coming to bear. But I am quite optimistic we will make some good progress on information in the supply chain of any business.  I think security will radically change over the next few years.

AARON: Thank you very much. I really appreciate your time.

Interview with M. Eric Johnson, Part 2

In this, the second part of Privacy, Security and the Law’s three part interview with M. Eric Johnson (begun here), Dr. Johnson talks about why he thinks the healthcare sector is uniquely vulnerable to security breaches and what special problems that vulnerability poses.

DR. M. ERIC JOHNSON: You know, if I step back and ask what do I think is really interesting out of what we saw, I think there are two or three things. The first thing is that the fragmented nature of the US healthcare systems means that there are many players, and some of them are very unsophisticated from an IT perspective. There are small practices, doctors who don’t employ fleets of IT people and so there are, of course, elements of weakness.

 In the debate that is going on right now around electronic healthcare records, one of the things I find most amusing is this notion that records aren’t already digitized. I mean, most of our records are already quickly moving into digital format, even in very small practices. You know, people somehow have this vision of those file folders lined up in the offices. And sure they exist in plenty of small practices, but along side them, most practices, even very small practices, have some IT and they’re using to do their patient billing, they’re tracking some basic amount of information about me through that. Maybe not all my information. They haven’t maybe digitized all my images or radiology or so forth but they’ve digitized parts of those. And what you find is a huge continuum on that and that information, of course, does get passed around in this healthcare supply change in what I call ad hoc file formats. So, rather than what you might see in a bank, enterprise IT - Oracle, or a SAP, or some Microsoft enterprise level system, a lot of the data ends up in spreadsheets and small access databases, other documents and whatnot, which can easily and do easily get passed around. 

What I find interesting about that, is that that, in any ways, is an underlying root problem of these inadvertent disclosures, whether or not they show up on a peer-to-peer file sharing network. They may or may not (end up on P2P), depending on the users . But they end up on laptops, they end up on Zip drives, they end up on all kinds of other media, which gets lost or disposed of improperly and every one of those is a potential inadvertent leak source.

AARON WRIGHT So that’s the first of the important things you say you take from the study. What were the others?

ERIC: I think that one’s a pretty interesting issue. The second one, that is equally interesting to me, is the mischief that can be created from this kind of information. You know, we spent, as I said, a good deal of time studying the banking sector and in the banking sector, you worry a lot about people’s names, social security numbers, Visa or other account numbers being leaked. Of course, a leaked Visa number with my name and security code is very fungible. That is, I can create financial costs from that very easily and at relatively low costs and low sophistication from the criminal’s point of view, which of course has attracted a huge industry of criminal elements that are doing that.

In healthcare, what’s true is that first of all, there is a criminal element. It’s growing. We know it’s growing. There are different types of frauds that are happening that we can talk about, it’s kind of the third interesting area of the three takeaways I would say. But finishing off this second idea. 

What I think is interesting in healthcare is that the type of data that is leaking is similar to that of banking -name, date, social security number, these kinds of things -things that could be used to create traditional financial fraud. Because if I’ve got your social security number and your birthday and a bunch of personal information about you, I could create frauds where I open accounts or whatnot in your name. But I think what’s far more alarming, from a consumer point of view, is that the data is far more personal. That is, it goes well beyond name, date and social security number. The kinds of things we see are related to my doctor, my diagnoses, maybe my employer. Because of the (healthcare) financial web where you’ve got, some very significant players – my employer is a big player, my healthcare provider, my insurance provider is a big player – typically, those pieces of data often are kept together with information about me and so suddenly it’s not just me but it’s my employer, my healthcare provider, my doctor, my insurance provider, that are all, in some sense, part of the breach. And in some ways you can say that the breach affects them too. If I’m a large employer and a couple of thousands of employees have a disclosure but I’m listed with them, the disclosure is also against me. And then probably the most alarming is that you’d see some relatively detailed protected healthcare information, diagnoses and so forth, that I may not want disclosed for obvious reasons. So, that second takeaway is just the nature, the richness of the data and the fact that, to go back to the first kind of takeaway, you’ve got this ad hoc file format flying around with some pretty rich data, far richer than you might see in the financial world.

So then, getting to that last one, the third one, which is how does that create fraud and what’s going on in that space. There are, I would say, three types of fraud that are prevalent in the healthcare world. The first is kind of good old fashion medical fraud which typically involves billing payers: Medicare/Medicaid, other insurance payers, for treatments that likely were never rendered or exaggerating those treatments for individuals. A lot of that fraud has been around for a long time, Medicare/Medicaid has been fighting that for years. Some estimates say that 10 percent of US healthcare expenditures are really fraud. Those are staggeringly large numbers, when you think about the trillions of dollars that get spent on healthcare in the US. But, much of that has been around for a long time. These kinds of disclosures facilitate that, but there is plenty of other ways to perpetrate it. The second is medical identify theft, which involves, typically, treatment. In this case, it’s getting treatment under some other individual’s identity. The most common  approach for criminals to create wealth from that is to steal identities and then package them up and resell them to people who need access to U.S. healthcare, people who don’t have insurance, illegal immigrants, whatnot. There have been a number of cases, some which have already been in prosecution, where identities has been sold to people who need access to healthcare, and then they go get healthcare as Eric Johnson for a while. If they have my insurance information and identity information about me, it’s relatively easy for them to gain access to healthcare. 

The alarming thing about that is not only is there fraud that goes on there, but when they do that, they are changing my medical records in those places. So, suddenly you get lots of data accumulating in a medical record that is unrelated to me. And when I talk to docs about this, they’ll quickly share stories of “we always kind of scratch our heads when someone rolls into the emergency room and we look up in their healthcare record and see that the last time they were here they weighed 200 lbs. and now they weigh 125 lbs. and they didn’t lose weight. These are two different people but what are we going to do about it. At the moment, we are treating them and that’s what it’s about.” 

The last kind of area that we see around fraud, which is some of the most sophisticated fraud, well it can be unsophisticated. The unsophisticated types look to basically find ways to get prescription drugs to resell and they may do that at a very low level so that if I can get individual’s identities and just get whatever, extra prescriptions for Viagra, OxyCotin, then I can go resell that. At a larger level, the more sophisticated version typically involves using identities that have been stolen, sometimes what we synthetics identities, because sometimes they’ll use parts of real identities with other fabricated pieces of information to bill payers fraudulently for people who don’t exist, deceased individuals, and all kinds of things. When I say they are more elaborate, typically these things have to be built up over time and built around some bit of a real medical system. That is, maybe it’s a clinic that actually is providing care to some group of people with doctors and whatnot, but in some sense the clinic is a fabrication or a fraud, the back end of the clinic is all designed to commit fraud and so they have some element of realism to make them seem legitimate, and to make it easier for them to kind of commit these frauds, and these kinds of organizations grow over time, many times years, before they’re caught, and they are consumers of identities because identities fuel their fraud And so identities can be packaged up and sold to them, and then used to commit the frauds. But, as I started saying at the top of this, if you think about all three of these that I have mentioned, they all require more effort and sophistication then typical financial fraud. Of course, the criminals go to the easier house first, right? There’s a kind of a rolling belief that, when the financial fraud becomes harder and harder, we will see more fraud in healthcare and there’s lots of reasons to believe that, largely because of the data practices that I’m talking about that fuel it and also because many of the safeguards that have grown up in the banking sector don’t yet exist in the healthcare sector. That is, we don’t have Big Brother Visa looking out for individuals in the same way. Today Visa is so good, I would guess that many of your readers have had their Visa cards compromised and often they learn that from Visa themselves – a call saying, “Did you make this purchase?” and many times they call exceedingly quickly, within hours of the fraud and immediately the card is shout off, we move on to a new number and the consumers are out very little, if nothing, other than the aggravation of the event. In healthcare, there aren’t the kind of agencies or organizations with large fraud practices and algorithms that are tracking this and watching for it. It’s more likely the patients, or consumers themselves, may notice some strange billing and wonder what went wrong. Many people in health care worry that many patients don’t have a huge incentive to really chase those down, and maybe don’t understand their statements well enough to even notice when frauds are being committed against them. Also, the amounts of money that can be fraudulently obtained through healthcare could be march larger. There aren’t kind off preset limits and whatnot, like Visa might have, and the frauds, because they involve identities, sometimes are harder to stop over time. I can change my Visa number tomorrow and then Visa can shut the number down and its over, and very little fraud can be committed against a defunct Visa card, but my information related to my identity, like my social security number and whatnot, could be used over and over again to try to commit different types of medical fraud. So, many of us believe that we will see more fraud in the healthcare sector over the next ten years. 

AARON: That’s actually something I did want to talk to you about. Your paper indicates that this type of crime is relatively new and it’s not something we have a particularly good handle on. I was wondering what you predict those trends are going to look like. About how many of these types of medical identity thefts and medical fraud in general do you think are going on now, and ten years from now what do you see the trend being? 

ERIC: What is kind of funny in some ways is that we say it’s “new,” but in fact, as I said earlier, medical fraud, particularly fraudulently billing Medicare and Medicaid is an old crime, and Medicare/Medicaid has been fighting it for years. But that type of fraud usually involved corrupt organizations that were just overbilling, typically for real patients, and so there’s all kinds of effort and work that goes into auditing health care systems. Medicare and Medicaid are involved in that to try to prevent that type of fraud. That has been around a long time and, as I said, they have been as high as ten percent, big numbers. But these newer innovations, I would say, around medical identity theft are, in fact, much newer. The numbers are not available; they’re really are very few good numbers. FTC has been tracking some complaints, but we all know that a very small fraction of what happens they ever hear about or see, and so, there really aren’t any good numbers out there. It’s left to kind of people’s imaginations what the extent of the problem we’re having and how quickly it’s growing. I think the data is so suspect at this moment that I would be hard-pressed to really believe the numbers that are around at the moment. I think it’s from the anecdotal evidence just from individuals in healthcare organizations that we see it and wonder where this is really going. But, we think it’s going to grow. 

AARON: One of the things you mentioned is that there is some difficulty of monitoring and I was hoping you would point us for us why you think this monitoring is so difficult. Do you think it’s a lack of awareness or is it a combination of factors. What do you think is going on there?

ERIC: Some monitoring in what way? Just to make sure I understand.

AARON: Sure. The difficulty of monitoring both whether or not someone is currently the victim of medical identity theft and, in a broader sense, monitoring how many of these types of thefts are going on.

ERIC: Yeh. To date, I think what’s--I’ll make the comparison again back to financial sectors and in the financial sector, of course, we not only have Big Brother Visa but we also have a few very powerful credit agencies that are tracking your credit worthiness and your financial performance across all your financial undertakings. There’s really nothing like that in healthcare other than individual payers, who would be tracking your health care expenditures for their own purposes and, of course, they’re watching for fraud, so Blue Cross Blue Shield is watching for fraud within its own system as is Medicare and Medicaid, but there’s nothing that spans those organizations that is the Equifax, or whatever, of the healthcare world that would be able to see fraud across different sources. So there’s one structural difference. 

You mention awareness. I think, as I mentioned earlier, at a patient or consumer level, I think consumers probably spend far more time scrutinizing their bank statements and credit card statements than they do statements from their healthcare providers, and, to be honest, a lot of the HMOs and so forth, the way they’re structured now, they’ve created a situation where there’s really no reason for patients to scrutinize. If I go pay my co-pay and just move on, there’s really no reason for me to kind of be looking at any of those statements, and I may not even be getting statements, in fact. So, there’s plenty of reason to believe that there’s less awareness on every dimension, less overall monitoring of the healthcare dollars that are being expended on my behalf.

 [Continued in part 3]

* In part three Dr. Johnson talks about why the fragmented nature of the American healthcare system is so dangerous and why he believes greater consolidation would better protect private information. He also talks about the specific problems associated with data security on peer-to-peer file sharing networks.

Interview with M. Eric Johnson, author of "Data Hemorrhages in the Health-Care Sector"

I recently had the chance to sit down with M. Eric Johnson, Director of Tuck’s Glassmeyer/McNamee Center for Digital Strategies and Professor of Operations Management at the Tuck School of Business, Dartmouth College, to talk about his recent paper “Data Hemorrhages in the Health-Care Sector” (.pdf).   The results of Dr. Johnson’s study were startling.  For instance, his finding that a great deal of personal patient information is openly available on Peer-to-Peer (P2P) file sharing networks resulted in a great deal of media attention from publications dealing with privacy like SC Magazine, technology publications like Wired, and general interest publications like USA Today.  We are thrilled that Dr. Johnson agreed to do a full interview with Security, Privacy, and The Law.

Because the interview is long and covers a number of important topics of interest, we will post the interview in three parts.  The first installment of the interview follows below.  In this part of the interview, Dr. Johnson discusses how he came to be interested in information security, how he conducted his research, and his findings about just how much personal health information is available on P2P networks.
 

AARON WRIGHT: Thank you for agreeing to talk to me.  The first thing I wanted to ask you was if you would be willing to give us a little bit of the overview of your paper “Data Hemorrhages and Healthcare Sector”.  So, would you talk, just a little bit for people who haven’t read the paper, about your very broad findings?

DR. M. ERIC JOHNSON: Yes.  And you said earlier you wanted me to give you some background of why we were doing this to begin with.  Do you want me to start with that?

AARON:  That would be great.

ERIC:  I direct a center for digital strategies here at Tuck and the center is focused on enterprise computing and large organizations.  So we see the world through the eyes of a Chief Information Officer of a Fortune 500 company, that’s our viewpoint.  And we do a number of things related to that.  We run a CIO roundtable, both in the US and in Europe, that meets three times here in the US and twice a year in Europe.  And it’s CIO of those kinds of companies.  So in the US, it’s companies like Cisco Systems and Eaton and 3M and Staples, folks like that.  In Europe, it’s BMW, BT and Nestle, ABB, folks like that.  We’ve been doing that for some time now, seven/eight years, and our focus is thinking about how does technology enable business strategy.  About four or five years ago, we began to hear more and more from the CIOs that security and related privacy issues were beginning to more and more impact their ability to use IT in different kinds of environments and it really had a big impact on a lot of the things they were doing and so that’s really what started our interest in security. We run a security workshop with CISO (chief information securities officers) that meets once a year, our most recent one from the fall we held in conjunction with Senators Collins and Lieberman, sponsored by The I3P in Washington at the Senate Dirksen building.  It’s the same kind of thing as the CIO roundtable, we pulled together a group of 30 or 40 CISOs of large organizations, including some healthcare providers, and we had a discussion about the pressing issues that they are facing One of the issues that has really captured our imagination, my imagination, is not the technical securities issues – that is the hacks and so forth – but rather what I call the inadvertent disclosures.

I argue that many of the largest security breaches over the last few years have been what I would call, inadvertent disclosures – that is disclosures that resulted by mistakes in the organization or sloppiness in the organization that exposed customer data in one way, shape, or form.

So we started studying that in earnest maybe about three years ago and we’ve looked at a lot of different aspects of inadvertent disclosures – everything from lost laptops to misposting on the web of information, taking things that weren’t meant to be web-facing and inadvertently making them web-facing.  But probably the most interesting area that we’ve looked at in the last couple of years has been the issue of inadvertent disclosures through file-sharing.  It’s a, I think, largely very misunderstood area.  People, I think, are often shocked to realize that there are millions and millions of people that are participating in different types of file-sharing activities.  People would think about Napster from 10 years ago, think well isn’t that dead and gone, but, of course, in the place of Napster grew up many, many different clients and networks that enable file sharing. And as the recording industries and other have gone after them one by one, it just seems to drive even more innovation in that space.  So as soon as one kind of gets closed, as soon as they’ve closed down, eDonkey or Gorkster, up grow five new ones.  Last year it was LimeWire, this year is FrostWire, and it just keeps growing and growing and growing.   And most of the estimates show that the population just has continued to grow well-past 10 million simultaneous users sharing music and other media all the time.  But, of course, what we’ve learned over the years is that people often, inadvertently, share much more than their media files.  

We actually started this project couple years ago looking at banking.  We studied the top 30 US banks and found lots and lots of sensitive material being leaked out.  That boiled up into a congressional hearing on that topic 18 months ago.  It was a fascinating discussion because the people in the panel were myself, the CIO of the Department of Transportation, who had to explain how its chief privacy officer had leaked out a whole bunch of information, the CEO of LimeWire, who was on the hot seat, and a number of other interesting folks.  What was clear there was the realization that there was a lot of stuff leaking out and not a lot had been done about it. 

So, at that time, we decided to start looking at healthcare because we believed that the leaks we had seen in banking probably would be even more substantial in healthcare.  And in fact, that is indeed what we found.  We have lots of theories on that, why that’s true and so forth. But really, the way I see the study -  this one on data hemorrhaging, is, really kind of what I call a window into the data that’s moving around within the US healthcare system.  I had a lot of interest a couple of years ago just in peer-to-peer and the problems that filesharing faced.  I mean I think it’s still a big problem but that’s not really my interest anymore.  Now I see peer-to-peer as just one window into the kinds of issues that organizations face in maintaining control over data.  I think peer-to-peer was a particularly interesting window into the fragmented nature of the US healthcare system.  Unlike banking where you have, 10/20 very large players that control most of the activities, and, in those banks, there is a lot of sophisticated IT -in healthcare, it’s much more fragmented.  

We started by looking at the top ten publicly traded healthcare firms, and for each firm we created, what we call, a digital signature.  Basically, a set of terms related to each one of those firms that, if you Google would probably take you back to that firm and if you typed into LimeWire, would likely lead you to things that might surprise you, in terms of documents and whatnot that are being inadvertently shared. 

If you look under the hood of any one of those top ten publicly-trade firms, you’ll find that each one of those is a roll-up of many, many small hospitals.  So, you have lots and lots of individual hospitals that still operate under their original names in their communities.  And so the name of those hospitals would be part of the digital signature that we were searching on, that we were looking for.  We would use the names of local hospitals, or brands that they use in those markets, as pieces of the digital signature that we would search for and then we would search across the major networks for file matches across those set of digital signatures.  And of course, we found a lot of files related to them.

Our goal in this initial study was really just to kind of get a sense of the types of data that we might encounter and so we did a sampling over a couple-week period where we collected about 3000 files that had some match against the digital signature of these ten concerns.  And, as you might expect, out of those that we’re just kind of grabbing automatically to see what they look like and maybe half of them, the paper describes in detail the exact statistics, but, roughly half of them were duplicate files that had been copied and were being moved around by different players.  A big hunk of them were irrelevant to our real interest – that is, they might have something to do with healthcare, but they weren’t really what we were looking for.  You find all kinds of funny things, of course.  Medical students are sharing whole healthcare texts that have been digitized, so, of course, we saw those things running about and journal articles and whatnot that were being shared.

But, once we sift that down at the end of the day, we found a nice hunk, I forget what the exact number was, 200-300 files that were of interest to us that had some match against these concerns and in the end contained some data we thought was interesting in some way, shape, or form.  We went through and cataloged those files and categorized them by the organization.  By whether they were a spreadsheet, a little database, a word document, .PDF file.  And what kinds of information they had, did they have patient information, did they have employee information.  I mean, in some cases we’d find a spreadsheet with a bunch of employee information.  It was just a working spreadsheet or whatever being used inside of one of these organizations.  But, in some cases, and we describe that in the paper, there were alarming disclosures of patient information or employee information.  We cataloged those and we did a simple analysis of what we found.   

Then over the next six months, we followed up with a few particularly promising organizations that seemed to be having more issues around leaks.  One of the things you have to understand about peer-to-peer, that people have a hard time digesting, is that what you see on any given day or week or time changes dramatically as new members come and go, old members log in, share files, log out; unlike the web where a website may be persistent over a period of time and relatively stable.  With peer-to-peer, the network is constantly changing and the individuals involved are constantly changing.  And, so what you might find on a Tuesday at 2 o’clock can be very different than Thursday at 3.  So we, in a rather casual way, over the next six months, sampled back in particularly promising areas of the network, places or terms we thought were particularly interesting.  Or even individuals because when people share, they become members of these networks and many times from a music point of view, if I find that someone I know is sharing interesting music to me, if I go back and look, I’ll find they’re sharing more interesting music to me, so I’ll go back periodically to see what they’ve got.  Just like if I go to check your blog, if I like your blog I might kind of syndicate, or whatever.  It’s not quite as advanced as Google Reader but you can go back and browse, what’s called browsing a host, go back and look at a particular host that you found was interesting, just like if I went back to a blog that I found that was interesting.  And so we did that and as we did that, of course, we found even more and more alarming and interesting leaks – some of which were quite extensive.  Leaks where you would find a spreadsheet from one health care organization with over 20,000 patients, and for those patients, 82 fields of information, not just name, date, social security numbers, things like that, but a much more detailed set of information, including their employer, their insurance carrier, the doctor that was treating them, the diagnostic codes that were used.  So some were very rich sources of information and they would come from health care organizations, they would come from partners in the health care supply chains, a collection agency, a group of anesthesiologists that may service a whole set of hospitals in a region, or a group of psychiatric providers who, again, may be servicing across a number of different healthcare organizations.  Each one pointing to the fragmented nature of IT in these healthcare chains.

[Continued in Part 2]

* In the next part of this interview Dr. Johnson talks about why information in the healthcare sector is uniquely vulnerable and why that vulnerability represnts a special set of chalenges and dangers to providers and consumers alike.

Links: