Skip to main content

Research Papers

Privacy-Preserving Data Mining of Medical Data Using Data Separation-Based Techniques

Authors
  • Gang Kou
  • Yi Peng
  • Yong Shi
  • Zhengxin Chen

Abstract

Data mining is concerned with the extraction of useful knowledge from various types of data. Medical data mining has been a popular data mining topic of late. Compared with other data mining areas, medical data mining has some unique characteristics. Because medical files are related to human subjects, privacy concerns are taken more seriously than other data mining tasks. This paper applied data separation-based techniques to preserve privacy in classification of medical data. We take two approaches to protect privacy: one approach is to vertically partition the medical data and mine these partitioned data at multiple sites; the other approach is to horizontally split data across multiple sites. In the vertical partition approach, each site uses a portion of the attributes to compute its results, and the distributed results are assembled at a central trusted party using a majority-vote ensemble method. In the horizontal partition approach, data are distributed among several sites. Each site computes its own data, and a central trusted party is responsible to integrate these results. We implement these two approaches using medical datasets from UCI KDD archive and report the experimental results.
Year: 2007
Volume 6
Page/Article: S429-S434
DOI: 10.2481/dsj.6.S429
Published on Aug 3, 2007
Peer Reviewed