Elsevier

Optik

Volume 126, Issue 24, December 2015, Pages 5188-5193
Optik

Local-weighted Citation-kNN algorithm for breast ultrasound image classification

https://doi.org/10.1016/j.ijleo.2015.09.231Get rights and content

Abstract

A new multiple-instance learning (MIL) algorithm which combines local distribution feature of samples with Citation-kNN is proposed. The local distance feature and local sparseness feature are considered. The voters are weighted according to their local distribution. The differently weighted schemes and their combinations are applied to Musk benchmark data set and Breast Ultrasound (BUS) Images. For musk data set, a bag represents a molecule. And instances in a bag represent low-energy conformations of the molecule. For BUS image classification, the image is viewed as a bag and its sub-regions are considered as the instances of the bag. And the image classification problem is converted into a MIL problem. In comparison with Citation-kNN and other methods, the proposed algorithm demonstrates competitive classification accuracy and adaptability.

Introduction

Multiple-instance learning (MIL) is proposed to solve learning problems with incomplete information about labels of training data. For traditional supervised learning problem, each training example is represented by a fixed-length vector of features with known label. However, in MIL each example is called a bag and is represented by multiple instances. In other words, each example is represented by variable length feature vector. Labels are only provided for training bags, the labels of instances are unknown. And the task is to learn a model to predict the labels of new bags [1], [2], [3].

The first work on MIL was done by Dietterich et al. when they were investigating the problem of drug activity prediction [4]. The problem was to determine if a given drug molecule will strongly bind to a target protein. The axis-parallel rectangle (APR) algorithm was introduced to solve the problem. After that, many MIL methods have been studied for wide applications, such as diverse density (DD) for stock market prediction [5], natural scene classification [6] and content based image retrieval [7], MIL support vector machine for image classification [8], Citation-kNN for web mining [9], etc.

Citation-kNN is an improved kNN algorithm which is suitable for MIL approach. It is a kind of lazy learning method which defers processing training data until a query needs to be answered [11]. It borrows the concepts of citation and reference from scientific literatures. Not only the neighborhood bags of bag b are taken into account, but also the bags that count b as a neighbor are considered [9]. The Hausdorff distance [10] is used to measure distances among bags. Then the MIL problem is shifted from discriminating instances to discriminating bags [3].

However, in Citation-kNN algorithm, the contribution of each training bag to the classification is either 0 or 1. The distribution of the bags in feature space is not considered. But in most case, the distribution property can affect the final decision or the decision confidence.

To solve the problem, an improved Citation-kNN algorithm called locally weighted Citation-kNN (LWCKNN) is proposed in this paper. The distribution feature, such as the relative distance and sparseness among bags, is taken into account. The algorithm is tested with Musk data and breast ultrasound images. And it shows better results than that of the traditional Citation-kNN.

The rest of the paper is organized as follows. The Citation-kNN algorithm is reviewed in Section 2. The LWCKNN algorithm is presented in Section 3. The experimental results are shown in Section 4. Finally, the discussions and the conclusions are drawn in Section 5.

Section snippets

Citation-kNN

The standard k-nearest neighbor algorithm (k-NN) is a method for classifying test samples based on k closest training examples in feature space. The test sample is assigned to the class mostly occurring amongst its k nearest neighbors. Usually, the Euclidean distance is used to measure the closeness of the samples. For two different samples in feature space, i.e., a and b, the distance between them can be written as following:Dist(a,b)=ab

But for MIL, the distance between bags cannot be

Locally weighted Citation-kNN (LWCKNN)

In Citation-kNN algorithm, when a test bag X is to be classified, its reference set and citer set are calculated using Hausdorff distance, and this forms the voter set of X. Majority voting among the training bags in the voter set is usually used to decide the label of X. The process does not take the distribution of samples into consideration. Each element in the voter set makes equal contribution to the prediction of test bag X, no matter where it is related to X and to the other elements in

Experimental results

The data sets in our experiments are Musk1 and Musk2 which are the benchmark data sets for MIL, and a set of breast ultrasound images acquired by the Department of Ultrasound of the Second Affiliated Hospital of Harbin Medical University. Different weighted methods are selected and combined to conduct the experiments. The results are compared with that by using traditional Citation-kNN algorithm.

In experiments, k-fold cross validation is used. All the data are randomly divided into 10 groups.

Discussions and conclusions

The distribution of samples is an important factor for the classification. To improve Citation-kNN decision rule, the local distribution feature of samples is considered in this paper. The different voters should have different contributions to the classification. The Distance-Weighted Decision considers the contribution according to the distance of voters from test bag. The voter which is closer to the test bag should have higher weighted value. The Sparseness-Weighted Decision method

References (19)

There are more references available in the full text version of this article.

Cited by (0)

View full text