Characterization of Mueller matrix elements for classifying human skin cancer utilizing random forest algorithm

Ngan Thanh Luu; Thanh-Hai Le; Quoc-Hung Phan; Thi-Thu-Hien Pham

doi:10.1117/1.JBO.26.7.075001

5 July 2021 Characterization of Mueller matrix elements for classifying human skin cancer utilizing random forest algorithm

Ngan Thanh Luu, Thanh-Hai Le, Quoc-Hung Phan, Thi-Thu-Hien Pham

Author Affiliations +

Journal of Biomedical Optics, Vol. 26, Issue 7, 075001 (July 2021). https://doi.org/10.1117/1.JBO.26.7.075001

Abstract

Significance: The Mueller matrix decomposition method is widely used for the analysis of biological samples. However, its presumed sequential appearance of the basic optical effects (e.g., dichroism, retardance, and depolarization) limits its accuracy and application.

Aim: An approach is proposed for detecting and classifying human melanoma and non-melanoma skin cancer lesions based on the characteristics of the Mueller matrix elements and a random forest (RF) algorithm.

Approach: In the proposal technique, 669 data points corresponding to the 16 elements of the Mueller matrices obtained from 32 tissue samples with squamous cell carcinoma (SCC), basal cell carcinoma (BCC), melanoma, and normal features are input into an RF classifier as predictors.

Results: The results show that the proposed model yields an average precision of 93%. Furthermore, the classification results show that for biological tissues, the circular polarization properties (i.e., elements m₄₄, m₃₄, m₂₄, and m₁₄ of the Mueller matrix) dominate the linear polarization properties (i.e., elements m₁₃, m₃₁, m₂₂, and m₄₁ of the Mueller matrix) in determining the classification outcome of the trained classifier.

Conclusions: Overall, our study provides a simple, accurate, and cost-effective solution for developing a technique for classification and diagnosis of human skin cancer.

1. Introduction

According to the International Agency for Research on Cancer, there were 300,000 new cases of melanoma and over 1,000,000 new cases of non-melanoma skin cancer in 2018.¹^,² Furthermore, the true number of skin cancer cases may be even higher than this figure due to many factors such as the registration methodology of skin cancer, the quality of skin cancer data.³ Without early detection and preventative control, melanoma can quickly develop and become far riskier, e.g., from stage I with a five-year survival rate of 97% to stage IV with a five-year survival rate of just 20% to 10%.³ Therefore, the early detection of skin cancer is essential in improving the prognosis of skin cancer patients.

No reliable biomarkers exist for melanoma diagnosis. Consequently, current diagnostic methods for skin lesions are subjective and imprecise. Typically, a patient must undergo around 36 biopsies to confirm (or discount) melanoma. However, despite this large number of biopsies, false negative predictions cannot be entirely ruled out.⁴ Thus, new skin cancer detection methods with greater accuracy and less invasiveness are urgently required. Among the various optical imaging technologies available nowadays, optical coherence tomography (OCT)⁵^,⁶ and polarization-sensitive OCT⁷ make possible the real-time comprehensive morphological mapping of skin tissue samples with micrometer resolution by measuring the inherent properties of light (e.g., the scattering, birefringence, and refractive index properties) as it propagates through the sample.⁸ However, while OCT has a greater sensitivity for detecting melanoma than other techniques, such as reflectance confocal microscopy,⁹ high-frequency ultrasonography,¹⁰ and multispectral imaging,¹¹ detecting early stage melanoma using OCT still poses a significant challenge⁵ due to the great number of different types of non-melanoma skin cancer.¹²

Many studies have shown that the Stokes–Mueller method, based on polarized light, has significant potential for replacing current clinical standards for skin cancer detection. Lu and Chipman¹³ proposed a Mueller matrix decomposition method for determining the diattenuation, retardance, and depolarization properties of a sample. Ghosh et al.¹⁴ investigated the efficacy of the Mueller matrix decomposition method in extracting the individual intrinsic polarimetry characteristics of a scattering medium with both linear birefringence (LB) and optical activity. Du et al.¹⁵ used a Mueller matrix imaging technique to construct two-dimensional images of the polarization parameters (i.e., attenuation, depolarization power, and linear retardance) of human skin basal cell carcinoma (BCC) and human papillary thyroid carcinoma tissues. Martin et al.¹⁶ used the Mueller matrix decomposition techniques proposed by Lu and Chipman¹³ and Ossikovski¹⁷ to differentiate between healthy and irradiated pig skin samples based on their measured retardance, diattenuation, and depolarization properties. Pham et al.¹⁸^–²² employed a Stokes–Mueller method to examine the polarization properties of skin cancer, liver cancer tissues, neuroblastoma, collagen-rich tendons, and cartilage. It was shown that the proposed method yielded nine effective parameters for distinguishing between normal skin tissue and various skin cancer tissues, including BCC, squamous cell carcinoma (SCC), and malignant melanoma.

Machine learning provides a powerful tool for performing the objective and precise diagnosis of cancer through its use of statistics, probabilistic algorithms, and massive computational power. According to recent studies, machine learning techniques can improve 15% to 20% of the previous accuracy of cancer detection.²³ For example, Codella et al.²⁴ used a convolutional neural network (CNN) in deep learning combined with image segmentation algorithms to recognize melanoma in a dataset consisting of 900 training dermoscopic images and 379 test images. The classification accuracy was found to be 76%. By contrast, the average diagnosis accuracy of eight expert dermatologists was just 70.5%. Esteva et al.²⁵ used a GoogleNet Inception v3 CNN architecture and a transfer learning technique to perform the first-level classification of three class disease partitions (benign, malignant, and non-neoplastic) with an accuracy of 72.1% and the second-level classification of the same partitions with an accuracy of 55.4%. Baldwin et al.²⁶ proposed an automated Mueller matrix polarization imaging system and a classification and regression tree (CART) statistical analysis approach for classifying three classes of Sinclair swine tissue (normal, benign, and cancerous) and showed that the sensitivity was as high as 90%. Sigurdsson et al.²⁷ detected five skin tumor lesion types using Raman spectra and a nonlinear neutral network. The experimental results showed that the proposed system achieved a classification rate of 80.5% for malignant melanomas and 95.8% for BCC. Legesse et al.²⁸ used a perceptron algorithm to discriminate healthy and tumorous regions in BCC Stokes–Raman scattering (CARS) based on an analysis of the texture features. It was shown that the classifier achieved a sensitivity of 88% and a specificity of 91%. Murugan et al.²⁹ used random forest (RF) and support vector machine (SVM) classifiers techniques for skin cancer detection. The experimental results showed that the proposed system achieved a classification rate of 72.2% using RF techniques and 87.81% using SVM+RF. Singh et al.³⁰ detected breast cancer using RF classifier technique. It was shown that the classifier achieved a sensitivity of 90.56% and a specificity of 86.40%. Based on the fruitful achievement of Mueller matrix in Refs. 13 14.15.16.17.18.19.20.21.–22 and machine learning techniques for skin cancer detection in Refs. 24 25.26.27.28.29.–30, furthermore, the RF classifier is adopted for this study because of its advantage for overcoming the overfitting and suitable for classifying untrained data.³¹ Notably, the RF has the advantage in reducing the influence of noisy trees contribution.³² Moreover, the RF allows the ability of investigation to feature importance,³³ which is useful to analyze the impact of optical properties of tissue on different types of skin pathology. Accordingly, the present study explores the feasibility for using a machine learning technique to discriminate between normal skin tissue and three classes of skin cancer based on the 16 elements of the Mueller matrix of a biomedical sample, in which all of the optical effects may appear simultaneously.

2. Stokes–Mueller Matrix Polarimetry Formalism for Skin Cancer Classification

The output Stokes vector, $S_{C}$ , has the following form:

Eq. (1)

S_{c} = {[\begin{matrix} S_{0} \\ S_{1} \\ S_{2} \\ S_{3} \end{matrix}]}_{c} = [M_{sample}] {\hat{S}}_{c} = [\begin{matrix} m_{11} & m_{12} & m_{13} & m_{14} \\ m_{21} & m_{22} & m_{23} & m_{24} \\ m_{31} & m_{32} & m_{33} & m_{34} \\ m_{41} & m_{42} & m_{43} & m_{44} \end{matrix}] {[\begin{matrix} {\hat{S}}_{0} \\ {\hat{S}}_{1} \\ {\hat{S}}_{2} \\ {\hat{S}}_{3} \end{matrix}]}_{c},

where [

M_{sample}

] is the Mueller matrix of a biomedical sample with depolarization, LB, circular birefringence (CB), linear dichroism (LD), and circular dichroism (CD) properties, and

S_{c}

is the input Stokes vector.

Assume that the sample is illuminated by four input lights with linear polarization states (i.e., ${\hat{S}}_{0 °} = {[\begin{matrix} 1, & 1, & 0, & 0 \end{matrix}]}^{T}$ , ${\hat{S}}_{45 °} = {[\begin{matrix} 1, & 0, & 1, & 0 \end{matrix}]}^{T}$ , ${\hat{S}}_{90 °} = {[\begin{matrix} 1, & - 1, & 0, & 0 \end{matrix}]}^{T}$ , and ${\hat{S}}_{135 °} = {[\begin{matrix} 1, & 0, & - 1, & 0 \end{matrix}]}^{T}$ ) and two input lights with circular polarization states (i.e., right-handed ${\hat{S}}_{RHC} = {[\begin{matrix} 1, & 0, & 0, & 1 \end{matrix}]}^{T}$ and left-handed ${\hat{S}}_{LHC} = {[\begin{matrix} 1, & 0, & 0, & - 1 \end{matrix}]}^{T}$ ). The Mueller matrix elements of the biomedical sample, [ $M_{sample}$ ], are then obtained as

Eq. (2)

[\begin{matrix} m_{11} & m_{12} & m_{13} & m_{14} \\ m_{21} & m_{22} & m_{23} & m_{24} \\ m_{31} & m_{32} & m_{33} & m_{34} \\ m_{41} & m_{42} & m_{43} & m_{44} \end{matrix}] = \frac{1}{2} [\begin{matrix} (S_{0 \deg} (1) + S_{90 \deg} (1)) & (S_{0 \deg} (1) - S_{90 \deg} (1)) & (S_{45 \deg} (1) - S_{135 \deg} (1)) & (S_{RHC} (1) - S_{LHC} (1)) \\ (S_{0 \deg} (2) + S_{90 \deg} (2)) & (S_{0 \deg} (2) - S_{90 \deg} (2)) & (S_{45 \deg} (2) - S_{135 \deg} (2)) & (S_{RHC} (2) - S_{LHC} (2)) \\ (S_{0 \deg} (3) + S_{90 \deg} (3)) & (S_{0 \deg} (3) - S_{90 \deg} (3)) & (S_{45 \deg} (3) - S_{135 \deg} (3)) & (S_{RHC} (3) - S_{LHC} (3)) \\ (S_{0 \deg} (4) + S_{90 \deg} (4)) & (S_{0 \deg} (4) - S_{90 \deg} (4)) & (S_{45 \deg} (4) - S_{135 \deg} (4)) & (S_{RHC} (4) - S_{LHC} (4)) \end{matrix}] .

3. Skin Cancer Classification Model

3.1.

Decision Tree Algorithm

Decision tree algorithms implement classification by splitting the dataset using binary questions based on the feature vectors.³⁴ In particular, the feature vectors (denoted as $X$ ) are taken as tree nodes in the classification architecture, while the class labels are denoted as $Y$ . A decision rule, $d (t)$ , is then used to map each $X$ to $d (X)$ , where $d (X)$ represents the class label of the feature vectors.³⁵ Depending on whether or not the input features (i.e., attributes) satisfy the binary question, they are divided into two groups (known as branches) of nodes. Thus, by applying multiple questions to the flow, the decision tree classifies the input dataset into multiple different class labels.

One of the most well-known decision tree algorithms is the CART algorithm proposed by Breiman et al.,³⁶ which constructs decision trees by applying a threshold for features that yield the best performance of the Gini index or information gain, respectively, depending on the tuned parameters.³⁷ Notably, the algorithm not only accommodates both numerical and categorical variables but also handles outliers in the dataset in a robust manner.³⁸ As such, it is ideally suited to the classification problem considered in the present study, in which the instances in the dataset [i.e., the Mueller matrix elements describing the optical (depolarization, LB, CB, LD, and CD) properties of human tissue samples] are numerical and have no missing values, but may contain outliers.

3.2.

Random Forest Algorithm

The RF classification algorithm³¹ builds multiple individual sub-decision trees as building blocks for categorization tasks ${T_{1} (X), T_{2} (X), T_{3} (X), \dots T_{n} (X)}$ .³⁵ Each individual sub-decision tree utilizes a different method to generate the binary questions used for classification purposes, and hence the resulting tree structure and organization are unique. Since each sub-decision tree in the RF architecture performs its own classification procedure, each tree can be regarded as an individual predictor and votes for the prediction of the input data and the final classification outcome can then be determined via a polling process. Compared to the traditional decision tree classification algorithm described above, the RF classifier provides a more effective reduction of the bias-variance by combining small decision trees with random feature subsets; thereby preventing overfitting during the training process.³⁹

3.3.

Gini Impurity

The Gini index is a statistical measure for quantifying the heterogeneity of a dataset.⁴⁰ As described above, in binary decision trees, decision rules, $d (t)$ , are used to split the learning set of feature vectors $L$ containing a certain number of feature vectors $X$ . By splitting $L$ into two sub-sets, namely $L_{1}$ and $L_{2}$ , such that the data points of each subset conform to a specific rule, i.e., $d (t)$ . Consequently, the impurities of $L_{1}$ and $L_{2}$ , respectively, are less than that of their parent, $L$ .⁴¹ The impurity is measured by the Gini index, which has the following form:

Eq. (3)

G = 1 - \sum_{i = 1}^{k} {(p (c_{i} | t))}^{2},

where

G

is the Gini index at node

t

and

p (t)

is the probability of a given dataset

L

being assigned to class

c_{i}

. The Gini index varies from 0 to 1, where

G = 0

represents a complete equality of the data (i.e., all the data in the subset after splitting belong to a specific class label), whereas

G = 1

indicates a complete inequality of the data (i.e., none of the data in the subset after splitting belong to the same class label).

4. Experimental Setup and Data Acquisition Process

4.1.

Experimental Setup

Figure 1 presents a schematic illustration of the experimental setup used in this study. The illumination light was produced by a frequency stable He-Ne laser (HNLS008R, SIOS Co.) with a center wavelength of 632.8 nm. The light emitted by the laser passed through a quarter-wave plate (QWP0-63304-4-R10, CVI Co.) and polarizer (GTH5M, Thorlabs Co.) and was then incident on the sample (i.e., biological tissue mounted on a quartz slice). It is noted that quartz slices were used to minimize the depolarization effect when light passed through the sample. Furthermore, the blank quartz slides were measured before performing experiments for calibration purposes. The quarter-wave plate was used to produce two circular polarization input states (right-handed and left-handed), while the polarizer was used to produce four linear polarization input states (0 deg, 45 deg, 90 deg, and 135 deg). For both optical elements, the polarization states were produced using rotary stages and a controller. The polarized light was passed through a neutral density filter (NDC-100-2, ONSET Co.) in order to ensure a consistent intensity of the light incident on the sample. The light emerging from the sample was detected by a commercial Stokes polarimeter (PAX5710, Thorlabs Co.), where the intensity was sampled at a rate of 33.33 (samples/s) and the output Stokes vector $S_{c}$ was constructed accordingly.

Fig. 1

Setup of measurement system.

The experiments considered 32 biomedical skin tissue samples, specifically, 12 BCC samples, 4 melanoma samples, 4 normal samples, and 12 SCC samples, where each sample was cut into 4 to 6 slices. For each slice, the measurement process was performed at 4 to 6 different positions; with the output Mueller matrix calculated for each point. For each data point, the 16 elements of the Mueller matrix were compiled into a feature vector, $X$ . Data from the 32 samples (i.e., 669 vectors) were split into two parts for training and testing. As shown in Table 1, the training dataset includes 607 feature vectors, where these vectors belonged to four different classes, namely BCC (282 vectors), SCC (231), melanoma (52 vectors), and normal (42 vectors). One sample of each skin tissue type was used for evaluation of the trained model; thus, the testing dataset includes 62 feature vectors, namely BCC (30 vectors), SCC (23), melanoma (3 vectors), and normal (6 vectors).

Table 1

Number of feature vectors of each class label in training and testing datasets.

	BCC	SCC	Melanoma	Normal
Number of training feature vectors	282	231	52	42
Number of testing feature vectors	30	23	3	6

It is noted that due to the difference in the shape of samples, each sample was under a different time of slicing and also the difference in measurement of interest position. Hence, the number of feature vectors belongs to each sample varies. This leads to the difference in the ratio of training and testing feature vectors for each type of skin tissue.

4.2.

Classification Workflow

4.2.1.

Data preprocessing

One of the most common problems facing machine learning classifiers is that of imbalanced datasets, where the data records of the majority class overwhelm those of the other classes. In such a situation, the training process is unable to learn proper classification rules for the minority classes, and hence the classification accuracy for these classes is severely impaired.⁴² As shown in Table 1, the dataset employed in this study suffered this imbalance problem since the BCC class contained 282 feature vectors, whereas the normal class contained only 42 vectors. Accordingly, the oversampling technique⁴³ was performed to randomly duplicate instances of the minority classes (SCC, melanoma, and normal skin) based on the original number of vector features belonging to the BCC majority class (see Table 2).

Table 2

Number of feature vectors of each class label in training and testing datasets after oversampling.

	BCC	SCC	Melanoma	Normal
Number of training feature vectors	282	282	282	282
Number of testing feature vectors	30	30	30	30

4.2.2.

Training

Figure 2 presents a flowchart of the 10-fold cross-validation training process performed in this study. The selection of hyperparameters for the classification predictor was chosen with the help of the grid search technique. By feeding the ranges of hyperparameters into target predictors, the mentioned technique tries to fit each of the parameters into predictors to compute the score and select the best parameter for optimization of classification performance. Specifically, the range of number of trees and depth of each individual tree were [10, 20, 30…990, 1000] and [2, 3, 4… 20], respectively. The RF classifier consisted of 220 sub-decision trees, and the depth of the model was limited to 14 layers where these choices of parameters were made through experimentation. In addition, the classifier used the Gini impurity index as the splitting criterion. For each data point, 15 feature vectors were fed into the classifier. [Note that one of the features (Mueller matrix element $m_{11}$ ) was used for normalization purposes, and hence was not used as a predictor.) Notably, there were no instances of missing data, and thus handling schemes for missing data were not required. Cross-validation is usually performed using $k = 10$ folds⁴⁴ since a larger value of $k$ reduces the size of each fold and thus reduces the difference in size of the training set and resampling subset, respectively. As a result, the bias, e.g., the difference between the true value and the expected value of the estimator, is decreased. For the present training process, 10-fold cross-validation was implemented with 3 times of repetition. According to Molinaro et al.⁴⁵ and Kim,⁴⁶ repeating $k$ -fold cross-validation is beneficial in improving the precision score of classification models while maintaining a small bias.

Fig. 2

Flowchart showing RF classifier training process with 10-fold cross-validation.

5. Results and Discussion

Figure 3 shows the training and validation accuracy results for the 10 folds of the dataset. For the training set, the classification accuracy is equal to 100% in virtually every fold. By contrast, for the validation set, the classification accuracy reduces to around 91%. This tendency is reasonable since the oversampling process increases the number of duplicate data features, and therefore the trained model produces multiple rules for one instance, and the rules become specific for a portion of training data. This increases the training accuracy, but decreases the classification accuracy.

Fig. 3

Training accuracy and validation accuracy of 10-fold cross-validation process.

The performance of the trained RF classifier when applied to the test dataset, including 30 BCC, 23 SCC, 3 melanoma, and 6 normal feature vectors, then those of each class equals 30 after oversampling, was evaluated by a confusion matrix, as shown in Table 3. As shown, the optimal classification performance was obtained for the melanoma class, with 30 true positive cases, no false positive case or false negative case. A good classification performance was also obtained for the normal skin tissue, i.e., 30 true positive cases, no false negative cases, and just 2 false positive cases. However, for the BCC and SCC classes, the classification performance was degraded, with 7 false positive outcomes for the BCC class and 9 false negative outcomes for the SCC class. Interestingly, almost all, i.e., 7/9 of the SCC instances, were misclassified as BCC. It is noted that when cancerous tumors develop in the tissue, numerous changes in the collagen components occur, including the deposition of collagen fibrils resulting from an increased number of fibroblasts, the production of proteolytic enzymes for cancer invasion, etc.⁴⁷^,⁴⁸ The change of biological structure that led the classification model significantly distinguished between normal tissue and cancerous tissue. Whereas some cases of BCC and SCC share the same clinical features, such as an ulcer with a rolled border, that may get the estimator confused.⁴⁹

Table 3

Confusion matrix of trained RF classifier when applied to test dataset.

		Predicted class
		BCC	Melanoma	Normal	SCC
True class	BCC	30	0	0	0
	Melanoma	0	30	0	0
	Normal	0	0	30	0
	SCC	7	0	2	21

The receiver operator characteristic curve is an evaluation metric for binary classification.⁵⁰ It represents true positive rate (TPR) and false positive rate (FPR) at different thresholds. Thus, the calculation of the area under the curve (AUC) can be used to evaluate the model with unbiased estimation. The closer of the AUC score to 1, the better the model is. As shown in Table 4, the AUC score of melanoma and normal skin tissue was 1. The performance of the RF model on prediction BCC and SCC is lower; however, it is still a good score with 0.999 and 0.996, respectively. Overall, the mean AUC for all types of skin tissue is 0.999.

Table 4

AUC scores of each class and mean AUC of the proposed method.

BCC	SCC	Melanoma	Normal	Mean AUC
0.999	0.996	1.0	1.0	0.999

Table 5 analyzes the performance of the trained classifier for the four different class labels. The performance metrics, i.e., the precision, recall, and $F 1$ score are defined as follows:

Eq. (4)

Precision = \frac{TP}{TP + FP},

Eq. (5)

Recall = \frac{TP}{TP + FN},

Eq. (6)

F 1 score = 2 \times \frac{Precision \times Recall}{Precision + Recall},

where TP is the true positive, FP is the false positive, and FN is the false negative.

Table 5

Performance evaluation analysis of trained classification model.

Class label	Precision	Recall	F1 score
BCC	0.81	1.00	0.90
Melanoma	1.00	1.00	1.00
Normal	0.94	1.00	0.97
SCC	1.00	0.70	0.82

The precision metric evaluates the prediction performance of the model, with a value closer to 1 indicating a better closeness of the predicted outcomes to the true outcomes. As shown, the classifier attains a precision of 1 for the SCC class. In other words, when supplied with the feature vectors of SCC, it correctly outputs a class label of normal in almost every case. Meanwhile, the recall metric evaluates the performance of the trained model for each individual prediction. In other words, the recall value of 1 for the BCC class indicates that if the trained model has previously predicted the current feature vectors as not belonging to the BCC class, then the current input belongs to the three other classes either SCC, normal, or melanoma with a probability of 100%. Finally, $F 1$ score is the metric that combines precision and recall scores as harmonic mean. The $F 1$ score takes both precision and recall scores into account, therefore, that is more general than these two metrics in evaluating models. Also shown in Table 4, the trained model achieves a good classification performance for the melanoma class ( $precision = 1$ ; $recall = 1$ ). Moreover, the classifier also achieves a good performance for the normal class ( $precision = 0.94$ ; $recall = 1$ ). However, as implied in the confusion matrix in Table 3, the classifier has a poorer performance for the BCC and SCC classes. Overall, the trained classifier successfully discriminates four classes of skin tissues with a (mean accuracy of 0.93).

Figure 4 presents the distributions and magnitudes of the 15 Mueller matrix elements of the SCC, BCC, melanoma, and normal skin tissue samples. The vertical and horizontal axes show the magnitude and distribution of the corresponding Mueller matrix elements, respectively. Note that, as described earlier, one of the matrix elements ( $m_{11}$ ) was used for normalization purposes, and is hence omitted here. It is seen that for each Mueller matrix element, the magnitude is approximately equal for all four types of tissue sample. However, the distribution varies from one sample type to another. For example, for element $m_{22}$ , the distributions of the different sample types are affected by outliers, which result in a significant skew of the distribution. Thus, element $m_{22}$ has only a low contribution to the outcome of the classification model, as shown in Fig. 5. In other words, although the range of data distribution varies among the four types of samples, it is difficult to classify the data without relying on advanced algorithms. It is also noted that the average standard deviation of the Mueller matrix elements $m_{12}$ , $m_{13}$ , $m_{14}$ , $m_{21}$ , $m_{24}$ , and $m_{34}$ over 4 to 6 measurement points of the same slice are highest and deviated from 0.03 to 0.1 for four types of sampling. While the average standard deviation of the other elements over 4 to 6 measurement points of the same slice are smaller than 0.01. This is also clearly observed in Fig. 4 when the elements $m_{12}$ , $m_{13}$ , $m_{14}$ , $m_{21}$ , $m_{24}$ , and $m_{34}$ have wider distributions than the other Mueller matrix elements.

Fig. 4

Magnitude and distribution of Mueller matrix elements for four different types of skin tissues.

Fig. 5

Feature importance ranking of 15 elements of Mueller matrix in (a) descending order; (b) Mueller matrix form.

Figure 5 shows the relative importance of the 15 different elements of the Mueller matrix within the classification model. Note that the feature importance represents the reduction in the node Gini impurity weighted by the node probability.³⁷ For each decision tree, the importance score of feature $i$ on node $j$ , $n i_{j}$ , is calculated as

Eq. (7)

n i_{j} = w_{j} G_{j} - w_{j} (L) G_{j} (L) - w_{j} (R) G_{j} (R),

where

w_{j}

is the proportion of the number of samples reaching node

j

;

w_{j} (L)

is the child node of the left split of node

j

;

w_{j} (R)

is the child node of the right split of node

j

; and

G_{j}

,

G_{j} (L)

,

G_{j} (R)

are the Gini impurities of node

j

and its left and right child nodes, respectively.

Thus, the importance of feature $i$ in a specific tree, $f i$ , can be calculated as

Eq. (8)

f i = \frac{\sum_{j}^{s} n i_{j}}{\sum_{k}^{N} n i_{k}},

where

s

is the number of node

j

splits for feature

i

;

N

is the number of nodes; and

f i

is the importance of feature

i

and is normalized to a value between 0 and 1.

Finally, the importance score of feature $i$ in a forest of $T$ estimators, $F i$ , is given by the average importance score of feature $i$ over the individual trees, i.e.,

Eq. (9)

F i = \frac{\sum^{T} f i}{T} .

As shown in Fig. 5(b) and referring to Eq. (2), elements $m_{24}$ , $m_{34}$ , $m_{44}$ , and $m_{14}$ in the Mueller matrix, referring to the left- and right-handed circular polarization states, have high relative contributions of 9.7%, 8.3%, 7.5%, and 6.0%, respectively, toward the classification outcome. By contrast, elements $m_{43}$ , $m_{33}$ , $m_{23}$ , and $m_{13}$ , corresponding to the input linear polarization states of 45 deg and 135 deg, have relatively low contributions of 5.7%, 5.7%, 5.6%, and 5.4%, respectively. Similarly, elements $m_{32}$ , $m_{42}$ , $m_{31}$ , $m_{22}$ , and $m_{41}$ , corresponding to linear polarization states of 0 deg and 90 deg also have low contributions of 7%, 6.9%, 5.2%, 4.3%, and 4.2%, respectively. Meanwhile, elements $m_{21}$ and $m_{12}$ , corresponding to linear polarization states of 0 deg and 90 deg, have high relative contributions of 9.3% and 9.1%, respectively. In other words, the circular polarization elements in the Mueller matrix exert a greater effect on the classification outcome than the linear polarization elements. This finding is reasonable since skin tissue samples have a high natural scattering effect, which causes a helicity flip of the circular polarization light while passing through the sample.⁵¹ In general, the results presented above indicate that the proposed technique, based on Stokes–Mueller matrix polarimetry and an RF classification algorithm, provides a simple and well-accurate tool for skin cancer classification and diagnosis applications.

6. Conclusion

This study has proposed a Stokes–Mueller polarimetry method based on an RF classifier consisting of 220 sub-decision binary trees for discriminating between four different types of skin tissues, namely BCC, melanoma cancer, SCC, and normal, based on the measured values of the 16 elements in the output Mueller matrix. Based on the experimental results obtained for 32 skin tissue samples, it has been shown that the proposed model achieves an average classification accuracy of 93% for the four skin tissue types. It has additionally been shown that among all of the elements in the Mueller matrix, elements $m_{44}$ , $m_{34}$ , $m_{24}$ , and $m_{14}$ , relating to the left- and right-handed circular polarization states, respectively, have a stronger discriminatory power than those relating to the linear polarization states. Overall, the results show that the proposed framework has a promising potential for the development of machine learning approaches for automated cancer tissue screening and diagnosis.

Disclosures

The authors declare no conflicts of interest.

Acknowledgements

The authors gratefully acknowledge the financial support provided to this study by the Vietnam National University Ho Chi Minh City (VNU-HCM) under Grant No. DS2020-28-02.

References

1.

E. Maverakis et al., “Metastatic melanoma—a review of current and future treatment options,” Acta Derm. Venereol., 95 (5), 516 –524 (2015). https://doi.org/10.2340/00015555-2035 ADVEA4 1651-2057 Google Scholar

2.

J. Ferlay et al., “Estimating the global cancer incidence and mortality in 2018: GLOBOCAN sources and methods,” Int. J. Cancer, 144 (8), 1941 –1953 (2019). https://doi.org/10.1002/ijc.31937 IJCNAW 1097-0215 Google Scholar

3.

S. Gupta, R. Reintjes and N. Trialonis-Suthakharan, “Analysis of the methodology of skin cancer incidence registration in German cancer registries,” Ann. Cancer Epidemiol., 3 8 (2019). https://doi.org/10.21037/ace.2019.08.04 Google Scholar

4.

A. Bhattacharya et al., “Precision diagnosis of melanoma and other skin lesions from digital images,” AMIA Jt. Summits Transl. Sci. Proc., 2017 220 –226 (2017). Google Scholar

5.

A. Rajabi-Estarabadi et al., “Optical coherence tomography imaging of melanoma skin cancer,” Lasers Med. Sci., 34 (2), 411 –420 (2019). https://doi.org/10.1007/s10103-018-2696-1 Google Scholar

6.

Y.-Q. Xiong et al., “Optical coherence tomography for the diagnosis of malignant skin tumors: a meta-analysis,” J. Biomed. Opt., 23 (2), 020902 (2018). https://doi.org/10.1117/1.JBO.23.2.020902 JBOPFO 1083-3668 Google Scholar

7.

T. Marvdashti et al., “Classification of basal cell carcinoma in human skin using machine learning and quantitative features captured by polarization sensitive optical coherence tomography,” Biomed. Opt. Express, 7 (9), 3721 –3721 (2016). https://doi.org/10.1364/BOE.7.003721 BOEICL 2156-7085 Google Scholar

8.

M. Mogensen and G. B. E. Jemec, “Diagnosis of nonmelanoma skin cancer/keratinocyte carcinoma: a review of diagnostic accuracy of nonmelanoma skin cancer diagnostic tests and technologies,” Dermatol. Surg., 33 (10), 1158 –1174 (2007). https://doi.org/10.1111/j.1524-4725.2007.33251.x Google Scholar

9.

S. Arroyo-Camarena et al., “Spectroscopic and imaging characteristics of pigmented non-melanoma skin cancer and melanoma in patients with skin phototypes III and IV,” Oncol. Ther., 4 (2), 315 –331 (2016). https://doi.org/10.1007/s40487-016-0036-9 Google Scholar

10.

M. Raszewska-Famielec et al., “Clinical usefulness of high-frequency ultrasonography in the monitoring of basal cell carcinoma treatment effects,” Postepy Dermatol. Alergol., 37 (3), 364 –370 (2020). https://doi.org/10.5114/ada.2020.96099 Google Scholar

11.

L. Rey-Barroso et al., “Visible and extended near-infrared multispectral imaging for skin cancer diagnosis,” Sensors, 18 1441 (2018). https://doi.org/10.3390/s18051441 SNSRES 0746-9462 Google Scholar

12.

S. Batz et al., “Differentiation of different nonmelanoma skin cancer types using OCT,” Skin Pharmacol. Physiol., 31 (5), 238 –245 (2018). https://doi.org/10.1159/000489269 Google Scholar

13.

S.-Y. Lu and R. A. Chipman, “Interpretation of Mueller matrices based on polar decomposition,” J. Opt. Soc. Am. A, 13 (5), 1106 –1106 (1996). https://doi.org/10.1364/JOSAA.13.001106 JOAOD6 0740-3232 Google Scholar

14.

N. Ghosh, M. F. G. Wood and I. A. Vitkin, “Mueller matrix decomposition for extraction of individual polarization parameters from complex turbid media exhibiting multiple scattering, optical activity, and linear birefringence,” J. Biomed. Opt., 13 (4), 044036 (2008). https://doi.org/10.1117/1.2960934 JBOPFO 1083-3668 Google Scholar

15.

E. Du et al., “Mueller matrix polarimetry for differentiating characteristic features of cancerous tissues,” J. Biomed. Opt., 19 (7), 076013 (2014). https://doi.org/10.1117/1.JBO.19.7.076013 JBOPFO 1083-3668 Google Scholar

16.

L. Martin, G. Le Brun and B. Le Jeune, “Mueller matrix decomposition for biological tissue analysis,” Opt. Commun., 293 4 –9 (2013). https://doi.org/10.1016/j.optcom.2012.11.086 OPCOB8 0030-4018 Google Scholar

17.

R. Ossikovski, “Analysis of depolarizing Mueller matrices through a symmetric decomposition,” J. Opt. Soc. Am. A:, 26 1109 –1118 (2009). https://doi.org/10.1364/JOSAA.26.001109 Google Scholar

18.

T. T. H. Pham and Y. L. Lo, “Extraction of effective parameters of turbid media utilizing the Mueller matrix approach: study of glucose sensing,” J. Biomed. Opt., 17 (9), 0970021 (2012). https://doi.org/10.1117/1.JBO.17.9.097002 JBOPFO 1083-3668 Google Scholar

19.

T. T. H. Pham and Y. L. Lo, “Extraction of effective parameters of anisotropic optical materials using a decoupled analytical method,” J. Biomed. Opt., 17 (2), 025006 (2012). https://doi.org/10.1117/1.JBO.17.2.025006 JBOPFO 1083-3668 Google Scholar

20.

T. T. H Pham et al., “Optical parameters of human blood plasma, collagen, and calfskin based on the Stokes–Mueller technique,” Appl. Opt., 57 (16), 4353 –4359 (2018). https://doi.org/10.1364/AO.57.004353 APOPAI 0003-6935 Google Scholar

21.

D. L. Le et al., “Characterization of healthy and nonmelanoma-induced mouse utilizing the Stokes–Mueller decomposition,” J. Biomed. Opt., 23 (12), 125003 (2018). https://doi.org/10.1117/1.JBO.23.12.125003 JBOPFO 1083-3668 Google Scholar

22.

D. L. Le et al., “Characterization of healthy and cancerous human skin tissue utilizing Stokes–Mueller polarimetry technique,” Opt. Commun., 480 126460 (2021). https://doi.org/10.1016/j.optcom.2020.126460 OPCOB8 0030-4018 Google Scholar

23.

J. A. Cruz and D. S. Wishart, “Applications of machine learning in cancer prediction and prognosis,” Cancer Inf., 2 59 –77 (2007). https://doi.org/10.1177/117693510600200030 Google Scholar

24.

N. C. F. Codella et al., “Deep learning ensembles for melanoma recognition in dermoscopy images,” IBM J. Res. Dev., 61 (4–5), 5:1 –5:15 (2017). https://doi.org/10.1147/JRD.2017.2708299 IBMJAE 0018-8646 Google Scholar

25.

A. Esteva et al., “Dermatologist-level classification of skin cancer with deep neural networks,” Nature, 542 115 –118 (2017). https://doi.org/10.1038/nature21056 Google Scholar

26.

M. Baldwin et al., “Mueller matrix imaging for cancer detection,” in Proc. 25th Annu. Int. Conf. IEEE Eng. Med. and Biol. Soc., 1027 –1030 (2003). https://doi.org/10.1109/IEMBS.2003.1279419 Google Scholar

27.

S. Sigurdsson et al., “Detection of skin cancer by classification of Raman spectra,” IEEE Trans. Biomed. Eng., 51 (10), 1784 –1793 (2004). https://doi.org/10.1109/TBME.2004.831538 IEBEAX 0018-9294 Google Scholar

28.

F. B. Legesse et al., “Texture analysis and classification in coherent anti-Stokes Raman scattering (CARS) microscopy images for automated detection of skin cancer,” Comput. Med. Imaging Graphics, 43 36 –43 (2015). https://doi.org/10.1016/j.compmedimag.2015.02.010 Google Scholar

29.

A. Murugan et al., “Diagnosis of skin cancer using machine learning techniques,” Microprocess. Microsy., 81 102727 (2020). https://doi.org/10.1016/j.micpro.2020.103727 MIMID5 0141-9331 Google Scholar

30.

V. P. Singh et al., “Mammogram classification using selected GLCM features and random forest classifier,” Int. J. Comput. Sci. Inf. Secur., 14 82 –87 (2016). Google Scholar

31.

L. Breiman, “Random forests,” Mach. Learn., 45 (1), 5 –32 (2001). https://doi.org/10.1023/A:1010933404324 MALEEZ 0885-6125 Google Scholar

32.

A. Criminisi and J. Shotton, Decision Forests for Computer Vision and Medical Image Analysis, London (2013). Google Scholar

33.

J.-H. Hur, S.-Y. Ihm and Y.-H. Park, “A variable impacts measurement in random forest for mobile cloud computing,” Wireless Commun. Mobile Comput., 2017 1 –13 (2017). https://doi.org/10.1155/2017/6817627 Google Scholar

34.

S. Wang et al., “An improved random forest-based rule extraction method for breast cancer diagnosis,” Appl. Soft Comput., 86 105941 (2020). https://doi.org/10.1016/j.asoc.2019.105941 Google Scholar

35.

S. R. Safavian and D. Landgrebe, “A survey of decision tree classifier methodology,” IEEE Trans. Syst. Man Cybern., 21 (3), 660 –674 (1991). https://doi.org/10.1109/21.97458 Google Scholar

36.

L. Breiman et al., Classification and Regression Trees, Chapman and Hall/CRC(1984). Google Scholar

37.

F. Pedregosa et al., “Scikit-learn: machine learning in Python,” J. Mach. Learn. Res., 12 (85), 2825 –2830 (2011). Google Scholar

38.

S. Sathyadevan, R. R. Nair, “Comparative analysis of decision tree algorithms: ID3, C4.5 and random forest,” Smart Innovation, Systems and Technologies, 31 549 –562 Springer, New Delhi (2015). Google Scholar

39.

V. Svetnik et al., “Random forest: a classification and regression tool for compound classification and QSAR modeling,” J. Chem. Inf. Comput. Sci., 43 (6), 1947 –1958 (2003). https://doi.org/10.1021/ci034160g JCISD8 0095-2338 Google Scholar

40.

Y. Zhang and J. Yao, “Gini objective functions for three-way classifications,” Int. J. Approx. Reason., 81 103 –114 (2017). https://doi.org/10.1016/j.ijar.2016.11.005 IJARE4 0888-613X Google Scholar

41.

L. Raileanu and K. Stoffel, “Theoretical comparison between the Gini index and information gain criteria,” Ann. Math. Artif. Intel., 41 77 –93 (2004). https://doi.org/10.1023/B:AMAI.0000018580.96245.c6 AMAIEC 1012-2443 Google Scholar

42.

Y. Sun, A. K. C. Wong and M. S. Kamel, “Classification of imbalanced data: a review,” Int. J. Pattern Recognit. Artif. Intell., 23 (4), 687 –719 (2009). https://doi.org/10.1142/S0218001409007326 IJPIEI 0218-0014 Google Scholar

43.

H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Trans. Knowl. Data Eng., 21 (9), 1263 –1284 (2009). https://doi.org/10.1109/TKDE.2008.239 ITKEEH 1041-4347 Google Scholar

44.

M. Kuhn and K. Johnson, Applied Predictive Modeling, Springer, New York (2013). Google Scholar

45.

A. M. Molinaro, R. Simon and R. M. Pfeiffer, “Prediction error estimation: a comparison of resampling methods,” Bioinformatics, 21 (15), 3301 –3307 (2005). https://doi.org/10.1093/bioinformatics/bti499 BOINFP 1367-4803 Google Scholar

46.

J.-H. Kim, “Estimating classification error rate: repeated cross-validation, repeated hold-out and bootstrap,” Comput. Stat. Data Anal., 53 (11), 3735 –3745 (2009). https://doi.org/10.1016/j.csda.2009.04.009 CSDADW 0167-9473 Google Scholar

47.

H. R. Lee et al., “Digital histology with Mueller microscopy: how to mitigate an impact of tissue cut thickness fluctuations,” J. Biomed. Opt., 24 (7), 076004 (2019). https://doi.org/10.1117/1.JBO.24.7.076004 JBOPFO 1083-3668 Google Scholar

48.

J. Qi and D. S. Elson, “Mueller polarimetric imaging for surgical and diagnostic applications: a review,” J. Biophotonics, 10 (8), 950 –982 (2017). https://doi.org/10.1002/jbio.201600152 Google Scholar

49.

T. H. Ryu et al., “Features causing confusion between basal cell carcinoma and squamous cell carcinoma in clinical diagnosis,” Ann. Dermatol., 30 (1), 64 –70 (2018). https://doi.org/10.5021/ad.2018.30.1.64 Google Scholar

50.

A. P. Bradley, “The use of the area under the ROC curve in the evaluation of machine learning algorithms,” Pattern Recognit., 30 (7), 1145 –1159 (1997). https://doi.org/10.1016/S0031-3203(96)00142-2 Google Scholar

51.

C. Macdonald and I. Meglinski, “Backscattering of circular polarized light from a disperse random medium influenced by optical clearing,” Laser Phys. Lett., 8 (4), 324 –328 (2011). https://doi.org/10.1002/lapl.201010133 1612-2011 Google Scholar

Biography

Ngan Thanh Luu received her BS degree in biomedical engineering from International University, Vietnam National University, Vietnam, in 2021. Her research interests include artificial intelligence (AI), deep learning, and machine learning techniques for biosensing, skin cancer detection, cancer detection, biomedical detection, and applications.

Thanh-Hai Le received his BS degree in mechatronic engineering from Ho Chi Minh City (HCMC) University of Technology, Vietnam, and his MS and PhD degrees in bio-mechatronic engineering from Sungkyunkwan University, South Korea, in 2007 and 2011, respectively. Since September 2011, he joined the HCMC University of Technology, where he is currently a lecturer in the Department of Mechatronics, Faculty of Mechanical Engineering. His current research interests are vision-guided systems, diagnostic imaging systems using MRIs, CT scans, and x-rays, industrial automation using PLC, and instructional methodology.

Quoc-Hung Phan received his BS degree in mechanical engineering from HCMC University of Technology, Vietnam, in 2004, his MS degree from the Department of Mechanical Engineering, Southern Taiwan University, Taiwan, in 2007, and his PhD from the Department of Mechanical Engineering, National Cheng Kung University in 2016. His research interests include surface plasmon resonance, Stokes–Mueller matrix polarimetry, optical biosensing, and non-invasive glucose monitoring devices.

Thi-Thu-Hien Pham received her BS degree in mechatronics from the HCMC University of Technology–Vietnam National University, HCMC, Vietnam, in 2003 and his MS and PhD degrees in mechanical engineering from Southern Taiwan University of Technology and National Cheng Kung University, Tainan, Taiwan, in 2007 and 2012, respectively. She is currently head of the Biomedical Photonics Lab and an associate professor in the Department of Biomedical Engineering, International University–Vietnam National University HCMC, Ho Chi Minh City, Vietnam. Her research interests are in the areas of polarized light-tissue studies, polarimetry, optical techniques in precision measurement to determine the optical properties of bio-samples (glucose, collagen, and tumor) or cancer detection (skin, liver, and breast), noninvasive glucose measurement, cell/tissue characterization, laser/LED applications, and automatic control systems.

CC BY: © The Authors. Published by SPIE under a Creative Commons Attribution 4.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.

Citation Download Citation

Ngan Thanh Luu, Thanh-Hai Le, Quoc-Hung Phan, and Thi-Thu-Hien Pham "Characterization of Mueller matrix elements for classifying human skin cancer utilizing random forest algorithm," Journal of Biomedical Optics 26(7), 075001 (5 July 2021). https://doi.org/10.1117/1.JBO.26.7.075001

Received: 16 April 2021; Accepted: 16 June 2021; Published: 5 July 2021

Access the abstract

JOURNAL ARTICLE
13 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

CITATIONS

Cited by 24 scholarly publications.

Explore citations on Lens.org

KEYWORDS

Skin

Melanoma

Skin cancer

Tissues

Polarization

Chemical elements

Tissue optics

1.

Introduction

2.

Stokes–Mueller Matrix Polarimetry Formalism for Skin Cancer Classification

Eq. (1)

Eq. (2)

3.

Skin Cancer Classification Model

3.1.

Decision Tree Algorithm

3.2.

Random Forest Algorithm

3.3.

Gini Impurity

Eq. (3)

4.

Experimental Setup and Data Acquisition Process

4.1.

Experimental Setup

Fig. 1

Table 1

4.2.

Classification Workflow

4.2.1.

Data preprocessing

Table 2

4.2.2.

Training

Fig. 2

5.

Results and Discussion

Fig. 3

Table 3

Table 4

Eq. (4)

Eq. (5)

Eq. (6)

Table 5

Fig. 4

Fig. 5

Eq. (7)

Eq. (8)

Eq. (9)

6.

Conclusion

Disclosures

Acknowledgements

References

Biography

Show All Keywords

Keywords/Phrases

Search In:

Publication Years