Principal component and multiple correspondence analysis for handling mixed variables in the smoothed location model

The issue of classifying objects into groups when the measured variables are mixtures of continuous and binary variables has attracted the attention of statisticians. Among the discriminant methods in classification, Smoothed Location Model (SLM) is used to handle data that contains both continuous...

詳細記述

保存先:
書誌詳細
第一著者: Ngu, Penny Ai Huong
フォーマット: 学位論文
言語:English
English
出版事項: 2016
主題:
オンライン・アクセス:http://etd.uum.edu.my/6034/
タグ: タグ追加
タグなし, このレコードへの初めてのタグを付けませんか!
id my.uum.etd.6034
record_format eprints
spelling my.uum.etd.60342021-04-19T02:43:12Z http://etd.uum.edu.my/6034/ Principal component and multiple correspondence analysis for handling mixed variables in the smoothed location model Ngu, Penny Ai Huong QA299.6-433 Analysis The issue of classifying objects into groups when the measured variables are mixtures of continuous and binary variables has attracted the attention of statisticians. Among the discriminant methods in classification, Smoothed Location Model (SLM) is used to handle data that contains both continuous and binary variables simultaneously. However, this model is infeasible if the data is having a large number of binary variables. The presence of huge binary variables will create numerous multinomial cells that will later cause the occurrence of large number of empty cells. Past studies have shown that the occurrence of many empty cells affected the performance of the constructed smoothed location model. In order to overcome the problem of many empty cells due to large number of measured variables (mainly binary), this study proposes four new SLMs by combining the existing SLM with Principal Component Analysis (PCA) and four types of Multiple Correspondence Analysis (MCA). PCA is used to handle large continuous variables whereas MCA is used to deal with huge binary variables. The performance of the four proposed models, SLM+PCA+Indicator MCA, SLM+PCA+Burt MCA, SLM+PCA+Joint Correspondence Analysis (JCA), and SLM+PCA+Adjusted MCA are compared based on the misclassification rate. Results of a simulation study show that SLM+PCA+JCA model performs the best in all tested conditions since it successfully extracted the smallest amount of binary components and executed with the shortest computational time. Investigations on a real data set of full breast cancer also showed that this model produces the lowest misclassification rate. The next lowest misclassification rate is obtained by SLM+PCA+Adjusted MCA followed by SLM+PCA+Burt MCA and SLM+PCA+Indicator MCA models. Although SLM+PCA+Indicator MCA model gives the poorest performance but it is still better than a few existing classification methods. Overall, the developed smoothed location models can be considered as alternative methods for classification tasks in handling large number of mixed variables, mainly the binary. 2016 Thesis NonPeerReviewed text en /6034/1/s817094_01.pdf text en /6034/2/s817094_02.pdf Ngu, Penny Ai Huong (2016) Principal component and multiple correspondence analysis for handling mixed variables in the smoothed location model. Masters thesis, Universiti Utara Malaysia.
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Electronic Theses
url_provider http://etd.uum.edu.my/
language English
English
topic QA299.6-433 Analysis
spellingShingle QA299.6-433 Analysis
Ngu, Penny Ai Huong
Principal component and multiple correspondence analysis for handling mixed variables in the smoothed location model
description The issue of classifying objects into groups when the measured variables are mixtures of continuous and binary variables has attracted the attention of statisticians. Among the discriminant methods in classification, Smoothed Location Model (SLM) is used to handle data that contains both continuous and binary variables simultaneously. However, this model is infeasible if the data is having a large number of binary variables. The presence of huge binary variables will create numerous multinomial cells that will later cause the occurrence of large number of empty cells. Past studies have shown that the occurrence of many empty cells affected the performance of the constructed smoothed location model. In order to overcome the problem of many empty cells due to large number of measured variables (mainly binary), this study proposes four new SLMs by combining the existing SLM with Principal Component Analysis (PCA) and four types of Multiple Correspondence Analysis (MCA). PCA is used to handle large continuous variables whereas MCA is used to deal with huge binary variables. The performance of the four proposed models, SLM+PCA+Indicator MCA, SLM+PCA+Burt MCA, SLM+PCA+Joint Correspondence Analysis (JCA), and SLM+PCA+Adjusted MCA are compared based on the misclassification rate. Results of a simulation study show that SLM+PCA+JCA model performs the best in all tested conditions since it successfully extracted the smallest amount of binary components and executed with the shortest computational time. Investigations on a real data set of full breast cancer also showed that this model produces the lowest misclassification rate. The next lowest misclassification rate is obtained by SLM+PCA+Adjusted MCA followed by SLM+PCA+Burt MCA and SLM+PCA+Indicator MCA models. Although SLM+PCA+Indicator MCA model gives the poorest performance but it is still better than a few existing classification methods. Overall, the developed smoothed location models can be considered as alternative methods for classification tasks in handling large number of mixed variables, mainly the binary.
format Thesis
author Ngu, Penny Ai Huong
author_facet Ngu, Penny Ai Huong
author_sort Ngu, Penny Ai Huong
title Principal component and multiple correspondence analysis for handling mixed variables in the smoothed location model
title_short Principal component and multiple correspondence analysis for handling mixed variables in the smoothed location model
title_full Principal component and multiple correspondence analysis for handling mixed variables in the smoothed location model
title_fullStr Principal component and multiple correspondence analysis for handling mixed variables in the smoothed location model
title_full_unstemmed Principal component and multiple correspondence analysis for handling mixed variables in the smoothed location model
title_sort principal component and multiple correspondence analysis for handling mixed variables in the smoothed location model
publishDate 2016
url http://etd.uum.edu.my/6034/
_version_ 1698699498013851648
score 13.252575