Document Details

Document Type : Thesis 
Document Title :
A STATISTICAL STUDY ABOUT PRINCIPAL COMPONENTS ANALYSIS OF MIXTURE MODELS
دراسة احصائية حول تحليل المركبات الرئيسية للنماذج المختلطة
 
Subject : Faculty of Science 
Document Language : Arabic 
Abstract : Data scientists use various algorithms of machine learning to find patterns in large data that lead to practical insights. To treat this data properly, we need to examine if it can be interpreted in a low-dimensional space or not. In addition, we try fitting the new data with different mixture models to obtain the suitable model. This step will perform the statistical model that predicts and estimates the parameters as close as possible to the original data. In this research, we use principal component analysis as a representation of the data from high dimensional to low dimensional space and expressing the data in such a way to highlight their similarities and differences. we proposed two scenarios: The first one is dealing with the reduced data as one Gaussian mixture model. Then, we obtain the estimations of the parameters by using the expectation-maximization algorithm. The clustering method is applied on reduced data, then fit the mixture model on the new data by taking the cluster means as initial values of the means for mixture model. The second scenario is dealing with each variable in the reduced data individually, once by fitting Gaussian mixture model on each variable, and the other time by fitting Cauchy mixture model on each variable also. The benefit of using the Cauchy mixture model is demonstrated in its ability to handle with heterogeneity and outliers. The model's parameters were estimated based on the expectation maximization algorithm. The effectiveness of the discussed methods demonstrated through a simulation study and by real datasets. In this research, we also discussed the principal components analysis of mixed data (PCAMIX) and demonstrated how it is useful in today’s real-world data. Nowadays, most databases are mixed data, meaning that there is a combination of numerical and categorical variables in the database. The PCAMIX method is used to handle this type of database and to allow statistical information to be collected over the studied population. The efficiency of PCAMIX is investigated using data set available in the R package and using simulated data. 
Supervisor : Dr. Zakiah Ibrahim Kalantan 
Thesis Type : Master Thesis 
Publishing Year : 1441 AH
2020 AD
 
Added Date : Thursday, August 6, 2020 

Researchers

Researcher Name (Arabic)Researcher Name (English)Researcher TypeDr GradeEmail
ندى عائض القحطانيAlqahtani, Nada AyedResearcherMaster 

Files

File NameTypeDescription
 46658.pdf pdf 

Back To Researches Page