Open Access Open Access  Restricted Access Subscription or Fee Access

Multiple Imputation of Industrial Missing Data Using Chernoff Bound Techniques

Dr. A. Finny Belwin, Dr.A. Linda Sherin

Abstract


ABSTRACT

Data mining requires a pre-processing task in which the data are prepared and cleaned for ensuring the quality. Missing data value occurs when no data value is stored for a variable in an observation. Imputation is popular method because it is conceptually simple and because the resulting sample has the same number of observations as the full data set. Its trust for the industry to forbid the qualifies of handling enormous data. Multiple Imputations is a popular technique for analyzing incomplete data. Missing at random mechanism is often assumed when multiple imputation is performed, assuming that the response mechanism does not depend on the missing variable. However, the assumption of ignorable nonresponse may lead to largely biased estimates when in fact the missingness is nonignorable. In the research methodology of this research, we take the selection model approach and specify the response model and the respondents’ outcome model to capture the joint model of the study variable and the response indicator. This article deals with several algorithms in supervised and Unsupervised machine learning techniques like Mean, Median, Standard Deviation, Regression and Naïve Bayesian classifier. The performance of above method has been compared by using correlation statistics analysis gives the imputed values are positively related or negatively related or not related with each other. To evaluate the performance of missing values can be measured by using Chernoff Bounded Theorem, which set the bounds infimum and Supremum to the data imputation. Based upon the performance of Monotonic sequence and subsequence, the imputed missing values are increasing or decreasing or a bounded monotonic sequence of finite limit and also analyzing that every bounded sequence of missing values has a convergent subsequence. To evaluate the performance, the standard machine learning repository dataset has been used. This article focuses primarily on how to implement Chernoff Bound theorem to perform imputation of missing values. The proposed data augmentation algorithm uses the respondents’ outcome model and incorporates a semi parametric estimation of the respondents’ outcome model. The proposed multiple imputation method performs well if the specified response model is correct. In this paper, we propose a multiple imputation method in the presence of nonignorable nonresponse Data Augmentation Techniques-An effective imputation for datasets where the proportion of missing data is high. The performances of existing imputation approaches of missing value cannot satisfy the analysis requirements due to its low accuracy and poor stability, especially the rapid decreasing imputation accuracy with the increasing rate of missing data. Patterns of missing data can be with respect to cases or attributes. The global impact assessment of the imputed data is performed by several statistical tests. It is found that the imputation value is high with DarbouX variate, which fixes the infimum and supremum of the missing data.

Keywords: Imputation, Knowledge Transfer, Missing data, Data patterns, Multiple imputation. Chernoff Bound, Infimum and Supremum.


Full Text:

PDF


DOI: https://doi.org/10.37628/ijods.v6i2.639

Refbacks

  • There are currently no refbacks.