- Imputation: Fill in the missing values with estimated values. This can be done using various imputation techniques such as mean imputation, median imputation, or regression imputation. However, it's important to note that imputation may introduce some bias into the analysis.
- Deletion: Remove the rows or columns with missing values from the dataset. This approach is suitable when the missing values are relatively few and randomly distributed. However, it may lead to a loss of information if the missing values are not missing at random.
- Distance-based imputation: Instead of imputing the missing values directly, you can calculate the pairwise distances between observations and use these distances to estimate the missing values. This approach can be useful if you have a good understanding of the underlying data structure.
What to do when dataset for linkage/Dendogram functions contains NANs?
8 次查看(过去 30 天)
显示 更早的评论
Hi all,
I am new to the subject of cluster analysis and try to use the linkage and then dendrogram function while feeding it with an array that contains a few NANs. I receive an 3-by-Y array from linkage whereby the third column is filled with NANs only. Dendogram then crushes.
Any advice how to deal with it?
Cheers,
Jan
0 个评论
采纳的回答
Sandeep
2023-8-31
Hi Jan Skerswetat,
When dealing with missing values (NANs) in cluster analysis, there are a few approaches you can consider:
Before applying any of these approaches, it's important to understand the nature and pattern of missing values in your dataset. Additionally, consider the potential impact of missing values on your cluster analysis results and whether imputation or deletion is more appropriate for your specific case.
You can refer to the following page for some commonly used imputation functions:
Hope you find it helpful.
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Descriptive Statistics 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!