MNIST数据集上朴素贝叶斯分类器
Naive Bayes Classification
fitcnb
Train multiclass naive Bayes model
Syntax
Mdl = fitcnb(Tbl,ResponseVarName)
Mdl = fitcnb(Tbl,formula)
Mdl = fitcnb(Tbl,Y)
Mdl = fitcnb(X,Y)
Mdl = fitcnb(___,Name,Value)
[Mdl,AggregateOptimizationResults] = fitcnb(___)
Normal (Gaussian) Distribution
The ‘normal’ distribution (specify using ‘normal’ ) is appropriate for predictors that have normal distributions in each class. For each predictor you model with a normal distribution, the naive Bayes classifier estimates a separate normal distribution for each class by computing the mean and standard deviation of the training data in that class.
Kernel Distribution
The ‘kernel’ distribution (specify using ‘kernel’) is appropriate for predictors that have a continuous distribution. It does not require a strong assumption such as a normal distribution and you can use it in cases where the distribution of a predictor may be skewed or have multiple peaks or modes. It requires more computing time and more memory than the normal distribution. For each predictor you model with a kernel distribution, the naive Bayes classifier computes a separate kernel density estimate for each class based on the training data for that class. By default the kernel is the normal kernel, and the classifier selects a width automatically for each class and predictor. The software supports specifying different kernels for each predictor, and different widths for each predictor or class.
DistributionNames — fitcnb searches among 'normal' and 'kernel'.
Kernel — fitcnb searches among 'normal', 'box', 'epanechnikov', and 'triangle'.
使用pca作为降维,pca是一种非监督的线性降维方法。方差贡献率作为参数。
10000个样本的样本集
Label Count
_____ _____0 1000 1 1000 2 1000 3 1000 4 1000 5 1000 6 1000 7 1000 8 1000 9 1000
%%
filename = 'mnist';
digitDatasetPath = fullfile(matlabroot,'toolbox','nnet','nndemos', ...'nndatasets','DigitDataset');
imds = imageDatastore(digitDatasetPath, ...'IncludeSubfolders',true,'LabelSource','foldernames');
%%
labelCount = countEachLabel(imds);
%%
Y = imds.Labels;
for i = 1:sum(labelCount.Count)img = im2double(readimage(imds,i));imgs(:,i) = img(:);
end%%
[coeff,score,latent,tsquared,explained,mu] = pca(imgs');
% figure;plot(explained)
%%
idx = explained > 0.05;
X = score(:,idx);%%
DistributionNames = 'kernel';
Mdl = fitcnb(X,Y,"DistributionNames",DistributionNames);
isLabels = resubPredict(Mdl);
%%
filename = 'mnist';
digitDatasetPath = fullfile(matlabroot,'toolbox','nnet','nndemos', ...'nndatasets','DigitDataset');
imds = imageDatastore(digitDatasetPath, ...'IncludeSubfolders',true,'LabelSource','foldernames');
%%
labelCount = countEachLabel(imds);
%%
Y = imds.Labels;
for i = 1:sum(labelCount.Count)img = im2double(readimage(imds,i));imgs(:,i) = img(:);
end%%
[coeff,score,latent,tsquared,explained,mu] = pca(imgs');
% figure;plot(explained)
%%
idx = explained > 0.05;
X = score(:,idx);%%
DistributionNames = 'normal';
Mdl = fitcnb(X,Y,"DistributionNames",DistributionNames);
isLabels = resubPredict(Mdl);