Welcome to the MATLAB tutorial version of Data Science Rosetta Stone. Before beginning this tutorial, please check to make sure you have MATLAB installed.
Note: In MATLAB,
% This is a single line comment. %{ This is a paragraph comment %}
Now let’s get started!
First specify the format of the variables to be read in: %C = character, %d = integer, %f = floating point number.
formatSpec = '%C%C%d%f%f'; student = readtable('class.csv', 'Delimiter', ',', 'Format', formatSpec);
MATLAB reads tables from .xlsx formats, which is another version of an Excel file.
student_xlsx = readtable('class.xlsx');
student_json = jsondecode(fileread('class.json'));
summary(student);
Variables: Name: 19×1 categorical Values: Alfred 1 Alice 1 Barbara 1 Carol 1 Henry 1 James 1 Jane 1 Janet 1 Jeffrey 1 John 1 Joyce 1 Judy 1 Louise 1 Mary 1 Philip 1 Robert 1 Ronald 1 Thomas 1 William 1 Sex: 19×1 categorical Values: F 9 M 10 Age: 19×1 int32 Values: Min 11 Median 13 Max 16 Height: 19×1 double Values: Min 51.3 Median 62.8 Max 72 Weight: 19×1 double Values: Min 50.5 Median 99.5 Max 150summary()
The ":" operator tells MATLAB to print all columns (variables), while "1:5" indicates to print only the first 5 observations.
disp(student(1:5,:));
Name Sex Age Height Weight _______ ___ ___ ______ ______ Alfred M 14 69 112.5 Alice F 13 56.5 84 Barbara F 13 65.3 98 Carol F 14 62.8 102.5 Henry M 14 63.5 102.5disp()
age = table2array(student(:,3)); disp(mean(age));
13.3158
height = table2array(student(:,4)); disp(mean(height));
62.3368
weight = table2array(student(:,5)); disp(mean(weight));
100.0263table2array() | mean() | disp()
numeric_vars = student(:,{'Age', 'Height', 'Weight'}); statarray = grpstats(numeric_vars, [], {'min', 'median', 'mean', 'max'}); disp(statarray);
GroupCount min_Age median_Age mean_Age max_Age min_Height median_Height mean_Height max_Height min_Weight median_Weight mean_Weight max_Weight __________ _______ __________ ________ _______ __________ _____________ ___________ __________ __________ _____________ ___________ __________ All 19 11 13 13.316 16 51.3 62.8 62.337 72 50.5 99.5 100.03 150grpstats() | disp()
weight = table2array(student(:,5)); disp(std(weight));
22.7739
disp(sum(weight));
1.9005e+03
disp(length(weight));
19
disp(max(weight));
150
disp(min(weight));
50.5000
disp(median(weight));
99.5000table2array() | std() | sum() | length() | max() | min() | median()
tabulate(age);
Value Count Percent 1 0 0.00% 2 0 0.00% 3 0 0.00% 4 0 0.00% 5 0 0.00% 6 0 0.00% 7 0 0.00% 8 0 0.00% 9 0 0.00% 10 0 0.00% 11 2 10.53% 12 5 26.32% 13 3 15.79% 14 4 21.05% 15 4 21.05% 16 1 5.26%tabulate()
sex = table2array(student(:,{'Sex'}));
tabulate(sex);
Value Count Percent F 9 47.37% M 10 52.63%
crosstable = varfun(@(x) length(x), student, 'GroupingVariables', {'Age' 'Sex'}, 'InputVariables', {}); disp(crosstable);
Age Sex GroupCount ___ ___ __________ 11 F 1 11 M 1 12 F 2 12 M 3 13 F 2 13 M 1 14 F 2 14 M 2 15 F 2 15 M 2 16 M 1varfun()
% Find the indices of those students who are females, and then get those observations % from the student data frame. females = student(student.Sex == 'F',:); disp(females(1:5,:));
Name Sex Age Height Weight _______ ___ ___ ______ ______ Alice F 13 56.5 84 Barbara F 13 65.3 98 Carol F 14 62.8 102.5 Jane F 12 59.8 84.5 Janet F 15 62.5 112.5
The first argument of the cat function is dim, which is specified as 2 here to indicate to concatenate column-wise.
height_weight = cat(2,table2array(student(:,4)),table2array(student(:,5))); disp(corr(height_weight));
1.0000 0.8778 0.8778 1.0000table2array() | cat() | corr()
Weight = table2array(student(:,{'Weight'})); histogram(Weight, 40:20:160) xlabel('Weight'); ylabel('Frequency');
boxplot(Weight); mx = mean(Weight); ylabel('Weight'); hold on plot(mx, 'd') hold off
Height = table2array(student(:,{'Height'})); scatter(Height, Weight) xticks(50:5:75) yticks(40:20:160) xlabel('Height') ylabel('Weight')
scatter(Height, Weight) xticks(50:5:75) yticks(40:20:160) xlabel('Height') ylabel('Weight') b = polyfit(Height, Weight,1); m = b(1); y_int = b(2); lsline annotation('textbox', [.2 .5 .3 .3], 'String', sprintf('Line: y = %fx + %f', m, y_int), ... 'FitBoxToText', 'on');
Sex = table2array(student(:,{'Sex'})); histogram(Sex) xlabel('Sex') ylabel('Frequency')
females = student(student.Sex == 'F',:); males = student(student.Sex == 'M',:); Female_Weight = table2array(females(:,{'Weight'})); Male_Weight = table2array(males(:,{'Weight'})); clf boxplot(Weight, Sex); means = [mean(Female_Weight), mean(Male_Weight)]; xlabel('Sex'); ylabel('Weight'); hold on plot(means, 'd') hold off
% The "./" (and similarly, ".^2") tells MATLAB to divide (and similarly, exponentiate) % element-wise, instead of matrix-wise. student.BMI = student.Weight ./ student.Height .^ 2 * 703; disp(student(1:5,:));
Name Sex Age Height Weight BMI _______ ___ ___ ______ ______ ______ Alfred M 14 69 112.5 16.612 Alice F 13 56.5 84 18.499 Barbara F 13 65.3 98 16.157 Carol F 14 62.8 102.5 18.271 Henry M 14 63.5 102.5 17.87
student.BMI_Class = student.Name; for i = 1:size(student,1) if student.BMI(i) < 19.0 student.BMI_Class(i) = 'Underweight'; else student.BMI_Class(i) = 'Healthy'; end end disp(student(1:5,:));
Name Sex Age Height Weight BMI BMI_Class _______ ___ ___ ______ ______ ______ ___________ Alfred M 14 69 112.5 16.612 Underweight Alice F 13 56.5 84 18.499 Underweight Barbara F 13 65.3 98 16.157 Underweight Carol F 14 62.8 102.5 18.271 Underweight Henry M 14 63.5 102.5 17.87 Underweight
student.LogWeight = log(student.Weight); student.ExpAge = exp(double(student.Age)); student.SqrtHeight = sqrt(student.Height); student.BMI_Neg = student.BMI; for i = 1:size(student,1) if student.BMI(i) < 19.0 student.BMI_Neg(i) = -student.BMI(i); end end student.BMI_Pos = abs(student.BMI_Neg); student.BMI_Check = (student.BMI_Pos == student.BMI); disp(student(1:5,:));
Name Sex Age Height Weight BMI BMI_Class LogWeight ExpAge SqrtHeight BMI_Neg BMI_Pos BMI_Check _______ ___ ___ ______ ______ ______ ___________ _________ __________ __________ _______ _______ _________ Alfred M 14 69 112.5 16.612 Underweight 4.723 1.2026e+06 8.3066 -16.612 16.612 true Alice F 13 56.5 84 18.499 Underweight 4.4308 4.4241e+05 7.5166 -18.499 18.499 true Barbara F 13 65.3 98 16.157 Underweight 4.585 4.4241e+05 8.0808 -16.157 16.157 true Carol F 14 62.8 102.5 18.271 Underweight 4.6299 1.2026e+06 7.9246 -18.271 18.271 true Henry M 14 63.5 102.5 17.87 Underweight 4.6299 1.2026e+06 7.9687 -17.87 17.87 truelog() | exp() | sqrt() | abs()
Setting the variables to an empty array deletes the variables from the dataset.
student.LogWeight = []; student.ExpAge = []; student.SqrtHeight = []; student.BMI_Neg = []; student.BMI_Pos = []; student.BMI_Check = []; disp(student(1:5,:));
Name Sex Age Height Weight BMI BMI_Class _______ ___ ___ ______ ______ ______ ___________ Alfred M 14 69 112.5 16.612 Underweight Alice F 13 56.5 84 18.499 Underweight Barbara F 13 65.3 98 16.157 Underweight Carol F 14 62.8 102.5 18.271 Underweight Henry M 14 63.5 102.5 17.87 Underweight
student = sortrows(student, 'Age');
disp(student(1:5,:));
Name Sex Age Height Weight BMI BMI_Class ______ ___ ___ ______ ______ ______ ___________ Joyce F 11 51.3 50.5 13.49 Underweight Thomas M 11 57.5 85 18.073 Underweight James M 12 57.3 83 17.772 Underweight Jane F 12 59.8 84.5 16.612 Underweight John M 12 59 99.5 20.094 Healthysortrows()
student = sortrows(student, 'Sex');
disp(student(1:5,:));
Name Sex Age Height Weight BMI BMI_Class _______ ___ ___ ______ ______ ______ ___________ Joyce F 11 51.3 50.5 13.49 Underweight Jane F 12 59.8 84.5 16.612 Underweight Louise F 12 56.3 77 17.078 Underweight Alice F 13 56.5 84 18.499 Underweight Barbara F 13 65.3 98 16.157 Underweightsortrows()
group_means = grpstats(student, 'Sex', 'mean', 'DataVars', {'Age', 'Height', 'Weight', 'BMI'}); disp(group_means);
Sex GroupCount mean_Age mean_Height mean_Weight mean_BMI ___ __________ ________ ___________ ___________ ________ F F 9 13.222 60.589 90.111 17.051 M M 10 13.4 63.91 108.95 18.594grpstats()
disp(student(15:19,:));
Name Sex Age Height Weight BMI BMI_Class _______ ___ ___ ______ ______ ______ ___________ Alfred M 14 69 112.5 16.612 Underweight Henry M 14 63.5 102.5 17.87 Underweight Ronald M 15 67 133 20.828 Healthy William M 15 66.5 112 17.805 Underweight Philip M 16 72 150 20.341 Healthy
newObs = {'Name', 'Sex', 'Age', 'Height', 'Weight', 'BMI', 'BMI_Class'; 'Jane', 'F', 14, 56.3, 77.0, 17.077695, 'Underweight'}; newTable = dataset2table(cell2dataset(newObs)); student = vertcat(student,newTable); disp(student(16:20,:));
Name Sex Age Height Weight BMI BMI_Class _______ ___ ___ ______ ______ ______ ___________ Henry M 14 63.5 102.5 17.87 Underweight Ronald M 15 67 133 20.828 Healthy William M 15 66.5 112 17.805 Underweight Philip M 16 72 150 20.341 Healthy Jane F 14 56.3 77 17.078 Underweightcell2dataset() | dataset2table() | vertcat()
% To create a user-defined function, create a new file in MATLAB with the function definition, % and save the file as the function_name.m. Here, toKG.m would be: % % function KG = toKG(lb); % KG = 0.45359237 * lb; % end student.Weight_KG = toKG(student.Weight); disp(student(1:5,:));
Name Sex Age Height Weight BMI BMI_Class Weight_KG _______ ___ ___ ______ ______ ______ ___________ _________ Joyce F 11 51.3 50.5 13.49 Underweight 22.906 Jane F 12 59.8 84.5 16.612 Underweight 38.329 Louise F 12 56.3 77 17.078 Underweight 34.927 Alice F 13 56.5 84 18.499 Underweight 38.102 Barbara F 13 65.3 98 16.157 Underweight 44.452user-defined function
formatSpec = '%C%f%f%f%f%f%f'; fish = readtable('fish.csv', 'Delimiter', ',', ... 'Format', formatSpec); fish = sortrows(fish, 'Weight', 'descend'); disp(fish(1:5,:));
Species Weight Length1 Length2 Length3 Height Width _______ ______ _______ _______ _______ ______ ______ Bream NaN 29.5 32 37.3 13.913 5.0728 Pike 1650 59 63.4 68 10.812 7.48 Pike 1600 56 60 64 9.6 6.144 Pike 1550 56 60 64 9.6 6.144 Pike 1250 52 56 59.7 10.686 6.9849
fish = rmmissing(fish); disp(fish(1:5,:));
Species Weight Length1 Length2 Length3 Height Width _______ ______ _______ _______ _______ ______ ______ Pike 1650 59 63.4 68 10.812 7.48 Pike 1600 56 60 64 9.6 6.144 Pike 1550 56 60 64 9.6 6.144 Pike 1250 52 56 59.7 10.686 6.9849 Perch 1100 39 42 44.6 12.8 6.8684readtable() | sortrows() | rmmissing()
formatSpec = '%C%C%d%f%f'; student = readtable('class.csv', 'Delimiter', ',', ... 'Format', formatSpec); student1 = student(:, {'Name', 'Sex', 'Age'}); disp(student1(1:5,:));
Name Sex Age _______ ___ ___ Alfred M 14 Alice F 13 Barbara F 13 Carol F 14 Henry M 14
student2 = student(:, {'Name', 'Height', 'Weight'}); disp(student2(1:5,:));
Name Height Weight _______ ______ ______ Alfred 69 112.5 Alice 56.5 84 Barbara 65.3 98 Carol 62.8 102.5 Henry 63.5 102.5readtable()
new = join(student1, student2); disp(new(1:5,:));
Name Sex Age Height Weight _______ ___ ___ ______ ______ Alfred M 14 69 112.5 Alice F 13 56.5 84 Barbara F 13 65.3 98 Carol F 14 62.8 102.5 Henry M 14 63.5 102.5join()
newstudent1 = student(:, {'Name', 'Sex', 'Age'}); disp(newstudent1(1:5,:));
Name Sex Age _______ ___ ___ Alfred M 14 Alice F 13 Barbara F 13 Carol F 14 Henry M 14
newstudent2 = student(:, {'Height', 'Weight'}); disp(newstudent2(1:5,:));
Height Weight ______ ______ 69 112.5 56.5 84 65.3 98 62.8 102.5 63.5 102.5
new2 = [newstudent1, newstudent2]; disp(new2(1:5,:));
Name Sex Age Height Weight _______ ___ ___ ______ ______ Alfred M 14 69 112.5 Alice F 13 56.5 84 Barbara F 13 65.3 98 Carol F 14 62.8 102.5 Henry M 14 63.5 102.5
% Currently there is not a MATLAB function for creating pivot tables, but only user-defined functions % that could be used to create pivot tables.pivottable user-defined function
price = readtable('price.xlsx');
disp(unique(price.STATE));
'Baja California Norte' 'British Columbia' 'California' 'Campeche' 'Colorado' 'Florida' 'Illinois' 'Michoacan' 'New York' 'North Carolina' 'Nuevo Leon' 'Ontario' 'Quebec' 'Saskatchewan' 'Texas' 'Washington'readtable() | unique()
formatSpec = '%f%f%f%f%d'; iris = readtable('iris.csv', 'Delimiter', ',', ... 'Format', formatSpec); features = table2array(iris(:, {'SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth'})); % Z-score function to scale Zsc = @(x) (x-mean(x))./std(x); features_scaled = Zsc(features); disp(pca(features_scaled));
0.5224 0.3723 0.7210 -0.2620 -0.2634 0.9256 -0.2420 0.1241 0.5813 0.0211 -0.1409 0.8012 0.5656 0.0654 -0.6338 -0.5235readtable() | table2array() | pca()
sizeIris = size(iris); numRows = sizeIris(1); % Set the seed of the random number generator % for reproducibility. rng(29); [trainInd, valInd, testInd] = dividerand(numRows, 0.7, 0, 0.3); train = iris(trainInd,:); test = iris(testInd,:); csvwrite('iris_train_ML.csv', table2array(train)); csvwrite('iris_test_ML.csv', table2array(test));size() | rng() | dividerand() | table2array() | csvwrite()
formatSpec = '%d%f%f%C%C%C%C%d'; tips = readtable('tips.csv', 'Delimiter', ',', ... 'Format', formatSpec); tips.fifteen = 0.15 * tips.total_bill; tips.greater15 = (tips.tip > tips.fifteen); [b, dev, stats] = glmfit(tips.total_bill, tips.greater15, 'binomial', 'link', 'logit'); fprintf('The coefficients of the model are: %.3f and %.3f\n', b(1), b(2)); fprintf('The deviance of the fit of the fit is: %.3f\n', dev); fprintf('Other statistics of the model are:\n'); disp(stats);
The coefficients of the model are: 1.648 and -0.072 The deviance of the fit of the fit is: 313.743 Other statistics of the model are: beta: [2×1 double] dfe: 242 sfit: 1.0097 s: 1 estdisp: 0 covb: [2×2 double] se: [2×1 double] coeffcorr: [2×2 double] t: [2×1 double] p: [2×1 double] resid: [244×1 double] residp: [244×1 double] residd: [244×1 double] resida: [244×1 double] wts: [244×1 double]readtable() | glmfit() | fprintf()
linreg = fitlm(tips,'tip~total_bill');
disp(linreg);
Linear regression model: tip ~ 1 + total_bill Estimated Coefficients: Estimate SE tStat pValue ________ _________ ______ __________ (Intercept) 0.92027 0.15973 5.7612 2.5264e-08 total_bill 0.10502 0.0073648 14.26 6.6925e-34 Number of observations: 244, Error degrees of freedom: 242 Root Mean Squared Error: 1.02 R-squared: 0.457, Adjusted R-Squared 0.454 F-statistic vs. constant model: 203, p-value = 6.69e-34fitlm()
formatSpec = '%f%d%d%d%f'; train = readtable('tips_train.csv', 'Delimiter', ',', ... 'Format', formatSpec); test = readtable('tips_test.csv', 'Delimiter', ',', ... 'Format', formatSpec); train.fifteen = 0.15 * train.total_bill; train.greater15 = (train.tip > train.fifteen); test.fifteen = 0.15 * test.total_bill; test.greater15 = (test.tip > test.fifteen); [b, dev, stats] = glmfit(train.total_bill, train.greater15, 'binomial', 'link', 'logit'); fprintf('The coefficients of the model are: %.3f and %.3f\n', b(1), b(2)); fprintf('The deviance of the fit of the fit is: %.3f\n', dev); fprintf('Other statistics of the model are:\n'); disp(stats);
The coefficients of the model are: 1.646 and -0.071 The deviance of the fit of the fit is: 250.584 Other statistics of the model are: beta: [2×1 double] dfe: 193 sfit: 1.0107 s: 1 estdisp: 0 covb: [2×2 double] se: [2×1 double] coeffcorr: [2×2 double] t: [2×1 double] p: [2×1 double] resid: [195×1 double] residp: [195×1 double] residd: [195×1 double] resida: [195×1 double] wts: [195×1 double]readtable() | glmfit() | fprintf()
predictions = glmval(b, test.total_bill, 'logit'); predY = round(predictions); Results = strings(size(test,1),1); for i = 1:size(test,1) if (predY(i) == test.greater15(i)) Results(i) = 'Correct'; else Results(i) = 'Wrong'; end end tabulate(Results);
Value Count Percent Correct 34 69.39% Wrong 15 30.61%glmval() | round() | size() | strings() | tabulate()
train = readtable('boston_train.xlsx'); test = readtable('boston_test.xlsx'); x_train = train; x_train.Target = []; y_train = train.Target; x_test = test; x_test.Target = []; y_test = test.Target; linreg = fitlm(table2array(x_train), y_train); disp(linreg);
Linear regression model: y ~ 1 + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 + x11 + x12 + x13 Estimated Coefficients: Estimate SE tStat pValue __________ _________ ________ __________ (Intercept) 36.108 6.505 5.5509 5.7321e-08 x1 -0.085634 0.042774 -2.002 0.046077 x2 0.046034 0.01715 2.6842 0.0076262 x3 0.036413 0.076006 0.47909 0.63219 x4 3.248 1.0741 3.0238 0.0026862 x5 -14.873 4.6361 -3.2081 0.0014633 x6 3.5769 0.53699 6.6609 1.0962e-10 x7 -0.0087032 0.016853 -0.51643 0.60589 x8 -1.3689 0.25296 -5.4115 1.1818e-07 x9 0.31312 0.082366 3.8016 0.00017037 x10 -0.012882 0.0045986 -2.8012 0.0053829 x11 -0.9769 0.171 -5.713 2.4255e-08 x12 0.011326 0.0033585 3.3722 0.00083155 x13 -0.52672 0.062563 -8.419 1.0751e-15 Number of observations: 354, Error degrees of freedom: 340 Root Mean Squared Error: 4.99 R-squared: 0.724, Adjusted R-Squared 0.713 F-statistic vs. constant model: 68.5, p-value = 1.39e-86readtable() | fitlm()
predictions = predict(linreg, table2array(x_test)); sq_diff = (predictions - y_test) .^ 2; disp(mean(sq_diff));
17.7713
train = readtable('breastcancer_train.xlsx'); test = readtable('breastcancer_test.xlsx'); x_train = train; x_train.Target = []; y_train = train.Target; x_test = test; x_test.Target = []; y_test = test.Target; rng(29); treeMod = fitctree(x_train, y_train); var_import = predictorImportance(treeMod); var_import = var_import'; var_import(:,2) = var_import; for i = 1:size(var_import,1) var_import(i,1) = i; end var_import = sortrows(var_import, 2, 'descend'); disp(var_import(1:5,:));
24.0000 0.0292 28.0000 0.0077 2.0000 0.0019 12.0000 0.0013 5.0000 0.0009readtable() | rng() | fitctree() | predictorImportance() | size() | sortrows()
predictions = predict(treeMod, x_test); Results = strings(size(test,1),1); for i = 1:size(test,1) if (predictions(i) == y_test(i)) Results(i) = 'Correct'; else Results(i) = 'Wrong'; end end tabulate(Results);
Value Count Percent Correct 160 93.57% Wrong 11 6.43%predict() | strings() | size() | tabulate()
train = readtable('boston_train.xlsx'); test = readtable('boston_test.xlsx'); x_train = train; x_train.Target = []; y_train = train.Target; x_test = test; x_test.Target = []; y_test = test.Target; rng(29); treeMod = fitrtree(x_train, y_train); var_import = predictorImportance(treeMod); var_import = var_import'; var_import(:,2) = var_import; for i = 1:size(var_import,1) var_import(i,1) = i; end var_import = sortrows(var_import, 2, 'descend'); disp(var_import(1:5,:));
6.0000 0.6921 13.0000 0.2451 8.0000 0.1227 5.0000 0.0506 1.0000 0.0359readtable() | rng() | fitrtree() | predictorImportance() | size() | sortrows()
train = readtable('breastcancer_train.xlsx'); test = readtable('breastcancer_test.xlsx'); x_train = train; x_train.Target = []; y_train = train.Target; x_test = test; x_test.Target = []; y_test = test.Target; rng(29); rfMod = fitrensemble(table2array(x_train), y_train, 'Method', 'bag'); var_import = predictorImportance(rfMod); var_import = var_import'; var_import(:,2) = var_import; for i = 1:size(var_import,1) var_import(i,1) = i; end var_import = sortrows(var_import, 2, 'descend'); disp(var_import(1:5,:));
23.0000 0.0045 28.0000 0.0041 24.0000 0.0038 8.0000 0.0029 21.0000 0.0026readtable() | rng() | table2array() | fitrensemble() | predictorImportance() | size() | sortrows()
predictions = predict(rfMod, table2array(x_test)); predictions = round(predictions); Results = strings(size(test,1),1); for i = 1:size(test,1) if (predictions(i) == y_test(i)) Results(i) = 'Correct'; else Results(i) = 'Wrong'; end end tabulate(Results);
Value Count Percent Correct 166 97.08% Wrong 5 2.92%table2array() | predict() | round() | size() | strings() | tabulate()
train = readtable('boston_train.xlsx'); test = readtable('boston_test.xlsx'); x_train = train; x_train.Target = []; y_train = train.Target; x_test = test; x_test.Target = []; y_test = test.Target; rng(29); rfMod = fitrensemble(table2array(x_train), y_train, 'Method', 'bag'); var_import = predictorImportance(rfMod); var_import = var_import'; var_import(:,2) = var_import; for i = 1:size(var_import,1) var_import(i,1) = i; end var_import = sortrows(var_import, 2, 'descend'); disp(var_import(1:5,:));
6.0000 0.4811 13.0000 0.4738 1.0000 0.0873 3.0000 0.0842 8.0000 0.0723readtable() | rng() | table2array() | fitrensemble() | predictorImportance() | size() | sortrows()
predictions = predict(rfMod, table2array(x_test)); sq_diff = (predictions - y_test) .^ 2; disp(mean(sq_diff));
10.5583table2array() | predict() | mean()
Note: In implementation scaling should be used.
train = readtable('breastcancer_train.xlsx'); test = readtable('breastcancer_test.xlsx'); x_train = train; x_train.Target = []; y_train = train.Target; x_test = test; x_test.Target = []; y_test = test.Target; rng(29); svMod = fitcsvm(x_train, y_train);readtable() | rng() | fitcsvm()
predictions = predict(svMod, x_test); Results = strings(size(test,1),1); for i = 1:size(test,1) if (predictions(i) == y_test(i)) Results(i) = 'Correct'; else Results(i) = 'Wrong'; end end tabulate(Results);
Value Count Percent Correct 163 95.32% Wrong 8 4.68%predict() | size() | strings() | tabulate()
Note: In implementation scaling should be used.
train = readtable('boston_train.xlsx'); test = readtable('boston_test.xlsx'); x_train = train; x_train.Target = []; y_train = train.Target; x_test = test; x_test.Target = []; y_test = test.Target; rng(29); svMod = fitrsvm(x_train, y_train);readtable() | rng() | fitrsvm()
formatSpec = '%f%f%f%f%d'; iris = readtable('iris.csv', 'Delimiter', ',', ... 'Format', formatSpec); features = table2array(iris(:, {'SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth'})); iris.Labels = strings(size(iris,1),1); for i = 1:size(iris,1) if (iris.Target(i) == 0) iris.Labels(i) = 'Setosa'; else if (iris.Target(i) == 1) iris.Labels(i) = 'Versicolor'; else iris.Labels(i) = 'Virginica'; end end end rng(29); [labels, C] = kmeans(features, 3); iris.Predictions = labels; disp(crosstab(iris.Labels, iris.Predictions));
0 50 0 3 0 47 36 0 14readtable() | table2array() | size() | strings() | kmeans() | crosstab()
rng(29); tree = linkage(features, 'ward', 'euclidean', 'savememory', 'on'); labels = cluster(tree, 'maxclust', 3); iris.Predictions = labels; disp(crosstab(iris.Labels, iris.Predictions));
0 0 50 1 49 0 35 15 0rng() | linkage() | cluster() | crosstab()
model = arima('Constant',0,'D',1,'Seasonality',12,... 'MALags',1,'SMALags',12); est_model = estimate(model, air.AIR); [yF, yMSE] = forecast(est_model, 24, 'Y0', air.AIR); plot(yF);
ARIMA(0,1,1) Model Seasonally Integrated with Seasonal MA(12): --------------------------------------------------------------- Conditional Probability Distribution: Gaussian Standard t Parameter Value Error Statistic ----------- ----------- ------------ ----------- Constant 0 Fixed Fixed MA{1} -0.309349 0.0619375 -4.99454 SMA{12} -0.112821 0.082419 -1.36887 Variance 123.282 12.5426 9.82908
train = readtable('boston_train.xlsx'); test = readtable('boston_test.xlsx'); x_train = train; x_train.Target = []; y_train = train.Target; x_test = test; x_test.Target = []; y_test = test.Target; rng(29); rfMod = fitrensemble(table2array(x_train), y_train, 'Method', 'bag'); predictions = predict(rfMod, table2array(x_train)); r2_rf = 1 - ( (sum((y_train - predictions) .^ 2)) / (sum((y_train - mean(y_train)) .^ 2)) ); fprintf('Random forest regression model r^2 score (coefficient of determination): %.3f\n', r2_rf);
Random forest regression model r^2 score (coefficient of determination): 0.930readtable() | rng() | table2array() | fitrensemble() | predict() | mean() | sum() | fprintf()
predictions = predict(rfMod, table2array(x_test));
r2_rf = 1 - ( (sum((y_test - predictions) .^ 2)) / (sum((y_test - mean(y_test)) .^ 2)) );
fprintf('Random forest regression model r^2 score (coefficient of determination): %.3f\n', r2_rf);
Random forest regression model r^2 score (coefficient of determination): 0.867predict() | mean() | sum() | fprintf()
train = readtable('digits_train.xlsx'); test = readtable('digits_test.xlsx'); x_train = train; x_train.Target = []; y_train = train.Target; x_test = test; x_test.Target = []; y_test = test.Target; rng(29); rfMod = fitrensemble(table2array(x_train), y_train, 'Method', 'bag'); predY = predict(rfMod, table2array(x_train)); predY = round(predY); Results = zeros(size(train,1),1); for i = 1:size(Results,1) if (predY(i) == y_train(i)) Results(i) = 1; else Results(i) = 0; end end accuracy_rf = (1/size(x_train,1)) * sum(Results); fprintf('Random forest model accuracy: %.3f\n', accuracy_rf);
Random forest model accuracy: 0.692readtable() | rng() | table2array() | fitrensemble() | predict() | round() | size() | zeros() | sum() | fprintf()
predY = predict(rfMod, table2array(x_test)); predY = round(predY); Results = zeros(size(test,1),1); for i = 1:size(Results,1) if (predY(i) == y_test(i)) Results(i) = 1; else Results(i) = 0; end end accuracy_rf = (1/size(x_test,1)) * sum(Results); fprintf('Random forest model accuracy: %.3f\n', accuracy_rf);
Random forest model accuracy: 0.526predict() | round() | size() | zeros() | sum() | fprintf()
Note, from the MathWorks website: "MATLAB is an abbreviation for "matrix laboratory." While other programming languages mostly work with numbers one at a time, MATLAB® is designed to operate primarily on whole matrices and arrays. All MATLAB variables are multidimensional arrays, no matter what type of data."
A matrix is a two-dimensional array often used for linear algebra operations. Please see the following example of matrix creation and access:
my_matrix = [1 2 3; 4 5 6; 7 8 9] disp(my_matrix(2,2));
5