What is the problem with the shape of the roc curve with low auc(.4)?Is it possible to compare the classification ability of two sets of features by ROC?How to fix ROC curve with points below diagonal?Matlab and Support Vector Machines: Why doesn't the implementation of PCA give good prediction results?ROC curve and libsvmPerfcurve positive classsklearn - model keeps overfittingHow to plot a ROC curve of a detector generated by TrainCascadeObjectDetector?Calculating in Matlab confidence intervals and AUC in case of IdentificationSpecificity of ROC curve plotting in reverse directionwhy have high AUC and low accuracy in a balanced dataset for SVM
What is more safe for browsing the web: PC or smartphone?
What is monoid homomorphism exactly?
What happens if I accidentally leave an app running and click "Install Now" in Software Updater?
Hostile Divisor Numbers
In "Avengers: Endgame", what does this name refer to?
What is the thing used to help pouring liquids called?
How to say something covers all the view up to the horizon line?
What does the copyright in a dissertation protect exactly?
Referring to person by surname, keep or omit "von"?
Can an Iranian citizen enter the USA on a Dutch passport?
Huffman Code in C++
Convert Numbers To Emoji Math
Lines too long in piece with two sections for different instruments
Make me a minimum magic sum
What do you call a painting painted on a wall?
Playing Doublets with the Primes
What does のそ mean on this picture?
Endgame puzzle: How to avoid stalemate and win?
Subnumcases as a part of align
A 2-connected graph contains a path passing through all the odd degree vertices
Do Jedi mind tricks work on Ewoks?
Was there a dinosaur-counter in the original Jurassic Park movie?
Given a safe domain, are subdirectories safe as well?
Changing stroke width vertically but not horizontally in Inkscape
What is the problem with the shape of the roc curve with low auc(.4)?
Is it possible to compare the classification ability of two sets of features by ROC?How to fix ROC curve with points below diagonal?Matlab and Support Vector Machines: Why doesn't the implementation of PCA give good prediction results?ROC curve and libsvmPerfcurve positive classsklearn - model keeps overfittingHow to plot a ROC curve of a detector generated by TrainCascadeObjectDetector?Calculating in Matlab confidence intervals and AUC in case of IdentificationSpecificity of ROC curve plotting in reverse directionwhy have high AUC and low accuracy in a balanced dataset for SVM
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I'm trying to plot a ROC curve. I have 75 data points and I considered only 10 features. Ii'm getting a staircase like image see below. Is this due to the small data set? Can we add more points to improve the curve?
AUC is very low .44. Is there any method to upload csv file ?
species1= readtable('target.csv');
species1 = table2cell(species1)
meas1= readtable('feature.csv');
meas1=meas1(:,1:10);
meas1= table2array(meas1)
numObs = length(species1);
half = floor(numObs/2);
training = meas1(1:half,:);
trainingSpecies = species1(1:half);
sample = meas1(half+1:end,:);
trainingSpecies = cell2mat(trainingSpecies)
group = species1(half+1:end,:);
group = cell2mat(group)
SVMModel = fitcsvm(training,trainingSpecies)
[label,score] = predict(SVMModel,sample);
[X,Y,T,AUC] = perfcurve(group,score(:,2),'1');
plot(X,Y,'LineWidth',3)
xlabel('False positive rate')
ylabel('True positive rate')
title('ROC for Classification ')
matlab svm
add a comment |
I'm trying to plot a ROC curve. I have 75 data points and I considered only 10 features. Ii'm getting a staircase like image see below. Is this due to the small data set? Can we add more points to improve the curve?
AUC is very low .44. Is there any method to upload csv file ?
species1= readtable('target.csv');
species1 = table2cell(species1)
meas1= readtable('feature.csv');
meas1=meas1(:,1:10);
meas1= table2array(meas1)
numObs = length(species1);
half = floor(numObs/2);
training = meas1(1:half,:);
trainingSpecies = species1(1:half);
sample = meas1(half+1:end,:);
trainingSpecies = cell2mat(trainingSpecies)
group = species1(half+1:end,:);
group = cell2mat(group)
SVMModel = fitcsvm(training,trainingSpecies)
[label,score] = predict(SVMModel,sample);
[X,Y,T,AUC] = perfcurve(group,score(:,2),'1');
plot(X,Y,'LineWidth',3)
xlabel('False positive rate')
ylabel('True positive rate')
title('ROC for Classification ')
matlab svm
Perfcurve creates a threshold for every single point, when that happens, it will always be a stepwise plot.
– Durkee
Mar 22 at 12:54
There's nothing to solve here. This is just how it works. You could run a smoothing function I guess but that would degrade the quality.
– Durkee
Mar 22 at 13:27
which smoothing function
– leena s
Mar 22 at 13:44
add a comment |
I'm trying to plot a ROC curve. I have 75 data points and I considered only 10 features. Ii'm getting a staircase like image see below. Is this due to the small data set? Can we add more points to improve the curve?
AUC is very low .44. Is there any method to upload csv file ?
species1= readtable('target.csv');
species1 = table2cell(species1)
meas1= readtable('feature.csv');
meas1=meas1(:,1:10);
meas1= table2array(meas1)
numObs = length(species1);
half = floor(numObs/2);
training = meas1(1:half,:);
trainingSpecies = species1(1:half);
sample = meas1(half+1:end,:);
trainingSpecies = cell2mat(trainingSpecies)
group = species1(half+1:end,:);
group = cell2mat(group)
SVMModel = fitcsvm(training,trainingSpecies)
[label,score] = predict(SVMModel,sample);
[X,Y,T,AUC] = perfcurve(group,score(:,2),'1');
plot(X,Y,'LineWidth',3)
xlabel('False positive rate')
ylabel('True positive rate')
title('ROC for Classification ')
matlab svm
I'm trying to plot a ROC curve. I have 75 data points and I considered only 10 features. Ii'm getting a staircase like image see below. Is this due to the small data set? Can we add more points to improve the curve?
AUC is very low .44. Is there any method to upload csv file ?
species1= readtable('target.csv');
species1 = table2cell(species1)
meas1= readtable('feature.csv');
meas1=meas1(:,1:10);
meas1= table2array(meas1)
numObs = length(species1);
half = floor(numObs/2);
training = meas1(1:half,:);
trainingSpecies = species1(1:half);
sample = meas1(half+1:end,:);
trainingSpecies = cell2mat(trainingSpecies)
group = species1(half+1:end,:);
group = cell2mat(group)
SVMModel = fitcsvm(training,trainingSpecies)
[label,score] = predict(SVMModel,sample);
[X,Y,T,AUC] = perfcurve(group,score(:,2),'1');
plot(X,Y,'LineWidth',3)
xlabel('False positive rate')
ylabel('True positive rate')
title('ROC for Classification ')
matlab svm
matlab svm
edited Mar 23 at 4:37
leena s
asked Mar 22 at 9:38
leena sleena s
15
15
Perfcurve creates a threshold for every single point, when that happens, it will always be a stepwise plot.
– Durkee
Mar 22 at 12:54
There's nothing to solve here. This is just how it works. You could run a smoothing function I guess but that would degrade the quality.
– Durkee
Mar 22 at 13:27
which smoothing function
– leena s
Mar 22 at 13:44
add a comment |
Perfcurve creates a threshold for every single point, when that happens, it will always be a stepwise plot.
– Durkee
Mar 22 at 12:54
There's nothing to solve here. This is just how it works. You could run a smoothing function I guess but that would degrade the quality.
– Durkee
Mar 22 at 13:27
which smoothing function
– leena s
Mar 22 at 13:44
Perfcurve creates a threshold for every single point, when that happens, it will always be a stepwise plot.
– Durkee
Mar 22 at 12:54
Perfcurve creates a threshold for every single point, when that happens, it will always be a stepwise plot.
– Durkee
Mar 22 at 12:54
There's nothing to solve here. This is just how it works. You could run a smoothing function I guess but that would degrade the quality.
– Durkee
Mar 22 at 13:27
There's nothing to solve here. This is just how it works. You could run a smoothing function I guess but that would degrade the quality.
– Durkee
Mar 22 at 13:27
which smoothing function
– leena s
Mar 22 at 13:44
which smoothing function
– leena s
Mar 22 at 13:44
add a comment |
1 Answer
1
active
oldest
votes
As indicated by Durkee, the perfcurve
function will always be stepwise. In fact, the ROC curve is an empirical (as opposed to theoretical) cumulative distribution function (ecdf), and ecdf are stepwise functions by definition (as it computes the CDF on the values observed in the sample).
Usually, smoothing of the ROC curve is done via binning. You could bin the score values and compute an approximate ROC curve, or you could bin the False Positive Rate values obtained by the actual ROC curve (i.e. bin the X
values generated by perfcurve()
) which generates a smooth version that preserves the area under the curve (AUC
).
In the following example I will show and compare the smoothed ROC curves obtained from these two options, which can be accomplished using the TVals
option and the XVals
option of the perfcurve
function, respectively.
In each case, the binning is done so that we get approximately equal-sized (equal in the number of cases) bins using the tiedrank
function. The values to use for the TVals
and the XVals
options are then computed using the grpstats
function as the max
value on each bin of the original/pre-binned variable (scores
or X
, respectively).
%% Reference for the original ROC curve example: https://www.mathworks.com/help/stats/perfcurve.html
load fisheriris
pred = meas(51:end,1:2);
resp = (1:100)'>50; % Versicolor = 0, virginica = 1
mdl = fitglm(pred,resp,'Distribution','binomial','Link','logit');
scores = mdl.Fitted.Probability;
[X,Y,T,AUC] = perfcurve(species(51:end,:),scores,'virginica');
AUC
%% Define the number of bins to use for smoothing
nbins = 10;
%% Option 1 (RED): Smooth the ROC curve by defining score thresholds (based on equal-size bins of the score).
scores_grp = ceil(nbins * tiedrank(scores(:,1)) / length(scores));
scores_thr = grpstats(scores, scores_grp, @max);
[X_grpScore,Y_grpScore,T_grpScore,AUC_grpScore] = perfcurve(species(51:end,:),scores,'virginica','TVals',scores_thr);
AUC_grpScore
%% Option 2 (GREEN) Smooth the ROC curve by binning the False Positive Rate (variable X of the perfcurve() output)
X_grp = ceil(nbins * tiedrank(X(:,1)) / length(X));
X_thr = grpstats(X, X_grp, @max);
[X_grpFPR,Y_grpFPR,T_grpFPR,AUC_grpFPR] = perfcurve(species(51:end,:),scores,'virginica','XVals',X_thr);
AUC_grpFPR
%% Plot
figure
plot(X,Y,'b.-'); hold on
plot(X_grpScore,Y_grpScore,'rx-')
plot(X_grpFPR,Y_grpFPR,'g.-')
xlabel('False positive rate')
ylabel('True positive rate')
title('ROC for Classification by Logistic Regression')
legend('Original ROC curve', ...
sprintf('Smoothed ROC curve in %d bins (based on score bins)', nbins), ...
sprintf('Smoothed ROC curve in %d bins (based on FPR bins)', nbins), ...
'Location', 'SouthEast')
The graphical output from this code is the following:
Note: if you look at the text output generated by the above code, you will notice that, as anticipated, the AUC
values for the original ROC and the smoothed ROC curve based on FPR bins (GREEN option) coincide (AUC = 0.7918
), whereas the AUC
value for the smoothed ROC curve based on score bins (RED option) is quite smaller than the original AUC (= 0.6342
), so the FPR approach should be preferred as smoothing technique for plotting purposes. Note however that the FPR approach requires computing the ROC curve twice, once on the original scores
variable, and once on the binned FPR values (X
values of the first ROC calculation).
However, the second ROC calculation can be avoided because the same smoothed ROC curve can be obtained by binning the X
values and computing the max(Y)
value on each bin, as shown in the following snippet:
%% Compute max(Y) on the binned X values
% Make a dataset with the X and Y variables as columns (for easier manipulation and grouping)
ds = dataset(X,Y);
% Compute equal size bins on X and the corresponding MAX statistics
ds.X_grp = ceil(nbins * tiedrank(ds.X(:,1)) / size(ds.X,1));
ds_grp = grpstats(ds, 'X_grp', @max, 'DataVars', 'X', 'Y');
% Add the smooth curve to the previous plot
hold on
plot(ds_grp.max_X, ds_grp.max_Y, 'mx-')
And now you should see the above plot where the green curve has been overridden by a magenta curve with star points.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55296702%2fwhat-is-the-problem-with-the-shape-of-the-roc-curve-with-low-auc-4%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
As indicated by Durkee, the perfcurve
function will always be stepwise. In fact, the ROC curve is an empirical (as opposed to theoretical) cumulative distribution function (ecdf), and ecdf are stepwise functions by definition (as it computes the CDF on the values observed in the sample).
Usually, smoothing of the ROC curve is done via binning. You could bin the score values and compute an approximate ROC curve, or you could bin the False Positive Rate values obtained by the actual ROC curve (i.e. bin the X
values generated by perfcurve()
) which generates a smooth version that preserves the area under the curve (AUC
).
In the following example I will show and compare the smoothed ROC curves obtained from these two options, which can be accomplished using the TVals
option and the XVals
option of the perfcurve
function, respectively.
In each case, the binning is done so that we get approximately equal-sized (equal in the number of cases) bins using the tiedrank
function. The values to use for the TVals
and the XVals
options are then computed using the grpstats
function as the max
value on each bin of the original/pre-binned variable (scores
or X
, respectively).
%% Reference for the original ROC curve example: https://www.mathworks.com/help/stats/perfcurve.html
load fisheriris
pred = meas(51:end,1:2);
resp = (1:100)'>50; % Versicolor = 0, virginica = 1
mdl = fitglm(pred,resp,'Distribution','binomial','Link','logit');
scores = mdl.Fitted.Probability;
[X,Y,T,AUC] = perfcurve(species(51:end,:),scores,'virginica');
AUC
%% Define the number of bins to use for smoothing
nbins = 10;
%% Option 1 (RED): Smooth the ROC curve by defining score thresholds (based on equal-size bins of the score).
scores_grp = ceil(nbins * tiedrank(scores(:,1)) / length(scores));
scores_thr = grpstats(scores, scores_grp, @max);
[X_grpScore,Y_grpScore,T_grpScore,AUC_grpScore] = perfcurve(species(51:end,:),scores,'virginica','TVals',scores_thr);
AUC_grpScore
%% Option 2 (GREEN) Smooth the ROC curve by binning the False Positive Rate (variable X of the perfcurve() output)
X_grp = ceil(nbins * tiedrank(X(:,1)) / length(X));
X_thr = grpstats(X, X_grp, @max);
[X_grpFPR,Y_grpFPR,T_grpFPR,AUC_grpFPR] = perfcurve(species(51:end,:),scores,'virginica','XVals',X_thr);
AUC_grpFPR
%% Plot
figure
plot(X,Y,'b.-'); hold on
plot(X_grpScore,Y_grpScore,'rx-')
plot(X_grpFPR,Y_grpFPR,'g.-')
xlabel('False positive rate')
ylabel('True positive rate')
title('ROC for Classification by Logistic Regression')
legend('Original ROC curve', ...
sprintf('Smoothed ROC curve in %d bins (based on score bins)', nbins), ...
sprintf('Smoothed ROC curve in %d bins (based on FPR bins)', nbins), ...
'Location', 'SouthEast')
The graphical output from this code is the following:
Note: if you look at the text output generated by the above code, you will notice that, as anticipated, the AUC
values for the original ROC and the smoothed ROC curve based on FPR bins (GREEN option) coincide (AUC = 0.7918
), whereas the AUC
value for the smoothed ROC curve based on score bins (RED option) is quite smaller than the original AUC (= 0.6342
), so the FPR approach should be preferred as smoothing technique for plotting purposes. Note however that the FPR approach requires computing the ROC curve twice, once on the original scores
variable, and once on the binned FPR values (X
values of the first ROC calculation).
However, the second ROC calculation can be avoided because the same smoothed ROC curve can be obtained by binning the X
values and computing the max(Y)
value on each bin, as shown in the following snippet:
%% Compute max(Y) on the binned X values
% Make a dataset with the X and Y variables as columns (for easier manipulation and grouping)
ds = dataset(X,Y);
% Compute equal size bins on X and the corresponding MAX statistics
ds.X_grp = ceil(nbins * tiedrank(ds.X(:,1)) / size(ds.X,1));
ds_grp = grpstats(ds, 'X_grp', @max, 'DataVars', 'X', 'Y');
% Add the smooth curve to the previous plot
hold on
plot(ds_grp.max_X, ds_grp.max_Y, 'mx-')
And now you should see the above plot where the green curve has been overridden by a magenta curve with star points.
add a comment |
As indicated by Durkee, the perfcurve
function will always be stepwise. In fact, the ROC curve is an empirical (as opposed to theoretical) cumulative distribution function (ecdf), and ecdf are stepwise functions by definition (as it computes the CDF on the values observed in the sample).
Usually, smoothing of the ROC curve is done via binning. You could bin the score values and compute an approximate ROC curve, or you could bin the False Positive Rate values obtained by the actual ROC curve (i.e. bin the X
values generated by perfcurve()
) which generates a smooth version that preserves the area under the curve (AUC
).
In the following example I will show and compare the smoothed ROC curves obtained from these two options, which can be accomplished using the TVals
option and the XVals
option of the perfcurve
function, respectively.
In each case, the binning is done so that we get approximately equal-sized (equal in the number of cases) bins using the tiedrank
function. The values to use for the TVals
and the XVals
options are then computed using the grpstats
function as the max
value on each bin of the original/pre-binned variable (scores
or X
, respectively).
%% Reference for the original ROC curve example: https://www.mathworks.com/help/stats/perfcurve.html
load fisheriris
pred = meas(51:end,1:2);
resp = (1:100)'>50; % Versicolor = 0, virginica = 1
mdl = fitglm(pred,resp,'Distribution','binomial','Link','logit');
scores = mdl.Fitted.Probability;
[X,Y,T,AUC] = perfcurve(species(51:end,:),scores,'virginica');
AUC
%% Define the number of bins to use for smoothing
nbins = 10;
%% Option 1 (RED): Smooth the ROC curve by defining score thresholds (based on equal-size bins of the score).
scores_grp = ceil(nbins * tiedrank(scores(:,1)) / length(scores));
scores_thr = grpstats(scores, scores_grp, @max);
[X_grpScore,Y_grpScore,T_grpScore,AUC_grpScore] = perfcurve(species(51:end,:),scores,'virginica','TVals',scores_thr);
AUC_grpScore
%% Option 2 (GREEN) Smooth the ROC curve by binning the False Positive Rate (variable X of the perfcurve() output)
X_grp = ceil(nbins * tiedrank(X(:,1)) / length(X));
X_thr = grpstats(X, X_grp, @max);
[X_grpFPR,Y_grpFPR,T_grpFPR,AUC_grpFPR] = perfcurve(species(51:end,:),scores,'virginica','XVals',X_thr);
AUC_grpFPR
%% Plot
figure
plot(X,Y,'b.-'); hold on
plot(X_grpScore,Y_grpScore,'rx-')
plot(X_grpFPR,Y_grpFPR,'g.-')
xlabel('False positive rate')
ylabel('True positive rate')
title('ROC for Classification by Logistic Regression')
legend('Original ROC curve', ...
sprintf('Smoothed ROC curve in %d bins (based on score bins)', nbins), ...
sprintf('Smoothed ROC curve in %d bins (based on FPR bins)', nbins), ...
'Location', 'SouthEast')
The graphical output from this code is the following:
Note: if you look at the text output generated by the above code, you will notice that, as anticipated, the AUC
values for the original ROC and the smoothed ROC curve based on FPR bins (GREEN option) coincide (AUC = 0.7918
), whereas the AUC
value for the smoothed ROC curve based on score bins (RED option) is quite smaller than the original AUC (= 0.6342
), so the FPR approach should be preferred as smoothing technique for plotting purposes. Note however that the FPR approach requires computing the ROC curve twice, once on the original scores
variable, and once on the binned FPR values (X
values of the first ROC calculation).
However, the second ROC calculation can be avoided because the same smoothed ROC curve can be obtained by binning the X
values and computing the max(Y)
value on each bin, as shown in the following snippet:
%% Compute max(Y) on the binned X values
% Make a dataset with the X and Y variables as columns (for easier manipulation and grouping)
ds = dataset(X,Y);
% Compute equal size bins on X and the corresponding MAX statistics
ds.X_grp = ceil(nbins * tiedrank(ds.X(:,1)) / size(ds.X,1));
ds_grp = grpstats(ds, 'X_grp', @max, 'DataVars', 'X', 'Y');
% Add the smooth curve to the previous plot
hold on
plot(ds_grp.max_X, ds_grp.max_Y, 'mx-')
And now you should see the above plot where the green curve has been overridden by a magenta curve with star points.
add a comment |
As indicated by Durkee, the perfcurve
function will always be stepwise. In fact, the ROC curve is an empirical (as opposed to theoretical) cumulative distribution function (ecdf), and ecdf are stepwise functions by definition (as it computes the CDF on the values observed in the sample).
Usually, smoothing of the ROC curve is done via binning. You could bin the score values and compute an approximate ROC curve, or you could bin the False Positive Rate values obtained by the actual ROC curve (i.e. bin the X
values generated by perfcurve()
) which generates a smooth version that preserves the area under the curve (AUC
).
In the following example I will show and compare the smoothed ROC curves obtained from these two options, which can be accomplished using the TVals
option and the XVals
option of the perfcurve
function, respectively.
In each case, the binning is done so that we get approximately equal-sized (equal in the number of cases) bins using the tiedrank
function. The values to use for the TVals
and the XVals
options are then computed using the grpstats
function as the max
value on each bin of the original/pre-binned variable (scores
or X
, respectively).
%% Reference for the original ROC curve example: https://www.mathworks.com/help/stats/perfcurve.html
load fisheriris
pred = meas(51:end,1:2);
resp = (1:100)'>50; % Versicolor = 0, virginica = 1
mdl = fitglm(pred,resp,'Distribution','binomial','Link','logit');
scores = mdl.Fitted.Probability;
[X,Y,T,AUC] = perfcurve(species(51:end,:),scores,'virginica');
AUC
%% Define the number of bins to use for smoothing
nbins = 10;
%% Option 1 (RED): Smooth the ROC curve by defining score thresholds (based on equal-size bins of the score).
scores_grp = ceil(nbins * tiedrank(scores(:,1)) / length(scores));
scores_thr = grpstats(scores, scores_grp, @max);
[X_grpScore,Y_grpScore,T_grpScore,AUC_grpScore] = perfcurve(species(51:end,:),scores,'virginica','TVals',scores_thr);
AUC_grpScore
%% Option 2 (GREEN) Smooth the ROC curve by binning the False Positive Rate (variable X of the perfcurve() output)
X_grp = ceil(nbins * tiedrank(X(:,1)) / length(X));
X_thr = grpstats(X, X_grp, @max);
[X_grpFPR,Y_grpFPR,T_grpFPR,AUC_grpFPR] = perfcurve(species(51:end,:),scores,'virginica','XVals',X_thr);
AUC_grpFPR
%% Plot
figure
plot(X,Y,'b.-'); hold on
plot(X_grpScore,Y_grpScore,'rx-')
plot(X_grpFPR,Y_grpFPR,'g.-')
xlabel('False positive rate')
ylabel('True positive rate')
title('ROC for Classification by Logistic Regression')
legend('Original ROC curve', ...
sprintf('Smoothed ROC curve in %d bins (based on score bins)', nbins), ...
sprintf('Smoothed ROC curve in %d bins (based on FPR bins)', nbins), ...
'Location', 'SouthEast')
The graphical output from this code is the following:
Note: if you look at the text output generated by the above code, you will notice that, as anticipated, the AUC
values for the original ROC and the smoothed ROC curve based on FPR bins (GREEN option) coincide (AUC = 0.7918
), whereas the AUC
value for the smoothed ROC curve based on score bins (RED option) is quite smaller than the original AUC (= 0.6342
), so the FPR approach should be preferred as smoothing technique for plotting purposes. Note however that the FPR approach requires computing the ROC curve twice, once on the original scores
variable, and once on the binned FPR values (X
values of the first ROC calculation).
However, the second ROC calculation can be avoided because the same smoothed ROC curve can be obtained by binning the X
values and computing the max(Y)
value on each bin, as shown in the following snippet:
%% Compute max(Y) on the binned X values
% Make a dataset with the X and Y variables as columns (for easier manipulation and grouping)
ds = dataset(X,Y);
% Compute equal size bins on X and the corresponding MAX statistics
ds.X_grp = ceil(nbins * tiedrank(ds.X(:,1)) / size(ds.X,1));
ds_grp = grpstats(ds, 'X_grp', @max, 'DataVars', 'X', 'Y');
% Add the smooth curve to the previous plot
hold on
plot(ds_grp.max_X, ds_grp.max_Y, 'mx-')
And now you should see the above plot where the green curve has been overridden by a magenta curve with star points.
As indicated by Durkee, the perfcurve
function will always be stepwise. In fact, the ROC curve is an empirical (as opposed to theoretical) cumulative distribution function (ecdf), and ecdf are stepwise functions by definition (as it computes the CDF on the values observed in the sample).
Usually, smoothing of the ROC curve is done via binning. You could bin the score values and compute an approximate ROC curve, or you could bin the False Positive Rate values obtained by the actual ROC curve (i.e. bin the X
values generated by perfcurve()
) which generates a smooth version that preserves the area under the curve (AUC
).
In the following example I will show and compare the smoothed ROC curves obtained from these two options, which can be accomplished using the TVals
option and the XVals
option of the perfcurve
function, respectively.
In each case, the binning is done so that we get approximately equal-sized (equal in the number of cases) bins using the tiedrank
function. The values to use for the TVals
and the XVals
options are then computed using the grpstats
function as the max
value on each bin of the original/pre-binned variable (scores
or X
, respectively).
%% Reference for the original ROC curve example: https://www.mathworks.com/help/stats/perfcurve.html
load fisheriris
pred = meas(51:end,1:2);
resp = (1:100)'>50; % Versicolor = 0, virginica = 1
mdl = fitglm(pred,resp,'Distribution','binomial','Link','logit');
scores = mdl.Fitted.Probability;
[X,Y,T,AUC] = perfcurve(species(51:end,:),scores,'virginica');
AUC
%% Define the number of bins to use for smoothing
nbins = 10;
%% Option 1 (RED): Smooth the ROC curve by defining score thresholds (based on equal-size bins of the score).
scores_grp = ceil(nbins * tiedrank(scores(:,1)) / length(scores));
scores_thr = grpstats(scores, scores_grp, @max);
[X_grpScore,Y_grpScore,T_grpScore,AUC_grpScore] = perfcurve(species(51:end,:),scores,'virginica','TVals',scores_thr);
AUC_grpScore
%% Option 2 (GREEN) Smooth the ROC curve by binning the False Positive Rate (variable X of the perfcurve() output)
X_grp = ceil(nbins * tiedrank(X(:,1)) / length(X));
X_thr = grpstats(X, X_grp, @max);
[X_grpFPR,Y_grpFPR,T_grpFPR,AUC_grpFPR] = perfcurve(species(51:end,:),scores,'virginica','XVals',X_thr);
AUC_grpFPR
%% Plot
figure
plot(X,Y,'b.-'); hold on
plot(X_grpScore,Y_grpScore,'rx-')
plot(X_grpFPR,Y_grpFPR,'g.-')
xlabel('False positive rate')
ylabel('True positive rate')
title('ROC for Classification by Logistic Regression')
legend('Original ROC curve', ...
sprintf('Smoothed ROC curve in %d bins (based on score bins)', nbins), ...
sprintf('Smoothed ROC curve in %d bins (based on FPR bins)', nbins), ...
'Location', 'SouthEast')
The graphical output from this code is the following:
Note: if you look at the text output generated by the above code, you will notice that, as anticipated, the AUC
values for the original ROC and the smoothed ROC curve based on FPR bins (GREEN option) coincide (AUC = 0.7918
), whereas the AUC
value for the smoothed ROC curve based on score bins (RED option) is quite smaller than the original AUC (= 0.6342
), so the FPR approach should be preferred as smoothing technique for plotting purposes. Note however that the FPR approach requires computing the ROC curve twice, once on the original scores
variable, and once on the binned FPR values (X
values of the first ROC calculation).
However, the second ROC calculation can be avoided because the same smoothed ROC curve can be obtained by binning the X
values and computing the max(Y)
value on each bin, as shown in the following snippet:
%% Compute max(Y) on the binned X values
% Make a dataset with the X and Y variables as columns (for easier manipulation and grouping)
ds = dataset(X,Y);
% Compute equal size bins on X and the corresponding MAX statistics
ds.X_grp = ceil(nbins * tiedrank(ds.X(:,1)) / size(ds.X,1));
ds_grp = grpstats(ds, 'X_grp', @max, 'DataVars', 'X', 'Y');
% Add the smooth curve to the previous plot
hold on
plot(ds_grp.max_X, ds_grp.max_Y, 'mx-')
And now you should see the above plot where the green curve has been overridden by a magenta curve with star points.
answered Mar 25 at 3:19
mastropimastropi
1236
1236
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55296702%2fwhat-is-the-problem-with-the-shape-of-the-roc-curve-with-low-auc-4%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Perfcurve creates a threshold for every single point, when that happens, it will always be a stepwise plot.
– Durkee
Mar 22 at 12:54
There's nothing to solve here. This is just how it works. You could run a smoothing function I guess but that would degrade the quality.
– Durkee
Mar 22 at 13:27
which smoothing function
– leena s
Mar 22 at 13:44