What is the problem with the shape of the roc curve with low auc(.4)?Is it possible to compare the classification ability of two sets of features by ROC?How to fix ROC curve with points below diagonal?Matlab and Support Vector Machines: Why doesn't the implementation of PCA give good prediction results?ROC curve and libsvmPerfcurve positive classsklearn - model keeps overfittingHow to plot a ROC curve of a detector generated by TrainCascadeObjectDetector?Calculating in Matlab confidence intervals and AUC in case of IdentificationSpecificity of ROC curve plotting in reverse directionwhy have high AUC and low accuracy in a balanced dataset for SVM

What is more safe for browsing the web: PC or smartphone?

What is monoid homomorphism exactly?

What happens if I accidentally leave an app running and click "Install Now" in Software Updater?

Hostile Divisor Numbers

In "Avengers: Endgame", what does this name refer to?

What is the thing used to help pouring liquids called?

How to say something covers all the view up to the horizon line?

What does the copyright in a dissertation protect exactly?

Referring to person by surname, keep or omit "von"?

Can an Iranian citizen enter the USA on a Dutch passport?

Huffman Code in C++

Convert Numbers To Emoji Math

Lines too long in piece with two sections for different instruments

Make me a minimum magic sum

What do you call a painting painted on a wall?

Playing Doublets with the Primes

What does のそ mean on this picture?

Endgame puzzle: How to avoid stalemate and win?

Subnumcases as a part of align

A 2-connected graph contains a path passing through all the odd degree vertices

Do Jedi mind tricks work on Ewoks?

Was there a dinosaur-counter in the original Jurassic Park movie?

Given a safe domain, are subdirectories safe as well?

Changing stroke width vertically but not horizontally in Inkscape



What is the problem with the shape of the roc curve with low auc(.4)?


Is it possible to compare the classification ability of two sets of features by ROC?How to fix ROC curve with points below diagonal?Matlab and Support Vector Machines: Why doesn't the implementation of PCA give good prediction results?ROC curve and libsvmPerfcurve positive classsklearn - model keeps overfittingHow to plot a ROC curve of a detector generated by TrainCascadeObjectDetector?Calculating in Matlab confidence intervals and AUC in case of IdentificationSpecificity of ROC curve plotting in reverse directionwhy have high AUC and low accuracy in a balanced dataset for SVM






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I'm trying to plot a ROC curve. I have 75 data points and I considered only 10 features. Ii'm getting a staircase like image see below. Is this due to the small data set? Can we add more points to improve the curve?
AUC is very low .44. Is there any method to upload csv file ?



species1= readtable('target.csv');
species1 = table2cell(species1)
meas1= readtable('feature.csv');
meas1=meas1(:,1:10);
meas1= table2array(meas1)
numObs = length(species1);

half = floor(numObs/2);
training = meas1(1:half,:);
trainingSpecies = species1(1:half);
sample = meas1(half+1:end,:);
trainingSpecies = cell2mat(trainingSpecies)
group = species1(half+1:end,:);
group = cell2mat(group)
SVMModel = fitcsvm(training,trainingSpecies)
[label,score] = predict(SVMModel,sample);

[X,Y,T,AUC] = perfcurve(group,score(:,2),'1');
plot(X,Y,'LineWidth',3)
xlabel('False positive rate')
ylabel('True positive rate')
title('ROC for Classification ')


my ROC










share|improve this question
























  • Perfcurve creates a threshold for every single point, when that happens, it will always be a stepwise plot.

    – Durkee
    Mar 22 at 12:54











  • There's nothing to solve here. This is just how it works. You could run a smoothing function I guess but that would degrade the quality.

    – Durkee
    Mar 22 at 13:27











  • which smoothing function

    – leena s
    Mar 22 at 13:44

















0















I'm trying to plot a ROC curve. I have 75 data points and I considered only 10 features. Ii'm getting a staircase like image see below. Is this due to the small data set? Can we add more points to improve the curve?
AUC is very low .44. Is there any method to upload csv file ?



species1= readtable('target.csv');
species1 = table2cell(species1)
meas1= readtable('feature.csv');
meas1=meas1(:,1:10);
meas1= table2array(meas1)
numObs = length(species1);

half = floor(numObs/2);
training = meas1(1:half,:);
trainingSpecies = species1(1:half);
sample = meas1(half+1:end,:);
trainingSpecies = cell2mat(trainingSpecies)
group = species1(half+1:end,:);
group = cell2mat(group)
SVMModel = fitcsvm(training,trainingSpecies)
[label,score] = predict(SVMModel,sample);

[X,Y,T,AUC] = perfcurve(group,score(:,2),'1');
plot(X,Y,'LineWidth',3)
xlabel('False positive rate')
ylabel('True positive rate')
title('ROC for Classification ')


my ROC










share|improve this question
























  • Perfcurve creates a threshold for every single point, when that happens, it will always be a stepwise plot.

    – Durkee
    Mar 22 at 12:54











  • There's nothing to solve here. This is just how it works. You could run a smoothing function I guess but that would degrade the quality.

    – Durkee
    Mar 22 at 13:27











  • which smoothing function

    – leena s
    Mar 22 at 13:44













0












0








0








I'm trying to plot a ROC curve. I have 75 data points and I considered only 10 features. Ii'm getting a staircase like image see below. Is this due to the small data set? Can we add more points to improve the curve?
AUC is very low .44. Is there any method to upload csv file ?



species1= readtable('target.csv');
species1 = table2cell(species1)
meas1= readtable('feature.csv');
meas1=meas1(:,1:10);
meas1= table2array(meas1)
numObs = length(species1);

half = floor(numObs/2);
training = meas1(1:half,:);
trainingSpecies = species1(1:half);
sample = meas1(half+1:end,:);
trainingSpecies = cell2mat(trainingSpecies)
group = species1(half+1:end,:);
group = cell2mat(group)
SVMModel = fitcsvm(training,trainingSpecies)
[label,score] = predict(SVMModel,sample);

[X,Y,T,AUC] = perfcurve(group,score(:,2),'1');
plot(X,Y,'LineWidth',3)
xlabel('False positive rate')
ylabel('True positive rate')
title('ROC for Classification ')


my ROC










share|improve this question
















I'm trying to plot a ROC curve. I have 75 data points and I considered only 10 features. Ii'm getting a staircase like image see below. Is this due to the small data set? Can we add more points to improve the curve?
AUC is very low .44. Is there any method to upload csv file ?



species1= readtable('target.csv');
species1 = table2cell(species1)
meas1= readtable('feature.csv');
meas1=meas1(:,1:10);
meas1= table2array(meas1)
numObs = length(species1);

half = floor(numObs/2);
training = meas1(1:half,:);
trainingSpecies = species1(1:half);
sample = meas1(half+1:end,:);
trainingSpecies = cell2mat(trainingSpecies)
group = species1(half+1:end,:);
group = cell2mat(group)
SVMModel = fitcsvm(training,trainingSpecies)
[label,score] = predict(SVMModel,sample);

[X,Y,T,AUC] = perfcurve(group,score(:,2),'1');
plot(X,Y,'LineWidth',3)
xlabel('False positive rate')
ylabel('True positive rate')
title('ROC for Classification ')


my ROC







matlab svm






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 23 at 4:37







leena s

















asked Mar 22 at 9:38









leena sleena s

15




15












  • Perfcurve creates a threshold for every single point, when that happens, it will always be a stepwise plot.

    – Durkee
    Mar 22 at 12:54











  • There's nothing to solve here. This is just how it works. You could run a smoothing function I guess but that would degrade the quality.

    – Durkee
    Mar 22 at 13:27











  • which smoothing function

    – leena s
    Mar 22 at 13:44

















  • Perfcurve creates a threshold for every single point, when that happens, it will always be a stepwise plot.

    – Durkee
    Mar 22 at 12:54











  • There's nothing to solve here. This is just how it works. You could run a smoothing function I guess but that would degrade the quality.

    – Durkee
    Mar 22 at 13:27











  • which smoothing function

    – leena s
    Mar 22 at 13:44
















Perfcurve creates a threshold for every single point, when that happens, it will always be a stepwise plot.

– Durkee
Mar 22 at 12:54





Perfcurve creates a threshold for every single point, when that happens, it will always be a stepwise plot.

– Durkee
Mar 22 at 12:54













There's nothing to solve here. This is just how it works. You could run a smoothing function I guess but that would degrade the quality.

– Durkee
Mar 22 at 13:27





There's nothing to solve here. This is just how it works. You could run a smoothing function I guess but that would degrade the quality.

– Durkee
Mar 22 at 13:27













which smoothing function

– leena s
Mar 22 at 13:44





which smoothing function

– leena s
Mar 22 at 13:44












1 Answer
1






active

oldest

votes


















0














As indicated by Durkee, the perfcurve function will always be stepwise. In fact, the ROC curve is an empirical (as opposed to theoretical) cumulative distribution function (ecdf), and ecdf are stepwise functions by definition (as it computes the CDF on the values observed in the sample).



Usually, smoothing of the ROC curve is done via binning. You could bin the score values and compute an approximate ROC curve, or you could bin the False Positive Rate values obtained by the actual ROC curve (i.e. bin the X values generated by perfcurve()) which generates a smooth version that preserves the area under the curve (AUC).



In the following example I will show and compare the smoothed ROC curves obtained from these two options, which can be accomplished using the TVals option and the XVals option of the perfcurve function, respectively.



In each case, the binning is done so that we get approximately equal-sized (equal in the number of cases) bins using the tiedrank function. The values to use for the TVals and the XVals options are then computed using the grpstats function as the max value on each bin of the original/pre-binned variable (scores or X, respectively).



%% Reference for the original ROC curve example: https://www.mathworks.com/help/stats/perfcurve.html
load fisheriris
pred = meas(51:end,1:2);
resp = (1:100)'>50; % Versicolor = 0, virginica = 1
mdl = fitglm(pred,resp,'Distribution','binomial','Link','logit');
scores = mdl.Fitted.Probability;
[X,Y,T,AUC] = perfcurve(species(51:end,:),scores,'virginica');
AUC

%% Define the number of bins to use for smoothing
nbins = 10;

%% Option 1 (RED): Smooth the ROC curve by defining score thresholds (based on equal-size bins of the score).
scores_grp = ceil(nbins * tiedrank(scores(:,1)) / length(scores));
scores_thr = grpstats(scores, scores_grp, @max);
[X_grpScore,Y_grpScore,T_grpScore,AUC_grpScore] = perfcurve(species(51:end,:),scores,'virginica','TVals',scores_thr);
AUC_grpScore

%% Option 2 (GREEN) Smooth the ROC curve by binning the False Positive Rate (variable X of the perfcurve() output)
X_grp = ceil(nbins * tiedrank(X(:,1)) / length(X));
X_thr = grpstats(X, X_grp, @max);
[X_grpFPR,Y_grpFPR,T_grpFPR,AUC_grpFPR] = perfcurve(species(51:end,:),scores,'virginica','XVals',X_thr);
AUC_grpFPR

%% Plot
figure
plot(X,Y,'b.-'); hold on
plot(X_grpScore,Y_grpScore,'rx-')
plot(X_grpFPR,Y_grpFPR,'g.-')
xlabel('False positive rate')
ylabel('True positive rate')
title('ROC for Classification by Logistic Regression')
legend('Original ROC curve', ...
sprintf('Smoothed ROC curve in %d bins (based on score bins)', nbins), ...
sprintf('Smoothed ROC curve in %d bins (based on FPR bins)', nbins), ...
'Location', 'SouthEast')


The graphical output from this code is the following:
enter image description here



Note: if you look at the text output generated by the above code, you will notice that, as anticipated, the AUC values for the original ROC and the smoothed ROC curve based on FPR bins (GREEN option) coincide (AUC = 0.7918), whereas the AUC value for the smoothed ROC curve based on score bins (RED option) is quite smaller than the original AUC (= 0.6342), so the FPR approach should be preferred as smoothing technique for plotting purposes. Note however that the FPR approach requires computing the ROC curve twice, once on the original scores variable, and once on the binned FPR values (X values of the first ROC calculation).

However, the second ROC calculation can be avoided because the same smoothed ROC curve can be obtained by binning the X values and computing the max(Y) value on each bin, as shown in the following snippet:



%% Compute max(Y) on the binned X values
% Make a dataset with the X and Y variables as columns (for easier manipulation and grouping)
ds = dataset(X,Y);
% Compute equal size bins on X and the corresponding MAX statistics
ds.X_grp = ceil(nbins * tiedrank(ds.X(:,1)) / size(ds.X,1));
ds_grp = grpstats(ds, 'X_grp', @max, 'DataVars', 'X', 'Y');
% Add the smooth curve to the previous plot
hold on
plot(ds_grp.max_X, ds_grp.max_Y, 'mx-')


And now you should see the above plot where the green curve has been overridden by a magenta curve with star points.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55296702%2fwhat-is-the-problem-with-the-shape-of-the-roc-curve-with-low-auc-4%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    As indicated by Durkee, the perfcurve function will always be stepwise. In fact, the ROC curve is an empirical (as opposed to theoretical) cumulative distribution function (ecdf), and ecdf are stepwise functions by definition (as it computes the CDF on the values observed in the sample).



    Usually, smoothing of the ROC curve is done via binning. You could bin the score values and compute an approximate ROC curve, or you could bin the False Positive Rate values obtained by the actual ROC curve (i.e. bin the X values generated by perfcurve()) which generates a smooth version that preserves the area under the curve (AUC).



    In the following example I will show and compare the smoothed ROC curves obtained from these two options, which can be accomplished using the TVals option and the XVals option of the perfcurve function, respectively.



    In each case, the binning is done so that we get approximately equal-sized (equal in the number of cases) bins using the tiedrank function. The values to use for the TVals and the XVals options are then computed using the grpstats function as the max value on each bin of the original/pre-binned variable (scores or X, respectively).



    %% Reference for the original ROC curve example: https://www.mathworks.com/help/stats/perfcurve.html
    load fisheriris
    pred = meas(51:end,1:2);
    resp = (1:100)'>50; % Versicolor = 0, virginica = 1
    mdl = fitglm(pred,resp,'Distribution','binomial','Link','logit');
    scores = mdl.Fitted.Probability;
    [X,Y,T,AUC] = perfcurve(species(51:end,:),scores,'virginica');
    AUC

    %% Define the number of bins to use for smoothing
    nbins = 10;

    %% Option 1 (RED): Smooth the ROC curve by defining score thresholds (based on equal-size bins of the score).
    scores_grp = ceil(nbins * tiedrank(scores(:,1)) / length(scores));
    scores_thr = grpstats(scores, scores_grp, @max);
    [X_grpScore,Y_grpScore,T_grpScore,AUC_grpScore] = perfcurve(species(51:end,:),scores,'virginica','TVals',scores_thr);
    AUC_grpScore

    %% Option 2 (GREEN) Smooth the ROC curve by binning the False Positive Rate (variable X of the perfcurve() output)
    X_grp = ceil(nbins * tiedrank(X(:,1)) / length(X));
    X_thr = grpstats(X, X_grp, @max);
    [X_grpFPR,Y_grpFPR,T_grpFPR,AUC_grpFPR] = perfcurve(species(51:end,:),scores,'virginica','XVals',X_thr);
    AUC_grpFPR

    %% Plot
    figure
    plot(X,Y,'b.-'); hold on
    plot(X_grpScore,Y_grpScore,'rx-')
    plot(X_grpFPR,Y_grpFPR,'g.-')
    xlabel('False positive rate')
    ylabel('True positive rate')
    title('ROC for Classification by Logistic Regression')
    legend('Original ROC curve', ...
    sprintf('Smoothed ROC curve in %d bins (based on score bins)', nbins), ...
    sprintf('Smoothed ROC curve in %d bins (based on FPR bins)', nbins), ...
    'Location', 'SouthEast')


    The graphical output from this code is the following:
    enter image description here



    Note: if you look at the text output generated by the above code, you will notice that, as anticipated, the AUC values for the original ROC and the smoothed ROC curve based on FPR bins (GREEN option) coincide (AUC = 0.7918), whereas the AUC value for the smoothed ROC curve based on score bins (RED option) is quite smaller than the original AUC (= 0.6342), so the FPR approach should be preferred as smoothing technique for plotting purposes. Note however that the FPR approach requires computing the ROC curve twice, once on the original scores variable, and once on the binned FPR values (X values of the first ROC calculation).

    However, the second ROC calculation can be avoided because the same smoothed ROC curve can be obtained by binning the X values and computing the max(Y) value on each bin, as shown in the following snippet:



    %% Compute max(Y) on the binned X values
    % Make a dataset with the X and Y variables as columns (for easier manipulation and grouping)
    ds = dataset(X,Y);
    % Compute equal size bins on X and the corresponding MAX statistics
    ds.X_grp = ceil(nbins * tiedrank(ds.X(:,1)) / size(ds.X,1));
    ds_grp = grpstats(ds, 'X_grp', @max, 'DataVars', 'X', 'Y');
    % Add the smooth curve to the previous plot
    hold on
    plot(ds_grp.max_X, ds_grp.max_Y, 'mx-')


    And now you should see the above plot where the green curve has been overridden by a magenta curve with star points.






    share|improve this answer



























      0














      As indicated by Durkee, the perfcurve function will always be stepwise. In fact, the ROC curve is an empirical (as opposed to theoretical) cumulative distribution function (ecdf), and ecdf are stepwise functions by definition (as it computes the CDF on the values observed in the sample).



      Usually, smoothing of the ROC curve is done via binning. You could bin the score values and compute an approximate ROC curve, or you could bin the False Positive Rate values obtained by the actual ROC curve (i.e. bin the X values generated by perfcurve()) which generates a smooth version that preserves the area under the curve (AUC).



      In the following example I will show and compare the smoothed ROC curves obtained from these two options, which can be accomplished using the TVals option and the XVals option of the perfcurve function, respectively.



      In each case, the binning is done so that we get approximately equal-sized (equal in the number of cases) bins using the tiedrank function. The values to use for the TVals and the XVals options are then computed using the grpstats function as the max value on each bin of the original/pre-binned variable (scores or X, respectively).



      %% Reference for the original ROC curve example: https://www.mathworks.com/help/stats/perfcurve.html
      load fisheriris
      pred = meas(51:end,1:2);
      resp = (1:100)'>50; % Versicolor = 0, virginica = 1
      mdl = fitglm(pred,resp,'Distribution','binomial','Link','logit');
      scores = mdl.Fitted.Probability;
      [X,Y,T,AUC] = perfcurve(species(51:end,:),scores,'virginica');
      AUC

      %% Define the number of bins to use for smoothing
      nbins = 10;

      %% Option 1 (RED): Smooth the ROC curve by defining score thresholds (based on equal-size bins of the score).
      scores_grp = ceil(nbins * tiedrank(scores(:,1)) / length(scores));
      scores_thr = grpstats(scores, scores_grp, @max);
      [X_grpScore,Y_grpScore,T_grpScore,AUC_grpScore] = perfcurve(species(51:end,:),scores,'virginica','TVals',scores_thr);
      AUC_grpScore

      %% Option 2 (GREEN) Smooth the ROC curve by binning the False Positive Rate (variable X of the perfcurve() output)
      X_grp = ceil(nbins * tiedrank(X(:,1)) / length(X));
      X_thr = grpstats(X, X_grp, @max);
      [X_grpFPR,Y_grpFPR,T_grpFPR,AUC_grpFPR] = perfcurve(species(51:end,:),scores,'virginica','XVals',X_thr);
      AUC_grpFPR

      %% Plot
      figure
      plot(X,Y,'b.-'); hold on
      plot(X_grpScore,Y_grpScore,'rx-')
      plot(X_grpFPR,Y_grpFPR,'g.-')
      xlabel('False positive rate')
      ylabel('True positive rate')
      title('ROC for Classification by Logistic Regression')
      legend('Original ROC curve', ...
      sprintf('Smoothed ROC curve in %d bins (based on score bins)', nbins), ...
      sprintf('Smoothed ROC curve in %d bins (based on FPR bins)', nbins), ...
      'Location', 'SouthEast')


      The graphical output from this code is the following:
      enter image description here



      Note: if you look at the text output generated by the above code, you will notice that, as anticipated, the AUC values for the original ROC and the smoothed ROC curve based on FPR bins (GREEN option) coincide (AUC = 0.7918), whereas the AUC value for the smoothed ROC curve based on score bins (RED option) is quite smaller than the original AUC (= 0.6342), so the FPR approach should be preferred as smoothing technique for plotting purposes. Note however that the FPR approach requires computing the ROC curve twice, once on the original scores variable, and once on the binned FPR values (X values of the first ROC calculation).

      However, the second ROC calculation can be avoided because the same smoothed ROC curve can be obtained by binning the X values and computing the max(Y) value on each bin, as shown in the following snippet:



      %% Compute max(Y) on the binned X values
      % Make a dataset with the X and Y variables as columns (for easier manipulation and grouping)
      ds = dataset(X,Y);
      % Compute equal size bins on X and the corresponding MAX statistics
      ds.X_grp = ceil(nbins * tiedrank(ds.X(:,1)) / size(ds.X,1));
      ds_grp = grpstats(ds, 'X_grp', @max, 'DataVars', 'X', 'Y');
      % Add the smooth curve to the previous plot
      hold on
      plot(ds_grp.max_X, ds_grp.max_Y, 'mx-')


      And now you should see the above plot where the green curve has been overridden by a magenta curve with star points.






      share|improve this answer

























        0












        0








        0







        As indicated by Durkee, the perfcurve function will always be stepwise. In fact, the ROC curve is an empirical (as opposed to theoretical) cumulative distribution function (ecdf), and ecdf are stepwise functions by definition (as it computes the CDF on the values observed in the sample).



        Usually, smoothing of the ROC curve is done via binning. You could bin the score values and compute an approximate ROC curve, or you could bin the False Positive Rate values obtained by the actual ROC curve (i.e. bin the X values generated by perfcurve()) which generates a smooth version that preserves the area under the curve (AUC).



        In the following example I will show and compare the smoothed ROC curves obtained from these two options, which can be accomplished using the TVals option and the XVals option of the perfcurve function, respectively.



        In each case, the binning is done so that we get approximately equal-sized (equal in the number of cases) bins using the tiedrank function. The values to use for the TVals and the XVals options are then computed using the grpstats function as the max value on each bin of the original/pre-binned variable (scores or X, respectively).



        %% Reference for the original ROC curve example: https://www.mathworks.com/help/stats/perfcurve.html
        load fisheriris
        pred = meas(51:end,1:2);
        resp = (1:100)'>50; % Versicolor = 0, virginica = 1
        mdl = fitglm(pred,resp,'Distribution','binomial','Link','logit');
        scores = mdl.Fitted.Probability;
        [X,Y,T,AUC] = perfcurve(species(51:end,:),scores,'virginica');
        AUC

        %% Define the number of bins to use for smoothing
        nbins = 10;

        %% Option 1 (RED): Smooth the ROC curve by defining score thresholds (based on equal-size bins of the score).
        scores_grp = ceil(nbins * tiedrank(scores(:,1)) / length(scores));
        scores_thr = grpstats(scores, scores_grp, @max);
        [X_grpScore,Y_grpScore,T_grpScore,AUC_grpScore] = perfcurve(species(51:end,:),scores,'virginica','TVals',scores_thr);
        AUC_grpScore

        %% Option 2 (GREEN) Smooth the ROC curve by binning the False Positive Rate (variable X of the perfcurve() output)
        X_grp = ceil(nbins * tiedrank(X(:,1)) / length(X));
        X_thr = grpstats(X, X_grp, @max);
        [X_grpFPR,Y_grpFPR,T_grpFPR,AUC_grpFPR] = perfcurve(species(51:end,:),scores,'virginica','XVals',X_thr);
        AUC_grpFPR

        %% Plot
        figure
        plot(X,Y,'b.-'); hold on
        plot(X_grpScore,Y_grpScore,'rx-')
        plot(X_grpFPR,Y_grpFPR,'g.-')
        xlabel('False positive rate')
        ylabel('True positive rate')
        title('ROC for Classification by Logistic Regression')
        legend('Original ROC curve', ...
        sprintf('Smoothed ROC curve in %d bins (based on score bins)', nbins), ...
        sprintf('Smoothed ROC curve in %d bins (based on FPR bins)', nbins), ...
        'Location', 'SouthEast')


        The graphical output from this code is the following:
        enter image description here



        Note: if you look at the text output generated by the above code, you will notice that, as anticipated, the AUC values for the original ROC and the smoothed ROC curve based on FPR bins (GREEN option) coincide (AUC = 0.7918), whereas the AUC value for the smoothed ROC curve based on score bins (RED option) is quite smaller than the original AUC (= 0.6342), so the FPR approach should be preferred as smoothing technique for plotting purposes. Note however that the FPR approach requires computing the ROC curve twice, once on the original scores variable, and once on the binned FPR values (X values of the first ROC calculation).

        However, the second ROC calculation can be avoided because the same smoothed ROC curve can be obtained by binning the X values and computing the max(Y) value on each bin, as shown in the following snippet:



        %% Compute max(Y) on the binned X values
        % Make a dataset with the X and Y variables as columns (for easier manipulation and grouping)
        ds = dataset(X,Y);
        % Compute equal size bins on X and the corresponding MAX statistics
        ds.X_grp = ceil(nbins * tiedrank(ds.X(:,1)) / size(ds.X,1));
        ds_grp = grpstats(ds, 'X_grp', @max, 'DataVars', 'X', 'Y');
        % Add the smooth curve to the previous plot
        hold on
        plot(ds_grp.max_X, ds_grp.max_Y, 'mx-')


        And now you should see the above plot where the green curve has been overridden by a magenta curve with star points.






        share|improve this answer













        As indicated by Durkee, the perfcurve function will always be stepwise. In fact, the ROC curve is an empirical (as opposed to theoretical) cumulative distribution function (ecdf), and ecdf are stepwise functions by definition (as it computes the CDF on the values observed in the sample).



        Usually, smoothing of the ROC curve is done via binning. You could bin the score values and compute an approximate ROC curve, or you could bin the False Positive Rate values obtained by the actual ROC curve (i.e. bin the X values generated by perfcurve()) which generates a smooth version that preserves the area under the curve (AUC).



        In the following example I will show and compare the smoothed ROC curves obtained from these two options, which can be accomplished using the TVals option and the XVals option of the perfcurve function, respectively.



        In each case, the binning is done so that we get approximately equal-sized (equal in the number of cases) bins using the tiedrank function. The values to use for the TVals and the XVals options are then computed using the grpstats function as the max value on each bin of the original/pre-binned variable (scores or X, respectively).



        %% Reference for the original ROC curve example: https://www.mathworks.com/help/stats/perfcurve.html
        load fisheriris
        pred = meas(51:end,1:2);
        resp = (1:100)'>50; % Versicolor = 0, virginica = 1
        mdl = fitglm(pred,resp,'Distribution','binomial','Link','logit');
        scores = mdl.Fitted.Probability;
        [X,Y,T,AUC] = perfcurve(species(51:end,:),scores,'virginica');
        AUC

        %% Define the number of bins to use for smoothing
        nbins = 10;

        %% Option 1 (RED): Smooth the ROC curve by defining score thresholds (based on equal-size bins of the score).
        scores_grp = ceil(nbins * tiedrank(scores(:,1)) / length(scores));
        scores_thr = grpstats(scores, scores_grp, @max);
        [X_grpScore,Y_grpScore,T_grpScore,AUC_grpScore] = perfcurve(species(51:end,:),scores,'virginica','TVals',scores_thr);
        AUC_grpScore

        %% Option 2 (GREEN) Smooth the ROC curve by binning the False Positive Rate (variable X of the perfcurve() output)
        X_grp = ceil(nbins * tiedrank(X(:,1)) / length(X));
        X_thr = grpstats(X, X_grp, @max);
        [X_grpFPR,Y_grpFPR,T_grpFPR,AUC_grpFPR] = perfcurve(species(51:end,:),scores,'virginica','XVals',X_thr);
        AUC_grpFPR

        %% Plot
        figure
        plot(X,Y,'b.-'); hold on
        plot(X_grpScore,Y_grpScore,'rx-')
        plot(X_grpFPR,Y_grpFPR,'g.-')
        xlabel('False positive rate')
        ylabel('True positive rate')
        title('ROC for Classification by Logistic Regression')
        legend('Original ROC curve', ...
        sprintf('Smoothed ROC curve in %d bins (based on score bins)', nbins), ...
        sprintf('Smoothed ROC curve in %d bins (based on FPR bins)', nbins), ...
        'Location', 'SouthEast')


        The graphical output from this code is the following:
        enter image description here



        Note: if you look at the text output generated by the above code, you will notice that, as anticipated, the AUC values for the original ROC and the smoothed ROC curve based on FPR bins (GREEN option) coincide (AUC = 0.7918), whereas the AUC value for the smoothed ROC curve based on score bins (RED option) is quite smaller than the original AUC (= 0.6342), so the FPR approach should be preferred as smoothing technique for plotting purposes. Note however that the FPR approach requires computing the ROC curve twice, once on the original scores variable, and once on the binned FPR values (X values of the first ROC calculation).

        However, the second ROC calculation can be avoided because the same smoothed ROC curve can be obtained by binning the X values and computing the max(Y) value on each bin, as shown in the following snippet:



        %% Compute max(Y) on the binned X values
        % Make a dataset with the X and Y variables as columns (for easier manipulation and grouping)
        ds = dataset(X,Y);
        % Compute equal size bins on X and the corresponding MAX statistics
        ds.X_grp = ceil(nbins * tiedrank(ds.X(:,1)) / size(ds.X,1));
        ds_grp = grpstats(ds, 'X_grp', @max, 'DataVars', 'X', 'Y');
        % Add the smooth curve to the previous plot
        hold on
        plot(ds_grp.max_X, ds_grp.max_Y, 'mx-')


        And now you should see the above plot where the green curve has been overridden by a magenta curve with star points.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 25 at 3:19









        mastropimastropi

        1236




        1236





























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55296702%2fwhat-is-the-problem-with-the-shape-of-the-roc-curve-with-low-auc-4%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

            Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

            Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript