What is the problem with the shape of the roc curve with low auc(.4)? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Is it possible to compare the classification ability of two sets of features by ROC?How to fix ROC curve with points below diagonal?Matlab and Support Vector Machines: Why doesn't the implementation of PCA give good prediction results?ROC curve and libsvmPerfcurve positive classsklearn - model keeps overfittingHow to plot a ROC curve of a detector generated by TrainCascadeObjectDetector?Calculating in Matlab confidence intervals and AUC in case of IdentificationSpecificity of ROC curve plotting in reverse directionwhy have high AUC and low accuracy in a balanced dataset for SVM

Dating a Former Employee

Where are Serre’s lectures at Collège de France to be found?

Can a party unilaterally change candidates in preparation for a General election?

Is it common practice to audition new musicians one-on-one before rehearsing with the entire band?

Amount of permutations on an NxNxN Rubik's Cube

How to compare two different files line by line in unix?

Why are the trig functions versine, haversine, exsecant, etc, rarely used in modern mathematics?

Extracting terms with certain heads in a function

Is there any way for the UK Prime Minister to make a motion directly dependent on Government confidence?

What is the meaning of the simile “quick as silk”?

What do you call the main part of a joke?

How to tell that you are a giant?

What's the meaning of "fortified infraction restraint"?

How does the math work when buying airline miles?

Do jazz musicians improvise on the parent scale in addition to the chord-scales?

Is there such thing as an Availability Group failover trigger?

When a candle burns, why does the top of wick glow if bottom of flame is hottest?

Denied boarding although I have proper visa and documentation. To whom should I make a complaint?

Compare a given version number in the form major.minor.build.patch and see if one is less than the other

8 Prisoners wearing hats

Delete nth line from bottom

What are the out-of-universe reasons for the references to Toby Maguire-era Spider-Man in ITSV

Do I really need recursive chmod to restrict access to a folder?

Withdrew £2800, but only £2000 shows as withdrawn on online banking; what are my obligations?



What is the problem with the shape of the roc curve with low auc(.4)?



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!Is it possible to compare the classification ability of two sets of features by ROC?How to fix ROC curve with points below diagonal?Matlab and Support Vector Machines: Why doesn't the implementation of PCA give good prediction results?ROC curve and libsvmPerfcurve positive classsklearn - model keeps overfittingHow to plot a ROC curve of a detector generated by TrainCascadeObjectDetector?Calculating in Matlab confidence intervals and AUC in case of IdentificationSpecificity of ROC curve plotting in reverse directionwhy have high AUC and low accuracy in a balanced dataset for SVM



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I'm trying to plot a ROC curve. I have 75 data points and I considered only 10 features. Ii'm getting a staircase like image see below. Is this due to the small data set? Can we add more points to improve the curve?
AUC is very low .44. Is there any method to upload csv file ?



species1= readtable('target.csv');
species1 = table2cell(species1)
meas1= readtable('feature.csv');
meas1=meas1(:,1:10);
meas1= table2array(meas1)
numObs = length(species1);

half = floor(numObs/2);
training = meas1(1:half,:);
trainingSpecies = species1(1:half);
sample = meas1(half+1:end,:);
trainingSpecies = cell2mat(trainingSpecies)
group = species1(half+1:end,:);
group = cell2mat(group)
SVMModel = fitcsvm(training,trainingSpecies)
[label,score] = predict(SVMModel,sample);

[X,Y,T,AUC] = perfcurve(group,score(:,2),'1');
plot(X,Y,'LineWidth',3)
xlabel('False positive rate')
ylabel('True positive rate')
title('ROC for Classification ')


my ROC










share|improve this question
























  • Perfcurve creates a threshold for every single point, when that happens, it will always be a stepwise plot.

    – Durkee
    Mar 22 at 12:54











  • There's nothing to solve here. This is just how it works. You could run a smoothing function I guess but that would degrade the quality.

    – Durkee
    Mar 22 at 13:27











  • which smoothing function

    – leena s
    Mar 22 at 13:44

















0















I'm trying to plot a ROC curve. I have 75 data points and I considered only 10 features. Ii'm getting a staircase like image see below. Is this due to the small data set? Can we add more points to improve the curve?
AUC is very low .44. Is there any method to upload csv file ?



species1= readtable('target.csv');
species1 = table2cell(species1)
meas1= readtable('feature.csv');
meas1=meas1(:,1:10);
meas1= table2array(meas1)
numObs = length(species1);

half = floor(numObs/2);
training = meas1(1:half,:);
trainingSpecies = species1(1:half);
sample = meas1(half+1:end,:);
trainingSpecies = cell2mat(trainingSpecies)
group = species1(half+1:end,:);
group = cell2mat(group)
SVMModel = fitcsvm(training,trainingSpecies)
[label,score] = predict(SVMModel,sample);

[X,Y,T,AUC] = perfcurve(group,score(:,2),'1');
plot(X,Y,'LineWidth',3)
xlabel('False positive rate')
ylabel('True positive rate')
title('ROC for Classification ')


my ROC










share|improve this question
























  • Perfcurve creates a threshold for every single point, when that happens, it will always be a stepwise plot.

    – Durkee
    Mar 22 at 12:54











  • There's nothing to solve here. This is just how it works. You could run a smoothing function I guess but that would degrade the quality.

    – Durkee
    Mar 22 at 13:27











  • which smoothing function

    – leena s
    Mar 22 at 13:44













0












0








0








I'm trying to plot a ROC curve. I have 75 data points and I considered only 10 features. Ii'm getting a staircase like image see below. Is this due to the small data set? Can we add more points to improve the curve?
AUC is very low .44. Is there any method to upload csv file ?



species1= readtable('target.csv');
species1 = table2cell(species1)
meas1= readtable('feature.csv');
meas1=meas1(:,1:10);
meas1= table2array(meas1)
numObs = length(species1);

half = floor(numObs/2);
training = meas1(1:half,:);
trainingSpecies = species1(1:half);
sample = meas1(half+1:end,:);
trainingSpecies = cell2mat(trainingSpecies)
group = species1(half+1:end,:);
group = cell2mat(group)
SVMModel = fitcsvm(training,trainingSpecies)
[label,score] = predict(SVMModel,sample);

[X,Y,T,AUC] = perfcurve(group,score(:,2),'1');
plot(X,Y,'LineWidth',3)
xlabel('False positive rate')
ylabel('True positive rate')
title('ROC for Classification ')


my ROC










share|improve this question
















I'm trying to plot a ROC curve. I have 75 data points and I considered only 10 features. Ii'm getting a staircase like image see below. Is this due to the small data set? Can we add more points to improve the curve?
AUC is very low .44. Is there any method to upload csv file ?



species1= readtable('target.csv');
species1 = table2cell(species1)
meas1= readtable('feature.csv');
meas1=meas1(:,1:10);
meas1= table2array(meas1)
numObs = length(species1);

half = floor(numObs/2);
training = meas1(1:half,:);
trainingSpecies = species1(1:half);
sample = meas1(half+1:end,:);
trainingSpecies = cell2mat(trainingSpecies)
group = species1(half+1:end,:);
group = cell2mat(group)
SVMModel = fitcsvm(training,trainingSpecies)
[label,score] = predict(SVMModel,sample);

[X,Y,T,AUC] = perfcurve(group,score(:,2),'1');
plot(X,Y,'LineWidth',3)
xlabel('False positive rate')
ylabel('True positive rate')
title('ROC for Classification ')


my ROC







matlab svm






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 23 at 4:37







leena s

















asked Mar 22 at 9:38









leena sleena s

15




15












  • Perfcurve creates a threshold for every single point, when that happens, it will always be a stepwise plot.

    – Durkee
    Mar 22 at 12:54











  • There's nothing to solve here. This is just how it works. You could run a smoothing function I guess but that would degrade the quality.

    – Durkee
    Mar 22 at 13:27











  • which smoothing function

    – leena s
    Mar 22 at 13:44

















  • Perfcurve creates a threshold for every single point, when that happens, it will always be a stepwise plot.

    – Durkee
    Mar 22 at 12:54











  • There's nothing to solve here. This is just how it works. You could run a smoothing function I guess but that would degrade the quality.

    – Durkee
    Mar 22 at 13:27











  • which smoothing function

    – leena s
    Mar 22 at 13:44
















Perfcurve creates a threshold for every single point, when that happens, it will always be a stepwise plot.

– Durkee
Mar 22 at 12:54





Perfcurve creates a threshold for every single point, when that happens, it will always be a stepwise plot.

– Durkee
Mar 22 at 12:54













There's nothing to solve here. This is just how it works. You could run a smoothing function I guess but that would degrade the quality.

– Durkee
Mar 22 at 13:27





There's nothing to solve here. This is just how it works. You could run a smoothing function I guess but that would degrade the quality.

– Durkee
Mar 22 at 13:27













which smoothing function

– leena s
Mar 22 at 13:44





which smoothing function

– leena s
Mar 22 at 13:44












1 Answer
1






active

oldest

votes


















0














As indicated by Durkee, the perfcurve function will always be stepwise. In fact, the ROC curve is an empirical (as opposed to theoretical) cumulative distribution function (ecdf), and ecdf are stepwise functions by definition (as it computes the CDF on the values observed in the sample).



Usually, smoothing of the ROC curve is done via binning. You could bin the score values and compute an approximate ROC curve, or you could bin the False Positive Rate values obtained by the actual ROC curve (i.e. bin the X values generated by perfcurve()) which generates a smooth version that preserves the area under the curve (AUC).



In the following example I will show and compare the smoothed ROC curves obtained from these two options, which can be accomplished using the TVals option and the XVals option of the perfcurve function, respectively.



In each case, the binning is done so that we get approximately equal-sized (equal in the number of cases) bins using the tiedrank function. The values to use for the TVals and the XVals options are then computed using the grpstats function as the max value on each bin of the original/pre-binned variable (scores or X, respectively).



%% Reference for the original ROC curve example: https://www.mathworks.com/help/stats/perfcurve.html
load fisheriris
pred = meas(51:end,1:2);
resp = (1:100)'>50; % Versicolor = 0, virginica = 1
mdl = fitglm(pred,resp,'Distribution','binomial','Link','logit');
scores = mdl.Fitted.Probability;
[X,Y,T,AUC] = perfcurve(species(51:end,:),scores,'virginica');
AUC

%% Define the number of bins to use for smoothing
nbins = 10;

%% Option 1 (RED): Smooth the ROC curve by defining score thresholds (based on equal-size bins of the score).
scores_grp = ceil(nbins * tiedrank(scores(:,1)) / length(scores));
scores_thr = grpstats(scores, scores_grp, @max);
[X_grpScore,Y_grpScore,T_grpScore,AUC_grpScore] = perfcurve(species(51:end,:),scores,'virginica','TVals',scores_thr);
AUC_grpScore

%% Option 2 (GREEN) Smooth the ROC curve by binning the False Positive Rate (variable X of the perfcurve() output)
X_grp = ceil(nbins * tiedrank(X(:,1)) / length(X));
X_thr = grpstats(X, X_grp, @max);
[X_grpFPR,Y_grpFPR,T_grpFPR,AUC_grpFPR] = perfcurve(species(51:end,:),scores,'virginica','XVals',X_thr);
AUC_grpFPR

%% Plot
figure
plot(X,Y,'b.-'); hold on
plot(X_grpScore,Y_grpScore,'rx-')
plot(X_grpFPR,Y_grpFPR,'g.-')
xlabel('False positive rate')
ylabel('True positive rate')
title('ROC for Classification by Logistic Regression')
legend('Original ROC curve', ...
sprintf('Smoothed ROC curve in %d bins (based on score bins)', nbins), ...
sprintf('Smoothed ROC curve in %d bins (based on FPR bins)', nbins), ...
'Location', 'SouthEast')


The graphical output from this code is the following:
enter image description here



Note: if you look at the text output generated by the above code, you will notice that, as anticipated, the AUC values for the original ROC and the smoothed ROC curve based on FPR bins (GREEN option) coincide (AUC = 0.7918), whereas the AUC value for the smoothed ROC curve based on score bins (RED option) is quite smaller than the original AUC (= 0.6342), so the FPR approach should be preferred as smoothing technique for plotting purposes. Note however that the FPR approach requires computing the ROC curve twice, once on the original scores variable, and once on the binned FPR values (X values of the first ROC calculation).

However, the second ROC calculation can be avoided because the same smoothed ROC curve can be obtained by binning the X values and computing the max(Y) value on each bin, as shown in the following snippet:



%% Compute max(Y) on the binned X values
% Make a dataset with the X and Y variables as columns (for easier manipulation and grouping)
ds = dataset(X,Y);
% Compute equal size bins on X and the corresponding MAX statistics
ds.X_grp = ceil(nbins * tiedrank(ds.X(:,1)) / size(ds.X,1));
ds_grp = grpstats(ds, 'X_grp', @max, 'DataVars', 'X', 'Y');
% Add the smooth curve to the previous plot
hold on
plot(ds_grp.max_X, ds_grp.max_Y, 'mx-')


And now you should see the above plot where the green curve has been overridden by a magenta curve with star points.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55296702%2fwhat-is-the-problem-with-the-shape-of-the-roc-curve-with-low-auc-4%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    As indicated by Durkee, the perfcurve function will always be stepwise. In fact, the ROC curve is an empirical (as opposed to theoretical) cumulative distribution function (ecdf), and ecdf are stepwise functions by definition (as it computes the CDF on the values observed in the sample).



    Usually, smoothing of the ROC curve is done via binning. You could bin the score values and compute an approximate ROC curve, or you could bin the False Positive Rate values obtained by the actual ROC curve (i.e. bin the X values generated by perfcurve()) which generates a smooth version that preserves the area under the curve (AUC).



    In the following example I will show and compare the smoothed ROC curves obtained from these two options, which can be accomplished using the TVals option and the XVals option of the perfcurve function, respectively.



    In each case, the binning is done so that we get approximately equal-sized (equal in the number of cases) bins using the tiedrank function. The values to use for the TVals and the XVals options are then computed using the grpstats function as the max value on each bin of the original/pre-binned variable (scores or X, respectively).



    %% Reference for the original ROC curve example: https://www.mathworks.com/help/stats/perfcurve.html
    load fisheriris
    pred = meas(51:end,1:2);
    resp = (1:100)'>50; % Versicolor = 0, virginica = 1
    mdl = fitglm(pred,resp,'Distribution','binomial','Link','logit');
    scores = mdl.Fitted.Probability;
    [X,Y,T,AUC] = perfcurve(species(51:end,:),scores,'virginica');
    AUC

    %% Define the number of bins to use for smoothing
    nbins = 10;

    %% Option 1 (RED): Smooth the ROC curve by defining score thresholds (based on equal-size bins of the score).
    scores_grp = ceil(nbins * tiedrank(scores(:,1)) / length(scores));
    scores_thr = grpstats(scores, scores_grp, @max);
    [X_grpScore,Y_grpScore,T_grpScore,AUC_grpScore] = perfcurve(species(51:end,:),scores,'virginica','TVals',scores_thr);
    AUC_grpScore

    %% Option 2 (GREEN) Smooth the ROC curve by binning the False Positive Rate (variable X of the perfcurve() output)
    X_grp = ceil(nbins * tiedrank(X(:,1)) / length(X));
    X_thr = grpstats(X, X_grp, @max);
    [X_grpFPR,Y_grpFPR,T_grpFPR,AUC_grpFPR] = perfcurve(species(51:end,:),scores,'virginica','XVals',X_thr);
    AUC_grpFPR

    %% Plot
    figure
    plot(X,Y,'b.-'); hold on
    plot(X_grpScore,Y_grpScore,'rx-')
    plot(X_grpFPR,Y_grpFPR,'g.-')
    xlabel('False positive rate')
    ylabel('True positive rate')
    title('ROC for Classification by Logistic Regression')
    legend('Original ROC curve', ...
    sprintf('Smoothed ROC curve in %d bins (based on score bins)', nbins), ...
    sprintf('Smoothed ROC curve in %d bins (based on FPR bins)', nbins), ...
    'Location', 'SouthEast')


    The graphical output from this code is the following:
    enter image description here



    Note: if you look at the text output generated by the above code, you will notice that, as anticipated, the AUC values for the original ROC and the smoothed ROC curve based on FPR bins (GREEN option) coincide (AUC = 0.7918), whereas the AUC value for the smoothed ROC curve based on score bins (RED option) is quite smaller than the original AUC (= 0.6342), so the FPR approach should be preferred as smoothing technique for plotting purposes. Note however that the FPR approach requires computing the ROC curve twice, once on the original scores variable, and once on the binned FPR values (X values of the first ROC calculation).

    However, the second ROC calculation can be avoided because the same smoothed ROC curve can be obtained by binning the X values and computing the max(Y) value on each bin, as shown in the following snippet:



    %% Compute max(Y) on the binned X values
    % Make a dataset with the X and Y variables as columns (for easier manipulation and grouping)
    ds = dataset(X,Y);
    % Compute equal size bins on X and the corresponding MAX statistics
    ds.X_grp = ceil(nbins * tiedrank(ds.X(:,1)) / size(ds.X,1));
    ds_grp = grpstats(ds, 'X_grp', @max, 'DataVars', 'X', 'Y');
    % Add the smooth curve to the previous plot
    hold on
    plot(ds_grp.max_X, ds_grp.max_Y, 'mx-')


    And now you should see the above plot where the green curve has been overridden by a magenta curve with star points.






    share|improve this answer



























      0














      As indicated by Durkee, the perfcurve function will always be stepwise. In fact, the ROC curve is an empirical (as opposed to theoretical) cumulative distribution function (ecdf), and ecdf are stepwise functions by definition (as it computes the CDF on the values observed in the sample).



      Usually, smoothing of the ROC curve is done via binning. You could bin the score values and compute an approximate ROC curve, or you could bin the False Positive Rate values obtained by the actual ROC curve (i.e. bin the X values generated by perfcurve()) which generates a smooth version that preserves the area under the curve (AUC).



      In the following example I will show and compare the smoothed ROC curves obtained from these two options, which can be accomplished using the TVals option and the XVals option of the perfcurve function, respectively.



      In each case, the binning is done so that we get approximately equal-sized (equal in the number of cases) bins using the tiedrank function. The values to use for the TVals and the XVals options are then computed using the grpstats function as the max value on each bin of the original/pre-binned variable (scores or X, respectively).



      %% Reference for the original ROC curve example: https://www.mathworks.com/help/stats/perfcurve.html
      load fisheriris
      pred = meas(51:end,1:2);
      resp = (1:100)'>50; % Versicolor = 0, virginica = 1
      mdl = fitglm(pred,resp,'Distribution','binomial','Link','logit');
      scores = mdl.Fitted.Probability;
      [X,Y,T,AUC] = perfcurve(species(51:end,:),scores,'virginica');
      AUC

      %% Define the number of bins to use for smoothing
      nbins = 10;

      %% Option 1 (RED): Smooth the ROC curve by defining score thresholds (based on equal-size bins of the score).
      scores_grp = ceil(nbins * tiedrank(scores(:,1)) / length(scores));
      scores_thr = grpstats(scores, scores_grp, @max);
      [X_grpScore,Y_grpScore,T_grpScore,AUC_grpScore] = perfcurve(species(51:end,:),scores,'virginica','TVals',scores_thr);
      AUC_grpScore

      %% Option 2 (GREEN) Smooth the ROC curve by binning the False Positive Rate (variable X of the perfcurve() output)
      X_grp = ceil(nbins * tiedrank(X(:,1)) / length(X));
      X_thr = grpstats(X, X_grp, @max);
      [X_grpFPR,Y_grpFPR,T_grpFPR,AUC_grpFPR] = perfcurve(species(51:end,:),scores,'virginica','XVals',X_thr);
      AUC_grpFPR

      %% Plot
      figure
      plot(X,Y,'b.-'); hold on
      plot(X_grpScore,Y_grpScore,'rx-')
      plot(X_grpFPR,Y_grpFPR,'g.-')
      xlabel('False positive rate')
      ylabel('True positive rate')
      title('ROC for Classification by Logistic Regression')
      legend('Original ROC curve', ...
      sprintf('Smoothed ROC curve in %d bins (based on score bins)', nbins), ...
      sprintf('Smoothed ROC curve in %d bins (based on FPR bins)', nbins), ...
      'Location', 'SouthEast')


      The graphical output from this code is the following:
      enter image description here



      Note: if you look at the text output generated by the above code, you will notice that, as anticipated, the AUC values for the original ROC and the smoothed ROC curve based on FPR bins (GREEN option) coincide (AUC = 0.7918), whereas the AUC value for the smoothed ROC curve based on score bins (RED option) is quite smaller than the original AUC (= 0.6342), so the FPR approach should be preferred as smoothing technique for plotting purposes. Note however that the FPR approach requires computing the ROC curve twice, once on the original scores variable, and once on the binned FPR values (X values of the first ROC calculation).

      However, the second ROC calculation can be avoided because the same smoothed ROC curve can be obtained by binning the X values and computing the max(Y) value on each bin, as shown in the following snippet:



      %% Compute max(Y) on the binned X values
      % Make a dataset with the X and Y variables as columns (for easier manipulation and grouping)
      ds = dataset(X,Y);
      % Compute equal size bins on X and the corresponding MAX statistics
      ds.X_grp = ceil(nbins * tiedrank(ds.X(:,1)) / size(ds.X,1));
      ds_grp = grpstats(ds, 'X_grp', @max, 'DataVars', 'X', 'Y');
      % Add the smooth curve to the previous plot
      hold on
      plot(ds_grp.max_X, ds_grp.max_Y, 'mx-')


      And now you should see the above plot where the green curve has been overridden by a magenta curve with star points.






      share|improve this answer

























        0












        0








        0







        As indicated by Durkee, the perfcurve function will always be stepwise. In fact, the ROC curve is an empirical (as opposed to theoretical) cumulative distribution function (ecdf), and ecdf are stepwise functions by definition (as it computes the CDF on the values observed in the sample).



        Usually, smoothing of the ROC curve is done via binning. You could bin the score values and compute an approximate ROC curve, or you could bin the False Positive Rate values obtained by the actual ROC curve (i.e. bin the X values generated by perfcurve()) which generates a smooth version that preserves the area under the curve (AUC).



        In the following example I will show and compare the smoothed ROC curves obtained from these two options, which can be accomplished using the TVals option and the XVals option of the perfcurve function, respectively.



        In each case, the binning is done so that we get approximately equal-sized (equal in the number of cases) bins using the tiedrank function. The values to use for the TVals and the XVals options are then computed using the grpstats function as the max value on each bin of the original/pre-binned variable (scores or X, respectively).



        %% Reference for the original ROC curve example: https://www.mathworks.com/help/stats/perfcurve.html
        load fisheriris
        pred = meas(51:end,1:2);
        resp = (1:100)'>50; % Versicolor = 0, virginica = 1
        mdl = fitglm(pred,resp,'Distribution','binomial','Link','logit');
        scores = mdl.Fitted.Probability;
        [X,Y,T,AUC] = perfcurve(species(51:end,:),scores,'virginica');
        AUC

        %% Define the number of bins to use for smoothing
        nbins = 10;

        %% Option 1 (RED): Smooth the ROC curve by defining score thresholds (based on equal-size bins of the score).
        scores_grp = ceil(nbins * tiedrank(scores(:,1)) / length(scores));
        scores_thr = grpstats(scores, scores_grp, @max);
        [X_grpScore,Y_grpScore,T_grpScore,AUC_grpScore] = perfcurve(species(51:end,:),scores,'virginica','TVals',scores_thr);
        AUC_grpScore

        %% Option 2 (GREEN) Smooth the ROC curve by binning the False Positive Rate (variable X of the perfcurve() output)
        X_grp = ceil(nbins * tiedrank(X(:,1)) / length(X));
        X_thr = grpstats(X, X_grp, @max);
        [X_grpFPR,Y_grpFPR,T_grpFPR,AUC_grpFPR] = perfcurve(species(51:end,:),scores,'virginica','XVals',X_thr);
        AUC_grpFPR

        %% Plot
        figure
        plot(X,Y,'b.-'); hold on
        plot(X_grpScore,Y_grpScore,'rx-')
        plot(X_grpFPR,Y_grpFPR,'g.-')
        xlabel('False positive rate')
        ylabel('True positive rate')
        title('ROC for Classification by Logistic Regression')
        legend('Original ROC curve', ...
        sprintf('Smoothed ROC curve in %d bins (based on score bins)', nbins), ...
        sprintf('Smoothed ROC curve in %d bins (based on FPR bins)', nbins), ...
        'Location', 'SouthEast')


        The graphical output from this code is the following:
        enter image description here



        Note: if you look at the text output generated by the above code, you will notice that, as anticipated, the AUC values for the original ROC and the smoothed ROC curve based on FPR bins (GREEN option) coincide (AUC = 0.7918), whereas the AUC value for the smoothed ROC curve based on score bins (RED option) is quite smaller than the original AUC (= 0.6342), so the FPR approach should be preferred as smoothing technique for plotting purposes. Note however that the FPR approach requires computing the ROC curve twice, once on the original scores variable, and once on the binned FPR values (X values of the first ROC calculation).

        However, the second ROC calculation can be avoided because the same smoothed ROC curve can be obtained by binning the X values and computing the max(Y) value on each bin, as shown in the following snippet:



        %% Compute max(Y) on the binned X values
        % Make a dataset with the X and Y variables as columns (for easier manipulation and grouping)
        ds = dataset(X,Y);
        % Compute equal size bins on X and the corresponding MAX statistics
        ds.X_grp = ceil(nbins * tiedrank(ds.X(:,1)) / size(ds.X,1));
        ds_grp = grpstats(ds, 'X_grp', @max, 'DataVars', 'X', 'Y');
        % Add the smooth curve to the previous plot
        hold on
        plot(ds_grp.max_X, ds_grp.max_Y, 'mx-')


        And now you should see the above plot where the green curve has been overridden by a magenta curve with star points.






        share|improve this answer













        As indicated by Durkee, the perfcurve function will always be stepwise. In fact, the ROC curve is an empirical (as opposed to theoretical) cumulative distribution function (ecdf), and ecdf are stepwise functions by definition (as it computes the CDF on the values observed in the sample).



        Usually, smoothing of the ROC curve is done via binning. You could bin the score values and compute an approximate ROC curve, or you could bin the False Positive Rate values obtained by the actual ROC curve (i.e. bin the X values generated by perfcurve()) which generates a smooth version that preserves the area under the curve (AUC).



        In the following example I will show and compare the smoothed ROC curves obtained from these two options, which can be accomplished using the TVals option and the XVals option of the perfcurve function, respectively.



        In each case, the binning is done so that we get approximately equal-sized (equal in the number of cases) bins using the tiedrank function. The values to use for the TVals and the XVals options are then computed using the grpstats function as the max value on each bin of the original/pre-binned variable (scores or X, respectively).



        %% Reference for the original ROC curve example: https://www.mathworks.com/help/stats/perfcurve.html
        load fisheriris
        pred = meas(51:end,1:2);
        resp = (1:100)'>50; % Versicolor = 0, virginica = 1
        mdl = fitglm(pred,resp,'Distribution','binomial','Link','logit');
        scores = mdl.Fitted.Probability;
        [X,Y,T,AUC] = perfcurve(species(51:end,:),scores,'virginica');
        AUC

        %% Define the number of bins to use for smoothing
        nbins = 10;

        %% Option 1 (RED): Smooth the ROC curve by defining score thresholds (based on equal-size bins of the score).
        scores_grp = ceil(nbins * tiedrank(scores(:,1)) / length(scores));
        scores_thr = grpstats(scores, scores_grp, @max);
        [X_grpScore,Y_grpScore,T_grpScore,AUC_grpScore] = perfcurve(species(51:end,:),scores,'virginica','TVals',scores_thr);
        AUC_grpScore

        %% Option 2 (GREEN) Smooth the ROC curve by binning the False Positive Rate (variable X of the perfcurve() output)
        X_grp = ceil(nbins * tiedrank(X(:,1)) / length(X));
        X_thr = grpstats(X, X_grp, @max);
        [X_grpFPR,Y_grpFPR,T_grpFPR,AUC_grpFPR] = perfcurve(species(51:end,:),scores,'virginica','XVals',X_thr);
        AUC_grpFPR

        %% Plot
        figure
        plot(X,Y,'b.-'); hold on
        plot(X_grpScore,Y_grpScore,'rx-')
        plot(X_grpFPR,Y_grpFPR,'g.-')
        xlabel('False positive rate')
        ylabel('True positive rate')
        title('ROC for Classification by Logistic Regression')
        legend('Original ROC curve', ...
        sprintf('Smoothed ROC curve in %d bins (based on score bins)', nbins), ...
        sprintf('Smoothed ROC curve in %d bins (based on FPR bins)', nbins), ...
        'Location', 'SouthEast')


        The graphical output from this code is the following:
        enter image description here



        Note: if you look at the text output generated by the above code, you will notice that, as anticipated, the AUC values for the original ROC and the smoothed ROC curve based on FPR bins (GREEN option) coincide (AUC = 0.7918), whereas the AUC value for the smoothed ROC curve based on score bins (RED option) is quite smaller than the original AUC (= 0.6342), so the FPR approach should be preferred as smoothing technique for plotting purposes. Note however that the FPR approach requires computing the ROC curve twice, once on the original scores variable, and once on the binned FPR values (X values of the first ROC calculation).

        However, the second ROC calculation can be avoided because the same smoothed ROC curve can be obtained by binning the X values and computing the max(Y) value on each bin, as shown in the following snippet:



        %% Compute max(Y) on the binned X values
        % Make a dataset with the X and Y variables as columns (for easier manipulation and grouping)
        ds = dataset(X,Y);
        % Compute equal size bins on X and the corresponding MAX statistics
        ds.X_grp = ceil(nbins * tiedrank(ds.X(:,1)) / size(ds.X,1));
        ds_grp = grpstats(ds, 'X_grp', @max, 'DataVars', 'X', 'Y');
        % Add the smooth curve to the previous plot
        hold on
        plot(ds_grp.max_X, ds_grp.max_Y, 'mx-')


        And now you should see the above plot where the green curve has been overridden by a magenta curve with star points.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 25 at 3:19









        mastropimastropi

        1136




        1136





























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55296702%2fwhat-is-the-problem-with-the-shape-of-the-roc-curve-with-low-auc-4%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

            SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

            은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현