Bin and Calculate Entropy using NumpyIs there a NumPy function to return the first index of something in an array?What are the advantages of NumPy over regular Python lists?How can the Euclidean distance be calculated with NumPy?How to print the full NumPy array, without truncation?Why do people write the #!/usr/bin/env python shebang on the first line of a Python script?Numpy array dimensionsHow to access the ith column of a NumPy multidimensional array?Dump a NumPy array into a csv file“Large data” work flows using pandasFeature Selection by Entropy and Information Gain in Matlab

Why does splatting create a tuple on the rhs but a list on the lhs?

Why does the Starter Set wizard have six spells in their spellbook?

Why sampling a periodic signal doesn't yield a periodic discrete signal?

Is "vegetable base" a common term in English?

Why did it take so long for Germany to allow electric scooters / e-rollers on the roads?

Shorten or merge multiple lines of `&> /dev/null &`

Are cells guaranteed to get at least one mitochondrion when they divide?

Is there a simple example that empirical evidence is misleading?

The disk image is 497GB smaller than the target device

Using too much dialogue?

Best shape for a necromancer's undead minions for battle?

Is superuser the same as root?

What could a self-sustaining lunar colony slowly lose that would ultimately prove fatal?

What is the use case for non-breathable waterproof pants?

What tokens are in the end of line?

Which European Languages are not Indo-European?

Co-author wants to put their current funding source in the acknowledgements section because they edited the paper

Is my plasma cannon concept viable?

Why is 'additive' EQ more difficult to use than 'subtractive'?

Expected maximum number of unpaired socks

Why is unzipped directory exactly 4.0k (much smaller than zipped file)?

Are runways booked by airlines to land their planes?

Are there any German nonsense poems (Jabberwocky)?

If I arrive in the UK, and then head to mainland Europe, does my Schengen visa 90 day limit start when I arrived in the UK, or mainland Europe?

Bin and Calculate Entropy using Numpy

Is there a NumPy function to return the first index of something in an array?What are the advantages of NumPy over regular Python lists?How can the Euclidean distance be calculated with NumPy?How to print the full NumPy array, without truncation?Why do people write the #!/usr/bin/env python shebang on the first line of a Python script?Numpy array dimensionsHow to access the ith column of a NumPy multidimensional array?Dump a NumPy array into a csv file“Large data” work flows using pandasFeature Selection by Entropy and Information Gain in Matlab

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I am attempting to perform the following task:

For a given column of data (stored as a numpy array), "bin" the data in a greedy fashion where I test the current object and the next in order to calculate its entropy.

Pseudocode would look like this:

split_data(feature):
 BestValues = 0
 For Each Value in Feature:
 Calculate CurrentGain As InformationGain(Entropy(Feature) - Entropy(Value + Next Value))
 If CurrentGain > BestGain:
 Set BestValues = Value,Next Value
 Set BestGain = CurrentGain


 return BestValues

I currently have a Python codes that looks like the following:

# This function finds the total entropy for a given dataset
def entropy(dataset):
 # Declare variables
 total_entropy = 0
 # Determine classes and numby of items in each class
 classes = numpy.unique(dataset[:,-1])

 # Loop through each "class", or label
 for aclass in classes:
 # Create temp variables
 currFreq = 0
 currProb = 0
 # Loop through each row in the dataset
 for row in dataset:
 # If that row has the same label as the current class, implement the frequency
 if (aclass == row[-1]):
 currFreq = currFreq + 1
 # If not, continue
 else:
 continue
 # The current probability is the # of occurences / total occurences
 currProb = currFreq / len(dataset)
 # If it is 0, then the entropy is 0. If not, use entropy formula
 if (currFreq > 0):
 total_entropy = total_entropy + (-currProb * math.log(currProb, 2))
 else:
 return 0

 # Return the total entropy
 return total_entropy

# This function gets the entropy for a single attribute
def entropy_by_attribute(dataset, feature):
 # The attribute is the specific feature of the dataset
 attribute = dataset[:,feature]
 # The target_variables are the unique values in that feature
 target_variables = numpy.unique(dataset[:,-1])
 # The unique values in the column we are evaluating
 variables = numpy.unique(attribute)
 # The entropy for the attribute in question
 entropy_attribute = 0

 # Loop through each of the possible values
 for variable in variables:
 denominator = 0
 entropy_each_feature = 0
 # For every row in the column
 for row in attribute:
 # If it is equal to the current value we are estimating, increase your denominator
 if row == variable:
 denominator = denominator + 1

 # Now loop through each class
 for target_variable in target_variables:
 numerator = 0
 # Loop through the dataset
 for row in dataset:
 index = 0
 # if the current row in the feature is equal to the value you are evaluating
 # and the label is equal to the label you are evaluating, increase the numerator
 if dataset[index][feature] == variable and dataset[index][-1] == target_variable:
 numerator = numerator + 1
 else:
 continue
 index = index + 1

 # use eps to protect from divide by 0
 fraction = numerator/(denominator+numpy.finfo(float).eps)
 entropy_each_feature = entropy_each_feature + (-fraction * math.log(fraction+numpy.finfo(float).eps, 2))

 # Now calculate the total entropy for the attribute in question
 big_fraction = denominator / len(dataset)
 entropy_attribute = entropy_attribute +(-big_fraction*entropy_each_feature)

 # Return that entropy
 return entropy_attribute

# This function calculates the information gain
def infogain(dataset, feature):
 # Grab the entropy from the total dataset
 total_entropy = entropy(dataset)
 # Grab the entropy for the current feature being evaluated
 feature_entropy = entropy_by_attribute(dataset, feature)
 # Calculate the infogain
 infogain = float(abs(total_entropy - feature_entropy))

 # Return the infogain
 return infogain

However, I am unsure of how to do the following:

For a feature, grab its total entropy

For a single feature, determine entropy using a binning technique where I am testing two values

I cannot logically conceive of how to develop codes to accomplish 1 and 2, and I am struggling hard. I will continue to update with progress that I do make.

asked Mar 23 at 23:24

Jerry M.

1,04911029

add a comment |

I am attempting to perform the following task:

For a given column of data (stored as a numpy array), "bin" the data in a greedy fashion where I test the current object and the next in order to calculate its entropy.

Pseudocode would look like this:

split_data(feature):
 BestValues = 0
 For Each Value in Feature:
 Calculate CurrentGain As InformationGain(Entropy(Feature) - Entropy(Value + Next Value))
 If CurrentGain > BestGain:
 Set BestValues = Value,Next Value
 Set BestGain = CurrentGain


 return BestValues

I currently have a Python codes that looks like the following:

# This function finds the total entropy for a given dataset
def entropy(dataset):
 # Declare variables
 total_entropy = 0
 # Determine classes and numby of items in each class
 classes = numpy.unique(dataset[:,-1])

 # Loop through each "class", or label
 for aclass in classes:
 # Create temp variables
 currFreq = 0
 currProb = 0
 # Loop through each row in the dataset
 for row in dataset:
 # If that row has the same label as the current class, implement the frequency
 if (aclass == row[-1]):
 currFreq = currFreq + 1
 # If not, continue
 else:
 continue
 # The current probability is the # of occurences / total occurences
 currProb = currFreq / len(dataset)
 # If it is 0, then the entropy is 0. If not, use entropy formula
 if (currFreq > 0):
 total_entropy = total_entropy + (-currProb * math.log(currProb, 2))
 else:
 return 0

 # Return the total entropy
 return total_entropy

# This function gets the entropy for a single attribute
def entropy_by_attribute(dataset, feature):
 # The attribute is the specific feature of the dataset
 attribute = dataset[:,feature]
 # The target_variables are the unique values in that feature
 target_variables = numpy.unique(dataset[:,-1])
 # The unique values in the column we are evaluating
 variables = numpy.unique(attribute)
 # The entropy for the attribute in question
 entropy_attribute = 0

 # Loop through each of the possible values
 for variable in variables:
 denominator = 0
 entropy_each_feature = 0
 # For every row in the column
 for row in attribute:
 # If it is equal to the current value we are estimating, increase your denominator
 if row == variable:
 denominator = denominator + 1

 # Now loop through each class
 for target_variable in target_variables:
 numerator = 0
 # Loop through the dataset
 for row in dataset:
 index = 0
 # if the current row in the feature is equal to the value you are evaluating
 # and the label is equal to the label you are evaluating, increase the numerator
 if dataset[index][feature] == variable and dataset[index][-1] == target_variable:
 numerator = numerator + 1
 else:
 continue
 index = index + 1

 # use eps to protect from divide by 0
 fraction = numerator/(denominator+numpy.finfo(float).eps)
 entropy_each_feature = entropy_each_feature + (-fraction * math.log(fraction+numpy.finfo(float).eps, 2))

 # Now calculate the total entropy for the attribute in question
 big_fraction = denominator / len(dataset)
 entropy_attribute = entropy_attribute +(-big_fraction*entropy_each_feature)

 # Return that entropy
 return entropy_attribute

# This function calculates the information gain
def infogain(dataset, feature):
 # Grab the entropy from the total dataset
 total_entropy = entropy(dataset)
 # Grab the entropy for the current feature being evaluated
 feature_entropy = entropy_by_attribute(dataset, feature)
 # Calculate the infogain
 infogain = float(abs(total_entropy - feature_entropy))

 # Return the infogain
 return infogain

However, I am unsure of how to do the following:

For a feature, grab its total entropy

For a single feature, determine entropy using a binning technique where I am testing two values

I cannot logically conceive of how to develop codes to accomplish 1 and 2, and I am struggling hard. I will continue to update with progress that I do make.

asked Mar 23 at 23:24

Jerry M.

1,04911029

add a comment |

I am attempting to perform the following task:

For a given column of data (stored as a numpy array), "bin" the data in a greedy fashion where I test the current object and the next in order to calculate its entropy.

Pseudocode would look like this:

split_data(feature):
 BestValues = 0
 For Each Value in Feature:
 Calculate CurrentGain As InformationGain(Entropy(Feature) - Entropy(Value + Next Value))
 If CurrentGain > BestGain:
 Set BestValues = Value,Next Value
 Set BestGain = CurrentGain


 return BestValues

I currently have a Python codes that looks like the following:

# This function finds the total entropy for a given dataset
def entropy(dataset):
 # Declare variables
 total_entropy = 0
 # Determine classes and numby of items in each class
 classes = numpy.unique(dataset[:,-1])

 # Loop through each "class", or label
 for aclass in classes:
 # Create temp variables
 currFreq = 0
 currProb = 0
 # Loop through each row in the dataset
 for row in dataset:
 # If that row has the same label as the current class, implement the frequency
 if (aclass == row[-1]):
 currFreq = currFreq + 1
 # If not, continue
 else:
 continue
 # The current probability is the # of occurences / total occurences
 currProb = currFreq / len(dataset)
 # If it is 0, then the entropy is 0. If not, use entropy formula
 if (currFreq > 0):
 total_entropy = total_entropy + (-currProb * math.log(currProb, 2))
 else:
 return 0

 # Return the total entropy
 return total_entropy

# This function gets the entropy for a single attribute
def entropy_by_attribute(dataset, feature):
 # The attribute is the specific feature of the dataset
 attribute = dataset[:,feature]
 # The target_variables are the unique values in that feature
 target_variables = numpy.unique(dataset[:,-1])
 # The unique values in the column we are evaluating
 variables = numpy.unique(attribute)
 # The entropy for the attribute in question
 entropy_attribute = 0

 # Loop through each of the possible values
 for variable in variables:
 denominator = 0
 entropy_each_feature = 0
 # For every row in the column
 for row in attribute:
 # If it is equal to the current value we are estimating, increase your denominator
 if row == variable:
 denominator = denominator + 1

 # Now loop through each class
 for target_variable in target_variables:
 numerator = 0
 # Loop through the dataset
 for row in dataset:
 index = 0
 # if the current row in the feature is equal to the value you are evaluating
 # and the label is equal to the label you are evaluating, increase the numerator
 if dataset[index][feature] == variable and dataset[index][-1] == target_variable:
 numerator = numerator + 1
 else:
 continue
 index = index + 1

 # use eps to protect from divide by 0
 fraction = numerator/(denominator+numpy.finfo(float).eps)
 entropy_each_feature = entropy_each_feature + (-fraction * math.log(fraction+numpy.finfo(float).eps, 2))

 # Now calculate the total entropy for the attribute in question
 big_fraction = denominator / len(dataset)
 entropy_attribute = entropy_attribute +(-big_fraction*entropy_each_feature)

 # Return that entropy
 return entropy_attribute

# This function calculates the information gain
def infogain(dataset, feature):
 # Grab the entropy from the total dataset
 total_entropy = entropy(dataset)
 # Grab the entropy for the current feature being evaluated
 feature_entropy = entropy_by_attribute(dataset, feature)
 # Calculate the infogain
 infogain = float(abs(total_entropy - feature_entropy))

 # Return the infogain
 return infogain

However, I am unsure of how to do the following:

For a feature, grab its total entropy

For a single feature, determine entropy using a binning technique where I am testing two values

I cannot logically conceive of how to develop codes to accomplish 1 and 2, and I am struggling hard. I will continue to update with progress that I do make.

asked Mar 23 at 23:24

Jerry M.

1,04911029

I am attempting to perform the following task:

For a given column of data (stored as a numpy array), "bin" the data in a greedy fashion where I test the current object and the next in order to calculate its entropy.

Pseudocode would look like this:

split_data(feature):
 BestValues = 0
 For Each Value in Feature:
 Calculate CurrentGain As InformationGain(Entropy(Feature) - Entropy(Value + Next Value))
 If CurrentGain > BestGain:
 Set BestValues = Value,Next Value
 Set BestGain = CurrentGain


 return BestValues

I currently have a Python codes that looks like the following:

# This function finds the total entropy for a given dataset
def entropy(dataset):
 # Declare variables
 total_entropy = 0
 # Determine classes and numby of items in each class
 classes = numpy.unique(dataset[:,-1])

 # Loop through each "class", or label
 for aclass in classes:
 # Create temp variables
 currFreq = 0
 currProb = 0
 # Loop through each row in the dataset
 for row in dataset:
 # If that row has the same label as the current class, implement the frequency
 if (aclass == row[-1]):
 currFreq = currFreq + 1
 # If not, continue
 else:
 continue
 # The current probability is the # of occurences / total occurences
 currProb = currFreq / len(dataset)
 # If it is 0, then the entropy is 0. If not, use entropy formula
 if (currFreq > 0):
 total_entropy = total_entropy + (-currProb * math.log(currProb, 2))
 else:
 return 0

 # Return the total entropy
 return total_entropy

# This function gets the entropy for a single attribute
def entropy_by_attribute(dataset, feature):
 # The attribute is the specific feature of the dataset
 attribute = dataset[:,feature]
 # The target_variables are the unique values in that feature
 target_variables = numpy.unique(dataset[:,-1])
 # The unique values in the column we are evaluating
 variables = numpy.unique(attribute)
 # The entropy for the attribute in question
 entropy_attribute = 0

 # Loop through each of the possible values
 for variable in variables:
 denominator = 0
 entropy_each_feature = 0
 # For every row in the column
 for row in attribute:
 # If it is equal to the current value we are estimating, increase your denominator
 if row == variable:
 denominator = denominator + 1

 # Now loop through each class
 for target_variable in target_variables:
 numerator = 0
 # Loop through the dataset
 for row in dataset:
 index = 0
 # if the current row in the feature is equal to the value you are evaluating
 # and the label is equal to the label you are evaluating, increase the numerator
 if dataset[index][feature] == variable and dataset[index][-1] == target_variable:
 numerator = numerator + 1
 else:
 continue
 index = index + 1

 # use eps to protect from divide by 0
 fraction = numerator/(denominator+numpy.finfo(float).eps)
 entropy_each_feature = entropy_each_feature + (-fraction * math.log(fraction+numpy.finfo(float).eps, 2))

 # Now calculate the total entropy for the attribute in question
 big_fraction = denominator / len(dataset)
 entropy_attribute = entropy_attribute +(-big_fraction*entropy_each_feature)

 # Return that entropy
 return entropy_attribute

# This function calculates the information gain
def infogain(dataset, feature):
 # Grab the entropy from the total dataset
 total_entropy = entropy(dataset)
 # Grab the entropy for the current feature being evaluated
 feature_entropy = entropy_by_attribute(dataset, feature)
 # Calculate the infogain
 infogain = float(abs(total_entropy - feature_entropy))

 # Return the infogain
 return infogain

However, I am unsure of how to do the following:

For a feature, grab its total entropy

For a single feature, determine entropy using a binning technique where I am testing two values

I cannot logically conceive of how to develop codes to accomplish 1 and 2, and I am struggling hard. I will continue to update with progress that I do make.

python numpy statistics

asked Mar 23 at 23:24

Jerry M.

1,04911029

asked Mar 23 at 23:24

Jerry M.

1,04911029

asked Mar 23 at 23:24

Jerry M.

1,04911029

asked Mar 23 at 23:24

Jerry M.

1,04911029

asked Mar 23 at 23:24

Jerry M.

1,04911029

add a comment |

1 Answer
1

active

oldest

votes

The following function processes the entropy calculation, per column (feature)

def entropy(attributes, dataset, targetAttr):
 freq = 
 entropy = 0.0
 index = 0
 for item in attributes:
 if (targetAttr == item):
 break
 else:
 index = index + 1
 index = index - 1
 for item in dataset:
 if ((item[index]) in freq):
 # Increase the index
 freq[item[index]] += 1.0
 else:
 # Initialize it by setting it to 0
 freq[item[index]] = 1.0

 for freq in freq.values():
 entropy = entropy + (-freq / len(dataset)) * math.log(freq / len(dataset), 2)
 return entropy

answered Apr 29 at 20:04

Jerry M.

1,04911029

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55319307%2fbin-and-calculate-entropy-using-numpy%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

The following function processes the entropy calculation, per column (feature)

def entropy(attributes, dataset, targetAttr):
 freq = 
 entropy = 0.0
 index = 0
 for item in attributes:
 if (targetAttr == item):
 break
 else:
 index = index + 1
 index = index - 1
 for item in dataset:
 if ((item[index]) in freq):
 # Increase the index
 freq[item[index]] += 1.0
 else:
 # Initialize it by setting it to 0
 freq[item[index]] = 1.0

 for freq in freq.values():
 entropy = entropy + (-freq / len(dataset)) * math.log(freq / len(dataset), 2)
 return entropy

answered Apr 29 at 20:04

Jerry M.

1,04911029

add a comment |

The following function processes the entropy calculation, per column (feature)

def entropy(attributes, dataset, targetAttr):
 freq = 
 entropy = 0.0
 index = 0
 for item in attributes:
 if (targetAttr == item):
 break
 else:
 index = index + 1
 index = index - 1
 for item in dataset:
 if ((item[index]) in freq):
 # Increase the index
 freq[item[index]] += 1.0
 else:
 # Initialize it by setting it to 0
 freq[item[index]] = 1.0

 for freq in freq.values():
 entropy = entropy + (-freq / len(dataset)) * math.log(freq / len(dataset), 2)
 return entropy

answered Apr 29 at 20:04

Jerry M.

1,04911029

add a comment |

The following function processes the entropy calculation, per column (feature)

def entropy(attributes, dataset, targetAttr):
 freq = 
 entropy = 0.0
 index = 0
 for item in attributes:
 if (targetAttr == item):
 break
 else:
 index = index + 1
 index = index - 1
 for item in dataset:
 if ((item[index]) in freq):
 # Increase the index
 freq[item[index]] += 1.0
 else:
 # Initialize it by setting it to 0
 freq[item[index]] = 1.0

 for freq in freq.values():
 entropy = entropy + (-freq / len(dataset)) * math.log(freq / len(dataset), 2)
 return entropy

answered Apr 29 at 20:04

Jerry M.

1,04911029

The following function processes the entropy calculation, per column (feature)

def entropy(attributes, dataset, targetAttr):
 freq = 
 entropy = 0.0
 index = 0
 for item in attributes:
 if (targetAttr == item):
 break
 else:
 index = index + 1
 index = index - 1
 for item in dataset:
 if ((item[index]) in freq):
 # Increase the index
 freq[item[index]] += 1.0
 else:
 # Initialize it by setting it to 0
 freq[item[index]] = 1.0

 for freq in freq.values():
 entropy = entropy + (-freq / len(dataset)) * math.log(freq / len(dataset), 2)
 return entropy

answered Apr 29 at 20:04

Jerry M.

1,04911029

answered Apr 29 at 20:04

Jerry M.

1,04911029

answered Apr 29 at 20:04

Jerry M.

1,04911029

answered Apr 29 at 20:04

Jerry M.

1,04911029

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

1 Answer
1

1 Answer
1

1 Answer
1