How Does the Hashing Trick in Machine Learning Work?Generate short hash string based using VBAHow does the Google “Did you mean?” Algorithm work?How can I generate an MD5 hash?How does a hash table work?Which machine learning classifier to choose, in general?Feature selection and unsupervised learning for multilingual data + machine learning algorithm selectionRepresenting arbitrarily long categorical array data in machine learningMixed parameter types for machine learningClassifying URLs into categories - Machine LearningHow exactly does feature hashing work?One hot encoding in Python

What is the need of methods like GET and POST in the HTTP protocol?

Painting a 4x6 grid with 2 colours

How to manage expenditure when billing cycles and paycheck cycles are not aligned?

On the meaning of 'anyways' in "What Exactly Is a Quartz Crystal, Anyways?"

Meaning of 'ran' in German?

Is it a good idea to leave minor world details to the reader's imagination?

Hilbert's hotel: why can't I repeat it infinitely many times?

Social leper versus social leopard

Does the Orange League not count as an official Pokemon League, making the Alolan League Ash's first-ever win?

Magneto 2 How to call Helper function in observer file

What exactly did this mechanic sabotage on the American Airlines 737, and how dangerous was it?

If the EU does not offer an extension to UK's Article 50 invocation, is the Benn Bill irrelevant?

How can an attacker use robots.txt?

Why is there not a feasible solution for a MIP?

I reverse the source code, you negate the output!

Do we know the situation in Britain before Sealion (summer 1940)?

Replace HP Smart Array RAID Controller with newer generation controller (e.g. 410 -> 420)

Is there any reason nowadays to use a neon indicator lamp instead of an LED?

To what extent is it worthwhile to report check fraud / refund scams?

How to deal with a Homophobic PC

In a folk jam session, when asked which key my non-transposing chromatic instrument (like a violin) is in, what do I answer?

Why does this image of Jupiter look so strange?

Where Does VDD+0.3V Input Limit Come From on IC chips?

The 100 soldier problem

How Does the Hashing Trick in Machine Learning Work?

Generate short hash string based using VBAHow does the Google “Did you mean?” Algorithm work?How can I generate an MD5 hash?How does a hash table work?Which machine learning classifier to choose, in general?Feature selection and unsupervised learning for multilingual data + machine learning algorithm selectionRepresenting arbitrarily long categorical array data in machine learningMixed parameter types for machine learningClassifying URLs into categories - Machine LearningHow exactly does feature hashing work?One hot encoding in Python

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

-1

I have a large categorical dataset and a feedforward ANN that I am using for classification purposes. I programmed the machine learning model using Excel VBA (the only programming language I have access too currently).

I have 150 categories in my dataset that I need to process. I have tried using Binary Encoding and One-Hot Encoding, however because of the number of categories I need to process, these vectors are often too large for VBA to handle and I end up with a memory error.

I’d like to give the Hashing trick a go, and see if it works any better. I don't understand how to do this with Excel however.

I have reviewed the following links to try and understand it:

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/feature-hashing

https://medium.com/value-stream-design/introducing-one-of-the-best-hacks-in-machine-learning-the-hashing-trick-bf6a9c8af18f

https://en.wikipedia.org/wiki/Vowpal_Wabbit

I still don’t completely understand it. Here is what I have done so far. I used the following code example to create a hash sequence for my categorical date:
Generate short hash string based using VBA

Using the code above, I have been able to produce collision free numerical hash sequences. However, what do I do now? Does the hash sequence need to be converted to a binary vector now? This is where I get lost.

I provided a small example of my data thus far. Would somebody be able to show me step by step how the hashing trick works (preferably for Excel)?

'CATEGORY 'HASH SEQUENCE
STEEL 37152
PLASTIC 31081
ALUMINUM 2310
BRONZE 9364

asked Mar 28 at 16:24

junfanbl

216 bronze badges

add a comment
|

-1

I’d like to give the Hashing trick a go, and see if it works any better. I don't understand how to do this with Excel however.

I have reviewed the following links to try and understand it:

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/feature-hashing

https://medium.com/value-stream-design/introducing-one-of-the-best-hacks-in-machine-learning-the-hashing-trick-bf6a9c8af18f

https://en.wikipedia.org/wiki/Vowpal_Wabbit

I provided a small example of my data thus far. Would somebody be able to show me step by step how the hashing trick works (preferably for Excel)?

'CATEGORY 'HASH SEQUENCE
STEEL 37152
PLASTIC 31081
ALUMINUM 2310
BRONZE 9364

asked Mar 28 at 16:24

junfanbl

216 bronze badges

add a comment
|

-1

I’d like to give the Hashing trick a go, and see if it works any better. I don't understand how to do this with Excel however.

I have reviewed the following links to try and understand it:

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/feature-hashing

https://medium.com/value-stream-design/introducing-one-of-the-best-hacks-in-machine-learning-the-hashing-trick-bf6a9c8af18f

https://en.wikipedia.org/wiki/Vowpal_Wabbit

I provided a small example of my data thus far. Would somebody be able to show me step by step how the hashing trick works (preferably for Excel)?

'CATEGORY 'HASH SEQUENCE
STEEL 37152
PLASTIC 31081
ALUMINUM 2310
BRONZE 9364

asked Mar 28 at 16:24

junfanbl

216 bronze badges

I’d like to give the Hashing trick a go, and see if it works any better. I don't understand how to do this with Excel however.

I have reviewed the following links to try and understand it:

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/feature-hashing

https://medium.com/value-stream-design/introducing-one-of-the-best-hacks-in-machine-learning-the-hashing-trick-bf6a9c8af18f

https://en.wikipedia.org/wiki/Vowpal_Wabbit

I provided a small example of my data thus far. Would somebody be able to show me step by step how the hashing trick works (preferably for Excel)?

'CATEGORY 'HASH SEQUENCE
STEEL 37152
PLASTIC 31081
ALUMINUM 2310
BRONZE 9364

excel machine-learning hash hashcode

asked Mar 28 at 16:24

junfanbl

216 bronze badges

asked Mar 28 at 16:24

junfanbl

216 bronze badges

asked Mar 28 at 16:24

junfanbl

216 bronze badges

asked Mar 28 at 16:24

junfanbl

216 bronze badges

asked Mar 28 at 16:24

junfanbl

216 bronze badges

add a comment
|

1 Answer
1

active

oldest

votes

So what the hashing trick does is it prevents ~fake words from taking up extra memory. In a regular Bag-Of-Words (BOW) model, you have 1 dimension per word in the vocabulary. This means that a misspelled word and the regular word can both take up separate dimensions - if you have the misspelled word in the model at all. If the misspelled word is not in the model, (depending on your model) you might ignore it completly. This adds up over time. And by misspelled word, I'm just using an example of any word not in the vocabulary you use to create the vectors to train your model with. Meaning any model trained this way cannot adapt to new vocab without being trained all over again.

The hashing method allows you to incorporate out-of-vocab words, with some potential accuracy loss. It also ensures that you can bound your memory. Essentially the hashing method starts by defining a hash function that takes some input (typically a word) and mapping it to an output value Within an Already Determined Range. You would choose your hash function to output somewhere between say 0-2^16. Thus you know your output vectors will always be capped at size 2^16 (arbitrary value really), so you can prevent memory issues. Further, hash functions have "collisions" - what this means is that hash(a) might equal hash(b) - very rarely with an appropriate output range, but its possible. This means that you lose some accuracy - but since the hash function is theoretically able to take any input string, it can work with out of vocabulary words to get a new vector Of the Same Size as the original vectors used to train the model. Since your new data vector is the Same Size as those used to train the model previously, you can use it to refine your model instead of being forced to train a new model.

answered Mar 28 at 16:43

Evan Mata

16713 bronze badges

add a comment
|

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55402537%2fhow-does-the-hashing-trick-in-machine-learning-work%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

answered Mar 28 at 16:43

Evan Mata

16713 bronze badges

add a comment
|

answered Mar 28 at 16:43

Evan Mata

16713 bronze badges

add a comment
|

answered Mar 28 at 16:43

Evan Mata

16713 bronze badges

answered Mar 28 at 16:43

Evan Mata

16713 bronze badges

answered Mar 28 at 16:43

Evan Mata

16713 bronze badges

answered Mar 28 at 16:43

Evan Mata

16713 bronze badges

answered Mar 28 at 16:43

Evan Mata

16713 bronze badges

add a comment
|

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

1 Answer
1

1 Answer
1

1 Answer
1