Looking for an automated way to generate taxonomies for a glossaryProgramming GlossaryExample for a simple LaTeX glossaryGlossary Web Part for SharePoint 2007 / Glossary jQuery / Glossary Management SystemOnline glossary - simple (free) approachLooking for a way to incorporate an accordion into an alphabetic glossary in Drupal 7How to create a glossary in Sphinx?Rmarkdown glossaryDelete multiple nodes from linked list javaSchema.org for acronym / abbreviation (in a glossary)Glossary tool for a web pages?
Can an open source licence be revoked if it violates employer's IP?
Leveraging cash for buying car
How to search for Android apps without ads?
Does anyone recognize these rockets, and their location?
How did Avada Kedavra get its name?
Monotonic operations and integrals
Is it possible for underground bunkers on different continents to be connected?
Is it a bad idea to have an pen name with only an initial for a surname?
What is the context for Napoleon's quote "[the Austrians] did not know the value of five minutes"?
Someone who is granted access to information but not expected to read it
TiKZ won't graph 1/sqrt(x)
newcommand with parameter blank or zero
Idiom for 'person who gets violent when drunk"
IIS LAN and WAN separate SSL certificates for the same server
Why did the USA sell so many airplanes prior to WW2?
Having some issue with notation in a Hilbert space
Cant bend fingertip when finger is straight
Is there a term for someone whose preferred policies are a mix of Left and Right?
Can a 40amp breaker be used safely and without issue with a 40amp device on 6AWG wire?
How to make a villain when your PCs are villains?
Print the phrase "And she said, 'But that's his.'" using only the alphabet
Can I give my friend the sour dough "throw away" as a starter to their sourdough starter?
At zero velocity, is this object neither speeding up nor slowing down?
...and then she held the gun
Looking for an automated way to generate taxonomies for a glossary
Programming GlossaryExample for a simple LaTeX glossaryGlossary Web Part for SharePoint 2007 / Glossary jQuery / Glossary Management SystemOnline glossary - simple (free) approachLooking for a way to incorporate an accordion into an alphabetic glossary in Drupal 7How to create a glossary in Sphinx?Rmarkdown glossaryDelete multiple nodes from linked list javaSchema.org for acronym / abbreviation (in a glossary)Glossary tool for a web pages?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I'm really not sure where to post this question, as this is more of a data governance question than a programming one, and is also somewhat subjective. I will remove/repost elsewhere if this is not the right place.
I am setting up a glossary of terms using a bottom-up approach. Namely, I am initially generating the terms from all of the unique logical column names from different databases and aggregating them into one central location, the glossary. The clean-up and creation of relationships is proving to be an exceptionally difficult and manual process so far. This particular post concerns automating the creation of taxonomies/hierarchical groupings within the larger list of glossary terms.
Let's say that I have a list of ten terms in the glossary below. The terms in my glossary which I know to be related are not necessarily united by any naming convention, so I'm simulating that with this list:
['term a',
'second term',
'term 3',
'term d',
'term number five',
'sixth term',
'seventh term',
'term eight',
'ninth term',
'term 10']
I know for a fact that sixth term
is a parent term of term a
, term number five
and term 10
, and ideally I would want to create that association in something like a Python dictionary of lists, as follows:
'sixth term': ['term a', 'term number five', 'term 10']
So basically, my problem is that:
- I have a series of terms that I know to be related in a hierarchical fashion
- The related terms do not necessarily follow a standard naming convention
My initial thoughts on how to deal with this are:
- Trying to use string-matching on the definitions of these terms
- Finding any common terminology that does exist between the related terms and starting from there
- Using a library like difflib to do sequence-matching on terms and their definitions
The problem with using definitions as inputs is that the definitions are not always so good and trying to detect semantic similarity might not get useful results. And I am skeptical about doing a sequence match since, as I mentioned, the related terms do not necessarily follow the same naming convention and a sequence match might not give good results.
If necessary, I can just create the associations manually, but since the glossary is so large (about 9,000 terms) I really want to avoid that. I also suspect that there is a way to do this with machine learning as this seems like a classification problem, but I'll be honest in saying that I am really a novice in ML and really don't know what kind of algorithm would be able to do what I'm seeking.
Basically, I have a problem that is somewhat unique in that it's a governance problem that has spilled over into a programming one, and right now I can only think of creating these associations manually. Not looking for code as much as I am for libraries, tools, or general suggestions on how to deal with a kind of problem like this.
data-structures glossary
add a comment |
I'm really not sure where to post this question, as this is more of a data governance question than a programming one, and is also somewhat subjective. I will remove/repost elsewhere if this is not the right place.
I am setting up a glossary of terms using a bottom-up approach. Namely, I am initially generating the terms from all of the unique logical column names from different databases and aggregating them into one central location, the glossary. The clean-up and creation of relationships is proving to be an exceptionally difficult and manual process so far. This particular post concerns automating the creation of taxonomies/hierarchical groupings within the larger list of glossary terms.
Let's say that I have a list of ten terms in the glossary below. The terms in my glossary which I know to be related are not necessarily united by any naming convention, so I'm simulating that with this list:
['term a',
'second term',
'term 3',
'term d',
'term number five',
'sixth term',
'seventh term',
'term eight',
'ninth term',
'term 10']
I know for a fact that sixth term
is a parent term of term a
, term number five
and term 10
, and ideally I would want to create that association in something like a Python dictionary of lists, as follows:
'sixth term': ['term a', 'term number five', 'term 10']
So basically, my problem is that:
- I have a series of terms that I know to be related in a hierarchical fashion
- The related terms do not necessarily follow a standard naming convention
My initial thoughts on how to deal with this are:
- Trying to use string-matching on the definitions of these terms
- Finding any common terminology that does exist between the related terms and starting from there
- Using a library like difflib to do sequence-matching on terms and their definitions
The problem with using definitions as inputs is that the definitions are not always so good and trying to detect semantic similarity might not get useful results. And I am skeptical about doing a sequence match since, as I mentioned, the related terms do not necessarily follow the same naming convention and a sequence match might not give good results.
If necessary, I can just create the associations manually, but since the glossary is so large (about 9,000 terms) I really want to avoid that. I also suspect that there is a way to do this with machine learning as this seems like a classification problem, but I'll be honest in saying that I am really a novice in ML and really don't know what kind of algorithm would be able to do what I'm seeking.
Basically, I have a problem that is somewhat unique in that it's a governance problem that has spilled over into a programming one, and right now I can only think of creating these associations manually. Not looking for code as much as I am for libraries, tools, or general suggestions on how to deal with a kind of problem like this.
data-structures glossary
add a comment |
I'm really not sure where to post this question, as this is more of a data governance question than a programming one, and is also somewhat subjective. I will remove/repost elsewhere if this is not the right place.
I am setting up a glossary of terms using a bottom-up approach. Namely, I am initially generating the terms from all of the unique logical column names from different databases and aggregating them into one central location, the glossary. The clean-up and creation of relationships is proving to be an exceptionally difficult and manual process so far. This particular post concerns automating the creation of taxonomies/hierarchical groupings within the larger list of glossary terms.
Let's say that I have a list of ten terms in the glossary below. The terms in my glossary which I know to be related are not necessarily united by any naming convention, so I'm simulating that with this list:
['term a',
'second term',
'term 3',
'term d',
'term number five',
'sixth term',
'seventh term',
'term eight',
'ninth term',
'term 10']
I know for a fact that sixth term
is a parent term of term a
, term number five
and term 10
, and ideally I would want to create that association in something like a Python dictionary of lists, as follows:
'sixth term': ['term a', 'term number five', 'term 10']
So basically, my problem is that:
- I have a series of terms that I know to be related in a hierarchical fashion
- The related terms do not necessarily follow a standard naming convention
My initial thoughts on how to deal with this are:
- Trying to use string-matching on the definitions of these terms
- Finding any common terminology that does exist between the related terms and starting from there
- Using a library like difflib to do sequence-matching on terms and their definitions
The problem with using definitions as inputs is that the definitions are not always so good and trying to detect semantic similarity might not get useful results. And I am skeptical about doing a sequence match since, as I mentioned, the related terms do not necessarily follow the same naming convention and a sequence match might not give good results.
If necessary, I can just create the associations manually, but since the glossary is so large (about 9,000 terms) I really want to avoid that. I also suspect that there is a way to do this with machine learning as this seems like a classification problem, but I'll be honest in saying that I am really a novice in ML and really don't know what kind of algorithm would be able to do what I'm seeking.
Basically, I have a problem that is somewhat unique in that it's a governance problem that has spilled over into a programming one, and right now I can only think of creating these associations manually. Not looking for code as much as I am for libraries, tools, or general suggestions on how to deal with a kind of problem like this.
data-structures glossary
I'm really not sure where to post this question, as this is more of a data governance question than a programming one, and is also somewhat subjective. I will remove/repost elsewhere if this is not the right place.
I am setting up a glossary of terms using a bottom-up approach. Namely, I am initially generating the terms from all of the unique logical column names from different databases and aggregating them into one central location, the glossary. The clean-up and creation of relationships is proving to be an exceptionally difficult and manual process so far. This particular post concerns automating the creation of taxonomies/hierarchical groupings within the larger list of glossary terms.
Let's say that I have a list of ten terms in the glossary below. The terms in my glossary which I know to be related are not necessarily united by any naming convention, so I'm simulating that with this list:
['term a',
'second term',
'term 3',
'term d',
'term number five',
'sixth term',
'seventh term',
'term eight',
'ninth term',
'term 10']
I know for a fact that sixth term
is a parent term of term a
, term number five
and term 10
, and ideally I would want to create that association in something like a Python dictionary of lists, as follows:
'sixth term': ['term a', 'term number five', 'term 10']
So basically, my problem is that:
- I have a series of terms that I know to be related in a hierarchical fashion
- The related terms do not necessarily follow a standard naming convention
My initial thoughts on how to deal with this are:
- Trying to use string-matching on the definitions of these terms
- Finding any common terminology that does exist between the related terms and starting from there
- Using a library like difflib to do sequence-matching on terms and their definitions
The problem with using definitions as inputs is that the definitions are not always so good and trying to detect semantic similarity might not get useful results. And I am skeptical about doing a sequence match since, as I mentioned, the related terms do not necessarily follow the same naming convention and a sequence match might not give good results.
If necessary, I can just create the associations manually, but since the glossary is so large (about 9,000 terms) I really want to avoid that. I also suspect that there is a way to do this with machine learning as this seems like a classification problem, but I'll be honest in saying that I am really a novice in ML and really don't know what kind of algorithm would be able to do what I'm seeking.
Basically, I have a problem that is somewhat unique in that it's a governance problem that has spilled over into a programming one, and right now I can only think of creating these associations manually. Not looking for code as much as I am for libraries, tools, or general suggestions on how to deal with a kind of problem like this.
data-structures glossary
data-structures glossary
asked Mar 25 at 2:55
njrobnjrob
196
196
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55330698%2flooking-for-an-automated-way-to-generate-taxonomies-for-a-glossary%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55330698%2flooking-for-an-automated-way-to-generate-taxonomies-for-a-glossary%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown