Looking for an automated way to generate taxonomies for a glossaryProgramming GlossaryExample for a simple LaTeX glossaryGlossary Web Part for SharePoint 2007 / Glossary jQuery / Glossary Management SystemOnline glossary - simple (free) approachLooking for a way to incorporate an accordion into an alphabetic glossary in Drupal 7How to create a glossary in Sphinx?Rmarkdown glossaryDelete multiple nodes from linked list javaSchema.org for acronym / abbreviation (in a glossary)Glossary tool for a web pages?

Can an open source licence be revoked if it violates employer's IP?

Leveraging cash for buying car

How to search for Android apps without ads?

Does anyone recognize these rockets, and their location?

How did Avada Kedavra get its name?

Monotonic operations and integrals

Is it possible for underground bunkers on different continents to be connected?

Is it a bad idea to have an pen name with only an initial for a surname?

What is the context for Napoleon's quote "[the Austrians] did not know the value of five minutes"?

Someone who is granted access to information but not expected to read it

TiKZ won't graph 1/sqrt(x)

newcommand with parameter blank or zero

Idiom for 'person who gets violent when drunk"

IIS LAN and WAN separate SSL certificates for the same server

Why did the USA sell so many airplanes prior to WW2?

Having some issue with notation in a Hilbert space

Cant bend fingertip when finger is straight

Is there a term for someone whose preferred policies are a mix of Left and Right?

Can a 40amp breaker be used safely and without issue with a 40amp device on 6AWG wire?

How to make a villain when your PCs are villains?

Print the phrase "And she said, 'But that's his.'" using only the alphabet

Can I give my friend the sour dough "throw away" as a starter to their sourdough starter?

At zero velocity, is this object neither speeding up nor slowing down?

...and then she held the gun



Looking for an automated way to generate taxonomies for a glossary


Programming GlossaryExample for a simple LaTeX glossaryGlossary Web Part for SharePoint 2007 / Glossary jQuery / Glossary Management SystemOnline glossary - simple (free) approachLooking for a way to incorporate an accordion into an alphabetic glossary in Drupal 7How to create a glossary in Sphinx?Rmarkdown glossaryDelete multiple nodes from linked list javaSchema.org for acronym / abbreviation (in a glossary)Glossary tool for a web pages?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I'm really not sure where to post this question, as this is more of a data governance question than a programming one, and is also somewhat subjective. I will remove/repost elsewhere if this is not the right place.



I am setting up a glossary of terms using a bottom-up approach. Namely, I am initially generating the terms from all of the unique logical column names from different databases and aggregating them into one central location, the glossary. The clean-up and creation of relationships is proving to be an exceptionally difficult and manual process so far. This particular post concerns automating the creation of taxonomies/hierarchical groupings within the larger list of glossary terms.



Let's say that I have a list of ten terms in the glossary below. The terms in my glossary which I know to be related are not necessarily united by any naming convention, so I'm simulating that with this list:



['term a',
'second term',
'term 3',
'term d',
'term number five',
'sixth term',
'seventh term',
'term eight',
'ninth term',
'term 10']


I know for a fact that sixth term is a parent term of term a, term number five and term 10, and ideally I would want to create that association in something like a Python dictionary of lists, as follows:



'sixth term': ['term a', 'term number five', 'term 10']


So basically, my problem is that:



  • I have a series of terms that I know to be related in a hierarchical fashion

  • The related terms do not necessarily follow a standard naming convention

My initial thoughts on how to deal with this are:



  • Trying to use string-matching on the definitions of these terms

  • Finding any common terminology that does exist between the related terms and starting from there

  • Using a library like difflib to do sequence-matching on terms and their definitions

The problem with using definitions as inputs is that the definitions are not always so good and trying to detect semantic similarity might not get useful results. And I am skeptical about doing a sequence match since, as I mentioned, the related terms do not necessarily follow the same naming convention and a sequence match might not give good results.



If necessary, I can just create the associations manually, but since the glossary is so large (about 9,000 terms) I really want to avoid that. I also suspect that there is a way to do this with machine learning as this seems like a classification problem, but I'll be honest in saying that I am really a novice in ML and really don't know what kind of algorithm would be able to do what I'm seeking.



Basically, I have a problem that is somewhat unique in that it's a governance problem that has spilled over into a programming one, and right now I can only think of creating these associations manually. Not looking for code as much as I am for libraries, tools, or general suggestions on how to deal with a kind of problem like this.










share|improve this question




























    0















    I'm really not sure where to post this question, as this is more of a data governance question than a programming one, and is also somewhat subjective. I will remove/repost elsewhere if this is not the right place.



    I am setting up a glossary of terms using a bottom-up approach. Namely, I am initially generating the terms from all of the unique logical column names from different databases and aggregating them into one central location, the glossary. The clean-up and creation of relationships is proving to be an exceptionally difficult and manual process so far. This particular post concerns automating the creation of taxonomies/hierarchical groupings within the larger list of glossary terms.



    Let's say that I have a list of ten terms in the glossary below. The terms in my glossary which I know to be related are not necessarily united by any naming convention, so I'm simulating that with this list:



    ['term a',
    'second term',
    'term 3',
    'term d',
    'term number five',
    'sixth term',
    'seventh term',
    'term eight',
    'ninth term',
    'term 10']


    I know for a fact that sixth term is a parent term of term a, term number five and term 10, and ideally I would want to create that association in something like a Python dictionary of lists, as follows:



    'sixth term': ['term a', 'term number five', 'term 10']


    So basically, my problem is that:



    • I have a series of terms that I know to be related in a hierarchical fashion

    • The related terms do not necessarily follow a standard naming convention

    My initial thoughts on how to deal with this are:



    • Trying to use string-matching on the definitions of these terms

    • Finding any common terminology that does exist between the related terms and starting from there

    • Using a library like difflib to do sequence-matching on terms and their definitions

    The problem with using definitions as inputs is that the definitions are not always so good and trying to detect semantic similarity might not get useful results. And I am skeptical about doing a sequence match since, as I mentioned, the related terms do not necessarily follow the same naming convention and a sequence match might not give good results.



    If necessary, I can just create the associations manually, but since the glossary is so large (about 9,000 terms) I really want to avoid that. I also suspect that there is a way to do this with machine learning as this seems like a classification problem, but I'll be honest in saying that I am really a novice in ML and really don't know what kind of algorithm would be able to do what I'm seeking.



    Basically, I have a problem that is somewhat unique in that it's a governance problem that has spilled over into a programming one, and right now I can only think of creating these associations manually. Not looking for code as much as I am for libraries, tools, or general suggestions on how to deal with a kind of problem like this.










    share|improve this question
























      0












      0








      0








      I'm really not sure where to post this question, as this is more of a data governance question than a programming one, and is also somewhat subjective. I will remove/repost elsewhere if this is not the right place.



      I am setting up a glossary of terms using a bottom-up approach. Namely, I am initially generating the terms from all of the unique logical column names from different databases and aggregating them into one central location, the glossary. The clean-up and creation of relationships is proving to be an exceptionally difficult and manual process so far. This particular post concerns automating the creation of taxonomies/hierarchical groupings within the larger list of glossary terms.



      Let's say that I have a list of ten terms in the glossary below. The terms in my glossary which I know to be related are not necessarily united by any naming convention, so I'm simulating that with this list:



      ['term a',
      'second term',
      'term 3',
      'term d',
      'term number five',
      'sixth term',
      'seventh term',
      'term eight',
      'ninth term',
      'term 10']


      I know for a fact that sixth term is a parent term of term a, term number five and term 10, and ideally I would want to create that association in something like a Python dictionary of lists, as follows:



      'sixth term': ['term a', 'term number five', 'term 10']


      So basically, my problem is that:



      • I have a series of terms that I know to be related in a hierarchical fashion

      • The related terms do not necessarily follow a standard naming convention

      My initial thoughts on how to deal with this are:



      • Trying to use string-matching on the definitions of these terms

      • Finding any common terminology that does exist between the related terms and starting from there

      • Using a library like difflib to do sequence-matching on terms and their definitions

      The problem with using definitions as inputs is that the definitions are not always so good and trying to detect semantic similarity might not get useful results. And I am skeptical about doing a sequence match since, as I mentioned, the related terms do not necessarily follow the same naming convention and a sequence match might not give good results.



      If necessary, I can just create the associations manually, but since the glossary is so large (about 9,000 terms) I really want to avoid that. I also suspect that there is a way to do this with machine learning as this seems like a classification problem, but I'll be honest in saying that I am really a novice in ML and really don't know what kind of algorithm would be able to do what I'm seeking.



      Basically, I have a problem that is somewhat unique in that it's a governance problem that has spilled over into a programming one, and right now I can only think of creating these associations manually. Not looking for code as much as I am for libraries, tools, or general suggestions on how to deal with a kind of problem like this.










      share|improve this question














      I'm really not sure where to post this question, as this is more of a data governance question than a programming one, and is also somewhat subjective. I will remove/repost elsewhere if this is not the right place.



      I am setting up a glossary of terms using a bottom-up approach. Namely, I am initially generating the terms from all of the unique logical column names from different databases and aggregating them into one central location, the glossary. The clean-up and creation of relationships is proving to be an exceptionally difficult and manual process so far. This particular post concerns automating the creation of taxonomies/hierarchical groupings within the larger list of glossary terms.



      Let's say that I have a list of ten terms in the glossary below. The terms in my glossary which I know to be related are not necessarily united by any naming convention, so I'm simulating that with this list:



      ['term a',
      'second term',
      'term 3',
      'term d',
      'term number five',
      'sixth term',
      'seventh term',
      'term eight',
      'ninth term',
      'term 10']


      I know for a fact that sixth term is a parent term of term a, term number five and term 10, and ideally I would want to create that association in something like a Python dictionary of lists, as follows:



      'sixth term': ['term a', 'term number five', 'term 10']


      So basically, my problem is that:



      • I have a series of terms that I know to be related in a hierarchical fashion

      • The related terms do not necessarily follow a standard naming convention

      My initial thoughts on how to deal with this are:



      • Trying to use string-matching on the definitions of these terms

      • Finding any common terminology that does exist between the related terms and starting from there

      • Using a library like difflib to do sequence-matching on terms and their definitions

      The problem with using definitions as inputs is that the definitions are not always so good and trying to detect semantic similarity might not get useful results. And I am skeptical about doing a sequence match since, as I mentioned, the related terms do not necessarily follow the same naming convention and a sequence match might not give good results.



      If necessary, I can just create the associations manually, but since the glossary is so large (about 9,000 terms) I really want to avoid that. I also suspect that there is a way to do this with machine learning as this seems like a classification problem, but I'll be honest in saying that I am really a novice in ML and really don't know what kind of algorithm would be able to do what I'm seeking.



      Basically, I have a problem that is somewhat unique in that it's a governance problem that has spilled over into a programming one, and right now I can only think of creating these associations manually. Not looking for code as much as I am for libraries, tools, or general suggestions on how to deal with a kind of problem like this.







      data-structures glossary






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 25 at 2:55









      njrobnjrob

      196




      196






















          0






          active

          oldest

          votes












          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55330698%2flooking-for-an-automated-way-to-generate-taxonomies-for-a-glossary%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55330698%2flooking-for-an-automated-way-to-generate-taxonomies-for-a-glossary%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

          Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

          Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript