Relationship between words/phrases

What do you want to have included in the WordNet.Net library?

Relationship between words/phrases

Postby thanhdn » Tue Jul 19, 2005 10:15 am

Hi all,

I have a question. If you are exprienced about this problem, pls help me.

+ First, finding the relations of two words:
Given two words, the code is to find the relations between them(or compute the semantic/lexical similarity). These relations such as "kind-of"(hypernym, hyponym for nouns and hypernymy and troponymy for verbs) and "part-of" (holonymy and meronymy for nouns) or equivalence(synonym).
Example:
"tree" is a kind of "plant", "tree" is hyponym
of "plant" and "plant" is hypernym of "tree". Analogously from "trunk" is a part of "tree" we have that trunk is meronym of "tree" and "tree" is holonym of "trunk". otherwise, "car" is synonym of "auto".

As Troy has wroted to me, in his example "if you enter 'car' and 'wheel' you would like 'meronym' to be one of the types listed", the code may result a list of relations.

a hour ago, I 've coded a searcher for searching the connection between two words. In order to reduce the computational time, I have putted two restrictions, the first one is that only synonym relation is considered(used the lexical class of Jeff), and the second one is to limit the length of the searching path.

+ Second, finding the relations of words in a short phrase:
Since each searching word may have more than one senses, so we can disambiguate words in short phrases. (My friend told me the lesk algorithm can be used to solve this problem, but i've never read it).

For example, the word pine has 2 senses:
sense 1: kind of evergreen tree with needle–shaped leaves
sense 2: waste away through sorrow or illness.
the word "cone" has 3 senses:
sense 1: solid body which narrows to a point
sense 2: something of this shape whether solid or hollow
sense 3: fruit of certain evergreen tree

By comparing each of the two senses of the word "pine" with each of the three senses of the word "cone", it is found that the words "evergreen tree" occurs in one sense each of the two words. So these two senses are then declared to be the most appropriate senses when the words "pine" and "cone" are used together. This inference may help to reduce the complexity.


Have you coded something like this ? Now i am thinking about this problem in order to experiment an application that could semi-auto make matchings between xml schema.

Thank you.
thanhdn
 
Posts: 13
Joined: Fri Jul 15, 2005 8:06 am

OK

Postby thanhdn » Sun Aug 14, 2005 1:22 am

I think that should research and start coding these features. I will also write some articles to discuss with you their useful applications.

Collaborations or help are alway welcome !


Thank you in advance.
thanhdn
 
Posts: 13
Joined: Fri Jul 15, 2005 8:06 am

Postby ebswift » Sun Aug 14, 2005 5:23 am

Hello Thanh, just post any thoughts you have in here, I will post my ideas too. Hopefully by discussing we can get more useful thoughts.

My only thought so far is a "brute force" approach. I'll use "trunk" and "tree" as an example to explain what I mean.

First, do a search for "trunk". Next, find the part of speech using Jeff Martin's lexicon. Then, now that we have established that we are dealing with a noun start activating each relation search that is returned as part of the search set and look at the returned word list to see if there is a comparison with the second word.

When the search gets to 'holonym', it will find 'tree', so you have established that tree is a holonym of trunk. For the brute force to work effectively you would also need a list of the synonyms of the second word 'tree' to compare with lexemes in the relations searches instead of using the single word 'tree'. That would allow for a 'fuzzy' match on words.

That's my thoughts anyway - it would allow for a fast-track solution to your problem, allowing for a more elegant one to be designed later. If you look at the code for an overview search in the Windows example of the library, you will see how the available relations searches are populated into a popup menu from the parts of speech buttons along the top. You could activate those searches manually.
~Troy
ebswift
Site Admin
 
Posts: 90
Joined: Tue Jun 07, 2005 12:41 am
Location: Queensland, Australia

semantic similarity

Postby thanhdn » Sun Aug 14, 2005 2:06 pm

Hi Troy,

That's great way . Yes, brute-force has a low speed, but is to simply generate all possible routes...and seems now to be only way for the semantic relatedness problem.

Relately to my article at CP, currently I am more concerned on Semantic similarity, which is a special case of semantic relatedness(as you have presented above). The semantic similarity only considers the IS-A(kind-of) relation (hyponymy/ hypernymy for noun and troponymy for verb).

Compute path distance between two words a and b is to searching the connection path between them in WordNet. This can be done by searching the paths from each sense of a to each sense of b , and then select the shortest path. PathLength is measured in nodes rather than links. So the length between siblings or sister nodes is 3, the length between two member of the same synset is 1.

In example: a hyponymy relation in WordNet
Code: Select all
                   
                   object
                       |
                   artifact
                 /           \
               /               \
   Instrumentality             article
            /                           \       
conveynance, transport                    ware
           |                                    \                                   
       vehicle                                 table ware
           |                                               \
       wheeled vehicle                              cutlery, eating utensil
      /               \                                        |
automotive, motor     bike,bicycle                          fork
  /               \
car, auto,...  truck


Looking at this tree(sorry if it looks bad but it takes 5 mins to draw the tree :D ), the leght between "car" and "auto" is 1 because they both belong to the same synset. the length between "car" and "bike" is 4. length between "car" and "fork" is 12.

Personally, I think the path length above gives us a simple way to compute relatedness distance between two words. Some issues need to be addressed:
- Lemmatization : when looking up a word in WN, the word is first lemmatized. So the distance between "book" and "books" is 0 since thay are identical. "Mice" and "mouse" ? This can be done by using Morph.cs ... I've not tried this with morph.cs
- The path length just only compare the words which have same part of speech(POS). This means that we don't compare a noun and a verb because they are located in different taxonomy trees. and I just consider the words that are nouns , verbs, or adj, respectively. We will use lexical of Jeff Martin, when considering a word, we first check if it is a noun and if so we will treat it as a noun and its verb or adj will be disregarded. If it is not a noun, we will check if it is a verb...

- Compound nouns: like "travel agent" they will be treated as single word through the tokenization.

We have many measures to compute the similarity based on path length such as Leacok-chodorow, Wu-Palmer, Resnik. The path length measures have the advantage of that is independent of corpus statics...but they are also not much successful.

Beside the fomular I've proposed in experiment of the article at CP, I think that there is a one simple similarity path measure :

Sim(s1, s2) = 1 / dist(s1, s2).
Where s1, s2 is two synsets of words a,b respectively. dist(s1, s2) is path length between s1 and s2.

Just concerning on the semantic similarity, do you agree with me ?
Last edited by thanhdn on Mon Aug 15, 2005 9:45 am, edited 2 times in total.
thanhdn
 
Posts: 13
Joined: Fri Jul 15, 2005 8:06 am

Postby ebswift » Sun Aug 14, 2005 8:06 pm

Thanh, that does look like the most logical approach, great diagram example.
~Troy
ebswift
Site Admin
 
Posts: 90
Joined: Tue Jun 07, 2005 12:41 am
Location: Queensland, Australia

...

Postby thanhdn » Mon Aug 15, 2005 1:43 am

Troy,

Thank you very much for attending to this matter though I know you are busy now.

Thanks again!
thanhdn
 
Posts: 13
Joined: Fri Jul 15, 2005 8:06 am

Lesk algorithm and other related algorithms

Postby richardn » Mon Aug 15, 2005 8:29 pm

Hello Thanh,

Try looking at the following article, which discusses applying the Lesk algorithm and other related algorithms to the WordNet library.

http://www.msi.umn.edu/general/Reports/ ... 005-25.pdf

I for one would be interested if you managed to apply these ideas to WordNet using .NET code.

Richard
richardn
 
Posts: 1
Joined: Mon Aug 15, 2005 8:25 pm

Good new

Postby thanhdn » Tue Aug 16, 2005 4:32 am

Hi Richard,

Thank you very much for that interesting paper.
Yes, I am trying to apply these good ideas to WordNet.Net.
The major problems now are improving the speed of searching connection path and extending the measurement algorithms.

Thanh
thanhdn
 
Posts: 13
Joined: Fri Jul 15, 2005 8:06 am

Postby ebswift » Sun Aug 28, 2005 11:21 pm

Thanh,

I have found what appears to be a project specialising in the kind of relationship searching you are trying to achieve; it's in perl but the functionality may be of use.

The summary page is here:

http://search.cpan.org/dist/WordNet-Similarity/doc/intro.pod

There is a web interface here:

http://marimba.d.umn.edu/cgi-bin/similarity/similarity.cgi

Perl modules and documentation are here:

http://search.cpan.org/dist/WordNet-Similarity/

I hope this information is of some use.
~Troy
ebswift
Site Admin
 
Posts: 90
Joined: Tue Jun 07, 2005 12:41 am
Location: Queensland, Australia

....

Postby thanhdn » Wed Aug 31, 2005 2:03 pm

Hello Troy,

Thank you very much, it is very helpful. Unfortunately, I dont know Perl, but can benefit something from Ted's document and lib functionality.

Currently temporaries project is hosted at : http://opensvn.csie.org/WordNetDotNet/t ... cts/Thanh/

ps: I am very interested in your current work on process of returning the searching result as structure which is much better than a flat string.
thanhdn
 
Posts: 13
Joined: Fri Jul 15, 2005 8:06 am

Re: Relationship between words/phrases

Postby bsubba » Sun Aug 21, 2011 7:20 pm

Hi Thanhdn,
I am trying to write a program to find the semantic similarity between two given nouns using php.I am using wordnet as the knowledge base.I saw your post.I found difficulties in getting the kind of tree you have given in this post.Can you please help me to get such kind of tree programatically so that I can use the different similarity measures like Resnick, Lesk etc.
It would be nice if you can provide me some kind of algorithms or step to achieve the tree.
thanks in advace
regards,
bikash
bsubba
 
Posts: 1
Joined: Sun Aug 21, 2011 6:48 pm


Return to Wish List

Who is online

Users browsing this forum: No registered users and 0 guests

cron