Eigenword embeddings and spectral learning; I'm a beginner....

In summary: I think the best approach is to start with the basics and work my way up.In summary, the student is trying to learn how to program in Java and work with SWELL code. They are having difficulty with the environment setup and compiling the code.
  • #1
dominique_
7
1
Hi everyone,

I am a mathematics undergraduate and I'm currently doing an internship at the informatics department of a university. I am well and truly out of my depth. My supervisor has assigned me tasks which include Java (a language I'm having to quickly pick up, having only used python/R).

The first task is to write a program that takes as an input a set of words and outputs a set of random vector embeddings for it so that they can be fed into wordvectors.org.
These are the steps I think I have to do but I don't know how, I'm working on netbeans for Java but I'm wondering if I should use Eclipse .1. Opening a text file
2. Reading in a text file
3. Storing the words (i.e. strings) of the text file in an array.
4. Open another text file
5. Loop over the array of stored words.
6. For each word, output the word to the text file, along with a few random numbers.
7. Close the the text files.

Additionally, in the upcoming weeks I am to create a first word embedding prototype which he has detailed as
1) Download WordNet
2) Create a matrix graph from it and
3) do Singular value decomposition on it.

I am also to understand SWELL code for Java.

---> SWELL https://github.com/paramveerdhillon/swell
---> Paper on which this project is based http://www.pdhillon.com/dhillon15a.pdf
---> http://wordvectors.org/ (I'm required to download data and experiment with this...?)I am so sorry if this is vague or lazy but I've been researching academic papers and trying to absorb so much and I don't want to screw this opportunity up.

I'd be grateful for absolutely anything. I obtained this internship based on my linear algebra knowledge and concepts such as linear transformation, SVD, eigenvectors are obviously at play here and have made my research more digestible but I can't see how any of that can be implemented by me, how am I of use to this research team, gah!

Thank-you in advance
 
Technology news on Phys.org
  • #2
For the IDE, either one is as good as the other. I would find out what the majority of people in the department are using and go with that. The important thing is to pick one and stick with it. I use IntelliJ so I'm not familar with using either of them but there are plenty of people here who use Netbeans and Eclipse.

There are lots of tutorials for learning Java I/O. Here is a site that is good for learning the basics of Java programming - http://www.mkyong.com/tutorials/java-io-tutorials/.
I would start with the http://www.mkyong.com/java/how-to-read-file-from-java-bufferedreader-example/ and http://www.mkyong.com/java/how-to-write-to-file-in-java-bufferedwriter-example/ tutorials.

Meanwhile, download the SWELL code and see if you can get it to compile and run the examples.
 
  • Like
Likes FactChecker
  • #3
Thank you so much Borg!

You're going to have to bear with me : I downloaded the SWELL code and there are multiple classes and .java and .txt. So I don't know what to do, are the examples the txt files? How do I compile and run examples? I need to add things to the code,right?

I don't know if you can be much clearer...

All the PhD students in my lab are working on other projects and even they are at a loss :(

Thanks for the links, though
 
  • #4
Take it a step at a time and you'll get through it. Before you get to the SWELL code, let's tackle your environment.
  1. Which IDE are you going to use?
  2. Have you been able to create and compile a basic "Hello World" project in your IDE?
Once those are accomplished, you should try to get the I/O examples working. Only then should you try to work with the SWELL code. You don't want to have to deal with environment setup issues at the same time as you're trying to figure out the SWELL API.
 
  • #5
Eclipse, however when I go to open a .java from SWELL it only gives my the netbeans option and I can't seem to deselect it and use Eclipse

I can compile a basic Hello World, yes. Right now, I've just begun to understand loops in Java. So I have knowledge of int, doubles, constructors, instance and local variables, stuff like that. The beginner stuff. I've done a few tutorials on Udacity, as well.
 
  • #6
dominique_ said:
Eclipse, however when I go to open a .java from SWELL it only gives my the netbeans option and I can't seem to deselect it and use Eclipse

I can compile a basic Hello World, yes. Right now, I've just begun to understand loops in Java. So I have knowledge of int, doubles, constructors, instance and local variables, stuff like that. The beginner stuff. I've done a few tutorials on Udacity, as well.
For now, it isn't important to associate java files with eclipse. How are you compiling the code? Through the command line or from within an http://www.cis.upenn.edu/~matuszek/cit591-2004/Pages/starting-eclipse.html? Have you created a project for the SWELL code?

Other notes:
I've looked initially at the SWELL respository and the project is using Apache Ant to compile the code. You will need to download the latest (http://mirrors.koehn.com/apache//ant/binaries/apache-ant-1.9.5-bin.zip ). The build.xml file that ant uses has hardcoded paths to eclipse that you'll probably need to change later. Which brings me to your environment - are you using Windows, Linux or something else? Please note the version.
 
Last edited by a moderator:
  • #7
From within an Eclipse project.

No I haven't created a project for the swell code
 
  • #8
I'm on Linux x86-64
 
  • #9
dominique_ said:
From within an Eclipse project.

No I haven't created a project for the swell code
dominique_ said:
I'm on Linux x86-64
OK. I don't usually work on UNIX-based systems so I won't be much help with environmental settings.

As far as the SWELL code, I've looked at the project a little more closely and it uses makefiles and C++ files that I haven't worked with. I'll see if I can download it and get it running tomorow. If we're lucky, the project will just create a jar file that you can use in your project - I haven't read the SWELL docs.

I think that it would be better if you review the I/O examples and get them running first. I don't have time today to dig into the SWELL code but I can answer questions on the I/O examples pretty easily.
 
  • #10
Thanks for all your help!
 
  • Like
Likes Borg
  • #11
Wow, the work you're into sounds really interesting. I'm reading about it all right now. My two sense is use Eclipse, since you're new, I should warn you be sure to use the most modern forms of anything you use. For instance be sure to look at the modern collections classes
http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html
Which support operations like set intersection and union
https://docs.oracle.com/javase/tutorial/collections/interfaces/set.html
Rather than getting too wrapped up in arrays. Also be sure to look at classes like Scanner for reading stuff in. I don't even know what the most modern Java has, I haven't worked in for years. But time spend reading the API docs will save ten times the amount of time spend programming reinventing the the wheel.

But most importantly, don't let anything stupid stop you. Come back here, or to stack exchange to get answers if you hit stupid brick walls. The work you're into is way too interesting to get bogged down.
 
  • #12
Fooality said:
Wow, the work you're into sounds really interesting. I'm reading about it all right now. My two sense is use Eclipse, since you're new, I should warn you be sure to use the most modern forms of anything you use, because some tutorials give it to from the ground up (aka Java.20) For instance be sure to look at the modern collections classes
http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html
Which support operations like set intersection and union
https://docs.oracle.com/javase/tutorial/collections/interfaces/set.html
Rather than getting too wrapped up in arrays. Also be sure to look at classes like Scanner for reading stuff in. I don't even know what the most modern Java has, I haven't worked in for years. But time spend reading the API docs will save ten times the amount of time spend programming reinventing the the wheel.

But most importantly, don't let anything stupid stop you. Come back here, or to stack exchange to get answers if you hit stupid brick walls. The work you're into is way too interesting to get bogged down.
 
  • #13
Thanks Fooality, I managed to write a code to take in words and spit out vectors using bufferedRead and filereader, I got some help from some Phd students, too.

The next step is the prototype which I'll no doubt be back for help with.

Thanks so much for your help!
 
  • #14
Good, don't be a stranger. I started watching the MIT Open Courseware class on AI, and I was really interested in Vapnick's support vector machines. You've got that same idea of partitioning a multi-dimensional space, that I assume is behind the Single Value decomposition of the graph in your project. (http://www.cc.gatech.edu/~vempala/papers/dfkvv.pdf) Its really compelling stuff. I was a CS major, not a math major, and unfortunately some of the math is daunting to me, but I think there's some amazing work coming from this area. I don't know the details of Hinton's "thought vectors" for instance
http://www.extremetech.com/extreme/...s-could-revolutionize-artificial-intelligence
But it sounds like it's in the same area of research. Its good to see the AI field moving beyond the broken naive biological models of neural nets and genetic algorithms and breaking new ground. You're in an amazing field at an amazing time.
 

FAQ: Eigenword embeddings and spectral learning; I'm a beginner....

What are eigenword embeddings?

Eigenword embeddings are a type of representation learning method used in natural language processing. They are a way of mapping words to numerical vectors in a high-dimensional space, which allows for the relationships between words to be captured mathematically. This can be useful for various tasks such as language translation and sentiment analysis.

What is spectral learning?

Spectral learning is a machine learning technique that involves using spectral decomposition to learn the underlying structure of data. It is often used in conjunction with eigenword embeddings to learn the relationships between words and improve the performance of natural language processing tasks.

How are eigenword embeddings and spectral learning related?

Eigenword embeddings and spectral learning are often used together to improve the performance of natural language processing tasks. The eigenword embeddings provide a way to represent words numerically, while spectral learning uses spectral decomposition to learn the relationships between these word vectors.

What are the benefits of using eigenword embeddings and spectral learning?

Using eigenword embeddings and spectral learning allows for more efficient and accurate processing of natural language. This approach can capture subtle relationships between words and improve the performance of tasks such as language translation and sentiment analysis.

How can I get started with eigenword embeddings and spectral learning?

If you are a beginner, it is recommended to first familiarize yourself with the basics of natural language processing and machine learning. Then, you can start by reading research papers and tutorials on eigenword embeddings and spectral learning to gain a deeper understanding of the concepts and techniques involved. You can also experiment with existing implementations or try building your own models using open-source libraries and datasets.

Similar threads

Replies
18
Views
5K
Replies
1
Views
2K
Replies
5
Views
8K
Replies
1
Views
2K
Replies
3
Views
2K
Replies
13
Views
2K
Replies
2
Views
2K
Replies
1
Views
3K
Back
Top