Eigenword embeddings and spectral learning; I'm a beginner....

Click For Summary

Discussion Overview

The discussion revolves around the challenges faced by a mathematics undergraduate intern tasked with implementing word embeddings using Java. The scope includes programming tasks related to file I/O, understanding and using the SWELL code, and applying concepts from linear algebra such as singular value decomposition (SVD) and eigenvectors.

Discussion Character

  • Homework-related
  • Technical explanation
  • Exploratory

Main Points Raised

  • The intern expresses uncertainty about the programming tasks assigned, particularly in Java, and seeks guidance on how to proceed with file handling and generating random vector embeddings.
  • Some participants suggest that either NetBeans or Eclipse can be used as an IDE, emphasizing the importance of consistency in choice.
  • One participant recommends starting with basic Java I/O tutorials to build foundational skills before tackling the SWELL code.
  • There are questions about how to compile and run the SWELL examples, with some participants noting the complexity of the code structure and the need for clarity on using the SWELL API.
  • One participant mentions the importance of using modern Java features, such as collections and the Scanner class, rather than relying solely on arrays.
  • There is a suggestion to focus on getting the I/O examples working before delving into the SWELL code, indicating a step-by-step approach to learning.

Areas of Agreement / Disagreement

Participants generally agree on the need for a structured approach to learning Java and working with the SWELL code. However, there is no consensus on the best IDE to use, as preferences vary among participants.

Contextual Notes

Participants note the intern's lack of experience with Java and the challenges posed by the complexity of the SWELL code, which includes multiple classes and dependencies. There is also mention of the need to adapt to the specific environment (Linux x86-64) and the use of Apache Ant for compilation, which may require additional setup.

Who May Find This Useful

This discussion may be useful for beginners in programming, particularly those working with Java in academic or research settings, as well as individuals interested in word embeddings and linear algebra applications in computer science.

dominique_
Messages
7
Reaction score
1
Hi everyone,

I am a mathematics undergraduate and I'm currently doing an internship at the informatics department of a university. I am well and truly out of my depth. My supervisor has assigned me tasks which include Java (a language I'm having to quickly pick up, having only used python/R).

The first task is to write a program that takes as an input a set of words and outputs a set of random vector embeddings for it so that they can be fed into wordvectors.org.
These are the steps I think I have to do but I don't know how, I'm working on netbeans for Java but I'm wondering if I should use Eclipse .1. Opening a text file
2. Reading in a text file
3. Storing the words (i.e. strings) of the text file in an array.
4. Open another text file
5. Loop over the array of stored words.
6. For each word, output the word to the text file, along with a few random numbers.
7. Close the the text files.

Additionally, in the upcoming weeks I am to create a first word embedding prototype which he has detailed as
1) Download WordNet
2) Create a matrix graph from it and
3) do Singular value decomposition on it.

I am also to understand SWELL code for Java.

---> SWELL https://github.com/paramveerdhillon/swell
---> Paper on which this project is based http://www.pdhillon.com/dhillon15a.pdf
---> http://wordvectors.org/ (I'm required to download data and experiment with this...?)I am so sorry if this is vague or lazy but I've been researching academic papers and trying to absorb so much and I don't want to screw this opportunity up.

I'd be grateful for absolutely anything. I obtained this internship based on my linear algebra knowledge and concepts such as linear transformation, SVD, eigenvectors are obviously at play here and have made my research more digestible but I can't see how any of that can be implemented by me, how am I of use to this research team, gah!

Thank-you in advance
 
Technology news on Phys.org
For the IDE, either one is as good as the other. I would find out what the majority of people in the department are using and go with that. The important thing is to pick one and stick with it. I use IntelliJ so I'm not familar with using either of them but there are plenty of people here who use Netbeans and Eclipse.

There are lots of tutorials for learning Java I/O. Here is a site that is good for learning the basics of Java programming - http://www.mkyong.com/tutorials/java-io-tutorials/.
I would start with the http://www.mkyong.com/java/how-to-read-file-from-java-bufferedreader-example/ and http://www.mkyong.com/java/how-to-write-to-file-in-java-bufferedwriter-example/ tutorials.

Meanwhile, download the SWELL code and see if you can get it to compile and run the examples.
 
  • Like
Likes   Reactions: FactChecker
Thank you so much Borg!

You're going to have to bear with me : I downloaded the SWELL code and there are multiple classes and .java and .txt. So I don't know what to do, are the examples the txt files? How do I compile and run examples? I need to add things to the code,right?

I don't know if you can be much clearer...

All the PhD students in my lab are working on other projects and even they are at a loss :(

Thanks for the links, though
 
Take it a step at a time and you'll get through it. Before you get to the SWELL code, let's tackle your environment.
  1. Which IDE are you going to use?
  2. Have you been able to create and compile a basic "Hello World" project in your IDE?
Once those are accomplished, you should try to get the I/O examples working. Only then should you try to work with the SWELL code. You don't want to have to deal with environment setup issues at the same time as you're trying to figure out the SWELL API.
 
Eclipse, however when I go to open a .java from SWELL it only gives my the netbeans option and I can't seem to deselect it and use Eclipse

I can compile a basic Hello World, yes. Right now, I've just begun to understand loops in Java. So I have knowledge of int, doubles, constructors, instance and local variables, stuff like that. The beginner stuff. I've done a few tutorials on Udacity, as well.
 
dominique_ said:
Eclipse, however when I go to open a .java from SWELL it only gives my the netbeans option and I can't seem to deselect it and use Eclipse

I can compile a basic Hello World, yes. Right now, I've just begun to understand loops in Java. So I have knowledge of int, doubles, constructors, instance and local variables, stuff like that. The beginner stuff. I've done a few tutorials on Udacity, as well.
For now, it isn't important to associate java files with eclipse. How are you compiling the code? Through the command line or from within an http://www.cis.upenn.edu/~matuszek/cit591-2004/Pages/starting-eclipse.html? Have you created a project for the SWELL code?

Other notes:
I've looked initially at the SWELL respository and the project is using Apache Ant to compile the code. You will need to download the latest (http://mirrors.koehn.com/apache//ant/binaries/apache-ant-1.9.5-bin.zip ). The build.xml file that ant uses has hardcoded paths to eclipse that you'll probably need to change later. Which brings me to your environment - are you using Windows, Linux or something else? Please note the version.
 
Last edited by a moderator:
From within an Eclipse project.

No I haven't created a project for the swell code
 
I'm on Linux x86-64
 
dominique_ said:
From within an Eclipse project.

No I haven't created a project for the swell code
dominique_ said:
I'm on Linux x86-64
OK. I don't usually work on UNIX-based systems so I won't be much help with environmental settings.

As far as the SWELL code, I've looked at the project a little more closely and it uses makefiles and C++ files that I haven't worked with. I'll see if I can download it and get it running tomorow. If we're lucky, the project will just create a jar file that you can use in your project - I haven't read the SWELL docs.

I think that it would be better if you review the I/O examples and get them running first. I don't have time today to dig into the SWELL code but I can answer questions on the I/O examples pretty easily.
 
  • #10
Thanks for all your help!
 
  • Like
Likes   Reactions: Borg
  • #11
Wow, the work you're into sounds really interesting. I'm reading about it all right now. My two sense is use Eclipse, since you're new, I should warn you be sure to use the most modern forms of anything you use. For instance be sure to look at the modern collections classes
http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html
Which support operations like set intersection and union
https://docs.oracle.com/javase/tutorial/collections/interfaces/set.html
Rather than getting too wrapped up in arrays. Also be sure to look at classes like Scanner for reading stuff in. I don't even know what the most modern Java has, I haven't worked in for years. But time spend reading the API docs will save ten times the amount of time spend programming reinventing the the wheel.

But most importantly, don't let anything stupid stop you. Come back here, or to stack exchange to get answers if you hit stupid brick walls. The work you're into is way too interesting to get bogged down.
 
  • #12
Fooality said:
Wow, the work you're into sounds really interesting. I'm reading about it all right now. My two sense is use Eclipse, since you're new, I should warn you be sure to use the most modern forms of anything you use, because some tutorials give it to from the ground up (aka Java.20) For instance be sure to look at the modern collections classes
http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html
Which support operations like set intersection and union
https://docs.oracle.com/javase/tutorial/collections/interfaces/set.html
Rather than getting too wrapped up in arrays. Also be sure to look at classes like Scanner for reading stuff in. I don't even know what the most modern Java has, I haven't worked in for years. But time spend reading the API docs will save ten times the amount of time spend programming reinventing the the wheel.

But most importantly, don't let anything stupid stop you. Come back here, or to stack exchange to get answers if you hit stupid brick walls. The work you're into is way too interesting to get bogged down.
 
  • #13
Thanks Fooality, I managed to write a code to take in words and spit out vectors using bufferedRead and filereader, I got some help from some Phd students, too.

The next step is the prototype which I'll no doubt be back for help with.

Thanks so much for your help!
 
  • #14
Good, don't be a stranger. I started watching the MIT Open Courseware class on AI, and I was really interested in Vapnick's support vector machines. You've got that same idea of partitioning a multi-dimensional space, that I assume is behind the Single Value decomposition of the graph in your project. (http://www.cc.gatech.edu/~vempala/papers/dfkvv.pdf) Its really compelling stuff. I was a CS major, not a math major, and unfortunately some of the math is daunting to me, but I think there's some amazing work coming from this area. I don't know the details of Hinton's "thought vectors" for instance
http://www.extremetech.com/extreme/...s-could-revolutionize-artificial-intelligence
But it sounds like it's in the same area of research. Its good to see the AI field moving beyond the broken naive biological models of neural nets and genetic algorithms and breaking new ground. You're in an amazing field at an amazing time.
 

Similar threads

  • · Replies 18 ·
Replies
18
Views
6K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
6K
  • · Replies 5 ·
Replies
5
Views
9K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 13 ·
Replies
13
Views
4K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K