Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Eigenword embeddings and spectral learning; I'm a beginner...

  1. Jun 12, 2015 #1
    Hi everyone,

    I am a mathematics undergraduate and I'm currently doing an internship at the informatics department of a university. I am well and truly out of my depth. My supervisor has assigned me tasks which include Java (a language I'm having to quickly pick up, having only used python/R).

    The first task is to write a program that takes as an input a set of words and outputs a set of random vector embeddings for it so that they can be fed into wordvectors.org.
    These are the steps I think I have to do but I don't know how, I'm working on netbeans for Java but I'm wondering if I should use Eclipse .


    1. Opening a text file
    2. Reading in a text file
    3. Storing the words (i.e. strings) of the text file in an array.
    4. Open another text file
    5. Loop over the array of stored words.
    6. For each word, output the word to the text file, along with a few random numbers.
    7. Close the the text files.

    Additionally, in the upcoming weeks I am to create a first word embedding prototype which he has detailed as
    1) Download WordNet
    2) Create a matrix graph from it and
    3) do Singular value decomposition on it.

    I am also to understand SWELL code for Java.

    ---> SWELL https://github.com/paramveerdhillon/swell
    ---> Paper on which this project is based http://www.pdhillon.com/dhillon15a.pdf
    ---> http://wordvectors.org/ (I'm required to download data and experiment with this...?)


    I am so sorry if this is vague or lazy but I've been researching academic papers and trying to absorb so much and I don't want to screw this opportunity up.

    I'd be grateful for absolutely anything. I obtained this internship based on my linear algebra knowledge and concepts such as linear transformation, SVD, eigenvectors are obviously at play here and have made my research more digestible but I can't see how any of that can be implemented by me, how am I of use to this research team, gah!

    Thank-you in advance
     
  2. jcsd
  3. Jun 12, 2015 #2

    Borg

    User Avatar
    Science Advisor
    Gold Member

    For the IDE, either one is as good as the other. I would find out what the majority of people in the department are using and go with that. The important thing is to pick one and stick with it. I use IntelliJ so I'm not familar with using either of them but there are plenty of people here who use Netbeans and Eclipse.

    There are lots of tutorials for learning Java I/O. Here is a site that is good for learning the basics of Java programming - mkyong.com.
    I would start with the BufferedReader and BufferedWriter tutorials.

    Meanwhile, download the SWELL code and see if you can get it to compile and run the examples.
     
  4. Jun 12, 2015 #3
    Thank you so much Borg!

    You're going to have to bear with me : I downloaded the SWELL code and there are multiple classes and .java and .txt. So I don't know what to do, are the examples the txt files? How do I compile and run examples? I need to add things to the code,right?

    I don't know if you can be much clearer...

    All the PhD students in my lab are working on other projects and even they are at a loss :(

    Thanks for the links, though
     
  5. Jun 12, 2015 #4

    Borg

    User Avatar
    Science Advisor
    Gold Member

    Take it a step at a time and you'll get through it. Before you get to the SWELL code, let's tackle your environment.
    1. Which IDE are you going to use?
    2. Have you been able to create and compile a basic "Hello World" project in your IDE?
    Once those are accomplished, you should try to get the I/O examples working. Only then should you try to work with the SWELL code. You don't want to have to deal with environment setup issues at the same time as you're trying to figure out the SWELL API.
     
  6. Jun 12, 2015 #5
    Eclipse, however when I go to open a .java from SWELL it only gives my the netbeans option and I can't seem to deselect it and use Eclipse

    I can compile a basic Hello World, yes. Right now, I've just begun to understand loops in Java. So I have knowledge of int, doubles, constructors, instance and local variables, stuff like that. The beginner stuff. I've done a few tutorials on Udacity, as well.
     
  7. Jun 12, 2015 #6

    Borg

    User Avatar
    Science Advisor
    Gold Member

    For now, it isn't important to associate java files with eclipse. How are you compiling the code? Through the command line or from within an Eclipse project? Have you created a project for the SWELL code?

    Other notes:
    I've looked initially at the SWELL respository and the project is using Apache Ant to compile the code. You will need to download the latest (http://mirrors.koehn.com/apache//ant/binaries/apache-ant-1.9.5-bin.zip [Broken]). The build.xml file that ant uses has hardcoded paths to eclipse that you'll probably need to change later. Which brings me to your environment - are you using Windows, Linux or something else? Please note the version.
     
    Last edited by a moderator: May 7, 2017
  8. Jun 12, 2015 #7
    From within an Eclipse project.

    No I haven't created a project for the swell code
     
  9. Jun 12, 2015 #8
    I'm on Linux x86-64
     
  10. Jun 12, 2015 #9

    Borg

    User Avatar
    Science Advisor
    Gold Member

    OK. I don't usually work on UNIX-based systems so I won't be much help with environmental settings.

    As far as the SWELL code, I've looked at the project a little more closely and it uses makefiles and C++ files that I haven't worked with. I'll see if I can download it and get it running tomorow. If we're lucky, the project will just create a jar file that you can use in your project - I haven't read the SWELL docs.

    I think that it would be better if you review the I/O examples and get them running first. I don't have time today to dig into the SWELL code but I can answer questions on the I/O examples pretty easily.
     
  11. Jun 12, 2015 #10
    Thanks for all your help!
     
  12. Jun 12, 2015 #11
    Wow, the work you're into sounds really interesting. I'm reading about it all right now. My two sense is use Eclipse, since you're new, I should warn you be sure to use the most modern forms of anything you use. For instance be sure to look at the modern collections classes
    http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html
    Which support operations like set intersection and union
    https://docs.oracle.com/javase/tutorial/collections/interfaces/set.html
    Rather than getting too wrapped up in arrays. Also be sure to look at classes like Scanner for reading stuff in. I don't even know what the most modern Java has, I haven't worked in for years. But time spend reading the API docs will save ten times the amount of time spend programming reinventing the the wheel.

    But most importantly, don't let anything stupid stop you. Come back here, or to stack exchange to get answers if you hit stupid brick walls. The work you're into is way too interesting to get bogged down.
     
  13. Jun 12, 2015 #12
     
  14. Jun 15, 2015 #13
    Thanks Fooality, I managed to write a code to take in words and spit out vectors using bufferedRead and filereader, I got some help from some Phd students, too.

    The next step is the prototype which I'll no doubt be back for help with.

    Thanks so much for your help!
     
  15. Jun 15, 2015 #14
    Good, don't be a stranger. I started watching the MIT Open Courseware class on AI, and I was really interested in Vapnick's support vector machines. You've got that same idea of partitioning a multi-dimensional space, that I assume is behind the Single Value decomposition of the graph in your project. (http://www.cc.gatech.edu/~vempala/papers/dfkvv.pdf) Its really compelling stuff. I was a CS major, not a math major, and unfortunately some of the math is daunting to me, but I think there's some amazing work coming from this area. I don't know the details of Hinton's "thought vectors" for instance
    http://www.extremetech.com/extreme/...s-could-revolutionize-artificial-intelligence
    But it sounds like it's in the same area of research. Its good to see the AI field moving beyond the broken naive biological models of neural nets and genetic algorithms and breaking new ground. You're in an amazing field at an amazing time.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook




Similar Discussions: Eigenword embeddings and spectral learning; I'm a beginner...
Loading...