Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Mixing blogs using javascript or other means

  1. May 1, 2008 #1
    Is it possible with javascript to take one article from blog A, one article from blog B, one article from blog C, and show them all together in one big table, with one article per table cell?

    Even better, is it possible to mix blogs as above but without duplicating content on an intermediate server, and instead transferring content directly from the blog server to the client?
     
    Last edited: May 1, 2008
  2. jcsd
  3. May 1, 2008 #2
  4. May 1, 2008 #3
    This is exactly what "RSS" was invented for.

    If you go to basically any blog you will see a little icon in the top right corner of your web browser, it will either say "RSS" or be a little orange box that looks like a cell phone battery indicator. If you click on this it should put you into your web browser's RSS reader. If you add the blogs you want to read into the RSS reader, then when you open the RSS reader later you will see all the blogs you have added mixed together exactly as you asked.

    Alternately use "Google Reader", which is a website-based RSS feed reader. There are many RSS programs and websites of this type.
     
  5. May 1, 2008 #4
    Thanks. I tried both "Live Bookmarks" by firefox and "Google Reader" and none of them provides the functionality I was asking for. I want to mix articles, not just list titles or view all the articles of one blog at a time. Does any similar application support mixing blogs?

    In other words, small articles would look like posts in a thread of physicsforum: one comes from me, one comes from you, one comes from someone else, etc. This would facilitate a conversation between bloggers that is not censored by anyone, only the reader.
     
  6. May 6, 2008 #5

    -Job-

    User Avatar
    Science Advisor

    You can use Javascript to retrieve content from blogs through Ajax. You can either retrieve the content from any existing XML feeds, or just retrieve the article's HTML and parse it (not a great idea).

    Since javascript running in your page would need to query pages residing in different domains, depending on the user's security settings the browser might block this, or bring up a security warning (cross-site scripting is a big risk).

    A better approach is to retrieve the content from the blogs on the server side.
     
  7. May 6, 2008 #6
    I actually want to edit the content before presenting it. What's the command for retrieving the article's HTML?

    I see many sites have scripts from several other sites in them (it's obvious because of a firefox plugin for security I am using called NoScript). For example youtube.com has scripts from doubleclick.net and ytimg.com, and ytimg.com scripts are essential for youtube's operation. Is that cross-site scripting?

    Firefox doesn't block scripts from ytimg.com by default. Can I do the same in my site?
     
  8. May 6, 2008 #7

    -Job-

    User Avatar
    Science Advisor

    There's something called the "Same Origin Policy" and i don't think it applies to script tags, so it's acceptable to load a script from a separate domain via linked script sources (browsers with really tight security settings might still complain). If the remote site is in your "Trusted Sites" list, then it should be ok, otherwise you get an error. For instance, in IE you get a javascript error saying "Permission Denied".

    You can load a page's HTML with Ajax as in the following example:
    Code (Text):

    <html>
    <body>
    <script type="text/javascript">
    function AjaxRequest(page)
    {
        var xmlHttp;
        try{  // Firefox, Opera 8.0+, Safari
            xmlHttp = new XMLHttpRequest();  
        }catch (e){  // Internet Explorer
            try{
                xmlHttp = new ActiveXObject("Msxml2.XMLHTTP");
            }catch (e){
                try{
                    xmlHttp = new ActiveXObject("Microsoft.XMLHTTP");
                }catch (e){
                    alert("Your browser does not support AJAX!");
                    return false;
                }
            }
        }
        xmlHttp.onreadystatechange = function()
        {
            if(xmlHttp.readyState==4)
            {
                alert(xmlHttp.responseText);
            }
        }
        xmlHttp.open("GET",page,true);
        xmlHttp.send(null);
    }
    AjaxRequest("http://www.google.com");
    </script>
    </body>
    </html>
     
    So you would be able to load either the HTML from a blog article, or load in XML from some XML feed (if one is available). You can avoid cross-domain issues by having your server act as a proxy - meaning, you'd use Ajax to tell the server to go get some page and return it.
     
    Last edited: May 6, 2008
  9. May 7, 2008 #8
    I've tried that code on firefox and IE 6 and they both fail at this point:

    xmlHttp.open("GET",page,true);

    I know this because immediately before and immediately after this line I put some debug, using alert(). Adding "http://www.google.com" to the list of trusted sites of IE, the same still happens. Any thoughts?
     
  10. May 7, 2008 #9

    -Job-

    User Avatar
    Science Advisor

    Works for me in IE6, i get a permission error in FF, which is expected.
     
  11. May 7, 2008 #10
    Got it to work on IE, but had to modify the security settings. Do you understand what youtube is doing in co-operation with ytimg.com? Here it is:

    http://youtube.com/watch?v=mi-koOafKOk

    What if ytimg.com was blogspot.com?
     
  12. May 7, 2008 #11

    -Job-

    User Avatar
    Science Advisor

    They're loading scripts and images from ytimg.com, through script and img tags, both of which are allowed.

    They also load scripts from Google Analytics for example, as do most sites, for visitor stats.

    In your case loading scripts and css through these tags won't suit your needs unless the remote blog sites expose scripts that contain article contents in the form of javascript (for example using JSON) - this is a good idea actually and might happen eventually since JSON is becoming very popular.

    An option is to use IFRAME tags, each pointing to an article, however the Same Origin policy applies to IFRAMEs as well, so you won't be able to modify the contents of the IFRAME (to match your page style for example).
     
  13. May 7, 2008 #12
    What a pain, that Same Origin policy.

    I think I'll end up using an existing java-based spider like the ones below and modifying it:

    http://java-source.net/open-source/crawlers

    If only they could run as web-based applets. Convincing people to explicitly download and run a stand-alone java application might be hard.
     
  14. May 7, 2008 #13
  15. May 8, 2008 #14

    -Job-

    User Avatar
    Science Advisor

    You shouldn't let the Same Origin Policy stop you because you can easily get around that.

    All you need is a server side script like PHP that receives a URL, makes a request to that page and print out the response (that will be like 10 lines of PHP code).

    Then instead of making an Ajax request to www.someoneelsesblog.com, you'd make a request to www.mydomain.com/proxy.php?url=www.someoneelsesblog.com and take it from there.
     
  16. May 8, 2008 #15
    Alright. On the other hand, after a while it might get a little popular and the bandwidth requirements would get too high for me this way. But it seems someone has thought of a php script that bypasses the server and gets the content directly from the sites being mixed: what does this script here do?

    "Web Based Web Crawler": http://snippets.dzone.com/posts/show/3853

    I have copied the code below. Is it server-side or client-side? What code goes at "// insert code here"?

    PHP:

    // insert code here
    <html>
    <head><title>Web Crawler</title></head>
    <body>
    <form id="form1" method="post" action="">
         <label>URL:
         <input name="url" type="text" id="url" value="<?php $url; ?>" size="65" maxlength="255" />
         </label>
         <br />
         <br />
         <label>
         <input type="submit" name="Submit" value="Submit" />
         </label>
         <label>
         <input name="Reset" type="reset" id="Reset" value="Reset" />
         </label>
         <br />
    </form>
    </body>
    </html>
    <?php
    if (isset($_POST['url'])) {
    $url = $_POST['url'];
    $f = @fopen($url,"r");
    while( $buf = fgets($f,1024) )
    {
       $buf = fgets($f, 4096);
       preg_match_all("/<\s*a\s+[^>]*href\s*=\s*[\"']?([^\"' >]+)[\"' >]/isU",$buf,$words);
       for( $i = 0; $words[$i]; $i++ )
       {
          for( $j = 0; $words[$i][$j]; $j++ )
          {
            $cur_word = strtolower($words[$i][$j]);
            print "$cur_word<br>";
       }
      }
     }
    }
    ?>
     
    I have put the script on a php-supporting site, below. It seems to have a syntax error. What am I doing wrong?

    http://members.lycos.co.uk/blogmixer/test.html
     
    Last edited: May 8, 2008
  17. May 8, 2008 #16
    On second thought, I think wordpress blogs include php support, so maybe one blog could be created that plays the role of the proxy to the other wordpress blogs. Plus all blogspot.com blogs! The bandwidth would then be unlimited.

    Do blogs run out of bandwidth?
     
  18. May 8, 2008 #17
    Well, it looks like the PHP parser isn't parsing your page. It might be because your extension is .html instead of .php. In any case, you should read a basic lesson on PHP from php.net to get used to the syntax. You really only need one function call to file_get_contents() and some parsing, or something along those lines.

    There's no reason why blogs won't run out of bandwidth like any other site. It depends on the hosting plan, among other things.

    EDIT: I just read Job's post. Yeah, you could probably alter the script a little bit (for example getting rid of the forms and changing $_POST to $_GET to make things easier), and it should work.
     
    Last edited: May 8, 2008
  19. May 8, 2008 #18
    So it gets the data from the site to the server, and from the server to the client, like a normal proxy? No server bypass? (I wanted to bypass the server and connect directly to the sites and then mix the contents, the server would only give the instructions how to do that).
     
  20. May 8, 2008 #19
    Yes, that's the idea. Your server is the proxy. I don't know what you mean by server bypass.
     
  21. May 8, 2008 #20

    -Job-

    User Avatar
    Science Advisor

    The code in between <? ?> tags is PHP. You need to place it on a PHP capable server and name it with the .php extension.

    Basically the code receives a posted variable containing a URL, it performs a request to that URL, receives the response, uses regular expressions to extract the hyperlinks and repeats the process on those links. Basically it's a spider that crawls through web pages.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?



Similar Discussions: Mixing blogs using javascript or other means
  1. JavaScript form submit (Replies: 1)

  2. JavaScript Message Box (Replies: 3)

  3. Blogging Software? (Replies: 4)

Loading...