Mixing blogs using javascript or other means

Ulysees · May 1, 2008

Is it possible with javascript to take one article from blog A, one article from blog B, one article from blog C, and show them all together in one big table, with one article per table cell?

Even better, is it possible to mix blogs as above but without duplicating content on an intermediate server, and instead transferring content directly from the blog server to the client?

Ulysees · May 1, 2008

PS. All blogs wanted are from www.blogspot.com.

Coin · May 1, 2008

This is exactly what "RSS" was invented for.

If you go to basically any blog you will see a little icon in the top right corner of your web browser, it will either say "RSS" or be a little orange box that looks like a cell phone battery indicator. If you click on this it should put you into your web browser's RSS reader. If you add the blogs you want to read into the RSS reader, then when you open the RSS reader later you will see all the blogs you have added mixed together exactly as you asked.

Alternately use "Google Reader", which is a website-based RSS feed reader. There are many RSS programs and websites of this type.

Ulysees · May 1, 2008

Thanks. I tried both "Live Bookmarks" by firefox and "Google Reader" and none of them provides the functionality I was asking for. I want to mix articles, not just list titles or view all the articles of one blog at a time. Does any similar application support mixing blogs?

In other words, small articles would look like posts in a thread of physicsforum: one comes from me, one comes from you, one comes from someone else, etc. This would facilitate a conversation between bloggers that is not censored by anyone, only the reader.

-Job- · May 6, 2008

You can use Javascript to retrieve content from blogs through Ajax. You can either retrieve the content from any existing XML feeds, or just retrieve the article's HTML and parse it (not a great idea).

Since javascript running in your page would need to query pages residing in different domains, depending on the user's security settings the browser might block this, or bring up a security warning (cross-site scripting is a big risk).

A better approach is to retrieve the content from the blogs on the server side.

Ulysees · May 6, 2008

-Job- said:

You can either retrieve the content from any existing XML feeds, or just retrieve the article's HTML and parse it (not a great idea).

I actually want to edit the content before presenting it. What's the command for retrieving the article's HTML?

Since javascript running in your page would need to query pages residing in different domains, depending on the user's security settings the browser might block this, or bring up a security warning (cross-site scripting is a big risk).

I see many sites have scripts from several other sites in them (it's obvious because of a firefox plugin for security I am using called NoScript). For example youtube.com has scripts from doubleclick.net and ytimg.com, and ytimg.com scripts are essential for youtube's operation. Is that cross-site scripting?

Firefox doesn't block scripts from ytimg.com by default. Can I do the same in my site?

-Job- · May 6, 2008

There's something called the "Same Origin Policy" and i don't think it applies to script tags, so it's acceptable to load a script from a separate domain via linked script sources (browsers with really tight security settings might still complain). If the remote site is in your "Trusted Sites" list, then it should be ok, otherwise you get an error. For instance, in IE you get a javascript error saying "Permission Denied".

You can load a page's HTML with Ajax as in the following example:

Code:

<html>
<body>
<script type="text/javascript">
function AjaxRequest(page)
{
    var xmlHttp;
    try{  // Firefox, Opera 8.0+, Safari
        xmlHttp = new XMLHttpRequest();  
    }catch (e){  // Internet Explorer
        try{
            xmlHttp = new ActiveXObject("Msxml2.XMLHTTP");
        }catch (e){
            try{
                xmlHttp = new ActiveXObject("Microsoft.XMLHTTP");
            }catch (e){
                alert("Your browser does not support AJAX!");
                return false;
            }
        }
    }
    xmlHttp.onreadystatechange = function()
    {
        if(xmlHttp.readyState==4)
        {
            alert(xmlHttp.responseText);
        }
    }
    xmlHttp.open("GET",page,true);
    xmlHttp.send(null);
}
AjaxRequest("[PLAIN]http://www.google.com");[/PLAIN] 
</script>
</body>
</html>

So you would be able to load either the HTML from a blog article, or load in XML from some XML feed (if one is available). You can avoid cross-domain issues by having your server act as a proxy - meaning, you'd use Ajax to tell the server to go get some page and return it.

Ulysees · May 7, 2008

I've tried that code on firefox and IE 6 and they both fail at this point:

xmlHttp.open("GET",page,true);

I know this because immediately before and immediately after this line I put some debug, using alert(). Adding "http://www.google.com" to the list of trusted sites of IE, the same still happens. Any thoughts?

-Job- · May 7, 2008

Works for me in IE6, i get a permission error in FF, which is expected.

Ulysees · May 7, 2008

Got it to work on IE, but had to modify the security settings. Do you understand what youtube is doing in co-operation with ytimg.com? Here it is:

http://youtube.com/watch?v=mi-koOafKOk

What if ytimg.com was blogspot.com?

-Job- · May 7, 2008

They're loading scripts and images from ytimg.com, through script and img tags, both of which are allowed.

They also load scripts from Google Analytics for example, as do most sites, for visitor stats.

In your case loading scripts and css through these tags won't suit your needs unless the remote blog sites expose scripts that contain article contents in the form of javascript (for example using JSON) - this is a good idea actually and might happen eventually since JSON is becoming very popular.

An option is to use IFRAME tags, each pointing to an article, however the Same Origin policy applies to IFRAMEs as well, so you won't be able to modify the contents of the IFRAME (to match your page style for example).

Ulysees · May 7, 2008

What a pain, that Same Origin policy.

I think I'll end up using an existing java-based spider like the ones below and modifying it:

http://java-source.net/open-source/crawlers

If only they could run as web-based applets. Convincing people to explicitly download and run a stand-alone java application might be hard.

Ulysees · May 7, 2008

Does this work for you?

http://snippets.dzone.com/posts/show/3853

-Job- · May 8, 2008

You shouldn't let the Same Origin Policy stop you because you can easily get around that.

All you need is a server side script like PHP that receives a URL, makes a request to that page and print out the response (that will be like 10 lines of PHP code).

Then instead of making an Ajax request to www.someoneelsesblog.com, you'd make a request to www.mydomain.com/proxy.php?url=www.someoneelsesblog.com and take it from there.

Ulysees · May 8, 2008

Alright. On the other hand, after a while it might get a little popular and the bandwidth requirements would get too high for me this way. But it seems someone has thought of a php script that bypasses the server and gets the content directly from the sites being mixed: what does this script here do?

"Web Based Web Crawler": http://snippets.dzone.com/posts/show/3853

I have copied the code below. Is it server-side or client-side? What code goes at "// insert code here"?

PHP:

// insert code here
<html>
<head><title>Web Crawler</title></head>
<body>
<form id="form1" method="post" action="">
     <label>URL:
     <input name="url" type="text" id="url" value="<?php $url; ?>" size="65" maxlength="255" />
     </label>
     <br />
     <br />
     <label>
     <input type="submit" name="Submit" value="Submit" />
     </label>
     <label>
     <input name="Reset" type="reset" id="Reset" value="Reset" />
     </label>
     <br />
</form>
</body>
</html>
<?php
if (isset($_POST['url'])) {
$url = $_POST['url'];
$f = @fopen($url,"r");
while( $buf = fgets($f,1024) )
{
   $buf = fgets($f, 4096);
   preg_match_all("/<\s*a\s+[^>]*href\s*=\s*[\"']?([^\"' >]+)[\"' >]/isU",$buf,$words);
   for( $i = 0; $words[$i]; $i++ )
   {
      for( $j = 0; $words[$i][$j]; $j++ )
      {
        $cur_word = strtolower($words[$i][$j]);
     	print "$cur_word<br>";
   }
  }
 }
}
?>

I have put the script on a php-supporting site, below. It seems to have a syntax error. What am I doing wrong?

http://members.lycos.co.uk/blogmixer/test.html

Ulysees · May 8, 2008

-Job- said:

you'd make a request to www.mydomain.com/proxy.php?url=www.someoneelsesblog.com[/URL] and take it from there.[/QUOTE]

On second thought, I think wordpress blogs include php support, so maybe one blog could be created that plays the role of the proxy to the other wordpress blogs. Plus all blogspot.com blogs! The bandwidth would then be unlimited.

Do blogs run out of bandwidth?

Tedjn · May 8, 2008

Well, it looks like the PHP parser isn't parsing your page. It might be because your extension is .html instead of .php. In any case, you should read a basic lesson on PHP from php.net to get used to the syntax. You really only need one function call to file_get_contents() and some parsing, or something along those lines.

There's no reason why blogs won't run out of bandwidth like any other site. It depends on the hosting plan, among other things.

EDIT: I just read Job's post. Yeah, you could probably alter the script a little bit (for example getting rid of the forms and changing $_POST to $_GET to make things easier), and it should work.

Ulysees · May 8, 2008

So it gets the data from the site to the server, and from the server to the client, like a normal proxy? No server bypass? (I wanted to bypass the server and connect directly to the sites and then mix the contents, the server would only give the instructions how to do that).

Tedjn · May 8, 2008

Yes, that's the idea. Your server is the proxy. I don't know what you mean by server bypass.

-Job- · May 8, 2008

The code in between <? ?> tags is PHP. You need to place it on a PHP capable server and name it with the .php extension.

Basically the code receives a posted variable containing a URL, it performs a request to that URL, receives the response, uses regular expressions to extract the hyperlinks and repeats the process on those links. Basically it's a spider that crawls through web pages.

-Job- · May 8, 2008

If your site were to get popular you'd change the approach slightly. You'd have the browser cache requests and monitor the blog URLs for changes (at which time it would update the cache).

-Job- · May 8, 2008

Take a look at this, this is what i was thinking about:
http://buzz.blogger.com/2006/11/json-on-new-blogger.html

If a blog has a JSON feed, then you can put a script reference on your page, such as:
< script type="text/javascript" src="http://www.someoneelsesblog.com/feed/json.php?somequeryvarshere >

JSON is just data stored in Javascript objects. So a JSON feed might return something like:

Code:

CONTENT_VAR = {Article1:"Today i found i don't have a life...", Comment:"I don't have a life either"}

Because of the script reference, the blog's contents would get loaded in directly (no proxy), and would be immediately available to javascript so you can then use it in your page, for instance:

Code:

for(a in CONTENT_VAR){
     document.write(CONTENT_VAR[a]);
}

This approach bypasses the same origin policy. One thing to keep in mind is that if one of the feeds becomes unavailable or excessively slow it will impact your page's load time directly, so these feeds should be placed at the bottom of the page, after the page structure has loaded.

Ulysees · May 9, 2008

Alright! So it's definitely JSON then. And just found Blogger supports it.

Any good resources for getting started with web development like a course and reference for javascript?

-Job- · May 10, 2008

W3Schools has great tutorials for getting started with most web stuff:
http://www.w3schools.com/

Mixing blogs using javascript or other means

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Use of AI (ML/DL) in Science

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect