# Mixing blogs using javascript or other means

## Main Question or Discussion Point

Is it possible with javascript to take one article from blog A, one article from blog B, one article from blog C, and show them all together in one big table, with one article per table cell?

Even better, is it possible to mix blogs as above but without duplicating content on an intermediate server, and instead transferring content directly from the blog server to the client?

Last edited:

Related Computing and Technology News on Phys.org
This is exactly what "RSS" was invented for.

Thanks. I tried both "Live Bookmarks" by firefox and "Google Reader" and none of them provides the functionality I was asking for. I want to mix articles, not just list titles or view all the articles of one blog at a time. Does any similar application support mixing blogs?

In other words, small articles would look like posts in a thread of physicsforum: one comes from me, one comes from you, one comes from someone else, etc. This would facilitate a conversation between bloggers that is not censored by anyone, only the reader.

-Job-
You can use Javascript to retrieve content from blogs through Ajax. You can either retrieve the content from any existing XML feeds, or just retrieve the article's HTML and parse it (not a great idea).

Since javascript running in your page would need to query pages residing in different domains, depending on the user's security settings the browser might block this, or bring up a security warning (cross-site scripting is a big risk).

A better approach is to retrieve the content from the blogs on the server side.

You can either retrieve the content from any existing XML feeds, or just retrieve the article's HTML and parse it (not a great idea).
I actually want to edit the content before presenting it. What's the command for retrieving the article's HTML?

Since javascript running in your page would need to query pages residing in different domains, depending on the user's security settings the browser might block this, or bring up a security warning (cross-site scripting is a big risk).
I see many sites have scripts from several other sites in them (it's obvious because of a firefox plugin for security I am using called NoScript). For example youtube.com has scripts from doubleclick.net and ytimg.com, and ytimg.com scripts are essential for youtube's operation. Is that cross-site scripting?

Firefox doesn't block scripts from ytimg.com by default. Can I do the same in my site?

-Job-
There's something called the "Same Origin Policy" and i don't think it applies to script tags, so it's acceptable to load a script from a separate domain via linked script sources (browsers with really tight security settings might still complain). If the remote site is in your "Trusted Sites" list, then it should be ok, otherwise you get an error. For instance, in IE you get a javascript error saying "Permission Denied".

You can load a page's HTML with Ajax as in the following example:
Code:
<html>
<body>
<script type="text/javascript">
function AjaxRequest(page)
{
var xmlHttp;
try{  // Firefox, Opera 8.0+, Safari
xmlHttp = new XMLHttpRequest();
}catch (e){  // Internet Explorer
try{
xmlHttp = new ActiveXObject("Msxml2.XMLHTTP");
}catch (e){
try{
xmlHttp = new ActiveXObject("Microsoft.XMLHTTP");
}catch (e){
return false;
}
}
}
{
{
}
}
xmlHttp.open("GET",page,true);
xmlHttp.send(null);
}
</script>
</body>
</html>
So you would be able to load either the HTML from a blog article, or load in XML from some XML feed (if one is available). You can avoid cross-domain issues by having your server act as a proxy - meaning, you'd use Ajax to tell the server to go get some page and return it.

Last edited by a moderator:
I've tried that code on firefox and IE 6 and they both fail at this point:

xmlHttp.open("GET",page,true);

I know this because immediately before and immediately after this line I put some debug, using alert(). Adding "http://www.google.com" [Broken] to the list of trusted sites of IE, the same still happens. Any thoughts?

Last edited by a moderator:
-Job-
Works for me in IE6, i get a permission error in FF, which is expected.

Got it to work on IE, but had to modify the security settings. Do you understand what youtube is doing in co-operation with ytimg.com? Here it is:

What if ytimg.com was blogspot.com?

-Job-
They're loading scripts and images from ytimg.com, through script and img tags, both of which are allowed.

They also load scripts from Google Analytics for example, as do most sites, for visitor stats.

In your case loading scripts and css through these tags won't suit your needs unless the remote blog sites expose scripts that contain article contents in the form of javascript (for example using JSON) - this is a good idea actually and might happen eventually since JSON is becoming very popular.

An option is to use IFRAME tags, each pointing to an article, however the Same Origin policy applies to IFRAMEs as well, so you won't be able to modify the contents of the IFRAME (to match your page style for example).

What a pain, that Same Origin policy.

I think I'll end up using an existing java-based spider like the ones below and modifying it:

http://java-source.net/open-source/crawlers

If only they could run as web-based applets. Convincing people to explicitly download and run a stand-alone java application might be hard.

Does this work for you?

http://snippets.dzone.com/posts/show/3853 [Broken]

Last edited by a moderator:
-Job-
You shouldn't let the Same Origin Policy stop you because you can easily get around that.

All you need is a server side script like PHP that receives a URL, makes a request to that page and print out the response (that will be like 10 lines of PHP code).

Then instead of making an Ajax request to www.someoneelsesblog.com, you'd make a request to www.mydomain.com/proxy.php?url=www.someoneelsesblog.com and take it from there.

Alright. On the other hand, after a while it might get a little popular and the bandwidth requirements would get too high for me this way. But it seems someone has thought of a php script that bypasses the server and gets the content directly from the sites being mixed: what does this script here do?

"Web Based Web Crawler": http://snippets.dzone.com/posts/show/3853 [Broken]

I have copied the code below. Is it server-side or client-side? What code goes at "// insert code here"?

PHP:
// insert code here
<html>
<body>
<form id="form1" method="post" action="">
<label>URL:
<input name="url" type="text" id="url" value="<?php $url; ?>" size="65" maxlength="255" /> </label> <br /> <br /> <label> <input type="submit" name="Submit" value="Submit" /> </label> <label> <input name="Reset" type="reset" id="Reset" value="Reset" /> </label> <br /> </form> </body> </html> <?php if (isset($_POST['url'])) {
$url =$_POST['url'];
$f = @fopen($url,"r");
while( $buf = fgets($f,1024) )
{
$buf = fgets($f, 4096);
preg_match_all("/<\s*a\s+[^>]*href\s*=\s*[\"']?([^\"' >]+)[\"' >]/isU",$buf,$words);
for( $i = 0;$words[$i];$i++ )
{
for( $j = 0;$words[$i][$j]; $j++ ) {$cur_word = strtolower($words[$i][$j]); print "$cur_word<br>";
}
}
}
}
?>
I have put the script on a php-supporting site, below. It seems to have a syntax error. What am I doing wrong?

http://members.lycos.co.uk/blogmixer/test.html

Last edited by a moderator:
you'd make a request to www.mydomain.com/proxy.php?url=www.someoneelsesblog.com[/URL] and take it from there.[/QUOTE]

On second thought, I think wordpress blogs include php support, so maybe one blog could be created that plays the role of the proxy to the other wordpress blogs. Plus all blogspot.com blogs! The bandwidth would then be unlimited.

Do blogs run out of bandwidth?

Last edited by a moderator:
Well, it looks like the PHP parser isn't parsing your page. It might be because your extension is .html instead of .php. In any case, you should read a basic lesson on PHP from php.net to get used to the syntax. You really only need one function call to file_get_contents() and some parsing, or something along those lines.

There's no reason why blogs won't run out of bandwidth like any other site. It depends on the hosting plan, among other things.

EDIT: I just read Job's post. Yeah, you could probably alter the script a little bit (for example getting rid of the forms and changing $_POST to$_GET to make things easier), and it should work.

Last edited:
So it gets the data from the site to the server, and from the server to the client, like a normal proxy? No server bypass? (I wanted to bypass the server and connect directly to the sites and then mix the contents, the server would only give the instructions how to do that).

Yes, that's the idea. Your server is the proxy. I don't know what you mean by server bypass.

-Job-
The code in between <? ?> tags is PHP. You need to place it on a PHP capable server and name it with the .php extension.

Basically the code receives a posted variable containing a URL, it performs a request to that URL, receives the response, uses regular expressions to extract the hyperlinks and repeats the process on those links. Basically it's a spider that crawls through web pages.

-Job-
If your site were to get popular you'd change the approach slightly. You'd have the browser cache requests and monitor the blog URLs for changes (at which time it would update the cache).

-Job-
Take a look at this, this is what i was thinking about:
http://buzz.blogger.com/2006/11/json-on-new-blogger.html

If a blog has a JSON feed, then you can put a script reference on your page, such as:
< script type="text/javascript" src="http://www.someoneelsesblog.com/feed/json.php?somequeryvarshere [Broken] >

JSON is just data stored in Javascript objects. So a JSON feed might return something like:
Code:
CONTENT_VAR = {Article1:"Today i found i don't have a life...", Comment:"I don't have a life either"}
Because of the script reference, the blog's contents would get loaded in directly (no proxy), and would be immediately available to javascript so you can then use it in your page, for instance:
Code:
for(a in CONTENT_VAR){
document.write(CONTENT_VAR[a]);
}
This approach bypasses the same origin policy. One thing to keep in mind is that if one of the feeds becomes unavailable or excessively slow it will impact your page's load time directly, so these feeds should be placed at the bottom of the page, after the page structure has loaded.

Last edited by a moderator:
Alright! So it's definitely JSON then. And just found Blogger supports it.

Any good resources for getting started with web development like a course and reference for javascript?

-Job-