Feb 27, 2007

Dcentralized Content Distribution Network

What would happen if a single web server were suddenly inundated with an abnormally high amount of bandwidth, such that the cost of serving the content far outweighed the acceptable cost per user ratio intended by the publisher? This is known as the "slashdot" effect and can be applied to a number of scenarios across the digital publishing realm.

What inevitably happens is that, a small publisher of content creates a particular amount of media (be it an article, graphic, executable application, etc) that somehow finds massive appeal and favor among millions of people around the world. This in turn opens a virtual floodgate of people trying to access the content online, and thus the servers (of which the publisher is more than likely leasing) are collectively bombarded with content requests.

For a fortified server, such is usually not a problem as bandwidth is appropriated accordingly. But for low bandwidth servers (or say, publishers who cannot afford large bandwidth) we see either that the website in question is obliterated under the weight of the traffic, or that the host charges massive overages to the publisher - of which the publisher many times cannot afford.

Decentralized Content Distribution in this scenario can easily alleviate the major cause of this, while using very little of the network resources individually, but overall serving the content as though it were from a high speed server. Undeniably, 10k/sec from a large amount of servers adds up in the end just as pennies can add up in the large scale of things.

So say your server was being requested of 10k/sec. On your access per month, this amounts to very little and is inconsequential. But now multiply that by 200 servers, and you see that the content can be served at around 2000k/sec or higher (depending on the actual Kb/sec each server is appropriating). So, like a vest of Kevlar, when the bullet (the millions of requests) hit the main server, the Decentralized Network intercepts the bullet and spreads the traffic across a large area in small doses.

For websites, this seems to work amazingly well and, in fact, some websites currently handle their traffic this way (with 25 million users even though the main server remains low on bandwidth usage). The idea here is that I got to thinking about how this could be used in order to solve an age old problem concerning online Virtual Reality.

In one of the earliest written synopsis for an online virtual community (Habitat) Morningstar and Farmer pointed out that in order for a wide scale virtual environment to continually grow the servers of the content could not be centralized. Decentralization was key to the continued advancement and growth of virtual environments. I would like to believe also that this pertains to the "environment server" as well as the content server (many times these two are seperate), though I am currently focusing on a solution for the content end (with the environment server being a consideration well into the future).

When I muse of this, what I am thinking is that while current systems can easily handle tens of thousands of simultaneous users, the price per user begins to see an exponential increase in cost proportional to the amount of users. In essence, we see an exponential increase when the system is centralized.

Taking this into account, even the best planned and programmed system will begin to see an ultimate limit as to how many users can co-exist per centralized server before the costs of that system become prohibitive to it's expansion. And herein lies the point of Decentralized Content Distribution Networks.

While I am unfamiliar with the detailed mechanics of systems such as Second Life or There, I do understand that they take a fundamentally similar approach to the content that is served in their virtual environments. While Second Life effectively "streams" content using a derivative of the RealMedia system (as Phillip Rosedale was one of the creators of the format) I am want to believe that the content still comes from a centralized system of servers. In this much I am certain, though it is quite possible that the Second Life system also groups persons by node in order to offload the distribution of the content. I could be very wrong in the last aspect, but the behavior of the system seems to warrant such node grouping.

In recent months, I have read such articles wherein a self replicating object in the Second Life environment has managed to disable their servers, so in light of this new information I am now leaning toward the idea that they are attempting to run the Second Life environment on a centralized system.

This in itself is fine if a company in this field wishes to establish a user base of no larger than 100,000, but around such time, the cost per user will begin to skyrocket out of control as the numbers increase past this threshold.

In the end, seeing as I am interested in the theoretical applications of virtual environments and it's advancement, I have found that the Active Worlds environment seems to be the most receptive to this line of testing, and have continued toward attempting to find solutions to problems outlined in the synopsis "Lessons Learned From Lucasfilm's Habitat" Morningstar & Farmer, of which happens to be attempting to address the centralization problem.

Over the past few months I have indeed found a very plausible candidate for solving this Decentralization aspect within the Active Worlds environment by routing the Object Path through a server-side P2P network which self replicates and organizes the information from a main server across 260 high bandwidth servers around the world as a specific type of hash file. The more bandwidth that the Object Path requires, the more efficient the CDN will become, even though in beginning to use it I have noticed variable speeds (but not enough testing or time has been given to this in order to gauge it's overall ability).

One thing that I have immediately noticed is that it actually works in the Active Worlds environment, though entering the specific address to begin with in the World Properties Dialog and hitting apply will crash your browser, after you restart the browser you immediately realize that while it crashed your browser the change in address did take effect and the Object Path is being redirected through the CDN, thus beginning the hash replication and self organization across the whole network (server side and not user side).

After some various testing in the virtual world, I have noticed that it has a 99% accuracy when downloading the various files from the distributed network (99% as in that maybe 5 or 6 files will randomly time out while trying to be located and thus not properly loading, though simply exiting and re-entering the world seems to automatically correct this problem).

While I will not disclose exactly how this method is accomplished, I will say that for the most part it works just fine. I would imagine that it would perform much better under much higher stress than a few users entering at random, so this method would be best used for virtual environments where you are expecting a much higher stress on the Object Path (hundreds, thousands or more users). In theory, it would be possible to minimize the overall bandwidth consumption of an entire universe server's set of Object Paths even if it were to be filled to the brink with users (possibly serve every Object Path in a universe from a single cable modem and a home computer with low latency).

So from the theoretical application to the proof of concept, at least the serving of the content part can be decentralized as suggested by Morningstar and Farmer.

0 Comments:

Post a Comment