The Internet Portal (Sub-Ether)

The Internet Portal (June 30, 2004) Sub-Ether

I want to turn the entire Internet into one big computer.

Download info at the bottom.

Click here.

About my ideas: Most of my ideas are things that seemed like a good idea, or a novelty when I thought of them only to find out when I tell people about it, that somebody had the same idea long before me.

I'm not a very well read person, I'd much rather create than research. I'd rather write my own music than play somebody else's. And I'd rather reinvent the wheel than buy a prebuilt wheel.

This may seem wasteful since the wheel has been perfected over thousands of years, but I'm probably better at building wheels now than your average person.

My idea for The Internet Portal

The one obvious thing missing from the Internet:

The Internet is not free, it's paid for by lots of telecoms companies and probably some governments to some degree. But on the whole it's relatively cheap for the average person to use. You pay a small fee for access to the Internet. Most notably, it is not run by any one company or central authority that can mess it up for everybody else.

Linux is an example of a free software project. Some guy sat down and wrote some software and gave it away for free and now lots of people use it. Lots of other people came along and wrote tools and environments and desktops and all sorts of things that are also given away basically for free if you can download them off the Internet.

Lots of companies exist to supply useful information or tools or services on the Internet some of them using this free software. Google, yahoo, Ebay and amazon come to mind.

But I had one of those "EUREKA!" moments last week (June 30, 2004 at about 8am New York time or so) when it suddenly occurred to me that there's no portal/big useful service that isn't run by a company of some kind.

A lot of the p2p applications that exist are for file sharing, and a lot of the grid computer applications that exist are for crunching massive amounts of data. You break the data into small chunks and dole them out to whichever P4 is available to process the chunk.

The most popular of these apparently is SETI at home. RC5 was a personal favorite of mine. A lot of people seem to be jumping on the bandwagon. Apparently in November of 2004 IBM decided to get in on the act. Good for them. I just don't see why only a handful of people should be taking advantage of this resource, when EVERYBODY can.

For the first generation of p2p and grid computing (for lack of a better term) this is great. It gets things done, but like all ventures in the new and unknown the first thing is usually pretty simple and later generations advance on earlier ideas.

So here's a second generation idea for you.

There's lots of people's machines running doing nothing with their idle CPU, while MS and amazon and Ebay and Google have farms upon farms of machines grinding away finding web pages and selling stuff to you.

Why not use a p2p program to group together all of the available idle CPU and drive space to coordinate a useful system for a truly global decentralized portal?

While you're at it, why stop at a portal? Why not an Ebay, a web search tool, an amazon, an instant messenger, an Active-worlds, a freecycle? Why does ANY web application have to be centralized?

Remember the movie Terminator? Well, basically Skynet's what I'm talking about. Once again proving I rarely have an original idea.

So I figure what you need is a foundation program. All it does is serve as the (and I hate to use this term but it's useful here) application server. It does all of the p2p discovery and get-in-sync-with-everybody-else type work, and accepts requests for labor as well as post your local requests for labor to the p2p network.

Upon this you build the applications. You can write your own for people to discover (and as they get more use the system will forward them to more machines to be distributed more efficiently.) This will effectively make everybody their own web/mail/IM server, which apparently is against a lot of terms of service agreements of ISPs. Well, ISPs, like the phone company, will have to grow up, and get used to it. Remember the famous economic 1-liner: "If you don't cannibalize your business, somebody else will." So whoever openly adopts this first, wins. Where's the money to be made? There isn't. Linus Torvalds doesn't make money from Linux (not directly anyway) the gcc people don't either. But when enough people start clamoring for an ISP that allows them to run the Internet Portal, somebody will sell them the service. Remember this is for the public good, the kind developers write the program, and everybody contributes to the system by allowing some use of their hardware and bandwidth, and you pay your ISP for access to the Internet just as you do now, and they'll fall in line when the popularity hits a critical mass. Rome wasn't built in a day and all that.

There's lots of problems that I've come up with and many more that I haven't. All I know so far is that it's the biggest effort requiring idea I've ever had which is why I haven't started working on it myself yet. This is bigger than me. I take that back, I just started working on it 2/12/05.

There has to be some kind of trusted authority so that people don't go around abusing The Internet Portal. There has to be a mechanism for updating bits, adding and removing applications, administrative roles and so on.

I have lots of ideas of how to go about this, which I'll get to below, but I just wanted to present the basic idea and see if it takes off. I don't have the time to build this whole thing myself, so I'm hoping that some inspired developers will rally around the idea and start a project with me.

It's not a KDE or Gnome, blasphemous as it may sound, I think sometimes monopolies are good. In this case (and KDE/Gnome is a perfect example) rather than have 2 or more competing systems, why not just work on one that everybody can be happy with. Because you can't make everybody happy, that's why. So go ahead, compete. Have fun. I hope your implementation is better than mine.

I just want to point out that there are many cell phone companies that are all paying lots of money to put cell towers in all of the population dense places, covering the same areas many times over with basically the same service, while there are places not too far out in the boonies where you can't get signal, because all these companies have to compete.

Maybe, just maybe, if they all sat down and said, we'll compete on service and features, but as far as covering areas with signal, let's put all our money together in one pile and cover a wider area ONCE and share the resulting coverage network, instead of 10 companies covering the same area 10 times.

Nahhhh. People are too greedy for that. So go ahead and make 5 Internet Portals. (see below) That's fine with me. I just think this is an idea who's time will soon come. And I just want to tell people in case they hadn't thought of it themselves yet. Like Ebay and Google, one will rise to the top and eventually everybody will join the bandwagon, again for the greater good.

Anyway, enough of that rant, back to the purpose at hand.

It occurs to me that the first thing the average computer user is going to say when somebody asks them to run this p2p software on their machine is "I'm not running anything 'p2p' on my machine, it's against my religion. And besides why should I waste my CPU and disk space that I paid for on other people using my machine."

Okay, well maybe slightly above average computer user.

Well, the answer is that you're going to hear that a lot, and at first everybody's going to balk at the idea of letting other people run software on their machine, but like all big breakthrough technologies (which take a long time to become status quo) it's something everybody will slowly get used to, and like Google, will someday not be able to live without.

Right now, some people leave their computers on all of the time, some do not. I expect in the next 10 years, the computer in some form, will be on all the time, downloading your favorite TV shows and news bits over night so you will have them ready to view while you drink your morning coffee. What I envision is that over time, this system will become so ubiquitous that the personal computer and Internet will seem largely useless without it, much as a TV is useless without cable or satellite. Yeah, you can watch broadcast TV, but who wants to do that anymore... I figure there will someday be a generation of kids who won't know life without this paradigm.

Maybe I'm being a little big headed here, but it seems glaring to me that this is almost inevitable. Maybe it's not, maybe something even better will come along and usurp this before it becomes big, but given how well the Internet and free large-systems have been doing, it seems to me this is a no brainer.

Except that somebody has to build it first.

So here are some ideas I had. If you want to look at how to get p2p to work and to scale to some degree, go talk to the nice folks in kazaa land. They've got that whole supernode thing all worked out.
If you want to see a daemon that's good at sending stored data around and getting it to the right place when asked for, check out freenet.

Two very good starting points. But kazaa and freenet lack one important thing that yahoo, Ebay, and amazon have and Google is very quickly picking up on: user accounts.

So how can you possibly verify authoritatively that you are who you say you are when there's no central authority to verify against. Well, DNS works, doesn't it? Another good starting point.

If you see what I'm getting at here, all the pieces exist in one form or another somebody just has to sit down and connect the dots.

So now your saying "DNS is distributed but there's a hierarchy and there's a few root servers that are in trusted hands. So it *is* centralized." Okay, you win. I guess we're just going to have to throw up our hands and forget the whole thing. Or we can find 7 people to trust. I volunteer to be one of them. And I've got a few geeky friends who I'm sure will be happy to be the others. So who's with me?

Account information doesn't have to be big, it just has to have an id and some unique hash, and everything else can be slurped off the ether from wherever it lies. I haven't even gotten to the interesting stuff yet, and it already sounds complicated. And it is. But so is the gnu c++ optimizing compiler. I can be done. Man built the space shuttle. This is nothing by comparison. Hey, maybe some of my friends at NASA would be into this. This is right up their alley.

The Implementation

Where do I begin. The problem with this project, is that it is so huge, I don't know where to even start with the design.

Well, here's what I did. I wrote the Sub-Ether connect program. This is the core of the system. It serves to connect all the computers in the "Sub-Ether" to each other.
Then I wrote a few utilities that poll the network and tell you who's on and what applications are available.
Click here for the page about the implementation. Now I'm working on my first application. The distributed compiler. After that I'll start getting to some of the things mentioned below...

The Name

When I first thought of it, "The Internet Portal" seemed like the closest thing I could come up with to a name to describe the idea. Friends have since mentioned to me that they didn't think it was so hot, so we came up with "Sub-Ether." It sorta makes sense if you turn the network model upside down. Literally, not metaphorically. This is an application layer on which other applications operate, so it would really be super-ether, not sub-ether. But I like the sound of it. Suggestions welcome.

The Information Gatherer

One of the most annoying things about windows is that for all of it's available resources, whenever you do anything that involves opening up the file dialog the machine sits there and grinds away finding all available devices, files and associated icons to display in that box.
There is something to be said for on demand information, because it is most current, but there's also something to be said for humans not having to wait for machines all the time.

I'm in favor of the latter camp.

The information gatherer is a daemon that runs on each node that spends some small amount of idle time gathering useful information to be used for likely future requests. So when the request is made, the user doesn't have to wait for the information gathering part of the response.
For example: The first time a node wants to compile something, it wouldn't know that the user wanted to use a compiler beforehand, so it would have to note that the compiler was now an 'interesting' application, and it would go out and find some other available nodes with a compatible version of the compiler and then go run the compiles.

But from then on, the information gatherer would keep tabs on available compiler nodes so that next time the compiler was invoked, the information would be immediately available. It's 2005. Why windows doesn't do this, I'll never understand. Maybe it's to remind everybody who's in control.
The information gatherer will be poked by applications and be told what kinds of information for what applications to cache. So it can be used to keep track of where to get the user's favorite news bits from (and possibly the news content itself), and information about other networks nodes to be used for compiling, searching, mailing, and other node wide tasks. Like finding aliens.

Security and User Accounts

I'm not big on security. In fact as a programmer, I find it rather annoying. It's a useless layer of crap that makes things more complicated and run slower. All because you can't trust anybody. So while I see and understand the need, I still don't like it, and I'm not terribly good at it. "And then, one Thursday, nearly two thousand years after one man had been nailed to a tree for saying how great it would be to be nice to people for a change..." Well, you know how it goes.
Wouldn't it be nice if everybody was just nice to each other for a change.

For my part, what I envision someday is a setup where user information is stored over the distributed network securely and your private key would be the way to unlock it and prove you are you. I'm not up to that yet, I'm still building the framework. There are plenty of things to do before security is required, and by then I'm hoping somebody really interested in such things will offer to help.

As far as user accounts, I also eventually want this distributed, like everything else, but like KDE, it's not a bad idea to borrow a few stepping stones here and there. A few months ago a friend of mine proposed the idea of OSSS. Open Source Single Signon. I mentioned this to some other friends of mine and they pointed me here: http://www.projectliberty.org/ They're a company that offers a digital identity which sounds like what I'm looking for. I have to read about it more, but thisis probably what I'm going to start with until I can write something of my own.

The Underlying Datastore

This is almost as interesting as Security and User Accounts. Except that I hate security and therefore find this more interesting.
Where do we put all of this information?

Rankings

Some applications like web search and the compiler won't relate to user accounts, they're just using resources, but for things like auctions, and questions, you want to know that the user offering the item or answer is trustworthy. So we have rankings.

Rankings will be saved with your user record on your machine as part of your user information. There being no central server in this system there isn't really anywhere else 'safe' to put it. This of course means that anybody can just doctor their user account and make themselves very trustworthy by having a high rank. Well, it's not going to be THAT easy. The people who vote on a user's rank will themselves have ranking information. So when you vote on somebody, your identifying user information will be stored with your vote for and with that user. If somebody wants to know your rank they can look at your numbers, but they can also verify it, by polling all of the people who voted for you and seeing how trustworthy THEY are. Plus comments will be added, which is something a computer is going to have a hard time doing. So after you verify that all the people who voted for you have a varied start time, are online, and had something intelligible to say, you'll have a reasonable idea of how valid their ranking is.

This isn't a perfect system, but it's a start, and I'm always open to suggestions. Remember this isn't a one person effort, it's for everybody by everybody.

Authority

This is sort of like ranking but it's not for user accounts for applications, it's for the authority of the administration of the Sub-Ether system. My authority will be primary. I will trust my friends and other sysadmins that I know are 'good guys' and give them authority as well. They will be secondary. They can then trust people as tertiary and so on. Machines that have this authority will be the ones that everybody goes to to check the validity of an application or download. If you trust a level 5 authority to download programs then you can get the info from anybody 5 and up. You can GET the program from anywhere, but you can verify its MD5 from the authorities. This will keep my machine from getting killed trying to serve everything to everybody.

It sounds like I'm making a central point of failure, and to some degree, it is. But this doesn't effect the functioning of the Sub-Ether system, just the administration of it.

Applications

Questions and Answers

This seems like a good candidate for starters. It's a simple application, it's useful, certainly lots of websites try and do this type of thing, and to some degree, it's Usenet. But there's no existing amazon/ebay/Google class company doing this.
An application where you can post a question and anybody who subscribes to the Q&A application can freely respond with an answer. Everybody can search for existing questions and answers, and can respond if the answer was useful, thus upping the Q&A vote count of the answerer. So future question askers will have an idea if the person answering the question is likely to have a good answer.

Mail

The Compiler

This seems like one of the easier applications to build since it piggybacks on an existing project I'm already terribly impressed with: distcc.
While all of my Sub-Ether applications build on windows, linux, solaris and aix, this one's going to favor the non-windows machines. (Don't worry you windows folks, you can still help out by running the connector and functioning as a router.) The idea is people who have distcc installed and working, can offer up their spare CPU by running the Sub-Ether distcc program. This will allow others in the ether to use your compiler to compile their programs, and vice versa.
A friend recently told me it took him many hours to build KDE. I think maybe if you've got a few dozen or hundred machines helping you, it won't take so long. :-)

More details as I get it working.

News

The Datastore Application

Not to be confused with the underlying datastore.

Chess

Obviously, somebody's going to write a chess game. It will put Deep Thought to shame.

Chat

Once I work out a way to map auth credentials from project liberty (see above) to Sub-Ether global user id's, it will be pretty simple to find somebody by name in the ether and chat with them.

WWW

Well of course, everybody's going to be running their own webserver. Why use some provider of blogs/webspace/whatever with annoying ads, when you can host your own server and store your data on th Shared Drive.

SETI@home, Einstein@home and their ilk

Although it would initially be a step back for these guys, it seems silly to me that everybody writes their own CPU sharing system. I don't know much about the design of SETI@home, but I get the idea that it's more of a client->[many]server system, than peer to peer system. I'm offering up peer to peer, and there's a simple enough way to do block-check-out--process--return-results with the Sub-Ether framework and as soon as I have the distcc application working, I'm going to ask these guys if they're interested. I mean, why have everybody running 10 different cpu sharing systems, when you can run one, and just offer your application to be plugged in by those who want to offer to help.

Who has what program.

There is the question of what runs where, not every node in the ether is going to have every installable application. Each computer still has to run some software locally, so who decides who gets what?

Well there are a few things to go by. There are classifications of software. The compiler for example. It makes sense if the people who use the compiler offer up CPU for others to compile with. So that one can be installed for use and serving. But the SETI program for example, well, nobody really has a need to run the SETI program for themselves, but they might like to contribute their CPU to the cause, so these types of programs would fall under the volunteer category.
A lot of programs fit in those two categories. Everything else will default to no. The point is to not allow a rogue program to infiltrate and abuse the entire system. So if you want to browse you'll host the browser and store indexing information. If you want to buy stuff on ebay, you'll also host the ebay application.

Where does the software come from.

In a word, me. Obviously the plan is for everybody to write software for the ether, and you can offer it up to anybody, but if you want 'trusted' software you're going to have to trust somebody, and since this is my ether, you can trust me if you want. Each node will define what level of trust it will accept software to run. You can have exceptions, I want to run this untrusted software, and I want to never run anything from Microsoft, whatever you want. All that can be configurable.

There can be only one

Well, okay, you can have as many as you want, and perhaps, eventually, there will be so many participating nodes that there will be a use for segregated sub-ethers. But for starters, it seems to me having one would make the most sense. How useful would it be to have separate phone companies that couldn't call each other's phones. Or email systems that could mail between them. I envision something more like the Gnutella network. The protocol is available and there's a certain lowest common denominator that everybody agrees to, but you can implement whatever you want so that all the pieces can talk to each other on some level.

Bill Yeager

Well, as it turns out this is yet another idea that somebody had pretty much the same idea before me. I read about it in Network World, the 3/27/06 issue. Bill Yeager is a famous guy and he did a lot of neat things, and some time around 1997 I think, page 42 third paragraph on the page he describes at least part of this project in one nice succinct paragraph:

The charter was to create an open source project for
the creation of peer-to-peer protocols that would yield
a virtual layer on top of the TCP/IP stack. That would return 
end-to-end connectivity to the Internet my making
the traversal of NATs and firewalls transparent and
provide host endpoints with globally unique identifiers.

More to come.

Download info.

So I started actually working on it. Here are some notes on what I'm doing.

Now (2/17/06) I've got something that builds and runs okay and does distributed compiling with distccd Download the source here.
Unzip and untar this file.
There's a buildse.sh script, which you run first, and if it all builds well, there's an installse.sh script which will install it into /usr/local if you're root.
If you're not root and you want to install it somewhere else, change line 3 of the installse.sh script, before you run it.
There's a HOWTO file in the tar that explains step by step how to set up the distributed compiling stuff.

A bit more documentation, and an install script. 5/19/06 Download the Release II source here.
Unzip and untar this file.
Read the HOWTO-README, and there's also a HOWTO in the sedistccd directory.

I got into ajax, so now there's a status console web page for seconnect. A few bug fixes in sedistccd. Now I'm working on a subether webserver, should be neat when I'm done. 8/14//06 Download the Release III source here.
Unzip and untar this file.
Read the HOWTO-README, and there's also a HOWTO in the sedistccd directory.
When seconnect is running you can point your web browser to http://localhost:1124 and see the console.

you can email me at spamme at deadpelican.com