The Internet Portal (June 30, 2004) Sub-Ether
I want to turn the entire Internet into one big computer.
Download info at the bottom.
Click here.
About my ideas:
Most of my ideas are things that seemed like a good idea, or a novelty when I thought of them
only to find out when I tell people about it, that somebody had the same idea long before me.
I'm not a very well read person, I'd much rather create than research. I'd rather write my own
music than play somebody else's. And I'd rather reinvent the wheel than buy a prebuilt wheel.
This may seem wasteful since the wheel has been perfected over thousands of years, but
I'm probably better at building wheels now than your average person.
My idea for The Internet Portal
The one obvious thing missing from the Internet:
The Internet is not free, it's paid for by lots of telecoms companies and probably some governments
to some degree. But on the whole it's relatively cheap for the average person to use. You pay a small
fee for access to the Internet. Most notably, it is not run by any one company or central authority
that can mess it up for everybody else.
Linux is an example of a free software project. Some guy sat down and wrote some software and gave it
away for free and now lots of people use it. Lots of other people came along and wrote tools and
environments and desktops and all sorts of things that are also given away basically for free
if you can download them off the Internet.
Lots of companies exist to supply useful information or tools or services on the Internet some of them
using this free software.
Google, yahoo, Ebay and amazon come to mind.
But I had one of those "EUREKA!" moments last week (June 30, 2004 at about 8am New York time or so) when it suddenly
occurred to me that there's no portal/big useful service that isn't run by a company of some kind.
A lot of the p2p applications that exist are for file sharing, and a lot of the grid
computer applications that exist are for crunching massive amounts of data. You break the data
into small chunks and dole them out to whichever P4 is available to process the chunk.
The most popular of these apparently is SETI at home. RC5 was a personal
favorite of mine. A lot of people seem to be jumping on the bandwagon.
Apparently in November of 2004 IBM decided to get in on the act.
Good for them. I just don't see why only a handful of people should be taking
advantage of this resource, when EVERYBODY can.
For the first generation of p2p and grid computing (for lack of a better term) this is great.
It gets things done, but like all ventures in the new and unknown the first thing is usually
pretty simple and later generations advance on earlier ideas.
So here's a second generation idea for you.
There's lots of people's machines running doing nothing with their idle CPU, while MS and amazon and
Ebay and Google have farms upon farms of machines grinding away finding web pages and selling stuff to you.
Why not use a p2p program to group together all of the available idle CPU and drive space
to coordinate a useful system for a truly global decentralized portal?
While you're at it, why stop at a portal? Why not an Ebay, a web search tool, an amazon, an instant messenger,
an Active-worlds, a freecycle? Why does ANY web application have to be centralized?
Remember the movie Terminator? Well, basically Skynet's what I'm talking about. Once again
proving I rarely have an original idea.
So I figure what you need is a foundation program. All it does is serve as the (and I hate to use this term
but it's useful here) application server. It does all of the p2p discovery and get-in-sync-with-everybody-else
type work, and accepts requests for labor as well as post your local requests for labor to the p2p network.
Upon this you build the applications. You can write your own for people to discover (and as they get
more use the system will forward them to more machines to be distributed
more efficiently.) This will effectively make everybody their own web/mail/IM server, which apparently
is against a lot of terms of service agreements of ISPs. Well, ISPs, like the phone company, will have to grow up,
and get used to it. Remember the famous economic 1-liner: "If you don't cannibalize your business,
somebody else will." So whoever openly adopts this first, wins. Where's the money to be made?
There isn't. Linus Torvalds doesn't make money from Linux (not directly anyway) the gcc people don't either.
But when enough people start clamoring for an ISP that allows them to run the Internet Portal, somebody
will sell them the service. Remember this is for the public good, the kind developers write the program,
and everybody contributes to the system by allowing some use of their hardware and bandwidth, and
you pay your ISP for access to the Internet just as you do now, and they'll fall in line when the
popularity hits a critical mass. Rome wasn't built in a day and all that.
There's lots of problems that I've come up with and many more that I haven't.
All I know so far is that it's the biggest effort requiring idea I've ever had which is why I haven't
started working on it myself yet. This is bigger than me. I take that back, I just started working on it
2/12/05.
There has to be some kind of trusted authority so that people don't go around abusing The Internet Portal.
There has to be a mechanism for updating bits, adding and removing applications, administrative roles and so on.
I have lots of ideas of how to go about this, which I'll get to below, but I just wanted to present
the basic idea and see if it takes off.
I don't have the time to build this whole thing myself, so I'm hoping that some inspired developers
will rally around the idea and start a project with me.
It's not a KDE or Gnome, blasphemous as it may sound, I think sometimes monopolies are good. In this case
(and KDE/Gnome is a perfect example) rather than have 2 or more competing systems, why not just work on one
that everybody can be happy with. Because you can't make everybody happy, that's why. So go ahead, compete.
Have fun. I hope your implementation is better than mine.
I just want to point out that there are many cell phone companies that are all paying lots of money
to put cell towers in all of the population dense places, covering the same areas many times over
with basically the same service, while
there are places not too far out in the boonies where you can't get signal, because all these companies
have to compete.
Maybe, just maybe, if they all sat down and said, we'll compete on service and features, but as far
as covering areas with signal, let's put all our money together in one pile and cover a wider area ONCE
and share the resulting coverage network, instead of 10 companies covering the same area 10 times.
Nahhhh. People are too greedy for that. So go ahead and make 5 Internet Portals. (see below)
That's fine with me.
I just think this is an idea who's time will soon come. And I just want to tell people in case they hadn't
thought of it themselves yet. Like Ebay and Google, one will rise to the top and eventually
everybody will join the bandwagon, again for the greater good.
Anyway, enough of that rant, back to the purpose at hand.
It occurs to me that the first thing the average computer user is going to say when somebody asks
them to run this p2p software on their machine is "I'm not running anything 'p2p' on my machine,
it's against my religion. And besides why should I waste my CPU and disk space that I paid for
on other people using my machine."
Okay, well maybe slightly above average computer user.
Well, the answer is that you're going to hear that a lot, and at first everybody's going
to balk at the idea of letting other people run software on their machine,
but like all big breakthrough technologies (which take a long time to become status quo)
it's something everybody will slowly get used to, and like Google, will someday not be able
to live without.
Right now, some people leave their computers on all of the time, some do not. I expect in
the next 10 years, the computer in some form, will be on all the time, downloading your
favorite TV shows and news bits over night so you will have them ready to view while you
drink your morning coffee. What I envision is that over time, this system will become so
ubiquitous that the personal computer and Internet will seem largely useless without it,
much as a TV is useless without cable or satellite. Yeah, you can watch broadcast TV, but
who wants to do that anymore... I figure there will someday be a generation
of kids who won't know life without this paradigm.
Maybe I'm being a little big headed here, but it seems glaring to me that this is almost inevitable.
Maybe it's not, maybe something even better will come along and usurp this before it becomes big,
but given how well the Internet and free large-systems have been doing, it seems to me this is a
no brainer.
Except that somebody has to build it first.
So here are some ideas I had. If you want to look at how to get p2p to work and to scale to some degree,
go talk to the nice folks in kazaa land. They've got that whole supernode thing all worked out.
If you want to see a daemon that's good at sending stored data around and getting it to the right place
when asked for, check out freenet.
Two very good starting points.
But kazaa and freenet lack one important thing that yahoo, Ebay, and amazon have and
Google is very quickly picking up on:
user accounts.
So how can you possibly verify authoritatively that you are who you say you are when there's no
central authority to verify against. Well, DNS works, doesn't it? Another good starting point.
If you see what I'm getting at here, all the pieces exist in one form or another somebody just
has to sit down and connect the dots.
So now your saying "DNS is distributed but there's a hierarchy and there's a few root servers
that are in trusted hands. So it *is* centralized."
Okay, you win. I guess we're just going to have to throw up our hands and forget the whole thing.
Or we can find 7 people to trust. I volunteer to be one of them. And I've got a few geeky friends
who I'm sure will be happy to be the others. So who's with me?
Account information doesn't have to be big, it just has to have an id and some unique hash, and
everything else can be slurped off the ether from wherever it lies. I haven't even gotten to the
interesting stuff yet, and it already sounds complicated. And it is. But so is the gnu c++
optimizing compiler. I can be done. Man built the space shuttle. This is nothing by comparison.
Hey, maybe some of my friends at NASA would be into this. This is right up their alley.
The Implementation
Where do I begin. The problem with this project, is that it is so huge, I
don't know where to even start with the design.
Well, here's what I did. I wrote the Sub-Ether connect program.
This is the core of the system. It serves to connect all the computers in the "Sub-Ether"
to each other.
Then I wrote a few utilities that poll the network and tell you who's on
and what applications are available.
Click here for the
page about the implementation.
Now I'm working on my first application. The distributed compiler. After that
I'll start getting to some of the things mentioned below...
The Name
When I first thought of it, "The Internet Portal" seemed like the closest thing I
could come up with to a name to describe the idea. Friends have since mentioned to me
that they didn't think it was so hot, so we came up with "Sub-Ether." It sorta makes sense
if you turn the network model upside down. Literally, not metaphorically. This is an
application layer on which other applications operate, so it would really be super-ether,
not sub-ether. But I like the sound of it. Suggestions welcome.
The Information Gatherer
One of the most annoying things about windows is that for all of it's available
resources, whenever you do anything that involves opening up the file dialog
the machine sits there and grinds away finding all available devices, files
and associated icons to display in that box.
There is something to be said for on demand information, because it is most
current, but there's also something to be said for humans not having to wait
for machines all the time.
I'm in favor of the latter camp.
The information gatherer is a daemon that runs on each node that spends some small
amount of idle time gathering useful information to be used for likely future requests.
So when the request is made, the user doesn't have to wait for the
information gathering part of the response.
For example: The first time a node wants to compile something, it wouldn't
know that the user wanted to use a compiler beforehand, so it would have to note
that the compiler was now an 'interesting' application, and it would go out and
find some other available nodes with a compatible version of the compiler and
then go run the compiles.
But from then on, the information gatherer would keep tabs on available compiler
nodes so that next time the compiler was invoked, the information would be
immediately available.
It's 2005. Why windows doesn't do this, I'll never understand. Maybe it's to remind
everybody who's in control.
The information gatherer will be poked by applications and be told what kinds
of information for what applications to cache. So it can be used to keep track of
where to get the user's favorite news bits from (and possibly the news content
itself), and information about other networks nodes to be used for compiling, searching,
mailing, and other node wide tasks. Like finding aliens.
Security and User Accounts
I'm not big on security. In fact as a programmer, I find it rather
annoying. It's a useless layer of crap that makes things more complicated
and run slower. All because you can't trust anybody. So while I see and
understand the need, I still don't like it, and I'm not terribly good at it.
"And then, one Thursday, nearly two thousand years after one man had
been nailed to a tree for saying how great it would be to
be nice to people for a change..." Well, you know how it goes.
Wouldn't it be nice if everybody was just nice to each other for a change.
For my part, what I envision someday is a setup where user information
is stored over the distributed network securely and your private key would
be the way to unlock it and prove you are you. I'm not up to that yet, I'm
still building the framework. There are plenty of things to do before security
is required, and by then I'm hoping somebody really interested in such things
will offer to help.
As far as user accounts, I also eventually want this distributed, like
everything else, but like KDE, it's not a bad idea to borrow a few stepping stones
here and there.
A few months ago a friend of mine proposed the idea of OSSS. Open Source Single
Signon. I mentioned this to some other friends of mine and they pointed me here:
http://www.projectliberty.org/ They're a company that offers a digital identity
which sounds like what I'm looking for.
I have to read about it more, but thisis probably what I'm going to start with
until I can write something of my own.
The Underlying Datastore
This is almost as interesting as Security and User Accounts. Except that I hate security
and therefore find this more interesting.
Where do we put all of this information?
Rankings
Some applications like web search and the compiler won't relate to
user accounts, they're just using resources, but for things like
auctions, and questions, you want to know that the user
offering the item or answer is trustworthy. So we have rankings.
Rankings will be saved with your user record on your machine as part of
your user information. There being no central server in this system there
isn't really anywhere else 'safe' to put it. This of course means
that anybody can just doctor their user account and make themselves
very trustworthy by having a high rank. Well, it's not going to be
THAT easy.
The people who vote on a user's rank will themselves have ranking information.
So when you vote on somebody, your identifying user information will be
stored with your vote for and with that user. If somebody wants to know your rank
they can look at your numbers, but they can also verify it, by
polling all of the people who voted for you and seeing how trustworthy THEY are.
Plus comments will be added, which is something a computer is going to have
a hard time doing. So after you verify that all the people who voted for you
have a varied start time, are online, and had something intelligible to say,
you'll have a reasonable idea of how valid their ranking is.
This isn't a perfect system, but it's a start, and I'm always open to
suggestions. Remember this isn't a one person effort, it's for everybody
by everybody.
Authority
This is sort of like ranking but it's not for user accounts for applications,
it's for the authority of the administration of the Sub-Ether system.
My authority will be primary. I will trust my friends and other sysadmins
that I know are 'good guys' and give them authority as well. They will be
secondary. They can then trust people as tertiary and so on.
Machines that have this authority will be the ones that everybody goes to
to check the validity of an application or download. If you trust a level 5
authority to download programs then you can get the info from anybody 5
and up. You can GET the program from anywhere, but you can verify its MD5
from the authorities. This will keep my machine from getting killed trying
to serve everything to everybody.
It sounds like I'm making a central point of failure, and to some degree, it is.
But this doesn't effect the functioning of the Sub-Ether system, just the administration
of it.
Applications
Questions and Answers
This seems like a good candidate for starters. It's a simple application, it's useful,
certainly lots of websites try and do this type of thing, and to some degree, it's Usenet.
But there's no existing amazon/ebay/Google class company doing this.
An application where you can post a question and anybody who subscribes to the Q&A application
can freely respond with an answer. Everybody can search for existing questions and
answers, and can respond if
the answer was useful, thus upping the Q&A vote count of the answerer. So future question askers
will have an idea if the person answering the question is likely to have a good answer.
Mail
The Compiler
This seems like one of the easier applications to build since it piggybacks
on an existing project I'm already terribly impressed with: distcc.
While all of my Sub-Ether applications build on windows, linux, solaris and aix,
this one's going to favor the non-windows machines. (Don't worry you windows folks, you
can still help out by running the connector and functioning as a router.)
The idea is people who have distcc installed and working, can offer up their spare
CPU by running the Sub-Ether distcc program. This will allow others in the ether to
use your compiler to compile their programs, and vice versa.
A friend recently told me it took him many hours to build KDE. I think maybe
if you've got a few dozen or hundred machines helping you, it won't take so long. :-)
More details as I get it working.
News
The Datastore Application
Not to be confused with the underlying datastore.
Chess
Obviously, somebody's going to write a chess game. It will put Deep Thought to shame.
Chat
Once I work out a way to map auth credentials from project liberty (see above) to
Sub-Ether global user id's, it will be pretty simple to find somebody by name
in the ether and chat with them.
WWW
Well of course, everybody's going to be running their own webserver.
Why use some provider of blogs/webspace/whatever with annoying ads, when
you can host your own server and store your data on th Shared Drive.
SETI@home, Einstein@home and their ilk
Although it would initially be a step back for these guys, it seems silly
to me that everybody writes their own CPU sharing system. I don't know much about
the design of SETI@home, but I get the idea that it's more of a client->[many]server
system, than peer to peer system. I'm offering up peer to peer, and there's a simple enough way
to do block-check-out--process--return-results with the Sub-Ether framework
and as soon as I have the distcc application working, I'm going to ask these
guys if they're interested.
I mean, why have everybody running 10 different cpu sharing systems, when
you can run one, and just offer your application to be plugged in by
those who want to offer to help.
Who has what program.
There is the question of what runs where, not every node in the ether
is going to have every installable application. Each computer still has to
run some software locally, so who decides who gets what?
Well there are a few things to go by. There are classifications
of software. The compiler for example. It makes sense if the people
who use the compiler offer up CPU for others to compile with.
So that one can be installed for use and serving. But the SETI program
for example, well, nobody really has a need to run the SETI program
for themselves, but they might like to contribute their CPU to the cause,
so these types of programs would fall under the volunteer category.
A lot of programs fit in those two categories. Everything else
will default to no. The point is to not allow a rogue program
to infiltrate and abuse the entire system. So if you want to browse
you'll host the browser and store indexing information.
If you want to buy stuff on ebay, you'll also host the ebay
application.
Where does the software come from.
In a word, me. Obviously the plan is for everybody to write software for the
ether, and you can offer it up to anybody, but if you want 'trusted' software
you're going to have to trust somebody, and since this is my ether, you
can trust me if you want.
Each node will define what level of trust it will accept software to run.
You can have exceptions, I want to run this untrusted software, and I want
to never run anything from Microsoft, whatever you want. All that can
be configurable.
There can be only one
Well, okay, you can have as many as you want, and perhaps, eventually, there will
be so many participating nodes that there will be a use for segregated sub-ethers.
But for starters, it seems to me having one would make the most sense.
How useful would it be to have separate phone companies that couldn't call each other's phones.
Or email systems that could mail between them.
I envision something more like the Gnutella network. The protocol is available and there's a
certain lowest common denominator that everybody agrees to, but you can implement whatever
you want so that all the pieces can talk to each other on some level.
Bill Yeager
Well, as it turns out this is yet another idea that somebody had pretty much the same
idea before me. I read about it in Network World, the 3/27/06 issue.
Bill Yeager is a famous guy and he did a lot of neat things, and some time around
1997 I think, page 42 third paragraph on the page he describes at least part of this
project in one nice succinct paragraph:
The charter was to create an open source project for
the creation of peer-to-peer protocols that would yield
a virtual layer on top of the TCP/IP stack. That would return
end-to-end connectivity to the Internet my making
the traversal of NATs and firewalls transparent and
provide host endpoints with globally unique identifiers.
More to come.
Download info.
So I started actually working on it. Here are some notes on what I'm doing.
Now (2/17/06) I've got something that builds and runs okay and does distributed compiling with
distccd
Download the source here.
Unzip and untar this file.
There's a buildse.sh script, which you run first, and if it all builds well, there's
an installse.sh script which will install it into /usr/local if you're root.
If you're not root and you want to install it somewhere else, change line 3 of the installse.sh
script, before you run it.
There's a HOWTO file in the tar that explains step by step how to set up the distributed compiling
stuff.
A bit more documentation, and an install script.
5/19/06
Download the Release II source here.
Unzip and untar this file.
Read the HOWTO-README, and there's also a HOWTO in the sedistccd directory.
I got into ajax, so now there's a status console web page for seconnect.
A few bug fixes in sedistccd. Now I'm working on a subether webserver, should
be neat when I'm done.
8/14//06
Download the Release III source here.
Unzip and untar this file.
Read the HOWTO-README, and there's also a HOWTO in the sedistccd directory.
When seconnect is running you can point your web browser to
http://localhost:1124 and see the console.
you can email me at spamme at deadpelican.com