URLD - A wireless-oriented web page discovery program.

This document is an introduction to urld.

				Jim Binkley   jrb@cs.pdx.edu
				Sumit Chawla, sumit@cs.pdx.edu

Outline:

1. basic idea
	introduction
	2 usage scenarios
2. help needed
	port it to something
	demo it on something
	do something nifty and unexpected with it 
3. implementation/how it works (some details)
4. security considerations
5. contact us

---------------------------
1. Basic Idea

Urld reads and writes UDP-based broadcast messages made up 
of World Wide Web Uniform Resource Locaters (URLs).  These
messages consist of a system identity string and an associated
set of 1 or more HTML tags; for example, one URL might consist of:

http://www.cs.pdx.edu, "PSU CS department page"   

And the logical output in the web page created by urld on some 
other system might look like:

131.252.201.4  homebrew.cs.pdx.edu

	PSU CS department page  <----- an url ...
	----------------------

Messages from nearby nodes are sent to IP limited broadcast
and are written to a local html file on a receiving system.  For
example, on UNIX, the default output file is /tmp/urld.html.  This file
may be viewed by any web browser via file:/tmp/urld.html.
As a result one can determine nearby systems.  In summary, 
systems advertise WWW URLs to each other.

Thus we can distinquish between two kinds of urld runtime modes, which
we can call "reading" and "writing".  A system may be a reader (sends
URLs), a writer (reads URLs and puts them in a local html page), or both.
Doing both is the default.  On the other hand, A wireless laptop or
PDA user might minimally want to be a reader to find locally hearable
servers advertising thru public access points.  A fixed server (a wired
box) might be a writer hooked up on the same network as an
802.11 access point.  Thus urld serves as a way to advertise local
information thru the access point to wireless systems that are in
the same "cell".  Of course remote URLs reachable via the Internet
can be advertised too.  An advertised URL does not need to be local.
(Local content can be a remote web server).  However the advertising
system is local or at least hearable via a local broadcast.

The server maintainer might not be too interested in seeing who
showed up at the cafe, and might simply run a write-only mode.  On
the other hand, A laptop/PDA user (hereafter a mobile user), might
want to both be a reader/writer to both learn about other urld
systems nearby (including special monthly coffee deals at the local
coffee emporium that is acting as a public 802.11 site)
and/or advertise the cool web pages that said Mobile User is making
available either on their laptop or someplace else.
 
We suggest that by default everybody run reader/writer, but you have to 
make your own decision.  A Mobile User might not wish to let others know 
that he/she is lurking nearby.  DE GUSTIBUS NON DISPUTANDUM EST.
(How do you say mobile in Latin?).

btw, urld now has an official IANA approved UDP port, 3534.

1.1 usage scenarios

Let us suggest three possible usage scenarios, which we will call:

1. server advertisements

Assume you have urld, and are the owner of an Access Point that you
have made available to the public somehow, be it a for-pay scheme, or
a free scheme.  Let's assume your company sells almond lattes, and is
called lottajava inc.  And that you have a web server somewhere (possibly
on your local urld server or elsewhere) that has a web page setup to
advertise your company either locally or nationally.  This "web server"
could be on an openAP system, of course.  Or it could be somewhere else
entirely.  This is your choice.  We would point out though that "local"
urls may be better, because you are trying to advertise to *local* customers.

You could hook up urld as follows:

-------------
|  server   |  <--- runs urld and writes to ethernet broadcast
-------------
      |
      | ethernet (urld writes urld message)
      ----------------------------------------------------
				|
				| ethernet
			     802.11 Access Point
				|
				| WLAN side
		-------------------------------------
		|		|		    |   local wireless domain
urld readers    MU1		MU2		    MU3
	     
The above could be collapsed/integrated onto a UNIX system that has openAP
capability.  Note that the AP is a bridge.  We expect broadcast packets
sent on the ethernet to wander onto the wireless link (which is how
things work anyway).

You create a web page on the server (or somewhere), and setup 
the urld configuration as a writer, to advertise your url as follows:

http://www.lottajava.com   LottaJava Inc page

Your server (we will assume it is a UNIX box, say running linux,
with an ethernet port called eth0) has urld on it, and possibly a
web server for www.lottajava.com, although the web server could be in
Jamaica. You run urld in writer mode, and it writes out your url above.
Your customers can see it, since they are running urld in reader mode.
So then your url is stored in customer urld read-side output files.
The customer simply uses any web browser, displays the local urld file,
and then clicks on your url to visit your page.  By default urld sends
messages every 10 seconds, and then throws them away if they are not
refreshed in around 30 seconds.

2. mobile node advertisements (Mobile Users as peers)

In theory, with an 802.11 AP in managed mode, it MAY be possible for
Mobile Users to see other Mobile Users.  (We need to widely deploy
urld and see what features or misfeatures of APs exist in that arena).
(In theory, this should work.  In practice, it HAS worked, but there is
no telling how random APs may behave in this regard).  So for example,
MN1 above at the coffee shop, should be able to see that MN2 and MN3 are
"nearby".  This assumes of course that MN1, etc., are writing.  If they
are reading, you won't know about them from the urld point of view.
Lurking nodes are certainly possible.

3. ad hoc applications based on #2

If Mobile Nodes can send messages, it should be possible to build
higher-level applications that could take the file:/tmp/urld.html
file as input, (or a pure XML version) and thus determine local systems 
(local peers).  This might allow systems that are in the same broadcast 
domain (broadcast area) to exchange files in a peer-peer fashion.  
One could write a messaging application or a N-party game as well.
XML probably has a role to play here.

---------------
2. help needed

We need the assistance of a community committed to making this work.
We submit that urld is a mobile-wireless application, and can have
widespread applicability in helping to make public (and private) wireless 
nodes popular, especially with the people bringing up APs for public
use. 

How can others help?

1. port urld to something else.  Ideally, urld needs to be as universal
as possible.  We have supplied linux/freebsd/WIN32 and (not yet) java versions.
(They may have bugs and can stand more testing too).
Urld can stand to be ported to other platforms.  If you do so,
please resubmit your code with binary for re-release.  

2. set urld up and test it and demonstrate it to others.  Propaganda
efforts are needed and are important.  Urld needs to be deployed
on wired servers so that wireless customers can take advantage of it.
	
3. take urld and engineer up some higher-level application for it.
Something using XML would be a very nice idea.

---------------
3. Some implementation details

Note we have supplied WIN32, FreeBSD, and Linux capabilities,
as well as a java script.

In this section, we present a few implementation details.
Urld is in some sense, "simple", and maybe it is not so simple.

3.1 sockets

Of course, urld uses UDP sockets.  There is a reader socket, and
1 to many writer sockets.  Writer sockets are per interface.
1 to N interfaces may be specified in the config file.  Each
interface means urld is supposed to write the broadcast (or multicast)
urld packet out said interface.  There are various not terribly
interoperable mechanisms used to bind "broadcast" output to
an interface.  On BSD, it is a pain in the rear end, as you
have to use the Berkeley Packet Filter (bpf).  On linux there is
a nice socket option that makes it easy.  Thus we can distinquish
at least various different possible capabilities like so:

can read broadcast and/or multicast

can write broadcast/and or multicast to a "default" interface.

	can write broadcast to a second interface, that is
		not the first interface (according to ifconfig -a)

	can write broadcast/multicast to > 1 interface

In general linux/Freebsd systems can do all of the above, barring
	FreeBSD not being able to write when an interface command
	is not explicitly mentioned.  Sumit came up with a way
	to make this fairly flexible with WIN32 as well.

3.2 write side

The writer takes urls specified in the urld.conf file and
writes them out 1-N interfaces in a Tag Length Value format.
The protocol itself is specified in docs/urld_protocol.txt,
and is fairly straightforward and easily extensible (similar
to radius when it gets down to it).  Writes are coordinated by
the sendTime configuration setting which like all urld timers
is measured in seconds.  Of course, this is done with "alarm" or 
any functional equivalent that gives you seconds.  
Logically urld can be divided into a writer thread and a reader
thread.  However, on UNIX, the two threads can be "simulated"
possibly with an alarm signal, and the select(2) system call.

3.3 read side

The read side reads ALL packets send to broadcast or multicast 224.0.0.1.
output is "filtered" in the sense that the MD5 message digest function
is used to learn if urls are "new" or not within the expire number
of seconds.  If not new, urls are ignored.  As a result, urld does
not write out its output HTML file, unless urls actually change
for some reason.  It will however, write that file out if a change
does occur immediately upon the reception of any packet.  Urls will
time out eventually, which will also cause a rewrite of the file. 
In addition, an optional urlTime timer is provided that sets
the automatic HTML "rewrite" pragma timer, which in theory,
should automatically "reload" a page.

---------------
4. security considerations

The fundamental problem with urld security is likely no different
from using the web elsewhere.  All urld does is produce a web file.
When you click on something in that web file, be careful what you
download, and especially download and execute with a web browser.
Common sense should apply here.  For example, if someone offers up a web 
page that consists of a word document, urld isn't going to make downloading 
that and viewing it any more or less safe.  It isn't going to
prevent you from downloading and executing a trojan horse program.

Urld does not execute anything.  It also limits the number of urls
received per system, and the total number of systems that can
be heard from.  Input is ASCII and is placed in the output html file.

Urld's read buffer is not on the stack. It is limited to 1500 bytes.
The size is checked via the recvfrom(2) system call.  This should
limit the possibilities of any buffer overflow attacks.  

Urld does have to run as root on unix systems because it uses
broadcast sockets. (Although at some point, perhaps linux will
have a capability for that?).  It writes to 255.255.255.255 from the broadcast
IP point of view.  This is called "limited broadcast".  Urld does
not use directed broadcast.  It also cannot write messages faster
than one message per second.  No message can be larger than 1500 bytes.

---------------
5. contact us

Sumit Chawla at sumit@cs.pdx.edu

Jim Binkley at jrb@cs.pdx.edu