URLD - A wireless-oriented web page discovery program. This document is an introduction to urld. Jim Binkley jrb@cs.pdx.edu Sumit Chawla, sumit@cs.pdx.edu Outline: 1. basic idea introduction 2 usage scenarios 2. help needed port it to something demo it on something do something nifty and unexpected with it 3. implementation/how it works (some details) 4. security considerations 5. contact us --------------------------- 1. Basic Idea Urld reads and writes UDP-based broadcast messages made up of World Wide Web Uniform Resource Locaters (URLs). These messages consist of a system identity string and an associated set of 1 or more HTML tags; for example, one URL might consist of: http://www.cs.pdx.edu, "PSU CS department page" And the logical output in the web page created by urld on some other system might look like: 131.252.201.4 homebrew.cs.pdx.edu PSU CS department page <----- an url ... ---------------------- Messages from nearby nodes are sent to IP limited broadcast and are written to a local html file on a receiving system. For example, on UNIX, the default output file is /tmp/urld.html. This file may be viewed by any web browser via file:/tmp/urld.html. As a result one can determine nearby systems. In summary, systems advertise WWW URLs to each other. Thus we can distinquish between two kinds of urld runtime modes, which we can call "reading" and "writing". A system may be a reader (sends URLs), a writer (reads URLs and puts them in a local html page), or both. Doing both is the default. On the other hand, A wireless laptop or PDA user might minimally want to be a reader to find locally hearable servers advertising thru public access points. A fixed server (a wired box) might be a writer hooked up on the same network as an 802.11 access point. Thus urld serves as a way to advertise local information thru the access point to wireless systems that are in the same "cell". Of course remote URLs reachable via the Internet can be advertised too. An advertised URL does not need to be local. (Local content can be a remote web server). However the advertising system is local or at least hearable via a local broadcast. The server maintainer might not be too interested in seeing who showed up at the cafe, and might simply run a write-only mode. On the other hand, A laptop/PDA user (hereafter a mobile user), might want to both be a reader/writer to both learn about other urld systems nearby (including special monthly coffee deals at the local coffee emporium that is acting as a public 802.11 site) and/or advertise the cool web pages that said Mobile User is making available either on their laptop or someplace else. We suggest that by default everybody run reader/writer, but you have to make your own decision. A Mobile User might not wish to let others know that he/she is lurking nearby. DE GUSTIBUS NON DISPUTANDUM EST. (How do you say mobile in Latin?). btw, urld now has an official IANA approved UDP port, 3534. 1.1 usage scenarios Let us suggest three possible usage scenarios, which we will call: 1. server advertisements Assume you have urld, and are the owner of an Access Point that you have made available to the public somehow, be it a for-pay scheme, or a free scheme. Let's assume your company sells almond lattes, and is called lottajava inc. And that you have a web server somewhere (possibly on your local urld server or elsewhere) that has a web page setup to advertise your company either locally or nationally. This "web server" could be on an openAP system, of course. Or it could be somewhere else entirely. This is your choice. We would point out though that "local" urls may be better, because you are trying to advertise to *local* customers. You could hook up urld as follows: ------------- | server | <--- runs urld and writes to ethernet broadcast ------------- | | ethernet (urld writes urld message) ---------------------------------------------------- | | ethernet 802.11 Access Point | | WLAN side ------------------------------------- | | | local wireless domain urld readers MU1 MU2 MU3 The above could be collapsed/integrated onto a UNIX system that has openAP capability. Note that the AP is a bridge. We expect broadcast packets sent on the ethernet to wander onto the wireless link (which is how things work anyway). You create a web page on the server (or somewhere), and setup the urld configuration as a writer, to advertise your url as follows: http://www.lottajava.com LottaJava Inc page Your server (we will assume it is a UNIX box, say running linux, with an ethernet port called eth0) has urld on it, and possibly a web server for www.lottajava.com, although the web server could be in Jamaica. You run urld in writer mode, and it writes out your url above. Your customers can see it, since they are running urld in reader mode. So then your url is stored in customer urld read-side output files. The customer simply uses any web browser, displays the local urld file, and then clicks on your url to visit your page. By default urld sends messages every 10 seconds, and then throws them away if they are not refreshed in around 30 seconds. 2. mobile node advertisements (Mobile Users as peers) In theory, with an 802.11 AP in managed mode, it MAY be possible for Mobile Users to see other Mobile Users. (We need to widely deploy urld and see what features or misfeatures of APs exist in that arena). (In theory, this should work. In practice, it HAS worked, but there is no telling how random APs may behave in this regard). So for example, MN1 above at the coffee shop, should be able to see that MN2 and MN3 are "nearby". This assumes of course that MN1, etc., are writing. If they are reading, you won't know about them from the urld point of view. Lurking nodes are certainly possible. 3. ad hoc applications based on #2 If Mobile Nodes can send messages, it should be possible to build higher-level applications that could take the file:/tmp/urld.html file as input, (or a pure XML version) and thus determine local systems (local peers). This might allow systems that are in the same broadcast domain (broadcast area) to exchange files in a peer-peer fashion. One could write a messaging application or a N-party game as well. XML probably has a role to play here. --------------- 2. help needed We need the assistance of a community committed to making this work. We submit that urld is a mobile-wireless application, and can have widespread applicability in helping to make public (and private) wireless nodes popular, especially with the people bringing up APs for public use. How can others help? 1. port urld to something else. Ideally, urld needs to be as universal as possible. We have supplied linux/freebsd/WIN32 and (not yet) java versions. (They may have bugs and can stand more testing too). Urld can stand to be ported to other platforms. If you do so, please resubmit your code with binary for re-release. 2. set urld up and test it and demonstrate it to others. Propaganda efforts are needed and are important. Urld needs to be deployed on wired servers so that wireless customers can take advantage of it. 3. take urld and engineer up some higher-level application for it. Something using XML would be a very nice idea. --------------- 3. Some implementation details Note we have supplied WIN32, FreeBSD, and Linux capabilities, as well as a java script. In this section, we present a few implementation details. Urld is in some sense, "simple", and maybe it is not so simple. 3.1 sockets Of course, urld uses UDP sockets. There is a reader socket, and 1 to many writer sockets. Writer sockets are per interface. 1 to N interfaces may be specified in the config file. Each interface means urld is supposed to write the broadcast (or multicast) urld packet out said interface. There are various not terribly interoperable mechanisms used to bind "broadcast" output to an interface. On BSD, it is a pain in the rear end, as you have to use the Berkeley Packet Filter (bpf). On linux there is a nice socket option that makes it easy. Thus we can distinquish at least various different possible capabilities like so: can read broadcast and/or multicast can write broadcast/and or multicast to a "default" interface. can write broadcast to a second interface, that is not the first interface (according to ifconfig -a) can write broadcast/multicast to > 1 interface In general linux/Freebsd systems can do all of the above, barring FreeBSD not being able to write when an interface command is not explicitly mentioned. Sumit came up with a way to make this fairly flexible with WIN32 as well. 3.2 write side The writer takes urls specified in the urld.conf file and writes them out 1-N interfaces in a Tag Length Value format. The protocol itself is specified in docs/urld_protocol.txt, and is fairly straightforward and easily extensible (similar to radius when it gets down to it). Writes are coordinated by the sendTime configuration setting which like all urld timers is measured in seconds. Of course, this is done with "alarm" or any functional equivalent that gives you seconds. Logically urld can be divided into a writer thread and a reader thread. However, on UNIX, the two threads can be "simulated" possibly with an alarm signal, and the select(2) system call. 3.3 read side The read side reads ALL packets send to broadcast or multicast 224.0.0.1. output is "filtered" in the sense that the MD5 message digest function is used to learn if urls are "new" or not within the expire number of seconds. If not new, urls are ignored. As a result, urld does not write out its output HTML file, unless urls actually change for some reason. It will however, write that file out if a change does occur immediately upon the reception of any packet. Urls will time out eventually, which will also cause a rewrite of the file. In addition, an optional urlTime timer is provided that sets the automatic HTML "rewrite" pragma timer, which in theory, should automatically "reload" a page. --------------- 4. security considerations The fundamental problem with urld security is likely no different from using the web elsewhere. All urld does is produce a web file. When you click on something in that web file, be careful what you download, and especially download and execute with a web browser. Common sense should apply here. For example, if someone offers up a web page that consists of a word document, urld isn't going to make downloading that and viewing it any more or less safe. It isn't going to prevent you from downloading and executing a trojan horse program. Urld does not execute anything. It also limits the number of urls received per system, and the total number of systems that can be heard from. Input is ASCII and is placed in the output html file. Urld's read buffer is not on the stack. It is limited to 1500 bytes. The size is checked via the recvfrom(2) system call. This should limit the possibilities of any buffer overflow attacks. Urld does have to run as root on unix systems because it uses broadcast sockets. (Although at some point, perhaps linux will have a capability for that?). It writes to 255.255.255.255 from the broadcast IP point of view. This is called "limited broadcast". Urld does not use directed broadcast. It also cannot write messages faster than one message per second. No message can be larger than 1500 bytes. --------------- 5. contact us Sumit Chawla at sumit@cs.pdx.edu Jim Binkley at jrb@cs.pdx.edu