OpenSimulator’s login process and common login problems May 26, 2011Posted by justincc in opensim, opensim-arch, opensim-tech-basics, secondlife, viewers, virtual-environments, virtual-worlds.
Today I’m going to write about how a viewer logs in to an OpenSimulator grid. This is a considerably more complicated process than a simple website login. If you ever need to find out why login isn’t working it really helps to know what’s going on under the hood.
For simplicity’s sake, we’ll look at the standalone case, where all the regions and grid services are running under a single OpenSim.exe process.
Step 1: As the first step for logging in, the viewer sends an XMLRPC message to the Login Service URI containing the name, password, viewer version and other details. The -loginuri parameter on the command line that tells the viewer where the login service is (here it’s 192.168.1.2 and the port number is 9000). In viewers such as Imprudence the viewer’s grid manager can fetch this information from the grid info XML that the grid advertises at a well known URL.
Step 2: The login service uses these details to check that the password is correct. If it is then it looks up the simulator that the user should be placed in (e.g. last or home). In this case there’s only one region called “My Region”. The login service will generate a random ‘circuit code’ and ask the simulator to record this code and set up an ‘agent’ in My Region. The agent represents a user in an OpenSim region.
Step 3: Once the simulator has generated the agent, it returns a randomly generated ‘circuit code’ to the login service. The login service will package this up together with the IP and port address of the region and return it to the viewer in a reply XMLRPC message. The login service gets the region’s IP address and port from the ExternalHostName and InternalPort entries in the bin/config-include/Regions.ini file (config files there can also have other names as long as they end with .ini). In this case the entry is
[My Region] RegionUUID = dd5b77f8-bf88-45ac-aace-35bd76426c81 Location = 1000,1000 ... InternalPort = 9000 ExternalHostName = 192.168.1.2
The host name here is 192.168.1.2 (same as the login service since we’re on a standalone) and the internal port is 9000. But in this case it specifies the port that the client should use for UDP messages between itself and the simulator (we’ll ignore HTTP capabilities in this post). The ExternalHostName can also be SYSTEMIP, in which case the default IP of the machine hosting the simulator is used (which would also be 192.168.1.2).
Step 4: When the viewer receives the XMLRPC reply, it extracts the circuit code, simulator ip and port. To make first contact with the simulator, it sends it a UseCircuitCode UDP message containing the circuit code. The simulator compares this against the circuit code that the Login Service gave it for that client. If they match then the simulator sends back and Ack packet and the two start talking to each other (i.e. the simulator gets sent terrain, object and texture information and can move around the region). If they don’t match then the simulator logs a warning and ignores the UseCircuitCode message.
Whew, quite a process, eh? As you can imagine, there’s a lot that can go wrong. Let’s go through the possible problems.
Viewer has wrong loginuri
This is an easy one. If the viewer is trying to login with the wrong uri (e.g. 192.168.1.3 in the example above) or wrong port (e.g. 9001) then you’ll get something like an “Unable to connect to grid” – nothing will ever reach the Login Service.
Viewer has wrong login credentials
Another easy one. The Login Service will reject the credentials and tell the viewer, which will display a “Could not authenticate your avatar” message.
A firewall prevents the Login Service from replying to the viewer
In this case the viewer can send some initial TCP packets to the Login Service but can’t get anything back. As above, the viewer will present an “Unable to connect to grid” message but this time after a longer pause until it times out on the Login Service connection.
Viewer receives misconfigured external host name from Regions.ini
Now it gets more complicated. Suppose that instead of putting 192.168.1.2 in the My Region config I accidentally put 192.168.1.3 instead, which is a machine that doesn’t exist on my network.
[My Region] RegionUUID = dd5b77f8-bf88-45ac-aace-35bd76426c81 Location = 1000,1000 ... InternalPort = 9000 ExternalHostName = 192.168.1.3
In this case, the first part of the login process works okay and the progress bar moves along in the viewer. But when the Login Service returns the simulator information to the viewer, it returns the ExternalHostName of 192.168.1.3 instead of 192.168.1.2. The viewer will make a number of attempts to contact this non-existent simulator for the second part of the login, and so appear to hang for a while on a message such as “Waiting for region handshake…” before failing with a “We’re having trouble connecting…”
In this case, since 192.168.1.3 has no machine a simple ping will reveal the mistake. If there is a machine at that address or it’s the port number that is wrong then things are more complicated. It’s difficult to diagnose problems here since UDP messages are connectionless, unlike TCP. If you have a utility like netcat available on the viewer machine, you can try sending nonsense to the address and port given in Regions.ini. For instance, above we could doecho “foo” | nc -u 192.168.1.2 9000
and the simulator would print out a “Malformed data” message.
Viewer can’t reach region external host name
Now let’s suppose that the ExternalHostName and InternalPort are correct, but the viewer can’t reach that address for some reason (e.g. UDP messages to that port are blocked by a firewall). You’ll see exactly the same symptoms as if the host name is misconfigured. The diagnostics are also the same, with the addition that you need to thoroughly check your firewall and other network settings.
You can also see this if you’ve specified a public IP for ExternalHostName but you’re attempting a connection from within your LAN and your router does not support NAT loopback. The easiest solution is to get a router that does support NAT loopback though you might also want to try the workarounds listed on that wiki page.
A firewall prevents the simulator from replying to the viewer
Unlike the firewall blocking the login service reply above, this time the first part of the login process will complete correctly and the simulator will even receive the UseCircuitCode message. However, the Ack that it replies with (and any other UDP messages) is blocked by a firewall. In the simulator log you will see messages such as
[LLUDPSERVER]: Ignoring a repeated UseCircuitCode from 2c3b8307-e257-4d1e-b12f-76f2b8f50ee9 at 192.168.1.3:1208 for circuit 546230463
as the viewer resends the UseCircuitCode packet another 3 times (while it displays the “Waiting for region handshake…” message. Eventually, the viewer gives up and displays the “We’re having trouble connecting…” message. In this case, you need to carefully check that your firewall allows outbound UDP messages from the simulator to the viewer’s IP address.
As you can see, the login process is complicated. Much of this complexity exists so that in grid mode simulators can be hosted on different machines to the login service.
In grid mode, all of the above information still applies, with the addition that the login service and simulators communicate over a network rather than within a single process. This is another point of failure. If there’s a problem here then you should see an error in the login service log and the viewer will return with a “Unable to connect to grid” message.