This week's topic is URLs and how to interpret them. Many of you, especially students, will know most of this already, but bear with me and I think you'll find a few new things.
So what exactly does "URL" mean, anyway?
It stands for "Uniform Resource Locator"; the name comes from the fact that it was designed as a common or uniform method of stating the location of many different kinds of files or resources on the Internet, as opposed to the widely varying standards in place at the time. Originally it stood for "Universal Resource Locator" though this was changed some years ago when it was realized it would never quite supplant every other type of address out there.
How do you pronounce "URL"?
Most commonly, it is pronounced by spelling it out, though some people say it as a word sounding like "Earl". Both are correct as long as the person you're talking to understands what you mean.
Why do they look the way they do?
There are three main parts of almost every URL: the protocol identifier, the host name, and the path. As an example, let's look at a basic URL: "http://www.wou.edu/ucs/index.html, the address of the UCS home page.
What is the protocol identifier?
The protocol identifier is everything before the double slash; in this case "http:". This stands for the HyperText Transfer Protocol, the method by which browsers request information from web servers. URLs starting with "http:" are the addresses of web pages or other files accessible over the Web. Other common protocol identifiers you might see are:
Note that protocol identifiers are usually spelled out in lowercase, but really it doesn't matter if they are capitalized or not.
What is the host name?
The host name comes in between the double slashes and the first single slash; in this case, "www.wou.edu". As you can see, the host name is divided by dots; the first part is the name of the actual computer that hosts the file being requested, and the remaining parts make up the domain name, which tells your browser where to look to find that computer.
Note that "www" is just a commonly used name for web servers; really, a web server can be named anything, so don't be surprised if you don't see "www" anywhere in a URL. Also note that, like protocol identifiers, host names are not case-sensitive; you can type them in capitals or lowercase or any combination thereof, though lowercase is most commonly used.
The domain name is actually optional when you are within that domain; for example, when you are on the campus network, you can put http://www/ucs/index.html into your browser and it will return the same page as http://www.wou.edu/ucs/index.html because you are within the "wou.edu" domain already. There's no real reason to do this, though, except maybe to save a few keystrokes if you have to type in the URL by hand; if you ever send a URL to somebody else, you should use the full host address including the domain name.
How do I interpret domain names?
Domain names work from most specific to most general; "wou.edu" indicates the piece of the Internet run by WOU, and that WOU is an educational institution. There might be an address out there called "wou.com", which indicates a commercial business, perhaps named "Widgets: Original and Unlimited". As it happens, "wou.com" has actually been registered by a company that is in the business of registering domain names they don't really need and reselling them, but that doesn't affect us here at WOU because our domain name is under the ".edu" top-level domain.
Note that spammers or scammers sometimes try to take advantage of some people's confusion about domain names; for instance, you might get a scam email claiming to be from PayPal and asking you to click a link at "www.online-paypal.com".The domain name "online-paypal" could be owned by anyone in the world; just because it has the word "paypal" in it doesn't mean it has anything to do with the company named PayPal. Remember that the dot is the only character that separates parts of the domain name; a dash, underscore, or anything else is just part of that piece of the domain name.
What about a longer domain name like "www.bbc.co.uk"?
Top-level domains like ".com", ".org", ".net", ".edu" etc. are generally only used for computers located in the United States. Domain names ending in a two-letter country code such as ".uk" or ".de" are located in that country; "uk" stands for "United Kingdom", i.e. Great Britain, and "de" for Deutschland, i.e. Germany. The ".co" in the domain name denotes a commercial entity, similar to ".com" in the USA. The "bbc" part indicates that the computer belongs to the British Broadcasting Corporation, and the "www", of course, is the name of the specific computer, in this case the BBC's main web server.
As if this weren't confusing enough, some organizations in the US use the ".us" country code; many state and city governments, for instance, do this. Also, just in the last few years, more top-level domains have been approved, such as ".biz", ".info" and others; as time goes by, things will get increasingly complex.
How can I register a domain name of my own?
I can only cover the basics, as the details of the process would make this FAQ far too long. In general, what you need to do is go to the website of a domain name registration company (of which there are many) and search to find a domain name that is not already in use. Once you've found one, you pay a fee to the company to register that name. You'll have to renew that registration on a regular basis, usually yearly; you should be very careful to do this before it expires, because spammers often swoop in and register names that have just expired, hoping to take advantage of any traffic that goes to that domain.
Then you need to find a place to host the domain; there are far too many web hosting companies to list, and we cannot give out recommendations. You should consider many factors such as how much storage space you can get, how much traffic (AKA bandwidth) you are allowed, what other services (such as email lists, databases, and programming languages) the company offers, what their customer service is like, and of course the price.
There are many sites that specialize in comparing and contrasting various web hosting companies; I suggest going to google.com and searching for "compare web host companies" (without the quotes) to find some of these sites. Many include testimonials and reviews by customers, but you should be careful about deciding based on only one comparison site; if you are interested in a particular company, be sure to check them out in several comparison sites before deciding.
Once you've decided on a hosting company and paid them their fee, you need to upload web pages to the location they give you. Usually this is done via FTP. You can also configure email addresses and any other services the company provides. After that comes the problem of getting people to visit the site, but that is an entirely separate issue and will not be covered here.
Note that WOU does not register domain names, or host domain names registered elsewhere, except in very special cases such as "www.wouwolves.com", owned by the athletic department.
Right, let's get back to URLs, then. What's the last part of the URL?
That is the path. Just as the host name states the particular computer the resource is located on, the path indicates the specific file on that computer. It is divided up by slashes, indicating successive layers of folders, and usually ends in a file name. The structure of the path is totally up to the administrator of the server to decide; there are a few standard practices, but none are actually required, and I don't have room to go into them anyway.
One important thing to remember is that on Unix and Linux computers, which make up about two-thirds of all servers on the Internet, path names are case sensitive. This means that a word with capital letters is considered different from one with lowercase. For instance, since our web server runs Unix, you will get an error if you try to go to http://www.wou.edu/UCS/index.html instead of http://www.wou.edu/ucs/index.html. If you have to type in a URL, just to be on the safe side you should make sure to get the capitalization exactly right.
What if the URL ends in a slash, with no filename?
In this case, the web server will try to find an index file for the specified folder. For most web servers, including ours, the default index files name is "index.html", though many (again, including ours) are moving to the PHP language, so their index files will be named index.php. If you go to http://www.wou.edu/ucs/ you will see that it is the same as http://www.wou.edu/ucs/index.html. It is a good idea to name the main page in a folder "index.html" because it makes it easy to find. If there is no file named "index.html" then our web server will try other names such as "index.php", "index.htm", "default.htm", and a few others; if no file by any of these names is in the folder, the web server will show a list of every file in the folder, including those that you might not have intended the public to see.
What about URLs with that "~" character after the first slash?
That character is called a "tilde" and on most web servers including ours, it indicates that the following word is a username, and that the path after it starts from a folder in that user's home area reserved for public web pages. Here at WOU, that is the "public_html" folder in your personal drive; that folder is accessible to the world via the URL http://www.wou.edu/~yourusername/. Remember to name the main page in that folder "index.html" so people will be directed there instead of seeing the entire list of files in that folder.
What about URLs that change as soon as I type them in and hit Enter?
Those are web aliases. They are URLs that lead to another URL, which may be on the same server, a different one, or even in an entire other domain. For instance, the URL http://www.wou.edu/faq is an alias that leads to the FAQ index at http://www.wou.edu/ucs/faq/. The address of the actual page may change at some point, but that alias will not; all you will ever have to remember is that alias. Another example is http://www.wou.edu/forums which is an alias leading to the forums index page at http://forums.wou.edu/jive4/index.jspa. We will set up aliases by request for anyone who has a web page at WOU; just email me at email@example.com and tell me the exact URL if the existing page, and the alias you want. We won't give out very short and generic aliases because that limits the choices available to others, but go ahead and ask for what you want because that decision is made on a case-by-case basis.
I clicked on a URL in an email message and it didn't go to where it said it would go to! What happened?
First of all, you should not be clicking links in email messages. It is very easy, using a variety of tricks, to make an email program display a link as one thing when it really leads somewhere else. If there is any doubt, copy the text of the URL and paste it into your browser; that will defeat most tricks. Even then, though, you should be extremely careful. Never enter sensitive information in a web page you reached via a link in an email message, unless there is absolutely no doubt that the message came from the person you think it did, and you trust that person.
Anyway, that's all I have time for today. This is only a basic overview; I've left out port numbers, CGI parameters, and other URL features, but you can feel free to ask me more questions by emailing me at firstname.lastname@example.org or posting them on the FAQ forum under the topic for this FAQ.(Remember that you can get to the forums at http://www.wou.edu/forums and log in with your email username and password.)
University Computing Services 503-838-8154 | or e-mail: email@example.com