Short URLs in Markup

Everything you need to know about rel-shortlink

Authors and administrators can and should use rel="shortlink" to indicate their preferred short URL1 for a piece of content. Below, I discuss history, reasoning, implementation details, and helpful hints.

Why rel="shortlink"?

Previous suggestions included rev="canonical" which is both confusing and technically incorrect2 and the less problematic rel="shorturl".

However, rel="shortlink" is a WhatWG standards body3 proposal and is going through the MicroFormats process, unlike any other variants. A rel="shortlink" link is the way to go.

Terms used in this document §

slug
The part of a short URL after the domain and slash.
service
A web app that takes a long URL and optionally a custom slug and returns a shorter URL that redirects to the original URL.

Implementation into content §

The following examples are mostly from Microformats.org. The long, original resource URL is http://www.flickr.com/photos/tantek/3909804165/.

The #1 most important thing to do as an author or website administrator is to use the following HTTP header to indicate the "shortlink."4 §

Link: <http://flic.kr/p/6XuLyD>; rel="shortlink"

Authors should also include the following in the HTML <head>5 §

<link rel="shortlink" type="text/html" href="http://flic.kr/p/6XuLyD">

The type attribute is optional and almost meaningless since the other end of the link is just a redirection back to the current page; thus the content type is already implied.

Both of the above representations are important and machine-readable. (You can test your implementation here.)

If authors wish to expose short URLs to human site visitors (e.g. to suggest short URLs in the <body>), they may do so any way they wish. As long as the above methods are also used, services (i.e. machines) will correctly understand the canonical shortlink.

However, there are still related microformat standards which maintain a degree of machine-readability for user-facing markup. Markup standardization is inherently good and encourages portability (of CSS styles and JavaScript code, for example). §

<a rel="shortlink" type="text/html" href="http://flic.kr/p/6XuLyD">
 http://flic.kr/p/6XuLyD
</a>

or

<input type="text"
  class="shortlink"
  value="http://flic.kr/p/6XuLyD"
  readonly="readonly" />

The latter would present an input box with the short link, which makes selection for copy & paste easy. The readonly attribute prevents users from accidentally overwriting the link.

User selection of shortlinks may be further aided by executing the following JavaScript when users click the text input: §

this.focus(); this.select();

A best-practice, unobtrusive jQuery code snippet to apply this to any and all input elements with a class of "shortlink" is as follows:6 §

$(document).ready(function() {
        // Auto-select short urls
        $('input.shortlink').click(function()
        {
                this.focus();
                this.select();
        });
});

Some authors also provide a button that copies the short URL to the clipboard using Flash (as this is not allowed in JavaScript).7 Be aware that Flash can crash Macs and a growing number of users are using flash blockers. You may only want to show the Flash-based copy button if Flash is both installed and not blocked. Let me know of any related best-practice code. §

Shortening service implementations §

  1. Shortening services should always use the "301 Moved Permanently" HTTP status code combined with a "Location:" header to redirect user-agents to the longer URL. §

  2. The WhatWG suggests that short URLs consider avoiding lookalike glyphs such as 0/O and 1/l/I, "particularly when manual entry will be required (e.g. printed, spoken)." §

  3. The WhatWG also suggests allowing for human-readable slugs (short URLs), which allows users to better guess what lies on the other side of the link. This means allowing custom, manually-set slugs. §

  4. Whether or not capital letters are used is up to the service provider. It is easier to verbally dictate a URL of all lowercase, though mixed-case services can provide dramatically shorter URLs when hosting a large number of URL redirections. Thus it is my personal recommendation to consider using all lowercase when the redirection service is only expected to hold at most a few thousand redirections, and mixed-case otherwise.8 §

  5. Shortening services expected to produce four-letter or longer slugs, or allow the public to enter free-form slugs, should strongly consider blocking offensive terms for at least automatically-generated slugs.9 §

  6. In the spirit of the robustness principle,10 a URL shortening service should expect other programs to add extra punctuation on the end of short URLs it expects to resolve. The service should attempt to gracefully deal with this. For example, a user may write:

    ... these guidelines for short URLs (http://ajh.us/k).

    and a plaintext URL auto-linker11 may match this as

    http://ajh.us/k)

    in which case the service at ajh.us may receive a request for the URLs

    http://ajh.us/k),
    http://ajh.us/k%29, or perhaps even
    http://ajh.us/k).

    Thus, it is a best practice for services to use one of the following algorithms (in pseudocode) when redirecting12.

    Whitelist method (most forgiving):

    slug //from user
    pattern = /^[0-9a-z]+/i //whitelist of characters allowed in slug
    slug = regex_match(pattern, slug)
    if ! slug then fail
    

    Right-trim method (serves more 404s for truly horrible requests):

    slug //from user
    slug = urldecode(slug) //decode %-encoded characters e.g. %29
    characters_to_trim = ")>]}.,;!-"
    right_trim(slug, characters_to_trim)
    if length(slug) == 0 then fail
    

    Let me know if additional erroneous trailing characters should be stripped in the second method.

    I would use whichever algorithm is faster in your environment. §

  7. I am currently unsure as to the best way to deal with special characters like “”é() in long URLs. Such characters "shouldn't" be in HTTP URLs in the first place, but again, according to the robustness principle, services should support them, as they do show up in the wild.

    Please contact me if you are aware of a best practice for dealing with special characters in original URLs. §

Caveats

Be careful not to serve shortlink incorrectly by making incorrect assumptions regarding GET parameters. A resource's shortlink must redirect to exactly the same content.13

Also understand that using shortlink is a commitment to always support that short URL for as many years into the future as possible.14

Lastly, this standard is new (having been suggested less than a year ago, in early 2009). It is not finalized, and early some implementors are still using alternatives such as rev="canonical", rel="short", etc.15 However, this standard makes the most sense and is backed by standards bodies; Google Suggest also hints that rel="shortlink" is already the most popular choice.


  1. Short URLs redirect to a longer URL when entered into an address bar. They are convenient for media where space or characters come at a premium, such as Twitter, phone, and print. ↩︎

  2. Technically, a rev="canonical" link means "the page I am linking to considers me to be the canonical, or best/truest, URL for this content." An example of why this is wrong for short URLs was pointed out by Eli White: "The canonical URL of something on php.net might look like: http://php.net/manual/en/security.php ... However, you will never actually be served HTML under that canonical URL, instead you are auto-redirected to a mirror, such as: http://us3.php.net/manual/en/security.php." This means for php.net to use rev="canonical" would be to incorrectly assert that the current page is the canonical, or "one true" URL for that content. What we really want is a way to say, "the resource I am linking to is the canonical, or 'one true,' short URL for myself," which is a different thing entirely. ↩︎

  3. The WhatWG is the group bringing us HTML5 and are similar in function to the W3C. ↩︎

  4. This is because other services, such as Twitter clients, may use an HTTP HEAD request on the long URL to see if it has a canonical short URL defined. ↩︎

  5. This helps bookmarklets and other client-side code function, as bookmarklets only have the document to work with, not the headers. ↩︎

  6. It is tempting to apply the above code on the focus event (a more general event than click, it would include tabbing into the input box), but some browsers require the focus event to finish firing before select() is called for select() to have any effect. ↩︎

  7. Please indicate to your browser manufacturer and/or standards bodies that you would like a JavaScript method to copy to the clipboard (if the user trusts the site in question.) ↩︎

  8. It is possible, though uncommon and not recommended, to produce even shorter URLs by using more characters than just alphanumerics. For instance, slashes, periods, commas, and a question mark could be used. However, there are serious reasons to avoid these characters in slugs, especially as a trailing character. For instance, the URL http://example.com/foo. will typically become a hyperlink to http://example.com/foo, losing the trailing full stop, when auto-linked as it would be in Twitter or a plaintext email. Some other short URL providers use Unicode symbols in their short URLs, but support for these URLs is weak among email and Twitter clients and on many mobile devices. ↩︎

  9. TinyURL.com had a minor mess on their hands when someone linked an offensive slug to the then-current U.S. First Lady's homepage. ↩︎

  10. The robustness principle is, "Be conservative in what you do; be liberal in what you accept from others." ↩︎

  11. John Gruber suggests a liberal regex pattern that can avoid these mistakes (usually). ↩︎

  12. Unless of course the characters to be trimmed can appear in legitimate short URL slugs. In that case, you should trim them only if a legitimate short URL is not found. However, supporting these characters is obviously a really bad idea, in the opposite case where third party plaintext URL auto-linkers prematurely end matching your link. ↩︎

  13. This excludes trivial enhancements to the page, e.g. a user menu at the top, as long as the main content resource is unchanged. ↩︎

  14. Cool URIs don't change ↩︎

  15. As of February 14, 2010, this still includes Flickr, where the original rev-canonical variant was first proposed. ↩︎


February 23rd, 2010
Alan Hogan (@alanhogan_com).  Contact · About