Converting from Base10 to Base62

Although this is a fundamental problem used in a variety of situations, I couldn't find a blog which explained this in a way I would understand it easily, and that was annoying me. Hence, I've decided to write a short post on Base10 to Base62 conversion, and what it means.

What is a Base?


A number is said to be of a certain Base, based on the number of characters/digits that are used to represent a particular number. The most commonly used Base is Base10 or the Decimal system where the numbers are represented using ten digits (0-9). Any real number can be expressed in Base10, and what it means is each digit in that number is between the range [0-9]. For example here is an Integer represented in Base10 -

(27)10

Another commonly used system, on which computers operate is the Binary or Base2 system. Here every number is represented by just 2 digits (0 & 1). Here is the number 4 represented in Binary -

(100)2

Converting between these two is an elementary problem and is explained in many blogs/websites/YouTube.


What is Base62?


As you can predict, Base62 is a way of representing numbers using 62 characters/digits. They are -

a-z -> 26
A-Z -> 26
0-9 -> 10
Total -> 62

Note - Base64 usually includes + and / characters additionally. 

Base62 is extremely useful when you want to map a number into a string which contains the characters listed above. One of its main applications is in creating a TinyURL for a given URL. 

Assume for example your TinyURL generator generates a 7 character URL for every long URL you provide it, and the characters can only be in the range specified above. This means we can potentially create -

62^7  3.5 trillion URLs

These 3.5 Trillion URLs can be represented by 43 bits in the Binary system. That means - 

0101010110......01010 (43 bits in total)

Now, assuming we have a 43-bit number and we want to get its equivalent TinyURL. The first thing we do is to convert this Binary number into Decimal. For simplicity's sake, let's assume the 43-bit number maps to 100 as its Decimal equivalent. 

Now, we have to convert 100 to Base62. This can be done by the following algorithm -
dividend = number
while(dividend > 0) {
   remainder = dividend % 62
   dividend = dividend / 62
   collection.prepend(remainder)
}
What this algorithm is doing is to modulo and divide the number by 62 to get the digits we need to map to characters. This will essentially return a collection of digits from [0-61].

Let's implement this example for Decimal value 100.

Pass 1 - 

Remainder - 100%62 = 38; Dividend - 100/62 = 1; Collection -> [38]

Pass 2-

Remainder - 1 % 62 = 1; Dividend - 1/62 = 0; Collection -> [1,38]


Now that we have our Collection we can map the values to their appropriate characters -

base62Alphabet = [a,b,c,...,A,B,C,...,0,1,2,...] is an array of 62 characters, with each index representing one character. From this mapping we get -

1 -> b
38 -> M

So the equivalent Base62 conversion of 100 -> bM. Using the same logic, we can reduce any 43-bit number to it's corresponding 7 character string, which will be our TinyURL.


P.S - We can verify that our Decimal is obtained from the Base62 by doing this -

[1,38] -> 1 * 62^1 + 38 * 62 ^ 0 = 62+38 = 100.

P.P.S - The same algorithm can be used to convert from Base10 to any Base, by replacing the number we modulo and divide on with the Base we want it to be converted.



Comments

  1. Thanks for the post. have one question though. How do we make sure the short URL is always 7 character long. The example of using 100 created a short URL which is only 2 character long. Kindly clarify

    ReplyDelete

Post a Comment