Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Questions on encoding and combinatorics for an equation generator web app

  1. Nov 11, 2012 #1
    (I apologize for the following lack of clarity, I know next to nothing. Links would be cool if the answer would take to long, I simply don't know how to even google these questions )

    Question 1, non decimal numbers:
    I have a set of parameters I'd like to encode. Each parameter is in the form XX, under a base 62 number system using the characters (0-9, a-z, A-Z). If there are two of these "digits", how many numbers do they represent?

    My guess is that it's BASE^DIGITS, so 62^2 which gives me 3844. 2^62 yields a bazillion which can't be correct (right?). Is BASE^DIGITS even how you do it?

    Question 2, converting non decimal to decimal:
    Given a decimal number, say 1337, how do I convert this to my 62-base and vice versa? I have no idea where to begin with this.

    Question 3, encoding combinations as tightly as possible:
    Let's say I have 4 options (A, B, C, D), how can I best fit these options into 2 characters in my base-62 number system. Here's my guess (I'm lol'ing at the thought of how I'm going to try and word this):

    Each option is either "on" or "off", 1 or 0. Therefore:
    A = 1 or 0
    B = 1 or 0
    C = 1 or 0
    D = 1 or 0

    If I make each option represent one of these bits so that they are in the order of A B C D and reverse it, D C B A then:
    A = 1
    B = 2
    C = 4
    D = 8
    E = 16
    F = 32...

    Therefore, if my 62-base encoding in Question 1 is 0a (which equals 10 in decimal), then I know that I have options B and D because 2 + 8 = 10, and I can't combine A, B, C, or D in any other way to make 10.

    Am I on the right track with this? So if my encoding equates to 23, I know it must represent options E + C + B + A because 16 + 4 + 2 + 1

    This feels correct to me. I just don't know how to actually determine 23 is E + C + B + A algorithmically.


    I'm returning to school this Spring after several years without any formal education and would like to catch myself back up in Math (I'm retaking Trig to be on the safe side). To do this, I am developing a web app which spits out math problems.

    I'd like to develop what I call "sessions", which is a set of questions that fit certain parameters. I'd like to encode a session into a URL such that you would have something like:



    AA = The ID of an area of mathematics (Algebra, Geometry, Calculus, etc)
    XX = Is a certain type of problem within that area (Solve for X, Area of a triangle, limits, etc)
    YY = Is some type of modifier (only use integers, use real numbers, etc)
    ZZ = Number of problems

    The number system is base 62 (characters a-z, A-Z, 0-9), and each parameter is a pair of digits to make it easier. Therefore, each parameter has 62^2 = 3844 possibilites.


    Ugly Alternative
    The ugly alternative to this would simply be to not encode and do something like:
    mathiom.com/?field=calculus & type=limit & modifier=integers & count=10

    which would return 10 limit problems using only integers (whatever that means lol). If I were to create a complex session with 100 problems using all types of modifiers then the query string above would grow ridiculously long.

    This project will be hosted on GitHub, so people can add extra fields of math, modifications, etc. This means there will be no server-side language and therefore I can't simply keep a database of shortcuts (ie, mathiom.com/?session=1 would equal the above).

    Yes, this is a lot of work just to get nice URLs...but it's the best way I can think of that doesn't resort to a URL shortening service like bit.ly
  2. jcsd
  3. Nov 12, 2012 #2

    I like Serena

    User Avatar
    Homework Helper

    Welcome to PF, OMGCarlos! :smile:

    For the record, your ugly alternative would be my preferred solution.
    I think it doesn't really matter that the string grows long, especially not if it's done by a program.
    However, it is useful that the string is readable by humans.

    Your answer to Q1 is correct.

    For Q2 to convert 1337 to 62-base, you need to do:

    hi62digit = 1337 div 62, where div is division rounded down
    lo62digit = 1337 mod 62, where mod is the remainder after division
    After that you'll need to convert each digit to the appropriate symbol using a set of if-statements.

    To convert it back, you need a set of if-statements to convert the symbol to a digit.
    After that you can calculate the number with:
    number = hi62digit * 62 + lo62digit

    For Q3 you're basically converting a binary number into your 62-base number.
    Calculate the number with:
    number = A * 1 + B * 2 + C * 4 + D * 8
    or alternatively with:
    number = A + (B << 1) + (C<< 2) + (D << 3), where << is the shift-operator.
    Then encode the number to 62-base as before.
  4. Nov 12, 2012 #3
    Wow, I understand completely. Thanks!

    It didn't occur to me to use a series of if-statements or even a table to convert from 62-base to decimal. Looking at this now, it is pretty obvious as my program can't possibly know that 'a' represents 10, for example.

    However after thinking this through, I have to multiply the number of question type by 8 (AAXXYYZZ), so even at 10 question types I end up with an 80 character encoding which defeats the purpose.

    I think I'll go with the alternate method and just write my own URL shortener. I could represent 62^4 short URLs in 4 characters, which is infinitely more than my app will probably ever generate.

    Thanks :)
  5. Nov 12, 2012 #4


    User Avatar
    Homework Helper

    You might consider using something similar to youtube's base 64 system for video id's which also uses "-" and "_" in addition to (0-9, a-z, A-Z).

    For conversion, you can use a 62 or 64 byte table to convert a base 62 or base 64 digit into an Ascii character, and use a 256 byte table (with only 62 or 64 values used) to convert an Ascii character to a base 62 or base 64 digit. This will eliminate having to use if statements.

    If you use a base 64 system, then an integer, which is stored as binary in a computer, will already be the base 64 bit number, with upper digit = (integer >> 6) and lower digit = (integer & 0x3f).

    For the options, you could use a table of 4096 pointers to "strings", where each "string" is a count, followed by <count> options.
    Last edited: Nov 12, 2012
  6. Nov 12, 2012 #5
    Interesting. I didn't think of using byte tables, which is funny considering that I was actually referring to the ASCII table while I worked out a solution.

    Your method would certainly be faster and more optimized, but what I ended up doing is building a string of allowable characters: $chrs = 'acbABC123...';

    I can then perform a bitwise operation by on $chrs.indexOf('a'), which yields a similar, albiet slower result.
  7. Nov 12, 2012 #6


    User Avatar
    Science Advisor

    That looks suspiciously like Perl, in which case you should be using the core modules, rather than writing everything from scratch.
  8. Nov 12, 2012 #7
    Oh it's actually JavaScript. I used the dollar sign to make it easier to read because it was inline with the rest of the text :P

    You gave me a lead however. Turns out JavaScript supports window.btoa

    I'm hosting the site on GitHub, which allows me to design it in a way that it's easy to collaborate on (I only "know" Trig right now...know as in several years ago). The downside is that GitHub only supports static HTML sites, and so I can't take advantage of server languages.

    Though, I'm considering getting a simple web hosting plan to do just that.
    Last edited: Nov 12, 2012
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook