Sunday, August 25, 2013

Cracking the Fring Glyph Code

Lately I've been watching quite of bit of the TV series "Fringe."  For those unfamiliar, it starts out vaguely reminiscent of the x-files where a special agent investigates paranormal cases.  However, it is quite distinct from x-files as the focus is less on UFO's and government conspiracy, more on crazy science or science fiction realized.  In the vein of modern cinematic television, the overarching plot takes a more central focus as well.  The point is, its a solid show and I've consumed 3 20 episode seasons faster than I'd like to admit.

As I watched, I noticed the scene breaks were often accompanied by one of set of images.  I made some half hearted attempts to correlate the symbols with the current episode, but never made any progress.  A quick search of the web took me to fringepedia, where I learned that the images made up a code.  They link to a nice post from an ArsTechnica editor Julian Sanchez about cracking the "glyph code."  My first thought was, "I know I could break the code if I put in the effort, so why bother?  I'll just look up the solution."  Then I thought better and decided to crack it myself.  I recommend you give it a try as its pretty straight forward.  The most time consuming part is collecting the data, but I thin the ArsTechnica article links to a repository of the glyphs in each episode.

If you want to crack it yourself, stop reading here, or risk spoiling the fun.

I say that the code is relatively easy as the fringepedia page states that the code is a mono alphabetic substitution cypher.  If you are a fancy computer wiz (like Julian Sanchez), you'll find dictionary attack programs and whatnot (which may or may not be worth the time), or you can stay organized and think logically.

Perhaps it was cheating in knowing that the code was a monoalphabetic substitution (one symbol = one letter), or perhaps it was the simplest and therefore most logical place to start.  Either way, My first step was compiling my data.  With one seasons worth of self collected data (season 3), I did a frequency analysis.  This showed me the three most common symbols (with appearance percentage):  Apple8 (15%), Leaf1 (14%), and SmokeR_3 (11.5%).  That is, the apple image with the dot at roughly the 8 o'clock position, the leaf with the dot at the 1 o'clock position, and the smoke facing to the right with the dot at the 3 o'clock position (not knowing if there were smoke faces looking right with other dot patterns, I ended up recording extra information in the symbol name).  Searching for "letter frequency" quickly brings you to a wikipedia page showing that the 3 most common letters in the English language are:  E, T, and A.

I assigned Apple8 to E and either A or T to each Leaf1 and SmokeR_3.  From there it was good old crossword skills that cracked the code.  Episode 14 was crucial.  Between the episodes romance, the heart shaped dot on the last glyph of the episode, and the fact that the code contained 3 high frequency letters allowed for a good guess.  A six letter word, xEAxTx, related to romance?  HEARTS fits the bill.

To save myself the trouble of typing everything out, I put the episode cypher text into a spreadsheet and linked the plain text to my translated guesses:


This way, whenever I changed a letter in column 2, it would change my translations of each episode's cypher text.  Another key clue was symbol 13 LHand10_7 (left hand oriented at 10 o'clock with the dot at 7 o'clock) being a double letter at the end of a word.  How many letters can be doubles?  The answer is appearently almost all letters, but not commonly at the end of words.  That for the most part takes out vowels and suggests S or L.  Skeptical referencing back to frequency charts (given that we don't have 2 letter words and our limited sample size, the frequency table will be off by a bit), it was relatively straight forward to make educated guesses leading to correct looking solutions.  Next thing I knew, my cypher was pointing out my poor data collection on several episodes.

Its worth noting that my code breaking was simplistic and relied on several features.  First, that each symbol represented a letter.  Second, that each episode produced a word, that is, I knew where words started and ended.  Lastly, I had some good context.  Given a message typed using this cypher, my technique may not have been as efficient.

All in all, a fun puzzle, but not exactly mind blowing revelations in the coded messages.

No comments: