Joe Kelley's Blog
http://www.joefkelley.com/
Recent content on Joe Kelley's BlogHugo -- gohugo.ioen-usJoe KelleySun, 22 Mar 2020 00:00:00 +0000Optimal Guess Who
http://www.joefkelley.com/906/
Sun, 22 Mar 2020 00:00:00 +0000http://www.joefkelley.com/906/Recently, The Riddler on FiveThirtyEight featured a puzzle that required determining an optimal strategy for the game Guess Who. The resulting solution was a bit surprising to me since I had always assumed you would want to cut your remaining candidates in half, but that turned out to not be the case. So I’ll walk through my derivation of the strategy and give a table that shows what you should do in every situation to play optimally.Byzantine Generals Riddler
http://www.joefkelley.com/905/
Mon, 08 Aug 2016 00:00:00 +0000http://www.joefkelley.com/905/Every Friday, FiveThirtyEight.com posts a probability/statistics riddle as part of their “Riddler” column. I highly recommend them, they are usually quite tricky, but often very clever.
Anyway, I believe the solution to last week’s riddle was wrong! Well, not wrong technically, but not optimal. Here’s the problem:
You are the emperor of Byzantium (lucky you!) and you have N generals working for you. You know that more than half of your generals are loyal, and the rest are traitors.Explicitly using Scala Implicit conversions
http://www.joefkelley.com/904/
Wed, 18 Nov 2015 00:00:00 +0000http://www.joefkelley.com/904/It seems a bit of a contradiction, but I recently had a problem with Scala’s implicit conversions being a bit too implicit. I wanted a simple way to make them more explicit.
Here’s an example usage of implicit conversions:
case class Foo(x: Int) case class Bar(s: String) { def doSomething(): Unit = { println("Bar: " + s) } } implicit def foo2bar(foo: Foo): Bar = new Bar(foo.x.toString) new Foo(10).doSomething() Predictably, this prints “Bar: 10”.Optimizing Sparse Matrix Multiplication on Hadoop
http://www.joefkelley.com/853/
Fri, 08 May 2015 00:00:00 +0000http://www.joefkelley.com/853/Recently at work I spent some time trying to optimize large sparse matrix multiplication on Hadoop as part of an implementation of a larger algorithm (Markov Clustering). We ended up pursuing a different route, but I decided to continue pursuing the problem on my own time. I learned a fair amount, and I don’t think there exists a much better way of doing this out there. So I decided to write up an explanation of the algorithms involved, as well as release the code:Dota 2 Ability Draft Statistical Analysis - Part 2
http://www.joefkelley.com/788/
Sat, 24 Jan 2015 00:00:00 +0000http://www.joefkelley.com/788/In a previous post, I started an analysis of Dota 2’s Ability Draft mode. This is continued here, with a focus on increasingly fine-grained analysis. That post is probably a better place to start, if you haven’t read it yet.
If you just want the data, I’ve uploaded it to a few Google fusion tables:
Single Abilities Abilities Melee vs Ranged Pairs of Abilities Ability Synergy Probably the defining aspect of ability draft that makes it so fun is the ability to combine abilities from different heroes, and so I spent some time analyzing pairs of abilities.Dota 2 Ability Draft Statistical Analysis - Part 1
http://www.joefkelley.com/765/
Thu, 22 Jan 2015 00:00:00 +0000http://www.joefkelley.com/765/Dota 2’s ability draft game mode is sorely lacking some statistical analysis. The awesome dota-metrics blog did a good post about it a while back (https://dotametrics.wordpress.com/2014/06/11/random-ability-draft-hero-win-rates/) but only talked about which heroes were best, which misses the real meat of ability draft: the abilities!
I downloaded ability draft data from the Dota 2 web api for a period between Nov 9-12 2014, about 220,000 matches. Note that this is after the Phantom Lancer and Bloodseeker reworks, but before Oracle was added.Binary Matrix Proof Puzzle
http://www.joefkelley.com/743/
Thu, 23 Oct 2014 00:00:00 +0000http://www.joefkelley.com/743/This puzzle is one of the hardest I’ve ever solved, but it’s a surprisingly elegant and creative solution. Shout-out to Mark Liu, who I originally heard it from. I’ll warn you ahead of time - it probably requires some Math/CS theory.
Consider a matrix with the following constraints:
It contains only 1s and 0s It is wider than it is tall (more columns than rows) There is at least one 1 in each column Prove that there exists a 1 at some position in the matrix, for which there are more 1s in that row than in that column.Random sampling in Hive
http://www.joefkelley.com/736/
Thu, 05 Jun 2014 00:00:00 +0000http://www.joefkelley.com/736/Let’s say you have a Hive table with ten billion rows, but you want to efficiently randomly sample a fixed number- maybe ten thousand. The most obvious (and obviously wrong) way to do this is simply:
select * from my_table limit 10000; Hive makes no guarantees about the order of records if you don’t sort them, but in practice, they come back in the same order in which they’re in the file, so this is far from truly random.Hive argmin and argmax
http://www.joefkelley.com/727/
Wed, 28 May 2014 00:00:00 +0000http://www.joefkelley.com/727/By far the most common Hive question on StackOverflow is how to do argmin and argmax. Usually there’s a lot of other complex thinking wrapped around the problem, but most of the time when someone is having a hard time writing a tricky query, it boils down to them not knowing how to do argmin or argmax.
It’s funny because this is a feature that Hive developers consciously chose not to implement.Let the compiler prevent bugs
http://www.joefkelley.com/714/
Thu, 24 Apr 2014 00:00:00 +0000http://www.joefkelley.com/714/I’m a fan of statically-typed languages. A type-aware compiler is extremely helpful in suggesting the correctness of code, but it has to be used properly. This is an idea that I think good programmers are aware of, but I don’t think I’ve seen it explicitly called out. You should write your code in such a way that the logic is tied to the type. This allows the compiler to do more for you and prevent more bugs.Sink the Ship Puzzle - Solution
http://www.joefkelley.com/681/
Fri, 21 Feb 2014 00:00:00 +0000http://www.joefkelley.com/681/Here’s a solution to the “sink the ship” puzzle I posted a little while ago:
First of all, let’s reduce this to one dimension. Let’s say the ship starts at an integer position on a line: \(x_0\), and moves with some integer velocity: \(v\). All possible combinations of those two values must be eventually hit by our strategy.
A first intuition might be to try to drop missiles on every possible location.Sink the Ship Puzzle
http://www.joefkelley.com/677/
Tue, 04 Feb 2014 00:00:00 +0000http://www.joefkelley.com/677/Imagine an infinite ocean, with a single tiny island in the middle. Call this island position (0, 0). Now say there is a ship somewhere in this ocean, and it starts at an exact integer number of miles away from the island in both the x and y directions. You can picture an infinite grid centered on the island, and the ship starts on one of the grid points. The ship then starts moving; it has an x-velocity and y-velocity that are both integer numbers of miles per hour.Decimal expansions containing series
http://www.joefkelley.com/635/
Thu, 30 Jan 2014 00:00:00 +0000http://www.joefkelley.com/635/I just saw a great question on the Math StackExchange asking why \(\frac{100000000}{99989999}\) generates the fibonacci series in its decimal expansion: \(1.000100020003000500080013002100340055...\). And it isn’t just a trick; it really does give the fibonacci series until it can’t fit any more in 4-digit chunks. And even then, it just carries the extra digits over.
It’s a pretty neat technique to prove this. Basically, consider the sum: \(f_0+f_1x+f_2x^2+f_3x^3+...\)
Then, if you plug in a decimal like \(0.1+2+3+... ≠ -1/12
http://www.joefkelley.com/579/
Wed, 22 Jan 2014 00:00:00 +0000http://www.joefkelley.com/579/There was an unfortunate youtube video released recently making the claim that \(1 + 2 + 3 + 4 + ... = -\frac{1}{12}\). This is simply not true. Here’s the proof presented in the video:
$$ \displaystyle \begin{align} \text{Let }S_1 &=1-1+1-1+1-1+... \\ \text{Let }S_2 &=1-2+3-4+5-6+... \\ \text{Let }S_3 &=1+2+3+4+5+6+... \\ \end{align} \\ $$ First, calculate \(S_1\):
$$ \begin{align} S_1 + S_1 = 1&-1+1-1+1-... \\ &+1-1+1-1+... \\ \end{align} \\ 2S_1=1 \\ S_1=\frac{1}{2} \\ $$ Next, \(S_2\):I wish I was writing Scala
http://www.joefkelley.com/574/
Tue, 14 Jan 2014 00:00:00 +0000http://www.joefkelley.com/574/Today I wrote a chunk of Java code that really hammers home the usefulness of functional programming by how much it sucks. I had to take a list of strings, and create a new list containing only the first words of the input list. For simplicity, “word” means anything up until the first whitespace. Here it is:
List<String>; result = new ArrayList<String>(); for (String line : input) { for (int i = 0; i < line.Sorting by transposing
http://www.joefkelley.com/556/
Tue, 03 Dec 2013 00:00:00 +0000http://www.joefkelley.com/556/Here’s a clever algorithm for sorting integers. The bulk of the work is actually done by a transposition of a list of lists of different lengths. Normally, you would think of a transpose as something like this:
$$ \begin{array}{ccc} a & b & c \\ d & e & f \\ g & h & i \end{array} \Rightarrow \begin{array}{ccc} a & d & g \\ b & e & h \\ c & f & i \end{array} $$ But what if each row has a different number of elements?www.what-would-i-tweet.com
http://www.joefkelley.com/546/
Thu, 21 Nov 2013 00:00:00 +0000http://www.joefkelley.com/546/I recently finished the first version of a website:
what-would-i-tweet.com
The idea for it is shamelessly stolen from a website that does the same thing with Facebook. You log in, and then it generates tweets similar to what you would tweet, based on your tweet history. You can also have it generate tweets based on other people’s tweets. For example:
what-would-i-tweet.com/generate/amandabynes
I am bad at front-end web development, so it’s extremely bare-bones for now.Integer Fractions Puzzle
http://www.joefkelley.com/518/
Fri, 01 Nov 2013 00:00:00 +0000http://www.joefkelley.com/518/Here’s a programming puzzle: Given an integer \(n\), count the unique integer solutions \((x, y)\) to the equation:
\(\frac{1}{x}+\frac{1}{y}=\frac{1}{n!}\)
And let’s add a requirement that it must be reasonable efficient. We could be dealing with \(n=5000\), which means that \(n!\) is already an extremely large number. Any attempt at a brute-force-ish program is going to be useless with numbers that big. If you want to take a crack at the problem on your own, stop reading here and go for it.Quick ssh/tmux tip
http://www.joefkelley.com/514/
Tue, 01 Oct 2013 00:00:00 +0000http://www.joefkelley.com/514/I used to have the bad habit of ssh-ing into a machine, starting some long-running process, and then being stuck wherever I was (at work, wanting to go home, or at home, wanting to go to work) because disconnecting from wifi would mean my ssh session would die, and take down the long-running process with it. This is easily solved with something like tmux, but I would often simply forget to start a tmux session after ssh’ing in.Fibonacci Revisited
http://www.joefkelley.com/469/
Tue, 20 Aug 2013 00:00:00 +0000http://www.joefkelley.com/469/A coworker of mine (thanks, Jeff!) recently pointed out that I had been a little sloppy with my analysis of various algorithms to compute the \(n^{th}\) fibonacci number in a previous post. I made the simplifying assumption that arithmetic operations (addition and multiplication) happen in constant time. This is often, but not always, a good assumption to make; either the numbers involved in the computation are small enough that addition and multiplication do happen in constant time, or the algorithm is dominated by other factors.An abuse of the Java type system
http://www.joefkelley.com/460/
Tue, 13 Aug 2013 00:00:00 +0000http://www.joefkelley.com/460/What if you wanted to have a List or Collection of items, but with completely different types? Obviously, the standard way to deal with this would be with simply a List, but then accessing items would require casts and type-checks and would not receive any of the typical benefits of statically-typed code. So is there a way to dynamically create a List, where each item has a statically-checked, and entirely different type?Divisibility diagrams
http://www.joefkelley.com/434/
Wed, 07 Aug 2013 00:00:00 +0000http://www.joefkelley.com/434/Check out this diagram that can be used to check if a number is divisible by 7:
What you do is this: Start at the red circle. For each digit in your number, starting with the left-most, you follow that many black arrows. Between each digit, follow a single blue arrow. If you end on the red circle, the number is divisible by 7, otherwise it’s not. For example, if your number was 245, you would follow 2 black arrows, one blue, 4 black, one blue, and finally 5 black.A practical guide to online privacy
http://www.joefkelley.com/420/
Mon, 10 Jun 2013 00:00:00 +0000http://www.joefkelley.com/420/The NSA has been getting quite a lot of flack recently for all of the data it has been gathering from major US companies, and in fact, the US internet infrastructure as a whole. There’s a substantial outcry against this on the grounds of privacy, so I want to provide a few tips and tools that you can use if you want to keep your internet traffic secure and private, but first two disclaimers:Binary Operator Puzzle - Solution
http://www.joefkelley.com/383/
Tue, 30 Apr 2013 00:00:00 +0000http://www.joefkelley.com/383/Here’s one possible solution to the puzzle I posted last week. The task was to find a commutative and associative function with no identity element. I had hoped to post a more clever solution, but found some small errors in edge-cases and technicalities to everything I had planned on posting! But still, here’s a solution that does work, even if it’s a little ugly:
$$ f(x,y)=\left\{ \begin{array}{rl} x+y &\mbox{ if }x \neq 0, y \neq 0 \\ 0 &\mbox{ otherwise} \end{array} \right.Binary Operator Puzzle
http://www.joefkelley.com/362/
Wed, 24 Apr 2013 00:00:00 +0000http://www.joefkelley.com/362/Simple puzzle: can you come up with a binary operator over integers that is commutative and associative, but has no identity element? For example, addition is commutative because \(x+y=y+x\) and associative because \((x+y)+z=x+(y+z)\), but it does not satisfy the puzzle because \(0\) is an identity element: \(x+0=x\). Likewise, multiplication does not work because \(1\) is an identity element. \(\text{ }f(x,y)=xy+1\) is commutative and has no identity element, but is not associative.Patterns that aren't
http://www.joefkelley.com/330/
Thu, 21 Mar 2013 00:00:00 +0000http://www.joefkelley.com/330/In eighth grade, a friend of mine discovered an interesting pattern: start with the number 11, add 2, then add 4, then 6, etc. It appears that this generates prime numbers! He showed me the first few numbers in the series:
\(11, 13, 17, 23, 31, 41, 53, 67...\)
I was amazed. All of those are indeed prime numbers! But look what happens if you go a little further…
\(...83, 101, 121\)Good Will Hunting problems
http://www.joefkelley.com/293/
Thu, 14 Mar 2013 00:00:00 +0000http://www.joefkelley.com/293/I recently watched the movie Good Will Hunting for the first time. In it, Matt Damon plays a genius janitor that is discovered when he solves two sets of math problems that were left on a chalk board. Surprisingly, those problems were not nearly as difficult as they were portrayed in the movie. Here’s the first set, which the camera helpfully pauses on briefly in the movie:
Consider the following graph, G: 4 / 1 - 2 = 3Obscure Java features
http://www.joefkelley.com/274/
Wed, 06 Mar 2013 00:00:00 +0000http://www.joefkelley.com/274/Here’s a few features of the Java language that you might not have heard of. I think it’s interesting to plumb the depths of a relatively simple language.
1. Labeled loops and other strange control flow Did you know that you can give loops and if-statements names? For instance:
countToTen: for(int n = 1; n <= 10; n++) { System.out.print(n + "... "); evenOrOdd: if (n % 2 == 0) { System.Mersenne Primes
http://www.joefkelley.com/251/
Thu, 07 Feb 2013 00:00:00 +0000http://www.joefkelley.com/251/About a week ago, the largest known prime number was discovered. It has over 17 million digits, so I’m not going to post it here directly, but I will post a simpler representation: \(2^{57,885,161}-1\). This prime is more than four million digits longer than the previous record-holder. Think about how big that number is for a second… the number of particles in the universe has only about 80 digits. The number of different possible chess games has about 120 digits.Weighing Balls Puzzles - Solutions
http://www.joefkelley.com/239/
Mon, 04 Feb 2013 00:00:00 +0000http://www.joefkelley.com/239/Here are the solutions to the puzzles in the last post. If you want to read those first, click here.
Version 1: Easy
The basic idea is that there are three possible outcomes to every weighing: the left side is heavier, the right is heavier, or they are equal. So in the first weighing we want to have those three outcomes split the possible heavy balls into three groups of three possibilities.Weighing Balls Puzzles
http://www.joefkelley.com/233/
Thu, 31 Jan 2013 00:00:00 +0000http://www.joefkelley.com/233/Here’s a series of puzzles, in increasing difficulty. This is a common type of puzzle, and I’ve heard the first three many times before, but the last one is one that I came up with. It’s meant to be extremely difficult - it just might be the hardest of its kind ever posed.
Version 1: Easy
You are given a set of nine balls, one of which is very slightly heavier than the others, and a balance scale.Undecidable Problems III
http://www.joefkelley.com/215/
Thu, 24 Jan 2013 00:00:00 +0000http://www.joefkelley.com/215/In my second undecidable problems post, I talked about using a hypothetical solution to the Halting Problem to solve open problems in math. In this post, I’ll go the other direction, and use the non-existence of such a solution to prove a shocking fact: not only are there problems which cannot be computed, there are individual numbers that cannot be computed.
First, define a function \(f\) as “computable” if we can write a program to compute it for all inputs in its domain.Undecidable Problems II
http://www.joefkelley.com/191/
Wed, 23 Jan 2013 00:00:00 +0000http://www.joefkelley.com/191/In a previous post, I talked about the Halting Problem, and the fact that it was undecidable- that is, it was impossible to write a program to solve it for all cases. There are a few interesting related problems that may give some understanding as to why this must be the case.
As in the previous post, let:
$$ H(P,x)=\left\{ \begin{array}{rl} 1 &\mbox{ if P halts on x} \\ 0 &\mbox{ otherwise} \end{array} \right.Sequence Alignment
http://www.joefkelley.com/131/
Fri, 11 Jan 2013 00:00:00 +0000http://www.joefkelley.com/131/Sequence alignment is the task of aligning two sequences of characters, with possible gaps inserted, in order to maximize the matching characters. Its most notable application is in biology, where it is commonly used on sequences of nucleotides (DNA) or proteins.
Usually, to “score” an alignment, we choose three values: a match score, a mismatch penalty, and a gap penalty. For example, it’s common to choose match=2, mismatch=-1, gap=-1. With those choices, the following alignment would get a score of 5 (4 matches, 2 gaps, 1 mismatch):Undecidable Problems
http://www.joefkelley.com/111/
Tue, 11 Dec 2012 00:00:00 +0000http://www.joefkelley.com/111/One of the most surprising things I learned in my freshman year studying computer science was that there are certain mathematical/computational problems that can be very simply, concretely, and exactly stated, yet cannot be solved. These problems are called “undecidable” problems, and to me it seems counterintuitive that they should exist, and yet, they very clearly do, and I’ll go over one of the most interesting ones in this post.Internet Security
http://www.joefkelley.com/83/
Wed, 24 Oct 2012 00:00:00 +0000http://www.joefkelley.com/83/In a previous post, I explained the basics of how the infrastructure of the internet works to enable communication from one computer to another. This post will try to explain the basics of how sensitive information makes it all the way across the internet, with only the intended recipients able to read it.
I’d like to first note that I’ll only be covering the basics of one of four principles of information security.Computing Fibonacci numbers
http://www.joefkelley.com/20/
Wed, 17 Oct 2012 00:00:00 +0000http://www.joefkelley.com/20/The Fibonacci series is a very famous recursively-defined series that is often used as an introductory exercise in beginning programming courses. It has a very simple definition:
\(f(n)=f(n-1)+f(n-2)\)
We often choose \(f(0)=0\) and \(f(1)=1\), which makes the first few terms of the series:
\(0,1,1,2,3,5,8,13,21,34...\)
And it’s as simple as that. But computing an individual element of the series via computer is actually an interesting optimization problem.How the Internet works
http://www.joefkelley.com/10/
Tue, 16 Oct 2012 00:00:00 +0000http://www.joefkelley.com/10/Let’s talk about the internet. Everyone knows how to use it, but almost nobody actually knows how it functions. If you think about it, there’s definitely some magic going on here. When you type a website into your browser, e.g. “www.joefkelley.com”, how does it know which server it needs to talk to? How does your data get sent there? How does it get sent back? There’s a bunch of intermediate computers that all of this has to hop over; how do they know where to route the traffic to?