I can’t pretend that I know exactly what Matt Williams is talking about in his article, but suffice to say he’s doing something very impressive with his Pi. I’ll let him explain:
Substructure substance matching is, in many ways, a non-trivial exercise in Cheminformatics. The amount of data used to determine matches grows very quickly. For instance, one method of describing a molecule’s “fingerprint” uses 880 bytes. Or 2^880 combinations. This space is very sparsely populated, but there are still many potential combinations.
If you’re still with me, in his article he explains how the Pi does pattern matching with grep and how it speeds up when reading from the cache. Read his article here.
Thanks for the shout-out! I’ve been quite pleased with the computing power of the Pi 2 in particular. Of course, it wasn’t that long ago that they would have been considered servers unto themselves.
As a follow-up, I did find, yesterday, that not all greps are created equal. The grep in busybox is approximately 30x slower than GNU grep, so what was taking the GNU grep sub 2s was over a minute using busybox!