Using Regex to Match Multiples of Four

This topic came up recently in a discussion on stack overflow so I thought I’d post what I came up with here. Obviously this is not the most practical way to determine multiples of four but, I think this is an interesting question if for no other reason than it presents an interesting challenge academically.

Ultimately regex is just a pattern matcher right? so what types of patterns might be created by numbers in multiples of four? To answer this question I wrote a quick program to print out all multiples of four from 1 – 500 ( try it)  ;)

what I noticed is that the last digit of each number was alternating between 0 4 8 2 6. Now you might be tempted to use this immediately and just check all strings of digits to see if they end in one of these numbers, but that wouldn’t work, since other integers also end with those digits that aren’t themselves divisible by four such as 10, 14, 18, 22, 26, etc… and so the search continues. Next I looked at the last two digits and noticed a repeating pattern between 0 and 100

if you prefix the single digits with zeros you’ll notice that this pattern repeats every increment of 100. So now I’m feeling pretty confident that I’m on to something. To test my theory further I pulled up Google and typed in 2147483648 % 4 (which is the next highest number past the maximum 32 bit signed int value which is divisible by 4) this was just the first arbitrary value that came to mind and has no other meaning that I’m aware of and as it turns out 2147483648 % 4 = 0 so I’m feeling really good right now. I suppose you could actually write out a mathematical proof and prove that this theory works, but I’m more into application. So I figure at this point all I have to do is write up this regex and then I can test it against the output of the program written above. So my next goal is to write the actual regex.

If you notice I conveniently made the program print out the OR regex operator so I can just cut and paste most of the regex and I’m halfway home. All I want are the last two digits so the first part of my regex looks something like this:

you’ll notice I prefixed the zeros to the single digits and added 00 to the front. Again this is because I want to match the last TWO chars including the 00 from 100 (this will also return strings of 0 as a valid multiple of four as it should). so now I have my regex suffix wrtten. According to my theory any string of of digits suffixed by the aforementioned two digits is a multiple of four so I just need to write a rule for the prefix (any digit) and I’m done. This is very easy and is simply [0-9]* So now my regex looks like this:

Now I’m almost done. What have I forgotten? Single digits!!! 0,4 and 8 will be rejected by the regex above since they are single digits and the above pattern only matches two digits preceded by 0 or more digits. so I have to tweak the regex a little and I end up with this:

and that’s pretty much it. Technically you would also have to add word boundaries since you want to treat the entire string of digits as a word. you would add boundary tags like this:

but whether or not you do that depends on your application. If you were going to use this in a lexer you might be building with jflex for instance you might not want to include those since you could have other rules for similar lexemes.

So all in all that’s how I would do it. That’s probably not the shortest most concise regex and I’m sure there are better ways to do it, but if you’re looking for something quick and dirty I don’t think it gets any quicker or dirtier.

Leave a Reply