A better way to obfuscate ipaddresses in Python?
- November 12th, 2009
- Posted in Drupal6 . Ubuntu
- By
- Write comment
I’m too lazy now to ask google…but
>>> import re
>>> ipaddress=re.compile("(\d+\.\d+\.\d+\.\d+)",re.DOTALL)
>>> fp=open("logfile.log","rb")
>>> newfp=open("komserver01.log.temp","wb")
>>> for line in fp:
... for ip in ipaddress.finditer(line):
... newip=ip.group(1)
... bytes=re.search("(\d+)\.(\d+)\.(\d+)",newip)
... obfus=bytes.group(1)+"."+bytes.group(2)+"."+bytes.group(3)+".xx"
... line=line.replace(newip,obfus)
... newfp.write(line)
looks easy and is fast…but is there a simple way to do this?
Hmpf.

I do love python. But how could it possibly beat:
sed -re “s/(([0-9]+\.){3})[0-9]+/\1xx/g” access.log
?
I was thinking the same exact thing :) I asked how to do this about 15 years ago and that line was exactly the same :)
I guess this would work:
newfp.write(re.sub(r’(\d+)\.(\d+)\.(\d+)\.\d+’, r’\1.\2.\3.xx’, fp.read()))
sometimes thinking in simple terms is hard ;)
please don’t slurp in the whole file at once, unless you’re completely certain you will never have a large logfile (we’re dealing with multi-gigabyte logfiles at work…). I rather liked the original approach of reading it in line by line. Just do
Also: that regex isn’t particularly good, in the “will catch other things that aren’t ipv4 addresses” sense, but it’s probably good enough for this.
it’s a python application where this functionality needs to be implemented ;)
No problem ;)
(p.s. What’s the point of the OpenID login? It just told me I was blocked.)