It is impossible to ignore avro at work - it is the data serialization format
of choice at work (and rightly so), whether it is to store data into Kafka
or into our document database Espresso. Recently, I had the need to read
avro data serialized by a Java application, and I looked into how I might use
Python to read such data.
Having worked with Python for a while, I am trying to pick up Ruby, especially
for some of my work with logstash. While trying out a small program in Ruby, I
got stumped with a peculiar trait of Ruby hashes with default values. It made
me lose an hour of my life I am not going to get back. :(
Ok, I don't particularly like calling a bug fantastic, in this case, it is
more of a fantastic troubleshooting of a bug. What I found interesting was the
layers that were unpeeled one by one to reach the probable region of the root
cause. (Yeah, the root cause is probably so esoteric and confined to a specific
combination of version, that it is unlikely to be looked at by anybody).
After more than a month of tireless research and testing, we have finally got
to the bottom of our ZooKeeper mystery. Corruption during AES encryption in
Xen v4.1 or v3.4 paravirtual guests running a Linux 3.0+ kernel, combined with
the lack of TCP checksum validation in IPSec Transport mode, which leads to
the admission of corrupted TCP data on a ZooKeeper node, resulting in an
unhandled exception from which ZooKeeper is unable to recover. Jeez. Talk
about a needle in a haystack… Even after all this, we are still unsure where
precisely the bug lies. Despite that fact, we’re still pretty satisfied with
the outcome of the investigation. Now all we need to do is work around it.
I have a confession to make. Hollywood has always fascinated me. Not because of
the larger-than-life stories they come up with. But because of the enormous
machinery that churns out a movie. To the utter frustration of my family, I
always stay back at the end of a movie, looking at all the credits which flash
by - to see the rest of the iceberg under the tip. The thousands of people who
made this movie happen, out of which only a fraction gets the world wide
adulation, but all of them were needed to make it happen.
Apple has patented a piece of technology which would allow government and
police to block transmission of information, including video and photographs,
from any public gathering or venue they deem “sensitive”, and “protected from
In other words, these powers will have control over what can and cannot be
documented on wireless devices during any public event.
And while the company says the affected sites are to be mostly cinemas,
theaters, concert grounds and similar locations, Apple Inc. also says “covert
police or government operations may require complete ‘blackout’ conditions.”
And those who think that this is not coming for Android in the future are
deluded. If Apple managed to get this technology into the field, it is only a
matter of time that Android handset manufacturers are forced to incorporate this
as well. If the technology exists, in today's post 9/11 world, it is difficult
to resist government pressure on such matters.
Of course, it would be interesting to see the security features for this tech,
as this is very likely to be abused - by repressive governments (read, every
one) as well as criminal enterprise (recording-free drug zones everybody?)
Who said the field of security cannot have humour! An Android app to
control the commode in Japan (you know the
land of fully programmable toilets, I kid you not) has announced a
vulnerability because the bluetooth pairing code is hardcoded.
Curious about several peculiar Apple related 404 errors for images in my web
server logs, I decided to find what is going on, and became knowledgeable about
yet another nugget that I really didn't want to know. (sigh)
Just now read a rather disturbing article from Sophos security. The
article describes the interpretation of the law by NSA and some of the internal
policies that they use in surveillance.
They also reveal that courts don't always determine who's targeted for
surveillance because that discretion is practiced by the NSA's own analysts,
with only a percentage of decisions being reviewed by regular internal audits.
To make those decisions, NSA analysts use information including IP addresses,
potential targets' statements, and public information and data collected by
In the absence of such information - for example, if a potential target
is using online anonymity services such as Tor, or sending encrypted email and
instant messages - agents are encouraged to assume that the target is outside
This is the part that needs to be emphasized again and again - all this hullaboo
in USA about NSA's surveillance is about snooping on American citizens. If you
are not one, you have no rights at all and NSA has no limits to what they can
sniff out of you and how long they can keep that info. I know, it is pretty much
common sense, but when I see Indians getting all worked up about this
revelation, I sometimes feel that some of them don't get this.
So coming back to the article, if an American is using Tor or
encrypted email or encrypted chat messages, unless the American has been
positively identified as an US citizen, he will be treated like a foreign
person - essentially with no rights.
And this part is interesting:
If communication is encrypted - particularly if a US person is using certain
types of cryptology or steganography known to have been used by "individuals
associated with a foreign power or foreign territory” - the NSA is free to
collect it and store it "indefinitely" for future reference and cryptanalysis
That is a loophole right there in my opinion - will they still keep the crypto
data if they already have the means to crack it? :-)
High Scalability had an interesting link today about a project that combines
Raspberry PI, btsync and owncloud to create
essentially a personal Dropbox replacement with none of the costs or the storage
limitation. Also very importantly, keeping up with the hot topic nowadays, the
peace of mind from knowing that you are not making it easy for intelligence
agencies to go through your most important and personal data.
The players in this solution here are:
btsync: A still alpha lab product from the original bittorrent
creators, which allows you to securely sync a folder between multiple devices
owned by you. Ready to use binaries are provided for all the major platforms
(desktop and mobile) as well as several ARM architectures (which is where
Raspberry Pi comes in). The UI interface is not great, which is probably why
the next piece of the puzzle comes in - Owncloud. But if you really want the
basics, this is all the software that you need for a synchronized folder
among multiple devices.
Unfortunately, btsync is notOpensource software.
So it is entirely upto you who you trust more - Dropbox or Bittorrent Inc.
Btsync is reported to phone home for version check
and uploading anonymized stats. I have looked around. btsync doesn't have any
open source competition yet.
Owncloud: This is actually a standalone application for sharing
your files via a dropbox like web interface. It has an extensive list of
features - sync between devices, multiple user support, file
versioning, undelete, Lucence based search, shared calendar, tasks,
data migration/backup and many more. Most importantly, this is Open source
software, with all the code available on github.
One question that came to my mind after reading the feature set is that
Owncloud already had a multiple device file sync feature. So
why would you need btsync?
From reading over the net, it seems to me that btsync is considered to be
more reliable as a file sync client. So the idea is to use btsync everywhere,
and on one of the devices, use owncloud to provide the interface to
serve/edit files over the web.
So how does Raspberry Pi - the overnight micro computing sensation
fit into all this? This is because of the way Bittorrent works. For uploads
to happen for a torrent, you need one seed up with the complete data. Since
btsync is essentially multiple torrents bunched together, it needs a seed as
well. And if all your devices are mobile and not always on, there is a good
chance that when you need a file, none of the other devices are up and
you are cut off from your data.
The solution is simple, have one of the btsync devices to always be running,
essentially acting like the seeds for your data. If this always-on computer
is a mind-numbingly low 6 watts burning tiny box hanging off a wall socket,
well .. you can see the appeal of R-pi.
But I already have an always-on device - my Synology NAS, which also
happens to be an ARM device. So to try it out, I downloaded the PPC version of
btsync and tried to run it - no luck. The btsync binary is a glibc2.4 binary
while the NAS firmware is glibc2.3. btsync uses inotify on glibc2.4
and therefore will never support glibc2.3, so I am out of luck here.
./btsync: /lib/libc.so.6: version `GLIBC_2.4' not found (required by ./btsync)
The one thing I am yet not comfortable with Raspberry Pi, is its lack of a
shutdown switch. Raspberry Pi is perfect for headless usage and with a USB wifi
dongle, the only wire it needs is the charger. However to shut it down properly,
you cannot just turn it off. Just like any other Linux machine, you need to
execute the shutdown command which will unmount the filesystems cleanly before
turning off the machine. Mess this up, and you will end up with a filesystem
which needs an fsck on bootup and the machine will not boot without you using a
keyboard and console to fsck the filesystem.
Till I get myself a hack to shut R-Pi headlessly in a clean and convenient way,
I just am not to comfortable using it for serious applications, let alone touch
my precious data. There is a nice discussion on raspberry pi
forums that I need to readup to do this, and a few blogs (like
this) already provide various ways to do that. I just need to
find some time to go through all that.