Wednesday, 18 June 2014

Three weeks into Coding - Progress / Roadblocks

Well, this was supposed to be written when I was two weeks into coding. But, it went off my mind and my mentor has kindly reminded me to write this one. Its about to be four weeks but I will just talk about the experiences of my first three weeks.

So when I had written the last blog, I hadn't even begun the coding for GSOC and hence was all excited to start coding. And now, its been three weeks. I can't use the cliched phrase "Its been a long journey", but it has been a nice journey. Let me first describe the things I have done by now before going into more philosophical discussions.

Beginning with Week1. The task for me for the week was to implement the framework of Markov Network. I had to define all those little-little functions for adding nodes, removing edges, setting observation, accessing potentials etc etc. I had to take care to ensure that the we don't pass integers as node names, we don't access observation that was set and so on. Well to be fair to the previous developers of pgmpy, I copied quite a few things from the code for BayesianModels. So, thanks, the previous developers of pgmpy!

One mistake which I did in this process was making my own class for representing factors, when it existed already in the only package which I had overlooked while going through the code. So Ankur told me about that and hence I had to change my code to work with the existing Factor class.

The next week was supposed to be spent for building the independencies relations for the Markov Networks. However, I felt like working with triangulation and junction trees and so on and hence started working on this particular module. The coding part for this was very interesting as it involved implementing algorithms and so on. So, I started working on this and implemented a lot of things, starting from a checking if a graph is triangulated to triangulation heuristics to implementing the JunctionTree class and writing functions to create the junction tree from the triangulated graph.

So there is a long series of events which happened when I submitted the PR. First, I had not commented the code or added docstrings to the code because I had expected a lot of comments about changing the interface and hence I thought that it will be best if I postpone the commenting part till after the interface and code was fixed. However, I received a (rather-angry) mail from Abinash asking me why I hadn't commented the code. I gave him my reasons. Also, he told me that many of the functions which I had written was already in the standard networkx package. Ah, and I was heart-broken. All the efforts spent in writing those maximum-spanning trees and finding the cliques and checking if the graph was triangulated had just gone waste. I might accept my mistake in not using Maximum Spanning tree code of networkx, (but then it was 15 lines and I would have written that code atleast 3 times before), but who would imagine networkx hiding such specific functionalities(finding cliques in chordal graphs, finding tree width of chordal graphs etc) inside its deep seas of code disguised as functions such that you wouldn't even find them if you looked for functions of a graph object. Anyway, all that code had to be replaced by these library functions for maintainability etc.

Once I was done with this, I started commenting the code and adding docstrings and writing test cases. When I did that, I had a big realization, "Copying code was easy (Bayesian Model's framework to Markov Model), but documentation and testing won't be easy" as I will actually have to do all that. (Well, just because I am mentioning copying the code, don't start assuming that I copied all the code. I only copied small functions)

This took quite some part of week3. Oh man, documentation and testing have two really bad features : 1. They take a lot of time 2. They are slightly boring to do, but then as Ankur said, "I will realize the importance of testing and commenting".

Anyway, once I was done with that I started working on the message passing algorithm, and setting up the entire framework and I am still on it. The next task after this will be to write up various inference algorithms using Junction Trees (and yes, documenting and testing too :P ). Hope I have a good time ahead, as I have been having till now.

Signing off for now.
Navin


No comments:

Post a Comment