Thursday 21 August 2014

The final GSOC post

Since I had delayed the last post, so I am combining two of my blog posts and am going to use this blog post to give you an update of all that has happened in this GSOC (special emphasis on the stuff of the last 4 weeks).

So first question is, where am I right now? In Mumbai. In my college. A stupid answer !
I have finished all the weekly goals of the GSOC project apart from a few things :-

1) I wanted to implement atleast one the optimal triangulation algorithms (using the research paper). The particular algorithm which I had to implement was decided pretty late and I didn't find time to implement it towards the end and hence that is still pending. I hope to do it soon.

2) Alpha - expansion algorithm is due. I have written the code, but there seems to be bug. I will debug it and push the code for alpha-beta sampling soon.
Also, some code reviews are pending and hence I am expecting some work on the code improvement part. But other than that, the work which I was officially expected to do in GSOC is over. 
Now that I am done with my status, let me talk a little bit about what happened since my last blog post (since the virtual time of my blog post, if you may). So for those who had read my last blog post, I was slightly stuck with integrating the factor product code for cython. However, I realized soon that cython could easily take in the python objects directly (duh) and so I easily modified the factor product to work seamlessly with extra data that is stored in factors (this extra data is the assignment of the eliminate variables (the variables which were eliminated while maximizing) corresponding to every combinations of the existing variables). 
Anyway, once I was done with that, the next task was the implementation of specific special-setting inference algorithms. One of them was the "Graph-cut algorithm for MAP in pairwise binary MRFs with submodular potentials ". This particular algorithm requires a max-flow-min-cut algorithm and I was really surprised to see that networkx didn't have a function which returns the exact cut-edges which I needed for this algorithm. I could have used external python libraries for this but this would have created an extra dependency just for one function. So we (I and mentors) decided that it would be best to implement the max-flow algorithm on my own which could then be used for this algorithm. So I implemented this and then implemented the algorithm.

However, I was slightly busy for a week after this with resume submission deadlines and so on in the institute and after that came back to the GSOC work. I implemented the gibbs sampling algorithm first and then went to alpha-expansion algorithm. While implementing I suddenly realized that alpha-expansion algorithm might not be too helpful for practical purposes because there are just too many constraints (All the variables must have equal cardinality. Anyway, I implemented it and have not pushed it yet, as there are a few issues to resolve in this function.
Once I was done with that, I spent some time with code review, examples, test cases and documentation (Well, this is the part that I didn't like. I mean while writing examples, I just had to copy stuff from the examples, change the format etc and it was one of the most boring things which i did as a part of gsoc (which kind of tells you how interesting GSOC was for me, in general ). I don't know if there is a better way of doing this but copy-pasting examples, removing "self." and formatting it doesn't look like the kind of work which doesn't have alternatives in the CS world). I had tried to find out more about it, but couldn't find better methods. If you know of something, please tell me too.

I plan to write another blog about the stuff which I learned from GSOC. However, I think it is best written after a week or so of reflection. :)

That's all for now.
 Navin

Thursday 7 August 2014

Two weeks after mid-term eval

Oh man, Florian Fuchs is really going to kill me for this, but this blog post had completely gone off my mind. Anyway, so here I am writing this post long after it was supposed to be written. So I will just write about the experiences which I had till 13th. (They were interesting experiences for sure). I will also pretend in the blog as if it is 13th today :p

So, starting from where I had finished off in the last post. I was left with a broken factor function and some frustration arising from that. I had posted the issue on our group, but Shashank and Ankur were busy with other stuff and they were not finding time to solve it. So in the last GSOC chat, Shashank told me to write a new factor function itself. He thought that it would be a good exercise in cython to do that on my own and also, it would be very difficult for me to parse the code which Shashank had initially written for the factor product function.

Also, I decided to include a new list to store the MAP values along with the numpy.array which we had for storing the potential values. (Corresponding to each value, there would be the values which the other variables had taken during elimination). There were a lot of tough design decisions here (how to store it : list vs dictionary. Is a list fine? However, I implemented each of the entries as a list of the eliminated variable and stored the entire thing in a list. I am not really sure what to do about the huge expense of the fact that we are storing it in a list. For later.

After this, I pinged shashank again about the product module since it was making testing very difficult for me. So shashank told me to implement it myself since it would be a very good exercise in learning cython for me too. So I implemented this in cython and it was working fine. However, there was this big issue. How to handle the eliminated variable values in the cython function. because they were stored in lists and we could only pass C++ data structures to the cython functions. I am on the lookout for possible ways to handle this.

Anyway, that's all for now. (from the point of view of 13th August).