Portfolio

Developer.Data Analyst.

Download Resume


Outreachy 101 - Initial steps made easy


This article is about how to become an Outreachy participant and what people could gain by contributing to open source and my experience with Outreachy Mozilla Webcompat till date.
Outreachy is an open source internship conducted twice a year (December to March- Application deadline is in mid-October , May to August- Application deadline is in late March).The participating organizations are announced about a month before the application deadline. During this one month, the applicants need to go through the list of projects offered and find the project that suits your skill set.
If the skill set is an exact match, it is perfect. But always be ready to learn a thing or two. Then all that has to be done is:

  1. Become familiar with Git commands. Most open source projects use this version control system. You can download the Pro Git Book from https://git-scm.com/book/en/v2%20. You may not have time to complete the book. Try going through Chapter 1 & 2. It covers 80 % of the commands you will need most often.
  2. If it is a Github project go through the project’s Contributing.md file and build and run the project. (Good to do: record all the difficulties you encountered during building the project, discuss with the project members and add it to the Contributing.md file)
  3. Once the build is ready, try to use the product. Find a bug or two. File the issues.
  4. Go through the issue list, find a “good first bug”. Submit your patch / pull request.
  5. Mostly the Outreachy project page will contain the necessary initial task. If the mentioned task is to only complete the first bug, contact your mentor about the next task. Else work on the mentioned initial task and submit a pull request. Do not hesitate to contact your mentor/ other project team members in the channel’s IRC when you are stuck in this stage. Lurk on the IRC channels. Subscribe to the projects mailing list. You can find the list of all mailing list for Mozilla here :
    https://lists.mozilla.org/listinfo. Mozilla has a strong support base for welcoming new contributors to their products. You can simply get started by checking out this cool website:
    http://whatcanidoformozilla.org/
  6. Go through the project folder structure: check how the tests are written, understand how the pull request you send gets merged (Travis), try to work on ‘Good next tasks’
  7. Final but the most important one: ‘Submit your outreachy application’. Even after submitting your application, keep contributing to the projects. You can contribute to multiple projects. In case there are multiple applicants for a project, the project mentor will contact you and point to some other project for which you might be a suitable fit.

Now to the cool part. The perks:

  1. I got the chance to participate in Mozilla London all hands. I am going to receive the laptop(Travel and stay is taken care of. You get to meet the mentor, team members, other upcoming product/technology events relevant to Mozilla)
  2. I can get my code reviewed, corrected, suggestions about the best practices. There is a lot of learning.
  3. In addition to the 5500$ stipend, there is a travel allowance of 500$. This can be utilized to attend any conference of your choice and share your experiences. This fund is available a year after the internship end date.
  4. You have a mentor who can spend their time to guide you whenever you are stuck. My mentor was very cool. I get the quickest response possible in IRC for all my queries. This may not be the same for all projects. Mentors can be quite busy. In case you are not comfortable, try contributing to the projects that you feel you fit into the culture of the project. Most Outreachy projects are very welcoming to freshers.
  5. You get to pick the task you are most interested in. There is no compulsion that the tasks listed in the project page is all you have to do. Little efforts to improve project in subtle ways is also greatly appreciated.
  6. You get to create a Mozillian’s profile and you are vouched for your work. Being vouched means you get access to special content that Mozilla shares only with the community.

Experience with Webcompat, Mozilla:

Webcompat is a platform to get the compatibility issues resolved from all over the internet. Quoting from the Wiki:

Webcompat.com allows individuals to easily report site compatibility issues - and to allow us to better understand the larger picture of compatibility issues affecting Firefox users on the web.
Web compatibility is about making sure web sites work consistently across all browsers and devices. Sometimes, sites have bugs or policies that prevent them from working well in every browser. When Firefox is missing crucial standards features that sites rely on, webcompat communicate this back to the Gecko Platform.

What I really liked about the team was they were a close knit community and were extremely supportive during my initial tasks. My initial task included few ‘good first bugs’ and refactoring the error codes in this flask application.
Currently I have documented the issues specific to my environment that I encountered while building the flask application. In addition, I am also working on Intern for writing functional tests. More about my work in webcompat in the next blog article. Happy Coding!


How I built this website using pelican


I thought of documenting the entire process starting from pelican installation. But there are a lot of documents available in the internet already. I found the following pages pretty resourceful:

Instead I decided to document few issues I encountered while setting up the website.
  • Custom Themes made simple:
    I did not like any of the themes available. Most of the themes seemed to add Meta data such as date and author for every article I posted. So I wanted to use a custom theme. You can find a lot of custom themes in http://templated.co/ . I chose transit template. Once I chose the template, my job came down to adding the three html pages elements.html, generic.html and index.html inside the contents folder of html.

    You can find the theme folder inside the pelican project. The theme folder consisted of two folders: static & template. The static folder consists of font, css , images and js folders. I created a new theme folder called my-basic inside the theme folder. Inside my-basic, I created 2 folders – static & template. Inside the static folder I put the font, css , images and js folders I downloaded from transit template. Inside the template folder, I added base.html – this page is like a holder for all the html pages we add into the content.


    Inside the template folder I also created a folder called include – where I added two files namely header.html & footer.html.
    The following are the code changes inside base.html, header.html, footer.html:

    Base.html

    
    
    {% block head %}

    The above code must be added inside the head tag.The following code must be added to body.
    
     {% include "include/header.html" %}
     {% block content %}{% endblock %}
     {% include "include/footer.html" %}

    Now I need to set this theme as my pelican theme. The following are commands for the same:
                             pelican-themes --install ~/Dev/Pelican-project-folder/pelican-themes/my-basic –verbose

    You can check if the theme is installed using the command:
    pelican-themes –l
    Now we need to change the theme settings in pelicanconf.py.
                            THEME = '/home/username/dev/Pelican-project-folder/pelican-themes/my-basic/'
    Now run make html and reload the siteurl and voila!

  • How to set a static html page as my default home page in pelican?

    The next problem I countered was making a static html page my home page. If we notice the output folder, we can see that pelican auto generates the index.html page. Another important observation is that in the output folder, the name of the files generated are based on the title of the html pages in the content folder.


    In the pelicanconf.py add the following command:
    INDEX_SAVE_AS = 'venkid_index.html'
    DELETE_OUTPUT_DIRECTORY = True
    Index_save_as saves the auto generated index.html to a different given(in this case venkid_index.html) name in the output folder. Now change the title of whatever html page that you want to be stored as home page to index.
    As usual, run make html and reload the siteurl and voila!

  • After executing the make html command and if you try to display your web page, you can encounter the issue “Forbidden You don't have permission to access / on this server.
    This is because apache doesn’t has access to the output folder. To solve this, follow the below steps:
    • Navigate to your apache config file (.conf) usually in the path /etc/apache2/.
    • 2) Open the config file (.conf file) and look for the part
      
      …
      ..
      
    • Paste the below code
      
      
                      Options FollowSymLinks
                      AllowOverride None
                      Require all granted
      
      
      
    • Restart the apache server - sudo service apache2 restart
Feel free to clone the project from: https://github.com/deepthivenkat/deepthivenkat.github.io.git
Happy personal website building guys!



Experiments with Generative Adversarial Networks using keras

Why GANs:

The Generative models study the probability density distribution of the original data set and try to mimic data with the same function. Some cool applications of GAN:

  1. Training GAN with image frame from a video could help in predicting the next few frames – since it is a time series data, it is like predicting future!
  2. Mixing features from a set of images to produce new sets that never existed – to create scenic landscapes that never exists from the most beautiful landscapes of the world, new puppy breeds that do not exist – we are going to see the puppies example in our blog.
  3. In addition to the 5500$ stipend, there is a travel allowance of 500$. This can be utilized to attend any conference of your choice and share your experiences. This fund is available a year after the internship end date.
  4. To regenerate missing portions of the data – damaged pictures, Missing lines in a book/article, missing DNA sequences. The greatest advantage of this is the output is not restricted to a single possible outcome. There are a series of multimodal outputs that can be obtained with associated probabilities. This leads us to produce a different imaginative sequence like auto creating animations, artwork, stories etc.
  5. 1) To provide ideas to design their products in the future or to visualize how the product could evolve – these can be produced in parallel and are not dependent on the dimensions of the input.


How do they work?

The generative adversarial networks have two models –

  1. A Discriminator model that takes and input and determines if the input belongs to the original dataset or not
  2. Generator model trains on making the data that is indistinguishable by the discriminator.

The specialty of GANs are the data need not be labeled. The model can regenerate missing/blurry portions of the data and train on them.

A generative model that takes the loss and optimization function from the discriminator as input and try to improvise on the data it produces. In general, both the discriminator and generator are updated after each iteration to make better decisions. However, when both discriminator and generator are multi-layer perceptron, no explicit feedback is necessary since both the models are trained using back propagation and dropout algorithms.

The Discriminator and Generator play a two-player minimax gaming model in the following way using multi-layer perceptron as generator and discriminator model (adversarial networks):

  • If θg and θd represent the parameters that are being passed to the multilayer perceptron used for generator and discriminator respectively.
  • The Generator function is defined by G(z θg) is a differentiable function where z is a random noise variable.
  • The Discriminator function is defined by D(x θd) where D(x) is the probability of x belonging to the input data.
  • If the input to the discriminator function is defined by x- where x could be from the data set or the generator.

In the minimax two player game, the goal of the generator is to make the discriminator output true for the data generated by it. In order, to make the discriminator output true for its data belonging to the input set, it needs to maximize D(G(z)) and therefore minimize log(1-D(G(z))).

The goal of the discriminator is to learn to be accurate it its prediction – whether the data belongs to input or to the generators. It tries to improve accuracy of D(x) prediction.

D tries to make D(G(z)) near 0, G tries to make D(G(z)) near 1. For example, the generator and discriminator control each other’s loss functions. If the generator's loss is less than the discriminator's loss, then it means that the generator is doing better than the discriminator -- it is generating images that are fooling the discriminator. 

Here the key is to find the equilibrium point where both learn equally. If one is trained more than the other the model may not function well.

Challenges in designing generator density function

The generator model should be able to capture the complexity of the data through the maximum likelihood estimate. The loss function of the generative model should be able to control the changes happening to them after each iteration based on the loss function.

Parameter tuning for better fake images:

  1. Minibatch discrimination is feeding the discriminators with a batch of samples rather than individual samples. This will help the discriminator to distribute the focus on all samples evenly. However, minibatch can further be optimized to contain mixed samples of images from both the original data and generator. This will help in improving the accuracy of prediction for discriminator in the long run.
  2. Normalization of data in between -1 to +1 and random sampling of data using Gaussian distribution instead of uniform sampling.
  3. Using ADAM optimizer for generators helps in obtaining better images over other optimizers.
  4. DCGAN works best for most image generation. Here is the output for the puppy’s data set that makes use of DCGAN. The first image is a result from another blog that made use of RGB values of puppies’ image set. In our blog, we will be making use of grey scale puppies image dataset. The results are seen in the second image. The code can be found here on Github: https://github.com/deepthivenkat/DCGAN. The results are better compared to other composite generative adversarial models.

Image outputs from the generative model of DCGAN: RGB

Image outputs from the generative model of DCGAN: GREYSCALE LOW RESOLUTION IMAGES

In this blog, we are going to look at two different GAN models. First let us look at the GANs built on a sample 1 D normal distribution and how the generator model tries to fit with the discriminator model after every iteration. Later we will discuss how the deep convolution generative adversarial model works on puppy MNIST dataset, the results and for puppy grey scale dataset.

Let us look at the generator and iterator function. Here the z(noise/generator) function is m batch random draw from -5 to 5 whereas the input data was m batch draw from 1 dimentional normal distribution:

                        mu,sigma=-1,1

xs=np.linspace(-5,5,1000)

x= np.random.normal(mu,sigma,M)

z= np.linspace(-5.0,5.0,M)+np.random.random(M)*0.01


where M is the number of items for batch normalization.

The normal transformation function makes use of tanh activations. The base learning rate for SGD optimizer was 0.001. The same optimizer function was used for both the generator and discriminator networks.

The following are the data function and the generator function before training the generator begins:

However, at the end of ten thousand iterations, the shape of the generator function changes to fit per the input function as seen in:

Hyper parameter optimizations

To quicken convergence, the minibatch data from both the input and the generator was sorted before it was being fed into the discriminator.Now let us look at the DCGAN model we are using for MNIST dataset.

Generator Model Summary

Discriminator Model Summary

Hyper parameter optimizations

  1. The DCGAN discriminator model performed better when the activation point was changed to Leakyrelu instead of relu.
  2. Adam was used as a parameter for the optimizer.
  3. The loss function categorical cross entropy was used for all the models – GAN, Discriminator and Generator.
  4. All the images input data and the noise data from the generator were normalized before being fed into the discriminator model.

The plot of discriminator and generator model losses with each iteration is as follows:

Some of the generated MNIST images from the generator model after the training is done:

MNIST datasets are flat numbers on a 2D plane. Now let us try a slightly more complex example with the same model. The generator and the discriminator model losses plot for 100 epochs and for a batch of 256 image sets are:

The generated puppy images for low resolution (28*28*1) grey scaled puppy images were:

Future Work

The greatest advantage of GAN is the model can handle higher resolution multidimensional image densities and can generate high resolution images. When high dimension of input data could be maintained for Eg : (4096*4096*3) and can be fed into the model, the generated images could also be of high resolution. The number of training epochs for the generator and discriminator and the GAN model can also be increased.

Conclusion

Deep convolution networks perform much better over simple composite generative models like tanh. Batch Normalization helps in discriminator focus on the normalized structures of the input data instead of specific examples.

References

  1. https://arxiv.org/pdf/1406.2661.pdf
  2. https://arxiv.org/pdf/1701.00160.pdf
  3. http://blog.evjang.com/2016/06/generative-adversarial-nets-in.html
  4. http://www.iangoodfellow.com/slides/2016-12-04-NIPS.pdf
  5. https://github.com/soumith/ganhacks
  6. https://github.com/goodfeli/adversarial
  7. http://blog.aylien.com/introduction-generative-adversarial-networks-code-tensorflow/
  8. http://www.iangoodfellow.com/slides/2016-12-04-NIPS.pdf
  9. https://arxiv.org/pdf/1606.03498.pdf
  10. https://arxiv.org/pdf/1511.06434v2.pdf