Ads 468x60px

Featured Posts

Saturday, December 22, 2018

Building a chatbot using TF-IDF


We want to build a basic chatbot which trains on previous messages and responses. In this tutorial we look at the math that we are using to convert the messages and their associated responses into weights using term frequency and inverse document frequency. (tf-idf).

Once we have the appropriate weights of words present in messages and responses. We write the messages and responses in vector form of the weight present. We then try to find how similar are these vectors using cosine similarity.

We multiply term-frequency and inverse document frequency to obtain the final weight of the word that would be used to construct the vector.  

Cosine Similarity:
This is a measure of orientation and not magnitude. The reason we are not considering magnitude of the vectors is because the magnitude can be more depending on the length of the query or response associated but that does not tell us about how similar is the query and the messages that we have in our training data.

Angle gives us the direction where the vector points towards thus if the query has similar weighted words only 5 times and the message has 500 words but having similar weights then they would point in same direction and be more similar.

The reason for choosing cos(theta) is because it is monotonically decreasing function in [0, pi/2]. We use dot product to calculate the cos(theta) as shown in figure.


In this tutorial we would give a walkthrough of the code. The libraries that have been used are the scikit learn and numpy.

Full code present on github.

First we import the following libraries.

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer

Friday, December 21, 2018

(College Education - 1) Data Analytics for Teachers/Students.

Disclaimer: the views presented in this article are personal. 

Initially, I was going to write a rant on how teachers are shit in colleges and continue the age old blame game. In this game, the teachers' think that students are stupid or uninterested and students think that teachers don't know how to teach. It is true (to a certain extent ofc), but the problem is no one ever addresses it. No one thinks of any innovative methods that can be adapted to address what's wrong. Most people (including me) are involved in their own self-interests (includes teachers and students both) and to some extent rightfully so.

Before I propose the solution, I would like you to go through my line of thought.

Teachers of today feel inclined to play entertainers as compared to knowledge imparters. In this information age which we've become privy to, fuelling curiosity is far more important as compared to imparting knowledge. Students need to be introduced to concepts in a way which makes the learning process heuristic. Enabling them to relate these to life, applications around them and have a positive impact using that knowledge.

A lot of students tend to blame the syllabus but I disagree. I think that the syllabus is well defined and in accordance with a given branch of study. The reason most students feel disengaged from the syllabus is that they are unaware of the possibilities that it holds. As students move away from immersive learning and focus only on the parts that are necessary to get them better grades the whole ideology of a model student and a model teacher changes drastically. A model teacher is often one who is able to make sure that knowledge (or the method involved in its dissemination) is transmitted to students in a way that aids them in remembering it for a duration often limited to the exam period. If a teacher can assign tasks to students that lead to good marks then they are a model teacher (and hence they are diligent to their duties) and a model student becomes one who duly completes the tasks assigned to them. The students who are regular, sincere and complete everything on time.

Let us consider the problems that arise because of this.
  • Less than 10% of students/teachers fall into the model student - model teacher zone. 
  • Little accountability and deliverables on teacher's part.  
  • Independent line of thought by the student is not given proper importance. 
  • Fuelling and engaging with the community (online forums) is more important than completing the assigned tasks. 
  • Holistic development is not taken with the same level of sincerity as compared to knowledge importation. 

Solution: Proper Data Analytics for students and teachers. 

1. ) Actionable insights for teachers and students.

Teachers often do not have the time for every student. and students struggle needlessly on things that can be quickly understood. By enabling collection of proper data (for both students and teachers) following actionable insights can be generated.

2.) Regular after class tests instead of end semester / mid-semester examination patterns. 

In order to create real-time data for analysis and actionable insights for teachers and students, it is important to create data points on a short-term basis. This would also allow machine learning techniques such as reinforcement learning come into play and interact with students, thus reducing the workload for teachers. 
Not just that, more data points would result in more answerability on teacher's part. 

3.) Venn Like Diagram for multi-discipline projects and grading on basis of those projects.

I personally think this would be super cool if implemented. The idea is to use a graphical representation shown below to grade projects. 

Here is how it could work.

  • a radius of a circle would be determined by the number of topics covered by the project. 
  • the colour of the circle would be determined by the depth of the topic understood by the person. The darker the shade of circle would imply better understanding. 
  • community comments (feedback) from people who have expertise in that area would be also listed for every project.
  • deep learning model on the employability of these projects based on the above data as input parameters to be measured. 

4.) Awarding in-depth knowledge and understanding in a unique way.

Instead of assignment submissions (which have been reduced to handwriting practice for the majority of students), the assignments should include engaging with the online community (such as StackExchange/medium) on different topics of interest. The idea is to enforce students' interests instead of adding work pressure. By having communications with a community the students would feel more appreciated for their work as opposed to now. 

5.) Incorporating extra-curricular activities (sports) as an important part of a system. 

There is nothing more important than sports. A consistent sport should have some weight-age associated with it in all educational institutes of every field as it teaches teamwork, risk-taking and communication.  

Tuesday, December 4, 2018

Installing Anaconda, Running Jupyter on Google Cloud Remotely

I was just using google collab when I realised it cannot really replace a remote server with a GPU. It is super awesome if you are trying to collaborate on a notebook with multiple authors but it does not really provide you the flexibility of terminal. There is certain extent to which "!" can go. Had google collab provided a virtual instance, it would have been super.
It is already amazing that they are providing GPUs and TPUs completely free of cost. It is too much to ask to give shell access free too, and it would be hard for them to nail down the activities such as mining or torrenting if they did, thus people would be making money on their hardware meant for educational purposes.

This post is about how to setup NGINX along with jupyter notebook.

1.) Let us first install Anaconda by downloading it from here,

Once, you have installed anaconda on your virtual machine, it is time to install and make sure nginx is running.

2.) Start jupyter notebook using the following command. Copy the link

3.) Go to terminal and type the following command.

Wednesday, September 5, 2018

A happy teachers' day! Startups in Colleges.

This post was made by one of my linkedin connections (Ramesh Kumar).

Transcript of that post:-
Startups by Engineering Students I am glad to see so many Engineering students getting interested in having their startups while studying. And every University has an Entrepreneur wing! There is huge interest, which is good. But, I do not see many startups coming into existence! Why?

#1. Focus - They seem to lack focus. They get one 'great' idea today and even before they do something on this, they get 'another greater' idea. And it is a loop.

#2. lack of time - They are students and are expected to attend classes, prepare for exams, practicals etc. They may find it difficult to spend more time on their venture. They knew this and can not give this as excuse for shutting down the project.

#3. Unrealistic Projects- If they see and experience the pain point of some one and if they find a solution, they are likely to succeed. Many times they seem to identify projects which they can not complete!

#4. Lack of Industry Support - Few companies are investing in the talent in the colleges to get their projects done by students along with the Professors

#5. Glamour - 'Entrepreneurship' seems to be a glamour word for most of them. They seem to enjoy living in that glamour for a short time and then get back to fun life!

I found this assessment super accurate and hence this post got me thinking what can be done to solve these issues and have more students pursue entrepreneurship. Here are the solutions that I think are feasible.

#1 The reason students lack focus is that they are not threatened by the lack of not doing the startup. They can very well think of it as a failure rather than fighting an uphill battle. Another reason for lack of focus is there is no continuous gamified benchmark ratings for a startup. I think giving stars to startups based on benchmarks agreed upon by the e-cells and startups would enable the student startups be more productive.(Hey, are you a 2 star startup or 3 star startup?)

#2 Lack of time is very real. Some allowance in attendance should be given for project work.

#3 I think proper brainstorming sessions which not only revolve around the novelty of idea but practical things like amount of time required, amount of money required, and importance of generating proper numbers in terms of deliverables from a startup need to be taken into account. A proper document should be created and updated on weekly basis.

#4 I think a long term platform of communication between colleges and companies should exist(instead of speaker sessions), where professors, CEOs and students can share their opinion. (dedicated college site). If that is too hard to accomplish, maybe a group of students can list down all the events pertaining to specific niche that are happening in country and forward it to students that are interested in that niche.

 #5 When work comes into picture, glamour shits itself.

Enclosing,I think it is very important to have introduction of X vs time graph for every startup. The X should change according to time but a progress needs to updated by the startup and the direction they are heading.

Wednesday, August 8, 2018

Rick Sanchez on School

I'll tell you how I feel about school, Jerry: it's a waste of time. Bunch of people runnin' around bumpin' into each other, got a guy up front says, '2 + 2,' and the people in the back say, '4.' Then the bell rings and they give you a carton of milk and a piece of paper that says you can go take a dump or somethin'. I mean, it's not a place for smart people, Jerry. I know that's not a popular opinion, but that's my two cents on the issue.

This is one of those shows that you can rewatch a million times and still find something fascinating. The first time you watch it, you are like "Oh! Is that what it meant.. I think I get it."The second time, you watch it, you are like "holy crap! this is genius". The third time you cannot stop laughing.

I think here is what Rick might say about college 
"It is petty capitalism rolled up to brainwash student's minds, Jerry. It'ssss *burp* a make belief world with make belief leaders that handown scribbled notes or something to massproduce people with alter ego. Not worth ruining a person's prime years jerry. Not at all worth it. What is the opposite of Wabba lubba dub dub?"

Sunday, July 8, 2018

Sacred Games : Short Review

Just completed binge watching the show. It is a great series with amazing plot twists and surprises. It runs in a parallel story of past and present. It is about one man's sins and other man's redemption. You would keep guessing whose sins and whose redumption the plot wants to show you. It tries to play devil's advocate to some measure in Ganesh Gaitonde's charachter potrayal.

 Both charachters complemented each other and for some reason skinny short nawaz seems completely beilivable as mafia head.

I would give it 9/10 for story. 9/10 for acting and 8.5/10 for direction and cinematagrohy. Some of the scenes were thrilling and action packed. The series would definitely keep you on the edge of your seat. Must Watch.

Friday, June 22, 2018

How to use Redis for caching data in Rails

Why Redis is used?

This is a very basic question and it comes because you can do what redis does with postgresql for the purpose of solving the functionality aspect. What redis does is, it stores the key value pair and most of the operations can be executed in O(N) time only (for more information for every command time complexity check the redis documentation). Thus caching enables retrieval of data at much higher speeds and is optimal for tasks such as pub/sub and queries that need quicker access. 

Redis Configuration 

Initialization - redis/config/ini_redis.rb
redis_host = Rails.application.secrets.redis && Rails.application.secrets.redis['host'] || 'localhost'
redis_port = Rails.application.secrets.redis && Rails.application.secrets.redis['port'] || 6379

# The constant below will represent ONE connection, present globally in models, controllers, views etc for the instance. No need to do everytime
REDIS = redis_host, port: redis_port.to_i)

model.rb file - models/user.rb

def online?
You can use views such as like this, views/users/show.html.erb
@users_ol = User.where(:id => @id).select(&:online?)