AI Experiments & Thoughts

Exploring the frontiers of Artificial Intelligence, Data Engineering, and System Design.

12/7/2017
Text Classification – Classifying product titles using Convolutional Neural Network and Word2Vec embedding
Text classification help us to better understand and organize data. I’ve tried building a simple CNN classifier using Keras with tensorflow as backend to classify products available on eCommerce sites. Data for this experiment are product titles of three distinct categories from a popular eCommerce site. Reference: Tutorial tl;dr Python notebook and data  Collecting Data … Continue reading Text Classification – Classifying product titles using Convolutional Neural Network and Word2Vec embedding →
5/31/2016
Productionizing a CRF model, Recipe Ingredients Tagger in Action.
Steps involved in productionizing a statistical model.
2/19/2016
Structuring text – Sequence tagging using Conditional Random Field (CRF). Tagging recipe ingredient phrases.
Building a food graph is an interesting problem. Such graphs can be used to mine similar recipes, analyse relationship between cuisines and food cultures etc. This blog post from NYTimes about “Extracting Structured Data From Recipes Using Conditional Random Fields” could be an initial step towards building such graphs. In an attempt to implement the … Continue reading Structuring text – Sequence tagging using Conditional Random Field (CRF). Tagging recipe ingredient phrases. →
8/17/2015
Setting up python development environment with buildout
Attn: Checkout Conda before trying this. Buildout is a Python-based build system for creating, assembling and deploying applications from multiple parts, some of which may be non-Python-based. It lets you create a buildout configuration and reproduce the same software later. –buildout.org I’ve documented the steps required to create a simple buildout based project. Start by … Continue reading Setting up python development environment with buildout →
12/22/2014
Locality sensitive hashing (LSH) – Map-Reduce in Python
I’d try to explain LSH with help of python code and map-reduce technique. It is said that There is a remarkable connection between minhashing and Jaccard similarity of the sets that are minhashed. [Chapter 3, 3.3.3 Mining of massive datasets] Jaccard similarity Where a and b are sets. J = 0 if A and B … Continue reading Locality sensitive hashing (LSH) – Map-Reduce in Python →
4/27/2013
Clustering Text – Map Reduce in Python
Here I’m sharing a simple method to cluster text (product titles) based on key collision. Dependencies python-levenshtein stemming NLTK corpora/stopwords My Input file is a list of 20 product titles The idea is to split the data into a meaningful cluster so that it can be given as small input to various systems (de-duplication or … Continue reading Clustering Text – Map Reduce in Python →