Fine-tuning locality-sensitive hashing

Locality-sensitive hashing (LSH) allows for fast retrieval of similar objects from an index - orders of magnitude faster than simple search at the cost of some additional computation and some false positives/negatives. In the last post I introduced LSH for angular distance. In this one I will tell you how you can fine-tune it to get the expected results.
Read more →

Locality-sensitive hashing for angular distance in Python

Locality-sensitive hashing (LSH) is an important group of techniques which can be used to speed up vastly the task of finding similar sets or vectors.
Read more →

Text indexing in python - mapping text to values using finite-state transducers

In the previous posts I wrote about the finite-state automata (FSA). Now we’ll cover finite-state transducers (FST), which allow to index text with values in libraries such as elasticsearch.
Read more →

Text indexing in Python - constructing FSA from unsorted input

In this post we’ll take closer look at the Python implementation of algorithm for constructing finite-state automata from unsorted set of words.
Read more →

Text indexing in Python with minimal finite-state automata

Have you ever wondered how Lucene/Elasticsearch does its job so well? This post will teach you about essential part of the Lucene index - minimal finite-state automaton (FSA).
Read more →

Ensemble learning - stacking models with scikit-learn

Ensembling is a ML technique in which we use multiple learning algorithms to get better performance than could be obtained from any of the algorithms alone.
Read more →

Setting ulimits for docker process and containers in Ubuntu (and possibly other distros)

Learn how to set maximum number of system resources that can be allocated to running docker processes and containers in Ubuntu
Read more →