Learning to (Retrieve and) Rank — Intuitive Overview— part I

Published in

Job&Talent Engineering

7 min readMar 21, 2017

Real case candidate/job opening ranking framework at Jobandtalent (JT)

This is a very high-level introduction about how candidates (documents) are retrieved and ranked given a new job (query). We are going to share and discuss part of the framework we have built at Jobandtalent, and some of the ongoing projects we are working on to improve it. With this post, we aim to help you in building your own systems to start solving similar challenges.

For the records, Jobandtalent is a platform that aims to find the best employees for companies and the best employment for candidates, taking care of the whole funnel, from the search till the signature of the contract.

This first article discusses the topic of Text Understanding, in particular on how to process the job position (or query if we are speaking about search) to have some insights about its content.

**Fig1**. Jobandtalent Framework Overview.

Fig1 shows an overview of the framework that we’re building at Jobandtalent that takes care of retrieving and ranking the candidates. Some steps are already used in production while others are still under development and basically everything is always under constant improvement. In any case, this post will give you an idea about where we are heading.

In between a job is created and a set of candidates is retrieved, many things happen, and we can summarise them in three main phases:

Text Understanding. Among the information of the new job, the textual data is by far the most important and thus it’s critical to understand and process it in the correct way. Here Natural Language Processing (NLP) techniques come to help.
Candidate Retrieval. Once we know what we’re looking for, there are multiple ways to search for it, and we need to design the infrastructure in a way that can support our needs. Information Retrieval is required.
Candidate Ranking. Now that we have the (possibly huge) set of candidates, we want to rank them in order to place higher the most relevant ones. Learning to Rank techniques are exactly what we need.

Let’s discuss these steps with simple explanations and a few examples.

Text Understanding

Let’s assume we receive the following input query (e.g., job search):

“Senior Ruby developer working remotely”

In order to understand a sentence, we need to identify the meaning of its words. A common approach is to work at the term-level to automatically identify in which category a word will fit. The categories are pre-defined and should be designed in a way to help us structure the information they contain. For example, let’s assume we can extract the following classes:

**Fig2.** Example of job position terms classified — note that “Ruby developer” can be classified as a job_position alone if we consider also multiple terms together (*n-grams*: sequence of n-words).

And then exploit them to better retrieve candidates, for example using the location to filter candidates by their geographic preferences, prog_lang (i.e., programming language) to emphasize developers that own such skills, seniority to select candidates with the correct experience, and finally job_position as the role that we are looking for.

There is a set of techniques known as Named-Entity Recognition (NER) that handle this type of tasks. There are fairly good libraries available on the Web for a quick start, two well-known examples are Apache OpenNLP and Stanford NER. The workflow to train them is a classic one for supervised Machine Learning problems, that we can summarise as follow:

**Fig3**. Example of a training pipeline for a NER model with OpenNLP.

First, prepare the dataset considering the algorithms and the settings that we are going to use. For example, in order to recognize a specific entity (e.g., a job_position) we need to be sure that it appears enough times in our dataset, otherwise, certain algorithms won’t be able to learn how to recognize it.

The annotation step is very time consuming but it’s the most important one. Here we create the “ground truth” the algorithm will use to learn what and when to recognise. Unfortunately, there aren’t available datasets ready to use on our domain, meaning that we need to build the whole annotated set by ourselves. This is a long (and quite boring) manual task, that deals with text homonyms/heteronyms, typos, ambiguities, incomplete descriptions, etc. Moreover, our domain has its own terminology and often a mixture of languages, e.g., job_position in Spain contains also English terms. All of this makes the classification extremely tricky, and that’s the reason why a manual step will guarantee a better ground truth.
These are some common examples:

• "Retail Events Assistant": this can be identified as a single job_position but it's extremely specific and if we considered it as a single entity it might be hard to find candidates.• "Team Manager": this is a very generic job definition and it doesn't really contain enough information to identify the job_position (unfortunately there are plenty of these cases).• "Pizza Chef": the word "pizza" is critical to distinguish the type of job, this candidates might have very different skills than a "Sushi Chef" for example.• "Pizza Delivery Driver": the term "pizza" in this case is more noisy than helpful.

There are terms that can be classified as job_position (“pizza” in “pizza chef”) in some cases, while discarded in others (“Delivery Driver” without “pizza”). These are examples where the manual annotation makes the difference. For example, an algorithm should understand that the term chef when surrounded by certain nouns (e.g., pizza chef, sushi chef, sous chef) or when followed by certain prepositions and nouns (e.g., chef de partie) means different job_position, and it should also recognise when the nouns should not belong to the same class (e.g., head chef, full time chef, event chef). The manual annotation allows us to reach this level of accuracy, it’s an investment of time and resources but it is definitely worth the effort.

Finally, the model training step which, if well designed, could be pretty straightforward. It’s always recommended to tune the parameters with cross-validation (i.e., splitting the data in a way that reduces the chances of over-fitting our model) and use metrics that well suit our classification problem. For example, when dealing with job_position we want to minimise the missing classified terms (otherwise we will have a very hard time to find related candidates), thus we are interested in emphasizing Recall over Precision, F1-Beta score comes handy.
The methods we are using will determine the complexity of the feature engineering task. You can think about a feature as a value that gives us information regarding some characteristics of the data. In our case, if we want to classify a term as a job_position the surrounding terms and the type of those terms (nouns, adjectives, etc.) might be useful features. These features strongly depend on the algorithm we are going to use. For example using a method based on Conditional Random Fields (e.g., Stanford NER) we can select the sliding windows of the terms to consider, design POS tagging successions (adjective followed by a noun), and many other characteristics (terms finishing in -ing, plurals, etc.). The contribution of the features needs to be studied and evaluated in order to find a good setup with respect to the final accuracy of the model. For more information, check out this overview of variable and feature selection.

We have discussed a possible flow that can be implemented to extract insights (classes) from our initial text (query). In the next article, we are going to see how these insights can be used to retrieve a pool of candidates.

Ensemble of NERs

A little parenthesis about a more advanced approach that we’re building at Jobandtalent, based on the votes of different NER models. We basically train (as we’ve seen in Fig3) multiple models and/or multiple times each model with different parameters. The idea is to build a series of models where each one can learn slightly differently from the data, with the assumption that at the end, the votes cast will lead to better final accuracy.

Once each model classifies the input text with its own “experience”, a simple approach is to take the weighted average of the votes to infer the ultimate classification. Another approach consists of training a model that can define customized weights, which is called meta-modeling. See how Stacking works (a popular approach in the Kaggle challenges).
Below a simple representation of our ensemble of NERs.

**Fig4 — Ensemble of NERs**. Instead of training a single model as we’ve shown in Fig3, we train many models and settings: using different parameters, with/out external gazetteers, etc. obtaining a big jump in accuracy.

We basically exploit the diversity of the models, with the assumption that considering many different approaches leads to better overall accuracy than listening to just one of them. Apache OpenNLP, for instance, is based on Maximum Entropy Modelling, where Stanford NER is built upon a derived class of techniques called Conditional Random Fields (CRFs). Other models becoming popular are based on Word-Embedding and RNNs. They are all great models that tend to learn the same information but in different ways.

After this little digression, let’s move to the next article to see how to use the information we have inferred from the text in order to retrieve the initial set of candidates.