Artificial Intelligence vs Human Intelligence: we really don’t have to make a choice

Approaches towards synergetic and collaborative interaction

As a researcher and practitioner of machine learning since the 80’s I have a confession to make.  I was one of those who often dreamt about the day robots would relieve us of all the monotonous cognitive processing we have to go through before we get to the end of the day. We would spend our time focussing on grander thoughts, making the world a better place thanks to our boundless creativity. As the years passed and one machine learning project led to another I had to admit that we had virtu­­­ally never replaced a human mind in the way I had imagined. In customer care, compliance monitoring or even e-discovery in litigation – you name it –  we designed robots, yes, but never ones that took decisions and carried them out all by themselves. What we did do however, was provide real tangible support to the people making decisions. My dream has come true – just not in the way I had imagined.

Acknowledging that the Artificial Intelligence engines we create shouldn’t work in isolation but instead, work with Human Intelligence, should be considered a strength not a weakness. Where the challenge lies is how these man-machine collaborative systems communicate to be successful.  They each need to express their needs and expectations. They also need to guide each other. This is my new dream and one of my favourite research goals.

The (disappointing) reality

My team and I put most of our energy into designing and implementing machine-learning based systems that improve the productivity and performance of our business solutions. Our common goal is often to retrieve and organize information that lies in large, unstructured collections of documents, or to propose personalised recommendations in time-varying environments. These are environments where the users have changing information needs and where the objects themselves (documents, movies, communities of experts, …) come and go or may be perceived differently over time. The systems are integrated and rolled out in decision support platforms and workflows. One example is a technology-assisted review (TAR) platform for litigation and compliance monitoring. These platforms typically process millions of documents (emails, memos, plans, …)  in preparation for a corporate litigation case and help lawyers or paralegals to review them. Another example is to improve product recommendations of a global DVD reseller web site. 

Our tools propose things to humans. They are the ones to decide if they want to accept or reject the offers. This distance between the algorithms and final action is an important one to keep. In the fields of e-discovery, compliance monitoring and healthcare, the outcomes could have such huge impact that it’s wise, and sometimes even compulsory, to have the human in the loop. It would be simply absurd to try and force a customer to buy a product. The inherent lack of confidence humans have in models and algorithms is another reason. People naturally want to understand what’s "inside the box" to increase their confidence in the results, especially when conditions are difficult. Difficult conditions typically occur when the input data is far from the kind used to build the model or even if they are outliers. Yet another, more subtle reason to prevent the algorithms from having the final word, is the time lapse that occurs between the moment the user expresses their needs (i.e. what they’re looking for, in the form of labelled examples to train our machine learning models), and the moment the model is applied to real data. There is some kind of "concept drift" phenomenon. The lapse is usually pretty long, with little interaction between our team – who manages and controls the model and the end user. So, in the end, the tool doesn’t do what the user had expected. What they wanted to retrieve and the initial training examples they selected are often incomplete, imprecise and subject to change when confronted with the outcome. To complicate things even more, often the user doesn’t know exactly what they’re looking for. That finding comes from interacting with the tools and, in the case of e-discovery, taking time to explore the document collection. In a nutshell, the ideal situation is to allow users to interact with the underlying machine learning algorithms in a mutually enriching and "agile" way with no intermediary. Up until recently this was practically impossible, but things are changing.

Approaches towards synergetic and collaborative interaction

AI_Human synergetic and collaborative interaction

We undertook the first step in trying to realise this kind of interaction 5 years ago in the TAR document review for litigation. Traditionally, the attorney or paralegal has to give a binary label to each document under review in the TAR platform ("responsive" vs "non-responsive"). It quickly became clear to us we would get much better machine-learning based classifiers if we could allow the expert user to enter "directional" and "graded" labels. This means that a document that was not-responsive for the case could be graded as "nearly responsive" or "going in direction of responsiveness". Or between two non-responsive documents, one could be graded as "a bit more in the responsive direction" than the other. We even allowed conflicting labels when multiple users reviewed the same documents with differing opinions. By adapting the training algorithm to take the conflicting labels into account, as well as the level of expertise of the labellers, the performance of the classifier increased significantly.

After this, we extended the ways subject matter experts could introduce other types of prior knowledge or even finer-grained labels. A typical example is to allow the user to express "queries" as if they ‘re looking for relevant documents in a collection through a regular search engine. We then use these queries to teach the classifier. We do this by simply using the relevance score of each document in the collection with respect to the query, as a highly discriminative feature of the document. This extra feature is then added in an appropriate way to the other features of the document when using the classifier, at training and test times. In a similar vein, the subject matter expert can introduce a list of terms they consider representative of relevant (or non-relevant) documents. The opposite is possible too, where the learning algorithm proposes a list of terms with their "polarity" (relevant vs. non-relevant) and the user confirms or rejects the proposal.  The opportunity to highlight text to show what makes a document relevant or not (fine-grain annotation) is another way to improve the machine learning algorithm. In return, at test time, the classifier partly motivates its classification decision by highlighting the passages that contribute most to its final decision. This helps the user understand the underlying model in a simplistic way.

Giving confidence by motivating a recommendation

Confidence, like trust, comes from understanding so it’s difficult to guide a machine learning system if you don’t know what it’ll do with your guidance. We recently worked on associating a movie recommender system with profiles extracted from users’ past behaviour and the movie comments of users with the same profile. The recommendation algorithm basically analyses what the compatibility points are between the implicit preference model of the user and the features (both explicit and implicit) of the proposed items, and associates the most relevant terms of these compatibility points extracted from comments of users with a similar preference model. These compatibility points are actually automatically extracted from the users’ ratings through a method known as matrix factorisation; they correspond to latent factors that could be a posteriori interpreted as attraction/repulsion towards movie genre, movie time period, groups of actors, etc.  This helps the user understand the context and the value of the proposal and, eventually, to accept the recommendation.

A final example of where we provide users the rationale behind a prediction is email routing in customer contact centres. By performing a "what-if" analysis we can locally mimic the behaviour of our complex machine learning algorithm and display, in natural language, the rules that resulted in the routing applied.


Bi-directional feedback between humans and machines is important to increase confidence of users and is a useful way to improve performance. One of the most important factors for success is the design of the graphical user interface that translates and adapts these algorithmic mechanisms into efficient and effective communication tools. A multi-disciplinary approach is required that takes into account ergonomic, cognitive and psychological factors. Our collaborative touch-based table, DISCO, is a good example of such an implementation. You can learn more about it here. It will be the subject of the next blog post.


Further reading: