In the previous post: Branches of Artificial Intelligence, we have already discussed about different branches of AI and how they can be classified. If you are a beginner to AI and you have not read that post yet, I strongly recommend you to have a look at it because it will provide you a valuable overview of AI’s branches which may help you in choosing your research direction. Trust me, you will hardly find such comprehensive overview of AI’s branches over the Internet up to the time of this post creation.
As you may already know, Machine Learning is nowadays the most important branch of AI which allows intelligent agents to learn from experiences. In this post, I am going to introduce to you the motivation and the formal definition of Machine Learning.
First of all: What exactly is Machine Learning?
Before answering this question, let’s start with a motivation example: Spam Email Filtering. I think that most of you are using at least one email application, for example Google Mail or Yahoo Mail etc. If you look at your email application, you will find a folder called “Spam” and we know that all potential spam emails are moved here automatically. Concretely, if an incoming email is likely to be a spam, it is moved to the spam box, otherwise it can stay in the reception box.
That sounds simple, but the underlying application always has to deal with a question: how to know if an email is likely to be a spam? The following is an example of a spam email, and I think that you may have seen many similar ones like this:
Let’s have a look at its body: « your email address was selected to claim the sum of 500 000 dollars in the 2011 European lottery. To claim your prize, please contact our agent in Lagos, Nigeria« .
That sounds crazy, but you know, sometimes it works. And in order to avoid being victim of such emails, we need a good filtering strategy. Normally, such kind of spam emails can be recognized based on some patterns in their content. For example, some popular patterns may include: discount, offer, free, or congratulation, you win something, or click here to get something, etc. Imagine that you were asked to develop an application that filters such kind of spam emails, what would you do?
Here is an example of a very simple solution to this problem.
It’s just an example, but I would like to show you the idea of a simple solution without AI nor Machine Learning. Let’s have a look at this pseudo code: we want to compute the probability that the incoming email is a spam, so we first initialize this probability as 0. Then, for each word in the email’s content, we do a check as follows: if the word is “discount”, then we increment the probability of spam by e.g. 20 percent. If it is “offer”, we increment the probability by 30 percent, and so on. In general, we prepare a list of spam patterns, then we compare each word in the email’s content to each pattern in the list. Finally, we decide that the email is a spam if the spam probability is greater than a constant, let’s say, 70 percent for example. So, this is a very basic idea of a simple spam filter.
However, if we filter spam emails based on that kind of solution, we have to manually update the pattern list whenever the spammers modify a pattern. For instances, the word “for you” can be varied as « For U » with the word « you » replaced by the letter « U », or they can use the number « 4 » instead of the word “For”, or a combination of the two etc. And if it is the case, what we might have to do in our simple solution, is to update the pattern list or the code, for example adding some variation checks for each pattern in the list. And we have to repeat this action each time the spammers modify or create new patterns, it’s really a daunting task! So, we can say that this kind of simple solution cannot work effectively.
Indeed, what we really need in this situation, is an intelligent mechanism that can automatically learn spam patterns. With this idea, we don’t have to deal with manually specifying spam patterns. And to design such mechanism, Machine Learning is a great choice.
Now as you already see the motivation of Machine learning, I would like to give you its definition. According to Arthur Samuel, Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed. If you may get confused of what is “without being explicitly programmed”, then you can refer to the above example of spam filtering: whenever spam patterns changed, we have to update the code to deal with new patterns, but in Machine Learning, the application must be able to learn new patterns without having to be reprogrammed!
Indeed, this definition by Arthur Samuel is considered as an old definition of Machine Learning.
In academic context, people prefer to use this more formal definition by Tom Mitchell: A computer program is said to learn from experience E with respect to some class of task T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.
Consider the above example of Spam Email filtering. Indeed, the problem can be re-stated as the following: given a set of emails that were already classified as Spam or Not spam (called the training set), which may have, let’s say 1000, 10000, or 100000 emails, depending on your capacity, but the more the better. Then, for any new incoming email, the system must be able to determine if it is a spam email or not. In this situation, the task T can be considered as detecting spam emails. The performance P is the rate of correct decisions. And the experience E can be considered as the set of previously classified emails, i.e. the training set.
This example of Spam Email Filtering is a Supervised Learning problem i.e. learning from labeled data (emails that were already classified). In fact, there is also Unsupervised Learning i.e. learning from unlabeled data. So, are there how many types of Machine Learning and what are they? These questions will be answered in the next post: Types of Machine Learning.
Thank you for reading & sharing.