ALVANTIA

Soluciones que aportan valor

ALVANTIA
  • SCF Platform
  • Factoring Platform
  • Customised solutions
    • Consulting
    • Highly qualified in-house teams
  • Contact
  • Join us
  • Articles
  • English
    • Español
  • LinkedIn

Introduction to Machine Learning

  • 11/06/2019
  • alvantia (en)Technology
Introduction to Machine Learning

You have almost certainly heard about Machine Learning lately, and that is because this scientific discipline is being applied to more and more fields. This is due to two main reasons: great technological progress, especially the computing capacity, and the large amount of data that we currently have.

Machine Learning is a branch of artificial intelligence that focuses on the study of algorithms to create predictive models.

Illustration 1. Structure of Artificial Intelligence

It essentially works as follows: from some training data, a mathematical model is generated based on disciplines such as statistics and algebra. New data are subsequently introduced to this model to make the prediction, as shown in Illustration 2.

In the model training process, input data are divided into two or three sets. One of them is allocated to the training itself, which is the biggest set. The second is used to test the model and the third for validation purposes. The latter is optional as it depends on the amount of data we have. As a general rule, it is recommended that more data should be assigned to training than to testing. In the event of using an extra set for validation, the test set is usually divided and a part is assigned to the latter set. For example, if 60% is allocated to training and 40% to testing, the latter percentage is divided with the following distribution: 20% for testing and the remaining 20% for validation.

Illustration 2. Generic operation of Machine Learning algorithms

There is a great variety of algorithms, which are grouped into three main categories depending on the type of learning that takes place in the creation of the mathematical model. These are explained below.

1. Supervised Learning

In supervised learning, a predictive mathematical model is created by applying certain algorithms to a set of data, which already have their corresponding category. That is to say, each element of the set has a label associated with it which defines the category this element corresponds to for the purpose of introducing unclassified data into the model and ensuring that this one provides us with the category to which they belong. 

The generation process is based on classifying the elements of the training set and comparing the result with the label associated with each element. This process is carried out iteratively to adjust the predictive model.

One of the best-known supervised learning algorithms is based on Decision Trees, which use the “divide-and-conquer” technique to classify input data, taking into account the characteristics/properties of the data. This algorithm in particular is based on probability, using the entropy value, which reflects the level of uncertainty or disorder, showing which of the data attributes are most relevant in the decision-making process. This value is between 0 and 1.

Illustration 3. Representation of a decision tree

2. Unsupervised Learning

In contrast to the previous category, in unsupervised learning, the training data set used does not have labels. Therefore, these algorithms only take into account the attributes/characteristics of these data. Within this category, the best-known algorithms are based on the clustering process.

An example of an algorithm belonging to this type of learning is the so-called K-Means, whose purpose is to group data into k groups according to their characteristics/attributes. It is based on the use of quadratic distance.

The algorithm is divided into four phases:

-K centres are established in the vector space in which it is working and these can be selected in different ways, one of them being at random.

-Once the centres are defined, the data are associated with the centre with the closest average.

-The algorithm then recalculates the position of the centres with respect to the data associated with each centre.

-Phases 2 and 3 are repeated iteratively, as many times as configured. Ideally, the number of repetitions must be set until convergence is reached.

Illustration 4. Input data
Illustration 5. Grouped data

3. Reinforcement Learning

It includes the algorithms that most resemble human learning, since it is based on trial and error. To achieve this behaviour, a reward/punishment function is implemented. Training data are labelled using a mathematical function that can be discrete or linear. In this type of learning, the decision-making process is more important to reach the target than the solution itself.

That is to say, after each decision made by the algorithm until it reaches the target, a reward is given, the value of which depends on the correctness of the decision made.

After finding the solution, the quality of the decisions taken is evaluated based on the set of rewards/punishments obtained during the process of solving the problem.

The problems to which this type of algorithm is applied have two main components:

-An agent, which represents the entity that performs the actions/decisions, in this case the algorithm.

-And an environment, which represents the context of the problem. 

In short, this type of learning is intended to optimise the process of finding the solution to a particular problem.

An example of the application of this type of learning can be found in maze-solving algorithms. In this particular case, the context is the maze itself, which has attributes such as distribution, and the agent is the algorithm that is applied to solve it.

In each step, the agent is rewarded or punished with a value that accumulates. In the event that it advances against a wall, it will obtain a negative value as “punishment”, and in the event that it advances toward a path, it will be given a positive value that will vary depending on how good that path is, as shown in Illustration 7.

The algorithm, after reaching the end, will have the accumulation of rewards/punishments and this value will give you the information as to how good the path it has taken is. Training is considered complete when you have found the optimal path, that is, the path in which the accumulated reward value is the highest.

Illustration 6. Maze distribution
Illustration 7. Reward values

Overfitting and Underfitting

To conclude the article, we would like to mention two problematic situations that can occur during and after model training.

The first problem we will deal with is called Overfitting. It occurs when the model fits so closely with the training data that when new data are entered for a particular category, if they do not have exactly the same characteristics as the training data belonging to that same category, they will not be classified correctly.

For example, in an animal classification algorithm, if your training data only include characteristics of a certain dog and this problem occurs, when new data relating to a dog with characteristics different from the training are entered into the model, it will not be recognised as a dog.

And the second case is called Underfitting, which is the opposite of the previous one. It happens when the model fits so poorly that it is unable to make an acceptable classification/prediction, even though the new data have characteristics that are very similar to the data used during training.

If you are interested in knowing more about this scientific discipline, stay tuned! In upcoming articles we will delve a little deeper into the world of Machine Learning.

Tagged

Machine Learning

Share

Latest posts

  • Alvantia sponsors the AEF Assembly
  • Alvantia participates in Factoring Road Show 2025 in Lima and Bogota
  • Factoring and Confirming sector in Spain reaches 266,652 million euros in 2024
  • Alvantia joins Finwave Iberia & Latam and becomes part of Grupo Fibonacci
  • Factoring and sustainable finance

Archive

© 2025 ALVANTIA

  • LinkedIn
  • Home
  • Legal conditions
  • SCF Platform
  • Factoring Platform
  • Customised solutions
    • Consulting
    • Highly qualified in-house teams
  • Contact
  • Join us
  • Articles
  • English
    • Español
Gestionar el consentimiento de las cookies
Para ofrecer las mejores experiencias, utilizamos tecnologías como las cookies (propias y de terceros) para almacenar y/o acceder a la información del dispositivo. El consentimiento de estas tecnologías nos permitirá procesar datos como el comportamiento de navegación o las identificaciones únicas en este sitio. No consentir o retirar el consentimiento, puede afectar negativamente a ciertas características y funciones.
Funcional Always active
El almacenamiento o acceso técnico es estrictamente necesario para el propósito legítimo de permitir el uso de un servicio específico explícitamente solicitado por el abonado o usuario, o con el único propósito de llevar a cabo la transmisión de una comunicación a través de una red de comunicaciones electrónicas.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Estadísticas
The technical storage or access that is used exclusively for statistical purposes. El almacenamiento o acceso técnico que se utiliza exclusivamente con fines estadísticos anónimos. Sin un requerimiento, el cumplimiento voluntario por parte de tu proveedor de servicios de Internet, o los registros adicionales de un tercero, la información almacenada o recuperada sólo para este propósito no se puede utilizar para identificarte.
Marketing
El almacenamiento o acceso técnico es necesario para crear perfiles de usuario para enviar publicidad, o para rastrear al usuario en una web o en varias web con fines de marketing similares.
Manage options Manage services Manage {vendor_count} vendors Read more about these purposes
Ver preferencias
{title} {title} {title}