Data Science Glossary
Data Science (DS)
"Doing something with lots of data."
Doing something with data, usually a large amount of data. Attention: This notion is highly unspecific and it encompasses very different fields.
In more detail, business analytics using Microsoft Excel or Power BI would be considered DS. Likewise, there is a major DS part in ML. These two DS things are very different though. Scandio does not do the first one but the second is very important.
AI (Artificial Intelligence)
"The computer does something that looks smart."
Roughly, the idea of this field is to get a computer do something that can be perceived as smart. A naive example is a computer that manages be a very good chess player.
Machine Learning (ML)
"The computer should figure out how to be smart by itself."
ML is a subset of AI. It entails procedures where the computer is getting smart "by doing some magic itself". The aim is to arrive at "artificially intelligent" behaviour.
A main reason why ML is popular is that telling a computer explicitly what "being smart" means is even more difficult.
Data Analytics denote any endeavor to investigate a given data set with respect to the information contained within it. As a rule of thumb such data is virtually ubiquitous. However, another rule is that all statistical findings require to be put into suitable contexts to yield real benefit. Thus, expertise of the field where the data originated is just as necessary as statistics.
As a result, Data Analytics is always driven by exchange and dialogue of statistical means and "local" expert knowledge.
PyTorch and Tensorflow (TF)
Two major software frameworks for machine learning.
Both PyTorch and Tensorflow are among today's most relevant and popular frameworks to carry out machine learning. They are among our main fields of expertise and thus a key tool of choice for our projects.
Model, a.k.a. ML Model
"The thing that makes the computer smart."
A "smart programme" that learned to do something "smart" in ML fashion. This means, one gave a lot of data to computers in order for them to learn.
For instance, an "AI recognising" a face is actually a (ML) model which learned to do so from very large amounts of data.
"Feed the data to the computer."
The entire process of developing a ML model, mostly using PyTorch or Tensorflow. ML Training easily becomes very extensive as it involves a lot of steps. The aim is to arrive at an ML model that "performs well".
Important: ML training is very data intensive. The more data is available the better. At the same time, data quality is a huge issue. Data availability and data quality are usually the crucial point of any AI project.
A given ML model is made available to a large group of people.
Training ML models is a first major step. To make its service available to customers or a large user group is something entirely different. Scandio is particularly good at this transfer. A lot of people underestimate the importance of making a good model available at large scale.
"Exactly what is going on in production?!"
The performance of ML models needs constant monitoring. This is a major step of its own and it is very important. Work does not stop after you producing some ML model. Rather, one needs to make sure that it still behaves as expected.
"You know, back then it worked."
Use cases and scenarios often change over time. Thus, the resulting data changes, too. ML models usually need to be adapted to such changes.
"Let's make this work again."
If the ML model's performance is too bad (due to whatever reason) one may decide to retrain a model.
Computer Vision (CV)
"The computer recognises my cat!"
CV is a huge field and it is one of the "classical" applications of ML, e.g. face recognition. Also, CV offers impressive opportunities but could also have considerable ramifications on daily life and is thus not discussed without controversy in public discourse.
Natural Language Processing (NLP)
Making sense of human language is also a major application of ML. The rough idea is to interact with computers via human language; it is surprisingly difficult.
Time Series Forecasting
"Will there be a traffic jam tomorrow when going to work?"
Another major field of ML application. In general, the idea is to have large amounts of historic data and to predict future events based on these.
For instance, one may predict traffic jams on Munich's A99 and realise that Monday morning is a bad time to drive to Munich.Back to Data Science page