Scandio Report: Data Science Edition

The topic data science is currently on everyone's lips. For us as IT company, it is not only a very exciting field with great potential for our customers, but it is also a great opportunity to add new talent to our already existing expertise.

Currently, Scandio is growing a data science team that consists of 20 experts at the moment. Three of them are happy to answer some questions and give insights into the topic: Sana, Flo and Oliver. Find out who exactly these experts are, what data science means to them and what tips they have for all those interested in the topic.

Since when have you been part of Scandio and what is your role?

Flo: I have been working at Scandio since July 1st, 2022 as a Junior Data Scientist / Machine Learning Engineer.

Sana: I have been part of Scandio since July 1st as well and am also a Junior Data Scientist / Machine Learning Engineer.

Oliver: I have been working at Scandio as an Machine Learning Engineer since November 2020. In my projects, I work a lot with Data Scientists, so I often have kind of an interface role.

What exactly does the term data science mean?

Sana: A common definition of data science : "Data science is an AI subset that deals with data methods, scientific analysis, and statistics, all used to gain insight and meaning from data" which is, according to me, the most accurate one.

Flo: I'll just briefly quote Wikipedia to begin with: "Data science generally refers to the extraction of knowledge from data". But what exactly is this knowledge? It is, for example, the recognition of patterns hidden in data, or the recognition of interrelationships in order to derive conclusions from them. And to gain this knowledge from data, various methods from mathematics, computer science, statistics, and information science are used - usually combined with knowledge from other fields such as biology, chemistry, linguistics, etc.

Oliver: Data science is an interdisciplinary field that deals with the extraction of additional information from existing data and data sources. There is often a connection to machine learning, although both fields are of course independent and not congruent.

What do you find particularly exciting about this topic?

Oliver: In practice, data science comprises essential parts that are responsible for the success of a machine learning project. What one misses in the field of data science cannot be made good for later in the project, or with difficulty. I therefore find data science particularly interesting and exciting as a transition from an actual problem area to its handling using machine learning methods.

Flo: What fascinates me about the topic is that we can use data science methods to develop models that automatically learn the correlation between data points based on an accumulation of individual points. Subsequently, the models can classify enormous amounts of data points – which they have never seen before – within seconds based on the learned correlations, or even generate language and compose music.

Sana: As a data lover, the most exciting parts of being a data scientist are the discovery and the innovation. Starting from a complex dataset, you will end up solving problems and making decisions using a bench of models and functions. It's literally "making discoveries while swimming in data".

Can you give an example?

Flo: One example is the detection of defective products on production lines. A camera films the products passing by on the belt and transfers the images (data) to a model in the background that detects flawed products. These parts can then be sorted out automatically.

Sana: For example skin anomalies detection using deep learning for medical images database or real time detection using a camera.

Oliver: Very impressive for me was Timnit Gebru's work, which addressed that automatic facial recognition is strongly biased with regard to ethnicity and gender. The underlying discussion about ethics in the context of AI continues to this day. Regardless of the development, it can be stated in any case that at the time of training, one is well advised to have understood the available data as deeply as possible in terms of data science. This means that one needs both a certain technical understanding, for example in statistical matters, as well as subject-specific expertise with regard to the current problem area.

What projects are you currently working on in the context of data science?

Oliver: I am currently working as an ML Engineer on a project in the field of household appliances. As an example, the goal is to predict the remaining baking time of a cake as accurately as possible to enable a better user experience. Technically stated, this is a time series problem, which originates from thermodynamic processes. On the one hand, baking can be something mundane, but it is nevertheless enormously complex and challenging; this makes it a very special project for me.

Flo: Since I've only been at Scandio for a short time, my last project was my master's thesis, which I wrote in the area of deep learning, which is also part of data science. I compared different models to find out which one can handle so called unlabeled data better. The results of my work are meant to be applied to explore soil samples in terms of several properties so the impact of climate change can be better understood.

💡

Briefly explained: label and data
Labels annotate the data. Often they are the dimension whose hidden relation to the data is learned by a model to eventually predict the label for unknown data. An example of a labeled datum is an image showing a dog, which is annotated with the label "dog". Unlabeled data lack these labels, making the learning process more complex.

Sana: I joined Scandio just recently, so my last complete project was academic. Similar to Flo, it was also a deep learning project, but in my case about implementing a fire detection system using a huge image dataset. The project scope ranged from data augmentation and cleaning ( (resize, reshape, ...) to searching for the best model and parameters (it was CNN) and evaluating its performance.

How did you get in touch with the topic of data science?

Sana: I did a computer science bachelor, and one of the lectures was about artificial intelligence, the general field of data science, and right there I fell in love with this topic.

Oliver: In the course of my mathematics studies I had a lot to do with numerical mathematics. There, the outstanding successes of the machine learning community had not gone unnoticed, so people became increasingly involved with it. After I decided not to follow a scientific career, I finally ended up at Scandio, where I have been able to pursue this field ever since.

Flo: I had my first contact with the topic during my studies. A lecture called 'data mining', which is related to data science, sparked my enthusiasm for this topic and for Deep Learning in particular.

And why did you choose Scandio?

Sana: I felt that Scandio is the perfect place to exercise my passion as a job and to improve my knowledge at the same time.

Flo: I chose Scandio because I want to collaborate with talented and like-minded colleagues and develop some cool Deep Learning models with them.

Oliver: Basically, Scandio discusses how to deal with existing challenges and problems in a factual, open-ended and solution-oriented manner. Everyone's opinion is important here, because everyone, even if they are completely new to the topic area, can and will make valuable contributions. In this way, we can tackle challenging topics and problems confidently without falling into "delusions of grandeur"; and I like that very much ;-).

What advice would you give to someone who would also like to work in the field of data science?

Flo: My tip would be to acquire a solid basic theoretical understanding on data science and then to specialize in one direction. The first works well and structured by an appropriate study, but also online there are many sources which contain the necessary information. Ultimately, however, you only learn it by trying it out and applying it yourself.

Sana: The best advice that I can give to a new data scientist is to continuously learn from all possible resources available and practice with some Kaggle projects and competitions.

Oliver: Know and understand the basics of your field. Of course, you quickly understand the difference between causality and correlation. But in addition, it was and is very helpful for me to have a lot of background mathematically, for example in stochastics, measure theory, and functional analysis. It's not that this makes all problems disappear into thin air, but it's been my experience that this way I have a context that helps me develop a solution.

This way for more data science know-how
For those who want to delve deeper into the topic of data science, there are a number of very good resources online. For example, we recommend deeplearning.ai, Yann LeCun’s Deep Learning Course, or classes on coursera and edX.

And if you want to learn more about our services concerning data science, AI and machine learning, visit the Scandio website or get in touch with our experts.

If you want to know what else is happening at Scandio:
Scandiolife on Instagram.
Connect with us on LinkedIn.
Look what Scandio is tweeting.

Since when have you been part of Scandio and what is your role?

What exactly does the term data science mean?

What do you find particularly exciting about this topic?

Can you give an example?

What projects are you currently working on in the context of data science?

How did you get in touch with the topic of data science?

And why did you choose Scandio?

What advice would you give to someone who would also like to work in the field of data science?

Martin Grebner

Contact

Subscribe to Newsletter