The Scandio Data Science team identified five tech trends for 2023 and beyond. Learn what Scalable AI, Foundation Models or Data Sharing can mean for companies. And discover why educating employees across all business units in data literacy is vital for future success.
Data Science (DS) and related fields are some of the most promising technical disciplines. Many groundbreaking developments have been enabled by DS, AI and machine learning. At the same time, the potential still is tremendous and new use cases and developments occur constantly.
For 2023 (and beyond), my colleagues and I identified five trends - based on our experience from customer projects, scientific papers and exchange with other experts in the community.
AI & Data Science Trends 2023:
- Scalable AI
- Foundation Models
- Trustworthy AI
- Data Sharing
- Governmental Regulation
But what do all these buzzwords mean, and why should companies care? Well, the answer starts right here.
From a computational perspective, Scalable AI means being capable of scaling Machine Learning-based workloads up or down depending on the current load on the system.
But scalability in this context also encompasses the question of how to store and organize data which the ML models need, so that access to the data can be provided fast, securely, and reliably across a wide range of data sources. As AI models become more elaborate and powerful, they tend to also increase in size, and require a growing amount of computational power and the respective infrastructure which needs to be maintained. Continuing to scale up infrastructure may not be sustainable.
In this context, data fabric and data mesh are two approaches to managing, distributing, and utilizing data in an architectural framework. To achieve these goals, companies transform their organizational structure, enabling and facilitating the best possible utilization of data to maximize the added value. Even more importantly, companies encourage all their employees to become more literate in working with and handling data, as not only data scientists should engage with them.
Data will become even more abundantly available in the future, so everybody should build skills and competence in managing them.
The first wave of AI washed ashore task-specific models, which have dominated the AI landscape in solving real-world problems in recent years. But for training these models, we need vast amounts of labeled data for the specific task. If no dataset is available, collecting and manually labeling data is a resource-intensive task. Besides that, training large models on vast amounts of data for each new task imposes a significant carbon footprint on the environment, since the computing capacities required for the necessary calculations can consume a great deal of energy.
Riding on the currently arriving second wave of AI are foundation models like Florence, DALL-E-2, or Flamingo. This type of AI model bears the advantage of only needing a broad set of unlabeled data. A trained model can then be used for various tasks with comparatively little fine-tuning to a specific task. In this way, they can serve as the foundation for many applications of one AI model. For example, one can first train a computer vision model on the vast number of images available on the internet, later we can fine tune this model for the specific purpose of driving a car.
The data science team at Scandio currently includes around 20 people, ranging from Data Scientists and Data Engineers to Software Engineers as well as MLOps Engineers.
Meet the team and read the latest Scandio Report: Data Science Edition.
Trustworthy AI means establishing and building trust in the decision-making of AI models. AI models are often considered black boxes where we put data in (e.g., an image depicting a Scandio coffee mug) and get a result (e.g., coffee mug marked by bounding boxes on the picture) without knowing how exactly the model reached its decision.
As trivial as it may seem in the case of detecting coffee mugs, this issue escalates quickly in priority and importance when AI models help with medical diagnoses or rank the CVs of job applicants. In these cases, we might want to know what led the model to its conclusion since e.g., a false medical diagnosis could result in severe consequences.
Due to this black box property, companies work with academia to research methods capable of giving insights into the decision-making processes of AI models so that customers and users start trusting AI.
As the amount of produced and available data steadily increase in the future, data sharing takes on greater significance. In the last years, sharing data meant copying a dataset and sending it to the respective recipient. However, with the growing volume of data, this procedure becomes more and more infeasible because of the unnecessary multiplication of data and the increasing capacities needed for storing the (redundant) data. Also, data ownership is not always clear and legally secured, hindering data sharing in the first place.
As the amount of produced and available data steadily increase in the future, data sharing takes on greater significance.
Companies have recognized this problem and started initiatives to tackle this problem by cooperating across enterprise borders. For example, several German automotive companies and Deutsche Telekom, among others, are working to establish an open, scalable network for cross-company and secure information as well as data exchange in the automotive industry.
The AI Act, a European regulation on artificial intelligence proposed in April 2021, could be the first legal framework on AI by a significant governmental regulator anywhere in the world. It classifies AI applications into four risk categories, where AI applications classified in the highest risk category, 'Unacceptable Risk' (e.g., social scoring by a government), is prohibited by law. Nevertheless, this law also has several loopholes and exceptions that limit its ability to ensure that AI remains a force for good in our lives.
At the moment, especially the big enterprises struggle to find out what implications the AI Act has for their products regarding the interaction and collision of European law with German law and common practice. It seems like small to medium size companies lacking a dedicated judicial department analyzing the proposed rule are standing on the sidelines, waiting for the final draft.
Focusing on data now
It is estimated that today's data signify a mere 2 percent of the data that will exist in the year 2035, highlighting the ample opportunity. So, focusing on data and their application now is imperative for future rewards.
This data-centric transformation resembles the transition from traditional to agile project management. A change which must be lived, breathed, and internalized to be successfully applied.
The Scandio Data Science team will be happy to advise on your questions.
By bringing together different fields of expertise, we can help companies along the whole project journey:
- Starting with requirements analysis
- Following up with a small focused team on a proof of concept
- Scaling the team to full size, targeting for an productive MVP, including operations
Get in touch with us and let's talk how we can support you.