OUR PILOT COURSE FOR DATA SCIENTISTS
The Da.Re. project aims at creating a new way to train Data Scientists.
We are creating an open curriculum to be adopted by the European training institutions.
In order to refine the contents of this training path we developed a Pilot Course of about 150 hours dedicated to different kinds of participants.
The Pilot Course for Data Scientist, designed as result of the huge research activities of the Da.Re. Project, is shaped as a MOOC (Massive Open Online Course, lasting for 80 hours) and in part (70 hours) as a face-to-face learning and practical experience organized in Italy.
The course gas been experimented in early 2019 and it’s content is stilla avaiable in the download section.
The Pilot Course Concept
Our research identified a gap in the provision of data science education that is not satisfied by the universities or boot camps, namely for what we have called the bridge person, someone who combines knowledge of an organisation’s business with sufficient knowledge and understanding of data science to ‘bridge’ between nontechnical people in the business with highly skilled data scientists able to add value to the business.
We believe that the roles of the Bridge Person and Chief Data Analytics Officer identified by the Da.Re. project can make a significant impact to improving the application of data science to European businesses and organisations.
Our solution is to enable a proper additional education in the data science domain for:
- Employees from business who have expert knowledge about business and know little about data science. Career path: they can become Chief Data Analytics Officer (or similar) in the company (after a while)
- Graduate students who want to work as a data scientist in business and are motivated to learn new technical topics of the value for their future position. Career path: they can become Chief Data Scientist (after a while)
- Senior business people who have little time but want to know how data science can add value to their business, and how to take the first steps towards it.
An Outline of the Da.Re. Programme
The Da.Re. programme has two parts: 80 hours online education followed by 70 hours face-to-face education.
The idea is that the online education provides students with the technical knowledge and skills needed to do the hands-on training at the two-weeks 70-hours face-to-face residential school.
The content of the pilot course is still available even for those who could not participate to the formal process organized in 2019 (go to the download section).
By combining online and face-to-face education, Da.Re. can combine the best of MOOCs and the boot camp approach to provide new, useful and sustainable data science education in Europe.
Case study areas include the following topics:
- Energy Signatureof an electric microgrid
- Stationarity analysisfor industrial systems
- Business Communication Management
- Time series analysis for biomedical analytics
THE PILOT COURSE MODULES
The course is shaped in 5 different modules: 4 of them are online, 1 has been hosted in Italy and still available for download.
In this module students will learn some of the fundamental concepts of data science. This begins with learning how Google Analytics can provide a wide variety of information on the use of websites, including a lot of data on users and how they navigate web sites. Following this you will be introduced to Python, and the use of standard data science libraries including NumPy, Pandas, Scikit-learn and Matplotlib. Then you will learn about the R package that provides a wide variety of methods for statistical analysis of data. The module will end with an overview of a number of mathematical topics used in data science.
Data preparation assumes a critical role in the Data Scientist life. During these modules several keys aspects of preparing the data for later analysis is to be given particularly, ones that are crucial for the applications given with real case studies. In this way, topics such as data loading, sampling, feature extraction and Fourier transform are to be given. At the end of this module it is expected that you are able to successfully prepare your data for an efficient analysis.
The data Analysis module is the gateway to the world of machine learning (ML). In this module the many faces of ML such as dimensionality reduction, classification, prediction and clustering will be unveiled. They are presented in a simple theoretical way with an eye on practical examples; simple implementations using Python are provided, too. At the end of the module you will be asked to be able to appropriately apply the various paradigms and correctly interpret the results.
Now that we have the data, what do we do with it? In this module we will focus on ways we can use visualisation to answer this question. In particular, we will learn how to display data to help us gain insights and communicate these insights to others. We will introduce you to the theory that underpins the making of effective visualisations. We’ll do practical exercises using R, both on sample data and on the real-life provided data from chosen case studies to visualise it in different ways.
Prerequisite Knowledge and Background
- Level 6 educationor higher, e.g. a bachelor’s degree in any subject
- Numerate and able to read simple equations, graphs and charts
- Literate and able to write reportswith illustrative graphics
- Good research skills, finding and synthesising information
- Interest in patterns of data as they impact on business
- Good self-studyand time-management skills
- Goodteamwork skills
Thus, our typical students are seeking a job in industry, or are already employed people in companies (typically SMEs) who have the knowledge of their business domain and will acquire the data science competences of the bridge person.
The interest and feedbacks given by University Professors is also very welcome!
The Learning Outcomes
The general learning outcomes of the Pilot Course are:
- A clear knowledge of the data lifecycle
- Data Preparation
- Data Analysis
- Data Visualisation
- Practice and ability to solve real problems that companies face
- The capacity to go beyond the data lifecycle by creating added value to the organisation
- Ability to organise and revise a data lifecycle in an organisation. More precisely, to identify and select existing and not existing competences in the organisation, create a team and structure the work for going through an established pipeline:
- Problem Identification
- Data Preparation
- Data Analysis
- Data Visualisation