FM1AZDP05 Data Pre-Processing

FM1AZDP05 Data Pre-Processing

  • Course description
    • Course code
      FM1AZDP05
    • Level of study
      5.1
    • Program of study
      Applied Machine Learning
    • Credits
      5
    • Course coordinator
      Leon Grobbelaar
Teaching term(s)
2024 Autumn
About the Course

The course provides knowledge of and skills in data pre-processing, data standardisation, feature engineering and feature selection for modelling. Candidates are provided knowledge to study and understand data quality issues and skills to link pre-processing and data. The course also provides skills and statistical techniques that allow the candidate to prepare and format data in a structured way.

This course is relevant to the program because data pre-processing is an essential step in any machine-learning project. The course will provide students with the knowledge and skills required to standardise data so that it is in the right format for the machine learning model, create new features to best leverage the information in the dataset and select the best features to improve the machine learning model output.

Course Learning Outcomes
Learning outcomes - Knowledge

The candidate:

  • has knowledge of concepts and processes that are used to review and understand data quality issues
  • has knowledge of methods and tools that are used to standardise data
  • has insight into relevant standards and requirements for optimal data and the relation between data and data pre-processing
  • can update his/her knowledge of feature engineering and selections
Learning outcomes - Skills

The candidate:

  • can apply knowledge of selections and large-scale data to solve machine-learning tasks
  • masters relevant tools and techniques to format and structure data suitable for machine learning
  • masters relevant tools and statistical techniques to clean data and remove incomplete variables
  • can study data sets and identify issues and the need for data preparation to clean and expose the information content
General Competence

The candidate:

  • can carry out data pre-processing based on the needs of an overall machine learning methodology
  • can develop statistics and basic data visualisation
Teaching and Learning

In this course, the following teaching and learning methods can be applied, but are not limited to:

  • Lecture: Educator-led presentations or activities providing knowledge, skills, or general competencies in the subject area.
  • Group work: Collaborative activities where students work together to solve problems or complete tasks.
  • Tutoring: One-on-one or small group sessions with an instructor for personalized guidance and support.
  • Student presentations: Opportunities for students to demonstrate their understanding of course material by presenting to peers.
  • Online lessons: Digital content delivered via an online learning platform.
  • Guidance: Individualized advice and direction from instructors to support students in their learning journey.
  • Workshops: Practical sessions focused on hands-on application of theoretical concepts or skills.
  • Self-study: Independent study where students engage with course material on their own without any teacher support.
Reading list

Teaching materials, reading lists, and essential resources will be shared in the learning platform and software user manuals where applicable.

Assessments
Form of assessmentGrading scaleGroupingDuration of assessment
Course Assignment
Pass / Fail
Group/Individual
4 Week(s)