Machine Learning and Data Analysis (Strijov's practice)/4th year, fall

Материал из MachineLearning.

Перейти к: навигация, поиск


The course

Desogning exploatational models

Syllabus

Date Tasks Results Code
September 9 Introduction to the course, motivation, organisational information.
16 Select a problem and reviewer. Make a brief presentation (45 seconds) of the project. Sign-up using ML. Report B Go
23 Find literature. Find a dataset. Describe it in text and IDEF0 formats. Reference list. Data description. Literature
30 Formulate the problem. Write a mathematical statement in LaTeX format; describe basic algorithm. Problem statement and basic algorithm. Algorithm
October 7 Design system architecture and core interfaces. IDEF0 of the system. Idef
21 Add details to the interfaces, write code. Launchable code for real data. Code
28 Create unit-tests for each module. Make a report, justifying the interfaces and IDEF-representations. Unit tests, Report Unit
November 4 Collect and preprocess data for testing the system. Complete IDEF0 for data collection. Create and run system tests. Tests, data, completed IDEF0. Tests, Data
11 Optimize the code. Profiling report. Profiler
18 Make a visual report. After the projects are completed, write a review. Completed technical report, reviews. Report, Review
25 Create web-interface and several examples of usage. Source code is uploaded. Web
Декабрь 2 Make a presentation, tidy up documentation and code. Final report Slides
9 - Discussions

Reports are denoted by letters B, M, F.

Homeworks

Homework 1 --

Homework 2

  1. Complete section 1.1.2 "Motivation" of SysDocs;
  2. Complete section 1.1.3 "Literature";
  3. Prepare 40-second oral report on a problem.

Homework 3 Compose problem statement (using LaTeX). Here is a "template" of problem statement: https://svn.code.sf.net/p/mlalgorithms/code/Group074/Kuznetsov2013SSAForecasting/doc/

And here are some examples from the class presentation. It is recommended to review them before starting:

Homework 4

  1. Correct problem statement, if necessary.
  2. Write down the abstract according to plans and (section 1.1.1 Systemdocs)
  3. Design two layer IDEF0 diagram (sections 1.2.2, 1.2.3 Systemdocs), preferably separating learning stage from final utilization stage.
  4. Describe general data formats and structures(section 1.4 Systemdocs)
  5. Describe modules interfaces (section 2 Systemdocs)

Some useful links:

Homework 5 Create launchable source code. To complete this task you also need to rewrite in more detailed view all modules interfaces (section 2 Systemdocs) and function headings.

Homework 6

  1. Create final version of code for project basement: launchable code should evaluate project results in "one click".
  2. Write unit tests for each module, according to the manual that will be announced here soon.


Homework 7

  • Complete IDEF0: work out data processing module in details, create the second level of detalization. The second level is focused on checking user's data for validity.
    1. Look out for viruses (refrain from executing commands from file body, such as mpeg).
    2. Check if the file type is supported.
    3. Check if the file size is not too large (processing and analysis should not take more than 15 seconds; preferably, the file should not occupy more than 200 Mb),
    4. Check if the data structure is valid, so that algorithm would report of mistake instead of returning inadequate results.


  • Collect real data for demonstration and testing purposes in the 'data' folder. If the data is too huge, upload files with links to the dataset. Add data description to the systemdocs.
  • Create a module for uploading and processing data. The module should upload a single file.

NB: basic principles for system testing and error analysis

  1. Check data for validity.
  2. Check models for adequacy (overfitting, complexity, robustness, precision, etc.).
  3. Check your results for adequacy. Error analysis.
  4. Check technical parameters of the system (elapsed time, convergence of optimization algorithm, robustness - similar inputs should produce similar outputs).


Homework 8

  • Write a review using a plan provided below and place it into a file named like YourSurname2013ReviewSurname
  • Prepare 1-minute oral report
  • Create system tests: test data sets, module (script) for launching. Put the reference to this module in section 5.2 of SystemDocs file.

Review plan:

  1. Briefly state the main topic, the most important aspects of the project, aim of the project, compare it to similar projects. Explain how one can apply the results of the project (is it actual? important ?)
  2. Emphasize project strengths and weaknesses (pats of the problem that should be considered in a more detailed way)
  3. Project details: clarity of project description in SystemDocs, ProblemStatement; code readability, interfaces usability, tests coverage.
  4. Summarise your opinion to conclude

Homework 9

Using profiler, optimise 'bottleneck' parts of your code. Complete section 5.3 of systemdocs, using profiling report.

Potential bottlenecks are those parts of the code, which take considerable time in computational experiments. The task is to show improvements of computing time (say, after loops were substituted with matrix computations) or prove that the code is sufficiently optimize. Include most prominent rows (usually, first 10 to 15 rows) from profiler's report to your documentation.

  • Using 'profile' function, you can save profiler's report in suitable format.
  • We recommend to use 'parfor' parallel equivalent of 'for'. However, such constructions as x = x+1 or x(end+1) = y can not be parallelised. To avoid them, create matrices or structures of required size beforehand.

Example: >> matlabpool(3) >> tic; parfor i=1:3, c(:,i) = eig(rand(1000)); end; toc Elapsed time is 3.712837 seconds. >> tic; for i=1:3, c(:,i) = eig(rand(1000)); end; toc Elapsed time is 5.807167 seconds.

Homework 10 Using the results of system tests and the computational experiment, aimed to provide error rate analysis, create plots and tables with some clarifications, and put it into section 5.2 of system docs. Please identify different parts of this report with help of paragraphs named adequately.

Required parts of the mentioned computational experiment: Visualization of the procedure of model selection and structural parameters optimization Visualization of the resulting model or algorithm, visualization of the applied method of optimization, dependence of the lost function or quality criterion on the level of inserted noise or on other factors. Visualization of obtained error rate in "web" section. (also plot or table)

Homework 11 The folder "web" must contain the following files:

  • File "config.json" (name and extension must be the same). Fill this file using example placed in folder "Group074/Kuznetsov2013SSAForecasting/web/"
  • File "main.m" with one argument variable and one resulting variable:

html = main(filename), where filename is a text string containing file name, and html is text string containing visual "web" report in html format.

  • File "test.csv" (you can use another extension), This file should contain test object (text, time series, image, sound, video, etc.) for forecasting.
  • (Optionally) Other files, that are required for function "main" (in particular file with parameters and structural parameters of forecasting model/algorithm)

For testing purposes it is strongly recommended to launch function writeHTML. It calls function "main('test.csv')" and save results into "out.html". This file should contain either "web" report about results of forecasting or error message about some trouble with forecasting (types of errors were considered in data loading section).

Личные инструменты