Machine Learning and Data Analysis (Strijov's practice)/4th year, fall
Материал из MachineLearning.
The course
Desogning exploatational models
- Report on the course results in 2013: Report2013Fall.pdf
- Short URL to this page: http://goo.gl/yV60hk
- Web-interfaces to the projects can be found at http://mvr.jmlda.org
Syllabus
Date | Tasks | Results | Code | |
---|---|---|---|---|
September | 9 | Introduction to the course, motivation, organisational information. | ||
16 | Select a problem and reviewer. Make a brief presentation (45 seconds) of the project. | Sign-up using ML. Report B | Go | |
23 | Find literature. Find a dataset. Describe it in text and IDEF0 formats. | Reference list. Data description. | Literature | |
30 | Formulate the problem. Write a mathematical statement in LaTeX format; describe basic algorithm. | Problem statement and basic algorithm. | Algorithm | |
October | 7 | Design system architecture and core interfaces. | IDEF0 of the system. | Idef |
21 | Add details to the interfaces, write code. | Launchable code for real data. | Code | |
28 | Create unit-tests for each module. Make a report, justifying the interfaces and IDEF-representations. | Unit tests, Report | Unit | |
November | 4 | Collect and preprocess data for testing the system. Complete IDEF0 for data collection. Create and run system tests. | Tests, data, completed IDEF0. | Tests, Data |
11 | Optimize the code. | Profiling report. | Profiler | |
18 | Make a visual report. After the projects are completed, write a review. | Completed technical report, reviews. | Report, Review | |
25 | Create web-interface and several examples of usage. | Source code is uploaded. | Web | |
Декабрь | 2 | Make a presentation, tidy up documentation and code. | Final report | Slides |
9 | - | Discussions |
Reports are denoted by letters B, M, F.
Homeworks
Homework 1 --
Homework 2
- Complete section 1.1.2 "Motivation" of SysDocs;
- Complete section 1.1.3 "Literature";
- Prepare 40-second oral report on a problem.
Homework 3 Compose problem statement (using LaTeX). Here is a "template" of problem statement: https://svn.code.sf.net/p/mlalgorithms/code/Group074/Kuznetsov2013SSAForecasting/doc/
And here are some examples from the class presentation. It is recommended to review them before starting:
- http://strijov.com/papers/KuzminAduenkoStrijov2012Clustering.pdf
- http://strijov.com/papers/Kuznetsov2012Curvilinear.pdf
- http://strijov.com/papers/Kuznetsov-Strijov2013Concordance.pdf
- you can also review several articles from JMLDA journal archive: http://jmlda.org/?page_id=35
Homework 4
- Correct problem statement, if necessary.
- Write down the abstract according to plans and (section 1.1.1 Systemdocs)
- Design two layer IDEF0 diagram (sections 1.2.2, 1.2.3 Systemdocs), preferably separating learning stage from final utilization stage.
- Describe general data formats and structures(section 1.4 Systemdocs)
- Describe modules interfaces (section 2 Systemdocs)
Some useful links:
- MATLAB Programming Style Guidelines http://www.machinelearning.ru/wiki/images/1/18/MatlabStyle1p5.pdf
- IDEF0 http://www.machinelearning.ru/wiki/images/9/99/P_50-IDEF0.pdf
- Function heading style example http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group074/Kuznetsov2013SSAForecasting/code/
- System of notations http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group074/Kuznetsov2013SSAForecasting/doc/
Homework 5 Create launchable source code. To complete this task you also need to rewrite in more detailed view all modules interfaces (section 2 Systemdocs) and function headings.
Homework 6
- Create final version of code for project basement: launchable code should evaluate project results in "one click".
- Write unit tests for each module, according to the manual that will be announced here soon.
Homework 7
- Complete IDEF0: work out data processing module in details, create the second level of detalization. The second level is focused on checking user's data for validity.
- Look out for viruses (refrain from executing commands from file body, such as mpeg).
- Check if the file type is supported.
- Check if the file size is not too large (processing and analysis should not take more than 15 seconds; preferably, the file should not occupy more than 200 Mb),
- Check if the data structure is valid, so that algorithm would report of mistake instead of returning inadequate results.
- Collect real data for demonstration and testing purposes in the 'data' folder. If the data is too huge, upload files with links to the dataset. Add data description to the systemdocs.
- Create a module for uploading and processing data. The module should upload a single file.
NB: basic principles for system testing and error analysis
- Check data for validity.
- Check models for adequacy (overfitting, complexity, robustness, precision, etc.).
- Check your results for adequacy. Error analysis.
- Check technical parameters of the system (elapsed time, convergence of optimization algorithm, robustness - similar inputs should produce similar outputs).
Homework 8
- Write a review using a plan provided below and place it into a file named like YourSurname2013ReviewSurname
- Prepare 1-minute oral report
- Create system tests: test data sets, module (script) for launching. Put the reference to this module in section 5.2 of SystemDocs file.
Review plan:
- Briefly state the main topic, the most important aspects of the project, aim of the project, compare it to similar projects. Explain how one can apply the results of the project (is it actual? important ?)
- Emphasize project strengths and weaknesses (pats of the problem that should be considered in a more detailed way)
- Project details: clarity of project description in SystemDocs, ProblemStatement; code readability, interfaces usability, tests coverage.
- Summarise your opinion to conclude
Homework 9
Using profiler, optimise 'bottleneck' parts of your code. Complete section 5.3 of systemdocs, using profiling report.
Potential bottlenecks are those parts of the code, which take considerable time in computational experiments. The task is to show improvements of computing time (say, after loops were substituted with matrix computations) or prove that the code is sufficiently optimize. Include most prominent rows (usually, first 10 to 15 rows) from profiler's report to your documentation.
- Using 'profile' function, you can save profiler's report in suitable format.
- We recommend to use 'parfor' parallel equivalent of 'for'. However, such constructions as x = x+1 or x(end+1) = y can not be parallelised. To avoid them, create matrices or structures of required size beforehand.
Example: >> matlabpool(3) >> tic; parfor i=1:3, c(:,i) = eig(rand(1000)); end; toc Elapsed time is 3.712837 seconds. >> tic; for i=1:3, c(:,i) = eig(rand(1000)); end; toc Elapsed time is 5.807167 seconds.
Homework 10 Using the results of system tests and the computational experiment, aimed to provide error rate analysis, create plots and tables with some clarifications, and put it into section 5.2 of system docs. Please identify different parts of this report with help of paragraphs named adequately.
Required parts of the mentioned computational experiment: Visualization of the procedure of model selection and structural parameters optimization Visualization of the resulting model or algorithm, visualization of the applied method of optimization, dependence of the lost function or quality criterion on the level of inserted noise or on other factors. Visualization of obtained error rate in "web" section. (also plot or table)
Homework 11 The folder "web" must contain the following files:
- File "config.json" (name and extension must be the same). Fill this file using example placed in folder "Group074/Kuznetsov2013SSAForecasting/web/"
- File "main.m" with one argument variable and one resulting variable:
html = main(filename), where filename is a text string containing file name, and html is text string containing visual "web" report in html format.
- File "test.csv" (you can use another extension), This file should contain test object (text, time series, image, sound, video, etc.) for forecasting.
- (Optionally) Other files, that are required for function "main" (in particular file with parameters and structural parameters of forecasting model/algorithm)
For testing purposes it is strongly recommended to launch function writeHTML. It calls function "main('test.csv')" and save results into "out.html". This file should contain either "web" report about results of forecasting or error message about some trouble with forecasting (types of errors were considered in data loading section).