Участник:Strijov/Drafts

Материал из MachineLearning.

< Участник:Strijov(Различия между версиями)
Перейти к: навигация, поиск
Текущая версия (22:28, 26 февраля 2023) (править) (отменить)
 
(151 промежуточная версия не показана)
Строка 1: Строка 1:
-
{{Main article|Numerical methods of learning by precedents (practice, V.V. Strizhov)}}
 
{{TOCright}}
{{TOCright}}
-
=2021=
 
-
* Story [[My first scientific article (practice, V.V. Strizhov) / Groups 774, 794, spring 2020| 2020 (774, 794)]] — [[Automation of scientific research in machine learning (practice, V.V. Strizhov)/Group 674, spring 2019|2019 (674)]] — [[Automation of scientific research in machine learning (practice, V.V. Strizhov)/Group 694, spring 2019|2019 (694)]] — [[Numerical methods of learning by precedents (practice, V.V. Strizhov) / Group 574, spring 2018 | 2018]] — [[Numerical methods of learning by precedents (practice, V.V. Strizhov) / Group 474, spring 2017 | 2017]] — [[Numerical methods of learning by precedents (practice, V.V. Strizhov) / Group 374, spring 2016 | 2016]] — [[Numerical methods of learning by precedents (practice, V.V. Strizhov) / Group 274, spring 2015 | 2015]] — [[Numerical methods of learning by precedents (practice, V.V. Strizhov) / Group 174, spring 2014 | 2014]] — [[Numerical methods of learning by precedents (practice, V.V. Strizhov) / Group 074, spring 2013 | 2013]]
+
==2023==
 +
===Problem 112===
 +
* '''Title:''' Modeling an FMRI reading from a video of a shown person
 +
* '''Problem description:''' It is required to build a dependence model of the readings of FMRI sensors and the video sequence that a person is viewing at this moment.
 +
* '''Data:''' The sample for approximation is presented in the work of J. Berezutskay, in which there are various types of parallel signals.
 +
* '''Literature:''' Berezutskaya J., et al Open multimodal iEEG-fMRI dataset from naturalistic stimulation with a short audiovisual film // Sci Data 9, 91, 2022.
 +
* '''Predecessor code:'''
 +
* '''Base algorithm:''' Running code based on transformer models.
 +
* '''Novelty:''' Analysis of the relationship between sensor readings and human perceptions of the external world. It is required to test the hypothesis of the relationship between the data, as well as to propose a method for approximating FMRI readings based on the video sequence being viewed.
 +
* '''Authors:''' Expert Grabovoi Andrey.
 +
===Problem 113===
 +
* '''Title:''' Modeling of the FMRI indication on the sound range that a person hears
 +
* '''Problem description:''' It is required to build a model of the dependence of the readings of the FMRI sensors and the sound accompaniment that a person is listening to at this moment.
 +
* '''Data:''' The sample for approximation is presented in the work of J. Berezutskay, in which there are various types of parallel signals.
 +
* '''Literature:''' Berezutskaya J., et al Open multimodal iEEG-fMRI dataset from naturalistic stimulation with a short audiovisual film // Sci Data 9, 91, 2022.
 +
* '''Predecessor code:'''
 +
* '''Base algorithm:''' Running code based on transformer models.
 +
* '''Novelty:''' Analysis of the relationship between sensor readings and human perceptions of the external world. It is required to test the hypothesis of the relationship between the data, as well as to propose a method for approximating the FMRI readings from the listening sound series.
 +
* '''Authors:''' Expert Grabovoi Andrey.
 +
 +
===Problem 114===
 +
* '''Title:''' Simulating the Dynamics of Physical Systems with Physics-Informed Neural Networks
 +
* '''Problem description:''' The problem of choosing the optimal model for predicting the dynamics of a physical system is being solved. Under the dynamics of the system is understood the change in time of the parameters of the system. Neural networks do not have a priori knowledge about the system being modeled, which does not allow obtaining optimal parameters that take into account physical laws. The Lagrangian neural network takes into account the law of conservation of energy when modeling dynamics. In this paper, a Noetherian Agrangian neural network is proposed that takes into account the laws of conservation of momentum and angular momentum in addition to the law of conservation of energy. It is shown that for this problem the Noetherian Lagrangian neural network is optimal among the fully connected neural network model, the neural network with long-term short-term memory and the Lagrangian neural network. The simulation comparison was carried out on artificially generated data for the double pendulum system, which is the simplest chaotic system. The results of the experiments confirm the hypothesis that the introduction of a priori knowledge about the physics of the system improves the quality of the model.
 +
* '''Problem description:'''Generate a set of convolutions from the available data and choose the best one using order and dimensionality reduction techniques.
 +
* '''Data:''' Biomedical accelerometer and gyroscope data, ocean currents, dune movement, air currents.
 +
* '''Literature:''' The base work contains references.
 +
* '''Base algorithm:''' Neural network, Lagrangian neural networks.
 +
* '''Solution:''' Nesterov neural network.
 +
* '''Novelty:''' The proposed network takes into account the symmetry.
 +
* '''Authors:''' Experts Severilov, Strijov V.V., consultant - Panchenko.
 +
 +
===Problem 115===
 +
* '''Title:''' Knowledge distillation in deep networks and alignment of model structures
 +
* '''Problem description:''' It is required to build a network of the simplest structure, a student model, using a high quality teacher model. Show how the student's accuracy and stability change. The result of the experiment is a graph complexity-accuracy-stability, where each model is accurate.
 +
* '''Data:''' CIFAR-10. It is assumed that the teacher has a structure open for analysis with a large number of layers.
 +
* '''Literature:''' Hinton's original work on distillation, work by Andrei Grabovoi, work by Maria Gorpinich
 +
* '''Base algorithm:''' Training (models with a given structure of controlled complexity) without distillation. Teaching (ditto) with Hinton distillation. Layered learning. Neuronal transfer learning.
 +
* '''Solution:''' As in paragraph 2, only in layers. Building the path of least cost over neurons. We consider the covariance matrices of each neuron of each layer for the teacher and for the student. We propose an error function that includes the cost of the least cost path. We propose a way to construct the path of the least cost. The main idea: the transfer goes through pairs of neurons and the most similar distributions (expectation and covariance matrix) from teacher to student.
 +
* '''Novelty:''' The proposed transfer significantly reduces complexity without loss of accuracy and solves the problem of interchangeability of neurons by identifying them.
 +
* '''Authors:''' Experts Bakhteev Oleg, Strijov V.V., Consultant Gorpinich Maria.
 +
 +
===Problem 116===
 +
* '''Title:''' Neural differential equations for modeling physical activity - selection and generation of mathematical models
 +
* '''Problem description:''' The problem of choosing the optimal mat. models as the problem of genetic optimization. The optimality criterion is defined in terms of the accuracy, complexity, and stability of the model. The sampling procedure itself consists of two steps: generating a new structure and rejecting this structure if it does not satisfy the optimality criterion. Required on 'pendulum' type data - accelerometer, myogram, pulse wave - to choose the optimal model.
 +
* '''Data:''' WISDM, own collection of biomedical data
 +
* '''Literature:''' Neural CDE
 +
* '''Base algorithm:''' Neuro ODE/CDE on a two-layer neural network.
 +
* '''Solution:''' A number of experiments have already been performed, where sampling is performed by a genetic algorithm. Acceptable results have been obtained. It is proposed to analyze and improve them.
 +
* '''Solution:''' Algorithm for generating mathematical models in the form of ordinary differential equations. Comparison of models and solvers on biomedical data.
 +
* '''Authors:''' Expert Strijov V.V., consultant Eduard Vladimirov
 +
 +
===Problem 117===
 +
* '''Title:''' Search for dependencies of biomechanical systems (do people dance in pairs or independently?) and (Method of Convergence Cross-Mpping, Takens theorem)
 +
* '''Problem description:''' When forecasting complex time series that depend on exogenous factors and have multiple periodicity, it is required to solve the problem of identifying related pairs of series. It is assumed that the addition of these series to the model improves the quality of the forecast. In this paper, to detect relationships between time series, it is proposed to use the convergent cross-mapping method. With this approach, two time series are connected if their trajectory subspaces exist, the projections onto which are connected. In turn, the projections of series onto trajectory subspaces are related if the neighborhood of the phase trajectory of one series is mapped to the neighborhood of the phase trajectory of another series. The problem of finding trajectory subspaces that reveal the connection of series is set.
 +
* '''Literature:''' Everything Sugihara wrote in Science and Nature (ask the collection). Usmanova K.R., Strijov V.V. Detection of dependencies in time series in the problems of building predictive models // Systems and means of informatics, 2019, 29(2). Neural CDE
 +
* '''Data:''' Accelerometer, gyroscope, and other data describing dynamic systems
 +
* '''Solution:''' Basic in Karina's work. Ours is to build the Neural ODE for both signals and decide if both models belong to the same dynamic system.
 +
* '''Authors:''' Expert Strijov V.V., consultants Vladimirov, Samokhina
 +
 +
===Problem 118===
 +
* '''Title:''' Continuous time when building a BCI neural interface
 +
* '''Problem description:''' In signal decoding The problems, data is represented as multidimensional time series. When solving problems, a discrete representation of time is used. However, recent work on neural ordinary differential equations illustrates the ability to work with the hidden state of recurrent neural networks, as with solutions to differential equations. This allows us to consider time series as continuous in time.
 +
* '''Data:''' For classification: dataset P300, which was used to write an article with Alina, DEAP dataset dataset similar to it in the format of records, find a modern dataset, ask U.Grenoble-Alpes
 +
* '''Literature:''' Neural CDE
 +
* '''Base algorithm:''' Alina Samokhina's algorithm
 +
* '''Solution:''' Using NeurODE variations to approximate the original signal. Comparative analysis of existing approaches to the application of differential equations for EEG classification. (Encoder-tensor decomposition, NeuroCDE decoder)
 +
* '''Novelty:''' suggests a way to construct a continuous signal representation. Working with the functional space of the signal, not its discrete representation. Using the parameters of the resulting function as a feature space of the resulting model.
 +
* '''Authors:''' Expert Strijov V.V. (was Problem 109), consultant Tikhonov
 +
 +
===Problem 119===
 +
* '''Title:''' Analysis of the dynamics of multiple learning
 +
* '''Problem description:''' Consider a supervised multiple learning problems in which the training set is not fixed but is updated depending on the predictions of the trained model on the test set. For the process of multiple training, prediction and updating of the sample, we build a mathematical model and study the properties of this process based on the constructed model. Let f(x) be a feature distribution density function, G be an algorithm for training the model, generating predictions on the test set and mixing predictions into the training set, as a result of which the feature distribution changes. Let the space of non-negative smooth functions F(x) be given, whose integral on R^n is equal to one. f_{t+1}(x) = G(f_{t})(x), where G(f) is the evolution operator on the space of these functions F and the initial function f_0(x) is known. In general, G can be an arbitrary operator, not necessarily smooth and/or continuous. Question 0. Find conditions on the operator G under which the image of G lies in the same class of distribution density functions F. In particular, should G be bounded, the operator norm ||G|| <= 1, so that the image of G(f) \in F is also a distribution density function for any f from F? Does there exist a unit in the space F with respect to the operator G, and what will be the identity function f in such F? Question 1. Under what conditions will there be a t_0 on G such that for all t > t_0 the tail of the sequence {f} will be bounded? Question 2. Under what conditions will the operator G have a fixed point? Data In a computational experiment, it is proposed to check the significance of the restriction / the significance of the conditions under which the answer to questions 0-2 is obtained. For example, for a problem of linear regression and/or regression with a multilevel fully connected neural network with different proportions of predictions mixed into the training set on synthetic data sets.
 +
* '''Literature:'''
 +
*# Khritankov A., Hidden Feedback Loops in Machine Learning Systems: A Simulation Model and Preliminary Results, https://doi.org/10.1007/978-3-030-65854-0_5
 +
*# Khritankov A.. Pilkevich A. Existence Conditions for Hidden Feedback Loops in Online Recommender Systems, https://doi.org/10.1007/978-3-030-91560-5_19
 +
*# Katok A.B., Hasselblat B. Introduction to the modern theory of dynamical systems.1999. 768 p. ISBN 5-88688-042-9.
 +
*# Nemytsky V. V., Stepanov V. V. Qualitative theory of differential equations, published in 1974.
 +
* '''Authors:''' Expert Khritankov A.S., Expert Afanasiev A.P.
 +
 +
===Problem 120===
 +
* '''Title:''' Differentiated algorithm for searching ensembles of deep learning models with diversity control
 +
* '''Problem description:''' The problem of selecting an ensemble of models is considered. It is required to propose a method for controlling the diversity of basic models at the stage of application.
 +
* '''Data:''' Fashion-MNIST, CIFAR-10, CIFAR-100 datasets
 +
* '''Literature:'''
 +
*# Neural Architecture Search with Structure Complexity Control
 +
*# Neural Ensemble Search via Bayesian Sampling
 +
*# DARTS: Differentiable Architecture Search
 +
* '''Base algorithm:''' It is proposed to use DARTS [3] as the basic algorithm.
 +
* '''Solution:''' To control the diversity of basic models, it is proposed to use a hypernet [1], which shifts the structural parameters in terms of the Jensen-Shannon divergence. At the application stage, base architectures are sampled with a given offset to build an ensemble.
 +
* '''Novelty:''' The proposed method allows building ensembles with any number of base models without additional computational costs relative to the base algorithm.
 +
* '''Authors:''' K.D. Yakovlev, Bakhteev Oleg
 +
 +
===Problem 121===
 +
* '''Problem description:''' building predictive analytics for air pollution sensors.
 +
* '''Problem description:''' Data available for air quality monitoring stations in Moscow and the Moscow region (time series). The problem is to check the achievable predictive ability to predict the time series of station readings by their history + when connecting additional features (take into account the stations in aggregate, taking into account their location, time of day and weekend / working day, history and weather forecast (wind))
 +
* '''Data:''' Real data and simulations for Moscow and Moscow Region
 +
* '''Authors:''' Artem Mikhailov, Vladimir Vanovsky
 +
 +
===Problem 122===
 +
* '''Problem description:''' Reducing the dimension of space in a generative modeling problem using reversible models.
 +
* '''Problem description:''' An example of a generative modeling problem is image generation. Some kinds of new models, such as normalization flows or diffusion models, define reversible transformations. But at the same time they work in a space of very high dimensions. It is proposed to combine 2 approaches: dimensionality reduction and generative modeling.
 +
* '''Data:''' Any image dataset (MNIST/CIFAR10).
 +
* '''Novelty:''' By reducing the dimension, you can achieve a significant acceleration of generative models, which will reduce the complexity of such models.
 +
* '''Author:''' Roman Isachenko
 +
 +
===Problem 123===
 +
* '''Problem description:''' Analysis of distribution bias in contrast distribution problem.
 +
* '''Problem description:''' There is the same problem as Representation learning. One of the most popular approaches to solving this problem is contrastive learning. At the same time, in the data we learn from, there are often markup errors: false positive/false negative. It is proposed to analyze various ways to eliminate these biases caused by errors. And also to explore the properties of the proposed models.
 +
* '''Data:''' Any image dataset (MNIST/CIFAR10).
 +
* '''Novelty:''' Current models are very error sensitive. If you manage to take into account the bias in the distributions, many methods of ranking products will greatly increase in quality.
 +
* '''Author:''' Roman Isachenko
 +
 +
===Problem 124===
 +
* '''Title:''' Speed up sampling from diffusion models using adversarial networks
 +
* '''Problem description:''' The most popular generative model today is the diffusion model. Its main disadvantage is the speed of sampling. To sample 1 picture, you need to run 1 neural network 100-1000 times. There are ways to speed up this process. One such way is to use adversarial networks. It is proposed to develop this method and explore various ways to set the functional for sampling
 +
* '''Data:''' Any image dataset (MNIST/CIFAR10).
 +
* '''Novelty:''' By speeding up diffusion models, they will become even more popular and easier to use.
 +
* '''Author:''' Roman Isachenko
 +
 +
===Problem 125===
 +
* '''Title:''' Influence of the lockdown on the dynamics of the spread of the epidemic
 +
* '''Problem description:''' The introduction of a lockdown is considered an effective measure to combat the epidemic. However, contrary to intuition, it turned out that under certain conditions, a lockdown can lead to an increase in the epidemic. This effect is absent for the classical models of the spread of the epidemic “on average”, but was revealed when modeling the epidemic on the contact graph. The problem is to find formulaic and quantitative relationships between the parameters under which the lockdown can lead to an increase in the epidemic. It is necessary both to identify such relationships in the SEIRS/SEIR/SIS/etc models based on the SEIRS+ epidemiological distribution framework (and its modifications), and to theoretically substantiate the relationships obtained from specific implementations of the epidemia.
 +
* '''Data:''' The problem involves working with model and synthetic data: there are ready-made data, and it is also possible to generate new ones in the process of solving the problem. This The problem belongs to unsupervised learning, since the implementation of the epidemic on the contact graph has a high proportion of random events, and therefore requires analysis on average over many synthetically generated implementations of the epidemic
 +
* '''Literature:''' T. Harko, Francisco S. N. Lobo, and M. Mak. "Exact analytical solutions of the Susceptible-Infected-Recovered (SIR) epidemic model and of the SIR model with equal death and birth rates"
 +
* '''Authors:''' A.Yu. Bishuk, A.V. Zuhba
 +
 +
===Problem 126===
 +
* '''Title:''' Machine generation style change detection
 +
* '''Problem description:'''It is required to propose a detection method
 +
* '''Data:''' The sample for approximation is presented in the work of J. Berezutskay, in which there are various types of parallel signals.
 +
* '''Literature:'''
 +
*# G. Gritsay, A. Grabovoy, Y. Chekhovich. Automatic Detection of Machine Generated Texts: Need More Tokens // Ivannikov Memorial Workshop (IVMEM), 2022.
 +
*# M. Kuznetsov, A. Motrenko, R. Kuznetsova, V. Strijov. Methods for intrinsic plagiarism detection and author diarization // Working Notes of CLEF, 2016, 1609 : 912-919.
 +
*# RuATD competition.
 +
* '''Base algorithm:''' Using the results of the RuATD competition as base models for classifying proposals. Use the method from Kuznetsov et all.
 +
* '''Novelty:''' Suggest a method for detecting machine-generated fragments in the text using methods for changing the writing style.
 +
* '''Authors:''' Expert Grabovoi Andrey
 +
 +
===Problem 128===
 +
* '''Title:''' Build a deep learning model based on The problem data
 +
* '''Problem description:''' is considered The problem optimization of the deep learning model for the new dataset. It is required to propose a model optimization method that allows generating new models for a new dataset with low computational costs.
 +
* '''Data:''' CIFAR10, CIFAR100
 +
* '''Literature:''' variational inference for neural networks, hypernets, similar work tailored to change the model depending on a predetermined complexity
 +
* '''Base algorithm:''' Retrain the model directly.
 +
* '''Solution:''' The proposed method is to represent a deep learning model as a hypernet (a network that generates the parameters of another network) using a Bayesian approach. Probabilistic assumptions about the parameters of deep learning models are introduced, and a variational lower estimate of the Bayesian validity of the model is maximized. The variation estimate is considered as a conditional value, depending on the information about the problem data.
 +
* '''Novelty:''' The proposed method allows you to generate models in one-shot mode (practically without retraining) for the required The problem, which significantly reduces the cost of optimization and retraining.
 +
* '''Authors:''' Olga Grebenkova and Bakhteev Oleg
 +
 +
===Problem 129===
 +
* '''Title:''' Spatiotemporal Prediction with Convolutional Networks and Tensor Decompositions
 +
* '''Problem description:'''Generate a set of convolutions from the available data and choose the best one using order and dimensionality reduction techniques.
 +
* '''Data:''' Consumption and price of electricity, ocean currents, dune movement, air currents
 +
* '''Literature:'''
 +
*# [http://irep.ntu.ac.uk/id/eprint/32719/1/PubSub10184_Sanei.pdf](Tensor-based Singular Spectrum Analysis for Automatic Scoring of Sleep EEG
 +
*# [https://ieeexplore.ieee.org/document/6661921](Tensor based singular spectrum analysis for nonstationary source separation)
 +
* '''Base algorithm:''' Caterpillar, tensor caterpillar.
 +
* '''Solution:''' Find a multi-periodic time series, build its tensor representation, decompose into a spectrum, collect, show the forecast.
 +
* '''Novelty:''' Show that a multilinear model is a convenient way to construct convolutions for dimensions in space and time.
 +
* '''Authors:''' Expert Strijov V.V., consultant Nadezhda Alsakhanova
 +
 +
===Problem 130===
 +
* '''Title:''' Automatic highlighting of terms for topic modeling
 +
* '''Problem description:''' Build an ATE (Automatic Term Extraction) model for automatic extraction of phrases that are terms of the subject area in the texts of scientific articles. It is supposed to use effective collocation detection methods (TopMine or more modern) and thematic models to determine the "thematic" of the phrase. The model must be trained without a teacher (unsupervised).
 +
* '''Data:''' Collection of scientific articles in the field of machine learning. Marked up articles with highlighted terms for evaluating models.
 +
* '''Literature:'''
 +
*# El-Kishky A., Song Y., Wang C., Voss C. R., Han J. Scalable topical phrase mining from text corpora // Proc. VLDB Endowment. _ 2014._ Vol. 8, no. 3._Pp. 305_316.
 +
*# Vorontsov K. V. "Probabilistic thematic modeling: theory, models, algorithms and the BigARTM project" (http://www.machinelearning.ru/wiki/images/d/d5/Voron17survey-artm.pdf)
 +
*# Nikolay Shatalov. Unsupervised learning methods for automatically highlighting compound terms in text collections. 2019. VMK MSU.
 +
*# Vladimir Polushin. Topic models for ranking text content recommendations. 2017. VMK MSU.
 +
*# Hanh Thi Hong Tran, Matej Martinc, Jaya Caporusso, Antoine Doucet, Senja Pollak. The Recent Advances in Automatic Term Extraction: A survey. 2023. https://arxiv.org/abs/2301.06767
 +
* '''Base algorithm:''' TopMine collocation search method • BigARTM thematic modeling library. • Modern methods based on neural network language models
 +
* '''Solution:''' Application of the TopMine collocation search algorithm followed by filtering by topic. Selection of thematic model hyperparameters and thematicity criterion. Comparison of this approach with modern methods based on neural network models of the language.
 +
* '''Novelty:''' Previous studies of the proposed approach have shown good results both in terms of completeness and computational efficiency. However, they have not yet been compared with neural network models.
 +
* '''Authors:''' Polina Potapova, Vorontsov K.V.
 +
 +
===Problem 131===
 +
* '''Title:''' Iterative improvement of the topic model with user feedback
 +
* '''Problem description:''' Topic modeling is widely used in socio-humanitarian research to understand the thematic structure of large text collections. A typical use case would involve the user rating topics as relevant, irrelevant, and junk. If the number of garbage topics is too large, then the user tries to build another model. The problem is to use custom markup for each such rebuild in such a way that relevant topics are preserved, new relevant ones stand out from irrelevant and garbage topics if possible, and there are as few garbage topics as possible.
 +
* '''Data:''' Any collection of natural language texts about which the thematic structure is known (about how many topics, how many documents on different topics) is suitable as data. For example, you can take a collection of Lenta news, a Wikipedia dump, posts from Habrahabr, 20 Newsgroups, Reuters, articles from PostNauka. The subject of the collection should be of interest to the researcher himself, so that there is motivation to evaluate topics manually.
 +
* '''Literature:'''
 +
*# Vorontsov K. V. "Probabilistic thematic modeling: theory, models, algorithms and the BigARTM project" (http://www.machinelearning.ru/wiki/images/d/d5/Voron17survey-artm.pdf ).
 +
*# Alekseev V. et al. "TopicBank: Collection of coherent topics using multiple model training with their further use for topic model validation" (https://www.sciencedirect.com/science/article/pii/S0169023X21000483).
 +
* '''Solution:''' Using the BigARTM theme modeling library. Use of smoothing and decorrelation regularizers. Development of methods of initialization when rebuilding thematic models. Finding a ready-made tool or developing a simple, fast, convenient way to view and markup topics.
 +
* '''Novelty:''' The problem of non-uniqueness and instability of models still does not have a final solution in probabilistic thematic modeling. The proposed study is an important step towards building models with the maximum number of interpretable topics that are meaningfully useful from the point of view of humanitarian research.
 +
* '''Authors:''' Vasily Alekseev, Vorontsov K. V.
 +
 +
===Problem 132===
 +
* '''Title:''' Ranking of scientific articles for semi-automatic summarization
 +
* '''Problem description:''' Build a ranking model that takes a selection of texts of scientific articles as input and outputs the sequence of their mention in the abstract.
 +
* '''Data:''' - Overview sections (for example, Introduction and Related Work) of articles from the S2ORC collection (81.1M English-language articles) are used as a training sample. The object of the training set is a sequence of references to articles from the bibliography mentioned in the review sections. For each document there is a set of metadata - year of publication, journal, number of citations, number of citations of the author, etc. Also, there is an abstract and, possibly, the full text of the article. - Kendall's rank correlation coefficient is used as a metric.
 +
* '''Literature:'''
 +
*# Kryzhanovskaya S. Yu. "Technology of semi-automatic summation of thematic collections of scientific articles".
 +
*# Vlasov A. V. "Methods of semi-automatic summation of collections of scientific articles".
 +
*# Kryzhanovskaya S. Yu., Vorontsov K. V "Technology for semi-automatic summarization of thematic collections of scientific articles" (http://www.machinelearning.ru/wiki/images/f/ff/Idp22.pdf, p. 371), S2ORC: The Semantic Scholar Open Research Corpus.
 +
* '''Base algorithm:''' Pair-wise ranking methods. Gradient boosting.
 +
* '''Solution:''' The simplest solution is to rank the articles in chronological order, according to the year they were published. To solve the problem, it is proposed to build a ranking model based on gradient boosting. As signs, you can use the year of publication, the citation of the article, the citation of its authors, the semantic proximity of the publication to the review, to its local context, etc.
 +
* '''Novelty:''' The problem is the first step for semi-automatic summarization of thematic collections of scientific publications (machine aided human summarization, MAHS). After the abstract script is built, the system generates prompt phrases for each article, from which the user selects phrases to continue his abstract.
 +
* '''Author:''' Kryzhanovskaya Svetlana, Vorontsov K. V.
 +
 +
===Problem 133===
 +
* '''Title:''' Diffusion models in the problem of generating the structure of a molecule with optimal energy
 +
* '''Problem description:''' For an organic small molecule (the number of atoms is less than 100), knowing only the topology of the molecular graph is not enough to obtain the spatial structure. A molecule can have many possible configurations (conformers), each of which corresponds to a local minimum of the potential. In practice, of greatest interest are the most stable conformers, which have the lowest energy. Recent studies show the success of the application of diffusion models for the generation of molecular structures. This approach shows advanced results in the problem of generating molecules and their conformers for a small number of heavy atoms (QM9 dataset up to 9 heavy atoms in a molecule), as well as in assessing the binding of a molecule and a protein. It is proposed to build a model for the generation of conformers with minimum energy for larger molecules.
 +
* '''Data:''' Base dataset QM9
 +
* '''Literature:'''
 +
*# Different theoretical approaches to the diffusion model: https://arxiv.org/abs/2011.13456
 +
*# Diffusion in molecular generation: https://arxiv.org/abs/2203.17003
 +
*# Diffusion in the problem of binding a protein and a molecule: https://arxiv.org/abs/2210.01776
 +
*# Diffusion in the problem of conformer generation: https://arxiv.org/abs/2203.02923
 +
*# Tutorial on equivariant neural networks: https://arxiv.org/abs/2207.09453
 +
* '''Base algorithm:''' GeoDiff[4].
 +
* '''Solution:''' Implement conformer generation similar to DiffDock[3] for QM9 dataset. Check the performance of the model for larger molecules.
 +
* '''Novelty:''' The novelty of the work lies in the design of a model for generating large conformers, which is of great practical importance.
 +
* '''Author:''' Philip Nikitin
 +
 +
===Problem 134===
 +
* '''Title:''' Combining distillation of models and data
 +
* '''Problem description:''' Knowledge distillation is the transfer of knowledge from a more meaningful representation to a compact, concise representation. There are two kinds of knowledge distillation. The first is the distillation of models. In this case, the large model transfers knowledge (distilled) to the small model. The second is data distillation. In this case, a minimum data set is created, on which, after training the model, it achieves a quality comparable to training on a full sample. At the moment, there is no solution that can implement simultaneous distillation of model and knowledge. Therefore, the goal of The problem is to propose a basic solution for model distillation and compare with approaches to model distillation and data distillation.
 +
* '''Data:''' MNIST handwritten digit sampling, CIFAR-10 image sampling
 +
* '''Literature:'''
 +
*# A collection of various papers on the distillation of data.
 +
*# Review on methods of distillation models.
 +
*# Basic knowledge distillation solution.
 +
*# Basic solution for model distillation.
 +
* '''Base algorithm:''' Basic Model Distillation Solution, Hinton Distillation Basic Dataset Distillation Solution, Dataset Distillation by Matching Training Trajectories
 +
* '''Solution:''' It is proposed to implement data distillation as a basic algorithm. Then train a larger model on the data and distill it into a smaller model. Next, compare with the original model and the model trained on distilled data.
 +
* '''Novelty:''' The novelty of the work lies in the combination of two distillation approaches, which has not been implemented before
 +
* '''Authors:''' Andrey Filatov
 +
 +
===Problem 135===
 +
* '''Title:''' Proximity measures in self-supervised learning The problems
 +
* '''Problem description:''' The idea of self-supervised learning is to solve an artificially selected The problem to get useful representations of data without markup. One of the most popular approaches is the use of contrastive learning, during which the model is trained to minimize the distance between representations of augmented copies of the same object. The purpose of The problem is to investigate the quality of the resulting representations depending on the choice of the proximity measure (similarity measure) used in training, and to offer our own version of distance measurement
 +
* '''Data:''' CIFAR-100
 +
* '''Literature:'''
 +
*# Solution using squared Euclidean distance.
 +
*# Solution using cosine similarity.
 +
*# Decision based on the information principle.
 +
* '''Base algorithm:''' VicReg, Barlow Twins, SimSiam
 +
* '''Solution:''' One of the distance options that can be proposed is an analogue of the Vaserstein metric, which would allow taking into account the dependencies between features.
 +
* '''Novelty:''' Propose a new way to determine the measure of proximity, which would be theoretically justified / contributed to obtaining representations with given properties
 +
* '''Authors:''' Polina Barabanshchikova
 +
 +
===Problem 136===
 +
* '''Title:''' Stochastic Newton with Arbitrary Sampling
 +
* '''Problem description:''' We analyze second order methods solving Empirical Risk Minimization problem of the form min f(x) in R^d. Here x is a parameter vector of some Machine Learning model, f_i(x) is a loss function on i-th training point (a_i,b_i). Our desire to solve it using Newton-type method that requires access to only one data point per iteration. We investigate different sampling strategies of index i_k on iteration k. See description in PDF.
 +
* '''Data:''' It is proposed to use open SVM library as a data for experimental part of the work.
 +
* '''References:'''
 +
*# Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates
 +
*# Parallel coordinate descent methods for big data optimization
 +
* '''Base algorithm:''' As a base method it is proposed to use Algorithm 1 from the paper Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates.
 +
* '''Solution:''' Is is proposed to adjust existing sampling strategies from Parallel coordinate descent methods for big data optimization in this work.
 +
* '''Novelty:''' In the literature of Second Order methods there are a few works on incremental methods. The idea is to analyze the existing method by applying different sampling strategies. It is known that the proper sampling strategies may improve the performance of a method.
 +
* '''Authors:''' Islamov Rustem, Vadim Strijov
 +
 +
===Problem 139===
 +
* '''Title:''' Distillation of models on multidomain selections.
 +
* '''Problem description:''' The problem of reducing the complexity of the approximating model when transferred to new data of lower power is investigated.
 +
* '''Data:''' Samples MNIST, CIFAR-10, CIFAR-100, Amazon products.
 +
* '''Literature:''' Diploma Kamil Bayazitov
 +
* '''Base algorithm:''' The basic solution and experiments are presented in the thesis.
 +
* '''Authors:''' Grabovoi Andrey
 +
 +
===Problem 140===
 +
* '''Title:''' Tailoring the architecture of a performance-controlled deep learning model
 +
* '''Problem description:''' considers The problem adapting the structure of a trained deep learning model for limited computing resources. It is assumed that the resulting architecture (or several architectures) should work efficiently on several types of computing servers (for example, on different GPU models or different mobile devices). It is required to propose a model search method that allows controlling its complexity taking into account the target performance characteristics.
 +
* '''Data:''' MNIST, CIFAR
 +
* '''Literature:'''
 +
*# Grebenkova O.S., Bakhteev Oleg O., Strijov V.V. V.V. Variational optimization of a deep learning model with complexity control // Informatics and its applications, 2021, 15(2). PDF
 +
*# Yakovlev K. D. et al. Neural Architecture Search with Structure Complexity Control //Recent Trends in Analysis of Images, Social Networks and Texts: 10th International Conference, AIST 2021, Tbilisi, Georgia, December 16–18, 2021, Revised Selected Papers. Cham: Springer International Publishing, 2022. - pp. 207-219.
 +
*# FBNet: choosing a model architecture based on target characteristics
 +
* '''Base algorithm:''' FBNet and random search of model substructure
 +
* '''Solution:''' The proposed method is to use a differentiable neural network architecture search algorithm (FBNet) with parameter complexity control using a hypernet. A hypernetwork is a model that generates the structure of the model depending on the input parameters. It is proposed to use the normalized running time of basic operations on target computing resources as hypernet parameters. Thus, the resulting model will allow adapting the architecture of the model for an arbitrary device. * '''Novelty:''' The proposed method allows you to control the complexity of the model, in the process of searching for an architecture without additional heuristics.
 +
* '''Authors:''' Konstantin Yakovlev, Bakhteev Oleg
 +
 +
==2022==
 +
===Results===
 +
{|class="wikitable"
 +
|-
 +
! Author
 +
! Topic
 +
! Links
 +
! Consultant
 +
! Letters
 +
|-
 +
|[https://github.com/anton39reg Pilkevich Anton]
 +
| Existence conditions for hidden feedback loops in recommender systems
 +
|[https://github.com/Intelligent-Systems-Phystech/2021-Project-74 GitHub], [https://docs.google.com/document/d/1OLCqkmArjqFn8M9pB5C_kLoYOv0l1w9RjHy0y0upPew/edit?usp=sharing LinkReview],
 +
[https://github.com/Intelligent-Systems-Phystech/2021-Project-74/raw/main/docs/Pilkevich2021HiddenFeedbackLoops.pdf Paper], [https://github.com/Intelligent-Systems-Phystech/2021-Project-74/raw/main/docs/Pilkevich2021Presentation/Pilkevich2021Presentation.pdf Slides],
 +
[https://www.youtube.com/watch?v=xW_lXGn1WHs&t=24s Video], [https://youtu.be/9ELhIqjFSE8 Video]
 +
|[https://intelligent-systems-phystech.github.io/ru/people/khritankov_as/index.html Khritankov]
 +
| AILB.P-X+R-B-H1CVO.T-EM.H1WJSF
 +
|-
 +
|[https://github.com/Edyarich Vladimirov Eduard]
 +
|Restoration of the trajectory of hand movement from video
 +
|[https://github.com/Intelligent-Systems-Phystech/2022-Project-90 GitHub], [https://docs.google.com/document/d/1RpWz1sqpgwnf-ewTe4OHI_WODGklx5FBjLfzvHkIUYQ/edit?usp=sharing LinkReview],
 +
[https://github.com/Intelligent-Systems-Phystech/2022-Project-90/raw/master/paper/Vladimirov2022RestoringHandMovement.pdf Paper], [https://github.com/Intelligent-Systems-Phystech/2022-Project-90/blob/master/slides/Vladimirov2022Presentation.pdf Slides]
 +
|[https://github.com/r-isachenko Isachenko]
 +
|(B.O.H1M)ALI+PXRBС+V+TED?
 +
|-
 +
|[https://github.com/pkseniya Petrushina Ksenia]
 +
| Anti-Distillation: Knowledge Transfer from Simple Model to a Complex One
 +
|[https://github.com/Intelligent-Systems-Phystech/2022-Project-97 GitHub], [https://docs.google.com/document/d/1ekpNeQnvnpXP_Jwp07llyZArH85IZO7Bz1UAlTme7Xs/edit?usp=sharing LinkReview],
 +
[https://github.com/Intelligent-Systems-Phystech/2022-Project-97/blob/master/paper/Petrushina2022AntiDistillation.pdf Paper], [https://github.com/Intelligent-Systems-Phystech/2022-Project-97/blob/master/slides/Petrushina2022Presentation.pdf Slides]
 +
|[https://github.com/andriygav Grabovoi]
 +
| (B.O.H1M)ALIPXRBСVTED
 +
|-
 +
|[https://github.com/Jhomanik Kornilov Nikita]
 +
| Winterstorm risk prediction via machine learning methods
 +
|[https://github.com/Intelligent-Systems-Phystech/2022-Project-93-1 GitHub], [https://docs.google.com/document/d/1XAld9YsJ-R7Jv-i5SkIGNxX5Hy8vShPv8BA_jig9XcQ/edit?usp=sharing LinkReview],
 +
[https://github.com/Intelligent-Systems-Phystech/2022-Project-93-1/raw/master/paper/Kornilov2022Winterstorm.pdf Paper], [https://github.com/Intelligent-Systems-Phystech/2022-Project-93-1/raw/master/slides/Winterstorm_presentation.pdf Slides]
 +
| Yuri Maksimov
 +
| (B.O.H1M?)ALIPXRBСV+TE0D
 +
|-
 +
|[https://github.com/AlievAE Aliyev Alen]
 +
| Geometric Deep Learning for Protein-Protein Binding Affinity Prediction
 +
|[https://github.com/Intelligent-Systems-Phystech/2022-Project-103 GitHub], [https://docs.google.com/document/d/1J6nfi3nclsB6TOgcoqokSlli0u0YOqPpKzhZ7h0Xltw LinkReview],
 +
[https://github.com/Intelligent-Systems-Phystech/2022-Project-103/blob/master/docs/Aliev2022PpbAffinityPrediction.pdf Paper], [https://github.com/Intelligent-Systems-Phystech/2022-Project-103/blob/master/slides/Aliev2022Presentation.pdf Slides]
 +
| Ilya Igashov
 +
| (B.O.H1M?)ALIPXRBСVTED?
 +
|-
 +
|[https://github.com/IvanLukianenko Lukyanenko Ivan]
 +
| Hail Prediction Using Graph Neural Networks
 +
|[https://github.com/Intelligent-Systems-Phystech/2022-Project94 GitHub], [https://docs.google.com/document/d/1ntAjEcvUhdgxM4CZCwmWDq8fBXiOrqBKh92rto4C92Q/edit?usp=sharingLinkReview],
 +
[https://github.com/Intelligent-Systems-Phystech/2022-Project-94/blob/master/paper/Hail%20risk%20prediction%20with%20HailNet.pdf Paper], [https://github.com/Intelligent-Systems-Phystech/2022-Project-94/blob/master/slides/Hail%20risk%20prediction%20via%20Graph%20Neural%20Networks%20Slides.pdf Slides]
 +
| Yuri Maksimov
 +
| (B.O.H1M?)ALIPXRBСV+TED?
 +
|-
 +
|[https://github.com/Maxgaponov Gaponov Maxim]
 +
| Choosing Interpretable Recurrent Deep Learning Models
 +
|[https://github.com/Intelligent-Systems-Phystech/2022-Project-99 GitHub], [https://docs.google.com/document/d/1R-IAGa-w5Edc23jfB_68OZ34EiBlRq6Yaoc1XR_mQ9g/edit?usp=sharing LinkReview],
 +
[https://github.com/Intelligent-Systems-Phystech/2022-Project-99/blob/master/paper/Gaponov2022InterpretableRNN.pdf Paper], [https://github.com/Intelligent-Systems-Phystech/2022-Project-99/blob/master/slides/Gaponov2022InterpretableRNNSlides.pdf Slides]
 +
|[https://github.com/bahleg Bakhteev Oleg]
 +
| (B.O.H1M)AL+IPXRBСVT???ED
 +
|-
 +
|[https://github.com/MelnikovIgor1 Melnikov Igor]
 +
| Stochastic Newton with Arbitrary Sampling
 +
|[https://github.com/Intelligent-Systems-Phystech/2022-Project-101 GitHub], [https://docs.google.com/document/d/1wwLvqBrUV3atwJfnlqVRAhSk-KlUzbUpW6K_aaJ8arQ/edit?usp=sharing LinkReview],
 +
[https://github.com/Intelligent-Systems-Phystech/2022-Project-101/raw/master/paper/Melnikov2022StochasticNewtonWithArbitrarySampling.pdf Paper], [https://github.com/Intelligent-Systems-Phystech/2022-Project-101/raw/master/slides/one-slide.pdf Slides]
 +
|[https://github.com/Rustem-Islamov Rustem Islamov]
 +
| (B.O.H1M)ALIPXСRBVTED
 +
|-
 +
|[https://github.com/fzmushko Zmushko Philip]
 +
| Continuous time when building a BCI neural interface
 +
|[https://github.com/Intelligent-Systems-Phystech/2022-Project-109 GitHub], [https://docs.google.com/document/d/1tpH34r2x4vRWgaBeBkf8yp__-qGyDQNST-w7X29qgPg/edit?usp=sharing LinkReview],
 +
[https://github.com/Intelligent-Systems-Phystech/2022-Project-109/blob/master/paper/Zmushko2022ContinuousTime.pdf Paper], [https://github.com/Intelligent-Systems-Phystech/2022-Project-109/blob/master/slides/Zmushko2022Presentation.pdf Slides]
 +
|[https://github.com/Alina-Samokhina Samokhina]
 +
| (B.O.H1M)ALI0P0XR?BСVTE?D?
 +
|-
 +
|[https://github.com/hadingus Tishchenko Evgeny]
 +
| Cross-language duplicate search
 +
|[https://github.com/Intelligent-Systems-Phystech/2022-Project-104 GitHub], [https://docs.google.com/document/d/13bZ_Cs5Q-tAfuSEPXVMw-uqTtZkkvoUxF35pRSfx7bI/edit?usp=sharing LinkReview],
 +
[https://github.com/Intelligent-Systems-Phystech/2022-Project-104/blob/master/paper/Tishchenko2022PlagiatDetecting.pdf Paper], [https://github.com/Intelligent-Systems-Phystech/2022-Project-104/blob/master/slides/Tishchenko2022AntiplagiatDetectionSlides.pdf Slides]
 +
| Konstantin Vorontsov
 +
| (B.O.H1M)ALIPXRB0СV0T?E?D?
 +
|-
 +
|[https://github.com/JustAnotherArchetype Antyshev Tikhon]
 +
| Compression for Federated Random Reshuffling
 +
|[https://github.com/Intelligent-Systems-Phystech/2022-Project107 GitHub], [https://docs.google.com/document/d/1T0bsAXp2P8kWmhCtI2lV0KVi4neEdu6FabkWxrAd3aI/edit?usp=sharing LinkReview],
 +
[https://github.com/Intelligent-Systems-Phystech/2022-Project107/blob/master/paper/Antyshev2022CompressionforFedRR.pdf Paper], [https://github.com/Intelligent-Systems-Phystech/2022-Project107/blob/master/slides/Antyshev2022Presentation.pdf Slides]
 +
|[https://grigory-malinovsky.github.io/ Malinovsky]
 +
| (B.O.H1_M?)ALI-PXRBСVT?
 +
|-
 +
|[https://github.com/vladpyzh Pyzh Vladislav]
 +
| Flood risk prediction via machine learning methods
 +
|[https://github.com/Intelligent-Systems-Phystech/2022-Project-93-2 GitHub], [https://docs.google.com/document/d/1eKr7KS_ONyhj9B5ZupALz_ejm9SgO1rmoTvOAmW10G8/edit?usp=sharing LinkReview],
 +
[https://github.com/Intelligent-Systems-Phystech/2022-Project-93-2/raw/master/docs/Pyzh2022Title.pdf Paper], [https://www.overleaf.com/read/tbrgqmyttnnb Online Draft],
 +
[https://github.com/Intelligent-Systems-Phystech/2022-Project-93-2/raw/master/docs/presentation.pdf Slides]
 +
| Yuri Maksimov
 +
| (B.O.H10M?)ALI0P0XRBСVT0ED?
 +
|-
 +
|[https://github.com/Egor-s-gor Zharov Georgy]
 +
| Forest fire risk assessments using machine learning methods
 +
|[https://github.com/Intelligent-Systems-Phystech/2022-Project-93 GitHub], [https://docs.google.com/document/d/17LqpAAdnIwbVIq9dLdZA7z9eBnaf_nd-Dp0_kBKXxYA/edit?usp=sharing LinkReview],
 +
[https://github.com/Intelligent-Systems-Phystech/2022-Project-93/blob/master/paper/First_paper_Zharov_Wildfires.pdf Paper], [https://github.com/Intelligent-Systems-Phystech/2022-Project-93/blob/master/slides/talk.pdf Slides]
 +
| Yuri Maksimov
 +
| (B.O.H1)ALIPX0R0B0С0V0T?E0D?
 +
|-
 +
|[https://github.com/TimkaMLG Muradov Timur]
 +
| Choosing Interpretable Convolutional Deep Learning Models
 +
|[https://github.com/Intelligent-Systems-Phystech/2022-Project99 GitHub], [https://docs.google.com/document/d/177wuzjmAuY4BpG7325QSH9SkS4SBCgWCKBFXzc68YA0/edit LinkReview],
 +
[https://github.com/Intelligent-Systems-Phystech/2022-Project99/raw/master/paper/Muradov2022InterpretableCNN.pdf Paper], [https://github.com/Intelligent-Systems-Phystech/2022-Project99/raw/master/slides/Muradov2022Presentation.pdf Slides]
 +
|[https://github.com/bahleg Bakhteev]
 +
| (B.O.H1)ALI0P0XRBСV0T0E?D?
 +
|-
 +
|[https://github.com/YHx07 Pavlov Dmitry]
 +
| Machine learning approach to startup success prediction
 +
|[https://github.com/Intelligent-Systems-Phystech/2022-Project-vc GitHub], [https://www.overleaf.com/read/zswjpqgmrcmw Online Draft],
 +
[https://github.com/Intelligent-Systems-Phystech/2022-Project-vc/blob/master/paper/2022_Project_vc.pdf Paper], [https://github.com/Intelligent-Systems-Phystech/2022-Project-vc/blob/master/slides/2022_Project_vc.pdf Slides]
 +
| Anton Moiseev, Yuri Ammosov
 +
| (B.O.H10M?)ALI?P?XRBСV?T0E0D0
 +
|-
 +
|}
 +
 +
===Problem 100.2022 (group)===
 +
* '''Title:''' Multi-model representation of dynamical systems
 +
* '''Problem description:''' The system described by attractors in several phase spaces is considered. Particular models are constructed that approximate measurements of the state of the system in each space. A matching multimodel is built. The parameters of private models are specified.
 +
* '''Data:''' Human motion video, accelerometer, gyroscope, electroencephalogram signals
 +
* '''Literature:''' Our work on accelerometers and BCI, dissertations by Motrenko, Isachenko, Grabovoi
 +
* '''Base algorithm:''' Particular models are neural networks, multimodel is canonical correlation analysis and multimodel is distilled.
 +
* '''Solution:''' Generalize canonical correlation analysis and distillation to the case of an arbitrary number of models.
 +
* '''Novelty:''' Alignment space built for a set of heterogeneous models
 +
* '''Authors:''' A.V. Grabovoi, Strijov V.V.
 +
 +
===Problem 90.2022===
 +
* '''Title:''' Hand movement recovery from video
 +
* '''Problem description:''' A skeletal representation of a person's pose is restored from the video sequence. The trajectory of the movement of human limbs sets the initial phase space. The accelerometer signal from the limbs sets the target phase space. Build a model that connects the attractors of the trajectories of the source and target spaces.
 +
* '''Data:''' The initial sample is collected by the authors of the project. Parts of the selection are in the library examples.
 +
* '''Solution:''' Theoretical part executed by the extended command. Perform a theoretical study: show that the canonical correlation analysis method (and in particular the PLS, NNPLS, seq2seq, Neur ODE methods) are special cases of the Sugihara convergent cross mapping method.
 +
* '''Novelty:''' A reversible model has been introduced that maps the coordinates recovered from the video sequence into the accelerations of the mobile phone's accelerometer.
 +
* '''Authors:''' A.D. Kurdyukova, R.I. Isachenko, Strijov V.V.
 +
 +
===Problem 91.2022===
 +
* '''Title:''' Clustering human movement trajectories
 +
* '''Problem description:''' This paper analyzes the periodic signals in the time series to recognize human activity by using a mobile accelerometer. Each point in the timeline corresponds to a segment of historical time series. This segments form a phase trajectory in phase space of human activity. The principal components of segments of the phase trajectory are treated as feature descriptions at the point in the timeline. The paper introduces a new distance function between the points in new feature space. To reval changes of types of the human activity the paper proposes an algorithm. This algorithm clusters points of the timeline by using a pairwise distances matrix. The algorithm was tested on synthetic and real data. This real data were obtained from a mobile accelerometer
 +
* '''Data:''' USC-HAD, new accelerometer samples
 +
* '''Literature:''' Grabovoy A.V., Strijov V.V. Quasi-periodic time series clustering for human activity recognition // Lobachevskii Journal of Mathematics, 2020, 41 : 333-339.
 +
* '''Base algorithm:''' Caterpillar
 +
* '''Solution:''' Bring Grabovoi's article from the Lobachevsky Journal of Mathematics to perfection
 +
* '''Novelty:''' Use Neuro ODE to plot the phase trajectory and classify it
 +
* '''Authors:''' A.V. Grabovoi (ask!!), Strijov V.V.
 +
 +
===Problem 97.2022===
 +
* '''Title:''' Anti-distillation or teacher training: knowledge transfer from a simple model to a complex one
 +
* '''Problem description:''' The problem of adapting the model to a new sample with a large amount of information is considered. For adaptation, it is proposed to build a new model of greater complexity with further transfer of information from a simple model to it. When transferring information, it is necessary to take into account not only the quality of the forecast on the original sample, but also the adaptability of the new model to the new sample and the robustness of the solution obtained.
 +
* '''Data:''' MNIST handwritten digit sampling, CIFAR-10 image sampling
 +
* '''Literature:''' Original distillation problem statement: Hinton G. et al. Distilling the knowledge in a neural network //arXiv preprint arXiv:1503.02531
 +
* '''Base algorithm:''' It is proposed to increase the complexity of the model by including constant values close to zero in the model. This approach is basic, because can lead to a decrease in the robustness of the model and worse adaptability to a new sample.
 +
* '''Solution:''' It is proposed to consider several approaches to increase the complexity of the model, including both probabilistic (adding noise to new parameters, taking into account operational requirements) and algebraic (expanding the parametric space of the model, taking into account the requirements for robustness and constant Lipschitz of the original model)
 +
* '''Novelty:''' obtaining a method that allows you to adapt the existing model to complicate the training sample without losing information
 +
* '''Authors:''' Bakhteev, Grabovoi, Strijov V.V.
 +
 +
===Problem 98.2022===
 +
* '''Title:''' Deep learning model selection with expert model matching control
 +
* '''Problem description:''' is considered The problem classification. An expert model of low complexity is specified. It is required to build a deep learning model that gives a high quality of the forecast and is similar in behavior to the expert model.
 +
* '''Data:''' Sociological samples, CIFAR image sample
 +
* '''Literature:''' Yakovlev Konstantin, Grebenkova Olga, Bakhteev Oleg, Strijov Vadim. Neural architecture search with structure complexity control // Communications in Computer and Information Science (Proceedings of the 10th International Conference on Analysis of Images, Social Networks and Texts), 2021
 +
* '''Base algorithm:''' building an expert model.
 +
* '''Solution:''' The proposed method consists in hypernetworks with control of the consistency of the found model with the expert model. A hypernetwork is a deep learning model that generates the parameters of the target model.
 +
* '''Novelty:''' the proposed method allows to take into account expert judgment in the process of model selection and architecture search.
 +
* '''Authors:''' Grebenkova, Bakhteev, Strijov V.V.
 +
 +
===Problem 99.2022===
 +
* '''Title:''' Selection of interpretable convolutional deep learning models
 +
* '''Problem description:''' Considers The problem of choosing an interpretable deep learning classification model. Interpretability is understood as the ability of the model to: a) return the most significant features of an object for classification, b) determine clusters of objects that are similar from the point of view of the classifier
 +
* '''Data:''' MNIST handwritten digit sampling, CIFAR-10 image sampling
 +
* '''Literature:'''
 +
*# [https://arxiv.org/pdf/1802.06259.pdf Exact and Consistent Interpretation for Piecewise Linear Neural Networks: A Closed Form Solution]
 +
*# [https://arxiv.org/abs/1602.04938 "Why Should I Trust You?": Explaining the Predictions of Any Classifier]
 +
* '''Base algorithm:''' The LIME(1) algorithm interprets the model by local approximation
 +
* '''Solution:''' A solution based on the method described in (2) is proposed. In this paper, a generalization of the multilayer perzpetron model with a piecewise linear activation function was proposed. Such an activation function allows us to consider the classifier for each sample object as a locally linear one, without using approximation. It is proposed to generalize the proposed approach to the main nonlinear functions used in convolutional neural networks: convolution, pooling and normalization functions.
 +
* '''Novelty:''' is to obtain a new class of neural models that lend themselves to good interpretation.
 +
* '''Authors:''' Yakovlev, Bakhteev, Strijov V.V.
 +
 +
===Problem 01.2022===
 +
* '''Title:''' Stochastic Newton with Arbitrary Sampling
 +
* '''Problem:''' We analyze second order methods solving Empirical Risk Minimization problem of the form min f(x) in R^d. Here x is a parameter vector of some Machine Learning model, f_i(x) is a loss function on i-th training point (a_i,b_i). Our desire to solve it using Newton-type method that requires access to only one data point per iteration. We investigate different sampling strategies of index i_k on iteration k. See description in [http://www.machinelearning.ru/wiki/images/5/5c/Stochastic_Newton_with_Arbitrary_Sampling.pdf PDF].
 +
* '''Dataset:''' It is proposed to use open SVM library as a data for experimental part of the work.
 +
* '''References:'''
 +
*# Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates
 +
*# Parallel coordinate descent methods for big data optimization
 +
* '''Base algorithm:''' As a base method it is proposed to use Algorithm 1 from the paper Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates.
 +
* '''Solution:''' Is is proposed to adjust existing sampling strategies from Parallel coordinate descent methods for big data optimization in this work.
 +
* '''Novelty:''' In the literature of Second Order methods there are a few works on incremental methods. The idea is to analyze the existing method by applying different sampling strategies. It is known that the proper sampling strategies may improve the performance of a method.
 +
* '''Authors:''' Islamov Rustem, Vadim Strijov
 +
 +
===Problem 107.2022===
 +
* '''Title:''' Compression for Federated Random Reshuffling
 +
* '''Problem:''' We analyze first order methods solving Empirical Risk Minimization problem of the form min f(x) in R^d. Here x is a parameter vector of some Machine Learning model, f_i(x) is a loss function on i-th training point (a_i,b_i). We focus on distributed setting of this problem. We are going to apply compression techniques to reduce number of communicated bits to overcome communication bottleneck. Also we want to combine it with server-side updates. We desire to generalize and get improvement in theory and practice.
 +
* '''Dataset:''' It is proposed to use open SVM library as a data for experimental part of the work.
 +
* '''References:'''
 +
*# [https://fl-icml.github.io/2021/papers/FL-ICML21_paper_34.pdf Federated Random Reshuffling with Compression and Variance Reduction]
 +
*# [https://arxiv.org/pdf/2102.06704.pdf Proximal and Federated Random Reshuffling]
 +
*# [https://arxiv.org/pdf/2201.11066.pdf Server-Side Stepsizes and Sampling Without Replacement Provably Help in Federated Optimization]
 +
* '''Base algorithm:''' As a base method we use Algorithm 3 from [https://arxiv.org/pdf/2102.06704.pdf Proximal and Federated Random Reshuffling].
 +
* '''Solution:''' Is is proposed to combine the method with two stepsizes with compression operators.
 +
* '''Novelty:''' This would be the first method combining 4 popular federated learning techniques: local steps, compression, reshuffling of data and two stepsizes.
 +
* '''Authors:''' Grigory Malinovsky
 +
 +
===Problem 108.2022===
 +
* '''Title:''' Distillation of knowledge using sample representation in the common latent space of models
 +
* '''Problem description:''' Considers The problem of distillation - the transfer of information from one or more teacher models to the student. A special case is considered when teachers have incomplete information about the sample, and each model has useful information only about some subset.
 +
* '''Data:''' Sample CIFAR-10 images; sampling of handwritten MNIST digits
 +
* '''Literature:'''
 +
*# Hinton G. et al. Distilling the knowledge in a neural network //arXiv preprint arXiv:1503.02531. - 2015. - Vol. 2. - No. 7.
 +
*# Oki H. et al. Triplet Loss for Knowledge Distillation //2020 International Joint Conference on Neural Networks (IJCNN). - IEEE, 2020. - P. 1-7.
 +
* '''Base algorithm:''' Hinton distillation [1].
 +
* '''Solution:''' It is proposed to consider hidden representations of teachers and students obtained using dimensionality reduction algorithms. To align the model spaces, it is proposed to use the autoencoder model with triplet constraints (see, for example, [2]).
 +
* '''Novelty:''' The proposed method will allow the distillation of heterogeneous models, using information from several teachers.
 +
* '''Authors:''' Gorpinich, Bakhteev, Strijov V.V.
 +
 +
===Problem 93.2022===
 +
* '''Title:''' Estimating the risk of forest fires using machine learning methods.
 +
* '''Problem description:''' Wildfire risk prediction based on climate variables (water/air temperature, atmospheric pressure) since 1991. Forecasting is carried out (a) in the short-term range (2-5 years; stationary time series) and (b) in the long-term range (up to 50 years; non-stationary time series). A feature of forecasting in the long range is the (probable) significant change in the behavior of climate variables (CMIP5 scenarios). The key features of problem (1) are the need for a sufficiently accurate prediction of extreme risk values (maximum values of the time series), while the algorithm can make a significant number of errors in the region of small values of the series. (2) the spatial data structure of the series.
 +
* '''Data:'''
 +
*# [https://developers.google.com/earth-engine/datasets/catalog/IDAHO_EPSCOR_TERRACLIMATE Google Earth Data] - data on climate variables and landscape available via API (there is a jupyter notebook through which you can download data locally)
 +
*# [https://www.worldclim.org/data/cmip6/cmip6_clim2.5m.html CMIP5] climate scenarios (there is a jupyter notebook through which you can download data locally)
 +
*# [https://daac.ornl.gov/cgi-bin/theme_dataset_lister.pl?theme_id=8 Wildfire Risk Database]
 +
*# [https://www.visualcrossing.com/weather/weather-data-services Severe Weather Dataset]
 +
* '''Literature:'''
 +
*# [http://staff.ustc.edu.cn/~hexn/papers/kdd19-timeseries.pdf Daizong Ding, Mi Zhang, Xudong Pan, Min Yang, Xiangnan He. Modeling Extreme Events in Time Series Prediction. KDD-2019].
 +
*# [https://arxiv.org/abs/2004.09140 Roman Kail, Alexey Zaytsev, Evgeny Burnaev. Recurrent Convolutional Neural Networks help to predict the location of Earthquakes].
 +
*# [http://roseyu.com/time-series-workshop/submissions/TSW2017_paper_3.pdf Nikolay Laptev, Jason Yosinski, Li Erran Li, Slawek Smyl. Time-series Extreme Event Forecasting with Neural Networks at Uber].
 +
* '''Base algorithm:''' (1) method from article 1, (2). ST-LSTM
 +
* '''Solution:''' is proposed to solve the problem in two steps. At the first step, Algorithm 1 (with the addition of a spatial component) restores (averaged over a certain range) the behavior of the time series. Next, the discrepancy between the values of the series and the model is analyzed. Based on this, the noise distribution is restored and a probabilistic model is built to achieve a certain level of risk in a given territory in the required time range.
 +
* '''Novelty:''' (geo)-spatial time series prediction is an open area with great potential for theoretical and practical work. In particular, fire risk assessment is necessary for (1) predicting the probability of accidents (electric power industry, gas transport complex); (2) prioritization of fire prevention measures by region; (3) assessing the financial risks of companies operating in the region.
 +
* '''Authors:''' Yuri Maksimov, Alexey Zaitsev
 +
* '''Consultants:''' Yuri Maksimov, Alexey Zaitsev, Alexander Lukashevich.
 +
 +
===Problem 94.2022===
 +
* '''Title:''' Hail forecast using graph neural networks
 +
* '''Problem description:''' Hail risk prediction based on climate variables (water/air temperature, atmospheric pressure) since 1991. Forecasting is carried out (a) in the short-term range (2-5 years; stationary time series) and (b) in the long-term range (up to 50 years; non-stationary time series). A feature of forecasting in the long range is the (probable) significant change in the behavior of climate variables (CMIP5 scenarios). Key features of The problem (1) rare events, the case of hail in Russia over the past 30 years was less than 700 throughout the country (2) the spatial structure of the data series.
 +
* '''Data:'''
 +
*# [https://developers.google.com/earth-engine/datasets/catalog/IDAHO_EPSCOR_TERRACLIMATE Google Earth Data] - data on climate variables and landscape available via API (there is a jupyter notebook through which you can download data locally)
 +
*# [https://www.worldclim.org/data/cmip6/cmip6_clim2.5m.html CMIP5] climate scenarios (there is a jupyter notebook through which you can download data locally)
 +
*# [https://www.ncdc.noaa.gov/stormevents/ftp.jsp NOAA Storm Events Database]
 +
*# [https://eswd.eu/cgi-bin/eswd.cgi European Severe Weather Database]
 +
*# [https://www.visualcrossing.com/weather/weather-data-services Severe Weather Dataset]
 +
* '''Literature:'''
 +
*# Ayush, Kumar, et al. "Geography-aware self-supervised learning." [https://openaccess.thecvf.com/content/ICCV2021/papers/Ayush_Geography-Aware_Self-Supervised_Learning_ICCV_2021_paper.pdf Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021].
 +
*# Cachay, Salva Rühling, et al. "Graph Neural Networks for Improved El Ni\~ no Forecasting." arXiv preprint arXiv:2012.01598 (2020). [https://arxiv.org/pdf/2012.01598.pdf NeurIPS Clima Workshop].
 +
*# Cai, Lei, et al. "Structural temporal graph neural networks for anomaly detection in dynamic graphs." [https://dl.acm.org/doi/pdf/10.1145/3459637.3481955 Proceedings] of the 30th ACM International Conference on Information & Knowledge Management. 2021.
 +
* '''Base algorithm:''' classification with extremely rare events, the most basic variant of log-regression + SMOTE. The paper proposes to take a combination of algorithms from articles 2 and 3 as a basis.
 +
* '''Solution:''' suggests that a combination of the algorithms from articles 2 and 3 can improve classification in such The problems with exceptionally rare events. In addition, it is supposed to use physical information to regularize the classifier (combination of temperature/humidity factors at which hail is most likely)
 +
* '''Novelty:''' (geo)-spatial time series prediction is an open area with great potential for theoretical and practical work. In particular, fire risk assessment is necessary for (1) predicting the probability of damage (agriculture, animal husbandry); (2) assessment of insurance and financial risks.
 +
* '''Authors:''' Yuri Maksimov (point of contact), Alexey Zaitsev
 +
* '''Consultants:''' Yuri Maksimov (point of contact), Alexey Zaitsev, Alexander Bulkin.
 +
 +
===Problem 95.2022===
 +
* '''Title:''' Identification the transmission rate and time-dependent noise for the stochastic SIER disease model with vital rates (Time-dependent parameter identification for a stochastic epidemic model)
 +
* '''Problem description:''' The problem is set to find the optimal time-dependent parameters for the known stochastic SIER disease propagation model. The optimal parameters are the parameters of the stochastic equation, under which the sample of the rate of spread of the virus in a limited population, when using comparison with the optimal sample. It is proposed to use the adaptive generalized method of moments with local delay (LLGMM) based on the generalized method of moments (GMM).
 +
* '''Data:''' Hopkins Institution's Coronavirus Increasing Data is available from various sources. You can also download the data yourself from the link.
 +
* '''Literature:'''
 +
*# Anna Mummert, Olusegun M. Otunuga Parameter identification for a stochastic SEIRS epidemic model: case study influenza PDF
 +
*# David M. Drukker Understanding the generalized method of moments (GMM): A simple example LINK
 +
* '''Keywords:''' Compartment disease model, Stochastic disease model, Local lagged adapted generalized method of moments, Time-dependent transmission rate
 +
* '''Base algorithm:''' there are several different options on the Internet, for example, the article B.Tseytlin Actually forecasting COVID-19 LINK, the current program does not give good convergence, because it always uses a fixed number of points for prediction
 +
* '''Novelty:''' a new LLGMM method of moments that increases the accuracy of prediction& The basic idea of the method of moments is to use in moment conditions (moment functions or simply moments) instead of mathematical expectations, sample means, which, according to the law of large numbers under sufficiently weak conditions, should converges asymptotically to the mathematical expectations. Since the number of conditions for moments in the general case is greater than the number of estimated parameters, this system of conditions does not have a unique solution. The generalized method of moments suggests a situation where it is possible to obtain more conditions for moments than estimated parameters. The method constructs moment conditions (moment functions), also called orthogonality conditions, in a more general form as some function of model parameters and data. The parameters are estimated by minimizing a certain positive quadratic form from the sample means for the moments (moment functions). The quadratic form is in an iterative process with the required accuracy. If the model contains more than one parameter (this is our case) to be identified, then the second and higher moments are used to construct moment conditions. LLGMM defines time-dependent parameters by using a limited number of "points" in a data time series to form moment conditions, rather than the entire series. So the method is late. In addition, the number of time series elements used varies for each estimate over time. Thus, the method is local and adaptive.
 +
* '''Author:''' expert Vera Markasheva (Laboratory of Computational Bioinformatics of the Center for Systems Biology)
 +
 +
===Problem 96.2022===
 +
* '''Title:''' Impact of the lockdown on the dynamics of the epidemic
 +
* '''Problem description:''' The introduction of a lockdown is considered an effective measure to combat the epidemic. However, contrary to intuition, it turned out that under certain conditions, a lockdown can lead to an increase in the epidemic. This effect is absent for classical models “on average”, but was revealed when modeling the spread of the epidemic, taking into account the contact graph. The problem is to find formulaic and quantitative relationships between the parameters under which the lockdown can lead to an increase in the epidemic.
 +
* '''Data:''' Real data on the spread of the epidemic on contact graphs, especially considering the need for scenario analysis, is not available. The problem involves working with model and synthetic data: there are ready-made data, and it is also assumed that new ones can be generated in the process of solving the problem.
 +
* '''Authors:''' Anton Bishuk, A.V. Zuhba
 +
 +
===Problem 102.2022===
 +
* '''Title:''' Graph neural networks in the problem of regression of pairs of graphs
 +
* '''Problem description:''' Considered The problem regression on a pair of graphs. In a pair, each vertex of one graph corresponds to a vertex of the second graph. It is required to establish the optimal architecture of the graph neural network, taking into account the given order specified on the vertices.
 +
* '''Data:''' It is suggested to use chemical reaction datasets [https://github.com/hesther/reactiondatabase github]. For a given dataset, a pair of graphs is specified in a natural way. These are graphs of molecules of initial substances and products of a chemical reaction.
 +
* '''Literature:'''
 +
*# [https://chemrxiv.org/engage/chemrxiv/article-details/60c74e0f9abda2cf1af8d58a DRACON: disconnected graph neural network for atom mapping in chemical reactions.]
 +
*# [https://chemrxiv.org/engage/api-gateway/chemrxiv/assets/orp/resource/item/6112ac487117507542e68bef/original/machine-learning-of-reaction-properties-via-learned-representations-of-the- condensed-graph-of-reaction.pdf Machine learning of reaction properties via learned representations of the condensed graph of reaction.]
 +
*# [https://ieeexplore.ieee.org/abstract/document/9046288 A comprehensive survey on graph neural networks.]
 +
* '''Base algorithm:''' The graph relationship is set at the level of graph embeddings. That is, a separate embedding vector is built for each graph, and then the vector data is concatenated. In this case, information about the correspondence of vertices in graphs is not explicitly used.
 +
* '''Novelty:''' On the example of the architecture of a graph neural network with fixed hyperparameters, from a theoretical and practical point of view, to study ways to add information about the relationship of graphs to a graph neural network.
 +
* '''Authors:''' Filipp Nikitin, Vadim Strijov V.V., Alexander Isaev.
 +
 +
===Problem 103.2022===
 +
* '''Requirement:''' Fluent English to collaborate, Python and PyTorch (medium level and higher), Git, Bash, Background in computational biology is a plus
 +
* '''Introduction:''' [http://www.machinelearning.ru/wiki/images/f/fa/M1p_ppis.pdf See full description here]. Proteins are involved in several biological reactions by means of interactions with other proteins or with other molecules such as nucleic acids, carbohydrates, and ligands. Among these interaction types, protein–protein interactions (PPIs) are considered to be one of the key factors as they are involved in most of the cellular processes [1]. The binding of two proteins can be viewed as a reversible and rapid process in an equilibrium that is governed by the law of mass action. Binding affinity is the strength of the interaction between two (or more than two) molecules that bind reversibly (interact). It is translated into physico-chemical terms in the dissociation constant Kd, the latter being the concentration of free protein at which half of all binding sites of the second protein type are occupied [2].
 +
* '''Objectives:''' Three main objectives of this work can be formulated as follows: 1. Refine PDBbind [12] data and a standard binding affinity dataset [3], and compile a novel benchmark of PPIs with known binding affinity values. 2. Employ graph-learning toolset to predict binding affinities of PPIs from the new dataset. 3. Benchmark the resulting method against existing state-of-the-art approaches
 +
* '''Data & Metrics:''' In this work, we will operate on experimentally-observed three-dimensional structures of protein-protein complexes annotated with the binding affinity values. Two main sources of data are the following:
 +
* PDBbind dataset [12] that includes around 2k PPIs
 +
* Standard dataset introduced in [3] that includes 144 PPIs As main regression metrics, we suggest to consider Mean Squared Error (MSE), Mean Absolute Error (MAE) and Pearson correlation.
 +
* '''Novelty:''' To the best of our knowledge, geometric deep learning methods have never been applied to the protein-protein binding affinity prediction problem so far.
 +
* '''Authors:''' Arne Schneuing, Ilia Igashov
 +
 +
===Problem 109.2022===
 +
* '''Title:''' Continuous time when building a BCI neural interface
 +
* '''Problem description:''' In Signal Decoding The problems, data is represented as multivariate time series. When solving problems, a discrete representation is used time. However, recent work on neural ordinary differential equations illustrates the ability to work with the hidden state of recurrent neural networks, as with solutions to differential equations. This allows us to consider time series as continuous in time.
 +
* '''Data:''' For classification:
 +
*# dataset P300, according to which the article was written
 +
*# dataset DEAPdataset similar to it in the format of records.
 +
*# Definition of emotions.
 +
*# Same SEED emotion classification
 +
*# Not EEG, but accelerometer data with activity/position classification
 +
*# For regression, you can take the same neurotycho, if you want to complicate life somewhat with respect to classification problems.
 +
* '''Literature:'''
 +
*# Neural Ordinary Differential Equations
 +
*# Neural controlled differential equations for irregular time series
 +
*# Latent ODEs for Irregularly-Sampled Time Series (?)
 +
*# GRU-ODE-Bayes: Continuous modeling of sporadically-observed time series (?)
 +
*# Neural Rough Differential Equations for Long Time Series (?)
 +
*# ODE2VAE: Deep generative second order ODEs with Bayesian neural networks (?)
 +
*# Go with the Flow: Adaptive Control for Neural ODEs
 +
*# Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks
 +
*# My master's
 +
* '''Base algorithm:''' Alina Samokhina's algorithm
 +
* '''Solution:''' Using NeurODE variations to approximate the original signal. (Bayes, partial derivatives, etc.). Comparative analysis of existing approaches to the application of differential equations for EEG classification
 +
* '''Novelty:''' suggests a way to construct a continuous signal representation. Working with the functional space of the signal, not its discrete representation. Using the parameters of the resulting function as a feature space of the resulting model.
 +
* '''Authors:''' Alina Samokhina, Strijov V.V.
 +
 +
===Problem 104.2022===
 +
* '''Title:''' (Clarification awaited) Cross-language duplicate search
 +
* '''Problem description:''' The problem of cross-language search for text plagiarism is set. The search for duplicates of the original text is carried out among texts in 100 different languages.
 +
* '''Data:'''
 +
*# A selection of scientific articles from the scientific electronic library eLIBRARY.ru, as well as articles from the Wikipedia online encyclopedia, is used as a training sample.
 +
*# The State Rubricator of Scientific and Technical Information (SRSTI), the Universal Decimal Classifier (UDC) are considered as scientific rubricators.
 +
*# The following are used as search quality metrics:
 +
*# average frequency - the frequency, averaged over the control languages, with which the query document falls into the top 10% of documents among which the search is carried out
 +
*# average percentage - the percentage of documents, averaged over the control languages, that are in the top 10% of translation documents that have the same scientific heading as the query document
 +
* '''Literature:''' Vorontsov K. V. Probabilistic thematic modeling: review of models and additive regularization [http://www.machinelearning.ru/wiki/images/d/d5/Voron17survey-artm.pdf PDF]
 +
* '''Base algorithm:'''
 +
*# Hierarchical topic models
 +
*# Topic models with one-pass document vectorization
 +
* '''Solution:''' To solve the search problem, a multimodal thematic model was built. 100 languages were used as modalities, as well as scientific headings, which included articles from the training data. A series of experiments was carried out to improve search quality metrics, including: selection of the optimal tokenization method, addition of regularizers, selection of thematic vector comparison functions, ranking functions, etc.
 +
* '''Novelty:''' Most systems for finding documents in large collections are based on vectorization of the documents in the collection and the search document in one way or another. The latest ways to vectorize documents are usually limited to one language. In this case, the problem arises of creating a uniform system for obtaining vector embeddings of a multilingual collection of documents. The proposed approach makes it possible to train a topic model that encodes information about the distribution of words in a text, regardless of their language affiliation. Also, the solution is subject to restrictions on the size of the model and training time, due to the possibility of practical use of the described model.
 +
* '''Author:''' Polina Potapova, Konstantin Vorontsov
 +
 +
===Problem 52.2022===
 +
* '''Title:''' (pending clarification) Predicting the quality of protein models using spherical convolutions on 3D graphs.
 +
* '''Problem description:''' The purpose of this work is to create and study a new convolution operation on three-dimensional graphs within the framework of solving the problem of assessing the quality of three-dimensional protein models (The problem regression on graph nodes).
 +
* '''Data:''' [http://predictioncenter.org Models generated by CASP contestants] are used.
 +
* '''Literature:'''
 +
*# [https://drive.google.com/file/d/1pXCED8XBcxbjwtg_1wZG0oAjvUCxFlua/view?usp=sharing The problem details].
 +
*# [https://arxiv.org/abs/1806.01261 Relational inductive biases, deep learning, and graph networks].
 +
*# [https://arxiv.org/abs/1611.08097 Geometric deep learning: going beyond euclidean data].
 +
* '''Base algorithm:''' As a base algorithm, we will use a neural network based on the graph convolution method, which is generally described in [https://arxiv.org/abs/1806.01261].
 +
* '''Solution:''' The presence of a peptide chain in proteins allows you to uniquely enter local coordinate systems for all graph nodes, which makes it possible to create and apply spherical filters regardless of the graph topology.
 +
* '''Novelty:''' In general, graphs are irregular structures, and in many Graph Learning The problems, sample objects do not have a single topology. Therefore, the existing operations of convolutions on graphs are greatly simplified or do not generalize to different topologies. In this paper, we propose to consider a new method for constructing a convolution operation on three-dimensional graphs, for which it is possible to uniquely choose local coordinate systems associated with each node.
 +
* '''Author:''' Sergey Grudinin
 +
 +
===Problem 110. 2022 (technical)===
 +
* '''Title:''' Detection of defects on the car body
 +
* '''SubThe problems:''' Classification of cars by type and brand, Classification of car parts (door, hood, roof, etc.), Segmentation of defective areas on different parts of the car, Classification of defects by type (dent, scratch, glass damage), Assessment of the degree of damage,
 +
* '''Data:'''
 +
*# Coco Car Damage Detection Dataset - 70 photos of damaged cars with frames, semantic mask and damage type (headlight, front bumper, hood, door, rear bumper)
 +
*# Сar_damage - 920 photos of damaged cars with labeled masks
 +
*# CarDent-Detection-Assessment - 100 photos of damaged cars with labeled masks
 +
*# CarAccidentDataset - 52 photos of damaged cars with labeled masks
 +
*# Car damage detection - 950 photos of damaged and 1150 photos of whole cars
 +
*# Car Damage - 1512 photos of damaged cars. Labeled to classify the type of damage
 +
*# Cars Dataset - 16185 photos of whole cars, 196 models. Images with different angles, labels and frames of machine elements for matching angles.
 +
* '''Author:''' Andrey Inyakin
 +
 +
===Problem 111.2022 (technical)===
 +
* '''Title:''' Recognition of named entities in informational Russian-language news
 +
* '''SubThe problems:''' Estimating the accuracy of available NER models (up to 2 weeks for data collection and markup)
 +
* '''Base algorithm:''' Development of an algorithm for saturation (augmentation) of the training sample with rare named entities
 +
* '''Data:''' To solve the problem, datasets of news from Interfax with the markup of named entities will be prepared.
 +
 +
==2021==
{|class="wikitable"
{|class="wikitable"
|-
|-
Строка 14: Строка 652:
! Reviewer
! Reviewer
|-
|-
-
|[[Участник:Magistrkoljan| Grebenkova Olga]] (example)
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Magistrkoljan Grebenkova Olga]
|Variational optimization of deep learning models with model complexity control
|Variational optimization of deep learning models with model complexity control
|[https://docs.google.com/document/d/1gHyVeYgzFgco1vUTZRjxT2FbO03GsB27EVEstLWTzdM/edit?usp=sharing LinkReview]
|[https://docs.google.com/document/d/1gHyVeYgzFgco1vUTZRjxT2FbO03GsB27EVEstLWTzdM/edit?usp=sharing LinkReview]
Строка 21: Строка 659:
[https://github.com/Intelligent-Systems-Phystech/2020-Project60/raw/master/slides/Grebenkova2020OptimizationSlides.pdf Slides]
[https://github.com/Intelligent-Systems-Phystech/2020-Project60/raw/master/slides/Grebenkova2020OptimizationSlides.pdf Slides]
[https://youtu.be/9ELhIqjFSE8 Video]
[https://youtu.be/9ELhIqjFSE8 Video]
-
|[[Участник:Oleg Bakhteev|Oleg Bakhteev]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Oleg_Bakhteev Oleg Bakhteev]
|AILP+UXBR+HCV+TEDWSS
|AILP+UXBR+HCV+TEDWSS
-
|[[Участник:Oleg Bakhteev|Shokorov Vyacheslav]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Vshokorov Shokorov Vyacheslav]
[https://github.com/Intelligent-Systems-Phystech/2020_Project_9/raw/master/review%20Grebenkova.pdf Review]
[https://github.com/Intelligent-Systems-Phystech/2020_Project_9/raw/master/review%20Grebenkova.pdf Review]
|-
|-
-
|[[Участник:Anton39reg| Pilkevich Anton]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Anton39reg Pilkevich Anton]
|Existence conditions for hidden feedback loops in recommender systems
|Existence conditions for hidden feedback loops in recommender systems
|[https://github.com/Intelligent-Systems-Phystech/2021-Project-74 GitHub]
|[https://github.com/Intelligent-Systems-Phystech/2021-Project-74 GitHub]
Строка 33: Строка 671:
[https://github.com/Intelligent-Systems-Phystech/2021-Project-74/raw/main/docs/Pilkevich2021Presentation/Pilkevich2021Presentation.pdf Slides]
[https://github.com/Intelligent-Systems-Phystech/2021-Project-74/raw/main/docs/Pilkevich2021Presentation/Pilkevich2021Presentation.pdf Slides]
[https://www.youtube.com/watch?v=xW_lXGn1WHs&t=24s Video]
[https://www.youtube.com/watch?v=xW_lXGn1WHs&t=24s Video]
-
|[[Участник:Khritankov| Khritankov Anton]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Khritankov Khritankov Anton]
|AILB*P-X+R-B-H1CVO*T-EM*H1WJSF
|AILB*P-X+R-B-H1CVO*T-EM*H1WJSF
-
|[[Участник:Gorpinich|Gorpinich Maria]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Gorpinich Gorpinich Maria]
[https://github.com/Intelligent-Systems-Phystech/2021-Project-84/raw/main/docs/Pilkevich2021HiddenFeedbackLoops_review.pdf Review]
[https://github.com/Intelligent-Systems-Phystech/2021-Project-84/raw/main/docs/Pilkevich2021HiddenFeedbackLoops_review.pdf Review]
|-
|-
-
|[[Участник:Antonina Kurdyukova| Antonina Kurdyukova|]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Antonina_Kurdyukova Antonina Kurdyukova]
|Determining the phase and disorder of human movement based on the signals of wearable devices
|Determining the phase and disorder of human movement based on the signals of wearable devices
|[https://docs.google.com/document/d/1ts2i6Cq6CCFf3YWGPhtDxDlfj3OCoQGC3RcXou9bo1I/edit?usp=sharing LinkReview]
|[https://docs.google.com/document/d/1ts2i6Cq6CCFf3YWGPhtDxDlfj3OCoQGC3RcXou9bo1I/edit?usp=sharing LinkReview]
Строка 45: Строка 683:
[https://github.com/Intelligent-Systems-Phystech/2021-Project77/raw/main/slides/Kurdyukova2021Presentation_ru.pdf Slides]
[https://github.com/Intelligent-Systems-Phystech/2021-Project77/raw/main/slides/Kurdyukova2021Presentation_ru.pdf Slides]
[https://www.youtube.com/watch?v=xW_lXGn1WHs&t=684s Video]
[https://www.youtube.com/watch?v=xW_lXGn1WHs&t=684s Video]
-
|[[Участник:KormakovG| Georgy Kormakov]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:KormakovG Georgy Kormakov]
|AILB*PXBRH1CVO*TEM*WJSF
|AILB*PXBRH1CVO*TEM*WJSF
-
|[[Участник:Anton39reg| Pilkevich Anton]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Anton39reg Pilkevich Anton]
[https://github.com/Intelligent-Systems-Phystech/2021-Project-74/raw/main/docs/review_Kurdyukova.pdf Review]
[https://github.com/Intelligent-Systems-Phystech/2021-Project-74/raw/main/docs/review_Kurdyukova.pdf Review]
|-
|-
-
|[[Участник: Yakovlev kd| Yakovlev Konstantin]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Yakovlev_kd Yakovlev Konstantin]
|A differentiable search algorithm for model architecture with control over its complexity
|A differentiable search algorithm for model architecture with control over its complexity
|[https://docs.google.com/document/d/1cxWRiZ1a4JR83kYvxtXwpiOR-g8ar4_NaQ6E2ealEF0/edit LinkReview]
|[https://docs.google.com/document/d/1cxWRiZ1a4JR83kYvxtXwpiOR-g8ar4_NaQ6E2ealEF0/edit LinkReview]
Строка 57: Строка 695:
[https://github.com/Intelligent-Systems-Phystech/2021-Project85/raw/main/slides/Yakovlev2021Presentation_ru.pdf Slides]
[https://github.com/Intelligent-Systems-Phystech/2021-Project85/raw/main/slides/Yakovlev2021Presentation_ru.pdf Slides]
[https://www.youtube.com/watch?v=xW_lXGn1WHs&t=1157s Video]
[https://www.youtube.com/watch?v=xW_lXGn1WHs&t=1157s Video]
-
|[[Участник: Magistrkoljan| Grebenkova Olga]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Magistrkoljan Grebenkova Olga]
|AILB*PXBRH1CVO*TEM*WJSF
|AILB*PXBRH1CVO*TEM*WJSF
-
|[[Участник: Vitalii_kondratiuk| Pyrau Vitaly]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Vitalii_kondratiuk Pyrau Vitaly]
[https://github.com/Intelligent-Systems-Phystech/2021-Project-Planning/raw/main/docs/Yakovlev2021DARTS_Review.pdf Review]
[https://github.com/Intelligent-Systems-Phystech/2021-Project-Planning/raw/main/docs/Yakovlev2021DARTS_Review.pdf Review]
|-
|-
-
|[[Участник:Gorpinich|Gorpinich Maria]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Gorpinich Gorpinich Maria]
|Trajectory Regularization of Deep Learning Model Parameters Optimization Based on Knowledge Distillation
|Trajectory Regularization of Deep Learning Model Parameters Optimization Based on Knowledge Distillation
|[https://docs.google.com/document/d/1kQj66GEPv4Dx21A1_zJJKLRR1OujsLgkJrgKO5DCz70/edit?usp=sharing LinkReview]
|[https://docs.google.com/document/d/1kQj66GEPv4Dx21A1_zJJKLRR1OujsLgkJrgKO5DCz70/edit?usp=sharing LinkReview]
Строка 69: Строка 707:
[https://github.com/Intelligent-Systems-Phystech/2021-Project-84/raw/main/docs/slides/Gorpinich2021DistillingKnowledgeSlides.pdf Slides]
[https://github.com/Intelligent-Systems-Phystech/2021-Project-84/raw/main/docs/slides/Gorpinich2021DistillingKnowledgeSlides.pdf Slides]
[https://www.youtube.com/watch?v=xW_lXGn1WHs&t=1625s Video]
[https://www.youtube.com/watch?v=xW_lXGn1WHs&t=1625s Video]
-
|[[Участник: Oleg Bakhteev|Oleg Bakhteev]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Oleg_Bakhteev Oleg Bakhteev]
|AILB*P+XBRC+VH1O*TEM*WJSF
|AILB*P+XBRC+VH1O*TEM*WJSF
-
|[[Участник:Kulackov| Kulakov Yaroslav]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Kulackov Kulakov Yaroslav]
[https://github.com/Intelligent-Systems-Phystech/2021-Project-17/raw/main/docs/GorpinichMaria2020PaperReview.pdf Review]
[https://github.com/Intelligent-Systems-Phystech/2021-Project-17/raw/main/docs/GorpinichMaria2020PaperReview.pdf Review]
|-
|-
-
||[[Участник: Alexandr Tolmachev| Alexandr Tolmachev]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Alexandr_Tolmachev Alexandr Tolmachev]
|Analysis of the QPFS Feature Selection Method for Generalized Linear Models
|Analysis of the QPFS Feature Selection Method for Generalized Linear Models
|[https://docs.google.com/document/d/1mtJc1ZqMSmPh9nRjdCZCV-zOSfDNp3Sejo3sx8mHw9Q/edit?usp=sharing LinkReview]
|[https://docs.google.com/document/d/1mtJc1ZqMSmPh9nRjdCZCV-zOSfDNp3Sejo3sx8mHw9Q/edit?usp=sharing LinkReview]
Строка 81: Строка 719:
[https://github.com/Intelligent-Systems-Phystech/2021-Project-87/raw/main/Slides/Tolmachev2021Presentation.pdf Slides]
[https://github.com/Intelligent-Systems-Phystech/2021-Project-87/raw/main/Slides/Tolmachev2021Presentation.pdf Slides]
[https://www.youtube.com/watch?v=xW_lXGn1WHs&t=2201s Video]
[https://www.youtube.com/watch?v=xW_lXGn1WHs&t=2201s Video]
-
|[[Участник:Aduenko| Aduenko Alexander]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Aduenko Aduenko Alexander]
|AILB*PXB-R-H1CVO*TEM*WJSF
|AILB*PXB-R-H1CVO*TEM*WJSF
-
|[[Участник:Antonina Kurdyukova|Antonina Kurdyukova]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Antonina_Kurdyukova Antonina Kurdyukova]
[https://github.com/Intelligent-Systems-Phystech/2021-Project77/raw/main/docs/Tolmachev2021BayesApproach_Review.pdf Review]
[https://github.com/Intelligent-Systems-Phystech/2021-Project77/raw/main/docs/Tolmachev2021BayesApproach_Review.pdf Review]
|-
|-
-
|[[Участник:Kulackov| Kulakov Yaroslav]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Kulackov Kulakov Yaroslav]
|BCI: Selection of consistent models for building a neural interface
|BCI: Selection of consistent models for building a neural interface
|[https://docs.google.com/document/d/1w28UOFRZgXhvt2MZqgdj682vGS9fjP6EUijrQqYoUPs/edit?usp=sharing LinkReview]
|[https://docs.google.com/document/d/1w28UOFRZgXhvt2MZqgdj682vGS9fjP6EUijrQqYoUPs/edit?usp=sharing LinkReview]
Строка 93: Строка 731:
[https://github.com/Intelligent-Systems-Phystech/2021-Project-17/raw/main/presentation/Kulakov2021Presentation.pdf Slides]
[https://github.com/Intelligent-Systems-Phystech/2021-Project-17/raw/main/presentation/Kulakov2021Presentation.pdf Slides]
[https://www.youtube.com/watch?v=xW_lXGn1WHs&t=2850s Video]
[https://www.youtube.com/watch?v=xW_lXGn1WHs&t=2850s Video]
-
|[[Участник:Isachenkoroma| Isachenko Roman]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Isachenkoroma Isachenko Roman]
|AILB*PXBRH1CVO*TEM*WJ0SF
|AILB*PXBRH1CVO*TEM*WJ0SF
-
|[[Участник:Zverev.eo| Zverev Egor]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Zverev.eo Zverev Egor]
[https://raw.githubusercontent.com/Intelligent-Systems-Phystech/2021-Project-86/main/docs/PeerReviewForKulakov(RUS).pdf Review]
[https://raw.githubusercontent.com/Intelligent-Systems-Phystech/2021-Project-86/main/docs/PeerReviewForKulakov(RUS).pdf Review]
|-
|-
-
|[[Участник:Vitalii_kondratiuk| Pyrau Vitaly]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Vitalii_kondratiuk Pyrau Vitaly]
|Experimental comparison of several problems of operational planning of biochemical production.
|Experimental comparison of several problems of operational planning of biochemical production.
|[https://docs.google.com/document/d/115kv-KWPdX5R_UkEA8UlZV9opw-OmnevRM87R3xrn6k/edit?usp=sharing LinkReview]
|[https://docs.google.com/document/d/115kv-KWPdX5R_UkEA8UlZV9opw-OmnevRM87R3xrn6k/edit?usp=sharing LinkReview]
Строка 107: Строка 745:
|[https://mipt.ru/education/chairs/dm/staff/trenin.php Trenin Sergey Alekseevich]
|[https://mipt.ru/education/chairs/dm/staff/trenin.php Trenin Sergey Alekseevich]
|AILB*PXBRH1CVO*TEM*WJSF
|AILB*PXBRH1CVO*TEM*WJSF
-
|[[Участник: Yakovlev kd| Yakovlev Konstantin]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Yakovlev_kd Yakovlev Konstantin]
[https://github.com/Intelligent-Systems-Phystech/2021-Project85/raw/main/docs/Pirau2021_Scheduling_Review.pdf Review]
[https://github.com/Intelligent-Systems-Phystech/2021-Project85/raw/main/docs/Pirau2021_Scheduling_Review.pdf Review]
|-
|-
-
|[[Участник:Bazhenov.aa| Bazhenov Andrey]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Bazhenov.aa Bazhenov Andrey]
|Search for the boundaries of the iris by the method of circular projections
|Search for the boundaries of the iris by the method of circular projections
|[https://docs.google.com/document/d/1rmd1MQemJhgHG7W3p3qH2Di7KAxtClImVB_Gx_lOWCY/edit?usp=sharing LinkReview]
|[https://docs.google.com/document/d/1rmd1MQemJhgHG7W3p3qH2Di7KAxtClImVB_Gx_lOWCY/edit?usp=sharing LinkReview]
Строка 117: Строка 755:
[https://github.com/Intelligent-Systems-Phystech/2021-Project88/raw/master/slides/Bazhenov2021Presentation.pdf Slides]
[https://github.com/Intelligent-Systems-Phystech/2021-Project88/raw/master/slides/Bazhenov2021Presentation.pdf Slides]
[https://www.youtube.com/watch?v=xW_lXGn1WHs&t=4712s Video]
[https://www.youtube.com/watch?v=xW_lXGn1WHs&t=4712s Video]
-
|[[Участник:IvanMatveev| Matveev Ivan Alekseevich]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:IvanMatveev Matveev Ivan Alekseevich]
|AILB*PXB0RH1CVO*TEM*WJ0SF
|AILB*PXB0RH1CVO*TEM*WJ0SF
|
|
|-
|-
-
|[[Участник:Zverev.eo| Zverev Egor]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Zverev.eo Zverev Egor]
|Learning co-evolution information with natural language processing for protein folding problem
|Learning co-evolution information with natural language processing for protein folding problem
|[https://docs.google.com/document/d/1x4TGjGlGjtr2m4hhzY3qGSFU7bZ6wv03eHwpTOVC3-8/edit?usp=sharing LinkReview]
|[https://docs.google.com/document/d/1x4TGjGlGjtr2m4hhzY3qGSFU7bZ6wv03eHwpTOVC3-8/edit?usp=sharing LinkReview]
Строка 127: Строка 765:
[https://raw.githubusercontent.com/Intelligent-Systems-Phystech/2021-Project-86/main/docs/Zverev2021CoevolutionFromLMs.pdf Paper] [https://raw.githubusercontent.com/Intelligent-Systems-Phystech/2021-Project-86/main/docs/Zverev2021Presentation.pdf Slides]
[https://raw.githubusercontent.com/Intelligent-Systems-Phystech/2021-Project-86/main/docs/Zverev2021CoevolutionFromLMs.pdf Paper] [https://raw.githubusercontent.com/Intelligent-Systems-Phystech/2021-Project-86/main/docs/Zverev2021Presentation.pdf Slides]
[https://www.youtube.com/watch?v=xW_lXGn1WHs&t=4184s Video]
[https://www.youtube.com/watch?v=xW_lXGn1WHs&t=4184s Video]
-
|[https://team.inria.fr/nano-d/team-members/sergei-grudinin/ Sergei Grudinin], [[Участник:Igashov| Ilya Igashov]]
+
|[https://team.inria.fr/nano-d/team-members/sergei-grudinin/ Sergei Grudinin], |[http://www.machinelearning.ru/wiki/index.php?title=Участник:Igashov Ilya Igashov]
|AILB*PXBRH1CVO*TEM*WJSF
|AILB*PXBRH1CVO*TEM*WJSF
-
|[[Участник: Alexandr Tolmachev| Alexandr Tolmachev]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Alexandr_Tolmachev Alexandr Tolmachev]
[https://github.com/Intelligent-Systems-Phystech/2021-Project-87/raw/main/docs/Zverev2021Review.pdf Review]
[https://github.com/Intelligent-Systems-Phystech/2021-Project-87/raw/main/docs/Zverev2021Review.pdf Review]
|-
|-
Строка 140: Строка 778:
|[https://faculty.skoltech.ru/people/yurymaximov Yuri Maksimov]
|[https://faculty.skoltech.ru/people/yurymaximov Yuri Maksimov]
|AILB*PX0B0R0H1C0V0O*0T0E0M*0W0JS0F
|AILB*PX0B0R0H1C0V0O*0T0E0M*0W0JS0F
-
|[[Участник:Bazhenov.aa| Bazhenov Andrey]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Bazhenov.aa Bazhenov Andrey]
[https://github.com/Intelligent-Systems-Phystech/2021-Project88/raw/master/docs/Gorchakov2021_Importance_Sampling_for_Chance_Constrained_Optimization_Review.pdf Review]
[https://github.com/Intelligent-Systems-Phystech/2021-Project88/raw/master/docs/Gorchakov2021_Importance_Sampling_for_Chance_Constrained_Optimization_Review.pdf Review]
|-
|-
-
|[[Участник:NikLin|Lindemann Nikita]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:NikLin Lindemann Nikita]
|Training with an expert for a sample with many domains
|Training with an expert for a sample with many domains
|[https://docs.google.com/document/d/1wL99D7UyY2uJqHwvxTfTKX3REoauyub5L8bFnRnwpJU/edit?usp=sharing LinkReview]
|[https://docs.google.com/document/d/1wL99D7UyY2uJqHwvxTfTKX3REoauyub5L8bFnRnwpJU/edit?usp=sharing LinkReview]
Строка 149: Строка 787:
[https://github.com/Intelligent-Systems-Phystech/2021-Project-82/raw/main/docs/Lindemann2021DomainAdaptation.pdf Paper]
[https://github.com/Intelligent-Systems-Phystech/2021-Project-82/raw/main/docs/Lindemann2021DomainAdaptation.pdf Paper]
[https://github.com/Intelligent-Systems-Phystech/2021-Project-82/raw/main/Slides/Lindemann2021PresentationDomainAdaptation.pdf Slides]
[https://github.com/Intelligent-Systems-Phystech/2021-Project-82/raw/main/Slides/Lindemann2021PresentationDomainAdaptation.pdf Slides]
-
|[[Участник:Andriygav|Andrey Grabovoi]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Andriygav Andrey Grabovoi]
|AILPXBRH1C0V0O*TE0M*0W0J0SF0
|AILPXBRH1C0V0O*TE0M*0W0J0SF0
|
|
|-
|-
|}
|}
-
===Task 74===
+
 
-
* '''Name:''' Existence conditions for hidden feedback loops in recommender systems
+
===Problem 74.2021===
 +
* '''Title:''' Existence conditions for hidden feedback loops in recommender systems
* '''Problem description:''' In recommender systems, the effect of artificially inadvertently limiting the user's choice due to the adaptation of the model to his preferences (echo chamber / filter bubble) is known. The effect is a special case of hidden feedback loops. (see - Analysis H.F.L.). It is expressed in the fact that by recommending the same objects of interest to the user, the algorithm maximizes the quality of its work. The problem is a) lack of variety b) saturation / volatility of the user's interests.
* '''Problem description:''' In recommender systems, the effect of artificially inadvertently limiting the user's choice due to the adaptation of the model to his preferences (echo chamber / filter bubble) is known. The effect is a special case of hidden feedback loops. (see - Analysis H.F.L.). It is expressed in the fact that by recommending the same objects of interest to the user, the algorithm maximizes the quality of its work. The problem is a) lack of variety b) saturation / volatility of the user's interests.
-
* '''Task: '''It is clear that the algorithm does not know the interests of the user and the user is not always honest in his choice. Under what conditions, what properties of the learning algorithm and dishonesty (deviation of the user's choice from his interests) will the indicated effect be observed? Clarification. The recommendation algorithm gives the user a_t objects to choose from. The user selects one of them c_t from Bernoulli from the model of interest mu(a_t) . Based on the user's choice, the algorithm changes its internal state w_t and gives the next set of objects to the user. On an infinite horizon, you need to maximize the total reward sum c_t. Find the conditions for the existence of an unlimited growth of user interest in the proposed objects in a recommender system with the Thomson Sampling (TS) MAB algorithm under conditions of noisy user choice c_t. Without noise, it is known that there is always unlimited growth (in the model) [1].
+
* '''Problem description:'''It is clear that the algorithm does not know the interests of the user and the user is not always honest in his choice. Under what conditions, what properties of the learning algorithm and dishonesty (deviation of the user's choice from his interests) will the indicated effect be observed? Clarification. The recommendation algorithm gives the user a_t objects to choose from. The user selects one of them c_t from Bernoulli from the model of interest mu(a_t) . Based on the user's choice, the algorithm changes its internal state w_t and gives the next set of objects to the user. On an infinite horizon, you need to maximize the total reward sum c_t. Find the conditions for the existence of an unlimited growth of user interest in the proposed objects in a recommender system with the Thomson Sampling (TS) MAB algorithm under conditions of noisy user choice c_t. Without noise, it is known that there is always unlimited growth (in the model) [1].
* '''Data:''' are created as part of the experiment (simulation model) by analogy with the article [1], external data is not required.
* '''Data:''' are created as part of the experiment (simulation model) by analogy with the article [1], external data is not required.
* '''References:'''
* '''References:'''
Строка 163: Строка 802:
*# Khritankov, A. (2021). Hidden Feedback Loops in Machine Learning Systems: A Simulation Model and Preliminary Results. In International Conference on Software Quality (pp. 54-65). Springer, Cham.
*# Khritankov, A. (2021). Hidden Feedback Loops in Machine Learning Systems: A Simulation Model and Preliminary Results. In International Conference on Software Quality (pp. 54-65). Springer, Cham.
*# Khritankov A. (2021). Hidden feedback loop experiment demo. https://github.com/prog-autom/hidden-demo
*# Khritankov A. (2021). Hidden feedback loop experiment demo. https://github.com/prog-autom/hidden-demo
-
* '''Basic algorithm:''' The initial mathematical model of the phenomenon under study is described in the article [1]. The method of experimental research is in the article [2]. The base source code is available at [3]
+
* '''Base algorithm:''' The initial mathematical model of the phenomenon under study is described in the article [1]. The method of experimental research is in the article [2]. The base source code is available at [3]
-
* '''Solution:''' It is necessary to derive conditions for the existence of positive feedback for the Thomson Sampling Multi-armed Bandit algorithm based on the known theoretical properties of this algorithm. Then check their performance in the simulation model. For verification, a series of experiments is performed with the study of parameter ranges and the estimation of the error (variance) of the simulation. The results are compared with the previously constructed mathematical model of the effect. There is an implementation of the experiment system that can be improved for this task.
+
* '''Solution:''' It is necessary to derive conditions for the existence of positive feedback for the Thomson Sampling Multi-armed Bandit algorithm based on the known theoretical properties of this algorithm. Then check their performance in the simulation model. For verification, a series of experiments is performed with the study of parameter ranges and the estimation of the error (variance) of the simulation. The results are compared with the previously constructed mathematical model of the effect. There is an implementation of the experiment system that can be improved for this The problem.
-
* '''Novelty:''' The studied positive feedback effect is observed in real and model systems and is described in many publications as an undesirable phenomenon. There is his model for the limited case of the absence of noise in the user's actions, which is not implemented in practice. Under the proposed conditions, Task has not previously been posed and not solved for recommender systems. For the regression problem, the solution is known.
+
* '''Novelty:''' The studied positive feedback effect is observed in real and model systems and is described in many publications as an undesirable phenomenon. There is his model for the limited case of the absence of noise in the user's actions, which is not implemented in practice. Under the proposed conditions, The problem has not previously been posed and not solved for recommender systems. For the regression problem, the solution is known.
-
* '''Authors:''' Expert, consultant - Anton Khritankov
+
* '''Authors:''' Expert, consultant Anton Khritankov
-
===Task 77===
+
===Problem 77.2021===
-
* '''Name:''' Determining the phase and disorder of human movement by signals from wearable devices
+
* '''Title:''' Determining the phase and disorder of human movement by signals from wearable devices
-
* '''Task:''' A wide class of periodic movements of a person or an animal is investigated. It is required to find the beginning and end of the movement. It is required to understand when one type of movement ends and another begins. For this, the Task of segmentation of time series is solved. The phase trajectory of one movement is constructed and its actual dimension is found. The purpose of the work is to describe a method for finding the minimum dimension of the phase space. By repetition of the phase, segment the periodic actions of a person. It is also necessary to propose a method for extracting the zero phase in a given space for a specific action. Bonus: find the discord in the phase trajectory and indicate the change in the type of movement. Bonus 2: do this for different phone positions by proposing invariant transformation models.
+
* '''Problem description:''' A wide class of periodic movements of a person or an animal is investigated. It is required to find the beginning and end of the movement. It is required to understand when one type of movement ends and another begins. For this, The problem of segmentation of time series is solved. The phase trajectory of one movement is constructed and its actual dimension is found. The purpose of the work is to describe a method for finding the minimum dimension of the phase space. By repetition of the phase, segment the periodic actions of a person. It is also necessary to propose a method for extracting the zero phase in a given space for a specific action. Bonus: find the discord in the phase trajectory and indicate the change in the type of movement. Bonus 2: do this for different phone positions by proposing invariant transformation models.
* '''Data:''' The data consists of time series read from a three-axis accelerometer with an explicit periodic class (walking, running, walking up and down stairs, etc.). It is possible to get your own data from a mobile device, or get model data from the dataset [https://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones UCI HAR]
* '''Data:''' The data consists of time series read from a three-axis accelerometer with an explicit periodic class (walking, running, walking up and down stairs, etc.). It is possible to get your own data from a mobile device, or get model data from the dataset [https://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones UCI HAR]
* '''References:'''
* '''References:'''
-
*# A. P. Motrenko, V. V. Strijov. Extracting fundamental periods to segment biomedical signals // Journal of Biomedical and Health Informatics, 2015, 20(6).P. 1466–1476
1.(Сегментация временных рядов с периодическими действиями: решалась Task сегментации с использованием фазового пространства фиксированной размерности.) [http://strijov.com/papers/MotrenkoStrijov2014RV2.pdf PDF][http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group874/Motrenko2014TSsegmentation/JBHI/MotrenkoStrijov2014RV2.pdf?format=raw URL]
+
*# A. P. Motrenko, V. V. Strijov. Extracting fundamental periods to segment biomedical signals // Journal of Biomedical and Health Informatics, 2015, 20(6).P. 1466–1476. Time series segmentation with periodic actions: The segmentation problem was solved using a fixed-dimensional phase space. [http://strijov.com/papers/MotrenkoStrijov2014RV2.pdf PDF][http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group874/Motrenko2014TSsegmentation/JBHI/MotrenkoStrijov2014RV2.pdf?format=raw URL]
-
*# A.D. Ignatov, V. V. Strijov. Human activity recognition using quasi-periodic time series collected from a single triaxial accelerometer. // Multimedia Tools and Applications, 2015, P. 1–14.
( Классификация человеческой активности с помощью сегментации временных рядов
: исследовались классификаторы над получаемыми сегментами.) [https://rdcu.be/6oBD PDF][http://strijov.com/papers/Ignatov2015HumanActivity.pdf URL]
+
*# A.D. Ignatov, V. V. Strijov. Human activity recognition using quasi-periodic time series collected from a single triaxial accelerometer. // Multimedia Tools and Applications, 2015, P. 1–14. Classification of human activity using time series segmentation: classifiers were studied on the resulting segments. [https://rdcu.be/6oBD PDF][http://strijov.com/papers/Ignatov2015HumanActivity.pdf URL]
-
*# Grabovoy, A.V., Strijov, V.V. Quasi-Periodic Time Series Clustering for Human Activity Recognition. Lobachevskii J Math 41, 333–339 (2020). (Сегментация временных рядов на квазипериодические сегменты
: исследовались методы сегментации с использованием анализа главных компонент and перехода в фазовое пространство.) [http://www.machinelearning.ru/wiki/images/c/cd/Grabovoy2019BSThesis.pdf Text] [http://www.machinelearning.ru/wiki/images/1/19/Grabovoy2019TimeSeriesClusteringSlides.pdf Slides] [https://doi.org/10.1134/S1995080220030075
1 DOI]
+
*# Grabovoy, A.V., Strijov, V.V. Quasi-Periodic Time Series Clustering for Human Activity Recognition. Lobachevskii J Math 41, 333–339 (2020). Segmentation of time series into quasi-periodic segments: Segmentation methods were explored using principal component analysis and transition to phase space. [http://www.machinelearning.ru/wiki/images/c/cd/Grabovoy2019BSThesis.pdf Text] [http://www.machinelearning.ru/wiki/images/1/19/Grabovoy2019TimeSeriesClusteringSlides.pdf Slides] [https://doi.org/10.1134/S19950802200300751 DOI]
-
* '''Basic algorithm:''' The basic algorithm is described in 1 and 3 works, [https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group874/Motrenko2014TSsegmentation/ code here], work code 3 author.
+
* '''Base algorithm:''' The basic algorithm is described in 1 and 3 works, [https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group874/Motrenko2014TSsegmentation/ code here], work code 3 author.
* '''Solution:''' It is proposed to consider various dimensionality reduction algorithms and compare different spaces in which the phase trajectory is constructed. Develop an algorithm for finding the minimum dimension of the phase space in which the phase trajectory has no self-intersections up to the standard deviation of the reconstructed trajectory.
* '''Solution:''' It is proposed to consider various dimensionality reduction algorithms and compare different spaces in which the phase trajectory is constructed. Develop an algorithm for finding the minimum dimension of the phase space in which the phase trajectory has no self-intersections up to the standard deviation of the reconstructed trajectory.
* '''Novelty:''' In Motrenko's article, the space dimension is equal to two. This shortcoming must be corrected. The phase trajectory must not intersect itself. And if we can distinguish one type of movement from another within one period (switched from running to a step and realized this within one and a half steps), it will be great.
* '''Novelty:''' In Motrenko's article, the space dimension is equal to two. This shortcoming must be corrected. The phase trajectory must not intersect itself. And if we can distinguish one type of movement from another within one period (switched from running to a step and realized this within one and a half steps), it will be great.
-
* '''Authors:''' 
consultants: Kormakov G.V., Tikhonov D.M., Expert Strizhov V.V.
+
* '''Authors:'''
 +
consultants: Kormakov G.V., Tikhonov D.M., Expert Strijov V.V.
-
===Task 78===
+
===Problem 78. 2021===
-
* '''Name:''' Importance Sampling for Scenario Approximation of Chance Constrained Optimization
+
* '''Title:''' Importance Sampling for Scenario Approximation of Chance Constrained Optimization
-
* '''Task:''' Optimization problems with probabilistic constraints are often encountered in engineering practice. For example, the Task of minimizing energy generation in energy networks, with (randomly fluctuating) renewable energy sources. In this case, it is necessary to comply with safety restrictions: voltages at generators and consumers, as well as currents on the lines, must be less than certain thresholds. However, even in the simplest situations, the Task cannot be resolved exactly. The best-known approach is the chance constrained optimization methods, which often give a good approximation. An alternative approach is sampling the network operation modes and solving the problem on the data set of the classification problem: separating bad modes from good ones with a given error of the second kind. At the same time, for a sufficiently accurate solution, a very large amount of data is required, which often makes the problem numerically inefficient. We suggest using “importance sampling” to reduce the number of scenarios. Importance sampling consists of substituting a sample from a nominal solution, which often carries no information since all bad events are very rare, with a synthetic distribution that samples the sample in a neighborhood of bad events.            
+
* '''Problem description:''' Optimization problems with probabilistic constraints are often encountered in engineering practice. For example, The problem of minimizing energy generation in energy networks, with (randomly fluctuating) renewable energy sources. In this case, it is necessary to comply with safety restrictions: voltages at generators and consumers, as well as currents on the lines, must be less than certain thresholds. However, even in the simplest situations, The problem cannot be resolved exactly. The best-known approach is the chance constrained optimization methods, which often give a good approximation. An alternative approach is sampling the network operation modes and solving the problem on the data set of the classification * '''Problem description:''' separating bad modes from good ones with a given error of the second kind. At the same time, for a sufficiently accurate solution, a very large amount of data is required, which often makes the problem numerically inefficient. We suggest using “importance sampling” to reduce the number of scenarios. Importance sampling consists of substituting a sample from a nominal solution, which often carries no information since all bad events are very rare, with a synthetic distribution that samples the sample in a neighborhood of bad events.
* '''Problem statement:''' find the minimum of a convex function (price) under probabilistic constraints (the probability of exceeding a certain threshold for a system of linear/quadratic functions is small) and numerically show the effectiveness of sampling in this problem.
* '''Problem statement:''' find the minimum of a convex function (price) under probabilistic constraints (the probability of exceeding a certain threshold for a system of linear/quadratic functions is small) and numerically show the effectiveness of sampling in this problem.
* '''Data:''' Data is available in the pypower and matpower packages as csv files.
* '''Data:''' Data is available in the pypower and matpower packages as csv files.
* '''References:''' The proposed algorithms are based on 3 articles:
* '''References:''' The proposed algorithms are based on 3 articles:
-
*# Owen, Maximov, Chertkov. Importance Sampling for the Union of Rare Events with Applications to Power Systems [https://statistics.sites.stanford.edu/sites/g/files/sbiybj6031/f/2017-10.pdf LINK]
+
*# Owen, Maximov, Chertkov. Importance Sampling for the Union of Rare Events with Applications to Power Systems [https://statistics.sites.stanford.edu/sites/g/files/sbiybj6031/f/2017-10.pdf LINK]
-
*# A. Nemirovski. On safe tractable approximations of chance constraints [https://www2.isye.gatech.edu/~nemirovs/EUROXXIV.pdf LINK]
+
*# A. Nemirovski. On safe tractable approximations of chance constraints [https://www2.isye.gatech.edu/~nemirovs/EUROXXIV.pdf LINK]
-
*# S. Tong, A. Subramanyam, and Vi. Rao. Optimization under rare chance constraints. [https://arxiv.org/pdf/2011.06052.pdf LINK]
+
*# S. Tong, A. Subramanyam, and Vi. Rao. Optimization under rare chance constraints. [https://arxiv.org/pdf/2011.06052.pdf LINK]
*# In addition, the authors of the problem have a draft of the article, in which you need to add a numerical part.
*# In addition, the authors of the problem have a draft of the article, in which you need to add a numerical part.
-
* '''Basic algorithm:''' A list of basic algorithms is provided in this lecture [http://niaohe.ise.illinois.edu/IE598_2020/IE598NH-lecture-10-11-CCP.pdf LINK]
+
* '''Base algorithm:''' A list of basic algorithms is provided in this lecture [http://niaohe.ise.illinois.edu/IE598_2020/IE598NH-lecture-10-11-CCP.pdf LINK]
-
* '''Solution:''' in numerical experiments, you need to compare the sample size requirements for standard methods (scenario approximation) and using importance sampling to obtain a solution of comparable quality (and inverse Task, having equal sample lengths, compare the quality of the solution)
+
* '''Solution:''' in numerical experiments, you need to compare the sample size requirements for standard methods (scenario approximation) and using importance sampling to obtain a solution of comparable quality (and inverse The problem, having equal sample lengths, compare the quality of the solution)
-
* '''Novelty:''' Task has long been known in the community and scenario approximation is one of the main methods. At the same time, importance sampling helps to significantly reduce the number of scenarios. We have recently received a number of interesting results on how to calculate optimal samplers, with their use the complexity of the problem will be significantly reduced
+
* '''Novelty:''' The problem has long been known in the community and scenario approximation is one of the main methods. At the same time, importance sampling helps to significantly reduce the number of scenarios. We have recently received a number of interesting results on how to calculate optimal samplers, with their use the complexity of the problem will be significantly reduced
-
* '''Authors:''' Expert Yuri Maksimov, consultant Yuri Maksimov and Alexander Lukashevich, student.
+
* '''Authors:''' Expert Yuri Maksimov, consultant Yuri Maksimov and Alexander Lukashevich.
-
===Task 79===
+
===Problem 79.2021===
-
* '''Name:''' Improving Bayesian Inference in Physics Informed Machine Learning
+
* '''Title:''' Improving Bayesian Inference in Physics Informed Machine Learning
-
* '''Task:''' Machine learning methods are currently widely used in physics, in particular, in solving turbulence problems or analyzing the stability of physical networks. At the same time, the key issue is which modes to choose for training models. A frequent choice is a sequence of points that uniformly covers the admissible set. However, often such sequences are not very informative, especially if analytical methods give a region where the system is guaranteed to be stable. The problem proposes several methods of sampling: allowing to take into account this information. Our goal is to compare them and find the one that requires the smallest sample size (empirical comparison).
+
* '''Problem description:''' Machine learning methods are currently widely used in physics, in particular, in solving turbulence problems or analyzing the stability of physical networks. At the same time, the key issue is which modes to choose for training models. A frequent choice is a sequence of points that uniformly covers the admissible set. However, often such sequences are not very informative, especially if analytical methods give a region where the system is guaranteed to be stable. The problem proposes several methods of sampling: allowing to take into account this information. Our goal is to compare them and find the one that requires the smallest sample size (empirical comparison).
* '''Data:''' The experiment is proposed to be carried out on model and real data. The simulation experiment consists in analyzing the stability of (slightly non-linear) differential equations (synthetic data is self-generated). The second experiment is to analyze the stability of energy systems (data from matpower, pypower, GridDyn).
* '''Data:''' The experiment is proposed to be carried out on model and real data. The simulation experiment consists in analyzing the stability of (slightly non-linear) differential equations (synthetic data is self-generated). The second experiment is to analyze the stability of energy systems (data from matpower, pypower, GridDyn).
* '''References:'''
* '''References:'''
-
*# Art Owen. Quasi Monte Carlo Sampling. [https://statweb.stanford.edu/~owen/courses/362-1011/readings/siggraph03.pdf LINK ]
+
*# Art Owen. Quasi Monte Carlo Sampling. [https://statweb.stanford.edu/~owen/courses/362-1011/readings/siggraph03.pdf LINK ]
-
*# Jian Cheng & Marek J. Druzdzel. Computational Investigation of Low-Discrepancy Sequences in Simulation Algorithms for Bayesian Networks [https://arxiv.org/pdf/1301.3841.pdf LINK]
+
*# Jian Cheng & Marek J. Druzdzel. Computational Investigation of Low-Discrepancy Sequences in Simulation Algorithms for Bayesian Networks [https://arxiv.org/pdf/1301.3841.pdf LINK]
-
*# A. Owen, Y Maximov, M. Chertkov. Importance Sampling for the Union of Rare Events with Applications to Power Systems [https://statistics.sites.stanford.edu/sites/g/files/sbiybj6031/f/2017-10.pdf LINK]
+
*# A. Owen, Y Maximov, M. Chertkov. Importance Sampling for the Union of Rare Events with Applications to Power Systems [https://statistics.sites.stanford.edu/sites/g/files/sbiybj6031/f/2017-10.pdf LINK]
-
*# Polson and Solokov. Deep Learning: A Bayesian Perspective [https://arxiv.org/pdf/1706.00473.pdf LINK]
+
*# Polson and Solokov. Deep Learning: A Bayesian Perspective [https://arxiv.org/pdf/1706.00473.pdf LINK]
*# In addition: the authors of the problem have a draft work on this topic
*# In addition: the authors of the problem have a draft work on this topic
-
* '''Basic algorithm:''' The basic algorithm we are improving is Quasi Monte Carlo (QMC, [https://statweb.stanford.edu/~owen/courses/362-1011/readings/siggraph03.pdf LINK ]). Task to construct low discrepancy sequences not covering the polyhedral region and the region given by the intersection of the quadratic constraints. Another algorithm with which we need a comparison:
+
* '''Base algorithm:''' The basic algorithm we are improving is Quasi Monte Carlo (QMC, [https://statweb.stanford.edu/~owen/courses/362-1011/readings/siggraph03.pdf LINK ]). The problem to construct low discrepancy sequences not covering the polyhedral region and the region given by the intersection of the quadratic constraints. Another algorithm with which we need a comparison: E. Gryazina, B. Polyak. Random Sampling: a Billiard Walk Algorithm [https://www.sciencedirect.com/science/article/pii/S1474667016425711 LINK] and algorithms Hit and Run [https://statweb.stanford.edu/~cgates/PERSI/papers/hitandrun062207.pdf LINK]
-
E. Gryazina, B. Polyak. Random Sampling: a Billiard Walk Algorithm [https://www.sciencedirect.com/science/article/pii/S1474667016425711 LINK] и с алгоритмами типа Hit and Run [https://statweb.stanford.edu/~cgates/PERSI/papers/hitandrun062207.pdf LINK]
+
* '''Solution:''' sampling methods by importance, in particular the extension of the approach (Boy, Ryi, 2014) and (Owen, Maximov, Chertkov, 2017) and their applications to ML/DL for physical problems
* '''Solution:''' sampling methods by importance, in particular the extension of the approach (Boy, Ryi, 2014) and (Owen, Maximov, Chertkov, 2017) and their applications to ML/DL for physical problems
* '''Novelty:''' in a significant reduction in sample complexity and the explicit use of existing and analytical results and learning to solve physical problems, before that ML approaches and analytical solutions were mostly parallel courses
* '''Novelty:''' in a significant reduction in sample complexity and the explicit use of existing and analytical results and learning to solve physical problems, before that ML approaches and analytical solutions were mostly parallel courses
* '''Authors:''' Expert Yuri Maksimov, consultant Yuri Maksimov and Alexander Lukashevich, student.
* '''Authors:''' Expert Yuri Maksimov, consultant Yuri Maksimov and Alexander Lukashevich, student.
-
 
+
-
 
+
===Problem 81.2021===
-
===Task 81 ===
+
* '''Title:''' NAS — Generation and selection of neural network architectures
-
* '''Name:''' NAS — Generation and selection of neural network architectures
+
* '''Problem description:''' The problem of choosing the optimal neural network architecture is set as The problem of sampling the vector of structural parameters. The optimality criterion is defined in terms of the accuracy, complexity and stability of the model. The sampling procedure itself consists of two steps: generating a new structure and rejecting this structure if it does not satisfy the optimality criterion. It is proposed to explore various methods of sampling. The formulation of the problem of choosing the optimal structure is described in [https://drive.google.com/file/d/1Wn-CEhDKvjyZMvZdBHWUobxpizVF1G8l/view?usp=sharing Potanin-1]
-
* '''Task:''' The task of choosing the optimal neural network architecture is set as the Task of sampling the vector of structural parameters. The optimality criterion is defined in terms of the accuracy, complexity and stability of the model. The sampling procedure itself consists of two steps: generating a new structure and rejecting this structure if it does not satisfy the optimality criterion. It is proposed to explore various methods of sampling. The formulation of the problem of choosing the optimal structure is described in [https://drive.google.com/file/d/1Wn-CEhDKvjyZMvZdBHWUobxpizVF1G8l/view?usp=sharing Potanin-1]
+
* '''Data:''' : Two separate sets are offered as data. The first one consists of one element, this is the popular MNIST dataset. Pros - is a strong and generally accepted baseline, was used as a benchmark for the WANN article, quite large (multi-class classification). The second set is a set of datasets for the regression The problem. Size varies from very small to quite large. Here is a link to the dataset and laptop to download the data [https://drive.google.com/file/d/19Cxtf3dg7gHFHyDXYAI0cEoT7PaNl4IR/view?usp=sharing data].
-
* '''Data:''' : Two separate sets are offered as data. The first one consists of one element, this is the popular MNIST dataset. Pros - is a strong and generally accepted baseline, was used as a benchmark for the WANN article, quite large (multi-class classification). The second set is a set of datasets for the regression task. Size varies from very small to quite large. Here is a link to the dataset and laptop to download the data [https://drive.google.com/file/d/19Cxtf3dg7gHFHyDXYAI0cEoT7PaNl4IR/view?usp=sharing data].
+
* '''References:'''
* '''References:'''
*# [https://drive.google.com/file/d/1Wn-CEhDKvjyZMvZdBHWUobxpizVF1G8l/view?usp=sharing Potanin - 1]
*# [https://drive.google.com/file/d/1Wn-CEhDKvjyZMvZdBHWUobxpizVF1G8l/view?usp=sharing Potanin - 1]
*# Potanin - 2. One more work, the text is given to the interested student, but without publication.
*# Potanin - 2. One more work, the text is given to the interested student, but without publication.
-
*# Strizhov Factory laboratory [http://strijov.com/papers/Strijov2012ErrorFn.pdf Error function]
+
*# Strijov Factory laboratory [http://strijov.com/papers/Strijov2012ErrorFn.pdf Error function]
*# [http://strijov.com/papers/HyperOptimizationEng.pdf Informtica]
*# [http://strijov.com/papers/HyperOptimizationEng.pdf Informtica]
*# [https://weightagnostic.github.io/ WANN]
*# [https://weightagnostic.github.io/ WANN]
Строка 226: Строка 864:
*# [https://arxiv.org/pdf/1912.01412.pdf Symbols]
*# [https://arxiv.org/pdf/1912.01412.pdf Symbols]
*# [http://nn.cs.utexas.edu/downloads/papers/stanley.cec02.pdf NEAT]
*# [http://nn.cs.utexas.edu/downloads/papers/stanley.cec02.pdf NEAT]
-
* '''Basic algorithm:''' Closest [https://weightagnostic.github.io/ project], and its [https://github.com/google/brain-tokyo-workshop/tree/master/WANNRelease/WANN code]. Actual [https://drive.google.com/file/d/19Cxtf3dg7gHFHyDXYAI0cEoT7PaNl4IR/view?usp=sharing code] from consultant.
+
* '''Base algorithm:''' Closest [https://weightagnostic.github.io/ project], and its [https://github.com/google/brain-tokyo-workshop/tree/master/WANNRelease/WANN code]. Actual [https://drive.google.com/file/d/19Cxtf3dg7gHFHyDXYAI0cEoT7PaNl4IR/view?usp=sharing code] from consultant.
* '''Solution:''' A number of experiments have already been performed, where sampling is performed by a genetic algorithm. Acceptable results have been obtained. It is proposed to analyze and improve them. Namely, to distinguish two modules: generation and deviation and compare several types of sampling. Basic - Importance sampling, desirable - Metropolis-Hastings (or even Metropolis-Langevin) sampling. Since the genetic algorithm is considered by us as a process with jumps, it is proposed to take this into account when designing the sampling procedure. The bonus of MH is that it has a Bayesian interpretation. The first level of Bayesian inference as applied to MH is described in [Informatica]. It is required either to rewrite it in terms of the distribution of structural parameters, or to describe both levels in general, moving the structural parameters to the second level (by the way, approximately the same will be in the Aduenko problem).
* '''Solution:''' A number of experiments have already been performed, where sampling is performed by a genetic algorithm. Acceptable results have been obtained. It is proposed to analyze and improve them. Namely, to distinguish two modules: generation and deviation and compare several types of sampling. Basic - Importance sampling, desirable - Metropolis-Hastings (or even Metropolis-Langevin) sampling. Since the genetic algorithm is considered by us as a process with jumps, it is proposed to take this into account when designing the sampling procedure. The bonus of MH is that it has a Bayesian interpretation. The first level of Bayesian inference as applied to MH is described in [Informatica]. It is required either to rewrite it in terms of the distribution of structural parameters, or to describe both levels in general, moving the structural parameters to the second level (by the way, approximately the same will be in the Aduenko problem).
-
* '''Novelty:''' Neural networks excel at the tasks of computer vision, reinforcement learning, and natural language processing. One of the main goals of neural networks is to perform well tasks that are currently solved exclusively by humans, that is, natural human neural networks. Artificial neural networks still work very differently from natural neural networks. One of the main differences is that natural neural networks evolve over time, changing the strength of connections and their architecture. Artificial neural networks can adjust the strength of connections using weights, but cannot change their architecture. Therefore, the task of choosing the optimal structures of neural networks for specific tasks seems to be an important step in the development of the capabilities of neural network models.
+
* '''Novelty:''' Neural networks excel at The problems of computer vision, reinforcement learning, and natural language processing. One of the main goals of neural networks is to perform well The problems that are currently solved exclusively by humans, that is, natural human neural networks. Artificial neural networks still work very differently from natural neural networks. One of the main differences is that natural neural networks evolve over time, changing the strength of connections and their architecture. Artificial neural networks can adjust the strength of connections using weights, but cannot change their architecture. Therefore, The problem of choosing the optimal structures of neural networks for specific The problems seems to be an important step in the development of the capabilities of neural network models.
-
* '''Authors:''' consultant Mark Potanin, Expert Strizhov V.V.
+
* '''Authors:''' consultant Mark Potanin, Expert Strijov V.V.
-
===Task 82===
+
===Problem 82.2021===
-
* '''Name:''' Training with an Expert for a sample with many domains.
+
* '''Title:''' Training with an Expert for a sample with many domains.
-
* '''Task:''' The Task of approximating a multi-domain sample by a single multi-model - a mixture of Experts is considered. As data, it is supposed to use a sample that contains several domains. There is no domain label for each object. Each domain is approximated by a local model. The paper considers a two-stage Task optimization based on the EM algorithm.
+
* '''Problem description:''' The problem of approximating a multi-domain sample by a single multi-model - a mixture of Experts is considered. As data, it is supposed to use a sample that contains several domains. There is no domain label for each object. Each domain is approximated by a local model. The paper considers a two-stage The problem optimization based on the EM algorithm.
* '''Data:''' Samples of reviews from the Amazon site for different types of goods are used as data. It is supposed to use a linear model as a local model, and use tf-idf vectors within each domain as an indicative description of reviews.
* '''Data:''' Samples of reviews from the Amazon site for different types of goods are used as data. It is supposed to use a linear model as a local model, and use tf-idf vectors within each domain as an indicative description of reviews.
* '''References:'''
* '''References:'''
Строка 240: Строка 878:
*# [https://dl.acm.org/doi/pdf/10.1145/3400066 https://dl.acm.org/doi/pdf/10.1145/3400066]
*# [https://dl.acm.org/doi/pdf/10.1145/3400066 https://dl.acm.org/doi/pdf/10.1145/3400066]
* '''Basic algorithm and Solution:''' The basic solution is presented [https://www.aclweb.org/anthology/D18-1498.pdf here]. The work uses the expert mixture method for the Multi-Soruce domain adaptation problem. The code for the article is available [https://github.com/jiangfeng1124/transfer link].
* '''Basic algorithm and Solution:''' The basic solution is presented [https://www.aclweb.org/anthology/D18-1498.pdf here]. The work uses the expert mixture method for the Multi-Soruce domain adaptation problem. The code for the article is available [https://github.com/jiangfeng1124/transfer link].
-
* '''Novelty:''' At the moment, in machine learning there are more and more tasks related to data that are taken from different sources. In this case, there are samples that consist of a large number of domains. At the moment, there is no complete theoretical justification for constructing mixtures of local models for approximating such types of samples.
+
* '''Novelty:''' At the moment, in machine learning there are more and more The problems related to data that are taken from different sources. In this case, there are samples that consist of a large number of domains. At the moment, there is no complete theoretical justification for constructing mixtures of local models for approximating such types of samples.
-
* '''Authors:''' Grabovoi A.V., Strizhov V.V.
+
* '''Authors:''' Grabovoi A.V., Strijov V.V.
-
=== Task 17 ===
+
===Problem 17.2021===
-
* '''Name:''' BCI: Selection of consistent models for building a neural interface
+
* '''Title:''' BCI: Selection of consistent models for building a neural interface
-
* '''Task''': When building brain-computer interface systems, simple, stable models are used. An important step in building an interface is such a model is an adequate choice of model. A wide range of models is considered: linear, simple neural networks, recurrent networks, transformers. The peculiarity of the problem is that when making a prediction, it is required to model not only the initial signal taken from the cerebral cortex, but also the target signal taken from the limbs. Thus, two models are required. In order for them to work together, a space of agreements is being built. It is proposed to explore the properties of this space and the properties of the resulting forecast (neural interface) on various pairs of models.
+
* '''Problem:''' When building brain-computer interface systems, simple, stable models are used. An important step in building an interface is such a model is an adequate choice of model. A wide range of models is considered: linear, simple neural networks, recurrent networks, transformers. The peculiarity of the problem is that when making a prediction, it is required to model not only the initial signal taken from the cerebral cortex, but also the target signal taken from the limbs. Thus, two models are required. In order for them to work together, a space of agreements is being built. It is proposed to explore the properties of this space and the properties of the resulting forecast (neural interface) on various pairs of models.
* '''Data:''' ECoG/EEG brain signal data sets.
* '''Data:''' ECoG/EEG brain signal data sets.
*# Need ECoG (dataset 25 contains EEG, EOG and hand movements) [http://bnci-horizon-2020.eu/database/data-sets http://bnci-horizon-2020.eu/database/data-sets]
*# Need ECoG (dataset 25 contains EEG, EOG and hand movements) [http://bnci-horizon-2020.eu/database/data-sets http://bnci-horizon-2020.eu/database/data-sets]
*# neyrotycho — our old data.
*# neyrotycho — our old data.
-
* '''References:''':
+
* '''References:'''
-
*# Yaushev F.Yu., Isachenko R.V., Strizhov V.V. Latent space matching models in the forecasting problem // Systems and Means of Informatics, 2021, 31(1). [http://strijov.com/papers/Isachenko2020CanonicCorrelation.pdf PDF]
+
*# Yaushev F.Yu., Isachenko R.V., Strijov V.V. Latent space matching models in the forecasting problem // Systems and Means of Informatics, 2021, 31(1). [http://strijov.com/papers/Isachenko2020CanonicCorrelation.pdf PDF]
*# Isachenko R.V. Choice of a signal decoding model in high-dimensional spaces. Manuscript, 2021. [https://github.com/r-isachenko/PhDThesis/raw/master/doc/Isachenko2021PhDThesis.pdf PDF]
*# Isachenko R.V. Choice of a signal decoding model in high-dimensional spaces. Manuscript, 2021. [https://github.com/r-isachenko/PhDThesis/raw/master/doc/Isachenko2021PhDThesis.pdf PDF]
*# Isachenko R.V. Choice of a signal decoding model in high-dimensional spaces. Slides, 2020. [https://github.com/r-isachenko/PhDThesis/raw/master/pres/Isachenko2020PhDThesisPres.pdf]
*# Isachenko R.V. Choice of a signal decoding model in high-dimensional spaces. Slides, 2020. [https://github.com/r-isachenko/PhDThesis/raw/master/pres/Isachenko2020PhDThesisPres.pdf]
-
*# Isachenko R.V., Vladimirova M.R., Strijov V.V. Dimensionality reduction for time series decoding and forecasting problems // DEStech Transactions on Computer Science and Engineering, 2018, 27349 : 286-296. [http://strijov.com/papers/IsachenkoVladimirova2018PLS.pdf PDF]
+
*# Isachenko R.V., Vladimirova M.R., Strijov V.V. Dimensionality reduction for time series decoding and forecasting problems // DEStech Transactions on Computer Science and Engineering, 2018, 27349 : 286-296. [http://strijov.com/papers/IsachenkoVladimirova2018PLS.pdf PDF]
-
*# Isachenko R.V., Strijov V.V. Quadratic Programming Optimization with Feature Selection for Non-linear Models // Lobachevskii Journal of Mathematics, 2018, 39(9) : 1179-1187. [https://rdcu.be/bfR32 PDF]
+
*# Isachenko R.V., Strijov V.V. Quadratic Programming Optimization with Feature Selection for Non-linear Models // Lobachevskii Journal of Mathematics, 2018, 39(9) : 1179-1187. [https://rdcu.be/bfR32 PDF]
-
*# Motrenko A.P., Strijov V.V. Multi-way feature selection for ECoG-based brain-computer interface // Expert Systems with Applications, 2018, 114(30) : 402-413. [http://strijov.com/papers/MotrenkoStrijov2017ECoG_HL_2.pdf PDF]
+
*# Motrenko A.P., Strijov V.V. Multi-way feature selection for ECoG-based brain-computer interface // Expert Systems with Applications, 2018, 114(30) : 402-413. [http://strijov.com/papers/MotrenkoStrijov2017ECoG_HL_2.pdf PDF]
*# Eliseyev A., Aksenova T. Stable and artifact-resistant decoding of 3D hand trajectories from ECoG signals using the generalized additive model //Journal of neural engineering. – 2014.
*# Eliseyev A., Aksenova T. Stable and artifact-resistant decoding of 3D hand trajectories from ECoG signals using the generalized additive model //Journal of neural engineering. – 2014.
-
* '''Basic algorithm''': Described in the first work. The code is available. In that work, the data is two parts of an image. In our work, the signal of the brain and the movement of the hands. SuperTask: to finish the first job. Also the code and works [http://www.machinelearning.ru/wiki/index.php?title=BCI here].
+
* '''Basic algorithm''': Described in the first work. The code is available. In that work, the data is two parts of an image. In our work, the signal of the brain and the movement of the hands. Super* '''Problem description:''' to finish the first job. Also the code and works [http://www.machinelearning.ru/wiki/index.php?title=BCI here].
-
* '''Solution:''' The case is considered when the initial data are heterogeneous: the spaces of the independent and target variables are of different nature. It is required to build a predictive model that would take into account the dependence in the source space of the independent variable, as well as in the space of the target variable. It is proposed to investigate the accuracy, complexity and stability of pairs of various models. Since the inverse Task is solved when building a forecast, it is required to build inverse transformations for each model. To do this, you can use both basic techniques (PLS) and streams.
+
* '''Solution:''' The case is considered when the initial data are heterogeneous: the spaces of the independent and target variables are of different nature. It is required to build a predictive model that would take into account the dependence in the source space of the independent variable, as well as in the space of the target variable. It is proposed to investigate the accuracy, complexity and stability of pairs of various models. Since the inverse The problem is solved when building a forecast, it is required to build inverse transformations for each model. To do this, you can use both basic techniques (PLS) and streams.
* '''Novelty:''' Analysis of the prediction and latent space obtained by a pair of heterogeneous models.
* '''Novelty:''' Analysis of the prediction and latent space obtained by a pair of heterogeneous models.
-
* '''Authors:''' Consultant Roman Isachenko, Expert Strizhov V.V.
+
* '''Authors:''' Consultant Roman Isachenko, Expert Strijov V.V.
-
===Task 69 ===
+
===Problem 69.2021===
-
* '''Name:''' Graph Neural Network in Reaction Yield prediction
+
* '''Title:''' Graph Neural Network in Reaction Yield prediction
-
* '''Task:''' There are disconnected graphs of source molecules and products in a chemical reaction. The yield of the main product in the reaction is known. It is required to design an algorithm that predicts yield by solving the regression task on given disconnected graphs.
+
* '''Problem description:''' There are disconnected graphs of source molecules and products in a chemical reaction. The yield of the main product in the reaction is known. It is required to design an algorithm that predicts yield by solving the regression The problem on given disconnected graphs.
* '''Data:''' Database of reaction from US patents [https://www.repository.cam.ac.uk/handle/1810/244727]
* '''Data:''' Database of reaction from US patents [https://www.repository.cam.ac.uk/handle/1810/244727]
-
* '''References:''':
+
* '''References:'''
-
** [https://www.ncbi.nlm.nih.gov/pubmed/30046072] A general overview.
+
*# [https://www.ncbi.nlm.nih.gov/pubmed/30046072] A general overview.
-
** [https://pure.uva.nl/ws/files/33146507/1703.06103.pdf] Relational Graph Convolution Neural Network
+
*# [https://pure.uva.nl/ws/files/33146507/1703.06103.pdf] Relational Graph Convolution Neural Network
-
** [https://papers.nips.cc/paper/7181-attention-is-all-you-need] Transformer architecture
+
*# [https://papers.nips.cc/paper/7181-attention-is-all-you-need] Transformer architecture
-
** [http://www.machinelearning.ru/wiki/images/6/6c/NikitinMMPR201927.pdf] Graph neural network learning for chemical compounds synthesis
+
*# [http://www.machinelearning.ru/wiki/images/6/6c/NikitinMMPR201927.pdf] Graph neural network learning for chemical compounds synthesis
-
**
+
* '''Base algorithm:''' Transformer model. The input sequence is a SMILES representation of the source and product molecules.
-
* '''Basic algorithm:''' Transformer model. The input sequence is a SMILES representation of the source and product molecules.
+
* '''Solution:''' A pipeline for working with disconnected graphs is proposed. The pipeline includes the construction of extended graph with molecule and reaction representation, Relational Graph Convolution Neural Network, Encoder of Transformer. The method is applied to solve yield predictions.
-
*'''Solution:''' A pipeline for working with disconnected graphs is proposed. The pipeline includes the construction of extended graph with molecule and reaction representation, Relational Graph Convolution Neural Network, Encoder of Transformer. The method is applied to solve yield predictions.
+
* '''Novelty:''' A solution for regression problem on the given disconnected graph is constructed; the approach demonstrates better performance compared with other solutions
-
*'''Novelty:''' A solution for regression problem on the given disconnected graph is constructed; the approach demonstrates better performance compared with other solutions
+
* '''Authors:''' Nikitin Filipp, Isayev Olexandr, Strijov V.V.
-
*'''Authors:''': Nikitin Filipp, Isayev Olexandr, Strizhov V.V.
+
-
===Task 84===
+
===Problem 84.2021===
-
* '''Name:''' Trajectory Regularization of Deep Learning Model Parameters Optimization Based on Knowledge Distillation
+
* '''Title:''' Trajectory Regularization of Deep Learning Model Parameters Optimization Based on Knowledge Distillation
-
* '''Task:''' The problem of optimizing the parameters of a deep learning model is considered. The case is considered when the responses of a more complex model (teacher model) are available during optimization. The classical approach to solving such a problem is learning based on the responses of a complex model (knowledge distillation). Assignment of hyperparameters is made empirically based on the results of the model on delayed sampling. In this paper, we propose to consider a modification of the approach to knowledge distillation, in which the coefficient of significance of the distilling term, as well as its gradients, act as hyperparameters. Both of these groups of parameters allow you to adjust the optimization of the model parameters. To optimize hyperparameters, it is proposed to consider the optimization problem as a two-level optimization problem, where at the first level of optimization the Task of optimizing the model parameters is solved, and at the second level the Task of optimizing hyperparameters is approximately solved by the value of the loss function on the delayed sample.
+
* '''Problem description:''' The problem of optimizing the parameters of a deep learning model is considered. The case is considered when the responses of a more complex model (teacher model) are available during optimization. The classical approach to solving such a problem is learning based on the responses of a complex model (knowledge distillation). Assignment of hyperparameters is made empirically based on the results of the model on delayed sampling. In this paper, we propose to consider a modification of the approach to knowledge distillation, in which the coefficient of significance of the distilling term, as well as its gradients, act as hyperparameters. Both of these groups of parameters allow you to adjust the optimization of the model parameters. To optimize hyperparameters, it is proposed to consider the optimization problem as a two-level optimization problem, where at the first level of optimization The problem of optimizing the model parameters is solved, and at the second level The problem of optimizing hyperparameters is approximately solved by the value of the loss function on the delayed sample.
* '''Data:''' Sampling of CIFAR-10 images
* '''Data:''' Sampling of CIFAR-10 images
* '''References:'''
* '''References:'''
-
*#[https://arxiv.org/abs/1503.02531 Distillation of knowledge]
+
*# [https://arxiv.org/abs/1503.02531 Distillation of knowledge]
-
*#[https://arxiv.org/abs/1511.06727 Hyperparameter Optimization in a Bilevel Problem: Greedy Method]
+
*# [https://arxiv.org/abs/1511.06727 Hyperparameter Optimization in a Bilevel * '''Problem description:''' Greedy Method]
-
*#[http://strijov.com/papers/Bakhteev2017Hypergrad.pdf Hyperparameter Optimization in a Bilevel Problem: Comparison of Approaches]
+
*# [http://strijov.com/papers/Bakhteev2017Hypergrad.pdf Hyperparameter Optimization in a Bilevel * '''Problem description:''' Comparison of Approaches]
-
*#[https://arxiv.org/abs/1606.04474 Meta Optimization: neural network instead of optimization operator]
+
*# [https://arxiv.org/abs/1606.04474 Meta Optimization: neural network instead of optimization operator]
* '''Basic algorithm: Model optimization without distillation and with standard distillation approach
* '''Basic algorithm: Model optimization without distillation and with standard distillation approach
* '''Solution:''' Using a two-level problem for model optimization. The combination of gradients for both terms is processed by a separate model (LSTM)
* '''Solution:''' Using a two-level problem for model optimization. The combination of gradients for both terms is processed by a separate model (LSTM)
* '''Novelty:''' A new approach to model distillation will be proposed to significantly improve the performance of models trained in privileged information mode. It is also planned to study the dynamics of changes in hyperparameters in the optimization process.
* '''Novelty:''' A new approach to model distillation will be proposed to significantly improve the performance of models trained in privileged information mode. It is also planned to study the dynamics of changes in hyperparameters in the optimization process.
-
* '''Authors:''' Oleg Bakhteev, Strizhov V.V.
+
* '''Authors:''' Oleg Bakhteev, Strijov V.V.
-
===Task 85===
+
===Problem 85.2021===
-
* '''Name:''' A differentiable search algorithm for model architecture with control over its complexity
+
* '''Title:''' A differentiable search algorithm for model architecture with control over its complexity
-
* '''Task:''' The problem of choosing the structure of a deep learning model with a predetermined complexity is considered. It is required to propose a method for searching for a model that allows controlling its complexity with low computational costs.
+
* '''Problem description:''' The problem of choosing the structure of a deep learning model with a predetermined complexity is considered. It is required to propose a method for searching for a model that allows controlling its complexity with low computational costs.
* '''Data:''' MNIST, CIFAR
* '''Data:''' MNIST, CIFAR
* '''References:'''
* '''References:'''
-
*# Grebenkova O.S., Oleg Bakhteev, Strizhov V.V.Variational optimization of a deep learning model with complexity control // Informatics and its applications, 2021, 15(2). [http://strijov.com/papers/Grebenkova2020HyperNet.pdf PDF]
+
*# Grebenkova O.S., Oleg Bakhteev, Strijov V.V.Variational optimization of a deep learning model with complexity control // Informatics and its applications, 2021, 15(2). [http://strijov.com/papers/Grebenkova2020HyperNet.pdf PDF]
-
*#[https://arxiv.org/abs/1806.09055 DARTS]
+
*# [https://arxiv.org/abs/1806.09055 DARTS]
-
*#[https://arxiv.org/abs/1609.09106 hypernets]
+
*# [https://arxiv.org/abs/1609.09106 hypernets]
* '''Basic algorithm: DARTS
* '''Basic algorithm: DARTS
* '''Solution:''' The proposed method is to use a differentiable neural network architecture search algorithm (DARTS) with parameter complexity control using a hypernet.
* '''Solution:''' The proposed method is to use a differentiable neural network architecture search algorithm (DARTS) with parameter complexity control using a hypernet.
Строка 304: Строка 941:
* '''Authors:''' Oleg Bakhteev, Grebenkova O. S.
* '''Authors:''' Oleg Bakhteev, Grebenkova O. S.
-
===Task 86 ===
+
===Problem 86. 2021===
-
* '''Name:''' Learning co-evolution information with natural language processing for protein folding problem
+
* '''Title:''' Learning co-evolution information with natural language processing for protein folding problem
-
* '''Task''': One of the most essential problems in structural bioinformatics is protein fold recognition since the relationship between the protein amino acid sequence and its tertiary structure is revealed by protein folding. A specific protein fold describes the distinctive arrangement of secondary structure elements in the nearly-infinite conformation space, which denotes the structural characteristics of a protein molecule.
+
* '''Problem:''' One of the most essential problems in structural bioinformatics is protein fold recognition since the relationship between the protein amino acid sequence and its tertiary structure is revealed by protein folding. A specific protein fold describes the distinctive arrangement of secondary structure elements in the nearly-infinite conformation space, which denotes the structural characteristics of a protein molecule.
* '''Problem description:''': request
* '''Problem description:''': request
* '''Authors:''' Sergei Grudinin, Maria Kadukova.
* '''Authors:''' Sergei Grudinin, Maria Kadukova.
-
===Task 87 ===
+
===Problem 87.2021===
-
* '''Name:''' Bayesian choice of structures of generalized linear models
+
* '''Title:''' Bayesian choice of structures of generalized linear models
-
* '''Task:''' The work is devoted to testing methods for feature selection. It is assumed that the sample under study contains a significant number of multicollinear features. Multicollinearity is a strong correlation between the features selected for analysis that jointly affect the target vector, which makes it difficult to estimate regression parameters and identify the relationship between features and the target vector. There is a set of time series containing the readings of various sensors that reflect the state of the device. The readings of the sensors correlate with each other. It is necessary to choose the optimal set of features for solving the forecasting problem.
+
* '''Problem description:''' The work is devoted to testing methods for feature selection. It is assumed that the sample under study contains a significant number of multicollinear features. Multicollinearity is a strong correlation between the features selected for analysis that jointly affect the target vector, which makes it difficult to estimate regression parameters and identify the relationship between features and the target vector. There is a set of time series containing the readings of various sensors that reflect the state of the device. The readings of the sensors correlate with each other. It is necessary to choose the optimal set of features for solving the forecasting problem.
* '''Novelty:''' One of the most preferred feature selection algorithms has been published. It uses structural parameters. But there is no theoretical justification. It is proposed to build a theory by describing and analyzing various functions of a priori distribution of structural parameters. In works on the search for structures of neural networks, there is also no clear theory and a list of a priori assumptions.
* '''Novelty:''' One of the most preferred feature selection algorithms has been published. It uses structural parameters. But there is no theoretical justification. It is proposed to build a theory by describing and analyzing various functions of a priori distribution of structural parameters. In works on the search for structures of neural networks, there is also no clear theory and a list of a priori assumptions.
* '''Data:''' Multivariate time series with readings from various sensors from paper 4, for starters, all samples from paper 1.
* '''Data:''' Multivariate time series with readings from various sensors from paper 4, for starters, all samples from paper 1.
* '''References:''' Keywords: bootstrap aggregation, Belsley method, vector autoregression.
* '''References:''' Keywords: bootstrap aggregation, Belsley method, vector autoregression.
-
*# Katrutsa A.M., Strijov V.V. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria // Expert Systems with Applications, 2017, 76 : 1-11. [http://strijov.com/papers/Katrutsa2016QPFeatureSelection.pdf PDF]
+
*# Katrutsa A.M., Strijov V.V. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria // Expert Systems with Applications, 2017, 76 : 1-11. [http://strijov.com/papers/Katrutsa2016QPFeatureSelection.pdf PDF]
-
*# Katrutsa A.M., Strijov V.V. Stresstest procedure for feature selection algorithms // Chemometrics and Intelligent Laboratory Systems, 2015, 142 : 172-183.  [http://strijov.com/papers/Katrutsa2014TestGenerationEn.pdf PDF]
+
*# Katrutsa A.M., Strijov V.V. Stresstest procedure for feature selection algorithms // Chemometrics and Intelligent Laboratory Systems, 2015, 142 : 172-183. [http://strijov.com/papers/Katrutsa2014TestGenerationEn.pdf PDF]
-
*# Strizhov V.V. Error function in regression recovery problems // Factory laboratory. material diagnostics, 2013, 79(5) : 65-73. [http://strijov.com/papers/Strijov2012ErrorFn.pdf PDF]
+
*# Strijov V.V. Error function in regression recovery problems // Factory laboratory. material diagnostics, 2013, 79(5) : 65-73. [http://strijov.com/papers/Strijov2012ErrorFn.pdf PDF]
-
*# Зайцев А.А., Strizhov V.V., Tokmakova A.A. Estimation of hyperparameters of regression models by the maximum likelihood method // Information technologies, 2013, 2 : 11-15. [http://strijov.com/papers/ZaytsevStrijovTokmakova2012Likelihood_Preprint.pdf PDF]
+
*# Zaitsev A.A., Strijov V.V., Tokmakova A.A. Estimation of hyperparameters of regression models by the maximum likelihood method // Information technologies, 2013, 2 : 11-15. [http://strijov.com/papers/ZaytsevStrijovTokmakova2012Likelihood_Preprint.pdf PDF]
-
*# Kuznetsov M.P., Tokmakova A.A., Strijov V.V. Analytic and stochastic methods of structure parameter estimation // Informatica, 2016, 27(3) : 607-624. [http://strijov.com/papers/HyperOptimizationEng.pdf PDF]
+
*# Kuznetsov M.P., Tokmakova A.A., Strijov V.V. Analytic and stochastic methods of structure parameter estimation // Informatica, 2016, 27(3) : 607-624. [http://strijov.com/papers/HyperOptimizationEng.pdf PDF]
-
*# Катруца А.М., Strizhov V.V. The problem of multicollinearity in the selection of features in regression problems // Information technologies, 2015, : 8-18.  [http://strijov.com/papers/Katrutsa2014TestGeneration.pdf PDF]
+
*# Katrutsa A.M., Strijov V.V. The problem of multicollinearity in the selection of features in regression problems // Information technologies, 2015, 1 : 8-18. [http://strijov.com/papers/Katrutsa2014TestGeneration.pdf PDF]
-
*# Нейчев Р.Г., Катруца А.М., Strizhov V.V. Selection of the optimal set of features from a multicorrelated set in the forecasting problem. Zavodskaya Lab. material diagnostics, 2016, 82(3) : 68-74. [http://strijov.com/papers/Neychev2015FeatureSelection.pdf PDF]
+
*# Neichev Р.Г., Katrutsa A.M., Strijov V.V. Selection of the optimal set of features from a multicorrelated set in the forecasting problem. Zavodskaya Lab. material diagnostics, 2016, 82(3) : 68-74. [http://strijov.com/papers/Neychev2015FeatureSelection.pdf PDF]
-
* '''Basic algorithm:''' Described in Reference 1: Quadratic Programming for QPFS Feature Selection. Code from Roman Isachenko.
+
* '''Base algorithm:''' Described in Reference 1: Quadratic Programming for QPFS Feature Selection. Code from Roman Isachenko.
* '''Solution:''' It is proposed to consider the structural parameters used in QPFS at the second level of Bayesian inference. Introduce informative a priori distributions of parameters and structural parameters. Compare different a priori assumptions.
* '''Solution:''' It is proposed to consider the structural parameters used in QPFS at the second level of Bayesian inference. Introduce informative a priori distributions of parameters and structural parameters. Compare different a priori assumptions.
* '''Novelty:''' Statistical Analysis of Structural Parameter Space and Visualization
* '''Novelty:''' Statistical Analysis of Structural Parameter Space and Visualization
-
* '''Authors:''' Alexander Aduenko — consultant, Strizhov V.V.
+
* '''Authors:''' Alexander Aduenko consultant, Strijov V.V.
-
===Task 88===
+
===Problem 88.2021===
*'''Name:''' Search for the boundaries of the iris by the method of circular projections
*'''Name:''' Search for the boundaries of the iris by the method of circular projections
-
*'''Task:''' Given a monochrome bitmap of the eye, [[Media:Matveev2021project.pdf|см. examples]]. The approximate position of the center of the pupil is also known. The word "approximate" means that the calculated center of the pupil is no more than half of its true radius from the true one. It is necessary to determine the approximate positions of the circles approximating the pupil and iris. The algorithm must be very fast.
+
*'''Problem:''' Given a monochrome bitmap of the eye, [http://www.machinelearning.ru/wiki/images/1/16/Matveev2021project.pdf examples]. The approximate position of the center of the pupil is also known. The word "approximate" means that the calculated center of the pupil is no more than half of its true radius from the true one. It is necessary to determine the approximate positions of the circles approximating the pupil and iris. The algorithm must be very fast.
*'''Data:''' About 200 thousand eye images. For each, the position of the true circles is marked - for the purpose of training and testing the method being created.
*'''Data:''' About 200 thousand eye images. For each, the position of the true circles is marked - for the purpose of training and testing the method being created.
-
*'''Basic algorithm:''' To speed up work with the image, it is proposed to aggregate data using circular projections of brightness. Circular projection is a function that depends on the radius, the value of which P(r) is equal to the integral of the directed image brightness gradient over a circle of radius r (or along an arc of a circle). Example for one arc (right quadrant) and for four arcs. Having built some circular projections, based on them, you can try to determine the position of the inner and outer borders of the iris (ring) using heuristics and / or a neural network. It is interesting to evaluate the capabilities of the neural network in this task.
+
*'''Basic algorithm:''' To speed up work with the image, it is proposed to aggregate data using circular projections of brightness. Circular projection is a function that depends on the radius, the value of which P(r) is equal to the integral of the directed image brightness gradient over a circle of radius r (or along an arc of a circle). Example for one arc (right quadrant) and for four arcs. Having built some circular projections, based on them, you can try to determine the position of the inner and outer borders of the iris (ring) using heuristics and / or a neural network. It is interesting to evaluate the capabilities of the neural network in this The problem.
*'''References:''' Matveev I.A. Detection of Iris in Image By Interrelated Maxima of Brightness Gradient Projections // Applied and Computational Mathematics. 2010. V.9. N.2. P.252-257 [https://www.researchgate.net/publication/228396639_Detection_of_iris_in_image_by_interrelated_maxima_of_brightness_gradient_projections PDF]
*'''References:''' Matveev I.A. Detection of Iris in Image By Interrelated Maxima of Brightness Gradient Projections // Applied and Computational Mathematics. 2010. V.9. N.2. P.252-257 [https://www.researchgate.net/publication/228396639_Detection_of_iris_in_image_by_interrelated_maxima_of_brightness_gradient_projections PDF]
*'''Author:''' Matveev I.A.
*'''Author:''' Matveev I.A.
-
===Task 53 ===
+
===Problem 53.2021===
-
* '''Name:''' Solution of an optimization problem combining classification and regression to estimate the binding energy of a protein and small molecules.
+
* '''Title:''' Solution of an optimization problem combining classification and regression to estimate the binding energy of a protein and small molecules.
-
* '''Task:''' The goal of the problem is to solve an optimization problem with classification and regression loss functions applied to biological data.
+
* '''Problem description:''' The goal of the problem is to solve an optimization problem with classification and regression loss functions applied to biological data.
* '''Data:''' Approximately 12,000 complexes of proteins with small molecules. For classification, for each of them there is 1 correct position in space and 18 incorrect ones generated, for regression, each complex corresponds to the value of the binding constant (proportional to energy). The main descriptors are histograms of distributions of distances between different atoms.
* '''Data:''' Approximately 12,000 complexes of proteins with small molecules. For classification, for each of them there is 1 correct position in space and 18 incorrect ones generated, for regression, each complex corresponds to the value of the binding constant (proportional to energy). The main descriptors are histograms of distributions of distances between different atoms.
-
* '''References:''':
+
* '''References:'''
-
** https://www.overleaf.com/read/rjdnyyxpdkyj Task details
+
*# https://www.overleaf.com/read/rjdnyyxpdkyj The problem details
-
** http://cs229.stanford.edu/notes/cs229-notes3.pdf SVM
+
*# http://cs229.stanford.edu/notes/cs229-notes3.pdf SVM
-
** http://scikit-learn.org/stable/modules/linear_model.html#ridge-regression Ridge Regression
+
*# http://scikit-learn.org/stable/modules/linear_model.html#ridge-regression Ridge Regression
-
** https://alex.smola.org/papers/2003/SmoSch03b.pdf SVR
+
*# https://alex.smola.org/papers/2003/SmoSch03b.pdf SVR
-
* '''Basic algorithm:''' In the classification task, we used an algorithm similar to linear SVM, whose relationship with the energy estimate, which is outside the scope of the classification task, is described in the article https://hal.inria.fr/hal-01591154/. For MSE, there is already a formulated dual Task as a regression loss function, with the implementation of which we can start.
+
* '''Base algorithm:''' In the classification The problem, we used an algorithm similar to linear SVM, whose relationship with the energy estimate, which is outside the scope of the classification The problem, is described in the article https://hal.inria.fr/hal-01591154/. For MSE, there is already a formulated dual The problem as a regression loss function, with the implementation of which we can start.
* '''Solution:''' The first step is to solve the problem with the MSE in the loss function using a solver that is convenient for you. The main difficulty may be the large dimensionality of the data, but they are sparse. Further it will be possible to change the wording of the problem.
* '''Solution:''' The first step is to solve the problem with the MSE in the loss function using a solver that is convenient for you. The main difficulty may be the large dimensionality of the data, but they are sparse. Further it will be possible to change the wording of the problem.
-
* '''Novelty:''' Many models used to predict the interactions of proteins with ligands are "retrained" for some task. For example, models that are good at predicting binding energies may be poor at selecting a protein-binding molecule from a variety of non-binding ones, and models that are good at determining the correct geometry of the complex may be poor at predicting energies. In this problem, we propose to consider a new approach to combat such overfitting, since the combination of classification and regression loss functions seems to us to be a very natural regularization.
+
* '''Novelty:''' Many models used to predict the interactions of proteins with ligands are "retrained" for some The problem. For example, models that are good at predicting binding energies may be poor at selecting a protein-binding molecule from a variety of non-binding ones, and models that are good at determining the correct geometry of the complex may be poor at predicting energies. In this problem, we propose to consider a new approach to combat such overfitting, since the combination of classification and regression loss functions seems to us to be a very natural regularization.
* '''Authors:''' Sergei Grudinin, Maria Kadukova.
* '''Authors:''' Sergei Grudinin, Maria Kadukova.
-
=== Task 75 ===
+
===Problem 75.2021===
-
* '''Name:''' Alignment of image elements using metric models.
+
* '''Title:''' Alignment of image elements using metric models.
-
* '''Task:''' Character set specified. Each symbol is represented by one file - an image. Image pixel size may vary. All images are known to belong to the same class, such as faces, letters, flowers, or cars. (A more complicated option is to one class, which we are studying and noise classes.) It is known that each image can be combined with another with the help of an equalizing transformation up to noise, or up to some average image. (This image may or may not be present in the sample). This leveling transformation is specified in the base case by a neural network, and in the proposed case - by a parametric transformation from some given class (the first is a special case of the second). The aligned image is compared with the original one using the distance function. If the distance between two images is statistically significant, it is concluded that the images belong to the same class. It is required to 1) propose an adequate model of the alignment transformation that takes into account the assumptions about the nature of the image (for example, only rotation and proportional scaling), 2) propose a distance function, 3) propose a method for finding the average image.
+
* '''Problem description:''' Character set specified. Each symbol is represented by one file - an image. Image pixel size may vary. All images are known to belong to the same class, such as faces, letters, flowers, or cars. (A more complicated option is to one class, which we are studying and noise classes.) It is known that each image can be combined with another with the help of an equalizing transformation up to noise, or up to some average image. (This image may or may not be present in the sample). This leveling transformation is specified in the base case by a neural network, and in the proposed case - by a parametric transformation from some given class (the first is a special case of the second). The aligned image is compared with the original one using the distance function. If the distance between two images is statistically significant, it is concluded that the images belong to the same class. It is required to 1) propose an adequate model of the alignment transformation that takes into account the assumptions about the nature of the image (for example, only rotation and proportional scaling), 2) propose a distance function, 3) propose a method for finding the average image.
* '''Data:''' Synthetic and real 1) pictures - faces and symbols with rotation and stretch transformation, 2) faces and cars with 3D rotation transformation with 2D projection. Synthetic images are proposed to be created manually using 1) photographs of a sheet of paper, 2) photographs of the surface of the drawing on a balloon.
* '''Data:''' Synthetic and real 1) pictures - faces and symbols with rotation and stretch transformation, 2) faces and cars with 3D rotation transformation with 2D projection. Synthetic images are proposed to be created manually using 1) photographs of a sheet of paper, 2) photographs of the surface of the drawing on a balloon.
* '''References:'''
* '''References:'''
Строка 359: Строка 996:
*# DTW alignment work in 2D,
*# DTW alignment work in 2D,
*# parametric alignment work.
*# parametric alignment work.
-
* '''Basic algorithm:''' from work 1.
+
* '''Base algorithm:''' from work 1.
* '''Solution:''' In the attached file pdf.
* '''Solution:''' In the attached file pdf.
* '''Novelty:''' Instead of multidimensional image alignment, parametric alignment is proposed.
* '''Novelty:''' Instead of multidimensional image alignment, parametric alignment is proposed.
-
* '''Authors:''' Alexey Goncharov, Strizhov V.V.
+
* '''Authors:''' Alexey Goncharov, Strijov V.V.
-
===Task 80===
+
===Problem 80.2021===
-
* '''Name:''' Detection of correlations between activity in social networks and capitalization of companies
+
* '''Title:''' Detection of correlations between activity in social networks and capitalization of companies
-
* '''Task:''' At present, the significant impact on stock quotes, company capitalization and the success or failure of an IPO depends on social factors such as public opinion expressed on social media. A recent notable example is the change in GameStore quotes caused by the surge in activity on Reddit. Our task at the first stage is to identify quotes between the shares of companies in different segments and activity in social networks. That is, it is necessary to identify correlations between significant changes in the company's capitalization and previous bursts (positive or negative) of its discussion in social networks. That is, it is necessary to find the minimum of the loss function when restoring the dependence in various classes of models (parametrics, neural networks, etc.). This Task is part of a large project to analyze the analysis of markets and the impact of social factors on risks (within a team of 5-7 professors), which will lead to a series of publications sufficient to defend a dissertation.
+
* '''Problem description:''' At present, the significant impact on stock quotes, company capitalization and the success or failure of an IPO depends on social factors such as public opinion expressed on social media. A recent notable example is the change in GameStore quotes caused by the surge in activity on Reddit. Our The problem at the first stage is to identify quotes between the shares of companies in different segments and activity in social networks. That is, it is necessary to identify correlations between significant changes in the company's capitalization and previous bursts (positive or negative) of its discussion in social networks. That is, it is necessary to find the minimum of the loss function when restoring the dependence in various classes of models (parametrics, neural networks, etc.). This The problem is part of a large project to analyze the analysis of markets and the impact of social factors on risks (within a team of 5-7 professors), which will lead to a series of publications sufficient to defend a dissertation.
-
* '''Data:''' Task has a significant engineering context, the data is downloads from quotes on the Moscow Exchange, as well as NYT and reddit data (crawling and parsing is done by standard tools). The student working on this task must have strong engineering skills and a desire to engage in both the practice of machine learning and the engineering parts of the task.
+
* '''Data:''' The problem has a significant engineering context, the data is downloads from quotes on the Moscow Exchange, as well as NYT and reddit data (crawling and parsing is done by standard tools). The student working on this The problem must have strong engineering skills and a desire to engage in both the practice of machine learning and the engineering parts of The problem.
* '''References:'''
* '''References:'''
-
*# Paul S. Adler and Seok-Woo Kwon. Social Capital: Prospects for a new Concept. [https://journals.aom.org/doi/abs/10.5465/AMR.2002.5922314 LINK]   
+
*# Paul S. Adler and Seok-Woo Kwon. Social Capital: Prospects for a new Concept. [https://journals.aom.org/doi/abs/10.5465/AMR.2002.5922314 LINK]
-
*# Kim and Hastak. Social network analysis: Characteristics of online social networks after a disaster [https://www.sciencedirect.com/science/article/pii/S026840121730525X?casa_token=JzqhHlll56IAAAAA:fQmNqxyErD4-VCCCFdJRA1WX0o4zdifj_zbm-vgwXDcmt26OBbAdu9gvgob0ntnlnCt_Y_ITD_g LINK]
+
*# Kim and Hastak. Social network analysis: Characteristics of online social networks after a disaster [https://www.sciencedirect.com/science/article/pii/S026840121730525X?casa_token=JzqhHlll56IAAAAA:fQmNqxyErD4-VCCCFdJRA1WX0o4zdifj_zbm-vgwXDcmt26OBbAdu9gvgob0ntnlnCt_Y_ITD_g LINK]
-
*# Baumgartner, Jason, et al. "The pushshift reddit dataset." Proceedings of the International AAAI Conference on Web and Social Media. Vol. 14. 2020. [https://ojs.aaai.org/index.php/ICWSM/article/download/7347/7201/ LINK]
+
*# Baumgartner, Jason, et al. "The pushshift reddit dataset." Proceedings of the International AAAI Conference on Web and Social Media. Vol. 14. 2020. [https://ojs.aaai.org/index.php/ICWSM/article/download/7347/7201/ LINK]
-
* '''Basic algorithm:''' The basic algorithms are LSTM and Graph neural networks.
+
* '''Base algorithm:''' The basic algorithms are LSTM and Graph neural networks.
* '''Solution:''' Let's start by using LSTM, then try some of its standard extensions
* '''Solution:''' Let's start by using LSTM, then try some of its standard extensions
* '''Novelty:''' In this area, there are a lot of economic, model solutions, but the accuracy of these solutions is not always high. The use of modern ML/DL models is expected to significantly improve the quality of the solution.
* '''Novelty:''' In this area, there are a lot of economic, model solutions, but the accuracy of these solutions is not always high. The use of modern ML/DL models is expected to significantly improve the quality of the solution.
* '''Authors:''' Expert Yuri Maksimov, consultant Yuri Maksimov, student.
* '''Authors:''' Expert Yuri Maksimov, consultant Yuri Maksimov, student.
-
===Task 88b ===
+
===Problem 88b.2021===
*'''Name:''' Finding a Pupil in an Eye Image Using the Luminance Projection Method
*'''Name:''' Finding a Pupil in an Eye Image Using the Luminance Projection Method
-
*'''Task:''' Given a monochrome bitmap of the eye, [[Media:Matveev2021project.pdf|examples]]. It is necessary to determine the approximate coordinates of the center of the pupil. The word "approximate" means that the calculated pupil center must lie inside a circle centered at the pupil's true center and half the true radius. The algorithm must be very fast.
+
*'''Problem:''' Given a monochrome bitmap of the eye, [[Media:Matveev2021project.pdf|examples]]. It is necessary to determine the approximate coordinates of the center of the pupil. The word "approximate" means that the calculated pupil center must lie inside a circle centered at the pupil's true center and half the true radius. The algorithm must be very fast.
*'''Data:''' About 200 thousand eye images. For each, the position of the true circle is marked - for the purpose of training and testing the method being created.
*'''Data:''' About 200 thousand eye images. For each, the position of the true circle is marked - for the purpose of training and testing the method being created.
-
'''Basic algorithm:''' To speed up work with the image, it is proposed to aggregate data using brightness projections. Image brightness is a function of two discrete arguments. Its projection on the horizontal axis is equal to. Similarly, projections are constructed on axes with an inclination. Having built several projections (two, four), based on them, you can try to determine the position of the pupil (compact dark area) using heuristics and / or a neural network. It is interesting to evaluate the capabilities of the neural network in this task.
+
'''Basic algorithm:''' To speed up work with the image, it is proposed to aggregate data using brightness projections. Image brightness is a function of two discrete arguments. Its projection on the horizontal axis is equal to. Similarly, projections are constructed on axes with an inclination. Having built several projections (two, four), based on them, you can try to determine the position of the pupil (compact dark area) using heuristics and / or a neural network. It is interesting to evaluate the capabilities of the neural network in this The problem.
*'''References:''' Zhi-Hua Zhou, Xin Geng Projection functions for eye detection // Pattern Recognition. 2004. V.37ю N.5. P.1049-1056. [https://doi.org/10.1016/j.patcog.2003.09.006 PDF]
*'''References:''' Zhi-Hua Zhou, Xin Geng Projection functions for eye detection // Pattern Recognition. 2004. V.37ю N.5. P.1049-1056. [https://doi.org/10.1016/j.patcog.2003.09.006 PDF]
*'''Author:''' Matveev I.A.
*'''Author:''' Matveev I.A.
-
===Task 88c ===
+
===Problem 88c.2021===
*'''Name:''' Searching for a century in an image as a parabolic contour using the projection method.
*'''Name:''' Searching for a century in an image as a parabolic contour using the projection method.
-
*'''Task:''' Given a monochrome bitmap of the eye, [[Media:Matveev2021project.pdf|examples]]. It is necessary to find the contour of the upper eyelid as a parabola, that is, to determine the parameters.
+
*'''Problem:''' Given a monochrome bitmap of the eye, [[Media:Matveev2021project.pdf|examples]]. It is necessary to find the contour of the upper eyelid as a parabola, that is, to determine the parameters.
*'''Data:''' About 200 thousand eye images. For some (about 2500), a human expert marked the position of a parabola that approximates the eyelid.
*'''Data:''' About 200 thousand eye images. For some (about 2500), a human expert marked the position of a parabola that approximates the eyelid.
*'''Basic algorithm:''' The first step is pre-processing the image with a vertical gradient filter with further binarization, below is a typical result. There are various options for the next step. For example, if the coordinates of the pupil are known, you can set the region of interest (from above) and in it, using the selected points, construct a parabola by approximation using the least squares method. An example result is given below. More subtle methods are possible, such as finding a parabola using the Hough transform (see Wikipedia). Another way is to use projective methods (Radon transform). The main idea: after specifying the coefficient , apply a coordinate transformation to the image, as a result of which all parabolas of the form formula turn into lines of the form , then, given the coefficient , apply the coordinate transformation where , after which the oblique lines of the formula form become horizontal, which are easy to determine, for example, by horizontal projection (by summing the values in the rows of the matrix of the resulting image. If the coefficients are guessed correctly, the perabola representing the eyelid will give a clear maximum in the projection. By going through the formula (having a physical meaning), you can find those that give the maximum projection value, and consider that the desired parabola - eyelid.
*'''Basic algorithm:''' The first step is pre-processing the image with a vertical gradient filter with further binarization, below is a typical result. There are various options for the next step. For example, if the coordinates of the pupil are known, you can set the region of interest (from above) and in it, using the selected points, construct a parabola by approximation using the least squares method. An example result is given below. More subtle methods are possible, such as finding a parabola using the Hough transform (see Wikipedia). Another way is to use projective methods (Radon transform). The main idea: after specifying the coefficient , apply a coordinate transformation to the image, as a result of which all parabolas of the form formula turn into lines of the form , then, given the coefficient , apply the coordinate transformation where , after which the oblique lines of the formula form become horizontal, which are easy to determine, for example, by horizontal projection (by summing the values in the rows of the matrix of the resulting image. If the coefficients are guessed correctly, the perabola representing the eyelid will give a clear maximum in the projection. By going through the formula (having a physical meaning), you can find those that give the maximum projection value, and consider that the desired parabola - eyelid.
Строка 393: Строка 1030:
*'''Author:''' Matveev I.A.
*'''Author:''' Matveev I.A.
-
=== Task 62===
+
===Problem 62.2021===
-
* '''Name:''' Construction of a method for dynamic alignment of multidimensional time series, resistant to local signal fluctuations.
+
* '''Title:''' Construction of a method for dynamic alignment of multidimensional time series, resistant to local signal fluctuations.
-
* '''Task:''' In the process of working with multidimensional time series, the situation of close proximity of sensors corresponding to different measurement channels is common. As a result, small signal shifts in space can lead to signal peak fixation by neighboring sensors, which leads to significant differences in measurements in terms of L2 distance.<br />Thus, small signal shifts lead to significant fluctuations in the readings of the sensors. The Task of constructing a distance function between points of time series that is resistant to noise generated by small spatial signal shifts is considered. It is necessary to consider the problem in the approximation of the presence of a map of the location of the sensors.
+
* '''Problem description:''' In the process of working with multidimensional time series, the situation of the close proximity of sensors corresponding to different measurement channels is common. As a result, small signal shifts in space can lead to signal peak fixation by neighboring sensors, which leads to significant differences in measurements in terms of L2 distance.<br />Thus, small signal shifts lead to significant fluctuations in the readings of the sensors. The problem of constructing a distance function between points of time series that is resistant to noise generated by small spatial signal shifts is considered. It is necessary to consider the problem in the approximation of the presence of a map of the location of the sensors.
* '''Data:'''
* '''Data:'''
-
**[http://neurotycho.org/download Monkey brain activity measurements]
+
*# [http://neurotycho.org/download Monkey brain activity measurements]
-
**Artificially created data (several options must be proposed, for example: signal movement in space clockwise and counterclockwise)
+
*# Artificially created data (several options must be proposed, for example signal movement in space clockwise and counterclockwise)
-
* '''References:''':
+
* '''References:'''
-
**[https://www.cs.unm.edu/~mueen/DTW.pdf Обзорная презентация о DTW]
+
*# [https://www.cs.unm.edu/~mueen/DTW.pdf Reviriew DTW]
-
**[https://www.researchgate.net/publication/228740947_Multi-dimensional_dynamic_time_warping_for_gesture_recognition Multi-Dimensional Dynamic Time Warping for Gesture Recognition]
+
*# [https://www.researchgate.net/publication/228740947_Multi-dimensional_dynamic_time_warping_for_gesture_recognition Multi-Dimensional Dynamic Time Warping for Gesture Recognition]
-
**[https://www.semanticscholar.org/paper/Multiple-Multidimensional-Sequence-Alignment-Using-Sanguansat/76d35bd5a52453ebde80faaa1467d7effd74426f Multiple Multidimensional Sequence Alignment Using Generalized Dynamic Time Warping]
+
*# [https://www.semanticscholar.org/paper/Multiple-Multidimensional-Sequence-Alignment-Using-Sanguansat/76d35bd5a52453ebde80faaa1467d7effd74426f Multiple Multidimensional Sequence Alignment Using Generalized Dynamic Time Warping]
-
* '''Basic algorithm:''' L2 distance between a pair of measurements.
+
* '''Base algorithm:''' L2 distance between a pair of measurements.
* '''Solution:''' Use the DTW distance function between two multidimensional time series. Two time axes are aligned, while inside the DTW functional, the distance between the i-th and j-th measurements is chosen such that it is resistant to local “shifts” of the signal. It is required to offer such functionality. The basic solution is L2, the improved solution is DTW between the i-th and j-th dimensions (dtw inside dtw).<br />You can suggest some modification, for example, the distance between the hidden layers of the autoencoder for points i and j.
* '''Solution:''' Use the DTW distance function between two multidimensional time series. Two time axes are aligned, while inside the DTW functional, the distance between the i-th and j-th measurements is chosen such that it is resistant to local “shifts” of the signal. It is required to offer such functionality. The basic solution is L2, the improved solution is DTW between the i-th and j-th dimensions (dtw inside dtw).<br />You can suggest some modification, for example, the distance between the hidden layers of the autoencoder for points i and j.
* '''Novelty:''' A method for aligning multidimensional time series is proposed that takes into account small signal fluctuations in space.
* '''Novelty:''' A method for aligning multidimensional time series is proposed that takes into account small signal fluctuations in space.
-
* '''Authors:''' Expert - Strizhov V.V., consultants - Gleb Morgachev, Alexey Goncharov.
+
* '''Authors:''' Expert Strijov V.V., consultants Gleb Morgachev, Alexey Goncharov.
-
===Task 58 ===
+
===Problem 58.2021===
-
* '''Name:''' Transformation of the Gerchberg-Saxton algorithm using Bayesian neural networks. (or Neural network approach in the problem of phase search for images from the European synchrotron)
+
* '''Title:''' Transformation of the Gerchberg-Saxton algorithm using Bayesian neural networks. (or Neural network approach in the problem of phase search for images from the European synchrotron)
-
* '''Task:''' The aim of the project is to improve the quality of resolution of images of nanosized objects obtained in the laboratories of the European Synchrotron Radiation Foundation.
+
* '''Problem description:''' The aim of the project is to improve the quality of resolution of images of nanosized objects obtained in the laboratories of the European Synchrotron Radiation Foundation.
* '''Data:''' Contact an advisor for data (3GB).
* '''Data:''' Contact an advisor for data (3GB).
-
'''References:''':
+
'''References:'''
-
** [https://arxiv.org/pdf/1809.04626.pdf] Iterative phase retrieval in coherent diffractive imaging: practical issues
+
*# [https://arxiv.org/pdf/1809.04626.pdf] Iterative phase retrieval in coherent diffractive imaging: practical issues
-
** [https://www.nature.com/articles/s41467-019-08635-x#Sec15] X-ray nanotomography of coccolithophores reveals that coccolith mass and segment number correlate with grid size
+
*# [https://www.nature.com/articles/s41467-019-08635-x#Sec15] X-ray nanotomography of coccolithophores reveals that coccolith mass and segment number correlate with grid size
-
** [https://www.nature.com/articles/s41598-018-34253-6#Sec14] Lens-free microscopy for 3D + time acquisitions of 3D cell culture
+
*# [https://www.nature.com/articles/s41598-018-34253-6#Sec14] Lens-free microscopy for 3D + time acquisitions of 3D cell culture
-
** [https://arxiv.org/pdf/1904.11301.pdf] DEEP ITERATIVE RECONSTRUCTION FOR PHASE RETRIEVAL
+
*# [https://arxiv.org/pdf/1904.11301.pdf] DEEP ITERATIVE RECONSTRUCTION FOR PHASE RETRIEVAL
-
** https://docs.google.com/document/d/1K7bIzU33MSfeUvg3WITRZX0pe3sibbtH62aw42wxsEI/edit?ts=5e42f70e LinkReview
+
*# https://docs.google.com/document/d/1K7bIzU33MSfeUvg3WITRZX0pe3sibbtH62aw42wxsEI/edit?ts=5e42f70e LinkReview
-
* '''Basic algorithm:''' The transition from direct space to reciprocal space occurs using the Fourier transform. The Fourier transform is a linear transformation. Therefore, it is proposed to approximate it with a neural network. For example, an autoencoder for modeling forward and inverse Fourier transforms.
+
* '''Base algorithm:''' The transition from direct space to reciprocal space occurs using the Fourier transform. The Fourier transform is a linear transformation. Therefore, it is proposed to approximate it with a neural network. For example, an autoencoder for modeling forward and inverse Fourier transforms.
*'''Solution:''' Transformation of the Gerchberg-Saxton algorithm using Bayesian neural networks. Use of information on physical limitations and expertise.
*'''Solution:''' Transformation of the Gerchberg-Saxton algorithm using Bayesian neural networks. Use of information on physical limitations and expertise.
*'''Novelty:''' Use of information about physical constraints and expert knowledge in the construction of the error function.
*'''Novelty:''' Use of information about physical constraints and expert knowledge in the construction of the error function.
-
*'''Authors:''': Experts Sergei Grudinin, Yuri Chushkin, Strizhov V.V., consultant Mark Potanin
+
*'''Authors:''' Experts Sergei Grudinin, Yuri Chushkin, Strijov V.V., consultant Mark Potanin
-
=== Task 63===
+
 
-
* '''Name:''' Hierarchical alignment of time sequences.
+
===Problem 63.2021===
-
* '''Task:''' Task of alignment of sequences of difficult events is considered. An example is the complex behavior of a person: when considering data from IMU sensors, one can put forward a hypothesis: there is an initial signal, there are aggregates of “elementary actions” and there are aggregates of “actions” of a person. Each of the indicated levels of abstraction can be distinguished and operated on exactly by it.<br />In order to accurately recognize the sequence of actions, it is possible to use metric methods (for example, DTW, as a method that is resistant to time shifts). For a more accurate quality of timeline alignment, it is possible to carry out alignment at different levels of abstraction.<br />It is proposed to explore such a hierarchical approach to sequence alignment, based on the possibility of applying alignment algorithms to objects of different structures, having a distance function on them.
+
* '''Title:''' Hierarchical alignment of time sequences.
 +
* '''Problem description:''' The problem of alignment of sequences of difficult events is considered. An example is the complex behavior of a person: when considering data from IMU sensors, one can put forward a hypothesis: there is an initial signal, there are aggregates of “elementary actions” and there are aggregates of “actions” of a person. Each of the indicated levels of abstraction can be distinguished and operated on exactly by it.<br />In order to accurately recognize the sequence of actions, it is possible to use metric methods (for example, DTW, as a method that is resistant to time shifts). For a more accurate quality of timeline alignment, it is possible to carry out alignment at different levels of abstraction.<br />It is proposed to explore such a hierarchical approach to sequence alignment, based on the possibility of applying alignment algorithms to objects of different structures, having a distance function on them.
* '''References:'''
* '''References:'''
-
**[https://www.cs.unm.edu/~mueen/DTW.pdf Overview presentation about DTW]
+
*# [https://www.cs.unm.edu/~mueen/DTW.pdf Overview presentation about DTW]
-
**[https://link.springer.com/article/10.1007/s00371-015-1092-0 DTW-based kernel and rank-level fusion for 3D gait recognition using Kinect Multi-Dimensional Dynamic Time Warping for Gesture Recognition]
+
*# [https://link.springer.com/article/10.1007/s00371-015-1092-0 DTW-based kernel and rank-level fusion for 3D gait recognition using Kinect Multi-Dimensional Dynamic Time Warping for Gesture Recognition]
-
**[https://ieeexplore.ieee.org/abstract/document/8966048 Time Series Similarity Measure via Siamese Convolutional Neural Network]
+
*# [https://ieeexplore.ieee.org/abstract/document/8966048 Time Series Similarity Measure via Siamese Convolutional Neural Network]
-
**[https://www.semanticscholar.org/paper/Multiple-Multidimensional-Sequence-Alignment-Using-Sanguansat/76d35bd5a52453ebde80faaa1467d7effd74426f Multiple Multidimensional Sequence Alignment Using Generalized Dynamic Time Warping]
+
*# [https://www.semanticscholar.org/paper/Multiple-Multidimensional-Sequence-Alignment-Using-Sanguansat/76d35bd5a52453ebde80faaa1467d7effd74426f Multiple Multidimensional Sequence Alignment Using Generalized Dynamic Time Warping]
-
* '''Basic algorithm:''' classic DTW.
+
* '''Base algorithm:''' classic DTW.
* '''Solution:''' It is proposed to perform the transition from one level of abstraction to another by using convolutional and recurrent neural networks. Then the object at the lower level of abstraction is the original signal. At the second level - a signal from the hidden layer of the model (built on the objects of the lower level), the dimension of which is much less, and the upper layer - a signal from the hidden layer of the model (built on the objects of the middle level).<br />In this case, DTW is calculated separately between the lower , between the middle and between the upper levels, but the formation of objects for calculating the distance is carried out taking into account the alignment path between the objects of the previous level.<br />This method is considered as a way to increase the interpretability of the alignment procedure and the accuracy of the action classification in connection with the transition to higher-level patterns. In addition, a significant increase in speed is expected.
* '''Solution:''' It is proposed to perform the transition from one level of abstraction to another by using convolutional and recurrent neural networks. Then the object at the lower level of abstraction is the original signal. At the second level - a signal from the hidden layer of the model (built on the objects of the lower level), the dimension of which is much less, and the upper layer - a signal from the hidden layer of the model (built on the objects of the middle level).<br />In this case, DTW is calculated separately between the lower , between the middle and between the upper levels, but the formation of objects for calculating the distance is carried out taking into account the alignment path between the objects of the previous level.<br />This method is considered as a way to increase the interpretability of the alignment procedure and the accuracy of the action classification in connection with the transition to higher-level patterns. In addition, a significant increase in speed is expected.
* '''Novelty:''' The idea of aligning time sequences simultaneously at several levels of abstraction is proposed. The method should significantly improve the interpretability of alignment algorithms and increase their speed.
* '''Novelty:''' The idea of aligning time sequences simultaneously at several levels of abstraction is proposed. The method should significantly improve the interpretability of alignment algorithms and increase their speed.
-
* '''Authors:''' Strizhov V.V. - Expert, Gleb Morgachev, Alexey Goncharov - consultants.
+
* '''Authors:''' Strijov V.V. Expert, Gleb Morgachev, Alexey Goncharov consultants.
-
===Task 57 ===
+
===Problem 57.2021===
-
* '''Name:'''Additive Regularization and in the Tasks of Privileged Learning in Solving the Problem of Predicting the State of the Ocean
+
* '''Title:'''Additive Regularization and in The problems of Privileged Learning in Solving the Problem of Predicting the State of the Ocean
-
* '''Task:''' There is a sample of data from ocean buoys, it is required to predict the state of the ocean at different points in time.
+
* '''Problem description:''' There is a sample of data from ocean buoys, it is required to predict the state of the ocean at different points in time.
* '''Data:''' The buoys provide data on wave height, wind speed, wind direction, wave period, sea level pressure, air temperature and sea surface temperature with a resolution of 10 minutes to 1 hour.
* '''Data:''' The buoys provide data on wave height, wind speed, wind direction, wave period, sea level pressure, air temperature and sea surface temperature with a resolution of 10 minutes to 1 hour.
-
*References:
+
*# '''References:'''
-
** [https://arxiv.org/pdf/1906.00195.pdf]
+
*# [https://arxiv.org/pdf/1906.00195.pdf]
-
* '''Basic algorithm:''' Using a simple neural network.
+
* '''Base algorithm:''' Using a simple neural network.
* '''Solution:'''Adding to the basic algorithm (a simple neural network) a system of differential equations. Explore the properties of the parameter space of teacher and student according to the preferred approach.
* '''Solution:'''Adding to the basic algorithm (a simple neural network) a system of differential equations. Explore the properties of the parameter space of teacher and student according to the preferred approach.
*'''Novelty:''' Investigation of the parameter space of the teacher and the student and their change. It is possible to set up separate teacher and student models and track the change in their parameters in the optimization process - variance, change in the quality of the student when adding teacher information, complexity.
*'''Novelty:''' Investigation of the parameter space of the teacher and the student and their change. It is possible to set up separate teacher and student models and track the change in their parameters in the optimization process - variance, change in the quality of the student when adding teacher information, complexity.
-
* '''Authors:''': Strizhov V.V., Mark Potanin
+
* '''Authors:''' Strijov V.V., Mark Potanin
-
 
+
===Problem 52. 2021===
-
===Task 52 ===
+
* '''Title:''' Predicting the quality of protein models using spherical convolutions on 3D graphs.
-
* '''Name:''' Predicting the quality of protein models using spherical convolutions on 3D graphs.
+
* '''Problem:''' The purpose of this work is to create and study a new convolution operation on three-dimensional graphs in the framework of solving the problem of assessing the quality of three-dimensional protein models (The problem regression on graph nodes).
-
* '''Task''': The purpose of this work is to create and study a new convolution operation on three-dimensional graphs in the framework of solving the problem of assessing the quality of three-dimensional protein models (task regression on graph nodes).
+
* '''Data:''' Models generated by CASP competitors are used (http://predictioncenter.org).
* '''Data:''' Models generated by CASP competitors are used (http://predictioncenter.org).
-
* '''References:''':
+
* '''References:'''
-
** [https://drive.google.com/file/d/1pXCED8XBcxbjwtg_1wZG0oAjvUCxFlua/view?usp=sharing] More about the task.
+
*# [https://drive.google.com/file/d/1pXCED8XBcxbjwtg_1wZG0oAjvUCxFlua/view?usp=sharing] More about The problem.
-
** [https://arxiv.org/abs/1806.01261] Relational inductive biases, deep learning, and graph networks.
+
*# [https://arxiv.org/abs/1806.01261] Relational inductive biases, deep learning, and graph networks.
-
** [https://arxiv.org/abs/1611.08097] Geometric deep learning: going beyond euclidean data.
+
*# [https://arxiv.org/abs/1611.08097] Geometric deep learning: going beyond euclidean data.
-
* '''Basic algorithm:''' As a basic algorithm, we will use a neural network based on the graph convolution method, which is generally described in [https://arxiv.org/abs/1806.01261].
+
* '''Base algorithm:''' As a basic algorithm, we will use a neural network based on the graph convolution method, which is generally described in [https://arxiv.org/abs/1806.01261].
* '''Solution:''' The presence of a peptide chain in proteins makes it possible to uniquely introduce local coordinate systems for all graph nodes, which makes it possible to create and apply spherical filters regardless of the graph topology.
* '''Solution:''' The presence of a peptide chain in proteins makes it possible to uniquely introduce local coordinate systems for all graph nodes, which makes it possible to create and apply spherical filters regardless of the graph topology.
-
* '''Novelty:''' In the general case, graphs are irregular structures, and in many graph learning tasks, the sample objects do not have a single topology. Therefore, the existing operations of convolutions on graphs are greatly simplified or do not generalize to different topologies. In this paper, we propose to consider a new method for constructing a convolution operation on three-dimensional graphs, for which it is possible to uniquely choose local coordinate systems associated with each node.
+
* '''Novelty:''' In the general case, graphs are irregular structures, and in many graph learning The problems, the sample objects do not have a single topology. Therefore, the existing operations of convolutions on graphs are greatly simplified or do not generalize to different topologies. In this paper, we propose to consider a new method for constructing a convolution operation on three-dimensional graphs, for which it is possible to uniquely choose local coordinate systems associated with each node.
* '''Authors:''' Sergei Grudinin, Ilya Igashov.
* '''Authors:''' Sergei Grudinin, Ilya Igashov.
-
===Task 44+ ===
+
===Problem 44+. 2021===
-
*'''Name:''' Early prediction of sufficient sample size for a generalized linear model.
+
*'''Title:''' Early prediction of sufficient sample size for a generalized linear model.
-
*'''Task''': The problem of experiment planning is investigated. The Task of estimating a sufficient sample size according to the data is solved. The sample is assumed to be simple. It is described by an adequate model. Otherwise, the sample is generated by a fixed probabilistic model from a known class of models. The sample size is considered sufficient if the model is restored with sufficient confidence. It is required, knowing the model, to estimate a sufficient sample size at the early stages of data collection.
+
*'''Deiscription''': The problem of experiment planning is investigated. The problem of estimating a sufficient sample size according to the data is solved. The sample is assumed to be simple. It is described by an adequate model. Otherwise, the sample is generated by a fixed probabilistic model from a known class of models. The sample size is considered sufficient if the model is restored with sufficient confidence. It is required, knowing the model, to estimate a sufficient sample size at the early stages of data collection.
-
* '''Цель''': On a small simple iid sample, predict the error on a replenished large one. The predictive model is smooth monotonic in two derivatives. The choice of model is a complete enumeration or genetics. The model depends on the reduced (explore) covariance matrix of the GLM parameters.
+
* '''Goal''': On a small simple iid sample, predict the error on a replenished large one. The predictive model is smooth monotonic in two derivatives. The choice of model is a complete enumeration or genetics. The model depends on the reduced (explore) covariance matrix of the GLM parameters.
*'''Data:''' For the computational experiment, it is proposed to use classical samples from the UCI repository. Link to selections https://github.com/ttgadaev/SampleSizeEstimation/tree/master/datasets
*'''Data:''' For the computational experiment, it is proposed to use classical samples from the UCI repository. Link to selections https://github.com/ttgadaev/SampleSizeEstimation/tree/master/datasets
-
*'''References:''':
+
*'''References:'''
*# [https://docs.google.com/document/d/1o2gtdV3nYeAsfW0JZ5fESlVPhCA4_lfUOVnWhRjg1ck/edit?usp=sharing Overview of Methods, Motivation and Problem Statement for Sample Size Estimation]
*# [https://docs.google.com/document/d/1o2gtdV3nYeAsfW0JZ5fESlVPhCA4_lfUOVnWhRjg1ck/edit?usp=sharing Overview of Methods, Motivation and Problem Statement for Sample Size Estimation]
*# http://svn.code.sf.net/p/mlalgorithms/code/PhDThesis/.
*# http://svn.code.sf.net/p/mlalgorithms/code/PhDThesis/.
*# Bootstrap method. https://projecteuclid.org/download/pdf_1/euclid.aos/1.
*# Bootstrap method. https://projecteuclid.org/download/pdf_1/euclid.aos/1.
Bishop, C. 2006. Pattern Recognition and Machine Learning. Berlin: Springer. 758 p.
Bishop, C. 2006. Pattern Recognition and Machine Learning. Berlin: Springer. 758 p.
-
*'''Basic algorithm''': We will say that the sample size is sufficient if the log-likelihood has a small variance, on a sample of size m calculated using the bootstrap.
+
*'''Basic algorithm''': We will say that the sample size is sufficient if the log-likelihood has a small variance on a sample of size m calculated using the bootstrap.
We are trying to approximate the dependence of the average value of log-likelihood and its variance on the sample size.
We are trying to approximate the dependence of the average value of log-likelihood and its variance on the sample size.
*'''Solution:''' The methods described in the review are asymptotic or require a deliberately large sample size. The new method should be to predict volume in the early stages of experiment design, i.e. when data is scarce.
*'''Solution:''' The methods described in the review are asymptotic or require a deliberately large sample size. The new method should be to predict volume in the early stages of experiment design, i.e. when data is scarce.
-
*'''Authors:''' consultant - Malinovsky G., Strizhov V.V. (Expert)
+
*'''Authors:''' expert Strijov V.V., consultant Malinovsky G.
-
 
+
===Problem 12.2021===
-
=== Task 12 ===
+
* '''Title:''' Machine translation training without parallel texts.
-
* '''Name:''' Machine translation training without parallel texts.
+
* '''Problem:''' The problem of building a text translation model without the use of parallel texts is considered, i.e. pairs of identical sentences in different languages. This The problem occurs when building translation models for low-resource languages (that is, languages for which there is not much data in the public domain).
-
* '''Task''': The Task of building a text translation model without the use of parallel texts is considered, i.e. pairs of identical sentences in different languages. This Task occurs when building translation models for low-resource languages (that is, languages for which there is not much data in the public domain).
+
* '''Data:''' A selection of articles from Wikipedia in two languages.
* '''Data:''' A selection of articles from Wikipedia in two languages.
-
* '''References:''':
+
* '''References:'''
-
** [https://arxiv.org/abs/1711.00043] Unsupervised Machine Translation Using Monolingual Corpora Only
+
*# [https://arxiv.org/abs/1711.00043] Unsupervised Machine Translation Using Monolingual Corpora Only
-
** [https://arxiv.org/pdf/1609.08144.pdf] Sequence to sequence.
+
*# [https://arxiv.org/pdf/1609.08144.pdf] Sequence to sequence.
-
** [http://www.cs.toronto.edu/~larocheh/publications/icml-2008-denoising-autoencoders.pdf] Autoencoding.
+
*# [http://www.cs.toronto.edu/~larocheh/publications/icml-2008-denoising-autoencoders.pdf] Autoencoding.
-
** [https://arxiv.org/pdf/1511.06709.pdf] Training with Monolingual Training Data.
+
*# [https://arxiv.org/pdf/1511.06709.pdf] Training with Monolingual Training Data.
* '''Basic algorithm''': Unsupervised Machine Translation Using Monolingual Corpora Only.
* '''Basic algorithm''': Unsupervised Machine Translation Using Monolingual Corpora Only.
* '''Solution:''' As a translation model, it is proposed to consider a combination of two auto-encoders, each of which is responsible for presenting sentences in one of the languages. The models are optimized in such a way that the latent spaces of autoencoders for different languages match. As an initial representation of sentences, it is proposed to consider their graph description obtained using multilingual ontologies.
* '''Solution:''' As a translation model, it is proposed to consider a combination of two auto-encoders, each of which is responsible for presenting sentences in one of the languages. The models are optimized in such a way that the latent spaces of autoencoders for different languages match. As an initial representation of sentences, it is proposed to consider their graph description obtained using multilingual ontologies.
* '''Novelty:''' A method for constructing a translation model is proposed, taking into account graph descriptions of sentences.
* '''Novelty:''' A method for constructing a translation model is proposed, taking into account graph descriptions of sentences.
-
* '''Authors:''' Oleg Bakhteev, Strizhov V.V.,
+
* '''Authors:''' Oleg Bakhteev, Strijov V.V.,
-
 
+
===Problem 8.2021===
-
===Task 8 ===
+
* '''Title:''' Generation of features using locally approximating models (Classification of human activities according to measurements of fitness bracelets).
-
* '''Name:''' Generation of features using locally approximating models (Classification of human activities according to measurements of fitness bracelets).
+
* '''Problem:''' It is required to check the feasibility of the hypothesis about the simplicity of sampling for the generated features. Features are the optimal parameters of approximating models. Moreover, the entire sample is not simple and requires a mixture of models to approximate it. Explore the information content of the generated features - the parameters of the approximating models trained on the segments of the original time series. According to the measurements of the accelerometer and gyroscope, it is required to determine the type of activity of the worker. It is assumed that the time series of measurements contain elementary movements that form clusters in the space of time series descriptions. The characteristic duration of the movement is seconds. Time series are labeled with activity type labels: work, leisure. The typical duration of activity is minutes. It is required to restore the type of activity according to the description of the time series and cluster.
-
* '''Task''': It is required to check the feasibility of the hypothesis about the simplicity of sampling for the generated features. Features are the optimal parameters of approximating models. Moreover, the entire sample is not simple and requires a mixture of models to approximate it. Explore the information content of the generated features - the parameters of the approximating models trained on the segments of the original time series. According to the measurements of the accelerometer and gyroscope, it is required to determine the type of activity of the worker. It is assumed that the time series of measurements contain elementary movements that form clusters in the space of time series descriptions. The characteristic duration of the movement is seconds. Time series are labeled with activity type labels: work, leisure. The typical duration of activity is minutes. It is required to restore the type of activity according to the description of the time series and cluster.
+
* '''Data:''' WISDM accelerometer time series ([[Time series (library of examples)]], section Accelerometry).
* '''Data:''' WISDM accelerometer time series ([[Time series (library of examples)]], section Accelerometry).
-
** WISDM (Kwapisz, J.R., G.M. Weiss, and S.A. Moore. 2011. Activity recognition using cell phone accelerometers. ACM SigKDD Explorations Newsletter. 12(2):74–82.), USC-HAD или сложнее. Данные акселерометра (Human activity recognition using smart phone embedded sensors: A Linear Dynamical Systems method, W Wang, H Liu, L Yu, F Sun - Neural Networks (IJCNN), 2014)
+
*# WISDM (Kwapisz, J.R., G.M. Weiss, and S.A. Moore. 2011. Activity recognition using cell phone accelerometers. ACM SigKDD Explorations Newsletter. 12(2):74–82.), USC-HAD. Human activity recognition using smart phone embedded sensors: A Linear Dynamical Systems method, W Wang, H Liu, L Yu, F Sun - Neural Networks (IJCNN), 2014.
-
* '''References:''':
+
* '''References:'''
-
** Motrenko A.P., Strijov V.V. Extracting fundamental periods to segment human motion time series // Journal of Biomedical and Health Informatics, 2016, Vol. 20, No. 6, 1466 - 1476. [http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group874/Motrenko2014TSsegmentation/JBHI/MotrenkoStrijov2014RV2.pdf?format=raw URL]
+
*# Motrenko A.P., Strijov V.V. Extracting fundamental periods to segment human motion time series // Journal of Biomedical and Health Informatics, 2016, Vol. 20, No. 6, 1466 - 1476. [http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group874/Motrenko2014TSsegmentation/JBHI/MotrenkoStrijov2014RV2.pdf?format=raw URL]
-
** Карасиков М.Е., Strizhov V.V. Classification of time series in the space of parameters of generating models // Informatics and its applications, 2016.[http://strijov.com/papers/Karasikov2016TSC.pdf URL]
+
*# Karasikov M.E., Strijov V.V. Classification of time series in the space of parameters of generating models // Informatics and its applications, 2016.[http://strijov.com/papers/Karasikov2016TSC.pdf URL]
-
** Kuznetsov M.P., Ivkin N.P. Algorithm for Classifying Accelerometer Time Series by Combined Feature Description // Machine Learning and Data Analysis. 2015. T. 1, No. 11. C. 1471 - 1483. [http://jmlda.org/papers/doc/2015/no11/Ivkin2015TSclassification.pdf URL]
+
*# Kuznetsov M.P., Ivkin N.P. Algorithm for Classifying Accelerometer Time Series by Combined Feature Description // Machine Learning and Data Analysis. 2015. T. 1, No. 11. C. 1471 - 1483. [http://jmlda.org/papers/doc/2015/no11/Ivkin2015TSclassification.pdf URL]
-
** Isachenko R.V., Strizhov V.V. Metric learning in Taskx multiclass classification of time series // Informatics and its applications, 2016, 10(2) : 48-57. [http://strijov.com/papers/Isachenko2016MetricsLearning.pdf URL]
+
*# Isachenko R.V., Strijov V.V. Metric learning in The problem of multiclass classification of time series // Informatics and its applications, 2016, 10(2) : 48-57. [http://strijov.com/papers/Isachenko2016MetricsLearning.pdf URL]
-
** Zadayanchuk A.I., Popova M.S., Strizhov V.V. Choosing the optimal model for classifying physical activity based on accelerometer measurements // Information technologies, 2016. [http://strijov.com/papers/Zadayanchuk2015OptimalNN4.pdf URL]
+
*# Zadayanchuk A.I., Popova M.S., Strijov V.V. Choosing the optimal model for classifying physical activity based on accelerometer measurements // Information technologies, 2016. [http://strijov.com/papers/Zadayanchuk2015OptimalNN4.pdf URL]
-
** Ignatov A., Strijov V. Human activity recognition using quasiperiodic time series collected from a single triaxial accelerometer // Multimedia Tools and Applications, 2015, 17.05.2015 : 1-14. [http://strijov.com/papers/Ignatov2015HumanActivity.pdf URL]
+
*# Ignatov A., Strijov V. Human activity recognition using quasiperiodic time series collected from a single triaxial accelerometer // Multimedia Tools and Applications, 2015, 17.05.2015 : 1-14. [http://strijov.com/papers/Ignatov2015HumanActivity.pdf URL]
-
* '''Basic algorithm''': Basic algorithm described in [Karasikov, Strizhov: 2016] and [Kuznetsov, Ivkin: 2014].
+
* '''Basic algorithm''': Basic algorithm described in [Karasikov, Strijov: 2016] and [Kuznetsov, Ivkin: 2014].
* '''Solution:''' It is required to build a set of locally approximating models and choose the most adequate ones. Find the optimal segmentation method and the optimal description of the time series. Construct a metric space of descriptions of elementary motions.
* '''Solution:''' It is required to build a set of locally approximating models and choose the most adequate ones. Find the optimal segmentation method and the optimal description of the time series. Construct a metric space of descriptions of elementary motions.
* '''Novelty:''' A standard for building locally approximating models has been created. The connection of two characteristic times of the description of human life, the combined statement of the problem.
* '''Novelty:''' A standard for building locally approximating models has been created. The connection of two characteristic times of the description of human life, the combined statement of the problem.
-
* '''Authors:''' Expert - Strizhov V.V., consultants - Alexandra Galtseva, Danil Sayranov.
+
* '''Authors:''' Expert Strijov V.V., consultants Alexandra Galtseva, Danil Sayranov.
-
 
+
-
=2020=
+
-
* Story [[Automation of scientific research in machine learning (practice, Strizhov V.V.)/ Group 674, spring 2019|2019 (674)]] — [[Automation of scientific research in machine learning (practice, Strizhov V.V.)/ Group 694, spring 2019|2019 (694)]] — [[Numerical Methods for Case-Based Learning (practice, Strizhov V.V.)/Group 574, spring 2018 | 2018]] — [[Numerical Methods for Case-Based Learning (practice, Strizhov V.V.)/Group 474, spring 2017 | 2017]] — [[Numerical Methods for Case-Based Learning (practice, Strizhov V.V.)/Group 374, spring 2016 | 2016]] — [[Numerical Methods for Case-Based Learning (practice, Strizhov V.V.)/Group 274, spring 2015 | 2015]] — [[Numerical Methods for Case-Based Learning (practice, Strizhov V.V.)/Group 174, spring 2014 | 2014]] — [[Numerical Methods for Case-Based Learning (practice, Strizhov V.V.)/Group 074, spring 2013 | 2013]]
+
==2020==
{|class="wikitable"
{|class="wikitable"
Строка 522: Строка 1155:
|-
|-
|-
|-
-
|[[Участник:Magistrkoljan| Grebenkova Olga]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Magistrkoljan Grebenkova Olga]
|Variational optimization of deep learning models with model complexity control
|Variational optimization of deep learning models with model complexity control
|[https://docs.google.com/document/d/1gHyVeYgzFgco1vUTZRjxT2FbO03GsB27EVEstLWTzdM/edit?usp=sharing LinkReview]
|[https://docs.google.com/document/d/1gHyVeYgzFgco1vUTZRjxT2FbO03GsB27EVEstLWTzdM/edit?usp=sharing LinkReview]
Строка 529: Строка 1162:
[https://github.com/Intelligent-Systems-Phystech/2020-Project60/raw/master/slides/Grebenkova2020OptimizationSlides.pdf Slides]
[https://github.com/Intelligent-Systems-Phystech/2020-Project60/raw/master/slides/Grebenkova2020OptimizationSlides.pdf Slides]
[https://youtu.be/9ELhIqjFSE8 Video]
[https://youtu.be/9ELhIqjFSE8 Video]
-
|[[Участник:Oleg Bakhteev|Oleg Bakhteev]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Oleg_Bakhteev Oleg Bakhteev]
|AILP+UXBR+HCV+TEDWS
|AILP+UXBR+HCV+TEDWS
-
|[[Участник:Vshokorov|Shokorov Vyacheslav]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Vshokorov Shokorov Vyacheslav]
[https://github.com/Intelligent-Systems-Phystech/2020_Project_9/raw/master/review%20Grebenkova.pdf Review]
[https://github.com/Intelligent-Systems-Phystech/2020_Project_9/raw/master/review%20Grebenkova.pdf Review]
|-
|-
-
|[[Участник:Vshokorov|Shokorov Vyacheslav]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Vshokorov Shokorov Vyacheslav]
|Text recognition based on skeletal representation of thick lines and convolutional networks
|Text recognition based on skeletal representation of thick lines and convolutional networks
|[https://docs.google.com/document/d/1zsk-tpd51axWfcYxpa4CWd1QZdOnr0Hv6b1_a34q28Y/edit?usp=sharing LinkReview]
|[https://docs.google.com/document/d/1zsk-tpd51axWfcYxpa4CWd1QZdOnr0Hv6b1_a34q28Y/edit?usp=sharing LinkReview]
Строка 543: Строка 1176:
|Denis Ozherelkov
|Denis Ozherelkov
|AIL
|AIL
-
|[[Участник:Magistrkoljan| Grebenkova Olga]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Magistrkoljan Grebenkova Olga]
[https://github.com/Intelligent-Systems-Phystech/2020-Project60/raw/master/docs/Shokorov2020ImageClassification_Review.pdf Review]
[https://github.com/Intelligent-Systems-Phystech/2020-Project60/raw/master/docs/Shokorov2020ImageClassification_Review.pdf Review]
|-
|-
-
|[[Участник:Filatov Andrey|Filatov Andrey]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Filatov Filatov Andrey]
|Intention forecasting. Investigation of the properties of local models in the spatial decoding of brain signals
|Intention forecasting. Investigation of the properties of local models in the spatial decoding of brain signals
|[https://docs.google.com/document/d/1UmRq34enjk7RpW2vpF5V88TaHKQd0Ne3LpwyoV0E6nA/edit?usp=sharing LinkReview]
|[https://docs.google.com/document/d/1UmRq34enjk7RpW2vpF5V88TaHKQd0Ne3LpwyoV0E6nA/edit?usp=sharing LinkReview]
Строка 555: Строка 1188:
|Valery Markin
|Valery Markin
|AILPHUXBRCVTEDWS
|AILPHUXBRCVTEDWS
-
|[[Участник:Hristolubov Maxim|Hristolubov Maxim]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Hristolubov_Maxim Hristolubov Maxim]
[https://github.com/Intelligent-Systems-Phystech/2020_Project8/raw/master/docs/%D0%A0%D0%B5%D1%86%D0%B5%D0%BD%D0%B7%D0%B8%D1%8F.pdf Review]
[https://github.com/Intelligent-Systems-Phystech/2020_Project8/raw/master/docs/%D0%A0%D0%B5%D1%86%D0%B5%D0%BD%D0%B7%D0%B8%D1%8F.pdf Review]
|-
|-
-
|[[Участник:Rustem Messi|Islamov Rustem]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Rustem_Messi Islamov Rustem]
|Analysis of the properties of an ensemble of locally approximating models
|Analysis of the properties of an ensemble of locally approximating models
|[https://docs.google.com/document/d/1wEYR3vXzZsYEv2L51wMCBFmP7UQwIBDPn3Gpz72MIyw/edit LinkReview]
|[https://docs.google.com/document/d/1wEYR3vXzZsYEv2L51wMCBFmP7UQwIBDPn3Gpz72MIyw/edit LinkReview]
Строка 565: Строка 1198:
[https://github.com/Intelligent-Systems-Phystech/2020_Project-51/raw/master/slides/Islamov2020EnsembleOfModels_Presentation.pdf Slides]
[https://github.com/Intelligent-Systems-Phystech/2020_Project-51/raw/master/slides/Islamov2020EnsembleOfModels_Presentation.pdf Slides]
[https://youtu.be/9yFRWsyj6zo Video]
[https://youtu.be/9yFRWsyj6zo Video]
-
|[[Участник:Andriygav | Andrey Grabovoi]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Andriygav Andrey Grabovoi]
|AILPHUXBRCVTEDWS
|AILPHUXBRCVTEDWS
-
|[[Участник:Gunaev Ruslan| Gunaev Ruslan]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Gunaev_Ruslan Gunaev Ruslan]
[https://github.com/Gunaev/2020-Project-69/raw/master/paper/Islamov2020_Review.docx Review]
[https://github.com/Gunaev/2020-Project-69/raw/master/paper/Islamov2020_Review.docx Review]
|-
|-
-
|[[Участник:Zholobov Vladimir| Zholobov Vladimir]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Zholobov_Vladimir Zholobov Vladimir]
|Early prediction of sufficient sample size for a generalized linear model.
|Early prediction of sufficient sample size for a generalized linear model.
|[https://docs.google.com/document/d/1o2gtdV3nYeAsfW0JZ5fESlVPhCA4_lfUOVnWhRjg1ck/edit LinkReview]
|[https://docs.google.com/document/d/1o2gtdV3nYeAsfW0JZ5fESlVPhCA4_lfUOVnWhRjg1ck/edit LinkReview]
Строка 579: Строка 1212:
|Grigory Malinovsky
|Grigory Malinovsky
|AILPHUXBRCVTEWSF
|AILPHUXBRCVTEWSF
-
|[[Участник:Vayser Kirill|Vayser Kirill]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Vayser_Kirill Vayser Kirill]
[https://github.com/Intelligent-Systems-Phystech/2020-Project_Regul/raw/master/docs/Zholobov2020EarlyForecast_Review.docx Review]
[https://github.com/Intelligent-Systems-Phystech/2020-Project_Regul/raw/master/docs/Zholobov2020EarlyForecast_Review.docx Review]
|-
|-
-
|[[Участник:Vayser Kirill|Vayser Kirill]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Vayser_Kirill Vayser Kirill]
|Additive regularization and its meta parameters when choosing the structure of deep learning networks
|Additive regularization and its meta parameters when choosing the structure of deep learning networks
|[https://docs.google.com/document/d/1LRVQ8dgRejQx8zdtk6dLMbHXdXwbAju6qD8NNSa1MgE/edit?usp=sharing LinkReview]
|[https://docs.google.com/document/d/1LRVQ8dgRejQx8zdtk6dLMbHXdXwbAju6qD8NNSa1MgE/edit?usp=sharing LinkReview]
Строка 591: Строка 1224:
|Mark Potanin
|Mark Potanin
|AILP+HUX+BRCV+TEDWS
|AILP+HUX+BRCV+TEDWS
-
|[[Участник:Zholobov Vladimir| Zholobov Vladimir]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Zholobov_Vladimir Zholobov Vladimir]
[https://github.com/Intelligent-Systems-Phystech/2020-Project44/blob/master/doc/review/Vaiser2020review.docx Review]
[https://github.com/Intelligent-Systems-Phystech/2020-Project44/blob/master/doc/review/Vaiser2020review.docx Review]
|-
|-
-
|[[Участник:Bishuk Anton|Bishuk Anton]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Bishuk_Anton Bishuk Anton]
|Solution of an optimization problem combining classification and regression to estimate the binding energy of a protein and small molecules.
|Solution of an optimization problem combining classification and regression to estimate the binding energy of a protein and small molecules.
|[https://drive.google.com/file/d/1NPz05B6HceCdD1Q-P8xYCUkc15bka2Qz/view?usp=sharing LinkReview]
|[https://drive.google.com/file/d/1NPz05B6HceCdD1Q-P8xYCUkc15bka2Qz/view?usp=sharing LinkReview]
Строка 601: Строка 1234:
[https://github.com/Intelligent-Systems-Phystech/2020_Project53_Class-Reg/blob/master/docs/Bishuk_2020_Cls_Rg_in_Mol_Docking_pres.pdf Slides]
[https://github.com/Intelligent-Systems-Phystech/2020_Project53_Class-Reg/blob/master/docs/Bishuk_2020_Cls_Rg_in_Mol_Docking_pres.pdf Slides]
[https://youtu.be/8sRcvKR2F-0 Video]
[https://youtu.be/8sRcvKR2F-0 Video]
-
 
|Maria Kadukova
|Maria Kadukova
|AILPHUXBRCVTEDH
|AILPHUXBRCVTEDH
-
|[[Участник:Filippova Anastasia|Filippova Anastasia]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Filippova_Anastasia Filippova Anastasia]
|-
|-
-
|[[Участник:Filippova Anastasia|Filippova Anastasia]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Filippova_Anastasia Filippova Anastasia]
|Step detection for IMU navigation via deep learning
|Step detection for IMU navigation via deep learning
|[https://docs.google.com/spreadsheets/d/1XLDBM53bX_7_HwCYbmuZTY8IlbcE0A4B1BQ8EnIXJEo/edit?usp=sharing LinkReview]
|[https://docs.google.com/spreadsheets/d/1XLDBM53bX_7_HwCYbmuZTY8IlbcE0A4B1BQ8EnIXJEo/edit?usp=sharing LinkReview]
Строка 616: Строка 1248:
|Tamaz Gadaev
|Tamaz Gadaev
|AIL0PUXBRCVSF
|AIL0PUXBRCVSF
-
|[[Участник:Bishuk Anton|Bishuk Anton]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Bishuk_Anton Bishuk Anton]
[https://github.com/Intelligent-Systems-Phystech/2020_Project53_Class-Reg/raw/master/Review.pdf Review]
[https://github.com/Intelligent-Systems-Phystech/2020_Project53_Class-Reg/raw/master/Review.pdf Review]
|-
|-
-
|[[Участник:Savelev Nickolay|Savelev Nickolay]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Savelev_Nickolay Savelev Nickolay]
|Distributed optimization under Polyak-Loyasievich conditions
|Distributed optimization under Polyak-Loyasievich conditions
|[https://docs.google.com/document/d/1tXEXnjv8F1CFYGSbdlp1Fd0fbU49N1E5bGnwo6XW3CU/edit?usp=sharing LinkReview]
|[https://docs.google.com/document/d/1tXEXnjv8F1CFYGSbdlp1Fd0fbU49N1E5bGnwo6XW3CU/edit?usp=sharing LinkReview]
Строка 628: Строка 1260:
|A. N. Beznosikov
|A. N. Beznosikov
|AILPHUXBRCVTEDWS
|AILPHUXBRCVTEDWS
-
|[[Участник:Lexakhar|Khary Alexandra]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Lexakhar Khary Alexandra]
[https://github.com/Intelligent-Systems-Phystech/2020-Project59/raw/master/docs/review.pdf Review]
[https://github.com/Intelligent-Systems-Phystech/2020-Project59/raw/master/docs/review.pdf Review]
|-
|-
-
|[[Участник:Lexakhar|Khary Alexandra]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Lexakhar Khary Alexandra]
|Theoretical validity of the application of metric classification methods using dynamic alignment (DTW) to spatiotemporal objects.
|Theoretical validity of the application of metric classification methods using dynamic alignment (DTW) to spatiotemporal objects.
|[https://docs.google.com/document/d/1B2INH2qRFHpUJWBMwn27kyQ6ySMI5i1-N322nzKUApY/edit?usp=sharing LinkReview]
|[https://docs.google.com/document/d/1B2INH2qRFHpUJWBMwn27kyQ6ySMI5i1-N322nzKUApY/edit?usp=sharing LinkReview]
Строка 640: Строка 1272:
|Gleb Morgachev, Alexey Goncharov
|Gleb Morgachev, Alexey Goncharov
|AILPHUXBRCVTEDCWS
|AILPHUXBRCVTEDCWS
-
|[[Участник:Savelev Nickolay|Savelev Nickolay]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Savelev_Nickolay Savelev Nickolay]
[https://github.com/Intelligent-Systems-Phystech/2020-Project64/raw/master/dosc/Review.pdf Review]
[https://github.com/Intelligent-Systems-Phystech/2020-Project64/raw/master/dosc/Review.pdf Review]
|-
|-
-
|[[Участник:Hristolubov Maxim|Hristolubov Maxim]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Hristolubov_Maxim Hristolubov Maxim]
|Generating features using locally approximating models (Classification of human activities by measurements of fitness bracelets)
|Generating features using locally approximating models (Classification of human activities by measurements of fitness bracelets)
|[https://drive.google.com/open?id=1j9NUd2r3rAmNlt_iobBcxHM8Nc1uXk51gCe4AAr1Evs LinkReview]
|[https://drive.google.com/open?id=1j9NUd2r3rAmNlt_iobBcxHM8Nc1uXk51gCe4AAr1Evs LinkReview]
Строка 652: Строка 1284:
|Alexandra Galtseva, Danil Sayranov
|Alexandra Galtseva, Danil Sayranov
|AILPH
|AILPH
-
|[[Участник:Filatov Andrey|Filatov Andrey]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Filatov Filatov Andrey]
[https://github.com/Intelligent-Systems-Phystech/2020-Project-17/raw/master/report/Hristolubov2020AccelerometerReview.pdf Review]
[https://github.com/Intelligent-Systems-Phystech/2020-Project-17/raw/master/report/Hristolubov2020AccelerometerReview.pdf Review]
|-
|-
-
|[[Участник:Mamonov|Mamonov Kirill]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Mamonov Mamonov Kirill]
|Nonlinear ranking of exploratory information search results.
|Nonlinear ranking of exploratory information search results.
|[https://docs.google.com/document/d/1PEIvEfvq_2Mo62M5jMN0Fgg_XTuWSoYMvdssnTlSXn4/edit?usp=sharing LinkReview]
|[https://docs.google.com/document/d/1PEIvEfvq_2Mo62M5jMN0Fgg_XTuWSoYMvdssnTlSXn4/edit?usp=sharing LinkReview]
Строка 662: Строка 1294:
[https://github.com/Intelligent-Systems-Phystech/2020-Project73/raw/master/report/Mamonov2020Project73slides.pdf Slides]
[https://github.com/Intelligent-Systems-Phystech/2020-Project73/raw/master/report/Mamonov2020Project73slides.pdf Slides]
[https://youtu.be/9Gr_YWYriww Video]
[https://youtu.be/9Gr_YWYriww Video]
-
|[[Участник:MEremeev|Maxim Eremeev]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:MEremeev Maxim Eremeev]
|AILPHU+XBRC+V+TEDHWJSF
|AILPHU+XBRC+V+TEDHWJSF
|-
|-
-
|[[Участник: Pavlichenko| Pavlichenko Nikita]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Pavlichenko Pavlichenko Nikita]
| Predicting the quality of protein models using spherical convolutions on 3D graphs.
| Predicting the quality of protein models using spherical convolutions on 3D graphs.
|[https://docs.google.com/document/d/1EaExQN9F94kt_JAJnglX1liuo-qS4C9Hee8pLOUWlL8/edit?usp=sharing LinkReview]
|[https://docs.google.com/document/d/1EaExQN9F94kt_JAJnglX1liuo-qS4C9Hee8pLOUWlL8/edit?usp=sharing LinkReview]
Строка 675: Строка 1307:
|AILPUXBRHCVTEDH
|AILPUXBRHCVTEDH
|-
|-
-
|[[Участник: Sodikov| Sodikov Mahmud]], [[Участник: Skachkov| Skachkov Daniel]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Sodikov Sodikov Mahmud], [http://www.machinelearning.ru/wiki/index.php?title=Участник:Skachkov Skachkov Daniel]
| Agnostic neural networks
| Agnostic neural networks
|[https://github.com/Intelligent-Systems-Phystech/WeightAgnosticNN/raw/master/WANN_modif.py Code]
|[https://github.com/Intelligent-Systems-Phystech/WeightAgnosticNN/raw/master/WANN_modif.py Code]
Строка 683: Строка 1315:
| Radoslav Neichev
| Radoslav Neichev
|AILPHUXBRC+VTEDHWJSF
|AILPHUXBRC+VTEDHWJSF
-
|[[Участник:Kulagin|Kulagin Petr]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Kulagin Kulagin Petr]
[https://github.com/petr-kulagin/2020-Project62/blob/master/docs/SodikovSkachkov2020Project66_Review.pdf Review]
[https://github.com/petr-kulagin/2020-Project62/blob/master/docs/SodikovSkachkov2020Project66_Review.pdf Review]
|-
|-
-
|[[Участник:Gunaev Ruslan|Gunaev Ruslan]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Gunaev_Ruslan Gunaev Ruslan]
| Graph Neural Network in Reaction Yield prediction
| Graph Neural Network in Reaction Yield prediction
|[https://docs.google.com/document/d/18-eJP3-bPs-aYGGR2PuD3tjJdaa7CF59JMJanwRQLJM/edit LinkReview]
|[https://docs.google.com/document/d/18-eJP3-bPs-aYGGR2PuD3tjJdaa7CF59JMJanwRQLJM/edit LinkReview]
Строка 695: Строка 1327:
|Philip Nikitin
|Philip Nikitin
|AILPUXBRHCVTEDHWSF
|AILPUXBRHCVTEDHWSF
-
|[[Участник:Rustem Messi|Islamov Rustem]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Rustem_Messi Islamov Rustem]
[https://github.com/Intelligent-Systems-Phystech/2020_Project-51/raw/master/doc/Gunaev2020Project69_Review.pdf Review]
[https://github.com/Intelligent-Systems-Phystech/2020_Project-51/raw/master/doc/Gunaev2020Project69_Review.pdf Review]
|-
|-
-
|[[Участник:Fyaush|Yaushev Farukh]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Fyaush Yaushev Farukh]
| Investigation of ways to match models by reducing the dimension of space
| Investigation of ways to match models by reducing the dimension of space
|[https://docs.google.com/document/d/14T3fHZycMMtvd-1LROd5gDOtbI-johIPp_RdiW_Qd3c/edit?usp=sharing LinkReview]
|[https://docs.google.com/document/d/14T3fHZycMMtvd-1LROd5gDOtbI-johIPp_RdiW_Qd3c/edit?usp=sharing LinkReview]
Строка 707: Строка 1339:
|Roman Isachenko
|Roman Isachenko
|AILPUXBRHCVTEDHWJS
|AILPUXBRHCVTEDHWJS
-
|[[Участник:Zholobov Vladimir| Zholobov Vladimir]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Zholobov_Vladimir Zholobov Vladimir]
[https://github.com/Intelligent-Systems-Phystech/2020-Project44/blob/master/doc/review/Yaushev2020review.docx Review]
[https://github.com/Intelligent-Systems-Phystech/2020-Project44/blob/master/doc/review/Yaushev2020review.docx Review]
|}
|}
-
===Task 51 ===
+
===51. 2020===
*'''Name:''' Analysis of the properties of an ensemble of locally approximating models.
*'''Name:''' Analysis of the properties of an ensemble of locally approximating models.
-
*'''Task''': In this paper, we consider the task of constructing a universal approximator --- a multimodel, which consists of a given finite set of local models. Each local model approximates a connected region in feature space. It is assumed that the set of local models cover the entire space of objects. A convex combination of local models is considered as an aggregating function. As the coefficients of the convex combination, we consider a function depending on the object --- the gate function.
+
*'''Problem''': In this paper, we consider The problem of constructing a universal approximator --- a multimodel, which consists of a given finite set of local models. Each local model approximates a connected region in feature space. It is assumed that the set of local models cover the entire space of objects. A convex combination of local models is considered as an aggregating function. As the coefficients of the convex combination, we consider a function depending on the object --- the gate function.
*'''Required''': To construct an algorithm for optimizing the parameters of local models and parameters of the gate function. It is required to propose a metric in the space of objects, a metric in the space of models.
*'''Required''': To construct an algorithm for optimizing the parameters of local models and parameters of the gate function. It is required to propose a metric in the space of objects, a metric in the space of models.
*'''Data:'''
*'''Data:'''
*# Synthetically generated data.
*# Synthetically generated data.
*# Energy consumption forecasting data. It is proposed to use the following models as local models: working day, day off. (Energy Consumption, Turk Electricity Consumption German Spot Price).
*# Energy consumption forecasting data. It is proposed to use the following models as local models: working day, day off. (Energy Consumption, Turk Electricity Consumption German Spot Price).
-
*'''References:''':
+
*'''References:'''
*# [https://github.com/andriygav/EMprior/blob/master/paper/Grabovoy2019MixtureOfExpertEng.pdf Overview of methods for estimating sample size]
*# [https://github.com/andriygav/EMprior/blob/master/paper/Grabovoy2019MixtureOfExpertEng.pdf Overview of methods for estimating sample size]
*# [http://www.machinelearning.ru/wiki/images/2/21/Voron-ML-Compositions-slides2.pdf Vorontsov's lectures on compositions]
*# [http://www.machinelearning.ru/wiki/images/2/21/Voron-ML-Compositions-slides2.pdf Vorontsov's lectures on compositions]
*# [http://www.machinelearning.ru/wiki/images/0/0d/Voron-ML-Compositions.pdf Vorontsov's lectures on compositions]
*# [http://www.machinelearning.ru/wiki/images/0/0d/Voron-ML-Compositions.pdf Vorontsov's lectures on compositions]
*# Esen Y.S., Wilson J., Gader P.D. Twenty Years of Mixture of Experts. IEEE Transactions on Neural Networks and Learning Systems. 2012. Issues. 23. No 8. P. 1177-1193.
*# Esen Y.S., Wilson J., Gader P.D. Twenty Years of Mixture of Experts. IEEE Transactions on Neural Networks and Learning Systems. 2012. Issues. 23. No 8. P. 1177-1193.
-
*# [https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/MSThesis/Pavlov2012/ Pavlov K.V. Selection of multilevel models in Tasks classification, 2012]
+
*# [https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/MSThesis/Pavlov2012/ Pavlov K.V. Selection of multilevel models in The problems classification, 2012]
*'''Basic algorithm''': As a basic algorithm, it is proposed to use a two-level optimization problem, where local models are optimized at one iteration and at the next iteration, the parameters of the gate function are optimized.
*'''Basic algorithm''': As a basic algorithm, it is proposed to use a two-level optimization problem, where local models are optimized at one iteration and at the next iteration, the parameters of the gate function are optimized.
-
*'''Authors:''' Grabovoi A.V. (consultant), Strizhov V.V. (Expert)
+
*'''Authors:''' Grabovoi A.V. (consultant), Strijov V.V. (Expert)
-
===Task 54 ===
+
===54. 2020===
-
* '''Name:''' Finding the pupil in the eye image using the brightness projection method.
+
* '''Title:''' Finding the pupil in the eye image using the brightness projection method.
-
* '''Task''': Given a monochrome bitmap of the eye, see examples (https://cloud.mail.ru/public/eaou/4JSamfmrh).
+
* '''Problem:''' Given a monochrome bitmap of the eye, see examples (https://cloud.mail.ru/public/eaou/4JSamfmrh).
It is necessary to determine the approximate coordinates of the center of the pupil. The word "approximate" means that the calculated pupil center must lie inside a circle centered at the pupil's true center and half the true radius. The algorithm must be very fast.
It is necessary to determine the approximate coordinates of the center of the pupil. The word "approximate" means that the calculated pupil center must lie inside a circle centered at the pupil's true center and half the true radius. The algorithm must be very fast.
* '''Data:''' About 200 thousand eye images. For each, the position of the true circle is marked - for the purpose of training and testing the method being created.
* '''Data:''' About 200 thousand eye images. For each, the position of the true circle is marked - for the purpose of training and testing the method being created.
-
* '''Basic algorithm:''' To speed up work with the image, it is proposed to aggregate data using brightness projections. Image brightness is a function of two discrete arguments I(x, y). Its projection onto the horizontal axis is P(x)=\sum \limits_y I(x,y). Similarly, projections are constructed on axes with an inclination. Having built several projections (two, four), based on them, you can try to determine the position of the pupil (compact dark area) using heuristics and / or a neural network. It is interesting to evaluate the capabilities of the neural network in this task.
+
* '''Base algorithm:''' To speed up work with the image, it is proposed to aggregate data using brightness projections. Image brightness is a function of two discrete arguments I(x, y). Its projection onto the horizontal axis is P(x)=\sum \limits_y I(x,y). Similarly, projections are constructed on axes with an inclination. Having built several projections (two, four), based on them, you can try to determine the position of the pupil (compact dark area) using heuristics and / or a neural network. It is interesting to evaluate the capabilities of the neural network in this The problem.
-
* '''References:''': Zhi-Hua Zhou, Xin Geng Projection functions for eye detection // Pattern Recognition. 2004. V.37ю N.5. P.1049-1056. https://doi.org/10.1016/j.patcog.2003.09.006
+
* '''References:''' Zhi-Hua Zhou, Xin Geng Projection functions for eye detection // Pattern Recognition. 2004. V.37ю N.5. P.1049-1056. https://doi.org/10.1016/j.patcog.2003.09.006
* '''Authors:''' Matveev I.A.
* '''Authors:''' Matveev I.A.
-
===Task 55 ===
+
===55. 2020===
-
* '''Name:''' Search for the boundaries of the iris by the method of circular projections
+
* '''Title:''' Search for the boundaries of the iris by the method of circular projections
-
* '''Task''': Given a monochrome bitmap of the eye, see examples (https://cloud.mail.ru/public/2DBu/5c6F6e3LC). The approximate position of the center of the pupil is also known. The word "approximate" means that the calculated center of the pupil is no more than half of its true radius from the true one. It is necessary to determine the approximate positions of the circles approximating the pupil and iris. The algorithm must be very fast.
+
* '''Problem:''' Given a monochrome bitmap of the eye, see examples (https://cloud.mail.ru/public/2DBu/5c6F6e3LC). The approximate position of the center of the pupil is also known. The word "approximate" means that the calculated center of the pupil is no more than half of its true radius from the true one. It is necessary to determine the approximate positions of the circles approximating the pupil and iris. The algorithm must be very fast.
* '''Data:''' About 200 thousand eye images. For each, the position of the true circle is marked - for the purpose of training and testing the method being created.
* '''Data:''' About 200 thousand eye images. For each, the position of the true circle is marked - for the purpose of training and testing the method being created.
-
* '''Basic algorithm:''' To speed up work with the image, it is proposed to aggregate data using circular projections of brightness. Circular projection is a function that depends on the radius, the value of which P(r) is equal to the integral of the directed image brightness gradient over a circle of radius r (or along an arc of a circle). Example for one arc (right quadrant) and for four arcs. Having built some circular projections, based on them, you can try to determine the position of the inner and outer borders of the iris (ring) using heuristics and / or a neural network. It is interesting to evaluate the capabilities of the neural network in this task.
+
* '''Base algorithm:''' To speed up work with the image, it is proposed to aggregate data using circular projections of brightness. Circular projection is a function that depends on the radius, the value of which P(r) is equal to the integral of the directed image brightness gradient over a circle of radius r (or along an arc of a circle). Example for one arc (right quadrant) and for four arcs. Having built some circular projections, based on them, you can try to determine the position of the inner and outer borders of the iris (ring) using heuristics and / or a neural network. It is interesting to evaluate the capabilities of the neural network in this The problem.
-
* '''References:''': Matveev I.A. Detection of Iris in Image By Interrelated Maxima of Brightness Gradient Projections // Applied and Computational Mathematics. 2010. V.9. N.2. P.252-257. https://www.researchgate.net/publication/228396639_Detection_of_iris_in_image_by_interrelated_maxima_of_brightness_gradient_projections
+
* '''References:''' Matveev I.A. Detection of Iris in Image By Interrelated Maxima of Brightness Gradient Projections // Applied and Computational Mathematics. 2010. V.9. N.2. P.252-257. https://www.researchgate.net/publication/228396639_Detection_of_iris_in_image_by_interrelated_maxima_of_brightness_gradient_projections
* '''Authors:''' Matveev I.A.
* '''Authors:''' Matveev I.A.
-
===Task 56 ===
+
===56. 2020===
-
* '''Name:''' Construction of local and universal interpretable scoring models
+
* '''Title:''' Construction of local and universal interpretable scoring models
-
* '''Task''': Build a simple and interpretable scoring system as a superposition of local models, taking into account the requirements for the system to retain knowledge about key customers and features (in other words, take into account new economic phenomena). The model must be a superposition, and each element must be controlled by its own quality criterion. Introduce a schedule for optimizing the structure and parameters of the model: the system must work in a single optimization chain. Propose an algorithm for selecting features and objects.
+
* '''Problem:''' Build a simple and interpretable scoring system as a superposition of local models, taking into account the requirements for the system to retain knowledge about key customers and features (in other words, take into account new economic phenomena). The model must be a superposition, and each element must be controlled by its own quality criterion. Introduce a schedule for optimizing the structure and parameters of the model: the system must work in a single optimization chain. Propose an algorithm for selecting features and objects.
* '''Data:'''
* '''Data:'''
# Data from OTP Bank. The sample contains records of 15,223 clients classified into two classes: 1 - there was a response (1812 clients), 0 - there was no response (13411 clients). Feature descriptions of clients consist of 50 features, which include, in particular, age, gender, social status in relation to work, social status in relation to pension, number of children, number of dependents, education, marital status, branch of work. The data are available at the following addresses: www.machinelearning.ru/wiki/images/2/26/Contest_MMRO15_OTP.rar (sample A), www.machinelearning.ru/wiki/images/5/52/Contest_MMRO15_OTP_(validation).rar (sample B).
# Data from OTP Bank. The sample contains records of 15,223 clients classified into two classes: 1 - there was a response (1812 clients), 0 - there was no response (13411 clients). Feature descriptions of clients consist of 50 features, which include, in particular, age, gender, social status in relation to work, social status in relation to pension, number of children, number of dependents, education, marital status, branch of work. The data are available at the following addresses: www.machinelearning.ru/wiki/images/2/26/Contest_MMRO15_OTP.rar (sample A), www.machinelearning.ru/wiki/images/5/52/Contest_MMRO15_OTP_(validation).rar (sample B).
# Data from Home Credit: https://www.kaggle.com/c/home-credit-default-risk/data
# Data from Home Credit: https://www.kaggle.com/c/home-credit-default-risk/data
-
* '''References:''':
+
* '''References:'''
-
# Strijov V.V. Error function in regression analysis // Factory Laboratory, 2013, 79(5) : 65-73
+
*# Strijov V.V. Error function in regression analysis // Factory Laboratory, 2013, 79(5) : 65-73
-
# Bishop C. M. Linear models for classification / В кн.: Pattern Recognition and Machine Learning. Под ред.: M. Jordan, J. Kleinberg, B. Scholkopf. – New York: Springer Science+Business Media, 2006, pp--203 – 208
+
*# Bishop C. M. Linear models for classification / В кн.: Pattern Recognition and Machine Learning. Под ред.: M. Jordan, J. Kleinberg, B. Scholkopf. – New York: Springer Science+Business Media, 2006, pp--203 – 208
-
# Tokmakova A.A. Obtaining Stable Hyperparameter Estimates for Linear Regression Models // Machine Learning and Data Analysis. — 2011. — № 2. — С. 140-155
+
*# Tokmakova A.A. Obtaining Stable Hyperparameter Estimates for Linear Regression Models // Machine Learning and Data Analysis. — 2011. — № 2. — С. 140-155
-
# S. Scitovski and N. Sarlija. Cluster analysis in retail segmentation for credit scoring // CRORR 5. 2014. 235–245
+
*# S. Scitovski and N. Sarlija. Cluster analysis in retail segmentation for credit scoring // CRORR 5. 2014. 235–245
-
# Goncharov A.V. Building Interpretable Deep Learning Models in the Social Ranking Problem
+
*# Goncharov A.V. Building Interpretable Deep Learning Models in the Social Ranking Problem
-
* '''Basic algorithm:''' Iterative weighted least squares (described in (2))
+
* '''Base algorithm:''' Iterative weighted least squares (described in (2))
* '''Solution:''' It is proposed to build a scoring system containing such a preprocessing block as a block for generating metric features. It is proposed to investigate the influence of the non-equivalence of objects on the selection of features for the model, to investigate the joint selection of features and objects when building a model. It is required to implement a schedule for optimizing the model structure using an algorithm based on the analysis of covariance matrices of model hyperparameters. The schedule includes a phased replenishment of the set of features and objects. The feature sample size will be determined by controlling the error variance. The main criterion for the quality of the system: ROC AUC (Gini).
* '''Solution:''' It is proposed to build a scoring system containing such a preprocessing block as a block for generating metric features. It is proposed to investigate the influence of the non-equivalence of objects on the selection of features for the model, to investigate the joint selection of features and objects when building a model. It is required to implement a schedule for optimizing the model structure using an algorithm based on the analysis of covariance matrices of model hyperparameters. The schedule includes a phased replenishment of the set of features and objects. The feature sample size will be determined by controlling the error variance. The main criterion for the quality of the system: ROC AUC (Gini).
* '''Novelty:'''
* '''Novelty:'''
# The model structure optimization schedule must satisfy the requirement to rebuild the model at any time without losing its characteristics.
# The model structure optimization schedule must satisfy the requirement to rebuild the model at any time without losing its characteristics.
# Accounting for the unequal value of objects in the selection of features
# Accounting for the unequal value of objects in the selection of features
-
* '''Authors:''' Pugaeva I.V. (consultant), Strizhov V.V. (Expert)
+
* '''Authors:''' Pugaeva I.V. (consultant), Strijov V.V. (Expert)
-
===Task 59 ===
+
===59. 2020===
* Name: Distributed optimization under Polyak-Loyasievich conditions
* Name: Distributed optimization under Polyak-Loyasievich conditions
-
* Task: The task is to efficiently solve large systems of nonlinear equations using a network of calculators.
+
* '''Problem description:''' The problem is to efficiently solve large systems of nonlinear equations using a network of calculators.
-
* Solution: A new method for decentralized distributed solution of systems of nonlinear equations under Polyak-Loyasievich's conditions is proposed. The approach is based on the fact that the distributed optimization problem can be represented as a composite optimization problem (see 2 from the literature), which in turn can be solved by analogs of the similar triangles or sliding method (see 2 from the literature).
+
* '''Solution:''' A new method for decentralized distributed solution of systems of nonlinear equations under Polyak-Loyasievich's conditions is proposed. The approach is based on the fact that the distributed optimization problem can be represented as a composite optimization problem (see 2 from the literature), which in turn can be solved by analogs of the similar triangles or sliding method (see 2 from the literature).
* Basic algorithm: The proposed method is compared with gradient descent and accelerated gradient descent
* Basic algorithm: The proposed method is compared with gradient descent and accelerated gradient descent
-
* References:
+
* '''References:'''
-
# Linear Convergence of Gradient and Proximal-GradientMethods Under the Polyak- Lojasiewicz Condition https://arxiv.org/pdf/1608.04636.pdf
+
*# Linear Convergence of Gradient and Proximal-GradientMethods Under the Polyak- Lojasiewicz Condition https://arxiv.org/pdf/1608.04636.pdf
-
# Linear Convergence for Distributed Optimization Under the Polyak-Łojasiewicz Condition https://arxiv.org/pdf/1912.12110.pdf
+
*# Linear Convergence for Distributed Optimization Under the Polyak-Łojasiewicz Condition https://arxiv.org/pdf/1912.12110.pdf
-
# Optimal Decentralized Distributed Algorithms for Stochastic ConvexOptimization https://arxiv.org/pdf/1911.07363.pdf
+
*# Optimal Decentralized Distributed Algorithms for Stochastic ConvexOptimization https://arxiv.org/pdf/1911.07363.pdf
-
# Modern numerical optimization methods, universal gradient descent method https://arxiv.org/ftp/arxiv/papers/1711/1711.00394.pdf
+
*# Modern numerical optimization methods, universal gradient descent method https://arxiv.org/ftp/arxiv/papers/1711/1711.00394.pdf
-
* Novelty: Reduction of a distributed optimization problem to a composite optimization problem and its solution under Polyak-Loyasievich conditions
+
* '''Novelty:''' Reduction of a distributed optimization problem to a composite optimization problem and its solution under Polyak-Loyasievich conditions
-
* Authors: Expert — А.В. Гасников, consultant — А.Н. Безносиков
+
* '''Authors:''' Expert A.B. Gasnikov, consultant A.N. Beznossikov
-
* '''Comment: it is important to set up a computational experiment in this task, otherwise the task will be poorly compatible with the course.'''
+
* '''Comment: it is important to set up a computational experiment in this The problem, otherwise The problem will be poorly compatible with the course.'''
-
=== Task 17 ===
+
===17. 2020===
-
* '''Name:''' Intention forecasting. Investigation of the properties of local models in the spatial decoding of brain signals
+
* '''Title:''' Intention forecasting. Investigation of the properties of local models in the spatial decoding of brain signals
-
* '''Task''': When building brain-computer interface systems, simple, stable models are used. An important stage in the construction of such a model is the construction of an adequate feature space. Previously, such a Task was solved by extracting features from the frequency characteristics of signals.
+
* '''Problem:''' When building brain-computer interface systems, simple, stable models are used. An important stage in the construction of such a model is the construction of an adequate feature space. Previously, such the problem was solved by extracting features from the frequency characteristics of signals.
* '''Data:''' ECoG/EEG brain signal data sets.
* '''Data:''' ECoG/EEG brain signal data sets.
-
* '''References:''':
+
* '''References:'''
*# Motrenko A.P., Strijov V.V. Multi-way feature selection for ECoG-based brain-computer Interface // Expert systems with applications. - 2018.
*# Motrenko A.P., Strijov V.V. Multi-way feature selection for ECoG-based brain-computer Interface // Expert systems with applications. - 2018.
*# Eliseyev A., Aksenova T. Stable and artifact-resistant decoding of 3D hand trajectories from ECoG signals using the generalized additive model //Journal of neural engineering. – 2014.
*# Eliseyev A., Aksenova T. Stable and artifact-resistant decoding of 3D hand trajectories from ECoG signals using the generalized additive model //Journal of neural engineering. – 2014.
Строка 787: Строка 1419:
* '''Solution:''' In this paper, it is proposed to take into account the spatial dependence between sensors that read data. To do this, it is necessary to locally model the spatial impulse/signal and build a predictive model based on the local description.
* '''Solution:''' In this paper, it is proposed to take into account the spatial dependence between sensors that read data. To do this, it is necessary to locally model the spatial impulse/signal and build a predictive model based on the local description.
* '''Novelty:''' An essentially new way of constructing a feature description in the problem of signal decoding is proposed. Bonus: analysis of changes in the structure of the model, adaptation of the structure when the sample changes.
* '''Novelty:''' An essentially new way of constructing a feature description in the problem of signal decoding is proposed. Bonus: analysis of changes in the structure of the model, adaptation of the structure when the sample changes.
-
* '''Authors:''' Strizhov V.V., Roman Isachenko - Experts, consultants – Valery Markin, Alina Samokhina
+
* '''Authors:''' Strijov V.V., Roman Isachenko - Experts, consultants – Valery Markin, Alina Samokhina
-
===Task 9 ===
+
===9. 2020===
-
* '''Name:''' Text recognition based on skeletal representation of thick lines and convolutional networks
+
* '''Title:''' Text recognition based on skeletal representation of thick lines and convolutional networks
-
* '''Task''': It is required to build two CNNs, one recognizes a raster representation of an image, the other a vector one.
+
* '''Problem:''' It is required to build two CNNs, one recognizes a raster representation of an image, the other a vector one.
* '''Data:''' Fonts in raster representation.
* '''Data:''' Fonts in raster representation.
-
* '''References:''':List of works [http://www.machinelearning.ru/wiki/images/a/a2/Morozov2017Synthesis_of_medicines.pdf], in particular arXiv:1611.03199 and
+
* '''References:'''List of works [http://www.machinelearning.ru/wiki/images/a/a2/Morozov2017Synthesis_of_medicines.pdf], in particular arXiv:1611.03199 and
-
** Goyal P., Ferrara E. Graph embedding techniques, applications, and performance: A survey. arXiv:1705.02801, 2017.
+
*# Goyal P., Ferrara E. Graph embedding techniques, applications, and performance: A survey. arXiv:1705.02801, 2017.
-
** Cai H., Zheng V.W., Chang K.C.-C. A comprehensive survey of graph embedding: Problems, techniques and applications. arXiv:1709.07604, 2017.
+
*# Cai H., Zheng V.W., Chang K.C.-C. A comprehensive survey of graph embedding: Problems, techniques and applications. arXiv:1709.07604, 2017.
-
** Grover A., Leskovec J. node2vec: Scalable Feature Learning for Networks. arXiv:1607.00653, 2016.
+
*# Grover A., Leskovec J. node2vec: Scalable Feature Learning for Networks. arXiv:1607.00653, 2016.
-
** Mestetskiy L., Semenov A. Binary Image Skeleton - Continuous Approach // Proceedings 3rd International Conference on Computer Vision Theory and Applications, VISAPP 2008. P. 251-258. [https://www.researchgate.net/publication/221415333_Binary_Image_Skeleton_-_Continuous_Approach URL]
+
*# Mestetskiy L., Semenov A. Binary Image Skeleton - Continuous Approach // Proceedings 3rd International Conference on Computer Vision Theory and Applications, VISAPP 2008. P. 251-258. [https://www.researchgate.net/publication/221415333_Binary_Image_Skeleton_-_Continuous_Approach URL]
-
** Kushnir O.A., Seredin O.S., Stepanov A.V. Experimental study of regularization parameters and approximation of skeletal graphs of binary images // Machine Learning and Data Analysis. 2014. Т. 1. № 7. С. 817-827. [http://jmlda.org/papers/doc/2014/no7/Kushnir2014ParametersResearch.pdf URL]
+
*# Kushnir O.A., Seredin O.S., Stepanov A.V. Experimental study of regularization parameters and approximation of skeletal graphs of binary images // Machine Learning and Data Analysis. 2014. Т. 1. № 7. С. 817-827. [http://jmlda.org/papers/doc/2014/no7/Kushnir2014ParametersResearch.pdf URL]
-
** Zhukova K.V., Reyer I.A. Basic Skeleton Connectivity and Parametric Shape Descriptor // Machine Learning and Data Analysis.2014. Т. 1. № 10. С. 1354-1368. [http://jmlda.org/papers/doc/2014/no10/Reyer2014SkeletonConnectivity.pdf URL]
+
*# Zhukova K.V., Reyer I.A. Basic Skeleton Connectivity and Parametric Shape Descriptor // Machine Learning and Data Analysis.2014. Т. 1. № 10. С. 1354-1368. [http://jmlda.org/papers/doc/2014/no10/Reyer2014SkeletonConnectivity.pdf URL]
-
** Kushnir O., Seredin O. Shape Matching Based on Skeletonization and Alignment of Primitive Chains // Communications in Computer and Information Science. 2015. V. 542. P. 123-136. [https://link.springer.com/chapter/10.1007/978-3-319-26123-2_12 URL]
+
*# Kushnir O., Seredin O. Shape Matching Based on Skeletonization and Alignment of Primitive Chains // Communications in Computer and Information Science. 2015. V. 542. P. 123-136. [https://link.springer.com/chapter/10.1007/978-3-319-26123-2_12 URL]
* '''Basic algorithm''': Convolution network for bitmap.
* '''Basic algorithm''': Convolution network for bitmap.
* '''Solution:''' It is required to propose a method for collapsing graph structures, which allows generating an informative description of the thick line skeleton.
* '''Solution:''' It is required to propose a method for collapsing graph structures, which allows generating an informative description of the thick line skeleton.
* '''Novelty:''' A method is proposed for improving the quality of recognition of thick lines due to a new method for generating their descriptions.
* '''Novelty:''' A method is proposed for improving the quality of recognition of thick lines due to a new method for generating their descriptions.
-
* '''Authors:''' Experts Reyer I.A., Strizhov V.V., Mark Potanin, consultant Denis Ozherelkov
+
* '''Authors:''' Experts Reyer I.A., Strijov V.V., Mark Potanin, consultant Denis Ozherelkov
-
=== Task 60 ===
+
===60. 2020===
-
* '''Name:''' Variational optimization of deep learning models with model complexity control
+
* '''Title:''' Variational optimization of deep learning models with model complexity control
-
* '''Task''': The task of optimizing a deep learning model with a predetermined model complexity is considered. It is required to propose a model optimization method that allows generating new models with a given complexity and low computational costs.
+
* '''Problem:''' The problem of optimizing a deep learning model with a predetermined model complexity is considered. It is required to propose a model optimization method that allows generating new models with a given complexity and low computational costs.
* '''Data:'''MNIST, CIFAR
* '''Data:'''MNIST, CIFAR
* '''References:'''
* '''References:'''
-
** [1] variational inference for neural networks https://papers.nips.cc/paper/4329-practical-variational-inference-for-neural-networks.pdf
+
*# [1] variational inference for neural networks https://papers.nips.cc/paper/4329-practical-variational-inference-for-neural-networks.pdf
-
** [2] hypernets https://arxiv.org/abs/1609.09106
+
*# [2] hypernets https://arxiv.org/abs/1609.09106
-
** [3] network factories https://papers.nips.cc/paper/6304-convolutional-neural-fabrics.pdf
+
*# [3] network factories https://papers.nips.cc/paper/6304-convolutional-neural-fabrics.pdf
-
* '''Basic algorithm:''' Random search
+
* '''Base algorithm:''' Random search
* '''Solution:''' The proposed method is to represent a deep learning model as a hypernet (a network that generates the parameters of another network) using a Bayesian approach. Probabilistic assumptions about the parameters of deep learning models are introduced, and a variational lower estimate of the Bayesian validity of the model is maximized. The variation estimate is considered as a conditional value depending on the external parameter of complexity.
* '''Solution:''' The proposed method is to represent a deep learning model as a hypernet (a network that generates the parameters of another network) using a Bayesian approach. Probabilistic assumptions about the parameters of deep learning models are introduced, and a variational lower estimate of the Bayesian validity of the model is maximized. The variation estimate is considered as a conditional value depending on the external parameter of complexity.
* '''Novelty:''' The proposed method allows generating models in one-shot mode (practically without retraining) with the required model complexity, which significantly reduces the cost of optimization and retraining.
* '''Novelty:''' The proposed method allows generating models in one-shot mode (practically without retraining) with the required model complexity, which significantly reduces the cost of optimization and retraining.
-
* '''Authors:''' Oleg Bakhteev, Strizhov V.V.
+
* '''Authors:''' Oleg Bakhteev, Strijov V.V.
-
=== Task 61 ===
+
===61. 2020===
-
* '''Name:''' Selecting a deep learning model based on the triplet relationship of model and sample
+
* '''Title:''' Selecting a deep learning model based on the triplet relationship of model and sample
-
* '''Task''': Task one-shot of choosing a deep learning model is considered: choosing a model for a specific sample, issued from some general population, should not be computationally expensive.
+
* '''Problem:''' The problem one-shot of choosing a deep learning model is considered: choosing a model for a specific sample, issued from some general population, should not be computationally expensive.
* '''Data:'''MNIST, synthetic data
* '''Data:'''MNIST, synthetic data
* '''References:'''
* '''References:'''
-
** [1] learning model predictions on pairs <sample, model> https://www.ri.cmu.edu/pub_files/2016/10/yuxiongw_eccv16_learntolearn.pdf
+
*# [1] learning model predictions on pairs <sample, model> https://www.ri.cmu.edu/pub_files/2016/10/yuxiongw_eccv16_learntolearn.pdf
-
** [2] Bayesian choice for two domains https://arxiv.org/abs/1806.08672
+
*# [2] Bayesian choice for two domains https://arxiv.org/abs/1806.08672
-
* '''Basic algorithm:''' Random search
+
* '''Base algorithm:''' Random search
* '''Solution:''' It is proposed to consider the space of parameters and models as two domains with their own generative models. To obtain a connection between domains, a generalization of the variational derivation to the case of triplet constraints is used.
* '''Solution:''' It is proposed to consider the space of parameters and models as two domains with their own generative models. To obtain a connection between domains, a generalization of the variational derivation to the case of triplet constraints is used.
* '''Novelty:''' New one-shot model training method
* '''Novelty:''' New one-shot model training method
-
* '''Authors:''' Oleg Bakhteev, Strizhov V.V.
+
* '''Authors:''' Oleg Bakhteev, Strijov V.V.
-
=== Task 64===
+
===64. 2020===
-
* '''Name:''' Theoretical validity of the application of metric classification methods using dynamic alignment (DTW) to spatiotemporal objects.
+
* '''Title:''' Theoretical validity of the application of metric classification methods using dynamic alignment (DTW) to spatiotemporal objects.
-
* '''Task:''' It is necessary to study the existing theoretical justifications for applying dynamic alignment methods to various objects, and explore the use of such methods for space-time series.<br />When proving the applicability of alignment methods, it is proved that the function generated by the dynamic alignment algorithm is the core. Which, in turn, justifies the use of metric classification methods.
+
* '''Problem description:''' It is necessary to study the existing theoretical justifications for applying dynamic alignment methods to various objects, and explore the use of such methods for space-time series.<br />When proving the applicability of alignment methods, it is proved that the function generated by the dynamic alignment algorithm is the core. Which, in turn, justifies the use of metric classification methods.
* '''References:'''
* '''References:'''
-
**[https://www.cs.unm.edu/~mueen/DTW.pdf Overview presentation about DTW]
+
*# [https://www.cs.unm.edu/~mueen/DTW.pdf Overview presentation about DTW]
-
**[http://www.machinelearning.ru/wiki/index.php?title=Теорема_Мерсера Mercer's theorem]
+
*# [http://www.machinelearning.ru/wiki/index.php?title=Теорема_Мерсера Mercer's theorem]
-
**[https://www.researchgate.net/profile/Vincent_Wan/publication/221478420_Polynomial_dynamic_time_warping_kernel_support_vector_machines_for_dysarthric_speech_recognition_with_sparse_training_data/links/09e4150b7256b621ac000000/Polynomial-dynamic-time-warping-kernel-support-vector-machines-for-dysarthric-speech-recognition-with-sparse-training-data.pdf Polynomial dynamic time warping kernel support vector machines for dysarthric speech recognition with sparse training data]
+
*# [https://www.researchgate.net/profile/Vincent_Wan/publication/221478420_Polynomial_dynamic_time_warping_kernel_support_vector_machines_for_dysarthric_speech_recognition_with_sparse_training_data/links/09e4150b7256b621ac000000/Polynomial-dynamic-time-warping-kernel-support-vector-machines-for-dysarthric-speech-recognition-with-sparse-training-data.pdf Polynomial dynamic time warping kernel support vector machines for dysarthric speech recognition with sparse training data]
-
**[https://link.springer.com/content/pdf/10.1007/11608288_67.pdf Online Signature Verification with New Time Series Kernels for Support Vector Machines]
+
*# [https://link.springer.com/content/pdf/10.1007/11608288_67.pdf Online Signature Verification with New Time Series Kernels for Support Vector Machines]
* '''Solution:''' For different formulations of the DTW method (when the internal function of the distance between time series samples is different) - find and collect evidence that the function is the kernel in one place.<br />For a basic set of datasets with time series (on which the accuracy of distance functions is checked ) check the fulfillment of the conditions from the Mercer theorem (positive definiteness of the matrix). Do this for various modifications of the DTW distance function. (Sakoe-Chiba band, Itakura band, weighted DTW.)
* '''Solution:''' For different formulations of the DTW method (when the internal function of the distance between time series samples is different) - find and collect evidence that the function is the kernel in one place.<br />For a basic set of datasets with time series (on which the accuracy of distance functions is checked ) check the fulfillment of the conditions from the Mercer theorem (positive definiteness of the matrix). Do this for various modifications of the DTW distance function. (Sakoe-Chiba band, Itakura band, weighted DTW.)
* '''Novelty:''' Investigation of theoretical justifications for applying the dynamic alignment algorithm (DTW) and its modifications to space-time series.
* '''Novelty:''' Investigation of theoretical justifications for applying the dynamic alignment algorithm (DTW) and its modifications to space-time series.
-
* '''Authors:''' Strizhov V.V. - Expert, [[Участник:Morgachev.gleb|Gleb Morgachev]], Alexey Goncharov - consultants.
+
* '''Authors:''' Strijov V.V. - Expert, [[Участник:Morgachev.gleb|Gleb Morgachev]], Alexey Goncharov - consultants.
-
=== Task 66 ===
+
===66. 2020===
-
* '''Name:''' Agnostic neural networks
+
* '''Title:''' Agnostic neural networks
-
* '''Task:''' Introduce a metric space into the problem of automatic construction (selection) of agnostic networks.
+
* '''Problem description:''' Introduce a metric space into the problem of automatic construction (selection) of agnostic networks.
* '''Data:''' Data from the Reinforcement learning area. Preferably the type of cars on the track.
* '''Data:''' Data from the Reinforcement learning area. Preferably the type of cars on the track.
-
* '''References:''':
+
* '''References:'''
-
** (!) Kulunchakov A.S., Strijov V.V. Generation of simple structured Information Retrieval functions by genetic algorithm without stagnation // [http://strijov.com/papers/Kulunchakov2014RankingBySimpleFun.pdf Expert Systems with Applications, 2017, 85 : 221—230.]
+
*# (!) Kulunchakov A.S., Strijov V.V. Generation of simple structured Information Retrieval functions by genetic algorithm without stagnation // [http://strijov.com/papers/Kulunchakov2014RankingBySimpleFun.pdf Expert Systems with Applications, 2017, 85 : 221—230.]
-
** A. A. Varfolomeeva The choice of features when marking bibliographic lists by methods of structural learning, 2013, [http://www.machinelearning.ru/wiki/images/f/f2/Varfolomeeva2013Diploma.pdf?format=raw]
+
*# A. A. Varfolomeeva The choice of features when marking bibliographic lists by methods of structural learning, 2013, [http://www.machinelearning.ru/wiki/images/f/f2/Varfolomeeva2013Diploma.pdf?format=raw]
-
** Bin Cao, Ying Li and Jianwei Yin Measuring Similarity between Graphs Based on the Levenshtein Distance, 2012, [http://naturalspublishing.com/files/published/92cn7jm44d8wt1.pdf?format=raw]
+
*# Bin Cao, Ying Li and Jianwei Yin Measuring Similarity between Graphs Based on the Levenshtein Distance, 2012, [http://naturalspublishing.com/files/published/92cn7jm44d8wt1.pdf?format=raw]
-
** https://habr.com/ru/post/465369/
+
*# https://habr.com/ru/post/465369/
-
** https://weightagnostic.github.io/
+
*# https://weightagnostic.github.io/
-
* '''Basic algorithm:''' Networks from an archived article. Symbolic regression from an article in ESwA (you need to restore the code).
+
* '''Base algorithm:''' Networks from an archived article. Symbolic regression from an article in ESwA (you need to restore the code).
* '''Solution:''' We create a model generator in the framework of symbolic regression. We create a model generator as a variational autoencoder (we won’t have time during the course). We study the metric properties of sample spaces (Euclidean) and models (Banach). We create a GAN pair - a generator-discriminator for predicting the structures of predictive models.
* '''Solution:''' We create a model generator in the framework of symbolic regression. We create a model generator as a variational autoencoder (we won’t have time during the course). We study the metric properties of sample spaces (Euclidean) and models (Banach). We create a GAN pair - a generator-discriminator for predicting the structures of predictive models.
* '''Novelty:''' So far, no one has succeeded. Here they discussed Tommi Yaakkola, how he came to us in Yandex. He hasn't succeeded yet either.
* '''Novelty:''' So far, no one has succeeded. Here they discussed Tommi Yaakkola, how he came to us in Yandex. He hasn't succeeded yet either.
-
* '''Authors:''' Expert Strizhov V.V., Radoslav Neichev - consultant
+
* '''Authors:''' Expert Strijov V.V., Radoslav Neichev - consultant
-
=== Task 13 ===
+
===13. 2020===
-
* '''Name:''' Deep learning for RNA secondary structure prediction
+
* '''Title:''' Deep learning for RNA secondary structure prediction
-
* '''Task''': RNA secondary structure is an important feature which defines RNA functional properties. Its importance can be illustrated by the fact, that it is evolutionary preserved and some types of functional RNAs always * have the same secondary structure, for example all tRNAs fold into cloverleaf. As secondary structure often defines functions, knowing RNAs secondary structure may help investigate functions of novel RNA molecules. RNA folding is not as easy as DNA folding, because RNA is single stranded molecule which forms complicated base-pairing interactions, while DNA mostly exists as fully base paired double helices. Current methods of RNA structure prediction rely on experimentally evaluated thermodynamic rules, but with thermodynamics alone only 80% of structures can be accurately predicted. We propose an AI-driven method for predicting RNA secondary structure inspired by neural machine translation model.
+
* '''Problem:''' RNA secondary structure is an important feature which defines RNA functional properties. Its importance can be illustrated by the fact, that it is evolutionary preserved and some types of functional RNAs always * have the same secondary structure, for example all tRNAs fold into cloverleaf. As secondary structure often defines functions, knowing RNAs secondary structure may help investigate functions of novel RNA molecules. RNA folding is not as easy as DNA folding, because RNA is single stranded molecule which forms complicated base-pairing interactions, while DNA mostly exists as fully base paired double helices. Current methods of RNA structure prediction rely on experimentally evaluated thermodynamic rules, but with thermodynamics alone only 80% of structures can be accurately predicted. We propose an AI-driven method for predicting RNA secondary structure inspired by neural machine translation model.
* '''Data:''' RNA sequences in form of strings of characters
* '''Data:''' RNA sequences in form of strings of characters
-
* '''References:''': https://arxiv.org/abs/1609.08144
+
* '''References:''' https://arxiv.org/abs/1609.08144
-
* '''Basic algorithm:''' https://www.ncbi.nlm.nih.gov/pubmed/16873527
+
* '''Base algorithm:''' https://www.ncbi.nlm.nih.gov/pubmed/16873527
* '''Solution:''' Deep learning recurrent encoder-decoder model with attention
* '''Solution:''' Deep learning recurrent encoder-decoder model with attention
* '''Novelty:''' Currently RNA secondary structure prediction still remains unsolved problem and to the best of our knowledge DL approach has never been introduced in the literature before
* '''Novelty:''' Currently RNA secondary structure prediction still remains unsolved problem and to the best of our knowledge DL approach has never been introduced in the literature before
-
* '''Authors:''' consultant Maria Popova, Alexander Isaev (we are waiting for a response from them, without a response task is removed)
+
* '''Authors:''' consultant Maria Popova, Alexander Isaev (we are waiting for a response from them, without a response The problem is removed)
-
=== Task 65 ===
+
===65. 2020===
-
* '''Name:''' Approximation of low-dimensional samples by heterogeneous models
+
* '''Title:''' Approximation of low-dimensional samples by heterogeneous models
-
* '''Task:''' The problem of knowledge transfer (Hinton's distillation, Vapnik's privileged learning) from one network to another is investigated.
+
* '''Problem description:''' The problem of knowledge transfer (Hinton's distillation, Vapnik's privileged learning) from one network to another is investigated.
* '''Data:''' UCI samples, see what samples are used in papers on this topic
* '''Data:''' UCI samples, see what samples are used in papers on this topic
-
* '''References:''':
+
* '''References:'''
-
** Neichev's Diploma [http://www.machinelearning.ru/wiki/images/3/36/NeyhevMS_Thesis.pdf Informative a priori assumptions in the privileged learning problem], [http://www.machinelearning.ru/wiki/images/1/1c/NeychevMS_Slides.pdf presentation]
+
*# Neichev's Diploma [http://www.machinelearning.ru/wiki/images/3/36/NeyhevMS_Thesis.pdf Informative a priori assumptions in the privileged learning problem], [http://www.machinelearning.ru/wiki/images/1/1c/NeychevMS_Slides.pdf presentation]
-
** Works Hinton Knowledge distilling, pay attention to error functions
+
*# Works Hinton Knowledge distilling, pay attention to error functions
-
* '''Basic algorithm:''' described in the work of Neichev
+
* '''Base algorithm:''' described in the work of Neichev
* '''Novelty:''' Exploring different sampling methods
* '''Novelty:''' Exploring different sampling methods
* '''Solution:'''Try different models that are in the lectures, from non-parametric to deep ones, compare and visualize the likelihood functions
* '''Solution:'''Try different models that are in the lectures, from non-parametric to deep ones, compare and visualize the likelihood functions
-
* '''Authors:''' consultants Mark Potanin, (ask Andrey Grabovoi for help) Strizhov V.V.
+
* '''Authors:''' consultants Mark Potanin, (ask Andrey Grabovoi for help) Strijov V.V.
-
=== Task 67 ===
+
===67. 2020===
-
* '''Name:''' Selection of topics in topic models for exploratory information retrieval.
+
* '''Title:''' Selection of topics in topic models for exploratory information retrieval.
-
* '''Task:''' Test the hypothesis that when searching for similar documents by their topic vectors, not all topics are informative, so discarding some topics can increase the accuracy and completeness of the search. Consider the alternative hypothesis that instead of discarding topics, one can compare vectors by a weighted cosine proximity measure with adjustable weights.
+
* '''Problem description:''' Test the hypothesis that when searching for similar documents by their topic vectors, not all topics are informative, so discarding some topics can increase the accuracy and completeness of the search. Consider the alternative hypothesis that instead of discarding topics, one can compare vectors by a weighted cosine proximity measure with adjustable weights.
* '''Data:''' Text collections of sites habr.com and techcrunch.com. Labeled selections: queries and related documents.
* '''Data:''' Text collections of sites habr.com and techcrunch.com. Labeled selections: queries and related documents.
-
* '''References:''':
+
* '''References:'''
-
*# ''Vorontsov K. V.'' [[Media:voron17survey-artm.pdf|Probabilistic Topic Modeling: An Overview of Models and Additive Regularization]].
+
*# Vorontsov K. V. [[Media:voron17survey-artm.pdf|Probabilistic Topic Modeling: An Overview of Models and Additive Regularization]].
-
*# ''Ianina A., Vorontsov K.'' [https://fruct.org/publications/fruct25/files/Ian.pdf Regularized Multimodal Hierarchical Topic Model for Document-by-Document Exploratory Search] // FRUCT ISMW, 2019.
+
*# Ianina A., Vorontsov K. [https://fruct.org/publications/fruct25/files/Ian.pdf Regularized Multimodal Hierarchical Topic Model for Document-by-Document Exploratory Search] // FRUCT ISMW, 2019.
-
* '''Basic algorithm:''' The topic model with regularizers and modalities described in the article (source code available).
+
* '''Base algorithm:''' The topic model with regularizers and modalities described in the article (source code available).
* '''Novelty:'''The question of informativeness of topics for vector search of thematically related documents has not been studied before.
* '''Novelty:'''The question of informativeness of topics for vector search of thematically related documents has not been studied before.
* '''Solution:''' Evaluate the individual informativeness of topics by throwing them out one at a time; then sort the topics by individual informativeness and determine the threshold for cutting off non-informative topics. A suggestion as to why this should work: background themes are not informative, and discarding them increases search accuracy and recall by a few percent.
* '''Solution:''' Evaluate the individual informativeness of topics by throwing them out one at a time; then sort the topics by individual informativeness and determine the threshold for cutting off non-informative topics. A suggestion as to why this should work: background themes are not informative, and discarding them increases search accuracy and recall by a few percent.
-
* '''Authors:''' [[Участник:Vokov|Vorontsov K. V.в]], consultant Anastasia Yanina.
+
* '''Authors:''' [http://www.machinelearning.ru/wiki/index.php?title=Участник:Vokov Vorontsov K. V.], consultant Anastasia Yanina.
-
=== Task 68 ===
+
===68. 2020===
-
* '''Name:''' Meta-learning of topic classification models.
+
* '''Title:''' Meta-learning of topic classification models.
-
* '''Task:''' Develop universal heuristics for a priori assignment of modality weights in thematic models of text classification.
+
* '''Problem description:''' Develop universal heuristics for a priori assignment of modality weights in thematic models of text classification.
* '''Data:''' [https://docs.google.com/spreadsheets/d/1dhiz7ecgWH7lWi1wM4OkhlDI2r1D_OvcGUXaP8CDHEI/edit#gid=0 Description of datasets], [https://drive.google.com/drive/folders/1PPnw6aZOJAJoLRYuwdGm437RssV-XQx0?usp=sharing Folder with datasets].
* '''Data:''' [https://docs.google.com/spreadsheets/d/1dhiz7ecgWH7lWi1wM4OkhlDI2r1D_OvcGUXaP8CDHEI/edit#gid=0 Description of datasets], [https://drive.google.com/drive/folders/1PPnw6aZOJAJoLRYuwdGm437RssV-XQx0?usp=sharing Folder with datasets].
-
* '''References:''':
+
* '''References:'''
-
*# ''Vorontsov K. V.'' [[Media:voron17survey-artm.pdf|Probabilistic Topic Modeling: An Overview of Models and Additive Regularization]].
+
*# Vorontsov K. V. [[Media:voron17survey-artm.pdf|Probabilistic Topic Modeling: An Overview of Models and Additive Regularization]].
-
* '''Basic algorithm:''' Thematic classification models for several datasets.
+
* '''Base algorithm:''' Thematic classification models for several datasets.
* '''Novelty:'''In topic modeling, the problem of automatic selection of modality weights has not yet been solved.
* '''Novelty:'''In topic modeling, the problem of automatic selection of modality weights has not yet been solved.
* '''Solution:''' Optimize the weights of modalities according to the quality criterion of text classification. Investigate the dependence of the optimal relative weights of modalities on the dimensional characteristics of the problem. Find formulas for estimating the initial values of modality weights without explicitly solving the problem. To reproduce datasets, apply sampling of fragments of source documents.
* '''Solution:''' Optimize the weights of modalities according to the quality criterion of text classification. Investigate the dependence of the optimal relative weights of modalities on the dimensional characteristics of the problem. Find formulas for estimating the initial values of modality weights without explicitly solving the problem. To reproduce datasets, apply sampling of fragments of source documents.
-
* '''Authors:''' [[Участник:Vokov|Vorontsov K. V.]], consultant Yulian Serdyuk.
+
* '''Authors:''' [http://www.machinelearning.ru/wiki/index.php?title=Участник:Vokov Vorontsov K. V.], consultant Yulian Serdyuk.
-
===Task 70 ===
+
===70. 2020===
* Name: Investigation of the structure of the target space when building a predictive model
* Name: Investigation of the structure of the target space when building a predictive model
-
* Task:The problem of forecasting a complex target variable is studied. Complexity means the presence of dependencies (linear or non-linear). It is assumed that the initial data are heterogeneous: the spaces of the independent and target variables are of different nature. It is required to build a predictive model that would take into account the dependence in the source space of the independent variable, as well as in the space of the target variable.
+
* The problem:The problem of forecasting a complex target variable is studied. Complexity means the presence of dependencies (linear or non-linear). It is assumed that the initial data are heterogeneous: the spaces of the independent and target variables are of different nature. It is required to build a predictive model that would take into account the dependence in the source space of the independent variable, as well as in the space of the target variable.
* Data: Heterogeneous data: picture - text, picture - speech and so on.
* Data: Heterogeneous data: picture - text, picture - speech and so on.
* Basic algorithm: As basic algorithms, it is proposed to use a linear model, as well as a nonlinear neural network model.
* Basic algorithm: As basic algorithms, it is proposed to use a linear model, as well as a nonlinear neural network model.
-
* Authors: Strizhov V.V. - Expert, consultant: Isachenko Roman.
+
* '''Authors:''' Strijov V.V. - Expert, consultant: Isachenko Roman.
-
===Task 71 ===
+
===71. 2020===
* Name: Investigation of ways to match models by reducing the dimension of space
* Name: Investigation of ways to match models by reducing the dimension of space
-
* Task: The task of predicting a complex target variable is investigated. Complexity means the presence of dependencies (linear or non-linear). It is proposed to study ways to take into account dependencies in the space of the target variable, as well as the conditions under which these dependencies affect the quality of the final predictive model.
+
* '''Problem description:''' The problem of predicting a complex target variable is investigated. Complexity means the presence of dependencies (linear or non-linear). It is proposed to study ways to take into account dependencies in the space of the target variable, as well as the conditions under which these dependencies affect the quality of the final predictive model.
* Data: Synthetic data with known data generation hypothesis.
* Data: Synthetic data with known data generation hypothesis.
* Basic algorithm: As basic algorithms, it is proposed to use space dimensionality reduction methods (PCA, PLS, autoencoder) and linear matching models.
* Basic algorithm: As basic algorithms, it is proposed to use space dimensionality reduction methods (PCA, PLS, autoencoder) and linear matching models.
-
* Authors: Strizhov V.V. - Expert, consultant: Isachenko Roman.
+
* '''Authors:''' Strijov V.V. - Expert, consultant: Isachenko Roman.
-
===Task 72 ===
+
===72. 2020===
* Name: Construction of a single latent space in the problem of modeling heterogeneous data.
* Name: Construction of a single latent space in the problem of modeling heterogeneous data.
-
* Task: The task of predicting a complex target variable is investigated. Complexity means the presence of dependencies (linear or non-linear). It is proposed to build a single latent space for the independent and target variables. Model matching is proposed to be carried out in the resulting low-dimensional space.
+
* '''Problem description:''' The problem of predicting a complex target variable is investigated. Complexity means the presence of dependencies (linear or non-linear). It is proposed to build a single latent space for the independent and target variables. Model matching is proposed to be carried out in the resulting low-dimensional space.
* Data: Heterogeneous data: picture - text, picture - speech and so on.
* Data: Heterogeneous data: picture - text, picture - speech and so on.
* Basic algorithm: As basic algorithms, it is proposed to use space dimensionality reduction methods (PCA, PLS, autoencoder) and linear matching models.
* Basic algorithm: As basic algorithms, it is proposed to use space dimensionality reduction methods (PCA, PLS, autoencoder) and linear matching models.
-
* Authors: Strizhov V.V. - Expert, consultant: Isachenko Roman.
+
* '''Authors:''' Strijov V.V. - Expert, consultant: Isachenko Roman.
-
=== Task 73 ===
+
===73. 2020===
-
* '''Name:''' Nonlinear ranking of exploratory information search results.
+
* '''Title:''' Nonlinear ranking of exploratory information search results.
-
* '''Task:''' Develop an algorithm for recommending the reading order of documents (reading order, reading list) found using exploratory information retrieval. Documents should be ranked from simple to complex, from general to specific, that is, in the order in which it will be easier for the user to understand a new subject area for him. The algorithm must build a reading graph - a partial order relation on the set of found documents; in particular, it can be a collection of trees (document forest).
+
* '''Problem description:''' Develop an algorithm for recommending the reading order of documents (reading order, reading list) found using exploratory information retrieval. Documents should be ranked from simple to complex, from general to specific, that is, in the order in which it will be easier for the user to understand a new subject area for him. The algorithm must build a reading graph - a partial order relation on the set of found documents; in particular, it can be a collection of trees (document forest).
* '''Data:''' Part of Wikipedia and reference reading graph derived from Wikipedia categories.
* '''Data:''' Part of Wikipedia and reference reading graph derived from Wikipedia categories.
-
* '''References:''':
+
* '''References:'''
-
*# ''Vorontsov K. V.'' [[Media:voron17survey-artm.pdf|Probabilistic Topic Modeling: An Overview of Models and Additive Regularization]].
+
*# Vorontsov K. V. [[Media:voron17survey-artm.pdf|Probabilistic Topic Modeling: An Overview of Models and Additive Regularization]].
-
*# ''Georgia Koutrika, Lei Liu, and Steven Simske''. [https://www.hpl.hp.com/techreports/2014/HPL-2014-5R1.pdf Generating reading orders over document collections]. HP Laboratories, 2014.
+
*# Georgia Koutrika, Lei Liu, and Steven Simske. [https://www.hpl.hp.com/techreports/2014/HPL-2014-5R1.pdf Generating reading orders over document collections]. HP Laboratories, 2014.
-
*# ''James G. Jardine''. [https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-848.pdf Automatically generating reading lists]. Cambridge, 2014.
+
*# James G. Jardine. [https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-848.pdf Automatically generating reading lists]. Cambridge, 2014.
-
* '''Basic algorithm:''' described in the article G.Koutrika.
+
* '''Base algorithm:''' described in the article G.Koutrika.
-
* '''Novelty:''' Task has been little studied in the literature. Regularized multimodal topic models (ARTM, BigARTM) have never been applied to this problem.
+
* '''Novelty:''' The problem has been little studied in the literature. Regularized multimodal topic models (ARTM, BigARTM) have never been applied to this problem.
* '''Solution:''' The use of ARTM topic models in conjunction with estimates of the cognitive complexity of the text.
* '''Solution:''' The use of ARTM topic models in conjunction with estimates of the cognitive complexity of the text.
-
* '''Authors:''' [[Участник:Vokov|Vorontsov K. V.]], consultant Maxim Eremeev.
+
* '''Authors:''' [http://www.machinelearning.ru/wiki/index.php?title=Участник:Vokov Vorontsov K. V.], consultant Maxim Eremeev.
-
 
+
-
=2019=
+
-
* Story [[Automation of scientific research in machine learning (practice, Strizhov V.V.)/ Group 694, spring 2019|2019 (694)]] — [[Numerical methods of learning by precedents (practice, Strizhov V.V.)/Group 574, spring 2018 | 2018]] — [[Numerical methods of learning by precedents (practice, Strizhov V.V.)/Group 474, spring 2017 | 2017]] — [[Numerical methods of learning by precedents (practice, Strizhov V.V.)/Group 374, spring 2016 | 2016]] — [[Numerical methods of learning by precedents (practice, Strizhov V.V.)/Group 274, spring 2015 | 2015]] — [[Numerical methods of learning by precedents (practice, Strizhov V.V.)/Group 174, spring 2014 | 2014]] — [[Numerical methods of learning by precedents (practice, Strizhov V.V.)/Group 074, spring 2013 | 2013]]
+
==2019==
{|class="wikitable"
{|class="wikitable"
Строка 950: Строка 1580:
!
!
|-
|-
-
|[[Участник:Severilov.pa|Severilov Pavel]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Severilov.pa Severilov Pavel]
-
|Task of searching characters in texts
+
|The problem of searching characters in texts
|[https://docs.google.com/document/d/1FljjnPqYXNj9u7zjLCMf8eKYcbTmsSUmZbs0BDvzI84/edit?usp=sharing LinkReview]
|[https://docs.google.com/document/d/1FljjnPqYXNj9u7zjLCMf8eKYcbTmsSUmZbs0BDvzI84/edit?usp=sharing LinkReview]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-46/tree/master/code code]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-46/tree/master/code code]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-46/raw/master/Severilov2019SymbolsInTexts/Severilov2019SymbolsInTexts.pdf paper]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-46/raw/master/Severilov2019SymbolsInTexts/Severilov2019SymbolsInTexts.pdf paper]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-46/raw/master/report/final_slides/Severilov_Pr46.pdf slides] [https://www.youtube.com/watch?v=vaE1vLoPFVk video]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-46/raw/master/report/final_slides/Severilov_Pr46.pdf slides] [https://www.youtube.com/watch?v=vaE1vLoPFVk video]
-
|[[Участник:Mapishev| Murat Apishev]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Mapishev Murat Apishev]
|
|
|
|
|-
|-
-
|[[Участник:Grigorev.ad|Grigoriev Alexey]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Grigorev.ad Grigoriev Alexey]
|Text recognition based on skeletal representation of thick lines and convolutional networks
|Text recognition based on skeletal representation of thick lines and convolutional networks
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-9/blob/master/Grigorev2019Project9/LinkReview.pdf LinkReview]
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-9/blob/master/Grigorev2019Project9/LinkReview.pdf LinkReview]
Строка 966: Строка 1596:
[https://github.com/Intelligent-Systems-Phystech/2019-Project-9/raw/master/Grigorev2019Project9/report/Image_classification_based_on_skeletonization_and_Graph_NN.pdf paper], [https://github.com/Intelligent-Systems-Phystech/2019-Project-9/raw/master/Grigorev2019Project9/report/skeletons_presentation.pdf slides]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-9/raw/master/Grigorev2019Project9/report/Image_classification_based_on_skeletonization_and_Graph_NN.pdf paper], [https://github.com/Intelligent-Systems-Phystech/2019-Project-9/raw/master/Grigorev2019Project9/report/skeletons_presentation.pdf slides]
[https://www.youtube.com/watch?v=j0I1w8htPZA video]
[https://www.youtube.com/watch?v=j0I1w8htPZA video]
-
|[[Участник:Ilyazharikov| Ilya Zharikov]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Ilyazharikov Ilya Zharikov]
-
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-9/raw/master/Grigorev2019Project9/report/Grigorev_review.docx review] [[Участник:Varenik.nv|Varenyk Natalia]]
+
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-9/raw/master/Grigorev2019Project9/report/Grigorev_review.docx review] [http://www.machinelearning.ru/wiki/index.php?title=Участник:Varenik.nv Varenyk Natalia]
|
|
|-
|-
-
|[[Участник:Grishanov|Grishanov Alexey]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Grishanov Grishanov Alexey]
-
|Automatic configuration of BigARTM parameters for a wide class of tasks
+
|Automatic configuration of BigARTM parameters for a wide class of The problems
|[https://docs.google.com/document/d/1UFvURCZloCHlnLTTJmpXFr_-GWCo4t8fTOJl4FygtJk/edit?usp=sharing LinkReview] [https://github.com/Intelligent-Systems-Phystech/2019-Project-4/tree/master/code code], [https://github.com/Intelligent-Systems-Phystech/2019-Project-4/raw/master/Grishanov2019Project4/Grishanov2019Project4.pdf paper][https://github.com/Intelligent-Systems-Phystech/2019-Project-4/raw/master/report/Grishanov2019Presentation.pdf slides]
|[https://docs.google.com/document/d/1UFvURCZloCHlnLTTJmpXFr_-GWCo4t8fTOJl4FygtJk/edit?usp=sharing LinkReview] [https://github.com/Intelligent-Systems-Phystech/2019-Project-4/tree/master/code code], [https://github.com/Intelligent-Systems-Phystech/2019-Project-4/raw/master/Grishanov2019Project4/Grishanov2019Project4.pdf paper][https://github.com/Intelligent-Systems-Phystech/2019-Project-4/raw/master/report/Grishanov2019Presentation.pdf slides]
[https://www.youtube.com/watch?v=OVGUuHUvNjc video]
[https://www.youtube.com/watch?v=OVGUuHUvNjc video]
|Viktor Bulatov
|Viktor Bulatov
-
|[https://github.com/Nikolay-Gerasimenko/Experiment/raw/master/Рецензия%20на%20рукопись.docx review][[Участник:Nikolay-Gerasimenko| Gerasimenko Nikolay]]
+
|[https://github.com/Nikolay-Gerasimenko/Experiment/raw/master/Рецензия%20на%20рукопись.docx review][http://www.machinelearning.ru/wiki/index.php?title=Участник:Nikolay-Gerasimenko Gerasimenko Nikolay]
|
|
|-
|-
-
|[[Участник:Yusupov_igor|Yusupov Igor]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Yusupov_igor Yusupov Igor]
|Dynamic alignment of multivariate time series
|Dynamic alignment of multivariate time series
|[https://docs.google.com/document/d/1RHAdwtvDZU5JS6cTVKEWYkSI-6KgwDd3aefpBAw9Ujw/edit LinkReview] code [https://github.com/igor-yusupov/2018-Project-3/raw/patch-1/Yusupov2019Title/Yusupov2019.pdf paper] [https://github.com/igor-yusupov/2018-Project-3/raw/patch-1/Yusupov2019Title/presentation.pdf slides] [https://www.youtube.com/watch?v=wtnGACpmU8k video]
|[https://docs.google.com/document/d/1RHAdwtvDZU5JS6cTVKEWYkSI-6KgwDd3aefpBAw9Ujw/edit LinkReview] code [https://github.com/igor-yusupov/2018-Project-3/raw/patch-1/Yusupov2019Title/Yusupov2019.pdf paper] [https://github.com/igor-yusupov/2018-Project-3/raw/patch-1/Yusupov2019Title/presentation.pdf slides] [https://www.youtube.com/watch?v=wtnGACpmU8k video]
Строка 985: Строка 1615:
|
|
|-
|-
-
|[[Участник:Varenik.nv|Varenyk Natalia]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Varenik.nv Varenyk Natalia]
|Spherical CNN for QSAR prediction
|Spherical CNN for QSAR prediction
|[https://docs.google.com/document/d/13L7JHa3H19lSuJKRgq2novuzSnaMv3MpwwGpcW5rRZc/edit LinkReview], [https://github.com/Natalia-Varenik/s2cnn code], [https://github.com/Intelligent-Systems-Phystech/2019-Project-47/raw/master/Varenik2019Project47/Varenik2019Project47.pdf paper], [https://github.com/Intelligent-Systems-Phystech/2019-Project-47/raw/master/report/Varenik2019Project47Presentation.pdf slides] [https://www.youtube.com/watch?v=0kJW898HPqM video]
|[https://docs.google.com/document/d/13L7JHa3H19lSuJKRgq2novuzSnaMv3MpwwGpcW5rRZc/edit LinkReview], [https://github.com/Natalia-Varenik/s2cnn code], [https://github.com/Intelligent-Systems-Phystech/2019-Project-47/raw/master/Varenik2019Project47/Varenik2019Project47.pdf paper], [https://github.com/Intelligent-Systems-Phystech/2019-Project-47/raw/master/report/Varenik2019Project47Presentation.pdf slides] [https://www.youtube.com/watch?v=0kJW898HPqM video]
-
|[[Участник:Mpopova|Maria Popova]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Mpopova Maria Popova]
-
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-47/raw/master/report/review.pdf review] [[Участник:Grigorev.ad|Grigoriev Alexey]]
+
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-47/raw/master/report/review.pdf review] [http://www.machinelearning.ru/wiki/index.php?title=Участник:Grigorev.ad Grigoriev Alexey]
|
|
|-
|-
-
|[[Участник:Beznosikov.an|Beznosikov Alexander]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Beznosikov.an Beznosikov Alexander]
|Z-learning of linearly-solvable Markov Decision Processes
|Z-learning of linearly-solvable Markov Decision Processes
|[https://docs.google.com/document/d/1Ef25ueOxzBkbcAFV24fuCEHAApwxspGRAPq_r2hw0EM/edit?usp=sharing LinkReview]
|[https://docs.google.com/document/d/1Ef25ueOxzBkbcAFV24fuCEHAApwxspGRAPq_r2hw0EM/edit?usp=sharing LinkReview]
Строка 1002: Строка 1632:
|
|
|-
|-
-
|[[Участник:PanchenkoSviatoslav|Panchenko Svyatoslav]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:PanchenkoSviatoslav Panchenko Svyatoslav]
|Obtaining a simple sample at the output of the neural network layer
|Obtaining a simple sample at the output of the neural network layer
|[https://docs.google.com/document/d/1CPgyqyaM4pv_6jxFio5NwU_Ncgu6tazFxl_jgH4gSWQ/edit?usp=sharing LinkReview],
|[https://docs.google.com/document/d/1CPgyqyaM4pv_6jxFio5NwU_Ncgu6tazFxl_jgH4gSWQ/edit?usp=sharing LinkReview],
[https://github.com/Intelligent-Systems-Phystech/2019-Project-43/tree/master/code code],
[https://github.com/Intelligent-Systems-Phystech/2019-Project-43/tree/master/code code],
[https://github.com/Intelligent-Systems-Phystech/2019-Project-43/raw/master/Panchenko2019Project43/Panchenko2019Project43.pdf paper], slides
[https://github.com/Intelligent-Systems-Phystech/2019-Project-43/raw/master/Panchenko2019Project43/Panchenko2019Project43.pdf paper], slides
-
|[[Участник:Tamaz|Gadaev Tamaz]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Tamaz Gadaev Tamaz]
|
|
|
|
|-
|-
-
|[[Участник:VeselovaER|Veselova Evgeniya]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:VeselovaER Veselova Evgeniya]
|Deep Learning for reliable detection of tandem repeats in 3D protein structures
|Deep Learning for reliable detection of tandem repeats in 3D protein structures
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-14 Code] [https://docs.google.com/document/d/1_BtCiAihPg9ON-2PlxORkcmwL80pgqC4gOE7A03rQjg link review] [https://github.com/Intelligent-Systems-Phystech/2019-Project-14/raw/master/Veselova2019Project14/Veselova2019Project14.pdf paper] [https://github.com/Intelligent-Systems-Phystech/2019-Project-14/raw/master/Veselova2019Project14/Veselova2019Slides.pdf slides] [https://www.youtube.com/watch?v=XGLT5BGYTek video]
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-14 Code] [https://docs.google.com/document/d/1_BtCiAihPg9ON-2PlxORkcmwL80pgqC4gOE7A03rQjg link review] [https://github.com/Intelligent-Systems-Phystech/2019-Project-14/raw/master/Veselova2019Project14/Veselova2019Project14.pdf paper] [https://github.com/Intelligent-Systems-Phystech/2019-Project-14/raw/master/Veselova2019Project14/Veselova2019Slides.pdf slides] [https://www.youtube.com/watch?v=XGLT5BGYTek video]
Строка 1018: Строка 1648:
|
|
|-
|-
-
|[[Участник:Aminov.tv|Aminov Timur]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Aminov.tv Aminov Timur]
|Quality Prediction for a Feature Selection Procedure
|Quality Prediction for a Feature Selection Procedure
|[https://docs.google.com/document/d/1HLo0fNei0KoTrFQNgkdubFCM39PRpEYOyeF1WilibpY/edit LinkReview] code [https://github.com/Intelligent-Systems-Phystech/2019-Project-40/raw/master/doc/Aminov2019FSPP.pdf paper]
|[https://docs.google.com/document/d/1HLo0fNei0KoTrFQNgkdubFCM39PRpEYOyeF1WilibpY/edit LinkReview] code [https://github.com/Intelligent-Systems-Phystech/2019-Project-40/raw/master/doc/Aminov2019FSPP.pdf paper]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-40/raw/master/doc/pres%20(1).pdf slides]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-40/raw/master/doc/pres%20(1).pdf slides]
-
|[[Участник:Isachenkoroma | Roman Isachenko]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Isachenkoroma Roman Isachenko]
|
|
|
|
|-
|-
-
|[[Участник:Vmarkin|Markin Valery]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Vmarkin Markin Valery]
|Investigation of the properties of local models in the spatial decoding of brain signals
|Investigation of the properties of local models in the spatial decoding of brain signals
|[https://docs.google.com/document/d/17rXnTPT9M6nYEkoxwfv5XDE8LIBt-mR1wv2vzrQSljw/edit?usp=sharing LinkReview]
|[https://docs.google.com/document/d/17rXnTPT9M6nYEkoxwfv5XDE8LIBt-mR1wv2vzrQSljw/edit?usp=sharing LinkReview]
Строка 1032: Строка 1662:
[https://github.com/Intelligent-Systems-Phystech/ECoG_Project/raw/master/Markin2019SpatialDecoding.pdf paper]
[https://github.com/Intelligent-Systems-Phystech/ECoG_Project/raw/master/Markin2019SpatialDecoding.pdf paper]
[https://github.com/Intelligent-Systems-Phystech/ECoG_Project/raw/master/Markin2019Slides.pdf slides] [https://www.youtube.com/watch?v=l_4AJ-Xb5cs video]
[https://github.com/Intelligent-Systems-Phystech/ECoG_Project/raw/master/Markin2019Slides.pdf slides] [https://www.youtube.com/watch?v=l_4AJ-Xb5cs video]
-
|[[Участник:Isachenkoroma | Roman Isachenko]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Isachenkoroma Roman Isachenko]
|
|
|
|
|-
|-
-
|[[Участник:Sadiev1998| Abdurahmon Sadiev]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Sadiev1998 Abdurahmon Sadiev]
|Generation of features using locally approximating models
|Generation of features using locally approximating models
|[https://docs.google.com/document/d/1A_rWU-2DnvD3ZVCOPLQcAEqB3Iw2YyWOqb9YspByh9o/edit LinkReview]
|[https://docs.google.com/document/d/1A_rWU-2DnvD3ZVCOPLQcAEqB3Iw2YyWOqb9YspByh9o/edit LinkReview]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-8/tree/master/code code], [https://github.com/Intelligent-Systems-Phystech/2019-Project-8/raw/master/paper/Feature_gen.pdf paper],
[https://github.com/Intelligent-Systems-Phystech/2019-Project-8/tree/master/code code], [https://github.com/Intelligent-Systems-Phystech/2019-Project-8/raw/master/paper/Feature_gen.pdf paper],
[https://github.com/Intelligent-Systems-Phystech/2019-Project-8/raw/master/slides_Sadiev.pdf slides] [https://www.youtube.com/watch?v=bDpvKQRZA7w video]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-8/raw/master/slides_Sadiev.pdf slides] [https://www.youtube.com/watch?v=bDpvKQRZA7w video]
-
|[[Участник:Anastasiya | Anastasia Motrenko]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Anastasiya Anastasia Motrenko]
|
|
|
|
|-
|-
-
|[[Участник:Tagirschik| Tagir Sattarov]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Tagirschik Tagir Sattarov]
|Machine translation training without parallel texts.
|Machine translation training without parallel texts.
|[https://docs.google.com/document/d/1ORgDN1bVeIduWTdcmjl9R346MNIgpe0_T3G-aUtrxlo/edit?usp=sharing LinkReview] [https://github.com/Intelligent-Systems-Phystech/2019-project-12/blob/master/monogolingual_mt_example.ipynb code] [https://github.com/Intelligent-Systems-Phystech/2019-project-12/blob/master/paper.pdf paper], [https://github.com/Intelligent-Systems-Phystech/2019-project-12/raw/master/Sattarov_presentation.pdf slides] [https://www.youtube.com/watch?v=wduZgu6ym-0 video]
|[https://docs.google.com/document/d/1ORgDN1bVeIduWTdcmjl9R346MNIgpe0_T3G-aUtrxlo/edit?usp=sharing LinkReview] [https://github.com/Intelligent-Systems-Phystech/2019-project-12/blob/master/monogolingual_mt_example.ipynb code] [https://github.com/Intelligent-Systems-Phystech/2019-project-12/blob/master/paper.pdf paper], [https://github.com/Intelligent-Systems-Phystech/2019-project-12/raw/master/Sattarov_presentation.pdf slides] [https://www.youtube.com/watch?v=wduZgu6ym-0 video]
-
|[[Участник:Oleg_Bakhteev | Oleg Bakhteev]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Oleg_Bakhteev Oleg Bakhteev]
|
|
|
|
|-
|-
-
|[[Участник:Nikolay-Gerasimenko| Gerasimenko Nikolay]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Nikolay-Gerasimenko Gerasimenko Nikolay]
|Thematic search for similar cases in the collection of acts of arbitration courts.
|Thematic search for similar cases in the collection of acts of arbitration courts.
|[https://docs.google.com/document/d/1D1fOYNCne6sU5oqgET4s9WKmSj84-Ra8pSRKoi215kc/edit?usp=sharing LinkReview] [https://github.com/Intelligent-Systems-Phystech/2019-Project-50/tree/master/code code] [https://github.com/Intelligent-Systems-Phystech/2019-Project-50/raw/master/Gerasimenko2019Project50/Russian/Gerasimenko2019Project50.pdf paper] [https://github.com/Intelligent-Systems-Phystech/2019-Project-50/raw/master/report/Gerasimenko2019Project50Presentation.pdf slides] [https://www.youtube.com/watch?v=EhgQexs2yIQ video]
|[https://docs.google.com/document/d/1D1fOYNCne6sU5oqgET4s9WKmSj84-Ra8pSRKoi215kc/edit?usp=sharing LinkReview] [https://github.com/Intelligent-Systems-Phystech/2019-Project-50/tree/master/code code] [https://github.com/Intelligent-Systems-Phystech/2019-Project-50/raw/master/Gerasimenko2019Project50/Russian/Gerasimenko2019Project50.pdf paper] [https://github.com/Intelligent-Systems-Phystech/2019-Project-50/raw/master/report/Gerasimenko2019Project50Presentation.pdf slides] [https://www.youtube.com/watch?v=EhgQexs2yIQ video]
|Ekaterina Artyomova
|Ekaterina Artyomova
-
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-50/raw/master/Gerasimenko2019Project50/Russian/Review.docx review][[Участник:Grishanov|Grishanov Alexey]]
+
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-50/raw/master/Gerasimenko2019Project50/Russian/Review.docx review][http://www.machinelearning.ru/wiki/index.php?title=Участник:Grishanov Grishanov Alexey]
|
|
|}
|}
-
=== Task 40 ===
+
===40. 2019===
-
* '''Name:''' Quality prediction for the feature selection procedure.
+
* '''Title:''' Quality prediction for the feature selection procedure.
-
* '''Task:''' The solution of the feature selection problem is reduced to enumeration of binary cube vertices. This procedure cannot be performed for a sample with a large number of features. It is proposed to reduce this problem to optimization in a linear space.
+
* '''Problem description:''' The solution of the feature selection problem is reduced to enumeration of binary cube vertices. This procedure cannot be performed for a sample with a large number of features. It is proposed to reduce this problem to optimization in a linear space.
* '''Data:''' Synthetic data + simple samples
* '''Data:''' Synthetic data + simple samples
-
* '''References:''':
+
* '''References:'''
*# Bertsimas D. et al. Best subset selection via a modern optimization lens //The annals of statistics. – 2016. – Т. 44. – №. 2. – С. 813-852.
*# Bertsimas D. et al. Best subset selection via a modern optimization lens //The annals of statistics. – 2016. – Т. 44. – №. 2. – С. 813-852.
*# Luo R. et al. Neural architecture optimization //Advances in Neural Information Processing Systems. – 2018. – С. 7827-7838.
*# Luo R. et al. Neural architecture optimization //Advances in Neural Information Processing Systems. – 2018. – С. 7827-7838.
-
* '''Basic algorithm:''' Popular feature selection methods.
+
* '''Base algorithm:''' Popular feature selection methods.
* '''Solution:''' In this paper, it is proposed to build a model that, based on a set of features, predicts the quality on a test sample. To do this, a mapping of a binary cube into a linear space is constructed. After that, the quality of the model in linear space is maximized. To reconstruct the solution of the problem, the model of inverse mapping into a binary cube is used.
* '''Solution:''' In this paper, it is proposed to build a model that, based on a set of features, predicts the quality on a test sample. To do this, a mapping of a binary cube into a linear space is constructed. After that, the quality of the model in linear space is maximized. To reconstruct the solution of the problem, the model of inverse mapping into a binary cube is used.
* '''Novelty:''' A constructively new approach to solving the problem of choosing models is proposed.
* '''Novelty:''' A constructively new approach to solving the problem of choosing models is proposed.
-
* '''Authors:''' Strizhov V.V., Tetiana Aksenova, consultant – Roman Isachenko
+
* '''Authors:''' Strijov V.V., Tetiana Aksenova, consultant – Roman Isachenko
-
=== Task 42 ===
+
===42. 2019===
-
* '''Name:''' Z-learning of linearly-solvable Markov Decision Processes
+
* '''Title:''' Z-learning of linearly-solvable Markov Decision Processes
-
* '''Task''': Adapt Z-learning from [1] to the case of Markov Decision Process discussed in [2] in the context of energy systems. Compare it with standard (in reinforcement learning) Q-learning.
+
* '''Problem:''' Adapt Z-learning from [1] to the case of Markov Decision Process discussed in [2] in the context of energy systems. Compare it with standard (in reinforcement learning) Q-learning.
* '''Data:''' We consider a Markov Process described via transition probability matrix. Given initial state vector (probability of being in a state at time zero), we generate data for the time evolution of the state vector. See [2] for an exemplary process describing evolution of an ensemble of energy consumers.
* '''Data:''' We consider a Markov Process described via transition probability matrix. Given initial state vector (probability of being in a state at time zero), we generate data for the time evolution of the state vector. See [2] for an exemplary process describing evolution of an ensemble of energy consumers.
-
* '''References:''':
+
* '''References:'''
*# E. Todorov. Linearly-solvable Markov decision problems https://homes.cs.washington.edu/~todorov/papers/TodorovNIPS06.pdf
*# E. Todorov. Linearly-solvable Markov decision problems https://homes.cs.washington.edu/~todorov/papers/TodorovNIPS06.pdf
-
*# Ensemble Control of Cycling Energy Loads: Markov Decision Approach. Michael Chertkov, Vladimir Y. Chernyak, Deepjyoti Deka. https://arxiv.org/abs/1701.04941
+
*# Ensemble Control of Cycling Energy Loads: Markov Decision Approach. Michael Chertkov, Vladimir Y. Chernyak, Deepjyoti Deka. https://arxiv.org/abs/1701.04941
*# Csaba Szepesvári. Algorithms for Reinforcement Learning. https://sites.ualberta.ca/~szepesva/papers/RLAlgsInMDPs.pdf
*# Csaba Szepesvári. Algorithms for Reinforcement Learning. https://sites.ualberta.ca/~szepesva/papers/RLAlgsInMDPs.pdf
-
* '''Basic algorithm:''' Principal comparison should be made with Q learning described in [3]
+
* '''Base algorithm:''' Principal comparison should be made with Q learning described in [3]
* '''Solution:''' We suppose that plugging in algorithm from [1] directly into [2] gives faster and more reliable solution.
* '''Solution:''' We suppose that plugging in algorithm from [1] directly into [2] gives faster and more reliable solution.
* '''Novelty:''' In the area of power systems there is a huge demand on fast reinforcement learning algorithms, but there is still a lack of that (in particular the ones respect the physics/underlying graph)
* '''Novelty:''' In the area of power systems there is a huge demand on fast reinforcement learning algorithms, but there is still a lack of that (in particular the ones respect the physics/underlying graph)
* '''Authors:''' Yury Maximov (consultant, expert), Michael Chertkov (expert)
* '''Authors:''' Yury Maximov (consultant, expert), Michael Chertkov (expert)
-
=== Task 1 ===
+
===1. 2019===
-
* '''Name:''' Forecasting the direction of movement of the price of exchange instruments according to the news flow.
+
* '''Title:''' Forecasting the direction of movement of the price of exchange instruments according to the news flow.
-
* '''Task:''' Build and explore a model for predicting the direction of price movement. Given a set of news S and a set of timestamps T corresponding to the time of publication of news from S. 2. Time series P, corresponding to the price of an exchange instrument, and time series V, corresponding to the volume of sales for this instrument, for a period of time T'. 3. The set T is a subset of the time period T'. 4. Time intervals w=[w0, w1], l=[l0, l1], d=[d0, d1], where w0 < w1=l0 < l1=d0 < d1. It is required to predict the direction of movement of the price of an exchange instrument at the time t=d0 according to the news released in the period w.
+
* '''Problem description:''' Build and explore a model for predicting the direction of price movement. Given a set of news S and a set of timestamps T corresponding to the time of publication of news from S. 2. Time series P, corresponding to the price of an exchange instrument, and time series V, corresponding to the volume of sales for this instrument, for a period of time T'. 3. The set T is a subset of the time period T'. 4. Time intervals w=[w0, w1], l=[l0, l1], d=[d0, d1], where w0 < w1=l0 < l1=d0 < d1. It is required to predict the direction of movement of the price of an exchange instrument at the time t=d0 according to the news released in the period w.
* '''Data:'''
* '''Data:'''
*# Financial data: data on quotes (at one tick interval) of several financial instruments (GAZP, SBER, VTBR, LKOH) for the 2nd quarter of 2017 from the Finam.ru website; for each point of the series, the date, time, price and volume are known.
*# Financial data: data on quotes (at one tick interval) of several financial instruments (GAZP, SBER, VTBR, LKOH) for the 2nd quarter of 2017 from the Finam.ru website; for each point of the series, the date, time, price and volume are known.
Строка 1096: Строка 1726:
*# Aysina Roza Munerovna, Thematic modeling of financial flows of corporate clients of a bank based on transactional data, final qualification work.
*# Aysina Roza Munerovna, Thematic modeling of financial flows of corporate clients of a bank based on transactional data, final qualification work.
*# Lee, Heeyoung, et al. "On the Importance of Text Analysis for Stock Price Prediction." LREC. 2014.
*# Lee, Heeyoung, et al. "On the Importance of Text Analysis for Stock Price Prediction." LREC. 2014.
-
* '''Basic algorithm:''' Method used in the article (4).
+
* '''Base algorithm:''' Method used in the article (4).
* '''Solution:''' Using topic modeling (ARTM) and local approximation models to translate a sequence of texts corresponding to different timestamps into a single feature description. Quality criterion: F1-score, ROC AUC, profitability of the strategy used.
* '''Solution:''' Using topic modeling (ARTM) and local approximation models to translate a sequence of texts corresponding to different timestamps into a single feature description. Quality criterion: F1-score, ROC AUC, profitability of the strategy used.
* '''Novelty:''' To substantiate the connection of time series, the Converging cross-mapping method is proposed.
* '''Novelty:''' To substantiate the connection of time series, the Converging cross-mapping method is proposed.
-
* '''Authors:''' Ivan Zaputlyaev (consultant), Strizhov V.V., K.V. Vorontsov (Experts)
+
* '''Authors:''' Ivan Zaputlyaev (consultant), Strijov V.V., K.V. Vorontsov (Experts)
-
=== Task 3 ===
+
===3. 2019===
-
* '''Name:''' Dynamic alignment of multidimensional time series.
+
* '''Title:''' Dynamic alignment of multidimensional time series.
-
* '''Task:''' A characteristic multidimensional time series is the trajectory of a point in 3-dimensional space. The two trajectories need to be optimally aligned with each other. For this, the distance DTW between two time series is used. In the classical representation, DTW is built between one-dimensional time series. It is necessary to introduce various modifications of the algorithm for working with high-dimensional time series: trajectories, corticograms.
+
* '''Problem description:''' A characteristic multidimensional time series is the trajectory of a point in 3-dimensional space. The two trajectories need to be optimally aligned with each other. For this, the distance DTW between two time series is used. In the classical representation, DTW is built between one-dimensional time series. It is necessary to introduce various modifications of the algorithm for working with high-dimensional time series: trajectories, corticograms.
* '''Data:''' The data describes 6 classes of time series from the mobile phone's accelerometer. https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group274/Goncharov2015MetricClassification/data/
* '''Data:''' The data describes 6 classes of time series from the mobile phone's accelerometer. https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group274/Goncharov2015MetricClassification/data/
* '''References:'''
* '''References:'''
*# Multidimensional DTW: https://pdfs.semanticscholar.org/76d3/5bd5a52453ebde80faaa1467d7effd74426f.pdf
*# Multidimensional DTW: https://pdfs.semanticscholar.org/76d3/5bd5a52453ebde80faaa1467d7effd74426f.pdf
-
* '''Basic algorithm:''' Using L_p distances between two dimensions of a time series, their modifications.
+
* '''Base algorithm:''' Using L_p distances between two dimensions of a time series, their modifications.
* '''Solution:''' Investigation of distances resistant to change of coordinate order, studies of distances unstable to change of coordinate order. Experiments with other types of distances (cosine, RBF, others).
* '''Solution:''' Investigation of distances resistant to change of coordinate order, studies of distances unstable to change of coordinate order. Experiments with other types of distances (cosine, RBF, others).
* '''Novelty:''' There is no complete review and study of methods for working with multivariate time series. The dependence of the quality of the solution on the selected distances between measurements has not been studied.
* '''Novelty:''' There is no complete review and study of methods for working with multivariate time series. The dependence of the quality of the solution on the selected distances between measurements has not been studied.
-
* '''Authors:''' Alexey Goncharov - consultant, Expert, Strizhov V.V. - Expert
+
* '''Authors:''' Alexey Goncharov - consultant, Expert, Strijov V.V. - Expert
-
=== Task 43 ===
+
===43. 2019===
-
* '''Name:''' Getting a simple sample at the output of the neural network layer
+
* '''Title:''' Getting a simple sample at the output of the neural network layer
-
* '''Task''': The output of the neural network is usually a generalized linear model over the outputs of the penultimate layer. It is necessary to propose a way to test the simplicity of the sample and its compliance with the generalized linear model (linear regression, logistic regression) using a system of statistical criteria.
+
* '''Problem:''' The output of the neural network is usually a generalized linear model over the outputs of the penultimate layer. It is necessary to propose a way to test the simplicity of the sample and its compliance with the generalized linear model (linear regression, logistic regression) using a system of statistical criteria.
* '''Data:''' For the computational experiment, it is proposed to use classical samples from the UCI repository. Link to samples https://github.com/ttgadaev/SampleSize/tree/master/datasets
* '''Data:''' For the computational experiment, it is proposed to use classical samples from the UCI repository. Link to samples https://github.com/ttgadaev/SampleSize/tree/master/datasets
-
* '''References:''': http://www.ccas.ru/avtorefe/0016d.pdf c 49-63 Bishop, C. 2006. Pattern Recognition and Machine Learning. Berlin: Springer. $758
+
* '''References:''' http://www.ccas.ru/avtorefe/0016d.pdf c 49-63 Bishop, C. 2006. Pattern Recognition and Machine Learning. Berlin: Springer. $758
-
* '''Basic algorithm:''' White test, Wald test, Goldfeld-Quantum test, Durbin-Watson, Chi-square, Fry-Behr, Shapiro-Wilk
+
* '''Base algorithm:''' White test, Wald test, Goldfeld-Quantum test, Durbin-Watson, Chi-square, Fry-Behr, Shapiro-Wilk
* '''Solution:''' The system of tests for checking the simplicity of the sample (and the adequacy of the model), the independent variables are not random, the dependent variables are distributed normally or binomially, there are no gaps and outliers, the classes are balanced, the sample is approximated by a single model. The variance of the error function does not depend on the independent variable. The study is based on synthetic and real data.
* '''Solution:''' The system of tests for checking the simplicity of the sample (and the adequacy of the model), the independent variables are not random, the dependent variables are distributed normally or binomially, there are no gaps and outliers, the classes are balanced, the sample is approximated by a single model. The variance of the error function does not depend on the independent variable. The study is based on synthetic and real data.
-
* '''Authors:''' Gadaev T. T. (consultant) Strizhov V.V., Grabovoi A.V. (Experts)
+
* '''Authors:''' Gadaev T. T. (consultant) Strijov V.V., Grabovoi A.V. (Experts)
-
===Task 14===
+
===14. 2019===
-
* '''Name:''' Deep Learning for reliable detection of tandem repeats in 3D protein structures [[Media:Strijov_3D_CNN.pdf|more in PDF]]
+
* '''Title:''' Deep Learning for reliable detection of tandem repeats in 3D protein structures [[Media:Strijov_3D_CNN.pdf|more in PDF]]
-
* '''Task''': Deep learning algorithms pushed computer vision to a level of accuracy comparable or higher than a human vision. Similarly, we believe that it is possible to recognize the symmetry of a 3D object with a very high reliability, when the object is represented as a density map. The optimization problem includes i) multiclass classification of 3D data. The output is the order of symmetry. The number of classes is ~10-20 ii) multioutput regression of 3D data. The output is the symmetry axis (a 3-vector). The input data are typically 24x24x24 meshes. The total amount of these meshes is of order a million. Biological motivation : Symmetry is an important feature of protein tertiary and quaternary structures that has been associated with protein folding, function, evolution, and stability. Its emergence and ensuing prevalence has been attributed to gene duplications, fusion events, and subsequent evolutionary drift in sequence. Methods to detect these symmetries exist, either based on the structure or the sequence of the proteins, however, we believe that they can be vastly improved.
+
* '''Problem:''' Deep learning algorithms pushed computer vision to a level of accuracy comparable or higher than a human vision. Similarly, we believe that it is possible to recognize the symmetry of a 3D object with a very high reliability, when the object is represented as a density map. The optimization problem includes i) multiclass classification of 3D data. The output is the order of symmetry. The number of classes is ~10-20 ii) multioutput regression of 3D data. The output is the symmetry axis (a 3-vector). The input data are typically 24x24x24 meshes. The total amount of these meshes is of order a million. Biological motivation : Symmetry is an important feature of protein tertiary and quaternary structures that has been associated with protein folding, function, evolution, and stability. Its emergence and ensuing prevalence has been attributed to gene duplications, fusion events, and subsequent evolutionary drift in sequence. Methods to detect these symmetries exist, either based on the structure or the sequence of the proteins, however, we believe that they can be vastly improved.
* '''Data:''' Synthetic data are obtained by ‘symmetrizing’ folds from top8000 library (http://kinemage.biochem.duke.edu/databases/top8000.php).
* '''Data:''' Synthetic data are obtained by ‘symmetrizing’ folds from top8000 library (http://kinemage.biochem.duke.edu/databases/top8000.php).
-
* '''References:''': Our previous 3D CNN: [https://arxiv.org/abs/1801.06252] Invariance of CNNs (and references therein): [https://hal.inria.fr/hal- 01630265/document], [https://arxiv.org/pdf/1706.03078.pdf]
+
* '''References:''' Our previous 3D CNN: [https://arxiv.org/abs/1801.06252] Invariance of CNNs (and references therein): [https://hal.inria.fr/hal- 01630265/document], [https://arxiv.org/pdf/1706.03078.pdf]
* '''Basic algorithm''': A prototype has already been created using the Tensorflow framework [4], which is capable of detecting the order of cyclic structures with about 93% accuracy. The main goal of this internship is to optimize the topology of the current neural network prototype and make it rotational and translational invariant with respect to input data. [4] [https://www.tensorflow.org/]
* '''Basic algorithm''': A prototype has already been created using the Tensorflow framework [4], which is capable of detecting the order of cyclic structures with about 93% accuracy. The main goal of this internship is to optimize the topology of the current neural network prototype and make it rotational and translational invariant with respect to input data. [4] [https://www.tensorflow.org/]
* '''Solution:''' The network architecture needs to be modified according to the invariance properties (most importantly, rotational invariance). Please see the links below [https://hal.inria.fr/hal-01630265/document], [https://arxiv.org/pdf/1706.03078.pdf] The code is written using the Tensorflow library, and the current model is trained on a single GPU (Nvidia Quadro 4000)of a desktop machine.
* '''Solution:''' The network architecture needs to be modified according to the invariance properties (most importantly, rotational invariance). Please see the links below [https://hal.inria.fr/hal-01630265/document], [https://arxiv.org/pdf/1706.03078.pdf] The code is written using the Tensorflow library, and the current model is trained on a single GPU (Nvidia Quadro 4000)of a desktop machine.
Строка 1131: Строка 1761:
* '''Authors:''' Expert Sergei Grudinin, consultants Guillaume Pages
* '''Authors:''' Expert Sergei Grudinin, consultants Guillaume Pages
-
=== Task 46 ===
+
===46. 2019===
-
* Name: Task of searching characters in texts
+
* Name: The problem of searching characters in texts
-
* Task: In the simplest case, this Task is reduced to the Sequence Labeling task on a labeled selection. The difficulty lies in obtaining a sufficient amount of training data, that is, it is required to obtain a larger sample from the existing small Expert markup (automatically by searching for patterns or by compiling a simple and high-quality markup instruction, for example, in Toloka). The presence of markup allows you to start experimenting with the selection of the optimal model, various neural network architectures (BiLSTM, Transformer, etc.) may be of interest here.
+
* '''Problem description:''' In the simplest case, this The problem is reduced to the Sequence Labeling The problem on a labeled selection. The difficulty lies in obtaining a sufficient amount of training data, that is, it is required to obtain a larger sample from the existing small Expert markup (automatically by searching for patterns or by compiling a simple and high-quality markup instruction, for example, in Toloka). The presence of markup allows you to start experimenting with the selection of the optimal model, various neural network architectures (BiLSTM, Transformer, etc.) may be of interest here.
* Data: Dictionary of symbols, Marked artistic texts
* Data: Dictionary of symbols, Marked artistic texts
-
* References: http://www.machinelearning.ru/wiki/images/0/05/Mmta18-rnn.pdf
+
* '''References:''' http://www.machinelearning.ru/wiki/images/0/05/Mmta18-rnn.pdf
* Basic algorithm: HMM, RNN
* Basic algorithm: HMM, RNN
-
* Solution: It is proposed to compare the work of several state-of-the-art algorithms. Propose a classifier quality metric for characters (character/non-character). Determine applicability of methods.
+
* '''Solution:''' It is proposed to compare the work of several state-of-the-art algorithms. Propose a classifier quality metric for characters (character/non-character). Determine applicability of methods.
-
* Novelty: The proposed approach to text analysis is used by Experts in manual mode and has not been automated
+
* '''Novelty:''' The proposed approach to text analysis is used by Experts in manual mode and has not been automated
-
* Authors: M. Apishev (consultant), D. Lemtyuzhnikova
+
* '''Authors:''' M. Apishev (consultant), D. Lemtyuzhnikova
-
=== Task 47 ===
+
===47. 2019===
-
* '''Name:''' Deep learning for RNA secondary structure prediction
+
* '''Title:''' Deep learning for RNA secondary structure prediction
-
* '''Task''': RNA secondary structure is an important feature which defines RNA functional properties. Its importance can be illustrated by the fact, that it is evolutionary preserved and some types of functional RNAs always * have the same secondary structure, for example all tRNAs fold into cloverleaf. As secondary structure often defines functions, knowing RNAs secondary structure may help investigate functions of novel RNA molecules. RNA folding is not as easy as DNA folding, because RNA is single stranded molecule which forms complicated base-pairing interactions, while DNA mostly exists as fully base paired double helices. Current methods of RNA structure prediction rely on experimentally evaluated thermodynamic rules, but with thermodynamics alone only 80% of structures can be accurately predicted. We propose an AI-driven method for predicting RNA secondary structure inspired by neural machine translation model.
+
* '''Problem:''' RNA secondary structure is an important feature which defines RNA functional properties. Its importance can be illustrated by the fact, that it is evolutionary preserved and some types of functional RNAs always * have the same secondary structure, for example all tRNAs fold into cloverleaf. As secondary structure often defines functions, knowing RNAs secondary structure may help investigate functions of novel RNA molecules. RNA folding is not as easy as DNA folding, because RNA is single stranded molecule which forms complicated base-pairing interactions, while DNA mostly exists as fully base paired double helices. Current methods of RNA structure prediction rely on experimentally evaluated thermodynamic rules, but with thermodynamics alone only 80% of structures can be accurately predicted. We propose an AI-driven method for predicting RNA secondary structure inspired by neural machine translation model.
* '''Data:''' RNA sequences in form of strings of characters
* '''Data:''' RNA sequences in form of strings of characters
-
* '''References:''': https://arxiv.org/abs/1609.08144
+
* '''References:''' https://arxiv.org/abs/1609.08144
-
* '''Basic algorithm:''' https://www.ncbi.nlm.nih.gov/pubmed/16873527
+
* '''Base algorithm:''' https://www.ncbi.nlm.nih.gov/pubmed/16873527
* '''Solution:''' Deep learning recurrent encoder-decoder model with attention
* '''Solution:''' Deep learning recurrent encoder-decoder model with attention
* '''Novelty:''' Currently RNA secondary structure prediction still remains unsolved problem and to the best of our knowledge DL approach has never been introduced in the literature before
* '''Novelty:''' Currently RNA secondary structure prediction still remains unsolved problem and to the best of our knowledge DL approach has never been introduced in the literature before
* '''Authors:''' consultant Maria Popova Chapel-Hill
* '''Authors:''' consultant Maria Popova Chapel-Hill
-
=== Task 4 ===
+
===4. 2019===
-
* '''Name:''' Automatic setting of ARTM parameters for a wide class of tasks.
+
* '''Title:''' Automatic setting of ARTM parameters for a wide class of The problems.
-
* '''Task:''' The bigARTM open library allows you to build topical models using a wide class of possible regularizers. However, this flexibility makes the task of setting the coefficients very difficult. This tuning can be greatly simplified by using the relative regularization coefficients mechanism and automatic selection of N-grams. We need to test the hypothesis that there is a universal set of relative regularization coefficients that gives "reasonably good" results on a wide class of problems. Several datasets are given with some external quality criterion (for example, classification of documents into categories or ranking). We find the best parameters for a particular dataset, giving the "locally the best model". We find the bigARTM initialization algorithm that produces thematic models with quality comparable to the "locally best model" on its dataset. Comparability criterion in quality: on this dataset, the quality of the "universal model" is no more than 5% worse than that of the "locally best model".
+
* '''Problem description:''' The bigARTM open library allows you to build topical models using a wide class of possible regularizers. However, this flexibility makes The problem of setting the coefficients very difficult. This tuning can be greatly simplified by using the relative regularization coefficients mechanism and automatic selection of N-grams. We need to test the hypothesis that there is a universal set of relative regularization coefficients that gives "reasonably good" results on a wide class of problems. Several datasets are given with some external quality criterion (for example, classification of documents into categories or ranking). We find the best parameters for a particular dataset, giving the "locally the best model". We find the bigARTM initialization algorithm that produces thematic models with quality comparable to the "locally best model" on its dataset. Comparability criterion in quality: on this dataset, the quality of the "universal model" is no more than 5% worse than that of the "locally best model".
*'''Data:''' [https://archive.ics.uci.edu/ml/datasets/Victorian+Era+Authorship+Attribution Victorian Era Authorship Attribution Data Set], [https://archive.ics. uci.edu/ml/datasets/Twenty+Newsgroups 20 Newsgroups], ICD-10, search/ranking triplets.
*'''Data:''' [https://archive.ics.uci.edu/ml/datasets/Victorian+Era+Authorship+Attribution Victorian Era Authorship Attribution Data Set], [https://archive.ics. uci.edu/ml/datasets/Twenty+Newsgroups 20 Newsgroups], ICD-10, search/ranking triplets.
* '''References:'''
* '''References:'''
Строка 1159: Строка 1789:
*# Presentation by Viktor Bulatov at a scientific seminar: https://drive.google.com/file/d/19pJ21LRPeeOxY4mkcSnQCRm93zOO4J5b/view
*# Presentation by Viktor Bulatov at a scientific seminar: https://drive.google.com/file/d/19pJ21LRPeeOxY4mkcSnQCRm93zOO4J5b/view
*# Draft with formulas: https://drive.google.com/open?id=1AqS7snUsSJ18ZYBtC-6uP_2dMTDJSGeD
*# Draft with formulas: https://drive.google.com/open?id=1AqS7snUsSJ18ZYBtC-6uP_2dMTDJSGeD
-
* '''Basic algorithm:''' PLSA / LDA / logregression.
+
* '''Base algorithm:''' PLSA / LDA / logregression.
* '''Solution:''' bigARTM with background themes and smoothing, sparseness and decorrelation regularizers (coefficients picked up automatically), as well as automatically selected N-grams.
* '''Solution:''' bigARTM with background themes and smoothing, sparseness and decorrelation regularizers (coefficients picked up automatically), as well as automatically selected N-grams.
* '''Novelty:''' The need for automated tuning of model parameters and the lack of such implementations in the scientific community.
* '''Novelty:''' The need for automated tuning of model parameters and the lack of such implementations in the scientific community.
-
* '''Authors:''' consultant Viktor Bulatov, Expert[[Участник:Vokov|Vorontsov K. V.]].
+
* '''Authors:''' consultant Viktor Bulatov, Expert[http://www.machinelearning.ru/wiki/index.php?title=Участник:Vokov Vorontsov K. V.].
-
=== Task 50 ===
+
===50. 2019===
-
* '''Name:''' Thematic search for similar cases in the collection of acts of arbitration courts.
+
* '''Title:''' Thematic search for similar cases in the collection of acts of arbitration courts.
-
* '''Task:''' Build an information retrieval algorithm for a collection of acts of arbitration courts. The request can be an arbitrary document of the collection (the text of the act). The search result should be a list of documents in the collection, ranked in descending order of relevance.
+
* '''Problem description:''' Build an information retrieval algorithm for a collection of acts of arbitration courts. The request can be an arbitrary document of the collection (the text of the act). The search result should be a list of documents in the collection, ranked in descending order of relevance.
*'''Data:''' collection of text documents — acts of arbitration courts http://kad.arbitr.ru.
*'''Data:''' collection of text documents — acts of arbitration courts http://kad.arbitr.ru.
* '''References:'''
* '''References:'''
-
*# ''Anastasia Yanina.'' [[Media:ianina18msc.pdf‎|Thematic exploratory information search]]. 2018. FIVT MIPT.
+
*# Anastasia Yanina. [[Media:ianina18msc.pdf‎|Thematic exploratory information search]]. 2018. FIVT MIPT.
-
*# ''Ianina A., Golitsyn L., Vorontsov K.'' [[Media:ianina17exploratory.pdf|Multi-objective topic modeling for exploratory search in tech news]]. AINL-2017. CCIS, Springer, 2018.
+
*# Ianina A., Golitsyn L., Vorontsov K. [[Media:ianina17exploratory.pdf|Multi-objective topic modeling for exploratory search in tech news]]. AINL-2017. CCIS, Springer, 2018.
-
*# ''Ahmed El-Kishky, Yanglei Song, Chi Wang, Clare Voss, Jiawei Han''. [http://hanj.cs.illinois.edu/pdf/vldb15_ael-kishky.pdf Scalable Topical Phrase Mining from Text Corpora]. 2015.
+
*# Ahmed El-Kishky, Yanglei Song, Chi Wang, Clare Voss, Jiawei Han. [http://hanj.cs.illinois.edu/pdf/vldb15_ael-kishky.pdf Scalable Topical Phrase Mining from Text Corpora]. 2015.
-
* '''Basic algorithm:''' BigARTM with decorrelation, smoothing, sparse regularizers. Search by TF-IDF of words, by TF-IDF of UPA links, by thematic vector representations of documents, using a cosine proximity measure. TopMine algorithm for collocation detection.
+
* '''Base algorithm:''' BigARTM with decorrelation, smoothing, sparse regularizers. Search by TF-IDF of words, by TF-IDF of UPA links, by thematic vector representations of documents, using a cosine proximity measure. TopMine algorithm for collocation detection.
* '''Solution:''' Add modality of links to legal acts. Add modality of legal terms. Choose the optimal number of topics and regularization strategy. Organize the process of marking pairs of documents. Implement the evaluation of the quality of the search for a labeled sample of pairs of documents.
* '''Solution:''' Add modality of links to legal acts. Add modality of legal terms. Choose the optimal number of topics and regularization strategy. Organize the process of marking pairs of documents. Implement the evaluation of the quality of the search for a labeled sample of pairs of documents.
* '''Novelty:''' The first attempt to use ARTM for thematic search of legal texts.
* '''Novelty:''' The first attempt to use ARTM for thematic search of legal texts.
-
* '''Authors:''' consultant Ekaterina Artyomova, Expert [[Участник:Vokov|Vorontsov K. V.]].
+
* '''Authors:''' consultant Ekaterina Artyomova, Expert [http://www.machinelearning.ru/wiki/index.php?title=Участник:Vokov Vorontsov K. V.].
-
=Group 2=
+
==2019 Group 2==
-
* Story [[Automation of scientific research in machine learning (practice, Strizhov V.V.)/Group 674, spring 2019|2019 (674)]] — [[Numerical methods of learning by precedents (practice, Strizhov V.V.)/Group 574, spring 2018 | 2018]] — [[Numerical methods of learning by precedents (practice, Strizhov V.V.)/Group 474, spring 2017 | 2017]] — [[Numerical methods of learning by precedents (practice, Strizhov V.V.)/Group 374, spring 2016 | 2016]] — [[Numerical methods of learning by precedents (practice, Strizhov V.V.)/Group 274, spring 2015 | 2015]] — [[Numerical methods of learning by precedents (practice, Strizhov V.V.)/Group 174, spring 2014 | 2014]] — [[Numerical methods of learning by precedents (practice, Strizhov V.V.)/Group 074, spring 2013 | 2013]]
+
{|class="wikitable"
{|class="wikitable"
|-
|-
Строка 1186: Строка 1815:
! Consultant
! Consultant
! Reviewer
! Reviewer
-
!
 
|-
|-
-
|[[Участник:Ninavishn|Vishnyakova Nina]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Ninavishn Vishnyakova Nina]
|Optimal Approximation of Non-linear Power Flow Problem
|Optimal Approximation of Non-linear Power Flow Problem
|[https://docs.google.com/document/d/1TvMgA1ytOMrCm1Fx35UsrnMSASvECnr249x0Nvy7TaY/edit LinkReview] [https://github.com/Intelligent-Systems-Phystech/2019-Project-41/raw/master/report/Optimal_Approximation_of_Non_linear_Power_Flow_Problem.pdf paper] [https://github.com/Intelligent-Systems-Phystech/2019-Project-41 code] [https://github.com/Intelligent-Systems-Phystech/2019-Project-41/raw/master/report/Vishnyakova_nina_2019_41_Talk.pdf presentation] [https://youtu.be/QINA00S1_Bo video]
|[https://docs.google.com/document/d/1TvMgA1ytOMrCm1Fx35UsrnMSASvECnr249x0Nvy7TaY/edit LinkReview] [https://github.com/Intelligent-Systems-Phystech/2019-Project-41/raw/master/report/Optimal_Approximation_of_Non_linear_Power_Flow_Problem.pdf paper] [https://github.com/Intelligent-Systems-Phystech/2019-Project-41 code] [https://github.com/Intelligent-Systems-Phystech/2019-Project-41/raw/master/report/Vishnyakova_nina_2019_41_Talk.pdf presentation] [https://youtu.be/QINA00S1_Bo video]
-
|[[Участник:Yury.maximov|Yury Maximov]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Yury.maximov Yury Maximov]
-
|reviewer [[Участник:Loginov-ra|Loginov Roman]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Loginov-ra Loginov Roman]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-41/raw/master/report/Vishnyakova2019Project41_Review.pdf review]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-41/raw/master/report/Vishnyakova2019Project41_Review.pdf review]
-
|
 
|-
|-
-
|[[Участник:Polinakud|Kudryavtseva Polina]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Polinakud Kudryavtseva Polina]
|Intention forecasting. Building an optimal signal decoding model for modeling a brain-computer interface.
|Intention forecasting. Building an optimal signal decoding model for modeling a brain-computer interface.
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-18/tree/master/code code]
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-18/tree/master/code code]
Строка 1202: Строка 1829:
[https://github.com/Intelligent-Systems-Phystech/2019-Project-18/raw/master/doc/Kudryavtseva2019Project18.pdf paper] [https://www.youtube.com/watch?v=wo-nJU3uG1I video]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-18/raw/master/doc/Kudryavtseva2019Project18.pdf paper] [https://www.youtube.com/watch?v=wo-nJU3uG1I video]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-18/raw/master/doc/Kudryavtseva2019Slides.pdf presentation]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-18/raw/master/doc/Kudryavtseva2019Slides.pdf presentation]
-
|[[Участник:Isachenkoroma|Roman Isachenko]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Isachenkoroma Roman Isachenko]
|Nechepurenko Ivan
|Nechepurenko Ivan
[https://docs.google.com/document/d/1i6WuDNEozojFYMkJHu5DcaItE5qrsr_Tt3ubBE298DQ/edit review]
[https://docs.google.com/document/d/1i6WuDNEozojFYMkJHu5DcaItE5qrsr_Tt3ubBE298DQ/edit review]
-
|
 
|-
|-
-
|[[Участник:Loginov-ra|Loginov Roman]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Loginov-ra Loginov Roman]
|Multi-simulation as a universal way to describe a general sample
|Multi-simulation as a universal way to describe a general sample
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-28/tree/master/code code]
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-28/tree/master/code code]
Строка 1215: Строка 1841:
[https://github.com/Intelligent-Systems-Phystech/2019-Project-28/raw/master/report/Loginov2019MultimodellingTime.pdf presentation]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-28/raw/master/report/Loginov2019MultimodellingTime.pdf presentation]
[https://www.youtube.com/watch?v=GCl7VSAz-Xg video]
[https://www.youtube.com/watch?v=GCl7VSAz-Xg video]
-
|[[Участник:Aduenko|Aduenko A. A.]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Aduenko Alexander Aduenko]
-
|Макаров Михаил [http://www.machinelearning.ru/wiki/images/9/92/Loginov2019Project28_Review.rtf review]
+
|Makarov Mikhail [http://www.machinelearning.ru/wiki/images/9/92/Loginov2019Project28_Review.rtf review]
-
|
+
|-
|-
-
|[[Участник:Makarov.mv|Mikhail Makarov]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Makarov.mv Mikhail Makarov]
|Location determination by accelerometer signals
|Location determination by accelerometer signals
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-26/tree/master/code code]
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-26/tree/master/code code]
Строка 1226: Строка 1851:
[https://github.com/Intelligent-Systems-Phystech/2018-Project-26/raw/master/pres/Project26presentation.pdf presentation]
[https://github.com/Intelligent-Systems-Phystech/2018-Project-26/raw/master/pres/Project26presentation.pdf presentation]
[https://www.youtube.com/watch?v=OEe9xmoNUNQ video]
[https://www.youtube.com/watch?v=OEe9xmoNUNQ video]
-
|[[Участник:Anastasiya|Anastasia Motrenko]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Anastasiya Anastasia Motrenko]
|Cherepkov Anton: [https://github.com/Intelligent-Systems-Phystech/2018-Project-26/raw/master/Makarov2019Project26/Makarov2019_review.pdf review]
|Cherepkov Anton: [https://github.com/Intelligent-Systems-Phystech/2018-Project-26/raw/master/Makarov2019Project26/Makarov2019_review.pdf review]
-
|
 
|-
|-
-
|[[Участник:Alex-kozinov|Kozinov Alexeyй]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Alex-kozinov Kozinov Alexey]
-
|Task of finding characters in images
+
|The problem of finding characters in images
|[https://docs.google.com/document/d/1P_osIW236MTBPe_aMJUI-EEHgUhheQR9bqlKCN97e8M/edit?usp=sharing LinkReview]
|[https://docs.google.com/document/d/1P_osIW236MTBPe_aMJUI-EEHgUhheQR9bqlKCN97e8M/edit?usp=sharing LinkReview]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-45/raw/master/Kozinov2019Project45/Kozinov2019Project45.pdf paper]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-45/raw/master/Kozinov2019Project45/Kozinov2019Project45.pdf paper]
Строка 1238: Строка 1862:
D. Lemtyuzhnikova
D. Lemtyuzhnikova
|Gracheva Anastasia [https://github.com/Intelligent-Systems-Phystech/2019-Project-15/raw/master/review.pdf review]
|Gracheva Anastasia [https://github.com/Intelligent-Systems-Phystech/2019-Project-15/raw/master/review.pdf review]
-
|
 
|-
|-
-
|[[Участник:Buchnev.valentin|Buchnev Valentin]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Buchnev.valentin Buchnev Valentin]
|Early prediction of sufficient sample size for a generalized linear model.
|Early prediction of sufficient sample size for a generalized linear model.
|[https://docs.google.com/document/d/1-xpsWSbI-hlX8PQXdVZ5gMOQC03LH0oM8u4dpTDMSKs/edit?usp=sharing LinkReview]
|[https://docs.google.com/document/d/1-xpsWSbI-hlX8PQXdVZ5gMOQC03LH0oM8u4dpTDMSKs/edit?usp=sharing LinkReview]
Строка 1246: Строка 1869:
[https://github.com/Intelligent-Systems-Phystech/2019-Project-44/ code] [https://github.com/Intelligent-Systems-Phystech/2019-Project-44/raw/master/report/Buchnev2019Project44presentation.pdf presentation]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-44/ code] [https://github.com/Intelligent-Systems-Phystech/2019-Project-44/raw/master/report/Buchnev2019Project44presentation.pdf presentation]
[https://www.youtube.com/watch?v=0SJL6Xx5VnU video]
[https://www.youtube.com/watch?v=0SJL6Xx5VnU video]
-
|[[Участник:Andriygav|Grabovoi A.V.]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Andriygav Grabovoi Andrey]
-
|reviewer
+
-
|
+
|-
|-
-
|[[Участник: Ivan.nechepurenco|Nechepurenko Ivan]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Ivan.nechepurenco Nechepurenko Ivan]
|Multisimulation, privileged training
|Multisimulation, privileged training
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-48/tree/master/code code],
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-48/tree/master/code code],
Строка 1256: Строка 1877:
[https://docs.google.com/document/d/1DJNwFfFXCipPictxTUWd8dBfj_Zv6zrfp86L5p_cfTI/edit?usp=sharing LinkReview]
[https://docs.google.com/document/d/1DJNwFfFXCipPictxTUWd8dBfj_Zv6zrfp86L5p_cfTI/edit?usp=sharing LinkReview]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-48/raw/master/slides/Nechepurenco2019.pdf presentation]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-48/raw/master/slides/Nechepurenco2019.pdf presentation]
-
|[[Участник:Neychev|R. G. Neichev]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Neychev R. G. Neichev]
|Kudryavtseva Polina
|Kudryavtseva Polina
-
|
 
|-
|-
-
|[[Участник: Gracheva.as|Gracheva Anastasia]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Gracheva.as Gracheva Anastasia]
|Estimation of binding energy of protein and small molecules
|Estimation of binding energy of protein and small molecules
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-15 code]
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-15 code]
Строка 1269: Строка 1889:
|Sergei Grudinin,
|Sergei Grudinin,
Maria Kadukova
Maria Kadukova
-
|reviewer
 
-
|
 
|-
|-
-
|[[Участник: Anthonycherepkov|Cherepkov Anton]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Anthonycherepkov Cherepkov Anton]
|Privileged learning in the problem of iris boundary approximation
|Privileged learning in the problem of iris boundary approximation
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-7/raw/master/report/Cherepkov_2019_Iris_circle_problem.pdf paper], [https://github.com/Intelligent-Systems-Phystech/2019-Project-7/raw/master/slides/Cherepkov_2019_Iris_circle_problem.pdf slides], [https://github.com/Intelligent-Systems-Phystech/2019-Project-7/tree/master/code code], [https://docs.google.com/document/d/140k6Qrf63iOHUqHcG9IO8cCa1PXEypY5zgboQ3S0LoU/edit?usp=sharing LinkReview]
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-7/raw/master/report/Cherepkov_2019_Iris_circle_problem.pdf paper], [https://github.com/Intelligent-Systems-Phystech/2019-Project-7/raw/master/slides/Cherepkov_2019_Iris_circle_problem.pdf slides], [https://github.com/Intelligent-Systems-Phystech/2019-Project-7/tree/master/code code], [https://docs.google.com/document/d/140k6Qrf63iOHUqHcG9IO8cCa1PXEypY5zgboQ3S0LoU/edit?usp=sharing LinkReview]
[https://www.youtube.com/watch?v=cI3x-vjOAIo video]
[https://www.youtube.com/watch?v=cI3x-vjOAIo video]
-
|[[Участник:Neychev|R. G. Neichev]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Neychev R. G. Neichev]
-
|[[Участник: Mlepekhin|Lepekhin Mikhail]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Mlepekhin Lepekhin Mikhail]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-7/raw/master/review/Cherepkov2019_review.pdf preliminary review]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-7/raw/master/review/Cherepkov2019_review.pdf preliminary review]
-
|
 
|-
|-
-
|[[Участник: Mlepekhin|Lepekhin Mikhail]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Mlepekhin Lepekhin Mikhail]
|Creation of ranking models for information retrieval systems. Algorithm for Predicting the Structure of Locally Optimal Models
|Creation of ranking models for information retrieval systems. Algorithm for Predicting the Structure of Locally Optimal Models
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-27/blob/master/code code]
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-27/blob/master/code code]
Строка 1289: Строка 1906:
[https://www.youtube.com/watch?v=AL6Q7u3daPw video]
[https://www.youtube.com/watch?v=AL6Q7u3daPw video]
|Andrey Kulunchakov
|Andrey Kulunchakov
-
|[[Участник:Ninavishn|Vishnyakova Nina]], [https://github.com/Intelligent-Systems-Phystech/2019-Project-41/raw/master/report/Рецензия%20на%20статью%20Лепехина%20Михаила.pdf review]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Ninavishn Vishnyakova Nina], [https://github.com/Intelligent-Systems-Phystech/2019-Project-41/raw/master/report/Рецензия%20на%20статью%20Лепехина%20Михаила.pdf review]
-
|
+
|-
|-
-
|[[Участник: Gridasovii|Gridasov Ilya]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Gridasovii Gridasov Ilya]
|Automatic construction of a neural network of optimal complexity
|Automatic construction of a neural network of optimal complexity
|[https://docs.google.com/document/d/1RcUfc9dKu-hO9r9sqS9hXUu7QofHeDfvHTuJqM8BgU4/edit?usp=sharing LinkReview]
|[https://docs.google.com/document/d/1RcUfc9dKu-hO9r9sqS9hXUu7QofHeDfvHTuJqM8BgU4/edit?usp=sharing LinkReview]
Строка 1298: Строка 1914:
[https://github.com/Intelligent-Systems-Phystech/2019-Project-11/raw/master/Gridasov2019Project11/presentation/Gridasov2019Project11Presentation.pdf Presentation]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-11/raw/master/Gridasov2019Project11/presentation/Gridasov2019Project11Presentation.pdf Presentation]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-11/tree/master/Gridasov2019Project11/code code]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-11/tree/master/Gridasov2019Project11/code code]
-
|O. Yu. Bakhteev, Strizhov V.V.
+
|O. Yu. Bakhteev, Strijov V.V.
-
|[[Участник:Buchnev.valentin|Buchnev Valentin]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Buchnev.valentin Buchnev Valentin]
-
|
+
|-
|-
-
|[[Участник: Telenkov-Dmitry|Telenkov Dmitry]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Telenkov-Dmitry Telenkov Dmitry]
|Brain signal decoding and intention prediction
|Brain signal decoding and intention prediction
|[https://docs.google.com/document/d/1pTzCafRueWf1hTYCY2uwatNEAFia_nbZSlsgYGYoWnY LinkReview]
|[https://docs.google.com/document/d/1pTzCafRueWf1hTYCY2uwatNEAFia_nbZSlsgYGYoWnY LinkReview]
Строка 1310: Строка 1925:
[https://github.com/Intelligent-Systems-Phystech/2019-Project-49/blob/master/report/Experiment.ipynb code]
[https://github.com/Intelligent-Systems-Phystech/2019-Project-49/blob/master/report/Experiment.ipynb code]
|Andrey Zadayanchuk
|Andrey Zadayanchuk
-
|reviewer
 
-
|
 
|-
|-
|}
|}
-
===Task 18 ===
+
===18. 2019===
-
* '''Name:''' Forecasting intentions. Building an optimal signal decoding model for modeling a brain-computer interface.
+
* '''Title:''' Forecasting intentions. Building an optimal signal decoding model for modeling a brain-computer interface.
-
* '''Task''': The Brain Computer Interface (BCI) allows you to help people with disabilities regain their mobility. According to the available description of the device signal, it is necessary to simulate the behavior of the subject.
+
* '''Problem:''' The Brain Computer Interface (BCI) allows you to help people with disabilities regain their mobility. According to the available description of the device signal, it is necessary to simulate the behavior of the subject.
* '''Data:''' Data sets of ECoG/EEG brain signals.
* '''Data:''' Data sets of ECoG/EEG brain signals.
-
* '''References:''':
+
* '''References:'''
#* Motrenko A.P., Strijov V.V. Multi-way feature selection for ECoG-based brain-computer Interface // Expert systems with applications. - 2018.
#* Motrenko A.P., Strijov V.V. Multi-way feature selection for ECoG-based brain-computer Interface // Expert systems with applications. - 2018.
* '''Basic algorithm''': It is proposed to compare with the partial least squares algorithm.
* '''Basic algorithm''': It is proposed to compare with the partial least squares algorithm.
* '''Solution:''' In this work, it is proposed to build a single system that solves the problem of signal decoding. As stages of building such a system, it is proposed to solve the problems of data preprocessing, feature space extraction, dimensionality reduction and selection of a model of optimal complexity. It is proposed to use the tensor version of PLS with feature selection.
* '''Solution:''' In this work, it is proposed to build a single system that solves the problem of signal decoding. As stages of building such a system, it is proposed to solve the problems of data preprocessing, feature space extraction, dimensionality reduction and selection of a model of optimal complexity. It is proposed to use the tensor version of PLS with feature selection.
* '''Novelty:''' In the formulation of the problem, the complex nature of the signal is taken into account: a continuous trajectory of movement, the presence of discrete structural variables (fingers or joint movement), the presence of continuous variables (position of a finger or limb).
* '''Novelty:''' In the formulation of the problem, the complex nature of the signal is taken into account: a continuous trajectory of movement, the presence of discrete structural variables (fingers or joint movement), the presence of continuous variables (position of a finger or limb).
-
* '''Authors:''' Strizhov V.V., Tetiana Aksenova, consultant – Roman Isachenko
+
* '''Authors:''' Strijov V.V., Tetiana Aksenova, consultant – Roman Isachenko
-
=== Task 41 ===
+
===41. 2019===
-
* '''Name:''' Optimal Approximation of Non-linear Power Flow Problem
+
* '''Title:''' Optimal Approximation of Non-linear Power Flow Problem
-
* '''Task''': Our goal is to approximate the solution of non-linear non-convex optimal power flow problem by solving a sequence of convex optimization problems (aka trust region approach). On this way we propose to compare various approaches for an approximate solution of this problem with adaptive approximation of the power flow non-linearities with a sequence of quadratic and/or piece-wise linear functions
+
* '''Problem:''' Our goal is to approximate the solution of non-linear non-convex optimal power flow problem by solving a sequence of convex optimization problems (aka trust region approach). On this way we propose to compare various approaches for an approximate solution of this problem with adaptive approximation of the power flow non-linearities with a sequence of quadratic and/or piece-wise linear functions
* '''Data:''' Matpower module from MATLAB contains all necessary test cases. Start considering IEEE 57 bus case.
* '''Data:''' Matpower module from MATLAB contains all necessary test cases. Start considering IEEE 57 bus case.
-
* '''References:''':
+
* '''References:'''
*# Molzahn, D. K., & Hiskens, I. A. (2019). A survey of relaxations and approximations of the power flow equations. Foundations and Trends in Electric Energy Systems, 4(1-2), 1-221. https://www.nowpublishers.com/article/DownloadSummary/EES-012
*# Molzahn, D. K., & Hiskens, I. A. (2019). A survey of relaxations and approximations of the power flow equations. Foundations and Trends in Electric Energy Systems, 4(1-2), 1-221. https://www.nowpublishers.com/article/DownloadSummary/EES-012
*# The QC Relaxation: A Theoretical and Computational Study on Optimal Power Flow. Carleton Coffrin ; Hassan L. Hijazi; Pascal Van Hentenryck https://ieeexplore.ieee.org/abstract/document/7271127/
*# The QC Relaxation: A Theoretical and Computational Study on Optimal Power Flow. Carleton Coffrin ; Hassan L. Hijazi; Pascal Van Hentenryck https://ieeexplore.ieee.org/abstract/document/7271127/
*# Convex Relaxations in Power System Optimization: A Brief Introduction. Carleton Coffrin and Line Roald. https://arxiv.org/pdf/1807.07227.pdf
*# Convex Relaxations in Power System Optimization: A Brief Introduction. Carleton Coffrin and Line Roald. https://arxiv.org/pdf/1807.07227.pdf
*# Optimal Adaptive Linearizations of the AC Power Flow Equations. Sidhant Misra, Daniel K. Molzahn, and Krishnamurthy Dvijotham https://molzahn.github.io/pubs/misra_molzahn_dvijotham-adaptive_linearizations2018.pdf
*# Optimal Adaptive Linearizations of the AC Power Flow Equations. Sidhant Misra, Daniel K. Molzahn, and Krishnamurthy Dvijotham https://molzahn.github.io/pubs/misra_molzahn_dvijotham-adaptive_linearizations2018.pdf
-
* '''Basic algorithm:''' A set of algorithms described in [1] should be considered to compare with, details behind the proposed method would be shared by the consultant (a draft of the paper)
+
* '''Base algorithm:''' A set of algorithms described in [1] should be considered to compare with, details behind the proposed method would be shared by the consultant (a draft of the paper)
* '''Solution:''' to figure out the quality of the solution we propose to compare it with the ones given by IPOPT and numerous relaxations, and do some reverse engineering regarding to our method
* '''Solution:''' to figure out the quality of the solution we propose to compare it with the ones given by IPOPT and numerous relaxations, and do some reverse engineering regarding to our method
* '''Novelty:''' The OPF is a truly hot topic in power systems, and is of higher interest by the discrete optimization community (as a general QCQP problem). Any advance in this area is of higher interest by the community
* '''Novelty:''' The OPF is a truly hot topic in power systems, and is of higher interest by the discrete optimization community (as a general QCQP problem). Any advance in this area is of higher interest by the community
Строка 1341: Строка 1954:
* '''Notes''': the problem has both the computational and the theoretical focuses, so 2 students are ok to work on this topic
* '''Notes''': the problem has both the computational and the theoretical focuses, so 2 students are ok to work on this topic
-
=== Task 2 ===
+
===2. 2019===
-
* '''Name:''' Investigation of reference objects in the problem of metric classification of time series.
+
* '''Title:''' Investigation of reference objects in the problem of metric classification of time series.
-
* '''Task:''' The DTW function is the distance between two time series that can be non-linearly warped relative to each other. It looks for the best alignment between two objects, so it can be used in a metric object classification problem. One of the methods for solving the problem of metric classification is measuring distances to reference objects and using the vector of these distances as an indicative description of the object. The DBA method is an algorithm for constructing centroids (reference objects) for time series based on the DTW distance. When plotting the distance between the time series and the centroid, different pairs of values (eg peak values) are more specific to one of the classes, and the impact of such coincidences on the distance value should be higher.
+
* '''Problem description:''' The DTW function is the distance between two time series that can be non-linearly warped relative to each other. It looks for the best alignment between two objects, so it can be used in a metric object classification problem. One of the methods for solving the problem of metric classification is measuring distances to reference objects and using the vector of these distances as an indicative description of the object. The DBA method is an algorithm for constructing centroids (reference objects) for time series based on the DTW distance. When plotting the distance between the time series and the centroid, different pairs of values (eg peak values) are more specific to one of the classes, and the impact of such coincidences on the distance value should be higher.
-
It is necessary to explore various ways of constructing reference objects, as well as determining their optimal number. The criterion is the quality of the metric classifier in the task. In the DBA method, for each centroid, it is proposed to create a weight vector that demonstrates the "significance" of the measurements of the centroid, and use it in the modified weighted-DTW distance function.
+
It is necessary to explore various ways of constructing reference objects, as well as determining their optimal number. The criterion is the quality of the metric classifier in The problem. In the DBA method, for each centroid, it is proposed to create a weight vector that demonstrates the "significance" of the measurements of the centroid, and use it in the modified weighted-DTW distance function.
* '''Data:''' The data describes 6 classes of time series from the mobile phone's accelerometer. https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group274/Goncharov2015MetricClassification/data/
* '''Data:''' The data describes 6 classes of time series from the mobile phone's accelerometer. https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group274/Goncharov2015MetricClassification/data/
* '''References:'''
* '''References:'''
Строка 1350: Строка 1963:
*# DBA: https://hal.sorbonne-universite.fr/hal-01630288/document
*# DBA: https://hal.sorbonne-universite.fr/hal-01630288/document
*# weighted DTW: http://www.mathnet.ru/php/archive.phtml?wshow=paper&jrnid=ia&paperid=414&option_lang=rus
*# weighted DTW: http://www.mathnet.ru/php/archive.phtml?wshow=paper&jrnid=ia&paperid=414&option_lang=rus
-
* '''Basic algorithm:''' Implement basic methods:
+
* '''Base algorithm:''' Implement basic methods:
*# Selection of a subset of training sample objects as reference
*# Selection of a subset of training sample objects as reference
*# Pre-processing of anomalous objects
*# Pre-processing of anomalous objects
Строка 1359: Строка 1972:
Literature research and a combination of up-to-date methods.
Literature research and a combination of up-to-date methods.
* '''Novelty:''' There has not been a comprehensive study of various methods of constructing centroids and reference elements along with the choice of their optimal number.
* '''Novelty:''' There has not been a comprehensive study of various methods of constructing centroids and reference elements along with the choice of their optimal number.
-
* '''Authors:''' Alexey Goncharov - consultant, Expert, Strizhov V.V. - Expert
+
* '''Authors:''' Alexey Goncharov - consultant, Expert, Strijov V.V. - Expert
-
===Task 7 ===
+
===7. 2019===
-
* '''Name:''' Privileged learning in the iris boundary approximation problem
+
* '''Title:''' Privileged learning in the iris boundary approximation problem
-
* '''Task''': Based on the image of the human eye, determine the circles approximating the inner and outer border of the iris.
+
* '''Problem:''' Based on the image of the human eye, determine the circles approximating the inner and outer border of the iris.
* '''Data:''' Bitmap monochrome images, typical size 640*480 pixels (however other sizes are possible)[http://www.bath.ac.uk/elec-eng/research/sipg/irisweb/ ], [http://www.cb-sr.ia.ac.cn/IrisDatabase.htm].
* '''Data:''' Bitmap monochrome images, typical size 640*480 pixels (however other sizes are possible)[http://www.bath.ac.uk/elec-eng/research/sipg/irisweb/ ], [http://www.cb-sr.ia.ac.cn/IrisDatabase.htm].
-
* '''References:''':
+
* '''References:'''
-
** Aduenko A.A. Selection of multi-models in Tasks classification (supervisor Strizhov V.V.). Moscow Institute of Physics and Technology, 2017. [http://www.frccsc.ru/sites/default/files/docs/ds/002-073-05/diss/11-aduenko/11-Aduenko_main.pdf?626]
+
*# Aduenko A.A. Selection of multi-models in The problems classification (supervisor Strijov V.V.). Moscow Institute of Physics and Technology, 2017. [http://www.frccsc.ru/sites/default/files/docs/ds/002-073-05/diss/11-aduenko/11-Aduenko_main.pdf?626]
-
** K.A. Gankin, A.N. Gneushev, I.A. Matveev Segmentation of the iris image based on approximate methods with subsequent refinements // Izvestiya RAN. Theory and control systems, 2014, no. 2, p. 78–92.
+
*# K.A. Gankin, A.N. Gneushev, I.A. Matveev Segmentation of the iris image based on approximate methods with subsequent refinements // Izvestiya RAN. Theory and control systems, 2014, no. 2, p. 78–92.
-
** Duda, R. O. Use of the Hough transformation to detect lines and curves in pictures / R. O. Duda, P. E. Hart // Communications of the ACM. 1972 Vol. 15, no. 1.Pp.
+
*# Duda, R. O. Use of the Hough transformation to detect lines and curves in pictures / R. O. Duda, P. E. Hart // Communications of the ACM. 1972 Vol. 15, no. 1.Pp.
* '''Basic algorithm''': Efimov Yury. Search for the outer and inner boundaries of the iris in the eye image using the paired gradient method, 2015.
* '''Basic algorithm''': Efimov Yury. Search for the outer and inner boundaries of the iris in the eye image using the paired gradient method, 2015.
* '''Solution:''' See [[Media:Iris_circle_problem.pdf | iris_circle_problem.pdf]]
* '''Solution:''' See [[Media:Iris_circle_problem.pdf | iris_circle_problem.pdf]]
* '''Novelty:''' A fast non-enumerative algorithm for approximating boundaries using linear multimodels is proposed. Additionally, capsule neural networks.
* '''Novelty:''' A fast non-enumerative algorithm for approximating boundaries using linear multimodels is proposed. Additionally, capsule neural networks.
-
* '''consultant''': Radoslav Neichev (by Strizhov V.V., Expert Matveev I.A.)
+
* '''consultant''': Radoslav Neichev (by Strijov V.V., Expert Matveev I.A.)
-
===Task 44 ===
+
===44. 2019===
*'''Name:''' Early prediction of sufficient sample size for a generalized linear model.
*'''Name:''' Early prediction of sufficient sample size for a generalized linear model.
-
*'''Task''': The problem of designing an experiment is being investigated. The Task of estimating a sufficient sample size according to the data is solved. The sample is assumed to be simple. It is described by an adequate model. Otherwise, the sample is generated by a fixed probabilistic model from a known class of models. The sample size is considered sufficient if the model is restored with sufficient confidence. It is required, knowing the model, to estimate a sufficient sample size at the early stages of data collection.
+
*'''Problem''': The problem of designing an experiment is being investigated. The problem of estimating a sufficient sample size according to the data is solved. The sample is assumed to be simple. It is described by an adequate model. Otherwise, the sample is generated by a fixed probabilistic model from a known class of models. The sample size is considered sufficient if the model is restored with sufficient confidence. It is required, knowing the model, to estimate a sufficient sample size at the early stages of data collection.
*'''Data:''' For the computational experiment, it is proposed to use classical samples from the UCI repository. Link to samples https://github.com/ttgadaev/SampleSize/tree/master/datasets
*'''Data:''' For the computational experiment, it is proposed to use classical samples from the UCI repository. Link to samples https://github.com/ttgadaev/SampleSize/tree/master/datasets
-
*'''References:''':
+
*'''References:'''
*# [Overview of methods for estimating sample size]
*# [Overview of methods for estimating sample size]
*# http://svn.code.sf.net/p/mlalgorithms/code/PhDThesis/.
*# http://svn.code.sf.net/p/mlalgorithms/code/PhDThesis/.
Строка 1386: Строка 1999:
We are trying to approximate the dependence of the average value of log-likelihood and its variance on the sample size.
We are trying to approximate the dependence of the average value of log-likelihood and its variance on the sample size.
*'''Solution:''' The methods described in the review are asymptotic or require a deliberately large sample size. The new method should be to predict volume in the early stages of experiment design, i.e. when data is scarce.
*'''Solution:''' The methods described in the review are asymptotic or require a deliberately large sample size. The new method should be to predict volume in the early stages of experiment design, i.e. when data is scarce.
-
*'''Authors:''' Grabovoi A.V. (consultant), Gadaev T. T. Strizhov V.V. (Experts)
+
*'''Authors:''' Grabovoi A.V. (consultant), Gadaev T. T. Strijov V.V. (Experts)
-
* Note: to determine the simplicity of the sample, a new definition of complexity is proposed ([http://www.machinelearning.ru/wiki/images/3/37/Ivanychev18BachelorThesis_%28merged%29.pdf Sergey Ivanychev]). This is a separate work, +1 Task 44a (? Katruza).
+
* Note: to determine the simplicity of the sample, a new definition of complexity is proposed ([http://www.machinelearning.ru/wiki/images/3/37/Ivanychev18BachelorThesis_%28merged%29.pdf Sergey Ivanychev]). This is a separate work, +1 The problem 44a (? Katruza).
-
===Task 15 ===
+
===15. 2019===
-
* '''Name:''' Formulation and solution of an optimization problem combining classification and regression to estimate the binding energy of a protein and small molecules. Task description [https://www.overleaf.com/read/rjdnyyxpdkyj]
+
* '''Title:''' Formulation and solution of an optimization problem combining classification and regression to estimate the binding energy of a protein and small molecules. The problem description [https://www.overleaf.com/read/rjdnyyxpdkyj]
-
* '''Task''': From a bioinformatics point of view, the Task is to estimate the free energy of protein binding to a small molecule (ligand): the best ligand in its best position has the ''lowest free energy'' of interaction with the protein. (Following a large text, see the file at the link above.)
+
* '''Problem:''' From a bioinformatics point of view, The problem is to estimate the free energy of protein binding to a small molecule (ligand): the best ligand in its best position has the ''lowest free energy'' of interaction with the protein. (Following a large text, see the file at the link above.)
* '''Data:'''
* '''Data:'''
-
** Data for binary classification. Approximately 12,000 protein-ligand complexes: for each of them there is 1 native position and 18 non-native ones. The main descriptors are histograms of distributions of distances between different atoms of the protein and ligand, the dimension of the vector of descriptors is ~ 20,000. In the case of continued research and publication in a specialized journal, the set of descriptors can be expanded. The data will be provided as binary files with a python script to read.
+
*# Data for binary classification. Approximately 12,000 protein-ligand complexes: for each of them there is 1 native position and 18 non-native ones. The main descriptors are histograms of distributions of distances between different atoms of the protein and ligand, the dimension of the vector of descriptors is ~ 20,000. In the case of continued research and publication in a specialized journal, the set of descriptors can be expanded. The data will be provided as binary files with a python script to read.
-
** Data for regression. For each of the presented complexes, the value of the quantity is known, which can be interpreted as the binding energy.
+
*# Data for regression. For each of the presented complexes, the value of the quantity is known, which can be interpreted as the binding energy.
-
* '''References:''':
+
* '''References:'''
-
** SVM [http://cs229.stanford.edu/notes/cs229-notes3.pdf]
+
*# SVM [http://cs229.stanford.edu/notes/cs229-notes3.pdf]
-
** Ridge Regression [http://scikit-learn.org/stable/modules/linear_model.html#ridge-regression]
+
*# Ridge Regression [http://scikit-learn.org/stable/modules/linear_model.html#ridge-regression]
-
** [https://alex.smola.org/papers/2003/SmoSch03b.pdf] (section 1)
+
*# [https://alex.smola.org/papers/2003/SmoSch03b.pdf] (section 1)
* '''Basic algorithm''': [https://hal.inria.fr/hal-01591154/] In the classification problem, we used an algorithm similar to linear SVM, whose relationship with the energy estimate is beyond the scope of the classification problem, described in the above article. Various loss functions can be used in a regression problem.
* '''Basic algorithm''': [https://hal.inria.fr/hal-01591154/] In the classification problem, we used an algorithm similar to linear SVM, whose relationship with the energy estimate is beyond the scope of the classification problem, described in the above article. Various loss functions can be used in a regression problem.
* '''Solution:''' It is necessary to connect the previously used optimization problem with the regression problem and solve it using standard methods. Cross-validation will be used to check the operation of the algorithm. There is a separate test set consisting of (1) 195 complexes of proteins and ligands, for which it is necessary to find the best ligand pose (the algorithm for obtaining ligand positions differs from that used in training), (2) complexes of proteins and ligands, for which native poses it is necessary to predict the energy binding, and (3) 65 proteins for which the most strongly binding ligand is to be found.
* '''Solution:''' It is necessary to connect the previously used optimization problem with the regression problem and solve it using standard methods. Cross-validation will be used to check the operation of the algorithm. There is a separate test set consisting of (1) 195 complexes of proteins and ligands, for which it is necessary to find the best ligand pose (the algorithm for obtaining ligand positions differs from that used in training), (2) complexes of proteins and ligands, for which native poses it is necessary to predict the energy binding, and (3) 65 proteins for which the most strongly binding ligand is to be found.
Строка 1404: Строка 2017:
* '''Authors''' Sergei Grudinin, Maria Kadukova
* '''Authors''' Sergei Grudinin, Maria Kadukova
-
=== Task 27 ===
+
===27. 2019===
-
* '''Name:''' Creation of ranking models for information retrieval systems. Algorithm for Predicting the Structure of Locally Optimal Models
+
* '''Title:''' Creation of ranking models for information retrieval systems. Algorithm for Predicting the Structure of Locally Optimal Models
-
* '''Task''': It is required to predict a time series using some parametric superposition of algebraic functions. It is proposed not to cost the prognostic model, but to predict it, that is, to predict the structure of the approximating superposition. A class of considered superpositions is introduced, and on the set of such structural descriptions, a search is made for a locally optimal model for the problem under consideration. The task consists in 1) searching for a suitable structural description of the model 2) describing the search algorithm for the structure that will correspond to the optimal model 3) describing the algorithm for inverse construction of the model according to its structural description. For an already existing example of the answer to questions 1-3, see the works of A. A. Varfolomeeva.
+
* '''Problem:''' It is required to predict a time series using some parametric superposition of algebraic functions. It is proposed not to cost the prognostic model, but to predict it, that is, to predict the structure of the approximating superposition. A class of considered superpositions is introduced, and on the set of such structural descriptions, a search is made for a locally optimal model for the problem under consideration. The problem consists in 1) searching for a suitable structural description of the model 2) describing the search algorithm for the structure that will correspond to the optimal model 3) describing the algorithm for inverse construction of the model according to its structural description. For an already existing example of the answer to questions 1-3, see the works of A. A. Varfolomeeva.
* '''Data:'''
* '''Data:'''
-
** Collection of text documents TREC (!)
+
*# Collection of text documents TREC (!)
-
** A set of time series, which implies the restoration of functional dependencies. It is proposed to first use synthetic data or immediately apply the algorithm to forecasting time series 1) electricity consumption 2) physical activity with subsequent analysis of the resulting structures.
+
*# A set of time series, which implies the restoration of functional dependencies. It is proposed to first use synthetic data or immediately apply the algorithm to forecasting time series 1) electricity consumption 2) physical activity with subsequent analysis of the resulting structures.
-
* '''References:''':
+
* '''References:'''
*# (!) Kulunchakov A.S., Strijov V.V. Generation of simple structured Information Retrieval functions by genetic algorithm without stagnation // [http://strijov.com/papers/Kulunchakov2014RankingBySimpleFun.pdf Expert Systems with Applications, 2017, 85: 221–230.]
*# (!) Kulunchakov A.S., Strijov V.V. Generation of simple structured Information Retrieval functions by genetic algorithm without stagnation // [http://strijov.com/papers/Kulunchakov2014RankingBySimpleFun.pdf Expert Systems with Applications, 2017, 85: 221–230.]
-
*# A. A. Varfolomeeva Selection of features when marking up bibliographic lists using structural learning methods, 2013, [http://www.machinelearning.ru/wiki/images/f/f2/Varfolomeeva2013Diploma.pdf?format=raw]
+
*# A. A. Varfolomeeva Selection of features when marking up bibliographic lists using structural learning methods, 2013, [http://www.machinelearning.ru/wiki/images/f/f2/Varfolomeeva2013Diploma.pdf?format=raw]
*# Bin Cao, Ying Li and Jianwei Yin Measuring Similarity between Graphs Based on the Levenshtein Distance, 2012, [http://naturalspublishing.com/files/published/92cn7jm44d8wt1.pdf?format=raw]
*# Bin Cao, Ying Li and Jianwei Yin Measuring Similarity between Graphs Based on the Levenshtein Distance, 2012, [http://naturalspublishing.com/files/published/92cn7jm44d8wt1.pdf?format=raw]
-
* '''Basic algorithm:''' Described in [1]. Developed in the work of the 974 group team. It is proposed to use their code and experiment.
+
* '''Base algorithm:''' Described in [1]. Developed in the work of the 974 group team. It is proposed to use their code and experiment.
* '''Solution:''' It is proposed to try to repeat the experiment of A. A. Varfolomeeva for a different structural description in order to understand what is happening. The superposition of algebraic functions defines an ortree, on the vertices of which the labels of the corresponding algebraic functions or variables are given. Therefore, the structural description of such a superposition can be its DFS-code. This is a string consisting of vertex labels, written in the order in which the tree is traversed by depth-first search. Knowing the arities of the corresponding algebraic functions, we can restore any such DFS-code in O(n) and get back the superposition of functions. On the set of similar string descriptions, it is proposed to search for the string description that will correspond to the optimal model.
* '''Solution:''' It is proposed to try to repeat the experiment of A. A. Varfolomeeva for a different structural description in order to understand what is happening. The superposition of algebraic functions defines an ortree, on the vertices of which the labels of the corresponding algebraic functions or variables are given. Therefore, the structural description of such a superposition can be its DFS-code. This is a string consisting of vertex labels, written in the order in which the tree is traversed by depth-first search. Knowing the arities of the corresponding algebraic functions, we can restore any such DFS-code in O(n) and get back the superposition of functions. On the set of similar string descriptions, it is proposed to search for the string description that will correspond to the optimal model.
-
* '''Authors:''' consultant [https://www.inria.fr/centre/grenoble Andrey Kulunchakov (Inria Montbonnot)], Expert Strizhov V.V.
+
* '''Authors:''' consultant [https://www.inria.fr/centre/grenoble Andrey Kulunchakov (Inria Montbonnot)], Expert Strijov V.V.
-
=== Task 26 ===
+
===26. 2019===
-
* '''Name:''' Accelerometer positioning
+
* '''Title:''' Accelerometer positioning
-
* '''Task''': Given initial coordinates, accelerometer signals, additional information (gyroscope, magnetometer signals). Possibly inaccurate map given (Task [https://en.wikipedia.org/wiki/Simultaneous_localization_and_mapping SLAM])
+
* '''Problem:''' Given initial coordinates, accelerometer signals, additional information (gyroscope, magnetometer signals). Possibly inaccurate map given (The problem [https://en.wikipedia.org/wiki/Simultaneous_localization_and_mapping SLAM])
* '''Data:''' from [1], self-collected data.
* '''Data:''' from [1], self-collected data.
-
* '''References:''':
+
* '''References:'''
*# https://arxiv.org/pdf/1712.09004.pdf
*# https://arxiv.org/pdf/1712.09004.pdf
*# https://ieeexplore.ieee.org/document/1528431
*# https://ieeexplore.ieee.org/document/1528431
Строка 1428: Строка 2041:
* '''Solution:''' Search for a priori and additional information that improves positioning accuracy.
* '''Solution:''' Search for a priori and additional information that improves positioning accuracy.
* '''Novelty:''' Statement of the problem in terms of Projection to Latent Spaces
* '''Novelty:''' Statement of the problem in terms of Projection to Latent Spaces
-
* '''Authors:''' consultant [http://www.forecsys.ru/ru/site/projects/solut2/ Anastasia Motrenko], Expert [https://www.huawei.com/en/ Ilya Gartseev] , Strizhov V.V.
+
* '''Authors:''' consultant [http://www.forecsys.ru/ru/site/projects/solut2/ Anastasia Motrenko], Expert [https://www.huawei.com/en/ Ilya Gartseev] , Strijov V.V.
-
=== Task 45 ===
+
===45. 2019===
-
* Name: Task of searching characters in images
+
* Name: The problem of searching characters in images
-
* Task: This Task in one of the formulation options can be reduced to two sequential operations: 1) searching for objects in the image and determining their class 2) searching the database for information about the symbolic meaning of the found objects. The main difficulty in solving the problem lies in the search for objects in the image. However, the following classification may also be difficult due to the fact that the image of the object may be incomplete, unusually stylized, and the like.
+
* '''Problem description:''' This The problem in one of the formulation options can be reduced to two sequential operations: 1) searching for objects in the image and determining their class 2) searching the database for information about the symbolic meaning of the found objects. The main difficulty in solving the problem lies in the search for objects in the image. However, the following classification may also be difficult due to the fact that the image of the object may be incomplete, unusually stylized, and the like.
* Data: Dictionary of Symbols Museum Sites Image-net
* Data: Dictionary of Symbols Museum Sites Image-net
-
* References:
+
* '''References:'''
*# http://www.machinelearning.ru/wiki/images/e/e2/IDP18.pdf (p. 116)
*# http://www.machinelearning.ru/wiki/images/e/e2/IDP18.pdf (p. 116)
*# http://www.image-net.org
*# http://www.image-net.org
* Basic algorithm: CNN
* Basic algorithm: CNN
-
* Solution: It is proposed to compare the work of several state-of-the-art algorithms. Suggest a quality metric for searching and classifying objects. Determine applicability of methods.
+
* '''Solution:''' It is proposed to compare the work of several state-of-the-art algorithms. Suggest a quality metric for searching and classifying objects. Determine applicability of methods.
-
* Novelty: The proposed image analysis approach is used by Experts in manual mode and has not been automated
+
* '''Novelty:''' The proposed image analysis approach is used by Experts in manual mode and has not been automated
-
* Authors: M. Apishev (consultant), D. Lemtyuzhnikova
+
* '''Authors:''' M. Apishev (consultant), D. Lemtyuzhnikova
-
=== Task 28 ===
+
===28. 2019===
* Name: Multi-simulation as a universal way to describe a general sample
* Name: Multi-simulation as a universal way to describe a general sample
-
* Task: Build a method for incremental refinement of the multimodel structure when new objects appear. Development and comparison of different algorithms for updating the structure of multimodels. Construction of an optimal scheme for refining the structure of a multimodel depending on the total sample size.
+
* '''Problem description:''' Build a method for incremental refinement of the multimodel structure when new objects appear. Development and comparison of different algorithms for updating the structure of multimodels. Construction of an optimal scheme for refining the structure of a multimodel depending on the total sample size.
* Data: At the initial stage of work, synthetic data with a known statistical structure is used. Testing of the developed methods is carried out on real data from the UCI repository.
* Data: At the initial stage of work, synthetic data with a known statistical structure is used. Testing of the developed methods is carried out on real data from the UCI repository.
-
* References:
+
* '''References:'''
-
# Bishop, Christopher M. "Pattern recognition and machine learning." Springer, New York (2006).
+
*# Bishop, Christopher M. "Pattern recognition and machine learning." Springer, New York (2006).
-
#Gelman, Andrew, et al. Bayesian data analysis, 3rd edition. Chapman and Hall/CRC, 2013.
+
*# Gelman, Andrew, et al. Bayesian data analysis, 3rd edition. Chapman and Hall/CRC, 2013.
-
# MacKay, David JC. "The evidence framework applied to classification networks." Neural computation 4.5 (1992): 720-736.
+
*# MacKay, David JC. "The evidence framework applied to classification networks." Neural computation 4.5 (1992): 720-736.
-
# Aduenko A. A. "Choice of multimodels in Task classification" Ph.D. thesis
+
*# Aduenko A. A. "Choice of multimodels in The problem classification" Ph.D. thesis
-
# Motrenko, Anastasiya, Strizhov V.V., and Gerhard-Wilhelm Weber. "Sample size determination for logistic regression." Journal of Computational and Applied Mathematics 255 (2014): 743-752.
+
*# Motrenko, Anastasiya, Strijov V.V., and Gerhard-Wilhelm Weber. "Sample size determination for logistic regression." Journal of Computational and Applied Mathematics 255 (2014): 743-752.
* Basic algorithm: Algorithm for constructing adequate multi-models from #4.
* Basic algorithm: Algorithm for constructing adequate multi-models from #4.
-
* Solution: Bayesian approach to the problem of choosing models based on validity. Analysis of the properties of validity and its relationship with statistical significance.
+
* '''Solution:''' Bayesian approach to the problem of choosing models based on validity. Analysis of the properties of validity and its relationship with statistical significance.
-
* Novelty: A method is proposed for constructing an optimal scheme for updating the structure of a multimodel when new objects appear. The relationship between validity and statistical significance for some classes of models has been studied.
+
* '''Novelty:''' A method is proposed for constructing an optimal scheme for updating the structure of a multimodel when new objects appear. The relationship between validity and statistical significance for some classes of models has been studied.
-
* Authors: Strizhov Vadim Viktorovich, Aduenko Alexander Alexandrovich (GMT-5)
+
* '''Authors:''' Strijov Vadim Viktorovich, Aduenko Alexander Alexandrovich (GMT-5)
-
=== Task 11 ===
+
===11. 2019===
-
* '''Name:''' Automatic construction of a neural network of optimal complexity
+
* '''Title:''' Automatic construction of a neural network of optimal complexity
-
* '''Task''': The Task of finding a stable (and not redundant in terms of parameters) neural network structure is considered. The neural network is considered as a computational graph, the edges of which are primitive functions, and the vertices are intermediate representations of the sample obtained under the action of these functions. It is required to choose a subgraph of the model, in which the final neural network will give an acceptable classification quality with a small number of parameters.
+
* '''Problem:''' The problem of finding a stable (and not redundant in terms of parameters) neural network structure is considered. The neural network is considered as a computational graph, the edges of which are primitive functions, and the vertices are intermediate representations of the sample obtained under the action of these functions. It is required to choose a subgraph of the model, in which the final neural network will give an acceptable classification quality with a small number of parameters.
* '''Data:''' Samples Boston, MNIST, CIFAR-10
* '''Data:''' Samples Boston, MNIST, CIFAR-10
-
* '''References:''':
+
* '''References:'''
-
*# [http://strijov.com/papers/BakhteevEvidenceArticle3.pdf Oleg Bakhteev Yu., Strizhov V.V. Selection of deep learning models of suboptimal complexity using variational likelihood estimation // Avtomatika and telemechanika, 2018.]
+
*# [http://strijov.com/papers/BakhteevEvidenceArticle3.pdf Oleg Bakhteev Yu., Strijov V.V. Selection of deep learning models of suboptimal complexity using variational likelihood estimation // Avtomatika and telemechanika, 2018.]
-
*# [http://strijov.com/papers/SmerdovBakhteevStrijov_Paraphrase2017.pdf Smerdov A.N., Oleg Bakhteev Yu., Strizhov V.V. Choosing the optimal model of the recurrent network in the Paraphrase Search Tasks // Informatics and its applications, 2018.]
+
*# [http://strijov.com/papers/SmerdovBakhteevStrijov_Paraphrase2017.pdf Smerdov A.N., Oleg Bakhteev Yu., Strijov V.V. Choosing the optimal model of the recurrent network in the Paraphrase Search The problems // Informatics and its applications, 2018.]
*# [https://papers.nips.cc/paper/4329-practical-variational-inference-for-neural-networks] Variational inference.
*# [https://papers.nips.cc/paper/4329-practical-variational-inference-for-neural-networks] Variational inference.
*# [https://arxiv.org/abs/1611.00712] Relaxation based on variational inference.
*# [https://arxiv.org/abs/1611.00712] Relaxation based on variational inference.
*# [https://arxiv.org/abs/1806.09055] DARTS.
*# [https://arxiv.org/abs/1806.09055] DARTS.
-
* '''Basic algorithm:''' random search and DARTS algorithm (model selection using relaxation without variational inference).
+
* '''Base algorithm:''' random search and DARTS algorithm (model selection using relaxation without variational inference).
* '''Decision'''It is proposed to choose the structure of the neural network based on the variational inference. To select the optimal structure, relaxation is used: from a strict choice of one of several considered submodels of the neural network, it is proposed to move to the composition of these models with different weights for each of them.
* '''Decision'''It is proposed to choose the structure of the neural network based on the variational inference. To select the optimal structure, relaxation is used: from a strict choice of one of several considered submodels of the neural network, it is proposed to move to the composition of these models with different weights for each of them.
* '''Novelty:''' A method of automatic model building is proposed, which takes into account inaccuracies in the optimization of model parameters and allows finding the most stable models.
* '''Novelty:''' A method of automatic model building is proposed, which takes into account inaccuracies in the optimization of model parameters and allows finding the most stable models.
-
* '''Authors:''' Oleg Bakhteev, Strizhov V.V.
+
* '''Authors:''' Oleg Bakhteev, Strijov V.V.
-
=== Task 48 ===
+
===48. 2019===
-
* '''Name:''' Multi-simulation, privileged training
+
* '''Title:''' Multi-simulation, privileged training
-
* '''Task''': Considers the Task of learning one model from another
+
* '''Problem:''' Considers The problem of learning one model from another
* '''Data:''' Time series samples
* '''Data:''' Time series samples
-
* '''References:''':
+
* '''References:'''
*# https://github.com/neychev/distillation_n_privileged_info_torch
*# https://github.com/neychev/distillation_n_privileged_info_torch
-
*# https://github.com/neychev/Multitask_forecast_code
+
*# https://github.com/neychev/MultiThe problem_forecast_code
*# Article by Mixture Experts
*# Article by Mixture Experts
*# Neychev's diploma http://www.machinelearning.ru/wiki/images/3/36/NeyhevMS_Thesis.pdf
*# Neychev's diploma http://www.machinelearning.ru/wiki/images/3/36/NeyhevMS_Thesis.pdf
-
* '''Basic algorithm:''' Blend of Experts, privileged training, distillation
+
* '''Base algorithm:''' Blend of Experts, privileged training, distillation
* '''Solution''' Run an experiment illustrating these approaches
* '''Solution''' Run an experiment illustrating these approaches
* '''Novelty:''' A forecasting method is proposed that uses a priori information about the membership of the model sample (publish the results).
* '''Novelty:''' A forecasting method is proposed that uses a priori information about the membership of the model sample (publish the results).
-
* '''Authors:''' R.G. Neichev (consultant), Strizhov V.V.
+
* '''Authors:''' R.G. Neichev (consultant), Strijov V.V.
-
=== Task 49 ===
+
===49. 2019===
* Name: Brain signal decoding and intention prediction
* Name: Brain signal decoding and intention prediction
-
* Task: It is required to build a model that restores the movement of the limbs according to the corticogram.
+
* '''Problem description:''' It is required to build a model that restores the movement of the limbs according to the corticogram.
* Data: neurotycho.org [9] (or fingers)
* Data: neurotycho.org [9] (or fingers)
-
* References:
+
* '''References:'''
-
** Neichev R.G., Katrutsa A.M., Strizhov V.V. Selection of the optimal set of features from a multicorrelated set in the forecasting problem. Zavodskaya Lab. Materials Diagnostics, 2016, 82(3) : 68-74. [10]
+
*# Neichev R.G., Katrutsa A.M., Strijov V.V. Selection of the optimal set of features from a multicorrelated set in the forecasting problem. Zavodskaya Lab. Materials Diagnostics, 2016, 82(3) : 68-74. [10]
-
** Isachenko R.V., Strijov V.V. Quadratic Programming Optimization with Feature Selection for Non-linear Models // Lobachevskii Journal of Mathematics, 2018, 39(9) : 1179-1187. article
+
*# Isachenko R.V., Strijov V.V. Quadratic Programming Optimization with Feature Selection for Non-linear Models // Lobachevskii Journal of Mathematics, 2018, 39(9) : 1179-1187. article
* Basic algorithm: Partial Least Squares[11]
* Basic algorithm: Partial Least Squares[11]
-
* Solution: Create a feature selection algorithm alternative to PLS and taking into account the non-orthogonal feature interdependence structure.
+
* '''Solution:''' Create a feature selection algorithm alternative to PLS and taking into account the non-orthogonal feature interdependence structure.
-
* Novelty: A feature selection method is proposed that takes into account the regularities of both the and independent variable and the dependent variable. Bonus: Explore changes in model structure as the nature of the sample changes.
+
* '''Novelty:''' A feature selection method is proposed that takes into account the regularities of both the and independent variable and the dependent variable. Bonus: Explore changes in model structure as the nature of the sample changes.
-
* Authors: Andrey Zadayanchuk, Strizhov V.V.
+
* '''Authors:''' Andrey Zadayanchuk, Strijov V.V.
-
=2018=
+
==2018==
-
==Autumn 2018 ==
+
{|class="wikitable"
{|class="wikitable"
|-
|-
-
! Number
+
! Title
-
! Project name
+
! Links
-
! materials
+
! Team
! Team
|-
|-
-
|0
+
|(Example) Metric classification of time series
-
|(пример) Метрическая классификация временных рядов
+
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/code Code],
-
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/code code],
+
[https://docs.google.com/document/d/1fx7fVlmnwdTesElt-lbaHvoGEjJC5t_9e-X0ZpUzEcQ/edit?usp=sharing LinkReview],
[https://docs.google.com/document/d/1fx7fVlmnwdTesElt-lbaHvoGEjJC5t_9e-X0ZpUzEcQ/edit?usp=sharing LinkReview],
[https://t.me/joinchat/Ak0SzkfYN_boA3eRtfPKvg Discussion]
[https://t.me/joinchat/Ak0SzkfYN_boA3eRtfPKvg Discussion]
-
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/Goncharov2015MetricClassification.pdf Alexey Goncharov]*, [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/Goncharov2015MetricClassification.pdf Максим Савинов]
+
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/Goncharov2015MetricClassification.pdf Alexey Goncharov]*, [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/Goncharov2015MetricClassification.pdf Maxim Savinov]
|-
|-
-
|1
+
|Forecasting the direction of movement of the price of exchange instruments according to the news flow
-
|Прогнозирование направления движения цены биржевых инструментов по новостному потоку0
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-1 Code],
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-1 Code],
[https://docs.google.com/document/d/1qa6PO_3AXcXPkJKNjQgihBXWkmBpspFWi3Ct34FYonw/edit LinkReview],
[https://docs.google.com/document/d/1qa6PO_3AXcXPkJKNjQgihBXWkmBpspFWi3Ct34FYonw/edit LinkReview],
[https://github.com/Intelligent-Systems-Phystech/2018-Project-1/raw/master/Presentation.pdf Slides],
[https://github.com/Intelligent-Systems-Phystech/2018-Project-1/raw/master/Presentation.pdf Slides],
[https://github.com/Intelligent-Systems-Phystech/2018-Project-1/raw/master/Paper.pdf Report]
[https://github.com/Intelligent-Systems-Phystech/2018-Project-1/raw/master/Paper.pdf Report]
-
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-1/raw/master/Borisov2018Project1/Borisov2018Project1.pdf Александр Борисов],
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-1/raw/master/Borisov2018Project1/Borisov2018Project1.pdf Alexander Borisov],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-1/blob/master/Drobin2018Project1/Drobin2018Project1.pdf Дробин Максим], [https://github.com/Intelligent-Systems-Phystech/2018-Project-1/raw/master/Govorov2018Project1/Govorov2018Project1.pdf Говоров Иван], [https://github.com/Intelligent-Systems-Phystech/2018-Project-1/raw/master/Mukhitdinova2018Project1/Mukhitdinova2018Project1.pdf Мухитдинова София], [https://github.com/Intelligent-Systems-Phystech/2018-Project-1/raw/master/Rodionov2018Project1/Rodionov2018Project1.pdf Валентин Родионов], [https://github.com/Intelligent-Systems-Phystech/2018-Project-1/raw/master/Akhiarov2018Project1/Akhiarov2018Project1.pdf Валентин Ахияров]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-1/blob/master/Drobin2018Project1/Drobin2018Project1.pdf Drobin Maxim], [https://github.com/Intelligent-Systems-Phystech/2018-Project-1/raw/master/Govorov2018Project1/Govorov2018Project1.pdf Govorov Ivan], [https://github.com/Intelligent-Systems-Phystech/2018-Project-1/raw/master/Mukhitdinova2018Project1/Mukhitdinova2018Project1.pdf Mukhitdinova Sofia], [https://github.com/Intelligent-Systems-Phystech/2018-Project-1/raw/master/Rodionov2018Project1/Rodionov2018Project1.pdf Valentin Rodionov], [https://github.com/Intelligent-Systems-Phystech/2018-Project-1/raw/master/Akhiarov2018Project1/Akhiarov2018Project1.pdf Valentin Akhiyarov]
|-
|-
-
|2
+
|Construction of reference objects for a set of multidimensional time series
-
|Построение опорных объектов для множества многомерных временных рядов
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-2 Code]
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-2 Code]
[https://docs.google.com/document/d/1ruVHmEMgBXcULWsy-mYg2KgAV2SyC5si4T4UHVPMu2E/edit LinkReview]
[https://docs.google.com/document/d/1ruVHmEMgBXcULWsy-mYg2KgAV2SyC5si4T4UHVPMu2E/edit LinkReview]
-
|[https://raw.githubusercontent.com/Intelligent-Systems-Phystech/2018-Project-2/master/Iskhakov2018Project2/test.pdf Исхаков Ришат],
+
|[https://raw.githubusercontent.com/Intelligent-Systems-Phystech/2018-Project-2/master/Iskhakov2018Project2/test.pdf Iskhakov Rishat],
-
[https://raw.githubusercontent.com/Intelligent-Systems-Phystech/2018-Project-2/master/Korepanov2018Project2/test.pdf Корепанов Георгий],
+
[https://raw.githubusercontent.com/Intelligent-Systems-Phystech/2018-Project-2/master/Korepanov2018Project2/test.pdf Korepanov Georgy],
-
[https://raw.githubusercontent.com/Intelligent-Systems-Phystech/2018-Project-2/master/Solodnev2018Project2/test.pdf Степан Солоднев]
+
[https://raw.githubusercontent.com/Intelligent-Systems-Phystech/2018-Project-2/master/Solodnev2018Project2/test.pdfStepan Solodnev]
-
[https://raw.githubusercontent.com/Intelligent-Systems-Phystech/2018-Project-2/master/Solodnev2018Project2/test.pdf Самирханов Данил]
+
[https://raw.githubusercontent.com/Intelligent-Systems-Phystech/2018-Project-2/master/Solodnev2018Project2/test.pdf Samirkhanov Danil]
-
 
+
|-
|-
-
|3
+
|Dynamic alignment of multivariate time series
-
|Динамическое выравнивание многомерных временных рядов
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-3 Code]
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-3 Code]
[https://docs.google.com/document/d/1ruVHmEMgBXcULWsy-mYg2KgAV2SyC5si4T4UHVPMu2E/edit LinkReview]
[https://docs.google.com/document/d/1ruVHmEMgBXcULWsy-mYg2KgAV2SyC5si4T4UHVPMu2E/edit LinkReview]
Строка 1541: Строка 2147:
[https://github.com/Intelligent-Systems-Phystech/2018-Project-3/raw/master/Morgachev2018Title/paper/Morgachev2018Title.pdf Report]
[https://github.com/Intelligent-Systems-Phystech/2018-Project-3/raw/master/Morgachev2018Title/paper/Morgachev2018Title.pdf Report]
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-3/raw/master/Morgachev2018Title/Morgachev2018Title.pdf Gleb Morgachev],
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-3/raw/master/Morgachev2018Title/Morgachev2018Title.pdf Gleb Morgachev],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-3/blob/master/Smirnov2018Title/Smirnov2018Title.pdf Владислав Смирнов],
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-3/blob/master/Smirnov2018Title/Smirnov2018Title.pdf Vladislav Smirnov],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-3/blob/master/Lipnitckaia2018Title/Lipnitckaia2018Title.pdf Татьяна Липницкая]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-3/blob/master/Lipnitckaia2018Title/Lipnitckaia2018Title.pdf Tatiana Lipnitskaya]
|-
|-
-
|4
+
|Automatic adjustment of ARTM parameters for a wide class of The problems
-
|Автоматическая настройка параметров АРТМ под широкий класс задач
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-4 Code],
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-4 Code],
[https://docs.google.com/document/d/1RidglPMH1-Yb1rx7V7QayDDuM-HfL-pF2kkGBWbWrxk/edit LinkReview],
[https://docs.google.com/document/d/1RidglPMH1-Yb1rx7V7QayDDuM-HfL-pF2kkGBWbWrxk/edit LinkReview],
[https://docs.google.com/presentation/d/1WpCbs7Rf9i7oCT25mSTcbBCLlN_tXwdjdv1VQ6Y8bVs/edit#slide=id.p Presentation]
[https://docs.google.com/presentation/d/1WpCbs7Rf9i7oCT25mSTcbBCLlN_tXwdjdv1VQ6Y8bVs/edit#slide=id.p Presentation]
-
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-4/raw/master/Golubeva2018Problem4/Golubeva2018Problem4.pdf Голубева Татьяна],
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-4/raw/master/Golubeva2018Problem4/Golubeva2018Problem4.pdf Golubeva Tatiana],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-4/raw/master/Ivanova2018Problem4/Ivanova2018Problem4.pdf Иванова Екатерина],
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-4/raw/master/Ivanova2018Problem4/Ivanova2018Problem4.pdf Ivanova Ekaterina],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-4/raw/master/Matveeva2018Problem4/Matveeva2018Problem4.pdf Матвеева Светлана],
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-4/raw/master/Matveeva2018Problem4/Matveeva2018Problem4.pdf Matveeva Svetlana],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-4/raw/master/Trusov2018Problem4/Trusov2018Problem4.pdf Трусов Антон],
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-4/raw/master/Trusov2018Problem4/Trusov2018Problem4.pdf Trusov Anton],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-4/raw/master/Tsaritsyn2018Problem4/Tsaritsyn2018Problem4.pdf Царицын Михаил],
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-4/raw/master/Tsaritsyn2018Problem4/Tsaritsyn2018Problem4.pdf Tsaritsyn Mikhail],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-4/raw/master/Chernonog2018Problem4/Chernonog2018Problem4.pdf Черноног Вячеслав]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-4/raw/master/Chernonog2018Problem4/Chernonog2018Problem4.pdf Chernonog Vyacheslav]
|-
|-
-
|5
+
|Finding paraphrases
-
|Нахождение парафразов
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-5 Code],
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-5 Code],
[https://docs.google.com/document/d/1rTEFOVCDVNPHss09IRG-C95yovUE4XTyryOnpb8DWFA LinkReview]
[https://docs.google.com/document/d/1rTEFOVCDVNPHss09IRG-C95yovUE4XTyryOnpb8DWFA LinkReview]
Строка 1563: Строка 2167:
[https://github.com/Intelligent-Systems-Phystech/2018-Project-5/raw/master/Kitashov2018Paraphrases/report.pdf Fedor Kitashov], [https://github.com/Intelligent-Systems-Phystech/2018-Project-5/raw/master/Proskura2018Paraphrases/report.pdf Polina Proskura], [https://github.com/Intelligent-Systems-Phystech/2018-Project-5/raw/master/Basimova2018Paraphrases/report.pdf Natalia Basimova], [https://github.com/Intelligent-Systems-Phystech/2018-Project-5/raw/master/Krasnikov2018Paraphrases/report.pdf Roman Krasnikov], [https://github.com/Intelligent-Systems-Phystech/2018-Project-5/raw/master/Shabanov2018Paraphrases/report.pdf Akhmedkhan Shabanov]
[https://github.com/Intelligent-Systems-Phystech/2018-Project-5/raw/master/Kitashov2018Paraphrases/report.pdf Fedor Kitashov], [https://github.com/Intelligent-Systems-Phystech/2018-Project-5/raw/master/Proskura2018Paraphrases/report.pdf Polina Proskura], [https://github.com/Intelligent-Systems-Phystech/2018-Project-5/raw/master/Basimova2018Paraphrases/report.pdf Natalia Basimova], [https://github.com/Intelligent-Systems-Phystech/2018-Project-5/raw/master/Krasnikov2018Paraphrases/report.pdf Roman Krasnikov], [https://github.com/Intelligent-Systems-Phystech/2018-Project-5/raw/master/Shabanov2018Paraphrases/report.pdf Akhmedkhan Shabanov]
|-
|-
-
|6
 
|On conformational changes of proteins using collective motions in torsion angle space and L1 regularization
|On conformational changes of proteins using collective motions in torsion angle space and L1 regularization
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-6 Code],
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-6 Code],
Строка 1570: Строка 2173:
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-6/raw/master/Ryabinina2018Project6/report.pdf Ryabinina Raisa], [https://github.com/Intelligent-Systems-Phystech/2018-Project-6/raw/master/Emtsev2018Project6/report.pdf Emtsev Daniil]
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-6/raw/master/Ryabinina2018Project6/report.pdf Ryabinina Raisa], [https://github.com/Intelligent-Systems-Phystech/2018-Project-6/raw/master/Emtsev2018Project6/report.pdf Emtsev Daniil]
|-
|-
-
|7
 
|Privileged training in the problem of approximating the borders of the iris
|Privileged training in the problem of approximating the borders of the iris
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-7 Code],
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-7 Code],
Строка 1579: Строка 2181:
[https://github.com/Intelligent-Systems-Phystech/2018-Project-7/raw/master/Learning_Pashtet_Crew/Balakin2018Project7/Privileged_training_in_the_problem_of_approximating_the_borders_of_the_iris.pdf Nikolay Balakin]
[https://github.com/Intelligent-Systems-Phystech/2018-Project-7/raw/master/Learning_Pashtet_Crew/Balakin2018Project7/Privileged_training_in_the_problem_of_approximating_the_borders_of_the_iris.pdf Nikolay Balakin]
|-
|-
-
|8
+
|Generation of features using locally approximating models
-
|Порождение признаков с помощью локально-аппроксимирующих моделей
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-8/tree/master/code Code],
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-8/tree/master/code Code],
[https://docs.google.com/document/d/1e65opLey0Yxo_kAZ4cKTcjMIIYxR1jVPCQrpmr4k29w/edit?usp=sharing LinkReview]
[https://docs.google.com/document/d/1e65opLey0Yxo_kAZ4cKTcjMIIYxR1jVPCQrpmr4k29w/edit?usp=sharing LinkReview]
-
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-8/raw/master/Kurashov2018Project8/Kurashov2018Project8.pdf Ибрагим Курашов], [https://github.com/Intelligent-Systems-Phystech/2018-Project-8/raw/master/Gilmutdinov2018Project8/Gilmutdinov2018Project8.pdf Наиль Гильмутдинов],
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-8/raw/master/Kurashov2018Project8/Kurashov2018Project8.pdf Ibrahim Kurashov], [https://github.com/Intelligent-Systems-Phystech/2018-Project-8/raw/master/Gilmutdinov2018Project8/Gilmutdinov2018Project8.pdf Nail Gilmutdinov],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-8/raw/master/Mulyukov2018Project8/Mulyukov2018Project8.pdf Альберт Мулюков],
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-8/raw/master/Mulyukov2018Project8/Mulyukov2018Project8.pdf Albert Mulyukov],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-8/raw/master/Spivak2018Project8/Spivak2018Project8.pdf Валентин Спивак]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-8/raw/master/Spivak2018Project8/Spivak2018Project8.pdf Valentin Spivak]
|-
|-
-
|9
+
|Text recognition based on skeletal representation of thick lines and convolutional networks
-
|Распознавание текста на основе скелетного представления толстых линий and сверточных сетей
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-9 Code], [https://docs.google.com/document/d/1vvOqLwLJSelbKBglc4LKh6XUWS5c72L0XMzyeJ20XBM/edit LiteratureReview], [https://drive.google.com/file/d/1pzfKkjVe1aP1-5ab1ewN0NMF60RJ26IA/view?usp=drivesdk Slides], [https://github.com/Intelligent-Systems-Phystech/2018-Project-9/raw/master/Lukoyanov2018Project9/main.pdf report]
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-9 Code], [https://docs.google.com/document/d/1vvOqLwLJSelbKBglc4LKh6XUWS5c72L0XMzyeJ20XBM/edit LiteratureReview], [https://drive.google.com/file/d/1pzfKkjVe1aP1-5ab1ewN0NMF60RJ26IA/view?usp=drivesdk Slides], [https://github.com/Intelligent-Systems-Phystech/2018-Project-9/raw/master/Lukoyanov2018Project9/main.pdf report]
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-9/raw/master/Kutsevol2018Project9/Kutsevol_Article.pdf Kutsevol Polina]
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-9/raw/master/Kutsevol2018Project9/Kutsevol_Article.pdf Kutsevol Polina]
Строка 1600: Строка 2200:
[https://github.com/Intelligent-Systems-Phystech/2018-Project-9/blob/master/ValukovKolya2018Project9/main.pdf Valyukov Nikolay]
[https://github.com/Intelligent-Systems-Phystech/2018-Project-9/blob/master/ValukovKolya2018Project9/main.pdf Valyukov Nikolay]
[https://github.com/Intelligent-Systems-Phystech/2018-Project-9/blob/master/Tushin2018Project9/Tushin.pdf Tushin Kirill]
[https://github.com/Intelligent-Systems-Phystech/2018-Project-9/blob/master/Tushin2018Project9/Tushin.pdf Tushin Kirill]
-
 
-
 
|-
|-
-
|10
+
|Comparison of neural network and continuous-morphological methods in the problem of text detection
-
|Сравнение нейросетевых and непрерывно-морфологических методов в задаче детекции текста
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-10 Code], [https://docs.google.com/document/d/1Gocn0x-FfYkD_L7ZLZdULxNTBfo25OMMKPBr2-otw-w/edit?usp=sharing LinkReview], [https://t.me/joinchat/DEQDKU-oqyt8FRG4SoFh3w Discussion], [https://docs.google.com/presentation/d/17_7i0KFELxyaL-MtvVmu2ed07sg331hiMagYqNpq9Ek/edit?usp=sharing Presentation]
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-10 Code], [https://docs.google.com/document/d/1Gocn0x-FfYkD_L7ZLZdULxNTBfo25OMMKPBr2-otw-w/edit?usp=sharing LinkReview], [https://t.me/joinchat/DEQDKU-oqyt8FRG4SoFh3w Discussion], [https://docs.google.com/presentation/d/17_7i0KFELxyaL-MtvVmu2ed07sg331hiMagYqNpq9Ek/edit?usp=sharing Presentation]
-
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-10/blob/master/report/Gaiduchenko2018Project10/Gaiduchenko2018Project10.pdf Гайдученко Николай]
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-10/blob/master/report/Gaiduchenko2018Project10/Gaiduchenko2018Project10.pdf Gaiduchenko Nikolay]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-10/tree/master/report/Torlak2018Project10 Торлак Артём ]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-10/tree/master/report/Torlak2018Project10 Torlak Artyom]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-10/tree/master/report/Akimov2018Project10 Акимов Кирилл]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-10/tree/master/report/Akimov2018Project10 Akimov Kirill]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-10/tree/master/report/Mironova2018Project10 Миронова Лилия]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-10/tree/master/report/Mironova2018Project10 Mironova Lilia]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-10/tree/master/report/Gonchar2018Project10 Гончар Даниил]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-10/tree/master/report/Gonchar2018Project10 Gonchar Daniel]
-
|
+
|-
|-
-
|11
+
|Automatic construction of a neural network of optimal complexity
-
|Автоматическое построение нейросети оптимальной сложности
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-11 Code], [https://docs.google.com/document/d/131-9Uxl4tTIMKBh7WNJuZR5MI1pHypvcb5qsYl-bAnI/edit?usp=sharing LinkReview], [https://github.com/Intelligent-Systems-Phystech/2018-Project-11/raw/master/report/report.pdf report], [https://github.com/Intelligent-Systems-Phystech/2018-Project-11/raw/master/report/pres.pdf slides]
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-11 Code], [https://docs.google.com/document/d/131-9Uxl4tTIMKBh7WNJuZR5MI1pHypvcb5qsYl-bAnI/edit?usp=sharing LinkReview], [https://github.com/Intelligent-Systems-Phystech/2018-Project-11/raw/master/report/report.pdf report], [https://github.com/Intelligent-Systems-Phystech/2018-Project-11/raw/master/report/pres.pdf slides]
-
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-11/blob/master/Goryan2018Project11/Goryan2018Project11.pdf Николай Горян]
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-11/blob/master/Goryan2018Project11/Goryan2018Project11.pdf Nikolai Goryan]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-11/tree/master/Ulitin2018Project11/Ulitin2018Project11.pdf Александр Улитин]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-11/tree/master/Ulitin2018Project11/Ulitin2018Project11.pdf Alexander Ulitin]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-11/blob/master/Tovkes2018Project11/Abstract.pdf Товкес Артем]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-11/blob/master/Tovkes2018Project11/Abstract.pdf Tovkes Artem]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-11/raw/master/Taranov2018Project-11/Taranov2018Project11.pdf Таранов Сергей]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-11/raw/master/Taranov2018Project-11/Taranov2018Project11.pdf Taranov Sergey]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-11/blob/master/Gubanov2018Project11/Gubanov2018Project11.pdf Губанов Сергей]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-11/blob/master/Gubanov2018Project11/Gubanov2018Project11.pdf Gubanov Sergey]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-11/blob/master/Krinitskiy2018Project11/Abstract.pdf Криницкий Константин]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-11/blob/master/Krinitskiy2018Project11/Abstract.pdf Krinitsky Konstantin]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-11/blob/master/Zabaznov2018Project11/Zabaznov2018Project11.pdf Забазнов Антон]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-11/blob/master/Zabaznov2018Project11/Zabaznov2018Project11.pdf Zabaznov Anton]
[https://github.com/Intelligent-Systems-Phystech/2018-Project-11/blob/master/Markin2018Project11/Markin2018Project11%20(1).pdf Valery Markin]
[https://github.com/Intelligent-Systems-Phystech/2018-Project-11/blob/master/Markin2018Project11/Markin2018Project11%20(1).pdf Valery Markin]
|-
|-
-
|12
+
|Machine translation training without parallel texts.
-
|Обучение машинного перевода без параллельных текстов.
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-12 Code],
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-12 Code],
[https://docs.google.com/document/d/1_5lrNNecgpiW3yObDglUAkTepVGj8ucreMhhcDV60qc/edit LinkReview],
[https://docs.google.com/document/d/1_5lrNNecgpiW3yObDglUAkTepVGj8ucreMhhcDV60qc/edit LinkReview],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-12/raw/master/report/result.pdf Отчет],
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-12/raw/master/report/result.pdf Report],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-12/raw/master/report/pres.pdf Слайды]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-12/raw/master/report/pres.pdf Slides]
-
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-12/raw/master/Artemenkov2018Title/Artemenkov2018Title.pdf Александр Артеменков]
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-12/raw/master/Artemenkov2018Title/Artemenkov2018Title.pdf Alexander Artemenkov]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-12/raw/master/Yaroshenko2018Title/Yaroshenko2018Title.pdf Ангелина Ярошенко]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-12/raw/master/Yaroshenko2018Title/Yaroshenko2018Title.pdf Angelina Yaroshenko]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-12/blob/master/Stroganov2018Title/Stroganov2018Title.pdf Андрей Строганов]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-12/blob/master/Stroganov2018Title/Stroganov2018Title.pdf Andrey Stroganov]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-12/blob/master/Skidnov2018Title/Skidnov2018Title.pdf Егор Скиднов]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-12/blob/master/Skidnov2018Title/Skidnov2018Title.pdf Egor Skidnov]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-12/raw/master/Borisova2018Title/Borisova2018Title.pdf Анастасия Борисова]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-12/raw/master/Borisova2018Title/Borisova2018Title.pdf Anastasia Borisova]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-12/blob/master/Ryabov2018Title/Ryabov2018Title.pdf Рябов Федор]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-12/blob/master/Ryabov2018Title/Ryabov2018Title.pdf Ryabov Fedor]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-12/tree/master/Mazurov2018Title/Abstract.pdf Мазуров Михаил]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-12/tree/master/Mazurov2018Title/Abstract.pdf Mazurov Mikhail]
|-
|-
-
|13
+
|Deep learning for RNA secondary structure prediction
-
|Глубокое обучение для предсказания вторичной структуры РНК
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-13/tree/master/code Code]
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-13/tree/master/code Code]
[https://docs.google.com/document/d/1RrIPcrVb0mEdA_hc7Ttk8thIDnDvtBXgyriIxwpYzzM/edit Link Review]
[https://docs.google.com/document/d/1RrIPcrVb0mEdA_hc7Ttk8thIDnDvtBXgyriIxwpYzzM/edit Link Review]
-
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-13/blob/master/Dorokhin2018Problem13/Dorokhin2018Problem13.pdf Дорохин Семён]
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-13/blob/master/Dorokhin2018Problem13/Dorokhin2018Problem13.pdf Dorokhin Semyon]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-13/tree/master/Pastukhov2018Project13 Пастухов Сергей]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-13/tree/master/Pastukhov2018Project13 Pastukhov Sergey]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-13/raw/master/Pikunov2018Problem13/first.pdf Пикунов Андрей]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-13/raw/master/Pikunov2018Problem13/first.pdf Pikunov Andrey]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-13/blob/master/Nesterova2018Project13/tutorial.pdf Нестерова Ирина]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-13/blob/master/Nesterova2018Project13/tutorial.pdf Nesterova Irina]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-13/blob/master/Kurilovich2018Problem13/Kurilovich2018Problem13.pdf Курилович Анна]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-13/blob/master/Kurilovich2018Problem13/Kurilovich2018Problem13.pdfKurilovich Anna]
[https://t.me/joinchat/DE_WxRAo9v0lIKxGyc07Kg chat]
[https://t.me/joinchat/DE_WxRAo9v0lIKxGyc07Kg chat]
|-
|-
-
|14
 
|Deep Learning for reliable detection of tandem repeats in 3D protein structures
|Deep Learning for reliable detection of tandem repeats in 3D protein structures
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-14 Code]
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-14 Code]
Строка 1656: Строка 2248:
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-14/raw/master/Veselova2019Project14/Veselova2019Project14.pdf Veselova Evgeniya]
|[https://github.com/Intelligent-Systems-Phystech/2019-Project-14/raw/master/Veselova2019Project14/Veselova2019Project14.pdf Veselova Evgeniya]
|-
|-
-
|15
+
|Formulation and solution of an optimization problem combining classification and regression to estimate the binding energy of a protein and small molecules
-
|Формулировка and решение задачи оптимизации, сочетающей классификацию and регрессию, для оценки энергии связывания белка and маленьких молекул
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-15/Code Code]
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-15/Code Code]
[https://docs.google.com/document/d/1Be2O0My8KWwOKLo8bFMmF8tPMCFGCK4zUVArurrPeNQ/edit Link Review]
[https://docs.google.com/document/d/1Be2O0My8KWwOKLo8bFMmF8tPMCFGCK4zUVArurrPeNQ/edit Link Review]
-
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-15/tree/master/Merkulova2018Title Меркулова Анастасия]
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-15/tree/master/Merkulova2018Title Merkulova Anastasia]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-15/tree/master/Plumite2018Title Плумите Эльвира]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-15/tree/master/Plumite2018Title Plumite Elvira]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-15/tree/master/Zhiboedova2018Title Жибоедова Анастасия]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-15/tree/master/Zhiboedova2018Title Zhiboyedova Anastasia]
[https://vk.me/join/AJQ1d2J3jQq0jJ50G5VAoioS chat]
[https://vk.me/join/AJQ1d2J3jQq0jJ50G5VAoioS chat]
|-
|-
-
|16
+
|Estimation of the optimal sample size for research in medicine
-
|Оценка оптимального объема выборки для исследований в медицине
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-16 Code]
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-16 Code]
[https://docs.google.com/document/d/1yqnjgMUheHQUp8AAQPqqy9jTJhhzzd_6wvnHY7GF1Fk/edit?usp=sharing Link Review]
[https://docs.google.com/document/d/1yqnjgMUheHQUp8AAQPqqy9jTJhhzzd_6wvnHY7GF1Fk/edit?usp=sharing Link Review]
-
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-16/blob/master/report/Kharatyan2018Project16/report.pdf Артемий Харатян],
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-16/blob/master/report/Kharatyan2018Project16/report.pdf Artemy Kharatyan],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-16/raw/master/Mikheev2018Project16 Михаил Михеев],
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-16/raw/master/Mikheev2018Project16 Mikhail Mikheev],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-16/tree/master/Evgin2018Project16 Евгин Александр],
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-16/tree/master/Evgin2018Project16 Evgin Alexander],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-16/tree/master/Seppar2018Project16 Сеппар Александр],
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-16/tree/master/Seppar2018Project16 Seppar Alexander],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-16/tree/master/Konoplev2018Project16 Коноплёв Максим],
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-16/tree/master/Konoplev2018Project16 Konoplyov Maxim],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-16/tree/master/Murlatov2018Project16 Мурлатов Станислав],
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-16/tree/master/Murlatov2018Project16 Murlatov Stanislav],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-16/tree/master/Makarenko2018Project16 Макаренко Степан]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-16/tree/master/Makarenko2018Project16 Makarenko Stepan]
|-
|-
-
|17
+
|Intention forecasting. Investigation of the properties of local models in the spatial decoding of brain signals
-
|Прогнозирование намерений. Исследование свойств локальных моделей при пространственном декодировании сигналов головного мозга
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-17/tree/master/code Code],
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-17/tree/master/code Code],
[https://docs.google.com/document/d/1j6laGt-zTP3lTm1v0Ozev3dKxivYciq9TOWfmn5sAIU/edit?usp=sharing LinkReview],
[https://docs.google.com/document/d/1j6laGt-zTP3lTm1v0Ozev3dKxivYciq9TOWfmn5sAIU/edit?usp=sharing LinkReview],
[https://github.com/Intelligent-Systems-Phystech/2018-Project-17/raw/master/report/Presentation.pdf Presentation]
[https://github.com/Intelligent-Systems-Phystech/2018-Project-17/raw/master/report/Presentation.pdf Presentation]
-
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-17/blob/master/Bolobolova2018Project17/Bolobolova2018Project17.pdf Наталия Болоболова],
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-17/blob/master/Bolobolova2018Project17/Bolobolova2018Project17.pdf Natalia Bolobolova],
[https://github.com/Intelligent-Systems-Phystech/2018-Project-17/raw/master/Samokhina2018Project17/Samokhina2018Problem17.pdf Alina Samokhina],
[https://github.com/Intelligent-Systems-Phystech/2018-Project-17/raw/master/Samokhina2018Project17/Samokhina2018Problem17.pdf Alina Samokhina],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-17/raw/master/Shiyanov2018Project17/Shiyanov2018Project17.pdf Шиянов Вадим]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-17/raw/master/Shiyanov2018Project17/Shiyanov2018Project17.pdf Shiyanov Vadim]
|-
|-
-
|18
+
|Intention forecasting. Building an optimal signal decoding model for modeling a brain-computer interface.
-
|Прогнозирование намерений. Построение оптимальной модели декодирования сигналов при моделировании нейрокомпьютерного интерфейса.
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-18 Code],
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-18 Code],
[https://docs.google.com/document/d/1b-CjunKY5nkZUK0Zfur0nKyQPaY2eWqht7kMcMQd-J8/edit LinkReview],
[https://docs.google.com/document/d/1b-CjunKY5nkZUK0Zfur0nKyQPaY2eWqht7kMcMQd-J8/edit LinkReview],
[https://github.com/Intelligent-Systems-Phystech/2018-Project-18/raw/master/Presentation-v1.pdf Presentation],
[https://github.com/Intelligent-Systems-Phystech/2018-Project-18/raw/master/Presentation-v1.pdf Presentation],
[https://github.com/Intelligent-Systems-Phystech/2018-Project-18/raw/master/_________________________.pdf Article]
[https://github.com/Intelligent-Systems-Phystech/2018-Project-18/raw/master/_________________________.pdf Article]
-
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-18/raw/master/Nasedkin2018Project18/Nasedkin2018Project18.pdf Иван Наседкин], [https://github.com/Intelligent-Systems-Phystech/2018-Project-18/raw/master/Latypova2018Project18/Latypova.pdf Галия Латыпова],
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-18/raw/master/Nasedkin2018Project18/Nasedkin2018Project18.pdf Ivan Nasedkin], [https://github.com/Intelligent-Systems-Phystech/2018-Project-18/raw/master/Latypova2018Project18/Latypova.pdf Galiya Latypova],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-18/raw/master/Sukhodolskiy2018Project18/Sukhodolskiy2018Project18.pdf Нестор Суходольский],
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-18/raw/master/Sukhodolskiy2018Project18/Sukhodolskiy2018Project18.pdf Nestor Sukhodolsky],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-18/raw/master/Shemenev2018Project18/Shemenev2018Project18.pdf Александр Шеменев]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-18/raw/master/Shemenev2018Project18/Shemenev2018Project18.pdf Alexander Shemenev]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-18/raw/master/Borodulin2018Project18/Borodulin2018Project18.pdf Иван Бородулин],
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-18/raw/master/Borodulin2018Project18/Borodulin2018Project18.pdf Ivan Borodulin],
|-
|-
-
|19
+
|Investigation of the dependence of the quality of recognition of ontological objects on the depth of hyponymy.
-
| Исследование зависимости качества распознавания онтологических объектов от глубины гипонимии.
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-19 Code],
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-19 Code],
[https://github.com/ddvika/2018-Project-19/raw/master/report/final_report.pdf Report],
[https://github.com/ddvika/2018-Project-19/raw/master/report/final_report.pdf Report],
[https://docs.google.com/document/d/1OeMPgVMi72AbHOKsKsUDs6ggMdNL2UT0liycgmYrnLk/edit LinkReview], [https://github.com/ddvika/2018-Project-19/raw/master/report/presentation19project.pdf Presentation]
[https://docs.google.com/document/d/1OeMPgVMi72AbHOKsKsUDs6ggMdNL2UT0liycgmYrnLk/edit LinkReview], [https://github.com/ddvika/2018-Project-19/raw/master/report/presentation19project.pdf Presentation]
-
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-19/raw/master/Rezyapkin2018Project19/RezyapkinPaper.pdf Вячеслав Резяпкин], [https://github.com/Intelligent-Systems-Phystech/2018-Project-19/raw/master/Russkin2018Project19/Russkin2018Project19.pdf Алексей Русскин],
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-19/raw/master/Rezyapkin2018Project19/RezyapkinPaper.pdf Vyacheslav Rezyapkin], [https://github.com/Intelligent-Systems-Phystech/2018-Project-19/raw/master/Russkin2018Project19/Russkin2018Project19.pdf Alexey Russkin],
-
[https://github.com/ddvika/2018-Project-19/raw/master/Dochkina2018Project19/Dochkina2018Project19.pdf Виктория Дочкина],
+
[https://github.com/ddvika/2018-Project-19/raw/master/Dochkina2018Project19/Dochkina2018Project19.pdf Victoria Dochkina],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-19/raw/master/Kuznetsov2018Project19/KuznetsovMiron.pdf Мирон Кузнецов],
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-19/raw/master/Kuznetsov2018Project19/KuznetsovMiron.pdf Miron Kuznetsov],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-19/raw/master/Yarmoshik2018Project19/Yarmoshik_article.pdf Ярмошик Демьян]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-19/raw/master/Yarmoshik2018Project19/Yarmoshik_article.pdf Yarmoshyk Demyan]
|-
|-
-
|20
+
|Comparison of the quality of end-to-end trainable models in The problem of answering questions in a dialogue, taking into account the context
-
| Сравнение качества end-to-end обучаемых моделей в задаче ответа на вопросы в диалоге с учетом контекста
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-20 Code]
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-20 Code]
[https://docs.google.com/document/d/1GQmJ6I2fIBchikR-44DcmMD4H-58j3_wuIchNK49Zrs/edit LinkReview]
[https://docs.google.com/document/d/1GQmJ6I2fIBchikR-44DcmMD4H-58j3_wuIchNK49Zrs/edit LinkReview]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-20/blob/master/Ryakin2018problem20/Ryakin2018project20.pdf Отчет],
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-20/blob/master/Ryakin2018problem20/Ryakin2018project20.pdf Report],
[https://github.com/Intelligent-Systems-Phystech/2018-Project-20/blob/master/presentation/QuAC.pdf Presentation]
[https://github.com/Intelligent-Systems-Phystech/2018-Project-20/blob/master/presentation/QuAC.pdf Presentation]
-
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-20/raw/master/Agafonov2018probem20/article/Agafonov2018project20.pdf Агафонов Алексей], [https://github.com/Intelligent-Systems-Phystech/2018-Project-20/blob/master/Ryakin2018problem20/Ryakin2018project20.pdf Рякин Илья],[https://github.com/Intelligent-Systems-Phystech/2018-Project-20/blob/master/Litvinenko2018problem20/Litvinenko2018project20.pdf Литвиенко Владимир],
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-20/raw/master/Agafonov2018probem20/article/Agafonov2018project20.pdf Agafonov Alexey], [https://github.com/Intelligent-Systems-Phystech/2018-Project-20/blob/master/Ryakin2018problem20/Ryakin2018project20.pdf Ryakin Ilya],[https://github.com/Intelligent-Systems-Phystech/2018-Project-20/blob/master/Litvinenko2018problem20/Litvinenko2018project20.pdf Litvinenko Vladimir],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-20/blob/master/Khokhlov2018problem20/Khokhlov2018project20.pdf Хохлов Иван],
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-20/blob/master/Khokhlov2018problem20/Khokhlov2018project20.pdf Khokhlov Ivan],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-20/blob/master/Velikovsky2018project20/Velikovsky2018project20.pdf Великовский Никита],
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-20/blob/master/Velikovsky2018project20/Velikovsky2018project20.pdf Velikovsky Nikita],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-20/blob/master/Anufrienko2018project20/Anufrienko2018project20.pdf Ануфриенко Олег]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-20/blob/master/Anufrienko2018project20/Anufrienko2018project20.pdf Anufrienko Oleg]
|-
|-
-
|21
+
|High order convex optimization methods
-
|Методы выпуклой оптимизации высокого порядка
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-21/tree/master/code Code],
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-21/tree/master/code Code],
[https://docs.google.com/document/d/1jF1Hkqbn2e7BnuguTzYuRPp43Y5MbMP36MlWwFVkf6U/edit LinkReview],
[https://docs.google.com/document/d/1jF1Hkqbn2e7BnuguTzYuRPp43Y5MbMP36MlWwFVkf6U/edit LinkReview],
[https://github.com/Intelligent-Systems-Phystech/2018-Project-21/blob/master/report/presentation_results.pdf Slides]
[https://github.com/Intelligent-Systems-Phystech/2018-Project-21/blob/master/report/presentation_results.pdf Slides]
-
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-21/raw/master/Selikhanovych2018Title/Selikhanovych2018Title.pdf Селиханович Даниил],
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-21/raw/master/Selikhanovych2018Title/Selikhanovych2018Title.pdf Selikhanovich Daniel],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-21/blob/master/Sokolov2018Title/Sokolov2018Title.pdf Соколов Игорь]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-21/blob/master/Sokolov2018Title/Sokolov2018Title.pdf Sokolov Igor]
|-
|-
-
|23
+
|Fractal analysis and synthesis of optical images of sea waves
-
|Фрактальный анализ and синтез оптических изображений морского волнения
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-23/tree/master/code code],
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-23/tree/master/code code],
[https://docs.google.com/document/d/1g-8H-i8vyThkWUTvthebbr4-qSd8c-kE4B_bieykF7c/edit LinkReview],
[https://docs.google.com/document/d/1g-8H-i8vyThkWUTvthebbr4-qSd8c-kE4B_bieykF7c/edit LinkReview],
[https://github.com/Intelligent-Systems-Phystech/2018-Project-23/blob/master/Kanygin2018/Projecte23_presentation.pdf Presentation]
[https://github.com/Intelligent-Systems-Phystech/2018-Project-23/blob/master/Kanygin2018/Projecte23_presentation.pdf Presentation]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-23/raw/master/Kanygin2018/Kanygin2018Project23.pdf report]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-23/raw/master/Kanygin2018/Kanygin2018Project23.pdf Report]
-
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-23/raw/master/Kanygin2018/Kanygin2018Project23.pdf Каныгин Юрий]
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-23/raw/master/Kanygin2018/Kanygin2018Project23.pdf Kanygin Yuri]
|-
|-
-
|24
+
|Entropy maximization for various types of image transformations
-
|Максимизация энтропии при различных видах преобразований над изображением
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-24/tree/master/code code],
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-24/tree/master/code code],
[https://docs.google.com/document/d/1FtOjEcx7S0PJ7ASP0V_5zM2nQDTSl0c9I61r0SYAWVc/edit LinkReview],
[https://docs.google.com/document/d/1FtOjEcx7S0PJ7ASP0V_5zM2nQDTSl0c9I61r0SYAWVc/edit LinkReview],
[https://github.com/Intelligent-Systems-Phystech/2018-Project-24/raw/master/report/report2018Project24.pdf report],
[https://github.com/Intelligent-Systems-Phystech/2018-Project-24/raw/master/report/report2018Project24.pdf report],
[https://github.com/Intelligent-Systems-Phystech/2018-Project-24/raw/master/slides/slides2018Project24.pdf slides]
[https://github.com/Intelligent-Systems-Phystech/2018-Project-24/raw/master/slides/slides2018Project24.pdf slides]
-
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-24/raw/master/Voskresenskiy2018Project24/Voskresenskiy2018Project24.pdf Никита Воскресенский],
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-24/raw/master/Voskresenskiy2018Project24/Voskresenskiy2018Project24.pdf Nikita Voskresensky],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-24/raw/master/Shabalina2018Project24/Shabalina2018Project24.pdf Алиса Шабалина],
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-24/raw/master/Shabalina2018Project24/Shabalina2018Project24.pdf Alisa Shabalina],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-24/raw/master/Murzaev2018Project24/Murzaev2018Project24.pdf Ярослав Мурзаев],
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-24/raw/master/Murzaev2018Project24/Murzaev2018Project24.pdf Yaroslav Murzaev],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-24/raw/master/Khokhlov2018Project24/Khokhlov2018Project24.pdf Алексей Хохлов],
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-24/raw/master/Khokhlov2018Project24/Khokhlov2018Project24.pdf Alexey Khokhlov],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-24/raw/master/Kazakov2018Project24/Kazakov2018Project24.pdf Алексей Казаков],
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-24/raw/master/Kazakov2018Project24/Kazakov2018Project24.pdf Alexey Kazakov],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-24/raw/master/Gribova2018Project24/Gribova2018Project24.pdf Ольга Грибова],
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-24/raw/master/Gribova2018Project24/Gribova2018Project24.pdf Olga Gribova],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-24/raw/master/Belozertsev2018Project24/Belozertsev2018Project24.pdf Александр Белозерцев]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-24/raw/master/Belozertsev2018Project24/Belozertsev2018Project24.pdf Alexander Belozertsev]
|-
|-
-
|25
+
|Automatic detection and recognition of objects in images
-
|Автоматическое детектирование and распознавание объектов на изображениях
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-25 code],
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-25 code],
[https://github.com/Intelligent-Systems-Phystech/2018-Project-25a code_A],
[https://github.com/Intelligent-Systems-Phystech/2018-Project-25a code_A],
Строка 1758: Строка 2340:
[https://github.com/Intelligent-Systems-Phystech/2018-Project-25/raw/master/report/slides_last.pdf slides_25_31]
[https://github.com/Intelligent-Systems-Phystech/2018-Project-25/raw/master/report/slides_last.pdf slides_25_31]
[https://docs.google.com/document/d/1s7QlihPkamecuVXXLVc5V76cBQn3HBo47HdbAOD0xBI/edit LinkReview]
[https://docs.google.com/document/d/1s7QlihPkamecuVXXLVc5V76cBQn3HBo47HdbAOD0xBI/edit LinkReview]
-
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-25/raw/master/Demidova2018Title/Demidova2018Project25_31.pdf Юлия Демидова]
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-25/raw/master/Demidova2018Title/Demidova2018Project25_31.pdf Julia Demidova]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-25/blob/master/Razumov2018Title/Razumov2018Project25_30.pdf Иван Разумов]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-25/blob/master/Razumov2018Title/Razumov2018Project25_30.pdf Ivan Razumov]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-25/raw/master/report/Report2018Project25_31.pdf Владислав Томинин]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-25/raw/master/report/Report2018Project25_31.pdf Vladislav Tominin]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-25/raw/master/TomininY2018Title/final/TomininY2018Project25_31.pdf Ярослав Томинин]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-25/raw/master/TomininY2018Title/final/TomininY2018Project25_31.pdf Yaroslav Tominin]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-25/blob/master/Dudorov2018Title/Dudorov2018Project25_31.pdf Никита Дудоров]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-25/blob/master/Dudorov2018Title/Dudorov2018Project25_31.pdf Nikita Dudorov]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-25/raw/master/Erlygin2018Title/jmlda-example-students.pdf Леонид Ерлыгин]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-25/raw/master/Erlygin2018Title/jmlda-example-students.pdf Leonid Erlygin]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-25/raw/master/Proshutinskii2018/!%20Article/Proshutinskii2018Project25_30.pdf Прошутинский Дмитрий]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-25/raw/master/Proshutinskii2018/!%20Article/Proshutinskii2018Project25_30.pdf Proshutinsky Dmitry]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-25/blob/master/Baymakov2018/25_Project.pdf Баймаков Владимир]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-25/blob/master/Baymakov2018/25_Project.pdf Baimakov Vladimir]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-25/blob/master/Zubkov2018/Zubkov2018Problem25.pdf Зубков Александр]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-25/blob/master/Zubkov2018/Zubkov2018Problem25.pdf Zubkov Alexander]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-25/blob/master/Chernenkova2018/Chernenkova2018Problem25.pdf Черненкова Елена]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-25/blob/master/Chernenkova2018/Chernenkova2018Problem25.pdf Chernenkova Elena]
-
 
+
|-
|-
-
|26
+
|Location determination by accelerometer signals
-
|Определение местоположения по сигналам акселерометра
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-26 Code],
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-26 Code],
[https://docs.google.com/document/d/1er3SgPu9bBBWkLk1yVev-9Ue42BOPapOkLn6sL0GAGA/edit?usp=sharing LinkReview],
[https://docs.google.com/document/d/1er3SgPu9bBBWkLk1yVev-9Ue42BOPapOkLn6sL0GAGA/edit?usp=sharing LinkReview],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-26/raw/master/Project26.pdf Слайды],
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-26/raw/master/Project26.pdf Slides],
-
[https://github.com/Vitaly-Protasov/Project26/raw/master/text.pdf Текст]
+
[https://github.com/Vitaly-Protasov/Project26/raw/master/text.pdf Text]
-
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-26/raw/master/Zainulina2018Project26/Zainulina2018Project26.pdf Эльвира Зайнулина]
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-26/raw/master/Zainulina2018Project26/Zainulina2018Project26.pdf Elvira Zainulina]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-26/raw/master/Fateev2018Project26/Fateev2018Project26.pdf Фатеев Дмитрий]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-26/raw/master/Fateev2018Project26/Fateev2018Project26.pdf Fateev Dmitry]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-26/raw/master/ProtasovKing2018Project26/Article.pdf Виталий Протасов]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-26/raw/master/ProtasovKing2018Project26/Article.pdf Vitaly Protasov]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-26/raw/master/Bozhedomov2018Project26/Bozhedomov2018Project26.pdf Никита Божедомов]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-26/raw/master/Bozhedomov2018Project26/Bozhedomov2018Project26.pdf Nikita Bozhedomov]
|-
|-
-
|28
+
|Multimodelling as a universal way to describe a general sample
-
|Мультимоделирование как универсальный способ описания выборки общего вида
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-28 Code],
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-28 Code],
[https://docs.google.com/document/d/1w8KoJqcppcsjjtQ_MNd4JTdxmCgerllRRkqvJHWhpX4/edit Linkreview],
[https://docs.google.com/document/d/1w8KoJqcppcsjjtQ_MNd4JTdxmCgerllRRkqvJHWhpX4/edit Linkreview],
[https://github.com/Intelligent-Systems-Phystech/2018-Project-28/blob/master/Slides.pdf Slides],
[https://github.com/Intelligent-Systems-Phystech/2018-Project-28/blob/master/Slides.pdf Slides],
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-28/blob/master/report/report.pdf report]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-28/blob/master/report/report.pdf Report]
-
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-28/raw/master/Kachanov2018Project28/Kachanov2018Project28.pdf Владимир Качанов]
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-28/raw/master/Kachanov2018Project28/Kachanov2018Project28.pdf Vladimir Kachanov]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-28/raw/master/Strelkova2018Project28/Strelkova2018Project28.pdf Евгения Стрелкова]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-28/raw/master/Strelkova2018Project28/Strelkova2018Project28.pdf Evgenia Strelkova]
|-
|-
-
|29
 
|Cross-Language Document Extractive Summarization with Neural Sequence Model
|Cross-Language Document Extractive Summarization with Neural Sequence Model
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-29/tree/master/code Code],
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-29/tree/master/code Code],
-
[https://docs.google.com/spreadsheets/d/1mDOp2KnXI9dH8_QYdj4fY-pMBWnqXfECkFUEg244O38/edit#gid=0 Linkreview], [https://github.com/Intelligent-Systems-Phystech/2018-Project-29/raw/master/report/Task29_Report.pdf Отчет], [https://github.com/Intelligent-Systems-Phystech/2018-Project-29/raw/master/report/CrossLang_Summa.pdf Слайды]
+
[https://docs.google.com/spreadsheets/d/1mDOp2KnXI9dH8_QYdj4fY-pMBWnqXfECkFUEg244O38/edit#gid=0 Linkreview], [https://github.com/Intelligent-Systems-Phystech/2018-Project-29/raw/master/report/The problem29_Report.pdf Report], [https://github.com/Intelligent-Systems-Phystech/2018-Project-29/raw/master/report/CrossLang_Summa.pdf Slides]
-
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-29/raw/master/Zakharov2018Title/Zakharov2018Article.pdf Павел Захаров]
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-29/raw/master/Zakharov2018Title/Zakharov2018Article.pdf Pavel Zakharov]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-29/blob/master/Kvasha2018Title/article.pdf Павел Кваша]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-29/blob/master/Kvasha2018Title/article.pdf Pavel Kvasha]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-29/tree/master/Dyachkov2018Title/article.pdf Евгений Дьячков]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-29/tree/master/Dyachkov2018Title/article.pdf Evgeny Dyachkov]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-29/raw/master/Petrov2018Title/article.pdf Евгений Петров]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-29/raw/master/Petrov2018Title/article.pdf Evgeny Petrov]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-29/blob/master/Selnitskiy2018Title/article.pdf Илья Сельницкий]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-29/blob/master/Selnitskiy2018Title/article.pdf Ilya Selnitsky]
|-
|-
-
|31
 
|Pairwise energy matrix construction for inverse folding problem
|Pairwise energy matrix construction for inverse folding problem
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-31/tree/master/code Code],
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-31/tree/master/code Code],
Строка 1806: Строка 2383:
[https://github.com/Intelligent-Systems-Phystech/2018-Project-31/blob/master/Rubinstein2018Project31/Rubinstein2018Project31.pdf Report]
[https://github.com/Intelligent-Systems-Phystech/2018-Project-31/blob/master/Rubinstein2018Project31/Rubinstein2018Project31.pdf Report]
[https://github.com/Intelligent-Systems-Phystech/2018-Project-31/raw/master/report/RubinsteinAR.pdf Slides]
[https://github.com/Intelligent-Systems-Phystech/2018-Project-31/raw/master/report/RubinsteinAR.pdf Slides]
-
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-31/raw/master/Rubinstein2018Project31/Rubinstein2018Project31.pdf Рубинштейн Александр]
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-31/raw/master/Rubinstein2018Project31/Rubinstein2018Project31.pdf Rubinshtein Alexander]
|-
|-
-
|32
 
|Smooth orientation-dependent scoring function
|Smooth orientation-dependent scoring function
|[https://gitlab.inria.fr/grudinin/sbrod Code]
|[https://gitlab.inria.fr/grudinin/sbrod Code]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-SBROD Отчёт]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-SBROD
-
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-SBROD/blob/master/Noskova/report.pdf Носкова Елизавета]
+
|[https://github.com/Intelligent-Systems-Phystech/2018-Project-SBROD/blob/master/Noskova/report.pdf Noskova Elizaveta]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-SBROD/blob/master/Kachkov/report.pdf Качков Сергей]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-SBROD/blob/master/Kachkov/report.pdf Kachkov Sergey]
-
[https://github.com/Intelligent-Systems-Phystech/2018-Project-SBROD/blob/master/Sidorenko/report.pdf Сидоренко Антон]
+
[https://github.com/Intelligent-Systems-Phystech/2018-Project-SBROD/blob/master/Sidorenko/report.pdf Sidorenko Anton]
-
 
+
|-
|}
|}
-
=== Task 5 ===
+
===5. 2018===
-
* '''Name:''' Нахождение парафразов.
+
* '''Title:''' Finding paraphrases.
-
* '''Task:''' Парафразы — разные вариации одного and того же текста, одинаковые по смыслу, но отличающиеся лексически and грамматически, например: "Куда поехала машина" and "В каком направлении поехал автомобиль". Task детектирования парафразов заключается в выделении в множестве текстов кластеров, таких что в каждом кластере содержатся только парафразы одного and того же предложения.
+
* '''Problem description:''' Paraphrases are different variations of the same and the same text, identical in meaning, but differing lexically and grammatically, for example: "Where did the car go" and "Which direction did the car go". The problem of detecting paraphrases is to select clusters in a set of texts, such that each cluster contains only paraphrases of the same and the same sentence. The easiest way to extract paraphrases is to cluster texts, where each text is represented by a "bag of words".
-
Самый простой способ выделения парафразов — кластеризация текстов, где каждый текст представлен "мешком слов".
+
* '''Data:''' There are open datasets of questions for testing and training on kaggle.com, there are open datasets for testing from semeval conferences.
-
*. '''Data:''' Есть открытые датасеты вопросов для тестирования and обучения на kaggle.com, есть открытые данные для тестирования с конференций semeval.
+
* '''Base algorithm:''' Use one of the document clustering algorithms to extract paraphrases, where each document is represented by a bag of words or tf-idf.
-
* '''References:'''
+
* '''Solution:''' Use neural network architectures to search for paraphrases, use phrases extracted with parsers as features, use multilevel clustering.
-
*# Будет позже
+
* '''Novelty:''' Lack of implementations for the Russian language that will use parsers for a similar The problem, all current solutions are quite "simple".
-
* '''Basic algorithm:''' Использовать для выделения парафразов какой-нибудь из алгоритмов кластеризации документов, где каждый документ представлен мешком слов или tf-idf.
+
* '''Authors:''' Artyom Popov.
-
* '''Solution:''' Использовать нейросетевые архитектуры для поиска парафразов, использовать в качестве признаков словосочетания, выделенные с помощью синтаксических анализаторов, использовать многоуровневую кластеризацию.
+
 
-
* '''Novelty:''' Отсутствие реализаций для русского языка, которые будут использовать синтаксические анализаторы для подобной задачи, все текущие решения достаточно "просты".
+
===6. 2018===
-
* '''Authors:''' Артём Попов.
+
* '''Title:''' On conformational changes of proteins using collective motions in torsion angle space and L1 regularization.
-
=== Task 6 ===
+
* '''Problem description:''' Torsion angles are the most natural degrees of freedom for describing motions of polymers, such as proteins. This is because bond lengths and bond angles are heavily constrained by covalent forces. Thus, multiple attempts have been done to describe protein dynamics in the torsion angle space. For example, one of us has developed an elastic network model (ENM) [1] in torsion angle space called Torsional Network Model (TNM) [2]. Functional conformational changes in proteins can be described in the Cartesian space using just a subset of collective coordinates [3], or even a sparse representation of these [4]. The latter requires a solution of a LASSO optimization problem [5]. The goal of the current project is to study if a sparse subset of collective coordinates in the torsion subspace can describe functional conformational changes in proteins. This will require a solution of a ridge regression problem with a L1 regularization constraint. The starting point will be the LASSO formulation.
-
* '''Name:''' On conformational changes of proteins using collective motions in torsion angle space and L1 regularization.
+
* '''Data:''' Experimental conformations will be extracted from the Protein Docking Benchmark v5 (https://zlab.umassmed.edu/benchmark/) and a few others. The TNM model can be downloaded from https://ub.cbm.uam.es/tnm/tnm_soft_main.php
-
* '''Task:''' Torsion angles are the most natural degrees of freedom for describing motions of polymers, such as proteins. This is because bond lengths and bond angles are heavily constrained by covalent forces. Thus, multiple attempts have been done to describe protein dynamics in the torsion angle space. For example, one of us has developed an elastic network model (ENM) [1] in torsion angle space called Torsional Network Model (TNM) [2]. Functional conformational changes in proteins can be described in the Cartesian space using just a subset of collective coordinates [3], or even a sparse representation of these [4]. The latter requires a solution of a LASSO optimization problem [5]. The goal of the current project is to study if a sparse subset of collective coordinates in the torsion subspace can describe functional conformational changes in proteins. This will require a solution of a ridge regression problem with a L1 regularization constraint. The starting point will be the LASSO formulation.
+
-
*. '''Data:''' Experimental conformations will be extracted from the Protein Docking Benchmark v5 (https://zlab.umassmed.edu/benchmark/) and a few others. The TNM model can be downloaded from https://ub.cbm.uam.es/tnm/tnm_soft_main.php
+
* '''References:'''
* '''References:'''
*# Tirion MM. (1996) Large Amplitude Elastic Motions in Proteins from a Single-Parameter, Atomic Anal- ysis. Phys Rev Lett. 77:1905–1908.
*# Tirion MM. (1996) Large Amplitude Elastic Motions in Proteins from a Single-Parameter, Atomic Anal- ysis. Phys Rev Lett. 77:1905–1908.
Строка 1840: Строка 2414:
*# https://en.wikipedia.org/wiki/Lasso_(statistics)
*# https://en.wikipedia.org/wiki/Lasso_(statistics)
*# E. Frezza, R. Lavery, Internal normal mode analysis (iNMA) applied to protein conformational flexibility, Journal of Chemical Theory and Computation 11 (2015) 5503–5512.
*# E. Frezza, R. Lavery, Internal normal mode analysis (iNMA) applied to protein conformational flexibility, Journal of Chemical Theory and Computation 11 (2015) 5503–5512.
-
* '''Basic algorithm:''' The starting point will be a combination of methods from references 2 and 4. It has to be a LASSO formulation with the direction vectors reconstructed from the internal coordinates. The quality will be computed based on the RMSD measure between the prediction and the solution on several benchmarks. Results will be presented with statistical plots (see examples in references 3-4.
+
* '''Base algorithm:''' The starting point will be a combination of methods from references 2 and 4. It has to be a LASSO formulation with the direction vectors reconstructed from the internal coordinates. The quality will be computed based on the RMSD measure between the prediction and the solution on several benchmarks. Results will be presented with statistical plots (see examples in references 3-4.
* '''Novelty:''' This is an important and open question in computational structural bioinformatics - how to efficiently represent transitions between protein structures. Not much has been done in the torsional angle subspace (internal coordinates)[6] and nearly nothing has been done using L1 regularization [4].
* '''Novelty:''' This is an important and open question in computational structural bioinformatics - how to efficiently represent transitions between protein structures. Not much has been done in the torsional angle subspace (internal coordinates)[6] and nearly nothing has been done using L1 regularization [4].
* '''Authors:''' Ugo Bastolla on the torsional subspace (https://ub.cbm.uam.es/home/ugo.php), Sergei Grudinin on L1 minimization (https://team.inria.fr/nano-d/team-members/sergei-grudinin/)
* '''Authors:''' Ugo Bastolla on the torsional subspace (https://ub.cbm.uam.es/home/ugo.php), Sergei Grudinin on L1 minimization (https://team.inria.fr/nano-d/team-members/sergei-grudinin/)
-
=== Task 10 ===
+
===10. 2018===
-
* '''Name:''' Сравнение нейросетевых and непрерывно-морфологических методов в задаче детекции текста (Text Detection).
+
* '''Title:''' Comparison of neural network and continuous-morphological methods in the problem of text detection (Text Detection).
-
* '''Task''': Automatically Detect Text in Natural Images.
+
* '''Problem:''' Automatically Detect Text in Natural Images.
-
* '''Data:''' синтетические сгенерированные данные + подготовленная выборка фотографий + [https://vision.cornell.edu/se3/coco-text-2/ COCO-Text dataset] + [http://www.machinelearning.ru/wiki/index.php?title=%D0%9A%D0%BE%D0%BD%D0%BA%D1%83%D1%80%D1%81_Avito.ru-2014:_%D1%80%D0%B0%D1%81%D0%BF%D0%BE%D0%B7%D0%BD%D0%B0%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5_%D0%BA%D0%BE%D0%BD%D1%82%D0%B0%D0%BA%D1%82%D0%BD%D0%BE%D0%B9_%D0%B8%D0%BD%D1%84%D0%BE%D1%80%D0%BC%D0%B0%D1%86%D0%B8%D0%B8_%D0%BD%D0%B0_%D0%B8%D0%B7%D0%BE%D0%B1%D1%80%D0%B0%D0%B6%D0%B5%D0%BD%D0%B8%D1%8F%D1%85 Конкурс Avito 2014].
+
* '''Data:''' Synthetic generated data + prepared sample of photos + [https://vision.cornell.edu/se3/coco-text-2/ COCO-Text dataset] + [http://www.machinelearning.ru/ Competition Avito 2014].
-
* '''References:''': [https://vision.cornell.edu/se3/wp-content/uploads/2016/01/1601.07140v1.pdf COCO benchmark], [https://vision.cornell.edu/se3/wp-content/uploads/2016/01/1601.07140v1.pdf One of a state-of-the-art architecture]
+
* '''References:''' [https://vision.cornell.edu/se3/wp-content/uploads/2016/01/1601.07140v1.pdf COCO benchmark], [https://vision.cornell.edu/se3/wp-content/uploads/2016/01/1601.07140v1.pdf One of a state-of-the-art architecture]
-
* '''Basic algorithm:''' [https://github.com/eragonruan/text-detection-ctpn code] + морфологические методы, [http://www.machinelearning.ru/wiki/images/f/f1/Avito.ru-2014_Ulyanov_presentation.pdf Avito 2014 winner’s solution].
+
* '''Base algorithm:''' [https://github.com/eragonruan/text-detection-ctpn code] + morphological methods, [http://www.machinelearning.ru/wiki/images/f/f1/Avito.ru-2014_Ulyanov_presentation.pdf Avito 2014 winner’s solution].
-
* '''Solution:''' Предлагается сравнить работы нескольких state-of-the-art алгоритмов, которым нужна обширная обучающая выборка, с морфологическими методы, требующие небольшого числа данных. Предлагается определить границы применимости тех или иных методов.
+
* '''Solution:''' It is proposed to compare the performance of several state-of-the-art algorithms that need a large training set with morphological methods that require a small amount of data. It is proposed to determine the limits of applicability of certain methods.
-
* '''Novelty:''' предложить алгоритм, основанный на использовании как нейросетевых, так and морфологических методов (решение задачи word detection).
+
* '''Novelty:''' propose an algorithm based on the use of both neural network and morphological methods (solution of the word detection problem).
-
* '''Authors:''' И. Н. Жариков.
+
* '''Authors:''' I. N. Zharikov.
-
* '''Expert''': Л. М. Местецкий (морфологические методы).
+
* '''Expert''': L. M. Mestetsky (morphological methods).
-
===Task 16 ===
+
===16. 2018===
-
* '''Name:''' Оценка оптимального объема выборки для исследований в медицине
+
* '''Title:''' Estimate of the optimal sample size for research in medicine
-
* '''Task''': В условиях недостаточного числа дорогостоящих измерений требуется спрогнозировать оптимальный объем пополняемой выборки.
+
* '''Problem:''' In conditions of an insufficient number of expensive measurements, it is required to predict the optimal size of the replenished sample.
-
* '''Data:''' Выборки измерений в медицинской диагностике, в частности, выборка иммунологических маркеров.
+
* '''Data:''' Samples of measurements in medical diagnostics, in particular, a sample of immunological markers.
-
* '''References:''':
+
* '''References:'''
-
** Мотренко А.П. Материалы по алгоритмам оценки оптимального объема выборки в репозитории MLAlgorithms[http://svn.code.sf.net/p/mlalgorithms/code/PhDThesis/Motrenko/doc/], [http://svn.code.sf.net/p/mlalgorithms/code/Group874/Motrenko2014KL/].
+
*# Motrenko A.P. Materials on algorithms for estimating the optimal sample size in the MLAlgorithms repository [http://svn.code.sf.net/p/mlalgorithms/code/PhDThesis/Motrenko/doc/], [http://svn.code.sf.net/ p/mlalgorithms/code/Group874/Motrenko2014KL/].
-
* '''Basic algorithm''': Серия эмпирических алгоритмов оценки объема выборки.
+
* '''Basic algorithm''': A series of empirical sample size estimation algorithms.
-
* '''Solution:''' Исследование свойств пространства параметров при пополнении выборки.
+
* '''Solution:''' Investigation of the properties of the parameter space when replenishing the sample.
-
* '''Novelty:''' Предложена новая методология прогнозирования объема выборки, обоснованная с точки зрения классической and байесовской статистики.
+
* '''Novelty:''' A new methodology for sample size forecasting is proposed, justified in terms of classical and Bayesian statistics.
-
* '''Authors:''' А.М. Катруца, Strizhov V.V., координатор Tamaz Gadaev
+
* '''Authors:''' A.M. Katrutsa, Strijov V.V., coordinator Tamaz Gadaev
-
 
+
-
===Task 19 ===
+
-
* Name: Исследование зависимости качества распознавания онтологических объектов от глубины гипонимии.
+
-
* Task: Необходимо исследовать зависимость качества распознавания онтологических объектов на различных уровнях гипонимии понятий. Классическая постановка задачи распознавания именованных сущностей: https://en.wikipedia.org/wiki/Named-entity_recognition
+
-
* Data: Гипонимии из https://wordnet.princeton.edu/ , тексты разных доменов предположительно из WebOfScience.
+
-
* References: Релевантные статьи для классической постановки http://arxiv-sanity.com/search?q=named+entity+recognition
+
-
* Basic algorithm: В качестве алгоритма может использоваться https://arxiv.org/pdf/1709.09686.pdf или упрощенная его версия, исследования производятся с использованием библиотеки DeepPavlov.
+
-
* Solution: Необходимо собрать датасет гипонимии (вложенности понятий) объектов с использованием WordNet, произвести автоматическую разметку онтологических объектов текстов различных доменов для нескольких уровней обобщения понятий, провести ряд экспериментов для определения качества распознавания онтологических объектов для разных уровней вложенности.
+
-
* Novelty: Подобные исследования не производились, готовые датасеты с иерархической разметкой объектов отсутствуют. Распознавание онтологических объектов на различных уровнях гипонимии может быть использовано для производства дополнительных признаков при решении различных NLP (Natural language processing) задач, а также определения являются ли объекты парой гипоним-гипероним.
+
-
* Authors: Бурцев Михаил Сергеевич (Expert), Баймурзина Диляра Римовна (consultant).
+
-
=== Task 20 ===
+
===19. 2018===
-
* Name: Сравнение качества end-to-end обучаемых моделей в задаче ответа на вопросы в диалоге с учетом контекста
+
* Name: Study of the dependence of the quality of recognition of ontological objects on the depth of hyponymy.
-
* Task: Задан фрагмент текста and несколько последовательных вопросов. Ответы на первые n вопросов известны. Нужно сформировать ответ на n+1 вопрос. В качестве ответа нужно указать непрерывный промежуток в тексте заданного фрагмента текста (номера начального and конечного слов). При оценке качества ответа Task сводится к классификации символов фрагмента на класс 0 (не входит в ответ) and 1 (входит в ответ).
+
* '''Problem description:''' It is necessary to investigate the dependence of the quality of recognition of ontological objects at different levels of concept hyponymy. The classic formulation of the problem of named entity recognition: https://en.wikipedia.org/wiki/Named-entity_recognition
-
* Data: Предоставляется размеченный датасет с фрагментами текста and наборами вопросов с ответами в диалоге
+
* Data: Hyponyms from https://wordnet.princeton.edu/ , texts from different domains presumably from WebOfScience.
-
* References: Статья Bi-directional Attention Flow for Machine Comprehension (BiDAF2017) описывает end-to-end модель ответов на вопросы по фрагменту без учета контекста диалога. Статья QuAC: Question Answering in Context (QuAC2018) описывает набор данных, содержит описание используемого базового алгоритма с учетом контекста диалога. Статьи с описанием других моделей вопрос-ответных систем (R-Net, DrQA)
+
* '''References:''' Relevant articles for classical staging http://arxiv-sanity.com/search?q=named+entity+recognition
-
* Basic algorithm: Basic algorithm описан статьях and реализован (QuAC2018, BiDAF2017).
+
* Basic algorithm: https://arxiv.org/pdf/1709.09686.pdf or its simplified version can be used as an algorithm, studies are performed using the DeepPavlov library.
-
* Solution: Предлагается изучить механизмы учета контекста (k-ctx, append, etc) and исследовать возможность их добавления в другие модели (DrQA, R-NET), либо предложить собственные для повышения качества по мере F1. Для изучения поведения модели используется визуализация внимания (attention visualization), обучаемых эмбеддингов, а также анализ ошибочных ответов. Предоставляется доступ к вычислительным ресурсам, используемые фреймворки: TensorFlow, PyTorch или Keras.
+
* '''Solution:''' It is necessary to collect a dataset of hyponymy (nesting of concepts) of objects using WordNet, to automatically mark up ontological objects of texts of various domains for several levels of generalization of concepts, to conduct a series of experiments to determine the quality of recognition of ontological objects for different levels of nesting.
-
* Novelty: Исследование проводится на новом датасете, для которого на данный момент имеется только Basic algorithm. Подтверждение повышения качества от применения механизмов учета контекста диалога в других моделях указывает на применимость предлагаемых подходов для решения более широкого круга задач.
+
* '''Novelty:''' Similar studies have not been carried out, there are no ready-made datasets with a hierarchical markup of objects. Recognition of ontological objects at various levels of hyponymy can be used to produce additional features when solving various NLP (Natural language processing) The problems, as well as determining whether objects are a hyponym-hypernym pair.
-
* Authors: [https://mipt.ru/education/chairs/parallelcomputing/persons/chritankov.php Антон Сергеевич Хританков]
+
* '''Authors:''' Burtsev Mikhail Sergeevich (Expert), Baimurzina Dilyara Rimovna (consultant).
-
=== Task 21 ===
+
===21. 2018===
-
* '''Name:''' Методы выпуклой оптимизации высокого порядка
+
* '''Title:''' High order convex optimization methods
-
* '''Task:''' Для выпуклых задач не очень больших размерностей эффективно (до n ~ 10^3 иногда даже до n ~ 10^4) применяются методы высокого порядка. До недавнего времени принято было считать, что это методы второго порядка (использующие вторые производные оптимизируемой функции). Однако в начале 2018 года Ю.Е. Нестеров [1] предложил в теории эффективный метод третьего порядка, который работает почти по оптимальным оценкам. В пособии [3] в упражнении 1.3 описан пример "плохой" выпуклой функции, предложенной Ю.Е. Нестеровым, на котором хотелось бы сравнить метод Нестерова второго and третьего порядка [1], метод из работы [2] второго and третьего порядка and обычные быстрые градиентные методы (первого порядка). Сравнивать стоит как по числу итераций, так and по общему времени работы.
+
* '''Problem description:''' High-order methods are effectively (up to n ~ 10^3 sometimes even up to n ~ 10^4) used for convex problems of not very large dimensions. Until recently, it was generally accepted that these are second-order methods (using the second derivatives of the function being optimized). However, at the beginning of 2018 Yu.E. Nesterov [1] proposed an efficient third-order method in the theory, which works according to almost optimal estimates. In the manual [3] in exercise 1.3, an example of a "bad" convex function proposed by Yu.E. Nesterov, on which I would like to compare the Nesterov method of the second and third order [1], the method from [2] of the second and third order and the usual fast gradient methods (of the first order). It is worth comparing both by the number of iterations and by the total running time.
* '''References:'''
* '''References:'''
# https://alfresco.uclouvain.be/alfresco/service/guest/streamDownload/workspace/SpacesStore/aabc2323-0bc1-40d4-9653-1c29971e7bd8/coredp2018_05web.pdf?guest=true
# https://alfresco.uclouvain.be/alfresco/service/guest/streamDownload/workspace/SpacesStore/aabc2323-0bc1-40d4-9653-1c29971e7bd8/coredp2018_05web.pdf?guest=true
-
# https://arxiv.org/pdf/1809.00382.pdf
+
# https://arxiv.org/pdf/1809.00382.pdf
-
# https://arxiv.org/pdf/1711.00394.pdf
+
# https://arxiv.org/pdf/1711.00394.pdf
-
* '''Author:''' Евгения Алексеевна Воронцова (доцент ДВФУ, Владивосток), Александр Владимирович Гасников
+
* '''Author:''' Evgenia Alekseevna Vorontsova (Associate Professor of Far Eastern Federal University, Vladivostok), Alexander Vladimirovich Gasnikov
-
=== Task 22 ===
+
===22. 2018===
-
* '''Name:''' Cutting plane methods for copositive optimization
+
* '''Title:''' Cutting plane methods for copositive optimization
-
* '''Task''': Conic program over the copositive cone (copositive program) min <C,X> : <A_i,X> = b_i, X \in \Pi_i C^k_i, k_i <= 5 A linear function is minimized over the intersection of an affine subspace with a product of copositive cones of orders k_i <= 5. [[Media:Problems.pdf|Подробнее тут]]
+
* '''Problem:''' Conic program over the copositive cone (copositive program) min <C,X> : <A_i,X> = b_i, X \in \Pi_i C^k_i, k_i <= 5 A linear function is minimized over the intersection of an affine subspace with a product of copositive cones of orders k_i <= 5.
* '''Data:''' The algorithm will be tested on randomly generated instances
* '''Data:''' The algorithm will be tested on randomly generated instances
* '''References:'''
* '''References:'''
-
** [1] Peter J. C. Dickinson, Mirjam Dür, Luuk Gijben, Roland Hildebrand. Scaling relationship between the copositive cone and Parrilo’s first level approximation. Optim. Lett. 7(8), 1669—1679, 2013.
+
*# [1] Peter J. C. Dickinson, Mirjam Dür, Luuk Gijben, Roland Hildebrand. Scaling relationship between the copositive cone and Parrilo’s first level approximation. Optim. Lett. 7(8), 1669—1679, 2013.
-
** [2] Stefan Bundfuss, Mirjam Dür. Algorithmic copositivity detection by simplicial partition. Linear Alg. Appl. 428, 1511—1523, 2008.
+
*# [2] Stefan Bundfuss, Mirjam Dür. Algorithmic copositivity detection by simplicial partition. Linear Alg. Appl. 428, 1511—1523, 2008.
-
** [3] Mirjam Dür. Copositive programming — a Survey. In Recent advances in Optimization and its Applications in Engineering, Springer, pp. 3-20, 2010.
+
*# [3] Mirjam Dür. Copositive programming — a Survey. In Recent advances in Optimization and its Applications in Engineering, Springer, pp. 3-20, 2010.
-
* '''Basic algorithm:''' The reference algorithm is described in [4] Stefan Bundfuss, Mirjam Dür. An Adaptive Linear Approximation Algorithm for Copositive Programs. SIAM J. Optim., 20(1), 30-53, 2009.
+
* '''Base algorithm:''' The reference algorithm is described in [4] Stefan Bundfuss, Mirjam Dür. An Adaptive Linear Approximation Algorithm for Copositive Programs. SIAM J. Optim., 20(1), 30-53, 2009.
* '''Solution:''' The copositive program will be solved by a cutting plane algorithm. The cutting plane (in the case of an infeasible iterate) will be constructed from the semidefinite representation of the diagonal 1 section of the cone proposed in [1]. The algorithm will be compared to a simplicial division method proposed in [2], [4]. General information about copositive programs and their applications in optimization can be found in [3] .
* '''Solution:''' The copositive program will be solved by a cutting plane algorithm. The cutting plane (in the case of an infeasible iterate) will be constructed from the semidefinite representation of the diagonal 1 section of the cone proposed in [1]. The algorithm will be compared to a simplicial division method proposed in [2], [4]. General information about copositive programs and their applications in optimization can be found in [3] .
* '''Novelty:''' The proposed algorithm for optimization over copositive cones up to order 5 uses an exact semi-definite representation. In contrast to all other algorithms existing today the generation of cutting planes is non-iterative.
* '''Novelty:''' The proposed algorithm for optimization over copositive cones up to order 5 uses an exact semi-definite representation. In contrast to all other algorithms existing today the generation of cutting planes is non-iterative.
-
* '''Автор''': [http://www-ljk.imag.fr/membres/Roland.Hildebrand/ Roland Hildebrand]
+
* '''Author''': [http://www-ljk.imag.fr/membres/Roland.Hildebrand/ Roland Hildebrand]
-
=== Task 23 ===
+
===23. 2018===
-
* '''Name:''' Фрактальный анализ and синтез оптических изображений морского волнения
+
* '''Title:''' Fractal analysis and synthesis of optical images of sea waves
-
* '''Task:''' Разнообразные физические процессы and явления изучаются с помощью изображений, получаемых дистанционно. Важной задачей является получение адекватной информации об интересующих процессах and явлениях путём измерения определённых характеристик изображений. Линии равной яркости (изолинии) на изображениях многих природных объектов являются фрактальными, то есть представляют собой множества точек, которые не могут быть представлены линиями конечной длины and занимают промежуточное положение между линиями and двумерными плоскими фигурами. Такие множества характеризуются фрактальной размерностью D, которая обобщает классическое понятие размерности множества and может принимать дробные значения. Для уединённой точки на изображении D=0, для гладкой кривой D=1, для плоской фигуры D=2. Фрактальная изолиния имеет размерность 1<D<2. Алгоритм расчёта D приведён, например, в [1]. Фрактальная размерность изолиний морской поверхности, может служить для оценки пространственных спектров морских волн по данным дистанционного зондирования [1]. Task состоит в следующем. Необходимо провести исследование численными методами зависимости между характеристиками пространственных спектров морских волн and фрактальной размерностью спутниковых изображений Земли в области солнечного блика. Для исследования следует использовать метод численного синтеза оптических изображений морского волнения, описанный в [2]. Численное моделирование должно быть при различных характеристиках морских волн, а также при различных положениях Солнца and пространственном разрешении изображений.
+
* '''Problem description:''' A variety of physical processes and phenomena are studied with the help of images obtained remotely. An important The problem is to obtain adequate information about the processes and phenomena of interest by measuring certain image characteristics. Lines of equal brightness (isolines) on the images of many natural objects are fractal, that is, they are sets of points that cannot be represented by lines of finite length and occupy an intermediate position between lines and two-dimensional flat figures. Such sets are characterized by the fractal dimension D, which generalizes the classical concept of the dimension of a set and can take fractional values. For a solitary point on the image D=0, for a smooth curve D=1, for a flat figure D=2. The fractal isoline has the dimension 1<D<2. The algorithm for calculating D is given, for example, in [1]. The fractal dimension of the sea surface isolines can serve to estimate the spatial spectra of sea waves according to remote sensing data [1]. The problem is as follows. It is necessary to conduct a numerical study of the relationship between the characteristics of the spatial spectra of sea waves and the fractal dimension of satellite images of the Earth in the solar glare region. For the study, the method of numerical synthesis of optical images of sea waves, described in [2], should be used. Numerical modeling should be done with different characteristics of sea waves, as well as with different positions of the Sun and spatial resolution of images.
* '''References:'''
* '''References:'''
-
*# Лупян Е. А., Мурынин А. Б. Возможности фрактального анализа оптических изображений морской поверхности. // Препринт Института Космических исследований АН СССР Пр.-1521, Москва, 1989, 30 с.
+
*# Lupyan E. A., Murynin A. B. Possibilities of fractal analysis of optical images of the sea surface. // Preprint of the Space Research Institute of the Academy of Sciences of the USSR Pr.-1521, Moscow, 1989, 30 p.
-
*# Мурынин А. Б. Восстановление пространственных спектров морской поверхности по оптическим изображениям в нелинейной модели поля яркости // Исследования Земли из космоса, 1990. № 6. С. 60-70.
+
*# Murynin A. B. Reconstruction of the spatial spectra of the sea surface from optical images in a nonlinear model of the brightness field // Research of the Earth from Space, 1990. No. 6. P. 60-70.
-
* '''Author:''' Иван Алексеевич Матвеев
+
* '''Author:''' Ivan Alekseevich Matveev
-
=== Task 24 ===
+
===24. 2018===
-
* '''Название''' Максимизация энтропии при различных видах преобразований над изображением
+
* '''Name''' Entropy maximization for various types of image transformations
-
* '''Task:''' Паншарпенинг — это алгоритм повышения разрешения мультиспектральных изображений с использованием опорного изображения. Task паншарпенинга формулируется следующим образом: имея панхроматическое изображение требуемого разрешения and мультиспектральное изображение пониженного разрешения, требуется восстановить мультиспектральное изображение в пространственном разрешении панхроматического. Из эмпирических наблюдений, основанных на большом количестве снимков высокого разрешения, известно, что пространственная вариативность интенсивности отраженного излучения для объектов одной природы гораздо больше, чем вариативность их спектра. Другими словами, можно наблюдать, что спектр отраженного излучения однороден в границах одного объекта, в то время как даже внутри одного объекта интенсивность отраженного излучения варьируется. На практике хороших результатов можно достигнуть, используя упрощенный подход, при котором считается, что если интенсивность соседних областей значительно отличается, то, вероятно, эти области принадлежат разным объектам с разными отраженными спектрами. На этом основан разработанный вероятностный алгоритм повышения разрешения мультиспектральных изображений с использованием опорного изображения [1]
+
* '''Problem description:''' Pansharpening is an algorithm for upscaling multispectral images using a reference image. The problem of pansharpening is formulated as follows: having a panchromatic image of the required resolution and a multispectral image of reduced resolution, it is required to restore the multispectral image in the spatial resolution of the panchromatic one. From empirical observations based on a large number of high-resolution images, it is known that the spatial variability of the reflected radiation intensity for objects of the same nature is much greater than the variability of their spectrum. In other words, one can observe that the spectrum of reflected radiation is homogeneous within the boundaries of one object, while even within one object the intensity of reflected radiation varies. In practice, good results can be achieved using a simplified approach, in which it is assumed that if the intensity of neighboring regions differ significantly, then these regions probably belong to different objects with different reflected spectra. This is the basis for the developed probabilistic algorithm for increasing the resolution of multispectral images using a reference image [1]
-
* '''Необходимо''' провести исследование по максимизации энтропии при различных видах преобразований над изображением. Показать, что энтропия может служить индикатором потерь информации, содержащейся в изображении, при преобразованиях над ним. Формулировка обратной задачи по восстановлению изображения: Условие 1: Соответствие интенсивности (в каждой точке) восстановленного изображения интенсивности панхромного изображения. Условие 2: Соответствие низкочастотной составляющей восстановленного изображения исходному мультиспектральному изображению. Условие 3: Однородность (подобность) спектра в пределах одного объекта and допущение скачкообразного изменения спектра на границе двух однородных областей. Условие 4: При соблюдении первых трех условий, локальная энтропия восстановленного изображения должна быть максимизирована.
+
* '''It is necessary''' to conduct a study on maximizing the entropy for various types of transformations on the image. Show that entropy can serve as an indicator of the loss of information contained in the image during transformations over it. Formulation of the inverse problem for image restoration: Condition 1: Correspondence of the intensity (at each point) of the restored image with the intensity of the panchromatic image. Condition 2: Correspondence of the low-frequency component of the reconstructed image with the original multispectral image. Condition 3: Homogeneity (similarity) of the spectrum within one object and the assumption of an abrupt change in the spectrum at the border of two homogeneous regions. Condition 4: Under the first three conditions, the local entropy of the reconstructed image must be maximized.
* '''References:'''
* '''References:'''
-
*# Гороховский К. Ю., Игнатьев В. Ю., Мурынин А. Б., Ракова К. О. Поиск оптимальных параметров вероятностного алгоритма повышения пространственного разрешения мультиспектральных спутниковых изображений // Известия РАН. Теория and системы управления, 2017, № 6.
+
*# Gorohovsky K. Yu., Ignatiev V. Yu., Murynin A. B., Rakova K. O. Search for optimal parameters of a probabilistic algorithm for increasing the spatial resolution of multispectral satellite images // Izvestiya RAN. Theory and control systems, 2017, No. 6.
-
* '''Author:''' Иван Алексеевич Матвеев
+
* '''Author:''' Ivan Alekseevich Matveev
-
=== Task 25 ===
+
===25. 2018===
-
* '''Name:''' Автоматическое детектирование and распознавание объектов на изображениях
+
* '''Title:''' Automatic detection and recognition of objects in images
-
* '''Task:''' Автоматическое детектирование and распознавание объектов на изображениях and видео является одной из основных задач компьютерного зрения. Как правило, эти задачи разбиваются на несколько подзадач: предобработка, выделение характерных свойств изображения объекта and классификация. Этап предобработки обычно включает некоторые операции с изображением, такие как фильтрация, выравнивание яркости, геометрические корректирующие преобразования для облегчения устойчивого выделения признаков.
+
* '''Problem description:''' Automatic detection and recognition of objects in images and videos is one of the main The problems of computer vision. As a rule, these The problems are divided into several subThe problems: preprocessing, extraction of the characteristic properties of the object image and classification. The pre-processing stage usually includes some operations on the image such as filtering, brightness equalization, geometric corrective transformations to facilitate robust feature extraction.
-
Под характерными свойствами изображения объекта понимается некоторый набор признаков, приближённо описывающий интересующий объект. Признаки можно разбить на два класса: локальные and интегральные. Преимуществом локальных признаков является их универсальность, инвариантность по отношению к неравномерным изменениям яркости and освещённости, но они не уникальны. Интегральные признаки, характеризующие изображение объекта в целом, не устойчивы к изменению структуры объекта and сложным условиям освещения. Существует комбинированный подход — использование локальных признаков в качестве элементов интегрального описания, когда искомый объект моделируется набором областей, каждая из которых характеризуется своим набором признаков — локальным текстурным дескриптором. Совокупность таких дескрипторов характеризует объект в целом.
+
The characteristic properties of an image of an object are understood as a set of features that approximately describe the object of interest. Features can be divided into two classes: local and integral. The advantage of local features is their versatility, invariance with respect to uneven changes in brightness and illumination, but they are not unique. Integral features that characterize the image of the object as a whole are not resistant to changes in the structure of the object and difficult lighting conditions. There is a combined approach - the use of local features as elements of an integral description, when the desired object is modeled by a set of areas, each of which is characterized by its own set of features - a local texture descriptor. The totality of such descriptors characterizes the object as a whole.
-
Под классификацией понимают определение принадлежности объекта к тому или иному классу путём анализа вектора признаков, полученного на предыдущем этапе, разделения признакового пространства на подобласти, указывающие на соответствующий класс. Существует множество подходов к классификации: нейросетевые, статистические (Байеса, регрессия, Фишера and др.), решающие деревья and леса, метрические (ближайшие К-соседей, парзеновские окна и&nbsp;т.&nbsp;д.) and ядерные (SVM, RBF, метод потенциальных функций), композиционные (AdaBoost). Для задачи обнаружения объекта на изображении оценивается принадлежность двум классам — классу изображений, содержащих объект, and классу изображений, не содержащих объект (изображениям фона).
+
Classification is understood as determining whether an object belongs to a particular class by analyzing the feature vector obtained at the previous stage, dividing the feature space into subdomains indicating the corresponding class. There are many approaches to classification: neural network, statistical (Bayesian, regression, Fisher, etc.), decision trees and forests, metric (nearest K-neighbors, Parzen windows, etc.) and nuclear (SVM, RBF, method of potential functions), compositional (AdaBoost). For The problem of detecting an object in an image, membership in two classes is evaluated - the class of images containing the object, and the class of images that do not contain the object (background images).
-
* [[Media:ThemesIS2018Video.pdf|References: and более подробно тут]]
+
* [[Media:ThemesIS2018Video.pdf| References and more details here]]
-
* '''Author:''' Иван Алексеевич Матвеев
+
* '''Author:''' Ivan Alekseevich Matveev
-
 
+
===29. 2018===
-
=== Task 29 ===
+
* Name: Cross-Language Document Extractive Summarization with Neural Sequence Model.
* Name: Cross-Language Document Extractive Summarization with Neural Sequence Model.
-
* Task: Предлагается решить задачу переноса обучения для модели сокращения текста выделением предложением (extractive summarization) and исследовать зависимость качества сокращения текста от качества обучения модели перевода. Имея данные для обучения модели сокращения на английском языке and параллельный англо-русский корпус текстов построить модель для сокращения текста на русском языке. Решение задачи оценивается на небольшом наборе данных для тестирования модели на русском языке, качество решения задачи определяется отношением значений критериев ROUGE на английском and русском наборах.
+
* '''Problem description:''' It is proposed to solve the transfer learning problem for the text reduction model by extractive summarization and to investigate the dependence of the quality of text reduction on the quality of training of the translation model. Having data for training the abbreviation model in English and a parallel English-Russian corpus of texts, build a model for abbreviating the text in Russian. The solution of the problem is evaluated on a small set of data for testing the model in Russian, the quality of the solution to the problem is determined by the ratio of the values of the ROUGE criteria in English and Russian sets.
-
* Data: Данные для обучения модели на английском языке (SummaRuNNer2016), параллельный корпус OPUS, данные для проверки на русском языке.
+
* Data: Data for training the model in English (SummaRuNNer2016), OPUS parallel corpus, data for verification in Russian.
-
* References: В статье (SummaRuNNer2016) дается описание базового алгоритма сокращения текста, в работе Neural machine translation by jointly learning to align and translate.(NMT2016) дается описание модели перевода. Идея совместного использования моделей представлена в статье Cross-Language Document Summarization Based on Machine Translation Quality Prediction (CrossSum2010).
+
* '''References:''' The article (SummaRuNNer2016) describes the basic text reduction algorithm, the work Neural machine translation by jointly learning to align and translate.(NMT2016) describes the translation model. The idea of sharing models is presented in Cross-Language Document Summarization Based on Machine Translation Quality Prediction (CrossSum2010).
-
* Basic algorithm: Одна из идей базового алгоритма представлена в (CrossSum2010), модель перевода реализована (OpenNMT), предоставляется реализация модели сокращения текста (SummaRuNNer2016).
+
* Basic algorithm: One idea of the basic algorithm is presented in (CrossSum2010), a translation model is implemented (OpenNMT), an implementation of a text reduction model is provided (SummaRuNNer2016).
-
* Solution: Предлагается исследовать идею решения, предложенную в статье (CrossSum2010) and варианты объединения моделей сокращения and перевода. Базовые модели and предобработка наборов данных реализованы (OpenNMT), библиотеки PyTorch and Tensorflow. Анализ ошибок по сокращению текста производится, как описано в (SummaRuNNer2016), анализ качества обучения моделей стандартными инструментами библиотек, .
+
* '''Solution:''' It is suggested to explore the solution idea proposed in the article (CrossSum2010) and options for combining reduction and translation models. Basic models and dataset preprocessing implemented (OpenNMT), PyTorch and Tensorflow libraries. Analysis of text reduction errors is performed as described in (SummaRuNNer2016), analysis of the quality of model training by standard library tools, .
-
* Novelty: Для базовой модели применимость исследована на паре наборов данных, подтверждение возможности переноса обучения на набор данных на другом языке and указание условий для этого переноса расширит область применения модели and укажет необходимые новые доработки модели или предобработки данных.
+
* '''Novelty:''' For the base model, the applicability was investigated on a couple of datasets, confirming the possibility of transferring training to a dataset in another language and specifying the conditions for this transfer will expand the scope of the model and indicate the necessary new refinements of the model or data preprocessing.
-
* Authors: Алексей Романов (consultant), Anton Khritankov (Expert).
+
* '''Authors:''' Alexey Romanov (consultant), Anton Khritankov (Expert).
-
=== Task 30 ===
+
===30. 2018===
-
* Name: Метод построения HG-LBP дескриптора на основе гистограмм градиентов для детектирования пешеходов.
+
* Title: Method for constructing an HG-LBP descriptor based on gradient histograms for pedestrian detection.
-
* Task: Предлагается разработать новый дескриптор, обобщающий LBP дескриптор на основе гистограмм модулей градиентов, имеющий свойства композиции HOG-LBP для задачи детектирования пешеходов на изображении. В качестве анализа качества нового дескриптора предлагается использовать графики ошибок детектирования FAR/FRR на базе INRIA.
+
* '''Problem description:''' It is proposed to develop a new descriptor that generalizes the LBP descriptor based on histograms of gradient modules, having HOG-LBP composition properties for The problem of detecting pedestrians in an image. As an analysis of the quality of a new descriptor, it is proposed to use FAR/FRR detection error plots based on INRIA.
-
* Data: База данных пешеходов INRIA: http://pascal.inrialpes.fr/data/human/
+
* Data: INRIA pedestrian database: http://pascal.inrialpes.fr/data/human/
-
* References:
+
* '''References:'''
-
*# 1. T. Ojala and M. Pietikainen. Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns, IEEE Trans on Pattern Analysis and Machine Intelligence, Vol. 24. No.7, July, 2002.
+
*# T. Ojala and M. Pietikainen. Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns, IEEE Trans on Pattern Analysis and Machine Intelligence, Vol. 24. No. 7, July, 2002.
-
*# 2. T. Bouwmans, C. Silva, C. Marghes, M. Zitouni, H. Bhaskar, C. Frelicot,, «On the Role and the Importance of Features for Background Modeling and Foreground Detection», https://arxiv.org/pdf/1611.09099v1.pdf
+
*# T. Bouwmans, C. Silva, C. Marghes, M. Zitouni, H. Bhaskar, C. Frelicot, "On the Role and the Importance of Features for Background Modeling and Foreground Detection", https:// arxiv.org/pdf/1611.09099v1.pdf
-
*# 3. N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection // Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2005, pp.886-893
+
*# N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2005, pp.886-893
-
*# 4. T. Ahonen, A. Hadid, M. Pietikainen Face Description with Local Binary Patterns: Application to Face Recognition \\ IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume:28 , Issue: 121.
+
*# T. Ahonen, A. Hadid, M. Pietikainen Face Description with Local Binary Patterns: Application to Face Recognition \\ IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume:28 , Issue: 121.
-
*# 5. http://www.magicandlove.com/blog/2011/08/26/people-detection-in-opencv-again/
+
*# http://www.magicandlove.com/blog/2011/08/26/people-detection-in-opencv-again/
-
*# 6. http://www.cse.oulu.fi/CMV/Downloads/LBPMatlab2.
+
*# http://www.cse.oulu.fi/CMV/Downloads/LBPMatlab2
-
*# 7. http://www.mathworks.com/help/vision/ref/extractlbpfeatures.html3.
+
*# http://www.mathworks.com/help/vision/ref/extractlbpfeatures.html3.
-
*# 8. http://www.codeproject.com/Articles/741559/Uniform-LBP-Features-and-Spatial-Histogram-Computa4.
+
*# http://www.codeproject.com/Articles/741559/Uniform-LBP-Features-and-Spatial-Histogram-Computa4.
-
*# 9. http://www.cse.oulu.fi/CMV/Research
+
*# http://www.cse.oulu.fi/CMV/Research
* Basic algorithm: Xiaoyu Wang, Tony X. Han, Shuicheng Yan. An HOG-LBP Human Detector with Partial Occlusion Handling \\ ICCV 2009
* Basic algorithm: Xiaoyu Wang, Tony X. Han, Shuicheng Yan. An HOG-LBP Human Detector with Partial Occlusion Handling \\ ICCV 2009
-
* Solution: Одним из вариантов обобщения LBP может быть использование вместо гистограмм распределения точек по LBP-коду, гистограмм распределения модулей градиентов точек в блоке по LBP-коду (HG-LBP). Предлагается для основы экспериментов использовать библиотеку OpenCV, в которой реализованы алгоритмы HOG and LBP. Необходимо модифицировать исходный код реализации LBP and вставить подсчет модулей градиента and накопление соответствующей гистограммы по LBP. Необходимо написать программу чтения базы INRIA, обучения по ней метода линейного SVM на исходных and модифицированных дескрипторах, сбора статистики детектирования and построения DET-графиков FAR/FRR.
+
* '''Solution:''' One of the options for generalizing LBP can be to use instead of histograms of distribution of points by LBP code, histograms of distribution of modules of point gradients in a block by LBP code (HG-LBP). It is proposed to use the OpenCV library for the basis of experiments, in which the HOG and LBP algorithms are implemented. It is necessary to modify the source code of the LBP implementation and insert the calculation of the modules of the gradient and the accumulation of the corresponding histogram over the LBP. It is necessary to write a program for reading the INRIA base, learning the linear SVM method on the original and modified descriptors, collecting detection statistics and plotting FAR/FRR DET plots.
-
* Novelty: Разработка вычислительно простых методов для выделения максимально информативных признаков в Taskх распознавания является актуальной в области создания встроенных систем, обладающих малыми вычислительными ресурсами. Замена композиции дескрипторов одним, более информативным, чем каждый по отдельности может упростить решение задачи. Использование значений градиента в гистограммах дескриптора LPB является новым.
+
* '''Novelty:''' The development of computationally simple methods for extracting the most informative features in recognition The problems is relevant in the field of creating embedded systems with low computing resources. Replacing the composition of descriptors with one that is more informative than each individually can simplify the solution of the problem. The use of gradient values in LPB descriptor histograms is new.
-
* Authors: Гнеушев Александр Николаевич
+
* '''Authors:''' Gneushev Alexander Nikolaevich
-
=== Task 31 ===
+
===31. 2018===
-
* Name: Использование HOG дескриптора для обучения нейронной сети в задаче детектирования пешеходов
+
* Name: Using the HOG descriptor to train a neural network in a pedestrian detection The problem
-
* Task: Предлагается заменить линейный SVM классификатор в классическом алгоритме HOG простой сверточной нейронной сетью небольшой глубины, при этом HOG дескриптор должен представляться трехмерным тензором, сохраняющим пространственную структуру локальных блоков. В качестве анализа качества нового дескриптора предлагается использовать графики ошибок детектирования FAR/FRR на базе INRIA.
+
* '''Problem description:''' It is proposed to replace the linear SVM classifier in the classical HOG algorithm with a simple convolutional neural network of small depth, while the HOG descriptor should be represented by a three-dimensional tensor that preserves the spatial structure of local blocks. As an analysis of the quality of a new descriptor, it is proposed to use FAR/FRR detection error plots based on INRIA.
-
* Data: База данных пешеходов INRIA: http://pascal.inrialpes.fr/data/human/
+
* Data: INRIA pedestrian database: http://pascal.inrialpes.fr/data/human/
-
* References:
+
* '''References:'''
-
*# 1. N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection // Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2005, pp.886-893
+
*# 1. N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2005, pp.886-893
-
*# 3. Q. Zhu, S. Avidan, M.-C. Yeh, and K.-T. Cheng. Fast human detection using a cascade of histograms of oriented gradients. In CVPR, pages 1491—1498, 2006 O. Tuzel, F. Porikli, and P. Meer. Human detection via classification on riemannian manifolds. In CVPR, 2007
+
*# 3. Q. Zhu, S. Avidan, M.-C. Yeh, and K.-T. Cheng. Fast human detection using a cascade of histograms of oriented gradients. In CVPR, pages 1491-1498, 2006 O. Tuzel, F. Porikli, and P. Meer. Human detection via classification on riemannian manifolds. In CVPR, 2007
-
*# 4. P. Dollar, C. Wojek, B. Schiele and P. Perona Pedestrian Detection: An Evaluation of the State of the Art / IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol 34. Issue 4, pp. 743—761
+
*# 4. P. Dollar, C. Wojek, B. Schiele and P. Perona Pedestrian Detection: An Evaluation of the State of the Art / IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol 34. Issue 4, pp . 743-761
*# 5. Xiaoyu Wang, Tony X. Han, Shuicheng Yan, An HOG-LBP Human Detector with Partial Occlusion Handling, ICCV 2009 http://www.xiaoyumu.com/s/PDF/Wang_HOG_LBP.pdf
*# 5. Xiaoyu Wang, Tony X. Han, Shuicheng Yan, An HOG-LBP Human Detector with Partial Occlusion Handling, ICCV 2009 http://www.xiaoyumu.com/s/PDF/Wang_HOG_LBP.pdf
*# 6. https://en.wikipedia.org/wiki/Pedestrian_detection
*# 6. https://en.wikipedia.org/wiki/Pedestrian_detection
Строка 1976: Строка 2539:
*# 9. People Detection in OpenCV http://www.magicandlove.com/blog/2011/08/26/people-detection-in-opencv-again/
*# 9. People Detection in OpenCV http://www.magicandlove.com/blog/2011/08/26/people-detection-in-opencv-again/
*# 10. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
*# 10. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
-
* Basic algorithm:
+
*Basic algorithm:
-
*# 1. N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection // Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2005, pp.886-893
+
*# 1. N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2005, pp.886-893
*# 2. Xiaoyu Wang, Tony X. Han, Shuicheng Yan, An HOG-LBP Human Detector with Partial Occlusion Handling, ICCV 2009
*# 2. Xiaoyu Wang, Tony X. Han, Shuicheng Yan, An HOG-LBP Human Detector with Partial Occlusion Handling, ICCV 2009
-
* Solution: Одним из вариантов обобщения алгоритма HOG может быть использование вместо линейного алгоритма SVM другого классификатора, например какой-либо нейронной сети. Предлагается для основы экспериментов использовать библиотеку OpenCV, в которой реализован алгоритм HOG and классификатор SVM. Нужно проанализировать исходный код реализации HOG, формализовать внутреннюю структуру вектора HOG дескриптора в форме трехмерного тензора — две пространственные and одна спектральная размерности. Необходимо написать программу чтения базы INRIA, обучения по ней метода линейного SVM на HOG-дескрипторах, сбора статистики детектирования and построения DET-графиков FAR/FRR. Необходимо на основе какой-либо системы обучения нейросети (например, mxnet) собрать неглубокую (не более 2-3 сверточных слоев) сверточную нейросеть известной архитектуры, обучить ее на базе INRIA and на тензорных дескрипторах HOG, построить соответствующие графики FAR/FRR.
+
* '''Solution:''' One of the options for generalizing the HOG algorithm can be to use another classifier instead of the linear SVM algorithm, for example, some kind of neural network. It is proposed to use the OpenCV library for the basis of experiments, which implements the HOG algorithm and the SVM classifier. It is necessary to analyze the source code of the HOG implementation, formalize the internal structure of the descriptor HOG vector in the form of a three-dimensional tensor — two spatial and one spectral dimensions. It is necessary to write a program for reading the INRIA base, learning the linear SVM method on HOG descriptors from it, collecting detection statistics and plotting FAR/FRR DET plots. Based on some neural network training system (for example, mxnet), it is necessary to assemble a shallow (no more than 2-3 convolutional layers) convolutional neural network of known architecture, train it on the basis of INRIA and on HOG tensor descriptors, build the corresponding FAR / FRR graphs.
-
* Novelty: Разработка вычислительно простых методов для выделения максимально информативных признаков в Taskх распознавания является актуальной в области создания встроенных систем, обладающих малыми вычислительными ресурсами. Использование небольшого количества наиболее информативных дескрипторов может уменьшить вычислительную сложность, по сравнению с использованием большой композиции простых признаков, например в глубокой сверточной нейросети. Обычно классификаторы используют HOG дескриптор как вектор в целом, однако при этом теряется информация о локальной пространственной структуре and спектре признаков. Новизна заключается в использовании свойства локальности блоков в HOG дескрипторе and представление HOG в виде трехмерного тензора. Использование этой информации позволяет достичь устойчивости детектирования к перекрытию пешехода.
+
* '''Novelty:''' The development of computationally simple methods for extracting the most informative features in recognition The problems is relevant in the field of creating embedded systems with low computing resources. Using a small number of the most informative descriptors can reduce computational complexity compared to using a large composition of simple features, such as in a deep convolutional neural network. Typically, classifiers use the HOG descriptor as a vector as a whole, however, information about the local spatial structure and feature spectrum is lost. The novelty lies in the use of the block locality property in the HOG descriptor and the representation of the HOG as a 3D tensor. The use of this information makes it possible to achieve detection resistance to pedestrian overlap.
-
* Authors: Гнеушев Александр Николаевич
+
* '''Authors:''' Gneushev Alexander Nikolaevich
-
=YEAR=
+
==2017==
{|class="wikitable"
{|class="wikitable"
|-
|-
-
 
! Author
! Author
! Topic
! Topic
Строка 1996: Строка 2558:
! Letters
! Letters
! <tex>\Sigma=3+13</tex>
! <tex>\Sigma=3+13</tex>
-
!
 
|-
|-
-
|[[Участник:Goncharovalex|Гончаров Алексей (пример)]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Goncharovalex Goncharov Alexey]
-
|Метрическая классификация временных рядов
+
|Metric classification of time series
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/code code],
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/code code],
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/Goncharov2015MetricClassification.pdf paper],
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/Goncharov2015MetricClassification.pdf paper],
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/GoncharovAlexey2015PresentationMetricClassification.pdf slides]
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/GoncharovAlexey2015PresentationMetricClassification.pdf slides]
-
|[[Участник:Mpopova|Maria Popova]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Mpopova Maria Popova]
-
|Задаянчук Андрей
+
|Zadayanchuk Andrey
|BMF
|BMF
|AILSBRCVTDSWH>
|AILSBRCVTDSWH>
-
|
 
-
|
 
|-
|-
-
| [[Участник:AstakhovAnton|Астахов Антон]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:AstakhovAnton Astakhov Anton]
-
| Восстановление структуры прогностической модели по вероятностному представлению
+
| Restoring the structure of a predictive model from a probabilistic representation
-
| [https://svn.code.sf.net/p/mlalgorithms/code/Group574/Astakhov2018RestorePrognosticStructure/ folder]
+
|[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Astakhov2018RestorePrognosticStructure/ folder]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Astakhov2018RestorePrognosticStructure/code/ code]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Astakhov2018RestorePrognosticStructure/code/ code]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Astakhov2018RestorePrognosticStructure/doc/paper/Astakhov2018RestorePrognosticStructure.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Astakhov2018RestorePrognosticStructure/doc/paper/Astakhov2018RestorePrognosticStructure.pdf paper]
-
| [[Участник:Katrutsa|Александр Катруца]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Katrutsa Alexander Katrutsa]
-
| [[Участник:KislinskiVadim|Кислинский Вадим]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:KislinskiVadim Kislinsky Vadim]
| BHF
| BHF
|A-I-L0S0B0R0C0V0T0 [A-I-L-S-B0R0C0V0T0E0D0W0S] + [AILSBRCBTEDWS]
|A-I-L0S0B0R0C0V0T0 [A-I-L-S-B0R0C0V0T0E0D0W0S] + [AILSBRCBTEDWS]
|2+4
|2+4
-
|
 
|-
|-
-
| [[Участник:GavYur|Гаврилов Юрий]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:GavYur Gavrilov Yuri]
-
| Выбор интерпретируемых мультимоделей в Taskх кредитного скоринга
+
| Choice of Interpreted Multimodels in Credit Scoring The problems
|[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Gavrilov2018CreditScoringMultimodels/ folder]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Gavrilov2018CreditScoringMultimodels/ folder]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Gavrilov2018CreditScoringMultimodels/code/ code]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Gavrilov2018CreditScoringMultimodels/code/ code]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Gavrilov2018CreditScoringMultimodels/doc/paper/Gavrilov574CreditScoringMultimodels.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Gavrilov2018CreditScoringMultimodels/doc/paper/Gavrilov574CreditScoringMultimodels.pdf paper]
[https://youtu.be/ZOzprVyK8bc video]
[https://youtu.be/ZOzprVyK8bc video]
-
| [[Участник:Goncharovalex|А.В. Гончаров]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Goncharovalex Goncharov Alexey]
-
| [[Участник:Twelveth|Остроухов Петр]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Twelveth Ostroukhov Petr]
| BF
| BF
|A+IL-S0B-R0 [A+ILSBRC-VT0E0D0W0S] + (W)
|A+IL-S0B-R0 [A+ILSBRC-VT0E0D0W0S] + (W)
-
| 2+9+1
+
| 2+9+1
-
|
+
|-
|-
-
| [[Участник:Tamaz|Gadaev Tamaz]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Tamaz Gadaev Tamaz]
-
|Оценка оптимального объема выборки
+
|Estimating the optimal sample size
|[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Gadaev2018OptimalSampleSIze/ folder]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Gadaev2018OptimalSampleSIze/ folder]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Gadaev2018OptimalSampleSIze/code/ code]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Gadaev2018OptimalSampleSIze/code/ code]
Строка 2042: Строка 2599:
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Gadaev2018OptimalSampleSIze/slides/Gadaev2018OptimalSample.pdf slides]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Gadaev2018OptimalSampleSIze/slides/Gadaev2018OptimalSample.pdf slides]
[https://youtu.be/N7UnR1cRTOI video]
[https://youtu.be/N7UnR1cRTOI video]
-
|[[Участник:Katrutsa|Александр Катруца]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Katrutsa Alexander Katrutsa]
-
| [[Участник:ShulginEgor|Шульгин Егор]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:ShulginEgor Shulgin Egor]
|BHF
|BHF
|A-IL>SB-R-C0V0T0 [AILSBR0CVT0E-D0W0S]
|A-IL>SB-R-C0V0T0 [AILSBR0CVT0E-D0W0S]
-
| 2+9
+
| 2+9
-
|
+
|-
|-
-
| [[Участник:Egorgladin|Гладин Егор]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Egorgladin Gladin Egor]
-
|Экономия заряда акселерометра на основе прогнозирования временных рядов
+
|Accelerometer Battery Savings Based on Time Series Forecasting
|[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Gladin2018AccelerometerChargeSaving/ folder]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Gladin2018AccelerometerChargeSaving/ folder]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Gladin2018AccelerometerChargeSaving/code code]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Gladin2018AccelerometerChargeSaving/code code]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Gladin2018AccelerometerChargeSaving/doc/paper/Gladin2018AccelerometerChargeSaving.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Gladin2018AccelerometerChargeSaving/doc/paper/Gladin2018AccelerometerChargeSaving.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Gladin2018AccelerometerChargeSaving/doc/slides slides]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Gladin2018AccelerometerChargeSaving/doc/slides slides]
-
|[[Участник:Mvladimirova|Мария Владимирова]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Mvladimirova Maria Vladimirova]
-
|[[Участник:KozlinskyEvg|Козлинский Евгений]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:KozlinskyEvg Kozlinsky Evgeny]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Kozlinsky2018WNTMvsTM/review_on_Gladin.docx review]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Kozlinsky2018WNTMvsTM/review_on_Gladin.docx review]
|.F
|.F
|AILS [A-I-L-SB0R0C000V0T0E0D0W0S]
|AILS [A-I-L-SB0R0C000V0T0E0D0W0S]
|1+4
|1+4
-
|
 
|-
|-
-
| [[Участник:Andriygav|Грабовой Андрей]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Andriygav Grabovoi Andrey]
-
|Автоматическое определение релевантности параметров нейросети.
+
|Automatic determination of the relevance of neural network parameters.
|[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Grabovoy2018OptimalBrainDamage/ folder]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Grabovoy2018OptimalBrainDamage/ folder]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Grabovoy2018OptimalBrainDamage/code/ code]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Grabovoy2018OptimalBrainDamage/code/ code]
Строка 2070: Строка 2625:
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Grabovoy2018OptimalBrainDamage/doc/slides/Grabovoy2018OptimalBrainDamage.pdf slides]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Grabovoy2018OptimalBrainDamage/doc/slides/Grabovoy2018OptimalBrainDamage.pdf slides]
[https://www.youtube.com/watch?v=OnW3t5jk-r0&feature=youtu.be video]
[https://www.youtube.com/watch?v=OnW3t5jk-r0&feature=youtu.be video]
-
|[[Участник:Oleg Bakhteev| Oleg BakhteevЮ. ]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Oleg_Bakhteev Oleg Bakhteev]
-
| [[Участник:Oleksandr Kulkov|Кульков Александр]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Oleksandr_Kulkov Kulkov Alexander]
|BHMF
|BHMF
| A+ILS+BRC+VTE>D> [AILSBRCVTEDWS] [<tex>\emptyset</tex>]
| A+ILS+BRC+VTE>D> [AILSBRCVTEDWS] [<tex>\emptyset</tex>]
|3+13
|3+13
-
|
 
|-
|-
-
| [[Участник:Nurlanov_zh|Нурланов Жакшылык]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Nurlanov_zh Nurlanov Zhakshylyk]
| Deep Learning for reliable detection of tandem repeats in 3D protein structures
| Deep Learning for reliable detection of tandem repeats in 3D protein structures
|[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Nurlanov2018DeepSymmetry/ folder]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Nurlanov2018DeepSymmetry/ folder]
Строка 2084: Строка 2638:
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Nurlanov2018DeepSymmetry/doc/slides/Nurlanov2018DeepSymmetry.pdf slides]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Nurlanov2018DeepSymmetry/doc/slides/Nurlanov2018DeepSymmetry.pdf slides]
[https://youtu.be/y_HKeBlj45s video]
[https://youtu.be/y_HKeBlj45s video]
-
| [https://team.inria.fr/nano-d/team-members/sergei-grudinin/ С. В. Грудинин], Guillaume Pages
+
|[https://team.inria.fr/nano-d/team-members/sergei-grudinin/ S. V. Grudinin], Guillaume Pages
-
| [[Участник:Nikita_Pletnev|Плетнев Никита]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Nikita_Pletnev Pletnev Nikita]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Nurlanov2018DeepSymmetry/feedback/Pletnev2018Recension.pdf Review]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Nurlanov2018DeepSymmetry/feedback/Pletnev2018Recension.pdf Review]
|BHF
|BHF
|AILB [A-I-LS-BRC0V0T-E0D0W0S]
|AILB [A-I-LS-BRC0V0T-E0D0W0S]
-
|2+7
+
|2+7
-
|
+
|-
|-
-
| [[Участник:AnnRogozina|Рогозина Анна]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:AnnRogozina Rogozina Anna]
| Deep learning for RNA secondary structure prediction
| Deep learning for RNA secondary structure prediction
|[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Rogozina2018StructurePredictionRNA/ folder]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Rogozina2018StructurePredictionRNA/ folder]
Строка 2099: Строка 2652:
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Rogozina2018StructurePredictionRNA/doc/slides/Rogozina2018RNAPredictionsSlides.pdf slides]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Rogozina2018StructurePredictionRNA/doc/slides/Rogozina2018RNAPredictionsSlides.pdf slides]
[https://youtu.be/r6S5_5b24hg video]
[https://youtu.be/r6S5_5b24hg video]
-
| [[Участник:Mpopova|Maria Popova]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Mpopova Maria Popova]
-
| [[Участник:Tamaz|Gadaev Tamaz]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Tamaz Gadaev Tamaz]
|BHMF
|BHMF
|AILSBR> [AILSBRC0V0T0E0D0W0S]+CW
|AILSBR> [AILSBRC0V0T0E0D0W0S]+CW
-
|3+9
+
|3+9
-
|
+
|-
|-
-
| [[Участник:Ol terekhov|Терехов Олег]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Ol_terekhov Terekhov Oleg]
-
|Порождение признаков с помощью локально-аппроксимирующих моделей
+
|Generation of features using locally approximating models
|[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Terekhov2018LocallyApproxModels/ folder]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Terekhov2018LocallyApproxModels/ folder]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Terekhov2018LocallyApproxModels/code/ code]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Terekhov2018LocallyApproxModels/code/ code]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Terekhov2018LocallyApproxModels/doc/Terekhov2018LocalApproxModels.pdf paper]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Terekhov2018LocallyApproxModels/doc/Terekhov2018LocalApproxModels.pdf paper]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Terekhov2018LocallyApproxModels/slides/Terekhov2018LAM_Presentation.pdf slides]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Terekhov2018LocallyApproxModels/slides/Terekhov2018LAM_Presentation.pdf slides]
-
|С.Д. Иванычев, [[Участник:Neychev.Г.Нейчев]]
+
|S.D. Ivanychev, [http://www.machinelearning.ru/wiki/index.php?title=Участник:Neychev R.G. Neichev]
-
|[[Участник:Egorgladin|Гладин Егор]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Egorgladin Gladin Egor]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Terekhov2018LocallyApproxModels/doc/Gladin2018LAM_Review.pdf review]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Terekhov2018LocallyApproxModels/doc/Gladin2018LAM_Review.pdf review]
|BHM
|BHM
|AILSBRCVTDSW [AIL0SB0R0C0V0TE0D0W0S]
|AILSBRCVTDSW [AIL0SB0R0C0V0TE0D0W0S]
-
|2+12
+
|2+12
-
|
+
|-
|-
-
| [[Участник:ShulginEgor|Шульгин Егор]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:ShulginEgor Shulgin Egor]
-
| Порождение признаков, инвариантных к изменению частоты временного ряда
+
| Generation of features that are invariant to changes in the frequency of the time series
|[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Shulgin2018InvariantFeatureGeneration/ folder]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Shulgin2018InvariantFeatureGeneration/ folder]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Shulgin2018InvariantFeatureGeneration/code/ code]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Shulgin2018InvariantFeatureGeneration/code/ code]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Shulgin2018InvariantFeatureGeneration/doc/paper/ paper]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Shulgin2018InvariantFeatureGeneration/doc/paper/ paper]
-
| [[Участник:Neychev | Р.Г.Нейчев]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Neychev R.G. Neichev]
-
| [[Участник:Ol terekhov|Терехов Олег]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Ol_terekhov Terekhov Oleg]
| BHM
| BHM
|AIL [AI-LS-BR0CV0T0E0D0W0S]
|AIL [AI-LS-BR0CV0T0E0D0W0S]
-
| 2+5
+
| 2+5
-
|
+
|-
|-
-
| [[Участник:Gmalinovsky|Малиновский Григорий]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Gmalinovsky Malinovsky Grigory]
-
|Предсказание графовой структуры нейросетевой модели
+
|Graph Structure Prediction of a Neural Network Model
|[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Malinovskyi2018StructureCNN/ folder]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Malinovskyi2018StructureCNN/ folder]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Malinovskyi2018StructureCNN/code/ code]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Malinovskyi2018StructureCNN/code/ code]
Строка 2139: Строка 2689:
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Malinovskyi2018StructureCNN/paper/Malinovskyi2018NeuralStructureF_talk.pdf slides]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Malinovskyi2018StructureCNN/paper/Malinovskyi2018NeuralStructureF_talk.pdf slides]
[https://youtu.be/GjsJxE6Msbg video]
[https://youtu.be/GjsJxE6Msbg video]
-
|[[Участник:Oleg Bakhteev| Oleg BakhteevЮ. ]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Oleg_Bakhteev Oleg Bakhteev]
-
| [[Участник:Andriygav|Грабовой Андрей]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Andriygav Grabovoi Andrey]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Malinovskyi2018StructureCNN/paper/Grabovoy2018GraphStructure_Review.pdf review]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Malinovskyi2018StructureCNN/paper/Grabovoy2018GraphStructure_Review.pdf review]
| BHMF
| BHMF
| A+I+L+SBR>C>V>T>E>D> [AILSBRC0VTED0WS]+(C)
| A+I+L+SBR>C>V>T>E>D> [AILSBRC0VTED0WS]+(C)
-
| 3+11
+
| 3+11
-
|
+
|-
|-
-
| [[Участник:Oleksandr Kulkov|Кульков Александр]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Oleksandr_Kulkov Kulkov Alexander]
-
|Декодирование сигналов мозга and прогнозирование намерений
+
|Brain signal decoding and intention prediction
-
| [http://svn.code.sf.net/p/mlalgorithms/code/Group574/Kulkov2018PartialLeastSquares/ folder]
+
|[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Kulkov2018PartialLeastSquares/ folder]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Kulkov2018PartialLeastSquares/code/ code]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Kulkov2018PartialLeastSquares/code/ code]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Kulkov2018PartialLeastSquares/doc/kulkov2018_pls.pdf paper]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Kulkov2018PartialLeastSquares/doc/kulkov2018_pls.pdf paper]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Kulkov2018PartialLeastSquares/doc/slides/kulkov2018_pls.pdf slides]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Kulkov2018PartialLeastSquares/doc/slides/kulkov2018_pls.pdf slides]
[https://youtu.be/7TLzV-oK7mk video]
[https://youtu.be/7TLzV-oK7mk video]
-
| [[Участник:Isachenkoroma.В. Исаченко]]
+
|[[http://www.machinelearning.ru/wiki/index.php?title=Участник:Isachenkoroma R.V. Isachenko]
-
| [[Участник:Gmalinovsky|Малиновский Григорий]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Gmalinovsky Malinovsky Grigory]
[https://sourceforge.net/p/mlalgorithms/code/13746/#diff-1 review]
[https://sourceforge.net/p/mlalgorithms/code/13746/#diff-1 review]
| BHMF
| BHMF
| AILSBR [AILSBRCVTED0W0S]
| AILSBR [AILSBRCVTED0W0S]
| 3+11
| 3+11
-
|
 
|-
|-
-
| [[Участник:Nikita_Pletnev|Плетнев Никита]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Nikita_Pletnev Pletnev Nikita]
-
|Аппроксимация границ радужки глаза
+
|Approximation of the boundaries of the iris
-
| [http://svn.code.sf.net/p/mlalgorithms/code/Group574/Pletnev2018IrisApproximation/paper/Pletnev2018IrisApproximation.pdf paper]
+
|[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Pletnev2018IrisApproximation/paper/Pletnev2018IrisApproximation.pdf paper]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Pletnev2018IrisApproximation/slides/Pletnev2018IrisApproximationSlides.pdf slides]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Pletnev2018IrisApproximation/slides/Pletnev2018IrisApproximationSlides.pdf slides]
[ video]
[ video]
-
| [[Участник:Aduenko|Alexander Aduenko]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Aduenko Alexander Aduenko]
-
| [[Участник:Nurlanov_zh|Нурланов Жакшылык]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Nurlanov_zh Nurlanov Zhakshylyk]
|BF
|BF
|AILSB>R> [AILSTWS]
|AILSB>R> [AILSTWS]
-
| 2+7
+
| 2+7
-
|
+
|-
|-
-
| [[Участник:Twelveth|Остроухов Петр]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Twelveth Ostroukhov Petr]
|Selection of models superposition for identification of a person on the basis of a ballistocardiogram
|Selection of models superposition for identification of a person on the basis of a ballistocardiogram
-
| [https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group374/Ostroukhov2018BCGIdentification/ folder]
+
|[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group374/Ostroukhov2018BCGIdentification/ folder]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Ostroukhov2018BCGIdentification/doc/Ostroukhov2018BCGIdentification.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Ostroukhov2018BCGIdentification/doc/Ostroukhov2018BCGIdentification.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Ostroukhov2018BCGIdentification/code/ code]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Ostroukhov2018BCGIdentification/code/ code]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Ostroukhov2018BCGIdentification/slides/Ostroukhov2018BCGIdentification_slides.pdf slides]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Ostroukhov2018BCGIdentification/slides/Ostroukhov2018BCGIdentification_slides.pdf slides]
-
|Александр Прозоров
+
|Alexander Prozorov
-
|[[Участник:GavYur|Гаврилов Юрий]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:GavYur Gavrilov Yuri]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Gavrilov2018CreditScoringMultimodels/ReviewOnOstroukhov.pdf review]
[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Gavrilov2018CreditScoringMultimodels/ReviewOnOstroukhov.pdf review]
|BhF
|BhF
|AIL>S?B?R? [AILSBRCVT-E0D0W0S]
|AIL>S?B?R? [AILSBRCVT-E0D0W0S]
| 2+10
| 2+10
-
|
 
|-
|-
-
| [[Участник:KislinskiVadim|Кислинский Вадим]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:KislinskiVadim Kislinsky Vadim]
-
|Предсказание музыкальных плейлистов пользователей в рекомендательной системе.
+
|Predicting user music playlists in a recommender system.
|[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Kislinskiy2018APContinuation/ folder]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Kislinskiy2018APContinuation/ folder]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Kislinskiy2018APContinuation/code code]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Kislinskiy2018APContinuation/code code]
Строка 2195: Строка 2741:
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Kislinskiy2018APContinuation/doc/paper/Kislinskiy2018APcontinution.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Kislinskiy2018APContinuation/doc/paper/Kislinskiy2018APcontinution.pdf paper]
[https://youtu.be/YTqe9dkVgyw video]
[https://youtu.be/YTqe9dkVgyw video]
-
| Евгений Фролов
+
| Evgeny Frolov
-
| [[Участник:AstakhovAnton|Астахов Антон]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:AstakhovAnton Astakhov Anton]
| .F
| .F
| (AIL)------(SB)---(RCVT)-- [AILS-BRCVTED0W0S]
| (AIL)------(SB)---(RCVT)-- [AILS-BRCVTED0W0S]
| 1+11
| 1+11
-
|
 
|-
|-
-
| [[Участник:KozlinskyEvg|Козлинский Евгений]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:KozlinskyEvg Kozlinsky Evgeny]
-
| Анализ банковских транзакционных данных физических лиц для выявления паттернов потребления клиентов.
+
| Analysis of banking transactional data of individuals to identify customer consumption patterns.
-
| [https://svn.code.sf.net/p/mlalgorithms/code/Group574/Kozlinsky2018WNTMvsTM/ folder]
+
|[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Kozlinsky2018WNTMvsTM/ folder]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Kozlinsky2018WNTMvsTM/code/ code]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Kozlinsky2018WNTMvsTM/code/ code]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Kozlinsky2018WNTMvsTM/doc/paper/kozlinsky18wntm-individuals.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Kozlinsky2018WNTMvsTM/doc/paper/kozlinsky18wntm-individuals.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Kozlinsky2018WNTMvsTM/doc/slides/analiz-tranzaktsii-slash.pdf slides]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Kozlinsky2018WNTMvsTM/doc/slides/analiz-tranzaktsii-slash.pdf slides]
[https://youtu.be/0WCyndULNIM video]
[https://youtu.be/0WCyndULNIM video]
-
| Роза Айсина
+
| Rosa Aisina
-
| [[Участник:AnnRogozina|Рогозина Анна]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:AnnRogozina Rogozina Anna]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Kozlinsky2018WNTMvsTM/doc/paper/Kozlinsky18wntm-individuals_Review.pdf review]
[https://svn.code.sf.net/p/mlalgorithms/code/Group574/Kozlinsky2018WNTMvsTM/doc/paper/Kozlinsky18wntm-individuals_Review.pdf review]
| BHMF
| BHMF
| AILSBR>CV> [AILSBR0C0V0TE0D0WS]+(С)
| AILSBR>CV> [AILSBR0C0V0TE0D0WS]+(С)
| 3+8+1
| 3+8+1
-
|
 
|-
|-
|}
|}
-
===Task 1 ===
+
===1 ===
-
* '''Name:''' Аппроксимация границ радужки глаза
+
* '''Title:''' Approximation of the boundaries of the iris
-
* '''Task''': По изображению человеческого глаза определить окружности, аппроксимирующие внутреннюю and внешнюю границу радужки.
+
* '''Problem:''' Based on the image of the human eye, determine the circles approximating the inner and outer border of the iris.
-
* '''Data:''' Растровые монохромные изображения, типичный размер 640*480 пикселей (однако, возможны and другие размеры)[http://www.bath.ac.uk/elec-eng/research/sipg/irisweb/], [http://www.cb-sr.ia.ac.cn/IrisDatabase.htm].
+
* '''Data:''' Bitmap monochrome images, typical size 640*480 pixels (however other sizes are possible)[http://www.bath.ac.uk/elec-eng/research/sipg/irisweb/ ], [http://www.cb-sr.ia.ac.cn/IrisDatabase.htm].
-
* '''References:''':
+
* '''References:'''
-
** Адуенко А.А. Выбор мультимоделей в Taskх классификации (научный руководитель Strizhov V.V.). Московский физико-технический институт, 2017. [http://www.frccsc.ru/sites/default/files/docs/ds/002-073-05/diss/11-aduenko/11-Aduenko_main.pdf?626]
+
*# Aduenko A.A. Selection of multi-models in The problems classification (supervisor Strijov V.V.). Moscow Institute of Physics and Technology, 2017. [http://www.frccsc.ru/sites/default/files/docs/ds/002-073-05/diss/11-aduenko/11-Aduenko_main.pdf?626]
-
** К.А.Ганькин, А.Н.Гнеушев, И.А.Матвеев Сегментация изображения радужки глаза, основанная на приближенных методах с последующими уточнениями // Известия РАН. Теория and системы управления, 2014, 2, с. 78–92.
+
*# K.A. Gankin, A.N. Gneushev, I.A. Matveev Segmentation of the iris image based on approximate methods with subsequent refinements // Izvestiya RAN. Theory and control systems, 2014, no. 2, p. 78–92.
-
** Duda, R. O. Use of the Hough transformation to detect lines and curves in pictures / R. O. Duda, P. E. Hart // Communications of the ACM. 1972. Vol. 15, no. 1. Pp.
+
*# Duda, R. O. Use of the Hough transformation to detect lines and curves in pictures / R. O. Duda, P. E. Hart // Communications of the ACM. 1972 Vol. 15, no. 1.Pp.
-
* '''Basic algorithm''': Ефимов Юрий. Поиск внешней and внутренней границ радужки на изображении глаза методом парных градиентов, 2015.
+
* '''Basic algorithm''': Efimov Yury. Search for the outer and inner boundaries of the iris in the eye image using the paired gradient method, 2015.
-
* '''Solution:''' См. [[Media:Iris_circle_problem.pdf | Iris_circle_problem.pdf]]
+
* '''Solution:''' See [[Media:Iris_circle_problem.pdf | iris_circle_problem.pdf]]
-
* '''Novelty:''' Предложен быстрый беспереборный алгоритм аппроксимации границ с помощью линейных мультимоделей.
+
* '''Novelty:''' A fast non-enumerative algorithm for approximating boundaries using linear multimodels is proposed.
-
* '''consultant''': Alexander Aduenko (автор Strizhov V.V., Expert Matveev I.A.)
+
* '''consultant''': Alexander Aduenko (by Strijov V.V., Expert Matveev I.A.)
-
===Task 2 ===
+
===2 ===
-
* '''Name:''' Оценка оптимального объема выборки
+
* '''Title:''' Estimated optimal sample size
-
* '''Task''': В условиях недостаточного числа дорогостоящих измерений требуется спрогнозировать оптимальный объем пополняемой выборки.
+
* '''Problem:''' In conditions of an insufficient number of expensive measurements, it is required to predict the optimal size of the replenished sample.
-
* '''Data:''' Выборки измерений в медицинской диагностике, в частности, выборка иммунологических маркеров.
+
* '''Data:''' Samples of measurements in medical diagnostics, in particular, a sample of immunological markers.
-
* '''References:''':
+
* '''References:'''
-
** Мотренко А.П. Материалы по алгоритмам оценки оптимального объема выборки в репозитории MLAlgorithms[http://svn.code.sf.net/p/mlalgorithms/code/PhDThesis/Motrenko/doc/], [http://svn.code.sf.net/p/mlalgorithms/code/Group874/Motrenko2014KL/].
+
*# Motrenko A.P. Materials on algorithms for estimating the optimal sample size in the MLAlgorithms repository [http://svn.code.sf.net/p/mlalgorithms/code/PhDThesis/Motrenko/doc/], [http://svn.code.sf.net/ p/mlalgorithms/code/Group874/Motrenko2014KL/].
-
* '''Basic algorithm''': Алгоритмы оценки объема выборки при .
+
* '''Basic algorithm''': Sample size estimation algorithms for .
-
* '''Solution:''' Исследование свойств пространства параметров при пополнении выборки.
+
* '''Solution:''' Investigation of the properties of the parameter space when replenishing the sample.
-
* '''Novelty:''' Предложена новая методология прогнозирования объема выборки, обоснованная с точки зрения классической and байесовской статистики.
+
* '''Novelty:''' A new methodology for sample size forecasting is proposed, justified in terms of classical and Bayesian statistics.
-
* '''Authors:''' А.М. Катруца, Strizhov V.V., Expert А.П. Мотренко
+
* '''Authors:''' A.M. Katrutsa, Strijov V.V., Expert A.P. Motrenko
-
===Task 3 ===
+
===3 ===
-
* '''Name:''' Восстановление структуры прогностической модели по вероятностному представлению
+
* '''Title:''' Restoring the structure of the prognostic model from a probabilistic representation
-
* '''Task''': Требуется восстановить дерево суперпозиции по порожденному графу вероятностей связей.
+
* '''Problem:''' It is required to reconstruct the superposition tree from the generated connection probability graph.
-
* '''Data:''' Сегменты временных, пространственно-временных рядов (и текстовые коллекции).
+
* '''Data:''' Segments of time series, spatio-temporal series (and text collections).
-
* '''References:''':
+
* '''References:'''
-
** Работы Tommy Yakkola and других в LinkReview [https://docs.google.com/document/d/1j-1eZ4Az05yBR3GvgZusqFVIZeE_HcZDawZDzz41zS4/edit?usp=sharing].
+
*# Works by Tommy Yakkola and others at LinkReview [https://docs.google.com/document/d/1j-1eZ4Az05yBR3GvgZusqFVIZeE_HcZDawZDzz41zS4/edit?usp=sharing].
-
* '''Basic algorithm''': Метод ветвей and границ, динамическое пограммирование при построении полносвязного графа.
+
* '''Basic algorithm''': Branch and bound method, dynamic programming when building a fully connected graph.
-
* '''Solution:''' Построение модели в виде GAN, VAE порождает взвешенный граф, NN аппроксимирует структуру дерева.
+
* '''Solution:''' Building a model in the form of GAN, VAE generates a weighted graph, NN approximates a tree structure.
-
* '''Novelty:''' Предложен способ оштрафовать граф за то, что он не является деревом. Предложен способ прогнозирования структур прогностических моделей.
+
* '''Novelty:''' Suggested a way to penalize a graph for not being a tree. A method for predicting the structures of prognostic models is proposed.
-
* '''Authors:''' А.М. Катруца, Strizhov V.V.
+
* '''Authors:''' A.M. Katrutsa, Strijov V.V.
-
===Task 4 ===
+
===4 ===
-
* '''Name:''' Распознавание текста на основе скелетного представления толстых линий and сверточных сетей
+
* '''Title:''' Text recognition based on skeletal representation of thick lines and convolutional networks
-
* '''Task''': Требуется построить две CNN, одна распознает растровое представление изображения, другая векторное.
+
* '''Problem:''' It is required to build two CNNs, one recognizes a bitmap representation of an image, the other a vector one.
-
* '''Data:''' Шрифты в растровом представлении.
+
* '''Data:''' Bitmap fonts.
-
* '''References:''': Список работ [http://www.machinelearning.ru/wiki/images/a/a2/Morozov2017Synthesis_of_medicines.pdf], в частности arXiv:1611.03199 and
+
* '''References:''' List of works [http://www.machinelearning.ru/wiki/images/a/a2/Morozov2017Synthesis_of_medicines.pdf], in particular arXiv:1611.03199 and
-
* '''Basic algorithm''': Сверточная сеть для растрового изображения.
+
* '''Basic algorithm''': Convolution network for bitmap.
-
* '''Solution:''' Требуется предложить способ свертывания графовых структур, позволяющий породить информативное описание скелета толстой линии.
+
* '''Solution:''' It is required to propose a method for collapsing graph structures, which allows generating an informative description of the skeleton of a thick line.
-
* '''Novelty:''' Предложен способ повышения качества распознавания толстых линий за счет нового способа порождения их описаний.
+
* '''Novelty:''' A way to improve the quality of recognition of thick lines due to a new way of generating their descriptions is proposed.
-
* '''Authors:''' Л.М. Местецкий, И.А. Рейер, Strizhov V.V.
+
* '''Authors:''' L.M. Mestetsky, I.A. Reyer, Strijov V.V.
-
===Task 5 ===
+
===5 ===
-
* '''Name:''' Порождение признаков с помощью локально-аппроксимирующих моделей
+
* '''Title:''' Generation of features using locally approximating models
-
* '''Task''': Требуется проверить выполнимость гипотезы о простоте выборки для порожденных признаков. Признаки - оптимальные параметры аппроксимирующих моделей. При этом вся выборка не является простой and требует смеси моделей для ее аппроксимации. Исследовать информативность порожденных признаков - параметров аппроксимирующих моделей, обученных на сегментах исходного временного ряда.
+
* '''Problem:''' It is required to test the feasibility of the hypothesis of simplicity of sampling for the generated features. Features are the optimal parameters of approximating models. Moreover, the entire sample is not simple and requires a mixture of models to approximate it. Explore the information content of the generated features - the parameters of the approximating models trained on the segments of the original time series.
-
* '''Data:'''
+
* '''Data:'''
-
** WISDM (Kwapisz, J.R., G.M. Weiss, and S.A. Moore. 2011. Activity recognition using cell phone accelerometers. ACM SigKDD Explorations Newsletter. 12(2):74–82.), USC-HAD или сложнее. Данные акселерометра (Human activity recognition using smart phone embedded sensors: A Linear Dynamical Systems method, W Wang, H Liu, L Yu, F Sun - Neural Networks (IJCNN), 2014)
+
*# WISDM (Kwapisz, J.R., G.M. Weiss, and S.A. Moore. 2011. Activity recognition using cell phone accelerometers. ACM SigKDD Explorations Newsletter. 12(2):74–82.), USC-HAD or higher. Accelerometer data (Human activity recognition using smart phone embedded sensors: A Linear Dynamical Systems method, W Wang, H Liu, L Yu, F Sun - Neural Networks (IJCNN), 2014)
-
** ([[Временной ряд (библиотека примеров)]], раздел Accelerometry).
+
*# ([[Time series (examples library)]], Accelerometry section).
-
* '''References:''':
+
* '''References:'''
-
** Kuznetsov M.P., Ivkin N.P. Алгоритм классификации временных рядов акселерометра по комбинированному признаковому описанию // Машинное обучение and анализ данных. 2015. T. 1, 11. C. 1471-1483.[http://jmlda.org/papers/doc/2015/no11/Ivkin2015TSclassification.pdf]
+
*# Kuznetsov M.P., Ivkin N.P. Algorithm for Classifying Accelerometer Time Series by Combined Feature Description // Machine Learning and Data Analysis. 2015. V. 1, No. 11. C. 1471-1483. [http://jmlda.org/papers/doc/2015/no11/Ivkin2015TSclassification.pdf]
-
** Карасиков М.Е., Strizhov V.V. Классификация временных рядов в пространстве параметров порождающих моделей // Информатика and ее применения, 2016.[http://strijov.com/papers/Karasikov2016TSC.pdf URL]
+
*# Karasikov M.E., Strijov V.V. Classification of time series in the space of parameters of generating models // Informatics and its applications, 2016.[http://strijov.com/papers/Karasikov2016TSC.pdf URL]
-
** Kuznetsov M.P., Ivkin N.P. Алгоритм классификации временных рядов акселерометра по комбинированному признаковому описанию // Машинное обучение and анализ данных. 2015. T. 1, 11. C. 1471 - 1483. [http://jmlda.org/papers/doc/2015/no11/Ivkin2015TSclassification.pdf URL]
+
*# Kuznetsov M.P., Ivkin N.P. Algorithm for Classifying Accelerometer Time Series by Combined Feature Description // Machine Learning and Data Analysis. 2015. V. 1, No. 11. C. 1471 - 1483. [http://jmlda.org/papers/doc/2015/no11/Ivkin2015TSclassification.pdf URL]
-
** Isachenko R.V., Strizhov V.V. Метрическое обучение в Taskх многоклассовой классификации временных рядов // Информатика and ее применения, 2016, 10(2) : 48-57. [http://strijov.com/papers/Isachenko2016MetricsLearning.pdf URL]
+
*# Isachenko R.V., Strijov V.V. Metric learning in The problemx multiclass classification of time series // Informatics and its applications, 2016, 10(2) : 48-57. [http://strijov.com/papers/Isachenko2016MetricsLearning.pdf URL]
-
** Zadayanchuk A.I., Popova M.S., Strizhov V.V. Выбор оптимальной модели классификации физической активности по измерениям акселерометра // Информационные технологии, 2016. [http://strijov.com/papers/Zadayanchuk2015OptimalNN4.pdf URL]
+
*# Zadayanchuk A.I., Popova M.S., Strijov V.V. Choosing the optimal model for classifying physical activity based on accelerometer measurements // Information technologies, 2016. [http://strijov.com/papers/Zadayanchuk2015OptimalNN4.pdf URL]
-
** Motrenko A.P., Strijov V.V. Extracting fundamental periods to segment human motion time series // Journal of Biomedical and Health Informatics, 2016, Vol. 20, No. 6, 1466 - 1476. [http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group874/Motrenko2014TSsegmentation/JBHI/MotrenkoStrijov2014RV2.pdf?format=raw URL]
+
*# Motrenko A.P., Strijov V.V. Extracting fundamental periods to segment human motion time series // Journal of Biomedical and Health Informatics, 2016, Vol. 20, no. 6, 1466 - 1476.
-
** Ignatov A., Strijov V. Human activity recognition using quasiperiodic time series collected from a single triaxial accelerometer // Multimedia Tools and Applications, 2015, 17.05.2015 : 1-14. [http://strijov.com/papers/Ignatov2015HumanActivity.pdf URL]
+
*# Ignatov A., Strijov V. Human activity recognition using quasiperiodic time series collected from a single triaxial accelerometer // Multimedia Tools and Applications, 2015, 17.05.2015 : 1-14. [http://strijov.com/papers/Ignatov2015HumanActivity.pdf URL]
-
* '''Basic algorithm''': Описан в работе Кузнецова, Ивкина.
+
* '''Basic algorithm''': Described by Kuznetsov, Ivkin.
-
* '''Solution:''' Требуется построить набор локально-аппроксимирующих моделей and выбрать наиболее адекватные.
+
* '''Solution:''' It is required to build a set of locally approximating models and choose the most adequate ones.
-
* '''Novelty:''' Создан стандарт построения локально-аппроксимирующих моделей.
+
* '''Novelty:''' A standard for building locally approximating models has been created.
-
* '''Authors:''' С.Д. Иванычев, Р.Г. Нейчев, Strizhov V.V.
+
* '''Authors:''' S.D. Ivanychev, R.G. Neichev, Strijov V.V.
-
===Task 6 ===
+
===6 ===
-
* '''Name:''' Декодирование сигналов мозга and прогнозирование намерений
+
* '''Title:''' Brain signal decoding and intention prediction
-
* '''Task''': Требуется построить модель, восстанавливающую движение конечностей по кортикограмме.
+
* '''Problem:''' It is required to build a model that restores the movement of the limbs from the corticogram.
* '''Data:''' neurotycho.org [http://neurotycho.org/]
* '''Data:''' neurotycho.org [http://neurotycho.org/]
-
* '''References:''':
+
* '''References:'''
-
** Нейчев Р.Г., Катруца А.М., Strizhov V.V. Выбор оптимального набора признаков из мультикоррелирующего множества в задаче прогнозирования // Заводская лаборатория. Диагностика материалов, 2016, 82(3) : 68-74. [http://strijov.com/papers/Neychev2015FeatureSelection.pdf]
+
*# Neichev R.G., Katrutsa A.M., Strijov V.V. Selection of the optimal set of features from a multicorrelated set in the forecasting problem. Zavodskaya Lab. Diagnostics of materials, 2016, 82(3) : 68-74. [http://strijov.com/papers/Neychev2015FeatureSelection.pdf]
-
** MLAlgorithms: Motrenko, Isachenko (submitted)
+
*# MLAlgorithms: Motrenko, Isachenko (submitted)
* '''Basic algorithm''': Partial Least Squares[https://en.wikipedia.org/wiki/Partial_least_squares_regression]
* '''Basic algorithm''': Partial Least Squares[https://en.wikipedia.org/wiki/Partial_least_squares_regression]
-
* '''Solution:''' Создать алгоритм выбора признаков, альтернативный PLS and учитывающий неортогональную структуру взаимозависимости признаков.
+
* '''Solution:''' Create a feature selection algorithm alternative to PLS and taking into account the non-orthogonal structure of feature interdependence.
-
* '''Novelty:''' Предложен способ выбора признаков, учитывающий закономерности как and независимой, так and в зависимой переменной.
+
* '''Novelty:''' A feature selection method is proposed that takes into account the regularities of both the and independent variable and the dependent variable.
-
* '''Authors:''' Р.В. Исаченко, Strizhov V.V.
+
* '''Authors:''' R.V. Isachenko, Strijov V.V.
-
===Task 7 ===
+
===7 ===
-
* '''Name:''' Автоматическое определение релевантности параметров нейросети.
+
* '''Title:''' Automatic determination of the relevance of neural network parameters.
-
* '''Task''': Рассматривается Task нахождения устойчивой (и не избыточной по параметрам) структуры нейросети. Для отсечения избыточных параметров предлагается ввести априорные вероятностные предположения о распределении параметров and удалить из нейросети неинформативные параметры методом Белсли. Для настройки априорного распределения предлагается использовать градиентные методы.
+
* '''Problem:''' The problem of finding a stable (and not redundant in terms of parameters) neural network structure is considered. To cut off redundant parameters, it is proposed to introduce a priori probabilistic assumptions about the distribution of parameters and remove non-informative parameters from the neural network using the Belsley method. To adjust the prior distribution, it is proposed to use gradient methods.
-
* '''Data:''' Выборка рукописных цифр MNIST
+
* '''Data:''' A selection of handwritten MNIST digits
-
* '''Basic algorithm''': Optimal Brain Damage, прореживание на основе вариацинного вывода. Структуру итоговой модели предлагается сравнивать с моделью, полученной алгоритмом AdaNet.
+
* '''Basic algorithm''': Optimal Brain Damage, decimation based on variance inference. The structure of the final model is proposed to be compared with the model obtained by the AdaNet algorithm.
-
* '''References:''':
+
* '''References:'''
-
** [https://arxiv.org/pdf/1502.03492.pdf] Градиентные методы оптимизации гиперпараметров.
+
*# [https://arxiv.org/pdf/1502.03492.pdf] Gradient hyperparameter optimization methods.
-
** [http://proceedings.mlr.press/v48/luketina16.pdf] Градиентные методы оптимизации гиперпараметров.
+
*# [http://proceedings.mlr.press/v48/luketina16.pdf] Gradient hyperparameter optimization methods.
-
** [http://yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf] Optimal Brain Damage.
+
*# [http://yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf] Optimal Brain Damage.
-
** [https://arxiv.org/abs/1607.01097] AdaNet
+
*# [https://arxiv.org/abs/1607.01097] AdaNet
-
** [http://strijov.com/papers/SanduleanuStrijov2011FeatureSelection_Preprint.pdf] Метод Белсли
+
*# [http://strijov.com/papers/SanduleanuStrijov2011FeatureSelection_Preprint.pdf] Belsley Method
-
* '''Authors:''' Oleg Bakhteev, Strizhov V.V.
+
* '''Authors:''' Oleg Bakhteev, Strijov V.V.
-
===Task 8 ===
+
===8 ===
-
* '''Name:''' Предсказание графовой структуры нейросетевой модели.
+
* '''Title:''' Prediction of the graph structure of the neural network model.
-
* '''Task''': Рассматривается Task нахождения устойчивой (и не избыточной по параметрам) структуры сверточной нейросети. Предлагается предсказывать структуру нейросети с использованием doubly-recurrent нейросетей. В качестве обучающей выборки предлагается использовать структуры моделей, показавших хорошее качество на подвыборках небольшой мощности.
+
* '''Problem:''' The problem is considered to find a stable (and non-redundant in terms of parameters) structure of a convolutional neural network. It is proposed to predict the structure of a neural network using doubly-recurrent neural networks. As a training sample, it is proposed to use the structures of models that have shown good quality on subsamples of small power.
-
* '''Data:''' Выборки MNIST, CIFAR-10
+
* '''Data:''' Samples MNIST, CIFAR-10
-
* '''Basic algorithm''': случайный поиск. Возможно сравнение с работами по обучению с подкреплением.
+
* '''Basic algorithm''': random search. Comparison with work on reinforcement learning is possible.
-
* '''References:''':
+
* '''References:'''
-
** [https://pdfs.semanticscholar.org/e7bd/0e7a7ee6b0904d5de6e76e095a6a3b88dd12.pdf] doubly-recurrent нейросети.
+
*# [https://pdfs.semanticscholar.org/e7bd/0e7a7ee6b0904d5de6e76e095a6a3b88dd12.pdf] doubly-recurrent neural networks.
-
** [https://arxiv.org/pdf/1707.07012] Схожий подход с использованием обучения с подкреплением.
+
*# [https://arxiv.org/pdf/1707.07012] Similar approach using reinforcement learning.
-
* '''Authors:''' Oleg Bakhteev. Strizhov V.V.
+
* '''Authors:''' Oleg Bakhteev, Strijov V.V.
-
===Task 9===
+
===9===
-
* '''Name:''' Deep Learning for reliable detection of tandem repeats in 3D protein structures [[Media:Strijov_3D_CNN.pdf|подробнее в PDF]]
+
* '''Title:''' Deep Learning for reliable detection of tandem repeats in 3D protein structures [[Media:Strijov_3D_CNN.pdf|more in PDF]]
-
* '''Task''': Deep learning algorithms pushed computer vision to a level of accuracy comparable or higher than a human vision. Similarly, we believe that it is possible to recognize the symmetry of a 3D object with a very high reliability, when the object is represented as a density map. The optimization problem includes i) multiclass classification of 3D data. The output is the order of symmetry. The number of classes is ~10-20 ii) multioutput regression of 3D data. The output is the symmetry axis (a 3-vector). The input data are typically 24x24x24 meshes. The total amount of these meshes is of order a million. Biological motivation : Symmetry is an important feature of protein tertiary and quaternary structures that has been associated with protein folding, function, evolution, and stability. Its emergence and ensuing prevalence has been attributed to gene duplications, fusion events, and subsequent evolutionary drift in sequence. Methods to detect these symmetries exist, either based on the structure or the sequence of the proteins, however, we believe that they can be vastly improved.
+
* '''Problem:''' Deep learning algorithms pushed computer vision to a level of accuracy comparable or higher than a human vision. Similarly, we believe that it is possible to recognize the symmetry of a 3D object with a very high reliability, when the object is represented as a density map. The optimization problem includes i) multiclass classification of 3D data. The output is the order of symmetry. The number of classes is ~10-20 ii) multioutput regression of 3D data. The output is the symmetry axis (a 3-vector). The input data are typically 24x24x24 meshes. The total amount of these meshes is of order a million. Biological motivation : Symmetry is an important feature of protein tertiary and quaternary structures that has been associated with protein folding, function, evolution, and stability. Its emergence and ensuing prevalence has been attributed to gene duplications, fusion events, and subsequent evolutionary drift in sequence. Methods to detect these symmetries exist, either based on the structure or the sequence of the proteins, however, we believe that they can be vastly improved.
* '''Data:''' Synthetic data are obtained by ‘symmetrizing’ folds from top8000 library (http://kinemage.biochem.duke.edu/databases/top8000.php).
* '''Data:''' Synthetic data are obtained by ‘symmetrizing’ folds from top8000 library (http://kinemage.biochem.duke.edu/databases/top8000.php).
-
* '''References:''': Our previous 3D CNN: [https://arxiv.org/abs/1801.06252] Invariance of CNNs (and references therein): [https://hal.inria.fr/hal-01630265/document], [https://arxiv.org/pdf/1706.03078.pdf]
+
* '''References:''' Our previous 3D CNN: [https://arxiv.org/abs/1801.06252] Invariance of CNNs (and references therein): [https://hal.inria.fr/hal- 01630265/document], [https://arxiv.org/pdf/1706.03078.pdf]
-
* '''Basic algorithm:''' A prototype has already been created using the Tensorflow framework [4], which is capable to detect the order of cyclic structures with about 93% accuracy. The main goal of this internship is to optimize the topology of the current neural network prototype and make it rotational and translational invariant with respect to input data. [4] [https://www.tensorflow.org/]
+
* '''Base algorithm:''' A prototype has already been created using the Tensorflow framework [4], which is capable of detecting the order of cyclic structures with about 93% accuracy. The main goal of this internship is to optimize the topology of the current neural network prototype and make it rotational and translational invariant with respect to input data. [4] [https://www.tensorflow.org/]
* '''Solution:''' The network architecture needs to be modified according to the invariance properties (most importantly, rotational invariance). Please see the links below [https://hal.inria.fr/hal-01630265/document],
* '''Solution:''' The network architecture needs to be modified according to the invariance properties (most importantly, rotational invariance). Please see the links below [https://hal.inria.fr/hal-01630265/document],
[https://arxiv.org/pdf/1706.03078.pdf] The code is written using the Tensorflow library, and the current model is trained on a single GPU (Nvidia Quadro 4000)of a desktop machine.
[https://arxiv.org/pdf/1706.03078.pdf] The code is written using the Tensorflow library, and the current model is trained on a single GPU (Nvidia Quadro 4000)of a desktop machine.
* '''Novelty:''' Applications of convolutional networks to 3D data are still very challenging due to large amount of data and specific requirements to the network architecture. More specifically, the models need to be rotationally and transnationally invariant, which makes classical 2D augmentation tricks loosely applicable here. Thus, new models need to be developed for 3D data.
* '''Novelty:''' Applications of convolutional networks to 3D data are still very challenging due to large amount of data and specific requirements to the network architecture. More specifically, the models need to be rotationally and transnationally invariant, which makes classical 2D augmentation tricks loosely applicable here. Thus, new models need to be developed for 3D data.
-
* '''Authors:''' Expert Sergei Grudinin, consultants Guillaume Pages, Strizhov V.V.
+
* '''Authors:''' Expert Sergei Grudinin, consultants Guillaume Pages, Strijov V.V.
-
===Task 10===
+
===10===
-
* '''Name:''' Semi-supervised representation learning with attention
+
* '''Title:''' Semi-supervised representation learning with attention
-
* '''Task''': обучение векторных представлений с использованием механизма attention, благодаря которому значительно выросло качество машинного перевода. Предлагается использовать его в сети архитектуры encoder-decoder для получения векторов фрагментов текста произвольной длины.
+
* '''Problem:''' training of vector representations using the attention mechanism, thanks to which the quality of machine translation has increased significantly. It is proposed to use it in the encoder-decoder architecture network to obtain vectors of text fragments of arbitrary length.
-
* '''Data:''' Предлагается рассмотреть две выборки: Microsoft Paraphrase Corpus (небольшой набор предложений, https://www.microsoft.com/en-us/download/details.aspx?id=52398) and PPDB(набор коротких сегментов, не всегда корректная разметка. http://sitem.herts.ac.uk/aeru/ppdb/en/)
+
* '''Data:''' It is proposed to consider two samples: Microsoft Paraphrase Corpus (a small set of proposals, https://www.microsoft.com/en-us/download/details.aspx?id=52398) and PPDB (a set of short segments, not always correct markup. http://sitem.herts.ac.uk/aeru/ppdb/en/)
-
* '''References:''':
+
* '''References:'''
-
1. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. Attention Is All You Need (https://arxiv.org/abs/1706.03762).
+
*# Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. Attention Is All You Need (https://arxiv.org/abs/1706.03762).
-
2. John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu. Towards Universal Paraphrastic Sentence Embeddings (https://arxiv.org/abs/1511.08198).
+
*# John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu. Towards Universal Paraphrastic Sentence Embeddings (https://arxiv.org/abs/1511.08198).
-
3. Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, Sanja Fidler. Skip-Thought Vectors (https://arxiv.org/abs/1506.06726).
+
*# Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, Sanja Fidler. Skip Thought Vectors (https://arxiv.org/abs/1506.06726).
-
4. Keras seq2seq (https://github.com/farizrahman4u/seq2seq).
+
*# Keras seq2seq (https://github.com/farizrahman4u/seq2seq).
-
* '''Basic algorithm''': решение [3] или векторные представления, полученные с использованием seq2seq [].
+
* '''Basic algorithm''': solution [3] or vector representations obtained using seq2seq[].
-
* '''Solution:''' в задаче предлагается обучить векторные представления для фраз, используя механизм attention and метод частичного обучения. В качестве внутреннего функционала качества предлагается использовать усовершенствованную функцию ошибки из [2]. В качестве прикладной задачи можно рассмотреть задачу детектирования перефразирований and сентимент-анализ. Причем, исходя из результатов, полученный в [1], можно сделать предположение о том, что механизм attention в большей степени влияет на получение универсальных векторов для фраз, чем архитектура сети. Предлагается протестировать эту гипотезу с использованием двух различных архитектур - стандартной рекуррентной and feed-forward сети.
+
* '''Solution:''' in The problem it is proposed to train vector representations for phrases using the attention and partial learning mechanism. As an internal quality functional, it is proposed to use the improved error function from [2]. As an applied problem, we can consider the problem of detecting paraphrases and sentiment analysis. Moreover, based on the results obtained in [1], it can be assumed that the attention mechanism has a greater influence on obtaining universal vectors for phrases than the network architecture. It is proposed to test this hypothesis using two different architectures - a standard recurrent and feed-forward network.
-
* '''Novelty:''' новый метод.
+
* '''Novelty:''' new method.
-
* '''Authors:''' Рита Кузнецова, consultant
+
* '''Authors:''' Rita Kuznetsova, consultant
-
=== Task 11 ===
+
===11 ===
-
* '''Name:''' Выбор интерпретируемых мультимоделей в Taskх кредитного скоринга
+
* '''Title:''' Selection of Interpreted Multi-Models in Credit Scoring The problems
-
* '''Task''': Task кредитного скоринга заключается в определении уровня кредитоспособности заемщика. Для этого используется анкета заемщика, содержащая как числовые (возраст, доход), так and категориальные признаки (пол, профессия). Требуется, имея историческую информацию о возвратах кредитов другими заемщиками, определить, вернет ли заемщик кредит. Данные могут быть разнородными (например, в случае наличия в стране разных регионов по доходу), and для адекватной классификации потребуется несколько моделей. Необходимо определить оптимальное число моделей. По набору параметров моделей необходимо составить портрет заемщика.
+
* '''Problem:''' The problem of credit scoring is to determine the level of creditworthiness of the borrower. For this, a borrower's questionnaire is used, containing both numerical (age, income) and categorical features (gender, profession). It is required, having historical information about the repayment of loans by other borrowers, to determine whether the borrower will return the loan. The data can be heterogeneous (example, if there are different income regions in a country), and several models will be needed to adequately classify. It is necessary to determine the optimal number of models. Based on the set of model parameters, it is necessary to draw up a portrait of the borrower.
-
* '''Data:''' Предлагается рассмотреть пять выборок из репозиториев UCI and Kaggle, мощностью от 50000 объектов.
+
* '''Data:''' It is proposed to consider five samples from the UCI and Kaggle repositories, with a capacity of 50,000 objects or more.
-
* '''References:''': Диссертация А.А. Адуенко \MLAlgorithms\PhDThesis; С. Bishop, Pattern recognition and machine learning, последняя глава; 20 years of Mixture experts.
+
* '''References:''' A.A. Aduenko \MLAlgorithms\PhDThesis; C. Bishop, Pattern recognition and machine learning, final chapter; 20 years of Mixture experts.
-
* '''Basic algorithm:''' Кластеризация and построение независимых моделей логистической регрессии, Адабуст, Решающий лес (с ограничениями на сложность), Смесь Expertов.
+
* '''Base algorithm:''' Clustering and building independent logistic regression models, Adaboost, Decision Forest (with restrictions on complexity), Blend of Experts.
-
* '''Solution:''' Предлагается алгоритм выбора мультимодели (смеси моделей или смеси Expertов) and определения оптимального числа моделей.
+
* '''Solution:''' An algorithm is proposed for selecting a multi-model (a mixture of models or a mixture of Experts) and determining the optimal number of models.
-
* '''Novelty:''' Предлагается функция расстояния между моделями, в которых распределения параметров заданы на разных носителях.
+
* '''Novelty:''' Proposed function of distance between models in which parameter distributions are given on different media.
-
* '''Authors:''' А.В. Гончаров, Strizhov V.V..
+
* '''Authors:''' Goncharov Alexey, Strijov V.V.
-
=== Task 12 ===
+
===12 ===
-
* '''Name:''' Порождение признаков, инвариантных к изменению частоты временного ряда.
+
* '''Title:''' Generation of features that are invariant to changes in the frequency of the time series.
-
* '''Task''': Неформально: есть набор временных рядов определенной частоты (s1), причем интересующая нас информация различима and при меньшей частоте дискретизации (например, отсчеты происходят каждую миллисекунду, а интересующие нас события происходят на интервале 0.1 с). Данные ряды интегрируются, снижая частоту в 10 раз (т.е. каждые 10 значений просто суммируются) and получается набор временных рядов s2.Предлагается найти такие преобразования над временным рядом, зависящие от частоты, что временные ряды высокой частоты s1и более низкой частоты s2 будут описываться одинаково. Формально: Задан набор временных рядов s1, .., sNSс высокой частотой дискретизации 1. Целевая информация (например, движение рукой/cуточное колебание цены/…) различима and при меньшей частоте дискретизации 2 < 1. Необходимо найти такое отображение f: S G, -частота ряда, что оно будет порождать похожие признаковые описания для рядов различной частоты. Т.е.
+
* '''Problem:''' Informally: there is a set of time series of a certain frequency (s1), and the information we are interested in is distinguishable and at a lower sampling rate (in the example, the samples occur every millisecond, and the events of interest to us occur at an interval of 0.1 s). These series are integrated reducing the frequency by a factor of 10 (i.e. every 10 values are simply summed) and a set of time series s2 is obtained. be described in the same way. Formally: Given a set of time series s1, .., sNS with a high sampling rate 1. Target information (example, hand movement/daily price fluctuation/…) is distinguishable and at a lower sampling rate 2 < 1. It is necessary to find such a mapping f: S G, - the frequency of the series, that it will generate similar feature descriptions for series of different frequencies. Those.
-
f* = argminf E(f1(s1) -f2(s2)) , где E- некоторая функция ошибки.
+
f* = argminf E(f1(s1) -f2(s2)) , where E is some error function.
-
* '''Data:''' Наборы временных рядов физической активности людей с акселерометров; временные ряды ЭЭГ человека; временные ряды энергопотребления городов/промышленных объектов. Ссылка на выборку: репозиторий UCI, наши выборки по ЭЭГ and акселерометрам.
+
* '''Data:''' Sets of time series of people's physical activity from accelerometers; human EEG time series; time series of energy consumption of cities/industrial facilities. Sample link: UCI repository, our EEG and accelerometer samples.
-
* '''References:''': См выше про Акселерометры
+
* '''References:''' See above for Accelerometers
-
* '''Basic algorithm:''' Преобразование Фурье.
+
* '''Base algorithm:''' Fourier transform.
-
* '''Solution:''' Построение автоэнкодера с частично фиксированным внутренним представлением в виде того же временного ряда с меньшей частотой.
+
* '''Solution:''' Building an autoencoder with a partially fixed internal representation as the same time series with a lower frequency.
-
* '''Novelty:''' Для временных рядов отсутствует “общепринятый подход” к анализу, в отличие, например, от анализа изображений. Если посмотреть на проблему отвлеченно, сейчас кот определяется так же хорошо, как and кот, занимающий вдвое меньшее пространство на изображении. Напрашивается аналогия с временными рядами. Тем более, природа данных в картинках and во временных рядах похожа: в картинках иерархия между значениями есть по двум осям (x and y), а во временных рядах - по одной - по оси времени. Гипотеза заключается в том, что сходные с анализом изображений методы позволят получить качественные результаты. Полученное признаковое представление может в дальнейшем использоваться для классификации and предсказания временных рядов.
+
* '''Novelty:''' For time series, there is no “common approach” to analysis, in contrast, in the example, to image analysis. If you look at the problem abstractly, now the cat is defined as well as and the cat, which takes up half the space in the image. An analogy with time series suggests itself. Moreover, the nature of data in pictures and in time series is similar: in pictures there is a hierarchy between values along two axes (x and y), and in time series - one at a time - along the time axis. The hypothesis is that methods similar to image analysis will provide qualitative results. The resulting feature representation can be further used for classification and prediction of time series.
-
* '''Authors:''' R. G. Neichev, Strizhov V.V..
+
* '''Authors:''' R. G. Neichev, Strijov V.V.
-
=== Task 14 ===
+
===18 ===
-
to be done
+
* '''Title:''' Comparison of neural network and continuous morphological methods in the Text Detection The problem.
-
* '''Name:''' Предсказание музыкальных плейлистов пользователей в рекомендательной системе.
+
* '''Problem:''' Automatically Detect Text in Natural Images.
-
* '''Task''':
+
* '''Data:''' synthetic generated data + trained photo sample + [https://vision.cornell.edu/se3/coco-text-2/ COCO-Text dataset] + [http://www.machinelearning .ru/ Avito Competition 2014].
-
* '''Data:''' [https://recsys-challenge.spotify.com конкурса конференции RecSys'18].
+
* '''References:''' [https://vision.cornell.edu/se3/wp-content/uploads/2016/01/1601.07140v1.pdf COCO benchmark], [https://vision.cornell. edu/se3/wp-content/uploads/2016/01/1601.07140v1.pdf One of a state-of-the-art architecture]
-
* '''References:''':
+
* '''Base algorithm:''' [https://github.com/eragonruan/text-detection-ctpn code] + morphological methods, [http://www.machinelearning.ru/wiki/images/f/f1 /Avito.ru-2014_Ulyanov_presentation.pdf Avito 2014 winner's solution].
-
*# ''Воронцов К.В.'' [[Media:voron17survey-artm.pdf|Обзор вероятностных тематических моделей]]. 2017.
+
* '''Solution:''' It is proposed to compare the performance of several state-of-the-art algorithms that need a large training set with morphological methods that require a small amount of data. It is proposed to determine the limits of applicability of certain methods.
-
* '''Basic algorithm:'''
+
* '''Novelty:''' propose an algorithm based on the use of both neural network and morphological methods (solution of the word detection problem).
-
* '''Solution:''' построение тематической модели с помощью библиотеки [http://bigartm.org BigARTM].
+
* '''Authors:''' I.N. Zharikov.
-
* '''Novelty:'''
+
* '''Expert''': L.M. Mestetsky (morphological methods).
-
* '''Authors:''' Vorontsov K. V..
+
-
 
+
-
=== Task 15 ===
+
-
to be done
+
-
* '''Name:''' Иерархическое тематическое моделирование текстовой коллекции
+
-
* '''Task''': (варианты: новостной поток на русском / выпускные работы studentов на русском / научные статьи на английском / научпоп на русском).
+
-
* '''Data:'''
+
-
* '''References:''':
+
-
*# ''Воронцов К.В.'' [[Media:voron17survey-artm.pdf|Обзор вероятностных тематических моделей]]. 2017.
+
-
* '''Basic algorithm:'''
+
-
* '''Solution:''' построение тематической модели с помощью библиотеки [http://bigartm.org BigARTM].
+
-
* '''Novelty:'''
+
-
* '''Authors:''' Vorontsov K. V.
+
-
 
+
-
=== Task 16 ===
+
-
to be done
+
-
* '''Name:''' Анализ банковских транзакционных данных физических лиц для выявления паттернов потребления клиентов.
+
-
* '''Task''':
+
-
* '''Data:'''
+
-
* '''References:''':
+
-
*# ''Воронцов К.В.'' [[Media:voron17survey-artm.pdf|Обзор вероятностных тематических моделей]]. 2017.
+
-
* '''Basic algorithm:'''
+
-
* '''Solution:''' построение тематической модели с помощью библиотеки [http://bigartm.org BigARTM].
+
-
* '''Novelty:'''
+
-
* '''Authors:''' Vorontsov K. V., consultants Роза Айсина, Philip Nikitin.
+
-
 
+
-
=== Task 17 ===
+
-
to be done
+
-
* '''Name:''' Анализ банковских транзакционных данных юридических лиц для выявления видов экономической деятельности компаний.
+
-
* '''Task''':
+
-
* '''Data:'''
+
-
* '''References:''':
+
-
*# ''Айсина Р.М.'' [[Media:2017AysinaBsc.pdf|Тематическое моделирование финансовых потоков корпоративных клиентов банка по транзакционным данным]].
+
-
* '''Basic algorithm:'''
+
-
* '''Solution:''' построение тематической модели с помощью библиотеки [http://bigartm.org BigARTM].
+
-
* '''Novelty:''' Task восстановления структуры товарно-денежных потоков в отрасли по банковским транзакционным данным ранее не ставилась.
+
-
* '''Authors:''' Vorontsov K. V., consultant Роза Айсина.
+
-
 
+
-
=== Task 18 ===
+
-
* '''Name:''' Сравнение нейросетевых and непрерывно-морфологических методов в задаче детекции текста (Text Detection).
+
-
* '''Task''': Automatically Detect Text in Natural Images.
+
-
* '''Data:''' синтетические сгенерированные данные + подготовленная выборка фотографий + [https://vision.cornell.edu/se3/coco-text-2/ COCO-Text dataset] + [http://www.machinelearning.ru/wiki/index.php?title=%D0%9A%D0%BE%D0%BD%D0%BA%D1%83%D1%80%D1%81_Avito.ru-2014:_%D1%80%D0%B0%D1%81%D0%BF%D0%BE%D0%B7%D0%BD%D0%B0%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5_%D0%BA%D0%BE%D0%BD%D1%82%D0%B0%D0%BA%D1%82%D0%BD%D0%BE%D0%B9_%D0%B8%D0%BD%D1%84%D0%BE%D1%80%D0%BC%D0%B0%D1%86%D0%B8%D0%B8_%D0%BD%D0%B0_%D0%B8%D0%B7%D0%BE%D0%B1%D1%80%D0%B0%D0%B6%D0%B5%D0%BD%D0%B8%D1%8F%D1%85 Конкурс Avito 2014].
+
-
* '''References:''': [https://vision.cornell.edu/se3/wp-content/uploads/2016/01/1601.07140v1.pdf COCO benchmark], [https://vision.cornell.edu/se3/wp-content/uploads/2016/01/1601.07140v1.pdf One of a state-of-the-art architecture]
+
-
* '''Basic algorithm:''' [https://github.com/eragonruan/text-detection-ctpn code] + морфологические методы, [http://www.machinelearning.ru/wiki/images/f/f1/Avito.ru-2014_Ulyanov_presentation.pdf Avito 2014 winner's solution].
+
-
* '''Solution:''' Предлагается сравнить работы нескольких state-of-the-art алгоритмов, которым нужна обширная обучающая выборка, с морфологическими методы, требующие небольшого числа данных. Предлагается определить границы применимости тех или иных методов.
+
-
* '''Novelty:''' предложить алгоритм, основанный на использовании как нейросетевых, так and морфологических методов (решение задачи word detection).
+
-
* '''Authors:''' И.Н. Жариков.
+
-
* '''Expert''': Л.М. Местецкий (морфологические методы).
+
-
 
+
-
=YEAR=
+
-
== Group 594 ==
+
 +
==2017 Group 2==
{|class="wikitable"
{|class="wikitable"
|-
|-
Строка 2437: Строка 2931:
! Letters
! Letters
!<tex>\Sigma=3+13</tex>
!<tex>\Sigma=3+13</tex>
-
!
 
|-
|-
-
|[[Участник:Goncharovalex|Гончаров Алексей (пример)]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Goncharovalex Goncharov Alexey]
-
|Метрическая классификация временных рядов
+
|Metric classification of time series
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/code code],
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/code code],
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/Goncharov2015MetricClassification.pdf paper],
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/Goncharov2015MetricClassification.pdf paper],
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/GoncharovAlexey2015PresentationMetricClassification.pdf slides]
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/GoncharovAlexey2015PresentationMetricClassification.pdf slides]
-
|[[Участник:Mpopova|Maria Popova]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Mpopova Maria Popova]
-
|Задаянчук Андрей
+
|Zadayanchuk Andrey
|BMF
|BMF
|AILSBRCVTDSWH>
|AILSBRCVTDSWH>
-
|
 
-
|
 
|-
|-
-
|[[Участник:white2302|Белых Евгений]] [[Участник:Alladdin|Проскурин Александр]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:white2302 Belykh Evgeny] [http://www.machinelearning.ru/wiki/index.php?title=Участник:Alladdin Proskurin Alexander]
-
|Классификация суперпозиций движений физической активности
+
|Classification of superpositions of movements of physical activity
|[https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/ProskurinBelykh2018ClassificationOfPhysicalActivitySuperposition/ClassificationOfPhysicalActivitySuperposition.pdf paper]
|[https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/ProskurinBelykh2018ClassificationOfPhysicalActivitySuperposition/ClassificationOfPhysicalActivitySuperposition.pdf paper]
[https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/ProskurinBelykh2018ClassificationOfPhysicalActivitySuperposition/ProskurinBelykh2018Presentation.pdf slides]
[https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/ProskurinBelykh2018ClassificationOfPhysicalActivitySuperposition/ProskurinBelykh2018Presentation.pdf slides]
[https://github.com/Intelligent-Systems-Phystech/Group594/tree/master/ProskurinBelykh2018ClassificationOfPhysicalActivitySuperposition/code code]
[https://github.com/Intelligent-Systems-Phystech/Group594/tree/master/ProskurinBelykh2018ClassificationOfPhysicalActivitySuperposition/code code]
-
|Мария Владимирова, Александра Малькова
+
|Maria Vladimirova, Alexandra Malkova
-
|[[Участник:IlyaSM|Романенко Илья]], [[Участник:popovkin|Поповкин Андрей]], [https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/ProskurinBelykh2018ClassificationOfPhysicalActivitySuperposition/RomanenkoPopovkin2018ClassificationOfPhysicalActivitySuperposition_Review.pdf review]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:IlyaSM Romanenko Ilya], [http://www.machinelearning.ru/wiki/index.php?title=Участник:popovkin Popovkin Andrey], [https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/ProskurinBelykh2018ClassificationOfPhysicalActivitySuperposition/RomanenkoPopovkin2018ClassificationOfPhysicalActivitySuperposition_Review.pdf review]
[https://www.youtube.com/watch?v=QnjOlVVVu2k video]
[https://www.youtube.com/watch?v=QnjOlVVVu2k video]
|MF
|MF
|AILSBRC>V> [AILSBRC0VT0E0D0WS] CTD
|AILSBRC>V> [AILSBRC0VT0E0D0WS] CTD
|2+9
|2+9
-
|
 
|-
|-
-
|[[Участник:zueva.nn|Зуева Надежда]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:zueva.nn Zueva Nadezhda]
|Style Change Detection
|Style Change Detection
|[https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/Zueva2018TextStyleTransfer/StyleChangeDetection%20(10).pdf paper]
|[https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/Zueva2018TextStyleTransfer/StyleChangeDetection%20(10).pdf paper]
[https://github.com/Intelligent-Systems-Phystech/Group594/blob/master/Zueva2018TextStyleTransfer/Zueva_Presentation_Plagiarism%20(2).pdf slides]
[https://github.com/Intelligent-Systems-Phystech/Group594/blob/master/Zueva2018TextStyleTransfer/Zueva_Presentation_Plagiarism%20(2).pdf slides]
[https://www.youtube.com/watch?v=1-GWn5uYvsc video]
[https://www.youtube.com/watch?v=1-GWn5uYvsc video]
-
|Рита Кузнецова
+
|Rita Kuznetsova
-
|Игашов Илья, [https://drive.google.com/file/d/1I-IWRxh39VhZuU2FPzbJAwkqfdYRcqRV/view?usp=sharing review]
+
|Igashov Ilya, [https://drive.google.com/file/d/1I-IWRxh39VhZuU2FPzbJAwkqfdYRcqRV/view?usp=sharing review]
|BHMF
|BHMF
|AIL-S-B-R- [AILSBRCV0TE0D0WS]
|AIL-S-B-R- [AILSBRCV0TE0D0WS]
|3+10
|3+10
-
|
 
|-
|-
-
|[[Участник:Igashov|Игашов Илья]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Igashov Igashov Ilya]
-
|Формулировка and решение задачи оптимизации, сочетающей классификацию and регрессию, для оценки энергии связывания белка and маленьких молекул.
+
|Formulation and solution of an optimization problem combining classification and regression to estimate the binding energy of a protein and small molecules.
|[https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/Igashov2018ProteinLigandComplexes/Igashov2018ProteinLigandComplexes.pdf paper]
|[https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/Igashov2018ProteinLigandComplexes/Igashov2018ProteinLigandComplexes.pdf paper]
[https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/Igashov2018ProteinLigandComplexes/presentation/presentation.pdf slides]
[https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/Igashov2018ProteinLigandComplexes/presentation/presentation.pdf slides]
[https://www.youtube.com/watch?v=U0rDFG0-lzE video]
[https://www.youtube.com/watch?v=U0rDFG0-lzE video]
|Sergei Grudinin, Maria Kadukova
|Sergei Grudinin, Maria Kadukova
-
|[[Участник:vanderwardan|Манучарян Вардан]], [https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/Igashov2018ProteinLigandComplexes/Igashov2018ProteinLigandComplexes_Review.pdf review], [https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/Igashov2018ProteinLigandComplexes/Igashov2018ProteinLigandComplexes_Correction.pdf correction]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:vanderwardan Manucharyan Vardan], [https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/Igashov2018ProteinLigandComplexes/Igashov2018ProteinLigandComplexes_Review.pdf review], [https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/Igashov2018ProteinLigandComplexes/Igashov2018ProteinLigandComplexes_Correction.pdf correction]
|BHMF
|BHMF
|AILBS+BRHC>V> [AILSBRCVTE0D0WS]
|AILBS+BRHC>V> [AILSBRCVTE0D0WS]
|3+11
|3+11
-
|
 
|-
|-
-
|[[Участник:kalugin_di|Калугин Дмитрий]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:kalugin_di Kalugin Dmitry]
-
|Предсказание графовой структуры нейросетевой модели
+
|Graph Structure Prediction of a Neural Network Model
|[https://drive.google.com/file/d/1ZTP7Uhi622cj5BnItDmlz0k988Twd9UZ/view?usp=sharing paper]
|[https://drive.google.com/file/d/1ZTP7Uhi622cj5BnItDmlz0k988Twd9UZ/view?usp=sharing paper]
[https://drive.google.com/file/d/1iErLatXyIoqjH9yDXBbATc9vuA_8dmgZ/view?usp=sharing slides]
[https://drive.google.com/file/d/1iErLatXyIoqjH9yDXBbATc9vuA_8dmgZ/view?usp=sharing slides]
-
|[[Участник:Oleg_Bakhteev|Бахтеев Олег]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Oleg_Bakhteev Oleg Bakhteev]
-
|[[Участник:zueva.nn|Зуева Надежда]] [https://drive.google.com/drive/u/1/folders/1SV29oCjnqnrmjZ_pb1iNGgukodwLk-Bf review]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:zueva.nn Zueva Nadezhda] [https://drive.google.com/drive/u/1/folders/1SV29oCjnqnrmjZ_pb1iNGgukodwLk-Bf review]
|BHM
|BHM
|AI-L-S--B0R0C0V0 [A-ILSBR0CVT0ED0WS]
|AI-L-S--B0R0C0V0 [A-ILSBR0CVT0ED0WS]
|2+11
|2+11
-
|
 
|-
|-
-
|[[Участник:vanderwardan|Манучарян Вардан]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:vanderwardan Manucharyan Vardan]
-
|Предсказание свойств and типов атомов в молекулярных графах при помощи сверточных сетей
+
|Prediction of properties and types of atoms in molecular graphs using convolutional networks
|[https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/Manucharyan2018AtomicTypePredictionInUsingCNN/doc/Manucharyan2018AtomicTypePredictionInUsingCNN.pdf paper],
|[https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/Manucharyan2018AtomicTypePredictionInUsingCNN/doc/Manucharyan2018AtomicTypePredictionInUsingCNN.pdf paper],
[https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/Manucharyan2018AtomicTypePredictionInUsingCNN/slides/Manucharyan2018AtomicTypePredictionInUsingCNNPresentation.pdf slides],
[https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/Manucharyan2018AtomicTypePredictionInUsingCNN/slides/Manucharyan2018AtomicTypePredictionInUsingCNNPresentation.pdf slides],
[https://github.com/Intelligent-Systems-Phystech/Group594/blob/master/Manucharyan2018AtomicTypePredictionInUsingCNN/code/Manucharyan2018AtomicTypePredictionInUsingCNN.ipynb code]
[https://github.com/Intelligent-Systems-Phystech/Group594/blob/master/Manucharyan2018AtomicTypePredictionInUsingCNN/code/Manucharyan2018AtomicTypePredictionInUsingCNN.ipynb code]
[https://www.youtube.com/watch?v=sShO-zIbidE video]
[https://www.youtube.com/watch?v=sShO-zIbidE video]
-
|Sergei Grudinin, [[Участник:Kadukovam|Maria Kadukova]]
+
|Sergei Grudinin, [http://www.machinelearning.ru/wiki/index.php?title=Участник:Kadukovam Maria Kadukova]
-
|Фаттахов Артур [https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/Manucharyan2018AtomicTypePredictionInUsingCNN/rev.pdf review]
+
|Fattakhov Artur [https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/Manucharyan2018AtomicTypePredictionInUsingCNN/rev.pdf review]
|BMF
|BMF
|AILS>B> [AILSB0R0CV0TE0D0WS] VED
|AILS>B> [AILSB0R0CV0TE0D0WS] VED
|3+7
|3+7
-
|
 
|-
|-
-
|[[Участник:kirill_mouraviev|Муравьев Кирилл]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:kirill_mouraviev Muraviev Kirill]
-
|Определение параметров нейросети, подлежащих оптимизации.
+
|Determination of neural network parameters to be optimized.
|[https://github.com/KirillMouraviev/science_publication/blob/master/doc/Muravyev2018ParameterOptimization.pdf paper],
|[https://github.com/KirillMouraviev/science_publication/blob/master/doc/Muravyev2018ParameterOptimization.pdf paper],
[https://github.com/KirillMouraviev/science_publication/raw/master/doc/Muravyev2018FinalTalk.pdf slides],
[https://github.com/KirillMouraviev/science_publication/raw/master/doc/Muravyev2018FinalTalk.pdf slides],
[https://github.com/KirillMouraviev/science_publication/tree/master/code code]
[https://github.com/KirillMouraviev/science_publication/tree/master/code code]
[https://www.youtube.com/watch?v=1KkQnx249rU video]
[https://www.youtube.com/watch?v=1KkQnx249rU video]
-
|[[Участник:Oleg_Bakhteev|Бахтеев Олег]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Oleg_Bakhteev Oleg Bakhteev]
-
|Калугин Дмитрий [https://github.com/Intelligent-Systems-Phystech/Group594/blob/master/Muravyev2018ParameterOptimization/Muravyev2018ParameterOptimization_Review.pdf review]
+
|Kalugin Dmitry [https://github.com/Intelligent-Systems-Phystech/Group594/blob/master/Muravyev2018ParameterOptimization/Muravyev2018ParameterOptimization_Review.pdf review]
|BHMF
|BHMF
|A+IL-S-B-RCVTED [AILSBRCV0TE0DWS]
|A+IL-S-B-RCVTED [AILSBRCV0TE0DWS]
|3+12
|3+12
-
|
 
|-
|-
-
|[[Участник:diraria|Мурзин Дмитрий]] [[Участник:andnlv|Данилов Андрей]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:diraria Murzin Dmitry], [http://www.machinelearning.ru/wiki/index.php?title=Участник:andnlv Danilov Andrey]
-
|Распознавание текста на основе скелетного представления толстых линий and свёрточных сетей
+
|Text recognition based on skeletal representation of thick lines and convolutional networks
|[https://rawgit.com/Intelligent-Systems-Phystech/Group594/master/DanilovMurzin2018TextRecognitionUsingSkeletonRepresentationAndCNN/doc/DanilovMurzin2018TextRecognitionUsingSkeletonRepresentationAndCNN.pdf paper], [https://rawgit.com/Intelligent-Systems-Phystech/Group594/master/DanilovMurzin2018TextRecognitionUsingSkeletonRepresentationAndCNN/slides/DanilovMurzin2018TextRecognitionUsingSkeletonRepresentationAndCNN.pdf slides], [https://github.com/Intelligent-Systems-Phystech/Group594/tree/master/DanilovMurzin2018TextRecognitionUsingSkeletonRepresentationAndCNN/code code]
|[https://rawgit.com/Intelligent-Systems-Phystech/Group594/master/DanilovMurzin2018TextRecognitionUsingSkeletonRepresentationAndCNN/doc/DanilovMurzin2018TextRecognitionUsingSkeletonRepresentationAndCNN.pdf paper], [https://rawgit.com/Intelligent-Systems-Phystech/Group594/master/DanilovMurzin2018TextRecognitionUsingSkeletonRepresentationAndCNN/slides/DanilovMurzin2018TextRecognitionUsingSkeletonRepresentationAndCNN.pdf slides], [https://github.com/Intelligent-Systems-Phystech/Group594/tree/master/DanilovMurzin2018TextRecognitionUsingSkeletonRepresentationAndCNN/code code]
[video]
[video]
-
|[[Участник:Mest. М. Местецкий]], [[Участник:Ivan_Reyer|Иван Рейер]], Жариков И. Н.
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Mest L. M. Mestetsky], [http://www.machinelearning.ru/wiki/index.php?title=Участник:Ivan_Reyer Ivan Reyer], Zharikov I. N.
-
|[[Участник:kirill_mouraviev|Муравьев Кирилл]] [https://github.com/Intelligent-Systems-Phystech/Group594/blob/master/DanilovMurzin2018TextRecognitionUsingSkeletonRepresentationAndCNN/%D0%A0%D0%B5%D1%86%D0%B5%D0%BD%D0%B7%D0%B8%D1%8F.docx?raw=true review]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:kirill_mouraviev Muraviev Kirill] [https://github.com/Intelligent-Systems-Phystech/Group594/blob/master/DanilovMurzin2018TextRecognitionUsingSkeletonRepresentationAndCNN/%D0%A0%D0%B5%D1%86%D0%B5%D0%BD%D0%B7%D0%B8%D1%8F.docx?raw=true review]
|BHMF
|BHMF
|A+IL> [AILSB0R0CV0TE0D0WS]
|A+IL> [AILSB0R0CV0TE0D0WS]
|3+8
|3+8
-
|
 
|-
|-
-
|[[Участник:popovkin|Поповкин Андрей]] [[Участник:IlyaSM|Романенко Илья]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:popovkin Popovkin Andrey] [http://www.machinelearning.ru/wiki/index.php?title=Участник:IlyaSM Romanenko Ilya]
-
|Создание ранжирующих моделей для систем информационного поиска. Алгоритм прогнозирования структуры локально-оптимальных моделей
+
|Creation of ranking models for information retrieval systems. Algorithm for Predicting the Structure of Locally Optimal Models
|[https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/PopovkinRomanenko2018PredictionStructureOfIRFunctions/PredictionStructureOfIRFunctions.pdf paper]
|[https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/PopovkinRomanenko2018PredictionStructureOfIRFunctions/PredictionStructureOfIRFunctions.pdf paper]
[https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/PopovkinRomanenko2018PredictionStructureOfIRFunctions/RomanenkoPopovkin2018Presentation.pdf slides]
[https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/PopovkinRomanenko2018PredictionStructureOfIRFunctions/RomanenkoPopovkin2018Presentation.pdf slides]
[https://github.com/IlRomanenko/Information-retrieval code]
[https://github.com/IlRomanenko/Information-retrieval code]
[https://www.youtube.com/watch?v=wBUt1SIWDBA video]
[https://www.youtube.com/watch?v=wBUt1SIWDBA video]
-
|Кулунчаков Андрей, Strizhov V.V.
+
|Kulunchakov Andrey, Strijov V.V.
-
|[[Участник:Alladdin|Проскурин Александр]], [[Участник:White2302|Белых Евгений]], [https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/PopovkinRomanenko2018PredictionStructureOfIRFunctions/ProskurinBelykh2018PredictionStructureOfIRFunctions_Review.doc review]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Alladdin Proskurin Alexander], [http://www.machinelearning.ru/wiki/index.php?title=Участник:white2302 Belykh Evgeny], [https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/PopovkinRomanenko2018PredictionStructureOfIRFunctions/ProskurinBelykh2018PredictionStructureOfIRFunctions_Review.doc review]
|BHMF
|BHMF
|AILS0BC>V> [AILSBRC0VTED0WS]
|AILS0BC>V> [AILSBRC0VTED0WS]
|3+11
|3+11
-
|
 
|-
|-
-
|[[Участник:fartuk|Фаттахов Артур]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:fartuk Fattakhov Artur]
|Style Change Detection
|Style Change Detection
|[https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/Fattakhov2018TextStyleTransfer/Fattakhov2018.pdf paper]
|[https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/Fattakhov2018TextStyleTransfer/Fattakhov2018.pdf paper]
Строка 2555: Строка 3038:
[https://github.com/Intelligent-Systems-Phystech/Group594/tree/master/Fattakhov2018TextStyleTransfer/code code]
[https://github.com/Intelligent-Systems-Phystech/Group594/tree/master/Fattakhov2018TextStyleTransfer/code code]
[https://www.youtube.com/watch?v=PM5CmOmlAlw video]
[https://www.youtube.com/watch?v=PM5CmOmlAlw video]
-
|Рита Кузнецова
+
|Rita Kuznetsova
-
|Данилов Андрей, Мурзин Дмитрий, [https://rawgit.com/Intelligent-Systems-Phystech/Group594/master/Fattakhov2018TextStyleTransfer/review/Fattakhov2018_Review.pdf review]
+
|Danilov Andrey, Murzin Dmitry, [https://rawgit.com/Intelligent-Systems-Phystech/Group594/master/Fattakhov2018TextStyleTransfer/review/Fattakhov2018_Review.pdf review]
|BMF
|BMF
|AIL-S-B-R-CVTDSWH [AILSBRCVTE0D0WS]
|AIL-S-B-R-CVTDSWH [AILSBRCVTE0D0WS]
Строка 2564: Строка 3047:
-
=== Task 1 (1-2) ===
+
===1 (1-2) ===
-
* '''Name:''' Классификация суперпозиций движений физической активности
+
* '''Title:''' Classification of superpositions of movements of physical activity
-
* '''Task''': Анализ поведения человека по измерениям датчиков мобильного телефона: по данным акселерометра определить движения человека. Данные акселерометра представляют собой сигнал, не имеющий точной периодики, который содержит неизвестную суперпозицию физических моделей. Будем рассматривать суперпозицию моделей: тело + рука/сумка/рюкзак.
+
* '''Problem:''' Human behavior analysis by mobile phone sensor measurements: detect human movements from accelerometer data. The accelerometer data is a signal without precise periodicity, which contains an unknown superposition of physical models. We will consider the superposition of models: body + arm/bag/backpack.
-
Классификация видов деятельности человека по измерениям фитнес-браслетов. По измерениям акселерометра and гироскопа требуется определить вид деятельности рабочего. Предполагается, что временные ряды измерений содержат элементарные движения, которые образуют кластеры в пространстве описаний временных рядов. (Развитие: Характерная продолжительность движения — секунды. Временные ряды размечены метками вида деятельности: работа, отдых. Характерная продолжительность деятельности — минуты. Требуется по описанию временного ряда and кластера восстановить вид деятельности.)
+
Classification of human activities according to measurements of fitness bracelets. According to the measurements of the accelerometer and gyroscope, it is required to determine the type of activity of the worker. It is assumed that the time series of measurements contain elementary movements that form clusters in the space of time series descriptions. (Development: The characteristic duration of movement is seconds. Time series are marked with activity type marks: work, rest. The characteristic duration of activity is minutes. It is required to restore the type of activity by the description of the time series and cluster.)
* '''Data:'''
* '''Data:'''
-
** Собираются самостоятельно
+
*# Self assembled
-
** Данные строителей
+
*# Builders data
-
** Временные ряды акселерометра WISDM ([[Временной ряд (библиотека примеров)]], раздел Accelerometry).
+
*# WISDM accelerometer time series ([[Time series (examples library)]], Accelerometry section).
-
* '''References:''':
+
* '''References:'''
-
** Карасиков М. Е., Стрижов В. В. Классификация временных рядов в пространстве параметров порождающих моделей // Информатика and ее применения, 2016. [[http://strijov.com/papers/Karasikov2016TSC.pdf URL]]
+
*# Karasikov M. E., Strijov V. V. Classification of time series in the space of parameters of generating models // Informatics and its applications, 2016. [[http://strijov.com/papers/Karasikov2016TSC.pdf URL]]
-
** Кузнецов М. П., Ивкин Н. П. Алгоритм классификации временных рядов акселерометра по комбинированному признаковому описанию // Машинное обучение and анализ данных. 2015. T. 1, № 11. C. 1471—1483. [[http://jmlda.org/papers/doc/2015/no11/Ivkin2015TSclassification.pdf URL]]
+
*# Kuznetsov M.P., Ivkin N.P. Algorithm for classification of accelerometer time series by combined feature description // Machine Learning and Data Analysis. 2015. T. 1, No. 11. C. 1471-1483. [[http://jmlda.org/papers/doc/2015/no11/Ivkin2015TSclassification.pdf URL]]
-
** Исаченко Р. В., Стрижов В. В. Метрическое обучение в Taskх многоклассовой классификации временных рядов // Информатика and ее применения, 2016, 10(2) : 48-57. [[http://strijov.com/papers/Isachenko2016MetricsLearning.pdf URL]]
+
*# Isachenko R. V., Strijov V. V. Metric learning in The problems of multiclass classification of time series // Informatics and its applications, 2016, 10(2): 48-57. [[http://strijov.com/papers/Isachenko2016MetricsLearning.pdf URL]]
-
** Задаянчук А. И., Попова М. С., Стрижов В. В. Выбор оптимальной модели классификации физической активности по измерениям акселерометра // Информационные технологии, 2016. [[http://strijov.com/papers/Zadayanchuk2015OptimalNN4.pdf URL]]
+
*# Zadayanchuk A.I., Popova M.S., Strijov V.V. Choice of the optimal model for classifying physical activity based on accelerometer measurements // Information technologies, 2016. [[http://strijov.com/papers/Zadayanchuk2015OptimalNN4.pdf URL]]
-
** Motrenko A.P., Strijov V.V. Extracting fundamental periods to segment human motion time series // Journal of Biomedical and Health Informatics, 2016, Vol. 20, No. 6, 1466—1476. [[http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group874/Motrenko2014TSsegmentation/JBHI/MotrenkoStrijov2014RV2.pdf?format=raw URL]]
+
*# Motrenko A.P., Strijov V.V. Extracting fundamental periods to segment human motion time series // Journal of Biomedical and Health Informatics, 2016, Vol. 20, no. 6, 1466-1476. [[http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group874/Motrenko2014TSsegmentation/JBHI/MotrenkoStrijov2014RV2.pdf?format=raw URL]]
-
** Ignatov A., Strijov V. Human activity recognition using quasiperiodic time series collected from a single triaxial accelerometer // Multimedia Tools and Applications, 2015, 17.05.2015 : 1-14. [[http://strijov.com/papers/Ignatov2015HumanActivity.pdf URL]]
+
*# Ignatov A., Strijov V. Human activity recognition using quasiperiodic time series collected from a single triaxial accelerometer // Multimedia Tools and Applications, 2015, 17.05.2015 : 1-14. [[http://strijov.com/papers/Ignatov2015HumanActivity.pdf URL]]
-
* '''Basic algorithm:''' Basic algorithm описан в работах [Карасиков, Стрижов: 2016] and [Кузнецов, Ивкин: 2014].
+
* '''Base algorithm:''' Basic algorithm is described in [Karasikov, Strijov: 2016] and [Kuznetsov, Ivkin: 2014].
-
* '''Solution:''' Найти оптимальный способ сегментации and оптимальное описание временного ряда. Построить метрическое пространство описаний элементарных движений.
+
* '''Solution:''' Find the optimal segmentation method and optimal description of the time series. Construct a metric space of descriptions of elementary motions.
-
* '''Novelty:''' Предложен способ классификации and анализа сложных движений (Развитие: Соединение двух характеристических времен описания жизни человека, комбинированная постановка задачи.)
+
* '''Novelty:''' A method for classifying and analyzing complex movements is proposed (Development: Connection of two characteristic times of a description of a person's life, combined problem statement.)
-
* '''Authors:''' Александра Малькова, Мария Владимирова, R. G. Neichev, Strizhov V.V.,
+
* '''Authors:''' Alexandra Malkova, Maria Vladimirova, R. G. Neichev, Strijov V.V.
-
=== Task 2 (1) ===
+
===2 (1) ===
-
* '''Name:''' Сравнение нейросетевых and непрерывно-морфологических методов в задаче детекции текста (Text Detection).
+
* '''Title:''' Comparison of neural network and continuous morphological methods in the Text Detection The problem.
-
* '''Task''': Automatically Detect Text in Natural Images.
+
* '''Problem:''' Automatically Detect Text in Natural Images.
-
* '''Data:''' синтетические сгенерированные данные + подготовленная выборка фотографий + [https://vision.cornell.edu/se3/coco-text-2/ COCO-Text dataset] + [http://www.machinelearning.ru/wiki/index.php?title=%D0%9A%D0%BE%D0%BD%D0%BA%D1%83%D1%80%D1%81_Avito.ru-2014:_%D1%80%D0%B0%D1%81%D0%BF%D0%BE%D0%B7%D0%BD%D0%B0%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5_%D0%BA%D0%BE%D0%BD%D1%82%D0%B0%D0%BA%D1%82%D0%BD%D0%BE%D0%B9_%D0%B8%D0%BD%D1%84%D0%BE%D1%80%D0%BC%D0%B0%D1%86%D0%B8%D0%B8_%D0%BD%D0%B0_%D0%B8%D0%B7%D0%BE%D0%B1%D1%80%D0%B0%D0%B6%D0%B5%D0%BD%D0%B8%D1%8F%D1%85 Конкурс Avito 2014].
+
* '''Data:''' synthetic generated data + trained photo sample + [https://vision.cornell.edu/se3/coco-text-2/ COCO-Text dataset] + [http://www.machinelearning .ru/wiki/index.php?title=%D0%9A%D0%BE%D0%BD%D0%BA%D1%83%D1%80%D1%81_Avito.ru-2014:_%D1%80% D0%B0%D1%81%D0%BF%D0%BE%D0%B7%D0%BD%D0%B0%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5_%D0% BA%D0%BE%D0%BD%D1%82%D0%B0%D0%BA%D1%82%D0%BD%D0%BE%D0%B9_%D0%B8%D0%BD%D1%84% D0%BE%D1%80%D0%BC%D0%B0%D1%86%D0%B8%D0%B8_%D0%BD%D0%B0_%D0%B8%D0%B7%D0%BE%D0% B1%D1%80%D0%B0%D0%B6%D0%B5%D0%BD%D0%B8%D1%8F%D1%85 Avito Competition 2014].
-
* '''References:''': [https://vision.cornell.edu/se3/wp-content/uploads/2016/01/1601.07140v1.pdf COCO benchmark], [https://vision.cornell.edu/se3/wp-content/uploads/2016/01/1601.07140v1.pdf One of a state-of-the-art architecture]
+
* '''References:''' [https://vision.cornell.edu/se3/wp-content/uploads/2016/01/1601.07140v1.pdf COCO benchmark], [https://vision.cornell. edu/se3/wp-content/uploads/2016/01/1601.07140v1.pdf One of a state-of-the-art architecture]
-
* '''Basic algorithm:''' [https://github.com/eragonruan/text-detection-ctpn code] + морфологические методы, [http://www.machinelearning.ru/wiki/images/f/f1/Avito.ru-2014_Ulyanov_presentation.pdf Avito 2014 winner’s solution].
+
* '''Base algorithm:''' [https://github.com/eragonruan/text-detection-ctpn code] + morphological methods, [http://www.machinelearning.ru/wiki/images/f/f1 /Avito.ru-2014_Ulyanov_presentation.pdf Avito 2014 winner's solution].
-
* '''Solution:''' Предлагается сравнить работы нескольких state-of-the-art алгоритмов, которым нужна обширная обучающая выборка, с морфологическими методы, требующие небольшого числа данных. Предлагается определить границы применимости тех или иных методов.
+
* '''Solution:''' It is proposed to compare the performance of several state-of-the-art algorithms that need a large training set with morphological methods that require a small amount of data. It is proposed to determine the limits of applicability of certain methods.
-
* '''Novelty:''' предложить алгоритм, основанный на использовании как нейросетевых, так and морфологических методов (решение задачи word detection).
+
* '''Novelty:''' propose an algorithm based on the use of both neural network and morphological methods (solution of the word detection problem).
-
* '''Authors:''' И. Н. Жариков.
+
* '''Authors:''' I. N. Zharikov.
-
* '''Expert''': Л. М. Местецкий (морфологические методы).
+
* '''Expert''': L. M. Mestetsky (morphological methods).
-
=== Task 3 (1-2) ===
+
===3 (1-2) ===
-
* '''Name:''' Распознавание текста на основе скелетного представления толстых линий and сверточных сетей
+
* '''Title:''' Text recognition based on skeletal representation of thick lines and convolutional networks
-
* '''Task''': Требуется построить две CNN, одна распознает растровое представление изображения, другая векторное. (Развитие: порождение толстых линий нейросетями)
+
* '''Problem:''' It is required to build two CNNs, one recognizes a bitmap representation of an image, the other a vector one. (Development: generation of thick lines by neural networks)
-
* '''Data:''' Шрифты в растровом представлении.
+
* '''Data:''' Bitmap fonts.
-
* '''References:''': Список работ [http://www.machinelearning.ru/wiki/images/a/a2/Morozov2017Synthesis_of_medicines.pdf], в частности arXiv:1611.03199 и
+
* '''References:''' List of works [http://www.machinelearning.ru/wiki/images/a/a2/Morozov2017Synthesis_of_medicines.pdf], in particular arXiv:1611.03199 and
-
* '''Basic algorithm''': Сверточная сеть для растрового изображения.
+
* '''Basic algorithm''': Convolution network for bitmap.
-
* '''Solution:''' Требуется предложить способ свертывания графовых структур, позволяющий породить информативное описание скелета толстой линии.
+
* '''Solution:''' It is required to propose a method for collapsing graph structures, which allows generating an informative description of the skeleton of a thick line.
-
* '''Novelty:''' Предложен способ повышения качества распознавания толстых линий за счет нового способа порождения их описаний.
+
* '''Novelty:''' A way to improve the quality of recognition of thick lines due to a new way of generating their descriptions is proposed.
-
* '''Authors:''' Л. М. Местецкий, И. А. Рейер, Strizhov V.V.
+
* '''Authors:''' L. M. Mestetsky, I. A. Reyer, Strijov V.V.
-
=== Task 4 (1-2) ===
+
===4 (1-2) ===
-
* '''Name:''' Создание ранжирующих моделей для систем информационного поиска. Алгоритм прогнозирования структуры локально-оптимальных моделей
+
* '''Title:''' Creation of ranking models for information retrieval systems. Algorithm for Predicting the Structure of Locally Optimal Models
-
* '''Task''': Требуется спрогнозировать временной ряд с помощью некоторой параметрической суперпозицией алгебраических функций. Предлагается не стоить прогностическую модель, а спрогнозировать ее, то есть предсказать структуру аппроксимирующей суперпозиции. Вводится класс рассматриваемых суперпозиций, and на множестве таких структурных описаний проводится поиск локально-оптимальной модели для рассматриваемой задачи. Task состоит в 1) поиске подходящего структурного описания модели 2) описания алгоритма поиска той структуры, которая будет соответствовать оптимальной модели 3) описания алгоритма обратного построения модели по ее структурному описанию. В качестве уже имеющегося примера ответа на вопросы 1-3, смотри работы А. А. Варфоломеевой.
+
* '''Problem:''' It is required to predict a time series using some parametric superposition of algebraic functions. It is proposed not to cost the prognostic model, but to predict it, that is, to predict the structure of the approximating superposition. A class of considered superpositions is introduced, and on the set of such structural descriptions, a search is made for a locally optimal model for the problem under consideration. The problem consists in 1) searching for a suitable structural description of the model 2) describing the search algorithm for the structure that will correspond to the optimal model 3) describing the algorithm for inverse construction of the model according to its structural description. For an already existing example of the answer to questions 1-3, see the works of A. A. Varfolomeeva.
* '''Data:'''
* '''Data:'''
-
** Коллекция текстовых документов TREC (!)
+
*# Collection of text documents TREC (!)
-
** Набор временных рядов, который подразумевает восстановление функциональных зависимостей. Предлагается сначала использовать синтетические данные или сразу применить алгоритм к прогнозированию временных рядов 1) потребления электроэнергии 2) физической активности с последующим анализом получающихся структур.
+
*# A set of time series, which implies the restoration of functional dependencies. It is proposed to first use synthetic data or immediately apply the algorithm to forecasting time series 1) electricity consumption 2) physical activity with subsequent analysis of the resulting structures.
-
* '''References:''':
+
* '''References:'''
-
** (!) Kulunchakov A.S., Strijov V.V. Generation of simple structured Information Retrieval functions by genetic algorithm without stagnation // [http://strijov.com/papers/Kulunchakov2014RankingBySimpleFun.pdf Expert Systems with Applications, 2017, 85 : 221—230.]
+
*# (!) Kulunchakov A.S., Strijov V.V. Generation of simple structured Information Retrieval functions by genetic algorithm without stagnation // [http://strijov.com/papers/Kulunchakov2014RankingBySimpleFun.pdf Expert Systems with Applications, 2017, 85: 221–230.]
-
** А. А. Варфоломеева Выбор признаков при разметке библиографических списков методами структурного обучения, 2013, [http://www.machinelearning.ru/wiki/images/f/f2/Varfolomeeva2013Diploma.pdf?format=raw]
+
*# A. A. Varfolomeeva Selection of features when marking up bibliographic lists using structural learning methods, 2013, [http://www.machinelearning.ru/wiki/images/f/f2/Varfolomeeva2013Diploma.pdf?format=raw]
-
** Bin Cao, Ying Li and Jianwei Yin Measuring Similarity between Graphs Based on the Levenshtein Distance, 2012, [http://naturalspublishing.com/files/published/92cn7jm44d8wt1.pdf?format=raw]
+
*# Bin Cao, Ying Li and Jianwei Yin Measuring Similarity between Graphs Based on the Levenshtein Distance, 2012, [http://naturalspublishing.com/files/published/92cn7jm44d8wt1.pdf?format=raw]
-
* '''Basic algorithm:''' Конкретно к предлагаемой проблеме базового алгоритма нет. Предлагается попробовать повторить эксперимент А. А. Варфоломеевой для другого структурного описания, чтобы понять, что происходит.
+
* '''Base algorithm:''' Specifically, there is no basic algorithm for the proposed problem. It is proposed to try to repeat the experiment of A.A. Varfolomeeva for a different structural description in order to understand what is happening.
-
* '''Solution:''' Суперпозиция алгебраических функций задает ордерево, на вершинах которого заданы метки соответствующих алгебраических функций или переменных. Поэтому структурным описанием такой суперпозиции может являться ее DFS-code. Это строка, состоящая из меток вершин, записанных в порядке обхода дерева поиском в глубину. Зная арности соответствующих алгебраических функций, можем любой такой DFS-code восстановить за O(n) and получить обратно суперпозицию функций. На множестве подобных строковых описаний предлагается искать то строковое описание, которое будет соответствовать оптимальной модели.
+
* '''Solution:''' The superposition of algebraic functions defines an ortree, on the vertices of which the labels of the corresponding algebraic functions or variables are given. Therefore, the structural description of such a superposition can be its DFS-code. This is a string consisting of vertex labels, written in the order in which the tree is traversed by depth-first search. Knowing the arities of the corresponding algebraic functions, we can restore any such DFS-code in O(n) and get back the superposition of functions. On the set of similar string descriptions, it is proposed to search for the string description that will correspond to the optimal model.
-
* '''Authors:''' Кулунчаков Андрей, Strizhov V.V.
+
* '''Authors:''' Kulunchakov Andrey, Strijov V.V.
-
=== Task 5 (1) ===
+
===5 (1) ===
-
* '''Name:''' Определение параметров нейросети, подлежащих оптимизации.
+
* '''Title:''' Definition of neural network parameters to be optimized.
-
* '''Task''': Рассматривается Task оптимизации нейросети. Требуется разделить параметры модели на две группы:
+
* '''Problem:''' The problem of neural network optimization is considered. It is required to divide the model parameters into two groups:
-
** а) Параметры модели, подлежащие оптимизации
+
*# a) Model parameters to be optimized
-
** б) Параметры модели, оптимизация которых завершилась. Дальнейшая оптимизация данных параметров не даст улучшения качества модели.
+
*# b) Model parameters whose optimization has been completed. Further optimization of these parameters will not improve the quality of the model.
-
Предлагается рассматривать оптимизацию параметров как стохастический процесс. Основываясь на истории процесса найдем те параметры, чья оптимизация больше не требуется.
+
It is proposed to consider the optimization of parameters as a stochastic process. Based on the history of the process, we find those parameters whose optimization is no longer required.
-
* '''Data:''' Выборка рукописных цифр MNIST
+
* '''Data:''' A selection of handwritten MNIST digits
-
* '''Basic algorithm''': Случайный выбор параметров.
+
* '''Basic algorithm''': Random choice of parameters.
-
* '''References:''':
+
* '''References:'''
-
** [https://arxiv.org/pdf/1704.04289.pdf] SGD как стохастический процесс.
+
*# [https://arxiv.org/pdf/1704.04289.pdf] SGD as a stochastic process.
-
** [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.704.7138&rep=rep1&type=pdf] Вариационный вывод в нейросетях.
+
*# [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.704.7138&rep=rep1&type=pdf] Variational inference in neural networks.
-
* '''Novelty:''' полученный алгоритм позволит существенно снизить вычислительную стоимость оптимизации нейросетей. Возможным дальнейшим развитием метода является получение оценок на параметры сети, полученной из исходной операциями расширения, сжатия, добавления and удаления слоев.
+
* '''Novelty:''' The resulting algorithm will significantly reduce the computational cost of optimizing neural networks. A possible further development of the method is to obtain estimates for the parameters of the network obtained from the original operations of expansion, compression, adding and removing layers.
-
* '''Authors:''' Бахтеев Олег, Strizhov V.V.
+
* '''Authors:''' Oleg Bakhteev, Strijov V.V.
-
=== Task 6 (1) ===
+
===6 (1) ===
-
* '''Name:''' Предсказание графовой структуры нейросетевой модели.
+
* '''Title:''' Prediction of the graph structure of the neural network model.
-
* '''Task''': Рассматривается Task нахождения устойчивой (и не избыточной по параметрам) структуры сверточной нейросети. Предлагается предсказывать структуру нейросети с использованием doubly-recurrent нейросетей. В качестве обучающей выборки предлагается использовать структуры моделей, показавших хорошее качество на подвыборках небольшой мощности.
+
* '''Problem:''' The problem is considered to find a stable (and non-redundant in terms of parameters) structure of a convolutional neural network. It is proposed to predict the structure of a neural network using doubly-recurrent neural networks. As a training sample, it is proposed to use the structures of models that have shown good quality on subsamples of small power.
-
* '''Data:''' Выборки MNIST, CIFAR-10
+
* '''Data:''' Samples MNIST, CIFAR-10
-
* '''Basic algorithm''': случайный поиск. Возможно сравнение с работами по обучению с подкреплением.
+
* '''Basic algorithm''': random search. Comparison with work on reinforcement learning is possible.
-
* '''References:''':
+
* '''References:'''
-
** [https://pdfs.semanticscholar.org/e7bd/0e7a7ee6b0904d5de6e76e095a6a3b88dd12.pdf] doubly-recurrent нейросети.
+
*# [https://pdfs.semanticscholar.org/e7bd/0e7a7ee6b0904d5de6e76e095a6a3b88dd12.pdf] doubly-recurrent neural networks.
-
** [https://arxiv.org/pdf/1707.07012] Схожий подход с использованием обучения с подкреплением.
+
*# [https://arxiv.org/pdf/1707.07012] Similar approach using reinforcement learning.
-
* '''Authors:''' Бахтеев Олег, Strizhov V.V.
+
* '''Authors:''' Oleg Bakhteev, Strijov V.V.
-
=== Task 7 (1) ===
+
===7 (1) ===
-
* '''Name:''' Style Change Detection.
+
* '''Title:''' Style Change Detection.
-
* '''Task''': Дана коллекция документов, требуется определить, написан ли каждый документ одним автором, или несколькими (http://pan.webis.de/clef18/pan18-web/author-identification.html).
+
* '''Problem:''' Given a collection of documents, it is required to determine if each document is written by one author or by several (http://pan.webis.de/clef18/pan18-web/author-identification.html).
* '''Data:''' PAN 2018 (http://pan.webis.de/clef18/pan18-web/author-identification.html)
* '''Data:''' PAN 2018 (http://pan.webis.de/clef18/pan18-web/author-identification.html)
PAN 2017 (http://pan.webis.de/clef17/pan17-web/author-identification.html)
PAN 2017 (http://pan.webis.de/clef17/pan17-web/author-identification.html)
PAN 2016 (http://pan.webis.de/clef16/pan16-web/author-identification.html)
PAN 2016 (http://pan.webis.de/clef16/pan16-web/author-identification.html)
-
* '''References:''':
+
* '''References:'''
-
1. Ian Goodfellow. NIPS 2016 Tutorial: Generative Adversarial Networks (https://arxiv.org/pdf/1701.06547.pdf)
+
*# Ian Goodfellow. NIPS 2016 Tutorial: Generative Adversarial Networks (https://arxiv.org/pdf/1701.06547.pdf)
-
2. Jiwei Li, Will Monroe, Tianlin Shi, Sebastien Jean, Alan Ritter and Dan Jurafsky. Adversarial Learning for Neural Dialogue Generation(https://arxiv.org/pdf/1701.06547.pdf)
+
*# Jiwei Li, Will Monroe, Tianlin Shi, Sebastien Jean, Alan Ritter and Dan Jurafsky. Adversarial Learning for Neural Dialogue Generation(https://arxiv.org/pdf/1701.06547.pdf)
-
3. M. Kuznetsov, A. Motrenko, R. Kuznetsova, V. Strijov. Methods for Intrinsic Plagiarism Detection and Author Diarization (https://pdfs.semanticscholar.org/1011/6d82a8438c78877a8a142be47c4ee8662138.pdf)
+
*# M. Kuznetsov, A. Motrenko, R. Kuznetsova, V. Strijov. Methods for Intrinsic Plagiarism Detection and Author Diarization
-
4. K. Safin, R. Kuznetsova. Style Breach Detection with Neural Sentence Embeddings (https://pdfs.semanticscholar.org/c70e/7f8fbc561520accda7eea2f9bbf254edb255.pdf)
+
*# K. Safin, R. Kuznetsova. Style Breach Detection with Neural Sentence Embeddings (https://pdfs.semanticscholar.org/c70e/7f8fbc561520accda7eea2f9bbf254edb255.pdf)
-
* '''Basic algorithm''': решение, описанное в [3, 4].
+
* '''Basic algorithm''': solution described in [3, 4].
-
* '''Solution:''' предлагается решать задачу, используя generative adversarial networks — генеративная модель порождает тексты в одном авторском стиле, дискриминативная модель — бинарный классификатор.
+
* '''Solution:''' is proposed to solve the problem using generative adversarial networks — the generative model generates texts in the same author's style, the discriminative model — a binary classifier.
-
* '''Novelty:''' предполагается, что решение этой задачи предлагаемым методом может дать прирост качества по сравнению с типичными методами решениями этой задачи, а также связанных с ней задач кластеризации авторов.
+
* '''Novelty:''' it is assumed that the solution of this problem by the proposed method can give an increase in quality compared to typical methods for solving this problem, as well as related clustering problems of the authors.
-
* '''Authors:''' Рита Кузнецова (consultant), Strizhov V.V.
+
* '''Authors:''' Rita Kuznetsova (consultant), Strijov V.V.
-
=== Task 8 (1) ===
+
===8 (1) ===
-
* '''Name:''' Получение оценок правдоподобия с использованием автокодировщиков
+
* '''Title:''' Obtaining likelihood estimates using autoencoders
-
* '''Task''': предполагается, что рассматриваемые объекты подчиняются гипотезе многообразия (manifold learning) — вектора высокий размерности сосредоточились вокруг некоторого подпространства меньшей размерности. Работы [1, 2] показывают, что некоторые модификации автокодировщиков ищут k-мерное многообразие в пространстве объектов, которое наиболее полно передает структуру данных. В работе [2] выводится оценка плотности вероятности данных с помощью автокодировщика. Требуется получить эту оценку на правдоподобие модели.
+
* '''Problem:''' it is assumed that the objects under consideration obey the manifold hypothesis (manifold learning) - high-dimensional vectors are concentrated around some subspace of lower dimension. Works [1, 2] show that some modifications of autoencoders are looking for a k-dimensional manifold in the object space, which most fully conveys the data structure. In [2], an estimate of the probability density of data is derived using an autoencoder. It is required to obtain this estimate for the plausibility of the model.
-
* '''Data:''' предлагается провести эксперимент на коротких текстовых фрагментах Google ngrams (http://storage.googleapis.com/books/ngrams/books/datasetsv2.html)
+
* '''Data:''' it is proposed to experiment on short text fragments of Google ngrams (http://storage.googleapis.com/books/ngrams/books/datasetsv2.html)
-
* '''References:''':
+
* '''References:'''
-
## Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-Antoine Manzagol. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion (http://www.jmlr.org/papers/volume11/vincent10a/vincent10a.pdf).
+
*# Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-Antoine Manzagol. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion (http://www.jmlr.org/papers/volume11/vincent10a/vincent10a.pdf).
-
## Guillaume Alain, Yoshua Bengio. What Regularized Auto-Encoders Learn from the Data Generating Distribution (https://arxiv.org/pdf/1211.4246.pdf)
+
*# Guillaume Alain, Yoshua Bengio. What Regularized Auto-Encoders Learn from the Data Generating Distribution (https://arxiv.org/pdf/1211.4246.pdf)
-
## Hanna Kamyshanska, Roland Memisevic. The Potential Energy of an Autoencoder (https://www.iro.umontreal.ca/~memisevr/pubs/AEenergy.pdf)
+
*# Hanna Kamyshanska, Roland Memisevic. The Potential Energy of an Autoencoder (https://www.iro.umontreal.ca/~memisevr/pubs/AEenergy.pdf)
* '''Basic algorithm''':
* '''Basic algorithm''':
-
* '''Solution:''' в задаче предлагается обучить векторные представления для фраз (n-грамм) с использованием автокодировщика, с помощью теоремы 2 в работе [2] получить оценку на правдоподобие выборки и, с помощью этой оценки, вывести правдоподобие модели. С помощью полученных оценок можно также рассмотреть процесс сэмплирования.
+
* '''Solution:''' in the problem it is proposed to train vector representations for phrases (n-grams) using an autoencoder, using Theorem 2 in [2] to obtain an estimate for the likelihood of the sample and, using this estimate, derive the likelihood of the model . Using the estimates obtained, one can also consider the sampling process.
-
* '''Novelty:''' получение оценок правдоподобия данных and правдоподобия модели, порождение текстов с помощью полученных оценок.
+
* '''Novelty:''' obtaining data and model likelihood estimates, generating texts using the resulting estimates.
-
* '''Authors:''' Рита Кузнецова (consultant).
+
* '''Authors:''' Rita Kuznetsova (consultant).
-
=== Task 9 (1) ===
+
===9 (1) ===
-
* '''Name:''' Предсказание свойств and типов атомов в молекулярных графах при помощи сверточных сетей.
+
* '''Title:''' Predict properties and types of atoms in molecular graphs using convolutional networks.
-
* '''Task''': Multilabel classification using convolutional neural networks (CNN) on graphs.
+
* '''Problem:''' Multilabel classification using convolutional neural networks (CNN) on graphs.
-
Для предсказания взаимодействия молекул друг с другом зачастую необходимо правильно описать составляющие их атомы, поставив им в соответствие некоторые типы. Для маленьких молекул доступно не так много дескрипторов: координаты and химические элементы атомов, длины связей and величины углов между ними. Используя эти признаки, мы успешно предсказываем гибридизации атомов and типы связей. При таком подходе каждый атом рассматривается «по отдельности», информация о соседних атомах, необходимая для определения типа атома, практически не используется, and типы атомов определяются с помощью проверки большого числа условий. В то же время, молекулы представимы в виде трехмерных молекулярных графов, and было бы интересно использовать это для предсказания их типов методами машинного обучения, например, с помощью CNN.
+
To predict the interaction of molecules with each other, it is often necessary to correctly describe their constituent atoms by assigning certain types to them. For small molecules, not many descriptors are available: the coordinates and chemical elements of atoms, the lengths of bonds and the magnitude of the angles between them. Using these features, we successfully predict atomic hybridizations and bond types. In this approach, each atom is considered "individually", the information about neighboring atoms necessary to determine the type of an atom is practically not used, and the types of atoms are determined by checking a large number of conditions. At the same time, molecules are represented as 3D molecular graphs, and it would be interesting to use this to predict their types with machine learning methods, for example, using CNNs.
-
Необходимо предсказать типы вершин and рёбер молекулярных графов :
+
It is necessary to predict the types of vertices and edges of molecular graphs:
-
** тип атома (тип вершины графа, около 150 классов),
+
*# atom type (graph vertex type, about 150 classes),
-
** гибридизацию атома (вспомогательный признак, тип вершины, 4 класса),
+
*# atom hybridization (auxiliary feature, vertex type, 4 classes),
-
** тип связи (вспомогательный признак, тип ребра, 5 классов).
+
*# connection type (auxiliary feature, edge type, 5 classes).
-
Тип атома (вершины графа) основан на информации о его гибридизации and свойствах соседних с ним атомов. Поэтому в случае успешного решения задачи классификации можно провести кластеризацию для поиска других способов определения типов атомов.
+
The type of an atom (graph vertex) is based on information about its hybridization and the properties of neighboring atoms. Therefore, in the case of a successful solution of the classification problem, clustering can be carried out to find other ways to determine the types of atoms.
-
* '''Data:''' Около 15 тысяч молекул, представленных в виде молекулярных графов. Для каждой вершины (атома) известны 3D координаты and химический элемент. Дополнительно посчитаны длины связей, величины углов and двугранных углов между атомами (3D координаты графа), бинарные признаки, отражающие, входит ли атом в цикл and является ли он терминальным. Выборка размечена, однако в размеченных данных может содержаться ~% ошибок.
+
* '''Data:''' About 15 thousand molecules represented as molecular graphs. For each vertex (atom), 3D coordinates and a chemical element are known. Additionally, bond lengths, angles and dihedral angles between atoms (3D graph coordinates), binary signs reflecting whether an atom is included in the cycle and whether it is terminal are calculated. The sample is labeled, but the labeled data may contain ~5% errors.
-
Если данных будет недостаточно, возможно увеличение выборки (до 200 тысяч молекул), сопряженное с увеличением неточности в разметке.
+
If there is not enough data, it is possible to increase the sample (up to 200 thousand molecules), associated with an increase in inaccuracies in labeling.
-
* '''References:''':
+
* '''References:'''
-
** [http://proceedings.mlr.press/v48/niepert16.pdf]
+
*# [http://proceedings.mlr.press/v48/niepert16.pdf]
-
** [https://arxiv.org/pdf/1603.00856.pdf]
+
*# [https://arxiv.org/pdf/1603.00856.pdf]
-
** [https://arxiv.org/pdf/1204.4539.pdf]
+
*# [https://arxiv.org/pdf/1204.4539.pdf]
-
* '''Basic algorithm:''' Предсказание гибридизаций and порядков связей с помощью мультиклассового нелинейного SVM с небольшим числом дескрипторов. https://hal.inria.fr/hal-01381010/document
+
* '''Base algorithm:''' Prediction of hybridizations and link orders using a multiclass non-linear SVM with a small number of descriptors. https://hal.inria.fr/hal-01381010/document
-
* '''Solution:''' Предлагаемое решение задачи and способы проведения исследования.
+
* '''Solution:''' Proposed solution to the problem and ways of conducting research.
-
Способы представления and визуализации данных and проведения анализа ошибок, анализа качества алгоритма.
+
Methods for presenting and visualizing data and conducting error analysis, analyzing the quality of the algorithm.
-
На первом этапе нужно будет определить операции на графах, необходимые для построения архитектуры сети. Далее нужно будет обучить сеть для мульти-классовой классификации типов вершин (и ребер) входного графа.
+
At the first stage, it will be necessary to determine the operations on the graphs necessary to build the network architecture. Next, you will need to train the network for multi-class classification of the types of vertices (and edges) of the input graph.
-
Для оценки качества алгоритма предполагается оценивать точность с помощью кросс-валидации. Для конечной публикации (в профильном журнале) нужно будет сделать специфический тест на качество предсказаний: на основе предсказанных типов связи молекула записывается в виде строки (в формате SMILES) and сравнивается с образцом. В этом случае для каждой молекулы предсказание будет считаться верным, только если типы всех связей в ней были предсказаны без ошибок.
+
To assess the quality of the algorithm, it is supposed to evaluate the accuracy using cross-validation. For the final publication (in a specialized journal), it will be necessary to make a specific test for the quality of predictions: based on the predicted bond types, the molecule is written as a string (in SMILES format) and compared with a sample. In this case, for each molecule, the prediction will be considered correct only if the types of all bonds in it were predicted without errors.
-
* '''Novelty:''' Предложенные молекулярные графы обладают 3D структурой and внутренней иерархией, что делает их идеальным объектом применения CNN.
+
* '''Novelty:''' The proposed molecular graphs have a 3D structure and internal hierarchy, making them an ideal CNN application.
-
* '''Authors:''' Sergei Grudinin, Maria Kadukova, Strizhov V.V..
+
* '''Authors:''' Sergei Grudinin, Maria Kadukova, Strijov V.V.
-
=== Task 10 (1) ===
+
===10 (1) ===
-
* '''Name:''' Формулировка and решение задачи оптимизации, сочетающей классификацию and регрессию, для оценки энергии связывания белка and маленьких молекул. Описание задачи [https://www.overleaf.com/read/rjdnyyxpdkyj]
+
* '''Title:''' Formulation and solution of an optimization problem combining classification and regression to estimate the binding energy of a protein and small molecules. The problem description [https://www.overleaf.com/read/rjdnyyxpdkyj]
-
* '''Task''':
+
* '''Problem:'''
-
С точки зрения биоинформатики, Task заключается в оценке свободной энергии связывания белка с маленькой молекулой (лигандом): наилучший лиганд в своем наилучшем положении имеет \textbf{наименьшую свободную энергию} взаимодействия с белком. (Далее большой текст, см. файл по ссылке вверху.)
+
From the point of view of bioinformatics, The problem is to estimate the free energy of protein binding to a small molecule (ligand): the best ligand in its best position has the \textbf{lowest free energy} of interaction with the protein. (Following a large text, see the file at the link above.)
* '''Data:'''
* '''Data:'''
-
** Данные для бинарной классификации.
+
*# Data for binary classification.
-
Около 12,000 комплексов белков с лигандами: для каждого из них есть 1 нативная поза and 18 ненативных. Основными дескрипторами являются гистограммы распределений расстояний между различными атомами белка and лиганда, размерность вектора дескрипторов ~ 20,000. В случае продолжения исследования and публикации в профильном журнале набор дескрипторов может быть расширен.
+
Approximately 12,000 protein-ligand complexes: for each of them there is 1 native position and 18 non-native ones. The main descriptors are histograms of distributions of distances between different atoms of the protein and ligand, the dimension of the vector of descriptors is ~ 20,000. In the case of continued research and publication in a specialized journal, the set of descriptors can be expanded.
-
Данные будут предоставлены в виде бинарных файлов со скриптом на python для чтения.
+
The data will be provided as binary files with a python script to read.
-
** Данные для регрессии.
+
*# Data for regression.
-
Для каждого из представленных комплексов известно значение величины, которую можно интерпретировать как энергию связывания.
+
For each of the presented complexes, the value of the quantity is known, which can be interpreted as the binding energy.
-
* '''References:''':
+
* '''References:'''
-
** SVM [http://cs229.stanford.edu/notes/cs229-notes3.pdf]
+
*# SVM [http://cs229.stanford.edu/notes/cs229-notes3.pdf]
-
** Ridge Regression [http://scikit-learn.org/stable/modules/linear_model.html#ridge-regression]
+
*# Ridge Regression [http://scikit-learn.org/stable/modules/linear_model.html#ridge-regression]
-
** [https://alex.smola.org/papers/2003/SmoSch03b.pdf] (секция 1)
+
*# [https://alex.smola.org/papers/2003/SmoSch03b.pdf] (section 1)
-
* '''Basic algorithm:''' [https://hal.inria.fr/hal-01591154/]
+
* '''Base algorithm:''' [https://hal.inria.fr/hal-01591154/]
-
В задаче классификации мы использовали алгоритм, похожий на линейный SVM, связь которого с оценкой энергии, выходящей за рамки задачи классификации, описана в указанной выше статье. В задаче регрессии можно использовать различные функции потерь.
+
In the classification problem, we used an algorithm similar to linear SVM, whose relationship with the energy estimate, which is outside the scope of the classification problem, is described in the above article. Various loss functions can be used in a regression problem.
-
* '''Solution:''' Необходимо связать использованную ранее оптимизационную задачу с задачей регрессии and решить стандартными методами. Для проверки работы алгоритма будет использована кросс-валидация.
+
* '''Solution:''' It is necessary to connect the previously used optimization problem with the regression problem and solve it using standard methods. Cross-validation will be used to check the operation of the algorithm.
-
Есть отдельный тестовый сет, состоящий из (1) 195 комплексов белков and лигандов, для которых нужно найти наилучшую позу лиганда (алгоритм получения положений лиганда отличается от используемого при обучении), (2) комплексов белков and лигандов, для нативных поз которых нужно предсказать энергию связывания, and (3) 65 белков, для которых нужно найти наиболее сильно связывающийся лиганд.
+
There is a separate test set consisting of (1) 195 complexes of proteins and ligands, for which it is necessary to find the best ligand pose (the algorithm for obtaining ligand positions differs from that used in training), (2) complexes of proteins and ligands, for which native poses it is necessary to predict the energy binding, and (3) 65 proteins for which the most strongly binding ligand is to be found.
-
* '''Novelty:''' В первую очередь, интерес представляет ''объединение задач классификации and регрессии'''.
+
* '''Novelty:''' First of all, the interest is ''combining classification and regression problems'''.
-
Правильная оценка качества связывания белка and лиганда используется при разработке лекарства для поиска молекул, наиболее сильно взаимодействующих с исследуемым белком. Использование описанной выше задачи классификации для предсказания энергии связывания приводит к недостаточно высокой корреляции предсказаний с экспериментальными значениями, в то время как использование одной лишь задачи регрессии приводит к переобучению.
+
The correct assessment of the quality of protein and ligand binding is used in drug development to search for molecules that interact most strongly with the protein under study. Using the classification problem described above to predict the binding energy results in an insufficiently high correlation of predictions with experimental values, while using the regression problem alone leads to overfitting.
-
* '''Авторы''' Sergei Grudinin, Maria Kadukova, Strizhov V.V..
+
* '''Authors''' Sergei Grudinin, Maria Kadukova, Strijov V.V.
-
 
+
-
=2017=
+
 +
==2017==
{|class="wikitable"
{|class="wikitable"
|-
|-
Строка 2731: Строка 3213:
! Report
! Report
! Letters
! Letters
-
!
 
-
!
 
|-
|-
-
|Гончаров Алексей (пример)
+
|Goncharov Alexey (example)
-
|Метрическая классификация временных рядов
+
|Metric classification of time series
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/code code],
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/code code],
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/Goncharov2015MetricClassification.pdf paper],
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/Goncharov2015MetricClassification.pdf paper],
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/GoncharovAlexey2015PresentationMetricClassification.pdf slides]
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/GoncharovAlexey2015PresentationMetricClassification.pdf slides]
-
|[[Участник:Mpopova|Maria Popova]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Mpopova Maria Popova]
-
|Задаянчук Андрей
+
|Zadayanchuk Andrey
|BMF
|BMF
|AILSBRCVTDSWH>
|AILSBRCVTDSWH>
-
|
 
-
|
 
|-
|-
-
|[[Участник:Alvant|Алексеев Василий]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Alvant Alekseev Vasily]
-
|Внутритекстовая когерентность как мера интерпретируемости тематических моделей текстовых коллекций
+
|Intratext coherence as a measure of interpretability of thematic models of text collections
|[http://svn.code.sf.net/p/mlalgorithms/code/Group474/Alekseev2017IntraTextCoherence/code code]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group474/Alekseev2017IntraTextCoherence/code code]
[http://svn.code.sf.net/p/mlalgorithms/code/Group474/Alekseev2017IntraTextCoherence/data/postnauka_original_reduced/postnauka_clean data]
[http://svn.code.sf.net/p/mlalgorithms/code/Group474/Alekseev2017IntraTextCoherence/data/postnauka_original_reduced/postnauka_clean data]
Строка 2754: Строка 3232:
[https://www.youtube.com/watch?v=6v2dNMJG4iA video]
[https://www.youtube.com/watch?v=6v2dNMJG4iA video]
|Viktor Bulatov
|Viktor Bulatov
-
|Захаренков Антон
+
|Zakharenkov Anton
|BMF
|BMF
|AILSB+RC+V+TDHW
|AILSB+RC+V+TDHW
-
|
 
-
|
 
|-
|-
-
|[[Участник:Dmitriy_Anikeyev|Аникеев Дмитрий]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Dmitriy_Anikeyev Anikeev Dmitry]
-
|Локальная аппроксимация временных рядов для построения прогностических метамоделей
+
|Local approximation of time series for building predictive metamodels
|[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/Anikeyev_Penkin2017ClassifyingMetamodels/code/ code]
|[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/Anikeyev_Penkin2017ClassifyingMetamodels/code/ code]
[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/Anikeyev_Penkin2017ClassifyingMetamodels/paper/AnikeyevPenkin2017Splines.pdf paper]
[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/Anikeyev_Penkin2017ClassifyingMetamodels/paper/AnikeyevPenkin2017Splines.pdf paper]
[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/Anikeyev_Penkin2017ClassifyingMetamodels/paper/Anikeev%20F-talk.pdf slides]
[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/Anikeyev_Penkin2017ClassifyingMetamodels/paper/Anikeev%20F-talk.pdf slides]
-
|[[Участник:strijov|Strizhov V.V.]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:strijov Strijov V.V.]
-
|[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Anikeyev2017ClassifyingMetamodels/paper/Review.pdf Смердов Антон]
+
|[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Anikeyev2017ClassifyingMetamodels/paper/Review.pdf Smerdov Anton]
|BMF
|BMF
|AILS>B0R0C0V0T0D0H0W0
|AILS>B0R0C0V0T0D0H0W0
-
|
 
-
|
 
|-
|-
-
|[[Участник: Гасанов Эльнур|Гасанов Эльнур]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Gasanov_Elnur Gasanov Elnur]
-
|Построение аппроксимирующего описания скалограммы в задаче прогнозирования движений по электрокортикограмме
+
|Construction of an approximating description of a scalogram in the problem of predicting movements using an electrocorticogram
|[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Gasanov2017ECoGAnalysis/Code code] [https://svn.code.sf.net/p/mlalgorithms/code/Group474/Gasanov2017ECoGAnalysis/Paper/Gasanov2017ECoGAnalysis.pdf paper]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Gasanov2017ECoGAnalysis/Code code] [https://svn.code.sf.net/p/mlalgorithms/code/Group474/Gasanov2017ECoGAnalysis/Paper/Gasanov2017ECoGAnalysis.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Gasanov2017ECoGAnalysis/Paper/FTalk.pdf slides]
[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Gasanov2017ECoGAnalysis/Paper/FTalk.pdf slides]
-
|[[Участник:Anastasiya|Anastasia Motrenko]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Anastasiya Anastasia Motrenko]
-
|[[Участник: Ковалев_Дмитрий|Ковалев Дмитрий]]
+
|Kovalev Dmitry
|BMF
|BMF
|AILSBRCVTDH0W0
|AILSBRCVTDH0W0
-
|
 
-
|
 
|-
|-
-
|Захаренков Антон
+
|Zakharenkov Anton
-
|Massively multitask deep learning for drug discovery
+
|Massively multiThe problem deep learning for drug discovery
-
|[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Zakharenkov2017MassivelyMultitaskNetworks/code/ code]
+
|[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Zakharenkov2017MassivelyMultiThe problemNetworks/code/ code]
-
[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Zakharenkov2017MassivelyMultitaskNetworks/doc/Zakharenkov2017MassivelyMultitaskNetworks.pdf paper]
+
[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Zakharenkov2017MassivelyMultiThe problemNetworks/doc/Zakharenkov2017MassivelyMultiThe problemNetworks.pdf paper]
-
[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Zakharenkov2017MassivelyMultitaskNetworks/doc/Zakharenkov2016Presentation.pdf slides]
+
[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Zakharenkov2017MassivelyMultiThe problemNetworks/doc/Zakharenkov2016Presentation.pdf slides]
[https://youtu.be/l6M-CfpkZKQ video]
[https://youtu.be/l6M-CfpkZKQ video]
-
|[[Участник:Mpopova|Maria Popova]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Mpopova Maria Popova]
-
|Алексеев Василий
+
|Alekseev Vasily
|BMF
|BMF
|AILSBRCVT>D>H0W0
|AILSBRCVT>D>H0W0
-
|
 
-
|
 
|-
|-
-
|Ковалев Дмитрий
+
|Kovalev Dmitry
|Unsupervised representation for molecules
|Unsupervised representation for molecules
|[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Kovalev2017MoleculesRepresentation/code/ code]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Kovalev2017MoleculesRepresentation/code/ code]
[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Kovalev2017MoleculesRepresentation/doc/paper/Kovalev2017MoleculesRepresentation.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Kovalev2017MoleculesRepresentation/doc/paper/Kovalev2017MoleculesRepresentation.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Kovalev2017MoleculesRepresentation/doc/slides/Kovalev2017MoleculesRepresentation.pdf slides]
[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Kovalev2017MoleculesRepresentation/doc/slides/Kovalev2017MoleculesRepresentation.pdf slides]
-
|[[Участник:Mpopova|Maria Popova]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Mpopova Maria Popova]
-
|[[Участник: Гасанов Эльнур|Гасанов Эльнур]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Gasanov_Elnur Gasanov Elnur]
|BMF
|BMF
|AILSBRCVT>D>H0W0
|AILSBRCVT>D>H0W0
-
|
 
-
|
 
|-
|-
-
|Новицкий Василий
+
|Novitsky Vasily
-
|Выбор признаков в Taskх авторегрессионного прогнозирования биомедицинских сигналов
+
|Feature Selection in Problems of Autoregressive Prediction of Biomedical Signals
|[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Novitskiy2017Biosignal/doc/novitskiy.pdf paper]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Novitskiy2017Biosignal/doc/novitskiy.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Novitskiy2017Biosignal/code/ code]
[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Novitskiy2017Biosignal/code/ code]
[http://svn.code.sf.net/p/mlalgorithms/code/Group474/Novitskiy2017Biosignal/slides/presentation.pdf slides]
[http://svn.code.sf.net/p/mlalgorithms/code/Group474/Novitskiy2017Biosignal/slides/presentation.pdf slides]
-
|[[Участник:Katrutsa|Александр Катруца]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Katrutsa Alexander Katrutsa]
|
|
|B - F
|B - F
|AILS>B0R0C0V0T0D0H0W0
|AILS>B0R0C0V0T0D0H0W0
-
|
 
-
|
 
|-
|-
-
|Селезнева Мария
+
|Selezneva Maria
-
|Агрегирование гетерогенных текстовых коллекций в иерархической тематической модели русскоязычного научно-популярного контента
+
|Aggregation of heterogeneous text collections in a hierarchical thematic model of Russian-language popular science content
|[http://svn.code.sf.net/p/mlalgorithms/code/Group474/Seleznova2017AggregationARTM/paper/Seleznova2017AggregationARTM.pdf paper]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group474/Seleznova2017AggregationARTM/paper/Seleznova2017AggregationARTM.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Seleznova2017AggregationARTM/code/ code]
[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Seleznova2017AggregationARTM/code/ code]
[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Seleznova2017AggregationARTM/slides/FinalTalk.pdf slides]
[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Seleznova2017AggregationARTM/slides/FinalTalk.pdf slides]
[https://www.youtube.com/watch?v=eKUJtfGGlTY video]
[https://www.youtube.com/watch?v=eKUJtfGGlTY video]
-
|[[Участник:Iefimova|Ирина Ефимова]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Iefimova Irina Efimova]
-
|[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/Seleznova2017AggregationARTM/feedback/Selezniova2017_Sholokhov-Feedback.rtf Шолохов Алексей]
+
|[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/Seleznova2017AggregationARTM/feedback/Selezniova2017_Sholokhov-Feedback.rtf Sholokhov Alexey]
|BMF
|BMF
|A+IL+SBRCVTDHW
|A+IL+SBRCVTDHW
-
|
 
-
|
 
|-
|-
-
|Смердов Антон
+
|Smerdov Anton
-
|Выбор оптимальной модели рекуррентной сети в Taskх поиска парафраза
+
|Choosing the optimal recurrent network model in the Paraphrase Search The problems
|[http://svn.code.sf.net/p/mlalgorithms/code/Group474/Smerdov2017Paraphrase/doc/Smerdov2017Paraphrase.pdf paper]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group474/Smerdov2017Paraphrase/doc/Smerdov2017Paraphrase.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Smerdov2017Paraphrase/code/ code]
[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Smerdov2017Paraphrase/code/ code]
[http://svn.code.sf.net/p/mlalgorithms/code/Group474/Smerdov2017Paraphrase/doc/Smerdov2017ParaphrasePresentation.pdf slides]
[http://svn.code.sf.net/p/mlalgorithms/code/Group474/Smerdov2017Paraphrase/doc/Smerdov2017ParaphrasePresentation.pdf slides]
[https://www.youtube.com/watch?v=dW_xv2IlhC4 video]
[https://www.youtube.com/watch?v=dW_xv2IlhC4 video]
-
|[[Участник:Oleg Bakhteev|Oleg Bakhteev]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Oleg_Bakhteev Oleg Bakhteev]
-
|[[Участник:Dmitriy_Anikeyev|Дмитрий Аникеев]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Dmitriy_Anikeyev Dmitry Anikeev]
|BMF
|BMF
|AIL+SB+RC>V+M-T>D0H0W0
|AIL+SB+RC>V+M-T>D0H0W0
-
|
 
-
|
 
|-
|-
-
|Уваров Никита
+
|Uvarov Nikita
-
|Оптимальный алгоритм для восстановления динамических моделей
+
|Optimal Algorithm for Reconstruction of Dynamic Models
|[http://svn.code.sf.net/p/mlalgorithms/code/Group474/Uvarov2017DynamicGraphicalModels/doc/Uvarov2017DynamicGraphicalModels.pdf paper]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group474/Uvarov2017DynamicGraphicalModels/doc/Uvarov2017DynamicGraphicalModels.pdf paper]
[http://svn.code.sf.net/p/mlalgorithms/code/Group474/Uvarov2017DynamicGraphicalModels/slides/Uvarov2017DynamicGraphicalModels.pdf slides]
[http://svn.code.sf.net/p/mlalgorithms/code/Group474/Uvarov2017DynamicGraphicalModels/slides/Uvarov2017DynamicGraphicalModels.pdf slides]
Строка 2856: Строка 3318:
|BMF
|BMF
|AILS0B0R0C0V0T0D0H0W0
|AILS0B0R0C0V0T0D0H0W0
-
|
 
-
|
 
|-
|-
-
|Усманова Карина
+
|Usmanova Karina
|Multiple Manifold Learning (Joint diagonalization for 3D shapes - AJD on Hessian matrices)
|Multiple Manifold Learning (Joint diagonalization for 3D shapes - AJD on Hessian matrices)
|[http://svn.code.sf.net/p/mlalgorithms/code/Group474/Usmanova2017MultipleManifoldLearning/doc/Usmanova2017MultipleManifoldLearning.pdf paper]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group474/Usmanova2017MultipleManifoldLearning/doc/Usmanova2017MultipleManifoldLearning.pdf paper]
Строка 2865: Строка 3325:
[http://svn.code.sf.net/p/mlalgorithms/code/Group474/Usmanova2017MultipleManifoldLearning/code/ code]
[http://svn.code.sf.net/p/mlalgorithms/code/Group474/Usmanova2017MultipleManifoldLearning/code/ code]
[https://www.youtube.com/watch?v=sqHLmSU-2iM video]
[https://www.youtube.com/watch?v=sqHLmSU-2iM video]
-
|[[Участник:Mkarasikov|Михаил Карасиков]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Mkarasikov Mikhail Karasikov]
-
|[[Участник:IShibaev|Иннокентий Шибаев]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:IShibaev Innokenty Shibaev]
|BMF
|BMF
|AILSBRC+VT+EDH>W
|AILSBRC+VT+EDH>W
-
|
 
-
|
 
|-
|-
-
|Шибаев Иннокентий
+
|Innokenty Shibaev
|Convex relaxations for multiple structure alignment (synchronization problem for SO(3))
|Convex relaxations for multiple structure alignment (synchronization problem for SO(3))
|[http://svn.code.sf.net/p/mlalgorithms/code/Group474/Shibaev2017MultipleStructureAlignment/doc/Shibaev2017MultipleStructureAlignment.pdf paper]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group474/Shibaev2017MultipleStructureAlignment/doc/Shibaev2017MultipleStructureAlignment.pdf paper]
Строка 2878: Строка 3336:
[https://nbviewer.jupyter.org/urls/svn.code.sf.net/p/mlalgorithms/code/Group474/Shibaev2017MultipleStructureAlignment/code/Shibaev2017MultipleStructureAlignment_different_algs.ipynb code]
[https://nbviewer.jupyter.org/urls/svn.code.sf.net/p/mlalgorithms/code/Group474/Shibaev2017MultipleStructureAlignment/code/Shibaev2017MultipleStructureAlignment_different_algs.ipynb code]
[https://youtu.be/qs1Rchb02C0 video]
[https://youtu.be/qs1Rchb02C0 video]
-
|[[Участник:Mkarasikov|Михаил Карасиков]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Mkarasikov Mikhail Karasikov]
-
|Карина Усманова
+
|Usmanova Karina
|BMF
|BMF
|AILS-BRCVT>D>H>W
|AILS-BRCVT>D>H>W
-
|
 
-
|
 
|-
|-
-
|Шолохов Алексей
+
|Sholokhov Alexey
-
|Помехоустойчивость методов информационного анализа ЭКГ-сигналов
+
|Noise immunity of methods for informational analysis of ECG signals
|
|
[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/Sholokhov2017NoiseSustainability/doc/Sholokhov2017NoiseSustainability.pdf paper]
[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/Sholokhov2017NoiseSustainability/doc/Sholokhov2017NoiseSustainability.pdf paper]
Строка 2892: Строка 3348:
[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/Sholokhov2017NoiseSustainability/slides/Sholokhov2017NiseSustainability_MidTalk.pdf slides]
[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/Sholokhov2017NoiseSustainability/slides/Sholokhov2017NiseSustainability_MidTalk.pdf slides]
[https://www.youtube.com/watch?v=5BHIpUiY9VU video]
[https://www.youtube.com/watch?v=5BHIpUiY9VU video]
-
|Влада Бунакова
+
|Vlada Bunakova
-
|[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/Sholokhov2017NoiseSustainability/feedback/Sholokhov2017NoiseSustainability_SelezniovaFeedback.rtf Селезнева Мария]
+
|[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/Sholokhov2017NoiseSustainability/feedback/Sholokhov2017NoiseSustainability_SelezniovaFeedback.rtf Selezneva Maria]
|BMF
|BMF
|AILSBRCVTDHW
|AILSBRCVTDHW
-
|
 
-
|
 
|-
|-
|}
|}
-
Академ или новые
+
Risky works
{|class="wikitable"
{|class="wikitable"
|-
|-
Строка 2912: Строка 3366:
! Report
! Report
! Letters
! Letters
-
!
 
-
!
 
-
|-
 
-
|Кульков Александр
 
-
|Адаптивные релаксации NP трудных задач через машинное обучение
 
-
|[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/Kulkov2017AdaptiveRelaxations/doc/article.pdf paper]
 
-
|Yuri Maksimov
 
-
|
 
-
|академ
 
-
|A>I>L>B0R0C0V0T0D0H0W0
 
-
|
 
-
|
 
|-
|-
-
|Калошин Павел <!--- , Болотин Пётр--->
+
|Kaloshin Pavel
-
|Применение сетей глубокого обучения для переноса моделей классификации в случае недостаточного объема данных.
+
|Using deep learning networks to transfer classification models in case of insufficient data.
|
|
[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/KaloshinBolotin2017TransferLearning/paper/main.pdf paper]
[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/KaloshinBolotin2017TransferLearning/paper/main.pdf paper]
Строка 2935: Строка 3377:
| - MF
| - MF
|AIL-SBRC-VT+D>H>W0
|AIL-SBRC-VT+D>H>W0
-
|
 
-
|
 
|-
|-
-
|Малиновский Григорий
+
|Malinovsky Grigory
-
|Выбор интерпретируемых мультимоделей в Taskх кредитного скоринга
+
|Choice of Interpreted Multimodels in Credit Scoring The problems
|[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Malinovskiy2017CreditScoring/doc/paper.pdf paper]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Malinovskiy2017CreditScoring/doc/paper.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Malinovskiy2017CreditScoring/code/ code]
[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Malinovskiy2017CreditScoring/code/ code]
-
|[[Участник:Aduenko|Alexander Aduenko]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Aduenko Alexander Aduenko]
|
|
-
|академ B - -
+
|out B - -
|AILS-B>R>C>V>T0D0H0W0
|AILS-B>R>C>V>T0D0H0W0
-
|
 
-
|
 
|-
|-
-
|Плетнев Никита
+
|Pletnev Nikita
-
|Детектирование внутреннего плагиата
+
|Internal plagiarism detection
|[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Pletnev2017PlagiarismDetecting/Pletnev2017PlagiarismDetecting.pdf paper]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group474/Pletnev2017PlagiarismDetecting/Pletnev2017PlagiarismDetecting.pdf paper]
-
|[[Участник:Rita_Kuznetsova|Рита Кузнецова]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Rita_Kuznetsova Rita Kuznetsova]
|
|
-
|академ - - -
+
|out - - -
|A-I-L-S>B0R0C0V0T0D0H0W0
|A-I-L-S>B0R0C0V0T0D0H0W0
-
|
 
-
|
 
|-
|-
-
|Гревцев Александр
+
|Grevtsev Alexander
-
|Параллельные алгоритмы параметрической идентификации потенциала Терсоффа для AlN
+
|Parallel Algorithms for Parametric Identification of the Tersoff Potential for AlN
|
|
[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/Grevtsev2017Problem3/doc/Article.pdf paper]
[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/Grevtsev2017Problem3/doc/Article.pdf paper]
-
|Каринэ Абгарян
+
|Karine Abgaryan
-
|
+
-
|
+
|
|
|
|
|
|
|-
|-
-
|Зайцев Никита
+
|Zaitsev Nikita
-
|Автоматическая классификация научных статей по кристаллографии
+
|Automatic classification of scientific articles on crystallography
|
|
[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/Zaytsev2017ArticlesClassification/report/report.pdf paper]
[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/Zaytsev2017ArticlesClassification/report/report.pdf paper]
[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/Zaytsev2017ArticlesClassification/README.txt readme]
[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/Zaytsev2017ArticlesClassification/README.txt readme]
-
|Евгений Гаврилов
+
|Evgeny Gavrilov
-
|
+
-
|
+
|
|
|
|
|
|
|-
|-
-
|Дилигул Александр
+
|Diligul Alexander
-
|Определение оптимальных параметров потенциала для модели Rosato-Guillope-Legrand (RGL) по экспериментальным данным and результатам квантово-механических расчетов
+
|Determination of the optimal potential parameters for the Rosato-Guillope-Legrand (RGL) model from experimental data and the results of quantum mechanical calculations
|
|
[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/Diligul2017Problem4/Doc/Article.pdf paper]
[https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group474/Diligul2017Problem4/Doc/Article.pdf paper]
-
|Каринэ Абгарян
+
|Karine Abgaryan
-
|
+
-
|
+
|
|
|
|
|
|
|-
|-
-
|Дарья Фокина
+
|Daria Fokina
-
|Отбор кандидатов в задаче поиска текстовых заимствований с перефразированием, основанный на векторизации текстовых фрагментов
+
|Selection of Candidates in the Problem of Finding Text Borrowings with Paraphrasing Based on the Vectorization of Text Fragments
|
|
-
|[[Участник:Fess10|Алексей Романов]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Fess10 Alexey Romanov]
|
|
|
|
|AILSB0R0C0V0T0D0H0W0
|AILSB0R0C0V0T0D0H0W0
-
|
 
-
|
 
|-
|-
|}
|}
-
=== Task 1 ===
+
===1. 2017===
-
* '''Name:''' Классификация видов деятельности человека по измерениям фитнес-браслетов.
+
* '''Title:''' Classification of human activities according to fitness bracelet measurements.
-
* '''Task''': По измерениям акселерометра and гироскопа требуется определить вид деятельности рабочего. Предполагается, что временные ряды измерений содержат элементарные движения, которые образуют кластеры в пространстве описаний временных рядов. Характерная продолжительность движения – секунды. Временные ряды размечены метками вида деятельности: работа, отдых. Характерная продолжительность деятельности – минуты. Требуется по описанию временного ряда and кластера восстановить вид деятельности.
+
* '''Problem:''' According to the accelerometer and gyroscope measurements, it is required to determine the type of worker's activity. It is assumed that the time series of measurements contain elementary movements that form clusters in the space of time series descriptions. The characteristic duration of the movement is seconds. Time series are labeled with activity type labels: work, leisure. The typical duration of activity is minutes. It is required to restore the type of activity according to the description of the time series and cluster.
-
* '''Data:''' Временные ряды акселерометра WISDM ([[Временной ряд (библиотека примеров)]], раздел Accelerometry).
+
* '''Data:''' WISDM accelerometer time series ([[Time series (examples library)]], Accelerometry section).
-
* '''References:''':
+
* '''References:'''
-
** Карасиков М.Е., Strizhov V.V. Классификация временных рядов в пространстве параметров порождающих моделей // Информатика and ее применения, 2016. [[http://strijov.com/papers/Karasikov2016TSC.pdf URL]]
+
*# Karasikov M.E., Strijov V.V. Classification of time series in the space of parameters of generating models // Informatics and its applications, 2016. [[http://strijov.com/papers/Karasikov2016TSC.pdf URL]]
-
** Kuznetsov M.P., Ivkin N.P. Алгоритм классификации временных рядов акселерометра по комбинированному признаковому описанию // Машинное обучение and анализ данных. 2015. T. 1, 11. C. 1471 - 1483. [[http://jmlda.org/papers/doc/2015/no11/Ivkin2015TSclassification.pdf URL]]
+
*# Kuznetsov M.P., Ivkin N.P. Algorithm for Classifying Accelerometer Time Series by Combined Feature Description // Machine Learning and Data Analysis. 2015. V. 1, No. 11. C. 1471 - 1483. [[http://jmlda.org/papers/doc/2015/no11/Ivkin2015TSclassification.pdf URL]]
-
** Isachenko R.V., Strizhov V.V. Метрическое обучение в Taskх многоклассовой классификации временных рядов // Информатика and ее применения, 2016, 10(2) : 48-57. [[http://strijov.com/papers/Isachenko2016MetricsLearning.pdf URL]]
+
*# Isachenko R.V., Strijov V.V. Metric learning in The problemx multiclass classification of time series // Informatics and its applications, 2016, 10(2) : 48-57. [[http://strijov.com/papers/Isachenko2016MetricsLearning.pdf URL]]
-
** Zadayanchuk A.I., Popova M.S., Strizhov V.V. Выбор оптимальной модели классификации физической активности по измерениям акселерометра // Информационные технологии, 2016. [[http://strijov.com/papers/Zadayanchuk2015OptimalNN4.pdf URL]]
+
*# Zadayanchuk A.I., Popova M.S., Strijov V.V. Choosing the optimal model for classifying physical activity based on accelerometer measurements // Information technologies, 2016. [[http://strijov.com/papers/Zadayanchuk2015OptimalNN4.pdf URL]]
-
** Motrenko A.P., Strijov V.V. Extracting fundamental periods to segment human motion time series // Journal of Biomedical and Health Informatics, 2016, Vol. 20, No. 6, 1466 - 1476. [[http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group874/Motrenko2014TSsegmentation/JBHI/MotrenkoStrijov2014RV2.pdf?format=raw URL]]
+
*# Motrenko A.P., Strijov V.V. Extracting fundamental periods to segment human motion time series // Journal of Biomedical and Health Informatics, 2016, Vol. 20, no. 6, 1466 - 1476.
-
** Ignatov A., Strijov V. Human activity recognition using quasiperiodic time series collected from a single triaxial accelerometer // Multimedia Tools and Applications, 2015, 17.05.2015 : 1-14. [[http://strijov.com/papers/Ignatov2015HumanActivity.pdf URL]]
+
*# Ignatov A., Strijov V. Human activity recognition using quasiperiodic time series collected from a single triaxial accelerometer // Multimedia Tools and Applications, 2015, 17.05.2015 : 1-14. [[http://strijov.com/papers/Ignatov2015HumanActivity.pdf URL]]
-
* '''Basic algorithm:''' Basic algorithm описан в работах [Карасиков, Стрижов: 2016] and [Кузнецов, Ивкин: 2014].
+
* '''Base algorithm:''' Basic algorithm is described in [Karasikov, Strijov: 2016] and [Kuznetsov, Ivkin: 2014].
-
* '''Solution:''' Найти оптимальный способ сегментации and оптимальное описание временного ряда. Построить метрическое пространство описаний элементарных движений.
+
* '''Solution:''' Find the optimal segmentation method and optimal description of the time series. Construct a metric space of descriptions of elementary motions.
-
* '''Novelty:''': Соединение двух характеристических времен описания жизни человека, комбинированная постановка задачи.
+
* '''Novelty:''': Connection of two characteristic times of the description of a person's life, combined statement of the problem.
-
* '''Authors:''' Strizhov V.V., М.П. Кузнецов, П.В. Левдик.
+
* '''Authors:''' Strijov V.V., M.P. Kuznetsov, P.V. Levdik.
-
=== Task 2 ===
+
===2. 2017===
-
* '''Name:''' Построение аппроксимирующего описания скалограммы в задаче прогнозирования движений по электрокортикограмме.
+
* '''Title:''' Construction of an approximating description of a scalogram in the problem of predicting movements using an electrocorticogram.
-
* '''Task''': В рамках решения задачи декодирования сигналов ECoG решается Task классификации движений по временным рядам показаний электродов. Инструментами для извлечения признаков из временных рядов ECoG являются коэффициенты вейвлет-преобразования исследуемого сигнала [Макарчук 2016], на основе которых для каждого электрода строится скалограмма - двумерный массив признаков в пространстве частота-время. Объединение скалограмм для каждого электрода даёт признаки временного ряда в пространственно-частотно-временной области. Построенное таким образом признаковое описание заведомо содержит мультикоррелирующие признаки and является избыточным. Требуется предложить метод снижения размерности признакового пространства.
+
* '''Problem:''' As part of solving the problem of decoding ECoG signals, The problem of classifying movements by time series of electrode readings is solved. The tools for extracting features from ECoG time series are the coefficients of the wavelet transform of the signal under study [Makarchuk 2016], on the basis of which a scalogram is built for each electrode - a two-dimensional array of features in frequency-time space. Combining scalograms for each electrode gives signs of a time series in the spatio-frequency-time domain. The feature description constructed in this way obviously contains multicorrelated features and is redundant. It is required to propose a method for reducing the dimension of the feature space.
-
* '''Data:''' Измерения положений пальцев при совершении простых жестов. [https://purl.stanford.edu/zk881ps0522 Описание экспериментов] [https://stacks.stanford.edu/file/druid:zk881ps0522/gestures.zip данные].
+
* '''Data:''' Measurements of the positions of the fingers when performing simple gestures. [https://purl.stanford.edu/zk881ps0522 Description of experiments] [https://stacks.stanford.edu/file/druid:zk881ps0522/gestures.zip data].
-
* '''References:''':
+
* '''References:'''
-
** Макарчук Г.И., Zadayanchuk A.I. Strizhov V.V. 2016. Использование метода частичных наименьших квадратов для декодирования движения руки с помощью ECoG сигналов у обезьян. [http://svn.code.sf.net/p/mlalgorithms/code/Group374/Makarchuk2016ECoGSignals/doc/Makarchuk2016ECoGSignals.pdf pdf]
+
*# Makarchuk G.I., Zadayanchuk A.I. Strijov V.V. 2016. Using partial least squares to decode hand movement using ECoG cues in monkeys. [http://svn.code.sf.net/p/mlalgorithms/code/Group374/Makarchuk2016ECoGSignals/doc/Makarchuk2016ECoGSignals.pdf pdf]
-
** Карасиков М.Е., Strizhov V.V. Классификация временных рядов в пространстве параметров порождающих моделей // Информатика and ее применения, 2016. [[http://strijov.com/papers/Karasikov2016TSC.pdf URL]]
+
*# Karasikov M.E., Strijov V.V. Classification of time series in the space of parameters of generating models // Informatics and its applications, 2016. [[http://strijov.com/papers/Karasikov2016TSC.pdf URL]]
-
** Kuznetsov M.P., Ivkin N.P. Алгоритм классификации временных рядов акселерометра по комбинированному признаковому описанию // Машинное обучение and анализ данных. 2015. T. 1, 11. C. 1471 - 1483.
+
*# Kuznetsov M.P., Ivkin N.P. Algorithm for Classifying Accelerometer Time Series by Combined Feature Description // Machine Learning and Data Analysis. 2015. T. 1, No. 11. C. 1471 - 1483.
-
* '''Basic algorithm:''' PLS
+
* '''Base algorithm:''' PLS
-
Chen C, Shin D, Watanabe H, Nakanishi Y, Kambara H, et al. (2013) Prediction of Hand Trajectory from Electrocorticography Signals in Primary Motor Cortex. PLoS ONE 8(12): e83534.
+
Chen C, Shin D, Watanabe H, Nakanishi Y, Kambara H, et al. (2013) Prediction of Hand Trajectory from Electrocorticography Signals in Primary Motor Cortex. PLoS ONE 8(12): e83534.
-
* '''Solution:''' Для снижения размерности предлагается использовать метод локальной аппроксимации, предложенный в [Кузнецов 2015] использованный для классификации акселерометрических временных рядов [Карасиков 2016].
+
* '''Solution:''' To reduce the dimension, it is proposed to use the local approximation method proposed in [Kuznetsov 2015] used to classify accelerometric time series [Karasikov 2016].
-
* '''Novelty:''' Предложен новый метод восстановления движений на основе электрокортикограмм.
+
* '''Novelty:''' A new method of movement recovery based on electrocorticograms is proposed.
-
* '''Authors:''' Strizhov V.V., А.П. Мотренко
+
* '''Authors:''' Strijov V.V., A.P. Motrenko
-
=== Task 3 ===
+
===3. 2017===
-
* '''Name:''' Multiple Manifold Learning (Joint diagonalization for 3D shapes - AJD on Hessian matrices).
+
* '''Title:''' Multiple Manifold Learning (Joint diagonalization for 3D shapes - AJD on Hessian matrices).
-
* '''Task''': Построение оптимального алгоритма для задачи Multiple Manifold Learning. Даны две конформации белка (две третичные труктуры). В окрестности каждого состояния задана модель эластичного тела (колебания структуры в окрестности данных состояний). Task состоит в построении общей модели эластичного тела для нахождения промежуточных состояний с максимальным совпадением с данными моделями в окрестностях заданных конформаций. Пространство движений эластичного тела задается собственными векторами гессиана. Требуется найти общее low-rank приближение пространства движений двух эластичных тел.
+
* '''Problem:''' Building an optimal algorithm for the Multiple Manifold Learning The problem. Two protein conformations (two tertiary structures) are given. In the vicinity of each state, a model of an elastic body is specified (oscillations of the structure in the vicinity of these states). The problem is to build a general model of an elastic body to find intermediate states with the maximum match with these models in the vicinity of given conformations. The space of motion of an elastic body is given by the Hessian eigenvectors. It is required to find a common low-rank approximation of the space of motions of two elastic bodies.
-
* '''Data:''' Белковые структуры в двойных конформациях из PDB, около 100 наборов из статьи https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4677049/
+
* '''Data:''' Protein structures in double conformations from PDB, about 100 sets from the article https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4677049/
-
* '''References:''': Список научных работ, дополненный 1) формулировкой решаемой задачи, 2) ссылками на новые результаты (недавняя статья, близкая по результатам), 3) основной информацией об исследуемой проблеме.
+
* '''References:''' A list of scientific papers, supplemented by 1) the statement of the problem being solved, 2) links to new results (a recent article that is close in results), 3) basic information about the problem under study.
Tirion, M. M. (1996). Large amplitude elastic motions in proteins from a single-parameter, atomic analysis. Physical Review Letters, 77(9), 1905.
Tirion, M. M. (1996). Large amplitude elastic motions in proteins from a single-parameter, atomic analysis. Physical Review Letters, 77(9), 1905.
Moal, I. H., & Bates, P. A. (2010). {SwarmDock} and the Use of Normal Modes in Protein-Protein Docking. IJMS, 11(10), 3623–3648. https://doi.org/10.3390/ijms11103623
Moal, I. H., & Bates, P. A. (2010). {SwarmDock} and the Use of Normal Modes in Protein-Protein Docking. IJMS, 11(10), 3623–3648. https://doi.org/10.3390/ijms11103623
-
* '''Basic algorithm:''' AJD algorithm: http://perso.telecom-paristech.fr/~cardoso/jointdiag.html, AJD algorithms implemented as part of Shogun ML toolbox http://shogun-toolbox.org, http://shogun-toolbox.org/api/latest/classshogun_1_1CApproxJointDiagonalizer.html.
+
* '''Base algorithm:''' AJD algorithm: http://perso.telecom-paristech.fr/~cardoso/jointdiag.html, AJD algorithms implemented as part of Shogun ML toolbox http://shogun-toolbox.org , http://shogun-toolbox.org/api/latest/classshogun_1_1CApproxJointDiagonalizer.html.
-
* '''Solution:''' Вычисление гессианов (C++ код у Сергея), изучение and запуск стандартных алгоритмов совместной диагонализации для первых n нетривиальных собственных векторов, анализ функций потерь, адаптирование стандартного алгоритма для решения исходной задачи.
+
* '''Solution:''' Computing Hessians (C++ code from Sergey), learning and running standard joint diagonalization algorithms for the first n non-trivial eigenvectors, analyzing loss functions, adapting the standard algorithm to solve the original problem.
-
* '''Novelty:''' При помощи простых моделей теории эластичности с одним или несколькими свободными параметрами можно описать тепловые флуктуации в белках. Однако такие модели не описывают переходы между несколькими стабильными конформациями в белках. Целью данной работы является доработка эластичной модели так, чтобы она также описывала пространство конформационных изменений.
+
* '''Novelty:''' Using simple elasticity models with one or more free parameters, thermal fluctuations in proteins can be described. However, such models do not describe transitions between several stable conformations in proteins. The purpose of this work is to refine the elastic model so that it also describes the space of conformational changes.
-
* '''Authors:''' Грудинин Сергей, consultant: Карасиков Михаил / Максимов Юрий.
+
* '''Authors:''' Sergey Grudinin, consultant: Mikhail Karasikov / Yury Maksimov.
-
=== Task 4 ===
+
===4. 2017===
-
* '''Name:''' Convex relaxations for multiple structure alignment (synchronization problem for SO(3)).
+
* '''Title:''' Convex relaxations for multiple structure alignment (synchronization problem for SO(3)).
-
* '''Task''': Найти преобразования для одновременного выравнивания третичных структур белков (простыми словами: найти ортогональные преобразования, совмещающие данные в R^3 молекулы, имеющие одинаковые химические формулы). Если структуры одинаковые (RMSD после выравнивания равно нулю, структуры совмещаются точно), то выравнивать можно попарно. Однако, если это не так, то Basic algorithm, вообще говоря, не находит оптимум исходной задачи с функцией потерь для одновременного выравнивания.
+
* '''Problem:''' Find transformations to align protein tertiary structures simultaneously (in simple words: find orthogonal transformations that align data in R^3 molecules that have the same chemical formula). If the structures are the same (the RMSD is equal to zero after alignment, the structures are aligned exactly), then you can align in pairs. However, if this is not the case, then the Basic algorithm, generally speaking, does not find the optimum of the original problem with a loss function for simultaneous equalization.
-
* '''Data:''' Структуры белков в PDB формате в различных состояниях and системах координат.
+
* '''Data:''' Protein structures in PDB format in various states and coordinate systems.
-
* '''References:''':
+
* '''References:'''
-
** Multiple structural alignment:
+
*# Multiple structural alignment:
-
**# Kearsley.S.K. (1990)7. Comput. Chem., 11, 1187-1192.
+
*# Kearsley.S.K. (1990)7. Comput. Chem., 11, 1187-1192.
-
**# Shapiro., BothaJ.D., PastorA and Lesk.A.M. (1992) Acta Crystallogr., A48, 11-14.
+
*# Shapiro., BothaJ.D., PastorA and Lesk.A.M. (1992) Acta Crystallogr., A48, 11-14.
-
**# Diamond,R. (1992) Protein Sci., 1, 1279-1287.
+
*# Diamond,R. (1992) Protein Sci., 1, 1279-1287.
-
**# May AC, Johnson MS, Improved genetic algorithm-based protein structure comparisons: pairwise and multiple superpositions. Protein Eng. 1995 Sep;8(9):873-82.
+
*# May AC, Johnson MS, Improved genetic algorithm-based protein structure comparisons: pairwise and multiple superpositions. ProteinEng. 1995 Sep;8(9):873-82.
-
** Synchronisation problem:
+
*# Synchronization problem:
-
**# O. Özyeşil, N. Sharon, A. Singer, ``Synchronization over Cartan motion groups via contraction”, Available at arXiv.
+
*# O. Özyeşil, N. Sharon, A. Singer, ``Synchronization over Cartan motion groups via contraction”, Available at arXiv.
-
**# L. Wang, A. Singer, ``Exact and Stable Recovery of Rotations for Robust Synchronization”, Information and Inference: A Journal of the IMA, 2(2), pp. 145--193 (2013).
+
*# L. Wang, A. Singer, `ʻExact and Stable Recovery of Rotations for Robust Synchronization”, Information and Inference: A Journal of the IMA, 2(2), pp. 145--193 (2013).
-
**# Semidefinite relaxations for optimization problems over rotation matrices J Saunderson, PA Parrilo… - Decision and Control ( …, 2014 - ieeexplore.ieee.org
+
*# Semidefinite relaxations for optimization problems over rotation matrices J Saunderson, PA Parrilo… - Decision and Control ( …, 2014 - ieeexplore.ieee.org
-
**# Spectral synchronization of multiple views in SE (3) F Arrigoni, B Rossi, A Fusiello - SIAM Journal on Imaging Sciences, 2016 - SIAM
+
*# Spectral synchronization of multiple views in SE (3) F Arrigoni, B Rossi, A Fusiello - SIAM Journal on Imaging Sciences, 2016 - SIAM
-
**# Robust Rotation Synchronization via Low-rank and Sparse Matrix Decomposition, F Arrigoni, A Fusiello, B Rossi, P Fragneto - arXiv preprint arXiv: …, 2015 - arxiv.org
+
*# Robust Rotation Synchronization via Low-rank and Sparse Matrix Decomposition, F Arrigoni, A Fusiello, B Rossi, P Fragneto - arXiv preprint arXiv: …, 2015 - arxiv.org
-
** Spectral relaxation for SO(2)
+
*# Spectral relaxation for SO(2)
-
**# A. Singer, Angular synchronization by eigenvectors and semidefinite programming, Applied and Computational Harmonic Analysis 30 (1) (2011) 20 – 36.
+
*# A. Singer, Angular synchronization by eigenvectors and semidefinite programming, Applied and Computational Harmonic Analysis 30 (1) (2011) 20 – 36.
-
** Spectral relaxation for SO(3)
+
*# Spectral relaxation for SO(3)
-
**# M.Arie-Nachimson,S.Z.Kovalsky,I.Kemelmacher-Shlizerman,A.Singer,R.Basri,Global motion estimation from point matches, in: International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission, 2012, pp. 81–88.
+
*# M.Arie-Nachimson,S.Z.Kovalsky,I.Kemelmacher-Shlizerman,A.Singer,R.Basri,Global motion estimation from point matches, in: International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission, 2012 , pp. 81–88.
-
**# A. Singer, Y. Shkolnisky, Three-dimensional structure determination from common lines in cryo-em by eigenvectors and semidefinite programming, SIAM Journal on Imaging Sciences 4 (2) (2011) 543– 572.
+
*# A. Singer, Y. Shkolnisky, Three-dimensional structure determination from common lines in cryo-em by eigenvectors and semidefinite programming, SIAM Journal on Imaging Sciences 4 (2) (2011) 543–572.
-
* '''Basic algorithm:''' Алгоритм локального (попарного) выравнивания. Kearsley.S.K. (1989) Acta Crystallogr., A45, 208-210 ; Rapid determination of RMSDs corresponding to macromolecular rigid body motions
+
* '''Base algorithm:''' Local (pairwise) alignment algorithm. Kearsley S.K. (1989) Acta Crystallogr., A45, 208-210; Rapid determination of RMSDs corresponding to macromolecular rigid body motions
-
Petr Popov, Sergei Grudinin, Journal of Computational Chemistry, Wiley, 2014, 35 (12), pp.950-956. <10.1002/jcc.23569>
+
Petr Popov, Sergei Grudinin, Journal of Computational Chemistry, Wiley, 2014, 35(12), pp.950-956. <10.1002/jcc.23569>
-
DOI : 10.1002/jcc.23569
+
DOI: 10.1002/jcc.23569
-
* '''Solution:''' Два варианта постановки оптимизационных задач (через матрицы поворота and через кватернионы). Релаксация полученных задач выпуклыми, сравнение решений задачи базовым алгоритмом and релаксациями (spectral relaxation, SDP).
+
* '''Solution:''' Two options for setting optimization problems (through rotation matrices and through quaternions). Relaxation of the obtained problems by convex ones, comparison of the solutions of the problem by the basic algorithm and relaxations (spectral relaxation, SDP).
-
* '''Novelty:''' Метод, выравнивающий структуры, минимизируя функцию потерь, учитывающую все попарные потери.
+
* '''Novelty:''' A method that flattens structures by minimizing the loss function, taking into account all pairwise losses.
-
* '''Authors:''' Грудинин Сергей, consultant: Карасиков Михаил.
+
* '''Authors:''' Sergey Grudinin, consultant: Mikhail Karasikov.
-
=== Task 5 ===
+
===5. 2017===
-
* '''Name:''' Локальная аппроксимация временных рядов для построения прогностических метамоделей.
+
* '''Title:''' Local approximation of time series for building predictive metamodels.
-
* '''Task''': Исследуется физическая активность человека по временным рядам - измерениям акселерометра. Целью проекта является создание инструмента для анализа проблемы созания моделей прогнозирования моделей - метамоделей. Исследуется сегмент временного ряда. Требуется спрогнозировать класс сегмента. (Вариант: спрогнозировать окончание сегмента, последующий сегмент, его класс. При этом класс последующего сегмента может отличаться от класса предыдущего).
+
* '''Problem:''' The physical activity of a person is investigated by time series - accelerometer measurements. The aim of the project is to create a tool for analyzing the problem of creating models for predicting models - metamodels. The segment of the time series is investigated. It is required to predict the class of the segment. (Option: predict the end of the segment, the next segment, its class. In this case, the class of the next segment may differ from the class of the previous one).
-
* '''Data:''' Взять за основу выборку Santa Fe или WISDM (выборки состоят из сегментов со многими элементарными движениями and соответствующими сегментам метками классов), вариант OPPORTUNITY Activity Recognition Challenge.
+
* '''Data:''' Based on a Santa Fe or WISDM sample (samples consist of segments with many elementary movements and class labels corresponding to the segments), a variant of the OPPORTUNITY Activity Recognition Challenge.
-
* '''References:''':
+
* '''References:'''
-
** Карасиков М.Е., Strizhov V.V. Классификация временных рядов в пространстве параметров порождающих моделей // Информатика and ее применения, 2016. [[http://strijov.com/papers/Karasikov2016TSC.pdf URL]]
+
*# Karasikov M.E., Strijov V.V. Classification of time series in the space of parameters of generating models // Informatics and its applications, 2016. [[http://strijov.com/papers/Karasikov2016TSC.pdf URL]]
-
** Kuznetsov M.P., Ivkin N.P. Алгоритм классификации временных рядов акселерометра по комбинированному признаковому описанию // Машинное обучение and анализ данных. 2015. T. 1, 11. C. 1471 - 1483. [[http://jmlda.org/papers/doc/2015/no11/Ivkin2015TSclassification.pdf URL]]
+
*# Kuznetsov M.P., Ivkin N.P. Algorithm for Classifying Accelerometer Time Series by Combined Feature Description // Machine Learning and Data Analysis. 2015. V. 1, No. 11. C. 1471 - 1483. [[http://jmlda.org/papers/doc/2015/no11/Ivkin2015TSclassification.pdf URL]]
-
* '''Basic algorithm:''' [Карасиков 2016]
+
* '''Base algorithm:''' [Karasikov 2016]
-
* '''Solution:''' См. [[Media:Local_appr.pdf|описание задачи]].
+
* '''Solution:''' See [[Media:Local_appr.pdf|The problem description]].
-
* '''Novelty:''' При создании метапрогностических моделей (моделей прогнозирования прогностических моделей) остается открытой проблема использования значений параметров локальных моделей при создании метамоделей. Цель нижеприведенного проекта - создание инструмента для анализа этой проблемы.
+
* '''Novelty:''' When creating meta-prognostic models (predictive models of predictive models), the problem of using the values of parameters of local models when creating meta-models remains open. The purpose of the project below is to create a tool to analyze this problem.
-
* '''Authors:''' Strizhov V.V.
+
* '''Authors:''' Strijov V.V.
-
=== Task 6 ===
+
===6. 2017===
-
* '''Name:''' Выбор оптимальной модели рекуррентной сети в Taskх поиска парафраза
+
* '''Title:''' Choosing the optimal recurrent network model in the Paraphrase Search The problems
-
* '''Task''': Задана выборка пар предложений с метками <<похожие>> and <<непохожие>>. Требуется построить рекуррентную сеть небольшой сложности (т.е. с небольшим количеством параметров), доставляющую минимум ошибке классификации пар предложений.
+
* '''Problem:''' Given a selection of pairs of sentences labeled <<similar>> and <<dissimilar>>. It is required to build a recurrent network of low complexity (that is, with a small number of parameters) that delivers a minimum error in the classification of pairs of sentences.
-
* '''Data:''' Предлагается рассмотреть две выборки: [https://www.microsoft.com/en-us/download/details.aspx?id=52398 Microsoft Paraphrase Corpus] (небольшой набор предложений) and [http://sitem.herts.ac.uk/aeru/ppdb/en/ PPDB] (набор коротких сегментов, не всегда корректная разметка)
+
* '''Data:''' It is proposed to consider two samples: [https://www.microsoft.com/en-us/download/details.aspx?id=52398 Microsoft Paraphrase Corpus] (a small set of sentences) and [http ://sitem.herts.ac.uk/aeru/ppdb/en/ PPDB] (set of short segments, markup not always correct)
-
* '''References:''':
+
* '''References:'''
-
** [http://deeplearning.net/tutorial/lstm.html [1]] Пошаговое описание реализации рекуррентной сети LSTM
+
*# [http://deeplearning.net/tutorial/lstm.html [1]] Step by step description of the implementation of the LSTM recurrent network
-
** [http://www.cs.toronto.edu/~graves/nips_2011.pdf [2]] Алгоритм прореживания, основанный на построении сети, обладающей минимальной длиной описания
+
*# [http://www.cs.toronto.edu/~graves/nips_2011.pdf [2]] Thinning algorithm based on building a network with a minimum description length
-
** [3] [http://papers.nips.cc/paper/250-optimal-brain-damage.pdf Optimal Brain Damage]
+
*# [http://papers.nips.cc/paper/250-optimal-brain-damage.pdf Optimal Brain Damage] [3]
-
* '''Basic algorithm''': В качестве базового алгоритма могут выступать:
+
* '''Basic algorithm''': The basic algorithm can be:
-
*# Решение без прореживания
+
*# Solution without thinning
-
*# Решение, описанное в [3]
+
*# Solution described in [3]
-
*# Otimal Brain Damage
+
*# Optimal Brain Damage
-
* '''Solution:''' Предлагается рассмотреть метод прореживания, описанный в [3] с блочной матрицей ковариаций: в качестве блоков выступают либо нейроны, либо параметры с группировкой по входным признакам.
+
* '''Solution:''' It is proposed to consider the thinning method described in [3] with a block covariance matrix: either neurons or parameters grouped by input features act as blocks.
-
* '''Novelty:''' Предложенный метод позволит эффективно снижать сложность рекуррентной сети с учетом взаимосвязи между нейронами или входными признаками.
+
* '''Novelty:''' The proposed method will effectively reduce the complexity of the recurrent network, taking into account the relationship between neurons or input features.
* '''Authors:''' Oleg Bakhteev, consultant
* '''Authors:''' Oleg Bakhteev, consultant
-
=== Task 7 ===
+
===7. 2017===
-
* '''Name:''' Детектирование внутреннего плагиата
+
* '''Title:''' Internal plagiarism detection
-
* '''Task''': Решается Task выявления внутренних заимствований в тексте. Требуется проверить гипотезу о том, что заданный текст написан единственным автором, and в случае ее невыполнения выделить заимствованные части текста. Заимствованием считается часть текста, предположительно написанная другим автором and содержащая характерные отличия от стиля основного автора. Требуется разработать такую стилевую функцию, которая позволяет с высокой степенью достоверности отличить стиль основного автора текста от заимствований.
+
* '''Problem:''' Solved by The problem to identify internal borrowings in text. It is required to test the hypothesis that the given text was written by a single author, and if it is not fulfilled, highlight the borrowed parts of the text. A borrowing is a part of the text, presumably written by another author and containing characteristic differences from the style of the main author. It is required to develop such a style function that allows to distinguish with a high degree of certainty the style of the main author of the text from borrowings.
-
* '''Data:''' Предлагается рассмотреть корпус PAN-2011, PAN-2016
+
* '''Data:''' It is proposed to consider the corpus PAN-2011, PAN-2016
-
* '''References:''':
+
* '''References:'''
-
** [http://deeplearning.net/tutorial/lstm.html [1]] Пошаговое описание реализации рекуррентной сети LSTM
+
*# [http://deeplearning.net/tutorial/lstm.html [1]] Step by step description of the implementation of the LSTM recurrent network
-
** [https://arxiv.org/pdf/1608.04485.pdf [2]] Алгоритм кластеризации авторов
+
*# [https://arxiv.org/pdf/1608.04485.pdf [2]] Author clustering algorithm
-
** [http://www.fit.vutbr.cz/imikolov/rnnlm/thesis.pdf [3]] Statistical Language Models Based on Neural Networks
+
*# [http://www.fit.vutbr.cz/imikolov/rnnlm/thesis.pdf [3]] Statistical Language Models Based on Neural Networks
-
** [https://pdfs.semanticscholar.org/1011/6d82a8438c78877a8a142be47c4ee8662138.pdf [4]] Methods for intrinsic plagiarism detection and author diarization
+
*# [https://pdfs.semanticscholar.org/1011/6d82a8438c78877a8a142be47c4ee8662138.pdf [4]] Methods for intrinsic plagiarism detection and author diarization
-
* '''Basic algorithm''': В качестве базового алгоритма может выступать решение, описанное в [4].
+
* '''Basic algorithm''': The solution described in [4] can be used as the Basic algorithm
-
* '''Solution:''' Предлагается рассмотреть метод, описанный в [2] and строить стилевую функцию, основываясь на выходах нейронной сети.
+
* '''Solution:''' It is proposed to consider the method described in [2] and build a style function based on the neural network outputs.
-
* '''Novelty:''' Предполагается, что построение стилевой функции предлагаемым методом может дать прирост качества по сравнению с типичными решениями этой задачи.
+
* '''Novelty:''' It is assumed that the construction of a style function by the proposed method can give an increase in quality compared to typical solutions to this problem.
-
* '''Authors:''' Рита Кузнецова, consultant
+
* '''Authors:''' Rita Kuznetsova, consultant
-
=== Task 8 ===
+
===8. 2017===
-
* '''Name:''' Адаптивные релаксации NP трудных задач через машинное обучение
+
* '''Title:''' Adaptive relaxations of NP hard problems through machine learning
-
* '''Task''': Современные задачи оптимизации потоков мощности в энергетических сетях приводят к невыпуклым Taskм оптимизации с большим количеством ограничений. Аналогичные по структуре постановки возникают также в ряде других инженерных задач and в классических Taskх комбинаторной оптимизации. Традиционный подход к решению подобных NP трудных задач состоит в написании их выпуклых релаксаций (semidefinite/SDP, second order conic/SOCP, etc), имеющих как правило существенно большее множество допустимых решений, чем в исходной задаче. and последующей проекцией полученного решения в область, где выполнены ограничения исходной задачи. Во многих практических случаях, качество полученного таким образом решения невелико. Альтернативные подходы, например MILP (mixed integer linear programming) релаксации, существенно более трудоемки по времени, но приводят к более точно у ответу.
+
* '''Problem:''' Modern problems of optimizing power flows in power networks lead to non-convex optimization The problems with a large number of restrictions. Statements similar in structure also arise in a number of other engineering problems and in classical The problems of combinatorial optimization. The traditional approach to solving such NP hard problems is to write their convex relaxations (semidefinite/SDP, second order conic/SOCP, etc), which usually have a much larger set of feasible solutions than in the original problem. and by the subsequent projection of the obtained solution into the region where the constraints of the original problem are satisfied. In many practical cases, the quality of the solution obtained in this way is not high. Alternative approaches, for example MILP (mixed integer linear programming) relaxation, are substantially more time consuming but result in a more accurate answer.
-
Основная проблема состоит в невозможности применения известных методов для решения задач большой размерности (сети из 1000 узлов and более). Одним из ключевых препятствий является не столько размерность задачи, сколько большое число ограничений. Вместе с тем, в реальных Taskх можно выделить небольшое множество ограничений такое, что множества допустимых точек в выделенном множестве and в исходном весьма близки. Это позволит заменить задачу на иную, с меньшим числом ограничений, что повысит скорость используемых алгоритмов.
+
The main problem is the impossibility of using known methods for solving large-scale problems (networks of 1000 nodes and more). One of the key obstacles is not so much the dimension of the problem as a large number of restrictions. At the same time, in real The problems it is possible to single out a small set of restrictions such that the sets of admissible points in the selected set and in the original one are very close. This will allow us to replace The problem with another one with fewer restrictions, which will increase the speed of the algorithms used.
-
Предлагается использовать методы машинного обучения для построения указанного множества наиболее важных ограничений.
+
It is proposed to use machine learning methods to build the indicated set of the most important constraints.
-
* '''References:''': Методы семплинга/машинного обучения:
+
* '''References:''' Sampling/machine learning methods:
*# Beygelzimer, A., Dasgupta, S., & Langford, J. (2009, June). Importance weighted active learning. In Proceedings of the 26th annual international conference on machine learning (pp. 49-56). ACM.
*# Beygelzimer, A., Dasgupta, S., & Langford, J. (2009, June). Importance weighted active learning. In Proceedings of the 26th annual international conference on machine learning (pp. 49-56). ACM.
*# Tong, S., & Koller, D. (2001). Support vector machine active learning with applications to text classification. Journal of machine learning research, 2(Nov), 45-66.
*# Tong, S., & Koller, D. (2001). Support vector machine active learning with applications to text classification. Journal of machine learning research, 2(Nov), 45-66.
*# Owen, A., & Zhou, Y. (2000). Safe and effective importance sampling. Journal of the American Statistical Association, 95(449), 135-143.
*# Owen, A., & Zhou, Y. (2000). Safe and effective importance sampling. Journal of the American Statistical Association, 95(449), 135-143.
-
Релаксации: Nagarajan, H., Lu, M., Yamangil, E., & Bent, R. (2016). Tightening McCormick Relaxations for Nonlinear Programs via Dynamic Multivariate Partitioning. arXiv preprint arXiv:1606.05806.
+
Relaxations: Nagarajan, H., Lu, M., Yamangil, E., & Bent, R. (2016). Tightening McCormick Relaxations for Nonlinear Programs via Dynamic Multivariate Partitioning. arXiv preprint arXiv:1606.05806.
-
* '''Data:''' данные ieee + matpower содержащие описания энергетических сетей and режимов их функционирования.
+
* '''Data:''' ieee + matpower data containing descriptions of energy networks and their modes of operation.
-
* '''Novelty:''' указанный подход, по видимому, является первым применением методов прикладной статистики/машинного обучения для решения трудных оптимизационных задач. Мы ожидаем существенный выигрыш в трудоемки стиль методов
+
* '''Novelty:''' This approach seems to be the first application of applied statistics/machine learning methods to solve difficult optimization problems. We expect substantial gains in labor-intensive style methods
-
* '''Автор''': consultant: Yuri Maksimov, Expert: Михаил Чертков
+
* '''Author''': consultant: Yuri Maksimov, Expert: Mikhail Chertkov
-
=== Task 9 ===
+
===9. 2017===
-
* '''Name:''' Оптимальный алгоритм для восстановления динамических моделей.
+
* '''Title:''' Optimal Algorithm for Reconstruction of Dynamic Models.
-
* '''Task''': Стандартная постановка задач машинного обучения в контексте обучения без учителя (unsupervised learning) предполагает, что примеры (samples) независимы and получены из одного распределения вероятности. Однако зачастую наблюдаемые данные имеют динамическое происхождение and являются коррелироваными. Task состоит в разработке эффективного метода для восстановления динамической графической модели (графа and параметров модели) по наблюдаемым коррелированным динамическим конфигурациям. Эта Task важна с теоретической точки зрения and имеет массу приложений. Основой алгоритма будет служить адаптация нового оптимального метода экранирования взаимодействий (interaction screening), разработанного для модели Изинга. Процесс решения будет сочетать в себе знакомство с теоретическими методами компьютерных наук / машинного обучения and численные эксперименты.
+
* '''Problem:''' A standard machine learning problem statement in the context of unsupervised learning assumes that the examples are independent and come from the same probability distribution. However, often observed data are of dynamic origin and are correlated. The problem is to develop an efficient method for restoring a dynamic graphical model (graph and model parameters) from observed correlated dynamic configurations. This The problem is theoretically important and has many applications. The basis of the algorithm will be the adaptation of a new optimal method of screening interactions (interaction screening), developed for the Ising model. The solution process will combine familiarity with computer science/machine learning theoretical methods and numerical experiments.
-
* '''Data:''' Симулированные динамические конфигурации спинов в кинетической модели Изинга.
+
* '''Data:''' Simulated dynamic configurations of spins in the kinetic Ising model.
-
* '''References:''':
+
* '''References:'''
*# Lokhov et al., "Optimal structure and parameter learning of Ising models", arXiv:1612.05024 (2016) {https://arxiv.org/abs/1612.05024}
*# Lokhov et al., "Optimal structure and parameter learning of Ising models", arXiv:1612.05024 (2016) {https://arxiv.org/abs/1612.05024}
*# Vuffray et al., "Interaction screening: efficient and sample-optimal learning of Ising models", NIPS 2016 {https://arxiv.org/abs/1605.07252}
*# Vuffray et al., "Interaction screening: efficient and sample-optimal learning of Ising models", NIPS 2016 {https://arxiv.org/abs/1605.07252}
*# Decelle and Zhang, "Inference of the sparse kinetic Ising model using the decimation method", Phys. Rev. E 2016 {https://arxiv.org/abs/1502.01660}
*# Decelle and Zhang, "Inference of the sparse kinetic Ising model using the decimation method", Phys. Rev. E 2016 {https://arxiv.org/abs/1502.01660}
*# Bresler et al., "Learning graphical models from the Glauber dynamics", Allerton 2014 {https://arxiv.org/abs/1410.7659}
*# Bresler et al., "Learning graphical models from the Glauber dynamics", Allerton 2014 {https://arxiv.org/abs/1410.7659}
-
*# Zeng et al., "Maximum likelihood reconstruction for Ising models with asynchronous updates", Phys. Rev. Lett. 2013 {https://arxiv.org/abs/1209.2401}
+
*# Zeng et al., "Maximum likelihood reconstruction for Ising models with asynchronous updates", Phys. Rev. Lett. 2013
-
* '''Basic algorithm:''' Динамический метод экранирования взаимодействий. Сравнение с методом максимального правдоподобия.
+
* '''Base algorithm:''' Dynamic method for shielding interactions. Comparison with the maximum likelihood method.
-
* '''Novelty:''' В настоящее время оптимальный (т.е. использующий минимальное возможное количество примеров) алгоритм для данной задачи неизвестен. Динамический метод экранирования взаимодействия имеет хорошие шансы окончательно "закрыть" эту задачу, т.к. является оптимальным для статической задачи.
+
* '''Novelty:''' Currently, the optimal (ie using the minimum possible number of examples) algorithm for this problem is unknown. The dynamic method of interaction screening has a good chance of finally "closing" this The problem, because is optimal for a static problem.
-
* '''Автор''': consultants Андрей Лохов, Yuri Maksimov. Expert Михаил Чертков
+
* '''Author''': consultants Andrey Lokhov, Yuri Maksimov. Expert Mikhail Chertkov
-
=== Task 10 ===
+
===10. 2017===
-
* '''Name:''' Выбор интерпретируемых мультимоделей в Taskх кредитного скоринга
+
* '''Title:''' Choice of Interpreted Multimodels in Credit Scoring The problems
-
* '''Task''': Task кредитного скоринга заключается в определении уровня кредитоспособности заемщика. Для этого используется анкета заемщика, содержащая как числовые (возраст, доход), так and категориальные признаки (пол, профессия). Требуется, имея историческую информацию о возвратах кредитов другими заемщиками, определить, вернет ли заемщик кредит. Данные могут быть разнородными (например, в случае наличия в стране разных регионов по доходу), and для адекватной классификации потребуется несколько моделей. Необходимо определить оптимальное число моделей. По набору параметров моделей необходимо составить портрет заемщика.
+
* '''Problem:''' The problem of credit scoring is to determine the level of creditworthiness of the borrower. For this, a borrower's questionnaire is used, containing both numerical (age, income) and categorical features (gender, profession). It is required, having historical information about the repayment of loans by other borrowers, to determine whether the borrower will return the loan. The data can be heterogeneous (example, if there are different income regions in a country), and several models will be needed to adequately classify. It is necessary to determine the optimal number of models. Based on the set of model parameters, it is necessary to draw up a portrait of the borrower.
-
* '''Data:''' Предлагается рассмотреть пять выборок из репозиториев UCI and Kaggle, мощностью от 50000 объектов.
+
* '''Data:''' It is proposed to consider five samples from the UCI and Kaggle repositories, with a capacity of 50,000 objects or more.
-
* '''References:''': Диссертация А.А. Адуенко \MLAlgorithms\PhDThesis; С. Bishop, Pattern recognition and machine learning, последняя глава; 20 years of Mixture experts.
+
* '''References:''' A.A. Aduenko \MLAlgorithms\PhDThesis; C. Bishop, Pattern recognition and machine learning, final chapter; 20 years of Mixture experts.
-
* '''Basic algorithm:''' Кластеризация and построение независимых моделей логистической регрессии, Адабуст, Решающий лес (с ограничениями на сложность), Смесь Expertов.
+
* '''Base algorithm:''' Clustering and building independent logistic regression models, Adaboost, Decision Forest (with restrictions on complexity), Blend of Experts.
-
* '''Solution:''' Предлагается алгоритм выбора мультимодели (смеси моделей или смеси Expertов) and определения оптимального числа моделей.
+
* '''Solution:''' An algorithm is proposed for selecting a multi-model (a mixture of models or a mixture of Experts) and determining the optimal number of models.
-
* '''Novelty:''' Предлагается функция расстояния между моделями, в которых распределения параметров заданы на разных носителях.
+
* '''Novelty:''' Proposed function of distance between models in which parameter distributions are given on different media.
-
* '''Authors:''' А.А. Адуенко, Strizhov V.V..
+
* '''Authors:''' A.A. Aduenko, Strijov V.V.
-
=== Task 11 ===
+
===11. 2017===
-
* '''Name:''' Выбор признаков в Taskх авторегрессионного прогнозирования биомедицинских сигналов.
+
* '''Title:''' Feature Selection in Problems of Autoregressive Prediction of Biomedical Signals.
-
* '''Task''': Решается Task прогнозирования биомедицинских сигналов and сигналов интернета вещей. Требуется спрогнозировать вектор – несколько следующих отсчетов сигнала. Предполагается, что собственную размерность пространства как прогнозируемой переменной, так and независимой переменной можно существенно снизить, увеличив тем самым устойчивость прогноза без существенной потери точности. Для этого используется подход Partial Least Squares в авторегрессионном прогнозировании.
+
* '''Problem:''' The problem of predicting biomedical signals and IoT signals is being solved. It is required to predict the vector - the next few signal samples. It is assumed that the proper dimension of the space of both the predicted variable and the independent variable can be significantly reduced, thereby increasing the stability of the forecast without significant loss of accuracy. For this, the Partial Least Squares approach in autoregressive forecasting is used.
-
* '''Data:''' Выборка биомедицинских временных рядов SantaFe, выборка сигналов интернета вещей.
+
* '''Data:''' SantaFe biomedical time series sample, IoT signal sample.
-
* '''References:''': Katrutsa A.M., Strijov V.V. Stresstest procedure for feature selection algorithms // Chemometrics and Intelligent Laboratory Systems, 2015, 142 : 172-183; : Katrutsa A.M., Strijov V.V. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria // Expert Systems with applications, 2017; Kee Siong Ng A Simple Explanation of Partial Least Squares keesiong.ng@gopivotal.com Draft, April 27, 2013, http://users.cecs.anu.edu.au/~kee/pls.pdf
+
* '''References:''' Katrutsa A.M., Strijov V.V. Stresstest procedure for feature selection algorithms // Chemometrics and Intelligent Laboratory Systems, 2015, 142 : 172-183; : Katrutsa A.M., Strijov V.V. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria // Expert Systems with applications, 2017; Kee Siong Ng A Simple Explanation of Partial Least Squares keesiong.ng@gopivotal.com Draft, April 27, 2013, http://users.cecs.anu.edu.au/~kee/pls.pdf
-
* '''Basic algorithm:''' PLS, алгоритм квадратичной оптимизации для выбора признаков.
+
* '''Base algorithm:''' PLS, quadratic optimization algorithm for feature selection.
-
* '''Solution:''' построить матрицу плана с субоптимальным набором объектов and признаков, предложить функцию ошибки квадратичной оптимизации (по возможности развить на случай тензорного представления матрицы плана).
+
* '''Solution:''' build a design matrix with a suboptimal set of objects and features, propose a quadratic optimization error function (if possible, develop it for the case of a tensor representation of the design matrix).
-
* '''Novelty:''' Обобщен алгоритм выбора признаков (опубликованный две недели назад) для случая PLS.
+
* '''Novelty:''' Generalized feature selection algorithm (published two weeks ago) for the PLS case.
-
* '''Authors:''' А.М. Катруца, Strizhov V.V..
+
* '''Authors:''' A.M. Katrutsa, Strijov V.V.
-
=== Task 12 ===
+
===12. 2017===
-
* '''Name:''' Massively multitask deep learning for drug discovery
+
* '''Title:''' Massively multiThe problem deep learning for drug discovery
-
* '''Task''': Разработать мультитасковую рекурентную нейронную сеть для предсказания биологической активности. Для каждой пары "молекула-протеин" требуется предсказать бинарную величину 0/1, означающую, что молекула связывается/не связывается с протеином.
+
* '''Problem:''' Develop a multi-The problem recurrent neural network to predict biological activity. For each molecule-protein pair, it is required to predict the binary value 0/1, which means that the molecule binds/does not bind to the protein.
-
* '''Data:''' разреженные данные биологической активности для ~100K молекул против ~ 1000 протеинов. Молекулы представлены в формате SMILES строк (последовательность символов, кодирующая молекулу)
+
* '''Data:''' sparse biological activity data for ~100K molecules versus ~1000 proteins. Molecules are represented as SMILES strings (sequence of characters encoding a molecule)
-
* '''References:''': https://arxiv.org/pdf/1502.02072
+
* '''References:''' https://arxiv.org/pdf/1502.02072
-
* '''Basic algorithm:''' мультитасковая нейросеть, предсказывающая активность по числовым признакам, однотасковая рекурентная нейросеть
+
* '''Base algorithm:''' multi-The problem neural network that predicts activity by numerical features, single-The problem recurrent neural network
-
* '''Solution:''' Мультитасковость означает, что требуется построить модель, которая получается на вход молекулу and предсказывает её биологическую активность против всех протеинов в выборке.
+
* '''Solution:''' MultiThe probleming means that you need to build a model that is obtained for the input of a molecule and predicts its biological activity against all proteins in the sample.
-
* '''Novelty:''' Существующие методы не показали существенного улучшения качества DL модели по сравнению со стандартными ML моделями
+
* '''Novelty:''' Existing methods did not show a significant improvement in the quality of the DL model compared to standard ML models
* '''Authors:''' Expert -- Alexander Isaev, consultant -- Maria Popova
* '''Authors:''' Expert -- Alexander Isaev, consultant -- Maria Popova
-
=== Task 13 ===
+
===13. 2017===
-
* '''Name:''' Unsupervised representation for molecules
+
* '''Title:''' Unsupervised representation for molecules
-
* '''Task''': Разработать unsupervised метод для репрезентации молекул
+
* '''Problem:''' Develop an unsupervised method for representing molecules
-
* '''Data:''' ~1.5M молекул в формате SMILES строк (последовательность символов, кодирующая молекулу)
+
* '''Data:''' ~1.5M molecules in SMILES string format (character sequence encoding the molecule)
-
* '''References:''': https://www.cs.toronto.edu/~hinton/science.pdf
+
* '''References:''' https://www.cs.toronto.edu/~hinton/science.pdf
-
* '''Basic algorithm:''' в настоящее время в качестве такой репрезентации используются выделенные вручную числовые признаки. Качество полученых репрезентаций можно сравнить с датасетом tox21 (10К молекул против 12 протеинов)
+
* '''Base algorithm:''' currently hand-selected numerical features are used as such representation. The quality of the resulting representations can be compared with the tox21 dataset (10K molecules versus 12 proteins)
-
* '''Solution:''' использовать свёрточные или рекуррентные сети для построения автоэнкодера.
+
* '''Solution:''' use convolutional or recurrent networks to build an autoencoder.
-
* '''Novelty:''' построение end-to-end модели для получения информативных признаков
+
* '''Novelty:''' building an end-to-end model to get informative features
* '''Authors:''' Expert -- Alexander Isaev, consultant -- Maria Popova
* '''Authors:''' Expert -- Alexander Isaev, consultant -- Maria Popova
-
=== Task 14 ===
+
===14. 2017===
-
* '''Name:''' Внутритекстовая когерентность как мера интерпретируемости тематических моделей текстовых коллекций.
+
* '''Title:''' Intratext coherence as a measure of interpretability of thematic models of text collections.
-
* '''Task''': Интерпретируемость – это субъективная характеристика качества тематических моделей, измеряемая с помощью Expertных оценок. Когерентность – это мера совстречаемости тематических слов, вычислимая по тексту автоматически and хорошо коррелирующая с интерпретируемостью, как показано в серии публикаций Ньюмана and Мимно. Первая Task – оценить репрезентативность последовательности слов текста, по которым оценивается когерентность. Вторая Task – сравнить несколько новых методов измерения интерпретируемости and когерентности, основанных на выделении наиболее репрезентативной последовательности слов в исходном тексте.
+
* '''Problem:''' Interpretability is a subjective measure of the quality of topic models, as measured by Expert Scores. Coherence is a measure of the occurrence of thematic words, calculated automatically from the text and correlates well with interpretability, as shown in the Newman and Mimno series. The first The problem is to evaluate the representativeness of the sequence of words in the text, according to which the coherence is estimated. The second The problem is to compare several new methods for measuring interpretability and coherence based on the selection of the most representative sequence of words in the source text.
-
* '''Data:''' Коллекция научно-популярного контента ПостНаука, коллекция новостного контента.
+
* '''Data:''' A collection of popular science content PostNauka, a collection of news content.
-
* '''References:''':
+
* '''References:'''
-
*#''Vorontsov K. V.'' [[Media:voron17survey-artm.pdf|Обзор вероятностных тематических моделей]], 2017.
+
*# Vorontsov K. V. [[Media:voron17survey-artm.pdf|Review of probabilistic thematic models]], 2017.
-
*#''N.Aletras, M.Stevenson.'' Evaluating Topic Coherence Using Distributional Semantics, 2013.
+
*# N.Aletras, M.Stevenson. Evaluating Topic Coherence Using Distributional Semantics, 2013.
-
*#''D.Newman et al.'' Automatic evaluation of topic coherence, 2010
+
*# D. Newman et al. Automatic evaluation of topic coherence, 2010
-
*#''D.Mimno et al.'' Optimizing semantic coherence in topic models, 2011
+
*# D.Mimno et al. Optimizing semantic coherence in topic models, 2011
-
*#http://palmetto.aksw.org/palmetto-webapp/
+
*# http://palmetto.aksw.org/palmetto-webapp/
-
* '''Basic algorithm:''' Стандартные методы оценивания интерпретируемости and когерентности тем в тематических моделях.
+
* '''Base algorithm:''' Standard methods for estimating the interpretability and coherence of topics in topic models.
-
* '''Solution:''' Новый метод измерения интерпретируемости and когерентности, эксперименты по поиску максимально коррелирующих мер интерпретируемости and когерентности, аналогичные [D.Newman, 2010].
+
* '''Solution:''' A new method for measuring interpretability and coherence, experiments to find the most correlated measures of interpretability and coherence, similar to [D.Newman, 2010].
-
* '''Novelty:''' внутритекстовые меры интерпретируемости and когерентности ранее не предлагались.
+
* '''Novelty:''' inline measures of interpretability and coherence were not previously proposed.
-
* '''Authors:''' Vorontsov K. V.. consultants: Viktor Bulatov, Анна Потапенко, Артём Попов.
+
* '''Authors:''' Vorontsov K. V.. consultants: Viktor Bulatov, Anna Potapenko, Artyom Popov.
-
=== Task 15 ===
+
===15. 2017===
-
* '''Name:''' Агрегирование гетерогенных текстовых коллекций в иерархической тематической модели русскоязычного научно-популярного контента.
+
* '''Title:''' Aggregation of heterogeneous text collections in a hierarchical thematic model of Russian-language popular science content.
-
* '''Task''': Реализовать and сравнить несколько способов объединения текстовых коллекций из различных источников в одну иерархическую тематическую модель. Построить классификатор, определяющий наличие темы в источнике.
+
* '''Problem:''' Implement and compare multiple ways of combining text collections from different sources into one hierarchical topic model. Build a classifier that determines the presence of a topic in the source.
-
* '''Data:''' Коллекция научно-популярного контента ПостНаука, коллекция Википедии.
+
* '''Data:''' Collection of popular science content PostNauka, Wikipedia collection.
-
* '''References:''':
+
* '''References:'''
-
*#''Vorontsov K. V.'' [[Media:voron17survey-artm.pdf|Обзор вероятностных тематических моделей]], 2017.
+
*# Vorontsov K. V. [[Media:voron17survey-artm.pdf|Review of probabilistic thematic models]], 2017.
-
*#''Чиркова Н. А, Vorontsov K. V.'' [http://jmlda.org/papers/doc/2016/no2/Chirkova2016hARTM.pdf Аддитивная регуляризация мультимодальных иерархических тематических моделей] // Машинное обучение and анализ данных, 2016. T. 2. 2.
+
*# Chirkova N. A, Vorontsov K. V. [http://jmlda.org/papers/doc/2016/no2/Chirkova2016hARTM.pdf Additive regularization of multimodal hierarchical topic models] // Machine Learning and Data Analysis, 2016. T. 2. No. 2.
-
* '''Basic algorithm:''' Алгоритм построения тематической иерархии в BigARTM, реализованный Надеждой Чирковой. Инструмент для разметки
+
* '''Base algorithm:''' An algorithm for constructing a thematic hierarchy in BigARTM, implemented by Nadezhda Chirkova. Marking tool
-
* '''Solution:''' Построить тематическую модель с модальностями источников and выделить темы, характерные только для одного из источников. Подготовить выборку для обучения классификатора, определяющего наличие темы в источнике.
+
* '''Solution:''' Build a topic model with source modalities and highlight topics specific to only one of the sources. Prepare a sample for training a classifier that determines the presence of a topic in the source.
-
* '''Novelty:''' Аддитивная регуляризация тематических моделей к данной задаче ранее не применялась.
+
* '''Novelty:''' Additive regularization of topic models has not been applied to this problem before.
-
* '''Authors:''' Vorontsov K. V.. consultants: Александр Романенко, Ирина Ефимова, Надежда Чиркова.
+
* '''Authors:''' Vorontsov K. V.. consultants: Alexander Romanenko, Irina Efimova, Nadezhda Chirkova.
-
=== Task 16 ===
+
===16. 2017===
-
* '''Name:''' Применение методов символьной динамики в технологии информационного анализа электрокардиосигналов.
+
* '''Title:''' Application of the methods of symbolic dynamics in the technology of informational analysis of electrocardiosignals.
-
* '''Task''': Технология информационного анализа электрокардиосигналов, предложенная В.М.Успенским, предполагает преобразование сырого сигнала в символьную последовательность and поиск паттернов заболеваний в даннйо последовательности. До сих пор для поиска паттернов использовались преимущественно символьные n-граммы. В рамках данной работы предлагается расширить класс шаблонов, в котором производится поиск диагностических признаков заболеваний. Критерий качества -- AUC and MAP ранжирования диагнозов.
+
* '''Problem:''' The technology of informational analysis of electrocardiosignals, proposed by V.M.Uspensky, involves converting a raw signal into a character sequence and searching for disease patterns in this sequence. So far, symbolic n-grams have been predominantly used to search for patterns. In the framework of this work, it is proposed to expand the class of templates in which the search for diagnostic signs of diseases is performed. Quality criterion -- AUC and MAP ranking of diagnoses.
-
* '''Data:''' Выборка электрокардиограмм с известными диагнозами.
+
* '''Data:''' A selection of electrocardiograms with known diagnoses.
-
* '''References:''':
+
* '''References:'''
-
*#''Успенский В.М.'' Информационная функция сердца. Теория and практика диагностики заболеваний внутренних органов методом информационного анализа электрокардиосигналов.- М.:«Экономика and информация», 2008. - 116с
+
*# Uspensky V.M. Informational function of the heart. Theory and practice of diagnosing diseases of internal organs by the method of information analysis of electrocardiosignals. - M .: "Economics and Information", 2008. - 116s
-
*#[[Технология информационного анализа электрокардиосигналов]].
+
*# Technology of information analysis of electrocardiosignals.
-
* '''Basic algorithm:''' Методы классификации .
+
* '''Base algorithm:''' Classification methods .
-
* '''Solution:''' Поиск логических закономерностей в символьных строках, методы символьной динамики, сравнение алгоритмов по критериям качества AUC and MAP (ранжирования диагнозов).
+
* '''Solution:''' Search for logical patterns in character strings, methods of character dynamics, comparison of algorithms according to the quality criteria AUC and MAP (diagnosis ranking).
-
* '''Novelty:''' До сих пор для поиска паттернов использовались преимущественно символьные n-граммы.
+
* '''Novelty:''' So far, character n-grams have been used predominantly to search for patterns.
-
* '''Authors:''' Vorontsov K. V.. consultants: Влада Целых.
+
* '''Authors:''' Vorontsov K. V.. consultants: Vlada Tselykh.
-
=== Task Vorontsov +===
+
=== Vorontsov The problems +===
* '''Title''': Dynamic hierarchical thematic model of the news flow.
* '''Title''': Dynamic hierarchical thematic model of the news flow.
-
* '''Task''': Develop an algorithm for classifying topics in news flows into new and ongoing ones. Apply the obtained criteria for creating new topics at all levels of the topic model hierarchy when adding the next piece of data to the text collection (for example, all news for one day).
+
* '''Problem:''' Develop an algorithm for classifying topics in news flows into new and ongoing ones. Apply the obtained criteria for creating new topics at all levels of the topic model hierarchy when adding the next piece of data to the text collection (for example, all news for one day).
* '''Data:''' Collection of news in Russian. A subsample of news classified into two classes: new and ongoing topics.
* '''Data:''' Collection of news in Russian. A subsample of news classified into two classes: new and ongoing topics.
* '''Literature''':
* '''Literature''':
*#''Vorontsov K.V.'' [[Media:voron17survey-artm.pdf|Review of probabilistic thematic models]], 2017.
*#''Vorontsov K.V.'' [[Media:voron17survey-artm.pdf|Review of probabilistic thematic models]], 2017.
-
*#''Chirkova N. A, Vorontsov K. V.'' [http://jmlda.org/papers/doc/2016/no2/Chirkova2016hARTM.pdf Additive regularization of multimodal hierarchical topic models] // Machine Learning and Data Analysis , 2016. T. 2. No. 2.
+
*#''Chirkova N. A, Vorontsov K. V.'' [http://jmlda.org/papers/doc/2016/no2/Chirkova2016hARTM.pdf Additive regularization of multimodal hierarchical topic models] // Machine Learning and Data Analysis , 2016 T. 2. No. 2.
* '''Basic Algorithm''': An algorithm for constructing a thematic hierarchy in BigARTM, implemented by Nadezhda Chirkova. Known Topic Detection & Tracking algorithms.
* '''Basic Algorithm''': An algorithm for constructing a thematic hierarchy in BigARTM, implemented by Nadezhda Chirkova. Known Topic Detection & Tracking algorithms.
* '''Solution''': Using BigARTM, selecting regularizers and their parameters, using the topic selection regularizer. Building an algorithm for classifying topics into new and ongoing.
* '''Solution''': Using BigARTM, selecting regularizers and their parameters, using the topic selection regularizer. Building an algorithm for classifying topics into new and ongoing.
Строка 3236: Строка 3664:
* '''Authors''': KV Vorontsov. Consultants: Alexander Romanenko, Artyom Popov.
* '''Authors''': KV Vorontsov. Consultants: Alexander Romanenko, Artyom Popov.
-
=== Task Antiplagiarism + ===
+
===Antiplagiarism + ===
-
* '''Name:''' Отбор кандидатов в задаче поиска текстовых заимствований с перефразированием, основанный на векторизации текстовых фрагментов.
+
* '''Title:''' Selection of Candidates in the Problem of Finding Text Borrowings with Paraphrasing Based on the Vectorization of Text Fragments.
-
* '''Task''': Поиск текстовых заимствований по коллекции документов предполагает отбор небольшого множества кандидатов для последующего детального анализа. Task отбора кандидатов формулируется как поиск оптимального ранжирования документов коллекции по запросу относительно некоторой функции, являющейся оценкой для общей длины заимствований из документа коллекции в документ-запрос.
+
* '''Problem:''' Searching for text borrowings in a collection of documents involves selecting a small set of candidates for subsequent detailed analysis. The Candidate Selection The problem is formulated as finding the optimal ranking of documents in a collection for a query with respect to some function that is an estimate for the total length of borrows from a collection document to a query document.
* '''Data:''' [http://pan.webis.de/clef11/pan11-web/plagiarism-detection.html PAN]
* '''Data:''' [http://pan.webis.de/clef11/pan11-web/plagiarism-detection.html PAN]
-
* '''References:''':
+
* '''References:'''
-
*#''Романов А.В., Хританков А.С.'' Отбор кандидатов при поиске заимствований в коллекции документов на иностранном языке [http://www.machinelearning.ru/wiki/images/c/c4/6.Romanov.pdf pdf]
+
*# Romanov A.V., Khritankov A.S. Selection of candidates when searching for borrowings in a collection of documents in a foreign language [http://www.machinelearning.ru/wiki/images/c/c4/6.Romanov .pdf]
-
* '''Basic algorithm''': метод шинглов с построением обратного индекса.
+
* '''Basic algorithm''': shingles method with reverse index construction.
-
* '''Solution:''' Векторизация фрагментов текста (word embeddings + свёрточные / рекуррентные нейронные сети) and последующий поиск ближайших объектов в многомерном метрическом пространстве.
+
* '''Solution:''' Vectorization of text fragments (word embeddings + convolutional / recurrent neural networks) and subsequent search for nearest objects in a multidimensional metric space.
-
* '''Novelty:''' новый подход к решению задачи.
+
* '''Novelty:''' a new approach to solving the problem.
-
* '''Authors:''' Алексей Романов (consultant)
+
* '''Authors:''' Alexey Romanov (consultant)
-
== Additional tasks ==
+
Additional projects
 +
=== Vorontsov+===
 +
* '''Title:''' Thematic modeling of an economic sector based on bank transaction data.
 +
* '''Problem:''' Test the hypothesis that a large sample of transactions between firms is adequately described by a relatively small set of economic activities (aka topics). The problem is reduced to decomposing the matrix of transactional data "buyers × sellers" into the product of three non-negative matrices "buyers × topics", "topics × topics", "topics × sellers", while the middle matrix describes a directed graph of financial flows in the industry. It is required to compare several methods for constructing such expansions and find the number of topics for which the observed set of transactions is modeled with sufficient accuracy.
 +
* '''Data:''' selection of transactions between firms, such as "buyer, seller, volume".
 +
* '''References:'''
 +
*# Vorontsov K. V. [[Media:voron17survey-artm.pdf|Review of probabilistic thematic models]], 2017.
 +
* '''Base algorithm:''' Standard methods for non-negative matrix expansions.
 +
* '''Solution:''' Regularized EM-algorithm for sparse non-negative matrix expansions. Visualization of the graph of financial flows. Testing the algorithm on synthetic data, testing the hypothesis about the stability of sparse solutions.
 +
* '''Novelty:''' Thematic modeling has not previously been applied to the analysis of financial transactional data.
 +
* '''Authors:''' Vorontsov K. V.. consultants: Viktor Safronov, Rosa Aisina.
-
=== Task Vorontsov + ===
+
===scoring+===
-
* '''Name:''' Тематическое моделирование отрасли экономики по транзакционным данным банка.
+
* '''Title:''' Generating and selecting features when building a credit scoring model.
-
* '''Task''': Проверить гипотезу, что большая выборка транзакций между фирмами достаточно хорошо описывается относительно небольшим множеством видов экономической деятельности (они же темы). Task сводится к разложению матрицы транзакционных данных «покупатели × продавцы» в произведение трёх неотрицательных матриц «покупатели × темы», «темы × темы», «темы × продавцы», при этом средняя матрица описывает направленный граф финансовых потоков в отрасли. Требуется сравнить несколько методов построения таких разложений and найти число тем, при котором наблюдаемое множество транзакций моделируется с достаточной точностью.
+
* '''Problem:''' Credit scoring models are built step by step. In particular, a number of independent transformations of individual features are performed, and new features are generated. Each step uses its own quality criterion. It is required to build a scoring model that adequately describes the sample. Maximizing the quality of the model at each step does not guarantee the maximum quality of the resulting model. It is proposed to abandon the step-by-step construction of the scoring model. To do this, the quality criterion must include all the optimized parameters of the model.
-
* '''Data:''' выборка транзакций между фирмами, вида «покупатель, продавец, объём».
+
* '''Data:''' The computational experiment will be performed on 5-7 samples to be found. It is desirable that the samples be of the same nature, for example, the samples of consumer credit questionnaires.
-
* '''References:''':
+
* '''References:''' Siddique N. Constructing scoring models, SAS. Hosmer D., Lemeshow S., Applied logistic regression, Wiley. Katrutsa A.M., Strijov V.V. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria // Expert Systems with applications, 2017.
-
*# ''Vorontsov K. V.'' [[Media:voron17survey-artm.pdf|Обзор вероятностных тематических моделей]], 2017.
+
* '''Base algorithm:''' The scoring model construction algorithm recommended by SAS.
-
* '''Basic algorithm:''' Стандартные методы неотрицательных матричных разложений.
+
* '''Solution:''' Each step of the procedure is represented as an optimization problem. The parameters to be optimized are combined, and the Feature Selection The problem is included as a Mixed Optimization The problem.
-
* '''Solution:''' Регуляризованный ЕМ-алгоритм для разреженных неотрицательных матричных разложений. Визуализация графа финансовых потоков. Тестирование алгоритма на синтетических данных, проверка гипотезы об устойчивости разреженных решений.
+
* '''Novelty:''' An error function is proposed, when using which the generation and selection of features, as well as the optimization of model parameters, are performed together.
-
* '''Novelty:''' тематическое моделирование ранее не применялось к анализу финансовых транзакционных данных.
+
* '''Authors:''' T.V. Voznesenskaya, Strijov V.V.
-
* '''Authors:''' Vorontsov K. V.. consultants: Виктор Сафронов, Роза Айсина.
+
-
=== Task scoring + ===
+
===Popova+===
-
* '''Name:''' Порождение and выбор признаков при построении модели кредитного скоринга.
+
* '''Title:''' Representation of molecules in 3D
-
* '''Task''': Построение кредитных скоринговых моделей выполняется по шагам. В частности, выполняется ряд независимых преобразований отдельных признаков, порождаются новые признаки. На каждом шаге используется собственный критерий качества. Требуется построить скоринговую модель, адекватно описывающую выборку. Максимизация качества модели на каждом шаге не гарантирует максимального качества полученной модели. Предлагается отказаться от пошагового построения скоринговой модели. Для этого критерий качества должен включать все оптимизируемые параметры модели.
+
* '''Problem:''' Develop representations of the 3D structure of molecules that would have the property of rotational and translational invariance.
-
* '''Data:''' Вычислительный эксперимент будет выполнен на 5-7 выборках, которые требуется найти. Желательно, чтобы выборки имели одну природу, например, выборки анкет потребительского кредита.
+
* '''Data:''' Millions of molecules given by 3D coordinates
-
* '''References:''': Siddique N. Constructing scoring models, SAS. Hosmer D., Lemeshow S., Applied logistic regression, Wiley. Katrutsa A.M., Strijov V.V. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria // Expert Systems with applications, 2017.
+
* '''References:''' https://arxiv.org/abs/1610.08935, http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.98.146401
-
* '''Basic algorithm:''' Алгоритм построения скоринговой модели, рекомендуемый SAS.
+
* '''Base algorithm:''' low rank matrix/tensor factorization
-
* '''Solution:''' Каждый шаг процедуры представляется в виде задачи оптимизации. Оптимизируемые параметры объединяются, включается Task выбора признаков как Task смешанной оптимизации.
+
* '''Solution:''' Molecules have a different number of atoms, and therefore the matrix of their 3D coordinates is Nx3. We need to find a mathematical transformation that would be independent of N (N is the number of atoms).
-
* '''Novelty:''' Предложена функция ошибки, при использовании который порождение and выбор признаков, а также оптимизация параметров модели выполняются совместно.
+
* '''Novelty:''' existing algorithms depend on the number of atoms in the molecule
-
* '''Authors:''' Т.В. Вознесенская, Strizhov V.V..
+
-
 
+
-
=== Task Popova + ===
+
-
* '''Name:''' Representation of molecules in 3D
+
-
* '''Task''': Разработать репрезентации 3D структуры молекул, которые обладали бы свойством вращательной and трансляционной инвариантности.
+
-
* '''Data:''' Миллионы молекул, заданные 3D координатами
+
-
* '''References:''': https://arxiv.org/abs/1610.08935, http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.98.146401
+
-
* '''Basic algorithm:''' low rank matrix/tensor factorization
+
-
* '''Solution:''' Молекулы имеют различное число атомов, and поэтому матрица их 3D координат имеет размерность Nx3. Нужно найти математическое преобразование, которое бы независило от N (N - число атомов).
+
-
* '''Novelty:''' существующие алгоритмы зависят от числа атомов в молекуле
+
* '''Authors:''' Expert -- Alexander Isaev, consultant -- Maria Popova
* '''Authors:''' Expert -- Alexander Isaev, consultant -- Maria Popova
-
=== Task Maksimov + ===
+
===Maksimov+===
-
* '''Name:''' Оптимальный алгоритм для восстановления блочных гамильтонианов (моделей XY and Гейзенберга).
+
* '''Title:''' Optimal algorithm for recovering block Hamiltonians (XY and Heisenberg models).
-
* '''Task''': Task состоит в восстановлении блочных гамильтонианов с непрерывными спинами (обощение модели Изинга на двух- and трёхмерные спины) по наблюдаемым данным. Эта постановка представляет собой частный случай области машинного обучения, известной как обучение без учителя (unsupervised learning). Восстановление графической спиновой модели по данным наблюдений является важной задачей в физике. Основой алгоритма будет служить адаптация нового оптимального метода экранирования взаимодействий (interaction screening), разработанного для модели Изинга. Процесс решения будет сочетать в себе знакомство с теоретическими методами компьютерных наук / машинного обучения and численные эксперименты.
+
* '''Problem:''' The problem is to reconstruct block Hamiltonians with continuous spins (a generalization of the Ising model to two- and three-dimensional spins) from the observed data. This setting is a special case of a field of machine learning known as unsupervised learning. Reconstruction of a graphical spin model from observational data is an important problem in physics. The basis of the algorithm will be the adaptation of a new optimal method of screening interactions (interaction screening), developed for the Ising model. The solution process will combine familiarity with computer science/machine learning theoretical methods and numerical experiments.
-
* '''Data:''' Симулированные конфигурации блочных спиновых моделей.
+
* '''Data:''' Simulated block spin model configurations.
-
* '''References:''':
+
* '''References:'''
*# Lokhov et al., "Optimal structure and parameter learning of Ising models", arXiv:1612.05024 (2016) {https://arxiv.org/abs/1612.05024}
*# Lokhov et al., "Optimal structure and parameter learning of Ising models", arXiv:1612.05024 (2016) {https://arxiv.org/abs/1612.05024}
*# Vuffray et al., "Interaction screening: efficient and sample-optimal learning of Ising models", NIPS 2016 {https://arxiv.org/abs/1605.07252}
*# Vuffray et al., "Interaction screening: efficient and sample-optimal learning of Ising models", NIPS 2016 {https://arxiv.org/abs/1605.07252}
*# Tyagi et al., "Regularization and decimation pseudolikelihood approaches to statistical inference in XY spin models", Phys. Rev. B 2016 {https://arxiv.org/abs/1603.05101}
*# Tyagi et al., "Regularization and decimation pseudolikelihood approaches to statistical inference in XY spin models", Phys. Rev. B 2016 {https://arxiv.org/abs/1603.05101}
-
* '''Basic algorithm:''' Динамический метод экранирования взаимодействий. Сравнение с методом максимального псевдо-правдоподобия (pseudolikelihood).
+
* '''Base algorithm:''' Dynamic method for shielding interactions. Comparison with the method of maximum pseudo-likelihood (pseudolikelihood).
-
* '''Novelty:''' Алгоритм основанный на динамическом методе экранирования взаимодействия имеет хорошие шансы быть оптимальным для данной задачи, т.к. соотествующий метод является оптимальным для обратной задачи Изинга.
+
* '''Novelty:''' An algorithm based on the dynamic interaction shielding method has a good chance of being optimal for this problem, because the corresponding method is optimal for the inverse Ising problem.
-
* '''Автор''': consultants Андрей Лохов, Yuri Maksimov. Expert Михаил Чертков
+
* '''Author''': consultants Andrey Lokhov, Yuri Maksimov. Expert Mikhail Chertkov
-
 
+
-
=== Task Khritankova (Transfer Learning) ===
+
-
* '''Name:''' Применение сетей глубокого обучения для переноса моделей классификации в случае недостаточного объема данных.
+
-
* '''Task''':
+
-
*# Разработать алгоритм вычисления набора скрытых признаков в задаче symmetric homogeneous transfer learning , решение задачи классификации в котором не зависит от исходной области, and который не хуже, чем при решении для каждого области отдельно (transfer error) для случая небольших размеров выборки с ошибками в разметке
+
-
*# Разработать алгоритм перехода к скрытому набору признаков без использования разметки (unsupervised domain adaptation)
+
-
* '''Data:''' teraPromise-CK (33 датасета с одинаковыми признаками, но разными распределениями).
+
-
* '''References:''':Базовая статья: Xavier Glorot , Antoine Bordes , Yoshua Bengio. (2011) Domain Adaptation for Large-Scale sentiment classification: A Deep Learning approach / In Proceedings of the Twenty-eight International Conference on Machine Learning, ICML.
+
-
Статьи с идеями по доработкам алгоритма будут выданы на руки (несколько).
+
-
* '''Basic algorithm:''' SDA (Stacked Denoising Autoencoder) – описан в статье базовой статье Glorot et al.
+
-
* '''Solution:''' Взять Basic algorithm, а) попробовать улучшить для применения к небольшим датасетам 100-1000 объектов (когда and применяется transfer learning) путем применения регуляризаторов, корректировкой архитектуры автокодировшика, корректировки алгоритма обучения (например, bootstrapping) б) исследовать модель на устойчивость к ошибкам в разметке (label corruption / noisy labels) and предложить доработку для повышения устойчивости (robustness).
+
-
* '''Novelty:''' Получение устойчивого алгоритма переноса моделей классификации на небольших объемах данных с ошибками в разметке.
+
-
* '''Authors:''' Хританков
+
 +
===Khritankova (Transfer Learning) ===
 +
* '''Title:''' Using deep learning networks to transfer classification models in case of insufficient data.
 +
* '''Problem description:'''
 +
*# Develop an algorithm for calculating a set of latent features in the symmetric homogeneous transfer learning problem, the solution of the classification problem in which does not depend on the original area, and which is no worse than when solving for each area separately (transfer error) for the case of small sample sizes with errors in markup
 +
*# Develop an algorithm for transitioning to a hidden set of features without using markup (unsupervised domain adaptation)
 +
* '''Data:''' teraPromise-CK (33 datasets with the same features but different distributions).
 +
* '''References:''' Base article: Xavier Glorot , Antoine Bordes , Yoshua Bengio. (2011) Domain Adaptation for Large-Scale sentiment classification: A Deep Learning approach / In Proceedings of the Twenty-eight International Conference on Machine Learning, ICML.
 +
Articles with ideas for improving the algorithm will be handed out (several).
 +
* '''Base algorithm:''' SDA (Stacked Denoising Autoencoder) – described in the Glorot et al.
 +
* '''Solution:''' Take the Basic algorithm, a) try to improve it for application to small datasets of 100-1000 objects (when transfer learning is applied) by applying regularizers, adjusting the architecture of the autoencoder, adjusting the learning algorithm (for example, bootstrapping) b ) investigate the model for resistance to markup errors (label corruption / noisy labels) and propose improvements to increase stability (robustness).
 +
* '''Novelty:''' Obtaining a stable algorithm for transferring classification models on small amounts of data with markup errors.
 +
* '''Authors:''' Khritankov
-
=== Task INRIA-МТФИ + ===
+
===INRIA===
-
* '''Name:''' Оценка энергии связывания белка and маленьких молекул.
+
* '''Title:''' Estimated binding energy of protein and small molecules.
-
* '''Task''': Моделирование связывания белка and маленькой молекулы (далее -- лиганда) основывается на том, что наилучший лиганд в своем наилучшем положении имеет наименьшую свободную энергию взаимодействия с белком. Необходимо оценить свободную энергию связывания белка and лиганда. Для обучения могут использоваться комплексы белков с лигандами, причем для каждого белка есть несколько положений лиганда: 1 правильное, "нативное", для которых энергия минимальна, and несколько сгенерированных неправильных. Для трети набора данных известны значения, пропорциональные искомой энергии связывания лигандов в нативных положениях с белком. Есть отдельный тестовый сет, состоящий из 1) комплексов белков and лигандов, для которых нужно найти наилучшую позу лиганда (алгоритм получения положений лиганда отличается от используемого при обучении), 2) комплексов белков and лигандов, для нативных поз которых нужно предсказать энергию связывания, and 3) белков, для которых нужно найти наиболее сильно связывающийся лиганд.
+
* '''Problem:''' Modeling the binding of a protein and a small molecule (hereinafter referred to as a ligand) is based on the fact that the best ligand in its best position has the lowest free energy of interaction with the protein. It is necessary to estimate the free energy of protein and ligand binding. Complexes of proteins with ligands can be used for training, and for each protein there are several positions of the ligand: 1 correct, "native", for which the energy is minimal, and several generated incorrect ones. For a third of the data set, values are known that are proportional to the desired binding energy of ligands in native positions with the protein. There is a separate test set consisting of 1) complexes of proteins and ligands, for which it is necessary to find the best ligand position (the algorithm for obtaining ligand positions differs from that used in training), 2) complexes of proteins and ligands, for whose native positions it is necessary to predict the binding energy, and 3) proteins for which it is necessary to find the most strongly binding ligand.
-
* '''Data:''' Около 10000 комплексов: для каждого из них есть 1 нативная поза and 18 (можно сгенерировать больше) ненативных. Основными дескрипторами являются гистограммы распределений расстояний между различными атомами белка and лиганда, размерность вектора дескрипторов ~ 20,000. Набор дескрипторов может быть расширен (можно генерировать позы с разным отклонением and использовать его как дескриптор, можно добавить свойства маленьких молекул: число связей, вокруг которых в молекуле возможен поворот, площадь ее поверхности, разбиение ее поверхности диаграммой Вороного. Данные будут предоставлены в виде бинарных файлов со скриптом на python для чтения.
+
* '''Data:''' About 10000 complexes: for each of them there is 1 native pose and 18 (more can be generated) non-native ones. The main descriptors are histograms of distributions of distances between different atoms of the protein and ligand, the dimension of the vector of descriptors is ~ 20,000. The set of descriptors can be extended (you can generate poses with different deviations and use it as a descriptor, you can add the properties of small molecules: the number of bonds around which rotation is possible in a molecule, its surface area, its surface division by a Voronoi diagram. The data will be provided in the form of binary files with a python script to read.
-
* '''References:''': PEPSI-Dock: a detailed data-driven protein–protein interaction potential accelerated by polar Fourier correlation Predicting Binding Poses and Affinities in the CSAR 2013―2014 Docking Exercises Using the Knowledge-Based Convex-PL Potential
+
* '''References:''' PEPSI-Dock: a detailed data-driven protein–protein interaction potential accelerated by polar Fourier correlation Predicting Binding Poses and Affinities in the CSAR 2013―2014 Docking Exercises Using the Knowledge-Based Convex-PL Potential
-
* '''Basic algorithm:''' Мы использовали линейный SVM (это просто lecture notes, я не вижу смысла тут давать Вапника, тем более что все это, включая эти lecture notes, гуглится), связь которого с оценкой энергии, выходящей за рамки задачей классификации, описана в перечисленных выше статьях. Для учета известных из эксперимента значений, пропорциональных энергии, предлагается использовать линейную регрессию SVR .
+
* '''Base algorithm:''' We used a linear SVM (these are just lecture notes, I see no reason to give Vapnik here, especially since all this, including these lecture notes, is googled), the connection of which with an energy estimate that goes beyond scope of the classification The problem is described in the articles listed above. To take into account experimentally known values proportional to energy, it is proposed to use linear regression SVR .
-
* '''Solution:''' Необходимо свести использованную ранее задачу SVM к задаче регрессии and решить стандартными методами. Для проверки работы алгоритма будет использован как описанный выше тест, так and несколько других тестовых сетов с аналогичными Taskми, но другими данными.
+
* '''Solution:''' It is necessary to reduce the previously used SVM problem to a regression problem and solve it using standard methods. To check the operation of the algorithm, both the test described above and several other test sets with similar The problems but different data will be used.
-
* '''Novelty:''' Правильная оценка качества связывания белка and лиганда используется при разработке лекарства для поиска молекул, наиболее сильно взаимодействующих с исследуемым белком.
+
* '''Novelty:''' Proper assessment of the quality of protein and ligand binding is used in drug development to find molecules that interact most strongly with the protein under study.
-
Особую важность представляет оценка значений энергии связывания белка с лигандом: определенный разными группами на предложенном тесте коэффициент корреляции (Пирсона) энергии с ее экспериментальными значениями не превышает 0.7. Предсказание наиболее сильно связывающегося лиганда из большого числа не связывающихся с белком молекул также вызывает трудности. Целью данной работы является получение метода, позволяющего достаточно точно оценивать связывание белка с лигандами. С точки зрения машинного обучения and оптимизации интерес представляет объединение задач классификации and регрессии.
+
Of particular importance is the assessment of the values of the binding energy of the protein with the ligand: the coefficient of correlation (Pearson) of the energy with its experimental values determined by different groups on the proposed test does not exceed 0.7. Prediction of the most strongly binding ligand from a large number of non-protein-binding molecules is also difficult. The aim of this work is to obtain a method that allows a fairly accurate assessment of protein binding to ligands. From the point of view of machine learning and optimization, it is of interest to combine classification and regression problems.
-
* '''Добавление''' Даны несколько наборов данных, описывающие атом в молекуле или связь между атомами, с маленьким feature вектором (обычно это 3-10 дескрипторов) and несколькими классами, соответствующими гибридизации атома или порядку связи. Самих данных может быть от ~ 100 до 20,000 векторов в зависимости от типа атома. Нужно протестировать на этом какое-нибудь мультиклассовое машинное обучение (random forests, нейронную сеть, что-то другое), можно что угодно делать с дескрипторами. Мы сейчас используем SVM. Важна не только точность, но and вычислительная сложность предсказания.
+
* '''Appendix''' Given several data sets describing an atom in a molecule or a bond between atoms, with a small feature vector (usually 3-10 descriptors) and several classes corresponding to the atom's hybridization or bond order. The data itself can be from ~100 to 20,000 vectors depending on the type of atom. You need to test some kind of multiclass machine learning on this (random forests, neural network, something else), you can do anything with descriptors. We are currently using SVM. Not only the accuracy is important, but also the computational complexity of the prediction.
* '''Authors:''' Sergei Grudinin, Maria Kadukova
* '''Authors:''' Sergei Grudinin, Maria Kadukova
-
=== Task Strizhov and Kulunchakov + ===
+
===Strijov and Kulunchakov+===
-
* '''Name:''' Creation of delay-operators for multiscale forecasting by means of symbolic regression
+
* '''Title:''' Creation of delay-operators for multiscale forecasting by means of symbolic regression
-
* '''Task''': Suppose that one needs to build a forecasting machine for a response variable. Given a large set of time series, one can advance a hypothesis that they are related to this variable. Relying upon this hypothesis, we can use given time series as features for the forecasting machine. However, the values of time series could be produced with different frequencies. Therefore, we should take into account not only the values, but the delays as well. The simplest model for forecast is a linear one. In the presence of large set of features this model can approximate the response quite well. To avoid the problem of multiscaling, we introduce a definition of delay-operators. Each delay-operator corresponds to one time series and represents continuous correlation function. This correlation function shows a dependence between the response variable and corresponding time series. Therefore, each delay-operator put weights on the values of corresponding time series depending on the greatness of the delay. Having these delay-operators, we avoid the problem of multiscaling. To find them, we use genetic programming and symbolic regression. If the resulted weighted linear regression model would produce poor approximation, we can use a nonlinear one instead. To find good nonlinear function, we would use symbolic regression as well.
+
* '''Problem:''' Suppose that one needs to build a forecasting machine for a response variable. Given a large set of time series, one can advance a hypothesis that they are related to this variable. Relying upon this hypothesis, we can use given time series as features for the forecasting machine. However, the values of time series could be produced with different frequencies. Therefore, we should take into account not only the values, but the delays as well. The simplest model for forecast is a linear one. In the presence of large set of features this model can approximate the response quite well. To avoid the problem of multiscaling, we introduce a definition of delay-operators. Each delay-operator corresponds to one time series and represents continuous correlation function. This correlation function shows a dependence between the response variable and corresponding time series. Therefore, each delay-operator put weights on the values of corresponding time series depending on the greatness of the delay. Having these delay-operators, we avoid the problem of multiscaling. To find them, we use genetic programming and symbolic regression. If the resulted weighted linear regression model would produce poor approximation, we can use a nonlinear one instead. To find good nonlinear function, we would use symbolic regression as well.
* '''Data:''' Any data from the domain of multiscalse forecating of time series. See the [[Media:Kulunchakov2016MultiscaleForecast.pdf|full version]] of this introduction.
* '''Data:''' Any data from the domain of multiscalse forecating of time series. See the [[Media:Kulunchakov2016MultiscaleForecast.pdf|full version]] of this introduction.
-
* '''References:''': to be handed by V.V.Strijov
+
* '''References:''' to be handed by V.V.Strijov
-
* '''Basic algorithm:''' to be handed by V.V.Strijov
+
* '''Base algorithm:''' to be handed by V.V.Strijov
* '''Solution:''' Use genetic algorithms applied to symbolic regression to create and test delay-operators in multiscale forecasting.
* '''Solution:''' Use genetic algorithms applied to symbolic regression to create and test delay-operators in multiscale forecasting.
* '''Novelty:''' to be handed by V.V.Strijov
* '''Novelty:''' to be handed by V.V.Strijov
* '''Authors:''' supervisor: V.V.Strijov, consultant: A.S. Kulunchakov
* '''Authors:''' supervisor: V.V.Strijov, consultant: A.S. Kulunchakov
-
 
+
==2016==
-
=2016=
+
-
 
+
{|class="wikitable"
{|class="wikitable"
|-
|-
Строка 3341: Строка 3765:
! Letters
! Letters
! Grade
! Grade
-
! Magazine
+
! Journal
-
|-
+
-
|Гончаров Алексей (пример)
+
-
|Метрическая классификация временных рядов
+
-
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/code code],
+
-
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/Goncharov2015MetricClassification.pdf paper],
+
-
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/GoncharovAlexey2015PresentationMetricClassification.pdf slides]
+
-
|[[Участник:Mpopova|Maria Popova]]
+
-
|Задаянчук Андрей
+
-
|BMF
+
-
|AILSBRCVTDSWH>
+
-
|10
+
-
|ИИП
+
|-
|-
-
|Баяндина Анастасия
+
|Bayandina Anastasia
-
|Тематические модели дистрибутивной семантики для выделения этнорелевантных тем в социальных сетях
+
|Thematic models of distributive semantics for highlighting ethno-relevant topics in social networks
|[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Bayandina2016TopicModeling/doc/Bayandina2016TopicModeling.pdf paper]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Bayandina2016TopicModeling/doc/Bayandina2016TopicModeling.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Bayandina2016TopicModeling/doc/Bayandina2016TopicModelingPresentation.pdf slides]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Bayandina2016TopicModeling/doc/Bayandina2016TopicModelingPresentation.pdf slides]
[https://www.youtube.com/watch?v=7IbYWWO_evY video]
[https://www.youtube.com/watch?v=7IbYWWO_evY video]
-
|Анна Потапенко
+
|Anna Potapenko
-
|Олег Городницкий
+
|Oleg Gorodnitsky
|BF
|BF
|AILSB++RCVTDEWHS
|AILSB++RCVTDEWHS
Строка 3367: Строка 3779:
|
|
|-
|-
-
|Белозерова Анастасия
+
|Belozerova Anastasia
-
|Согласование логических and линейных моделей классификации в информационном анализе электрокардиосигналов
+
|Coordination of logical and linear classification models in the information analysis of electrocardiosignals
|[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Belozerova2016LogicLinearClassificator/code code]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Belozerova2016LogicLinearClassificator/code code]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Belozerova2016LogicLinearClassificator/doc/Belozerova2016LogicLinearClassificator.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Belozerova2016LogicLinearClassificator/doc/Belozerova2016LogicLinearClassificator.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Belozerova2016LogicLinearClassificator/doc/Belozerova2016Presentation.pdf slides]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Belozerova2016LogicLinearClassificator/doc/Belozerova2016Presentation.pdf slides]
[https://www.youtube.com/watch?v=3XhaIN1bgDI video]
[https://www.youtube.com/watch?v=3XhaIN1bgDI video]
-
|Влада Целых
+
|Vlada Tselykh
-
|Малыгин Виталий
+
|Malygin Vitaly
|BF
|BF
|AILSB+RC+VTD>E0WH>S
|AILSB+RC+VTD>E0WH>S
Строка 3380: Строка 3792:
|
|
|-
|-
-
|Владимирова Мария
+
|Maria Vladimirova
-
|Бэггинг нейронных сетей в задаче предсказания биологической активности клеточных рецепторов
+
|Bagging of neural networks in the problem of predicting the biological activity of cell receptors
|[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Vladimirova2016BaggingNN/code code]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Vladimirova2016BaggingNN/code code]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Vladimirova2016BaggingNN/doc/Vladimirova2016BaggingNN.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Vladimirova2016BaggingNN/doc/Vladimirova2016BaggingNN.pdf paper]
Строка 3387: Строка 3799:
[https://www.youtube.com/watch?v=pPumIZ81KU4 vido]
[https://www.youtube.com/watch?v=pPumIZ81KU4 vido]
|Maria Popova
|Maria Popova
-
|Володин Сергей
+
|Volodin Sergey
|BMF
|BMF
|AILSBRCVTD>E>WHS
|AILSBRCVTD>E>WHS
Строка 3393: Строка 3805:
|
|
|-
|-
-
|Володин Сергей
+
|Volodin Sergey
-
|Вероятностный подход для задачи предсказания биологической активности ядерных рецепторов
+
|A probabilistic approach to the problem of predicting the biological activity of nuclear receptors
|[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Volodin2016ProbabilisticReceptorPrediction/code code] [https://svn.code.sf.net/p/mlalgorithms/code/Group374/Volodin2016ProbabilisticReceptorPrediction/doc/Volodin2016ProbabilisticReceptorPrediction.pdf paper] [https://svn.code.sf.net/p/mlalgorithms/code/Group374/Volodin2016ProbabilisticReceptorPrediction/doc/Volodin2016ProbabilisticReceptorPredictionSlides.pdf slides]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Volodin2016ProbabilisticReceptorPrediction/code code] [https://svn.code.sf.net/p/mlalgorithms/code/Group374/Volodin2016ProbabilisticReceptorPrediction/doc/Volodin2016ProbabilisticReceptorPrediction.pdf paper] [https://svn.code.sf.net/p/mlalgorithms/code/Group374/Volodin2016ProbabilisticReceptorPrediction/doc/Volodin2016ProbabilisticReceptorPredictionSlides.pdf slides]
[https://www.youtube.com/watch?v=TsQ8v778d0s video], [http://itas2016.iitp.ru/pdf/1570303389.pdf itis]
[https://www.youtube.com/watch?v=TsQ8v778d0s video], [http://itas2016.iitp.ru/pdf/1570303389.pdf itis]
|Maria Popova
|Maria Popova
-
|Мария Владимирова
+
|Maria Vladimirova
|BMF
|BMF
|AILSBRCVTDEWHS
|AILSBRCVTDEWHS
Строка 3404: Строка 3816:
|
|
|-
|-
-
|Городницкий Олег
+
|Gorodnitsky Oleg
-
|Адаптивный нелинейный метод восстановления матрицы по частичным наблюдениям
+
|An Adaptive Nonlinear Method for Recovering a Matrix from Partial Observations
|[http://svn.code.sf.net/p/mlalgorithms/code/Group374/Gorodnitskii2016AdaptiveApproximation/code code]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group374/Gorodnitskii2016AdaptiveApproximation/code code]
[http://svn.code.sf.net/p/mlalgorithms/code/Group374/Gorodnitskii2016AdaptiveApproximation/doc/Gorodnitskii2016AdaptiveApproximation2.pdf paper]
[http://svn.code.sf.net/p/mlalgorithms/code/Group374/Gorodnitskii2016AdaptiveApproximation/doc/Gorodnitskii2016AdaptiveApproximation2.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Gorodnitskii2016AdaptiveApproximation/doc/Gorodnitskii2016NNMF.pdf slides], [http://itas2016.iitp.ru/pdf/1570303466.pdf itis]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Gorodnitskii2016AdaptiveApproximation/doc/Gorodnitskii2016NNMF.pdf slides], [http://itas2016.iitp.ru/pdf/1570303466.pdf itis]
-
|Михаил Трофимов
+
|Mikhail Trofimov
-
|Анастасия Баяндина
+
|Bayandina Anastasia
|M
|M
|A++I++L++S+B+R+C++VTDE+WH
|A++I++L++S+B+R+C++VTDE+WH
Строка 3416: Строка 3828:
|
|
|-
|-
-
|Иванычев Сергей
+
|Ivanychev Sergey
-
|Синергия алгоритмов классификации (SVM Multimodelling)
+
|Synergy of classification algorithms (SVM Multimodelling)
|[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Ivanychev2016SVM_Multimodelling/code/ code]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Ivanychev2016SVM_Multimodelling/code/ code]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Ivanychev2016SVM_Multimodelling/doc/Ivanychev2016SVM_Multimodelling.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Ivanychev2016SVM_Multimodelling/doc/Ivanychev2016SVM_Multimodelling.pdf paper]
Строка 3428: Строка 3840:
|
|
|-
|-
-
|Ковалева Валерия
+
|Kovaleva Valeria
-
|Регулярная структура редких макромолекулярных кластеров
+
|Regular structure of rare macromolecular clusters
|[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Kovaleva2016Spectra/code/ code]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Kovaleva2016Spectra/code/ code]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Kovaleva2016Spectra/doc/Kovaleva2016Spectra.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Kovaleva2016Spectra/doc/Kovaleva2016Spectra.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Kovaleva2016Spectra/doc/Kovaleva2016Spectra_slides.pdf slides]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Kovaleva2016Spectra/doc/Kovaleva2016Spectra_slides.pdf slides]
[https://www.youtube.com/watch?v=JaeyrqJr1KU video], [http://itas2016.iitp.ru/pdf/1570303499.pdf itis]
[https://www.youtube.com/watch?v=JaeyrqJr1KU video], [http://itas2016.iitp.ru/pdf/1570303499.pdf itis]
-
|Ольга Вальба, Yuri Maksimov
+
|Olga Valba, Yuri Maksimov
-
|Дмитрий Федоряка
+
|Dmitry Fedoryaka
|BM
|BM
|A+IL+SBRCVTD0E0WH
|A+IL+SBRCVTD0E0WH
Строка 3441: Строка 3853:
|
|
|-
|-
-
|Макарчук Глеб
+
|Makarchuk Gleb
-
|Преобразования временных рядов для декодирование движения руки с помощью ECoG сигналов (electrocorticographic signals) у обезьян
+
|Time series transformations for hand motion decoding using ECoG signals (electrocorticographic signals) of monkeys
|[http://svn.code.sf.net/p/mlalgorithms/code/Group374/Makarchuk2016ECoGSignals/code code],
|[http://svn.code.sf.net/p/mlalgorithms/code/Group374/Makarchuk2016ECoGSignals/code code],
[http://svn.code.sf.net/p/mlalgorithms/code/Group374/Makarchuk2016ECoGSignals/doc/Makarchuk2016ECoGSignals.pdf paper]
[http://svn.code.sf.net/p/mlalgorithms/code/Group374/Makarchuk2016ECoGSignals/doc/Makarchuk2016ECoGSignals.pdf paper]
Строка 3454: Строка 3866:
|
|
|-
|-
-
|Малыгин Виталий
+
|Malygin Vitaly
-
|Применение комбинаторных оценок переобучения пороговых решающих правил для отбора признаков в задаче медицинской диагностики методом В. М. Успенского
+
|Application of combinatorial estimates of retraining of threshold decision rules for feature selection in the problem of medical diagnostics by the method of V. M. Uspensky
|[http://svn.code.sf.net/p/mlalgorithms/code/Group374/Malygin2016FeatureSelection/code code],
|[http://svn.code.sf.net/p/mlalgorithms/code/Group374/Malygin2016FeatureSelection/code code],
[http://svn.code.sf.net/p/mlalgorithms/code/Group374/Malygin2016FeatureSelection/doc/Malygin2016FeatureSelection.pdf paper],
[http://svn.code.sf.net/p/mlalgorithms/code/Group374/Malygin2016FeatureSelection/doc/Malygin2016FeatureSelection.pdf paper],
[http://svn.code.sf.net/p/mlalgorithms/code/Group374/Malygin2016FeatureSelection/doc/Malygin2016FSPresentation.pdf slides]
[http://svn.code.sf.net/p/mlalgorithms/code/Group374/Malygin2016FeatureSelection/doc/Malygin2016FSPresentation.pdf slides]
-
|Шаура Ишкина
+
|Shaura Ishkina
-
|Белозёрова Анастасия
+
|Belozerova Anastasia
|B
|B
|AILSBRCVTDEWH
|AILSBRCVTDEWH
Строка 3466: Строка 3878:
|
|
|-
|-
-
|Молибог Игорь
+
|Molibog Igor
-
|Использование методов снижения размерности при построении признакового пространства в задаче обнаружения внутреннего плагиата
+
|Using Dimension Reduction Methods When Building a Feature Space in the Problem of Internal Plagiarism Detection
|
|
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Molybog2016DimReduction/doc/MolybogMotrenkoStrijov2017DimRed.pdf paper],
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Molybog2016DimReduction/doc/MolybogMotrenkoStrijov2017DimRed.pdf paper],
Строка 3473: Строка 3885:
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Molybog2016DimReduction/doc/Molybog2016DimReduction_Presentation.pdf slides], [http://itas2016.iitp.ru/pdf/1570303407.pdf itis]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Molybog2016DimReduction/doc/Molybog2016DimReduction_Presentation.pdf slides], [http://itas2016.iitp.ru/pdf/1570303407.pdf itis]
|Anastasia Motrenko
|Anastasia Motrenko
-
|Сафин Камиль
+
|Safin Kamil
|BMF
|BMF
|AILSBRCVTDEWHS
|AILSBRCVTDEWHS
Строка 3479: Строка 3891:
|
|
|-
|-
-
|Погодин Роман
+
|Pogodin Roman
-
|Определение положения белков по электронной карте
+
|Determining the position of proteins using an electronic map
|[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Pogodin2016ProteinsFitting/code code], [http://svn.code.sf.net/p/mlalgorithms/code/Group374/Pogodin2016ProteinsFitting/doc/Pogodin2016ProteinsFitting.pdf paper], [http://svn.code.sf.net/p/mlalgorithms/code/Group374/Pogodin2016ProteinsFitting/doc/Pogodin2016ProteinsFittingPresentation.pdf slides]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Pogodin2016ProteinsFitting/code code], [http://svn.code.sf.net/p/mlalgorithms/code/Group374/Pogodin2016ProteinsFitting/doc/Pogodin2016ProteinsFitting.pdf paper], [http://svn.code.sf.net/p/mlalgorithms/code/Group374/Pogodin2016ProteinsFitting/doc/Pogodin2016ProteinsFittingPresentation.pdf slides]
[https://www.youtube.com/watch?v=0DskvHR4waE video], [http://itas2016.iitp.ru/pdf/1570303519.pdf itis]
[https://www.youtube.com/watch?v=0DskvHR4waE video], [http://itas2016.iitp.ru/pdf/1570303519.pdf itis]
-
|Александр Катруца
+
|Alexander Katrutsa
-
|Андрей Рязанов
+
|Andrey Ryazanov
|BMF
|BMF
|AILSBRСVTDEWHS
|AILSBRСVTDEWHS
Строка 3490: Строка 3902:
|
|
|-
|-
-
|Рязанов Андрей
+
|Andrey Ryazanov
-
|Восстановление первичной структуры белка по геометрии его главной цепи
+
|Restoration of the primary structure of a protein according to the geometry of its main chain
|[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Ryazanov2016InverseFolding/ folder]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Ryazanov2016InverseFolding/ folder]
[http://svn.code.sf.net/p/mlalgorithms/code/Group374/Ryazanov2016InverseFolding/doc/Ryazanov2016InverseFolding.pdf paper]
[http://svn.code.sf.net/p/mlalgorithms/code/Group374/Ryazanov2016InverseFolding/doc/Ryazanov2016InverseFolding.pdf paper]
[http://svn.code.sf.net/p/mlalgorithms/code/Group374/Ryazanov2016InverseFolding/doc/Ryazanov2016InverseFoldingPresentation.pdf slides]
[http://svn.code.sf.net/p/mlalgorithms/code/Group374/Ryazanov2016InverseFolding/doc/Ryazanov2016InverseFoldingPresentation.pdf slides]
[https://www.youtube.com/watch?v=ZGx14xat2Jg video], [http://itas2016.iitp.ru/pdf/1570303468.pdf itis]
[https://www.youtube.com/watch?v=ZGx14xat2Jg video], [http://itas2016.iitp.ru/pdf/1570303468.pdf itis]
-
|Михаил Карасиков
+
|Mikhail Karasikov
-
|Роман Погодин
+
|Roman Pogodin
|BMF
|BMF
|AIL+SBRC++VTD+EWHS
|AIL+SBRC++VTD+EWHS
Строка 3503: Строка 3915:
|
|
|-
|-
-
|Сафин Камиль
+
|Safin Kamil
-
|Определение заимствований в тексте без указания источника
+
|Definition of borrowings in the text without indicating the source
|[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Safin2016IntrinsicPlagiarism/code code], [https://svn.code.sf.net/p/mlalgorithms/code/Group374/Safin2016IntrinsicPlagiarism/doc/Safin2016IntrinsicPlagiarism.pdf paper]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Safin2016IntrinsicPlagiarism/code code], [https://svn.code.sf.net/p/mlalgorithms/code/Group374/Safin2016IntrinsicPlagiarism/doc/Safin2016IntrinsicPlagiarism.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Safin2016IntrinsicPlagiarism/doc/Safin2016Presentation1.pdf slides]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Safin2016IntrinsicPlagiarism/doc/Safin2016Presentation1.pdf slides]
[https://www.youtube.com/watch?v=lHYH1f5kYXU video]
[https://www.youtube.com/watch?v=lHYH1f5kYXU video]
-
|Михаил Кузнецов
+
|Mikhail Kuznetsov
-
|Молибог Игорь
+
|Molibog Igor
|BMF
|BMF
|AIL+SBRC>V>T>D>E0WHS
|AIL+SBRC>V>T>D>E0WHS
Строка 3515: Строка 3927:
|
|
|-
|-
-
|Федоряка Дмитрий
+
|Dmitry Fedoryaka
-
|Смеси моделей векторной авторегрессии в задаче прогнозирования временных рядов
+
|Mixtures of vector autoregression models in the problem of time series forecasting
|[http://svn.code.sf.net/p/mlalgorithms/code/Group374/Fedoriaka2016TimeSeriesPrediction/code code],
|[http://svn.code.sf.net/p/mlalgorithms/code/Group374/Fedoriaka2016TimeSeriesPrediction/code code],
[http://svn.code.sf.net/p/mlalgorithms/code/Group374/Fedoriaka2016TimeSeriesPrediction/doc/Fedoriaka2016TSPPresentation.pdf slides],
[http://svn.code.sf.net/p/mlalgorithms/code/Group374/Fedoriaka2016TimeSeriesPrediction/doc/Fedoriaka2016TSPPresentation.pdf slides],
[http://svn.code.sf.net/p/mlalgorithms/code/Group374/Fedoriaka2016TimeSeriesPrediction/doc/Fedoriaka2016TimeSeriesPrediction.pdf paper]
[http://svn.code.sf.net/p/mlalgorithms/code/Group374/Fedoriaka2016TimeSeriesPrediction/doc/Fedoriaka2016TimeSeriesPrediction.pdf paper]
|Radoslav Neichev
|Radoslav Neichev
-
|Ковалева Валерия
+
|Kovaleva Valeria
|BM
|BM
|AILSBRCV-T>D0E0WH>
|AILSBRCV-T>D0E0WH>
Строка 3527: Строка 3939:
|
|
|-
|-
-
|Цветкова Ольга
+
|Tsvetkova Olga
-
|Построение скоринговых моделей в системе SAS
+
|Building scoring models in the SAS system
|[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Tsvetkova2016ScoringCards/code code],
|[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Tsvetkova2016ScoringCards/code code],
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Tsvetkova2016ScoringCards/doc/ScoringCards.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Tsvetkova2016ScoringCards/doc/ScoringCards.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Tsvetkova2016ScoringCards/doc/presentation.pdf slides]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Tsvetkova2016ScoringCards/doc/presentation.pdf slides]
-
|Раиса Джамтырова
+
|Raisa Jamtyrova
-
|Чигринский Виктор
+
|Chygrynskiy Viktor
|BF
|BF
|A+I+L+S+B+R+C+V0T0D0E0WH>S
|A+I+L+S+B+R+C+V0T0D0E0WH>S
Строка 3539: Строка 3951:
|
|
|-
|-
-
|Чигринский Виктор
+
|Chygrynskiy Viktor
-
|Аппроксимация границ радужки глаза
+
|Approximation of the boundaries of the iris
|[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Chigrinskiy2016ApproximationOfIrisBoundaries/code code] [https://svn.code.sf.net/p/mlalgorithms/code/Group374/Chigrinskiy2016ApproximationOfIrisBoundaries/doc/Chigrinskiy2016ApproximationOfIrisBoundaries.pdf paper]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Chigrinskiy2016ApproximationOfIrisBoundaries/code code] [https://svn.code.sf.net/p/mlalgorithms/code/Group374/Chigrinskiy2016ApproximationOfIrisBoundaries/doc/Chigrinskiy2016ApproximationOfIrisBoundaries.pdf paper]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Chigrinskiy2016ApproximationOfIrisBoundaries/doc/Chigrinskiy2016ApproximationOfIrisBoundariesSlides.pdf slides]
[https://svn.code.sf.net/p/mlalgorithms/code/Group374/Chigrinskiy2016ApproximationOfIrisBoundaries/doc/Chigrinskiy2016ApproximationOfIrisBoundariesSlides.pdf slides]
[https://www.youtube.com/watch?v=3kuNMYhVBw4 video]
[https://www.youtube.com/watch?v=3kuNMYhVBw4 video]
-
|Юрий Ефимов
+
|Yuri Efimov
|
|
|B
|B
Строка 3552: Строка 3964:
|-
|-
|}
|}
-
=== Task 1 ===
+
===1. 2016===
-
* '''Data:''' Синергия алгоритмов классификации. Данные из репозитория UCI, чтобы можно было сравнивать напрямую с другими работами, в частности работами Вапника.
+
* '''Data:''' Synergy of classification algorithms. Data from the UCI repository so that it can be compared directly with other works, in particular the work of Vapnik.
-
* '''References:''': существуют разные подходы к комбинированию SVM: например, bagging (http://www.ecse.rpiscrews.us/~cvrl/FaceProject/Homepage/Publication/ICPR04_final_cameraready_v4.pdf), также пробуют and boosting (http://www.researchgate.net/profile/Hong-Mo_Je/publication/3974309_Pattern_classification_using_support_vector_machine_ensemble/links/09e415091bdc559051000000.pdf).
+
* '''References:''' There are different approaches to combining SVMs: on example, bagging (http://www.ecse.rpiscrews.us/~cvrl/FaceProject/Homepage/Publication/ICPR04_final_cameraready_v4.pdf), also try and boosting (http://www.researchgate.net/profile/Hong-Mo_Je/publication/3974309_Pattern_classification_using_support_vector_machine_ensemble/links/09e415091bdc559051000000.pdf).
-
* '''Basic algorithm:''' Описан в постановке задачи
+
* '''Base algorithm:''' Described in the problem statement
-
* '''Solution:''' модификация базового алгоритма, или просто сам Basic algorithm. Главное - сравнить с другими методами and сделать выводы, в частности о связи наличия улучшения в качестве and разнообразия множеств опорных объектов, построенных разными SVM ами.
+
* '''Solution:''' a modification of the basic algorithm, or simply the Basic algorithm itself. The main thing is to compare with other methods and draw conclusions, in particular, about the relationship between the presence of an improvement in the quality and diversity of sets of reference objects built by different SVMs.
-
* '''Novelty:''' известно (например, из лекций Константина Вячеславовича), что строить короткие композиции из сильных классификаторов (например, SVM) с помощью бустинга не получается (хотя все же пробуют (см. литературу)). Поэтому предлагается вместо линейной комбинации строить нелинейную. Предполагается, что такая композиция может дать прирост качества по сравнению с одиночным SVM.
+
* '''Novelty:''' It is known (for example, from Konstantin Vyacheslavovich's lectures) that it is not possible to build short compositions from strong classifiers (for example, SVM) using boosting (although they still try (see literature)). Therefore, it is proposed to build a nonlinear combination instead of a linear one. It is assumed that such a composition can give an increase in quality compared to a single SVM.
* '''consultant''': Alexander Aduenko
* '''consultant''': Alexander Aduenko
-
=== Task 2 ===
+
===2. 2016===
-
* '''Name:''' Темпоральная тематическая модель коллекции пресс-релизов.
+
* '''Title:''' Temporal theme model of the press release collection.
-
* '''Task''': Разработка методов анализа тематической структуры большой текстовой коллекции and её динамики во времени. Проблемой является оценка качества построенной структуры. Требуется реализовать критерии устойчивости and полноты темпоральной тематической модели с использованием ручного отбора найденных тем по их интерпретируемости, различности and событийности.
+
* '''Problem:''' Development of methods for analyzing the thematic structure of a large text collection and its dynamics over time. The problem is the assessment of the quality of the constructed structure. It is required to implement the criteria of stability and completeness of the temporal thematic model using manual selection of the found topics according to their interpretability, difference and eventfulness.
-
* '''Data:''' Коллекция пресс-релизов внешнеполитических ведомств ряда стран за 10 лет, на английском языке.
+
* '''Data:''' A collection of press releases from the foreign ministries of a number of countries over 10 years, in English.
-
* '''References:''':
+
* '''References:'''
-
*# Дойков Н.В. [[Media:2015_417_DoykovNV.pdf|Адаптивная регуляризация вероятностных тематических моделей]]. ВКР бакалавра, ВМК МГУ. 2015.
+
*# Doikov N.V. [[Media:2015_417_DoykovNV.pdf|Adaptive regularization of probabilistic topic models]]. VKR bachelor, VMK MSU. 2015.
-
* '''Basic algorithm:''' Классический LDA Д.Блэя c post-hoc анализом времени.
+
* '''Base algorithm:''' Blay's classic LDA with post-hoc time analysis.
-
* '''Solution:''' Реализация аддитивно регуляризованной тематической модели с помощью библиотеки [[BigARTM]]. Построение серий тематических моделей. Оценивание их интерпретируемости, устойчивости and полноты.
+
* '''Solution:''' Implementation of an additively regularized topic model using the [[BigARTM]] library. Building a series of thematic models. Evaluation of their interpretability, stability and completeness.
-
* '''Novelty:''' Критерии устойчивости and полноты тематических моделей являются новыми.
+
* '''Novelty:''' Criteria for sustainability and completeness of thematic models are new.
-
* '''consultant''': Никита Дойков, '''автор задачи''' Vorontsov K. V..
+
* '''consultant''': Nikita Doikov, '''problem author''' Vorontsov K. V.
-
=== Task 3 ===
+
===3. 2016===
-
* '''Name:''' Согласование логических and линейных моделей классификации в информационном анализе электрокардиосигналов.
+
* '''Title:''' Coordination of logical and linear classification models in the information analysis of electrocardiosignals.
-
* '''Task''': Имеются логические классификаторы, основанные на выявлении диагностических эталонов для каждого заболевания and построенные Expertом в полуручном режиме. Для этих классификаторов определены оценки активностей заболеваний, которые уже много лет используются в диагностической системе and удовлетворяют пользователей-врачей. Мы строим линейные классификаторы, которые обучаются полностью автоматически and по качеству классификации опережают логические. Однако прямой перенос методики оценивания активности на линейные классификаторы оказался невозможен. Требуется построить линейную модель активности, настроив её на воспроизведение известных оценок активности логического классификатора.
+
* '''Problem:''' There are logical classifiers based on the identification of diagnostic standards for each disease and built by the Expert in semi-manual mode. For these classifiers, estimates of disease activities are determined, which have been used in the diagnostic system for many years and satisfy physician users. We build linear classifiers that are trained completely automatically and are ahead of logical classifiers in terms of classification quality. However, a direct transfer of the activity estimation technique to linear classifiers turned out to be impossible. It is required to build a linear activity model, setting it to reproduce the known activity estimates of the logical classifier.
-
* '''Data:''' Выборка более 10 тысяч электрокардиограмм с диагнозами по 32 заболеваниям.
+
* '''Data:''' A selection of more than 10 thousand electrocardiograms with diagnoses for 32 diseases.
-
* '''References:''': выдадим :)
+
* '''References:''' will issue :)
-
* '''Basic algorithm:''' Линейный классификатор.
+
* '''Base algorithm:''' Linear classifier.
-
* '''Solution:''' Методы линейной регрессии, линейной классификации, отбора признаков.
+
* '''Solution:''' Methods of linear regression, linear classification, feature selection.
-
* '''Novelty:''' Task согласования двух моделей различной природы может рассматриваться как обучение с привилегированной информацией (learning with privileged information) — перспективное направление, предложенное классиком машинного обучения В.Н.Вапником несколько лет назад.
+
* '''Novelty:''' The problem of matching two models of different nature can be considered as learning with privileged information - a promising direction proposed by the machine learning classic VN Vapnik several years ago.
-
* '''consultant''': Влада Целых, '''автор задачи''' Vorontsov K. V..
+
* '''consultant''': Vlada Tselykh, '''problem author''' Vorontsov K. V.
-
=== Task 4 ===
+
===4. 2016===
-
* '''Name:''' Тематическая модель классификации для диагностики заболеваний по электрокардиограмме.
+
* '''Title:''' Thematic classification model for diagnosing diseases by electrocardiogram.
-
* '''Task''': [[Технология информационного анализа электрокардиосигналов]] по В.М.Успенскому основана на преобразовании ЭКГ в символьную строку and выделении информативных наборов слов — диагностических эталонов каждого заболевания. Линейный классификатор строит один диагностический эталон для каждого заболевания. В системе скрининговой диагностики «Скринфакс» сейчас используется четыре эталона для каждого заболевания, построенных в полуручном режиме. Требуется полностью автоматизировать процесс построения диагностических эталонов and определять их оптимальное количество для каждого заболевания. Для этого предполагается доработать тематическую модель классификации С.Цыгановой, выполнить новую реализацию под [[BigARTM]], расширить вычислительные эксперименты, улучшить качество классификации.
+
* '''Problem:''' [[Technology of information analysis of electrocardiosignals]] according to V.M.Uspensky is based on ECG conversion into a character string and selection of informative sets of words - diagnostic standards for each disease. The linear classifier builds one diagnostic standard for each disease. The Screenfax screening diagnostic system now uses four standards for each disease, built in a semi-manual mode. It is required to fully automate the process of constructing diagnostic standards and to determine their optimal number for each disease. To do this, it is supposed to finalize the thematic classification model of S. Tsyganova, to perform a new implementation under [[BigARTM]], to expand computational experiments, to improve the quality of classification.
-
* '''Data:''' Выборка более 10 тысяч электрокардиограмм с диагнозами по 32 заболеваниям.
+
* '''Data:''' A selection of more than 10 thousand electrocardiograms with diagnoses for 32 diseases.
-
* '''References:''': выдадим :)
+
* '''References:''' will issue :)
-
* '''Basic algorithm:''' Модели классификации В.Целых, тематическая модель С.Цыгановой.
+
* '''Base algorithm:''' Classification models by V.Tselykh, thematic model by S.Tsyganova.
-
* '''Solution:''' Тематическая модель, реализованная с помощью библиотеки [[BigARTM]].
+
* '''Solution:''' Topic model implemented using the [[BigARTM]] library.
-
* '''Novelty:''' Тематические модели ранее не применялись для классификации дискретизированных биомедицинских сигналов.
+
* '''Novelty:''' Topic models have not previously been used to classify sampled biomedical signals.
-
* '''consultant''': Светлана Цыганова, '''автор задачи''' Vorontsov K. V..
+
* '''consultant''': Svetlana Tsyganova, '''problem author''' Vorontsov K. V.
-
=== Task 5 ===
+
===5. 2016===
-
* '''Name:''' Тематические модели дистрибутивной семантики для выделения этнорелевантных тем в социальных сетях.
+
* '''Title:''' Thematic models of distributive semantics for highlighting ethno-relevant topics in social networks.
-
* '''Task''': Тематическое моделирование текстовых коллекций социальных медиа сталкивается с проблемой сверх-коротких документов. Не всегда ясно, где проводить границы между документами (возможные варианты: отдельный пост, стена пользователя, все сообщения данного пользователя, все сообщения за данный день в данном регионе, and т.д.). Тематические модели дают интерпретируемые векторные представления слов and документов, но их качество зависит от распределения длин документов. Модель word2vec независима от длин документов, так как учитывает лишь локальные контексты слов, но координаты векторных представлений не допускают тематическую интерпретацию. Задачей проекта является построение гибридной модели, объединяющей достоинства and свободной от недостатков обеих моделей.
+
* '''Problem:''' Thematic modeling of social media text collections faces the problem of ultra-short documents. It is not always clear where to draw the boundaries between documents (possible options: a single post, a user's wall, all posts by a given user, all posts for a given day in a given region, and so on). Topic models give interpretable vector representations of words and documents, but their quality depends on the distribution of document lengths. The word2vec model is independent of document lengths, since it takes into account only the local contexts of words, but the coordinates of vector representations do not allow thematic interpretation. The objective of the project is to build a hybrid model that combines the advantages and is free from the disadvantages of both models.
-
* '''Data:''' Коллекции социальных сетей ЖЖ and ВК.
+
* '''Data:''' Collections of social networks LJ and VK.
-
* '''References:''': выдадим :)
+
* '''References:''' will issue :)
-
* '''Basic algorithm:''' Тематические модели, ранее построенные на этих данных.
+
* '''Base algorithm:''' Topic models previously built on this data.
-
* '''Solution:''' Реализация регуляризатора дистрибутивной семантики, аналогичного языковой модели vord2vec, в библиотеке [[BigARTM]].
+
* '''Solution:''' Implementation of a distributive semantics regularizer similar to the vord2vec language model in the [[BigARTM]] library.
-
* '''Novelty:''' Пока в литературе нет языковых моделей, объединяющих основные преимущества вероятностных тематических моделей and модели word2vec.
+
* '''Novelty:''' So far, there are no language models in the literature that combine the main advantages of probabilistic topic models and the word2vec model.
-
* '''consultant''': Анна Потапенко, по техническим вопросам Murat Apishev, '''автор задачи''' Vorontsov K. V..
+
* '''consultant''': Anna Potapenko, on technical issues Murat Apishev, '''problem author''' Vorontsov K. V.
-
=== Task 7 ===
+
===7. 2016===
-
* '''Name:''' определение положения белков по электронной карте
+
* '''Title:''' Determining the position of proteins using an electronic map
-
* '''Task''': неформально --- есть наборы экспериментально определённых карт расположения белков в комплексах, часть из них известна в высоком разрешении, необходимо восстановить всю карту в высоком разрешении; формально --- есть матрицы and вектора энергий соответствующие каждой карте белкового комплекса, нужно определить какой набор белков минимизирует квадратичную форму, образованую матрицей and вектором.
+
* '''Problem:''' informally --- there are sets of experimentally determined maps of the location of proteins in complexes, some of them are known in high resolution, it is necessary to restore the entire map in high resolution; formally --- there are matrices and energy vectors corresponding to each map of the protein complex, it is necessary to determine which set of proteins minimizes the quadratic form formed by the matrix and vector.
-
* '''Data:''' экспериментальные данные с сайта http://www.emdatabank.org/ будуь преобразованы в матрицы в вектора энергий. Понимание биофизической природы не обязательно.
+
* '''Data:''' experimental data from the site http://www.emdatabank.org/ will be converted into matrices into energy vectors. Understanding the biophysical nature is not necessary.
-
* '''References:''': статьи по методам решения задач квадратичного программирования and различным релаксациям
+
* '''References:''' articles on methods for solving quadratic programming problems and various relaxations
-
* '''Basic algorithm:''' методы квадратичного программирования с различными релаксациями
+
* '''Base algorithm:''' quadratic programming methods with various relaxations
-
* '''Solution:''' минимизация суммарной энергии белкового комплекса
+
* '''Solution:''' minimizing the total energy of the protein complex
-
* '''Novelty:''' применение методов квадратичного программирования and исследование их точности в Taskх восстановления электронных карт
+
* '''Novelty:''' the application of quadratic programming methods and the study of their accuracy in The problems of restoring electronic maps
-
* '''consultant''': Александр Катруца, автор задачи: Sergei Grudinin.
+
* '''consultant''': Alexander Katrutsa, problem author: Sergei Grudinin.
-
* '''Желательные навыки''': понимание and интерес к методам оптимизации, работа с пакетом CVX
+
* '''Desirable skills''': understanding and interest in optimization methods, working with CVX package
-
=== Task 8 ===
+
===8. 2016===
-
* '''Name:''' Классификация физической активности: исследование изменения пространства параметров при дообучении and модификации моделей глубокого обучения
+
* '''Title:''' Classification of Physical Activity: Investigation of Parameter Space Variation in Retraining and Modification of Deep Learning Models
-
* '''Task''': Дана модель классификации по выборке временных сегментов, записанных с акселерометра мобильного телефона. Модель представляет собой многослойную нейросеть. Требуется 1) исследовать дисперсию and матрицу ковариаций параметров нейросети при различных расписаниях оптимизации (т.е. при различных подходах к поэтапному обучению). 2) на основе полученной матрицы ковариаций параметров предложить эффективный способ модификации модели глубокого обучении.
+
* '''Problem:''' Given a classification model for a sample of time segments recorded from a mobile phone's accelerometer. The model is a multilayer neural network. It is required 1) to investigate the variance and covariance matrix of the neural network parameters under different optimization schedules (i.e., under different approaches to staged learning). 2) based on the obtained parameter covariance matrix, propose an effective way to modify the deep learning model.
-
* '''Data:''' Выборка WISDM http://www.cis.fordham.edu/wisdm/dataset.php.
+
* '''Data:''' WISDM Sample http://www.cis.fordham.edu/wisdm/dataset.php.
-
* '''References:''':
+
* '''References:'''
-
**Zadayanchuk A.I., Popova M.S., Strizhov V.V. Выбор оптимальной модели классификации физической активности по измерениям акселерометра http://strijov.com/papers/Zadayanchuk2015OptimalNN4.pdf
+
*# Zadayanchuk A.I., Popova M.S., Strijov V.V. Choosing the optimal physical activity classification model based on accelerometer measurements http://strijov.com/papers/Zadayanchuk2015OptimalNN4.pdf
-
**Попова М. С., Strizhov V.V. Построение сетей глубокого обучения для классификации временных рядов - http://strijov.com/papers/PopovaStrijov2015DeepLearning.pdf
+
*# Popova M.S., Strijov V.V. Building Deep Learning Networks for Time Series Classification - http://strijov.com/papers/PopovaStrijov2015DeepLearning.pdf
-
**Oleg BakhteevЮ., Popova M.S., Strizhov V.V. Системы and средства глубокого обучения в Taskх классификации
+
*# Oleg Bakhteev Yu., Popova M.S., Strijov V.V. Deep Learning Systems and Tools in The problem Classification
-
**LeCun Y. Optimal Brain Damage - yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf
+
*# LeCun Y. Optimal Brain Damage - yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf
-
**Работы по пред-обучению (pre-training) and дообучению (fine-tuning)
+
*# Works on pre-training (pre-training) and additional training (fine-tuning)
-
* '''Basic algorithm:''' Базовая модель описана в статье "Построение сетей глубокого обучения для классификации временных рядов". Алгоритм можно реализовать как с помощью библиотеки PyLearn или keras (другие библиотеки and языки программирования также допустимы).
+
* '''Base algorithm:''' The basic model is described in the article "Building Deep Learning Networks for Time Series Classification". The algorithm can be implemented either using the PyLearn library or keras (other libraries and programming languages are also acceptable).
-
* '''Solution:''' Анализ матрицы ковариаций, построение add-del метода на основе полученных данных.
+
* '''Solution:''' Analysis of the covariance matrix, building an add-del method based on the received data.
-
* '''Novelty:''' Методика исследования ковариационной матрицы большой размерности, а также полученный алгоритм модификации модели важны and будут использоваться в дальнейшем при анализе моделей глубокого обучения.
+
* '''Novelty:''' The technique for studying a high-dimensional covariance matrix, as well as the resulting model modification algorithm, are important and will be used in the future when analyzing deep learning models.
* '''consultant''': Oleg Bakhteev
* '''consultant''': Oleg Bakhteev
-
=== Task 9 ===
+
===9. 2016===
-
* '''Name:''' восстановление первичной структуры белка по геометрии его главной цепи
+
* '''Title:''' Restoration of the primary structure of a protein according to the geometry of its main chain
-
* '''Task''': на основе главной цепи белка, то есть по сути его геометрии, надо восстановить первичную структуру белка, то есть какой последовательности аминокислот соотвествует заданная геометрия главной цепи. Предлагается это делать на основе минимизации суммарной энергии белка, выраженной квадратичной формой скорее всего не положительно определённой.
+
* '''Problem:''' on the basis of the main chain of the protein, that is, in essence its geometry, it is necessary to restore the primary structure of the protein, that is, which sequence of amino acids corresponds to the given geometry of the main chain. It is proposed to do this on the basis of minimizing the total energy of the protein, expressed by a quadratic form, most likely not positive definite.
-
* '''Data:''' на выбор studentа: собранные матрицы энергий для различных белков на основе их описаний в формате PDB или сами PDB-файлы; в последнем случае необходимо будет собрать матрицы для дальнейшей работы
+
* '''Data:''' at the choice of the student: collected energy matrices for various proteins based on their descriptions in the PDB format or the PDB files themselves; in the latter case, it will be necessary to collect matrices for further work
-
* '''References:''': статьи по методам решения задач квадратичного программирования and различным релаксациям
+
* '''References:''' articles on methods for solving quadratic programming problems and various relaxations
-
* '''Basic algorithm:''' методы квадратичного программирования с различными релаксациями
+
* '''Base algorithm:''' quadratic programming methods with various relaxations
-
* '''Solution:''' минимизация суммарной энергии белка
+
* '''Solution:''' minimizing the total protein energy
-
* '''Novelty:''' применение методов квадратичного программирования and исследование их точности
+
* '''Novelty:''' application of quadratic programming methods and study of their accuracy
-
* '''consultant''': Михаил Карасиков, автор задачи: Sergei Grudinin.
+
* '''consultant''': Mikhail Karasikov, problem author: Sergei Grudinin.
-
* '''Желательные навыки''': понимание and интерес к методам оптимизации, работа с пакетом CVX
+
* '''Desirable skills''': understanding and interest in optimization methods, working with CVX package
-
=== Task 10 ===
+
===10. 2016===
-
* '''Name:''' Multi-task learning подход для задачи предсказания биологической активности ядерных рецепторов
+
* '''Title:''' Multi-The problem learning approach for The problem of predicting the biological activity of nuclear receptors
-
* '''Task''': В задаче необходимо построить multi-task модель, предсказывающую взаимодействие двух типов молекул: рецепторов and протеинов. Решение этой задачи необходимо для разработки новых лекарств (drug design).
+
* '''Problem:''' In The problem it is necessary to build a multi-The problem model that predicts the interaction of two types of molecules: receptors and proteins. The solution of this problem is necessary for the development of new drugs (drug design).
-
* '''Data:''' описание 8500+ протеинов and метки для 12 рецепторов
+
* '''Data:''' description of 8500+ proteins and labels for 12 receptors
-
* '''References:''': будет отправлена studentу
+
* '''References:''' will be sent to the student
-
* '''Basic algorithm:''' multi-task lasso регрессия из библиотеки python scikit-learn
+
* '''Base algorithm:''' multi-The problem lasso regression from scikit-learn python library
-
* '''Solution:''' обобщение линейной регрересси на случай multi-task в вероятностной интерпретации
+
* '''Solution:''' generalization of linear regression to the multi-The problem case in probabilistic interpretation
-
* '''Novelty:''' Multi-task learning подход является новаторским в области drug design
+
* '''Novelty:''' Multi-The problem learning approach is pioneering in drug design
* '''consultant''': Maria Popova
* '''consultant''': Maria Popova
-
* '''Желательные навыки''': понимание and интерес к теории вероятности, готовность быстро разобраться в различных подходах к регрессии, знание или готовность к освоению Python
+
* '''Desired skills''': understanding of and interest in probability theory, willingness to quickly understand various approaches to regression, knowledge or willingness to learn Python
-
=== Task 11 ===
+
===11. 2016===
-
* '''Name:''' Бэггинг нейронных сетей в задаче предсказания биологической активности ядерных рецепторов.
+
* '''Title:''' Bagging of neural networks in The problem of predicting the biological activity of nuclear receptors.
-
* '''Task''': В задаче необходимо реализовать бэггинг (bootstrap aggregating) для двухслойной нейронной сети. Такая модель будет являться мультитасковой and предсказывать взаимодействие двух типов молекул: рецепторов and протеинов. Решение этой задачи необходимо для разработки новых лекарств (drug design).
+
* '''Problem:''' In The problem, it is necessary to implement bagging (bootstrap aggregating) for a two-layer neural network. Such a model will be multiThe probleming and predict the interaction of two types of molecules: receptors and proteins. The solution of this problem is necessary for the development of new drugs (drug design).
-
* '''Data:''' описание 8500+ протеинов and метки для 12 рецепторов
+
* '''Data:''' description of 8500+ proteins and labels for 12 receptors
-
* '''References:''': будет отправлена studentу
+
* '''References:''' will be sent to the student
-
* '''Basic algorithm:''' двухслойная нейронная сеть
+
* '''Base algorithm:''' two-layer neural network
-
* '''Solution:''' Композиция базовых классификаторов бэггинг
+
* '''Solution:''' Composition of base classifiers bagging
-
* '''Novelty:''' Такой подход является новаторским в области drug design
+
* '''Novelty:''' This approach is innovative in the field of drug design
* '''consultant''': Maria Popova
* '''consultant''': Maria Popova
-
===Task 12 ===
+
===12. 2016===
-
* '''Name:''' Смеси моделей в векторной авторегрессии в задаче прогнозирования (больших) временных рядов.
+
* '''Title:''' Mixtures of models in vector autoregression in the problem of predicting (large) time series.
-
* '''Task''': Имеется набор временных рядов длины T, содержащих показания различных датчиков, отражающих состояние устройства. Необходимо предсказать следующие t показаний датчиков. Практическая значимость: перед поломкой состояние устройства меняется, предсказание "аномального" поведения поможет своевременно принять меры and избежать поломки или минимизировать потери.
+
* '''Problem:''' There is a set of time series of length T containing the readings of various sensors that reflect the state of the device. It is necessary to predict the next t sensor readings. Practical significance: before a breakdown, the state of the device changes, the prediction of "abnormal" behavior will help to take timely measures and avoid breakdowns or minimize losses.
-
* '''Data:''' Многомерные временные ряды с показаниями различных датчиков серверов (загрузка ЦП, памяти, температура)
+
* '''Data:''' Multivariate time series with indications of various server sensors (CPU, memory, temperature)
-
* '''References:''': Ключевые слова: mixture models, boosting, Adaboost, векторная авторегрессия.
+
* '''References:''' Keywords: mixture models, boosting, Adaboost, vector autoregression.
-
**Александр Цыплаков. Введение в прогнозирование в классических моделях временных рядов. [http://quantile.ru/01/01-AT.pdf]
+
*# Alexander Tsyplakov. Introduction to forecasting in classical time series models. [http://quantile.ru/01/01-AT.pdf]
-
**Нейчев Р.Г., Катруца А.М., Strizhov V.V. Выбор оптимального набора признаков из мультикоррелирующего множества в задаче прогнозирования[http://strijov.com/papers/Neychev2015FeatureSelection.pdf]
+
*# Neichev R.G., Katrutsa A.M., Strijov V.V. Selection of the optimal set of features from a multicorrelated set in the forecasting problem[http://strijov.com/papers/Neychev2015FeatureSelection.pdf]
-
**Christopher M. Bishop. Pattern Recognition and Machine Learning. Страница 667
+
*# Christopher M. Bishop. Pattern Recognition and Machine Learning. Page 667
-
* '''Basic algorithm''': Бустинг, алгоритм Adaboost.
+
* '''Basic algorithm''': Boosting, Adaboost algorithm.
-
* '''Solution:''' Использовать для построения проноза смесь нескольких линейных моделей вместо одной сложной.
+
* '''Solution:''' Use a mixture of several linear models instead of one complex one to build pronosis.
-
* '''Novelty:''' Доработано пространство параметров для смеси моделей в векторной авторегрессии.
+
* '''Novelty:''' Improved parameter space for mixture of models in vector autoregression.
* '''consultant''': Radoslav Neichev
* '''consultant''': Radoslav Neichev
-
===Task 13 ===
+
===13. 2016===
-
* '''Name:''' Отбор мультикоррелирующих признаков в задаче векторной авторегрессии.
+
* '''Title:''' Selection of multicorrelated features in the problem of vector autoregression.
-
* '''Task''': Имеется набор временных рядов, содержащих показания различных датчиков, отражающих состояние устройства. Показания датчиков коррелируют между собой. Необходимо отобрать оптимальный набор признаков для решения задачи прогнозирования.
+
* '''Problem:''' There is a set of time series containing the readings of various sensors that reflect the state of the device. The readings of the sensors correlate with each other. It is necessary to select the optimal set of features for solving the forecasting problem.
-
* '''Data:''' Многомерные временные ряды с показаниями различных датчиков серверов (загрузка ЦП, памяти, температура)
+
* '''Data:''' Multivariate time series with indications of various server sensors (CPU, memory, temperature)
-
* '''References:''': Ключевые слова: bootstrap aggreagation, метод Белсли, векторная авторегрессия.
+
* '''References:''' Keywords: bootstrap aggregation, Belsley method, vector autoregression.
-
**Нейчев Р.Г., Катруца А.М., Strizhov V.V. Выбор оптимального набора признаков из мультикоррелирующего множества в задаче прогнозирования[http://strijov.com/papers/Neychev2015FeatureSelection.pdf]
+
*# Neichev R.G., Katrutsa A.M., Strijov V.V. Selection of the optimal set of features from a multicorrelated set in the forecasting problem[http://strijov.com/papers/Neychev2015FeatureSelection.pdf]
-
* '''Basic algorithm''': метод Белсли для одномерной авторегрессии (см. статью из списка литературы).
+
* '''Basic algorithm''': Belsley's method for univariate autoregression (see bibliography article).
-
* '''Solution:''' Применить метод Белсли для обнаружения коррелирующих признаков.
+
* '''Solution:''' Apply the Belsley method to detect correlated features.
-
* '''Novelty:''' Метод Белсли применяется для векторной авторегрессии.
+
* '''Novelty:''' The Belsley method is used for vector autoregression.
* '''consultant''': Radoslav Neichev
* '''consultant''': Radoslav Neichev
-
===Task 14 ===
+
===14. 2016===
-
* '''Name:''' Порождение признаков в задаче прогнозирования.
+
* '''Title:''' Generation of features in the prediction problem.
-
* '''Task''': Имеется набор временных рядов, содержащих показания различных датчиков, отражающих состояние устройства. Необходимо расширить пространство признаков с помощью нелинейных параметрический порождающих функций.
+
* '''Problem:''' There is a set of time series containing the readings of various sensors that reflect the state of the device. It is necessary to expand the feature space with the help of non-linear parametric generating functions.
-
* '''Data:''' Многомерные временные ряды с показаниями различных датчиков серверов (загрузка ЦП, памяти, температура)
+
* '''Data:''' Multivariate time series with indications of various server sensors (CPU, memory, temperature)
-
* '''References:''': Ключевые слова: криволинейная регрессия, порождение признаков, нелинейная регрессия, аппроксимация временных рядов.
+
* '''References:''' Keywords: curvilinear regression, feature generation, non-linear regression, time series approximation.
-
*.П. Кузнецов, Strizhov V.V., М.М. Медведникова. Алгоритм многоклассовой классификации объектов, описанных в ранговых шкалах.[http://strijov.com/papers/Kuznetsov2012Curvilinear.pdf]
+
*# M.P. Kuznetsov, Strijov V.V., M.M. Medvednikov. Algorithm for multiclass classification of objects described in rank scales.[http://strijov.com/papers/Kuznetsov2012Curvilinear.pdf]
-
* '''Basic algorithm''': Непараметрические порождающие функициии.
+
* '''Basic algorithm''': Non-parametric generating functions.
-
* '''Solution:''' Применить к признакам квазилинейные and нелинейные преобразования зависящие от параметра.
+
* '''Solution:''' Apply quasi-linear and non-linear parameter dependent transformations to features.
-
* '''Novelty:''' Предложен новый набор признаков для решения авторегрессионных задач.
+
* '''Novelty:''' A new set of features for solving autoregressive problems is proposed.
* '''consultant''': Roman Isachenko
* '''consultant''': Roman Isachenko
-
===Task 15 ===
+
===15. 2016===
-
* '''Name:''' Преобразования временных рядов для декодирование движения руки с помощью ECoG сигналов (electrocorticographic signals) у обезьян.
+
* '''Title:''' Time series transformations for hand motion decoding using ECoG signals (electrocorticographic signals) in monkeys.
-
* '''Task''': Имеется набор временных рядов, записи ECoG сигналов. Необходимо выделить признаки с помощью преобразований временных рядов (например, оконного преобразования Фурье).
+
* '''Problem:''' There is a set of time series records of ECoG signals. It is necessary to extract the features using time series transformations (for example, the windowed Fourier transform).
-
* '''Data:''' Многомерные временные ряды с показаниями ECOG and данные о движении обезьян [http://neurotycho.org/food-tracking-task]
+
* '''Data:''' Multivariate time series with ECOG readings and monkey movement data [http://neurotycho.org/food-tracking-The problem]
-
* '''References:''': Ключевые слова: выделение признаков, преобразования временных рядов, ECoG signal processing
+
* '''References:''' Keywords: feature extraction, time series transformations, ECoG signal processing
-
**Zenas C. Chao, Yasuo Nagasaka and Naotaka Fujii. Long-term asynchronous decoding of arm motion using electrocorticographic signals in monkeys[http://journal.frontiersin.org/article/10.3389/fneng.2010.00003/full]
+
*# Zenas C. Chao, Yasuo Nagasaka and Naotaka Fujii. Long-term asynchronous decoding of arm motion using electrocorticographic signals in monkeys
-
* '''Basic algorithm''': Вейвлет-преобразование (англ. Wavelet transform)
+
* '''Basic algorithm''': Wavelet transform
-
* '''Solution:''' Выделение признаков из ECoG различными методами.
+
* '''Solution:''' Feature extraction from ECoG by various methods.
-
* '''Novelty:''' Анализ оптимальности Вейвлет-преобразования в Taskх обработки ECoG сигналов
+
* '''Novelty:''' Wavelet Transform Optimality Analysis in ECoG Signal Processing The problems
-
* '''consultant''': Задаянчук Андрей
+
* '''consultant''': Zadayanchuk Andrey
-
===Task 16 ===
+
===16. 2016===
-
* '''Name:''' Адаптивный нелинейный метод восстановления матрицы по частичным наблюдениям
+
* '''Title:''' An adaptive nonlinear method for recovering a matrix from partial observations
-
* '''Task''': Пусть есть неизвестная (возможно многомерная) матрица A, позиция элемента в ней описывается целочисленным вектором p. Известны значения матрицы на некотором подмножестве ее элементов. Требуется найти параметризацию and параметры такие, что на некотором некотором подмножестве элементов минимизируется квадратичное отклонение. Более подробное описание по ссылке [https://www.dropbox.com/s/6xkk3xuzaa4y472/AdaptiveNonlinearMC.pdf?dl=0]
+
* '''Problem:''' Let there be an unknown (possibly multidimensional) matrix A, the position of an element in it is described by an integer vector p. The values of the matrix on some subset of its elements are known. It is required to find a parametrization and parameters such that the quadratic deviation is minimized on some subset of elements. More detailed description at the link [https://www.dropbox.com/s/6xkk3xuzaa4y472/AdaptiveNonlinearMC.pdf?dl=0]
-
* '''Data:''' модельные данные, Netflix Prize Data Set, MovieLens 20M Dataset, Criteo Display Advertising Challenge Dataset
+
* '''Data:''' model data, Netflix Prize Data Set, MovieLens 20M Dataset, Criteo Display Advertising Challenge Dataset
-
* '''References:''':
+
* '''References:'''
-
**"ACCAMS: Additive Co-Clustering to Approximate Matrices Succinctly" (Beutel, Amr Ahmed, Smola)
+
*# "ACCAMS: Additive Co-Clustering to Approximate Matrices Succinctly" (Beutel, Amr Ahmed, Smola)
-
**"Non-linear Matrix Factorization with Gaussian Processes" (Neil D. Lawrence)
+
*# "Non-linear Matrix Factorization with Gaussian Processes" (Neil D. Lawrence)
-
**"Low-rank matrix completion using alternating minimization" (Prateek Jain, Praneeth Netrapalli, Sujay Sanghavi)
+
*# "Low-rank matrix completion using alternating minimization" (Prateek Jain, Praneeth Netrapalli, Sujay Sanghavi)
-
* '''Basic algorithm''': Низкоранговое приближение
+
* '''Basic algorithm''': Low-rank approximation
-
* '''Solution:''' and параметры, and параметризацию искать из данных.
+
* '''Solution:''' and parameters, and search for parametrization from the data.
-
* '''Novelty:''' Обобщение работ в данной области; предложена новая модель, эфективность которой предлагается проверить
+
* '''Novelty:''' A summary of works in this area; a new model is proposed, the effectiveness of which is proposed to be tested
-
* '''consultant''': Михаил Трофимов
+
* '''consultant''': Mikhail Trofimov
-
* '''Желательные навыки''': python
+
* '''Desirable Skills''': python
-
===Task 17 ===
+
===17. 2016===
-
* '''Name:''' Построение скоринговых моделей в системе SAS (либо MATLAB).
+
* '''Title:''' Building scoring models in the SAS system (or MATLAB).
-
* '''Task''': Описать основные этапы построения скоринговых моделей. На этапе подготовки данных решается Task фильтрации выборов (удаления шумовых объектов). Так как выборка содержит значительное число признаков, не коррелирующих с платежеспособностью, необходимо решать задачу отбора признаков. Кроме того, в силу неоднородности данных (например, по регионам) предлагается строить смесь моделей, в которой каждая модель описывает свое подмножество выборки. При этом различным компонентам смеси могут соответствовать разные наборы признаков.
+
* '''Problem:''' Describe the main steps in building scoring models. At the stage of data preparation, The problem of filtering choices (removing noise objects) is solved. Since the sample contains a significant number of features that do not correlate with solvency, it is necessary to solve the problem of feature selection. In addition, due to the heterogeneity of the data (by example, by region), it is proposed to build a mixture of models, in which each model describes its own subset of the sample. At the same time, different sets of features can correspond to different components of the mixture.
-
* '''Data:''' Кредитная Story/анкеты потенциальных заемщиков [http://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/], [http://archive.ics.uci.edu/ml/datasets/Statlog+%28Australian+Credit+Approval%29/].
+
* '''Data:''' Credit Story/Potential Borrower Questionnaires [http://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/], [http://archive.ics .uci.edu/ml/datasets/Statlog+%28Australian+Credit+Approval%29/].
-
* '''References:''':
+
* '''References:'''
-
** Хосмер, Лемешов. Логистическая регрессия (англ.)
+
*# Hosmer, Lemeshov. Logistic regression
-
** Siddiqi. Constructing scorecards
+
*# Siddiqi. Constructing scorecards
-
** [http://svn.code.sf.net/p/mlalgorithms/code/Scoring Материалы по построению скоринговых карт]
+
*# [http://svn.code.sf.net/p/mlalgorithms/code/Scoring Scoring Mapping Materials]
-
* '''Basic algorithm''': Логистическая регрессия
+
* '''Basic algorithm''': Logistic regression
-
* '''Solution:''' Смесь моделей
+
* '''Solution:''' Mix of models
-
* '''Novelty:''' Описан способ построения скоринговых карт, в котором в задачу оптимизации включены как порождение признаков, так and мультимоделирование.
+
* '''Novelty:''' A method for constructing scoring maps is described, in which both feature generation and multi-modeling are included in the optimization problem.
-
* '''consultant''': Раиса Джамтырова
+
* '''consultant''': Raisa Jamtyrova
-
* '''Желательные навыки''': SAS
+
* '''Desirable Skills''': SAS
-
===Task 18 ===
+
===18. 2016===
-
* '''Name:''' Аппроксимация границ радужки глаза.
+
* '''Title:''' Approximation of the boundaries of the iris.
-
* '''Task''': По изображению человеческого глаза определить окружности, аппроксимирующие внутреннюю and внешнюю границу радужки.
+
* '''Problem:''' Based on the image of the human eye, determine the circles approximating the inner and outer border of the iris.
-
* '''Data:''' Растровые монохромные изображения, типичный размер 640*480 пикселей (однако, возможны and другие размеры)
+
* '''Data:''' Raster monochrome images, typical size 640*480 pixels (however, other sizes are also possible)
[http://www.bath.ac.uk/elec-eng/research/sipg/irisweb/], [http://www.cb-sr.ia.ac.cn/IrisDatabase.htm].
[http://www.bath.ac.uk/elec-eng/research/sipg/irisweb/], [http://www.cb-sr.ia.ac.cn/IrisDatabase.htm].
-
* '''References:''':
+
* '''References:'''
-
** К.А.Ганькин, А.Н.Гнеушев, И.А.Матвеев Сегментация изображения радужки глаза, основанная на приближенных методах с последующими уточнениями // Известия РАН. Теория and системы управления, 2014, 2, с. 78–92.
+
*# K.A. Gankin, A.N. Gneushev, I.A. Matveev Segmentation of the iris image based on approximate methods with subsequent refinements // Izvestiya RAN. Theory and control systems, 2014, no. 2, p. 78–92.
-
** Duda, R. O. Use of the Hough transformation to detect lines and curves in pictures / R. O. Duda, P. E. Hart // Communications of the ACM. 1972. Vol. 15, no. 1. Pp.
+
*# Duda, R. O. Use of the Hough transformation to detect lines and curves in pictures / R. O. Duda, P. E. Hart // Communications of the ACM. 1972 Vol. 15, no. 1.Pp.
-
* '''Basic algorithm''': Ефимов Юрий. Поиск внешней and внутренней границ радужки на изображении глаза методом парных градиентов, 2015.
+
* '''Basic algorithm''': Efimov Yury. Search for the outer and inner boundaries of the iris in the eye image using the paired gradient method, 2015.
-
* '''Solution:''' См. [[Media:Iris_circle_problem.pdf | Iris_circle_problem.pdf]]
+
* '''Solution:''' See [[Media:Iris_circle_problem.pdf | iris_circle_problem.pdf]]
-
* '''Novelty:''' Предложен быстрый беспереборный алгоритм аппроксимации границ с помощью линейных мультимоделей.
+
* '''Novelty:''' A fast non-enumerative algorithm for approximating boundaries using linear multimodels is proposed.
-
* '''consultant''': Юрий Ефимов (автор Стрижов, Expert Матвеев)
+
* '''consultant''': Yuri Efimov (by Strijov V.V., Expert Matveev)
-
=== Task 19 ===
+
===19. 2016===
-
* '''Name:''' Аппроксимация комбинаторных оценок переобучения для отбора признаков в задаче медицинской диагностики.
+
* '''Title:''' Approximation of combinatorial overfitting estimates for feature selection in the problem of medical diagnostics.
-
* '''Task''': [[Технология информационного анализа электрокардиосигналов]] по В. М. Успенскому применяется для диагностики заболеваний внутренних органов по электрокардиограмме. Линейный наивный байесовский классификатор с отбором признаков хорошо зарекомендовал себя в этой задаче. Однако для отбора признаков до сих пор использовались только очень простые жадные стратегии. Предлагается использовать более интенсивные переборные стратегии, чтобы найти лучшие and более короткие диагностические наборы признаков. Однако чем интенсивнее перебор, тем выше вероятность переобучения. Для сокращения переобучения предлагается использовать комбинаторные оценки переобучения пороговых решающих правил. Для эффективного вычисления этих оценок предлагается использовать суррогатное моделирование.
+
* '''Problem:''' [[Technology of information analysis of electrocardiosignals]] according to V. M. Uspensky is used to diagnose diseases of internal organs by electrocardiogram. The linear naive bayesian classifier with feature selection performs well in this The problem. However, only very simple greedy strategies have been used so far for feature selection. It is proposed to use more intensive enumeration strategies to find better and shorter diagnostic feature sets. However, the more intense the search, the higher the probability of overfitting. To reduce overfitting, it is proposed to use combinatorial estimates of overfitting of threshold decision rules. For efficient calculation of these estimates, it is proposed to use surrogate modeling.
-
* '''Data:''' Выборки векторов признаковых описаний ЭКГ, полученные с помощью системы скрининговой диагностики «Скринфакс». Будут выданы.
+
* '''Data:''' Samples of vectors of ECG feature descriptions obtained using the Screenfax screening diagnostics system. Will be issued.
-
* '''References:''':
+
* '''References:'''
-
** ''Успенский В. М.'' Информационная функция сердца. Теория and практика диагностики заболеваний внутренних органов методом информационного анализа электрокардиосигналов. – М.: Экономика and информатика, 2008. 116 с.
+
*# ''Uspensky V. M.'' Informational function of the heart. Theory and practice of diagnosing diseases of internal organs by the method of information analysis of electrocardiosignals. - M.: Economics and informatics, 2008. - 116 p.
-
** ''Vorontsov K. V.'' [[Media:Voron-2011-tnop.pdf|Теория надёжности обучения по прецедентам]]. Курс лекций ВМК МГУ and МФТИ. 2011.
+
*# Vorontsov K. V. [[Media:Voron-2011-tnop.pdf|Reliability theory of precedent learning]]. Course of lectures of VMK MSU and MIPT. 2011.
-
** ''Ишкина Ш. Х.'' Комбинаторные оценки обобщающей способности как критерии отбора признаков в синдромном алгоритме. - Тезисы 58-научной конференции МФТИ. URL: http://conf58.mipt.ru/static/reports_pdf/755.pdf
+
*# ''Ishkina Sh. Kh.'' Combinatorial estimates of generalizing ability as criteria for feature selection in the syndromic algorithm. - Abstracts of the 58th scientific conference of the Moscow Institute of Physics and Technology. URL: http://conf58.mipt.ru/static/reports_pdf/755.pdf
-
** MVR Composer http://www.machinelearning.ru/wiki/index.php?title=MVR_Composer
+
*# MVR Composer http://www.machinelearning.ru/wiki/index.php?title=MVR_Composer
-
* '''Basic algorithm:''' линейный наивный байесовский классификатор с отбором признаков.
+
* '''Base algorithm:''' linear naive bayes classifier with feature selection.
-
* '''Solution:''' Для оценивания переобучения используются точные комбинаторные формулы. Для аппроксимации (суррогатного моделирования) этих формул используется MVR Composer. Для отбора признаков используются эвристические полужадные алгоритмы комбинаторной оптимизации.
+
* '''Solution:''' Exact combinatorial formulas are used to evaluate overfitting. For approximation (surrogate modeling) of these formulas, MVR Composer is used. Heuristic semi-greedy combinatorial optimization algorithms are used for feature selection.
-
* '''Novelty:''' Ранее для отбора признаков комбинаторные оценки переобучения не применялись. Данный метод позволяет сокращать диагностические наборы признаков and улучшать качество классификации.
+
* '''Novelty:''' Previously, combinatorial retraining estimates were not used for feature selection. This method makes it possible to reduce diagnostic sets of features and improve the quality of classification.
-
* '''consultant''': Ишкина Шаура, Кулунчаков Андрей (MVR Composer), '''автор задачи''': Vorontsov K. V.
+
* '''consultant''': Ishkina Shaura, Kulunchakov Andrey (MVR Composer), '''problem author''': Vorontsov K. V.
-
=== Task 20 ===
+
===20. 2016===
-
* '''Name:''' Модель порождения объектов в задаче прогнозирования временных рядов
+
* '''Title:''' Object generation model in the problem of time series forecasting
-
*'''Task''': Построить модель порождения объектов для задачи прогнозирования, которая будет создавать качественную выборку для последующего решения задачи прогнозирования.
+
*'''Problem''': Build an object generation model for the prediction The problem, which will create a high-quality sample for the subsequent solution of the prediction The problem.
-
* '''Data:''' Временные ряды потребления электроэнергии, временные ряды акселерометра мобильного телефона
+
* '''Data:''' Electricity consumption time series, mobile phone accelerometer time series
-
* '''References:''':
+
* '''References:'''
-
**Keogh E. J., Pazzani M. J. Scaling up dynamic time warping to massive datasets
+
*# Keogh E. J., Pazzani M. J. Scaling up dynamic time warping to massive datasets
-
**Salvador S., Chan P. Fastdtw: Toward accurate dynamic time warping in linear time and space
+
*# Salvador S., Chan P. Fastdtw: Toward accurate dynamic time warping in linear time and space
-
**Kuznetsov M.P., Ivkin N.P. Алгоритм классификации временных рядов акселерометра по комбинированному признаковому описанию
+
*# Kuznetsov M.P., Ivkin N.P. Algorithm for classification of accelerometer time series by combined feature description
-
**Карасиков М. Е. Классификация временных рядов в пространстве параметров порождающих моделей [https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group174/Karasikov2015TimeSeriesClassification/doc/Karasikov2015TimeSeriesClassification.pdf?format=raw]
+
*# Karasikov M. E. Classification of time series in the space of parameters of generating models
-
* '''Basic algorithm:''' Различные эвристики
+
* '''Base algorithm:''' Various heuristics
-
* '''Постановка задачи''': Формулировка and подробное описание задачи приведено по ссылке [https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group274/Goncharov2016Essays/Goncharov2016Consult.pdf?format=raw]
+
* '''Problem Statement''': The formulation and detailed description of the problem is given at [https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group274/Goncharov2016Essays/Goncharov2016Consult.pdf?format=raw]
-
* '''Novelty:''' рассмотрение модели порождения данных в подобной задаче
+
* '''Novelty:''' consideration of the data generation model in a similar The problem
-
* '''consultant''': Гончаров Алексей
+
* '''consultant''': Alexey Goncharov
-
=== Task 21 ===
+
===21. 2016===
-
* '''Name:''' Алгоритм прогнозирования структуры локально-оптимальных моделей
+
* '''Title:''' Algorithm for predicting the structure of locally optimal models
-
*'''Task''': Требуется спрогнозировать временной ряд с помощью некоторой параметрической суперпозицией алгебраических функций. Предлагается не стоить прогностическую модель, а спрогнозировать ее, то есть предсказать структуру аппроксимирующей суперпозиции. Вводится класс рассматриваемых суперпозиций, and на множестве таких структурных описаний проводится поиск локально-оптимальной модели для рассматриваемой задачи. Task состоит в 1) поиске подходящего структурного описания модели 2) описания алгоритма поиска той структуры, которая будет соответствовать оптимальной модели 3) описания алгоритма обратного построения модели по ее структурному описанию. В качестве уже имеющегося примера ответа на вопросы 1-3, смотри работы А. А. Варфоломеевой.
+
*'''Problem''': It is required to predict a time series using some parametric superposition of algebraic functions. It is proposed not to cost the prognostic model, but to predict it, that is, to predict the structure of the approximating superposition. A class of considered superpositions is introduced, and on the set of such structural descriptions, a search is made for a locally optimal model for the problem under consideration. The problem consists in 1) searching for a suitable structural description of the model 2) describing the search algorithm for the structure that will correspond to the optimal model 3) describing the algorithm for inverse construction of the model according to its structural description. For an already existing example of the answer to questions 1-3, see the work of A. A. Varfolomeeva.
-
* '''Data:''' Набор временных рядов, который подразумевает восстановление функциональных зависимостей. Предлагается сначала использовать синтетические данные или сразу применить алгоритм к прогнозированию временных рядов 1) потребления электроэнергии 2) физической активности с последующим анализом получающихся структур.
+
* '''Data:''' A set of time series, which implies the restoration of functional dependencies. It is proposed to first use synthetic data or immediately apply the algorithm to forecasting time series 1) electricity consumption 2) physical activity with subsequent analysis of the resulting structures.
-
* '''References:''':
+
* '''References:'''
-
**A. A. Varfolomeeva Выбор признаков при разметке библиографических списков методами структурного обучения, 2013, [http://www.machinelearning.ru/wiki/images/f/f2/Varfolomeeva2013Diploma.pdf?format=raw]
+
*# A. A. Varfolomeeva Selection of features when marking up bibliographic lists using structural learning methods, 2013, [http://www.machinelearning.ru/wiki/images/f/f2/Varfolomeeva2013Diploma.pdf?format=raw]
-
**Bin Cao, Ying Li and Jianwei Yin Measuring Similarity between Graphs Based on the Levenshtein Distance, 2012, [http://naturalspublishing.com/files/published/92cn7jm44d8wt1.pdf?format=raw]
+
*# Bin Cao, Ying Li and Jianwei Yin Measuring Similarity between Graphs Based on the Levenshtein Distance, 2012, [http://naturalspublishing.com/files/published/92cn7jm44d8wt1.pdf?format=raw]
-
* '''Basic algorithm:''' Конкретно к предлагаемой проблеме базового алгоритма нет. Предлагается попробовать повторить эксперимент А. А. Варфоломеевой для другого структурного описания, чтобы понять, что происходит.
+
* '''Base algorithm:''' Specifically, there is no basic algorithm for the proposed problem. It is proposed to try to repeat the experiment of A. A. Varfolomeeva for a different structural description in order to understand what is happening.
-
* '''Solution:''' Суперпозиция алгебраических функций задает ордерево, на вершинах которого заданы метки соответствующих алгебраических функций или переменных. Поэтому структурным описанием такой суперпозиции может являться ее DFS-code. Это строка, состоящая из меток вершин, записанных в порядке обхода дерева поиском в глубину. Зная арности соответствующих алгебраических функций, можем любой такой DFS-code восстановить за O(n) and получить обратно суперпозицию функций. На множестве подобных строковых описаний предлагается искать то строковое описание, которое будет соответствовать оптимальной модели.
+
* '''Solution:''' The superposition of algebraic functions defines an ortree, on the vertices of which the labels of the corresponding algebraic functions or variables are given. Therefore, the structural description of such a superposition can be its DFS-code. This is a string consisting of vertex labels, written in the order in which the tree is traversed by depth-first search. Knowing the arities of the corresponding algebraic functions, we can restore any such DFS-code in O(n) and get back the superposition of functions. On the set of similar string descriptions, it is proposed to search for the string description that will correspond to the optimal model.
-
* '''consultant''': Кулунчаков Андрей
+
* '''consultant''': Kulunchakov Andrey
-
=== Task 22 ===
+
===22. 2016===
-
* '''Name:''' Определение заимствований в тексте без указания источника
+
* '''Title:''' Definition of borrowings in the text without indicating the source
-
*'''Task''': Решается Task выявления внутренних заимствований в тексте. Требуется проверить гипотезу о том, что заданный текст написан единственным автором, and в случае ее невыполнения выделить заимствованные части текста. Заимствованием считается часть текста, предположительно написанная другим автором and содержащая характерные отличия от стиля основного автора. Требуется разработать такую стилевую функцию, которая позволяет с высокой степенью достоверности отличить стиль основного автора текста от заимствований.
+
*'''Problem''': The problem is solved to detect internal borrowings in the text. It is required to test the hypothesis that the given text was written by a single author, and if it is not fulfilled, highlight the borrowed parts of the text. A borrowing is a part of the text, presumably written by another author and containing characteristic differences from the style of the main author. It is required to develop such a style function that allows to distinguish with a high degree of certainty the style of the main author of the text from borrowings.
-
* '''Data:''' Коллекция конкурса PAN-2011.
+
* '''Data:''' PAN-2011 contest collection.
-
* '''References:''':
+
* '''References:'''
-
*# Oberreuter, G., L’Huillier, G., Rıos, S. A., & Velásquez, J. D. (2011). Approaches for intrinsic and external plagiarism detection. Proceedings of the PAN.
+
*# Oberreuter, G., L'Huillier, G., Rıos, S. A., & Velásquez, J. D. (2011). Approaches for intrinsic and external plagiarism detection. Proceedings of the PAN.
-
* '''Basic algorithm, решение''': На текущий момент реализован базовый метод выявления зависимостей, основанный на анализе частотностей слов and символьных n-грамм в предложении. Для каждого текста формируется словарь, в котором каждому слову (n-грамме) поставлено в соответствие значение его встречаемости в тексте. На основе значений встречаемости формируется признаковое описание каждого сегмента-предложения. Выполняется классификация сегментов текста на основе Expertной разметки заимствований. Качество базового алгоритма составляет 0.29 по F1-мере (Pladget 0.21) на коллекции PAN-2011, в то время как качество лучшего алгоритма, принимавшего участие в соревновании 2011 года [Oberreuter], составляет 0.32 по F1-мере (Pladget 0.32). Предлагается реализовать этот алгоритм and сравнить его с базовым методом.
+
* '''Basic algorithm, solution''': At the moment, a basic method for identifying dependencies is implemented, based on the analysis of the frequencies of words and symbolic n-grams in a sentence. For each text, a dictionary is formed, in which each word (n-gram) is assigned the value of its occurrence in the text. Based on the occurrence values, an indicative description of each segment-offer is formed. Classification of text segments is performed on the basis of Expert markup of borrowings. The quality of the base algorithm is 0.29 in F1-measure (Pladget 0.21) on the PAN-2011 collection, while the quality of the best algorithm that participated in the 2011 competition [Oberreuter] is 0.32 in F1-measure (Pladget 0.32). It is proposed to implement this algorithm and compare it with the base method.
-
* '''consultant''': [[Участник:mikethehuman|Михаил Кузнецов]]
+
* '''consultant''': [[User:mikethehuman|Mikhail Kuznetsov]]
-
=== Task 23 ===
+
===23. 2016===
-
* '''Name:''' Использование методов снижения размерности при построении признакового пространства в задаче обнаружения внутреннего плагиата
+
* '''Title:''' Using Dimension Reduction Methods When Building a Feature Space in the Problem of Internal Plagiarism Detection
-
*'''Task''': Для более эффективного решения задачи обнаружения внутреннего плагиата использовать методы снижения размерности, сохраняющие расстояние между объектами. Требуется доработать метод tSNE [2], включив в модель информацию о разметке данных and возможность добавления ранее не рассмотренных объектов в пространство сниженной размерности. Подробнее см. [1]
+
*'''Problem''': For a more efficient solution to The problem of detecting internal plagiarism, use dimensionality reduction methods that preserve the distance between objects. It is required to refine the tSNE method [2] by including in the model information about data markup and the possibility of adding previously unconsidered objects to the space of reduced dimension. For details see [1]
-
* '''Data:''' Коллекция конкурса PAN-2011.
+
* '''Data:''' PAN-2011 contest collection.
-
* '''References:''':
+
* '''References:'''
-
*# [[Media:Problem_statement_dim_reduce.pdf| Problem_statement_dim_reduce.pdf‎]]
+
*# [[Media:Problem_statement_dim_reduce.pdf| Problem_statement_dim_reduce.pdf‎]]
*# Laurens van der Maaten. Visualizing Data using t-SNE Journal of Machine Learning Research, 9 (2008) 2579-2605.
*# Laurens van der Maaten. Visualizing Data using t-SNE Journal of Machine Learning Research, 9 (2008) 2579-2605.
*# Julian Brooke and Graeme Hirst. Paragraph Clustering for Intrinsic Plagiarism Detection using a Stylistic Vector-Space Model with Extrinsic Features, 2012.
*# Julian Brooke and Graeme Hirst. Paragraph Clustering for Intrinsic Plagiarism Detection using a Stylistic Vector-Space Model with Extrinsic Features, 2012.
-
* '''Базовой алгоритм, решение''': См. [1]
+
* '''Basic algorithm, solution''': See [1]
-
* '''consultant''': Мотренко Анастасия
+
* '''consultant''': Anastasia Motrenko
-
=== Task 26 ===
+
===26. 2016===
-
* '''Name:''' Построение отображений с минимальной деформацией для сравнения изображений с эталоном.
+
* '''Title:''' Construction of mappings with minimal deformation to compare images with the standard.
-
* '''Task''': Применить вариационный метод построения квазиизометрических отображений для решения классической задачи геометрической морфологии and регистрации изображений - построения двумерной или трехмерной деформации для сравнения с эталоном.
+
* '''Problem:''' Apply the variational method of constructing quasi-isometric mappings to solve the classical problem of geometric morphology and image registration - constructing a two-dimensional or three-dimensional deformation for comparison with the standard.
-
* '''Data:''' Изображения в формате bmp. На первом этапе можно задавать простые тела посредством ч/б раскраски декартовой решетки.
+
* '''Data:''' Images in bmp format. At the first stage, simple bodies can be defined by means of a b/w coloring of the Cartesian lattice.
-
* '''References:''':
+
* '''References:'''
-
*# Michael I. Miller, Alain Trouve, Laurent Younes. ON THE METRICS AND EULER-LAGRANGE EQUATIONS OF COMPUTATIONAL ANATOMY. Annu. Rev. Biomed. Eng. 2002. 4:375–405
+
*# Michael I. Miller, Alain Trouve, Laurent Younes. ON THE METRICS AND EULER-LAGRANGE EQUATIONS OF COMPUTATIONAL ANATOMY. Annu. Rev. Biomed. Eng. 2002. 4:375–405
*# Beg MF, Miller MI, Trouve A, Younes L. Computing large deformation metric mappings via geodesics flows of diffeomorphisms. International Journal of Computer Vision. 2005; V.61(2):139-157.
*# Beg MF, Miller MI, Trouve A, Younes L. Computing large deformation metric mappings via geodesics flows of diffeomorphisms. International Journal of Computer Vision. 2005; V.61(2):139-157.
*# Trouve A. An approach of pattern recognition through infinite dimensional group action. Research report LMENS-95-9. 1995.
*# Trouve A. An approach of pattern recognition through infinite dimensional group action. Research report LMENS-95-9. 1995.
-
*# Garanzha VA. Maximum norm optimization of quasi-isometric mappings. Num. Linear Algebra Appl. 2002; V.9(6-7):493--510.
+
*# Garanzha VA. Maximum norm optimization of quasi-isometric mappings. Num. Linear Algebra Appl. 2002; V.9(6-7):493-510.
-
*# Garanzha V.A., Kudryavtseva L.N., Utyzhnikov S.V. Untangling and optimization of spatial meshes // Journal of Computational and Applied Mathematics. -- 2014. -- October. -- V. 269 -- P. 24--41.
+
*# Garanzha V.A., Kudryavtseva L.N., Utyzhnikov S.V. Untangling and optimization of spatial meshes // Journal of Computational and Applied Mathematics. -- 2014. -- October. -- V. 269 -- P. 24--41.
-
* '''Basic algorithm:''' Использовать вариационный метод построения отображений, который ранее был предложен для построения пространственных отображений с заданным отображением границы [4], [5], в случае, когда задается мера близости функций, описывающих геометрические тела, например, как среднеквадратичная мера близости функций яркости.
+
* '''Base algorithm:''' Use the variational method for constructing mappings, which was previously proposed for constructing spatial mappings with a given boundary mapping [4], [5], in the case when a measure of proximity of functions describing geometric bodies is given on example , as an rms measure of the proximity of brightness functions.
-
* '''Solution:''' Для существующего кода, который реализует вариационный метод построения двумерных отображений с минимальным искажением, необходимо дописать модуль, реализующий добавку к функционалу, являющуюся мерой близости геометрических тел. Это включает вычисление самого функционала, его градиента, and поправки к предобусловливателю.
+
* '''Solution:''' For the existing code that implements the variational method for constructing two-dimensional mappings with minimal distortion, it is necessary to add a module that implements an additive to the functional, which is a measure of the proximity of geometric bodies. This includes calculating the functional itself, its gradient, and adjusting the preconditioner.
-
* '''Novelty:''' Сравнить полученный метод с методом геодезического потока диффеоморфизмов, предложенного в работах Алэна Труве (см. ссылки [1]-[3]). Оценить качество приближения and быстродействие полученного алгоритма.
+
* '''Novelty:''' Compare the obtained method with the method of geodesic flow of diffeomorphisms proposed in the works of Alain Trouvé (see references [1]-[3]). Estimate the quality of the approximation and the performance of the resulting algorithm.
-
* '''consultant''': Владимир Анатольевич Гаранжа (ВЦ РАН).
+
* '''consultant''': Vladimir Anatolyevich Garanzha (CC RAS).
-
=== Task 27 ===
+
===27. 2016===
-
* '''Name:''' Кросс-язычный тематический поиск научных публикаций.
+
* '''Title:''' Cross-language thematic search for scientific publications.
-
* '''Task''': Содание прототипа поискового сервиса, который принимает в качестве запроса текст научной статьи на русском языке and выдаёт в качестве результата поиска тематически близкие статьи на английском языке из коллекции arXiv.org.
+
* '''Problem:''' Creation of a prototype search service that accepts the text of a scientific article in Russian as a request and returns thematically related articles in English from the arXiv.org collection as a search result.
-
* '''Data:''' Коллекция текстов arXiv.org, двуязычная коллекция текстов Википедии.
+
* '''Data:''' The arXiv.org text collection, Wikipedia's bilingual text collection.
-
* '''References:''': выдадим.
+
* '''References:''' will issue.
-
* '''Basic algorithm:''' Тематическая модель, построенная по объединённой коллекции англоязычного arXiv and двуязычной англо-русской Википедии.
+
* '''Base algorithm:''' Topic model built from the combined collection of the English-language arXiv and the bilingual English-Russian Wikipedia.
-
* '''Solution:''' Построение регуляризованной тематической модели средствами библиотеки [[BigARTM]]. Применение стандартных средств построения инвертированных индексов.
+
* '''Solution:''' Building a regularized topic model using the [[BigARTM]] library. Application of standard means of constructing inverted indexes.
-
* '''Novelty:''' Такого сервиса в русскоязычном интернете пока нет.
+
* '''Novelty:''' There is no such service on the Russian Internet yet.
-
* '''consultant''': Марина Суворова.
+
* '''Consultant''': Marina Suvorova.
-
=== Task 28 ===
+
===28. 2016===
-
* '''Name:''' Поиск резонансных частот в растворах полимеров.
+
* '''Title:''' Search for resonant frequencies in polymer solutions.
-
* '''Task''': Математически Task сводиться к поиску спектральной плотности случайных графов в окрестности точки перколяции.
+
* '''Problem:''' Mathematically, The problem comes down to finding the spectral density of random graphs in the vicinity of the percolation point.
-
* '''Data:''' Симуляционные данные (графы Эрдеша-Реньи в окрестности точки перколяции).
+
* '''Data:''' Simulation data (Erdos-Rényi graphs around the percolation point).
-
* '''References:''': Nazarov L. I. et al. A statistical model of intra-chromosome contact maps //Soft matter. 2015. – Т. 11. – №. 5. – С. 1019-1025.
+
* '''References:''' Nazarov L. I. et al. A statistical model of intra-chromosome contact maps //Soft matter. - 2015. - T. 11. - No. 5. - S. 1019-1025.
-
* '''Basic algorithm:''' Монте-Карло.
+
* '''Base algorithm:''' Monte Carlo.
-
* '''Novelty:''' В настоящее известен алгоритм оценка спектральной плотности линейных цепочек, вопрос с оценкой спектральной плотности ансамблей деревьев открытый.
+
* '''Novelty:''' At present, an algorithm for estimating the spectral density of linear chains is known, the issue with estimating the spectral density of tree ensembles is open.
-
* '''consultant''': Ольга Вальба, Yuri Maksimov, '''Автор задачи''': Нечаев Сергей.
+
* '''Consultant''': Olga Valba, Yuri Maksimov, '''Problem Author''': Nechaev Sergey.
-
=YEAR=
+
==2016 Group 2==
{|class="wikitable"
{|class="wikitable"
Строка 3853: Строка 4265:
! Magazine
! Magazine
|-
|-
-
|Гончаров Алексей (пример)
+
|Akhtyamov Pavel
-
|Метрическая классификация временных рядов
+
|Selection of multicorrelating features in the problem of vector autoregression
-
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/code code],
+
-
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/Goncharov2015MetricClassification.pdf?format=raw paper],
+
-
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/GoncharovAlexey2015PresentationMetricClassification.pdf?format=raw slides]
+
-
|[[Участник:Mpopova|Maria Popova]]
+
-
|Задаянчук Андрей
+
-
|BMF
+
-
|AILSBRCVTDSW
+
-
|10
+
-
|ИИП
+
-
|-
+
-
|Ахтямов Павел
+
-
|Отбор мультикоррелирующих признаков в задаче векторной авторегрессии
+
|[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Akhtyamov2016FeatureSelectionVAR/code/ code],
|[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Akhtyamov2016FeatureSelectionVAR/code/ code],
[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Akhtyamov2016FeatureSelectionVAR/doc/Akhtyamov2016FeatureSelectionVAR.pdf?format=raw paper],
[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Akhtyamov2016FeatureSelectionVAR/doc/Akhtyamov2016FeatureSelectionVAR.pdf?format=raw paper],
[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Akhtyamov2016FeatureSelectionVAR/doc/Akhtyamov2016PresentationFeatureSelectionVAR.pdf?format=raw slides]
[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Akhtyamov2016FeatureSelectionVAR/doc/Akhtyamov2016PresentationFeatureSelectionVAR.pdf?format=raw slides]
-
|[[Участник:Neychev|Radoslav Neichev]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Neychev Radoslav Neichev]
-
|Медведева Анна
+
|Medvedeva Anna
|BF
|BF
|AI+LSB++R+CVTDEH
|AI+LSB++R+CVTDEH
Строка 3877: Строка 4277:
|
|
|-
|-
-
|Батаев Владислав
+
|Bataev Vladislav
-
|Тематическая модель классификации для диагностики заболеваний по электрокардиограмме
+
|Thematic classification model for diagnosing diseases by electrocardiogram
|[https://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Bataev2016CardiogramARTM/code/ code],
|[https://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Bataev2016CardiogramARTM/code/ code],
[https://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Bataev2016CardiogramARTM/doc/Bataev2016CardiogramARTM.pdf?format=raw paper]
[https://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Bataev2016CardiogramARTM/doc/Bataev2016CardiogramARTM.pdf?format=raw paper]
-
|Светлана Цыганова
+
|Svetlana Tsyganova
|
|
|B
|B
Строка 3888: Строка 4288:
|
|
|-
|-
-
|Иванов Илья
+
|Ivanov Ilya
-
|Классификация физической активности: исследование изменения пространства параметров при дообучении and модификации моделей глубокого обучения
+
|Classification of physical activity: study of parameter space change during retraining and modification of deep learning models
|[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Ivanov2016Covariance/code/ code],
|[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Ivanov2016Covariance/code/ code],
[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Ivanov2016Covariance/doc/Ivanov2016Covariance.pdf?format=raw paper],
[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Ivanov2016Covariance/doc/Ivanov2016Covariance.pdf?format=raw paper],
Строка 3900: Строка 4300:
|
|
|-
|-
-
|Медведева Анна
+
|Medvedeva Anna
-
|Модель порождения объектов в задаче прогнозирования временных рядов
+
|Object generation model in the problem of time series forecasting
|[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Medvedeva2016GenerationModelTS/code/ code]
|[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Medvedeva2016GenerationModelTS/code/ code]
[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Medvedeva2016GenerationModelTS/doc/Medvedeva2016ObjectGenerationTS.pdf?format=raw paper]
[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Medvedeva2016GenerationModelTS/doc/Medvedeva2016ObjectGenerationTS.pdf?format=raw paper]
[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Medvedeva2016GenerationModelTS/doc/presentation/Medvedeva2016ObjectGeneration_presentation.pdf?format=raw slides]
[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Medvedeva2016GenerationModelTS/doc/presentation/Medvedeva2016ObjectGeneration_presentation.pdf?format=raw slides]
-
|Гончаров Алексей
+
|Goncharov Alexey
-
|Ахтямов Павел
+
|Akhtyamov Pavel
|BF
|BF
|AILS-BRCVTD0EWS
|AILS-BRCVTD0EWS
Строка 3912: Строка 4312:
|
|
|-
|-
-
|Персиянов Дмитрий
+
|Persianov Dmitry
-
|Темпоральная тематическая модель коллекции пресс-релизов
+
|Temporal theme model of press release collection
|[https://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Persiyanov2016TemporalModelARTM/code/ code]
|[https://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Persiyanov2016TemporalModelARTM/code/ code]
[https://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Persiyanov2016TemporalModelARTM/doc/Persiyanov2016TemporalModelARTM.pdf?format=raw paper]
[https://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Persiyanov2016TemporalModelARTM/doc/Persiyanov2016TemporalModelARTM.pdf?format=raw paper]
[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Persiyanov2016TemporalModelARTM/doc/PersiyanovPresentationTemporalModelARTM.pdf?format=raw slides]
[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Persiyanov2016TemporalModelARTM/doc/PersiyanovPresentationTemporalModelARTM.pdf?format=raw slides]
-
|Никита Дойков
+
|Nikita Doikov
|
|
|BF
|BF
Строка 3924: Строка 4324:
|
|
|-
|-
-
|Семененко Денис
+
|Semenenko Denis
-
|Алгоритм прогнозирования структуры локально-оптимальных моделей
+
|Algorithm for Predicting the Structure of Locally Optimal Models
|[https://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Semenenko2016StructureLearning/code/ code]
|[https://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Semenenko2016StructureLearning/code/ code]
[https://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Semenenko2016StructureLearning/doc/Semenenko2016StructureLearning.pdf?format=raw paper]
[https://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Semenenko2016StructureLearning/doc/Semenenko2016StructureLearning.pdf?format=raw paper]
-
|Кулунчаков Андрей
+
|Kulunchakov Andrey
|
|
|B
|B
Строка 3935: Строка 4335:
|
|
|-
|-
-
|Софиенко Александр
+
|Sofienko Alexander
-
|Согласование логических and линейных моделей классификации в информационном анализе электрокардиосигналов
+
|Coordination of logical and linear classification models in the information analysis of electrocardiosignals
||[https://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Sofienko2016LinearClassificationVAR/code/ code],
||[https://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Sofienko2016LinearClassificationVAR/code/ code],
[https://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Sofienko2016LinearClassificationVAR/doc/Sofienko2016LinearClassification.pdf?format=raw paper]
[https://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Sofienko2016LinearClassificationVAR/doc/Sofienko2016LinearClassification.pdf?format=raw paper]
-
|Влада Целых
+
|Vlada Tselykh
|
|
|B
|B
Строка 3946: Строка 4346:
|
|
|-
|-
-
|Яронская Любовь
+
|Yaronskaya Lyubov
|Sparse Regularized Regression on Protein Complex Data
|Sparse Regularized Regression on Protein Complex Data
|[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Yaronskaya2016SparseRegression/code/ code]
|[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Yaronskaya2016SparseRegression/code/ code]
[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Yaronskaya2016SparseRegression/doc/yaronskayaRegressionOnProtein.pdf?format=raw paper]
[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Yaronskaya2016SparseRegression/doc/yaronskayaRegressionOnProtein.pdf?format=raw paper]
[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Yaronskaya2016SparseRegression/slides/YaronskayaPresentation.pdf?format=raw slides]
[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Yaronskaya2016SparseRegression/slides/YaronskayaPresentation.pdf?format=raw slides]
-
|Александр Катруца
+
|Alexander Katrutsa
|
|
|
|
Строка 3958: Строка 4358:
|
|
|-
|-
-
|Аксенов Сергей
+
|Aksenov Sergey
-
|Кросс-язычный тематический поиск научных публикаций.
+
|Cross-language thematic search for scientific publications.
|[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Aksenov2016CrosslangARTM/code/ code]
|[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Aksenov2016CrosslangARTM/code/ code]
[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Aksenov2016CrosslangARTM/doc/Aksenov_CrossLang.pdf?format=raw paper]
[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Aksenov2016CrosslangARTM/doc/Aksenov_CrossLang.pdf?format=raw paper]
[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Aksenov2016CrosslangARTM/slides/Aksenov.pdf?format=raw slides]
[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Aksenov2016CrosslangARTM/slides/Aksenov.pdf?format=raw slides]
-
|Марина Суворова
+
|Marina Suvorova
|
|
|
|
Строка 3970: Строка 4370:
|
|
|-
|-
-
|Хисматуллин Тимур
+
|Khismatullin Timur
-
|Анализ and классификация интерфейса комплекса ДНК-белок
+
|Analysis and classification of the DNA-protein complex interface
|[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Khismatullin2016ProteinDNA/code/ code]
|[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Khismatullin2016ProteinDNA/code/ code]
[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Khismatullin2016ProteinDNA/paper/Khismatullin2016ProteinDNA.pdf?format=raw paper]
[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Khismatullin2016ProteinDNA/paper/Khismatullin2016ProteinDNA.pdf?format=raw paper]
[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Khismatullin2016ProteinDNA/slides/Khismatullin2016ProteinDNA.pdf?format=raw slides]
[http://svn.code.sf.net/p/mlalgorithms/code/GroupYAD16/Khismatullin2016ProteinDNA/slides/Khismatullin2016ProteinDNA.pdf?format=raw slides]
-
|Владимир Гаранжа
+
|Vladimir Garanzha
|
|
|F
|F
Строка 3983: Строка 4383:
|}
|}
-
=== Task 6 ===
+
===6 ===
-
* '''Name:''' Sparse Regularized Regression on Protein Complex Data
+
* '''Title:''' Sparse Regularized Regression on Protein Complex Data
-
* '''Task''': найти лучшую модель регрессии на данных связывания белковых комплексов
+
* '''Problem:''' find the best regression model on protein complex binding data
-
* '''Data:''' признаковое описание белковых комплексов and константы связывания для них
+
* '''Data:''' feature description of protein complexes and binding constants for them
-
* '''References:''': статьи по регрессии and сравнению методов на схожих данных
+
* '''References:''' articles on regression and comparing methods on similar data
-
* '''Basic algorithm:''' регуляризованная линейная регрессия (Lasso, Ridge, ..), SVR, kernel methods, etc.
+
* '''Base algorithm:''' regularized linear regression (Lasso, Ridge, ..), SVR, kernel methods, etc.
-
* '''Solution:''' сравнение различных алгоритмов регрессии на данных, выбор оптимальной модели and оптимизация параметров
+
* '''Solution:''' comparison of various regression algorithms on data, selection of the optimal model and parameter optimization
-
* '''Novelty:''' получение лучшей модели регрессии для данных связывания белковых комплексов
+
* '''Novelty:''' getting the best regression model for protein complex binding data
-
* '''consultant''': Александр Катруца, автор задачи: Sergei Grudinin.
+
* '''consultant''': Alexander Katrutsa, problem author: Sergei Grudinin.
-
* '''Желательные навыки''': готовность быстро разобраться в различных подходах к регрессии, знание или готовность к освоению С++ на среднем уровне (для более полного исследования нужно будет попробовать библиотеки на С++)
+
* '''Desirable Skills''': willingness to quickly understand various approaches to regression, knowledge or willingness to master C++ at an intermediate level (for a more complete study, you will need to try C++ libraries)
-
=== Task 8 ===
+
===8 ===
-
* '''Name:''' Классификация физической активности: исследование изменения пространства параметров при дообучении and модификации моделей глубокого обучения
+
* '''Title:''' Classification of physical activity: study of parameter space change during retraining and modification of deep learning models
-
* '''Task''': Дана модель классификации по выборке временных сегментов, записанных с акселерометра мобильного телефона. Модель представляет собой многослойную нейросеть. Требуется 1) исследовать дисперсию and матрицу ковариаций параметров нейросети при различных расписаниях оптимизации (т.е. при различных подходах к поэтапному обучению). 2) на основе полученной матрицы ковариаций параметров предложить эффективный способ модификации модели глубокого обучении.
+
* '''Problem:''' Given a classification model for a sample of time segments recorded from a mobile phone's accelerometer. The model is a multilayer neural network. It is required 1) to investigate the variance and covariance matrix of the neural network parameters under different optimization schedules (i.e., under different approaches to staged learning). 2) based on the obtained parameter covariance matrix, propose an effective way to modify the deep learning model.
-
* '''Data:''' Выборка WISDM http://www.cis.fordham.edu/wisdm/dataset.php.
+
* '''Data:''' WISDM Sample http://www.cis.fordham.edu/wisdm/dataset.php.
-
* '''References:''':
+
* '''References:'''
-
**Zadayanchuk A.I., Popova M.S., Strizhov V.V. Выбор оптимальной модели классификации физической активности по измерениям акселерометра http://strijov.com/papers/Zadayanchuk2015OptimalNN4.pdf
+
*# Zadayanchuk A.I., Popova M.S., Strijov V.V. Choosing the optimal physical activity classification model based on accelerometer measurements http://strijov.com/papers/Zadayanchuk2015OptimalNN4.pdf
-
**Попова М. С., Strizhov V.V. Построение сетей глубокого обучения для классификации временных рядов - http://strijov.com/papers/PopovaStrijov2015DeepLearning.pdf
+
*# Popova M.S., Strijov V.V. Building Deep Learning Networks for Time Series Classification - http://strijov.com/papers/PopovaStrijov2015DeepLearning.pdf
-
**Oleg BakhteevЮ., Popova M.S., Strizhov V.V. Системы and средства глубокого обучения в Taskх классификации
+
*# Oleg Bakhteev Yu., Popova M.S., Strijov V.V. Deep Learning Systems and Tools in The problem Classification
-
**LeCun Y. Optimal Brain Damage - yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf
+
*# LeCun Y. Optimal Brain Damage - yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf
-
**Работы по пред-обучению (pre-training) and дообучению (fine-tuning)
+
*# Works on pre-training (pre-training) and additional training (fine-tuning)
-
* '''Basic algorithm:''' Базовая модель описана в статье "Построение сетей глубокого обучения для классификации временных рядов". Алгоритм можно реализовать как с помощью библиотеки PyLearn или keras (другие библиотеки and языки программирования также допустимы).
+
* '''Base algorithm:''' The basic model is described in the article "Building Deep Learning Networks for Time Series Classification". The algorithm can be implemented either using the PyLearn library or keras (other libraries and programming languages are also acceptable).
-
* '''Solution:''' Анализ матрицы ковариаций, построение add-del метода на основе полученных данных.
+
* '''Solution:''' Analysis of the covariance matrix, building an add-del method based on the received data.
-
* '''Novelty:''' Методика исследования ковариационной матрицы большой размерности, а также полученный алгоритм модификации модели важны and будут использоваться в дальнейшем при анализе моделей глубокого обучения.
+
* '''Novelty:''' The technique for studying a high-dimensional covariance matrix, as well as the resulting model modification algorithm, are important and will be used in the future when analyzing deep learning models.
* '''consultant''': Oleg Bakhteev
* '''consultant''': Oleg Bakhteev
-
=== Task 25 ===
+
===25 ===
-
* '''Name:''' Устойчивость дискретизации электрокардиосигналов относительно частотной фильтрации.
+
* '''Title:''' Stability of sampling of electrocardiosignals relative to frequency filtering.
-
* '''Task''': [[Технология информационного анализа электрокардиосигналов]] по В.М.Успенскому основана на преобразовании электрокардиограммы в символьную строку (кодограмму) and выделении информативных наборов слов — диагностических эталонов каждого заболевания. Проблема в том, что для дискретизации необходимо достаточно точно определять амплитуду R-пиков. На амплитуду может влиять частотная фильтрация сигнала, которая производится электрокардиографом на аппаратном или программном уровне. Task заключается в том, чтобы оценить, насколько сильно различные частотные фильтры (например, фильтр 50.4Гц, подавляющий воздействие электрической сети, высокочастотный фильтр) могут влиять на частоты слов в кодограмме and на качество классификации.
+
* '''Problem:''' [[Technology of information analysis of electrocardiosignals]] according to V.M.Uspensky is based on the transformation of the electrocardiogram into a character string (codogram) and the selection of informative sets of words - diagnostic standards for each disease. The problem is that for discretization it is necessary to accurately determine the amplitude of the R-peaks. The amplitude can be affected by the frequency filtering of the signal, which is performed by the electrocardiograph at the hardware or software level. The problem is to evaluate how much different frequency filters (example, 50.4Hz mains suppression filter, high-pass filter) can affect the word frequencies in the codegram and the quality of the classification.
-
* '''Data:''' электрокардиограммы в формате KDM.
+
* '''Data:''' electrocardiograms in KDM format.
-
* '''References:''': выдадим :)
+
* '''References:''' will issue :)
-
* '''Basic algorithm:''' Линейный классификатор.
+
* '''Base algorithm:''' Linear classifier.
-
* '''Solution:''' Прямое and обратное преобразование Фурье, алгоритм детекции R-пиков на электрокардиограмме, алгоритм определения амплитуды R-пиков.
+
* '''Solution:''' Direct and inverse Fourier transform, algorithm for detecting R-peaks on an electrocardiogram, algorithm for determining the amplitude of R-peaks.
-
* '''Novelty:''' Исследование устойчивости кодограмм по отношению к частотной фильтрации с различными параметрами ранее не проводилось в информационном анализе электрокардиосигналов.
+
* '''Novelty:''' The study of the stability of codograms in relation to frequency filtering with different parameters has not previously been carried out in the information analysis of electrocardiosignals.
-
* '''consultant''': Виктор Сафронов (Научный центр им. В.И.Кулакова)
+
* '''consultant''': Victor Safronov (Scientific Center named after V.I.Kulakov)
-
=2015=
+
==2015==
{|class="wikitable"
{|class="wikitable"
Строка 4031: Строка 4431:
! DZ-2 (Problem number)
! DZ-2 (Problem number)
! Letters
! Letters
-
! Sum
 
-
! Grade
 
|-
|-
-
|Бернштейн Юлия
+
|Bernstein Julia
-
|Методы определения характеристик фибринолиза по последовательности изображений крови in vitro
+
|Methods for characterizing fibrinolysis by in vitro blood imaging sequence
-
 
+
| Matveev I. A.
-
|Матвеев И. А.
+
|Solomatin
-
|Соломатин
+
|1
|1
|3 (8)
|3 (8)
|AILSBRCVTDE
|AILSBRCVTDE
-
|11
 
-
|10
 
|-
|-
-
|Бочкарев Артем
+
|Bochkarev Artem
-
|Структурное обучение при порождении моделей
+
|Structural learning when generating models
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Bochkarev2015StructuredLearning/] (no code), [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Bochkarev2015StructuredLearning/doc/Bochakrev2015StructuredLearning.pdf?format=raw paper], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Bochkarev2015StructuredLearning/doc/presentation.pdf?format=raw slides]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Bochkarev2015StructuredLearning/] (no code), [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Bochkarev2015StructuredLearning/doc/Bochakrev2015StructuredLearning.pdf?format=raw paper], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Bochkarev2015StructuredLearning/doc/presentation.pdf?format=raw slides]
-
|[[Участник:Varf_Ann|Варфоломеева Анна]], [[Участник:Oleg_Bakhteev|Бахтеев Олег]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Varf_Ann Varfolomeeva Anna], [http://www.machinelearning.ru/wiki/index.php?title=Участник:Oleg_Bakhteev Oleg Bakhteev]
-
|Исаченко
+
|Isachenko
|2
|2
|2 (7)
|2 (7)
|A+I++LS+BRCVT+DS
|A+I++LS+BRCVT+DS
-
|9.25
+
|-
-
|10
+
|Goncharov Alexey
-
|Гончаров Алексей
+
|Metric classification of time series
-
|Метрическая классификация временных рядов
+
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/code code],
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/code code],
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/Goncharov2015MetricClassification.pdf?format=raw paper],
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/Goncharov2015MetricClassification.pdf?format=raw paper],
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/GoncharovAlexey2015PresentationMetricClassification.pdf?format=raw slides]
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/GoncharovAlexey2015PresentationMetricClassification.pdf?format=raw slides]
-
|[[Участник:Mpopova|Maria Popova]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Mpopova Maria Popova]
-
|Задаянчук
+
|Zadayanchuk
|1.5
|1.5
|1 (4)
|1 (4)
|AILSBRCVTDSW
|AILSBRCVTDSW
-
|12
 
-
|10
 
|-
|-
-
|Двинских Дарина
+
|Dvinskikh Darina
-
|Повышение качества прогнозирования с использованием групп товаров
+
|Improving the quality of forecasting using product groups
|[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Dvinskikh2015DemandForecasting/code code],
|[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Dvinskikh2015DemandForecasting/code code],
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Dvinskikh2015DemandForecasting/doc/DvinskikhDemandForecasting.pdf paper],
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Dvinskikh2015DemandForecasting/doc/DvinskikhDemandForecasting.pdf paper],
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Dvinskikh2015DemandForecasting/doc/Dvinskikh.Presentation.pdf slides]
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Dvinskikh2015DemandForecasting/doc/Dvinskikh.Presentation.pdf slides]
-
|Каневский Д. Ю.
+
|Kanevsky D. Yu.
-
|Смирнов
+
|Smirnov
|0.5
|0.5
|3 (7)
|3 (7)
|AILSBRCVTDEHS
|AILSBRCVTDEHS
-
|14
 
-
|10
 
|-
|-
-
|Ефимов Юрий
+
|Efimov Yuri
-
|Поиск внешней and внутренней границ радужки на изображении глаза методом парных градиентов
+
|Search for the outer and inner boundaries of the iris in the eye image using the paired gradient method
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Efimov2015IrisBorderRecognition/code code],
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Efimov2015IrisBorderRecognition/code code],
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Efimov2015IrisBorderRecognition/doc/Efimov2015IrisBorderRecognition.pdf?format=raw paper],
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Efimov2015IrisBorderRecognition/doc/Efimov2015IrisBorderRecognition.pdf?format=raw paper],
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Efimov2015IrisBorderRecognition/doc/15_presentation.pdf?format=raw slides]
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Efimov2015IrisBorderRecognition/doc/15_presentation.pdf?format=raw slides]
-
|Матвеев И. А.
+
|Matveev I. A.
-
|Нейчев
+
|Neichev
|
|
|
|
|AILSBRCVTDEW
|AILSBRCVTDEW
-
|12
 
-
|10
 
|-
|-
-
|Жариков Илья
+
|Zharikov Ilya
-
|Проверка соответствия электрокардиографа требованиям диагностической системы «Скринфакс» and оценка качества электрокардиограмм.
+
|Checking the compliance of the electrocardiograph with the requirements of the diagnostic system "Screenfax" and assessing the quality of electrocardiograms.
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Zharikov2015ECGVerification/code code], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Zharikov2015ECGVerification/doc/Zharikov2015ECGVerification.pdf?format=raw paper], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Zharikov2015ECGVerification/doc/Zharikov2015Presentation.pdf?format=raw slides]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Zharikov2015ECGVerification/code code], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Zharikov2015ECGVerification/doc/Zharikov2015ECGVerification.pdf?format=raw paper], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Zharikov2015ECGVerification/doc/Zharikov2015Presentation.pdf?format=raw slides]
-
|Ишкина Шаура
+
|Shaura Ishkina
-
|Бочкарев
+
|Bochkarev
|3.5
|3.5
|3 (5)
|3 (5)
|AIL+SBRCVTDEHSW
|AIL+SBRCVTDEHSW
-
|14.25
 
-
|10
 
|-
|-
-
|Задаянчук Андрей
+
|Zadayanchuk Andrey
-
|Выбор оптимальной модели классификации физической активности
+
|Choosing the optimal physical activity classification model
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Zadayanchuk2015OptimalNN/code code],
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Zadayanchuk2015OptimalNN/code code],
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Zadayanchuk2015OptimalNN/doc/Zadayanchuk2015OptimalNN.pdf paper],
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Zadayanchuk2015OptimalNN/doc/Zadayanchuk2015OptimalNN.pdf paper],
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Zadayanchuk2015OptimalNN/doc/Zadayanchuk2015OptimalNNpresentation.pdf slides]
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Zadayanchuk2015OptimalNN/doc/Zadayanchuk2015OptimalNNpresentation.pdf slides]
-
|[[Участник:Mpopova|Maria Popova]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Mpopova Maria Popova]
-
|Гончаров
+
|Goncharov
|2
|2
|0 (17)
|0 (17)
|AI-LSB+RCVTD
|AI-LSB+RCVTD
-
|10
 
-
|10
 
|-
|-
-
|Златов Александр
+
|Zlatov Alexander
-
|Построение иерархической модели крупной конференции
+
|Building a hierarchical model of a large conference
||[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Zlatov2015ConferenceModel/code code],
||[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Zlatov2015ConferenceModel/code code],
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Zlatov2015ConferenceModel/doc/ConferenceModel.pdf?format=raw paper],
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Zlatov2015ConferenceModel/doc/ConferenceModel.pdf?format=raw paper],
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Zlatov2015ConferenceModel/doc/Zlatov2015ConferenceModelPresentation.pdf?format=raw slides]
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Zlatov2015ConferenceModel/doc/Zlatov2015ConferenceModelPresentation.pdf?format=raw slides]
-
|Арсентий Кузьмин
+
|Arsenty Kuzmin
-
|Двинских
+
|Dvinskyh
|1.5
|1.5
|3 (14)
|3 (14)
|AI+L+SBRC++V+TDESW
|AI+L+SBRC++V+TDESW
-
|14.25
 
-
|10
 
|-
|-
|Isachenko Roman
|Isachenko Roman
-
|Метрическое обучение and снижение размерности пространства в Taskх кластеризации временных рядов
+
|Metric Learning and Space Dimension Reduction in The problems of Time Series Clustering
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Isachenko2015MetricLearning/code code], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Isachenko2015MetricLearning/doc/Isachenko2015MetricLearning.pdf?format=raw paper], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Isachenko2015MetricLearning/doc/Isachenko2015MLPresentation.pdf?format=raw slides]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Isachenko2015MetricLearning/code code], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Isachenko2015MetricLearning/doc/Isachenko2015MetricLearning.pdf?format=raw paper], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Isachenko2015MetricLearning/doc/Isachenko2015MLPresentation.pdf?format=raw slides]
-
|[[Участник:Katrutsa|Катруца Александр]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Katrutsa Alexander Katrutsa]
-
|Жариков
+
|Zharikov
|3.5
|3.5
|3 (14)
|3 (14)
|A-I+L+S-BR+CVTDEHSW
|A-I+L+S-BR+CVTDEHSW
-
|14.25
 
-
|10
 
|-
|-
-
|Нейчев Радослав
+
|Radoslav Neichev
-
|Отбор признаков в прогнозировании временных рядов c использованием экзогенных факторов
+
|Feature Selection in Time Series Forecasting Using Exogenous Factors
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Neychev2015FeatureSelection/code code], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Neychev2015FeatureSelection/doc/Neychev2015FeatureSelection.pdf?format=raw paper], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Neychev2015FeatureSelection/doc/Neychev2015FSPresentation.pdf?format=raw slides]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Neychev2015FeatureSelection/code code], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Neychev2015FeatureSelection/doc/Neychev2015FeatureSelection.pdf?format=raw paper], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Neychev2015FeatureSelection/doc/Neychev2015FSPresentation.pdf?format=raw slides]
-
|[[Участник:Katrutsa|Катруца Александр]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Katrutsa Alexander Katrutsa]
-
|Ефимов
+
|Efimov
|1
|1
|3 (9)
|3 (9)
|AI-L-SBRCVTDEHSW
|AI-L-SBRCVTDEHSW
-
|13.5
 
-
|10
 
|-
|-
-
|Подкопаев Александр
+
|Podkopaev Alexander
-
|Прогнозирование четвертичных структур белков
+
|Prediction of Quaternary Structures of Proteins
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Podkopaev2015ProteinStructures/code code],
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Podkopaev2015ProteinStructures/code code],
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Podkopaev2015ProteinStructures/doc/Podkopaev2015ProteinStructures.pdf?format=raw paper],
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Podkopaev2015ProteinStructures/doc/Podkopaev2015ProteinStructures.pdf?format=raw paper],
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Podkopaev2015ProteinStructures/doc/Podkopaev2015ProteinStructuresPresentation.pdf?format=raw slides]
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Podkopaev2015ProteinStructures/doc/Podkopaev2015ProteinStructuresPresentation.pdf?format=raw slides]
-
|Ю. В. Максимов
+
|Maksimov Yu. V.
-
|Решетова
+
|Reshetov
|3.5
|3.5
|3 (11)
|3 (11)
|AILS+B+RCVTDEHS
|AILS+B+RCVTDEHS
-
|13.5
 
-
|10
 
|-
|-
-
|Решетова Дарья
+
|Reshetova Daria
-
|Методы многоклассовой классификации с улучшенными оценками сходимости в Taskх частичного обучения
+
|Multiclass Classification Methods with Improved Convergence Estimators in Partial Learning The problems
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Reshetova2015MetricLearning/code code],
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Reshetova2015MetricLearning/code code],
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Reshetova2015MetricLearning/doc/Reshetova2015MulticlussClussification.pdf?format=raw paper],
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Reshetova2015MetricLearning/doc/Reshetova2015MulticlussClussification.pdf?format=raw paper],
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Reshetova2015MetricLearning/doc/presentation.pdf?format=raw slides]
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Reshetova2015MetricLearning/doc/presentation.pdf?format=raw slides]
-
|Максимов Юрий
+
|Maksimov Yu. V.
-
|Камзолов
+
|Kamzolov
|2.5
|2.5
|3 (10)
|3 (10)
|AIL++SB+RCVT++DEHS-
|AIL++SB+RCVT++DEHS-
-
|14
 
-
|10
 
|-
|-
-
|Смирнов Евгений
+
|Smirnov Evgeniy
-
|Тематическая модель интересов постоянных пользователей мобильного приложения
+
|Thematic model of interests of permanent users of the mobile application
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Smirnov2015TopicModeling/Code code], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Smirnov2015TopicModeling/doc/Smirnov2015TopicModeling.pdf?format=raw paper], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Smirnov2015TopicModeling/doc/Smirnov2015Presentation.pdf?format=raw slides]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Smirnov2015TopicModeling/Code code], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Smirnov2015TopicModeling/doc/Smirnov2015TopicModeling.pdf?format=raw paper], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Smirnov2015TopicModeling/doc/Smirnov2015Presentation.pdf?format=raw slides]
-
|Виктор Сафронов
+
|Victor Safronov
-
|Златов
+
|Zlatov
|1
|1
|1 (4)
|1 (4)
|AILSBRCVTWDE
|AILSBRCVTWDE
-
|11.25
 
-
|10
 
|-
|-
-
|Соломатин Иван
+
|Solomatin Ivan
-
|Определение области затенения радужки классификатором локальных текстурных признаков
+
|Determination of the iris shading area by the classifier of local textural features
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Solomatin2015EESLocalization/code code], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Solomatin2015EESLocalization/doc/article.pdf?format=raw paper], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Solomatin2015EESLocalization/doc/Solomatin.EESLocalisation.Presentation.pdf?format=raw slides]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Solomatin2015EESLocalization/code code], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Solomatin2015EESLocalization/doc/article.pdf?format=raw paper], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Solomatin2015EESLocalization/doc/Solomatin.EESLocalisation.Presentation.pdf?format=raw slides]
-
|Матвеев И. А.
+
|Matveev I. A.
-
|Бернштейн
+
|Bernstein Julia
|
|
|3 (9)
|3 (9)
|AILSBRCVTDE
|AILSBRCVTDE
-
|11
 
-
|10
 
|-
|-
-
|Черных Владимир
+
|Chernykh Vladimir
-
|Тестирование непараметрических алгоритмов прогнозирования временных рядов в условиях нестационарности
+
|Testing nonparametric algorithms for time series forecasting under nonstationary conditions
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Chernykh2015TimeSeriesPrediction/code code],
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Chernykh2015TimeSeriesPrediction/code code],
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Chernykh2015TimeSeriesPrediction/doc/SteninaChernykh2015ArimaHistForecast.pdf?format=raw paper],
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Chernykh2015TimeSeriesPrediction/doc/SteninaChernykh2015ArimaHistForecast.pdf?format=raw paper],
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Chernykh2015TimeSeriesPrediction/doc/presentation/Chernykh2015Presentation.pdf?format=raw slides]
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Chernykh2015TimeSeriesPrediction/doc/presentation/Chernykh2015Presentation.pdf?format=raw slides]
-
|[[Участник:Medvmasha|Стенина Мария]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Medvmasha Stenina Maria]
-
|Шишковец
+
|Shishkovets Svetlana
|3.5
|3.5
|3 (4)
|3 (4)
|A+I+LSBRCVT+DE++H++
|A+I+LSBRCVT+DE++H++
-
|13.75
 
-
|10
 
|-
|-
-
|Шишковец Светлана
+
|Shishkovets Svetlana
-
|Регуляризация линейного наивного байесовского классификатора.
+
|Regularization of a linear naive bayes classifier.
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Shishkovets2015NaivBayes/code code],
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Shishkovets2015NaivBayes/code code],
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Shishkovets2015NaivBayes/doc/Shishkovets2015NaivBayes.pdf?format=raw paper], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Shishkovets2015NaivBayes/doc/Shishkovets_Presentation.pdf?format=raw slides]
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Shishkovets2015NaivBayes/doc/Shishkovets2015NaivBayes.pdf?format=raw paper], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Shishkovets2015NaivBayes/doc/Shishkovets_Presentation.pdf?format=raw slides]
-
|[[Участник:Uskov Mikhail|Михаил Усков]], [[Участник:Vokov|Константин Воронцов]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Uskov_Mikhail Uskov Mikhail], [http://www.machinelearning.ru/wiki/index.php?title=Участник:Vokov Vorontsov K. V.]
-
|Черных
+
|Chernykh Vladimir
|3.5
|3.5
|2 (9)
|2 (9)
|A+I+L+SBR+CV+TD+E+H+S
|A+I+L+SBR+CV+TD+E+H+S
-
|15
 
-
|10
 
|-
|-
-
|Камзолов Дмитрий
+
|Kamzolov Dmitri
-
|Новые алгоритмы для задачи ранжирования веб-страниц
+
|New algorithms for the problem of ranking web pages
|—
|—
-
|Александр Гасников, Yuri Maksimov
+
|Alexander Gasnikov, Yuri Maksimov
-
|Подкопаев
+
|Podkopaev
|
|
|
|
|AILSB+RCVT+DEHS--
|AILSB+RCVT+DEHS--
-
|13
 
-
|8
 
|-
|-
-
|Сухарева Анжелика
+
|Sukhareva Angelica
-
|Классификация научных текстов по отраслям знаний
+
|Classification of scientific texts by branches of knowledge
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Sukhareva2015TextClassification/code code],
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Sukhareva2015TextClassification/code code],
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Sukhareva2015TextClassification/doc/Sukhareva2015TextClassification.pdf?format=raw paper],
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Sukhareva2015TextClassification/doc/Sukhareva2015TextClassification.pdf?format=raw paper],
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Sukhareva2015TextClassification/doc/Sukhareva_Presentation.pdf?format=raw slides]
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Sukhareva2015TextClassification/doc/Sukhareva_Presentation.pdf?format=raw slides]
-
| [[Участник:Sidious|Сергей Царьков]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Sidious Sergei Tsarkov]
|
|
|0.5
|0.5
|
|
|AILSBRCVTDEH
|AILSBRCVTDEH
-
|
 
-
|9
 
|-
|-
|}
|}
-
=== Task 1 ===
+
===1. 2015===
-
* '''Name:''' Повышение качества прогнозирования спроса с использованием групп товаров
+
* '''Title:''' Improving the quality of demand forecasting using product groups
-
* '''Task:'''
+
* '''Problem description:'''
-
Дано:
+
Given:
-
*# Временные ряды продаж нескольких группам товаров в одном гипермаркете. Также для каждого товара известны периоды дефицита, периоды воздействия на спрос календарных праздников and периоды проведения. маркетинговых акций. Также известен товарный классификатор: дерево групп товаров, где сами товары являются листьями.
+
*# Time series of sales for several product groups in one hypermarket. Also, for each product, periods of shortage, periods of influence on the demand of calendar holidays and periods of holding are known. marketing promotions. A product classifier is also known: a tree of product groups, where the products themselves are leaves.
-
*# Алгоритм прогнозирования, который используется для построения прогнозов спроса по этим товарам: самоадаптивное экспоненциальное сглаживание (модель Тригга-Лича, см. [1])
+
*# Forecasting algorithm that is used to generate demand forecasts for these products: self-adaptive exponential smoothing (Trigg-Leach model, see [1])
-
*# Функция потерь, по которой измеряется качество прогнозов: MAPE.
+
*# Loss function by which the quality of forecasts is measured: MAPE.
-
*# Требования к построению прогнозов: прогнозы требуется строить понедельно на 4 недели вперёд (в начале текущей недели требуется построить прогноз суммарного спроса на следующую неделю, неделю через одну, через две, через 3).
+
*# Requirements for building forecasts: forecasts must be built weekly for 4 weeks ahead (at the beginning of the current week, you need to build a forecast of total demand for the next week, a week in one, two, and 3).
-
Гипотеза: спрос на отдельные товары слишком неустойчив, чтобы выявить характерную для них сезонность. Предлагается использовать данные о группах товаров, чтобы точнее определить параметры сезонности.
+
Hypothesis: Demand for individual goods is too volatile to reveal their characteristic seasonality. It is proposed to use data on product groups in order to more accurately determine the parameters of seasonality.
-
Замечание: возможны and другие варианты повышения качества прогнозирования за счёт работы с группами товаров.
+
Note: there are other options for improving the quality of forecasting by working with groups of goods.
-
Task заключается в повышении качества прогнозирования в рамках поставленной задачи путём учёта эффекта взаимозаменяемости товаров, по сравнению с базовым алгоритмом.
+
The problem is to improve the quality of forecasting within the framework of The problem by taking into account the effect of the interchangeability of goods, in comparison with the Basic algorithm
-
Результат можно считать достигнутым, если показано статистически значимое повышение качества при построении серии прогнозов (не менее 20) по каждому временному ряду скользящим контролем.
+
The result can be considered achieved if a statistically significant increase in quality is shown when building a series of forecasts (at least 20) for each time series using a sliding control.
-
* '''Data:'''
+
* '''Data:'''
-
*# Данные о продажах нескольких товарных групп в гипермаркете крупной торговой сети: https://drive.google.com/file/d/0B5YjPespcL83X3pHaE1aRzBUaDg/view?usp=sharing
+
*# Data on sales of several product groups in a hypermarket of a large retail chain: https://drive.google.com/file/d/0B5YjPespcL83X3pHaE1aRzBUaDg/view?usp=sharing
-
* '''References:'''
+
* '''References:'''
-
*# Лукашин Ю. П. Адаптивные методы краткосрочного прогнозирования временных рядов. — М.: Финансы and статистика, 2003.
+
*# Lukashin Yu. P. Adaptive methods of short-term forecasting of time series. - M .: Finance and statistics, 2003.
-
*# http://www.machinelearning.ru/wiki/index.php?title=%D0%9C%D0%BE%D0%B4%D0%B5%D0%BB%D1%8C_%D0%A2%D1%80%D0%B8%D0%B3%D0%B3%D0%B0-%D0%9B%D0%B8%D1%87%D0%B0
+
*# http://www.machinelearning.ru/wiki/index.php?title=%D0%9C%D0%BE%D0%B4%D0%B5%D0%BB%D1%8C_%D0%A2%D1 %80%D0%B8%D0%B3%D0%B3%D0%B0-%D0%9B%D0%B8%D1%87%D0%B0
*# Nitin Patel, Mahesh Kumar, Rama Ramakrishnan. Clustering models to improve forecasts in retail merchandising. http://www.cytel.com/Papers/INFORMS_Prac_%2004.pdf
*# Nitin Patel, Mahesh Kumar, Rama Ramakrishnan. Clustering models to improve forecasts in retail merchandising. http://www.cytel.com/Papers/INFORMS_Prac_%2004.pdf
*# Kumar M., Error-based Clustering and Its Application to Sales Forecasting in Retail Merchandising. PhD Thesis. http://books.google.ru/books/about/Error_based_Clustering_and_Its_Applicati.html?id=6252NwAACAAJ&redir_esc=y
*# Kumar M., Error-based Clustering and Its Application to Sales Forecasting in Retail Merchandising. PhD Thesis. http://books.google.ru/books/about/Error_based_Clustering_and_Its_Applicati.html?id=6252NwAACAAJ&redir_esc=y
-
* '''Basic algorithm:''' Предлагется использовать модель сезонности [3] в сочетании с моделью Тригга-Лича в качестве алгоритма прогнозирования ряда без сезонности ([1] and [2]). При этом возможны 3 варианта алгоритма, в зависимости от способа оценки сезонности:
+
* '''Base algorithm:''' It is proposed to use the seasonality model [3] in combination with the Trigg-Leach model as a non-seasonal series prediction algorithm ([1] and [2]). In this case, 3 variants of the algorithm are possible, depending on the method of assessing seasonality:
-
*# Сезонность оценивается по самому ряду продаж. Для товаров с "короткой" историей оценка сезонности не выполняется.
+
*# Seasonality is estimated by the very series of sales. For products with a "short" history, seasonality is not assessed.
-
*# Сезонность оценивается по группе товаров, исходя из классификатора товарных групп (нижний уровень классификатора)
+
*# Seasonality is estimated for a group of goods, based on the classifier of commodity groups (lower level of the classifier)
-
*# Сезонность оценивается по кластерам, исходя из методики [3], [4].
+
*# Seasonality is estimated by clusters, based on the methodology [3], [4].
-
* '''Solution:''' Требуется реализовать объединение модели сезонности [3] and модели Тригга-Лича в качестве алгоритма прогнозирования ряда без сезонности ([1] and [2]), с 3-мя вариантами анализа сезонности, описанными выше. При построение сезонных профилей необходимо исключать периоды маркетинговых акций (иначе может быть существенное искажение сезонности). Дальше понадобится серия экспериментов с анализом качества на реальных данных. При анализе качества можно исключать периоды проведения праздников and маркетинговых акций. По итогам экспериментов, возможно, потребуется адаптация алгоритма кластеризации.
+
* '''Solution:''' It is required to implement the combination of the seasonality model [3] and the Trigg-Leach model as a non-seasonal series prediction algorithm ([1] and [2]), with the 3 variants of seasonality analysis described above. When constructing seasonal profiles, it is necessary to exclude periods of marketing campaigns (otherwise, there may be a significant distortion of seasonality). Next, you need a series of experiments with quality analysis on real data. When analyzing quality, you can exclude periods of holidays and marketing campaigns. Based on the results of the experiments, it may be necessary to adapt the clustering algorithm.
-
* '''Novelty:''' Построение самоадаптивного алгоритма прогнозирования с учётом сезонности, выявляемой путём кластерного анализа.
+
* '''Novelty:''' Building a self-adaptive forecasting algorithm taking into account seasonality, identified by cluster analysis.
-
* '''consultant:''' Каневский Д.Ю.
+
* '''consultant:''' Kanevsky D.Yu.
-
=== Task 2 ===
+
===2. 2015===
-
* '''Name:''' Исследование связи онкологических заболеваний and экологической ситуации по пространственно-временной выборке
+
* '''Title:''' Study of the relationship between oncological diseases and the ecological situation by spatio-temporal sampling
-
* '''Task:''' Дана матрица с оценками экологической обстановки and данными по средней заболеваемости онкологией для каждого района Ростовской области за несколько лет. Оценки экологической обстановки содержат значительное количество шума. Оценки экологической обстановки выполнены в ранговых шкалах. Требуется построить регрессионную модель для оценки количества онкозаболеваний, которая бы учитывала экологическую обстановку в районе, соседство с другими районами and тенденцию изменения параметров на протяжении временного ряда.
+
* '''Problem description:''' Given a matrix with estimates of the environmental situation and data on the average incidence of oncology for each district of the Rostov region for several years. Assessments of the environmental situation contain a significant amount of noise. Assessments of the environmental situation are made in rank scales. It is required to build a regression model for estimating the number of oncological diseases, which would take into account the ecological situation in the region, proximity to other regions and the trend in parameter changes over the time series.
-
* '''Data:''' таблица с данными об экологической ситуации and количестве онкологических заболеваний в Ростовской области.
+
* '''Data:''' table with data on the environmental situation and the number of oncological diseases in the Rostov region.
-
* '''References:'''
+
* '''References:'''
-
** http://www.scielosp.org/pdf/aiss/v47n2/v47n2a10.pdf - Ecological studies of cancer incidence in an area interested by dumping waste sites in Campania (Italy)
+
*# http://www.scielosp.org/pdf/aiss/v47n2/v47n2a10.pdf - Ecological studies of cancer incidence in an area interested by dumping waste sites in Campania (Italy)
-
** http://lasi.lynchburg.edu/shahady_t/public/Breast%20Cancer.pdf - Incidence of human cancer in correlation with ecological integrity in a metropolitan population
+
*# http://lasi.lynchburg.edu/shahady_t/public/Breast%20Cancer.pdf - Incidence of human cancer in correlation with ecological integrity in a metropolitan population
-
** http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/SUBBARAO1/HeivReview.pdf - Heteroscedastic Errors-in-Variables Regression
+
*# http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/SUBBARAO1/HeivReview.pdf - Heteroscedastic Errors-in-Variables Regression
-
** http://en.wikipedia.org/wiki/Errors-in-variables_models - википедия: модели с ошибками в независимых переменных
+
*# http://en.wikipedia.org/wiki/Errors-in-variables_models - wikipedia: models with errors in independent variables
-
** http://www.cardiff.ac.uk/maths/resources/Gillard_Tech_Report.pdf - An Historical Overview of Linear Regression with Errors in both Variables
+
*# http://www.cardiff.ac.uk/maths/resources/Gillard_Tech_Report.pdf - An Historical Overview of Linear Regression with Errors in both Variables
-
** http://arxiv.org/pdf/1212.5049v1.pdf - A Partial Least Squares Algorithm Handling Ordinal Variables Also In Presence Of A Small Number Of Categories
+
*# http://arxiv.org/pdf/1212.5049v1.pdf - A Partial Least Squares Algorithm Handling Ordinal Variables Also In Presence Of A Small Number Of Categories
-
** [https://ru.wikipedia.org/wiki/%D0%A0%D0%B0%D1%81%D1%81%D1%82%D0%BE%D1%8F%D0%BD%D0%B8%D0%B5_%D0%9C%D0%B0%D1%85%D0%B0%D0%BB%D0%B0%D0%BD%D0%BE%D0%B1%D0%B8%D1%81%D0%B0] - википедия: Расстояние Махаланобиса
+
*# [https://en.wikipedia.org/wiki/%D0%A0%D0%B0%D1%81%D1%81%D1%82%D0%BE%D1%8F%D0%BD%D0% B8%D0%B5_%D0%9C%D0%B0%D1%85%D0%B0%D0%BB%D0%B0%D0%BD%D0%BE%D0%B1%D0%B8%D1%81% D0%B0] - wikipedia: Mahalanobis Distance
-
** http://see.stanford.edu/materials/aimlcs229/cs229-hmm.pdf - Hidden Markov Models Fundamentals
+
*# http://see.stanford.edu/materials/aimlcs229/cs229-hmm.pdf - Hidden Markov Models Fundamentals
-
* '''Basic algorithm:''' Сравнений с базовым алгоритмом проводить не предполагается
+
* '''Base algorithm:''' Comparisons with the basic algorithm are not expected
-
* '''Solution:''' Один из алгоритмов регрессии из обзора (3-й пункт литературы). Трансформацию порядковых признаков в линейные можно найти в пункте 4 литературы
+
* '''Solution:''' One of the regression algorithms from the review (3rd reference point). The transformation of ordinal features into linear ones can be found in paragraph 4 of the literature
-
* '''Novelty:''' В отличие от существующих работ, в основном использующих только наборы признаков, но не географическое соседство с загрязненными районами and динамику изменения окружающей среды, в данной работе предлагается провести анализ проблемы с учетом этих факторов.
+
* '''Novelty:''' In contrast to existing works, which mainly use only sets of features, but not geographic proximity to contaminated areas and the dynamics of environmental changes, this paper proposes to analyze the problem taking into account these factors.
* '''consultant:''' Oleg Bakhteev.
* '''consultant:''' Oleg Bakhteev.
-
=== Task 3 ===
+
===3. 2015===
-
* '''Name:''' Получение оценки разреженной ковариационной матрицы для нелинейных моделей (нейросетей).
+
* '''Title:''' Obtaining an estimate of the sparse covariance matrix for nonlinear models (neural networks).
-
* '''Task''': Предложить метод оценки ковариационной матрицы параметров модели общего вида для случая линейной регрессии, логистической регрессии, общих нелинейных моделей, включая нейросети. Предложить способ учета структуры матрицы (разреженность, зависимости между коэффициентами and т.д.)
+
* '''Problem:''' Suggest a method for estimating the covariance matrix of parameters of a general model for the case of linear regression, logistic regression, general non-linear models, including neural networks. Suggest a way to take into account the structure of the matrix (sparseness, dependencies between coefficients, etc.)
-
* '''Data:''' Синтетические данные and тесты.
+
* '''Data:''' Synthetic data and tests.
-
* '''References:''':
+
* '''References:'''
-
** Зайцев А.А., Strizhov V.V., Tokmakova A.A. [http://strijov.com/papers/ZaytsevStrijovTokmakova2012Likelihood_Preprint.pdf Оценка гиперпараметров регрессионных моделей методом максимального правдоподобия] // Информационные технологии, 2013, 2 11-15.
+
*# Zaitsev A.A., Strijov V.V., Tokmakova A.A. [http://strijov.com/papers/ZaytsevStrijovTokmakova2012Likelihood_Preprint.pdf Maximum Likelihood Estimation of Hyperparameters of Regression Models] // Information Technologies, 2013, 2 - 11-15.
-
** Kuznetsov M.P., Tokmakova A.A., Strijov V.V. [http://strijov.com/papers/HyperOptimizationEng.pdf Analytic and stochastic methods of structure parameter estimation] // Preprint, 2015.
+
*# Kuznetsov M.P., Tokmakova A.A., Strijov V.V. [http://strijov.com/papers/HyperOptimizationEng.pdf Analytic and stochastic methods of structure parameter estimation] // Preprint, 2015.
-
** Aduenko A. A. Презентация по Evidence, 2015. [[Медиа:aduenko_presentation_russian.pdf|aduenko_presentation_russian.pdf]]
+
*# Aduenko A. A. Presentation on Evidence, 2015. [[Media: aduenko_presentation_russian.pdf|aduenko_presentation_russian.pdf]]
-
** Bishop C. M. Pattern Recognition and Machine Learning, pp. 161-172, 2006.
+
*# Bishop C. M. Pattern Recognition and Machine Learning, pp. 161-172, 2006.
-
* '''Basic algorithm:''' Оценка диагональной матрицы, см. папку MLAlgorithms/HyperOptimization.
+
* '''Base algorithm:''' Diagonal matrix estimation, see MLAlgorithms/HyperOptimization folder.
-
* '''Solution:'''
+
* '''Solution:'''
-
* '''Novelty:''' Предложен быстрый алгоритм получения оценок ковариационной матрицы общего вида для нелинейных моделей, исследованы свойства разреженных матриц.
+
* '''Novelty:''' A fast algorithm for obtaining estimates of the general covariance matrix for nonlinear models is proposed, the properties of sparse matrices are investigated.
* '''consultant:''' Alexander Aduenko.
* '''consultant:''' Alexander Aduenko.
-
=== Task 4 ===
+
===4. 2015===
-
* '''Name:''' Отбор признаков в прогнозировании временных рядов c использованием экзогенных факторов
+
* '''Title:''' Feature selection in time series forecasting using exogenous factors
-
* '''Task''': постановка задачи из [http://www.swissquant.net/files/pdf/Robust%20Calculation%20and%20Parameter%20Estimation%20of%20the%20Hourly%20Price%20Forward%20Curve.pdf] формула (32)
+
* '''Problem:''' The problem statement from [http://www.swissquant.net/files/pdf/Robust%20Calculation%20and%20Parameter%20Estimation%20of%20the%20Hourly%20Price%20Forward%20Curve.pdf ] formula (32)
-
* '''Data:''' временные ряды с ценами на электроэнергию.
+
* '''Data:''' time series with electricity prices.
-
* '''References:''':
+
* '''References:'''
-
** Ключевые слова: Hourly Price Forward Curve, краткосрочное прогнозирование временных рядов, выбор признаков, метод Add-Del, (не)линейная регрессия.
+
*# Keywords: Hourly Price Forward Curve, short-term time series forecasting, feature selection, Add-Del method, (non)linear regression.
-
**Основные статьи:
+
*# Main Articles:
-
*# [http://scl.hanyang.ac.kr/scl/database/papers/PESGM/PESGM2014/files/PESGM2014-000294.PDF] - исследование влияния цен в одной стране на цену в другой and как это учесть при прогнозировании.
+
*# [http://scl.hanyang.ac.kr/scl/database/papers/PESGM/PESGM2014/files/PESGM2014-000294.PDF] - study of the influence of prices in one country on the price in another and how to take this into account when forecasting .
-
*# [http://www.eeh.ee.ethz.ch/uploads/tx_ethpublications/hildmann_EEM_2013.pdf] - обзор терминов and процессов, всплывающих в прогнозировании HPFC + мотивация
+
*# [http://www.eeh.ee.ethz.ch/uploads/tx_ethpublications/hildmann_EEM_2013.pdf] - overview of terms and processes emerging in HPFC forecasting + motivation
-
*# [http://www1.vwa.unisg.ch/RePEc/usg/sfwpfi/WPF-1311.pdf] - тоже про прогнозирование цен, но тут про спотовые цены
+
*# [http://www1.vwa.unisg.ch/RePEc/usg/sfwpfi/WPF-1311.pdf] - also about price forecasting, but here about spot prices
-
* '''Basic algorithm:'''
+
* '''Base algorithm:'''
-
*# LAD-Lasso estimation из [http://www.swissquant.net/files/pdf/Robust%20Calculation%20and%20Parameter%20Estimation%20of%20the%20Hourly%20Price%20Forward%20Curve.pdf]
+
*# LAD-Lasso estimation from [http://www.swissquant.net/files/pdf/Robust%20Calculation%20and%20Parameter%20Estimation%20of%20the%20Hourly%20Price%20Forward%20Curve.pdf]
-
*# Статья Сандуляну про модификацию Add-Del: [http://strijov.com/papers/SanduleanuStrijov2011FeatureSelection_Preprint.pdf].
+
*# Sanduleanu's article about the Add-Del modification: [http://strijov.com/papers/SanduleanuStrijov2011FeatureSelection_Preprint.pdf].
-
* '''Solution:''' применить в качестве метода отбора признаков модифицрованный метод Add-Del.
+
* '''Solution:''' apply the modified Add-Del method as a feature selection method.
-
* '''Novelty:''' сравнение базвого and предложенного методов, анализ свойств предложенного метода.
+
* '''Novelty:''' comparison of basic and proposed methods, analysis of properties of the proposed method.
-
* '''consultant:''' Александр Катруца.
+
* '''consultant:''' Alexander Katrutsa.
-
=== Task 5 ===
+
===5. 2015===
-
* '''Name:''' Разработка алгоритма распознавания изображений при поиске параметров фибринолиза.
+
* '''Title:''' Development of an image recognition algorithm for the search for fibrinolysis parameters.
-
* '''Task''': Задан набор снимков роста фибринового сгустка, полученных в процессе исследования тромбодинамики and [https://ru.wikipedia.org/wiki/%D0%A4%D0%B8%D0%B1%D1%80%D0%B8%D0%BD%D0%BE%D0%BB%D0%B8%D0%B7|фибринолиза]. Требуется разработать алгоритм поиска координат отрезка and угла наклона линии активатора по серии снимков. Протестировать разработанный алгоритм на разных видах фибринолиза and примерах, где данный процесс отсутствует.
+
* '''Problem:''' A set of images of fibrin clot growth obtained during the study of thrombodynamics and [https://ru.wikipedia.org/wiki/%D0%A4%D0%B8%D0%B1%D1% 80%D0%B8%D0%BD%D0%BE%D0%BB%D0%B8%D0%B7|fibrinolysis]. It is required to develop an algorithm for finding the coordinates of the segment and the angle of inclination of the activator line from a series of images. Test the developed algorithm on different types of fibrinolysis and examples where this process is absent.
-
* '''Data:''' Массив снимков для каждого исследования формата tiff 16 бит c моментами времени от начала в сек.
+
* '''Data:''' An array of images for each study in tiff format 16 bits with time points from the beginning in seconds.
-
* '''References:'''
+
* '''References:'''
-
** Описание прикладной задачи and техническое задание: по запросу.
+
*# Description of the applied The problem and terms of reference: on request.
-
* '''Basic algorithm:''' Преобразование Хафа [https://www.cs.sfu.ca/~hamarneh/ecopy/compvis1999_hough.pdf|pdf], обсуждается.
+
* '''Base algorithm:''' Hough Transform [https://www.cs.sfu.ca/~hamarneh/ecopy/compvis1999_hough.pdf|pdf], discussed.
-
* '''consultant:''' И.А. Матвеев
+
* '''consultant:''' I.A. Matveev
-
=== Task 6 ===
+
===6. 2015===
-
* '''Name:''' Прогнозирование четвертичных структур белков: нивелирование
+
* '''Title:''' Prediction of Quaternary Structures of Proteins: нивелирование
-
* '''Task:''' Task заключается в предсказании упаковки белковых молекул в мультимерный комплекс в приближении жестких тел. Одна из формклировок задачи записывается как невыпуклая оптимизация.
+
* '''Problem description:''' The problem is to predict the packing of protein molecules into a multimeric complex in the rigid body approximation. One of the formulations of the problem is written as a non-convex optimization.
-
Нужно исследовать эту формулировку and предложить алгоритм решения. Suppose we have <tex>N</tex> proteins in an assembly, such that each protein <tex>i</tex> can be located in one of <tex>P</tex> positions <tex>x_{p}^{i}</tex>. <tex>N</tex> is ~ 10, <tex>P</tex> ~ 100. To each two vectors <tex>x_{i}^{p}</tex> and <tex>x_{j}^{q}</tex>, we can assign an energy function <tex>q_{0}</tex>, which is the overlap integral in the simplest approximation. Each protein position also has an associated score <tex>b_{0}</tex>.
+
It is necessary to study this formulation and propose a solution algorithm. Suppose we have <tex>N</tex> proteins in an assembly, such that each protein <tex>i</tex> can be located in one of <tex>P</tex> positions <tex>x_{p}^{i}</tex>. <tex>N</tex> is ~ 10, <tex>P</tex> ~ 100. To each two vectors <tex>x_{i}^{p}</tex> and <tex>x_{j}^{q}</tex>, we can assign an energy function <tex>q_{0}</tex>, which is the overlap integral in the simplest approximation. Each protein position also has an associated score <tex>b_{0}</tex>.
-
Thus, the optimal packing problem can be formulated as
+
* '''Data:''' Collected using one of the standard complexes resolved using electron microscopy. The energy values and overlap integrals are calculated by modifying one of the standard packages, on example, [http://nano-d.inrialpes.fr/software/hermitefit/ HermiteFit]. Data is generated in ~1 minute, code modification and data preparation will take ~1 week.
-
<tex>
+
* '''References:''' Yu.E. Nesterov Introduction to Convex Optimization (available at PreMoLab website)
-
\begin{align}
+
* '''Code notes:''' [[Media:MaximovProgrammingRequiremets.pdf|Implementation notes]]
-
x^{T}Q_{0}x+b_{0}^{T}x &\rightarrow& \textrm{min}\\
+
* '''Base algorithm:''' I would like to try convex relaxations.
-
\textrm{w.r.t}. &&\left\Vert x^{k}\right\Vert _{\infty}=1\;\forall k \\
+
* '''Novelty:''' Convex relaxations have not been used before in such The problems on these proteins
-
&& x_{i}^{k}\geq0\;\forall i,k
+
* '''consultant:''' Yu.V. Maksimov
-
\end{align}
+
-
</tex>
+
-
* '''Data:''' Собираются при помощи одного из стандартных комплексов решенных при помощи электронной микроскопии. Значения энергий and интегралов перекрытия вычисляются при помощи модификации одного из стандартных пакетов, например, [http://nano-d.inrialpes.fr/software/hermitefit/ HermiteFit]. Данные генерируются за ~ 1 минуту, модификация кода and подготовка данных займет ~ 1 неделю.
+
-
* '''References:''' Ю.Е. Нестеров Введение в выпуклую оптимизацию (доступна на сайте PreMoLab)
+
-
* '''Замечания по коду:''' [[Медиа:MaximovProgrammingRequiremets.pdf|Замечания по программной реализации]]
+
-
* '''Basic algorithm:''' Хочется попробовать выпуклые релаксации.
+
-
* '''Novelty:''' Выпуклые релаксации не применялись ранее в таких Taskх на данных белков
+
-
* '''consultant:''' Ю.В. Максимов
+
-
=== Task 7 ===
+
===7. 2015===
-
* '''Name:''' Метрическое обучение and снижение размерности пространства в Taskх классификации временных рядов
+
* '''Title:''' Metric learning and space dimensionality reduction in Time Series Classification The problems
-
* '''Task''': постановка задачи из базовой статьи, возможна некоторая модификация функции ошибки из-за специфики временных рядов
+
* '''Problem:''' The problem statement from the base article, some modification of the error function is possible due to the specifics of the time series
-
* '''Data:''' временные ряды цен на электроэнергию
+
* '''Data:''' electricity price time series
-
* '''References:''':
+
* '''References:'''
-
*# [http://perso.telecom-paristech.fr/~abellet/papers/aistats15.pdf] - базовая статья
+
*# [http://perso.telecom-paristech.fr/~abellet/papers/aistats15.pdf] - basic article
-
*# [http://arxiv.org/pdf/1306.6709.pdf] - отличный обзор методов Metric Learning
+
*# [http://arxiv.org/pdf/1306.6709.pdf] - excellent overview of Metric Learning methods
-
*# [http://www.cs.cmu.edu/~liuy/frame_survey_v2.pdf] - ещё обзор
+
*# [http://www.cs.cmu.edu/~liuy/frame_survey_v2.pdf] - more overview
-
* '''Basic algorithm:''' алгоритм Франка-Вольфа (условного градиентного спуска)
+
* '''Base algorithm:''' Frank-Wolf algorithm (conditional gradient descent)
-
* '''Solution:''' применить прореживание целевой матрицы с помощью метода Belsley для удаления мультиколлинерности
+
* '''Solution:''' apply target matrix decimation with Belsley method to remove multicollinearity
-
* '''Novelty:''' применение методов Metric Learning в задаче кластеризации временных рядов, анализ свойств предложенного метода
+
* '''Novelty:''' application of Metric Learning methods in the problem of time series clustering, analysis of the properties of the proposed method
-
* '''consultant:''' Александр Катруца
+
* '''consultant:''' Alexander Katrutsa
-
=== Task 8 ===
+
===8. 2015===
-
* '''Name:''' Структурное обучение при порождении моделей
+
* '''Title:''' Structural learning when generating models
-
* '''Task''': Решается Task поиска ранжирующей функции в Taskх информационного поиска. Поиск проводится среди непараметрических функций (структур), сгенерированныx грамматикой вида G: g---> B(g, g) | U(g) | S, где B - набор бинарных операций {+, -, *, /}, U - унарных {-(), sqrt, log, exp}, S - переменных and параметров {x, y, k}. Предлагается решать задачу порождения ранжирующей модели в два этапа, используя в качестве обучающей выборки историю восстановления структуры модели.
+
* '''Problem:''' Solved by The problem search ranking function in Information Search The problems. The search is carried out among non-parametric functions (structures) generated by a grammar of the form G: g---> B(g, g) | U(g) | S, where B is a set of binary operations {+, -, *, /}, U - unary operations {-(), sqrt, log, exp}, S - variables and parameters {x, y, k}. It is proposed to solve the problem of generating a ranking model in two stages, using the history of restoring the structure of the model as a training sample.
-
* '''Data:''' Подколлекции TREC.
+
* '''Data:''' TREC subcollections.
-
* Описание коллекции данных, используемых для оценки функций, and процедуры оценки. [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Kulunchakov2014RankinBySimpleFun/doc/Kulunchakov2014RankingBySimpleFun.pdf?format=raw|pdf]
+
* Description of the collection of data used to evaluate the features, and the evaluation procedure. [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Kulunchakov2014RankinBySimpleFun/doc/Kulunchakov2014RankingBySimpleFun.pdf?format=raw|pdf]
* '''References:'''
* '''References:'''
-
** Jaakkola T. Scaled structured prediction.
+
*# Jaakkola T. Scaled structured prediction.
-
** [http://www.youtube.com/watch?v=LbsBguCUFEc|Лекция Tommi Jaakkola “Scaling structured prediction”]
+
*# [http://www.youtube.com/watch?v=LbsBguCUFEc|Tommi Jaakkola lecture “Scaling structured prediction”]
-
** ''Найти все работы учеников TJ по данной тематике.''
+
*# Find all the work of TJ students on a given topic.
-
** Варфоломеева А.А. Дипломная работа бакалавра в MLAlgorithms/BSThesis/Varfolomeeva
+
*# Varfolomeeva A.A. Bachelor's thesis in MLAlgorithms/BSThesis/Varfolomeeva
-
* '''Basic algorithm:''' Парантапа, BM25 - модели для сравнения.
+
* '''Base algorithm:''' Parantap, BM25 - models for comparison.
-
* '''Solution:''' Предлагается кластеризовать коллекцию and породить модели для кластеров документов. Затем методом структурного обучения найти модели, обобщающие объединения кластеров вплоть до самой коллекции.
+
* '''Solution:''' It is proposed to cluster the collection and generate models for document clusters. Then, using the structural learning method, find models that generalize the unions of clusters up to the collection itself.
-
* '''Novelty:''' Обнаружены ранжирующие функции, не уступающие по качеству используемым на практике.
+
* '''Novelty:''' Ranking functions found that are as good as those used in practice.
-
* * '''consultant:''' Анна Варфоломеева, Oleg Bakhteev
+
* '''consultant:''' Anna Varfolomeeva, Oleg Bakhteev
-
=== Task 9 ===
+
===9. 2015===
-
* '''Name:''' Проверка соответствия электрокардиографа требованиям диагностической системы «Скринфакс» and оценка качества электрокардиограмм.
+
* '''Title:''' Checking the compliance of the electrocardiograph with the requirements of the diagnostic system "Screenfax" and assessing the quality of electrocardiograms.
-
* '''Task:''' Решается Task проверки соответствия произвольного электрокардиографа требованиям системы диагностики «Скринфакс» [1—4] на основе сравнения электрокардиограмм (ЭКГ) одних and тех же пациентов, зарегистрированных обоими приборами по схеме АВАВ, где А – первый прибор, В – второй. Также решается Task автоматического выявления некачественных электрокардиограмм, не удовлетворяющих требованиям диагностической системы.
+
* '''Problem description:''' The problem of checking the compliance of an arbitrary electrocardiograph with the requirements of the "Screenfax" diagnostic system [1—4] is solved based on a comparison of electrocardiograms (ECG) of the same and the same patients recorded by both devices according to the ABAB scheme, where A is the first device, B - the second. The problem of automatic detection of low-quality electrocardiograms that do not meet the requirements of the diagnostic system is also solved.
-
* '''Data:''' Выборка состоит из записей со значениями ЭКГ, зарегистрированными прибором, для которого проводится проверка, and прибором, используемым в системе диагностики «Скринфакс» (данные с подробным описанием формата записей будут предоставлены выбравшему задачу). Для тестирования алгоритмов обнаружения R-пиков and оценивания уровня шума можно использовать http://www.physionet.org/physiobank/database/ptbdb/
+
* '''Data:''' The selection consists of records with ECG values recorded by the device for which the test is being carried out, and by the device used in the Screenfax diagnostic system (data with a detailed description of the recording format will be provided to the person who selected The problem). You can use http://www.physionet.org/physiobank/database/ptbdb/ to test algorithms for R-peak detection and noise level estimation.
-
* '''References:'''
+
* '''References:'''
-
*# Информационный портал Диагностической системы «Скринфакс». URL: http://skrinfax.ru/автор-метода/
+
*# Information portal of the Diagnostic system "Screenfax". URL: http://skrinfax.ru/method-author/
-
*# [[Технология информационного анализа электрокардиосигналов]]
+
*# [[Technology for information analysis of electrocardiosignals]]
-
*# Успенский В.М. Информационная функция сердца. Теория and практика диагностики заболеваний внутренних органов методом информационного анализа электрокардиосигналов. М.: Экономика and информатика, 2008. 116с.
+
*# Uspensky V.M. Information function of the heart. Theory and practice of diagnosing diseases of internal organs by the method of information analysis of electrocardiosignals. M.: Economics and informatics, 2008. 116p.
-
*# Успенский В.М. Информационная функция сердца. // Клиническая медицина. 2008. Т.86. №5. С.4–13.
+
*# Uspensky V.M. Information function of the heart. // Clinical medicine. 2008. V.86. No. 5. pp.4–13.
-
*# Naseri H., Homaeinezhad M.R. Electrocardiogram signal quality assessment using an artificially reconstructed target lead // Computer Methods in Biomechanics and Biomedical Engineering. 2015. Vol.18, No. 10. Pp. 1126-1141.
+
*# Naseri H., Homainezhad M.R. Electrocardiogram signal quality assessment using an artificially reconstructed target lead // Computer Methods in Biomechanics and Biomedical Engineering. 2015. Vol.18, No. 10.Pp. 1126-1141.
-
*# Zidelmal Z., Amirou A., Ould-Abdeslam D., Moukadem A., Dieterlen A. QRS detection using S-Transform and Shannon energy. // Comput Methods Programs Biomed. 2014. Vol. 116, No. 1. Pp. 1-9. URL: https://yadi.sk/i/-kD00y1VepB3q
+
*# Zidelmal Z., Amirou A., Ould-Abdeslam D., Moukadem A., Dieterlen A. QRS detection using S-Transform and Shannon energy. // Comput Methods Programs Biomed. 2014. Vol. 116, no. 1.Pp. 1-9. URL: https://yadi.sk/i/-kD00y1VepB3q
-
*# Sarfraz M., Li F. F., Khan A. A. Independent Component Analysis Methods to Improve Electrocardiogram Patterns Recognition in the Presence of Non-Trivial Artifacts // Journal of Medical and Bioengineering. 2015. Vol. 4, No. 3. Pp. 221—226. URL: https://yadi.sk/i/-kD00y1VepB3q
+
*# Sarfraz M., Li F. F., Khan A. A. Independent Component Analysis Methods to Improve Electrocardiogram Patterns Recognition in the Presence of Non-Trivial Artifacts // Journal of Medical and Bioengineering. 2015. Vol. 4, no. 3.Pp. 221-226. URL: https://yadi.sk/i/-kD00y1VepB3q
-
*# Meziane N. et al. Simultaneous comparison of 1 gel with 4 dry electrode types for electrocardiography // Physiol. Meas. 2015. Vol. 36, No. 513.
+
*# Meziane N. et al. Simultaneous comparison of 1 gel with 4 dry electrode types for electrocardiography // Physiol. Meas. 2015. Vol. 36, no. 513.
-
*# Allana S., Aversa J., Varghese C., et al. Poor quality electrocardiograms negatively affect the diagnostic accuracy of ST segment elevation myocardial infarction. // J Am Coll Cardiol. 2014. Vol. 63, No. 12_S. doi:10.1016/S0735-1097(14)60172-8.
+
*# Allana S., Aversa J., Varghese C., et al. Poor quality electrocardiograms negatively affect the diagnostic accuracy of ST segment elevation myocardial infarction. // J Am Call Cardiol. 2014. Vol. 63, no. 12_S. doi:10.1016/S0735-1097(14)60172-8.
-
* '''Basic algorithm:''' Оценивание качества ЭКГ – [4], обнаружение R-пиков – [5], оценивание уровня шума в данных – [6].
+
* '''Base algorithm:''' ECG quality estimation – [4], R-peak detection – [5], noise level estimation in data – [6].
-
* '''Solution:''' Задачу проверки соответствия произвольного электрокардиографа требованиям системы диагностики «Скринфакс» предлагается решать путем построения перестановочных статистических тестов по сравнению значений RR-интервалов and R-амплитуд and выявленных кодовых последовательностей (вычисляются по амплитудам and интервалам) для каждого заболевания. Здесь возникает Task обнаружения R-пиков. В задаче обнаружения некачественных электрокардиограмм возникает Task оценивания уровня шума. Кроме того, необходимо научиться отсеивать ЭКГ с неинформативными значениями амплитуд или большим разбросом значений интервалов, поскольку методика анализа электрокардиосигналов неприменима к диагностике аритмии.
+
* '''Solution:''' The problem of checking the compliance of an arbitrary electrocardiograph with the requirements of the "Screenfax" diagnostic system is proposed to be solved by constructing permutation statistical tests by comparing the values of RR-intervals and R-amplitudes and detected code sequences (calculated by amplitudes and intervals) for each diseases. This is where The problem of detecting R peaks comes in. In The problem of detecting low-quality electrocardiograms, The problem of estimating the noise level arises. In addition, it is necessary to learn how to filter out ECG with non-informative amplitude values or a large spread of interval values, since the method of analyzing electrocardiographic signals is not applicable to the diagnosis of arrhythmia.
-
* '''Novelty:''' Задачу проверки соответствия электрокардиографа требованиям диагностической системы можно рассматривать как задачу сравнения приборов регистрации ЭКГ, возникающей, например, при сравнении различных видов электродов, and в качестве критериев выбираются уровень шума в значениях электрокардиосигналов, наличие дрейфа базовой линии and некоторые другие признаки [7].
+
* '''Novelty:''' The problem of checking the compliance of the electrocardiograph with the requirements of the diagnostic system can be considered as The problem of comparing ECG recording devices that arise, for example, when comparing different types of electrodes, and the noise level in the values of electrocardiosignals, the presence of baseline drift are selected as criteria and some other features [7].
-
* '''consultant:''' Ишкина Шаура
+
* '''consultant:''' Shaura Ishkina
-
=== Task 10 ===
+
===10. 2015===
-
* '''Name:''' Simplification of the IR models structure
+
* '''Title:''' Simplification of the IR models structure
-
* '''Task''': To achieve the acceptable quality of the information retrieval models, modern search engines use models of very complex structure. In current research we propose to simplify the model structure and make it interpretable without decreasing the model accuracy. To do this, we follow the idea from (Goswami et al., 2014) of constructing the set of nonlinear IR functions of simple structure and admissible accuracy. However, each of this functions is expected to have lower accuracy while comparing with the best IR model of complex structure. Thus, we propose to approximate this complex model with the linear combination of simple nonlinear functions and expect to obtain the comparable quality of solution.
+
* '''Problem:''' To achieve the acceptable quality of the information retrieval models, modern search engines use models of very complex structure. In current research we propose to simplify the model structure and make it interpretable without decreasing the model accuracy. To do this, we follow the idea from (Goswami et al., 2014) of constructing the set of nonlinear IR functions of simple structure and admissible accuracy. However, each of these functions is expected to have lower accuracy while comparing with the best IR model of complex structure. Thus, we propose to approximate this complex model with the linear combination of simple nonlinear functions and expect to obtain the comparable quality of solution.
* '''Data:''' TREC collections.
* '''Data:''' TREC collections.
-
* '''References:'''
+
* '''References:'''
-
** P. Goswami et Al. Exploring the Space of IR Functions // Advances in Information Retrieval. Lecture Notes in Computer Science. 8416:372-384, 2014.
+
*# P. Goswami et Al. Exploring the Space of IR Functions // Advances in Information Retrieval. Lecture Notes in Computer Science. 8416:372-384, 2014.
-
** [https://www.dropbox.com/s/yw7xczcnm8fbymk/StructureSimplification.pdf?dl=0| Problem statement]
+
*# [https://www.dropbox.com/s/yw7xczcnm8fbymk/StructureSimplification.pdf?dl=0| problem statement]
-
* '''Basic algorithm:''' Gradient boosting machine for constructing a model of high complexity. Exaustive search of superpositions from a set of elementary functions for approximation and simplification.
+
* '''Base algorithm:''' Gradient boosting machine for constructing a model of high complexity. Exaustive search of superpositions from a set of elementary functions for approximation and simplification.
* '''Solution:''' The optimal functions for the linear combination can be found by the greedy algorithm.
* '''Solution:''' The optimal functions for the linear combination can be found by the greedy algorithm.
* '''Novelty:''' A new ranking function of simple structure competitive with traditional ones.
* '''Novelty:''' A new ranking function of simple structure competitive with traditional ones.
* '''consultant:''' Mikhail Kuznetsov.
* '''consultant:''' Mikhail Kuznetsov.
-
=== Task 11 ===
+
===11. 2015===
-
* '''Name:''' Тестирование непараметрических алгоритмов прогнозирования временных рядов в условиях нестационарности
+
* '''Title:''' Testing non-parametric time series forecasting algorithms under non-stationary conditions
-
* '''Task''': Одним из ключевых предположений о распределении данных при непараметрическом является предположение о стационарности временного ряда. Адекватность прогнозов при невыполнении этого требования не гарантируется. Требуется разработать метод определения выполнения условия локальной стационарности временного ряда исследовать применимость основных алгоритмов непараметрического прогнозирования в отсутствии стационарности. Рассмотреть основные методы непараметрической регрессии, такие как ядерное сглаживание, сглаживание сплайнами, авторегрессия, скользящее среднее and др.
+
* '''Problem:''' One of the key assumptions about the distribution of data in non-parametric is the assumption that the time series is stationary. The adequacy of forecasts if this requirement is not met is not guaranteed. It is required to develop a method for determining the fulfillment of the condition of local stationarity of the time series to study the applicability of the main algorithms of nonparametric forecasting in the absence of stationarity. Consider the main methods of nonparametric regression, such as kernel smoothing, spline smoothing, autoregression, moving average, etc.
-
* '''Data:''' Данные о грузовых железнодорожных перевозках (РЖД)
+
* '''Data:''' Data on freight rail transportation (RZD)
-
* '''References:''':
+
* '''References:'''
-
**Вальков А.С., Кожанов Е.М., Медведникова М.М., Хусаинов Ф.И. Непараметрическое прогнозирование загруженности системы железнодорожных узлов по историческим данным // Машинное обучение and анализ данных. 2012. — № 4.
+
*# Valkov A.S., Kozhanov E.M., Medvednikova M.M., Khusainov F.I. Nonparametric forecasting of railway junction system load based on historical data // Machine Learning and Data Analysis. - 2012. - No. 4.
-
** Dickey D. A. and Fuller W. A. Distribution of the Estimators for Autoregressive Time Series with a Unit Root / Journal of the American Statistical Association. 74. 1979. p. 427—-431.
+
*# Dickey D. A. and Fuller W. A. Distribution of the Estimators for Autoregressive Time Series with a Unit Root / Journal of the American Statistical Association. - 74. - 1979. - p. 427--431.
-
* '''Basic algorithm:''' ARMA, Hist.
+
* '''Base algorithm:''' ARMA, Hist.
-
* '''Solution:''' В качестве базового метода для проверки рядов на нестационарность использовать тест Дики-Фуллера. Предлагается также рассмотреть такие источники нестационарности, как тренд and сезонность.
+
* '''Solution:''' Use the Dickey-Fuller test as a basic method for checking series for non-stationarity. It is also proposed to consider such sources of non-stationarity as trend and seasonality.
-
* '''Novelty:''' Разработан and обоснован метод определения выполнения условия локальной стационарности временного ряда.
+
* '''Novelty:''' A method for determining the fulfillment of the condition of local stationarity of a time series has been developed and substantiated.
-
* '''consultant:''' Стенина Мария
+
* '''consultant:''' Stenina Maria
-
=== Task 12 ===
+
===12. 2015===
-
* '''Name:''' Обучение метрик в Taskх полного and частичного обучения
+
* '''Title:''' Learning metrics in Full and Partial Learning The problems
-
* '''Task:''' состоит в программной реализации комплекса методов выпуклой and DC-оптимизации для задачи выбора оптимальной метрики в Taskх распознавания. Иными словами, в построении метрики такой, что классификация методом ближайших соседей дает высокую точность.
+
* '''Problem description:''' is a software implementation of a complex of convex and DC-optimization methods for the problem of choosing the optimal metric in The problems of recognition. In other words, in constructing a metric such that the nearest neighbor classification gives high accuracy.
-
* '''Data:''' Birds and Fungus коллекции ImageNet с извлеченными Deep features(предоставляется consultantом). Первичные тесты можно проводить на данных представленных [http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html здесь]
+
* '''Data:''' Birds and Fungus ImageNet collection with Deep features extracted (provided by consultant). Primary tests can be done on the data provided by [http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html here]
-
* '''References:''' Список литературы and описание подробное задачи приведены [[Медиа:Maximov_Metric_Learning%28Strijov_Course%29.pdf| в файле]]
+
* '''References:''' References and a detailed description of the problem are given [[Media:Maximov_Metric_Learning%28Strijov_Course%29.pdf| in file]]
-
* '''Замечания к коду:''' [[Медиа:MaximovProgrammingRequiremets.pdf|Замечания по программной реализации]]
+
* '''Code notes:''' [[Media:MaximovProgrammingRequiremets.pdf|Implementation notes]]
-
* '''Basic algorithm:''' 1) выпуклая релаксация задачи решаемая внутренней точкой через CVX 2) SVM на модифицированной выборке, состоящей из пар объектов
+
* '''Base algorithm:''' 1) convex relaxation of the problem solved by an internal point through CVX 2) SVM on a modified sample consisting of pairs of objects
-
* '''consultant:''' Ю.В. Максимов
+
* '''consultant:''' Yu.V. Maksimov
-
=== Task 13 ===
+
===13. 2015===
-
* '''Name:''' Построение иерархической тематической модели крупной конференции
+
* '''Title:''' Building a hierarchical topic model of a large conference
-
* '''Task''': Ежегодно, программный комитет крупной конференции EURO (более 2000 докладов) сталкивается с задачей построения иерархической модели тезисов конференции. В силу того, что структура конференции слабо меняется из года в год, предлагается построить тематическую модель будущей конференции, используя экспертные модели конференций прошлых лет. При этом возникают следующие подзадачи:
+
* '''Problem:''' Every year, the program committee of a major EURO conference (more than 2000 reports) is faced with The problem of building a hierarchical model of conference abstracts. Due to the fact that the structure of the conference changes little from year to year, it is proposed to build a thematic model of the future conference using expert models of conferences of previous years. This raises the following subThe problems:
-
# Классификация тезисов новой конференции.
+
# Classification of abstracts of the new conference.
-
# Прогнозирование изменений структуры конференции.
+
# Predicting changes in the structure of the conference.
 +
* '''Data:''' Abstracts and expert models of EURO 2010, 2012, 2013 conferences.
 +
* '''References:''' Alexander A. Aduenko, Arsentii A. Kuzmin, Vadim V. Strijov. Adaptive thematic forecasting of major conference proceedings [http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group974/KuzminAduenkoStrijov2013AdoptiveTextClustering/doc/TextClustering_english_5.pdf?format=raw text of the article]
 +
* '''Base algorithm:'''
 +
* '''Solution:''' For solving subThe problems
 +
# it is proposed to combine the expert models of conferences of previous years into one, and for each thesis of a new conference to find the most suitable cluster in the resulting combined model, on example, using a weighted cosine measure of proximity.
 +
# explore changes in the structure of conferences from year to year and determine the threshold of intra-cluster similarity values at which, for a certain set of abstracts, Experts create a new cluster, rather than adding these abstracts to existing clusters.
 +
* '''Novelty:''' A weighted cosine proximity measure that takes into account the hierarchical structure of clusters. Forecasting changes in the hierarchical structure/topics of the conference
 +
* '''consultant:''' Arsenty Kuzmin
-
* '''Data:''' Тезисы and экспертные модели конференций EURO 2010, 2012, 2013.
+
===14. 2015===
-
* '''References:''': Alexander A. Aduenko, Arsentii A. Kuzmin, Vadim V. Strijov. Adaptive thematic forecasting of major conference proceedings [http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group974/KuzminAduenkoStrijov2013AdoptiveTextClustering/doc/TextClustering_english_5.pdf?format=raw текст статьи]
+
* '''Title:''' Regularization of the linear naive bayes classifier.
-
* '''Basic algorithm:'''
+
* '''Problem:''' Building a linear classifier is one of the classic and most well studied machine learning The problems. A linear naive bayesian (LNB) classifier has the strong advantage that it builds in time that is linear in sample length, and the strong limitation that it assumes that the features are independent in its derivation. On some data, LNB performs surprisingly well, despite a clear violation of the feature independence hypothesis. The Linear Support Vector Machine (SVM) is considered to be a very successful method, but takes a long time on large samples. Both of these methods work in the same space of linear classifiers. The idea of the study is to bring LNB closer to SVM in terms of quality, but without loss of efficiency, by means of minor corrections.
-
* '''Solution:''' Для решения подзадач
+
* '''Data:''' One of the three data sets, optional: classification of texts into scientific and non-scientific, classification of abstracts by fields of science, classification of ECG codograms for sick and healthy.
-
# предлагается объединить экспертные модели конференций прошлых лет в одну, and для каждого тезиса новой конференции найти в полученной объединенной модели наиболее подходящий кластер, например, с помощью взвешенной косинусной меры близости.
+
* '''References:'''
-
# исследовать изменения в структуре конференций из года в год and определить порог значений внутрикластерного сходства, при котором для некоторого набора тезисов Experts создают новый кластер, а не добавляют эти тезисы в уже существующие кластеры.
+
*# Larsen (2005) Generalized Naive Bayes Classifiers.
 +
*# Abraham, Simha, Iyengar (2009) Effective Discretization and Hybrid feature selection using Naïve Bayesian classifier for Medical datamining.
 +
*# Lutu (2013) Fast Feature Selection for Naive Bayes Classification in Data Stream Mining.
 +
*# Zaidi, Carman, Cerquides, Webb (2014) Naive-Bayes Inspired Effective Pre-Conditioner for Speeding-up Logistic Regression.
 +
*# + ask [[User:Vokov|Vorontsov K. V.а]].
 +
* '''Base algorithm:''' any ready-made LNB and SVM implementations. Plus naive feature selection for LNB.
 +
* '''Solution:''' Derive correction formulas for LNB weights when using a margin-maximization regularizer similar to SVM. We build an iterative process in which a correction is calculated at each step, bringing the LNB closer to the SVM a little more. ROC-curves and dependences of Hold-out AUC on the iteration number are built.
 +
* '''Novelty:''' The ML community still hasn't realized that any linear classifier is equivalent to some kind of Naive Bayesian classifier.
 +
* '''consultant:''' Mikhail Uskov. '''Hyperconsultant:''' [http://www.machinelearning.ru/wiki/index.php?title=Участник:Vokov Vorontsov K. V.].
-
* '''Novelty:''' Взвешенная косинусная мера близости, учитывающая иерархичность структуры кластеров. Прогнозирование изменений иерархической структуры/тематики конференции
+
===15. 2015===
-
* '''consultant:''' Арсентий Кузьмин
+
* '''Title:''' Thematic model of the interests of regular users of the mobile application.
-
 
+
* '''Problem:''' The mobile app for learning English words offers the user words one by one. The user can either add a word to the studied ones, or discard it. To start learning words, you need to type at least 10 words. It is required to build a probabilistic word generation model that adapts to the interests of the user.
-
=== Task 14 ===
+
* '''Data:''' There are lists of added and dropped words for each user. In addition, it is intended to use a large external collection of texts, for example, Wikipedia, for sustainable topic definition.
-
* '''Name:''' Регуляризация линейного наивного байесовского классификатора.
+
-
* '''Task''': Построение линейного классификатора является одной из классических and самых хорошо изученных задач машинного обучения. Линейный наивный байесовский (LNB) классификатор имеет сильное преимущество — он строится за время, линейное по длине выборки, and сильное ограничение — при его выводе предполагается, что признаки независимы. На некоторых данных LNB работает удивительно хорошо, несмотря на явное нарушение гипотезы о независимости признаков. Линейная машина опорных векторов (SVM) считается очень успешным методом, но на больших выборках работает долго. Оба эти метода работают в одном and том же пространстве линейных классификаторов. Идея исследования состоит в том, чтобы путём незначительных поправок LNB приблизить его к SVM по качеству, но без утраты эффективности.
+
-
* '''Data:''' Один из трёх наборов данных, по выбору: классификация текстов на научные and ненаучные, классификация авторефератов по областям науки, классификация кодограмм ЭКГ на больных and здоровых.
+
-
* '''References:''':
+
-
*# ''Larsen'' (2005) Generalized Naive Bayes Classifiers.
+
-
*# ''Abraham, Simha, Iyengar'' (2009) Effective Discretization and Hybrid feature selection using Naïve Bayesian classifier for Medical datamining.
+
-
*# ''Lutu'' (2013) Fast Feature Selection for Naive Bayes Classification in Data Stream Mining.
+
-
*# ''Zaidi, Carman, Cerquides, Webb'' (2014) Naive-Bayes Inspired Effective Pre-Conditioner for Speeding-up Logistic Regression.
+
-
*# + спросить у [[Участник:Vokov|Vorontsov K. V.а]].
+
-
* '''Basic algorithm:''' любые готовые реализации LNB and SVM. Плюс наивный отбор признаков для LNB.
+
-
* '''Solution:''' Выводим поправочные формулы для весов LNB при использовании margin-maximization регуляризатора, аналогичного SVM. Строим итерационный процесс, в котором на каждом шаге вычисляется поправка, ещё немного приближающая LNB к SVM. Строятся ROC-кривые and зависимости Hold-out AUC от номера итерации.
+
-
* '''Novelty:''' Сообщество ML до сих пор не осознало, что любой линейный классификатор эквивалентен какому-то наивному байесовскому.
+
-
* '''consultant:''' Михаил Усков. '''Гиперconsultant:''' [[Участник:Vokov|Vorontsov K. V.]].
+
-
 
+
-
=== Task 15 ===
+
-
* '''Name:''' Тематическая модель интересов постоянных пользователей мобильного приложения.
+
-
* '''Task''': Мобильное приложение для изучения английских слов предлагает пользователю слова одно за другим. Пользователь может либо добавить слово к изучаемым, либо откинуть. Чтобы начать учить слова, нужно набрать, как минимум, 10 слов. Требуется построить вероятностную модель генерации слов, адаптирующуюся под интересы пользователя.
+
-
* '''Data:''' Для каждого пользователя имеются списки добавленных and откинутых слов. Кроме того, предполагается использовать большую внешнюю коллекцию текстов, например, Википедию, для устойчивого определения тематики.
+
-
* '''References:''':
+
-
*# ''Vorontsov K. V., Potapenko A. A.'' [[Media:Voron14mlj.pdf|Additive Regularization of Topic Models]] // Machine Learning. Special Issue “Data Analysis and Intelligent Optimization with Applications”. 2014. [[Media:Voron14mlj-rus.pdf|Русский перевод]]
+
-
*# + попросить у Vorontsov K. V.а
+
-
* '''Basic algorithm:''' Алгоритм случайного отбора слов.
+
-
* '''Solution:''' Тематическая модель для каждого пользователя определяет тематический профиль его интересов p(t|u). Для генерации слов используются распределения слов из распределений p(w|t) тем данного пользователя. Строятся зависимости функционалов качества тематической модели от номера итерации. Основной функционал качества — способность модели предсказывать, какие слова пользователь оставит, а какие откинет.
+
-
* '''Novelty:''' Особенностью модели является наличие откинутых слов. Разработанные методы могут быть также применены в рекомендательных системах с лайками and дизлайками.
+
-
* '''consultant:''' Виктор Сафронов. '''Гиперconsultant:''' [[Участник:Vokov|Vorontsov K. V.]].
+
-
 
+
-
=2015=
+
-
 
+
-
{|class="wikitable"
+
-
|-
+
-
! Author
+
-
! Topic
+
-
! Link
+
-
! Consultant
+
-
! Reviewer
+
-
! DZ-1
+
-
! DZ-2 (Problem number)
+
-
! Letters
+
-
! Sum
+
-
! Grade
+
-
|-
+
-
|Бернштейн Юлия
+
-
|Методы определения характеристик фибринолиза по последовательности изображений крови in vitro
+
-
 
+
-
|Матвеев И. А.
+
-
|Соломатин
+
-
|1
+
-
|3 (8)
+
-
|AILSBRCVTDE
+
-
|11
+
-
|10
+
-
|-
+
-
|Бочкарев Артем
+
-
|Структурное обучение при порождении моделей
+
-
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Bochkarev2015StructuredLearning/] (no code), [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Bochkarev2015StructuredLearning/doc/Bochakrev2015StructuredLearning.pdf?format=raw paper], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Bochkarev2015StructuredLearning/doc/presentation.pdf?format=raw slides]
+
-
|[[Участник:Varf_Ann|Варфоломеева Анна]], [[Участник:Oleg_Bakhteev|Бахтеев Олег]]
+
-
|Исаченко
+
-
|2
+
-
|2 (7)
+
-
|A+I++LS+BRCVT+DS
+
-
|9.25
+
-
|10
+
-
|-
+
-
|Гончаров Алексей
+
-
|Метрическая классификация временных рядов
+
-
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/code code],
+
-
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/Goncharov2015MetricClassification.pdf?format=raw paper],
+
-
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Goncharov2015MetricClassification/doc/GoncharovAlexey2015PresentationMetricClassification.pdf?format=raw slides]
+
-
|[[Участник:Mpopova|Maria Popova]]
+
-
|Задаянчук
+
-
|1.5
+
-
|1 (4)
+
-
|AILSBRCVTDSW
+
-
|12
+
-
|10
+
-
|-
+
-
|Двинских Дарина
+
-
|Повышение качества прогнозирования с использованием групп товаров
+
-
|[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Dvinskikh2015DemandForecasting/code code],
+
-
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Dvinskikh2015DemandForecasting/doc/DvinskikhDemandForecasting.pdf paper],
+
-
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Dvinskikh2015DemandForecasting/doc/Dvinskikh.Presentation.pdf slides]
+
-
|Каневский Д. Ю.
+
-
|Смирнов
+
-
|0.5
+
-
|3 (7)
+
-
|AILSBRCVTDEHS
+
-
|14
+
-
|10
+
-
|-
+
-
|Ефимов Юрий
+
-
|Поиск внешней and внутренней границ радужки на изображении глаза методом парных градиентов
+
-
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Efimov2015IrisBorderRecognition/code code],
+
-
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Efimov2015IrisBorderRecognition/doc/Efimov2015IrisBorderRecognition.pdf?format=raw paper],
+
-
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Efimov2015IrisBorderRecognition/doc/15_presentation.pdf?format=raw slides]
+
-
|Матвеев И. А.
+
-
|Нейчев
+
-
|
+
-
|
+
-
|AILSBRCVTDEW
+
-
|12
+
-
|10
+
-
|-
+
-
|Жариков Илья
+
-
|Проверка соответствия электрокардиографа требованиям диагностической системы «Скринфакс» and оценка качества электрокардиограмм.
+
-
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Zharikov2015ECGVerification/code code], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Zharikov2015ECGVerification/doc/Zharikov2015ECGVerification.pdf?format=raw paper], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Zharikov2015ECGVerification/doc/Zharikov2015Presentation.pdf?format=raw slides]
+
-
|Ишкина Шаура
+
-
|Бочкарев
+
-
|3.5
+
-
|3 (5)
+
-
|AIL+SBRCVTDEHSW
+
-
|14.25
+
-
|10
+
-
|-
+
-
|Задаянчук Андрей
+
-
|Выбор оптимальной модели классификации физической активности
+
-
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Zadayanchuk2015OptimalNN/code code],
+
-
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Zadayanchuk2015OptimalNN/doc/Zadayanchuk2015OptimalNN.pdf paper],
+
-
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Zadayanchuk2015OptimalNN/doc/Zadayanchuk2015OptimalNNpresentation.pdf slides]
+
-
|[[Участник:Mpopova|Maria Popova]]
+
-
|Гончаров
+
-
|2
+
-
|0 (17)
+
-
|AI-LSB+RCVTD
+
-
|10
+
-
|10
+
-
|-
+
-
|Златов Александр
+
-
|Построение иерархической модели крупной конференции
+
-
||[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Zlatov2015ConferenceModel/code code],
+
-
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Zlatov2015ConferenceModel/doc/ConferenceModel.pdf?format=raw paper],
+
-
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Zlatov2015ConferenceModel/doc/Zlatov2015ConferenceModelPresentation.pdf?format=raw slides]
+
-
|Арсентий Кузьмин
+
-
|Двинских
+
-
|1.5
+
-
|3 (14)
+
-
|AI+L+SBRC++V+TDESW
+
-
|14.25
+
-
|10
+
-
|-
+
-
|Isachenko Roman
+
-
|Метрическое обучение and снижение размерности пространства в Taskх кластеризации временных рядов
+
-
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Isachenko2015MetricLearning/code code], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Isachenko2015MetricLearning/doc/Isachenko2015MetricLearning.pdf?format=raw paper], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Isachenko2015MetricLearning/doc/Isachenko2015MLPresentation.pdf?format=raw slides]
+
-
|[[Участник:Katrutsa|Катруца Александр]]
+
-
|Жариков
+
-
|3.5
+
-
|3 (14)
+
-
|A-I+L+S-BR+CVTDEHSW
+
-
|14.25
+
-
|10
+
-
|-
+
-
|Нейчев Радослав
+
-
|Отбор признаков в прогнозировании временных рядов c использованием экзогенных факторов
+
-
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Neychev2015FeatureSelection/code code], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Neychev2015FeatureSelection/doc/Neychev2015FeatureSelection.pdf?format=raw paper], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Neychev2015FeatureSelection/doc/Neychev2015FSPresentation.pdf?format=raw slides]
+
-
|[[Участник:Katrutsa|Катруца Александр]]
+
-
|Ефимов
+
-
|1
+
-
|3 (9)
+
-
|AI-L-SBRCVTDEHSW
+
-
|13.5
+
-
|10
+
-
|-
+
-
|Подкопаев Александр
+
-
|Прогнозирование четвертичных структур белков
+
-
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Podkopaev2015ProteinStructures/code code],
+
-
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Podkopaev2015ProteinStructures/doc/Podkopaev2015ProteinStructures.pdf?format=raw paper],
+
-
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Podkopaev2015ProteinStructures/doc/Podkopaev2015ProteinStructuresPresentation.pdf?format=raw slides]
+
-
|Ю. В. Максимов
+
-
|Решетова
+
-
|3.5
+
-
|3 (11)
+
-
|AILS+B+RCVTDEHS
+
-
|13.5
+
-
|10
+
-
|-
+
-
|Решетова Дарья
+
-
|Методы многоклассовой классификации с улучшенными оценками сходимости в Taskх частичного обучения
+
-
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Reshetova2015MetricLearning/code code],
+
-
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Reshetova2015MetricLearning/doc/Reshetova2015MulticlussClussification.pdf?format=raw paper],
+
-
[https://svn.code.sf.net/p/mlalgorithms/code/Group274/Reshetova2015MetricLearning/doc/presentation.pdf?format=raw slides]
+
-
|Максимов Юрий
+
-
|Камзолов
+
-
|2.5
+
-
|3 (10)
+
-
|AIL++SB+RCVT++DEHS-
+
-
|14
+
-
|10
+
-
|-
+
-
|Смирнов Евгений
+
-
|Тематическая модель интересов постоянных пользователей мобильного приложения
+
-
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Smirnov2015TopicModeling/Code code], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Smirnov2015TopicModeling/doc/Smirnov2015TopicModeling.pdf?format=raw paper], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Smirnov2015TopicModeling/doc/Smirnov2015Presentation.pdf?format=raw slides]
+
-
|Виктор Сафронов
+
-
|Златов
+
-
|1
+
-
|1 (4)
+
-
|AILSBRCVTWDE
+
-
|11.25
+
-
|10
+
-
|-
+
-
|Соломатин Иван
+
-
|Определение области затенения радужки классификатором локальных текстурных признаков
+
-
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Solomatin2015EESLocalization/code code], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Solomatin2015EESLocalization/doc/article.pdf?format=raw paper], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Solomatin2015EESLocalization/doc/Solomatin.EESLocalisation.Presentation.pdf?format=raw slides]
+
-
|Матвеев И. А.
+
-
|Бернштейн
+
-
|
+
-
|3 (9)
+
-
|AILSBRCVTDE
+
-
|11
+
-
|10
+
-
|-
+
-
|Черных Владимир
+
-
|Тестирование непараметрических алгоритмов прогнозирования временных рядов в условиях нестационарности
+
-
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Chernykh2015TimeSeriesPrediction/code code],
+
-
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Chernykh2015TimeSeriesPrediction/doc/SteninaChernykh2015ArimaHistForecast.pdf?format=raw paper],
+
-
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Chernykh2015TimeSeriesPrediction/doc/presentation/Chernykh2015Presentation.pdf?format=raw slides]
+
-
|[[Участник:Medvmasha|Стенина Мария]]
+
-
|Шишковец
+
-
|3.5
+
-
|3 (4)
+
-
|A+I+LSBRCVT+DE++H++
+
-
|13.75
+
-
|10
+
-
|-
+
-
|Шишковец Светлана
+
-
|Регуляризация линейного наивного байесовского классификатора.
+
-
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Shishkovets2015NaivBayes/code code],
+
-
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Shishkovets2015NaivBayes/doc/Shishkovets2015NaivBayes.pdf?format=raw paper], [http://svn.code.sf.net/p/mlalgorithms/code/Group274/Shishkovets2015NaivBayes/doc/Shishkovets_Presentation.pdf?format=raw slides]
+
-
|[[Участник:Uskov Mikhail|Михаил Усков]], [[Участник:Vokov|Константин Воронцов]]
+
-
|Черных
+
-
|3.5
+
-
|2 (9)
+
-
|A+I+L+SBR+CV+TD+E+H+S
+
-
|15
+
-
|10
+
-
|-
+
-
|Камзолов Дмитрий
+
-
|Новые алгоритмы для задачи ранжирования веб-страниц
+
-
|—
+
-
|Александр Гасников, Yuri Maksimov
+
-
|Подкопаев
+
-
|
+
-
|
+
-
|AILSB+RCVT+DEHS--
+
-
|13
+
-
|8
+
-
|-
+
-
|Сухарева Анжелика
+
-
|Классификация научных текстов по отраслям знаний
+
-
|[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Sukhareva2015TextClassification/code code],
+
-
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Sukhareva2015TextClassification/doc/Sukhareva2015TextClassification.pdf?format=raw paper],
+
-
[http://svn.code.sf.net/p/mlalgorithms/code/Group274/Sukhareva2015TextClassification/doc/Sukhareva_Presentation.pdf?format=raw slides]
+
-
| [[Участник:Sidious|Сергей Царьков]]
+
-
|
+
-
|0.5
+
-
|
+
-
|AILSBRCVTDEH
+
-
|
+
-
|9
+
-
|-
+
-
|}
+
-
 
+
-
 
+
-
=== Task 1 ===
+
-
* '''Name:''' Повышение качества прогнозирования спроса с использованием групп товаров
+
-
* '''Task:'''
+
-
Дано:
+
-
*# Временные ряды продаж нескольких группам товаров в одном гипермаркете. Также для каждого товара известны периоды дефицита, периоды воздействия на спрос календарных праздников and периоды проведения. маркетинговых акций. Также известен товарный классификатор: дерево групп товаров, где сами товары являются листьями.
+
-
*# Алгоритм прогнозирования, который используется для построения прогнозов спроса по этим товарам: самоадаптивное экспоненциальное сглаживание (модель Тригга-Лича, см. [1])
+
-
*# Функция потерь, по которой измеряется качество прогнозов: MAPE.
+
-
*# Требования к построению прогнозов: прогнозы требуется строить понедельно на 4 недели вперёд (в начале текущей недели требуется построить прогноз суммарного спроса на следующую неделю, неделю через одну, через две, через 3).
+
-
 
+
-
Гипотеза: спрос на отдельные товары слишком неустойчив, чтобы выявить характерную для них сезонность. Предлагается использовать данные о группах товаров, чтобы точнее определить параметры сезонности.
+
-
Замечание: возможны and другие варианты повышения качества прогнозирования за счёт работы с группами товаров.
+
-
Task заключается в повышении качества прогнозирования в рамках поставленной задачи путём учёта эффекта взаимозаменяемости товаров, по сравнению с базовым алгоритмом.
+
-
Результат можно считать достигнутым, если показано статистически значимое повышение качества при построении серии прогнозов (не менее 20) по каждому временному ряду скользящим контролем.
+
-
* '''Data:'''
+
-
*# Данные о продажах нескольких товарных групп в гипермаркете крупной торговой сети: https://drive.google.com/file/d/0B5YjPespcL83X3pHaE1aRzBUaDg/view?usp=sharing
+
-
* '''References:'''
+
-
*# Лукашин Ю. П. Адаптивные методы краткосрочного прогнозирования временных рядов. — М.: Финансы and статистика, 2003.
+
-
*# http://www.machinelearning.ru/wiki/index.php?title=%D0%9C%D0%BE%D0%B4%D0%B5%D0%BB%D1%8C_%D0%A2%D1%80%D0%B8%D0%B3%D0%B3%D0%B0-%D0%9B%D0%B8%D1%87%D0%B0
+
-
*# Nitin Patel, Mahesh Kumar, Rama Ramakrishnan. Clustering models to improve forecasts in retail merchandising. http://www.cytel.com/Papers/INFORMS_Prac_%2004.pdf
+
-
*# Kumar M., Error-based Clustering and Its Application to Sales Forecasting in Retail Merchandising. PhD Thesis. http://books.google.ru/books/about/Error_based_Clustering_and_Its_Applicati.html?id=6252NwAACAAJ&redir_esc=y
+
-
* '''Basic algorithm:''' Предлагется использовать модель сезонности [3] в сочетании с моделью Тригга-Лича в качестве алгоритма прогнозирования ряда без сезонности ([1] and [2]). При этом возможны 3 варианта алгоритма, в зависимости от способа оценки сезонности:
+
-
*# Сезонность оценивается по самому ряду продаж. Для товаров с "короткой" историей оценка сезонности не выполняется.
+
-
*# Сезонность оценивается по группе товаров, исходя из классификатора товарных групп (нижний уровень классификатора)
+
-
*# Сезонность оценивается по кластерам, исходя из методики [3], [4].
+
-
* '''Solution:''' Требуется реализовать объединение модели сезонности [3] and модели Тригга-Лича в качестве алгоритма прогнозирования ряда без сезонности ([1] and [2]), с 3-мя вариантами анализа сезонности, описанными выше. При построение сезонных профилей необходимо исключать периоды маркетинговых акций (иначе может быть существенное искажение сезонности). Дальше понадобится серия экспериментов с анализом качества на реальных данных. При анализе качества можно исключать периоды проведения праздников and маркетинговых акций. По итогам экспериментов, возможно, потребуется адаптация алгоритма кластеризации.
+
-
* '''Novelty:''' Построение самоадаптивного алгоритма прогнозирования с учётом сезонности, выявляемой путём кластерного анализа.
+
-
* '''consultant:''' Каневский Д.Ю.
+
-
 
+
-
=== Task 2 ===
+
-
* '''Name:''' Исследование связи онкологических заболеваний and экологической ситуации по пространственно-временной выборке
+
-
* '''Task:''' Дана матрица с оценками экологической обстановки and данными по средней заболеваемости онкологией для каждого района Ростовской области за несколько лет. Оценки экологической обстановки содержат значительное количество шума. Оценки экологической обстановки выполнены в ранговых шкалах. Требуется построить регрессионную модель для оценки количества онкозаболеваний, которая бы учитывала экологическую обстановку в районе, соседство с другими районами and тенденцию изменения параметров на протяжении временного ряда.
+
-
* '''Data:''' таблица с данными об экологической ситуации and количестве онкологических заболеваний в Ростовской области.
+
-
* '''References:'''
+
-
** http://www.scielosp.org/pdf/aiss/v47n2/v47n2a10.pdf - Ecological studies of cancer incidence in an area interested by dumping waste sites in Campania (Italy)
+
-
** http://lasi.lynchburg.edu/shahady_t/public/Breast%20Cancer.pdf - Incidence of human cancer in correlation with ecological integrity in a metropolitan population
+
-
** http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/SUBBARAO1/HeivReview.pdf - Heteroscedastic Errors-in-Variables Regression
+
-
** http://en.wikipedia.org/wiki/Errors-in-variables_models - википедия: модели с ошибками в независимых переменных
+
-
** http://www.cardiff.ac.uk/maths/resources/Gillard_Tech_Report.pdf - An Historical Overview of Linear Regression with Errors in both Variables
+
-
** http://arxiv.org/pdf/1212.5049v1.pdf - A Partial Least Squares Algorithm Handling Ordinal Variables Also In Presence Of A Small Number Of Categories
+
-
** [https://ru.wikipedia.org/wiki/%D0%A0%D0%B0%D1%81%D1%81%D1%82%D0%BE%D1%8F%D0%BD%D0%B8%D0%B5_%D0%9C%D0%B0%D1%85%D0%B0%D0%BB%D0%B0%D0%BD%D0%BE%D0%B1%D0%B8%D1%81%D0%B0] - википедия: Расстояние Махаланобиса
+
-
** http://see.stanford.edu/materials/aimlcs229/cs229-hmm.pdf - Hidden Markov Models Fundamentals
+
-
* '''Basic algorithm:''' Сравнений с базовым алгоритмом проводить не предполагается
+
-
* '''Solution:''' Один из алгоритмов регрессии из обзора (3-й пункт литературы). Трансформацию порядковых признаков в линейные можно найти в пункте 4 литературы
+
-
* '''Novelty:''' В отличие от существующих работ, в основном использующих только наборы признаков, но не географическое соседство с загрязненными районами and динамику изменения окружающей среды, в данной работе предлагается провести анализ проблемы с учетом этих факторов.
+
-
* '''consultant:''' Oleg Bakhteev.
+
-
 
+
-
=== Task 3 ===
+
-
* '''Name:''' Получение оценки разреженной ковариационной матрицы для нелинейных моделей (нейросетей).
+
-
* '''Task''': Предложить метод оценки ковариационной матрицы параметров модели общего вида для случая линейной регрессии, логистической регрессии, общих нелинейных моделей, включая нейросети. Предложить способ учета структуры матрицы (разреженность, зависимости между коэффициентами and т.д.)
+
-
* '''Data:''' Синтетические данные and тесты.
+
-
* '''References:''':
+
-
** Зайцев А.А., Strizhov V.V., Tokmakova A.A. [http://strijov.com/papers/ZaytsevStrijovTokmakova2012Likelihood_Preprint.pdf Оценка гиперпараметров регрессионных моделей методом максимального правдоподобия] // Информационные технологии, 2013, 2 — 11-15.
+
-
** Kuznetsov M.P., Tokmakova A.A., Strijov V.V. [http://strijov.com/papers/HyperOptimizationEng.pdf Analytic and stochastic methods of structure parameter estimation] // Preprint, 2015.
+
-
** Aduenko A. A. Презентация по Evidence, 2015. [[Медиа:aduenko_presentation_russian.pdf|aduenko_presentation_russian.pdf]]
+
-
** Bishop C. M. Pattern Recognition and Machine Learning, pp. 161-172, 2006.
+
-
* '''Basic algorithm:''' Оценка диагональной матрицы, см. папку MLAlgorithms/HyperOptimization.
+
-
* '''Solution:'''
+
-
* '''Novelty:''' Предложен быстрый алгоритм получения оценок ковариационной матрицы общего вида для нелинейных моделей, исследованы свойства разреженных матриц.
+
-
* '''consultant:''' Alexander Aduenko.
+
-
 
+
-
=== Task 6 ===
+
-
* '''Name:''' Прогнозирование четвертичных структур белков: нивелирование
+
-
* '''Task:''' Task заключается в предсказании упаковки белковых молекул в мультимерный комплекс в приближении жестких тел. Одна из формклировок задачи записывается как невыпуклая оптимизация.
+
-
Нужно исследовать эту формулировку and предложить алгоритм решения.
+
-
 
+
-
Suppose we have <tex>N</tex> proteins in an assembly, such that each protein <tex>i</tex> can be located in one of <tex>P</tex> positions <tex>x_{p}^{i}</tex>. <tex>N</tex> is ~ 10, <tex>P</tex> ~ 100. To each two vectors <tex>x_{i}^{p}</tex> and <tex>x_{j}^{q}</tex>, we can assign an energy function <tex>q_{0}</tex>, which is the overlap integral in the simplest approximation. Each protein position also has an associated score <tex>b_{0}</tex>.
+
-
Thus, the optimal packing problem can be formulated as
+
-
 
+
-
<tex>
+
-
\begin{align}
+
-
x^{T}Q_{0}x+b_{0}^{T}x &\rightarrow& \textrm{min}\\
+
-
\textrm{w.r.t}. &&\left\Vert x^{k}\right\Vert _{\infty}=1\;\forall k \\
+
-
&& x_{i}^{k}\geq0\;\forall i,k
+
-
\end{align}
+
-
</tex>
+
-
+
-
* '''Data:''' Собираются при помощи одного из стандартных комплексов решенных при помощи электронной микроскопии. Значения энергий and интегралов перекрытия вычисляются при помощи модификации одного из стандартных пакетов, например, [http://nano-d.inrialpes.fr/software/hermitefit/ HermiteFit]. Данные генерируются за ~ 1 минуту, модификация кода and подготовка данных займет ~ 1 неделю.
+
-
* '''References:''' Ю.Е. Нестеров Введение в выпуклую оптимизацию (доступна на сайте PreMoLab)
+
-
* '''Замечания по коду:''' [[Медиа:MaximovProgrammingRequiremets.pdf|Замечания по программной реализации]]
+
-
* '''Basic algorithm:''' Хочется попробовать выпуклые релаксации.
+
-
* '''Novelty:''' Выпуклые релаксации не применялись ранее в таких Taskх на данных белков
+
-
* '''consultant:''' Ю.В. Максимов
+
-
 
+
-
=== Task 8 ===
+
-
* '''Name:''' Структурное обучение при порождении моделей
+
-
* '''Task''': Решается Task поиска ранжирующей функции в Taskх информационного поиска. Поиск проводится среди непараметрических функций (структур), сгенерированныx грамматикой вида G: g---> B(g, g) | U(g) | S, где B - набор бинарных операций {+, -, *, /}, U - унарных {-(), sqrt, log, exp}, S - переменных and параметров {x, y, k}. Предлагается решать задачу порождения ранжирующей модели в два этапа, используя в качестве обучающей выборки историю восстановления структуры модели.
+
-
* '''Data:''' Подколлекции TREC.
+
-
* Описание коллекции данных, используемых для оценки функций, and процедуры оценки. [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Kulunchakov2014RankinBySimpleFun/doc/Kulunchakov2014RankingBySimpleFun.pdf?format=raw|pdf]
+
* '''References:'''
* '''References:'''
-
** Jaakkola T. Scaled structured prediction.
+
*# Vorontsov K. V., Potapenko A. A. [[Media:Voron14mlj.pdf|Additive Regularization of Topic Models]] // Machine Learning. Special Issue “Data Analysis and Intelligent Optimization with Applications”. 2014. [[Media:Voron14mlj-rus.pdf|Russian translation]]
-
** [http://www.youtube.com/watch?v=LbsBguCUFEc|Лекция Tommi Jaakkola “Scaling structured prediction”]
+
* '''Base algorithm:''' Random word selection algorithm.
-
** ''Найти все работы учеников TJ по данной тематике.''
+
* '''Solution:''' The topic model for each user determines the topic profile of his interests p(t|u). To generate words, word distributions from the distributions p(w|t) of the topics of the given user are used. Dependences of the quality functionals of the thematic model on the iteration number are constructed. The main functionality of quality is the ability of the model to predict which words the user will leave and which ones they will discard.
-
** Варфоломеева А.А. Дипломная работа бакалавра в MLAlgorithms/BSThesis/Varfolomeeva
+
* '''Novelty:''' A feature of the model is the presence of discarded words. The developed methods can also be applied in recommender systems with likes and dislikes.
-
* '''Basic algorithm:''' Парантапа, BM25 - модели для сравнения.
+
* '''consultant:''' Viktor Safronov. '''Hyperconsultant:''' [[User:Vokov|Vorontsov K. V.]].
-
* '''Solution:''' Предлагается кластеризовать коллекцию and породить модели для кластеров документов. Затем методом структурного обучения найти модели, обобщающие объединения кластеров вплоть до самой коллекции.
+
-
* '''Novelty:''' Обнаружены ранжирующие функции, не уступающие по качеству используемым на практике.
+
-
* * '''consultant:''' Анна Варфоломеева, Oleg Bakhteev
+
-
 
+
-
=== Task 9 ===
+
-
* '''Name:''' Проверка соответствия электрокардиографа требованиям диагностической системы «Скринфакс» and оценка качества электрокардиограмм.
+
-
* '''Task:''' Решается Task проверки соответствия произвольного электрокардиографа требованиям системы диагностики «Скринфакс» [1—4] на основе сравнения электрокардиограмм (ЭКГ) одних and тех же пациентов, зарегистрированных обоими приборами по схеме АВАВ, где А – первый прибор, В – второй. Также решается Task автоматического выявления некачественных электрокардиограмм, не удовлетворяющих требованиям диагностической системы.
+
-
* '''Data:''' Выборка состоит из записей со значениями ЭКГ, зарегистрированными прибором, для которого проводится проверка, and прибором, используемым в системе диагностики «Скринфакс» (данные с подробным описанием формата записей будут предоставлены выбравшему задачу). Для тестирования алгоритмов обнаружения R-пиков and оценивания уровня шума можно использовать http://www.physionet.org/physiobank/database/ptbdb/
+
-
* '''References:'''
+
-
*# Информационный портал Диагностической системы «Скринфакс». URL: http://skrinfax.ru/автор-метода/
+
-
*# [[Технология информационного анализа электрокардиосигналов]]
+
-
*# Успенский В.М. Информационная функция сердца. Теория and практика диагностики заболеваний внутренних органов методом информационного анализа электрокардиосигналов. М.: Экономика and информатика, 2008. 116с.
+
-
*# Успенский В.М. Информационная функция сердца. // Клиническая медицина. 2008. Т.86. №5. С.4–13.
+
-
*# Naseri H., Homaeinezhad M.R. Electrocardiogram signal quality assessment using an artificially reconstructed target lead // Computer Methods in Biomechanics and Biomedical Engineering. 2015. Vol.18, No. 10. Pp. 1126-1141.
+
-
*# Zidelmal Z., Amirou A., Ould-Abdeslam D., Moukadem A., Dieterlen A. QRS detection using S-Transform and Shannon energy. // Comput Methods Programs Biomed. 2014. Vol. 116, No. 1. Pp. 1-9. URL: https://yadi.sk/i/-kD00y1VepB3q
+
-
*# Sarfraz M., Li F. F., Khan A. A. Independent Component Analysis Methods to Improve Electrocardiogram Patterns Recognition in the Presence of Non-Trivial Artifacts // Journal of Medical and Bioengineering. 2015. Vol. 4, No. 3. Pp. 221—226. URL: https://yadi.sk/i/-kD00y1VepB3q
+
-
*# Meziane N. et al. Simultaneous comparison of 1 gel with 4 dry electrode types for electrocardiography // Physiol. Meas. 2015. Vol. 36, No. 513.
+
-
*# Allana S., Aversa J., Varghese C., et al. Poor quality electrocardiograms negatively affect the diagnostic accuracy of ST segment elevation myocardial infarction. // J Am Coll Cardiol. 2014. Vol. 63, No. 12_S. doi:10.1016/S0735-1097(14)60172-8.
+
-
* '''Basic algorithm:''' Оценивание качества ЭКГ – [4], обнаружение R-пиков – [5], оценивание уровня шума в данных – [6].
+
-
* '''Solution:''' Задачу проверки соответствия произвольного электрокардиографа требованиям системы диагностики «Скринфакс» предлагается решать путем построения перестановочных статистических тестов по сравнению значений RR-интервалов and R-амплитуд and выявленных кодовых последовательностей (вычисляются по амплитудам and интервалам) для каждого заболевания. Здесь возникает Task обнаружения R-пиков. В задаче обнаружения некачественных электрокардиограмм возникает Task оценивания уровня шума. Кроме того, необходимо научиться отсеивать ЭКГ с неинформативными значениями амплитуд или большим разбросом значений интервалов, поскольку методика анализа электрокардиосигналов неприменима к диагностике аритмии.
+
-
* '''Novelty:''' Задачу проверки соответствия электрокардиографа требованиям диагностической системы можно рассматривать как задачу сравнения приборов регистрации ЭКГ, возникающей, например, при сравнении различных видов электродов, and в качестве критериев выбираются уровень шума в значениях электрокардиосигналов, наличие дрейфа базовой линии and некоторые другие признаки [7].
+
-
* '''consultant:''' Ишкина Шаура
+
-
 
+
-
=== Task 12 ===
+
-
* '''Name:''' Обучение метрик в Taskх полного and частичного обучения
+
-
* '''Task:''' состоит в программной реализации комплекса методов выпуклой and DC-оптимизации для задачи выбора оптимальной метрики в Taskх распознавания. Иными словами, в построении метрики такой, что классификация методом ближайших соседей дает высокую точность.
+
-
* '''Data:''' Birds and Fungus коллекции ImageNet с извлеченными Deep features(предоставляется consultantом). Первичные тесты можно проводить на данных представленных [http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html здесь]
+
-
* '''References:''' Список литературы and описание подробное задачи приведены [[Медиа:Maximov_Metric_Learning%28Strijov_Course%29.pdf| в файле]]
+
-
* '''Замечания к коду:''' [[Медиа:MaximovProgrammingRequiremets.pdf|Замечания по программной реализации]]
+
-
* '''Basic algorithm:''' 1) выпуклая релаксация задачи решаемая внутренней точкой через CVX 2) SVM на модифицированной выборке, состоящей из пар объектов
+
-
* '''consultant:''' Ю.В. Максимов
+
-
 
+
-
== Plans for next year: ==
+
-
# Expand the matlab test and give it along with the trial programming as the first task.
+
-
=2014=
+
 +
==2014==
{|class="wikitable"
{|class="wikitable"
|-
|-
Строка 4856: Строка 4843:
! Grade
! Grade
|-
|-
-
|[[Участник:rgazizullina|Газизуллина Римма]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:rgazizullina Gazizullina Rimma]
-
|Прогнозирование объемов железнодорожных грузоперевозок по парам веток
+
|Forecasting the volume of rail freight traffic by pairs of branches
|[http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group174/Gazizullina2014RailwayForecasting/], [http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group174/Gazizullina2014RailwayForecasting/doc/Gazizullina2014RailwayForecasting.pdf?format=raw pdf]
|[http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group174/Gazizullina2014RailwayForecasting/], [http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group174/Gazizullina2014RailwayForecasting/doc/Gazizullina2014RailwayForecasting.pdf?format=raw pdf]
-
|[[Участник:Medvmasha|Стенина Мария]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Medvmasha Stenina Maria]
|<tex>\frac{15}{15}+\frac{10}{16}</tex>
|<tex>\frac{15}{15}+\frac{10}{16}</tex>
|[MF]TAI+L+SBR+CV+T>DEH(J)
|[MF]TAI+L+SBR+CV+T>DEH(J)
Строка 4865: Строка 4852:
|10
|10
|-
|-
-
|[[Участник:Agrinchuk|Гринчук Алексей]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Agrinchuk Grinchuk Alexey]
-
|Выбор оптимальных структур прогностических моделей методами структурного обучения
+
|Selection of Optimal Structures of Predictive Models by Structural Learning Methods
|[http://svn.code.sf.net/p/mlalgorithms/code/Group174/Grinchuk2014StructuredPrediction/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Grinchuk2014StructuredPrediction/doc/Grinchuk2014StructuredPrediction.pdf?format=raw pdf]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group174/Grinchuk2014StructuredPrediction/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Grinchuk2014StructuredPrediction/doc/Grinchuk2014StructuredPrediction.pdf?format=raw pdf]
-
|Варфоломеева Анна
+
|Varfolomeeva Anna
|<tex>\frac{7}{15}+\frac{2}{16}</tex>
|<tex>\frac{7}{15}+\frac{2}{16}</tex>
|[F]TA+I+LSBR+СV+T+D+E(F)
|[F]TA+I+LSBR+СV+T+D+E(F)
Строка 4874: Строка 4861:
|9
|9
|-
|-
-
|[[Участник:Aguschin|Гущин Александр]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Aguschin Gushchin Alexander]
-
|Последовательное порождение существенно нелинейных моделей в Taskх ранжирования документов
+
|Sequential Generation of Essentially Nonlinear Models in The problems of Document Ranking
|[http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group174/Guschin2014FeaturesGeneration/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Guschin2014FeaturesGeneration/doc/Guschin2014DocumentRetrieval.pdf?format=raw pdf]
|[http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group174/Guschin2014FeaturesGeneration/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Guschin2014FeaturesGeneration/doc/Guschin2014DocumentRetrieval.pdf?format=raw pdf]
-
|[[Участник:Mikethehuman|Кузнецов Михаил]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Mikethehuman Kuznetsov Mikhail]
|<tex>\frac{5}{15}+\frac{2}{16}</tex>
|<tex>\frac{5}{15}+\frac{2}{16}</tex>
|[F]TAI+L+SBRCVTDEHS(F)
|[F]TAI+L+SBRCVTDEHS(F)
Строка 4883: Строка 4870:
|9
|9
|-
|-
-
|[[Участник:Iefimova|Ефимова Ирина]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Iefimova Efimova Irina]
-
|Дифференциальная диагностика заболеваний по электрокардиограмме
+
|Differential diagnosis of diseases by electrocardiogram
|[http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group174/Efimova2014DiagnosticsOfDiseases/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Efimova2014DiagnosticsOfDiseases/doc/Efimova2014DiagnosticsOfDiseases.pdf?format=raw pdf]
|[http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group174/Efimova2014DiagnosticsOfDiseases/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Efimova2014DiagnosticsOfDiseases/doc/Efimova2014DiagnosticsOfDiseases.pdf?format=raw pdf]
-
|[[Участник:Celyh|Целых Влада]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Celyh Vlada Tselykh]
|<tex>\frac{15}{15}+\frac{12}{16}</tex>
|<tex>\frac{15}{15}+\frac{12}{16}</tex>
|[MF]T+A+I+L+SB++R+CV+TDE+H(J ed)
|[MF]T+A+I+L+SB++R+CV+TDE+H(J ed)
Строка 4892: Строка 4879:
|10
|10
|-
|-
-
|[[Участник:Azhukov|Жуков Андрей]]
+
|[[Участник:Azhukov|Zhukov Andrey]]
-
|Построение рейтингов вузов: панельный анализ and оценка устойчивости
+
|Building University Rankings: Panel Analysis and Sustainability Assessment
|[http://svn.code.sf.net/p/mlalgorithms/code/Group174/Zhukov2014UniversityRanking/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Zhukov2014UniversityRanking/doc/Zhukov2014UniversityRanking.pdf?format=raw pdf]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group174/Zhukov2014UniversityRanking/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Zhukov2014UniversityRanking/doc/Zhukov2014UniversityRanking.pdf?format=raw pdf]
-
|[[Участник:Mikethehuman|Кузнецов Михаил]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Mikethehuman Kuznetsov Mikhail]
|<tex>\frac{8}{15}+0</tex>
|<tex>\frac{8}{15}+0</tex>
|[F]TAIL+SBRCVTDEHS(F)
|[F]TAIL+SBRCVTDEHS(F)
Строка 4901: Строка 4888:
|9
|9
|-
|-
-
|[[Участник:Aignatov|Игнатов Андрей]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Aignatov Ignatov Andrey]
-
|Обучение многообразий для прогнозирования наборов квазипериодических временных рядов
+
|Manifold training for predicting sets of quasi-periodic time series
|[http://svn.code.sf.net/p/mlalgorithms/code/Group174/Ignatov2014ManifoldsTraining/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Ignatov2014ManifoldsTraining/doc/Ignatov2014ManifoldsTraining.pdf?format=raw pdf]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group174/Ignatov2014ManifoldsTraining/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Ignatov2014ManifoldsTraining/doc/Ignatov2014ManifoldsTraining.pdf?format=raw pdf]
-
|Ивкин Никита
+
|Ivkin Nikita
|<tex>0+\frac{7}{16}</tex>
|<tex>0+\frac{7}{16}</tex>
|[MF]TA+I+L+S+B+R+C+VTD>E+HS (J if ed)
|[MF]TA+I+L+S+B+R+C+VTD>E+HS (J if ed)
Строка 4910: Строка 4897:
|10
|10
|-
|-
-
|[[Участник:Mkarasikov|Карасиков Михаил]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Mkarasikov Karasikov Mikhail]
-
|Поиск эффективных методов снижения размерности при решении задач мультиклассовой классификации путем её сведения к решению бинарных задач
+
|Search for effective methods of dimensionality reduction in solving problems of multiclass classification by reducing it to solving binary problems
|[http://svn.code.sf.net/p/mlalgorithms/code/Group174/Karasikov2014MulticlassClassification/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Karasikov2014MulticlassClassification/doc/Karasikov2014MulticlassClassification.pdf?format=raw pdf]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group174/Karasikov2014MulticlassClassification/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Karasikov2014MulticlassClassification/doc/Karasikov2014MulticlassClassification.pdf?format=raw pdf]
-
|Ю.В. Максимов
+
|Yu.V. Maksimov
|<tex>0+0</tex>
|<tex>0+0</tex>
|[MF]TAI+L+SBRC+V+TDESH(J)
|[MF]TAI+L+SBRC+V+TDESH(J)
Строка 4919: Строка 4906:
|10
|10
|-
|-
-
|[[Участник:Кулунчаков|Кулунчаков Андрей]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=%D0%A3%D1%87%D0%B0%D1%81%D1%82%D0%BD%D0%B8%D0%BA:%D0%9A%D1%83%D0%BB%D1%83%D0%BD%D1%87%D0%B0%D0%BA%D0%BE%D0%B2 Kulunchakov Andrey]
-
|Обнаружение изоморфных структур существенно нелинейных прогностических моделей
+
|Detecting Isomorphic Structures of Essentially Nonlinear Predictive Models
|[http://svn.code.sf.net/p/mlalgorithms/code/Group174/Kulunchakov2014IsomorphicStructures/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Kulunchakov2014IsomorphicStructures/doc/Kulunchakov2014IsomorphicStructures.pdf?format=raw pdf]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group174/Kulunchakov2014IsomorphicStructures/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Kulunchakov2014IsomorphicStructures/doc/Kulunchakov2014IsomorphicStructures.pdf?format=raw pdf]
-
|Сологуб Роман, [[Участник:Mikethehuman|Кузнецов Михаил]]
+
|Sologub Roman, [http://www.machinelearning.ru/wiki/index.php?title=Участник:Mikethehuman Kuznetsov Mikhail]
|<tex>\frac{10}{15}+\frac{14}{16}</tex>
|<tex>\frac{10}{15}+\frac{14}{16}</tex>
|[F]T+AI+L+S+BR+CVT++D+EHS(J ed-ed)
|[F]T+AI+L+S+BR+CVT++D+EHS(J ed-ed)
Строка 4928: Строка 4915:
|10
|10
|-
|-
-
|[[Участник:Alipatova|Липатова Анна]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Alipatova Lipatova Anna]
-
|Обнаружение закономерностей в наборе временных рядов методами структурного обучения
+
|Detecting Patterns in a Set of Time Series by Structural Learning Methods
|[http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group174/Lipatova2014StructureLearning/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Lipatova2014StructureLearning/doc/Lipatova2014StructureLearning.pdf?format=raw pdf]
|[http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group174/Lipatova2014StructureLearning/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Lipatova2014StructureLearning/doc/Lipatova2014StructureLearning.pdf?format=raw pdf]
-
|А. П. Мотренко
+
|A. P. Motrenko
|<tex>\frac{8}{15}+\frac{6}{16}</tex>
|<tex>\frac{8}{15}+\frac{6}{16}</tex>
|[MF]TA+I+LSBR-CVTDE (J when ed)
|[MF]TA+I+LSBR-CVTDE (J when ed)
Строка 4937: Строка 4924:
|10
|10
|-
|-
-
|[[Участник:Nmakarova|Макарова Анастасия]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Nmakarova Makarova Anastasia]
-
|Использование нелинейного прогнозирования при поиске зависимостей между временными рядами
+
|Using non-linear forecasting when looking for dependencies between time series
|[http://svn.code.sf.net/p/mlalgorithms/code/Group174/Makarova2014DynamicTS/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Makarova2014DynamicTS/doc/Makarova2014DynamicTS.pdf?format=raw pdf]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group174/Makarova2014DynamicTS/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Makarova2014DynamicTS/doc/Makarova2014DynamicTS.pdf?format=raw pdf]
-
|Мотренко Анастасия
+
|A. P. Motrenko
|<tex>0+0</tex>
|<tex>0+0</tex>
|[F]TAI-LSB+R-CVTD>E>(F)
|[F]TAI-LSB+R-CVTD>E>(F)
Строка 4946: Строка 4933:
|9
|9
|-
|-
-
|[[Участник:Aplavin|Плавин Александр]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Aplavin Plavin Alexander]
-
|Оптимизация числа тем в вероятностных тематических моделях с помощью регуляризатора строкового разреживания
+
|Optimizing the Number of Topics in Probabilistic Topic Models with a String Sparse Regularizer
|[http://svn.code.sf.net/p/mlalgorithms/code/Group174/Plavin2014TopicsNumberOptimization/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Plavin2014TopicsNumberOptimization/doc/Plavin2014TopicsNumberOptimization.pdf?format=raw pdf]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group174/Plavin2014TopicsNumberOptimization/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Plavin2014TopicsNumberOptimization/doc/Plavin2014TopicsNumberOptimization.pdf?format=raw pdf]
-
|[[Участник:AnyaP|Потапенко Анна]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:AnyaP Potapenko Anna]
|<tex>\frac{13}{15}+\frac{14}{16}</tex>
|<tex>\frac{13}{15}+\frac{14}{16}</tex>
|[F]T+A+I+L+S+BR++CVTD+>>(?)
|[F]T+A+I+L+S+BR++CVTD+>>(?)
Строка 4955: Строка 4942:
|10
|10
|-
|-
-
|[[Участник:Mpopova|Попова Мария]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Mpopova Maria Popova]
-
|Выбор оптимальной модели прогнозирования физической активности человека по измерениям акселерометра
+
|Choosing the optimal model for predicting human physical activity based on accelerometer measurements
|[http://svn.code.sf.net/p/mlalgorithms/code/Group174/Popova2014OptimalModelSelection/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Popova2014OptimalModelSelection/doc/Popova2014OptimalModelSelection.pdf?format=raw pdf]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group174/Popova2014OptimalModelSelection/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Popova2014OptimalModelSelection/doc/Popova2014OptimalModelSelection.pdf?format=raw pdf]
-
|[[Участник:Aleksandra.Tokmakova|Токмакова Александра]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Aleksandra.Tokmakov Tokmakova Alexandra]
|<tex>\frac{11}{15}+\frac{6}{16}</tex>
|<tex>\frac{11}{15}+\frac{6}{16}</tex>
|[MF]T+AI+L++SB++R+CV+TD+(JV ed)
|[MF]T+AI+L++SB++R+CV+TD+(JV ed)
Строка 4964: Строка 4951:
|10
|10
|-
|-
-
|[[Участник:Mshvets|Швец Михаил]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Mshvets Shvets Mikhail]
-
|Интерпретация мультимоделей при обработке социологических данных
+
|Interpretation of multimodels in the processing of sociological data
|[http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group174/Shvets2014MultimodelInterpretation/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Shvets2014MultimodelInterpretation/doc/Shvets2014MultimodelInterpretation.pdf?format=raw pdf]
|[http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group174/Shvets2014MultimodelInterpretation/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Shvets2014MultimodelInterpretation/doc/Shvets2014MultimodelInterpretation.pdf?format=raw pdf]
-
|[[Участник:Aduenko|Адуенко Александр]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Aduenko Alexander Aduenko]
|<tex>\frac{11}{15}+\frac{4}{16}</tex>
|<tex>\frac{11}{15}+\frac{4}{16}</tex>
|[M+F]T+A+I+L+S+B+R+CVTD+E(F)
|[M+F]T+A+I+L+S+B+R+CVTD+E(F)
Строка 4973: Строка 4960:
|9
|9
|-
|-
-
|[[Участник:Mshinkevich|Шинкевич Михаил]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:Mshinkevich Shinkevich Mikhail]
-
|Влияние регуляризаторов разреживания, сглаживания and декорреляции на устойчивость вероятностной тематической модели
+
|Influence of sparse, smoothing and decorrelation regularizers on the stability of a probabilistic topic model
|[http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group174/Shinkevich2014RegularizatorsCombination/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Shinkevich2014RegularizatorsCombination/doc/Shinkevich2014RegularizatorsCombination.pdf?format=raw pdf]
|[http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group174/Shinkevich2014RegularizatorsCombination/], [http://svn.code.sf.net/p/mlalgorithms/code/Group174/Shinkevich2014RegularizatorsCombination/doc/Shinkevich2014RegularizatorsCombination.pdf?format=raw pdf]
-
| Дударенко Марина
+
| Dudarenko Marina
|<tex>\frac{15}{15}+\frac{9}{16}</tex>
|<tex>\frac{15}{15}+\frac{9}{16}</tex>
|[MF]T+AIL+S+BR+CV+T+D+E+H(J ed)
|[MF]T+AIL+S+BR+CV+T+D+E+H(J ed)
Строка 4984: Строка 4971:
|}
|}
-
===1. Оптимизация числа тем в вероятностных тематических моделях с помощью регуляризатора строкового разреживания===
+
===1. 2014===
 +
* Optimizing the Number of Topics in Probabilistic Topic Models with a String Sparse Regularizer
 +
* '''Problem:''' The probabilistic topic model describes the probabilities of occurrence of words <tex>w\in W</tex> in documents <tex>d\in D</tex> through latent topics <tex>t\in T< /text>:
 +
<tex> p(w|d) = \sum_{t\in T} p(w|t)p(t|d) = \sum_{t\in T} \phi_{wt}\theta_{td}. </tex> We need to test the hypothesis that by imposing constraints on the <tex>\Theta</tex> matrix using the string sparse regularizer, it is possible to determine the optimal number of topics.
 +
* '''Data:''' The collection of documents is specified by word frequencies. Since to solve the problem it is necessary to know the <<true>> number of topics, experiments are performed on realistic model or semi-model data.
 +
* '''References:'''
 +
*# [[Media:The problem-PTM-Potapenko.pdf| Description of the problem and proposed solutions]]
 +
*# Vorontsov K. V. Additive regularization of thematic models of collections of text documentsc ops // Reports of the Russian Academy of Sciences. 2014. - V. 455, No. 3 (in press).
 +
*# Vorontsov K. V. Probabilistic thematic modeling. — 2014. http://www.MachineLearning.ru/wiki/images/2/22/Voron-2013-ptm.pdf
 +
*# Teh Y. W., Jordan M. I., Beal M. J., Blei D. M. Hierarchical Dirichlet processes // Journal of the American Statistical Association. - 2006. - Vol. 101, no. 476.-Pp. 1566–1581
 +
* '''Basic algorithm:''' Regularized EM-algorithm [2014: Vorontsov] is used to solve the optimization problem. A rational, stochastic or online version of the EM algorithm can be used.
 +
* '''Novelty:''' Dirichlet's HDP [2006: Teh et Al] hierarchical process model is commonly used to optimize the number of topics. It determines the number of topics is unstable, and at the same time it is difficult both to understand and to implement. Additive Regularization of Topic Models (ARTM) is a new approach to topic modeling that combines versatility, flexibility and simplicity. The problem of optimizing the number of topics has not yet been considered in the framework of ARTM.
-
'''consultant:''' А.А. Потапенко
+
===2. 2014===
 +
* Differential diagnosis of diseases by electrocardiogram
 +
* '''Problem:''' It is proposed to solve a typical classification problem. Signs are 216 characteristics calculated from the electrocardiogram. It is necessary to evaluate the quality of the classification on a delayed control sample. To do this, the fractions of errors of the first and second kind are calculated. Under the error of the first kind is meant the assignment of healthy people to the class of patients, the second kind - the assignment of patients to the class of healthy people. Preference is given to minimizing Type II errors.
 +
* '''Data:''' For each of the 5 diseases, there are 2 types of samples. Reference - more reliable, specially selected cases. The rest are cases when the diagnoses were established by doctors less reliably; these samples are proposed to be used for control.
 +
* '''References:'''
 +
*# Vorontsov K. V. Metric classification algorithms. Lectures on machine learning. — 2014. http://www.MachineLearning.ru/wiki/images/c/c3/Voron-ML-Metric-slides.pdf
 +
*# Uspensky V. M. Information function of the heart // Clinical Medicine, 2008. - V. 86, No. 5. - P. 4–13.
 +
*# Uspensky V. M. Information function of the heart. Theory and practice of diagnosing diseases of internal organs by the method of information analysis of electrocardiosignals. - M .: "Economy and information", 2008. - 116 p.
 +
* '''Basic algorithm:''' To solve the problem, it is proposed to use a metric algorithm with greedy feature selection.
 +
* '''Novelty:''' The data were prepared using a unique technology for information analysis of electrocardiosignals, developed by prof. MD V.M.Uspensky. A classification algorithm is proposed and its generalizing ability is investigated.
 +
* '''consultant:''' Vlada Tselykh
-
'''Task:''' Вероятностная тематическая модель описывает вероятности появления слов <tex>w\in W</tex> в документах <tex>d\in D</tex> через латентные темы <tex>t\in T</tex>:
+
===3. 2014===
 +
* Influence of sparse, smoothing and decorrelation regularizers on the stability of a probabilistic topic model
 +
* '''Problem:'''Probabilistic topic model describes the probabilities of occurrence of words <tex>w\in W</tex> in documents <tex>d\in D</tex> through latent topics <tex>t\in T< /text>: <tex> p(w|d) = \sum_{t\in T} p(w|t)p(t|d) = \sum_{t\in T} \phi_{wt}\theta_{td}. </tex> Matrix representation <tex>\|p(w|d)\|_{W\times D}</tex>
 +
as a product of two smaller matrices <tex>{\Phi=\|\phi_{wt}\|_{W\times T}}</tex> and <tex>{\Theta=\|\theta_{dt} \|_{T\times D}}</tex> is not the only one: <tex>\Phi \Theta = (\Phi S)(S^{-1}\Theta) = \Phi'\Theta'</tex> for some non-degenerate <tex>S</tex>. It is required to test the hypothesis that, by imposing restrictions on the matrices <tex>\Phi, \Theta</tex> using regularizers,
 +
it is possible to increase the stability of their recovery.
 +
* '''Data:''' The collection of documents is specified by word frequencies. To solve the problem, it is necessary to know the “true” matrices <tex>\Phi, \Theta,</tex> experiments are performed on realistic model or semi-model data that satisfy the hypotheses of sparseness, weak correlation of topics and the presence of background topics.
 +
* '''References:'''
 +
*# Vorontsov K. V. Additive regularization of thematic models of collections of text documents // Reports of the Russian Academy of Sciences. 2014. - V. 455, No. 3 (in press).
 +
*# Vorontsov K. V. Probabilistic thematic modeling. - 2014. http://www.MachineLearning.ru/wiki/images/2/22/Voron-2013-ptm.pdf.
 +
* '''Basic algorithm:''' Regularized EM-algorithm [2014: Vorontsov] is used to solve the optimization problem. A rational, stochastic or online version of the EM algorithm can be used.
 +
* '''Novelty:''' Additive Regularization of Topic Models (ARTM) was proposed in [2014: Vorontsov] as a universal way to improve the stability and interpretability of topic models. However, the question of which particular combination of regularizers increases stability remains open. This study is aimed at solving this problem.
 +
* '''consultant:''' Marina Dudarenko
-
<tex> p(w|d) = \sum_{t\in T} p(w|t)p(t|d) = \sum_{t\in T} \phi_{wt}\theta_{td}. </tex>
+
===4. 2014===
 +
* Building University Rankings: Panel Analysis and Sustainability Assessment
 +
* '''consultant:''' Kuznetsov Mikhail
 +
* '''Problem:''' University ranking changes from year to year. This change may be due to the poor quality of the ranking calculation methodology, random changes in the institution's performance, and purposeful changes in the state of the institution. It is required to propose such a rating method that is resistant to random changes, which would allow interpreting the change in the state of the university.
 +
* '''Data:''' Eight years of data for the world's top 100 universities.
 +
* '''References:'''
 +
*# Strijov V.V. Refinement of expert assessments using measured data. Zavodskaya lab. Diagnostics of materials, 2006, 72(7) - 59-64.
 +
*# Strijov V.V. Refinement of Expert assessments in rank scales using measured data. Zavodskaya lab. Diagnostics of materials, 2011, 77(7) - 72-78.
 +
*# Kuznetsov M.P., Strijov V.V. Methods of expert estimations concordance for integral quality estimation // Expert Systems with Applications, 2014.
 +
*# ''Draft POF article on request.''
 +
* '''Basic algorithm:''' A method for constructing the RUR rating and one of the redundantly stable algorithms for ranking scales.
 +
* '''Novelty:''' Introduced the concept of interpretability of the change in the rating position. The problem of choosing and optimal locally monotonous correction of indicators was solved. A technique for constructing a rating is proposed that allows interpreting the change in the state of a university for the purpose of monitoring. Option: solved the reverse The problem of management: how to change the indicators of the university in order to achieve a given goal.
-
Требуется проверить гипотезу, что,
+
===5. 2014===
-
накладывая ограничения на матрицу <tex>\Theta</tex> с помощью регуляризатора строкового разреживания,
+
* Detecting Patterns in a Set of Time Series by Structural Learning Methods
-
возможно определить оптимальное число тем.
+
* '''consultant:''' A. P. Motrenko
 +
* '''Problem:''' To improve the quality of the time series forecast, I would like to use expert statements about the presence of a causal relationship between events. To do this, it is necessary to be able to assess the reliability of expert statements. It is impossible to prove the existence of a causal relationship by statistical methods. The researcher can only check the presence of a certain structure of communication. The purpose of The problem is, based on expert statements about the presence of a connection between events, to examine the time series for the presence of various structural connections and find the structure that is most consistent with the Expert's opinion.
 +
* '''References:'''
 +
*# R. B. Kline, Principles and Practice of Structural Equation Modeling. New York: Guilford. 2005.
 +
*# J. Pearl, Graphs, Causality and Structural Equation Models. Sociological Methods and Research, 27-2(1998), 226-284.
 +
*# J. Pearl, E. Bareinboim, Transportability of Causal and Statistical Relations: A Formal Approach // Proceedings of the 25th AAAI Conference on Artificial Intelligence, August 7-11, 2011, San Francisco. 247-254
 +
*# Valkov A.S., Kozhanov E.M., Motrenko A.P., Khusainov F.I. Construction of cross-correlation dependences in the forecast of load of the railway junction // Machine learning and data analysis. 2013. T. 1, No. 5. C. 505-518.
 +
*# Valkov A.S., Kozhanov E.M., Medvednikova M.M., Khusainov F.I. Nonparametric forecasting of railway junction system load based on historical data // Machine Learning and Data Analysis. 2012. T. 1, No. 4. C. 448-465.
 +
* '''Basic algorithm:''' structural equation modeling, SEM
 +
* '''Novelty:''' A method for assessing the reliability of Expert statements about the impact of exchange prices on major instruments on the volume of rail freight traffic is proposed. Various structures of links between time series are proposed. The concept of structure complexity is introduced. The relationship between the complexity of the structure and the assessment of the reliability of the statement is investigated.
-
'''Data:''' Коллекция документов задаётся частотами слов. Поскольку для решения задачи необходимо знать <<истинное>> число тем, эксперименты производятся на реалистичных модельных или полумодельных данных.
+
===18. 2014===
 +
* Using non-linear forecasting when looking for dependencies between time series
 +
* '''consultant:''' A. P. Motrenko
 +
* '''Problem:''' (As part of a study devoted to the discovery of patterns in time series sets) It is proposed to abandon the standard assumptions about the stationarity of the time series when searching for dependencies between time series and to study time series from the point of view of dynamical systems theory, within which irregular time dependences determined by the structure of the phase space are considered. It is required to study a set of approaches to the analysis of dynamic data and the identification of relationships between them; describe the limits of applicability of the basic algorithm and propose new options for the revealed structural relationships.
 +
* Data: Synthetic data, historical stock prices for major instruments and rail freight data.
 +
* '''References:'''
 +
*# Tools for the Analysis of Chaotic Data. HENRY D. I. ABARBANEL
 +
*# Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series, G. Sugihara, R.M. May.
 +
*# George Sugihara et al. Detecting Causality in Complex Ecosystems. Science 338, 496 (2012);
 +
*# Valkov A.S., Kozhanov E.M., Motrenko A.P., Khusainov F.I. Construction of cross-correlation dependences in the forecast of load of the railway junction // Machine learning and data analysis. 2013. T. 1, No. 5. C. 505-518.
 +
*# Valkov A.S., Kozhanov E.M., Medvednikova M.M., Khusainov F.I. Nonparametric forecasting of railway junction system load based on historical data // Machine Learning and Data Analysis. 2012. T. 1, No. 4. C. 448-465.
 +
* '''Basic algorithm:''' convergent cross mapping
 +
* '''Novelty:''' Proposed different structures of relationships between time series and a method for checking the existence of relationships
-
'''References:'''
+
===6. 2014 ===
-
* [[Медиа:Task-PTM-Potapenko.pdf| Описание задачи and предлагаемые пути решения]]
+
* Sequential Generation of Essentially Nonlinear Models in The problems of Document Ranking
-
* Vorontsov K. V. Аддитивная регуляризация тематических моделей коллекций текстовых доку-
+
* '''consultant:''' Kuznetsov Mikhail
-
ментов // Доклады РАН. 2014. — Т. 455, №3 (в печати).
+
* '''Problem:''' Propose and test on test and real data an algorithm for generating essentially non-linear models. The algorithm should generate 1) a complete set of models 2) choose the optimal step for a fixed model structure (adding a superposition element).
-
* Vorontsov K. V. Вероятностное тематическое моделирование. — 2014.
+
* '''Data:''' Synthetic data, data for LIG text collections.
-
http://www.MachineLearning.ru/wiki/images/2/22/Voron-2013-ptm.pdf
+
* '''References:'''
-
* Teh Y. W., Jordan M. I., Beal M. J., Blei D. M. Hierarchical Dirichlet processes // Journal of the
+
*# Goswami P., Moura1 S., Gaussier E., Amini M.R. Exploring the Space of IR Functions //
-
American Statistical Association. — 2006. — Vol. 101, no. 476. — Pp. 1566–1581.
+
*# Ore G.I., Strijov V.V. Algorithms for the inductive generation of superpositions for the approximation of measured data // Informatics and its applications, 2013, 7(1) - 17-26.
 +
*# Ore G.I., Strijov V.V. Simplification of superpositions of elementary functions with the help of graph transformations according to the rules // Intellectualization of information processing. Reports of the 9th international conference, 2012 - 140-143.
 +
*# Vladislavleva E., Smith G., Hertog D., Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via Pareto Genetic Programming // IEEE Transactions on Evolutionary Computation, 2009. Vol. 13(2). pp. 333-349.
 +
*# Vladislavleva E. Model-based Problem Solving through Symbolic Regression via Pareto Genetic Programming: PhD thesis, Tilburg University, Tilburg, the Netherlands, 2008.
 +
*'''Basic algorithm:''' An exhaustive enumeration algorithm for admissible superpositions of generating functions.
 +
* '''Novelty:''' An algorithm for sequential addition of superposition elements is proposed. A function of the distance between superpositions is proposed and its properties are investigated. The notion of superposition complexity and the notion of adjacent superpositions that differ in complexity by one are introduced. An algorithm for generating adjacent superpositions is proposed.
-
'''Basic algorithm:''' Для решения оптимизационной задачи используется регуляризованный EM-алгоритм [2014: Воронцов]. Может быть использована рациональная, стохастическая или онлайновая версия EM-алгоритма.
+
===7. 2014===
 +
* Detecting Isomorphic Structures of Essentially Nonlinear Predictive Models
 +
* '''consultant:''' Sologub Roman, Kuznetsov Mikhail
 +
* '''Problem:''' Develop an algorithm for finding isomorphic subgraphs for trees (a variant - for directed acyclic graphs). Compare the complexity of the algorithm for checking the isomorphism of two superpositions for the proposed algorithm and for the algorithm for element-by-element comparison of mappings.
 +
* '''Data:''' Data on exchange options: dependence of option volatility on the price and time of its execution.
 +
* '''References:'''
 +
*# Ore G.I., Strijov V.V. Algorithms for the inductive generation of superpositions for the approximation of measured data // Informatics and its applications, 2013, 7(1) - 17-26.
 +
*# Ore G.I., Strijov V.V. Simplification of superpositions of elementary functions with the help of graph transformations according to the rules // Intellectualization of information processing. Reports of the 9th international conference, 2012 - 140-143.
 +
*# Ehrig H., Ehrig G., Prange U., Taentzer. G. Fundamentals of Algebraic Graph Transformation. Springer, 2006.
 +
*# Ehrig H., Engels G. Handbook of Graph Grammars and Computing by Graph Transformation. World Scientific Publishing, 1997.
 +
*# Strijov V.V., Sologub R.A. Inductive generation of regression models of implied volatility for option trading // Computational technologies, 2009, 14(5) — 102-113.
 +
* '''Basic algorithm:''' Algorithm for element-by-element comparison of mappings.
 +
* '''Novelty:''' A fast algorithm for simplifying superpositions and searching for isomorphic models is proposed. The incidence matrix of the set of generating functions is used.
-
'''Novelty:''' Для оптимизации числа тем обычно используется модель иерархического процесса Дирихле HDP [2006: Teh et Al]. Она определяет число тем неустойчиво, and при этом сложна как для понимания, так and для реализации. Аддитивная регуляризация тематических моделей (ARTM) --- это новый подход к тематическому моделированию, сочетающий универсальность, гибкость and простоту. Task оптимизации числа тем ещё не рассматривалась в рамках ARTM.
+
===8. 2014===
 +
* Building predictive models as superpositions of expert-specified functions
 +
* '''consultant:''' Ivkin Nikita
 +
* '''Problem:''' Required to assign a set of time series to one of several classes. It is proposed to do this using the automated feature generation procedure. To do this, Expert creates a set of generating functions that 1) transform the time series (by example, smooth, decompose into principal components), 2) extract its aggregated descriptions from the time series (by example, mean, variance, number of extrema). It is possible to generate a significant number of features by constructing superpositions of generating functions. The resulting features are used to classify a set of time series (for example, by the nearest neighbor method).
 +
* '''Data:''' data from the mobile phone's accelerometer.
 +
* '''References:'''
 +
*# Problem statement \MLAlgorithms\Group074\Kuznetsov2013SSAForecasting\doc
 +
*# Khaikin S. Neural networks. Williams, 2006.
 +
* '''Basic algorithm:''' neural network (option: deep learning neural network).
 +
* '''Novelty:''' A method for extracting features using automatically constructed superpositions of Expert-specified functions is proposed. Comparison of structural and topological complexity in The problem classification.
-
===2. Дифференциальная диагностика заболеваний по электрокардиограмме===
+
===9. 2014===
 +
* Manifold training for predicting sets of quasi-periodic time series
 +
* '''consultant:''' Ivkin Nikita
 +
* '''Problem:''' The problem of classifying human activity based on data from the mobile phone's accelerometer is solved. Data from the accelerometer are represented by quasi-periodic time series. It is required to attribute the time series to one of the types of activity: running, walking, etc. To solve the problem of classifying series, a method based on nearest neighbors in the space of manifolds is proposed.
 +
* '''Data:''' data from the mobile phone's accelerometer.
 +
* '''References:'''
 +
*# Mi Zhang; Sawchuk, A.A., "Manifold Learning and Recognition of Human Activity Using Body-Area Sensors," Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on , vol.2, no., pp.7,13, 18- 21 Dec. 2011
 +
* '''Basic algorithm:''' neural network
 +
* '''Novelty:''' proposed a method for classifying quasi-periodic time series based on manifolds
-
'''consultant:''' В.Р. Целых
+
=== 10. 2014===
 +
* Interpretation of multimodels in the processing of sociological data
 +
* '''consultant:''' Alexander Aduenko
 +
* '''Problem:''' The problem of credit scoring is to determine the level of creditworthiness of the borrower who applied for a loan. To do this, a borrower's questionnaire is used, containing both numerical data (age, income, time of residence in the country) and categorical features (gender, profession). It is required, having historical information on loan repayments by other borrowers, to determine whether the client in question will return the loan. Thus, it is required to solve the problem of classification. Since the data can be heterogeneous (for example, if there are different income regions in the country), the data can be described not by one, but by several models. In this paper, we propose to compare two methods for constructing multimodels: mixtures of logistic models and gradient boosting.
 +
* '''Data:''' data on consumer loans (\mlalgorithms\BSThesis\Aduenko2013\data).
 +
* '''References:'''
 +
*# model blends (\mlalgorithms\BSThesis\Aduenko2013\doc, Bishop)
 +
*# boosting (lecture "Compositional methods of classification and regression" by Vorontsov)
 +
* '''Basic algorithm:''' boosting.
 +
* '''Novelty:''' Identification and explanation of similarities and differences between the solutions obtained by the two specified algorithms.
-
'''Task:''' Предлагается решить типичную задачу классификации. Признаками являются 216 характеристик, вычисляемых по электрокардиограмме. Необходимо провести оценку качества классификации по отложенной контрольной выборке. Для этого вычисляются доли ошибок первого and второго рода. Под ошибкой первого рода подразумевается отнесение здоровых к классу больных, второго рода – отнесение больных к классу здоровых. Предпочтение отдается минимизации ошибок второго рода.
+
=== 11. 2014===
 +
* Selection of Optimal Structures of Predictive Models by Structural Learning Methods
 +
* '''consultant:''' Varfolomeeva Anna
 +
* '''Problem:''' It is proposed to solve the problem of forecasting in two stages: first, the structure of the predictive model is restored using the stories of constructing successful forecasts. The model parameters are then optimized; using the model, a time series forecast is built.
 +
* '''Data:''' synthetic sample, biomedical time series, accelerometer measurements.
 +
* '''References:'''
 +
*# Jaakkola T. Scaled structured prediction.
 +
*# URL: http://video.yandex.ru/users/ya-events/view/486/user-tag/scientific%20seminar/
 +
*# ''Find all the work of TJ students on the given topic.''
 +
*# Varfolomeeva A.A. Bachelor's thesis in MLAlgorithms/BSThesis/Varfolomeeva
 +
* '''Basic algorithm:''' the metaprediction algorithm described in the thesis.
 +
* '''Novelty:''' A method for restoring model structures using a priori assumptions about these structures is proposed.
-
'''Data:''' Для каждой из 5 болезней есть 2 типа выборок. Эталонные – более надежные, специально отобранные случаи. Остальные – случаи, когда диагнозы устанавливались врачами менее надежно, эти выборки предлагается использовать для контроля.
+
===12. 2014 ===
 +
* Invariants in Predicting Quasi-Periodic Series
 +
* '''consultant:''' Arsenty Kuzmin
 +
* '''Problem:''' The problem of hourly price/electricity consumption forecasting for the day ahead is being solved. When constructing the plan matrix, it is proposed to use not the original segment of the time series, but its invariant representation.
 +
* '''Data:''' hourly data on electricity prices and volumes (insert link).
 +
* '''References:'''
 +
*# Sandulyanu L.N., Strijov V.V. Feature Selection in Autoregressive Forecasting The problems // Information Technologies, 2012, 7 — 11-15.
 +
*# ''(taken from Fadeev's last article)''
 +
*# '''Basic algorithm:''' autoregressive prediction described in Sanduleanu's work.
 +
* '''Novelty:''' An algorithm for joint estimation of the parameters of the invariants and autoregressive model is proposed, which makes it possible to significantly improve the accuracy of forecasting.
-
'''References:'''
+
=== 13. 2014 ===
-
* Vorontsov K. V. Метрические алгоритмы классификации. Лекции по машинному обучению. — 2014. http://www.MachineLearning.ru/wiki/images/c/c3/Voron-ML-Metric-slides.pdf
+
* Forecasting the volume of rail freight traffic by pairs of branches
-
* Успенский В. М. Информационная функция сердца // Клиническая медицина, 2008. — Т. 86, № 5. — С. 4–13.
+
* '''consultant:''' Stenina Maria (Medvednikova)
-
* Успенский В. М. Информационная функция сердца. Теория and практика диагностики заболеваний внутренних органов методом информационного анализа электрокардиосигналов. — М.: «Экономика and информация», 2008. — 116 с.
+
* '''Problem:''' Predict traffic volumes from branch to branch, compare with the basic algorithm for predicting the departure of wagons from branch. Test the hypothesis that the traffic forecast from branch to branch is more accurate than the forecast using the Basic algorithm Examine series for trend/periodicity. If there is a trend/periodicity, then include it in the model. Prepare a prediction algorithm for use.
-
'''Basic algorithm:''' Для решения задачи предлагается использовать метрический алгоритм с жадным отбором признаков.
+
* '''Data:''' daily data for a year and a half on the transportation of 38 types of cargo in the Omsk region.
 +
* '''References:'''
 +
*# Valkov A.S., Kozhanov E.M., Medvednikova M.M., Khusainov F.I. Nonparametric forecasting of railway junction system load based on historical data // Machine Learning and Data Analysis. - 2012. - No. 4.
 +
* '''Basic algorithm:''' histogram prediction described in the article.
 +
* '''Novelty:''' it is proposed to improve the quality of the forecast by dividing the data into smaller parts and forecast traffic for specific branches instead of forecasting the departure of wagons.
-
'''Novelty:''' Данные подготовлены по уникальной технологии информационного анализа электрокардиосигналов, разработанной проф. д.м.н. В.М.Успенским. Предложен алгоритм классификации and исследована его обобщающая способность.
+
===14. 2014===
 +
* Choosing the optimal model for predicting human physical activity based on accelerometer measurements
 +
* '''consultant:''' Tokmakova Alexandra
 +
* '''Problem:''' Suggest an algorithm for sequential modification of the neural network. The goal is to find the most simple, stable and accurate network configuration that allows solving the problem of two-class (variant: multi-class) physical activity prediction.
 +
* '''Data:''' Set of time series of accelerometer measurements.
 +
* '''References:'''
 +
*# Decimation of neural families on Machinelearning.ru.
 +
*# Khaikin S. Neural networks. Williams, 2006.
 +
* '''Basic algorithm:''' Optimal Brain Damage/Optimal Brain Surgery.
 +
* '''Novelty:''' A method for sequential generation of neural networks of optimal complexity is proposed. The stability of generated models is studied.
-
===3. Влияние регуляризаторов разреживания, сглаживания and декорреляции на устойчивость вероятностной тематической модели===
+
=== 15. 2014===
 +
* Time Series Metaprediction
 +
* '''consultant:''' A.S. Inyakin, Ivkin Nikita
 +
* '''Problem:''' A set of time series forecasting algorithms is specified. According to the presented time series, it is required to indicate the algorithm that delivers the most accurate forecast. In this case, the algorithm itself is not supposed to be executed. To solve this problem, it is proposed to build a set of features that describe the Expert time series, but a set of generating functions is created that 1) transform the time series (by example, smooth, decompose into principal components), 2) extract its aggregated descriptions from the time series (by example, mean, variance , the number of extrema). It is possible to generate a significant number of features by constructing superpositions of generating functions.
 +
* '''Data:''' Library of quasi-periodic and aperiodic time series
 +
* '''References:'''
 +
*# Kuznetsov M.P., Mafusalov A.A., Zhivotovsky N.K., Zaitsev E., Sungurov D.S. Smoothing forecasting algorithms // Machine learning and data analysis. 2011. T. 1, No. 1. C. 104-112.
 +
*# Fadeev I.V., Ivkin N.P., Savinov N.A., Kornienko A.I., Kononenko D.S., Dzhamtyrova R.B. Autoregressive forecasting algorithms // Machine learning and data analysis. 2011. T. 1, No. 1. C. 92-103.
 +
* '''Basic algorithm:''' Use the SAS/SPSS algorithm.
 +
* '''Novelty:''' A method for fast selection of the optimal predictive algorithm based on the description of the time series is proposed.
-
'''consultant:''' М.A. Дударенко
+
=== 16. 2014===
 +
* Identification of a person by the image of the iris
 +
* '''consultant:''' Matveev I. A.
 +
* '''Problem:''' In the problem of identifying a person by the image of the iris (iris), the most important role is played by the selection of the region of the iris in the original image (segmentation of the iris). However, the iris image is usually partially obscured (shaded) by eyelids, eyelashes, highlights, that is, part of the iris cannot be used for recognition and moreover, the use of data from shaded areas can generate false signs and reduce accuracy. Therefore, one of the important steps in the segmentation of the iris image is the rejection of shaded areas.
 +
* '''Data:''' bitmap monochrome image, typical size 640*480 pixels (however, other sizes are possible) and coordinates of centers and radii of two circles approximating pupil and iris.
 +
* '''References:'''
 +
*# [[Media:The problemIris.pdf |Problem description and proposed solutions]]
 +
*# Monro D. University of Bath Iris Image Database // http:// www.bath.ac.uk/ elec-eng/ research/ sipg/ irisweb/
 +
*# Chinese academy of sciences institute of automation (CASIA) CASIA Iris image database // http://www.cb-sr.ia.ac.cn/IrisDatabase.htm, 2005.
 +
*# MMU Iris Image Database: Multimedia University // http://pesonna.mmu.edu.my/ccteo/
 +
*# Phillips P.J., Scruggs W.T., O'Toole A.J. et al. Frvt2006 and ice2006 large-scale experimental results // IEEE PAMI. 2010. V. 32. No. 5. P. 831–846.
 +
*# G.Xu, Z.Zhang, Y.Ma Improving the performance of iris recognition system using eyelids and eyelashes detection and iris image enhancement // Proc. 5Th Int. Conf. Cognitive Informatics. 2006. P.871-876.
 +
* '''Basic algorithm:''' method using sliding window and texture features [2006: Xu, Zhang, Ma].
 +
* '''Novelty:''' the mask of the open area of the iris has been built.
-
'''Task:'''Вероятностная тематическая модель описывает вероятности появления слов <tex>w\in W</tex> в документах <tex>d\in D</tex> через латентные темы <tex>t\in T</tex>:
+
=== 17. 2014 ===
 +
* Search for effective methods of dimensionality reduction in solving problems of multiclass classification by reducing it to solving binary problems
 +
* '''consultant:''' Yu.V. Maksimov
 +
* '''Problem:''' Explore different approaches to solving multi-class classification problems and compare their performance.
 +
* '''Data:''' Data with a different number of classes.
 +
*# Toy example: Shuttle dataset. http://archive.ics.uci.edu/ml/datasets/Statlog+(Shuttle). Small sample, 7 classes. No need to do data preparation.
 +
*# Reuters collection text data http://www.daviddlewis.com/resources/testcollections/reuters21578/.
 +
*# Data from our LIG Kaggle contest http://www.kaggle.com/c/lshtc
 +
* '''References:'''
 +
*# [[Media:LearningEmbedding.pdf |Problem description and proposed solutions]]
 +
*# Xia lecture. http://courses.washington.edu/ling572/winter2012/slides/ling572_class13_multiclass.pdf
 +
*# Rifkin lecture http://www.mit.edu/~9.520/spring08/Classes/multiclass.pdf
 +
*# Tax, Duin. Using two-class classifiers for multiclass classification. Pattern Recognition, 2002. Proceedings. 16th International Conference on (Volume:2). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.19.7063&rep=rep1&type=pdf
 +
*# Dietterich, Bakiri. Solving Multiclass Learning Problems via Error-Correcting Output Codes. 1995. http://arxiv.org/pdf/cs/9501101
 +
*# Allwein, Schapire, Singer. Reducing Multiclass to Binary:A Unifying Approach for Margin Classifiers. Journal of Machine Learning Research 1 (2000) 113-141. http://machinelearning.wustl.edu/mlpapers/paper_files/AllweinSS00.pdf
 +
* '''Basic algorithms:''' SVM with different cores, Adaboost. Basic approaches: one vs all(combined), one vs one(uncombined)
-
<tex> p(w|d) = \sum_{t\in T} p(w|t)p(t|d) = \sum_{t\in T} \phi_{wt}\theta_{td}.</tex>
+
=== Trial Programming ===
-
 
+
-
Представление матрицы <tex>\|p(w|d)\|_{W\times D}</tex>
+
-
в виде произведения двух матриц меньшего размера <tex>{\Phi=\|\phi_{wt}\|_{W\times T}}</tex> and <tex>{\Theta=\|\theta_{dt}\|_{T\times D}}</tex> не единственно:
+
-
<tex>\Phi \Theta = (\Phi S)(S^{-1}\Theta) = \Phi'\Theta'</tex>
+
-
для некоторых невырожденных <tex>S</tex>.
+
-
Требуется проверить гипотезу, что, накладывая ограничения на матрицы <tex>\Phi, \Theta</tex> с помощью регуляризаторов,
+
-
возможно повысить устойчивость их восстановления.
+
-
 
+
-
'''Data:''' Коллекция документов задаётся частотами слов. Поскольку для
+
-
решения задачи необходимо знать «истинные» матрицы <tex>\Phi, \Theta,</tex> эксперименты производятся на реалистичных модельных или полумодельных данных, удовлетворяющих гипотезам разреженности, слабой коррелированности тем and наличия фоновых тем.
+
-
 
+
-
'''References:'''
+
-
* Vorontsov K. V. Аддитивная регуляризация тематических моделей коллекций текстовых документов // Доклады РАН. 2014. — Т. 455, №3 (в печати).
+
-
* Vorontsov K. V. Вероятностное тематическое моделирование. — 2014. http://www.MachineLearning.ru/wiki/images/2/22/Voron-2013-ptm.pdf.
+
-
 
+
-
'''Basic algorithm:''' Для решения оптимизационной задачи используется регуляризованный EM-алгоритм [2014: Воронцов]. Может быть использована рациональная, стохастическая или онлайновая версия EM-алгоритма.
+
-
 
+
-
'''Novelty:''' Аддитивная регуляризация тематических моделей (ARTM) предложена в [2014: Воронцов] как универсальный способ повышения устойчивости and интерпретируемости тематических моделей. Однако вопрос о том, какое именно сочетание регуляризаторов повышает устойчивость, пока остаётся открытым. Данное исследование направлено на решение этой проблемы.
+
-
 
+
-
===4. Построение рейтингов вузов: панельный анализ and оценка устойчивости===
+
-
+
-
'''consultant:''' М.П. Кузнецов
+
-
 
+
-
'''Task:''' Рейтинг вуза изменяется от года к году. Это изменение может быть вызвано плохим качеством методики подсчета рейтинга, случайными изменениями в показателях вуза and целенаправленным изменением состояния вуза. Требуется предложить такую устойчивую к случайным изменениям методику рейтингования, которая бы позволяла интерпретировать изменение состояния вуза.
+
-
 
+
-
'''Data:''' Данные по ста ведущим мировым университетам за восемь лет.
+
-
 
+
-
'''References:'''
+
-
* Strizhov V.V. Уточнение Expertных оценок с помощью измеряемых данных // Заводская лаборатория. Диагностика материалов, 2006, 72(7) — 59-64.
+
-
* Strizhov V.V. Уточнение Expertных оценок, выставленных в ранговых шкалах, с помощью измеряемых данных // Заводская лаборатория. Диагностика материалов, 2011, 77(7) — 72-78.
+
-
* Kuznetsov M.P., Strijov V.V. Methods of expert estimations concordance for integral quality estimation // Expert Systems with Applications, 2014.
+
-
* ''Черновик статьи POF по запросу.''
+
-
'''Basic algorithm:''' Методика построения рейтинга RUR and один из избыточно устойчивых алгоритмов для ранговых шкал.
+
-
 
+
-
'''Novelty:''' Введено понятие интерпретируемости изменения позиции рейтинга. Решена Task выбора and оптимальной локально-монотонной коррекции показателей. Предложена методика построения рейтинга, позволяющевого интерпретировать изменение состояния вуза с целью мониторинга. Вариант: решена обратная Task управления: как изменить показатели вуза, чтобы достичь заданной цели.
+
-
 
+
-
===5. Обнаружение закономерностей в наборе временных рядов методами структурного обучения===
+
-
 
+
-
'''consultant:''' А.П. Мотренко
+
-
 
+
-
'''Task:''' Для повышения качества прогноза временных рядов хочется использовать экспертные высказывания о наличии причинно-следственной связи между событиями. Для этого необходимо уметь оценивать достоверность Expertных высказываний. Доказать наличие причинно-следственной связи статистическими методами невозможно. Исследователь может лишь проверить наличие определенной структуры связи. Целью задачи является, опираясь на экспертные высказывания о наличии связи между событиями, исследовать временные ряды на наличие различных структурных связей and найти структуру, наиболее согласованную с мнением Expertа.
+
-
 
+
-
'''References:'''
+
-
* R. B. Kline, Principles and Practice of Structural Equation Modeling. New York: Guilford. 2005.
+
-
* J. Pearl, Graphs, Causality and Structural Equation Models. Sociological Methods and Research, 27-2(1998), 226-284.
+
-
* J. Pearl, E. Bareinboim, Transportability of Causal and Statistical Relations: A Formal Approach // Proceedings of the 25th AAAI Conference on Artificial Intelligence, August 7-11, 2011, San Francisco. 247-254
+
-
* Вальков А.С., Кожанов Е.М., Мотренко А.П., Хусаинов Ф.И. Построение кросс-корреляционных зависимостей при прогнозе загруженности железнодорожного узла // Машинное обучение and анализ данных. 2013. T. 1, № 5. C. 505-518.
+
-
* Вальков А.С., Кожанов Е.М., Медведникова М.М., Хусаинов Ф.И. Непараметрическое прогнозирование загруженности системы железнодорожных узлов по историческим данным // Машинное обучение and анализ данных. 2012. T. 1, № 4. C. 448-465.
+
-
'''Basic algorithm:''' моделирование структурных уравнений, SEM
+
-
 
+
-
'''Novelty:''' Предложен метод оценки достоверности Expertных высказываний о влиянии биржевых цен на основные инструменты на объем железнодорожных грузоперевозок. Предложены различные структуры связей между временными рядами. Введено понятие сложности структуры. Исследована связь между сложностью структуры and оценкой достоверности высказывания.
+
-
 
+
-
===18. Использование нелинейного прогнозирования при поиске зависимостей между временными рядами===
+
-
 
+
-
'''consultant:''' А.П. Мотренко
+
-
 
+
-
'''Task:''' (Как часть исследования, посвященного обнаружению закономерностей в наборах временных рядов) Предлагается отказаться при поиске зависимостей между временными рядами от стандартных предположений о стационарности временного ряда and исследовать временные ряды с точки зрения теории динамических систем, в рамках которой рассматриваются нерегулярные временные зависимости, определенные структурой фазового пространства. Требуется изучить набор подходов к анализу динамических данных and выявлению связей между ними; описать границы применимости базового алгоритма and предложить новые варианты выявляемых структурных связей.
+
-
Data: Синтетические данные, исторические биржевые цены на основные инструменты and данные по железнодорожным грузоперевозкам.
+
-
 
+
-
'''References:'''
+
-
* Tools for the Analysis of Chaotic Data. HENRY D. I. ABARBANEL
+
-
* Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series, G. Sugihara, R.M. May.
+
-
* George Sugihara et al. Detecting Causality in Complex Ecosystems. Science 338, 496 (2012);
+
-
* Вальков А.С., Кожанов Е.М., Мотренко А.П., Хусаинов Ф.И. Построение кросс-корреляционных зависимостей при прогнозе загруженности железнодорожного узла // Машинное обучение and анализ данных. 2013. T. 1, № 5. C. 505-518.
+
-
* Вальков А.С., Кожанов Е.М., Медведникова М.М., Хусаинов Ф.И. Непараметрическое прогнозирование загруженности системы железнодорожных узлов по историческим данным // Машинное обучение and анализ данных. 2012. T. 1, № 4. C. 448-465.
+
-
'''Basic algorithm:''' convergent cross mapping
+
-
 
+
-
'''Novelty:''' Предложены различные структуры связей между временными рядами and метод проверки наличия связей
+
-
 
+
-
===6. Последовательное порождение существенно нелинейных моделей в Taskх ранжирования документов===
+
-
 
+
-
'''consultant:''' М.П. Кузнецов
+
-
+
-
'''Task:''' Предложить and протестировать на тестовых and реальных данных алгоритм порождения существенно нелинейных моделей. Алгоритм должен порождать 1) полный набор моделей 2) выбирать оптимальный шаг для фиксированной структуры модели (добавление элемента суперпозиции).
+
-
 
+
-
'''Data:''' Синтетические данные, данные по текстовым коллекциям LIG.
+
-
 
+
-
'''References:'''
+
-
* Goswami P., Moura1 S., Gaussier E., Amini M.R. Exploring the Space of IR Functions //
+
-
* Рудой Г.И., Strizhov V.V. Алгоритмы индуктивного порождения суперпозиций для аппроксимации измеряемых данных // Информатика and её применения, 2013, 7(1) — 17-26.
+
-
* Рудой Г.И., Strizhov V.V. Упрощение суперпозиций элементарных функций при помощи преобразований графов по правилам // Интеллектуализация обработки информации. Доклады 9-й международной конференции, 2012 — 140-143.
+
-
* Vladislavleva E.,Smith G., Hertog D., Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via Pareto Genetic Programming // IEEE Transactions on Evolutionary Computation, 2009. Vol. 13(2). Pp. 333-349.
+
-
* Vladislavleva E. Model-based Problem Solving through Symbolic Regression via Pareto Genetic Programming: PhD thesis, Tilburg University, Tilburg, the Netherlands, 2008.
+
-
'''Basic algorithm:''' Алгоритм полного перебора допустимых суперпозиций порождающих функций.
+
-
 
+
-
'''Novelty:''' Предложен алгоритм последовательного добавления элементы суперпозиций. Предложена функция расстояния между суперпозициями, исследованы ее свойства. Введено понятие сложности суперпозиции and понятие смежных суперпозиций, отличающихся по сложности на единицу. Предложен алгоритм порождения смежных суперпозиций.
+
-
 
+
-
===7. Обнаружение изоморфных структур существенно нелинейных прогностических моделей===
+
-
 
+
-
'''consultant:''' Р.А. Сологуб, М.П. Кузнецов
+
-
 
+
-
'''Task:''' Развить алгоритм поиска изоморфных подграфов для деревьев (вариант - для ориентированных ациклических графов). Сравнить сложность алгоритма проверки изоморфности двух суперпозиций для предлагаемого алгоритма and для алгоритма поэлементного сравнения отображений.
+
-
 
+
-
'''Data:''' Данные по биржевым опционам: зависимость волатильности опциона от цены and времени его исполнения.
+
-
 
+
-
'''References:'''
+
-
* Рудой Г.И., Strizhov V.V. Алгоритмы индуктивного порождения суперпозиций для аппроксимации измеряемых данных // Информатика and её применения, 2013, 7(1) — 17-26.
+
-
* Рудой Г.И., Strizhov V.V. Упрощение суперпозиций элементарных функций при помощи преобразований графов по правилам // Интеллектуализация обработки информации. Доклады 9-й международной конференции, 2012 — 140-143.
+
-
* Ehrig H., Ehrig G., Prange U.,Taentzer. G. Fundamentals of Algebraic Graph Transformation. Springer, 2006.
+
-
* Ehrig H., Engels G. Handbook of Graph Grammars and Computing by Graph Transformation. World Scientific Publishing, 1997.
+
-
* Strizhov V.V., Сологуб Р.А. Индуктивное порождение регрессионных моделей предполагаемой волатильности для опционных торгов // Вычислительные технологии, 2009, 14(5) — 102-113.
+
-
'''Basic algorithm:''' Алгоритм поэлементного сравнения отображений.
+
-
 
+
-
'''Novelty:''' Предложен быстрый алгоритм упрощения суперпозиций and поиска изоморфных моделей. Используется матрица инцидентности набора порождающих функций.
+
-
 
+
-
===8. Построение прогностических моделей как суперпозиций Expertно-заданных функций===
+
-
 
+
-
'''consultant:''' Н.П. Ивкин
+
-
 
+
-
'''Task:''' Требуется отнести набор временных рядов к одному из нескольких классов. Предлагается это сделать с помощью процедуры автоматизированного порождения признаков. Для этого Expertно создается набор порождающих функций, которые 1) преобразуют временной ряд (например, сглаживают, раскладывают по главным компонентам), 2) извлекают из временного ряда его агрегированные описания (например, среднее, дисперсию, число экстремумов). Возможно порождение значительного количества признаков путем построения суперпозиций порождающих функций. Полученные признаки используются для классификации набора временных рядов (например, методом ближайших соседей).
+
-
 
+
-
'''Data:''' данные с акселерометра мобильного телефона.
+
-
 
+
-
'''References:'''
+
-
* Постановка задачи \MLAlgorithms\Group074\Kuznetsov2013SSAForecasting\doc
+
-
* Хайкин С. Нейронные сети. Вильямс, 2006.
+
-
'''Basic algorithm:''' нейронная сеть (вариант: нейронная сеть глубокого обучения).
+
-
 
+
-
'''Novelty:''' Предложен способ извлечения признаков с помощью автоматически построенных суперпозиций Expertно-заданных функций.
+
-
 
+
-
Сравнение структурной and топологической сложности в Taskх классификации.
+
-
 
+
-
===9. Обучение многообразий для прогнозирования наборов квазипериодических временных рядов ===
+
-
 
+
-
'''consultant:''' Н.П. Ивкин
+
-
 
+
-
'''Task:''' Решается Task классификации человеческой активности на основании данных с акселерометра мобильного телефона. Данные с акселерометра представляются квазипериодическими временными рядами. Требуется отнести временной ряд к одному из видов активности: бег, ходьба and др. Для решения задачи классификации рядов предлагается метод на основе ближайших соседей в пространстве многообразий.
+
-
 
+
-
'''Data:''' данные с акселерометра мобильного телефона.
+
-
 
+
-
'''References:'''
+
-
* Mi Zhang; Sawchuk, A.A., "Manifold Learning and Recognition of Human Activity Using Body-Area Sensors," Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on , vol.2, no., pp.7,13, 18-21 Dec. 2011
+
-
'''Basic algorithm:''' нейронная сеть
+
-
 
+
-
'''Novelty:''' предложен способ классификации квазипериодических временных рядов на основе многообразий
+
-
 
+
-
=== 10. Интерпретация мультимоделей при обработке социологических данных ===
+
-
'''consultant:''' А.А. Адуенко
+
-
 
+
-
'''Task:''' Task кредитного скоринга заключается в определении уровня кредитоспособности заемщика, подавшего заявку на кредит. Для этого используется анкета заемщика, содержащая как числовые данные (возраст, доход, время проживания в стране), так and категориальные признаки (пол, профессия). Требуется, имея историческую информацию о возвратах кредитов другими заемщиками, определить, вернет ли кредит рассматриваемый клиент. Таким образом, требуется решить задачу классификации. Так как данные могут быть разнородными (например, в случае наличия в стране разных регионов по доходу), данные могут описываться не одной, а несколькими моделями. В данной работе предлагается сравнить два метода построения мультимоделей: смеси логистических моделей and градиентный бустинг.
+
-
 
+
-
'''Data:''' данные по потребительским кредитам (\mlalgorithms\BSThesis\Aduenko2013\data).
+
-
 
+
-
'''References:'''
+
-
* смеси моделей (\mlalgorithms\BSThesis\Aduenko2013\doc, Bishop)
+
-
* бустинг (лекция «Композиционные методы классификации and регрессии» Воронцова)
+
-
 
+
-
'''Basic algorithm:''' бустинг.
+
-
 
+
-
'''Novelty:''' Выявление and объяснение сходств and различий решений, полученных двумя указанными алгоритмами.
+
-
 
+
-
=== 11. Выбор оптимальных структур прогностических моделей методами структурного обучения ===
+
-
'''consultant:''' А.А. Варфоломеева
+
-
 
+
-
'''Task:''' Предлагается решать задачу прогнозирования в два этапа: сначала по Storyм построения успешных прогнозов восстанавливается структура прогностической модели. Затем параметры модели оптимизируются; с помощью модели строится прогноз временного ряда.
+
-
 
+
-
'''Data:''' синтетическая выборка, биомедицинские временные ряды, результаты измерений акселерометра.
+
-
 
+
-
'''References:'''
+
-
* Jaakkola T. Scaled structured prediction.
+
-
* URL: http://video.yandex.ru/users/ya-events/view/486/user-tag/научный%20семинар/
+
-
* ''Найти все работы учеников TJ по данной тематике.''
+
-
* Варфоломеева А.А. Дипломная работа бакалавра в MLAlgorithms/BSThesis/Varfolomeeva
+
-
 
+
-
'''Basic algorithm:''' алгоритм метапрогнозирования, описанный в дипломной работе.
+
-
 
+
-
'''Novelty:''' Предложен метод восстановления структур моделей с использованием априорных предположений об этих структурах.
+
-
 
+
-
===12. Инварианты при прогнозировании квазипериодических рядов ===
+
-
'''consultant:''' А.А. Кузьмин
+
-
 
+
-
'''Task:''' Решается Task почасового прогнозирования цен/потребления электроэнегрии на сутки вперед. При построении матрицы плана предлагается использовать не исходный отрезок временного временной ряда, а его инвариантное представление.
+
-
 
+
-
'''Data:''' почасовые данные о ценах and объема потребления электроэнергии (вставить ссылку).
+
-
 
+
-
'''References:'''
+
-
* Сандуляну Л.Н., Strizhov V.V. Выбор признаков в авторегрессионных Taskх прогнозирования // Информационные технологии, 2012, 7 — 11-15.
+
-
*''(взять из последней статьи Фадеева)''
+
-
 
+
-
'''Basic algorithm:''' авторегрессионное прогнозирование, описанное в работе Сандуляну.
+
-
 
+
-
'''Novelty:''' Предложен алгоритм совместной оценки параметров инвариантов and авторегрессионной модели, позволяющий существенно повысить точность прогнозирования.
+
-
 
+
-
=== 13. Прогнозирование объемов железнодорожных грузоперевозок по парам веток ===
+
-
'''consultant:''' М.М. Стенина (Медведникова)
+
-
 
+
-
'''Task:''' Спрогнозировать объемы перевозок с ветки на ветку, сравнить с базовым алгоритмом прогноза отправления вагонов с ветки. Проверить гипотезу о том, что прогноз перевозок с ветки на ветку точнее, чем прогноз при помощи базового алгоритма. Исследовать ряды на тренд/периодичность. Если тренд/периодичность есть, то включить в модель. Подготовить алгоритм прогнозирования для использования.
+
-
 
+
-
'''Data:''' посуточные данные за полтора года о перевозках 38 типов грузов по Омской области.
+
-
 
+
-
'''References:'''
+
-
*Вальков А.С., Кожанов Е.М., Медведникова М.М., Хусаинов Ф.И. Непараметрическое прогнозирование загруженности системы железнодорожных узлов по историческим данным // Машинное обучение and анализ данных. — 2012. — № 4.
+
-
 
+
-
'''Basic algorithm:''' гистограммное прогнозирование, описанное в статье.
+
-
 
+
-
'''Novelty:''' предлагается повысить качество прогноза путем разделения данных на меньшие части and прогнозирования перевозок по конкретным веткам вместо прогноза отправления вагонов.
+
-
 
+
-
===14. Выбор оптимальной модели прогнозирования физической активности человека по измерениям акселерометра ===
+
-
'''consultant:''' А.А. Токмакова
+
-
 
+
-
'''Task:''' Предложить алгоритм последовательной модификации нейронной сети. Цель - найти наиболее простую, устойчивую and точную конфигурацию сети, позволяющую решить задачу двухклассового (вариант: многоклассового) прогнозирования физической активности.
+
-
 
+
-
'''Data:''' Набор временных рядов измерений акселерометра.
+
-
 
+
-
'''References:'''
+
-
* Прореживание нейронных семей на сайте Machinelearning.ru.
+
-
* Хайкин С. Нейронные сети. Вильямс, 2006.
+
-
'''Basic algorithm:''' Optimal Brain Damage/Optimal Brain Surgery.
+
-
 
+
-
'''Novelty:''' Предложен способ последовательного порождения нейронных сетей оптимальной сложности. Исследована устойчивость порождаемых моделей.
+
-
 
+
-
=== 15. Метапрогнозирование временных рядов ===
+
-
'''consultant:''' А.С. Инякин, Н.П. Ивкин
+
-
 
+
-
'''Task:''' Задан набор алгоритмов прогнозирования временных рядов. По предъявленному временному ряду требуется указать алгоритм, который доставляет наиболее точный прогноз. При этом сам алгоритм выполнять не предполагается. Для решения этой задачи предлагается построить набор признаков, описывающих временной ряд Expertно создается набор порождающих функций, которые 1) преобразуют временной ряд (например, сглаживают, раскладывают по главным компонентам), 2) извлекают из временного ряда его агрегированные описания (например, среднее, дисперсию, число экстремумов). Возможно порождение значительного количества признаков путем построения суперпозиций порождающих функций.
+
-
 
+
-
'''Data:''' Библиотека квазипериодических and апериодических временных рядов
+
-
 
+
-
'''References:'''
+
-
* Kuznetsov M.P., Мафусалов А.А., Животовский Н.К., Зайцев Е., Сунгуров Д.С. Сглаживающие алгоритмы прогнозирования // Машинное обучение and анализ данных. 2011. T. 1, № 1. C. 104-112.
+
-
* Фадеев И.В., Ivkin N.P., Савинов Н.А., Корниенко А.И., Кононенко Д.С., Джамтырова Р.Б. Авторегрессионные алгоритмы прогнозирования // Машинное обучение and анализ данных. 2011. T. 1, № 1. C. 92-103.
+
-
'''Basic algorithm:''' Использовать алгоритм SAS/SPSS.
+
-
 
+
-
'''Novelty:''' Предложен метод быстрого выбора оптимального прогностического алгоритма по описанию временного ряда.
+
-
 
+
-
=== 16. Идентификация человека по изображению радужной оболочки глаза ===
+
-
'''consultant:''' И.А. Матвеев
+
-
 
+
-
'''Task:''' В проблеме идентификации человека по изображению радужной оболочки глаза (радужке) важнейшую роль играет выделение области радужки на исходном снимке (сегментация радужки). Однако, изображение радужки как правило частично закрыто (затенено) веками, ресницами, бликами, то есть часть радужки не может быть использована для распознавания and более того, использование данных с затенённых участков может порождать ложные признаки and снижать точность. Поэтому одним из важных этапов сегментации изображения радужки является отбраковка затенённых участков.
+
-
 
+
-
'''Data:''' растровое монохромное изображение, типичный размер 640*480 пикселей (однако, возможны and другие размеры) and координаты центров and радиусы двух окружностей, аппроксимирующих зрачок and радужку.
+
-
 
+
-
'''References:'''
+
-
* [[Медиа:TaskIris.pdf |Описание задачи and предлагаемые пути решения]]
+
-
* Monro D. University of Bath Iris Image Database // http:// www.bath.ac.uk/ elec-eng/ research/ sipg/ irisweb/
+
-
* Chinese academy of sciences institute of automation (CASIA) CASIA Iris image database // http://www.cb-sr.ia.ac.cn/IrisDatabase.htm, 2005.
+
-
* MMU Iris Image Database: Multimedia University // http:// pesonna.mmu.edu.my/ ccteo/
+
-
* Phillips P.J., Scruggs W.T., O’Toole A.J. et al. Frvt2006 and ice2006 large–scale experimental results // IEEE PAMI. 2010. V. 32. № 5. P. 831–846.
+
-
* G.Xu, Z.Zhang, Y.Ma Improving the performance of iris recogniton system using eyelids and eyelashes detection and iris image enhancement // Proc. 5Th Int. Conf. Cognitive Informatics. 2006. P.871-876.
+
-
'''Basic algorithm:''' метод, использующий скользящее окно and текстурные признаки [2006: Xu, Zhang, Ma].
+
-
 
+
-
'''Novelty:''' построена маска открытой области радужки.
+
-
 
+
-
=== 17. Поиск эффективных методов снижения размерности при решении задач мультиклассовой классификации путем её сведения к решению бинарных задач ===
+
-
'''consultant:''' Ю.В. Максимов
+
-
 
+
-
'''Task:''' Исследовать различные подходы к решению задач классификации с многими классами and сравнить их эффективность.
+
-
 
+
-
'''Data:''' Данные с различным числом классов.
+
-
0. Toy example: Shuttle dataset. http://archive.ics.uci.edu/ml/datasets/Statlog+(Shuttle). Маленькая выборка, 7 классов. Не надо делать подготовку данных.
+
-
1. Текстовые данные коллекции Reuters http://www.daviddlewis.com/resources/testcollections/reuters21578/.
+
-
2. Данные нашего конкурса Kaggle от LIG http://www.kaggle.com/c/lshtc
+
-
 
+
-
'''References:'''
+
-
* [[Медиа:LearningEmbedding.pdf |Описание задачи and предлагаемые пути решения]]
+
-
* Xia lecture. http://courses.washington.edu/ling572/winter2012/slides/ling572_class13_multiclass.pdf
+
-
* Rifkin lecture http://www.mit.edu/~9.520/spring08/Classes/multiclass.pdf
+
-
* Tax, Duin. Using two-class classifiers for multiclass classification. Pattern Recognition, 2002. Proceedings. 16th International Conference on (Volume:2). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.19.7063&rep=rep1&type=pdf
+
-
* Dietterich, Bakiri. Solving Multiclass Learning Problems via Error-Correcting Output Codes. 1995. http://arxiv.org/pdf/cs/9501101
+
-
* Allwein, Schapire, Singer. Reducing Multiclass to Binary:A Unifying Approach for Margin Classifiers. Journal of Machine Learning Research 1 (2000) 113-141. http://machinelearning.wustl.edu/mlpapers/paper_files/AllweinSS00.pdf
+
-
 
+
-
'''Базовые алгоритмы:''' SVM с различными ядрами, Adaboost. Базовые подходы: one vs all(combined), one vs one(uncombined)
+
-
 
+
-
== Домашнее задание-2: пробное программирование ==
+
{|class="wikitable"
{|class="wikitable"
-
! Task
+
! The problem
-
! Кто делает
+
! Who is doing
-
! Номер
+
! Number
|-
|-
-
|Дана выборка [http://archive.ics.uci.edu/ml/datasets/Wine "Вина различных регионов"]. Требуется определить кластеры (регионы происхождения вин) and нарисовать результат: цветной точкой обозначен объект кластера; цветным кружком обозначен класс этого объекта, взятый из выборки. Вариант задания: определить число кластеров. Вариант задания: использовать два алгоритма, например k-means and EM, and показать сравнение результатов кластеризации на графике.
+
|A selection is given [http://archive.ics.uci.edu/ml/datasets/Wine "Wine of different regions"]. It is required to determine the clusters (regions of origin of wines) and draw the result: the cluster object is marked with a colored dot; the colored circle indicates the class of this object taken from the sample. The problem option: determine the number of clusters. The problem option: use two algorithms, for example k-means and EM, and show a comparison of clustering results on a graph.
-
|Плавин
+
|Plavin
| 1
| 1
|-
|-
-
|Предложить способы визуализации наборов четырехмерных векторов, например для [http://archive.ics.uci.edu/ml/datasets/Iris Fisher's iris data].
+
|Suggest ways to visualize sets of 4D vectors, see example for [http://archive.ics.uci.edu/ml/datasets/Iris Fisher's iris data].
-
|Записать свою фамилию тут.
+
|Write down your last name here.
| 2
| 2
|-
|-
-
|Дан временной [http://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption ряд], описывающий потребление электричества. Приблизить ряд несколькими [[Линейная регрессия (пример)| криволинейными моделями]] and нарисовать спрогнозированные and исходный ряды на одном графике.
+
|Given a time series [http://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption series] describing electricity consumption. Approximate a series by several [[Linear regression (example)| curvilinear models]] and plot the predicted and original series on the same graph.
-
|Кулунчаков Андрей.
+
|Kulunchakov Andrey.
| 3
| 3
|-
|-
-
|Сгладить временной ряд [[Временной ряд (библиотека примеров)|Цены (объемы) на основные биржевые инструменты]] методом [[Экспоненциальное сглаживание| экспоненциального сглаживания]]. Нарисовать цветные графики сглаженных с различным <tex> \alpha </tex> рядов and исходного ряда.
+
|Smooth the time series [[Time series (library of examples)|Prices (volumes) for the main exchange instruments]] using the [[Exponential smoothing| exponential smoothing]]. Draw color plots of the antialiased rows with different <tex> \alpha </tex> and the original row.
-
|Авдюхов
+
|Avdyukhov
| 4
| 4
|-
|-
-
|Аппроксимация выборки замкнутой кривой [http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group874/Group874Essay/Group874Essay.pdf?format=raw]: проверить, лежат ли точки на окружности? Сгенерировать данные самостоятельно.
+
|Closed Curve Sample Fit [http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group874/Group874Essay/Group874Essay.pdf?format=raw]: Check if points lie on a circle? Generate data yourself.
-
| Газизуллина Римма
+
| Gazizullina Rimma
| 5
| 5
|-
|-
-
|Дан временной ряд с пропусками, например [http://archive.ics.uci.edu/ml/datasets/Gas+Sensor+Array+Drift+Dataset+at+Different+Concentrations]. Предложить способы заполнения пропусков в данных, заполнить пропуски. Для каждого способа построить гистограмму. Вариант: взять выборку без пропусков, удалить случайным образом часть данных, заполнить пропуски, сравнить с гистограммой исходной выборки.
+
|A time series with gaps is given, using the example [http://archive.ics.uci.edu/ml/datasets/Gas+Sensor+Array+Drift+Dataset+at+Different+Concentrations]. Suggest ways to fill in the gaps in the data, fill in the gaps. For each method, construct a histogram. Option: take a sample without gaps, randomly remove part of the data, fill in the gaps, compare with the histogram of the original sample.
-
|Игнатов Андрей
+
|Ignatov Andrey
| 6
| 6
|-
|-
-
|Дана выборка [http://archive.ics.uci.edu/ml/datasets/Wine "Вина различных регионов"]. Выбрать два признака. Рассмотреть различные функции расстояния при классификации с помощью [[Метод ближайших соседей| метода ближайшего соседа]]. Для каждой изобразить результат классификации в пространстве выбранных признаков.
+
|A selection is given [http://archive.ics.uci.edu/ml/datasets/Wine "Wine of different regions"]. Choose two features. Consider different distance functions when classifying with [[Nearest Neighbors| nearest neighbor method]]. For each, depict the classification result in the space of selected features.
-
|Попова Мария
+
|Maria Popova
| 7
| 7
|-
|-
-
|Для различных видов зависимости <tex> y = f(x) + \epsilon </tex> (линейная, квадратичная, логарифмическая) построить [[Линейная регрессия (пример)| линейную регрессию]] and нарисовать на графике SSE-отклонения (среднеквадратичные отклонения-?). Данные сгенерировать самостоятельно или взять данные "Цена на хлеб".
+
|For various types of dependence <tex> y = f(x) + \epsilon </tex> (linear, quadratic, logarithmic) build [[Linear regression (example)| linear regression]] and plot the SSE deviations (standard deviations-?). Generate data yourself or take data "Price for bread".
-
|Ефимова Ирина
+
|Efimova Irina
| 8
| 8
|-
|-
-
|Оценить площадь единичного круга методом Монте-Карло. Построить график зависимости результата от размера выборки.
+
|Estimate the area of a unit circle using the Monte Carlo method. Plot the result against the sample size.
-
|Шинкевич Михаил
+
|Shinkevich Mikhail
| 9
| 9
|-
|-
-
|Построить выпуклую оболочку точек на плоскости. Нарисовать график: точки and их выпуклая оболочка – замкнутая ломаная линия.
+
|Construct a convex hull of points on a plane. Draw a graph: points and their convex hull is a closed broken line.
-
|Макарова Анастасия
+
|Makarova Anastasia
| 10
| 10
|-
|-
-
|Дана выборка: [http://archive.ics.uci.edu/ml/datasets/Iris ирисы Фишера]. Реализовать процедуру классификации методом решающего дерева. Проиллюстрировать результаты классификации на плоскости в пространстве двух признаков.
+
|A selection is given: [http://archive.ics.uci.edu/ml/datasets/Fischer's Iris]. Implement the decision tree classification procedure. Illustrate the results of classification on a plane in the space of two features.
-
|Жуков Андрей
+
|Zhukov Andrey
| 11
| 11
|-
|-
-
|Задан временной ряд – [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/TSForecasting/TimeSeries/Sources/tsEnergyConsumption.csv объемы почасового потребления электроэнергии] (выбрать любые два дня). Аппроксимировать ряд полиномиальными моделями различных степеней (1-7). *Предложить метод определения оптимальной степени полинома.
+
|The time series is set - [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/TSForecasting/TimeSeries/Sources/tsEnergyConsumption.csv volumes of hourly electricity consumption] (select any two days). Approximate the series with polynomial models of various degrees (1-7). *Suggest a method for determining the optimal degree of a polynomial.
-
|Карасиков Михаил
+
|Karasikov Mikhail
| 12
| 12
|-
|-
-
|Задано два одномерных [[Временной ряд (библиотека примеров) | временных ряда]] различной длины. Вычислить расстояние между рядами методом динамического выравнивания.
+
|Two one-dimensional [[Time series (library examples)] | time series]] of various lengths. Calculate row spacing using dynamic alignment.
-
|Гринчук Алексей
+
|Grinchuk Alexey
| 13
| 13
|-
|-
-
|Сгенерировать набор точек на плоскости. Выделить and визуализировать главные компоненты.
+
|Generate a set of points on the plane. Select and visualize the main components.
-
| Липатова
+
| Lipatova
| 14
| 14
|-
|-
-
|Аппроксимировать выборку [https://dmba.svn.sourceforge.net/svnroot/dmba/Data/WhiteBreadPrices.csv цены на хлеб] полиномиальной моделью. Нарисовать график. Пометить объекты, являющиеся выбросами, используя правило трех сигм.
+
|Approximate the sample [https://dmba.svn.sourceforge.net/svnroot/dmba/Data/WhiteBreadPrices.csv bread prices] with a polynomial model. Draw a graph. Mark objects that are outliers using the three sigma rule.
-
|Швец Михаил
+
|Shvets Mikhail
| 15
| 15
|-
|-
-
|Разделить выборку [http://archive.ics.uci.edu/ml/datasets/Iris ирисы Фишера] на кластеры. Проиллюстрировать на графике результаты кластеризации, выделить кластеры разными цветами.
+
|Divide the sample [http://archive.ics.uci.edu/ml/datasets/Fischer's Iris] into clusters. Illustrate the results of clustering on a graph, highlight the clusters in different colors.
-
| Гущин Александр
+
| Gushchin Alexander
| 16
| 16
|-
|-
-
|'''И еще задания на выбор'''
+
|'''And more The problems to choose from'''
|
|
|
|
|-
|-
-
|Дана выборка из нескольких признаков, без целевого вектора Y. Например, эта https://dmba.svn.sourceforge.net/svnroot/dmba/Data/Diabets_LARS.csv Требуется указать тот признак, который хорошо описывается (в терминах линейной регрессии) остальными (такой признак обычно исключают из выборки).
+
|A sample of several features is given, without a target vector Y. For example, this https://dmba.svn.sourceforge.net/svnroot/dmba/Data/Diabets_LARS.csv You need to specify the feature that is well described (in terms of linear regression) by the rest (such a feature is usually excluded from the sample).
|
|
|17
|17
|-
|-
-
|Сгладить временной ряд [[Временной ряд (библиотека примеров)|(см. библиотеку)]] скользящим средним. Взять несколько окон разной длины and наложить результат на графике друг на друга.
+
|Smooth time series [[Time series (examples library)|(see library)]] with moving average. Take several windows of different lengths and superimpose the result on the graph on top of each other.
-
|Костюк
+
|Kostyuk
|18
|18
|-
|-
-
|Дан временной ряд [[Временной ряд (библиотека примеров)|(см. библиотеку)]]. По его вариационному ряду построить гистограмму из <tex>n</tex> перцентилей, нарисовать ее. Какое значение временного ряда встречается чаще всего?
+
|Given a time series [[Time series (examples library)|(see library)]]. Based on its variational series, construct a histogram of <tex>n</tex> percentiles and draw it. What is the most common time series value?
-
|Гиззатуллин Анвар
+
|Gizzatullin Anvar
|19
|19
|-
|-
-
|Показать разницу в скорости выполнения матричных операций and операций в цикле. Можно использовать в качестве примера [[Сингулярное разложение]] and другие методы линейной алгебры. Показать эффективность параллельных вычислений (parfor).
+
|Show the difference in the speed of performing matrix operations and operations in a loop. You can use [[Singular value decomposition]] and other linear algebra methods as an example. Show the efficiency of parallel computing (parfor).
|
|
|20
|20
|-
|-
-
|Разобраться как работает суперпозиция функций. С помощью функции @ породить все возможные полиномы от n переменных степени не более p. Вариант: приблизить полученными полиномами временной ряд цен на хлеб [[Линейная регрессия (пример)|(данные)]].
+
|Understand how function superposition works. Using the @ function, generate all possible polynomials in n variables of degree at most p. Option: use the obtained polynomials to approximate the time series of bread prices [[Linear regression (example)|(data)]].
|
|
|
|
Строка 5393: Строка 5293:
|}
|}
-
=2013=
+
==2013==
-
 
+
-
==Моя первая публикация с кросс-рецензированием==
+
-
 
+
-
== Задачи ==
+
{|class="wikitable"
{|class="wikitable"
|-
|-
-
! Task name
+
! Title
! Author
! Author
! Link
! Link
!MAIPVTDCHSJ
!MAIPVTDCHSJ
|-
|-
-
|Определение напечатанного изображения
+
|Definition of the printed image
-
|Пушняков Алексей
+
|Pushnyakov Alexey
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Pushnyakov2013SpectrumImage/doc]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Pushnyakov2013SpectrumImage/doc]
|MAIPVTDCHSJ
|MAIPVTDCHSJ
|-
|-
-
|Сравнение быстрых алгоритмов кластеризации
+
|Comparison of Fast Clustering Algorithms
-
|Катруца Александр
+
|Alexander Katrutsa
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Katrutsa2013RhoNets/Spring/doc]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Katrutsa2013RhoNets/Spring/doc]
|MAIPVTDCHS
|MAIPVTDCHS
|-
|-
-
|Векторная авторегрессия and управление макроэкономическими показателями
+
|Vector autoregression and management of macroeconomic indicators
-
|Кащеева Мария
+
|Kashcheeva Maria
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Kashcheeva2013InverseVAR/doc]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Kashcheeva2013InverseVAR/doc]
|MAIPVTDCHS
|MAIPVTDCHS
|-
|-
-
|Разметка библиографических записей с помощью логических алгоритмов
+
|Marking up bibliographic records using logical algorithms
-
|Рыскина Мария
+
|Ryskina Maria
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Ryskina2013Txt2Bib/doc]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Ryskina2013Txt2Bib/doc]
|MAIPVTDCHS
|MAIPVTDCHS
|-
|-
-
|Определение точной границы зрачка
+
|Determination of the exact border of the pupil
-
|Чинаев Николай
+
|Chinaev Nikolai
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Chinaev2013PupilBoundary/doc]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Chinaev2013PupilBoundary/doc]
|MAIPV.DCHS
|MAIPV.DCHS
|-
|-
-
|Векторная авторегрессия and управление макроэкономическими показателями
+
|Vector autoregression and management of macroeconomic indicators
-
|Гринчук Олег
+
|Grinchuk Oleg
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Grinchuk2013InverseVAR/doc]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Grinchuk2013InverseVAR/doc]
|MAIPVTD.HS
|MAIPVTD.HS
|-
|-
-
|Порождение нейронных сетей с Expertно-заданными функциями активации
+
|Generating Neural Networks with Expert-Defined Activation Functions
-
|Перекрестенко Дмитрий
+
|Perekrestenko Dmitry
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Perekrestenko2013DeepLearning/doc]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Perekrestenko2013DeepLearning/doc]
|MAIPVTDСHS
|MAIPVTDСHS
|-
|-
-
|Сравнительный анализ алгоритмов выбора признаков: точность, устойчивость, сложность регрессионных моделей
+
|Comparative analysis of feature selection algorithms: accuracy, stability, complexity of regression models
-
|Яшков Даниил
+
|Yashkov Daniel
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Yashkov2013FeatureSelection/doc]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Yashkov2013FeatureSelection/doc]
|MAI.VTD.HS
|MAI.VTD.HS
|-
|-
-
|Инвариантные преобразования в Taskх локального прогнозирования
+
|Invariant transformations in The problems of local forecasting
-
|Костин Александр
+
|Kostin Alexander
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Kostin2013Invariant4LocalForecast/doc]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Kostin2013Invariant4LocalForecast/doc]
|MAI.VT.HS
|MAI.VT.HS
|-
|-
-
|Алгоритм генетического программирования для решения задачи прогнозирования
+
|Genetic Programming Algorithm for Solving the Prediction Problem
-
|Воронов Сергей
+
|Voronov Sergey
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Voronov2013GeneticProg/doc]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Voronov2013GeneticProg/doc]
|MAIPVTDC.S
|MAIPVTDC.S
|-
|-
-
|Группировка номинальных переменных в Taskх банковского кредитного скоринга
+
|Grouping of Nominal Variables in Bank Credit Scoring The problems
-
|Митяшов Андрей
+
|Mityashov Andrey
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Mityashov2013ScoringFeatureSelection/doc]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Mityashov2013ScoringFeatureSelection/doc]
|MAIPVTDCHS
|MAIPVTDCHS
|-
|-
-
| Моделирование процесса обучения and забывания при оценке качества производства
+
| Modeling the process of learning and forgetting when assessing the quality of production
-
|Неклюдов Кирилл
+
|Neklyudov Kirill
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Neklyudov2013LearnForget/doc]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Neklyudov2013LearnForget/doc]
|MAI..DC.S
|MAI..DC.S
|-
|-
-
|Обзор алгоритмов упрощения алгебраических выражений
+
|Overview of Algorithms for Simplifying Algebraic Expressions
-
|Шубин Андрей
+
|Shubin Andrey
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Shubin2013Simplify/doc]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Shubin2013Simplify/doc]
|MAIPVTD.S
|MAIPVTD.S
|-
|-
-
|Алгоритмы переборного поиска наиболее информативных объектов and признаков в логистической регрессии
+
|Search algorithms for the most informative objects and features in logistic regression
-
|Ибраимова Айжан
+
|Ibraimova Aizhan
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Ibraimova2013ScoringSelection/doc]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Ibraimova2013ScoringSelection/doc]
|MAIP.TD..
|MAIP.TD..
|-
|-
-
|Интерпретация Expertных оценок видов Красной книги РФ путем отбора эталонных (представительных) объектов
+
|Interpretation of expert assessments of species of the Red Book of the Russian Federation by selecting reference (representative) objects
-
|Бырдин Александр
+
|Byrdin Alexander
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Byrdin2013RedBook/doc]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Byrdin2013RedBook/doc]
|MAI.TD.S
|MAI.TD.S
|-
|-
-
|Визуализация матрицы парных расстояний в тематическом моделировании
+
|Visualization of Pair Distance Matrix in Topic Modeling
-
|Вдовина Евгения
+
|Vdovina Evgenia
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Vdovina2013DistanceVisualizing/doc]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Vdovina2013DistanceVisualizing/doc]
|MAI.TDC.S
|MAI.TDC.S
|-
|-
-
|Алгоритм оценивания достоверности Expertных суждений о взаимосвязи временных рядов
+
|Algorithm for Estimating the Reliability of Expert Judgments on the Relationship of Time Series
-
|Антипова Наташа
+
|Antipova Natasha
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Antipova2013PlausibleExpert]
|[http://svn.code.sf.net/p/mlalgorithms/code/Group074Spring2013/Antipova2013PlausibleExpert]
|MAIP.T..S
|MAIP.T..S
|}
|}
-
===Task 2. Surname2013MassProduction (*eng)===
+
===2. 2013 MassProduction===
-
*'''Название.''' Порождение and оптимизация логических описаний при построении производственных линий.
+
*'''Name''' Generation and optimization of logical descriptions when building production lines.
-
*'''Проблема.''' Требуется поставить задачу синтеза допустимых суперпозиций, разработать алгоритм and протестировать его на синтетических данных.
+
*'''Problem''' It is required to set The problem of synthesizing admissible superpositions, develop an algorithm and test it on synthetic data.
-
*'''Данные.''' Требуется создать.
+
*'''Data''' Required to create.
-
*'''References:.''' Нужен поиск (скорее всего немецких публикаций).
+
*'''References:''' Need a search (most likely German publications).
-
*'''Предлагаемый алгоритм.''' Обсуждается.
+
*'''Proposed algorithm''' On discussion.
-
*'''Basic algorithm.''' Нет.
+
*'''Basic algorithm''' None.
-
===Task 3. Surname2013LearnForget (eng)===
+
===3. 2013 LearnForget===
-
*'''Название.''' Моделирование процесса обучения and забывания при оценке качества производства.
+
*'''Name''' Modeling the process of learning and forgetting when assessing the quality of production.
-
*'''Проблема.''' Найти адекватную регрессионную модель, описывающую деятельность группы людей.
+
*'''Problem''' Find an adequate regression model that describes the activities of a group of people.
-
*'''Данные.''' Данные по скорости and качеству сборки бумажных самолетиков.
+
*'''Data''' Data on the speed and quality of the assembly of paper airplanes.
-
*'''References:.''' Нужно искать.
+
* '''References:''' Need to find.
-
*'''Предлагаемый алгоритм.''' Процедура анализа регрессионных остатков.
+
*'''Proposed algorithm''' The procedure for analyzing regression residuals.
-
*'''Basic algorithm.''' Регрессионная модель в прилагаемой статье.
+
*'''Basic algorithm''' Regression model in the attached article.
-
===Task 4. Surname2013GeneticProg===
+
===4. 2013 GeneticProg===
-
*'''Название.''' Алгоритм генетического программирования для решения задачи прогнозирования.
+
*'''Name''' Genetic Programming Algorithm for Solving the Prediction Problem.
-
*'''Проблема.''' Создать алгоритм генетического программирования, решающий проблемы, названные Иваном Зелинкой. Предложить способ тестирования получаемых моделей, организовать скользящий контроль. Сравнить работу его на тестовом наборе задач с работой других алгоритмов ГП and с нейронными сетями.
+
*'''Problem''' Create a genetic programming algorithm that solves the problems named by Ivan Zelinka. Suggest a way to test the resulting models, organize a sliding control. Compare its performance on a test set of The problems with the performance of other GPU algorithms and with neural networks.
-
*'''Данные.''' Тестовый набор задач, взять на UCI или на Полигоне.
+
*'''Data''' Test set of The problems, take on the UCI or on the Polygon.
-
*'''References:.''' Zelinka, Oplatkova, Vladislavleva; найти работы последних лет по этой теме. Особенно по тестированию этих алгоритмов.
+
* '''References:''' Zelinka, Oplatkova, Vladislavleva; find works of recent years on this topic. Especially for testing these algorithms.
-
*'''Предлагаемый алгоритм.''' ГП.
+
*'''Proposed algorithm''' GPU.
-
*'''Basic algorithm.''' ГП, нейронные сети.
+
*'''Basic algorithm''' GPU, neural networks.
-
=== Task 5. Surname2013Simplify===
+
===5. 2013 Simplify===
-
*'''Название.''' Обзор алгоритмов упрощения алгебраических выражений.
+
*'''Name''' Overview of Algorithms for Simplifying Algebraic Expressions.
-
*'''Проблема.''' Требуется найти литературу по алгоритмам, упрощающим выражения, сравнить алгоритмы, запрограммировать алгоритм, предложенный в работе Рудой/Стрижов.
+
*'''Problem''' It is required to find literature on algorithms that simplify expressions, compare algorithms, program the algorithm proposed in the work by Ruda/Strijov V.V.
-
*'''Данные.''' Собрать тестовую коллекцию выражений.
+
*'''Data''' Collect a test collection of expressions.
-
*'''References:.''' Graph rewriting.
+
* '''References:''' Graph rewriting.
-
*'''Предлагаемый алгоритм.''' Р/С, сравнение алгоритмов.
+
*'''Proposed algorithm''' R/S, comparison of algorithms.
-
===Task 6. Surname2013RedListExplanation===
+
===6. 2013 RedListExplanation===
-
*'''Название.''' Интерпретация Expertных оценок видов Красной книги РФ путем отбора эталонных (представительных) объектов.
+
*'''Name''' Interpretation of expert assessments of species of the Red Book of the Russian Federation by selecting reference (representative) objects.
-
*'''Проблема.''' Отбор эталонных объектов (алгоритм STOLP). Этот алгоритм может быть интересен для Expertов: он быстро находит шумовые объекты, которых в наших терминах считаются противоречащими Expertным данным and "лежащими не в своем классе", а также отбирает эталонные объекты, которые также любопытно интерпретируются. С математической точки зрения интересно, во-первых, понаблюдать за разными метриками (обобщениями расстояния Хэмминга) и, самое главное, надо обобщить формулу отступа (margin) на случай монотонных классов, видимо, введя весовую функцию объектов.
+
*'''Problem''' Selection of reference objects (STOLP algorithm). This algorithm can be interesting for Experts: it quickly finds noise objects, which in our terms are considered to be inconsistent with Expert data and "out of their class", and also selects reference objects that are also interpreted in a curious way. From a mathematical point of view, it is interesting, firstly, to observe different metrics (generalizations of the Hamming distance) and, most importantly, it is necessary to generalize the margin formula for the case of monotone classes, apparently by introducing the weight function of objects.
-
*'''Данные.''' экспертные оценки краснокнижных видов.
+
*'''Data''' expert assessments of Red Data Book species.
-
*'''References:.''' References: по алгоритмам метрической классификации.
+
* '''References:''' according to metric classification algorithms.
-
*'''Предлагаемый алгоритм.''' Метод или алгоритм, который сообщает Expertу почему (sic!) объект не попал в предполагаемый Expertом класс.
+
*'''Proposed algorithm''' A method or algorithm that tells the Expert why (sic!) an object is not in the Expert's intended class.
-
===Task 7. Surname2013RedListClassification===
+
===7. 2013 RedListClassification===
-
*'''Название.''' Алгоритм монотонной классификации объектов, описанных в ранговых шкалах.
+
*'''Name''' Algorithm for monotonic classification of objects described in rank scales.
-
*'''Проблема.''' Применить решающее дерево к Expertным оценкам угрожаемости краснокнижных видов. Сравнить с ранее предложенными алгоритмами. Обосновывать операции с ранговыми признаками, ввести обобщение понятия информативности на случай монотонных классов, видимо, сделать обобщение гипергеометрического распределения.
+
*'''Problem''' Apply a decision tree to the Expert Estimates of Threatened Species in the Red Data Book. Compare with previously proposed algorithms. To substantiate operations with rank features, to introduce a generalization of the concept of informativeness for the case of monotone classes, apparently, to generalize the hypergeometric distribution.
-
*'''Данные.''' экспертные оценки краснокнижных видов.
+
*'''Data''' expert assessments of Red Data Book species.
-
*'''References:.''' Нужно постараться избежать ссылок на тривиальные источники. Поискать похожие работы в иностранных журналах.
+
* '''References:''' You should try to avoid referring to trivial sources. Search for similar works in foreign magazines.
-
===Task 11. Surname2013Invaraint4LocalForecast ===
+
===11. 2013 Invaraint4LocalForecast ===
-
*'''Название.''' Инвариантные преобразования в Taskх локального прогнозирования.
+
*'''Name''' Invariant transformations in The problems of local forecasting.
-
*'''Проблема.''' Совместить алгоритмы инвариантного преобразования времени and амплитуды прогнозируемых временных рядов.
+
*'''Problem''' Combine algorithms for invariant transformation of time and amplitude of predicted time series.
-
*'''Данные.''' Временные ряды измерения пульсовой волны.
+
*'''Data''' Time series of pulse wave measurement.
-
*'''References:.''' Найти, избежать тривиальных ссылок.
+
* '''References:''' Find, avoid trivial references.
-
===Task 8. Surname2013PlausibleExpert===
+
===8. 2013 PlausibleExpert===
-
*'''Название.''' Алгоритм оценивания достоверности Expertных суждений о взаимосвязи временных рядов.
+
*'''Name''' Algorithm for Estimating the Reliability of Expert Judgments on the Relationship of Time Series.
-
*'''Проблема.''' Исследование взаимосвязи биржевых цен на основные инструменты and железнодорожных грузоперевозок.
+
*'''Problem''' Study of the relationship between exchange prices for the main instruments and rail freight.
-
*'''Данные.''' Временные ряды за 1.5 года. Но лучше подобрать синтетический пример.
+
*'''Data''' Time series for 1.5 years. But it is better to choose a synthetic example.
-
*'''References:.''' Публикации по CCM.
+
* '''References:''' Publications on CCM.
-
*'''Предлагаемый алгоритм.''' Модификации ССМ.
+
*'''Proposed algorithm''' CCM modifications.
-
=== Task 9. Surname2013DeepLearning===
+
===9. 2013 DeepLearning===
-
*'''Название.''' Порождение нейронных сетей с Expertно-заданными функциями активации.
+
*'''Name''' Generating Neural Networks with Expert-Defined Activation Functions.
-
*'''Проблема.''' Требуется поднять современное состояние области DeepLearning, запрограммировать алгоритм, протестировать на задаче прогнозирования объемов потребления and цен на электроэнергию.
+
*'''Problem''' It is required to raise the current state of the DeepLearning area, program the algorithm, test it on the problem of predicting consumption volumes and electricity prices.
-
*'''Данные.''' Посуточные данные за три года.
+
*'''Data''' Daily data for three years.
-
*'''References:.''' Deep Learning.
+
* '''References:''' Deep Learning.
-
*'''Предлагаемый алгоритм.''' Построение нейронной сети and оценка ее параметров.
+
*'''Proposed algorithm''' Building a neural network and estimating its parameters.
-
===Task 16. Surname2013ScoringSelection===
+
===16. 2013 ScoringSelection===
-
*'''Название.''' Алгоритмы переборного поиска наиболее информативных объектов and признаков в логистической регрессии.
+
*'''Name''' Search algorithms for the most informative objects and features in logistic regression.
-
*'''Проблема.''' С помощью генетического алгоритма найти информативные объекты and признаки.
+
*'''Problem''' Using a genetic algorithm to find informative objects and features.
-
*'''Данные.''' Данные по потребительским кредитам.
+
*'''Data''' Consumer credit data.
-
*'''References:.''' -
+
* '''References:''' -
-
===Task 10. Surname2013ScoringFeatureSelection===
+
===10. 2013 ScoringFeatureSelection===
-
*'''Название.''' Группировка номинальных переменных в Taskх банковского кредитного скоринга.
+
*'''Name''' Grouping of Nominal Variables in Bank Credit Scoring The problems.
-
*'''Проблема.''' Создать генетический алгоритм снижения размерности признакового пространства.
+
*'''Problem''' Create a genetic algorithm for reducing the dimension of a feature space.
-
*'''Данные.''' Исторические данные по кредитам наличностью.
+
*'''Data''' Historical data on cash loans.
-
*'''References:.''' SAS, найти еще.
+
* '''References:''' SAS, find more.
-
===Task 15. Surname2013InverseVAR===
+
===15. 2013 InverseVAR===
-
*'''Название.''' Векторная авторегрессия and управление макроэкономическими показателями.
+
*'''Name''' Vector autoregression and management of macroeconomic indicators.
-
*'''Проблема.''' Решить обратную задачу прогнозирования. По заданному состоянию экономики задать такое значение управляемых макроэкономических показателей, которое бы привело экономику в желаемое состояние.
+
*'''Problem''' Solve the inverse forecasting problem. According to the given state of the economy, set such a value of managed macroeconomic indicators that would bring the economy to the desired state.
-
*'''Данные.''' Макроэкономические показатели России за последние 16 лет.
+
*'''Data''' Macroeconomic indicators of Russia over the past 16 years.
-
*'''References:.''' Работы С.А. Айвазяна.
+
* '''References:''' S.A. Ayvazyan works.
-
===Task 12. Surname2013DistanceVisualizing===
+
===12. 2013 DistanceVisualizing===
-
*'''Название.''' Визуализация матрицы парных расстояний в тематическом моделировании.
+
*'''Name''' Visualization of Pair Distance Matrix in Topic Modeling.
-
*'''Проблема.''' Отобразить тезисы конференции на плоскости с сохранением кластеров.
+
*'''Problem''' Display abstracts of the conference on the plane with the preservation of clusters.
-
*'''Данные.''' Тезисы конференции EURO.
+
*'''Data''' EURO conference abstracts.
-
*'''References:.''' Зиновьев на ML, References: по теме.
+
*'''References:''' Zinoviev on ML, references on the topic.
-
*'''Предлагаемый алгоритм.''' PCA.
+
*'''Proposed algorithm''' PCA.
-
*'''Basic algorithm.''' Алгоритм с минимизацией энергетического критерия.
+
*'''Basic algorithm''' Algorithm with minimization of the energy criterion.
-
===Task 13. Surname2013RhoNets===
+
===13. 2013 RhoNets===
-
*'''Название.''' Сравнение быстрых алгоритмов кластеризации.
+
*'''Name''' Comparison of Fast Clustering Algorithms.
-
*'''Проблема.''' Сравнить алгоритм кластеризации с использованием $\rho$-сетей and быстрый алгоритм $k$-средних.
+
*'''Problem''' Compare clustering algorithm using $\rho$-networks and a fast $k$-means algorithm.
-
*'''Данные.''' Была выборка аминокислотных последовательностей. Нужна тестовая выборка из UCI или из работ по сравнению.
+
*'''Data''' A selection of amino acid sequences. We need a test sample from the UCI or from comparison papers.
-
*'''References:.''' $k$-средних, $\varepsilon$-сети.
+
*'''References:''' $k$-средних, $\varepsilon$-networks.
-
*'''Предлагаемый алгоритм.''' $\rho$-сети.
+
*'''Proposed algorithm''' $\rho$-networks.
-
*'''Basic algorithm.''' $k$-средних.
+
*'''Basic algorithm''' $k$-means.
-
===Task 17. Surname2013FeatureSelection===
+
===17. 2013 FeatureSelection===
-
*'''Название.''' Сравнительный анализ алгоритмов выбора признаков: точность, устойчивость, сложность регрессионных моделей.
+
*'''Name''' Comparative analysis of feature selection algorithms: accuracy, stability, complexity of regression models.
-
*'''Проблема.''' Построить ряд тестовых задач для сравнения алгоритмов. Предложить алгоритм выбора признаков с анализом ковариационных матриц, основанных на методе Белсли.
+
*'''Problem''' Build a series of test problems to compare algorithms. Propose a feature selection algorithm with the analysis of covariance matrices based on the Belsley method.
-
*'''Данные.''' Синтетические.
+
*'''Data''' Synthetic.
-
*'''References:.''' Леонтьева/Стрижов, поискать современные обзоры.
+
* '''References:''' Leontieva/Strijov V.V., search for modern reviews.
-
===Task 1. Surname2013Txt2Bib===
+
===1. 2013 Txt2Bib===
-
*'''Название.''' Разметка библиографических записей с помощью логических алгоритмов.
+
*'''Name''' Marking up bibliographic records using logical algorithms.
-
*'''Проблема.''' Требуется создать алгоритм разметки текста. Новизна в постановке задачи. Актуальность в том, что будет создана более полная библиотека логических выражений and выбран адекватный алгоритм.
+
*'''Problem''' It is required to create a text markup algorithm. Novelty in the formulation of the problem. The relevance is that a more complete library of logical expressions will be created and an adequate algorithm will be selected.
-
*'''Данные.''' В MLAlgorithms.
+
*'''Data''' MLAlgorithms.
-
*'''References:.''' Работа А. Ивановой and все, что есть по теме за последние два года.
+
* '''References:''' The work of A. Ivanova and everything that is on the topic over the past two years.
-
*'''Предлагаемый алгоритм.''' Выбрать из логических алгоритмов классификации; дополнительно кластеризация.
+
*'''Proposed algorithm''' Choose from logical classification algorithms; optional clustering.
-
*'''Basic algorithm.''' Тупиковые покрытия.
+
*'''Basic algorithm''' Dead-end coatings.
-
===Task 14. Surname2013FindTheFormula (Risky)===
+
===14. 2013 FindTheFormula (Risky)===
-
*'''Название.''' Алгоритм поиска текстовых структур в документе.
+
*'''Name''' Algorithm for searching text structures in a document.
-
*'''Проблема.''' Предложить алгоритм, который бы в документе TeX искал бы формулы, эквивалентные заданной.
+
*'''Problem''' Suggest an algorithm that would look for formulas in a TeX document that are equivalent to a given one.
-
*'''Данные.''' Синтетические, коллекция MLAlgorithms.
+
*'''Data''' Synthetic, MLAlgorithms collection.
-
*'''References:.''' Надо искать. Поиск по химическим соединениям в WoK работает неплохо.
+
*'''References''' Have to search. Search by chemical compounds in WoK works well.
-
===Task 18. Surname2013ScannedImage (Image)===
+
===18. 2013 ScannedImage (Image)===
-
*'''Название.''' Определение типа бланка.
+
*'''Name''' Form type definition.
-
*'''Проблема.''' Определить тип бланка по скану.
+
*'''Problem''' Determine the type of form from the scan.
-
*'''Данные.''' Набор изображений в TIF.
+
*'''Data''' A set of images in TIF.
-
===Task 19. Surname2013SpectrumImage (Image)===
+
===19. 2013 SpectrumImage (Image)===
-
*'''Название.''' Определение напечатанного изображения.
+
*'''Name''' Definition of the printed image.
-
*'''Проблема.''' Сделать спектральное преобразование изображения, исследовать спектр.
+
*'''Problem''' Make a spectral transformation of the image, explore the spectrum.
-
*'''Данные.''' Набор изображений в JPG, отнесенных в два класса.
+
*'''Data''' A set of JPG images classified into two classes.
{|class="wikitable"
{|class="wikitable"
-
! Task
+
! The problem
-
! Кто делает
+
! Who is doing
|-
|-
-
|Дан набор трехэлементных векторов. Первые два элемента нарисовать по осям абсцисс and ординат. Третий элемент отобразить как круг с пропорциональным радиусом. Пропорции подобрать исходя из чувства прекрасного. Сравнить полученный график с plot3. Что лучше?
+
|A set of three-element vectors is given. Draw the first two elements along the abscissa and ordinate axes. The third element is displayed as a circle with a proportional radius. Choose proportions based on a sense of beauty. Compare the resulting graph with plot3. What's better?
-
|Митяшов Андрей
+
|Mityashov Andrey
|-
|-
-
|Дан пятиэлементный вектор. Нарисовать [http://ru.wikipedia.org/wiki/%D0%9B%D0%B8%D1%86%D0%B0_%D0%A7%D0%B5%D1%80%D0%BD%D0%BE%D0%B2%D0%B0 лицо Чернова]. Что лучше - лицо Чернова или [https://www.google.com/search?q=%D0%9B%D0%B5%D0%BF%D0%B5%D1%81%D1%82%D0%BA%D0%BE%D0%B2%D0%B0%D1%8F+%D0%B4%D0%B8%D0%B0%D0%B3%D1%80%D0%B0%D0%BC%D0%BC%D0%B0%3F&aq=f&oq=%D0%9B%D0%B5%D0%BF%D0%B5%D1%81%D1%82%D0%BA%D0%BE%D0%B2%D0%B0%D1%8F+%D0%B4%D0%B8%D0%B0%D0%B3%D1%80%D0%B0%D0%BC%D0%BC%D0%B0%3F&aqs=chrome.0.57j0l3.7857&sourceid=chrome&ie=UTF-8 диаграмма]?
+
|Given a five-element vector.
-
|Неклюдов Кирилл
+
|Neklyudov Kirill
|-
|-
-
|Разобраться как работает regexp в Матлабе. Сделать код, который выделяет все, что находится внутри скобок некоторого арифметического выражения.
+
|Understand how regexp works in Matlab. Make code that highlights everything that is inside the brackets of some arithmetic expression.
-
|Рыскина Мария
+
|Ryskina Maria
|-
|-
-
|Разобраться как работает суперпозиция функций. С помощью функции @ породить все возможные полиномы от n переменных степени не более p.
+
|Understand how function superposition works. Using the @ function, generate all possible polynomials in n variables of degree at most p.
-
|Шубин Андрей
+
|Shubin Andrey
|-
|-
-
|Разобраться как работает web-соединение and regexp. Сделать поисковый запрос по теме and сверстать из нее запись BibTeX.
+
|Understand how a web connection and regexp works. Make a search query on a topic and make up a BibTeX entry from it.
|
|
|-
|-
-
|Дан временной ряд из m + 1 (случайных) точек. Приблизить m его первых точек полиномами степени от 1 до m. Вычислить среднюю ошибку в точках. Какая степень дает наибольшую ошибку?
+
|Given a time series of m + 1 (random) points. Approximate its first m points by polynomials of degree from 1 to m. Calculate the mean error in points. Which degree gives the largest error?
-
|Воронов Сергей
+
|Voronov Sergey
|-
|-
-
|Повернуть and увеличить плоскую фигуру, сделать эффект приближения с вращением по кадрам.
+
|Rotate and zoom in on a flat figure, make a zoom effect with frame-by-frame rotation.
-
|Антипова Наташа
+
|Antipova Natasha
|-
|-
-
|Заданы две матрицы. Проверить, есть ли в них пересечение – подматрица?
+
|Two matrices are given. Check if they have an intersection - a submatrix?
-
|Вдовина Евгения
+
|Vdovina Evgenia
|-
|-
-
|Дана выборка из нескольких признаков, без целевого вектора Y. Например, эта https://dmba.svn.sourceforge.net/svnroot/dmba/Data/Diabets_LARS.csv Требуется указать тот признак, который хорошо описывается (в терминах линейной регрессии) остальными (такой признак обычно исключают из выборки).
+
|A sample of several features is given, without a target vector Y. For example, this https://dmba.svn.sourceforge.net/svnroot/dmba/Data/Diabets_LARS.csv You need to specify the feature that is well described (in terms of linear regression) the rest (such a feature is usually excluded from the sample).
-
|Гринчук Олег
+
|Grinchuk Oleg
|-
|-
-
|Дана выборка, в которой есть несколько выбросов. Известно, что она может быть описана одномерной линейной регрессией. Требуется переборным путем найти выбросы. Показать их на графике.
+
|Given a sample that has several outliers. It is known that it can be described by one-dimensional linear regression. It is required to find the outliers by enumeration. Show them on a chart.
-
|Пушняков Алексей
+
|Pushnyakov Alexey
|-
|-
-
|Дана выборка из двух классов на плоскости. Требуется найти все объекты, которые залезли в чужой класс. Показать их на графике.
+
|Given a sample of two classes on a plane. It is required to find all the objects that got into a foreign class. Show them on a chart.
-
|Кащеева Мария
+
|Kashcheeva Maria
|-
|-
-
|На вход подается матрица инцидентности дерева. Функция возвращает список (вектор) вершин в порядке их посещения.
+
|The input is the incidence matrix of the tree. The function returns a list (vector) of vertices in the order they were visited.
-
|Ибраимова Айжан
+
|Ibraimova Aizhan
|-
|-
-
|Классифицировать цветы ириса произвольным алгоритмом, нарисовать на плоскости «самую наглядную» пару признаков, указать, что классифицировалось правильно, а что – нет.
+
|Classify iris flowers with an arbitrary algorithm, draw the “most visual” pair of features on the plane, indicate what was classified correctly and what was not.
-
|Яшков Даниил
+
|Yashkov Daniel
|-
|-
-
|Дан временной ряд. По его вариационному ряду построить гистограмму из n перцентилей, нарисовать ее. Какое значение временного ряда встречается чаще всего?
+
|Given a time series. Based on its variational series, build a histogram of n percentiles, draw it. What is the most common time series value?
|
|
|-
|-
-
|Создать несколько групп точек на плоскости and выполнить их кластеризацию, используя любой алгоритм на выбор. Визуализировать полученные кластеры. Посчитать среднее внутрикластерное расстояние для одного кластера.
+
|Create several groups of points on the plane and perform their clustering using any algorithm of your choice. Visualize the resulting clusters. Calculate the average intracluster distance for one cluster.
-
|Перекрестенко Дмитрий
+
|Perekrestenko Dmitry
|-
|-
-
|Загрузить звуковой ряд, желательно несколько нот фортепиано. Выделить and проиграть определенную ноту.
+
|Upload a sound sequence, preferably a few piano notes. Select and play a specific note.
|
|
|-
|-
-
|Загрузить видеоряд. Удалить каждый второй кадр. Обработать по вкусу. Записать обратно.
+
|Download video. Delete every second frame. Process to taste. Write back.
-
|Бырдин Александр
+
|Byrdin Alexander
|-
|-
-
|Показать разницу в скорости выполнения матричных операций and операций в цикле. Показать эффективность параллельных вычислений (parfor and другие).
+
|Show the difference in the speed of performing matrix operations and operations in a loop. Show the efficiency of parallel computing (parfor and others).
-
|Катруца Александр
+
|Alexander Katrutsa
|-
|-
-
|Предложить варианты визуализации четырехмерных векторов and пространств. Сравнить их со встроенной функцией.
+
|Suggest options for visualization of four-dimensional vectors and spaces. Compare them to a built-in function.
|
|
|-
|-
-
|Сгладить временной ряд скользящим средним. Взять несколько окон разной длины and наложить результат на графике друг на друга.
+
|Smooth the time series with a moving average. Take several windows of different lengths and superimpose the result on the graph on top of each other.
-
|Чинаев Николай
+
|Chinaev Nikolai
|-
|-
-
|Нарисовать поверхность. Каждую точку поверхности заменить медианой от n соседей. Нарисовать результат.
+
|Draw a surface. Replace each point of the surface with a median of n neighbors. Draw the result.
-
|Костин Александр
+
|Kostin Alexander
|-
|-
|}
|}
-
=2012=
+
==2012==
 +
Thematic Modeling: paper in the Higher Attestation Commission journal
-
==Тематическое моделирование: публикация в журнале ВАК==
 
-
 
-
{{tip|Статус публикации работ см. внизу страницы, раздел "Публикация работ". Ожидается публикация всех работ до конца мая 2013.}}
 
-
 
-
== Список задач ==
 
{|class="wikitable"
{|class="wikitable"
|-
|-
-
! Task name
+
! Title
! Author
! Author
-
! Link to work
+
! Link
! Comments
! Comments
|-
|-
-
|Вычисление интегральных индикаторов в ранговых шкалах методами ко-кластеризации
+
|Calculation of integral indicators in rank scales by co-clustering methods
-
|Медведникова Мария
+
|Medvednikova Maria
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Medvednikova2012CoIndicator]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Medvednikova2012CoIndicator]
-
|Опубликовано
+
|Published
|-
|-
-
|Иерархическая тематическая кластеризация тезисов and визуализация
+
|Hierarchical thematic abstract clustering and visualization
-
|Кузьмин Арсентий
+
|Arsenty Kuzmin
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Kuzmin2012ThematicClustering]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Kuzmin2012ThematicClustering]
-
|Опубликовано
+
|Published
|-
|-
-
|Совместный выбор объектов and признаков в Taskх многоклассовой классификации.
+
|Joint selection of objects and features in The problems of multiclass classification.
-
|Адуенко Александр
+
|Alexander Aduenko
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Aduenko2012CovSelection]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Aduenko2012CovSelection]
-
|Опубликовано
+
|Published
|-
|-
-
|Построение иерархических тематических моделей
+
|Building hierarchical topic models
-
|Цыганова Светлана
+
|Tsyganova Svetlana
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Tsyganova2012TopicIerarhy]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Tsyganova2012TopicIerarhy]
-
|Опубликовано
+
|Published
|-
|-
-
|Выбор признаков в Taskх структурной регрессии
+
|Feature Selection in The problems Structural Regression
-
|Варфоломеева Анна
+
|Varfolomeeva Anna
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Varfolomeeva2012StructureLearning]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Varfolomeeva2012StructureLearning]
-
|Принято
+
|Accepted
|-
|-
-
|Статистические критерии однородности and согласия для сильно разреженных дискретных распределений
+
|Statistical tests for homogeneity and goodness of fit for highly sparse discrete distributions
-
|Целых Влада
+
|Vlada Tselykh
|
|
[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Celyh2012SparceDistribution]
[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Celyh2012SparceDistribution]
-
|Опубликовано
+
|Published
|-
|-
-
|Построение логических правил при разметке текстов
+
|Building logical rules when marking up texts
-
|Иванова Алина
+
|Ivanova Alina
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Ivanova2012LogicStructure]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Ivanova2012LogicStructure]
-
|Принято
+
|Accepted
|-
|-
-
|Проверка адекватности тематической модели
+
|Checking the adequacy of the topic model
-
|Степан Лобастов
+
|Stepan Lobastov
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Lobastov2012LatentModels]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Lobastov2012LatentModels]
-
|Редакция
+
|Redaction
|-
|-
|}
|}
 +
===1. 2012===
 +
*'''Name:''' CoRegression. Calculation of integral indicators in rank scales by co-clustering methods.
 +
*'''Teaser:''' Construction of an integral assessment of the effectiveness of scientific activity.
 +
*'''Data:''' Synthetic. PRND employees. Table authors-journals and number of articles of selected authors in journals.
 +
*'''References:''' [[Media:Voron-2008-11-10-cf.pdf|Vorontsov K. V. «Collaborative filtering»]].
 +
*'''Keywords:''' h-index, co-clustering, collaborative filtering.
 +
*'''Proposed algorithm:''' Joint regression (invent or find ready-made).
 +
*'''Basic algorithm:''' Calculated IF of journals and h-index of authors. (Coclustering or adaptive filtering is not good for comparison).
 +
*'''Problem:''' [[Media:Strijov2012SciRating.pdf‎|Description in file.]] Additionally: when creating a rating, there is a problem of splitting the set of authors and journals into clusters. The size of the cluster needs to be correlated with the "Assessment of the involvement of the author/journal in the scientific community". This assessment should be included in the rating (as a last resort, it should be presented separately).
-
===1. 2012CoRegression===
+
===2. 2012===
-
*'''Name:''' Вычисление интегральных индикаторов в ранговых шкалах методами ко-кластеризации.
+
*'''Name:''' ExpertRanking. Coordination of rank Expert estimates.
-
*'''Тизер:''' Построение интегральной оценки эффективности научной деятельности.
+
*'''Teaser:''' Voting ranking methods (selection of literary works, selection of a limited committee).
-
*'''Data:''' Синтетические. ПРНД сотрудников. Таблица авторы-журналы and число статей выбранных авторов в журналах.
+
*'''Data:''' Internet voting for a list of books, voting without co-optation.
-
*'''References:''' [[Media:Voron-2008-11-10-cf.pdf|Vorontsov K. V. «Коллаборативная фильтрация»]].
+
*'''References:''' Article in Notices AMS, 2008, 55(4). It will be necessary to review the literature on this issue.
-
*'''Ключевые слова:''' индекс Хирша, ко-кластеризация, коллаборативная фильтрация.
+
*'''Proposed algorithm::''' Finding the intersection of cones and estimating the effective space dimension or another algorithm.
-
*'''Предлагаемый алгоритм''' Совместная регрессия (придумать или найти готовую).
+
*'''Basic algorithm:''' Kemeny Median and other algorithms.
-
*'''Basic algorithm:''' Вычисленный IF журналов and h-index авторов. (Кокластеризация или адаптивная фильтрация для сравнения на годится).
+
*'''Problem:''' It is required to illustrate and study the properties of the committee selection algorithm. In particular, highlight the following problem. The ''n'' ranking of the selected candidates differs from the ''n+k'' ranking of the selected candidates, in a single vote with a choice of ''N'' candidates. It may be necessary to shed light on Arrow's paradox.
-
*'''Проблема:''' [[Media:Strijov2012SciRating.pdf‎|Описание в файле.]] Дополнительно: при создании рейтинга встает проблема разбиения множества авторов and журналов на кластеры. Размер кластера требуется соотнести с "Оценкой вовлеченности автора/журнала в научное сообщество". Эта оценка должна войти в рейтинг (в крайнем случае, должна быть представлена отдельно).
+
-
===2. 2012ExpertRanking===
+
===3. 2012===
-
*'''Name:''' Согласование ранговых Expertных оценок.
+
*'''Name:''' StructureRegression. Feature Selection in Structural Regression The problems
-
*'''Тизер:''' Методы ранжирования при голосовании (выборе литературных произведений, выборе ограниченного комитета).
+
*'''Teaser:''' Structural regression algorithm for tagging bibliographic lists, abstracts and other structured texts.
-
*'''Data:''' Интернет-голосование за список книг, голосование без кооптации.
+
*'''Data:''' bibliographic records from the BibTeX collection on CS.
-
*'''References:''' Статья в Notices AMS, 2008, 55(4). Нужно будет сделать обзор литературы по этой проблеме.
+
*'''References:''' by Jaakkola and his team, possibly code.
-
*'''Предлагаемый алгоритм:''' Нахождение пересечения конусов and оценка эффективной размерности пространства или другой алгоритм.
+
*'''Proposed algorithm::''' Structural regression.
-
*'''Basic algorithm:''' Медиана Кемени and другие алгоритмы.
+
*'''Basic algorithm:''' is described by Valentin.
-
*'''Проблема:''' Требуется проиллюстрировать and изучить свойства алгоритма выбора комитета. В частности, осветить следующую проблему. Рейтинг ''n'' выбранных кандидатов отличается от рейтинга ''n+k'' выбранных кандидатов, при единственном голосовании с выбором из ''N'' кандидатов. Возможно, требуется осветить парадокс Эрроу.
+
*'''Required:''' segment the input text and assign each segment a field and each group of fields a bibliographic record type.
-
===3. 2012StructureRegression===
+
===4. 2012===
-
*'''Name:''' Выбор признаков в Taskх структурной регрессии
+
*'''Name:''' LogicClassification. Building logical rules when marking up texts
-
*'''Тизер:''' Алгоритм структурной регрессии для разметки библиографических списков, тезисов and других структурированных текстов.
+
*'''Teaser:''' Structural regression algorithm for tagging bibliographic lists, abstracts and other structured texts.
-
*'''Data:''' библиографические записи из BibTeX collection on CS.
+
*'''Data:''' bibliographic records from BibTeX collection on CS / conference abstracts, other marked up texts.
-
*'''References:''' работы Jaakkola and его команды, возможно, код.
+
*'''References:''' works by Inyakin, Chuvilin, Kudinov.
-
*'''Предлагаемый алгоритм:''' Структурная регрессия.
+
*'''Proposed algorithm::''' Decision trees, Dead-end coatings.
-
*'''Basic algorithm:''' описан Валентином.
+
*'''Basic algorithm:''' is described by Valentin.
-
*'''Требуется:''' сегментировать входной текст and поставить в соответствие каждому сегменту поле, а каждой группе полей - тип библиографической записи.
+
*'''Required:''' train the model, text markup, using decision rules over RegExp - strings.
-
===4. 2012LogicClassification===
+
=== 5. 2012===
-
*'''Name:''' Построение логических правил при разметке текстов
+
* '''Title:''' RankClustering. Rank clustering and dynamic alignment algorithms.
-
*'''Тизер:''' Алгоритм структурной регрессии для разметки библиографических списков, тезисов and других структурированных текстов.
+
* '''Teaser:''' Search for duplicates in bibliographic records. Dynamic alignment when finding duplicate bibliographic records.
-
*'''Data:''' библиографические записи из BibTeX collection on CS / тезисы конференций, другие размеченные тексты.
+
* '''Data:''' Corrupted and incorrect bibliographic records (bases of student abstracts). [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Data2012TextMining Over 1000 bibliographic entries from data mining articles/books.]
-
*'''References:''' работы Инякина, Чувилина, Кудинова.
+
* '''References:''' [http://www.matbio.org/2012/Strijov2012(7_345).pdf Strijov V.V. et al. "Metric Sequence Clustering"], work on fast k-Means clustering.
-
*'''Предлагаемый алгоритм:''' Решающие деревья, тупиковые покрытия.
+
* '''Keywords:''' DTW — modifications, k-Means.
-
*'''Basic algorithm:''' описан Валентином.
+
* '''Proposed algorithm::''' Rank clustering algorithm.
-
*'''Требуется:''' обучить модель, разметки текста, используя решающие правила над RegExp - строками.
+
* '''Base algorithm:''' k-Means and its high performance variations.
 +
* '''Problem:''' It is required to modify the procedure for calculating the cost of the alignment path in such a way as to detect and take into account the invariants of permutations (and allowable modifications) of parts of the bibliographic record.
-
=== 5. 2012RankClustering ===
+
===6. 2012===
-
* '''Name:''' Ранговая кластеризация and алгоритмы динамического выравнивания.
+
*'''Name:''' ThematicClustering. Checking the adequacy of the topic model.
-
* '''Тизер:''' Поиск дубликатов в библиографических записях. Динамическое выравнивание при нахождении дубликатов библиографических записей.
+
*'''Teaser:''' Methods for detecting incorrect thematic classification on conference materials. Methods for constructing a thematic model similar to the given one. Article clustering, hierarchical topic models with topic interpretability. Hierarchical thematic clustering of abstracts.
-
* '''Data:''' Испорченные and некорректные библиографические записи (базы студенческих рефератов). [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Data2012TextMining Более 1000 библиографических записей из статей/книг по анализу данных.]
+
*'''Data:''' [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Data2012TextMining Texts of Euro 2012 conference abstracts, 1862 abstracts.]
-
* '''References:''' [http://www.matbio.org/2012/Strijov2012(7_345).pdf Стрижов et al. «Метрическая кластеризация последовательностей»], работы по быстрой кластеризации k-Means.
+
*'''References:''' on clustering, and introducing distances between texts as bags of words.
-
* '''Ключевые слова:''' DTW — модификации, k-Means.
+
*'''Keywords:''' hierarchical clustering, text similarity metrics.
-
* '''Предлагаемый алгоритм:''' Алгоритм ранговой кластеризации.
+
*'''Proposed algorithm::''' k-means hierarchical clustering algorithm + k-NN classification.
-
* '''Basic algorithm:''' k-Means and его высокопроизводительные вариации.
+
-
* '''Проблема:''' Требуется модифицировать процедуру вычисления стоимости пути выравнивания так, чтобы обнаруживать and учитывать инварианты перестановок (и допустимых модицикаций) частей библиографической записи.
+
-
 
+
-
===6. 2012ThematicClustering===
+
-
*'''Name:''' Проверка адекватности тематической модели.
+
-
*'''Тизер:''' Методы выявления некорректной тематической классификации на материалах конференции. Методы построения тематической модели, сходной с заданной. Кластеризация статей, иерархические тематические модели с тематической интерпретируемостью. Иерархическая тематическая кластеризация тезисов.
+
-
*'''Data:''' [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Data2012TextMining Тексты тезисов конференции Евро-2012, 1862 тезиса.]
+
-
*'''References:''' по кластеризации, and введению расстояний между текстами как мешками слов.
+
-
*'''Ключевые слова:''' иерархическая кластеризация, метрики сходства текстов.
+
-
*'''Предлагаемый алгоритм:''' алгоритм иерархической кластеризации k-means + классификация k-NN.
+
*'''Basic algorithm:''' k-Means
*'''Basic algorithm:''' k-Means
-
*'''Проблема:''' Требуется построить тематическую модель методом кластеризации and проверить корректность текущей классификации текстов. Для этого выполняется (иерархическая) кластеризация текстов, каждому кластеру ставится в соответствие название темы, соответствующее большинству статей из кластера. После построения модели каждая статья проверяется and относится к своей или к чужой теме.
+
*'''Problem:''' It is required to build a thematic model using the clustering method and check the correctness of the current text classification. To do this, (hierarchical) clustering of texts is performed, each cluster is assigned a topic name corresponding to the majority of articles from the cluster. After building the model, each article is checked and refers to its own or someone else's topic.
-
===7. 2012ThematicHierarchy===
+
===7. 2012===
-
*'''Name:''' Построение иерархических тематических моделей.
+
*'''Name:''' ThematicHierarchy. Building hierarchical topic models.
-
*'''Тизер:''' Иерархическая тематическая кластеризация тезисов. Построение тематической модели на материалах конференции.
+
*'''Teaser:''' Hierarchical thematic clustering of abstracts. Building a thematic model based on the materials of the conference.
-
*'''Data:''' [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Data2012TextMining Тексты тезисов.]
+
*'''Data:''' [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Data2012TextMining Abstract text.]
-
*'''References:''' иерархические модели, [http://www.cs.princeton.edu/~mimno/topics.html topic modelling].
+
*'''References:''' hierarchical models, [http://www.cs.princeton.edu/~mimno/topics.html topic modeling].
-
*'''Ключевые слова:''' иерархическое тематическое моделирование.
+
*'''Keywords:''' hierarchical topic modeling.
-
*'''Предлагаемый алгоритм:''' иерархические модели, оценка распределения по темам.
+
*'''Proposed algorithm::''' hierarchical models, evaluation of topic distribution.
-
*'''Basic algorithm:''' PLSA--LDA.
+
*'''Basic algorithm:'''PLSA--LDA.
-
*'''Проблема:''' Требуется построить иерархическую тематическую модель путем вычисления статистических оценок функций распределения слов по темам.
+
*'''Problem:''' It is required to build a hierarchical topic model by calculating statistical estimates of the distribution functions of words by topic.
-
===8. 2012ThematicVisualizing===
+
===8. 2012===
-
*'''Name:''' Визуализация иерархических тематических моделей.
+
*'''Name:''' ThematicVisualizing. Visualization of hierarchical thematic models.
-
*'''Тизер:''' На материалах конференции EURO.
+
*'''Teaser:''' On the materials of the EURO conference.
-
*'''Data:''' [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Data2012TextMining Тексты тезисов конференции Евро-2012.]
+
*'''Data:''' [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Data2012TextMining Texts of Euro 2012 conference abstracts.]
-
*'''References:''' многомерное шкалирование, кластеризация.
+
*'''References:''' multidimensional scaling, clustering.
-
*'''Ключевые слова:''' визуализация графов.
+
*'''Keywords:''' graph visualization.
-
*'''Предлагаемый алгоритм:'''
+
*'''Proposed algorithm::'''
*'''Basic algorithm:''' --
*'''Basic algorithm:''' --
-
*'''Проблема:''' Требуется визуализировать матрицу парных расстояний таким образом, чтобы можно было принять решение о
+
*'''Problem:''' It is required to visualize the matrix of paired distances in such a way that it is possible to make a decision about
-
** корректировки названий тем/подтем конференции,
+
*# correction of the names of topics/subtopics of the conference,
-
** переносе тезиса из одной темы в другую,
+
*# transferring the thesis from one topic to another,
-
** адекватности соответствия модельной and фактический кластеризации.
+
*# adequacy of correspondence between model and actual clustering.
-
===9. 2012CovSelection===
+
===9. 2012===
-
*'''Name:''' Совместный выбор объектов and признаков в Taskх многоклассовой классификации.
+
*'''Name:''' CovSelection. Joint selection of objects and features in The problems of multiclass classification.
-
*'''Тизер:''' Ранжирование поисковых выдач Яндекса.
+
*'''Teaser:''' Yandex search results ranking.
-
*'''Data:''' Яндекс – математика.
+
*'''Data:''' Yandex - mathematics.
-
*'''References:''' Бишоп, Стрижов.
+
*'''References:''' Bishop, Strijov V.V..
-
*'''Ключевые слова:''' логистическая регрессия, выбор признаков, фильтрация объектов.
+
*'''Keywords:''' logistic regression, feature selection, feature filtering.
-
*'''Предлагаемый алгоритм:''' Совместный выбор путем анализа ковариационных матриц.
+
*'''Proposed algorithm::''' Joint selection by analysis of covariance matrices.
*'''Basic algorithm:''' SVM.
*'''Basic algorithm:''' SVM.
-
*'''Проблема:''' Взять матрицу '''T''', с. 209 Бишопа, сделать многоклассовую классификацию (с. 208). Проверить на синтетической выборке того же формата, что and данные Яндекса. (Для сравнения запустить алгоритм SVM на этой же выборке.Связать с выбором признаков.) Оценить матрицы гиперпараметров многоклассовой регрессионной модели. Предложить пошаговый алгоритм совместного выбора с максимизацией правдоподобия модели.
+
*'''Problem:''' Get matrix '''T''', p. 209 Bishop, make a multi-class classification (p. 208). Check on a synthetic sample of the same format as Yandex data. (For comparison, run the SVM algorithm on the same sample. Associate with feature selection.) Estimate the hyperparameter matrices of the multiclass regression model. Propose a step-by-step algorithm for joint selection with maximization of the likelihood of the model.
-
===10. 2012ThematicMatching===
+
===10. 2012===
-
*'''Name:''' Определение соответствия документа тематике на основе выделения ключевых фраз.
+
*'''Name:''' ThematicMatching. Determining whether a document matches the topic based on the selection of key phrases.
-
*'''Тизер:''' Соответствует ли диссертация объявленному паспорту диссертации? Какова фактическая специальность диссертации?
+
*'''Teaser:''' Does the dissertation match the declared dissertation passport? What is the actual specialty of the dissertation?
-
*'''Data:''' Авторефераты диссертаций (SugarSync). [http://www.aspirantura.spb.ru/pasport/05.html Паспорта специальностей].
+
*'''Data:''' Abstracts of dissertations (SugarSync). [http://www.aspirantura.spb.ru/pasport/05.html Passports of specialties].
-
*'''References:''' (Статья С. Царькова «Морфологические and статистические методы выделения ключевых фраз для построения вероятностных тематических моделей коллекций текстовых документов» - проверить).
+
*'''References:''' (Article by S. Tsarkov "Morphological and statistical methods for extracting key phrases for building probabilistic thematic models of collections of text documents" - check).
-
*'''Ключевые слова:''' ключевые фразы, тематические модели, N-граммы, морфологические and статистические признаки.
+
*'''Keywords:''' key phrases, topic patterns, N-grams, morphological and statistical features.
-
*'''Предлагаемый алгоритм:'''
+
*'''Proposed algorithm::'''
*'''Basic algorithm:''' C-Value and TF-IDF.
*'''Basic algorithm:''' C-Value and TF-IDF.
-
*'''Проблема:''' Требуется проверить каждый автореферат из коллекции на формальное соответствие паспорту декларируемой в автореферате специальности. При этом пункты паспорта рассматриваются как описания тем. Реферат считается соответствующим данной теме, если в совокупная вероятность принадлежности заданного числа терминов к одному из описаний темы данной специальности выше, чем принадлежность описаниям темы других специальностей.
+
*'''Problem:''' It is required to check each abstract from the collection for formal compliance with the passport of the specialty declared in the abstract. At the same time, passport items are considered as descriptions of topics. An abstract is considered relevant to a given topic if the total probability of a given number of terms belonging to one of the topic descriptions of this specialty is higher than belonging to topic descriptions of other specialties.
-
*'''Проблема, еще раз:''' Выделяем ключевые слова из документа. Считаем, что паспорт специальности состоит из ключевых слов. Находим расстояния от одного набора ключевых слов до другого. В итоге
+
*'''Problem, again:''' Extracting the keywords from the document. We believe that the specialty passport consists of keywords. Finding distances from one set of keywords to another. Eventually
-
** пополняем паспорт известной специальности новыми ключевыми словами, либо
+
*# we fill up the passport of a known specialty with new keywords, or
-
** находим ближайший паспорт специальности.
+
*# find the nearest specialty passport.
-
*'''Варианты решения:''' Введение функции расстояния от совокупности терминов до описания темы, построение матрицы таких расстояний.
+
*'''Solution options:'''Introduction of the distance function from the set of terms to the description of the topic, construction of a matrix of such distances.
-
===11. 2012FeatureGen===
+
===11. 2012===
-
*'''Name:''' Последовательное порождение and выбор признаков в задаче многоклассовой классификации
+
*'''Name:''' FeatureGen. Sequential generation and selection of features in a multiclass classification problem
-
*'''Тизер:''' Научно ли данное произведение? Определение типа произведения (определение научной области произведения). Определение социальной роли автора текста.
+
*'''Teaser:''' Is this work scientific? Determination of the type of work (definition of the scientific field of the work). Definition of the social role of the author of the text.
-
*'''Data:''' синтетические, интернет-коллекция.
+
*'''Data:''' synthetic, internet collection.
-
*'''References:''' Стрижов, Рудой.
+
*'''References:''' Strijov V.V., Ore.
-
*'''Ключевые слова:''' порождение признаков, поиск изоморфных моделей.
+
*'''Keywords:''' generation of features, search for isomorphic models.
-
*'''Предлагаемый алгоритм:''' алгоритм последовательного порождения суперпозиций.
+
*'''Proposed algorithm::''' Algorithm for sequential generation of superpositions.
-
*'''Basic algorithm:''' решающие деревья.
+
*'''Basic algorithm:''' decision trees.
-
*'''Проблема:''' Требуется построить набор признаков, по которым можно классифицировать текст.
+
*'''Problem:''' It is required to build a set of features by which the text can be classified.
-
===12. 2012TypeDetection===
+
===12. 2012===
-
*'''Name:''' Методы извлечения признаков из текстовой информации
+
*'''Name:''' TypeDetection. Methods for extracting features from text information
-
*'''Тизер:''' Научно ли данное произведение? Определение типа произведения (определение научной области произведения). Определение социальной роли автора текста.
+
*'''Teaser:''' Is this work scientific? Determination of the type of work (definition of the scientific field of the work). Definition of the social role of the author of the text.
-
*'''Data:''' синтетические, интернет-коллекция.
+
*'''Data:''' synthetic, internet collection.
-
*'''References:''' Найти.
+
*'''References:''' Find.
-
*'''Ключевые слова:''' иерархическая кластеризация, structural learning, метрики сходства текстов.
+
*'''Keywords:''' hierarchical clustering, structural learning, text similarity metrics.
-
*'''Предлагаемый алгоритм.'''
+
*'''Proposed algorithm'''
-
*'''Basic algorithm.'''
+
*'''Basic algorithm'''
-
*'''Проблема:''' Требуется построить набор признаков, по которым можно классифицировать текст.
+
*'''Problem:''' It is required to build a set of features by which the text can be classified.
-
 
+
-
===Темы К.В. Воронцова===
+
===13. 2012===
-
* '''2012SparceDistribution''' Статистические критерии однородности and согласия для сильно разреженных дискретных распределений (В.Ц.)
+
*'''Name:''' Checking the adequacy of the topic model.
-
 
+
*'''Teaser:''' Methods for detecting incorrect thematic classification on conference materials. Methods for constructing a thematic model similar to the given one. Article clustering, hierarchical topic models with topic interpretability. Hierarchical thematic clustering of abstracts.
-
=== 2012LatentModels===
+
*'''Data:''' [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Data2012TextMining Texts of Euro 2012 conference abstracts, 1862 abstracts.]
-
*'''Name:''' Проверка адекватности тематической модели.
+
*'''References:''' for latent models.
-
*'''Тизер:''' Методы выявления некорректной тематической классификации на материалах конференции. Методы построения тематической модели, сходной с заданной. Кластеризация статей, иерархические тематические модели с тематической интерпретируемостью. Иерархическая тематическая кластеризация тезисов.
+
*'''Keywords:''' soft clustering, latent models.
-
*'''Data:''' [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Data2012TextMining Тексты тезисов конференции Евро-2012, 1862 тезиса.]
+
*'''Proposed algorithm::''' hHDP.
-
*'''References:''' по латентным моделям.
+
*'''Basic algorithm:'''HDP.
-
*'''Ключевые слова:''' мягкая кластеризация, латентные модели.
+
*'''Problem:''' It is required to build a thematic model using the clustering method and check the correctness of the current text classification. To do this, (hierarchical) clustering of texts is performed, each cluster is assigned a topic name corresponding to the majority of articles from the cluster. After building the model, each article is checked and refers to its own or someone else's topic.
-
*'''Предлагаемый алгоритм:''' hHDP.
+
-
*'''Basic algorithm:''' HDP.
+
-
*'''Проблема:''' Требуется построить тематическую модель методом кластеризации and проверить корректность текущей классификации текстов. Для этого выполняется (иерархическая) кластеризация текстов, каждому кластеру ставится в соответствие название темы, соответствующее большинству статей из кластера. После построения модели каждая статья проверяется and относится к своей или к чужой теме.
+
-
 
+
-
== Ссылки ==
+
-
https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/utilities
+
-
В SugarSync/remarks находится документ с одной из возможных функций расстояния между текстами.
+
-
 
+
-
==References:==
+
-
https://www.sugarsync.com
+
-
Файлохранилище, где находятся материалы по проекту. Доступ к соответствующей папке предоставлен по адресу электронной почты. Материалы включают публикации по каждой теме.
+
-
==Публикация работ==
 
-
Легенда: Редакция >> Подать (оформление для журнала) >> Подано >> Принято (рецензентами) >> Верстка (замечания рецензентов and редактора учтены) >> Опубликовано (вышел номер).
 
{|class="wikitable"
{|class="wikitable"
|-
|-
-
! Task name
+
! Title
! Author
! Author
! Link to the journal
! Link to the journal
Строка 5903: Строка 5781:
! State
! State
|-
|-
-
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/KuzminAduenkoStrijov2012ThematicClustering/aduenko_kuzmin_strijov.pdf Выбор признаков and оптимизация метрики при кластеризации коллекции документов]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/KuzminAduenkoStrijov2012ThematicClustering/aduenko_kuzmin_strijov.pdf Feature selection and metric optimization when clustering a collection of documents]
-
|Адуенко А.А., Кузьмин А.А., Strizhov V.V.
+
|Aduenko A.A., Kuzmin A.A., Strijov V.V.
-
|[http://publishing.tsu.tula.ru/EstestvNauki.html Известия ТулГу]
+
|[http://publishing.tsu.tula.ru/EstestvNauki.html Izvestiya TulGu]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/KuzminAduenkoStrijov2012ThematicClustering/KuzminAduenkoStrijov2012Clustering.tex]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/KuzminAduenkoStrijov2012ThematicClustering/KuzminAduenkoStrijov2012Clustering.tex]
|12.10.2012
|12.10.2012
-
|Опубликовано
+
|Published
|-
|-
-
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/BudnikovStrijov2012StringProbabilities/budnikov_strijov.pdf Оценивание вероятностей появления строк в коллекции документов]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/BudnikovStrijov2012StringProbabilities/budnikov_strijov.pdf Estimating the Probabilities of Strings in a Collection of Documents]
-
|Будников Е.А., Strizhov V.V.
+
|Budnikov E.A., Strijov V.V.
-
|[http://novtex.ru/IT/ Информационные технологии]
+
|[http://novtex.ru/IT/ Information Technology]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/BudnikovStrijov2012StringProbabilities/BudnikovStrijov2012StringProbabilities.docx]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/BudnikovStrijov2012StringProbabilities/BudnikovStrijov2012StringProbabilities.docx]
|24.09.2012
|24.09.2012
-
|Опубликовано
+
|Published
|-
|-
-
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Kuzmin2012ThematicClustering/kuzmin_strijov.pdf Проверка адекватности тематических моделей коллекции документов]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Kuzmin2012ThematicClustering/kuzmin_strijov.pdf Checking the adequacy of the topic models of a collection of documents]
-
|Кузьмин А.А., Strizhov V.V.
+
|Kuzmin A.A., Strijov V.V.
-
|[http://novtex.ru/pi.html Программная инженерия]
+
|[http://novtex.ru/pi.html Software engineering]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Kuzmin2012ThematicClustering/ThematicClusteringAndVisualizing.tex]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Kuzmin2012ThematicClustering/ThematicClusteringAndVisualizing.tex]
|17.12.2012
|17.12.2012
-
|Опубликовано
+
|Published
|-
|-
-
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/AduenkoStrijov2012TextVisualizingII/aduenko_strijov2.pdf Алгоритм оптимального расположения названий коллекции документов]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/AduenkoStrijov2012TextVisualizingII/aduenko_strijov2.pdf Algorithm for the optimal location of the names of a collection of documents]
-
|Адуенко А.А., Strizhov V.V.
+
|Aduenko A.A., Strijov V.V.
-
|[http://novtex.ru/pi.html Программная инженерия]
+
|[http://novtex.ru/pi.html Software engineering]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/AduenkoStrijov2012TextVisualizingII/AduenkoStrijov2012TextVisualizing.tex]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/AduenkoStrijov2012TextVisualizingII/AduenkoStrijov2012TextVisualizing.tex]
|13.11.2012
|13.11.2012
-
|Опубликовано
+
|Published
|-
|-
-
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/AduenkoStrijov2012TextVisualizing/aduenko_strijov1.pdf Визуализация матрицы парных расстояний между документами]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/AduenkoStrijov2012TextVisualizing/aduenko_strijov1.pdf Visualization of the matrix of paired distances between documents]
-
|Адуенко А.А., Strizhov V.V.
+
|Aduenko A.A., Strijov V.V.
-
|[http://ntv.spbstu.ru/index4.html Научно-технические ведомости С.-Пб.ПГУ]
+
|[http://ntv.spbstu.ru/index4.html Scientific and technical statements of S.-Pb.PSU]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/AduenkoStrijov2012TextVisualizing/AduenkoStrijov2012TextVisualizing.tex]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/AduenkoStrijov2012TextVisualizing/AduenkoStrijov2012TextVisualizing.tex]
|29.10.2012
|29.10.2012
-
|Подано
+
|Submitted
|-
|-
-
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Medvednikova2012CoIndicator/doc/medvednikova_strijov.pdf Построение интегрального индикатора качества научных публикаций методами ко-кластеризации]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Medvednikova2012CoIndicator/doc/medvednikova_strijov.pdf Construction of an integral indicator of the quality of scientific publications by co-clustering methods]
-
|Медведникова М.М., Strizhov V.V.
+
|Medvednikova M.M., Strijov V.V.
-
|[http://publishing.tsu.tula.ru/EstestvNauki.html Известия ТулГу]
+
|[http://publishing.tsu.tula.ru/EstestvNauki.html Izvestiya TulGu]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Medvednikova2012CoIndicator/doc/Medvednikova2012CoIndicator.tex]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Medvednikova2012CoIndicator/doc/Medvednikova2012CoIndicator.tex]
|15.11.2012
|15.11.2012
-
|Опубликовано
+
|Published
|-
|-
-
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Aduenko2012CovSelection/aduenko_strijov3.pdf Совместный выбор объектов and признаков в Taskх многоклассовой классификации коллекции документов]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Aduenko2012CovSelection/aduenko_strijov3.pdf Joint selection of objects and features in The problems of multiclass classification of a collection of documents]
-
|Адуенко А.А., Strizhov V.V.
+
|Aduenko A.A., Strijov V.V.
-
| [http://ikt.psuti.ru/rules/ Инфокоммуникационные технологии]
+
|[http://ikt.psuti.ru/rules/ Infocommunication technologies]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Aduenko2012CovSelection/abstract_modified.tex]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Aduenko2012CovSelection/abstract_modified.tex]
|18.12.2012
|18.12.2012
-
|Опубликовано
+
|Published
|-
|-
-
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Ivanova2012LogicStructure/ivanova_aduenko_strijov.pdf Алгоритм построения логических правил при разметке текстов]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Ivanova2012LogicStructure/ivanova_aduenko_strijov.pdf Algorithm for constructing logical rules when marking up texts]
-
|Иванова А.В., Адуенко А.А., Strizhov V.V.
+
|Ivanova A.B., Aduenko A.A., Strijov V.V.
-
|[http://novtex.ru/pi.html Программная инженерия]
+
|[http://novtex.ru/pi.html Software engineering]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Ivanova2012LogicStructure]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Ivanova2012LogicStructure]
|24.01.2013
|24.01.2013
-
|Принято
+
|Accepted
|-
|-
-
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Tsyganova2012TopicIerarhy/tsyganova_strijov.pdf Построение иерархических тематических моделей коллекции документов]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Tsyganova2012TopicIerarhy/tsyganova_strijov.pdf Building hierarchical topic models of document collections]
-
|Цыганова С.В., Strizhov V.V.
+
|Tsyganova S.V., Strijov V.V.
-
|[http://www.appliedinformatics.ru/r/authors/ Прикладная информатика]
+
|[http://www.appliedinformatics.ru/r/authors/ Applied Informatics]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Tsyganova2012TopicIerarhy/Tsyganova2012TopicIerarhy_copy.tex]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Tsyganova2012TopicIerarhy/Tsyganova2012TopicIerarhy_copy.tex]
|27.01.2013
|27.01.2013
-
|Опубликовано
+
|Published
|-
|-
-
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Varfolomeeva2012StructureLearning/doc/varfolomeeva_strijov.pdf Выбор признаков при разметке библиографических списков методами структурного обучения]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Varfolomeeva2012StructureLearning/doc/varfolomeeva_strijov.pdf Choice of features when marking bibliographic lists by methods of structured learning]
-
|Варфоломеева А.А., Strizhov V.V.
+
|Varfolomeeva A.A., Strijov V.V.
-
|[http://ntv.spbstu.ru/index4.html Научно-технические ведомости С.-Пб.ПГУ]
+
|[http://ntv.spbstu.ru/index4.html Scientific and technical statements of S.-Pb.PSU]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Varfolomeeva2012StructureLearning/doc/Varfolomeeva2012StrcLearning.tex]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Varfolomeeva2012StructureLearning/doc/Varfolomeeva2012StrcLearning.tex]
|27.01.2013
|27.01.2013
-
|Отрецензировано
+
|Reviewed
|-
|-
-
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Celyh2012SparceDistribution/doc/doc/celyh_vorontsov.pdf Критерии согласия для разреженных дискретных распределений and их применение в тематическом моделировании]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Celyh2012SparceDistribution/doc/doc/celyh_vorontsov.pdf Goodness-of-fit criteria for sparse discrete distributions and their application in topic modeling]
-
|Целых В.Р., Воронцов К.В.
+
|Tselykh V.R., Vorontsov K. V.
-
|[http://jmlda.org Машинное обучение and анализ данных]
+
|[http://jmlda.org Machine learning and data analysis]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Celyh2012SparceDistribution/doc/doc/CelyhVorontsov2013sparse.tex]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Celyh2012SparceDistribution/doc/doc/CelyhVorontsov2013sparse.tex]
|17.12.2012
|17.12.2012
-
|Опубликовано
+
|Published
|-
|-
-
|Проверка адекватности тематической модели
+
|Checking the adequacy of the topic model
-
|Степан Лобастов
+
|Stepan Lobastov
|
|
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Lobastov2012LatentModels/Doc/LatentModels.tex]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Lobastov2012LatentModels/Doc/LatentModels.tex]
|
|
-
|Редакция
+
|Redaction
|}
|}
-
== Список принятых к публикации работ ==
+
'''List of works accepted for publication'''
-
* 1. Aduenko A. A., Стрижов В. В. Визуализация матрицы парных расстояний между документами // Научно-технический вестник С.-Пб. ПГУ. Информатика. Телекоммуникации. Управление, 2013, 1 — ?.
+
* 1. Aduenko A. A., Strijov V. V. V.V. Visualization of the matrix of paired distances between documents // Scientific and technical bulletin of St. Petersburg. PGU. Computer science. Telecommunications. Management, 2013, 1 - ?.
-
* 2. Aduenko A. A., Кузьмин А. А., Стрижов В. В. Выбор признаков and оптимизация метрики при кластеризации коллекции документов // Известия Тульского государственного университета, Естественные науки, 2012, 3. С. 119-132.
+
* 2. Aduenko A. A., Kuzmin A. A., Strijov V. V. V. V. Feature selection and metric optimization when clustering a collection of documents // Proceedings of the Tula State University, Natural Sciences, 2012, No. 3. P. 119-132.
-
* 3. Aduenko A. A., Стрижов В. В. Алгоритм оптимального расположения названий коллекции документов // Программная инженерия, 2013. 3. С.21-25.
+
* 3. Aduenko A. A., Strijov V. V. V.V. Algorithm for the optimal location of the names of a collection of documents // Software engineering, 2013. No. 3. P.21-25.
-
* 4. Будников Е. А., Стрижов В. В. Оценивание вероятностей появления строк в коллекции документов // Информационные технологии, 2013. 4.
+
* 4. Budnikov E. A., Strijov V. V. V. V. Estimating the Probabilities of Strings in a Collection of Documents // Information Technology, 2013. No. 4.
-
* 5. Кузьмин А. А., Strizhov V.V. Проверка адекватности тематических моделей коллекции документов // Программная инженерия, 2013. 4.
+
* 5. Kuzmin A. A., Strijov V. V. Checking the adequacy of the topic models of a collection of documents // Software engineering, 2013. No. 4.
-
* 6. Медведникова М. М., Strizhov V.V. Построение интегрального индикатора качества научных публикаций методами ко-кластеризации // Известия Тульского государственного университета, Естественные науки, 2013. №1.
+
* 6. Medvednikova M. M., Strijov V.V. Construction of an integral indicator of the quality of scientific publications by co-clustering methods // Proceedings of the Tula State University, Natural Sciences, 2013. No. 1.
-
* 7. Aduenko A. A., Стрижов В. В. Совместный выбор объектов and признаков в Taskх многоклассовой классификации коллекции документов // Инфокоммуникационные технологии, 2013. 2.
+
* 7. Aduenko A. A., Strijov V. V. V. V. Joint selection of objects and features in The problems of multiclass classification of a collection of documents // Infocommunication technologies, 2013. No. 2.
-
* 8. Иванова А.В., Aduenko A. A., Стрижов В. В. Алгоритм построения логических правил при разметке текстов // Программная инженерия, 2013. 4(5).
+
* 8. Ivanova A.V., Aduenko A.A., Strijov V.V. V.V. Algorithm for constructing logical rules when marking up texts // Software engineering, 2013. No. 4(5).
-
* 9. Цыганова С.В., Стрижов В. В. Построение иерархических тематических моделей коллекции документов // Прикладная информатика, 2013. 1.
+
* 9. Tsyganova S.V., Strijov V.V. V. V. Building hierarchical topic models of document collections // Applied Informatics, 2013. No. 1.
-
* 10. Варфоломеева А.А., Стрижов В. В. Выбор признаков при разметке библиографических списков методами структурного обучения // Научно-технический вестник С.-Пб. ПГУ. Информатика. Телекоммуникации. Управление, 2013.
+
* 10. Varfolomeeva A.A., Strijov V.V. V. V. Choice of features when marking bibliographic lists by methods of structured learning // Scientific and Technical Bulletin of St. Petersburg. PGU. Computer science. Telecommunications. Management, 2013.
-
* 11. Целых В.Р., Воронцов К.В. Критерии согласия для разреженных дискретных распределений and их применение в тематическом моделировании // JMLDA, 2012. №4. С. 432-442.
+
* 11. Tselykh V.R., Vorontsov K. V. Goodness-of-fit criteria for sparse discrete distributions and their application in topic modeling // JMLDA, 2012. No. 4. pp. 432-442.
-
[[Категория:Учебные курсы]]
+
-
==Моя первая публикация с кросс-рецензированием==
 
-
 
-
== Список задач ==
 
{|class="wikitable"
{|class="wikitable"
|-
|-
-
! Task name
+
! Title
! Author
! Author
! Reviewer
! Reviewer
-
! Link to work
+
! Link
! Comments
! Comments
|-
|-
-
|CMARS: аппроксимация сплайнами
+
|CMARS: spline approximation
-
|Влада Целых
+
|Vlada Tselykh
-
|Татьяна Шпакова
+
|Tatiana Shpakova
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Celyh2012CMARS/ Celyh2012CMARS]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Celyh2012CMARS/ Celyh2012CMARS]
|[.]сaipvdstrj(10)
|[.]сaipvdstrj(10)
|-
|-
-
|Алгоритмические основы построения банковских скоринговых карт
+
|Algorithmic foundations for constructing bank scoring cards
|Alexander Aduenko
|Alexander Aduenko
-
|Алина Иванова
+
|Alina Ivanova
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Aduenko2012economics/ Aduenko2012economics]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Aduenko2012economics/ Aduenko2012economics]
|[.]сaipvdstrj(10)
|[.]сaipvdstrj(10)
|-
|-
-
|Использование метода главных компонент при построении интегральных индикаторов
+
|Using the method of principal components in the construction of integral indicators
-
|Мария Медведникова
+
|Maria Medvednikova
-
|Светлана Цыганова
+
|Svetlana Tsyganova
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Medvednikova2012PCA/ Medvednikova2012PCA]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Medvednikova2012PCA/ Medvednikova2012PCA]
|[r]сaipvdstrj(10)
|[r]сaipvdstrj(10)
|-
|-
-
|Многоуровневая классификация при обнаружении движения цен
+
|Multi-level classification for price movement detection
-
|Арсентий Кузьмин
+
|Arsenty Kuzmin
-
|Анна Варфоломеева
+
|Varfolomeeva A.A.
-
| [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Kuzmin2012TimeRows/ Kuzmin2012TimeRows]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Kuzmin2012TimeRows/ Kuzmin2012TimeRows]
|[r]сaipvdstjr(10)
|[r]сaipvdstjr(10)
|-
|-
-
|Локальные методы прогнозирования с выбором инвариантного преобразования
+
|Local forecasting methods with the choice of an invariant transformation
-
|Светлана Цыганова
+
|Svetlana Tsyganova
-
|Мария Медведникова
+
|Maria Medvednikova
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Tsyganova2012LocalForecast/ Tsyganova2012 LocalForecast]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Tsyganova2012LocalForecast/ Tsyganova2012 LocalForecast]
|[r]сaipvdstjr(10)
|[r]сaipvdstjr(10)
|-
|-
-
|Прогноз квазипериодических многомерных временных рядов непараметрическими методами (пример)
+
|Prediction of Quasi-Periodic Multivariate Time Series by Non-Parametric Methods (example)
-
|Егор Клочков
+
|Egor Klochkov
-
|Александр Шульга
+
|Alexander Shulga
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Klochkov2012Goods4Cast Klochkov2012Goods4Cast]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Klochkov2012Goods4Cast Klochkov2012Goods4Cast]
|[r]сaipvdstj.(10)
|[r]сaipvdstj.(10)
|-
|-
-
|Алгоритмы переборного поиска наиболее информативных объектов and признаков в логистической регрессии (пример)
+
|Search algorithms for the most informative objects and features in logistic regression (example)
-
|Степан Лобастов
+
|Stepan Lobastov
-
|Егор Клочков
+
|Egor Klochkov
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Lobastov2012FOSelection/ Lobastov2012FOSelection]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Lobastov2012FOSelection/ Lobastov2012FOSelection]
|[r]сaipvdstrj(10)
|[r]сaipvdstrj(10)
|-
|-
-
|Локальные методы прогнозирования с выбором метрики
+
|Local forecasting methods with the choice of metric
-
|Анна Варфоломеева
+
|Varfolomeeva A.A.
-
|Арсентий Кузьмин
+
|Arsenty Kuzmin
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Varfolomeeva2012LocForecastMetrics/ Varfolomeeva2012 LocForecastMetrics]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Varfolomeeva2012LocForecastMetrics/ Varfolomeeva2012 LocForecastMetrics]
|[r]сaipvdstjr(10)
|[r]сaipvdstjr(10)
|-
|-
-
|Полиномы Чебышева and прогнозирование временных рядов
+
|Chebyshev polynomials and time series forecasting
-
|Валерия Бочкарева
+
|Valeria Bochkareva
-
|Степан Лобастов
+
|Stepan Lobastov
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Bochkareva2012TimeSeriesPrediction Bochkareva2012TimeSeriesPrediction]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Bochkareva2012TimeSeriesPrediction Bochkareva2012TimeSeriesPrediction]
|[.]сaipvdst-r(9)
|[.]сaipvdst-r(9)
|-
|-
-
|Кластеризация and составление словаря аминокислотных последовательностей
+
|Clustering and compiling a dictionary of amino acid sequences
-
|Татьяна Шпакова
+
|Tatiana Shpakova
-
|Влада Целых
+
|Vlada Tselykh
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Shpakova2012Clustering/ Shpakova2012Clustering]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Shpakova2012Clustering/ Shpakova2012Clustering]
|[.]сaipvdst.(9)
|[.]сaipvdst.(9)
|-
|-
-
|Векторная авторегрессия and управление макроэкономическими показателями
+
|Vector autoregression and management of macroeconomic indicators
-
|Александр Шульга
+
|Alexander Shulga
|
|
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Shulga2012VAR Shulga2012VAR]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Shulga2012VAR Shulga2012VAR]
|[.]сaipvds..(9)
|[.]сaipvds..(9)
|-
|-
-
|Аппроксимация эмпирических функций распределения
+
|Approximation of empirical distribution functions
-
|Алина Иванова
+
|Alina Ivanova
|Alexander Aduenko
|Alexander Aduenko
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Ivanova2012ApproximateFunc/ Ivanova2012 ApproximateFunc]
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group974/Ivanova2012ApproximateFunc/ Ivanova2012 ApproximateFunc]
Строка 6087: Строка 5961:
|}
|}
-
== Аннотации ==
+
===1===
-
=== Алгоритмы переборного поиска наиболее информативных объектов and признаков в логистической регрессии ===
+
* Search algorithms for the most informative objects and features in logistic regression
-
Логистическая регрессия – это статистическая модель, которая применяется для предсказания вероятности возникновения некоторого события по значениям множества признаков. Она находит применение, например, в медицине [http://math.tntech.edu/machida/MSD/lecture7.pdf] and кредитном скроллинге. В реальных условиях число признаков обычно велико, and важнейшей задачей является выбор только существенных признаков , а также поиск объектов, которые по тем или иным причинам являются атипичными.
+
* Logistic regression is a statistical model that is used to predict the probability of an event occurring based on the values of a set of features. It has applications, for example, in medicine [http://math.tntech.edu/machida/MSD/lecture7.pdf] and credit scrolling. In real conditions, the number of features is usually large, and the most important The problem is to select only essential features, as well as to search for objects that are atypical for one reason or another.
 +
* Keywords: logit model, feature selection, boosting.
-
Ключевые слова: logit model, feature selection, boosting.
+
===2===
 +
* Using the method of principal components in the construction of integral indicators
 +
* This paper considers Using the method of principal components in the construction of integral indicators. The results obtained are compared with the results given by the Pareto stratification method. An integral indicator is being built for Russian universities. For this, biographies of the 30 richest businessmen in Russia according to the Forbes magazine for 2011 are used.
 +
* ''Keywords:'' integral indicator, expert estimates, parameter weights, principal component method, Pareto stratification method.
-
===Использование метода главных компонент при построении интегральных индикаторов===
+
===3===
-
В данной работе рассматривается использование метода главных компонент при построении интегральных индикаторов. Полученные результаты сравниваются с результатами, даваемыми методом расслоения Парето. Строится интегральный индикатор для российских вузов. Для этого используются биографии 30 богатейших бизнесменов России по версии журнала "Forbes" за 2011 год.
+
* Approximation of empirical distribution functions
 +
* The work is devoted to methods of approximation of functions for efficient calculation of integrals. Practical The problems usually have data at certain points in time or space. When making assumptions about the remaining points, it becomes necessary to approximate the distribution function of the quantity under study, as well as to estimate the corresponding error. For its calculation, it is possible to use methods of different accuracy.
 +
* Keywords: Monte Carlo method, calculation of distribution functions, empirical distribution functions.
-
''Ключевые слова'': интегральный индикатор, экспертные оценки, веса параметров, метод главных компонент, метод расслоения Парето.
+
===4===
 +
* Local prediction methods with choice of transformation
 +
* Time series forecasting problems have many applications in various fields such as economics, physics, and medicine. Their solution is a forecast for the near future based on the already known values of the predicted series at previous points in time. In the work, a local forecasting algorithm will be built taking into account transformations, which allows, without human intervention, to identify visually similar sections of the time series.
-
===Аппроксимация эмпирических функций распределения===
+
==2011==
-
Работа посвящена методам аппроксимации функций для эффективного вычисления интегралов. В практических Taskх обычно имеются данные в определенных точках времени или пространства. При построении предположений об остальных точках возникает необходимость аппроксимации функции распределения исследуемой величины, а также оценка соответствующей ошибки. Для ее расчета есть возможность использовать методы разной точности.
+
-
 
+
-
Ключевые слова: метод Монте-Карло, вычисление функцй распределения, эмпирические функции распределения.
+
-
 
+
-
===Методы локального прогнозирования с выбором преобразования===
+
-
Задачи прогнозирования временных рядов имеют множество приложений в различных областях, таких как экономика, физика, медицина. Их решением является прогноз на недалекое будущее по уже известным значениям прогнозируемого ряда в предыдущие моменты времени. В работе будет построен алгоритм локального прогнозирования с учетом преобразований, позволяющий без участия человека выявить визуально похожие участки временного ряда.
+
-
 
+
-
Ключевые слова: локальное прогнозирование, преобразование
+
-
 
+
-
== Черновой список задач ==
+
-
# Кластеризация and составление словаря аминокислотных последовательностей
+
-
# Oblivious decision trees: алгоритм Яндекс для системы Полигон
+
-
# Сравнительный анализ регрессионных остатков в SVN-регрессии
+
-
# Алгоритмы нахождения гауссовских смесей
+
-
# Прогноз квазипериодических многомерных временных рядов непараметрическими методами
+
-
# Многоуровневая классификация при обнаружении движения цен
+
-
# CMARS: аппроксимация сплайнами
+
-
# Полиномы Чебышева and метод прогонки при прогнозировании временных рядов
+
-
# Сравнение методов ARMA and FLS при ретроспективном прогнозировании
+
-
# Локальные методы прогнозирования с выбором метрики
+
-
# Локальные методы прогнозирования с выбором инвариантного преобразования
+
-
# Алгоритмы переборного поиска наиболее информативных объектов and признаков в логистической регрессии
+
-
# Векторная авторегрессия and управление макроэкономическими показателями
+
-
# Построение рейтинга российских вузов по открытым данным об успешности карьеры их выпускников
+
-
 
+
-
== Ещё задачи ==
+
-
# Анализ текста методами структурного обучения
+
-
# Аппроксимация эмпирических функций распределения
+
-
# Алгоритмические основы построения банковских скоринговых карт
+
-
# Сингулярное разложение and поисковая машина
+
-
# Сравнение алгоритмов многокритериальной оптимизации
+
-
# Уточнение Expertных оценок на данных в ранговых шкалах (интервальные, конусы, веса Expertов, копулы)
+
-
# Уточнение Expertных оценок при анализе работы механизма устойчивого развития энергетики
+
-
# Визуализация пространства параметров регрессионных моделей
+
-
# Восстановление регрессии методом главных компонент
+
-
# Оценка гиперпараметров путем сэмплирования
+
-
# Прореживание существенно нелинейных моделей с помощью гиперпараметров
+
-
# Фактор Оккама для параметрических моделей с известной областью определения параметров
+
-
# Создание алгоритмов последовательной модификации моделей
+
-
# Порождение and выбор моделей классификации
+
-
 
+
-
== and еще задачи ==
+
-
* Функция расстояния между формулами and поиск.
+
-
* Поиск объектов (техническая работа).
+
-
 
+
-
== + ==
+
-
* Авторегрессия
+
-
* Векторная авторегрессия
+
-
* Экспоненциальное сглаживание
+
-
* Локальные методы, поиск метрики
+
-
* Локальные методы с инвариантами, метрика фиксирована
+
-
* ARIMA
+
-
* Многомерная гусеница, выбор длины гусеницы
+
-
* Многомерная гусеница, выбор рядов
+
-
* Прогнозирование с использованием DTW
+
-
* Скользящее среднее, выбор ядер
+
-
* Скользящее среднее с забыванием истории
+
-
* Скользящее среднее временных рядов с периодической составляющей
+
-
* Прогнозирование нейронными сетями
+
-
* Анализ качества прогноза
+
-
* Метаописание временных рядов
+
-
* Логическое прогнозирование
+
-
* SVN – регрессия
+
-
* Дискретное прогнозирование, музыка.
+
-
 
+
-
== Составить ==
+
-
* Список типичных типографических ошибок
+
-
* Список ошибок BibTeX
+
-
 
+
-
=2011=
+
-
 
+
-
==Публикация в журнале JMLDA==
+
-
 
+
-
Перед выполнением заданий рекомендуются к прочтению
+
-
* [[Численные методы обучения по прецедентам (практика, Strizhov V.V.)|Численные методы обучения по прецедентам]]
+
-
* [[Отчет о выполнении исследовательского проекта (практика, Strizhov V.V.)|Отчет о выполнении исследовательского проекта]]
+
-
* [[Автоматизация and стандартизация научных исследований (практика, Strizhov V.V.)|Автоматизация and стандартизация научных исследований]]
+
-
 
+
-
== Задачи ==
+
{|class="wikitable"
{|class="wikitable"
|-
|-
-
! Название задачи
+
! Name
-
! Работу выполняет
+
! Author
-
! Рецензент
+
! Reviewer
-
! Ссылка на работу
+
! Link
-
! Комментарии
+
|-
|-
-
| Устойчивость and сходимость оценок гиперпараметров линейных регрессионных моделей (пример)|Оценивание гиперпараметров линейных регрессионных моделей при отборе шумовых and коррелирующих признаков
+
| Stability and convergence of estimates of hyperparameters of linear regression models (example)|Estimation of hyperparameters of linear regression models in the selection of noise and correlated features
-
| Токмакова Александра
+
| Tokmakova Alexandra
-
| Мотренко Анастасия
+
| A. P. Motrenko
-
| [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Tokmakova2011HyperPar Tokmakova2011HyperPar]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Tokmakova2011HyperPar Tokmakova2011HyperPar]
-
|
+
|-
|-
-
| Выбор моделей прогнозирования объемов потребления and цен электроэнергии (пример)|Выбор моделей прогнозирования цен на электроэнергию
+
| Choice of forecasting models for electricity consumption and prices (example)|Choice of forecasting models for electricity prices
-
| Леонтьева Любовь
+
| Leontieva Lyubov
-
| Гребенников Евгений
+
| Grebennikov Evgeny
-
| [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Leonteva2011ElectricityConsumption Leonteva2011ElectricityConsumption]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Leonteva2011ElectricityConsumption Leonteva2011ElectricityConsumption]
-
|
+
|-
|-
-
| Многоклассовый прогноз вероятности наступления инфаркта and оценка необходимого объема выборки пациентов (пример)
+
| Multiclass prediction of the probability of myocardial infarction and estimation of the required sample size of patients (example)
-
| Мотренко Анастасия
+
| A. P. Motrenko
-
| Токмакова Александра
+
| Tokmakova Alexandra
-
| [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Motrenko2011HAPrediction Motrenko2011HAPrediction]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Motrenko2011HAPrediction Motrenko2011HAPrediction]
-
|
+
|-
|-
-
| Алгоритмы порождения существенно-нелинейных моделей
+
| Algorithms for generating essentially non-linear models
-
| Георгий Рудой
+
| Georgy Rudoy
-
| Николай Балдин
+
| Nikolai Baldin
-
| [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Rudoy2011Generation/ Rudoy2012Generation]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Rudoy2011Generation/ Rudoy2012Generation]
-
|
+
|-
|-
-
| Событийное моделирование and прогноз цен на сахар|Событийное моделирование and прогноз финансовых временных рядов
+
| Event Modeling and Sugar Price Forecast|Event Modeling and Financial Time Series Forecast
-
| Александр Романенко
+
| Alexander Romanenko
-
| Егор Будников
+
| Budnikov E. A.
-
| [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Romanenko2011Event/ Romanenko2011Event]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Romanenko2011Event/ Romanenko2011Event]
-
|
+
|-
|-
-
| Статистические модели естественных языков|Обзор некоторых статистических моделей естественного языка
+
| Statistical models of natural languages|Overview of some statistical models of natural language
-
| Егор Будников
+
| Budnikov E. A.
-
| Александр Романенко
+
| Alexander Romanenko
-
| [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Budnikov2011Statistical Budnikov2011Statistical]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Budnikov2011Statistical Budnikov2011Statistical]
-
|
+
|}
|}
-
==Моя первая публикация в журнале JMLDA==
+
'''Practical part'''
-
Перед выполнением заданий рекомендуются к прочтению
 
-
* [[Численные методы обучения по прецедентам (практика, Strizhov V.V.)|Численные методы обучения по прецедентам]]
 
-
* [[Отчет о выполнении исследовательского проекта (практика, Strizhov V.V.)|Отчет о выполнении исследовательского проекта]]
 
-
* [[Автоматизация and стандартизация научных исследований (практика, Strizhov V.V.)|Автоматизация and стандартизация научных исследований]]
 
-
 
-
См. также
 
-
* [[Временной ряд (библиотека примеров)]]
 
-
 
-
== Задачи ==
 
{| class="wikitable"
{| class="wikitable"
|-
|-
-
! Название задачи
+
! Name
-
! Работу выполняет
+
! Author
-
! Работу рецензирует
+
! Reviewer
-
! Ссылка на работу
+
! Link
-
! Комментарии
+
! Comments
|-
|-
-
| Использование теста Гренджера при прогнозировании временных рядов
+
| Using the Granger Test in Time Series Forecasting
| Anastasia Motrenko
| Anastasia Motrenko
-
| Любовь Леонтьева
+
| Leontieva Lyubov
-
| [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Motrenko2011GrangerForc Motrenko2011GrangerForc]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Motrenko2011GrangerForc Motrenko2011GrangerForc]
-
| Опубл. в JMLDA
+
| Published at JMLDA
|-
|-
-
| Выбор функции активации при прогнозировании нейронными сетями
+
| Choosing an Activation Function for Predicting Neural Networks
-
| Георгий Рудой
+
| Georgy Rudoy
-
| Николай Балдин
+
| Nikolai Baldin
-
| [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Rudoy2011NNForecasting Rudoy2011NNForecasting]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Rudoy2011NNForecasting Rudoy2011NNForecasting]
-
| Опубл. в JMLDA
+
| Published at JMLDA
|-
|-
-
| [[Многомерная гусеница, выбор длины and числа компонент гусеницы (пример)]]
+
|Multidimensional caterpillar, choice of length and number of caterpillar components
-
| Любовь Леонтьева
+
| Leontieva Lyubov
-
| Михаил Бурмистров
+
| Mikhail Burmistrov
-
| [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Leonteva2011GaterpillarLearning Leonteva2011GaterpillarLearning]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Leonteva2011GaterpillarLearning Leonteva2011GaterpillarLearning]
-
| Опубл. в JMLDA
+
| Published at JMLDA
|-
|-
-
| [[Прогнозирование функциями дискретного аргумента (пример)]]
+
|[[Prediction by Discrete Argument Functions (example)]]
-
| Егор Будников
+
| Budnikov E. A.
-
| Александр Романенко
+
| Alexander Romanenko
-
| [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Budnikov2011DiscreteForecasting Budnikov2011DiscreteForecasting]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Budnikov2011DiscreteForecasting Budnikov2011DiscreteForecasting]
-
| Опубл. в JMLDA
+
| Published at JMLDA
|-
|-
-
| Исследование сходимости при прогнозировании нейронными сетями с обратной связью
+
| Investigation of Convergence in Prediction by Neural Networks with Feedback
-
| [[Участник:nkgrin|Николай Балдин]]
+
|[http://www.machinelearning.ru/wiki/index.php?title=Участник:nkgrin Nikolai Baldin]
-
| Георгий Рудой
+
| Georgy Rudoy
-
| [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Baldin2011FNNForecasting Baldin2011FNNForecasting]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Baldin2011FNNForecasting Baldin2011FNNForecasting]
-
| Опубл. в JMLDA
+
| Published at JMLDA
|-
|-
-
| Выравнивание временных рядов: прогнозирование с использованием DTW (пример)|Выравнивание временных рядов: прогнозирование с использованием DTW
+
| Time series alignment: Forecasting with DTW (example)|Time series alignment: Forecasting with DTW
-
| Александр Романенко
+
| Alexander Romanenko
-
| Егор Будников
+
| Budnikov E. A.
-
| [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Romanenko2011DTWForecasting Romanenko2011DTWForecasting]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Romanenko2011DTWForecasting Romanenko2011DTWForecasting]
-
| Опубл. в JMLDA
+
| Published at JMLDA
|-
|-
-
|[[Выделение периодической компоненты временного ряда (пример)]]
+
|[[Isolation of the periodic component of the time series (example)]]
-
| Александра Токмакова
+
| Tokmakova Alexandra
-
| Егор Будников
+
| Budnikov E. A.
-
| [https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Tokmakova2011Periodic Tokmakova2011Periodic]
+
|[https://mlalgorithms.svn.sourceforge.net/svnroot/mlalgorithms/Group874/Tokmakova2011Periodic Tokmakova2011Periodic]
-
| Опубл. в JMLDA
+
| Published at JMLDA
|-
|-
|}
|}
-
 
-
==Краткое описание задач==
 
-
 
-
=== Task 1: Непараметрическое прогнозирование: выбор ядра, настройка параметров ===
 
-
В работе описывается метод ядерного сглаживания временного ряда, как один из видов непараметрической регрессии. Суть метода
 
-
состоит в восстановлении функции времени, как взвешенной линейной комбинации точек из некоторой окрестности. Непрерывную ограниченную симметричную вещественную весовую функцию называют ядром. Полученная ядерная оценка используется для прогнозирования следующей точки ряда. Исследуется зависимость качества прогнозирования от параметров ядра and наложенного шума.
 
-
=== Task 2: Экспоненциальное сглаживание and прогноз ===
+
===1. 2011===
-
В работе исследуется применение алгоритма экспоненциального сглаживания к прогнозированию временных рядов. В основе алгоритма лежит учет предыдущих значений ряда с весами, убывающими по мере удаления от исследуемого участка временного ряда. Изучено поведение алгоритма на модельных данных в различных моделях весов. Проведен анализ работы алгоритма на реальных данных -– биржевых индексах.
+
* Non-parametric forecasting: kernel selection, parameter tuning
-
 
+
* The paper describes the method of nuclear smoothing of the time series, as one of the types of nonparametric regression. The essence of the method
-
=== Task 3: [[Выделение периодической компоненты временного ряда (пример)]] ===
+
consists in restoring the function of time as a weighted linear combination of points from some neighborhood. A continuous bounded symmetric real weight function is called a kernel. The resulting kernel estimate is used to predict the next point in the series. The dependence of the quality of prediction on the parameters of the kernel and the superimposed noise is investigated.
-
 
+
-
В проекте исследуется временной ряд на наличие периодической компоненты, строится тригонометрическая интерполяция предложенных временных рядов методом наименьших квадратов. Производится оценка параметров функции метода наименьших квадратов в зависимости от качества прогнозирования. В вычислительном эксперименте приводятся результаты работы корреляционной функции and метода наименьших квадратов на зашумлённом модельном синусе and реальном временном ряде электрокардиограммы.
+
===2. 2011===
-
 
+
* Exponential Smoothing and Prediction
-
===Task 4: Многомерная гусеница, выбор длины and числа компонент гусеницы (сравнение сглаженного and несглаженного временного ряда) (пример)===
+
* The paper investigates the application of the exponential smoothing algorithm to time series forecasting. The algorithm is based on taking into account the previous values of the series with weights decreasing as you move away from the studied section of the time series. The behavior of the algorithm on model data in various models of weights is studied. An analysis of the operation of the algorithm on real data - stock indices was carried out.
-
 
+
-
В работе описывается метод гусеницы and его применение для прогнозирования временных рядов. Алгоритм основан на выделении из изучаемого временного ряда его информативных компонент and последующего построения прогноза. Исследуется зависимость точности прогнозов от выбора длины гусеницы and числа ее компонент. В вычислительном эксперименте приводятся результаты работы алгоритма на периодических рядах с разным рисунком внутри периода, на рядах с нарушением периодичности, а так же на реальных рядах почасовой температуры.
+
-
 
+
-
===Task 5: [[Прогнозирование функциями дискретного аргумента (пример)]] ===
+
-
 
+
-
В работе исследуются короткие временные ряды на примере монофонических музыкальных мелодий. Происходит прогнозирование одной ноты экспоненциальным сглаживанием, локальным методом, а также методом поиска постоянных закономерностей.
+
-
Вычислительный эксперимент проводится на двух мелодиях, одна из которых имеет точно повторяющиеся фрагменты.
+
-
 
+
-
===Task 7: Локальные методы прогнозирования,поиск метрики ===
+
-
Временной ряд делится на отдельные участки, каждому из которых сопоставляется точка в n-мерном пространстве признаков. Локальная модель рассчитывается в три последовательных этапа.
+
-
Первый – находит k-ближайших соседей наблюдаемой точки.
+
-
Второй – строит простую модель, используя только этих k соседей.
+
-
Третий – используя данную модель, по наблюдаемой точке прогнозирует следующую.
+
-
Многие исследователи, используют эвклидову метрику для измерения расстояний между точками.
+
-
Данная работа призвана сравнить точность прогнозирования при использовании различных метрик.
+
-
В частности, требуется исследовать оптимальный набор весов во взвешенной метрике для максимизации точности прогнозирования.
+
-
 
+
-
===Task 8: Локальные методы прогнозирования, поиск инвариантного преобразования ===
+
-
В проекте используются локальные методы прогнозирования
+
-
временных рядов. В этих методах не находится представления временного
+
-
ряда в классе заданных функций от времени. Вместо этого прогноз осуществляется на
+
-
основе данных о каком-то участке временного ряда (используется локальная информация).
+
-
В данной работе подробно исследован следующий метод (обобщение классического
+
-
«ближайшего соседа»).
+
-
 
+
-
Пусть имеется временной ряд, and стоит Task продолжить его. Предполагается, что такое продолжение определяется
+
-
предысторией, т.е. в ряде нужно найти часть, которая после
+
-
некоторого преобразования A становится схожа с той частью, которую мы стремимся прогнозировать. Поиск такого преобразования A and есть цель данного проекта. Для определения степени сходства используется функция B – функция близости двух отрезков
+
-
временного ряда (подробнее об этом см. [http://www.machinelearning.ru/wiki/index.php?title=%D0%9B%D0%BE%D0%BA%D0%B0%D0%BB%D1%8C%D0%BD%D1%8B%D0%B5_%D0%BC%D0%B5%D1%82%D0%BE%D0%B4%D1%8B_%D0%BF%D1%80%D0%BE%D0%B3%D0%BD%D0%BE%D0%B7%D0%B8%D1%80%D0%BE%D0%B2%D0%B0%D0%BD%D0%B8%D1%8F%2C%D0%BF%D0%BE%D0%B8%D1%81%D0%BA_%D0%BC%D0%B5%D1%82%D1%80%D0%B8%D0%BA%D0%B8_%28%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D1%80%29&action=edit здесь]). Так мы находим ближайшего соседа к нашей предыстории. В общем случае ищем несколько
+
-
ближайших соседей. Продолжение запишется в виде их линейной комбинации.
+
-
 
+
-
=== Task 9: Выравнивание временных рядов: прогнозирование с использованием DTW (пример) ===
+
-
 
+
-
[[временной ряд|Временным рядом]] называется последовательность упорядоченных по времени значений некоторой вещественной переменной <tex>$\mathbf{x}=\{x_{t}\}_{t=1}^T\in\mathbb{R}^T$</tex>. Task, сопутствующая появлению временных рядов, - сравнение одной последовательности данных с другой. Сравнение последовательностей существенно упрощается после деформации временного ряда вдоль одной из осей and его выравнивания. Dynamic time warping (DTW) представляет собой технику эффективного выравнивая временных рядов. Методы DTW используются при распознавании речи, при анализе информации в робототехнике, в промышленности, в медицине and других сферах.
+
-
 
+
-
Цель работы - привести пример выравнивания, ввести функционал сравнения двух временных рядов, обладающий естественными свойствами коммутативности, рефлексивности and транзитивностина. Функционал должен принимать на вход два временных ряда, а на выходе давать число, характеризующее степень их "похожести".
+
-
 
+
-
=== Task 10: Выбор функции активации при прогнозировании нейронными сетями===
+
-
 
+
-
Целью проекта является исследование зависимости качества прогнозирования нейронными сетями без обратной связи (одно- and многослойными перцептронами) от выбранной функции активации нейронов в сети, а также от параметров этой функции.
+
-
 
+
-
Результатом проекта является оценка качества прогнозирования нейронными сетями в зависимости от типа and параметров функции активации.
+
-
 
+
-
===Task 12: Исследование сходимости при прогнозировании нейронными сетями с обратной связью===
+
-
 
+
-
Исследуется зависимость скорости сходимости при прогнозировании временных рядов от параметров нейронной сети с обратной связью. Понятие обратной связи характерно для динамических систем, в которых выходной сигнал некоторого элемента cистемы
+
-
оказывает влияние на входной сигнал этого элемента. Выходной сигнал можно представить в виде бесконечной взвешенной
+
-
суммы текущего and предыдущих входных сигналов. В качестве модели нейронной сети используется сеть Джордана.
+
-
Предлагается исследовать скорость сходимости в зависимости от выбора функции активации (сигмоидной,
+
-
гиперболического тангенса), от числа нейронов в промежуточном слое and от ширины скользящего окна.
+
-
Также исследуется способ повышения скорости сходимости при использовании обобщенного дельта-правила.
+
-
===Task 13: [[Многомерная гусеница, выбор длины and числа компонент гусеницы (пример)]]===
+
===3. 2011 ===
 +
* Isolation of the periodic component of the time series
 +
* The project examines the time series for the presence of a periodic component, builds a trigonometric interpolation of the proposed time series using the least squares method. The parameters of the function of the least squares method are estimated depending on the quality of forecasting. In a computational experiment, the results of the work of the correlation function and the least squares method on a noisy model sine and a real time series of an electrocardiogram are presented.
-
Работа посвящена исследованию одного из методов анализа многомерных временных рядов - метода "гусеницы", также известного как Singular Spectrum Analysis или SSA. Метод можно разделить на четыре этапа - представление временного ряда в виде матрицы при помощи сдвиговой процедуры, вычисление ковариационной матрицы выборки and сингулярное ее разложение, отбор главных компонент,относящихся к различным составляющим ряда (от медленно меняющихся and периодических до шумовых), и, наконец, восстановление ряда.
+
===4. 2011 ===
 +
* Multivariate caterpillar, choice of length and number of caterpillar components (comparison of smoothed and unsmoothed time series)
 +
* The paper describes the caterpillar method and its application for time series forecasting. The algorithm is based on the selection of its informative components from the studied time series and the subsequent construction of a forecast. The dependence of the accuracy of forecasts on the choice of the caterpillar length and the number of its components is investigated. In a computational experiment, the results of the algorithm's operation on periodic series with different patterns within a period, on series with violation of periodicity, as well as on real time series of hourly temperature, are presented.
-
Областью применения алгоритма являются задачи как метеорологии and геофизики, так and экономики and медицины. Целью данной работы является выяснение зависимости эффективности алгоритма от выбора временных рядов, используемых в его работе.
+
===5. 2011===
 +
* Prediction by Discrete Argument Functions
 +
* The paper investigates short time series on the example of monophonic musical melodies. There is a prediction of one note by exponential smoothing, a local method, as well as a method of searching for constant patterns. The computational experiment is carried out on two melodies, one of which has exactly repeating fragments.
-
===Task 14: Использование теста Гренджера при прогнозировании временных рядов===
+
===7. 2011===
 +
* Local forecasting methods, search for metrics
 +
* The time series is divided into separate sections, each of which is associated with a point in the n-dimensional feature space. The local model is calculated in three successive stages. The first one finds the k-nearest neighbors of the observed point. The second one builds a simple model using only these k neighbors. The third - using this model, predicts the next one based on the observed point. Many researchers use the Euclidean metric to measure distances between points. This work is intended to compare the accuracy of forecasting when using different metrics. In particular, it is required to investigate the optimal set of weights in the weighted metric to maximize the prediction accuracy.
-
При прогнозировании ряда бывает полезно определить, является ли данный ряд "зависимым" от некоторого другого ряда. Выявить подобную связь помогает тест Грейнджера, основанный на статистических тестах(при этом метод не гарантирует точного результата - при сравнении двух рядов, зависящих от еще одного ряда возможна ошибка). Метод применяется при прогнозировании экономических явлений and явлений природного характера (например, землятрясений).
+
===8. 2011===
 +
* Local prediction methods, search for invariant transformation
 +
* The project uses local forecasting methods time series. There is no temporary representation in these methods series in the class of given functions of time. Instead, the prediction is made on the basis of data about some part of the time series (local information is used). In this paper, we study in detail the following method (a generalization of the classical "nearest neighbour").
 +
* Let there be a time series and The problem should continue it. It is assumed that such a continuation is determined
 +
prehistory, i.e. in a series you need to find the part that after some transformation of A becomes similar to the part we are trying to predict. Finding such a transformation A and is the goal of this project. To determine the degree of similarity, the function B is used - the function of the proximity of two segments time series. This is how we find the closest neighbor to our backstory. In general, we are looking for several nearest neighbors. The continuation will be written as their linear combination.
-
Цель работы - предложить алгоритм, наилучшим образом использующий данный метод; исследовать эффективность метода в зависимости от прогнозируемых рядов.
+
===9. 2011 ===
 +
* Time Series Flattening: Forecasting with DTW
 +
* Time series is a sequence of time-ordered values of some real variable <tex>$\mathbf{x}=\{x_{t}\}_{t=1}^T\in\mathbb{R }^T$</tex>. The problem that accompanies the appearance of time series is the comparison of one data sequence with another. Comparison of sequences is greatly simplified after the deformation of the time series along one of the axes and its alignment. Dynamic time warping (DTW) is a technique for effectively leveling time series. DTW methods are used in speech recognition, information analysis in robotics, industry, medicine and other areas.
 +
* The purpose of the work is to give an example of alignment, to introduce a comparison functional for two time series, which has the natural properties of commutativity, reflexivity and transitivity. The functional should take two time series as input, and at the output give a number characterizing the degree of their "similarity".
-
===Task 15: Прогнозирование and аппроксимация сплайнами===
+
===10. 2011===
-
Описание.
+
* Choosing an Activation Function for Predicting Neural Networks
 +
* The aim of the project is to study the dependence of the quality of prediction by neural networks without feedback (single- and multilayer perceptrons) on the chosen activation function of neurons in the network, as well as on the parameters of this function.
 +
* The result of the project is to evaluate the quality of forecasting by neural networks depending on the type and parameters of the activation function.
-
===Task 16: ARIMA and GARCH при прогнозировании высоковолатильных рядов ===
+
===12. 2011===
-
Описание.
+
* Investigation of Convergence in Prediction by Neural Networks with Feedback
 +
* The dependence of the convergence rate in time series forecasting on the parameters of a neural network with feedback is investigated. The concept of feedback is typical for dynamic systems in which the output signal of some element of the system affects the input signal of this element. The output signal can be represented as an infinite weighted the sum of the current and previous input signals. The Jordan network is used as a neural network model. It is proposed to investigate the rate of convergence depending on the choice of the activation function (sigmoid, hyperbolic tangent), on the number of neurons in the intermediate layer and on the width of the sliding window. We also explore a way to increase the rate of convergence using the generalized delta rule.
-
===Task 17: Прогнозирование and SVN–регрессия ===
+
===13. 2011===
-
Описание.
+
* Multidimensional caterpillar, choice of length and number of caterpillar components
 +
* The work is devoted to the study of one of the methods for analyzing multivariate time series - the "caterpillar" method, also known as Singular Spectrum Analysis or SSA. The method can be divided into four stages - the representation of the time series in the form of a matrix using a shift procedure, the calculation of the covariance matrix of the sample and its singular value decomposition, the selection of principal components related to various components of the series (from slowly changing and periodic to noise), and, finally, line restoration.
 +
* The scope of the algorithm is The problems of both meteorology and geophysics, and economics and medicine. The purpose of this work is to find out the dependence of the efficiency of the algorithm on the choice of time series used in its work.
-
== Доклады and экзамен (возможны уточнения) ==
+
===14. 2011===
-
* Доклад-1 6 апреля
+
* Using the Granger Test in Time Series Forecasting
-
* Контрольная точка 12 мая
+
* When predicting a series, it can be useful to determine whether a given series is "dependent" on some other series. The Granger test, based on statistical tests, helps to identify such a relationship (in this case, the method does not guarantee an accurate result - when comparing two rows that depend on another row, an error is possible). The method is used in forecasting economic and natural phenomena (for example, earthquakes).
-
* Экзамен 19 мая
+
* The purpose of the work is to propose an algorithm that makes the best use of this method; investigate the effectiveness of the method depending on the predicted series.

Текущая версия

Содержание

2023

Problem 112

  • Title: Modeling an FMRI reading from a video of a shown person
  • Problem description: It is required to build a dependence model of the readings of FMRI sensors and the video sequence that a person is viewing at this moment.
  • Data: The sample for approximation is presented in the work of J. Berezutskay, in which there are various types of parallel signals.
  • Literature: Berezutskaya J., et al Open multimodal iEEG-fMRI dataset from naturalistic stimulation with a short audiovisual film // Sci Data 9, 91, 2022.
  • Predecessor code:
  • Base algorithm: Running code based on transformer models.
  • Novelty: Analysis of the relationship between sensor readings and human perceptions of the external world. It is required to test the hypothesis of the relationship between the data, as well as to propose a method for approximating FMRI readings based on the video sequence being viewed.
  • Authors: Expert Grabovoi Andrey.

Problem 113

  • Title: Modeling of the FMRI indication on the sound range that a person hears
  • Problem description: It is required to build a model of the dependence of the readings of the FMRI sensors and the sound accompaniment that a person is listening to at this moment.
  • Data: The sample for approximation is presented in the work of J. Berezutskay, in which there are various types of parallel signals.
  • Literature: Berezutskaya J., et al Open multimodal iEEG-fMRI dataset from naturalistic stimulation with a short audiovisual film // Sci Data 9, 91, 2022.
  • Predecessor code:
  • Base algorithm: Running code based on transformer models.
  • Novelty: Analysis of the relationship between sensor readings and human perceptions of the external world. It is required to test the hypothesis of the relationship between the data, as well as to propose a method for approximating the FMRI readings from the listening sound series.
  • Authors: Expert Grabovoi Andrey.

Problem 114

  • Title: Simulating the Dynamics of Physical Systems with Physics-Informed Neural Networks
  • Problem description: The problem of choosing the optimal model for predicting the dynamics of a physical system is being solved. Under the dynamics of the system is understood the change in time of the parameters of the system. Neural networks do not have a priori knowledge about the system being modeled, which does not allow obtaining optimal parameters that take into account physical laws. The Lagrangian neural network takes into account the law of conservation of energy when modeling dynamics. In this paper, a Noetherian Agrangian neural network is proposed that takes into account the laws of conservation of momentum and angular momentum in addition to the law of conservation of energy. It is shown that for this problem the Noetherian Lagrangian neural network is optimal among the fully connected neural network model, the neural network with long-term short-term memory and the Lagrangian neural network. The simulation comparison was carried out on artificially generated data for the double pendulum system, which is the simplest chaotic system. The results of the experiments confirm the hypothesis that the introduction of a priori knowledge about the physics of the system improves the quality of the model.
  • Problem description:Generate a set of convolutions from the available data and choose the best one using order and dimensionality reduction techniques.
  • Data: Biomedical accelerometer and gyroscope data, ocean currents, dune movement, air currents.
  • Literature: The base work contains references.
  • Base algorithm: Neural network, Lagrangian neural networks.
  • Solution: Nesterov neural network.
  • Novelty: The proposed network takes into account the symmetry.
  • Authors: Experts Severilov, Strijov V.V., consultant - Panchenko.

Problem 115

  • Title: Knowledge distillation in deep networks and alignment of model structures
  • Problem description: It is required to build a network of the simplest structure, a student model, using a high quality teacher model. Show how the student's accuracy and stability change. The result of the experiment is a graph complexity-accuracy-stability, where each model is accurate.
  • Data: CIFAR-10. It is assumed that the teacher has a structure open for analysis with a large number of layers.
  • Literature: Hinton's original work on distillation, work by Andrei Grabovoi, work by Maria Gorpinich
  • Base algorithm: Training (models with a given structure of controlled complexity) without distillation. Teaching (ditto) with Hinton distillation. Layered learning. Neuronal transfer learning.
  • Solution: As in paragraph 2, only in layers. Building the path of least cost over neurons. We consider the covariance matrices of each neuron of each layer for the teacher and for the student. We propose an error function that includes the cost of the least cost path. We propose a way to construct the path of the least cost. The main idea: the transfer goes through pairs of neurons and the most similar distributions (expectation and covariance matrix) from teacher to student.
  • Novelty: The proposed transfer significantly reduces complexity without loss of accuracy and solves the problem of interchangeability of neurons by identifying them.
  • Authors: Experts Bakhteev Oleg, Strijov V.V., Consultant Gorpinich Maria.

Problem 116

  • Title: Neural differential equations for modeling physical activity - selection and generation of mathematical models
  • Problem description: The problem of choosing the optimal mat. models as the problem of genetic optimization. The optimality criterion is defined in terms of the accuracy, complexity, and stability of the model. The sampling procedure itself consists of two steps: generating a new structure and rejecting this structure if it does not satisfy the optimality criterion. Required on 'pendulum' type data - accelerometer, myogram, pulse wave - to choose the optimal model.
  • Data: WISDM, own collection of biomedical data
  • Literature: Neural CDE
  • Base algorithm: Neuro ODE/CDE on a two-layer neural network.
  • Solution: A number of experiments have already been performed, where sampling is performed by a genetic algorithm. Acceptable results have been obtained. It is proposed to analyze and improve them.
  • Solution: Algorithm for generating mathematical models in the form of ordinary differential equations. Comparison of models and solvers on biomedical data.
  • Authors: Expert Strijov V.V., consultant Eduard Vladimirov

Problem 117

  • Title: Search for dependencies of biomechanical systems (do people dance in pairs or independently?) and (Method of Convergence Cross-Mpping, Takens theorem)
  • Problem description: When forecasting complex time series that depend on exogenous factors and have multiple periodicity, it is required to solve the problem of identifying related pairs of series. It is assumed that the addition of these series to the model improves the quality of the forecast. In this paper, to detect relationships between time series, it is proposed to use the convergent cross-mapping method. With this approach, two time series are connected if their trajectory subspaces exist, the projections onto which are connected. In turn, the projections of series onto trajectory subspaces are related if the neighborhood of the phase trajectory of one series is mapped to the neighborhood of the phase trajectory of another series. The problem of finding trajectory subspaces that reveal the connection of series is set.
  • Literature: Everything Sugihara wrote in Science and Nature (ask the collection). Usmanova K.R., Strijov V.V. Detection of dependencies in time series in the problems of building predictive models // Systems and means of informatics, 2019, 29(2). Neural CDE
  • Data: Accelerometer, gyroscope, and other data describing dynamic systems
  • Solution: Basic in Karina's work. Ours is to build the Neural ODE for both signals and decide if both models belong to the same dynamic system.
  • Authors: Expert Strijov V.V., consultants Vladimirov, Samokhina

Problem 118

  • Title: Continuous time when building a BCI neural interface
  • Problem description: In signal decoding The problems, data is represented as multidimensional time series. When solving problems, a discrete representation of time is used. However, recent work on neural ordinary differential equations illustrates the ability to work with the hidden state of recurrent neural networks, as with solutions to differential equations. This allows us to consider time series as continuous in time.
  • Data: For classification: dataset P300, which was used to write an article with Alina, DEAP dataset dataset similar to it in the format of records, find a modern dataset, ask U.Grenoble-Alpes
  • Literature: Neural CDE
  • Base algorithm: Alina Samokhina's algorithm
  • Solution: Using NeurODE variations to approximate the original signal. Comparative analysis of existing approaches to the application of differential equations for EEG classification. (Encoder-tensor decomposition, NeuroCDE decoder)
  • Novelty: suggests a way to construct a continuous signal representation. Working with the functional space of the signal, not its discrete representation. Using the parameters of the resulting function as a feature space of the resulting model.
  • Authors: Expert Strijov V.V. (was Problem 109), consultant Tikhonov

Problem 119

  • Title: Analysis of the dynamics of multiple learning
  • Problem description: Consider a supervised multiple learning problems in which the training set is not fixed but is updated depending on the predictions of the trained model on the test set. For the process of multiple training, prediction and updating of the sample, we build a mathematical model and study the properties of this process based on the constructed model. Let f(x) be a feature distribution density function, G be an algorithm for training the model, generating predictions on the test set and mixing predictions into the training set, as a result of which the feature distribution changes. Let the space of non-negative smooth functions F(x) be given, whose integral on R^n is equal to one. f_{t+1}(x) = G(f_{t})(x), where G(f) is the evolution operator on the space of these functions F and the initial function f_0(x) is known. In general, G can be an arbitrary operator, not necessarily smooth and/or continuous. Question 0. Find conditions on the operator G under which the image of G lies in the same class of distribution density functions F. In particular, should G be bounded, the operator norm ||G|| <= 1, so that the image of G(f) \in F is also a distribution density function for any f from F? Does there exist a unit in the space F with respect to the operator G, and what will be the identity function f in such F? Question 1. Under what conditions will there be a t_0 on G such that for all t > t_0 the tail of the sequence {f} will be bounded? Question 2. Under what conditions will the operator G have a fixed point? Data In a computational experiment, it is proposed to check the significance of the restriction / the significance of the conditions under which the answer to questions 0-2 is obtained. For example, for a problem of linear regression and/or regression with a multilevel fully connected neural network with different proportions of predictions mixed into the training set on synthetic data sets.
  • Literature:
    1. Khritankov A., Hidden Feedback Loops in Machine Learning Systems: A Simulation Model and Preliminary Results, https://doi.org/10.1007/978-3-030-65854-0_5
    2. Khritankov A.. Pilkevich A. Existence Conditions for Hidden Feedback Loops in Online Recommender Systems, https://doi.org/10.1007/978-3-030-91560-5_19
    3. Katok A.B., Hasselblat B. Introduction to the modern theory of dynamical systems.1999. 768 p. ISBN 5-88688-042-9.
    4. Nemytsky V. V., Stepanov V. V. Qualitative theory of differential equations, published in 1974.
  • Authors: Expert Khritankov A.S., Expert Afanasiev A.P.

Problem 120

  • Title: Differentiated algorithm for searching ensembles of deep learning models with diversity control
  • Problem description: The problem of selecting an ensemble of models is considered. It is required to propose a method for controlling the diversity of basic models at the stage of application.
  • Data: Fashion-MNIST, CIFAR-10, CIFAR-100 datasets
  • Literature:
    1. Neural Architecture Search with Structure Complexity Control
    2. Neural Ensemble Search via Bayesian Sampling
    3. DARTS: Differentiable Architecture Search
  • Base algorithm: It is proposed to use DARTS [3] as the basic algorithm.
  • Solution: To control the diversity of basic models, it is proposed to use a hypernet [1], which shifts the structural parameters in terms of the Jensen-Shannon divergence. At the application stage, base architectures are sampled with a given offset to build an ensemble.
  • Novelty: The proposed method allows building ensembles with any number of base models without additional computational costs relative to the base algorithm.
  • Authors: K.D. Yakovlev, Bakhteev Oleg

Problem 121

  • Problem description: building predictive analytics for air pollution sensors.
  • Problem description: Data available for air quality monitoring stations in Moscow and the Moscow region (time series). The problem is to check the achievable predictive ability to predict the time series of station readings by their history + when connecting additional features (take into account the stations in aggregate, taking into account their location, time of day and weekend / working day, history and weather forecast (wind))
  • Data: Real data and simulations for Moscow and Moscow Region
  • Authors: Artem Mikhailov, Vladimir Vanovsky

Problem 122

  • Problem description: Reducing the dimension of space in a generative modeling problem using reversible models.
  • Problem description: An example of a generative modeling problem is image generation. Some kinds of new models, such as normalization flows or diffusion models, define reversible transformations. But at the same time they work in a space of very high dimensions. It is proposed to combine 2 approaches: dimensionality reduction and generative modeling.
  • Data: Any image dataset (MNIST/CIFAR10).
  • Novelty: By reducing the dimension, you can achieve a significant acceleration of generative models, which will reduce the complexity of such models.
  • Author: Roman Isachenko

Problem 123

  • Problem description: Analysis of distribution bias in contrast distribution problem.
  • Problem description: There is the same problem as Representation learning. One of the most popular approaches to solving this problem is contrastive learning. At the same time, in the data we learn from, there are often markup errors: false positive/false negative. It is proposed to analyze various ways to eliminate these biases caused by errors. And also to explore the properties of the proposed models.
  • Data: Any image dataset (MNIST/CIFAR10).
  • Novelty: Current models are very error sensitive. If you manage to take into account the bias in the distributions, many methods of ranking products will greatly increase in quality.
  • Author: Roman Isachenko

Problem 124

  • Title: Speed up sampling from diffusion models using adversarial networks
  • Problem description: The most popular generative model today is the diffusion model. Its main disadvantage is the speed of sampling. To sample 1 picture, you need to run 1 neural network 100-1000 times. There are ways to speed up this process. One such way is to use adversarial networks. It is proposed to develop this method and explore various ways to set the functional for sampling
  • Data: Any image dataset (MNIST/CIFAR10).
  • Novelty: By speeding up diffusion models, they will become even more popular and easier to use.
  • Author: Roman Isachenko

Problem 125

  • Title: Influence of the lockdown on the dynamics of the spread of the epidemic
  • Problem description: The introduction of a lockdown is considered an effective measure to combat the epidemic. However, contrary to intuition, it turned out that under certain conditions, a lockdown can lead to an increase in the epidemic. This effect is absent for the classical models of the spread of the epidemic “on average”, but was revealed when modeling the epidemic on the contact graph. The problem is to find formulaic and quantitative relationships between the parameters under which the lockdown can lead to an increase in the epidemic. It is necessary both to identify such relationships in the SEIRS/SEIR/SIS/etc models based on the SEIRS+ epidemiological distribution framework (and its modifications), and to theoretically substantiate the relationships obtained from specific implementations of the epidemia.
  • Data: The problem involves working with model and synthetic data: there are ready-made data, and it is also possible to generate new ones in the process of solving the problem. This The problem belongs to unsupervised learning, since the implementation of the epidemic on the contact graph has a high proportion of random events, and therefore requires analysis on average over many synthetically generated implementations of the epidemic
  • Literature: T. Harko, Francisco S. N. Lobo, and M. Mak. "Exact analytical solutions of the Susceptible-Infected-Recovered (SIR) epidemic model and of the SIR model with equal death and birth rates"
  • Authors: A.Yu. Bishuk, A.V. Zuhba

Problem 126

  • Title: Machine generation style change detection
  • Problem description:It is required to propose a detection method
  • Data: The sample for approximation is presented in the work of J. Berezutskay, in which there are various types of parallel signals.
  • Literature:
    1. G. Gritsay, A. Grabovoy, Y. Chekhovich. Automatic Detection of Machine Generated Texts: Need More Tokens // Ivannikov Memorial Workshop (IVMEM), 2022.
    2. M. Kuznetsov, A. Motrenko, R. Kuznetsova, V. Strijov. Methods for intrinsic plagiarism detection and author diarization // Working Notes of CLEF, 2016, 1609 : 912-919.
    3. RuATD competition.
  • Base algorithm: Using the results of the RuATD competition as base models for classifying proposals. Use the method from Kuznetsov et all.
  • Novelty: Suggest a method for detecting machine-generated fragments in the text using methods for changing the writing style.
  • Authors: Expert Grabovoi Andrey

Problem 128

  • Title: Build a deep learning model based on The problem data
  • Problem description: is considered The problem optimization of the deep learning model for the new dataset. It is required to propose a model optimization method that allows generating new models for a new dataset with low computational costs.
  • Data: CIFAR10, CIFAR100
  • Literature: variational inference for neural networks, hypernets, similar work tailored to change the model depending on a predetermined complexity
  • Base algorithm: Retrain the model directly.
  • Solution: The proposed method is to represent a deep learning model as a hypernet (a network that generates the parameters of another network) using a Bayesian approach. Probabilistic assumptions about the parameters of deep learning models are introduced, and a variational lower estimate of the Bayesian validity of the model is maximized. The variation estimate is considered as a conditional value, depending on the information about the problem data.
  • Novelty: The proposed method allows you to generate models in one-shot mode (practically without retraining) for the required The problem, which significantly reduces the cost of optimization and retraining.
  • Authors: Olga Grebenkova and Bakhteev Oleg

Problem 129

  • Title: Spatiotemporal Prediction with Convolutional Networks and Tensor Decompositions
  • Problem description:Generate a set of convolutions from the available data and choose the best one using order and dimensionality reduction techniques.
  • Data: Consumption and price of electricity, ocean currents, dune movement, air currents
  • Literature:
    1. [1](Tensor-based Singular Spectrum Analysis for Automatic Scoring of Sleep EEG
    2. [2](Tensor based singular spectrum analysis for nonstationary source separation)
  • Base algorithm: Caterpillar, tensor caterpillar.
  • Solution: Find a multi-periodic time series, build its tensor representation, decompose into a spectrum, collect, show the forecast.
  • Novelty: Show that a multilinear model is a convenient way to construct convolutions for dimensions in space and time.
  • Authors: Expert Strijov V.V., consultant Nadezhda Alsakhanova

Problem 130

  • Title: Automatic highlighting of terms for topic modeling
  • Problem description: Build an ATE (Automatic Term Extraction) model for automatic extraction of phrases that are terms of the subject area in the texts of scientific articles. It is supposed to use effective collocation detection methods (TopMine or more modern) and thematic models to determine the "thematic" of the phrase. The model must be trained without a teacher (unsupervised).
  • Data: Collection of scientific articles in the field of machine learning. Marked up articles with highlighted terms for evaluating models.
  • Literature:
    1. El-Kishky A., Song Y., Wang C., Voss C. R., Han J. Scalable topical phrase mining from text corpora // Proc. VLDB Endowment. _ 2014._ Vol. 8, no. 3._Pp. 305_316.
    2. Vorontsov K. V. "Probabilistic thematic modeling: theory, models, algorithms and the BigARTM project" (http://www.machinelearning.ru/wiki/images/d/d5/Voron17survey-artm.pdf)
    3. Nikolay Shatalov. Unsupervised learning methods for automatically highlighting compound terms in text collections. 2019. VMK MSU.
    4. Vladimir Polushin. Topic models for ranking text content recommendations. 2017. VMK MSU.
    5. Hanh Thi Hong Tran, Matej Martinc, Jaya Caporusso, Antoine Doucet, Senja Pollak. The Recent Advances in Automatic Term Extraction: A survey. 2023. https://arxiv.org/abs/2301.06767
  • Base algorithm: TopMine collocation search method • BigARTM thematic modeling library. • Modern methods based on neural network language models
  • Solution: Application of the TopMine collocation search algorithm followed by filtering by topic. Selection of thematic model hyperparameters and thematicity criterion. Comparison of this approach with modern methods based on neural network models of the language.
  • Novelty: Previous studies of the proposed approach have shown good results both in terms of completeness and computational efficiency. However, they have not yet been compared with neural network models.
  • Authors: Polina Potapova, Vorontsov K.V.

Problem 131

  • Title: Iterative improvement of the topic model with user feedback
  • Problem description: Topic modeling is widely used in socio-humanitarian research to understand the thematic structure of large text collections. A typical use case would involve the user rating topics as relevant, irrelevant, and junk. If the number of garbage topics is too large, then the user tries to build another model. The problem is to use custom markup for each such rebuild in such a way that relevant topics are preserved, new relevant ones stand out from irrelevant and garbage topics if possible, and there are as few garbage topics as possible.
  • Data: Any collection of natural language texts about which the thematic structure is known (about how many topics, how many documents on different topics) is suitable as data. For example, you can take a collection of Lenta news, a Wikipedia dump, posts from Habrahabr, 20 Newsgroups, Reuters, articles from PostNauka. The subject of the collection should be of interest to the researcher himself, so that there is motivation to evaluate topics manually.
  • Literature:
    1. Vorontsov K. V. "Probabilistic thematic modeling: theory, models, algorithms and the BigARTM project" (http://www.machinelearning.ru/wiki/images/d/d5/Voron17survey-artm.pdf ).
    2. Alekseev V. et al. "TopicBank: Collection of coherent topics using multiple model training with their further use for topic model validation" (https://www.sciencedirect.com/science/article/pii/S0169023X21000483).
  • Solution: Using the BigARTM theme modeling library. Use of smoothing and decorrelation regularizers. Development of methods of initialization when rebuilding thematic models. Finding a ready-made tool or developing a simple, fast, convenient way to view and markup topics.
  • Novelty: The problem of non-uniqueness and instability of models still does not have a final solution in probabilistic thematic modeling. The proposed study is an important step towards building models with the maximum number of interpretable topics that are meaningfully useful from the point of view of humanitarian research.
  • Authors: Vasily Alekseev, Vorontsov K. V.

Problem 132

  • Title: Ranking of scientific articles for semi-automatic summarization
  • Problem description: Build a ranking model that takes a selection of texts of scientific articles as input and outputs the sequence of their mention in the abstract.
  • Data: - Overview sections (for example, Introduction and Related Work) of articles from the S2ORC collection (81.1M English-language articles) are used as a training sample. The object of the training set is a sequence of references to articles from the bibliography mentioned in the review sections. For each document there is a set of metadata - year of publication, journal, number of citations, number of citations of the author, etc. Also, there is an abstract and, possibly, the full text of the article. - Kendall's rank correlation coefficient is used as a metric.
  • Literature:
    1. Kryzhanovskaya S. Yu. "Technology of semi-automatic summation of thematic collections of scientific articles".
    2. Vlasov A. V. "Methods of semi-automatic summation of collections of scientific articles".
    3. Kryzhanovskaya S. Yu., Vorontsov K. V "Technology for semi-automatic summarization of thematic collections of scientific articles" (http://www.machinelearning.ru/wiki/images/f/ff/Idp22.pdf, p. 371), S2ORC: The Semantic Scholar Open Research Corpus.
  • Base algorithm: Pair-wise ranking methods. Gradient boosting.
  • Solution: The simplest solution is to rank the articles in chronological order, according to the year they were published. To solve the problem, it is proposed to build a ranking model based on gradient boosting. As signs, you can use the year of publication, the citation of the article, the citation of its authors, the semantic proximity of the publication to the review, to its local context, etc.
  • Novelty: The problem is the first step for semi-automatic summarization of thematic collections of scientific publications (machine aided human summarization, MAHS). After the abstract script is built, the system generates prompt phrases for each article, from which the user selects phrases to continue his abstract.
  • Author: Kryzhanovskaya Svetlana, Vorontsov K. V.

Problem 133

  • Title: Diffusion models in the problem of generating the structure of a molecule with optimal energy
  • Problem description: For an organic small molecule (the number of atoms is less than 100), knowing only the topology of the molecular graph is not enough to obtain the spatial structure. A molecule can have many possible configurations (conformers), each of which corresponds to a local minimum of the potential. In practice, of greatest interest are the most stable conformers, which have the lowest energy. Recent studies show the success of the application of diffusion models for the generation of molecular structures. This approach shows advanced results in the problem of generating molecules and their conformers for a small number of heavy atoms (QM9 dataset up to 9 heavy atoms in a molecule), as well as in assessing the binding of a molecule and a protein. It is proposed to build a model for the generation of conformers with minimum energy for larger molecules.
  • Data: Base dataset QM9
  • Literature:
    1. Different theoretical approaches to the diffusion model: https://arxiv.org/abs/2011.13456
    2. Diffusion in molecular generation: https://arxiv.org/abs/2203.17003
    3. Diffusion in the problem of binding a protein and a molecule: https://arxiv.org/abs/2210.01776
    4. Diffusion in the problem of conformer generation: https://arxiv.org/abs/2203.02923
    5. Tutorial on equivariant neural networks: https://arxiv.org/abs/2207.09453
  • Base algorithm: GeoDiff[4].
  • Solution: Implement conformer generation similar to DiffDock[3] for QM9 dataset. Check the performance of the model for larger molecules.
  • Novelty: The novelty of the work lies in the design of a model for generating large conformers, which is of great practical importance.
  • Author: Philip Nikitin

Problem 134

  • Title: Combining distillation of models and data
  • Problem description: Knowledge distillation is the transfer of knowledge from a more meaningful representation to a compact, concise representation. There are two kinds of knowledge distillation. The first is the distillation of models. In this case, the large model transfers knowledge (distilled) to the small model. The second is data distillation. In this case, a minimum data set is created, on which, after training the model, it achieves a quality comparable to training on a full sample. At the moment, there is no solution that can implement simultaneous distillation of model and knowledge. Therefore, the goal of The problem is to propose a basic solution for model distillation and compare with approaches to model distillation and data distillation.
  • Data: MNIST handwritten digit sampling, CIFAR-10 image sampling
  • Literature:
    1. A collection of various papers on the distillation of data.
    2. Review on methods of distillation models.
    3. Basic knowledge distillation solution.
    4. Basic solution for model distillation.
  • Base algorithm: Basic Model Distillation Solution, Hinton Distillation Basic Dataset Distillation Solution, Dataset Distillation by Matching Training Trajectories
  • Solution: It is proposed to implement data distillation as a basic algorithm. Then train a larger model on the data and distill it into a smaller model. Next, compare with the original model and the model trained on distilled data.
  • Novelty: The novelty of the work lies in the combination of two distillation approaches, which has not been implemented before
  • Authors: Andrey Filatov

Problem 135

  • Title: Proximity measures in self-supervised learning The problems
  • Problem description: The idea of self-supervised learning is to solve an artificially selected The problem to get useful representations of data without markup. One of the most popular approaches is the use of contrastive learning, during which the model is trained to minimize the distance between representations of augmented copies of the same object. The purpose of The problem is to investigate the quality of the resulting representations depending on the choice of the proximity measure (similarity measure) used in training, and to offer our own version of distance measurement
  • Data: CIFAR-100
  • Literature:
    1. Solution using squared Euclidean distance.
    2. Solution using cosine similarity.
    3. Decision based on the information principle.
  • Base algorithm: VicReg, Barlow Twins, SimSiam
  • Solution: One of the distance options that can be proposed is an analogue of the Vaserstein metric, which would allow taking into account the dependencies between features.
  • Novelty: Propose a new way to determine the measure of proximity, which would be theoretically justified / contributed to obtaining representations with given properties
  • Authors: Polina Barabanshchikova

Problem 136

  • Title: Stochastic Newton with Arbitrary Sampling
  • Problem description: We analyze second order methods solving Empirical Risk Minimization problem of the form min f(x) in R^d. Here x is a parameter vector of some Machine Learning model, f_i(x) is a loss function on i-th training point (a_i,b_i). Our desire to solve it using Newton-type method that requires access to only one data point per iteration. We investigate different sampling strategies of index i_k on iteration k. See description in PDF.
  • Data: It is proposed to use open SVM library as a data for experimental part of the work.
  • References:
    1. Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates
    2. Parallel coordinate descent methods for big data optimization
  • Base algorithm: As a base method it is proposed to use Algorithm 1 from the paper Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates.
  • Solution: Is is proposed to adjust existing sampling strategies from Parallel coordinate descent methods for big data optimization in this work.
  • Novelty: In the literature of Second Order methods there are a few works on incremental methods. The idea is to analyze the existing method by applying different sampling strategies. It is known that the proper sampling strategies may improve the performance of a method.
  • Authors: Islamov Rustem, Vadim Strijov

Problem 139

  • Title: Distillation of models on multidomain selections.
  • Problem description: The problem of reducing the complexity of the approximating model when transferred to new data of lower power is investigated.
  • Data: Samples MNIST, CIFAR-10, CIFAR-100, Amazon products.
  • Literature: Diploma Kamil Bayazitov
  • Base algorithm: The basic solution and experiments are presented in the thesis.
  • Authors: Grabovoi Andrey

Problem 140

  • Title: Tailoring the architecture of a performance-controlled deep learning model
  • Problem description: considers The problem adapting the structure of a trained deep learning model for limited computing resources. It is assumed that the resulting architecture (or several architectures) should work efficiently on several types of computing servers (for example, on different GPU models or different mobile devices). It is required to propose a model search method that allows controlling its complexity taking into account the target performance characteristics.
  • Data: MNIST, CIFAR
  • Literature:
    1. Grebenkova O.S., Bakhteev Oleg O., Strijov V.V. V.V. Variational optimization of a deep learning model with complexity control // Informatics and its applications, 2021, 15(2). PDF
    2. Yakovlev K. D. et al. Neural Architecture Search with Structure Complexity Control //Recent Trends in Analysis of Images, Social Networks and Texts: 10th International Conference, AIST 2021, Tbilisi, Georgia, December 16–18, 2021, Revised Selected Papers. Cham: Springer International Publishing, 2022. - pp. 207-219.
    3. FBNet: choosing a model architecture based on target characteristics
  • Base algorithm: FBNet and random search of model substructure
  • Solution: The proposed method is to use a differentiable neural network architecture search algorithm (FBNet) with parameter complexity control using a hypernet. A hypernetwork is a model that generates the structure of the model depending on the input parameters. It is proposed to use the normalized running time of basic operations on target computing resources as hypernet parameters. Thus, the resulting model will allow adapting the architecture of the model for an arbitrary device. * Novelty: The proposed method allows you to control the complexity of the model, in the process of searching for an architecture without additional heuristics.
  • Authors: Konstantin Yakovlev, Bakhteev Oleg

2022

Results

Author Topic Links Consultant Letters
Pilkevich Anton Existence conditions for hidden feedback loops in recommender systems GitHub, LinkReview,

Paper, Slides, Video, Video

Khritankov AILB.P-X+R-B-H1CVO.T-EM.H1WJSF
Vladimirov Eduard Restoration of the trajectory of hand movement from video GitHub, LinkReview,

Paper, Slides

Isachenko (B.O.H1M)ALI+PXRBС+V+TED?
Petrushina Ksenia Anti-Distillation: Knowledge Transfer from Simple Model to a Complex One GitHub, LinkReview,

Paper, Slides

Grabovoi (B.O.H1M)ALIPXRBСVTED
Kornilov Nikita Winterstorm risk prediction via machine learning methods GitHub, LinkReview,

Paper, Slides

Yuri Maksimov (B.O.H1M?)ALIPXRBСV+TE0D
Aliyev Alen Geometric Deep Learning for Protein-Protein Binding Affinity Prediction GitHub, LinkReview,

Paper, Slides

Ilya Igashov (B.O.H1M?)ALIPXRBСVTED?
Lukyanenko Ivan Hail Prediction Using Graph Neural Networks GitHub, [3],

Paper, Slides

Yuri Maksimov (B.O.H1M?)ALIPXRBСV+TED?
Gaponov Maxim Choosing Interpretable Recurrent Deep Learning Models GitHub, LinkReview,

Paper, Slides

Bakhteev Oleg (B.O.H1M)AL+IPXRBСVT???ED
Melnikov Igor Stochastic Newton with Arbitrary Sampling GitHub, LinkReview,

Paper, Slides

Rustem Islamov (B.O.H1M)ALIPXСRBVTED
Zmushko Philip Continuous time when building a BCI neural interface GitHub, LinkReview,

Paper, Slides

Samokhina (B.O.H1M)ALI0P0XR?BСVTE?D?
Tishchenko Evgeny Cross-language duplicate search GitHub, LinkReview,

Paper, Slides

Konstantin Vorontsov (B.O.H1M)ALIPXRB0СV0T?E?D?
Antyshev Tikhon Compression for Federated Random Reshuffling GitHub, LinkReview,

Paper, Slides

Malinovsky (B.O.H1_M?)ALI-PXRBСVT?
Pyzh Vladislav Flood risk prediction via machine learning methods GitHub, LinkReview,

Paper, Online Draft, Slides

Yuri Maksimov (B.O.H10M?)ALI0P0XRBСVT0ED?
Zharov Georgy Forest fire risk assessments using machine learning methods GitHub, LinkReview,

Paper, Slides

Yuri Maksimov (B.O.H1)ALIPX0R0B0С0V0T?E0D?
Muradov Timur Choosing Interpretable Convolutional Deep Learning Models GitHub, LinkReview,

Paper, Slides

Bakhteev (B.O.H1)ALI0P0XRBСV0T0E?D?
Pavlov Dmitry Machine learning approach to startup success prediction GitHub, Online Draft,

Paper, Slides

Anton Moiseev, Yuri Ammosov (B.O.H10M?)ALI?P?XRBСV?T0E0D0

Problem 100.2022 (group)

  • Title: Multi-model representation of dynamical systems
  • Problem description: The system described by attractors in several phase spaces is considered. Particular models are constructed that approximate measurements of the state of the system in each space. A matching multimodel is built. The parameters of private models are specified.
  • Data: Human motion video, accelerometer, gyroscope, electroencephalogram signals
  • Literature: Our work on accelerometers and BCI, dissertations by Motrenko, Isachenko, Grabovoi
  • Base algorithm: Particular models are neural networks, multimodel is canonical correlation analysis and multimodel is distilled.
  • Solution: Generalize canonical correlation analysis and distillation to the case of an arbitrary number of models.
  • Novelty: Alignment space built for a set of heterogeneous models
  • Authors: A.V. Grabovoi, Strijov V.V.

Problem 90.2022

  • Title: Hand movement recovery from video
  • Problem description: A skeletal representation of a person's pose is restored from the video sequence. The trajectory of the movement of human limbs sets the initial phase space. The accelerometer signal from the limbs sets the target phase space. Build a model that connects the attractors of the trajectories of the source and target spaces.
  • Data: The initial sample is collected by the authors of the project. Parts of the selection are in the library examples.
  • Solution: Theoretical part executed by the extended command. Perform a theoretical study: show that the canonical correlation analysis method (and in particular the PLS, NNPLS, seq2seq, Neur ODE methods) are special cases of the Sugihara convergent cross mapping method.
  • Novelty: A reversible model has been introduced that maps the coordinates recovered from the video sequence into the accelerations of the mobile phone's accelerometer.
  • Authors: A.D. Kurdyukova, R.I. Isachenko, Strijov V.V.

Problem 91.2022

  • Title: Clustering human movement trajectories
  • Problem description: This paper analyzes the periodic signals in the time series to recognize human activity by using a mobile accelerometer. Each point in the timeline corresponds to a segment of historical time series. This segments form a phase trajectory in phase space of human activity. The principal components of segments of the phase trajectory are treated as feature descriptions at the point in the timeline. The paper introduces a new distance function between the points in new feature space. To reval changes of types of the human activity the paper proposes an algorithm. This algorithm clusters points of the timeline by using a pairwise distances matrix. The algorithm was tested on synthetic and real data. This real data were obtained from a mobile accelerometer
  • Data: USC-HAD, new accelerometer samples
  • Literature: Grabovoy A.V., Strijov V.V. Quasi-periodic time series clustering for human activity recognition // Lobachevskii Journal of Mathematics, 2020, 41 : 333-339.
  • Base algorithm: Caterpillar
  • Solution: Bring Grabovoi's article from the Lobachevsky Journal of Mathematics to perfection
  • Novelty: Use Neuro ODE to plot the phase trajectory and classify it
  • Authors: A.V. Grabovoi (ask!!), Strijov V.V.

Problem 97.2022

  • Title: Anti-distillation or teacher training: knowledge transfer from a simple model to a complex one
  • Problem description: The problem of adapting the model to a new sample with a large amount of information is considered. For adaptation, it is proposed to build a new model of greater complexity with further transfer of information from a simple model to it. When transferring information, it is necessary to take into account not only the quality of the forecast on the original sample, but also the adaptability of the new model to the new sample and the robustness of the solution obtained.
  • Data: MNIST handwritten digit sampling, CIFAR-10 image sampling
  • Literature: Original distillation problem statement: Hinton G. et al. Distilling the knowledge in a neural network //arXiv preprint arXiv:1503.02531
  • Base algorithm: It is proposed to increase the complexity of the model by including constant values close to zero in the model. This approach is basic, because can lead to a decrease in the robustness of the model and worse adaptability to a new sample.
  • Solution: It is proposed to consider several approaches to increase the complexity of the model, including both probabilistic (adding noise to new parameters, taking into account operational requirements) and algebraic (expanding the parametric space of the model, taking into account the requirements for robustness and constant Lipschitz of the original model)
  • Novelty: obtaining a method that allows you to adapt the existing model to complicate the training sample without losing information
  • Authors: Bakhteev, Grabovoi, Strijov V.V.

Problem 98.2022

  • Title: Deep learning model selection with expert model matching control
  • Problem description: is considered The problem classification. An expert model of low complexity is specified. It is required to build a deep learning model that gives a high quality of the forecast and is similar in behavior to the expert model.
  • Data: Sociological samples, CIFAR image sample
  • Literature: Yakovlev Konstantin, Grebenkova Olga, Bakhteev Oleg, Strijov Vadim. Neural architecture search with structure complexity control // Communications in Computer and Information Science (Proceedings of the 10th International Conference on Analysis of Images, Social Networks and Texts), 2021
  • Base algorithm: building an expert model.
  • Solution: The proposed method consists in hypernetworks with control of the consistency of the found model with the expert model. A hypernetwork is a deep learning model that generates the parameters of the target model.
  • Novelty: the proposed method allows to take into account expert judgment in the process of model selection and architecture search.
  • Authors: Grebenkova, Bakhteev, Strijov V.V.

Problem 99.2022

  • Title: Selection of interpretable convolutional deep learning models
  • Problem description: Considers The problem of choosing an interpretable deep learning classification model. Interpretability is understood as the ability of the model to: a) return the most significant features of an object for classification, b) determine clusters of objects that are similar from the point of view of the classifier
  • Data: MNIST handwritten digit sampling, CIFAR-10 image sampling
  • Literature:
    1. Exact and Consistent Interpretation for Piecewise Linear Neural Networks: A Closed Form Solution
    2. "Why Should I Trust You?": Explaining the Predictions of Any Classifier
  • Base algorithm: The LIME(1) algorithm interprets the model by local approximation
  • Solution: A solution based on the method described in (2) is proposed. In this paper, a generalization of the multilayer perzpetron model with a piecewise linear activation function was proposed. Such an activation function allows us to consider the classifier for each sample object as a locally linear one, without using approximation. It is proposed to generalize the proposed approach to the main nonlinear functions used in convolutional neural networks: convolution, pooling and normalization functions.
  • Novelty: is to obtain a new class of neural models that lend themselves to good interpretation.
  • Authors: Yakovlev, Bakhteev, Strijov V.V.

Problem 01.2022

  • Title: Stochastic Newton with Arbitrary Sampling
  • Problem: We analyze second order methods solving Empirical Risk Minimization problem of the form min f(x) in R^d. Here x is a parameter vector of some Machine Learning model, f_i(x) is a loss function on i-th training point (a_i,b_i). Our desire to solve it using Newton-type method that requires access to only one data point per iteration. We investigate different sampling strategies of index i_k on iteration k. See description in PDF.
  • Dataset: It is proposed to use open SVM library as a data for experimental part of the work.
  • References:
    1. Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates
    2. Parallel coordinate descent methods for big data optimization
  • Base algorithm: As a base method it is proposed to use Algorithm 1 from the paper Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates.
  • Solution: Is is proposed to adjust existing sampling strategies from Parallel coordinate descent methods for big data optimization in this work.
  • Novelty: In the literature of Second Order methods there are a few works on incremental methods. The idea is to analyze the existing method by applying different sampling strategies. It is known that the proper sampling strategies may improve the performance of a method.
  • Authors: Islamov Rustem, Vadim Strijov

Problem 107.2022

  • Title: Compression for Federated Random Reshuffling
  • Problem: We analyze first order methods solving Empirical Risk Minimization problem of the form min f(x) in R^d. Here x is a parameter vector of some Machine Learning model, f_i(x) is a loss function on i-th training point (a_i,b_i). We focus on distributed setting of this problem. We are going to apply compression techniques to reduce number of communicated bits to overcome communication bottleneck. Also we want to combine it with server-side updates. We desire to generalize and get improvement in theory and practice.
  • Dataset: It is proposed to use open SVM library as a data for experimental part of the work.
  • References:
    1. Federated Random Reshuffling with Compression and Variance Reduction
    2. Proximal and Federated Random Reshuffling
    3. Server-Side Stepsizes and Sampling Without Replacement Provably Help in Federated Optimization
  • Base algorithm: As a base method we use Algorithm 3 from Proximal and Federated Random Reshuffling.
  • Solution: Is is proposed to combine the method with two stepsizes with compression operators.
  • Novelty: This would be the first method combining 4 popular federated learning techniques: local steps, compression, reshuffling of data and two stepsizes.
  • Authors: Grigory Malinovsky

Problem 108.2022

  • Title: Distillation of knowledge using sample representation in the common latent space of models
  • Problem description: Considers The problem of distillation - the transfer of information from one or more teacher models to the student. A special case is considered when teachers have incomplete information about the sample, and each model has useful information only about some subset.
  • Data: Sample CIFAR-10 images; sampling of handwritten MNIST digits
  • Literature:
    1. Hinton G. et al. Distilling the knowledge in a neural network //arXiv preprint arXiv:1503.02531. - 2015. - Vol. 2. - No. 7.
    2. Oki H. et al. Triplet Loss for Knowledge Distillation //2020 International Joint Conference on Neural Networks (IJCNN). - IEEE, 2020. - P. 1-7.
  • Base algorithm: Hinton distillation [1].
  • Solution: It is proposed to consider hidden representations of teachers and students obtained using dimensionality reduction algorithms. To align the model spaces, it is proposed to use the autoencoder model with triplet constraints (see, for example, [2]).
  • Novelty: The proposed method will allow the distillation of heterogeneous models, using information from several teachers.
  • Authors: Gorpinich, Bakhteev, Strijov V.V.

Problem 93.2022

  • Title: Estimating the risk of forest fires using machine learning methods.
  • Problem description: Wildfire risk prediction based on climate variables (water/air temperature, atmospheric pressure) since 1991. Forecasting is carried out (a) in the short-term range (2-5 years; stationary time series) and (b) in the long-term range (up to 50 years; non-stationary time series). A feature of forecasting in the long range is the (probable) significant change in the behavior of climate variables (CMIP5 scenarios). The key features of problem (1) are the need for a sufficiently accurate prediction of extreme risk values (maximum values of the time series), while the algorithm can make a significant number of errors in the region of small values of the series. (2) the spatial data structure of the series.
  • Data:
    1. Google Earth Data - data on climate variables and landscape available via API (there is a jupyter notebook through which you can download data locally)
    2. CMIP5 climate scenarios (there is a jupyter notebook through which you can download data locally)
    3. Wildfire Risk Database
    4. Severe Weather Dataset
  • Literature:
    1. Daizong Ding, Mi Zhang, Xudong Pan, Min Yang, Xiangnan He. Modeling Extreme Events in Time Series Prediction. KDD-2019.
    2. Roman Kail, Alexey Zaytsev, Evgeny Burnaev. Recurrent Convolutional Neural Networks help to predict the location of Earthquakes.
    3. Nikolay Laptev, Jason Yosinski, Li Erran Li, Slawek Smyl. Time-series Extreme Event Forecasting with Neural Networks at Uber.
  • Base algorithm: (1) method from article 1, (2). ST-LSTM
  • Solution: is proposed to solve the problem in two steps. At the first step, Algorithm 1 (with the addition of a spatial component) restores (averaged over a certain range) the behavior of the time series. Next, the discrepancy between the values of the series and the model is analyzed. Based on this, the noise distribution is restored and a probabilistic model is built to achieve a certain level of risk in a given territory in the required time range.
  • Novelty: (geo)-spatial time series prediction is an open area with great potential for theoretical and practical work. In particular, fire risk assessment is necessary for (1) predicting the probability of accidents (electric power industry, gas transport complex); (2) prioritization of fire prevention measures by region; (3) assessing the financial risks of companies operating in the region.
  • Authors: Yuri Maksimov, Alexey Zaitsev
  • Consultants: Yuri Maksimov, Alexey Zaitsev, Alexander Lukashevich.

Problem 94.2022

  • Title: Hail forecast using graph neural networks
  • Problem description: Hail risk prediction based on climate variables (water/air temperature, atmospheric pressure) since 1991. Forecasting is carried out (a) in the short-term range (2-5 years; stationary time series) and (b) in the long-term range (up to 50 years; non-stationary time series). A feature of forecasting in the long range is the (probable) significant change in the behavior of climate variables (CMIP5 scenarios). Key features of The problem (1) rare events, the case of hail in Russia over the past 30 years was less than 700 throughout the country (2) the spatial structure of the data series.
  • Data:
    1. Google Earth Data - data on climate variables and landscape available via API (there is a jupyter notebook through which you can download data locally)
    2. CMIP5 climate scenarios (there is a jupyter notebook through which you can download data locally)
    3. NOAA Storm Events Database
    4. European Severe Weather Database
    5. Severe Weather Dataset
  • Literature:
    1. Ayush, Kumar, et al. "Geography-aware self-supervised learning." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
    2. Cachay, Salva Rühling, et al. "Graph Neural Networks for Improved El Ni\~ no Forecasting." arXiv preprint arXiv:2012.01598 (2020). NeurIPS Clima Workshop.
    3. Cai, Lei, et al. "Structural temporal graph neural networks for anomaly detection in dynamic graphs." Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2021.
  • Base algorithm: classification with extremely rare events, the most basic variant of log-regression + SMOTE. The paper proposes to take a combination of algorithms from articles 2 and 3 as a basis.
  • Solution: suggests that a combination of the algorithms from articles 2 and 3 can improve classification in such The problems with exceptionally rare events. In addition, it is supposed to use physical information to regularize the classifier (combination of temperature/humidity factors at which hail is most likely)
  • Novelty: (geo)-spatial time series prediction is an open area with great potential for theoretical and practical work. In particular, fire risk assessment is necessary for (1) predicting the probability of damage (agriculture, animal husbandry); (2) assessment of insurance and financial risks.
  • Authors: Yuri Maksimov (point of contact), Alexey Zaitsev
  • Consultants: Yuri Maksimov (point of contact), Alexey Zaitsev, Alexander Bulkin.

Problem 95.2022

  • Title: Identification the transmission rate and time-dependent noise for the stochastic SIER disease model with vital rates (Time-dependent parameter identification for a stochastic epidemic model)
  • Problem description: The problem is set to find the optimal time-dependent parameters for the known stochastic SIER disease propagation model. The optimal parameters are the parameters of the stochastic equation, under which the sample of the rate of spread of the virus in a limited population, when using comparison with the optimal sample. It is proposed to use the adaptive generalized method of moments with local delay (LLGMM) based on the generalized method of moments (GMM).
  • Data: Hopkins Institution's Coronavirus Increasing Data is available from various sources. You can also download the data yourself from the link.
  • Literature:
    1. Anna Mummert, Olusegun M. Otunuga Parameter identification for a stochastic SEIRS epidemic model: case study influenza PDF
    2. David M. Drukker Understanding the generalized method of moments (GMM): A simple example LINK
  • Keywords: Compartment disease model, Stochastic disease model, Local lagged adapted generalized method of moments, Time-dependent transmission rate
  • Base algorithm: there are several different options on the Internet, for example, the article B.Tseytlin Actually forecasting COVID-19 LINK, the current program does not give good convergence, because it always uses a fixed number of points for prediction
  • Novelty: a new LLGMM method of moments that increases the accuracy of prediction& The basic idea of the method of moments is to use in moment conditions (moment functions or simply moments) instead of mathematical expectations, sample means, which, according to the law of large numbers under sufficiently weak conditions, should converges asymptotically to the mathematical expectations. Since the number of conditions for moments in the general case is greater than the number of estimated parameters, this system of conditions does not have a unique solution. The generalized method of moments suggests a situation where it is possible to obtain more conditions for moments than estimated parameters. The method constructs moment conditions (moment functions), also called orthogonality conditions, in a more general form as some function of model parameters and data. The parameters are estimated by minimizing a certain positive quadratic form from the sample means for the moments (moment functions). The quadratic form is in an iterative process with the required accuracy. If the model contains more than one parameter (this is our case) to be identified, then the second and higher moments are used to construct moment conditions. LLGMM defines time-dependent parameters by using a limited number of "points" in a data time series to form moment conditions, rather than the entire series. So the method is late. In addition, the number of time series elements used varies for each estimate over time. Thus, the method is local and adaptive.
  • Author: expert Vera Markasheva (Laboratory of Computational Bioinformatics of the Center for Systems Biology)

Problem 96.2022

  • Title: Impact of the lockdown on the dynamics of the epidemic
  • Problem description: The introduction of a lockdown is considered an effective measure to combat the epidemic. However, contrary to intuition, it turned out that under certain conditions, a lockdown can lead to an increase in the epidemic. This effect is absent for classical models “on average”, but was revealed when modeling the spread of the epidemic, taking into account the contact graph. The problem is to find formulaic and quantitative relationships between the parameters under which the lockdown can lead to an increase in the epidemic.
  • Data: Real data on the spread of the epidemic on contact graphs, especially considering the need for scenario analysis, is not available. The problem involves working with model and synthetic data: there are ready-made data, and it is also assumed that new ones can be generated in the process of solving the problem.
  • Authors: Anton Bishuk, A.V. Zuhba

Problem 102.2022

  • Title: Graph neural networks in the problem of regression of pairs of graphs
  • Problem description: Considered The problem regression on a pair of graphs. In a pair, each vertex of one graph corresponds to a vertex of the second graph. It is required to establish the optimal architecture of the graph neural network, taking into account the given order specified on the vertices.
  • Data: It is suggested to use chemical reaction datasets github. For a given dataset, a pair of graphs is specified in a natural way. These are graphs of molecules of initial substances and products of a chemical reaction.
  • Literature:
    1. DRACON: disconnected graph neural network for atom mapping in chemical reactions.
    2. condensed-graph-of-reaction.pdf Machine learning of reaction properties via learned representations of the condensed graph of reaction.
    3. A comprehensive survey on graph neural networks.
  • Base algorithm: The graph relationship is set at the level of graph embeddings. That is, a separate embedding vector is built for each graph, and then the vector data is concatenated. In this case, information about the correspondence of vertices in graphs is not explicitly used.
  • Novelty: On the example of the architecture of a graph neural network with fixed hyperparameters, from a theoretical and practical point of view, to study ways to add information about the relationship of graphs to a graph neural network.
  • Authors: Filipp Nikitin, Vadim Strijov V.V., Alexander Isaev.

Problem 103.2022

  • Requirement: Fluent English to collaborate, Python and PyTorch (medium level and higher), Git, Bash, Background in computational biology is a plus
  • Introduction: See full description here. Proteins are involved in several biological reactions by means of interactions with other proteins or with other molecules such as nucleic acids, carbohydrates, and ligands. Among these interaction types, protein–protein interactions (PPIs) are considered to be one of the key factors as they are involved in most of the cellular processes [1]. The binding of two proteins can be viewed as a reversible and rapid process in an equilibrium that is governed by the law of mass action. Binding affinity is the strength of the interaction between two (or more than two) molecules that bind reversibly (interact). It is translated into physico-chemical terms in the dissociation constant Kd, the latter being the concentration of free protein at which half of all binding sites of the second protein type are occupied [2].
  • Objectives: Three main objectives of this work can be formulated as follows: 1. Refine PDBbind [12] data and a standard binding affinity dataset [3], and compile a novel benchmark of PPIs with known binding affinity values. 2. Employ graph-learning toolset to predict binding affinities of PPIs from the new dataset. 3. Benchmark the resulting method against existing state-of-the-art approaches
  • Data & Metrics: In this work, we will operate on experimentally-observed three-dimensional structures of protein-protein complexes annotated with the binding affinity values. Two main sources of data are the following:
  • PDBbind dataset [12] that includes around 2k PPIs
  • Standard dataset introduced in [3] that includes 144 PPIs As main regression metrics, we suggest to consider Mean Squared Error (MSE), Mean Absolute Error (MAE) and Pearson correlation.
  • Novelty: To the best of our knowledge, geometric deep learning methods have never been applied to the protein-protein binding affinity prediction problem so far.
  • Authors: Arne Schneuing, Ilia Igashov

Problem 109.2022

  • Title: Continuous time when building a BCI neural interface
  • Problem description: In Signal Decoding The problems, data is represented as multivariate time series. When solving problems, a discrete representation is used time. However, recent work on neural ordinary differential equations illustrates the ability to work with the hidden state of recurrent neural networks, as with solutions to differential equations. This allows us to consider time series as continuous in time.
  • Data: For classification:
    1. dataset P300, according to which the article was written
    2. dataset DEAPdataset similar to it in the format of records.
    3. Definition of emotions.
    4. Same SEED emotion classification
    5. Not EEG, but accelerometer data with activity/position classification
    6. For regression, you can take the same neurotycho, if you want to complicate life somewhat with respect to classification problems.
  • Literature:
    1. Neural Ordinary Differential Equations
    2. Neural controlled differential equations for irregular time series
    3. Latent ODEs for Irregularly-Sampled Time Series (?)
    4. GRU-ODE-Bayes: Continuous modeling of sporadically-observed time series (?)
    5. Neural Rough Differential Equations for Long Time Series (?)
    6. ODE2VAE: Deep generative second order ODEs with Bayesian neural networks (?)
    7. Go with the Flow: Adaptive Control for Neural ODEs
    8. Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks
    9. My master's
  • Base algorithm: Alina Samokhina's algorithm
  • Solution: Using NeurODE variations to approximate the original signal. (Bayes, partial derivatives, etc.). Comparative analysis of existing approaches to the application of differential equations for EEG classification
  • Novelty: suggests a way to construct a continuous signal representation. Working with the functional space of the signal, not its discrete representation. Using the parameters of the resulting function as a feature space of the resulting model.
  • Authors: Alina Samokhina, Strijov V.V.

Problem 104.2022

  • Title: (Clarification awaited) Cross-language duplicate search
  • Problem description: The problem of cross-language search for text plagiarism is set. The search for duplicates of the original text is carried out among texts in 100 different languages.
  • Data:
    1. A selection of scientific articles from the scientific electronic library eLIBRARY.ru, as well as articles from the Wikipedia online encyclopedia, is used as a training sample.
    2. The State Rubricator of Scientific and Technical Information (SRSTI), the Universal Decimal Classifier (UDC) are considered as scientific rubricators.
    3. The following are used as search quality metrics:
    4. average frequency - the frequency, averaged over the control languages, with which the query document falls into the top 10% of documents among which the search is carried out
    5. average percentage - the percentage of documents, averaged over the control languages, that are in the top 10% of translation documents that have the same scientific heading as the query document
  • Literature: Vorontsov K. V. Probabilistic thematic modeling: review of models and additive regularization PDF
  • Base algorithm:
    1. Hierarchical topic models
    2. Topic models with one-pass document vectorization
  • Solution: To solve the search problem, a multimodal thematic model was built. 100 languages were used as modalities, as well as scientific headings, which included articles from the training data. A series of experiments was carried out to improve search quality metrics, including: selection of the optimal tokenization method, addition of regularizers, selection of thematic vector comparison functions, ranking functions, etc.
  • Novelty: Most systems for finding documents in large collections are based on vectorization of the documents in the collection and the search document in one way or another. The latest ways to vectorize documents are usually limited to one language. In this case, the problem arises of creating a uniform system for obtaining vector embeddings of a multilingual collection of documents. The proposed approach makes it possible to train a topic model that encodes information about the distribution of words in a text, regardless of their language affiliation. Also, the solution is subject to restrictions on the size of the model and training time, due to the possibility of practical use of the described model.
  • Author: Polina Potapova, Konstantin Vorontsov

Problem 52.2022

  • Title: (pending clarification) Predicting the quality of protein models using spherical convolutions on 3D graphs.
  • Problem description: The purpose of this work is to create and study a new convolution operation on three-dimensional graphs within the framework of solving the problem of assessing the quality of three-dimensional protein models (The problem regression on graph nodes).
  • Data: Models generated by CASP contestants are used.
  • Literature:
    1. The problem details.
    2. Relational inductive biases, deep learning, and graph networks.
    3. Geometric deep learning: going beyond euclidean data.
  • Base algorithm: As a base algorithm, we will use a neural network based on the graph convolution method, which is generally described in [4].
  • Solution: The presence of a peptide chain in proteins allows you to uniquely enter local coordinate systems for all graph nodes, which makes it possible to create and apply spherical filters regardless of the graph topology.
  • Novelty: In general, graphs are irregular structures, and in many Graph Learning The problems, sample objects do not have a single topology. Therefore, the existing operations of convolutions on graphs are greatly simplified or do not generalize to different topologies. In this paper, we propose to consider a new method for constructing a convolution operation on three-dimensional graphs, for which it is possible to uniquely choose local coordinate systems associated with each node.
  • Author: Sergey Grudinin

Problem 110. 2022 (technical)

  • Title: Detection of defects on the car body
  • SubThe problems: Classification of cars by type and brand, Classification of car parts (door, hood, roof, etc.), Segmentation of defective areas on different parts of the car, Classification of defects by type (dent, scratch, glass damage), Assessment of the degree of damage,
  • Data:
    1. Coco Car Damage Detection Dataset - 70 photos of damaged cars with frames, semantic mask and damage type (headlight, front bumper, hood, door, rear bumper)
    2. Сar_damage - 920 photos of damaged cars with labeled masks
    3. CarDent-Detection-Assessment - 100 photos of damaged cars with labeled masks
    4. CarAccidentDataset - 52 photos of damaged cars with labeled masks
    5. Car damage detection - 950 photos of damaged and 1150 photos of whole cars
    6. Car Damage - 1512 photos of damaged cars. Labeled to classify the type of damage
    7. Cars Dataset - 16185 photos of whole cars, 196 models. Images with different angles, labels and frames of machine elements for matching angles.
  • Author: Andrey Inyakin

Problem 111.2022 (technical)

  • Title: Recognition of named entities in informational Russian-language news
  • SubThe problems: Estimating the accuracy of available NER models (up to 2 weeks for data collection and markup)
  • Base algorithm: Development of an algorithm for saturation (augmentation) of the training sample with rare named entities
  • Data: To solve the problem, datasets of news from Interfax with the markup of named entities will be prepared.

2021

Author Topic Links Consultant Letters Reviewer
Grebenkova Olga Variational optimization of deep learning models with model complexity control LinkReview

GitHub Paper Slides Video

Oleg Bakhteev AILP+UXBR+HCV+TEDWSS Shokorov Vyacheslav

Review

Pilkevich Anton Existence conditions for hidden feedback loops in recommender systems GitHub

LinkReview Paper Slides Video

Khritankov Anton AILB*P-X+R-B-H1CVO*T-EM*H1WJSF Gorpinich Maria

Review

Antonina Kurdyukova Determining the phase and disorder of human movement based on the signals of wearable devices LinkReview

GitHub Paper Slides Video

Georgy Kormakov AILB*PXBRH1CVO*TEM*WJSF Pilkevich Anton

Review

Yakovlev Konstantin A differentiable search algorithm for model architecture with control over its complexity LinkReview

GitHub Paper Slides Video

Grebenkova Olga AILB*PXBRH1CVO*TEM*WJSF Pyrau Vitaly

Review

Gorpinich Maria Trajectory Regularization of Deep Learning Model Parameters Optimization Based on Knowledge Distillation LinkReview

GitHub Paper Slides Video

Oleg Bakhteev AILB*P+XBRC+VH1O*TEM*WJSF Kulakov Yaroslav

Review

Alexandr Tolmachev Analysis of the QPFS Feature Selection Method for Generalized Linear Models LinkReview

GitHub Paper Slides Video

Aduenko Alexander AILB*PXB-R-H1CVO*TEM*WJSF Antonina Kurdyukova

Review

Kulakov Yaroslav BCI: Selection of consistent models for building a neural interface LinkReview

GitHub Paper Slides Video

Isachenko Roman AILB*PXBRH1CVO*TEM*WJ0SF Zverev Egor

Review

Pyrau Vitaly Experimental comparison of several problems of operational planning of biochemical production. LinkReview

GitHub Paper Slides Video

Trenin Sergey Alekseevich AILB*PXBRH1CVO*TEM*WJSF Yakovlev Konstantin

Review

Bazhenov Andrey Search for the boundaries of the iris by the method of circular projections LinkReview

GitHub Paper Slides Video

Matveev Ivan Alekseevich AILB*PXB0RH1CVO*TEM*WJ0SF
Zverev Egor Learning co-evolution information with natural language processing for protein folding problem LinkReview

GitHub Paper Slides Video

Ilya Igashov AILB*PXBRH1CVO*TEM*WJSF Alexandr Tolmachev

Review

Gorchakov Vyacheslav Importance Sampling for Chance Constrained Optimization LinkReview

Github Paper Video

Yuri Maksimov AILB*PX0B0R0H1C0V0O*0T0E0M*0W0JS0F Bazhenov Andrey

Review

Lindemann Nikita Training with an expert for a sample with many domains LinkReview

Github Paper Slides

Andrey Grabovoi AILPXBRH1C0V0O*TE0M*0W0J0SF0

Problem 74.2021

  • Title: Existence conditions for hidden feedback loops in recommender systems
  • Problem description: In recommender systems, the effect of artificially inadvertently limiting the user's choice due to the adaptation of the model to his preferences (echo chamber / filter bubble) is known. The effect is a special case of hidden feedback loops. (see - Analysis H.F.L.). It is expressed in the fact that by recommending the same objects of interest to the user, the algorithm maximizes the quality of its work. The problem is a) lack of variety b) saturation / volatility of the user's interests.
  • Problem description:It is clear that the algorithm does not know the interests of the user and the user is not always honest in his choice. Under what conditions, what properties of the learning algorithm and dishonesty (deviation of the user's choice from his interests) will the indicated effect be observed? Clarification. The recommendation algorithm gives the user a_t objects to choose from. The user selects one of them c_t from Bernoulli from the model of interest mu(a_t) . Based on the user's choice, the algorithm changes its internal state w_t and gives the next set of objects to the user. On an infinite horizon, you need to maximize the total reward sum c_t. Find the conditions for the existence of an unlimited growth of user interest in the proposed objects in a recommender system with the Thomson Sampling (TS) MAB algorithm under conditions of noisy user choice c_t. Without noise, it is known that there is always unlimited growth (in the model) [1].
  • Data: are created as part of the experiment (simulation model) by analogy with the article [1], external data is not required.
  • References:
    1. Jiang, R., Chiappa, S., Lattimore, T., György, A. and Kohli, P., 2019, January. Degenerate feedback loops in recommender systems. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (pp. 383-390).
    2. Khritankov, A. (2021). Hidden Feedback Loops in Machine Learning Systems: A Simulation Model and Preliminary Results. In International Conference on Software Quality (pp. 54-65). Springer, Cham.
    3. Khritankov A. (2021). Hidden feedback loop experiment demo. https://github.com/prog-autom/hidden-demo
  • Base algorithm: The initial mathematical model of the phenomenon under study is described in the article [1]. The method of experimental research is in the article [2]. The base source code is available at [3]
  • Solution: It is necessary to derive conditions for the existence of positive feedback for the Thomson Sampling Multi-armed Bandit algorithm based on the known theoretical properties of this algorithm. Then check their performance in the simulation model. For verification, a series of experiments is performed with the study of parameter ranges and the estimation of the error (variance) of the simulation. The results are compared with the previously constructed mathematical model of the effect. There is an implementation of the experiment system that can be improved for this The problem.
  • Novelty: The studied positive feedback effect is observed in real and model systems and is described in many publications as an undesirable phenomenon. There is his model for the limited case of the absence of noise in the user's actions, which is not implemented in practice. Under the proposed conditions, The problem has not previously been posed and not solved for recommender systems. For the regression problem, the solution is known.
  • Authors: Expert, consultant Anton Khritankov

Problem 77.2021

  • Title: Determining the phase and disorder of human movement by signals from wearable devices
  • Problem description: A wide class of periodic movements of a person or an animal is investigated. It is required to find the beginning and end of the movement. It is required to understand when one type of movement ends and another begins. For this, The problem of segmentation of time series is solved. The phase trajectory of one movement is constructed and its actual dimension is found. The purpose of the work is to describe a method for finding the minimum dimension of the phase space. By repetition of the phase, segment the periodic actions of a person. It is also necessary to propose a method for extracting the zero phase in a given space for a specific action. Bonus: find the discord in the phase trajectory and indicate the change in the type of movement. Bonus 2: do this for different phone positions by proposing invariant transformation models.
  • Data: The data consists of time series read from a three-axis accelerometer with an explicit periodic class (walking, running, walking up and down stairs, etc.). It is possible to get your own data from a mobile device, or get model data from the dataset UCI HAR
  • References:
    1. A. P. Motrenko, V. V. Strijov. Extracting fundamental periods to segment biomedical signals // Journal of Biomedical and Health Informatics, 2015, 20(6).P. 1466–1476. Time series segmentation with periodic actions: The segmentation problem was solved using a fixed-dimensional phase space. PDFURL
    2. A.D. Ignatov, V. V. Strijov. Human activity recognition using quasi-periodic time series collected from a single triaxial accelerometer. // Multimedia Tools and Applications, 2015, P. 1–14. Classification of human activity using time series segmentation: classifiers were studied on the resulting segments. PDFURL
    3. Grabovoy, A.V., Strijov, V.V. Quasi-Periodic Time Series Clustering for Human Activity Recognition. Lobachevskii J Math 41, 333–339 (2020). Segmentation of time series into quasi-periodic segments: Segmentation methods were explored using principal component analysis and transition to phase space. Text Slides DOI
  • Base algorithm: The basic algorithm is described in 1 and 3 works, code here, work code 3 author.
  • Solution: It is proposed to consider various dimensionality reduction algorithms and compare different spaces in which the phase trajectory is constructed. Develop an algorithm for finding the minimum dimension of the phase space in which the phase trajectory has no self-intersections up to the standard deviation of the reconstructed trajectory.
  • Novelty: In Motrenko's article, the space dimension is equal to two. This shortcoming must be corrected. The phase trajectory must not intersect itself. And if we can distinguish one type of movement from another within one period (switched from running to a step and realized this within one and a half steps), it will be great.
  • Authors:

consultants: Kormakov G.V., Tikhonov D.M., Expert Strijov V.V.

Problem 78. 2021

  • Title: Importance Sampling for Scenario Approximation of Chance Constrained Optimization
  • Problem description: Optimization problems with probabilistic constraints are often encountered in engineering practice. For example, The problem of minimizing energy generation in energy networks, with (randomly fluctuating) renewable energy sources. In this case, it is necessary to comply with safety restrictions: voltages at generators and consumers, as well as currents on the lines, must be less than certain thresholds. However, even in the simplest situations, The problem cannot be resolved exactly. The best-known approach is the chance constrained optimization methods, which often give a good approximation. An alternative approach is sampling the network operation modes and solving the problem on the data set of the classification * Problem description: separating bad modes from good ones with a given error of the second kind. At the same time, for a sufficiently accurate solution, a very large amount of data is required, which often makes the problem numerically inefficient. We suggest using “importance sampling” to reduce the number of scenarios. Importance sampling consists of substituting a sample from a nominal solution, which often carries no information since all bad events are very rare, with a synthetic distribution that samples the sample in a neighborhood of bad events.
  • Problem statement: find the minimum of a convex function (price) under probabilistic constraints (the probability of exceeding a certain threshold for a system of linear/quadratic functions is small) and numerically show the effectiveness of sampling in this problem.
  • Data: Data is available in the pypower and matpower packages as csv files.
  • References: The proposed algorithms are based on 3 articles:
    1. Owen, Maximov, Chertkov. Importance Sampling for the Union of Rare Events with Applications to Power Systems LINK
    2. A. Nemirovski. On safe tractable approximations of chance constraints LINK
    3. S. Tong, A. Subramanyam, and Vi. Rao. Optimization under rare chance constraints. LINK
    4. In addition, the authors of the problem have a draft of the article, in which you need to add a numerical part.
  • Base algorithm: A list of basic algorithms is provided in this lecture LINK
  • Solution: in numerical experiments, you need to compare the sample size requirements for standard methods (scenario approximation) and using importance sampling to obtain a solution of comparable quality (and inverse The problem, having equal sample lengths, compare the quality of the solution)
  • Novelty: The problem has long been known in the community and scenario approximation is one of the main methods. At the same time, importance sampling helps to significantly reduce the number of scenarios. We have recently received a number of interesting results on how to calculate optimal samplers, with their use the complexity of the problem will be significantly reduced
  • Authors: Expert Yuri Maksimov, consultant Yuri Maksimov and Alexander Lukashevich.

Problem 79.2021

  • Title: Improving Bayesian Inference in Physics Informed Machine Learning
  • Problem description: Machine learning methods are currently widely used in physics, in particular, in solving turbulence problems or analyzing the stability of physical networks. At the same time, the key issue is which modes to choose for training models. A frequent choice is a sequence of points that uniformly covers the admissible set. However, often such sequences are not very informative, especially if analytical methods give a region where the system is guaranteed to be stable. The problem proposes several methods of sampling: allowing to take into account this information. Our goal is to compare them and find the one that requires the smallest sample size (empirical comparison).
  • Data: The experiment is proposed to be carried out on model and real data. The simulation experiment consists in analyzing the stability of (slightly non-linear) differential equations (synthetic data is self-generated). The second experiment is to analyze the stability of energy systems (data from matpower, pypower, GridDyn).
  • References:
    1. Art Owen. Quasi Monte Carlo Sampling. LINK
    2. Jian Cheng & Marek J. Druzdzel. Computational Investigation of Low-Discrepancy Sequences in Simulation Algorithms for Bayesian Networks LINK
    3. A. Owen, Y Maximov, M. Chertkov. Importance Sampling for the Union of Rare Events with Applications to Power Systems LINK
    4. Polson and Solokov. Deep Learning: A Bayesian Perspective LINK
    5. In addition: the authors of the problem have a draft work on this topic
  • Base algorithm: The basic algorithm we are improving is Quasi Monte Carlo (QMC, LINK ). The problem to construct low discrepancy sequences not covering the polyhedral region and the region given by the intersection of the quadratic constraints. Another algorithm with which we need a comparison: E. Gryazina, B. Polyak. Random Sampling: a Billiard Walk Algorithm LINK and algorithms Hit and Run LINK
  • Solution: sampling methods by importance, in particular the extension of the approach (Boy, Ryi, 2014) and (Owen, Maximov, Chertkov, 2017) and their applications to ML/DL for physical problems
  • Novelty: in a significant reduction in sample complexity and the explicit use of existing and analytical results and learning to solve physical problems, before that ML approaches and analytical solutions were mostly parallel courses
  • Authors: Expert Yuri Maksimov, consultant Yuri Maksimov and Alexander Lukashevich, student.

Problem 81.2021

  • Title: NAS — Generation and selection of neural network architectures
  • Problem description: The problem of choosing the optimal neural network architecture is set as The problem of sampling the vector of structural parameters. The optimality criterion is defined in terms of the accuracy, complexity and stability of the model. The sampling procedure itself consists of two steps: generating a new structure and rejecting this structure if it does not satisfy the optimality criterion. It is proposed to explore various methods of sampling. The formulation of the problem of choosing the optimal structure is described in Potanin-1
  • Data: : Two separate sets are offered as data. The first one consists of one element, this is the popular MNIST dataset. Pros - is a strong and generally accepted baseline, was used as a benchmark for the WANN article, quite large (multi-class classification). The second set is a set of datasets for the regression The problem. Size varies from very small to quite large. Here is a link to the dataset and laptop to download the data data.
  • References:
    1. Potanin - 1
    2. Potanin - 2. One more work, the text is given to the interested student, but without publication.
    3. Strijov Factory laboratory Error function
    4. Informtica
    5. WANN
    6. DARTS
    7. Symbols
    8. NEAT
  • Base algorithm: Closest project, and its code. Actual code from consultant.
  • Solution: A number of experiments have already been performed, where sampling is performed by a genetic algorithm. Acceptable results have been obtained. It is proposed to analyze and improve them. Namely, to distinguish two modules: generation and deviation and compare several types of sampling. Basic - Importance sampling, desirable - Metropolis-Hastings (or even Metropolis-Langevin) sampling. Since the genetic algorithm is considered by us as a process with jumps, it is proposed to take this into account when designing the sampling procedure. The bonus of MH is that it has a Bayesian interpretation. The first level of Bayesian inference as applied to MH is described in [Informatica]. It is required either to rewrite it in terms of the distribution of structural parameters, or to describe both levels in general, moving the structural parameters to the second level (by the way, approximately the same will be in the Aduenko problem).
  • Novelty: Neural networks excel at The problems of computer vision, reinforcement learning, and natural language processing. One of the main goals of neural networks is to perform well The problems that are currently solved exclusively by humans, that is, natural human neural networks. Artificial neural networks still work very differently from natural neural networks. One of the main differences is that natural neural networks evolve over time, changing the strength of connections and their architecture. Artificial neural networks can adjust the strength of connections using weights, but cannot change their architecture. Therefore, The problem of choosing the optimal structures of neural networks for specific The problems seems to be an important step in the development of the capabilities of neural network models.
  • Authors: consultant Mark Potanin, Expert Strijov V.V.

Problem 82.2021

  • Title: Training with an Expert for a sample with many domains.
  • Problem description: The problem of approximating a multi-domain sample by a single multi-model - a mixture of Experts is considered. As data, it is supposed to use a sample that contains several domains. There is no domain label for each object. Each domain is approximated by a local model. The paper considers a two-stage The problem optimization based on the EM algorithm.
  • Data: Samples of reviews from the Amazon site for different types of goods are used as data. It is supposed to use a linear model as a local model, and use tf-idf vectors within each domain as an indicative description of reviews.
  • References:
    1. https://arxiv.org/pdf/1806.00258.pdf
    2. http://www.mysmu.edu/faculty/jingjiang/papers/da_survey.pdf
    3. https://dl.acm.org/doi/pdf/10.1145/3400066
  • Basic algorithm and Solution: The basic solution is presented here. The work uses the expert mixture method for the Multi-Soruce domain adaptation problem. The code for the article is available link.
  • Novelty: At the moment, in machine learning there are more and more The problems related to data that are taken from different sources. In this case, there are samples that consist of a large number of domains. At the moment, there is no complete theoretical justification for constructing mixtures of local models for approximating such types of samples.
  • Authors: Grabovoi A.V., Strijov V.V.

Problem 17.2021

  • Title: BCI: Selection of consistent models for building a neural interface
  • Problem: When building brain-computer interface systems, simple, stable models are used. An important step in building an interface is such a model is an adequate choice of model. A wide range of models is considered: linear, simple neural networks, recurrent networks, transformers. The peculiarity of the problem is that when making a prediction, it is required to model not only the initial signal taken from the cerebral cortex, but also the target signal taken from the limbs. Thus, two models are required. In order for them to work together, a space of agreements is being built. It is proposed to explore the properties of this space and the properties of the resulting forecast (neural interface) on various pairs of models.
  • Data: ECoG/EEG brain signal data sets.
    1. Need ECoG (dataset 25 contains EEG, EOG and hand movements) http://bnci-horizon-2020.eu/database/data-sets
    2. neyrotycho — our old data.
  • References:
    1. Yaushev F.Yu., Isachenko R.V., Strijov V.V. Latent space matching models in the forecasting problem // Systems and Means of Informatics, 2021, 31(1). PDF
    2. Isachenko R.V. Choice of a signal decoding model in high-dimensional spaces. Manuscript, 2021. PDF
    3. Isachenko R.V. Choice of a signal decoding model in high-dimensional spaces. Slides, 2020. [5]
    4. Isachenko R.V., Vladimirova M.R., Strijov V.V. Dimensionality reduction for time series decoding and forecasting problems // DEStech Transactions on Computer Science and Engineering, 2018, 27349 : 286-296. PDF
    5. Isachenko R.V., Strijov V.V. Quadratic Programming Optimization with Feature Selection for Non-linear Models // Lobachevskii Journal of Mathematics, 2018, 39(9) : 1179-1187. PDF
    6. Motrenko A.P., Strijov V.V. Multi-way feature selection for ECoG-based brain-computer interface // Expert Systems with Applications, 2018, 114(30) : 402-413. PDF
    7. Eliseyev A., Aksenova T. Stable and artifact-resistant decoding of 3D hand trajectories from ECoG signals using the generalized additive model //Journal of neural engineering. – 2014.
  • Basic algorithm: Described in the first work. The code is available. In that work, the data is two parts of an image. In our work, the signal of the brain and the movement of the hands. Super* Problem description: to finish the first job. Also the code and works here.
  • Solution: The case is considered when the initial data are heterogeneous: the spaces of the independent and target variables are of different nature. It is required to build a predictive model that would take into account the dependence in the source space of the independent variable, as well as in the space of the target variable. It is proposed to investigate the accuracy, complexity and stability of pairs of various models. Since the inverse The problem is solved when building a forecast, it is required to build inverse transformations for each model. To do this, you can use both basic techniques (PLS) and streams.
  • Novelty: Analysis of the prediction and latent space obtained by a pair of heterogeneous models.
  • Authors: Consultant Roman Isachenko, Expert Strijov V.V.

Problem 69.2021

  • Title: Graph Neural Network in Reaction Yield prediction
  • Problem description: There are disconnected graphs of source molecules and products in a chemical reaction. The yield of the main product in the reaction is known. It is required to design an algorithm that predicts yield by solving the regression The problem on given disconnected graphs.
  • Data: Database of reaction from US patents [6]
  • References:
    1. [7] A general overview.
    2. [8] Relational Graph Convolution Neural Network
    3. [9] Transformer architecture
    4. [10] Graph neural network learning for chemical compounds synthesis
  • Base algorithm: Transformer model. The input sequence is a SMILES representation of the source and product molecules.
  • Solution: A pipeline for working with disconnected graphs is proposed. The pipeline includes the construction of extended graph with molecule and reaction representation, Relational Graph Convolution Neural Network, Encoder of Transformer. The method is applied to solve yield predictions.
  • Novelty: A solution for regression problem on the given disconnected graph is constructed; the approach demonstrates better performance compared with other solutions
  • Authors: Nikitin Filipp, Isayev Olexandr, Strijov V.V.

Problem 84.2021

  • Title: Trajectory Regularization of Deep Learning Model Parameters Optimization Based on Knowledge Distillation
  • Problem description: The problem of optimizing the parameters of a deep learning model is considered. The case is considered when the responses of a more complex model (teacher model) are available during optimization. The classical approach to solving such a problem is learning based on the responses of a complex model (knowledge distillation). Assignment of hyperparameters is made empirically based on the results of the model on delayed sampling. In this paper, we propose to consider a modification of the approach to knowledge distillation, in which the coefficient of significance of the distilling term, as well as its gradients, act as hyperparameters. Both of these groups of parameters allow you to adjust the optimization of the model parameters. To optimize hyperparameters, it is proposed to consider the optimization problem as a two-level optimization problem, where at the first level of optimization The problem of optimizing the model parameters is solved, and at the second level The problem of optimizing hyperparameters is approximately solved by the value of the loss function on the delayed sample.
  • Data: Sampling of CIFAR-10 images
  • References:
    1. Distillation of knowledge
    2. Hyperparameter Optimization in a Bilevel * Problem description: Greedy Method
    3. Hyperparameter Optimization in a Bilevel * Problem description: Comparison of Approaches
    4. Meta Optimization: neural network instead of optimization operator
  • Basic algorithm: Model optimization without distillation and with standard distillation approach
  • Solution: Using a two-level problem for model optimization. The combination of gradients for both terms is processed by a separate model (LSTM)
  • Novelty: A new approach to model distillation will be proposed to significantly improve the performance of models trained in privileged information mode. It is also planned to study the dynamics of changes in hyperparameters in the optimization process.
  • Authors: Oleg Bakhteev, Strijov V.V.

Problem 85.2021

  • Title: A differentiable search algorithm for model architecture with control over its complexity
  • Problem description: The problem of choosing the structure of a deep learning model with a predetermined complexity is considered. It is required to propose a method for searching for a model that allows controlling its complexity with low computational costs.
  • Data: MNIST, CIFAR
  • References:
    1. Grebenkova O.S., Oleg Bakhteev, Strijov V.V.Variational optimization of a deep learning model with complexity control // Informatics and its applications, 2021, 15(2). PDF
    2. DARTS
    3. hypernets
  • Basic algorithm: DARTS
  • Solution: The proposed method is to use a differentiable neural network architecture search algorithm (DARTS) with parameter complexity control using a hypernet.
  • Novelty: The proposed method allows you to control the complexity of the model, in the process of searching for an architecture without additional heuristics.
  • Authors: Oleg Bakhteev, Grebenkova O. S.

Problem 86. 2021

  • Title: Learning co-evolution information with natural language processing for protein folding problem
  • Problem: One of the most essential problems in structural bioinformatics is protein fold recognition since the relationship between the protein amino acid sequence and its tertiary structure is revealed by protein folding. A specific protein fold describes the distinctive arrangement of secondary structure elements in the nearly-infinite conformation space, which denotes the structural characteristics of a protein molecule.
  • Problem description:: request
  • Authors: Sergei Grudinin, Maria Kadukova.

Problem 87.2021

  • Title: Bayesian choice of structures of generalized linear models
  • Problem description: The work is devoted to testing methods for feature selection. It is assumed that the sample under study contains a significant number of multicollinear features. Multicollinearity is a strong correlation between the features selected for analysis that jointly affect the target vector, which makes it difficult to estimate regression parameters and identify the relationship between features and the target vector. There is a set of time series containing the readings of various sensors that reflect the state of the device. The readings of the sensors correlate with each other. It is necessary to choose the optimal set of features for solving the forecasting problem.
  • Novelty: One of the most preferred feature selection algorithms has been published. It uses structural parameters. But there is no theoretical justification. It is proposed to build a theory by describing and analyzing various functions of a priori distribution of structural parameters. In works on the search for structures of neural networks, there is also no clear theory and a list of a priori assumptions.
  • Data: Multivariate time series with readings from various sensors from paper 4, for starters, all samples from paper 1.
  • References: Keywords: bootstrap aggregation, Belsley method, vector autoregression.
    1. Katrutsa A.M., Strijov V.V. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria // Expert Systems with Applications, 2017, 76 : 1-11. PDF
    2. Katrutsa A.M., Strijov V.V. Stresstest procedure for feature selection algorithms // Chemometrics and Intelligent Laboratory Systems, 2015, 142 : 172-183. PDF
    3. Strijov V.V. Error function in regression recovery problems // Factory laboratory. material diagnostics, 2013, 79(5) : 65-73. PDF
    4. Zaitsev A.A., Strijov V.V., Tokmakova A.A. Estimation of hyperparameters of regression models by the maximum likelihood method // Information technologies, 2013, 2 : 11-15. PDF
    5. Kuznetsov M.P., Tokmakova A.A., Strijov V.V. Analytic and stochastic methods of structure parameter estimation // Informatica, 2016, 27(3) : 607-624. PDF
    6. Katrutsa A.M., Strijov V.V. The problem of multicollinearity in the selection of features in regression problems // Information technologies, 2015, 1 : 8-18. PDF
    7. Neichev Р.Г., Katrutsa A.M., Strijov V.V. Selection of the optimal set of features from a multicorrelated set in the forecasting problem. Zavodskaya Lab. material diagnostics, 2016, 82(3) : 68-74. PDF
  • Base algorithm: Described in Reference 1: Quadratic Programming for QPFS Feature Selection. Code from Roman Isachenko.
  • Solution: It is proposed to consider the structural parameters used in QPFS at the second level of Bayesian inference. Introduce informative a priori distributions of parameters and structural parameters. Compare different a priori assumptions.
  • Novelty: Statistical Analysis of Structural Parameter Space and Visualization
  • Authors: Alexander Aduenko consultant, Strijov V.V.

Problem 88.2021

  • Name: Search for the boundaries of the iris by the method of circular projections
  • Problem: Given a monochrome bitmap of the eye, examples. The approximate position of the center of the pupil is also known. The word "approximate" means that the calculated center of the pupil is no more than half of its true radius from the true one. It is necessary to determine the approximate positions of the circles approximating the pupil and iris. The algorithm must be very fast.
  • Data: About 200 thousand eye images. For each, the position of the true circles is marked - for the purpose of training and testing the method being created.
  • Basic algorithm: To speed up work with the image, it is proposed to aggregate data using circular projections of brightness. Circular projection is a function that depends on the radius, the value of which P(r) is equal to the integral of the directed image brightness gradient over a circle of radius r (or along an arc of a circle). Example for one arc (right quadrant) and for four arcs. Having built some circular projections, based on them, you can try to determine the position of the inner and outer borders of the iris (ring) using heuristics and / or a neural network. It is interesting to evaluate the capabilities of the neural network in this The problem.
  • References: Matveev I.A. Detection of Iris in Image By Interrelated Maxima of Brightness Gradient Projections // Applied and Computational Mathematics. 2010. V.9. N.2. P.252-257 PDF
  • Author: Matveev I.A.

Problem 53.2021

  • Title: Solution of an optimization problem combining classification and regression to estimate the binding energy of a protein and small molecules.
  • Problem description: The goal of the problem is to solve an optimization problem with classification and regression loss functions applied to biological data.
  • Data: Approximately 12,000 complexes of proteins with small molecules. For classification, for each of them there is 1 correct position in space and 18 incorrect ones generated, for regression, each complex corresponds to the value of the binding constant (proportional to energy). The main descriptors are histograms of distributions of distances between different atoms.
  • References:
    1. https://www.overleaf.com/read/rjdnyyxpdkyj The problem details
    2. http://cs229.stanford.edu/notes/cs229-notes3.pdf SVM
    3. http://scikit-learn.org/stable/modules/linear_model.html#ridge-regression Ridge Regression
    4. https://alex.smola.org/papers/2003/SmoSch03b.pdf SVR
  • Base algorithm: In the classification The problem, we used an algorithm similar to linear SVM, whose relationship with the energy estimate, which is outside the scope of the classification The problem, is described in the article https://hal.inria.fr/hal-01591154/. For MSE, there is already a formulated dual The problem as a regression loss function, with the implementation of which we can start.
  • Solution: The first step is to solve the problem with the MSE in the loss function using a solver that is convenient for you. The main difficulty may be the large dimensionality of the data, but they are sparse. Further it will be possible to change the wording of the problem.
  • Novelty: Many models used to predict the interactions of proteins with ligands are "retrained" for some The problem. For example, models that are good at predicting binding energies may be poor at selecting a protein-binding molecule from a variety of non-binding ones, and models that are good at determining the correct geometry of the complex may be poor at predicting energies. In this problem, we propose to consider a new approach to combat such overfitting, since the combination of classification and regression loss functions seems to us to be a very natural regularization.
  • Authors: Sergei Grudinin, Maria Kadukova.

Problem 75.2021

  • Title: Alignment of image elements using metric models.
  • Problem description: Character set specified. Each symbol is represented by one file - an image. Image pixel size may vary. All images are known to belong to the same class, such as faces, letters, flowers, or cars. (A more complicated option is to one class, which we are studying and noise classes.) It is known that each image can be combined with another with the help of an equalizing transformation up to noise, or up to some average image. (This image may or may not be present in the sample). This leveling transformation is specified in the base case by a neural network, and in the proposed case - by a parametric transformation from some given class (the first is a special case of the second). The aligned image is compared with the original one using the distance function. If the distance between two images is statistically significant, it is concluded that the images belong to the same class. It is required to 1) propose an adequate model of the alignment transformation that takes into account the assumptions about the nature of the image (for example, only rotation and proportional scaling), 2) propose a distance function, 3) propose a method for finding the average image.
  • Data: Synthetic and real 1) pictures - faces and symbols with rotation and stretch transformation, 2) faces and cars with 3D rotation transformation with 2D projection. Synthetic images are proposed to be created manually using 1) photographs of a sheet of paper, 2) photographs of the surface of the drawing on a balloon.
  • References:
    1. support work - alignment of images using 2D DTW,
    2. support work - alignment of images using neural networks,
    3. DTW alignment work in 2D,
    4. parametric alignment work.
  • Base algorithm: from work 1.
  • Solution: In the attached file pdf.
  • Novelty: Instead of multidimensional image alignment, parametric alignment is proposed.
  • Authors: Alexey Goncharov, Strijov V.V.

Problem 80.2021

  • Title: Detection of correlations between activity in social networks and capitalization of companies
  • Problem description: At present, the significant impact on stock quotes, company capitalization and the success or failure of an IPO depends on social factors such as public opinion expressed on social media. A recent notable example is the change in GameStore quotes caused by the surge in activity on Reddit. Our The problem at the first stage is to identify quotes between the shares of companies in different segments and activity in social networks. That is, it is necessary to identify correlations between significant changes in the company's capitalization and previous bursts (positive or negative) of its discussion in social networks. That is, it is necessary to find the minimum of the loss function when restoring the dependence in various classes of models (parametrics, neural networks, etc.). This The problem is part of a large project to analyze the analysis of markets and the impact of social factors on risks (within a team of 5-7 professors), which will lead to a series of publications sufficient to defend a dissertation.
  • Data: The problem has a significant engineering context, the data is downloads from quotes on the Moscow Exchange, as well as NYT and reddit data (crawling and parsing is done by standard tools). The student working on this The problem must have strong engineering skills and a desire to engage in both the practice of machine learning and the engineering parts of The problem.
  • References:
    1. Paul S. Adler and Seok-Woo Kwon. Social Capital: Prospects for a new Concept. LINK
    2. Kim and Hastak. Social network analysis: Characteristics of online social networks after a disaster LINK
    3. Baumgartner, Jason, et al. "The pushshift reddit dataset." Proceedings of the International AAAI Conference on Web and Social Media. Vol. 14. 2020. LINK
  • Base algorithm: The basic algorithms are LSTM and Graph neural networks.
  • Solution: Let's start by using LSTM, then try some of its standard extensions
  • Novelty: In this area, there are a lot of economic, model solutions, but the accuracy of these solutions is not always high. The use of modern ML/DL models is expected to significantly improve the quality of the solution.
  • Authors: Expert Yuri Maksimov, consultant Yuri Maksimov, student.

Problem 88b.2021

  • Name: Finding a Pupil in an Eye Image Using the Luminance Projection Method
  • Problem: Given a monochrome bitmap of the eye, examples. It is necessary to determine the approximate coordinates of the center of the pupil. The word "approximate" means that the calculated pupil center must lie inside a circle centered at the pupil's true center and half the true radius. The algorithm must be very fast.
  • Data: About 200 thousand eye images. For each, the position of the true circle is marked - for the purpose of training and testing the method being created.

Basic algorithm: To speed up work with the image, it is proposed to aggregate data using brightness projections. Image brightness is a function of two discrete arguments. Its projection on the horizontal axis is equal to. Similarly, projections are constructed on axes with an inclination. Having built several projections (two, four), based on them, you can try to determine the position of the pupil (compact dark area) using heuristics and / or a neural network. It is interesting to evaluate the capabilities of the neural network in this The problem.

  • References: Zhi-Hua Zhou, Xin Geng Projection functions for eye detection // Pattern Recognition. 2004. V.37ю N.5. P.1049-1056. PDF
  • Author: Matveev I.A.

Problem 88c.2021

  • Name: Searching for a century in an image as a parabolic contour using the projection method.
  • Problem: Given a monochrome bitmap of the eye, examples. It is necessary to find the contour of the upper eyelid as a parabola, that is, to determine the parameters.
  • Data: About 200 thousand eye images. For some (about 2500), a human expert marked the position of a parabola that approximates the eyelid.
  • Basic algorithm: The first step is pre-processing the image with a vertical gradient filter with further binarization, below is a typical result. There are various options for the next step. For example, if the coordinates of the pupil are known, you can set the region of interest (from above) and in it, using the selected points, construct a parabola by approximation using the least squares method. An example result is given below. More subtle methods are possible, such as finding a parabola using the Hough transform (see Wikipedia). Another way is to use projective methods (Radon transform). The main idea: after specifying the coefficient , apply a coordinate transformation to the image, as a result of which all parabolas of the form formula turn into lines of the form , then, given the coefficient , apply the coordinate transformation where , after which the oblique lines of the formula form become horizontal, which are easy to determine, for example, by horizontal projection (by summing the values in the rows of the matrix of the resulting image. If the coefficients are guessed correctly, the perabola representing the eyelid will give a clear maximum in the projection. By going through the formula (having a physical meaning), you can find those that give the maximum projection value, and consider that the desired parabola - eyelid.
  • References: Wikipedia, articles "Hough Transform", "Radon Transform".
  • Author: Matveev I.A.

Problem 62.2021

  • Title: Construction of a method for dynamic alignment of multidimensional time series, resistant to local signal fluctuations.
  • Problem description: In the process of working with multidimensional time series, the situation of the close proximity of sensors corresponding to different measurement channels is common. As a result, small signal shifts in space can lead to signal peak fixation by neighboring sensors, which leads to significant differences in measurements in terms of L2 distance.
    Thus, small signal shifts lead to significant fluctuations in the readings of the sensors. The problem of constructing a distance function between points of time series that is resistant to noise generated by small spatial signal shifts is considered. It is necessary to consider the problem in the approximation of the presence of a map of the location of the sensors.
  • Data:
    1. Monkey brain activity measurements
    2. Artificially created data (several options must be proposed, for example signal movement in space clockwise and counterclockwise)
  • References:
    1. Reviriew DTW
    2. Multi-Dimensional Dynamic Time Warping for Gesture Recognition
    3. Multiple Multidimensional Sequence Alignment Using Generalized Dynamic Time Warping
  • Base algorithm: L2 distance between a pair of measurements.
  • Solution: Use the DTW distance function between two multidimensional time series. Two time axes are aligned, while inside the DTW functional, the distance between the i-th and j-th measurements is chosen such that it is resistant to local “shifts” of the signal. It is required to offer such functionality. The basic solution is L2, the improved solution is DTW between the i-th and j-th dimensions (dtw inside dtw).
    You can suggest some modification, for example, the distance between the hidden layers of the autoencoder for points i and j.
  • Novelty: A method for aligning multidimensional time series is proposed that takes into account small signal fluctuations in space.
  • Authors: Expert Strijov V.V., consultants Gleb Morgachev, Alexey Goncharov.

Problem 58.2021

  • Title: Transformation of the Gerchberg-Saxton algorithm using Bayesian neural networks. (or Neural network approach in the problem of phase search for images from the European synchrotron)
  • Problem description: The aim of the project is to improve the quality of resolution of images of nanosized objects obtained in the laboratories of the European Synchrotron Radiation Foundation.
  • Data: Contact an advisor for data (3GB).

References:

    1. [11] Iterative phase retrieval in coherent diffractive imaging: practical issues
    2. [12] X-ray nanotomography of coccolithophores reveals that coccolith mass and segment number correlate with grid size
    3. [13] Lens-free microscopy for 3D + time acquisitions of 3D cell culture
    4. [14] DEEP ITERATIVE RECONSTRUCTION FOR PHASE RETRIEVAL
    5. https://docs.google.com/document/d/1K7bIzU33MSfeUvg3WITRZX0pe3sibbtH62aw42wxsEI/edit?ts=5e42f70e LinkReview
  • Base algorithm: The transition from direct space to reciprocal space occurs using the Fourier transform. The Fourier transform is a linear transformation. Therefore, it is proposed to approximate it with a neural network. For example, an autoencoder for modeling forward and inverse Fourier transforms.
  • Solution: Transformation of the Gerchberg-Saxton algorithm using Bayesian neural networks. Use of information on physical limitations and expertise.
  • Novelty: Use of information about physical constraints and expert knowledge in the construction of the error function.
  • Authors: Experts Sergei Grudinin, Yuri Chushkin, Strijov V.V., consultant Mark Potanin

Problem 63.2021

  • Title: Hierarchical alignment of time sequences.
  • Problem description: The problem of alignment of sequences of difficult events is considered. An example is the complex behavior of a person: when considering data from IMU sensors, one can put forward a hypothesis: there is an initial signal, there are aggregates of “elementary actions” and there are aggregates of “actions” of a person. Each of the indicated levels of abstraction can be distinguished and operated on exactly by it.
    In order to accurately recognize the sequence of actions, it is possible to use metric methods (for example, DTW, as a method that is resistant to time shifts). For a more accurate quality of timeline alignment, it is possible to carry out alignment at different levels of abstraction.
    It is proposed to explore such a hierarchical approach to sequence alignment, based on the possibility of applying alignment algorithms to objects of different structures, having a distance function on them.
  • References:
    1. Overview presentation about DTW
    2. DTW-based kernel and rank-level fusion for 3D gait recognition using Kinect Multi-Dimensional Dynamic Time Warping for Gesture Recognition
    3. Time Series Similarity Measure via Siamese Convolutional Neural Network
    4. Multiple Multidimensional Sequence Alignment Using Generalized Dynamic Time Warping
  • Base algorithm: classic DTW.
  • Solution: It is proposed to perform the transition from one level of abstraction to another by using convolutional and recurrent neural networks. Then the object at the lower level of abstraction is the original signal. At the second level - a signal from the hidden layer of the model (built on the objects of the lower level), the dimension of which is much less, and the upper layer - a signal from the hidden layer of the model (built on the objects of the middle level).
    In this case, DTW is calculated separately between the lower , between the middle and between the upper levels, but the formation of objects for calculating the distance is carried out taking into account the alignment path between the objects of the previous level.
    This method is considered as a way to increase the interpretability of the alignment procedure and the accuracy of the action classification in connection with the transition to higher-level patterns. In addition, a significant increase in speed is expected.
  • Novelty: The idea of aligning time sequences simultaneously at several levels of abstraction is proposed. The method should significantly improve the interpretability of alignment algorithms and increase their speed.
  • Authors: Strijov V.V. Expert, Gleb Morgachev, Alexey Goncharov consultants.

Problem 57.2021

  • Title:Additive Regularization and in The problems of Privileged Learning in Solving the Problem of Predicting the State of the Ocean
  • Problem description: There is a sample of data from ocean buoys, it is required to predict the state of the ocean at different points in time.
  • Data: The buoys provide data on wave height, wind speed, wind direction, wave period, sea level pressure, air temperature and sea surface temperature with a resolution of 10 minutes to 1 hour.
    1. References:
    2. [15]
  • Base algorithm: Using a simple neural network.
  • Solution:Adding to the basic algorithm (a simple neural network) a system of differential equations. Explore the properties of the parameter space of teacher and student according to the preferred approach.
  • Novelty: Investigation of the parameter space of the teacher and the student and their change. It is possible to set up separate teacher and student models and track the change in their parameters in the optimization process - variance, change in the quality of the student when adding teacher information, complexity.
  • Authors: Strijov V.V., Mark Potanin

Problem 52. 2021

  • Title: Predicting the quality of protein models using spherical convolutions on 3D graphs.
  • Problem: The purpose of this work is to create and study a new convolution operation on three-dimensional graphs in the framework of solving the problem of assessing the quality of three-dimensional protein models (The problem regression on graph nodes).
  • Data: Models generated by CASP competitors are used (http://predictioncenter.org).
  • References:
    1. [16] More about The problem.
    2. [17] Relational inductive biases, deep learning, and graph networks.
    3. [18] Geometric deep learning: going beyond euclidean data.
  • Base algorithm: As a basic algorithm, we will use a neural network based on the graph convolution method, which is generally described in [19].
  • Solution: The presence of a peptide chain in proteins makes it possible to uniquely introduce local coordinate systems for all graph nodes, which makes it possible to create and apply spherical filters regardless of the graph topology.
  • Novelty: In the general case, graphs are irregular structures, and in many graph learning The problems, the sample objects do not have a single topology. Therefore, the existing operations of convolutions on graphs are greatly simplified or do not generalize to different topologies. In this paper, we propose to consider a new method for constructing a convolution operation on three-dimensional graphs, for which it is possible to uniquely choose local coordinate systems associated with each node.
  • Authors: Sergei Grudinin, Ilya Igashov.

Problem 44+. 2021

  • Title: Early prediction of sufficient sample size for a generalized linear model.
  • Deiscription: The problem of experiment planning is investigated. The problem of estimating a sufficient sample size according to the data is solved. The sample is assumed to be simple. It is described by an adequate model. Otherwise, the sample is generated by a fixed probabilistic model from a known class of models. The sample size is considered sufficient if the model is restored with sufficient confidence. It is required, knowing the model, to estimate a sufficient sample size at the early stages of data collection.
  • Goal: On a small simple iid sample, predict the error on a replenished large one. The predictive model is smooth monotonic in two derivatives. The choice of model is a complete enumeration or genetics. The model depends on the reduced (explore) covariance matrix of the GLM parameters.
  • Data: For the computational experiment, it is proposed to use classical samples from the UCI repository. Link to selections https://github.com/ttgadaev/SampleSizeEstimation/tree/master/datasets
  • References:
    1. Overview of Methods, Motivation and Problem Statement for Sample Size Estimation
    2. http://svn.code.sf.net/p/mlalgorithms/code/PhDThesis/.
    3. Bootstrap method. https://projecteuclid.org/download/pdf_1/euclid.aos/1.

Bishop, C. 2006. Pattern Recognition and Machine Learning. Berlin: Springer. 758 p.

  • Basic algorithm: We will say that the sample size is sufficient if the log-likelihood has a small variance on a sample of size m calculated using the bootstrap.

We are trying to approximate the dependence of the average value of log-likelihood and its variance on the sample size.

  • Solution: The methods described in the review are asymptotic or require a deliberately large sample size. The new method should be to predict volume in the early stages of experiment design, i.e. when data is scarce.
  • Authors: expert Strijov V.V., consultant Malinovsky G.

Problem 12.2021

  • Title: Machine translation training without parallel texts.
  • Problem: The problem of building a text translation model without the use of parallel texts is considered, i.e. pairs of identical sentences in different languages. This The problem occurs when building translation models for low-resource languages (that is, languages for which there is not much data in the public domain).
  • Data: A selection of articles from Wikipedia in two languages.
  • References:
    1. [20] Unsupervised Machine Translation Using Monolingual Corpora Only
    2. [21] Sequence to sequence.
    3. [22] Autoencoding.
    4. [23] Training with Monolingual Training Data.
  • Basic algorithm: Unsupervised Machine Translation Using Monolingual Corpora Only.
  • Solution: As a translation model, it is proposed to consider a combination of two auto-encoders, each of which is responsible for presenting sentences in one of the languages. The models are optimized in such a way that the latent spaces of autoencoders for different languages match. As an initial representation of sentences, it is proposed to consider their graph description obtained using multilingual ontologies.
  • Novelty: A method for constructing a translation model is proposed, taking into account graph descriptions of sentences.
  • Authors: Oleg Bakhteev, Strijov V.V.,

Problem 8.2021

  • Title: Generation of features using locally approximating models (Classification of human activities according to measurements of fitness bracelets).
  • Problem: It is required to check the feasibility of the hypothesis about the simplicity of sampling for the generated features. Features are the optimal parameters of approximating models. Moreover, the entire sample is not simple and requires a mixture of models to approximate it. Explore the information content of the generated features - the parameters of the approximating models trained on the segments of the original time series. According to the measurements of the accelerometer and gyroscope, it is required to determine the type of activity of the worker. It is assumed that the time series of measurements contain elementary movements that form clusters in the space of time series descriptions. The characteristic duration of the movement is seconds. Time series are labeled with activity type labels: work, leisure. The typical duration of activity is minutes. It is required to restore the type of activity according to the description of the time series and cluster.
  • Data: WISDM accelerometer time series (Time series (library of examples), section Accelerometry).
    1. WISDM (Kwapisz, J.R., G.M. Weiss, and S.A. Moore. 2011. Activity recognition using cell phone accelerometers. ACM SigKDD Explorations Newsletter. 12(2):74–82.), USC-HAD. Human activity recognition using smart phone embedded sensors: A Linear Dynamical Systems method, W Wang, H Liu, L Yu, F Sun - Neural Networks (IJCNN), 2014.
  • References:
    1. Motrenko A.P., Strijov V.V. Extracting fundamental periods to segment human motion time series // Journal of Biomedical and Health Informatics, 2016, Vol. 20, No. 6, 1466 - 1476. URL
    2. Karasikov M.E., Strijov V.V. Classification of time series in the space of parameters of generating models // Informatics and its applications, 2016.URL
    3. Kuznetsov M.P., Ivkin N.P. Algorithm for Classifying Accelerometer Time Series by Combined Feature Description // Machine Learning and Data Analysis. 2015. T. 1, No. 11. C. 1471 - 1483. URL
    4. Isachenko R.V., Strijov V.V. Metric learning in The problem of multiclass classification of time series // Informatics and its applications, 2016, 10(2) : 48-57. URL
    5. Zadayanchuk A.I., Popova M.S., Strijov V.V. Choosing the optimal model for classifying physical activity based on accelerometer measurements // Information technologies, 2016. URL
    6. Ignatov A., Strijov V. Human activity recognition using quasiperiodic time series collected from a single triaxial accelerometer // Multimedia Tools and Applications, 2015, 17.05.2015 : 1-14. URL
  • Basic algorithm: Basic algorithm described in [Karasikov, Strijov: 2016] and [Kuznetsov, Ivkin: 2014].
  • Solution: It is required to build a set of locally approximating models and choose the most adequate ones. Find the optimal segmentation method and the optimal description of the time series. Construct a metric space of descriptions of elementary motions.
  • Novelty: A standard for building locally approximating models has been created. The connection of two characteristic times of the description of human life, the combined statement of the problem.
  • Authors: Expert Strijov V.V., consultants Alexandra Galtseva, Danil Sayranov.

2020

Author Topic Links Consultant Letters Reviewer
Grebenkova Olga Variational optimization of deep learning models with model complexity control LinkReview

GitHub Paper Slides Video

Oleg Bakhteev AILP+UXBR+HCV+TEDWS Shokorov Vyacheslav

Review

Shokorov Vyacheslav Text recognition based on skeletal representation of thick lines and convolutional networks LinkReview

GitHub Paper Slides Video

Denis Ozherelkov AIL Grebenkova Olga

Review

Filatov Andrey Intention forecasting. Investigation of the properties of local models in the spatial decoding of brain signals LinkReview

GitHub Paper Slides Video

Valery Markin AILPHUXBRCVTEDWS Hristolubov Maxim

Review

Islamov Rustem Analysis of the properties of an ensemble of locally approximating models LinkReview

GitHub Paper Slides Video

Andrey Grabovoi AILPHUXBRCVTEDWS Gunaev Ruslan

Review

Zholobov Vladimir Early prediction of sufficient sample size for a generalized linear model. LinkReview

GitHub Paper Slides Video

Grigory Malinovsky AILPHUXBRCVTEWSF Vayser Kirill

Review

Vayser Kirill Additive regularization and its meta parameters when choosing the structure of deep learning networks LinkReview

GitHub Paper Slides Video

Mark Potanin AILP+HUX+BRCV+TEDWS Zholobov Vladimir

Review

Bishuk Anton Solution of an optimization problem combining classification and regression to estimate the binding energy of a protein and small molecules. LinkReview

GitHub Paper Slides Video

Maria Kadukova AILPHUXBRCVTEDH Filippova Anastasia
Filippova Anastasia Step detection for IMU navigation via deep learning LinkReview

GitHub Paper Slides EnglishPaper Video

Tamaz Gadaev AIL0PUXBRCVSF Bishuk Anton

Review

Savelev Nickolay Distributed optimization under Polyak-Loyasievich conditions LinkReview

GitHub Paper Slides Video

A. N. Beznosikov AILPHUXBRCVTEDWS Khary Alexandra

Review

Khary Alexandra Theoretical validity of the application of metric classification methods using dynamic alignment (DTW) to spatiotemporal objects. LinkReview

GitHub Paper Slides Video

Gleb Morgachev, Alexey Goncharov AILPHUXBRCVTEDCWS Savelev Nickolay

Review

Hristolubov Maxim Generating features using locally approximating models (Classification of human activities by measurements of fitness bracelets) LinkReview

GitHub Paper Slides Video

Alexandra Galtseva, Danil Sayranov AILPH Filatov Andrey

Review

Mamonov Kirill Nonlinear ranking of exploratory information search results. LinkReview

GitHub Paper Slides Video

Maxim Eremeev AILPHU+XBRC+V+TEDHWJSF
Pavlichenko Nikita Predicting the quality of protein models using spherical convolutions on 3D graphs. LinkReview

GitHub Paper Slides Video

Sergei Grudinin, Ilya Igashov AILPUXBRHCVTEDH
Sodikov Mahmud, Skachkov Daniel Agnostic neural networks Code

Paper Slides Video

Radoslav Neichev AILPHUXBRC+VTEDHWJSF Kulagin Petr

Review

Gunaev Ruslan Graph Neural Network in Reaction Yield prediction LinkReview

Github Paper Slides Video

Philip Nikitin AILPUXBRHCVTEDHWSF Islamov Rustem

Review

Yaushev Farukh Investigation of ways to match models by reducing the dimension of space LinkReview

Github Paper Slides Video

Roman Isachenko AILPUXBRHCVTEDHWJS Zholobov Vladimir

Review

51. 2020

  • Name: Analysis of the properties of an ensemble of locally approximating models.
  • Problem: In this paper, we consider The problem of constructing a universal approximator --- a multimodel, which consists of a given finite set of local models. Each local model approximates a connected region in feature space. It is assumed that the set of local models cover the entire space of objects. A convex combination of local models is considered as an aggregating function. As the coefficients of the convex combination, we consider a function depending on the object --- the gate function.
  • Required: To construct an algorithm for optimizing the parameters of local models and parameters of the gate function. It is required to propose a metric in the space of objects, a metric in the space of models.
  • Data:
    1. Synthetically generated data.
    2. Energy consumption forecasting data. It is proposed to use the following models as local models: working day, day off. (Energy Consumption, Turk Electricity Consumption German Spot Price).
  • References:
    1. Overview of methods for estimating sample size
    2. Vorontsov's lectures on compositions
    3. Vorontsov's lectures on compositions
    4. Esen Y.S., Wilson J., Gader P.D. Twenty Years of Mixture of Experts. IEEE Transactions on Neural Networks and Learning Systems. 2012. Issues. 23. No 8. P. 1177-1193.
    5. Pavlov K.V. Selection of multilevel models in The problems classification, 2012
  • Basic algorithm: As a basic algorithm, it is proposed to use a two-level optimization problem, where local models are optimized at one iteration and at the next iteration, the parameters of the gate function are optimized.
  • Authors: Grabovoi A.V. (consultant), Strijov V.V. (Expert)

54. 2020

It is necessary to determine the approximate coordinates of the center of the pupil. The word "approximate" means that the calculated pupil center must lie inside a circle centered at the pupil's true center and half the true radius. The algorithm must be very fast.

  • Data: About 200 thousand eye images. For each, the position of the true circle is marked - for the purpose of training and testing the method being created.
  • Base algorithm: To speed up work with the image, it is proposed to aggregate data using brightness projections. Image brightness is a function of two discrete arguments I(x, y). Its projection onto the horizontal axis is P(x)=\sum \limits_y I(x,y). Similarly, projections are constructed on axes with an inclination. Having built several projections (two, four), based on them, you can try to determine the position of the pupil (compact dark area) using heuristics and / or a neural network. It is interesting to evaluate the capabilities of the neural network in this The problem.
  • References: Zhi-Hua Zhou, Xin Geng Projection functions for eye detection // Pattern Recognition. 2004. V.37ю N.5. P.1049-1056. https://doi.org/10.1016/j.patcog.2003.09.006
  • Authors: Matveev I.A.

55. 2020

  • Title: Search for the boundaries of the iris by the method of circular projections
  • Problem: Given a monochrome bitmap of the eye, see examples (https://cloud.mail.ru/public/2DBu/5c6F6e3LC). The approximate position of the center of the pupil is also known. The word "approximate" means that the calculated center of the pupil is no more than half of its true radius from the true one. It is necessary to determine the approximate positions of the circles approximating the pupil and iris. The algorithm must be very fast.
  • Data: About 200 thousand eye images. For each, the position of the true circle is marked - for the purpose of training and testing the method being created.
  • Base algorithm: To speed up work with the image, it is proposed to aggregate data using circular projections of brightness. Circular projection is a function that depends on the radius, the value of which P(r) is equal to the integral of the directed image brightness gradient over a circle of radius r (or along an arc of a circle). Example for one arc (right quadrant) and for four arcs. Having built some circular projections, based on them, you can try to determine the position of the inner and outer borders of the iris (ring) using heuristics and / or a neural network. It is interesting to evaluate the capabilities of the neural network in this The problem.
  • References: Matveev I.A. Detection of Iris in Image By Interrelated Maxima of Brightness Gradient Projections // Applied and Computational Mathematics. 2010. V.9. N.2. P.252-257. https://www.researchgate.net/publication/228396639_Detection_of_iris_in_image_by_interrelated_maxima_of_brightness_gradient_projections
  • Authors: Matveev I.A.

56. 2020

  • Title: Construction of local and universal interpretable scoring models
  • Problem: Build a simple and interpretable scoring system as a superposition of local models, taking into account the requirements for the system to retain knowledge about key customers and features (in other words, take into account new economic phenomena). The model must be a superposition, and each element must be controlled by its own quality criterion. Introduce a schedule for optimizing the structure and parameters of the model: the system must work in a single optimization chain. Propose an algorithm for selecting features and objects.
  • Data:
  1. Data from OTP Bank. The sample contains records of 15,223 clients classified into two classes: 1 - there was a response (1812 clients), 0 - there was no response (13411 clients). Feature descriptions of clients consist of 50 features, which include, in particular, age, gender, social status in relation to work, social status in relation to pension, number of children, number of dependents, education, marital status, branch of work. The data are available at the following addresses: www.machinelearning.ru/wiki/images/2/26/Contest_MMRO15_OTP.rar (sample A), www.machinelearning.ru/wiki/images/5/52/Contest_MMRO15_OTP_(validation).rar (sample B).
  2. Data from Home Credit: https://www.kaggle.com/c/home-credit-default-risk/data
  • References:
    1. Strijov V.V. Error function in regression analysis // Factory Laboratory, 2013, 79(5) : 65-73
    2. Bishop C. M. Linear models for classification / В кн.: Pattern Recognition and Machine Learning. Под ред.: M. Jordan, J. Kleinberg, B. Scholkopf. – New York: Springer Science+Business Media, 2006, pp--203 – 208
    3. Tokmakova A.A. Obtaining Stable Hyperparameter Estimates for Linear Regression Models // Machine Learning and Data Analysis. — 2011. — № 2. — С. 140-155
    4. S. Scitovski and N. Sarlija. Cluster analysis in retail segmentation for credit scoring // CRORR 5. 2014. 235–245
    5. Goncharov A.V. Building Interpretable Deep Learning Models in the Social Ranking Problem
  • Base algorithm: Iterative weighted least squares (described in (2))
  • Solution: It is proposed to build a scoring system containing such a preprocessing block as a block for generating metric features. It is proposed to investigate the influence of the non-equivalence of objects on the selection of features for the model, to investigate the joint selection of features and objects when building a model. It is required to implement a schedule for optimizing the model structure using an algorithm based on the analysis of covariance matrices of model hyperparameters. The schedule includes a phased replenishment of the set of features and objects. The feature sample size will be determined by controlling the error variance. The main criterion for the quality of the system: ROC AUC (Gini).
  • Novelty:
  1. The model structure optimization schedule must satisfy the requirement to rebuild the model at any time without losing its characteristics.
  2. Accounting for the unequal value of objects in the selection of features
  • Authors: Pugaeva I.V. (consultant), Strijov V.V. (Expert)

59. 2020

  • Name: Distributed optimization under Polyak-Loyasievich conditions
  • Problem description: The problem is to efficiently solve large systems of nonlinear equations using a network of calculators.
  • Solution: A new method for decentralized distributed solution of systems of nonlinear equations under Polyak-Loyasievich's conditions is proposed. The approach is based on the fact that the distributed optimization problem can be represented as a composite optimization problem (see 2 from the literature), which in turn can be solved by analogs of the similar triangles or sliding method (see 2 from the literature).
  • Basic algorithm: The proposed method is compared with gradient descent and accelerated gradient descent
  • References:
    1. Linear Convergence of Gradient and Proximal-GradientMethods Under the Polyak- Lojasiewicz Condition https://arxiv.org/pdf/1608.04636.pdf
    2. Linear Convergence for Distributed Optimization Under the Polyak-Łojasiewicz Condition https://arxiv.org/pdf/1912.12110.pdf
    3. Optimal Decentralized Distributed Algorithms for Stochastic ConvexOptimization https://arxiv.org/pdf/1911.07363.pdf
    4. Modern numerical optimization methods, universal gradient descent method https://arxiv.org/ftp/arxiv/papers/1711/1711.00394.pdf
  • Novelty: Reduction of a distributed optimization problem to a composite optimization problem and its solution under Polyak-Loyasievich conditions
  • Authors: Expert A.B. Gasnikov, consultant A.N. Beznossikov
  • Comment: it is important to set up a computational experiment in this The problem, otherwise The problem will be poorly compatible with the course.

17. 2020

  • Title: Intention forecasting. Investigation of the properties of local models in the spatial decoding of brain signals
  • Problem: When building brain-computer interface systems, simple, stable models are used. An important stage in the construction of such a model is the construction of an adequate feature space. Previously, such the problem was solved by extracting features from the frequency characteristics of signals.
  • Data: ECoG/EEG brain signal data sets.
  • References:
    1. Motrenko A.P., Strijov V.V. Multi-way feature selection for ECoG-based brain-computer Interface // Expert systems with applications. - 2018.
    2. Eliseyev A., Aksenova T. Stable and artifact-resistant decoding of 3D hand trajectories from ECoG signals using the generalized additive model //Journal of neural engineering. – 2014.
  • Basic algorithm: The comparison is proposed to be made with the partial least squares algorithm.
  • Solution: In this paper, it is proposed to take into account the spatial dependence between sensors that read data. To do this, it is necessary to locally model the spatial impulse/signal and build a predictive model based on the local description.
  • Novelty: An essentially new way of constructing a feature description in the problem of signal decoding is proposed. Bonus: analysis of changes in the structure of the model, adaptation of the structure when the sample changes.
  • Authors: Strijov V.V., Roman Isachenko - Experts, consultants – Valery Markin, Alina Samokhina

9. 2020

  • Title: Text recognition based on skeletal representation of thick lines and convolutional networks
  • Problem: It is required to build two CNNs, one recognizes a raster representation of an image, the other a vector one.
  • Data: Fonts in raster representation.
  • References:List of works [24], in particular arXiv:1611.03199 and
    1. Goyal P., Ferrara E. Graph embedding techniques, applications, and performance: A survey. arXiv:1705.02801, 2017.
    2. Cai H., Zheng V.W., Chang K.C.-C. A comprehensive survey of graph embedding: Problems, techniques and applications. arXiv:1709.07604, 2017.
    3. Grover A., Leskovec J. node2vec: Scalable Feature Learning for Networks. arXiv:1607.00653, 2016.
    4. Mestetskiy L., Semenov A. Binary Image Skeleton - Continuous Approach // Proceedings 3rd International Conference on Computer Vision Theory and Applications, VISAPP 2008. P. 251-258. URL
    5. Kushnir O.A., Seredin O.S., Stepanov A.V. Experimental study of regularization parameters and approximation of skeletal graphs of binary images // Machine Learning and Data Analysis. 2014. Т. 1. № 7. С. 817-827. URL
    6. Zhukova K.V., Reyer I.A. Basic Skeleton Connectivity and Parametric Shape Descriptor // Machine Learning and Data Analysis.2014. Т. 1. № 10. С. 1354-1368. URL
    7. Kushnir O., Seredin O. Shape Matching Based on Skeletonization and Alignment of Primitive Chains // Communications in Computer and Information Science. 2015. V. 542. P. 123-136. URL
  • Basic algorithm: Convolution network for bitmap.
  • Solution: It is required to propose a method for collapsing graph structures, which allows generating an informative description of the thick line skeleton.
  • Novelty: A method is proposed for improving the quality of recognition of thick lines due to a new method for generating their descriptions.
  • Authors: Experts Reyer I.A., Strijov V.V., Mark Potanin, consultant Denis Ozherelkov

60. 2020

  • Title: Variational optimization of deep learning models with model complexity control
  • Problem: The problem of optimizing a deep learning model with a predetermined model complexity is considered. It is required to propose a model optimization method that allows generating new models with a given complexity and low computational costs.
  • Data:MNIST, CIFAR
  • References:
    1. [1] variational inference for neural networks https://papers.nips.cc/paper/4329-practical-variational-inference-for-neural-networks.pdf
    2. [2] hypernets https://arxiv.org/abs/1609.09106
    3. [3] network factories https://papers.nips.cc/paper/6304-convolutional-neural-fabrics.pdf
  • Base algorithm: Random search
  • Solution: The proposed method is to represent a deep learning model as a hypernet (a network that generates the parameters of another network) using a Bayesian approach. Probabilistic assumptions about the parameters of deep learning models are introduced, and a variational lower estimate of the Bayesian validity of the model is maximized. The variation estimate is considered as a conditional value depending on the external parameter of complexity.
  • Novelty: The proposed method allows generating models in one-shot mode (practically without retraining) with the required model complexity, which significantly reduces the cost of optimization and retraining.
  • Authors: Oleg Bakhteev, Strijov V.V.

61. 2020

  • Title: Selecting a deep learning model based on the triplet relationship of model and sample
  • Problem: The problem one-shot of choosing a deep learning model is considered: choosing a model for a specific sample, issued from some general population, should not be computationally expensive.
  • Data:MNIST, synthetic data
  • References:
    1. [1] learning model predictions on pairs <sample, model> https://www.ri.cmu.edu/pub_files/2016/10/yuxiongw_eccv16_learntolearn.pdf
    2. [2] Bayesian choice for two domains https://arxiv.org/abs/1806.08672
  • Base algorithm: Random search
  • Solution: It is proposed to consider the space of parameters and models as two domains with their own generative models. To obtain a connection between domains, a generalization of the variational derivation to the case of triplet constraints is used.
  • Novelty: New one-shot model training method
  • Authors: Oleg Bakhteev, Strijov V.V.

64. 2020

  • Title: Theoretical validity of the application of metric classification methods using dynamic alignment (DTW) to spatiotemporal objects.
  • Problem description: It is necessary to study the existing theoretical justifications for applying dynamic alignment methods to various objects, and explore the use of such methods for space-time series.
    When proving the applicability of alignment methods, it is proved that the function generated by the dynamic alignment algorithm is the core. Which, in turn, justifies the use of metric classification methods.
  • References:
    1. Overview presentation about DTW
    2. Mercer's theorem
    3. Polynomial dynamic time warping kernel support vector machines for dysarthric speech recognition with sparse training data
    4. Online Signature Verification with New Time Series Kernels for Support Vector Machines
  • Solution: For different formulations of the DTW method (when the internal function of the distance between time series samples is different) - find and collect evidence that the function is the kernel in one place.
    For a basic set of datasets with time series (on which the accuracy of distance functions is checked ) check the fulfillment of the conditions from the Mercer theorem (positive definiteness of the matrix). Do this for various modifications of the DTW distance function. (Sakoe-Chiba band, Itakura band, weighted DTW.)
  • Novelty: Investigation of theoretical justifications for applying the dynamic alignment algorithm (DTW) and its modifications to space-time series.
  • Authors: Strijov V.V. - Expert, Gleb Morgachev, Alexey Goncharov - consultants.

66. 2020

  • Title: Agnostic neural networks
  • Problem description: Introduce a metric space into the problem of automatic construction (selection) of agnostic networks.
  • Data: Data from the Reinforcement learning area. Preferably the type of cars on the track.
  • References:
    1. (!) Kulunchakov A.S., Strijov V.V. Generation of simple structured Information Retrieval functions by genetic algorithm without stagnation // Expert Systems with Applications, 2017, 85 : 221—230.
    2. A. A. Varfolomeeva The choice of features when marking bibliographic lists by methods of structural learning, 2013, [25]
    3. Bin Cao, Ying Li and Jianwei Yin Measuring Similarity between Graphs Based on the Levenshtein Distance, 2012, [26]
    4. https://habr.com/ru/post/465369/
    5. https://weightagnostic.github.io/
  • Base algorithm: Networks from an archived article. Symbolic regression from an article in ESwA (you need to restore the code).
  • Solution: We create a model generator in the framework of symbolic regression. We create a model generator as a variational autoencoder (we won’t have time during the course). We study the metric properties of sample spaces (Euclidean) and models (Banach). We create a GAN pair - a generator-discriminator for predicting the structures of predictive models.
  • Novelty: So far, no one has succeeded. Here they discussed Tommi Yaakkola, how he came to us in Yandex. He hasn't succeeded yet either.
  • Authors: Expert Strijov V.V., Radoslav Neichev - consultant

13. 2020

  • Title: Deep learning for RNA secondary structure prediction
  • Problem: RNA secondary structure is an important feature which defines RNA functional properties. Its importance can be illustrated by the fact, that it is evolutionary preserved and some types of functional RNAs always * have the same secondary structure, for example all tRNAs fold into cloverleaf. As secondary structure often defines functions, knowing RNAs secondary structure may help investigate functions of novel RNA molecules. RNA folding is not as easy as DNA folding, because RNA is single stranded molecule which forms complicated base-pairing interactions, while DNA mostly exists as fully base paired double helices. Current methods of RNA structure prediction rely on experimentally evaluated thermodynamic rules, but with thermodynamics alone only 80% of structures can be accurately predicted. We propose an AI-driven method for predicting RNA secondary structure inspired by neural machine translation model.
  • Data: RNA sequences in form of strings of characters
  • References: https://arxiv.org/abs/1609.08144
  • Base algorithm: https://www.ncbi.nlm.nih.gov/pubmed/16873527
  • Solution: Deep learning recurrent encoder-decoder model with attention
  • Novelty: Currently RNA secondary structure prediction still remains unsolved problem and to the best of our knowledge DL approach has never been introduced in the literature before
  • Authors: consultant Maria Popova, Alexander Isaev (we are waiting for a response from them, without a response The problem is removed)

65. 2020

  • Title: Approximation of low-dimensional samples by heterogeneous models
  • Problem description: The problem of knowledge transfer (Hinton's distillation, Vapnik's privileged learning) from one network to another is investigated.
  • Data: UCI samples, see what samples are used in papers on this topic
  • References:
    1. Neichev's Diploma Informative a priori assumptions in the privileged learning problem, presentation
    2. Works Hinton Knowledge distilling, pay attention to error functions
  • Base algorithm: described in the work of Neichev
  • Novelty: Exploring different sampling methods
  • Solution:Try different models that are in the lectures, from non-parametric to deep ones, compare and visualize the likelihood functions
  • Authors: consultants Mark Potanin, (ask Andrey Grabovoi for help) Strijov V.V.

67. 2020

  • Title: Selection of topics in topic models for exploratory information retrieval.
  • Problem description: Test the hypothesis that when searching for similar documents by their topic vectors, not all topics are informative, so discarding some topics can increase the accuracy and completeness of the search. Consider the alternative hypothesis that instead of discarding topics, one can compare vectors by a weighted cosine proximity measure with adjustable weights.
  • Data: Text collections of sites habr.com and techcrunch.com. Labeled selections: queries and related documents.
  • References:
    1. Vorontsov K. V. Probabilistic Topic Modeling: An Overview of Models and Additive Regularization.
    2. Ianina A., Vorontsov K. Regularized Multimodal Hierarchical Topic Model for Document-by-Document Exploratory Search // FRUCT ISMW, 2019.
  • Base algorithm: The topic model with regularizers and modalities described in the article (source code available).
  • Novelty:The question of informativeness of topics for vector search of thematically related documents has not been studied before.
  • Solution: Evaluate the individual informativeness of topics by throwing them out one at a time; then sort the topics by individual informativeness and determine the threshold for cutting off non-informative topics. A suggestion as to why this should work: background themes are not informative, and discarding them increases search accuracy and recall by a few percent.
  • Authors: Vorontsov K. V., consultant Anastasia Yanina.

68. 2020

  • Title: Meta-learning of topic classification models.
  • Problem description: Develop universal heuristics for a priori assignment of modality weights in thematic models of text classification.
  • Data: Description of datasets, Folder with datasets.
  • References:
    1. Vorontsov K. V. Probabilistic Topic Modeling: An Overview of Models and Additive Regularization.
  • Base algorithm: Thematic classification models for several datasets.
  • Novelty:In topic modeling, the problem of automatic selection of modality weights has not yet been solved.
  • Solution: Optimize the weights of modalities according to the quality criterion of text classification. Investigate the dependence of the optimal relative weights of modalities on the dimensional characteristics of the problem. Find formulas for estimating the initial values of modality weights without explicitly solving the problem. To reproduce datasets, apply sampling of fragments of source documents.
  • Authors: Vorontsov K. V., consultant Yulian Serdyuk.

70. 2020

  • Name: Investigation of the structure of the target space when building a predictive model
  • The problem:The problem of forecasting a complex target variable is studied. Complexity means the presence of dependencies (linear or non-linear). It is assumed that the initial data are heterogeneous: the spaces of the independent and target variables are of different nature. It is required to build a predictive model that would take into account the dependence in the source space of the independent variable, as well as in the space of the target variable.
  • Data: Heterogeneous data: picture - text, picture - speech and so on.
  • Basic algorithm: As basic algorithms, it is proposed to use a linear model, as well as a nonlinear neural network model.
  • Authors: Strijov V.V. - Expert, consultant: Isachenko Roman.

71. 2020

  • Name: Investigation of ways to match models by reducing the dimension of space
  • Problem description: The problem of predicting a complex target variable is investigated. Complexity means the presence of dependencies (linear or non-linear). It is proposed to study ways to take into account dependencies in the space of the target variable, as well as the conditions under which these dependencies affect the quality of the final predictive model.
  • Data: Synthetic data with known data generation hypothesis.
  • Basic algorithm: As basic algorithms, it is proposed to use space dimensionality reduction methods (PCA, PLS, autoencoder) and linear matching models.
  • Authors: Strijov V.V. - Expert, consultant: Isachenko Roman.

72. 2020

  • Name: Construction of a single latent space in the problem of modeling heterogeneous data.
  • Problem description: The problem of predicting a complex target variable is investigated. Complexity means the presence of dependencies (linear or non-linear). It is proposed to build a single latent space for the independent and target variables. Model matching is proposed to be carried out in the resulting low-dimensional space.
  • Data: Heterogeneous data: picture - text, picture - speech and so on.
  • Basic algorithm: As basic algorithms, it is proposed to use space dimensionality reduction methods (PCA, PLS, autoencoder) and linear matching models.
  • Authors: Strijov V.V. - Expert, consultant: Isachenko Roman.

73. 2020

  • Title: Nonlinear ranking of exploratory information search results.
  • Problem description: Develop an algorithm for recommending the reading order of documents (reading order, reading list) found using exploratory information retrieval. Documents should be ranked from simple to complex, from general to specific, that is, in the order in which it will be easier for the user to understand a new subject area for him. The algorithm must build a reading graph - a partial order relation on the set of found documents; in particular, it can be a collection of trees (document forest).
  • Data: Part of Wikipedia and reference reading graph derived from Wikipedia categories.
  • References:
    1. Vorontsov K. V. Probabilistic Topic Modeling: An Overview of Models and Additive Regularization.
    2. Georgia Koutrika, Lei Liu, and Steven Simske. Generating reading orders over document collections. HP Laboratories, 2014.
    3. James G. Jardine. Automatically generating reading lists. Cambridge, 2014.
  • Base algorithm: described in the article G.Koutrika.
  • Novelty: The problem has been little studied in the literature. Regularized multimodal topic models (ARTM, BigARTM) have never been applied to this problem.
  • Solution: The use of ARTM topic models in conjunction with estimates of the cognitive complexity of the text.
  • Authors: Vorontsov K. V., consultant Maxim Eremeev.

2019

Author Topic Links Consultant Reviewer
Severilov Pavel The problem of searching characters in texts LinkReview

code paper slides video

Murat Apishev
Grigoriev Alexey Text recognition based on skeletal representation of thick lines and convolutional networks LinkReview

code, paper, slides video

Ilya Zharikov review Varenyk Natalia
Grishanov Alexey Automatic configuration of BigARTM parameters for a wide class of The problems LinkReview code, paperslides

video

Viktor Bulatov reviewGerasimenko Nikolay
Yusupov Igor Dynamic alignment of multivariate time series LinkReview code paper slides video Alexey Goncharov
Varenyk Natalia Spherical CNN for QSAR prediction LinkReview, code, paper, slides video Maria Popova review Grigoriev Alexey
Beznosikov Alexander Z-learning of linearly-solvable Markov Decision Processes LinkReview

paper code slides video

Yury Maximov
Panchenko Svyatoslav Obtaining a simple sample at the output of the neural network layer LinkReview,

code, paper, slides

Gadaev Tamaz
Veselova Evgeniya Deep Learning for reliable detection of tandem repeats in 3D protein structures Code link review paper slides video Guillaume Pages, Sergei Grudinin
Aminov Timur Quality Prediction for a Feature Selection Procedure LinkReview code paper

slides

Roman Isachenko
Markin Valery Investigation of the properties of local models in the spatial decoding of brain signals LinkReview

code paper slides video

Roman Isachenko
Abdurahmon Sadiev Generation of features using locally approximating models LinkReview

code, paper, slides video

Anastasia Motrenko
Tagir Sattarov Machine translation training without parallel texts. LinkReview code paper, slides video Oleg Bakhteev
Gerasimenko Nikolay Thematic search for similar cases in the collection of acts of arbitration courts. LinkReview code paper slides video Ekaterina Artyomova reviewGrishanov Alexey

40. 2019

  • Title: Quality prediction for the feature selection procedure.
  • Problem description: The solution of the feature selection problem is reduced to enumeration of binary cube vertices. This procedure cannot be performed for a sample with a large number of features. It is proposed to reduce this problem to optimization in a linear space.
  • Data: Synthetic data + simple samples
  • References:
    1. Bertsimas D. et al. Best subset selection via a modern optimization lens //The annals of statistics. – 2016. – Т. 44. – №. 2. – С. 813-852.
    2. Luo R. et al. Neural architecture optimization //Advances in Neural Information Processing Systems. – 2018. – С. 7827-7838.
  • Base algorithm: Popular feature selection methods.
  • Solution: In this paper, it is proposed to build a model that, based on a set of features, predicts the quality on a test sample. To do this, a mapping of a binary cube into a linear space is constructed. After that, the quality of the model in linear space is maximized. To reconstruct the solution of the problem, the model of inverse mapping into a binary cube is used.
  • Novelty: A constructively new approach to solving the problem of choosing models is proposed.
  • Authors: Strijov V.V., Tetiana Aksenova, consultant – Roman Isachenko

42. 2019

  • Title: Z-learning of linearly-solvable Markov Decision Processes
  • Problem: Adapt Z-learning from [1] to the case of Markov Decision Process discussed in [2] in the context of energy systems. Compare it with standard (in reinforcement learning) Q-learning.
  • Data: We consider a Markov Process described via transition probability matrix. Given initial state vector (probability of being in a state at time zero), we generate data for the time evolution of the state vector. See [2] for an exemplary process describing evolution of an ensemble of energy consumers.
  • References:
    1. E. Todorov. Linearly-solvable Markov decision problems https://homes.cs.washington.edu/~todorov/papers/TodorovNIPS06.pdf
    2. Ensemble Control of Cycling Energy Loads: Markov Decision Approach. Michael Chertkov, Vladimir Y. Chernyak, Deepjyoti Deka. https://arxiv.org/abs/1701.04941
    3. Csaba Szepesvári. Algorithms for Reinforcement Learning. https://sites.ualberta.ca/~szepesva/papers/RLAlgsInMDPs.pdf
  • Base algorithm: Principal comparison should be made with Q learning described in [3]
  • Solution: We suppose that plugging in algorithm from [1] directly into [2] gives faster and more reliable solution.
  • Novelty: In the area of power systems there is a huge demand on fast reinforcement learning algorithms, but there is still a lack of that (in particular the ones respect the physics/underlying graph)
  • Authors: Yury Maximov (consultant, expert), Michael Chertkov (expert)

1. 2019

  • Title: Forecasting the direction of movement of the price of exchange instruments according to the news flow.
  • Problem description: Build and explore a model for predicting the direction of price movement. Given a set of news S and a set of timestamps T corresponding to the time of publication of news from S. 2. Time series P, corresponding to the price of an exchange instrument, and time series V, corresponding to the volume of sales for this instrument, for a period of time T'. 3. The set T is a subset of the time period T'. 4. Time intervals w=[w0, w1], l=[l0, l1], d=[d0, d1], where w0 < w1=l0 < l1=d0 < d1. It is required to predict the direction of movement of the price of an exchange instrument at the time t=d0 according to the news released in the period w.
  • Data:
    1. Financial data: data on quotes (at one tick interval) of several financial instruments (GAZP, SBER, VTBR, LKOH) for the 2nd quarter of 2017 from the Finam.ru website; for each point of the series, the date, time, price and volume are known.
    2. Text data: economic news for the 2nd quarter of 2017 from Forexis; each news is a separate html file.
  • References:
    1. Usmanova K.R., Kudiyarov S.P., Martyshkin R.V., Zamkovoy A.A., Strijov V.V. Analysis of relationships between indicators in forecasting cargo transportation // Systems and Means of Informatics, 2018, 28(3).
    2. Kuznetsov M.P., Motrenko A.P., Kuznetsova M.V., Strijov V.V. Methods for intrinsic plagiarism detection and author diarization // Working Notes of CLEF, 2016, 1609 : 912-919.
    3. Aysina Roza Munerovna, Thematic modeling of financial flows of corporate clients of a bank based on transactional data, final qualification work.
    4. Lee, Heeyoung, et al. "On the Importance of Text Analysis for Stock Price Prediction." LREC. 2014.
  • Base algorithm: Method used in the article (4).
  • Solution: Using topic modeling (ARTM) and local approximation models to translate a sequence of texts corresponding to different timestamps into a single feature description. Quality criterion: F1-score, ROC AUC, profitability of the strategy used.
  • Novelty: To substantiate the connection of time series, the Converging cross-mapping method is proposed.
  • Authors: Ivan Zaputlyaev (consultant), Strijov V.V., K.V. Vorontsov (Experts)

3. 2019

  • Title: Dynamic alignment of multidimensional time series.
  • Problem description: A characteristic multidimensional time series is the trajectory of a point in 3-dimensional space. The two trajectories need to be optimally aligned with each other. For this, the distance DTW between two time series is used. In the classical representation, DTW is built between one-dimensional time series. It is necessary to introduce various modifications of the algorithm for working with high-dimensional time series: trajectories, corticograms.
  • Data: The data describes 6 classes of time series from the mobile phone's accelerometer. https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group274/Goncharov2015MetricClassification/data/
  • References:
    1. Multidimensional DTW: https://pdfs.semanticscholar.org/76d3/5bd5a52453ebde80faaa1467d7effd74426f.pdf
  • Base algorithm: Using L_p distances between two dimensions of a time series, their modifications.
  • Solution: Investigation of distances resistant to change of coordinate order, studies of distances unstable to change of coordinate order. Experiments with other types of distances (cosine, RBF, others).
  • Novelty: There is no complete review and study of methods for working with multivariate time series. The dependence of the quality of the solution on the selected distances between measurements has not been studied.
  • Authors: Alexey Goncharov - consultant, Expert, Strijov V.V. - Expert

43. 2019

  • Title: Getting a simple sample at the output of the neural network layer
  • Problem: The output of the neural network is usually a generalized linear model over the outputs of the penultimate layer. It is necessary to propose a way to test the simplicity of the sample and its compliance with the generalized linear model (linear regression, logistic regression) using a system of statistical criteria.
  • Data: For the computational experiment, it is proposed to use classical samples from the UCI repository. Link to samples https://github.com/ttgadaev/SampleSize/tree/master/datasets
  • References: http://www.ccas.ru/avtorefe/0016d.pdf c 49-63 Bishop, C. 2006. Pattern Recognition and Machine Learning. Berlin: Springer. $758
  • Base algorithm: White test, Wald test, Goldfeld-Quantum test, Durbin-Watson, Chi-square, Fry-Behr, Shapiro-Wilk
  • Solution: The system of tests for checking the simplicity of the sample (and the adequacy of the model), the independent variables are not random, the dependent variables are distributed normally or binomially, there are no gaps and outliers, the classes are balanced, the sample is approximated by a single model. The variance of the error function does not depend on the independent variable. The study is based on synthetic and real data.
  • Authors: Gadaev T. T. (consultant) Strijov V.V., Grabovoi A.V. (Experts)

14. 2019

  • Title: Deep Learning for reliable detection of tandem repeats in 3D protein structures more in PDF
  • Problem: Deep learning algorithms pushed computer vision to a level of accuracy comparable or higher than a human vision. Similarly, we believe that it is possible to recognize the symmetry of a 3D object with a very high reliability, when the object is represented as a density map. The optimization problem includes i) multiclass classification of 3D data. The output is the order of symmetry. The number of classes is ~10-20 ii) multioutput regression of 3D data. The output is the symmetry axis (a 3-vector). The input data are typically 24x24x24 meshes. The total amount of these meshes is of order a million. Biological motivation : Symmetry is an important feature of protein tertiary and quaternary structures that has been associated with protein folding, function, evolution, and stability. Its emergence and ensuing prevalence has been attributed to gene duplications, fusion events, and subsequent evolutionary drift in sequence. Methods to detect these symmetries exist, either based on the structure or the sequence of the proteins, however, we believe that they can be vastly improved.
  • Data: Synthetic data are obtained by ‘symmetrizing’ folds from top8000 library (http://kinemage.biochem.duke.edu/databases/top8000.php).
  • References: Our previous 3D CNN: [27] Invariance of CNNs (and references therein): 01630265/document, [28]
  • Basic algorithm: A prototype has already been created using the Tensorflow framework [4], which is capable of detecting the order of cyclic structures with about 93% accuracy. The main goal of this internship is to optimize the topology of the current neural network prototype and make it rotational and translational invariant with respect to input data. [4] [29]
  • Solution: The network architecture needs to be modified according to the invariance properties (most importantly, rotational invariance). Please see the links below [30], [31] The code is written using the Tensorflow library, and the current model is trained on a single GPU (Nvidia Quadro 4000)of a desktop machine.
  • Novelty: Applications of convolutional networks to 3D data are still very challenging due to large amount of data and specific requirements to the network architecture. More specifically, the models need to be rotationally and transnationally invariant, which makes classical 2D augmentation tricks loosely applicable here. Thus, new models need to be developed for 3D data.
  • Authors: Expert Sergei Grudinin, consultants Guillaume Pages

46. 2019

  • Name: The problem of searching characters in texts
  • Problem description: In the simplest case, this The problem is reduced to the Sequence Labeling The problem on a labeled selection. The difficulty lies in obtaining a sufficient amount of training data, that is, it is required to obtain a larger sample from the existing small Expert markup (automatically by searching for patterns or by compiling a simple and high-quality markup instruction, for example, in Toloka). The presence of markup allows you to start experimenting with the selection of the optimal model, various neural network architectures (BiLSTM, Transformer, etc.) may be of interest here.
  • Data: Dictionary of symbols, Marked artistic texts
  • References: http://www.machinelearning.ru/wiki/images/0/05/Mmta18-rnn.pdf
  • Basic algorithm: HMM, RNN
  • Solution: It is proposed to compare the work of several state-of-the-art algorithms. Propose a classifier quality metric for characters (character/non-character). Determine applicability of methods.
  • Novelty: The proposed approach to text analysis is used by Experts in manual mode and has not been automated
  • Authors: M. Apishev (consultant), D. Lemtyuzhnikova

47. 2019

  • Title: Deep learning for RNA secondary structure prediction
  • Problem: RNA secondary structure is an important feature which defines RNA functional properties. Its importance can be illustrated by the fact, that it is evolutionary preserved and some types of functional RNAs always * have the same secondary structure, for example all tRNAs fold into cloverleaf. As secondary structure often defines functions, knowing RNAs secondary structure may help investigate functions of novel RNA molecules. RNA folding is not as easy as DNA folding, because RNA is single stranded molecule which forms complicated base-pairing interactions, while DNA mostly exists as fully base paired double helices. Current methods of RNA structure prediction rely on experimentally evaluated thermodynamic rules, but with thermodynamics alone only 80% of structures can be accurately predicted. We propose an AI-driven method for predicting RNA secondary structure inspired by neural machine translation model.
  • Data: RNA sequences in form of strings of characters
  • References: https://arxiv.org/abs/1609.08144
  • Base algorithm: https://www.ncbi.nlm.nih.gov/pubmed/16873527
  • Solution: Deep learning recurrent encoder-decoder model with attention
  • Novelty: Currently RNA secondary structure prediction still remains unsolved problem and to the best of our knowledge DL approach has never been introduced in the literature before
  • Authors: consultant Maria Popova Chapel-Hill

4. 2019

  • Title: Automatic setting of ARTM parameters for a wide class of The problems.
  • Problem description: The bigARTM open library allows you to build topical models using a wide class of possible regularizers. However, this flexibility makes The problem of setting the coefficients very difficult. This tuning can be greatly simplified by using the relative regularization coefficients mechanism and automatic selection of N-grams. We need to test the hypothesis that there is a universal set of relative regularization coefficients that gives "reasonably good" results on a wide class of problems. Several datasets are given with some external quality criterion (for example, classification of documents into categories or ranking). We find the best parameters for a particular dataset, giving the "locally the best model". We find the bigARTM initialization algorithm that produces thematic models with quality comparable to the "locally best model" on its dataset. Comparability criterion in quality: on this dataset, the quality of the "universal model" is no more than 5% worse than that of the "locally best model".
  • Data: Victorian Era Authorship Attribution Data Set, uci.edu/ml/datasets/Twenty+Newsgroups 20 Newsgroups, ICD-10, search/ranking triplets.
  • References:
    1. WRC by Nikita Doykov: http://www.machinelearning.ru/wiki/images/9/9f/2015_417_DoykovNV.pdf
    2. Presentation by Viktor Bulatov at a scientific seminar: https://drive.google.com/file/d/19pJ21LRPeeOxY4mkcSnQCRm93zOO4J5b/view
    3. Draft with formulas: https://drive.google.com/open?id=1AqS7snUsSJ18ZYBtC-6uP_2dMTDJSGeD
  • Base algorithm: PLSA / LDA / logregression.
  • Solution: bigARTM with background themes and smoothing, sparseness and decorrelation regularizers (coefficients picked up automatically), as well as automatically selected N-grams.
  • Novelty: The need for automated tuning of model parameters and the lack of such implementations in the scientific community.
  • Authors: consultant Viktor Bulatov, ExpertVorontsov K. V..

50. 2019

  • Title: Thematic search for similar cases in the collection of acts of arbitration courts.
  • Problem description: Build an information retrieval algorithm for a collection of acts of arbitration courts. The request can be an arbitrary document of the collection (the text of the act). The search result should be a list of documents in the collection, ranked in descending order of relevance.
  • Data: collection of text documents — acts of arbitration courts http://kad.arbitr.ru.
  • References:
    1. Anastasia Yanina. Thematic exploratory information search. 2018. FIVT MIPT.
    2. Ianina A., Golitsyn L., Vorontsov K. Multi-objective topic modeling for exploratory search in tech news. AINL-2017. CCIS, Springer, 2018.
    3. Ahmed El-Kishky, Yanglei Song, Chi Wang, Clare Voss, Jiawei Han. Scalable Topical Phrase Mining from Text Corpora. 2015.
  • Base algorithm: BigARTM with decorrelation, smoothing, sparse regularizers. Search by TF-IDF of words, by TF-IDF of UPA links, by thematic vector representations of documents, using a cosine proximity measure. TopMine algorithm for collocation detection.
  • Solution: Add modality of links to legal acts. Add modality of legal terms. Choose the optimal number of topics and regularization strategy. Organize the process of marking pairs of documents. Implement the evaluation of the quality of the search for a labeled sample of pairs of documents.
  • Novelty: The first attempt to use ARTM for thematic search of legal texts.
  • Authors: consultant Ekaterina Artyomova, Expert Vorontsov K. V..

2019 Group 2

Author Topic Links Consultant Reviewer
Vishnyakova Nina Optimal Approximation of Non-linear Power Flow Problem LinkReview paper code presentation video Yury Maximov Loginov Roman

review

Kudryavtseva Polina Intention forecasting. Building an optimal signal decoding model for modeling a brain-computer interface. code

LinkReview paper video presentation

Roman Isachenko Nechepurenko Ivan

review

Loginov Roman Multi-simulation as a universal way to describe a general sample code

LinkReview paper ChatInvite presentation video

Alexander Aduenko Makarov Mikhail review
Mikhail Makarov Location determination by accelerometer signals code

LinkReview paper presentation video

Anastasia Motrenko Cherepkov Anton: review
Kozinov Alexey The problem of finding characters in images LinkReview

paper code

M. Apishev,

D. Lemtyuzhnikova

Gracheva Anastasia review
Buchnev Valentin Early prediction of sufficient sample size for a generalized linear model. LinkReview

paper code presentation video

Grabovoi Andrey
Nechepurenko Ivan Multisimulation, privileged training code,

paper, LinkReview presentation

R. G. Neichev Kudryavtseva Polina
Gracheva Anastasia Estimation of binding energy of protein and small molecules code

paper LinkReview presentation video

Sergei Grudinin,

Maria Kadukova

Cherepkov Anton Privileged learning in the problem of iris boundary approximation paper, slides, code, LinkReview

video

R. G. Neichev Lepekhin Mikhail

preliminary review

Lepekhin Mikhail Creation of ranking models for information retrieval systems. Algorithm for Predicting the Structure of Locally Optimal Models code

LinkReview paper presentation video

Andrey Kulunchakov Vishnyakova Nina, review
Gridasov Ilya Automatic construction of a neural network of optimal complexity LinkReview

paper Presentation code

O. Yu. Bakhteev, Strijov V.V. Buchnev Valentin
Telenkov Dmitry Brain signal decoding and intention prediction LinkReview

git The paper Presentation code

Andrey Zadayanchuk

18. 2019

  • Title: Forecasting intentions. Building an optimal signal decoding model for modeling a brain-computer interface.
  • Problem: The Brain Computer Interface (BCI) allows you to help people with disabilities regain their mobility. According to the available description of the device signal, it is necessary to simulate the behavior of the subject.
  • Data: Data sets of ECoG/EEG brain signals.
  • References:
    • Motrenko A.P., Strijov V.V. Multi-way feature selection for ECoG-based brain-computer Interface // Expert systems with applications. - 2018.
  • Basic algorithm: It is proposed to compare with the partial least squares algorithm.
  • Solution: In this work, it is proposed to build a single system that solves the problem of signal decoding. As stages of building such a system, it is proposed to solve the problems of data preprocessing, feature space extraction, dimensionality reduction and selection of a model of optimal complexity. It is proposed to use the tensor version of PLS with feature selection.
  • Novelty: In the formulation of the problem, the complex nature of the signal is taken into account: a continuous trajectory of movement, the presence of discrete structural variables (fingers or joint movement), the presence of continuous variables (position of a finger or limb).
  • Authors: Strijov V.V., Tetiana Aksenova, consultant – Roman Isachenko

41. 2019

  • Title: Optimal Approximation of Non-linear Power Flow Problem
  • Problem: Our goal is to approximate the solution of non-linear non-convex optimal power flow problem by solving a sequence of convex optimization problems (aka trust region approach). On this way we propose to compare various approaches for an approximate solution of this problem with adaptive approximation of the power flow non-linearities with a sequence of quadratic and/or piece-wise linear functions
  • Data: Matpower module from MATLAB contains all necessary test cases. Start considering IEEE 57 bus case.
  • References:
    1. Molzahn, D. K., & Hiskens, I. A. (2019). A survey of relaxations and approximations of the power flow equations. Foundations and Trends in Electric Energy Systems, 4(1-2), 1-221. https://www.nowpublishers.com/article/DownloadSummary/EES-012
    2. The QC Relaxation: A Theoretical and Computational Study on Optimal Power Flow. Carleton Coffrin ; Hassan L. Hijazi; Pascal Van Hentenryck https://ieeexplore.ieee.org/abstract/document/7271127/
    3. Convex Relaxations in Power System Optimization: A Brief Introduction. Carleton Coffrin and Line Roald. https://arxiv.org/pdf/1807.07227.pdf
    4. Optimal Adaptive Linearizations of the AC Power Flow Equations. Sidhant Misra, Daniel K. Molzahn, and Krishnamurthy Dvijotham https://molzahn.github.io/pubs/misra_molzahn_dvijotham-adaptive_linearizations2018.pdf
  • Base algorithm: A set of algorithms described in [1] should be considered to compare with, details behind the proposed method would be shared by the consultant (a draft of the paper)
  • Solution: to figure out the quality of the solution we propose to compare it with the ones given by IPOPT and numerous relaxations, and do some reverse engineering regarding to our method
  • Novelty: The OPF is a truly hot topic in power systems, and is of higher interest by the discrete optimization community (as a general QCQP problem). Any advance in this area is of higher interest by the community
  • Authors: Yury Maximov (consultant and expert), Michael Chertkov (expert)
  • Notes: the problem has both the computational and the theoretical focuses, so 2 students are ok to work on this topic

2. 2019

  • Title: Investigation of reference objects in the problem of metric classification of time series.
  • Problem description: The DTW function is the distance between two time series that can be non-linearly warped relative to each other. It looks for the best alignment between two objects, so it can be used in a metric object classification problem. One of the methods for solving the problem of metric classification is measuring distances to reference objects and using the vector of these distances as an indicative description of the object. The DBA method is an algorithm for constructing centroids (reference objects) for time series based on the DTW distance. When plotting the distance between the time series and the centroid, different pairs of values (eg peak values) are more specific to one of the classes, and the impact of such coincidences on the distance value should be higher.

It is necessary to explore various ways of constructing reference objects, as well as determining their optimal number. The criterion is the quality of the metric classifier in The problem. In the DBA method, for each centroid, it is proposed to create a weight vector that demonstrates the "significance" of the measurements of the centroid, and use it in the modified weighted-DTW distance function.

Literature research and a combination of up-to-date methods.

  • Novelty: There has not been a comprehensive study of various methods of constructing centroids and reference elements along with the choice of their optimal number.
  • Authors: Alexey Goncharov - consultant, Expert, Strijov V.V. - Expert

7. 2019

  • Title: Privileged learning in the iris boundary approximation problem
  • Problem: Based on the image of the human eye, determine the circles approximating the inner and outer border of the iris.
  • Data: Bitmap monochrome images, typical size 640*480 pixels (however other sizes are possible)[32], [33].
  • References:
    1. Aduenko A.A. Selection of multi-models in The problems classification (supervisor Strijov V.V.). Moscow Institute of Physics and Technology, 2017. [34]
    2. K.A. Gankin, A.N. Gneushev, I.A. Matveev Segmentation of the iris image based on approximate methods with subsequent refinements // Izvestiya RAN. Theory and control systems, 2014, no. 2, p. 78–92.
    3. Duda, R. O. Use of the Hough transformation to detect lines and curves in pictures / R. O. Duda, P. E. Hart // Communications of the ACM. 1972 Vol. 15, no. 1.Pp.
  • Basic algorithm: Efimov Yury. Search for the outer and inner boundaries of the iris in the eye image using the paired gradient method, 2015.
  • Solution: See iris_circle_problem.pdf
  • Novelty: A fast non-enumerative algorithm for approximating boundaries using linear multimodels is proposed. Additionally, capsule neural networks.
  • consultant: Radoslav Neichev (by Strijov V.V., Expert Matveev I.A.)

44. 2019

  • Name: Early prediction of sufficient sample size for a generalized linear model.
  • Problem: The problem of designing an experiment is being investigated. The problem of estimating a sufficient sample size according to the data is solved. The sample is assumed to be simple. It is described by an adequate model. Otherwise, the sample is generated by a fixed probabilistic model from a known class of models. The sample size is considered sufficient if the model is restored with sufficient confidence. It is required, knowing the model, to estimate a sufficient sample size at the early stages of data collection.
  • Data: For the computational experiment, it is proposed to use classical samples from the UCI repository. Link to samples https://github.com/ttgadaev/SampleSize/tree/master/datasets
  • References:
    1. [Overview of methods for estimating sample size]
    2. http://svn.code.sf.net/p/mlalgorithms/code/PhDThesis/.
    3. Bootstrap method. https://projecteuclid.org/download/pdf_1/euclid.aos/1.

Bishop, C. 2006. Pattern Recognition and Machine Learning. Berlin: Springer. $758

  • Basic algorithm: We will say that the sample size is sufficient if the log-likelihood has a small variance, on a sample of size m calculated using bootstrap.

We are trying to approximate the dependence of the average value of log-likelihood and its variance on the sample size.

  • Solution: The methods described in the review are asymptotic or require a deliberately large sample size. The new method should be to predict volume in the early stages of experiment design, i.e. when data is scarce.
  • Authors: Grabovoi A.V. (consultant), Gadaev T. T. Strijov V.V. (Experts)
  • Note: to determine the simplicity of the sample, a new definition of complexity is proposed (Sergey Ivanychev). This is a separate work, +1 The problem 44a (? Katruza).

15. 2019

  • Title: Formulation and solution of an optimization problem combining classification and regression to estimate the binding energy of a protein and small molecules. The problem description [35]
  • Problem: From a bioinformatics point of view, The problem is to estimate the free energy of protein binding to a small molecule (ligand): the best ligand in its best position has the lowest free energy of interaction with the protein. (Following a large text, see the file at the link above.)
  • Data:
    1. Data for binary classification. Approximately 12,000 protein-ligand complexes: for each of them there is 1 native position and 18 non-native ones. The main descriptors are histograms of distributions of distances between different atoms of the protein and ligand, the dimension of the vector of descriptors is ~ 20,000. In the case of continued research and publication in a specialized journal, the set of descriptors can be expanded. The data will be provided as binary files with a python script to read.
    2. Data for regression. For each of the presented complexes, the value of the quantity is known, which can be interpreted as the binding energy.
  • References:
    1. SVM [36]
    2. Ridge Regression [37]
    3. [38] (section 1)
  • Basic algorithm: [39] In the classification problem, we used an algorithm similar to linear SVM, whose relationship with the energy estimate is beyond the scope of the classification problem, described in the above article. Various loss functions can be used in a regression problem.
  • Solution: It is necessary to connect the previously used optimization problem with the regression problem and solve it using standard methods. Cross-validation will be used to check the operation of the algorithm. There is a separate test set consisting of (1) 195 complexes of proteins and ligands, for which it is necessary to find the best ligand pose (the algorithm for obtaining ligand positions differs from that used in training), (2) complexes of proteins and ligands, for which native poses it is necessary to predict the energy binding, and (3) 65 proteins for which the most strongly binding ligand is to be found.
  • Novelty: First of all, the interest is combining classification and regression problems. The correct assessment of the quality of protein and ligand binding is used in drug development to search for molecules that interact most strongly with the protein under study. Using the classification problem described above to predict the binding energy results in an insufficiently high correlation of predictions with experimental values, while using the regression problem alone leads to overfitting.
  • Authors Sergei Grudinin, Maria Kadukova

27. 2019

  • Title: Creation of ranking models for information retrieval systems. Algorithm for Predicting the Structure of Locally Optimal Models
  • Problem: It is required to predict a time series using some parametric superposition of algebraic functions. It is proposed not to cost the prognostic model, but to predict it, that is, to predict the structure of the approximating superposition. A class of considered superpositions is introduced, and on the set of such structural descriptions, a search is made for a locally optimal model for the problem under consideration. The problem consists in 1) searching for a suitable structural description of the model 2) describing the search algorithm for the structure that will correspond to the optimal model 3) describing the algorithm for inverse construction of the model according to its structural description. For an already existing example of the answer to questions 1-3, see the works of A. A. Varfolomeeva.
  • Data:
    1. Collection of text documents TREC (!)
    2. A set of time series, which implies the restoration of functional dependencies. It is proposed to first use synthetic data or immediately apply the algorithm to forecasting time series 1) electricity consumption 2) physical activity with subsequent analysis of the resulting structures.
  • References:
    1. (!) Kulunchakov A.S., Strijov V.V. Generation of simple structured Information Retrieval functions by genetic algorithm without stagnation // Expert Systems with Applications, 2017, 85: 221–230.
    2. A. A. Varfolomeeva Selection of features when marking up bibliographic lists using structural learning methods, 2013, [40]
    3. Bin Cao, Ying Li and Jianwei Yin Measuring Similarity between Graphs Based on the Levenshtein Distance, 2012, [41]
  • Base algorithm: Described in [1]. Developed in the work of the 974 group team. It is proposed to use their code and experiment.
  • Solution: It is proposed to try to repeat the experiment of A. A. Varfolomeeva for a different structural description in order to understand what is happening. The superposition of algebraic functions defines an ortree, on the vertices of which the labels of the corresponding algebraic functions or variables are given. Therefore, the structural description of such a superposition can be its DFS-code. This is a string consisting of vertex labels, written in the order in which the tree is traversed by depth-first search. Knowing the arities of the corresponding algebraic functions, we can restore any such DFS-code in O(n) and get back the superposition of functions. On the set of similar string descriptions, it is proposed to search for the string description that will correspond to the optimal model.
  • Authors: consultant Andrey Kulunchakov (Inria Montbonnot), Expert Strijov V.V.

26. 2019

  • Title: Accelerometer positioning
  • Problem: Given initial coordinates, accelerometer signals, additional information (gyroscope, magnetometer signals). Possibly inaccurate map given (The problem SLAM)
  • Data: from [1], self-collected data.
  • References:
    1. https://arxiv.org/pdf/1712.09004.pdf
    2. https://ieeexplore.ieee.org/document/1528431
  • Basic algorithm: from [1].
  • Solution: Search for a priori and additional information that improves positioning accuracy.
  • Novelty: Statement of the problem in terms of Projection to Latent Spaces
  • Authors: consultant Anastasia Motrenko, Expert Ilya Gartseev , Strijov V.V.

45. 2019

  • Name: The problem of searching characters in images
  • Problem description: This The problem in one of the formulation options can be reduced to two sequential operations: 1) searching for objects in the image and determining their class 2) searching the database for information about the symbolic meaning of the found objects. The main difficulty in solving the problem lies in the search for objects in the image. However, the following classification may also be difficult due to the fact that the image of the object may be incomplete, unusually stylized, and the like.
  • Data: Dictionary of Symbols Museum Sites Image-net
  • References:
    1. http://www.machinelearning.ru/wiki/images/e/e2/IDP18.pdf (p. 116)
    2. http://www.image-net.org
  • Basic algorithm: CNN
  • Solution: It is proposed to compare the work of several state-of-the-art algorithms. Suggest a quality metric for searching and classifying objects. Determine applicability of methods.
  • Novelty: The proposed image analysis approach is used by Experts in manual mode and has not been automated
  • Authors: M. Apishev (consultant), D. Lemtyuzhnikova

28. 2019

  • Name: Multi-simulation as a universal way to describe a general sample
  • Problem description: Build a method for incremental refinement of the multimodel structure when new objects appear. Development and comparison of different algorithms for updating the structure of multimodels. Construction of an optimal scheme for refining the structure of a multimodel depending on the total sample size.
  • Data: At the initial stage of work, synthetic data with a known statistical structure is used. Testing of the developed methods is carried out on real data from the UCI repository.
  • References:
    1. Bishop, Christopher M. "Pattern recognition and machine learning." Springer, New York (2006).
    2. Gelman, Andrew, et al. Bayesian data analysis, 3rd edition. Chapman and Hall/CRC, 2013.
    3. MacKay, David JC. "The evidence framework applied to classification networks." Neural computation 4.5 (1992): 720-736.
    4. Aduenko A. A. "Choice of multimodels in The problem classification" Ph.D. thesis
    5. Motrenko, Anastasiya, Strijov V.V., and Gerhard-Wilhelm Weber. "Sample size determination for logistic regression." Journal of Computational and Applied Mathematics 255 (2014): 743-752.
  • Basic algorithm: Algorithm for constructing adequate multi-models from #4.
  • Solution: Bayesian approach to the problem of choosing models based on validity. Analysis of the properties of validity and its relationship with statistical significance.
  • Novelty: A method is proposed for constructing an optimal scheme for updating the structure of a multimodel when new objects appear. The relationship between validity and statistical significance for some classes of models has been studied.
  • Authors: Strijov Vadim Viktorovich, Aduenko Alexander Alexandrovich (GMT-5)

11. 2019

48. 2019

49. 2019

  • Name: Brain signal decoding and intention prediction
  • Problem description: It is required to build a model that restores the movement of the limbs according to the corticogram.
  • Data: neurotycho.org [9] (or fingers)
  • References:
    1. Neichev R.G., Katrutsa A.M., Strijov V.V. Selection of the optimal set of features from a multicorrelated set in the forecasting problem. Zavodskaya Lab. Materials Diagnostics, 2016, 82(3) : 68-74. [10]
    2. Isachenko R.V., Strijov V.V. Quadratic Programming Optimization with Feature Selection for Non-linear Models // Lobachevskii Journal of Mathematics, 2018, 39(9) : 1179-1187. article
  • Basic algorithm: Partial Least Squares[11]
  • Solution: Create a feature selection algorithm alternative to PLS and taking into account the non-orthogonal feature interdependence structure.
  • Novelty: A feature selection method is proposed that takes into account the regularities of both the and independent variable and the dependent variable. Bonus: Explore changes in model structure as the nature of the sample changes.
  • Authors: Andrey Zadayanchuk, Strijov V.V.

2018

Title Links Team
(Example) Metric classification of time series Code,

LinkReview, Discussion

Alexey Goncharov*, Maxim Savinov
Forecasting the direction of movement of the price of exchange instruments according to the news flow Code,

LinkReview, Slides, Report

Alexander Borisov,

Drobin Maxim, Govorov Ivan, Mukhitdinova Sofia, Valentin Rodionov, Valentin Akhiyarov

Construction of reference objects for a set of multidimensional time series Code

LinkReview

Iskhakov Rishat,

Korepanov Georgy, Solodnev Samirkhanov Danil

Dynamic alignment of multivariate time series Code

LinkReview Slides Report

Gleb Morgachev,

Vladislav Smirnov, Tatiana Lipnitskaya

Automatic adjustment of ARTM parameters for a wide class of The problems Code,

LinkReview, Presentation

Golubeva Tatiana,

Ivanova Ekaterina, Matveeva Svetlana, Trusov Anton, Tsaritsyn Mikhail, Chernonog Vyacheslav

Finding paraphrases Code,

LinkReview

Stas Okrug, Nikita Mokrov

Fedor Kitashov, Polina Proskura, Natalia Basimova, Roman Krasnikov, Akhmedkhan Shabanov

On conformational changes of proteins using collective motions in torsion angle space and L1 regularization Code,

LinkReview Presentation

Ryabinina Raisa, Emtsev Daniil
Privileged training in the problem of approximating the borders of the iris Code,

LinkReview

Pavel Fedosov, Alexey Gladkov,

Genrikh Kenigsberger, Ivan Korostelev, Nikolay Balakin

Generation of features using locally approximating models Code,

LinkReview

Ibrahim Kurashov, Nail Gilmutdinov,

Albert Mulyukov, Valentin Spivak

Text recognition based on skeletal representation of thick lines and convolutional networks Code, LiteratureReview, Slides, report Kutsevol Polina

Lukoyanov Artem Korobov Nikita Boyko Alexander Litovchenko Leonid Valukov Alexandr Badrutdinov Kamil Yakushevskiy Nikita Valyukov Nikolay Tushin Kirill

Comparison of neural network and continuous-morphological methods in the problem of text detection Code, LinkReview, Discussion, Presentation Gaiduchenko Nikolay

Torlak Artyom Akimov Kirill Mironova Lilia Gonchar Daniel

Automatic construction of a neural network of optimal complexity Code, LinkReview, report, slides Nikolai Goryan

Alexander Ulitin Tovkes Artem Taranov Sergey Gubanov Sergey Krinitsky Konstantin Zabaznov Anton Valery Markin

Machine translation training without parallel texts. Code,

LinkReview, Report, Slides

Alexander Artemenkov

Angelina Yaroshenko Andrey Stroganov Egor Skidnov Anastasia Borisova Ryabov Fedor Mazurov Mikhail

Deep learning for RNA secondary structure prediction Code

Link Review

Dorokhin Semyon

Pastukhov Sergey Pikunov Andrey Nesterova Irina Anna chat

Deep Learning for reliable detection of tandem repeats in 3D protein structures Code

Link Review

Veselova Evgeniya
Formulation and solution of an optimization problem combining classification and regression to estimate the binding energy of a protein and small molecules Code

Link Review

Merkulova Anastasia

Plumite Elvira Zhiboyedova Anastasia chat

Estimation of the optimal sample size for research in medicine Code

Link Review

Artemy Kharatyan,

Mikhail Mikheev, Evgin Alexander, Seppar Alexander, Konoplyov Maxim, Murlatov Stanislav, Makarenko Stepan

Intention forecasting. Investigation of the properties of local models in the spatial decoding of brain signals Code,

LinkReview, Presentation

Natalia Bolobolova,

Alina Samokhina, Shiyanov Vadim

Intention forecasting. Building an optimal signal decoding model for modeling a brain-computer interface. Code,

LinkReview, Presentation, Article

Ivan Nasedkin, Galiya Latypova,

Nestor Sukhodolsky, Alexander Shemenev Ivan Borodulin,

Investigation of the dependence of the quality of recognition of ontological objects on the depth of hyponymy. Code,

Report, LinkReview, Presentation

Vyacheslav Rezyapkin, Alexey Russkin,

Victoria Dochkina, Miron Kuznetsov, Yarmoshyk Demyan

Comparison of the quality of end-to-end trainable models in The problem of answering questions in a dialogue, taking into account the context Code

LinkReview Report, Presentation

Agafonov Alexey, Ryakin Ilya,Litvinenko Vladimir,

Khokhlov Ivan, Velikovsky Nikita, Anufrienko Oleg

High order convex optimization methods Code,

LinkReview, Slides

Selikhanovich Daniel,

Sokolov Igor

Fractal analysis and synthesis of optical images of sea waves code,

LinkReview, Presentation Report

Kanygin Yuri
Entropy maximization for various types of image transformations code,

LinkReview, report, slides

Nikita Voskresensky,

Alisa Shabalina, Yaroslav Murzaev, Alexey Khokhlov, Alexey Kazakov, Olga Gribova, Alexander Belozertsev

Automatic detection and recognition of objects in images code,

code_A, Slides_for_demo, Report2018Project25_30 Report2018Project25_31 slides_30 slides_25_31 LinkReview

Julia Demidova

Ivan Razumov Vladislav Tominin Yaroslav Tominin Nikita Dudorov Leonid Erlygin Proshutinsky Dmitry Baimakov Vladimir Zubkov Alexander Chernenkova Elena

Location determination by accelerometer signals Code,

LinkReview, Slides, Text

Elvira Zainulina

Fateev Dmitry Vitaly Protasov Nikita Bozhedomov

Multimodelling as a universal way to describe a general sample Code,

Linkreview, Slides, Report

Vladimir Kachanov

Evgenia Strelkova

Cross-Language Document Extractive Summarization with Neural Sequence Model Code,

Linkreview, problem29_Report.pdf Report, Slides

Pavel Zakharov

Pavel Kvasha Evgeny Dyachkov Evgeny Petrov Ilya Selnitsky

Pairwise energy matrix construction for inverse folding problem Code,

LinkReview Report Slides

Rubinshtein Alexander
Smooth orientation-dependent scoring function Code

[https://github.com/Intelligent-Systems-Phystech/2018-Project-SBROD

Noskova Elizaveta

Kachkov Sergey Sidorenko Anton

5. 2018

  • Title: Finding paraphrases.
  • Problem description: Paraphrases are different variations of the same and the same text, identical in meaning, but differing lexically and grammatically, for example: "Where did the car go" and "Which direction did the car go". The problem of detecting paraphrases is to select clusters in a set of texts, such that each cluster contains only paraphrases of the same and the same sentence. The easiest way to extract paraphrases is to cluster texts, where each text is represented by a "bag of words".
  • Data: There are open datasets of questions for testing and training on kaggle.com, there are open datasets for testing from semeval conferences.
  • Base algorithm: Use one of the document clustering algorithms to extract paraphrases, where each document is represented by a bag of words or tf-idf.
  • Solution: Use neural network architectures to search for paraphrases, use phrases extracted with parsers as features, use multilevel clustering.
  • Novelty: Lack of implementations for the Russian language that will use parsers for a similar The problem, all current solutions are quite "simple".
  • Authors: Artyom Popov.

6. 2018

  • Title: On conformational changes of proteins using collective motions in torsion angle space and L1 regularization.
  • Problem description: Torsion angles are the most natural degrees of freedom for describing motions of polymers, such as proteins. This is because bond lengths and bond angles are heavily constrained by covalent forces. Thus, multiple attempts have been done to describe protein dynamics in the torsion angle space. For example, one of us has developed an elastic network model (ENM) [1] in torsion angle space called Torsional Network Model (TNM) [2]. Functional conformational changes in proteins can be described in the Cartesian space using just a subset of collective coordinates [3], or even a sparse representation of these [4]. The latter requires a solution of a LASSO optimization problem [5]. The goal of the current project is to study if a sparse subset of collective coordinates in the torsion subspace can describe functional conformational changes in proteins. This will require a solution of a ridge regression problem with a L1 regularization constraint. The starting point will be the LASSO formulation.
  • Data: Experimental conformations will be extracted from the Protein Docking Benchmark v5 (https://zlab.umassmed.edu/benchmark/) and a few others. The TNM model can be downloaded from https://ub.cbm.uam.es/tnm/tnm_soft_main.php
  • References:
    1. Tirion MM. (1996) Large Amplitude Elastic Motions in Proteins from a Single-Parameter, Atomic Anal- ysis. Phys Rev Lett. 77:1905–1908.
    2. Mendez R, Bastolla U. (2011) Torsional network model: normal modes in torsion angle space better correlate with conformation changes in proteins. Phys Rev Lett. 2010 104:228103.
    3. SwarmDock and the use of normal modes in protein-protein docking. IH Moal, PA Bates - International journal of molecular sciences, 2010
    4. Modeling protein conformational transition pathways using collective motions and the LASSO method. TW Hayes, IH Moal - Journal of chemical theory and computation, 2017
    5. https://en.wikipedia.org/wiki/Lasso_(statistics)
    6. E. Frezza, R. Lavery, Internal normal mode analysis (iNMA) applied to protein conformational flexibility, Journal of Chemical Theory and Computation 11 (2015) 5503–5512.
  • Base algorithm: The starting point will be a combination of methods from references 2 and 4. It has to be a LASSO formulation with the direction vectors reconstructed from the internal coordinates. The quality will be computed based on the RMSD measure between the prediction and the solution on several benchmarks. Results will be presented with statistical plots (see examples in references 3-4.
  • Novelty: This is an important and open question in computational structural bioinformatics - how to efficiently represent transitions between protein structures. Not much has been done in the torsional angle subspace (internal coordinates)[6] and nearly nothing has been done using L1 regularization [4].
  • Authors: Ugo Bastolla on the torsional subspace (https://ub.cbm.uam.es/home/ugo.php), Sergei Grudinin on L1 minimization (https://team.inria.fr/nano-d/team-members/sergei-grudinin/)

10. 2018

  • Title: Comparison of neural network and continuous-morphological methods in the problem of text detection (Text Detection).
  • Problem: Automatically Detect Text in Natural Images.
  • Data: Synthetic generated data + prepared sample of photos + COCO-Text dataset + Competition Avito 2014.
  • References: COCO benchmark, One of a state-of-the-art architecture
  • Base algorithm: code + morphological methods, Avito 2014 winner’s solution.
  • Solution: It is proposed to compare the performance of several state-of-the-art algorithms that need a large training set with morphological methods that require a small amount of data. It is proposed to determine the limits of applicability of certain methods.
  • Novelty: propose an algorithm based on the use of both neural network and morphological methods (solution of the word detection problem).
  • Authors: I. N. Zharikov.
  • Expert: L. M. Mestetsky (morphological methods).

16. 2018

  • Title: Estimate of the optimal sample size for research in medicine
  • Problem: In conditions of an insufficient number of expensive measurements, it is required to predict the optimal size of the replenished sample.
  • Data: Samples of measurements in medical diagnostics, in particular, a sample of immunological markers.
  • References:
    1. Motrenko A.P. Materials on algorithms for estimating the optimal sample size in the MLAlgorithms repository [45], p/mlalgorithms/code/Group874/Motrenko2014KL/.
  • Basic algorithm: A series of empirical sample size estimation algorithms.
  • Solution: Investigation of the properties of the parameter space when replenishing the sample.
  • Novelty: A new methodology for sample size forecasting is proposed, justified in terms of classical and Bayesian statistics.
  • Authors: A.M. Katrutsa, Strijov V.V., coordinator Tamaz Gadaev

19. 2018

  • Name: Study of the dependence of the quality of recognition of ontological objects on the depth of hyponymy.
  • Problem description: It is necessary to investigate the dependence of the quality of recognition of ontological objects at different levels of concept hyponymy. The classic formulation of the problem of named entity recognition: https://en.wikipedia.org/wiki/Named-entity_recognition
  • Data: Hyponyms from https://wordnet.princeton.edu/ , texts from different domains presumably from WebOfScience.
  • References: Relevant articles for classical staging http://arxiv-sanity.com/search?q=named+entity+recognition
  • Basic algorithm: https://arxiv.org/pdf/1709.09686.pdf or its simplified version can be used as an algorithm, studies are performed using the DeepPavlov library.
  • Solution: It is necessary to collect a dataset of hyponymy (nesting of concepts) of objects using WordNet, to automatically mark up ontological objects of texts of various domains for several levels of generalization of concepts, to conduct a series of experiments to determine the quality of recognition of ontological objects for different levels of nesting.
  • Novelty: Similar studies have not been carried out, there are no ready-made datasets with a hierarchical markup of objects. Recognition of ontological objects at various levels of hyponymy can be used to produce additional features when solving various NLP (Natural language processing) The problems, as well as determining whether objects are a hyponym-hypernym pair.
  • Authors: Burtsev Mikhail Sergeevich (Expert), Baimurzina Dilyara Rimovna (consultant).

21. 2018

  • Title: High order convex optimization methods
  • Problem description: High-order methods are effectively (up to n ~ 10^3 sometimes even up to n ~ 10^4) used for convex problems of not very large dimensions. Until recently, it was generally accepted that these are second-order methods (using the second derivatives of the function being optimized). However, at the beginning of 2018 Yu.E. Nesterov [1] proposed an efficient third-order method in the theory, which works according to almost optimal estimates. In the manual [3] in exercise 1.3, an example of a "bad" convex function proposed by Yu.E. Nesterov, on which I would like to compare the Nesterov method of the second and third order [1], the method from [2] of the second and third order and the usual fast gradient methods (of the first order). It is worth comparing both by the number of iterations and by the total running time.
  • References:
  1. https://alfresco.uclouvain.be/alfresco/service/guest/streamDownload/workspace/SpacesStore/aabc2323-0bc1-40d4-9653-1c29971e7bd8/coredp2018_05web.pdf?guest=true
  2. https://arxiv.org/pdf/1809.00382.pdf
  3. https://arxiv.org/pdf/1711.00394.pdf
  • Author: Evgenia Alekseevna Vorontsova (Associate Professor of Far Eastern Federal University, Vladivostok), Alexander Vladimirovich Gasnikov

22. 2018

  • Title: Cutting plane methods for copositive optimization
  • Problem: Conic program over the copositive cone (copositive program) min <C,X> : <A_i,X> = b_i, X \in \Pi_i C^k_i, k_i <= 5 A linear function is minimized over the intersection of an affine subspace with a product of copositive cones of orders k_i <= 5.
  • Data: The algorithm will be tested on randomly generated instances
  • References:
    1. [1] Peter J. C. Dickinson, Mirjam Dür, Luuk Gijben, Roland Hildebrand. Scaling relationship between the copositive cone and Parrilo’s first level approximation. Optim. Lett. 7(8), 1669—1679, 2013.
    2. [2] Stefan Bundfuss, Mirjam Dür. Algorithmic copositivity detection by simplicial partition. Linear Alg. Appl. 428, 1511—1523, 2008.
    3. [3] Mirjam Dür. Copositive programming — a Survey. In Recent advances in Optimization and its Applications in Engineering, Springer, pp. 3-20, 2010.
  • Base algorithm: The reference algorithm is described in [4] Stefan Bundfuss, Mirjam Dür. An Adaptive Linear Approximation Algorithm for Copositive Programs. SIAM J. Optim., 20(1), 30-53, 2009.
  • Solution: The copositive program will be solved by a cutting plane algorithm. The cutting plane (in the case of an infeasible iterate) will be constructed from the semidefinite representation of the diagonal 1 section of the cone proposed in [1]. The algorithm will be compared to a simplicial division method proposed in [2], [4]. General information about copositive programs and their applications in optimization can be found in [3] .
  • Novelty: The proposed algorithm for optimization over copositive cones up to order 5 uses an exact semi-definite representation. In contrast to all other algorithms existing today the generation of cutting planes is non-iterative.
  • Author: Roland Hildebrand

23. 2018

  • Title: Fractal analysis and synthesis of optical images of sea waves
  • Problem description: A variety of physical processes and phenomena are studied with the help of images obtained remotely. An important The problem is to obtain adequate information about the processes and phenomena of interest by measuring certain image characteristics. Lines of equal brightness (isolines) on the images of many natural objects are fractal, that is, they are sets of points that cannot be represented by lines of finite length and occupy an intermediate position between lines and two-dimensional flat figures. Such sets are characterized by the fractal dimension D, which generalizes the classical concept of the dimension of a set and can take fractional values. For a solitary point on the image D=0, for a smooth curve D=1, for a flat figure D=2. The fractal isoline has the dimension 1<D<2. The algorithm for calculating D is given, for example, in [1]. The fractal dimension of the sea surface isolines can serve to estimate the spatial spectra of sea waves according to remote sensing data [1]. The problem is as follows. It is necessary to conduct a numerical study of the relationship between the characteristics of the spatial spectra of sea waves and the fractal dimension of satellite images of the Earth in the solar glare region. For the study, the method of numerical synthesis of optical images of sea waves, described in [2], should be used. Numerical modeling should be done with different characteristics of sea waves, as well as with different positions of the Sun and spatial resolution of images.
  • References:
    1. Lupyan E. A., Murynin A. B. Possibilities of fractal analysis of optical images of the sea surface. // Preprint of the Space Research Institute of the Academy of Sciences of the USSR Pr.-1521, Moscow, 1989, 30 p.
    2. Murynin A. B. Reconstruction of the spatial spectra of the sea surface from optical images in a nonlinear model of the brightness field // Research of the Earth from Space, 1990. No. 6. P. 60-70.
  • Author: Ivan Alekseevich Matveev

24. 2018

  • Name Entropy maximization for various types of image transformations
  • Problem description: Pansharpening is an algorithm for upscaling multispectral images using a reference image. The problem of pansharpening is formulated as follows: having a panchromatic image of the required resolution and a multispectral image of reduced resolution, it is required to restore the multispectral image in the spatial resolution of the panchromatic one. From empirical observations based on a large number of high-resolution images, it is known that the spatial variability of the reflected radiation intensity for objects of the same nature is much greater than the variability of their spectrum. In other words, one can observe that the spectrum of reflected radiation is homogeneous within the boundaries of one object, while even within one object the intensity of reflected radiation varies. In practice, good results can be achieved using a simplified approach, in which it is assumed that if the intensity of neighboring regions differ significantly, then these regions probably belong to different objects with different reflected spectra. This is the basis for the developed probabilistic algorithm for increasing the resolution of multispectral images using a reference image [1]
  • It is necessary to conduct a study on maximizing the entropy for various types of transformations on the image. Show that entropy can serve as an indicator of the loss of information contained in the image during transformations over it. Formulation of the inverse problem for image restoration: Condition 1: Correspondence of the intensity (at each point) of the restored image with the intensity of the panchromatic image. Condition 2: Correspondence of the low-frequency component of the reconstructed image with the original multispectral image. Condition 3: Homogeneity (similarity) of the spectrum within one object and the assumption of an abrupt change in the spectrum at the border of two homogeneous regions. Condition 4: Under the first three conditions, the local entropy of the reconstructed image must be maximized.
  • References:
    1. Gorohovsky K. Yu., Ignatiev V. Yu., Murynin A. B., Rakova K. O. Search for optimal parameters of a probabilistic algorithm for increasing the spatial resolution of multispectral satellite images // Izvestiya RAN. Theory and control systems, 2017, No. 6.
  • Author: Ivan Alekseevich Matveev

25. 2018

  • Title: Automatic detection and recognition of objects in images
  • Problem description: Automatic detection and recognition of objects in images and videos is one of the main The problems of computer vision. As a rule, these The problems are divided into several subThe problems: preprocessing, extraction of the characteristic properties of the object image and classification. The pre-processing stage usually includes some operations on the image such as filtering, brightness equalization, geometric corrective transformations to facilitate robust feature extraction.

The characteristic properties of an image of an object are understood as a set of features that approximately describe the object of interest. Features can be divided into two classes: local and integral. The advantage of local features is their versatility, invariance with respect to uneven changes in brightness and illumination, but they are not unique. Integral features that characterize the image of the object as a whole are not resistant to changes in the structure of the object and difficult lighting conditions. There is a combined approach - the use of local features as elements of an integral description, when the desired object is modeled by a set of areas, each of which is characterized by its own set of features - a local texture descriptor. The totality of such descriptors characterizes the object as a whole. Classification is understood as determining whether an object belongs to a particular class by analyzing the feature vector obtained at the previous stage, dividing the feature space into subdomains indicating the corresponding class. There are many approaches to classification: neural network, statistical (Bayesian, regression, Fisher, etc.), decision trees and forests, metric (nearest K-neighbors, Parzen windows, etc.) and nuclear (SVM, RBF, method of potential functions), compositional (AdaBoost). For The problem of detecting an object in an image, membership in two classes is evaluated - the class of images containing the object, and the class of images that do not contain the object (background images).

29. 2018

  • Name: Cross-Language Document Extractive Summarization with Neural Sequence Model.
  • Problem description: It is proposed to solve the transfer learning problem for the text reduction model by extractive summarization and to investigate the dependence of the quality of text reduction on the quality of training of the translation model. Having data for training the abbreviation model in English and a parallel English-Russian corpus of texts, build a model for abbreviating the text in Russian. The solution of the problem is evaluated on a small set of data for testing the model in Russian, the quality of the solution to the problem is determined by the ratio of the values of the ROUGE criteria in English and Russian sets.
  • Data: Data for training the model in English (SummaRuNNer2016), OPUS parallel corpus, data for verification in Russian.
  • References: The article (SummaRuNNer2016) describes the basic text reduction algorithm, the work Neural machine translation by jointly learning to align and translate.(NMT2016) describes the translation model. The idea of sharing models is presented in Cross-Language Document Summarization Based on Machine Translation Quality Prediction (CrossSum2010).
  • Basic algorithm: One idea of the basic algorithm is presented in (CrossSum2010), a translation model is implemented (OpenNMT), an implementation of a text reduction model is provided (SummaRuNNer2016).
  • Solution: It is suggested to explore the solution idea proposed in the article (CrossSum2010) and options for combining reduction and translation models. Basic models and dataset preprocessing implemented (OpenNMT), PyTorch and Tensorflow libraries. Analysis of text reduction errors is performed as described in (SummaRuNNer2016), analysis of the quality of model training by standard library tools, .
  • Novelty: For the base model, the applicability was investigated on a couple of datasets, confirming the possibility of transferring training to a dataset in another language and specifying the conditions for this transfer will expand the scope of the model and indicate the necessary new refinements of the model or data preprocessing.
  • Authors: Alexey Romanov (consultant), Anton Khritankov (Expert).

30. 2018

  • Title: Method for constructing an HG-LBP descriptor based on gradient histograms for pedestrian detection.
  • Problem description: It is proposed to develop a new descriptor that generalizes the LBP descriptor based on histograms of gradient modules, having HOG-LBP composition properties for The problem of detecting pedestrians in an image. As an analysis of the quality of a new descriptor, it is proposed to use FAR/FRR detection error plots based on INRIA.
  • Data: INRIA pedestrian database: http://pascal.inrialpes.fr/data/human/
  • References:
    1. T. Ojala and M. Pietikainen. Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns, IEEE Trans on Pattern Analysis and Machine Intelligence, Vol. 24. No. 7, July, 2002.
    2. T. Bouwmans, C. Silva, C. Marghes, M. Zitouni, H. Bhaskar, C. Frelicot, "On the Role and the Importance of Features for Background Modeling and Foreground Detection", https:// arxiv.org/pdf/1611.09099v1.pdf
    3. N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2005, pp.886-893
    4. T. Ahonen, A. Hadid, M. Pietikainen Face Description with Local Binary Patterns: Application to Face Recognition \\ IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume:28 , Issue: 121.
    5. http://www.magicandlove.com/blog/2011/08/26/people-detection-in-opencv-again/
    6. http://www.cse.oulu.fi/CMV/Downloads/LBPMatlab2
    7. http://www.mathworks.com/help/vision/ref/extractlbpfeatures.html3.
    8. http://www.codeproject.com/Articles/741559/Uniform-LBP-Features-and-Spatial-Histogram-Computa4.
    9. http://www.cse.oulu.fi/CMV/Research
  • Basic algorithm: Xiaoyu Wang, Tony X. Han, Shuicheng Yan. An HOG-LBP Human Detector with Partial Occlusion Handling \\ ICCV 2009
  • Solution: One of the options for generalizing LBP can be to use instead of histograms of distribution of points by LBP code, histograms of distribution of modules of point gradients in a block by LBP code (HG-LBP). It is proposed to use the OpenCV library for the basis of experiments, in which the HOG and LBP algorithms are implemented. It is necessary to modify the source code of the LBP implementation and insert the calculation of the modules of the gradient and the accumulation of the corresponding histogram over the LBP. It is necessary to write a program for reading the INRIA base, learning the linear SVM method on the original and modified descriptors, collecting detection statistics and plotting FAR/FRR DET plots.
  • Novelty: The development of computationally simple methods for extracting the most informative features in recognition The problems is relevant in the field of creating embedded systems with low computing resources. Replacing the composition of descriptors with one that is more informative than each individually can simplify the solution of the problem. The use of gradient values in LPB descriptor histograms is new.
  • Authors: Gneushev Alexander Nikolaevich

31. 2018

  • Name: Using the HOG descriptor to train a neural network in a pedestrian detection The problem
  • Problem description: It is proposed to replace the linear SVM classifier in the classical HOG algorithm with a simple convolutional neural network of small depth, while the HOG descriptor should be represented by a three-dimensional tensor that preserves the spatial structure of local blocks. As an analysis of the quality of a new descriptor, it is proposed to use FAR/FRR detection error plots based on INRIA.
  • Data: INRIA pedestrian database: http://pascal.inrialpes.fr/data/human/
  • References:
    1. 1. N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2005, pp.886-893
    2. 3. Q. Zhu, S. Avidan, M.-C. Yeh, and K.-T. Cheng. Fast human detection using a cascade of histograms of oriented gradients. In CVPR, pages 1491-1498, 2006 O. Tuzel, F. Porikli, and P. Meer. Human detection via classification on riemannian manifolds. In CVPR, 2007
    3. 4. P. Dollar, C. Wojek, B. Schiele and P. Perona Pedestrian Detection: An Evaluation of the State of the Art / IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol 34. Issue 4, pp . 743-761
    4. 5. Xiaoyu Wang, Tony X. Han, Shuicheng Yan, An HOG-LBP Human Detector with Partial Occlusion Handling, ICCV 2009 http://www.xiaoyumu.com/s/PDF/Wang_HOG_LBP.pdf
    5. 6. https://en.wikipedia.org/wiki/Pedestrian_detection
    6. 7. HOG person detector tutorial https://chrisjmccormick.wordpress.com/2013/05/09/hog-person-detector-tutorial/
    7. 8. NavneetDalalThesis.pdf Navneet Dalal. Finding People in Images and Videos. PhD Thesis. Institut National Polytechnique de Grenoble / INRIA Rhone-Alpes, Grenoble, July 2006)
    8. 9. People Detection in OpenCV http://www.magicandlove.com/blog/2011/08/26/people-detection-in-opencv-again/
    9. 10. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
  • Basic algorithm:
    1. 1. N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2005, pp.886-893
    2. 2. Xiaoyu Wang, Tony X. Han, Shuicheng Yan, An HOG-LBP Human Detector with Partial Occlusion Handling, ICCV 2009
  • Solution: One of the options for generalizing the HOG algorithm can be to use another classifier instead of the linear SVM algorithm, for example, some kind of neural network. It is proposed to use the OpenCV library for the basis of experiments, which implements the HOG algorithm and the SVM classifier. It is necessary to analyze the source code of the HOG implementation, formalize the internal structure of the descriptor HOG vector in the form of a three-dimensional tensor — two spatial and one spectral dimensions. It is necessary to write a program for reading the INRIA base, learning the linear SVM method on HOG descriptors from it, collecting detection statistics and plotting FAR/FRR DET plots. Based on some neural network training system (for example, mxnet), it is necessary to assemble a shallow (no more than 2-3 convolutional layers) convolutional neural network of known architecture, train it on the basis of INRIA and on HOG tensor descriptors, build the corresponding FAR / FRR graphs.
  • Novelty: The development of computationally simple methods for extracting the most informative features in recognition The problems is relevant in the field of creating embedded systems with low computing resources. Using a small number of the most informative descriptors can reduce computational complexity compared to using a large composition of simple features, such as in a deep convolutional neural network. Typically, classifiers use the HOG descriptor as a vector as a whole, however, information about the local spatial structure and feature spectrum is lost. The novelty lies in the use of the block locality property in the HOG descriptor and the representation of the HOG as a 3D tensor. The use of this information makes it possible to achieve detection resistance to pedestrian overlap.
  • Authors: Gneushev Alexander Nikolaevich

2017

Author Topic Links Consultant Reviewer Report Letters \Sigma=3+13
Goncharov Alexey Metric classification of time series code,

paper, slides

Maria Popova Zadayanchuk Andrey BMF AILSBRCVTDSWH>
Astakhov Anton Restoring the structure of a predictive model from a probabilistic representation folder

code paper

Alexander Katrutsa Kislinsky Vadim BHF A-I-L0S0B0R0C0V0T0 [A-I-L-S-B0R0C0V0T0E0D0W0S] + [AILSBRCBTEDWS] 2+4
Gavrilov Yuri Choice of Interpreted Multimodels in Credit Scoring The problems folder

code paper video

Goncharov Alexey Ostroukhov Petr BF A+IL-S0B-R0 [A+ILSBRC-VT0E0D0W0S] + (W) 2+9+1
Gadaev Tamaz Estimating the optimal sample size folder

code paper slides video

Alexander Katrutsa Shulgin Egor BHF A-IL>SB-R-C0V0T0 [AILSBR0CVT0E-D0W0S] 2+9
Gladin Egor Accelerometer Battery Savings Based on Time Series Forecasting folder

code paper slides

Maria Vladimirova Kozlinsky Evgeny

review

.F AILS [A-I-L-SB0R0C000V0T0E0D0W0S] 1+4
Grabovoi Andrey Automatic determination of the relevance of neural network parameters. folder

code paper slides video

Oleg Bakhteev Kulkov Alexander BHMF A+ILS+BRC+VTE>D> [AILSBRCVTEDWS] [\emptyset] 3+13
Nurlanov Zhakshylyk Deep Learning for reliable detection of tandem repeats in 3D protein structures folder

code paper slides video

S. V. Grudinin, Guillaume Pages Pletnev Nikita

Review

BHF AILB [A-I-LS-BRC0V0T-E0D0W0S] 2+7
Rogozina Anna Deep learning for RNA secondary structure prediction folder

code paper slides video

Maria Popova Gadaev Tamaz BHMF AILSBR> [AILSBRC0V0T0E0D0W0S]+CW 3+9
Terekhov Oleg Generation of features using locally approximating models folder

code paper slides

S.D. Ivanychev, R.G. Neichev Gladin Egor

review

BHM AILSBRCVTDSW [AIL0SB0R0C0V0TE0D0W0S] 2+12
Shulgin Egor Generation of features that are invariant to changes in the frequency of the time series folder

code paper

R.G. Neichev Terekhov Oleg BHM AIL [AI-LS-BR0CV0T0E0D0W0S] 2+5
Malinovsky Grigory Graph Structure Prediction of a Neural Network Model folder

code paper slides video

Oleg Bakhteev Grabovoi Andrey

review

BHMF A+I+L+SBR>C>V>T>E>D> [AILSBRC0VTED0WS]+(C) 3+11
Kulkov Alexander Brain signal decoding and intention prediction folder

code paper slides video

[R.V. Isachenko Malinovsky Grigory

review

BHMF AILSBR [AILSBRCVTED0W0S] 3+11
Pletnev Nikita Approximation of the boundaries of the iris paper

slides [ video]

Alexander Aduenko Nurlanov Zhakshylyk BF AILSB>R> [AILSTWS] 2+7
Ostroukhov Petr Selection of models superposition for identification of a person on the basis of a ballistocardiogram folder

paper code slides

Alexander Prozorov Gavrilov Yuri

review

BhF AIL>S?B?R? [AILSBRCVT-E0D0W0S] 2+10
Kislinsky Vadim Predicting user music playlists in a recommender system. folder

code slides paper video

Evgeny Frolov Astakhov Anton .F (AIL)------(SB)---(RCVT)-- [AILS-BRCVTED0W0S] 1+11
Kozlinsky Evgeny Analysis of banking transactional data of individuals to identify customer consumption patterns. folder

code paper slides video

Rosa Aisina Rogozina Anna

review

BHMF AILSBR>CV> [AILSBR0C0V0TE0D0WS]+(С) 3+8+1


1

  • Title: Approximation of the boundaries of the iris
  • Problem: Based on the image of the human eye, determine the circles approximating the inner and outer border of the iris.
  • Data: Bitmap monochrome images, typical size 640*480 pixels (however other sizes are possible)[46], [47].
  • References:
    1. Aduenko A.A. Selection of multi-models in The problems classification (supervisor Strijov V.V.). Moscow Institute of Physics and Technology, 2017. [48]
    2. K.A. Gankin, A.N. Gneushev, I.A. Matveev Segmentation of the iris image based on approximate methods with subsequent refinements // Izvestiya RAN. Theory and control systems, 2014, no. 2, p. 78–92.
    3. Duda, R. O. Use of the Hough transformation to detect lines and curves in pictures / R. O. Duda, P. E. Hart // Communications of the ACM. 1972 Vol. 15, no. 1.Pp.
  • Basic algorithm: Efimov Yury. Search for the outer and inner boundaries of the iris in the eye image using the paired gradient method, 2015.
  • Solution: See iris_circle_problem.pdf
  • Novelty: A fast non-enumerative algorithm for approximating boundaries using linear multimodels is proposed.
  • consultant: Alexander Aduenko (by Strijov V.V., Expert Matveev I.A.)

2

  • Title: Estimated optimal sample size
  • Problem: In conditions of an insufficient number of expensive measurements, it is required to predict the optimal size of the replenished sample.
  • Data: Samples of measurements in medical diagnostics, in particular, a sample of immunological markers.
  • References:
    1. Motrenko A.P. Materials on algorithms for estimating the optimal sample size in the MLAlgorithms repository [49], p/mlalgorithms/code/Group874/Motrenko2014KL/.
  • Basic algorithm: Sample size estimation algorithms for .
  • Solution: Investigation of the properties of the parameter space when replenishing the sample.
  • Novelty: A new methodology for sample size forecasting is proposed, justified in terms of classical and Bayesian statistics.
  • Authors: A.M. Katrutsa, Strijov V.V., Expert A.P. Motrenko

3

  • Title: Restoring the structure of the prognostic model from a probabilistic representation
  • Problem: It is required to reconstruct the superposition tree from the generated connection probability graph.
  • Data: Segments of time series, spatio-temporal series (and text collections).
  • References:
    1. Works by Tommy Yakkola and others at LinkReview [50].
  • Basic algorithm: Branch and bound method, dynamic programming when building a fully connected graph.
  • Solution: Building a model in the form of GAN, VAE generates a weighted graph, NN approximates a tree structure.
  • Novelty: Suggested a way to penalize a graph for not being a tree. A method for predicting the structures of prognostic models is proposed.
  • Authors: A.M. Katrutsa, Strijov V.V.

4

  • Title: Text recognition based on skeletal representation of thick lines and convolutional networks
  • Problem: It is required to build two CNNs, one recognizes a bitmap representation of an image, the other a vector one.
  • Data: Bitmap fonts.
  • References: List of works [51], in particular arXiv:1611.03199 and
  • Basic algorithm: Convolution network for bitmap.
  • Solution: It is required to propose a method for collapsing graph structures, which allows generating an informative description of the skeleton of a thick line.
  • Novelty: A way to improve the quality of recognition of thick lines due to a new way of generating their descriptions is proposed.
  • Authors: L.M. Mestetsky, I.A. Reyer, Strijov V.V.

5

  • Title: Generation of features using locally approximating models
  • Problem: It is required to test the feasibility of the hypothesis of simplicity of sampling for the generated features. Features are the optimal parameters of approximating models. Moreover, the entire sample is not simple and requires a mixture of models to approximate it. Explore the information content of the generated features - the parameters of the approximating models trained on the segments of the original time series.
  • Data:
    1. WISDM (Kwapisz, J.R., G.M. Weiss, and S.A. Moore. 2011. Activity recognition using cell phone accelerometers. ACM SigKDD Explorations Newsletter. 12(2):74–82.), USC-HAD or higher. Accelerometer data (Human activity recognition using smart phone embedded sensors: A Linear Dynamical Systems method, W Wang, H Liu, L Yu, F Sun - Neural Networks (IJCNN), 2014)
    2. (Time series (examples library), Accelerometry section).
  • References:
    1. Kuznetsov M.P., Ivkin N.P. Algorithm for Classifying Accelerometer Time Series by Combined Feature Description // Machine Learning and Data Analysis. 2015. V. 1, No. 11. C. 1471-1483. [52]
    2. Karasikov M.E., Strijov V.V. Classification of time series in the space of parameters of generating models // Informatics and its applications, 2016.URL
    3. Kuznetsov M.P., Ivkin N.P. Algorithm for Classifying Accelerometer Time Series by Combined Feature Description // Machine Learning and Data Analysis. 2015. V. 1, No. 11. C. 1471 - 1483. URL
    4. Isachenko R.V., Strijov V.V. Metric learning in The problemx multiclass classification of time series // Informatics and its applications, 2016, 10(2) : 48-57. URL
    5. Zadayanchuk A.I., Popova M.S., Strijov V.V. Choosing the optimal model for classifying physical activity based on accelerometer measurements // Information technologies, 2016. URL
    6. Motrenko A.P., Strijov V.V. Extracting fundamental periods to segment human motion time series // Journal of Biomedical and Health Informatics, 2016, Vol. 20, no. 6, 1466 - 1476.
    7. Ignatov A., Strijov V. Human activity recognition using quasiperiodic time series collected from a single triaxial accelerometer // Multimedia Tools and Applications, 2015, 17.05.2015 : 1-14. URL
  • Basic algorithm: Described by Kuznetsov, Ivkin.
  • Solution: It is required to build a set of locally approximating models and choose the most adequate ones.
  • Novelty: A standard for building locally approximating models has been created.
  • Authors: S.D. Ivanychev, R.G. Neichev, Strijov V.V.

6

  • Title: Brain signal decoding and intention prediction
  • Problem: It is required to build a model that restores the movement of the limbs from the corticogram.
  • Data: neurotycho.org [53]
  • References:
    1. Neichev R.G., Katrutsa A.M., Strijov V.V. Selection of the optimal set of features from a multicorrelated set in the forecasting problem. Zavodskaya Lab. Diagnostics of materials, 2016, 82(3) : 68-74. [54]
    2. MLAlgorithms: Motrenko, Isachenko (submitted)
  • Basic algorithm: Partial Least Squares[55]
  • Solution: Create a feature selection algorithm alternative to PLS and taking into account the non-orthogonal structure of feature interdependence.
  • Novelty: A feature selection method is proposed that takes into account the regularities of both the and independent variable and the dependent variable.
  • Authors: R.V. Isachenko, Strijov V.V.

7

  • Title: Automatic determination of the relevance of neural network parameters.
  • Problem: The problem of finding a stable (and not redundant in terms of parameters) neural network structure is considered. To cut off redundant parameters, it is proposed to introduce a priori probabilistic assumptions about the distribution of parameters and remove non-informative parameters from the neural network using the Belsley method. To adjust the prior distribution, it is proposed to use gradient methods.
  • Data: A selection of handwritten MNIST digits
  • Basic algorithm: Optimal Brain Damage, decimation based on variance inference. The structure of the final model is proposed to be compared with the model obtained by the AdaNet algorithm.
  • References:
    1. [56] Gradient hyperparameter optimization methods.
    2. [57] Gradient hyperparameter optimization methods.
    3. [58] Optimal Brain Damage.
    4. [59] AdaNet
    5. [60] Belsley Method
  • Authors: Oleg Bakhteev, Strijov V.V.

8

  • Title: Prediction of the graph structure of the neural network model.
  • Problem: The problem is considered to find a stable (and non-redundant in terms of parameters) structure of a convolutional neural network. It is proposed to predict the structure of a neural network using doubly-recurrent neural networks. As a training sample, it is proposed to use the structures of models that have shown good quality on subsamples of small power.
  • Data: Samples MNIST, CIFAR-10
  • Basic algorithm: random search. Comparison with work on reinforcement learning is possible.
  • References:
    1. [61] doubly-recurrent neural networks.
    2. [62] Similar approach using reinforcement learning.
  • Authors: Oleg Bakhteev, Strijov V.V.

9

  • Title: Deep Learning for reliable detection of tandem repeats in 3D protein structures more in PDF
  • Problem: Deep learning algorithms pushed computer vision to a level of accuracy comparable or higher than a human vision. Similarly, we believe that it is possible to recognize the symmetry of a 3D object with a very high reliability, when the object is represented as a density map. The optimization problem includes i) multiclass classification of 3D data. The output is the order of symmetry. The number of classes is ~10-20 ii) multioutput regression of 3D data. The output is the symmetry axis (a 3-vector). The input data are typically 24x24x24 meshes. The total amount of these meshes is of order a million. Biological motivation : Symmetry is an important feature of protein tertiary and quaternary structures that has been associated with protein folding, function, evolution, and stability. Its emergence and ensuing prevalence has been attributed to gene duplications, fusion events, and subsequent evolutionary drift in sequence. Methods to detect these symmetries exist, either based on the structure or the sequence of the proteins, however, we believe that they can be vastly improved.
  • Data: Synthetic data are obtained by ‘symmetrizing’ folds from top8000 library (http://kinemage.biochem.duke.edu/databases/top8000.php).
  • References: Our previous 3D CNN: [63] Invariance of CNNs (and references therein): 01630265/document, [64]
  • Base algorithm: A prototype has already been created using the Tensorflow framework [4], which is capable of detecting the order of cyclic structures with about 93% accuracy. The main goal of this internship is to optimize the topology of the current neural network prototype and make it rotational and translational invariant with respect to input data. [4] [65]
  • Solution: The network architecture needs to be modified according to the invariance properties (most importantly, rotational invariance). Please see the links below [66],

[67] The code is written using the Tensorflow library, and the current model is trained on a single GPU (Nvidia Quadro 4000)of a desktop machine.

  • Novelty: Applications of convolutional networks to 3D data are still very challenging due to large amount of data and specific requirements to the network architecture. More specifically, the models need to be rotationally and transnationally invariant, which makes classical 2D augmentation tricks loosely applicable here. Thus, new models need to be developed for 3D data.
  • Authors: Expert Sergei Grudinin, consultants Guillaume Pages, Strijov V.V.

10

  • Title: Semi-supervised representation learning with attention
  • Problem: training of vector representations using the attention mechanism, thanks to which the quality of machine translation has increased significantly. It is proposed to use it in the encoder-decoder architecture network to obtain vectors of text fragments of arbitrary length.
  • Data: It is proposed to consider two samples: Microsoft Paraphrase Corpus (a small set of proposals, https://www.microsoft.com/en-us/download/details.aspx?id=52398) and PPDB (a set of short segments, not always correct markup. http://sitem.herts.ac.uk/aeru/ppdb/en/)
  • References:
    1. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. Attention Is All You Need (https://arxiv.org/abs/1706.03762).
    2. John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu. Towards Universal Paraphrastic Sentence Embeddings (https://arxiv.org/abs/1511.08198).
    3. Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, Sanja Fidler. Skip Thought Vectors (https://arxiv.org/abs/1506.06726).
    4. Keras seq2seq (https://github.com/farizrahman4u/seq2seq).
  • Basic algorithm: solution [3] or vector representations obtained using seq2seq[].
  • Solution: in The problem it is proposed to train vector representations for phrases using the attention and partial learning mechanism. As an internal quality functional, it is proposed to use the improved error function from [2]. As an applied problem, we can consider the problem of detecting paraphrases and sentiment analysis. Moreover, based on the results obtained in [1], it can be assumed that the attention mechanism has a greater influence on obtaining universal vectors for phrases than the network architecture. It is proposed to test this hypothesis using two different architectures - a standard recurrent and feed-forward network.
  • Novelty: new method.
  • Authors: Rita Kuznetsova, consultant

11

  • Title: Selection of Interpreted Multi-Models in Credit Scoring The problems
  • Problem: The problem of credit scoring is to determine the level of creditworthiness of the borrower. For this, a borrower's questionnaire is used, containing both numerical (age, income) and categorical features (gender, profession). It is required, having historical information about the repayment of loans by other borrowers, to determine whether the borrower will return the loan. The data can be heterogeneous (example, if there are different income regions in a country), and several models will be needed to adequately classify. It is necessary to determine the optimal number of models. Based on the set of model parameters, it is necessary to draw up a portrait of the borrower.
  • Data: It is proposed to consider five samples from the UCI and Kaggle repositories, with a capacity of 50,000 objects or more.
  • References: A.A. Aduenko \MLAlgorithms\PhDThesis; C. Bishop, Pattern recognition and machine learning, final chapter; 20 years of Mixture experts.
  • Base algorithm: Clustering and building independent logistic regression models, Adaboost, Decision Forest (with restrictions on complexity), Blend of Experts.
  • Solution: An algorithm is proposed for selecting a multi-model (a mixture of models or a mixture of Experts) and determining the optimal number of models.
  • Novelty: Proposed function of distance between models in which parameter distributions are given on different media.
  • Authors: Goncharov Alexey, Strijov V.V.

12

  • Title: Generation of features that are invariant to changes in the frequency of the time series.
  • Problem: Informally: there is a set of time series of a certain frequency (s1), and the information we are interested in is distinguishable and at a lower sampling rate (in the example, the samples occur every millisecond, and the events of interest to us occur at an interval of 0.1 s). These series are integrated reducing the frequency by a factor of 10 (i.e. every 10 values are simply summed) and a set of time series s2 is obtained. be described in the same way. Formally: Given a set of time series s1, .., sNS with a high sampling rate 1. Target information (example, hand movement/daily price fluctuation/…) is distinguishable and at a lower sampling rate 2 < 1. It is necessary to find such a mapping f: S G, - the frequency of the series, that it will generate similar feature descriptions for series of different frequencies. Those.

f* = argminf E(f1(s1) -f2(s2)) , where E is some error function.

  • Data: Sets of time series of people's physical activity from accelerometers; human EEG time series; time series of energy consumption of cities/industrial facilities. Sample link: UCI repository, our EEG and accelerometer samples.
  • References: See above for Accelerometers
  • Base algorithm: Fourier transform.
  • Solution: Building an autoencoder with a partially fixed internal representation as the same time series with a lower frequency.
  • Novelty: For time series, there is no “common approach” to analysis, in contrast, in the example, to image analysis. If you look at the problem abstractly, now the cat is defined as well as and the cat, which takes up half the space in the image. An analogy with time series suggests itself. Moreover, the nature of data in pictures and in time series is similar: in pictures there is a hierarchy between values along two axes (x and y), and in time series - one at a time - along the time axis. The hypothesis is that methods similar to image analysis will provide qualitative results. The resulting feature representation can be further used for classification and prediction of time series.
  • Authors: R. G. Neichev, Strijov V.V.

18

2017 Group 2

Author Topic Link Consultant Reviewer Report Letters \Sigma=3+13
Goncharov Alexey Metric classification of time series code,

paper, slides

Maria Popova Zadayanchuk Andrey BMF AILSBRCVTDSWH>
Belykh Evgeny Proskurin Alexander Classification of superpositions of movements of physical activity paper

slides code

Maria Vladimirova, Alexandra Malkova Romanenko Ilya, Popovkin Andrey, review

video

MF AILSBRC>V> [AILSBRC0VT0E0D0WS] CTD 2+9
Zueva Nadezhda Style Change Detection paper

slides video

Rita Kuznetsova Igashov Ilya, review BHMF AIL-S-B-R- [AILSBRCV0TE0D0WS] 3+10
Igashov Ilya Formulation and solution of an optimization problem combining classification and regression to estimate the binding energy of a protein and small molecules. paper

slides video

Sergei Grudinin, Maria Kadukova Manucharyan Vardan, review, correction BHMF AILBS+BRHC>V> [AILSBRCVTE0D0WS] 3+11
Kalugin Dmitry Graph Structure Prediction of a Neural Network Model paper

slides

Oleg Bakhteev Zueva Nadezhda review BHM AI-L-S--B0R0C0V0 [A-ILSBR0CVT0ED0WS] 2+11
Manucharyan Vardan Prediction of properties and types of atoms in molecular graphs using convolutional networks paper,

slides, code video

Sergei Grudinin, Maria Kadukova Fattakhov Artur review BMF AILS>B> [AILSB0R0CV0TE0D0WS] VED 3+7
Muraviev Kirill Determination of neural network parameters to be optimized. paper,

slides, code video

Oleg Bakhteev Kalugin Dmitry review BHMF A+IL-S-B-RCVTED [AILSBRCV0TE0DWS] 3+12
Murzin Dmitry, Danilov Andrey Text recognition based on skeletal representation of thick lines and convolutional networks paper, slides, code

[video]

L. M. Mestetsky, Ivan Reyer, Zharikov I. N. Muraviev Kirill review BHMF A+IL> [AILSB0R0CV0TE0D0WS] 3+8
Popovkin Andrey Romanenko Ilya Creation of ranking models for information retrieval systems. Algorithm for Predicting the Structure of Locally Optimal Models paper

slides code video

Kulunchakov Andrey, Strijov V.V. Proskurin Alexander, Belykh Evgeny, review BHMF AILS0BC>V> [AILSBRC0VTED0WS] 3+11
Fattakhov Artur Style Change Detection paper

slides code video

Rita Kuznetsova Danilov Andrey, Murzin Dmitry, review BMF AIL-S-B-R-CVTDSWH [AILSBRCVTE0D0WS] 3+11


1 (1-2)

  • Title: Classification of superpositions of movements of physical activity
  • Problem: Human behavior analysis by mobile phone sensor measurements: detect human movements from accelerometer data. The accelerometer data is a signal without precise periodicity, which contains an unknown superposition of physical models. We will consider the superposition of models: body + arm/bag/backpack.

Classification of human activities according to measurements of fitness bracelets. According to the measurements of the accelerometer and gyroscope, it is required to determine the type of activity of the worker. It is assumed that the time series of measurements contain elementary movements that form clusters in the space of time series descriptions. (Development: The characteristic duration of movement is seconds. Time series are marked with activity type marks: work, rest. The characteristic duration of activity is minutes. It is required to restore the type of activity by the description of the time series and cluster.)

  • Data:
    1. Self assembled
    2. Builders data
    3. WISDM accelerometer time series (Time series (examples library), Accelerometry section).
  • References:
    1. Karasikov M. E., Strijov V. V. Classification of time series in the space of parameters of generating models // Informatics and its applications, 2016. [URL]
    2. Kuznetsov M.P., Ivkin N.P. Algorithm for classification of accelerometer time series by combined feature description // Machine Learning and Data Analysis. 2015. T. 1, No. 11. C. 1471-1483. [URL]
    3. Isachenko R. V., Strijov V. V. Metric learning in The problems of multiclass classification of time series // Informatics and its applications, 2016, 10(2): 48-57. [URL]
    4. Zadayanchuk A.I., Popova M.S., Strijov V.V. Choice of the optimal model for classifying physical activity based on accelerometer measurements // Information technologies, 2016. [URL]
    5. Motrenko A.P., Strijov V.V. Extracting fundamental periods to segment human motion time series // Journal of Biomedical and Health Informatics, 2016, Vol. 20, no. 6, 1466-1476. [URL]
    6. Ignatov A., Strijov V. Human activity recognition using quasiperiodic time series collected from a single triaxial accelerometer // Multimedia Tools and Applications, 2015, 17.05.2015 : 1-14. [URL]
  • Base algorithm: Basic algorithm is described in [Karasikov, Strijov: 2016] and [Kuznetsov, Ivkin: 2014].
  • Solution: Find the optimal segmentation method and optimal description of the time series. Construct a metric space of descriptions of elementary motions.
  • Novelty: A method for classifying and analyzing complex movements is proposed (Development: Connection of two characteristic times of a description of a person's life, combined problem statement.)
  • Authors: Alexandra Malkova, Maria Vladimirova, R. G. Neichev, Strijov V.V.

2 (1)

3 (1-2)

  • Title: Text recognition based on skeletal representation of thick lines and convolutional networks
  • Problem: It is required to build two CNNs, one recognizes a bitmap representation of an image, the other a vector one. (Development: generation of thick lines by neural networks)
  • Data: Bitmap fonts.
  • References: List of works [68], in particular arXiv:1611.03199 and
  • Basic algorithm: Convolution network for bitmap.
  • Solution: It is required to propose a method for collapsing graph structures, which allows generating an informative description of the skeleton of a thick line.
  • Novelty: A way to improve the quality of recognition of thick lines due to a new way of generating their descriptions is proposed.
  • Authors: L. M. Mestetsky, I. A. Reyer, Strijov V.V.

4 (1-2)

  • Title: Creation of ranking models for information retrieval systems. Algorithm for Predicting the Structure of Locally Optimal Models
  • Problem: It is required to predict a time series using some parametric superposition of algebraic functions. It is proposed not to cost the prognostic model, but to predict it, that is, to predict the structure of the approximating superposition. A class of considered superpositions is introduced, and on the set of such structural descriptions, a search is made for a locally optimal model for the problem under consideration. The problem consists in 1) searching for a suitable structural description of the model 2) describing the search algorithm for the structure that will correspond to the optimal model 3) describing the algorithm for inverse construction of the model according to its structural description. For an already existing example of the answer to questions 1-3, see the works of A. A. Varfolomeeva.
  • Data:
    1. Collection of text documents TREC (!)
    2. A set of time series, which implies the restoration of functional dependencies. It is proposed to first use synthetic data or immediately apply the algorithm to forecasting time series 1) electricity consumption 2) physical activity with subsequent analysis of the resulting structures.
  • References:
    1. (!) Kulunchakov A.S., Strijov V.V. Generation of simple structured Information Retrieval functions by genetic algorithm without stagnation // Expert Systems with Applications, 2017, 85: 221–230.
    2. A. A. Varfolomeeva Selection of features when marking up bibliographic lists using structural learning methods, 2013, [69]
    3. Bin Cao, Ying Li and Jianwei Yin Measuring Similarity between Graphs Based on the Levenshtein Distance, 2012, [70]
  • Base algorithm: Specifically, there is no basic algorithm for the proposed problem. It is proposed to try to repeat the experiment of A.A. Varfolomeeva for a different structural description in order to understand what is happening.
  • Solution: The superposition of algebraic functions defines an ortree, on the vertices of which the labels of the corresponding algebraic functions or variables are given. Therefore, the structural description of such a superposition can be its DFS-code. This is a string consisting of vertex labels, written in the order in which the tree is traversed by depth-first search. Knowing the arities of the corresponding algebraic functions, we can restore any such DFS-code in O(n) and get back the superposition of functions. On the set of similar string descriptions, it is proposed to search for the string description that will correspond to the optimal model.
  • Authors: Kulunchakov Andrey, Strijov V.V.

5 (1)

  • Title: Definition of neural network parameters to be optimized.
  • Problem: The problem of neural network optimization is considered. It is required to divide the model parameters into two groups:
    1. a) Model parameters to be optimized
    2. b) Model parameters whose optimization has been completed. Further optimization of these parameters will not improve the quality of the model.

It is proposed to consider the optimization of parameters as a stochastic process. Based on the history of the process, we find those parameters whose optimization is no longer required.

  • Data: A selection of handwritten MNIST digits
  • Basic algorithm: Random choice of parameters.
  • References:
    1. [71] SGD as a stochastic process.
    2. [72] Variational inference in neural networks.
  • Novelty: The resulting algorithm will significantly reduce the computational cost of optimizing neural networks. A possible further development of the method is to obtain estimates for the parameters of the network obtained from the original operations of expansion, compression, adding and removing layers.
  • Authors: Oleg Bakhteev, Strijov V.V.

6 (1)

  • Title: Prediction of the graph structure of the neural network model.
  • Problem: The problem is considered to find a stable (and non-redundant in terms of parameters) structure of a convolutional neural network. It is proposed to predict the structure of a neural network using doubly-recurrent neural networks. As a training sample, it is proposed to use the structures of models that have shown good quality on subsamples of small power.
  • Data: Samples MNIST, CIFAR-10
  • Basic algorithm: random search. Comparison with work on reinforcement learning is possible.
  • References:
    1. [73] doubly-recurrent neural networks.
    2. [74] Similar approach using reinforcement learning.
  • Authors: Oleg Bakhteev, Strijov V.V.

7 (1)

PAN 2017 (http://pan.webis.de/clef17/pan17-web/author-identification.html) PAN 2016 (http://pan.webis.de/clef16/pan16-web/author-identification.html)

  • References:
    1. Ian Goodfellow. NIPS 2016 Tutorial: Generative Adversarial Networks (https://arxiv.org/pdf/1701.06547.pdf)
    2. Jiwei Li, Will Monroe, Tianlin Shi, Sebastien Jean, Alan Ritter and Dan Jurafsky. Adversarial Learning for Neural Dialogue Generation(https://arxiv.org/pdf/1701.06547.pdf)
    3. M. Kuznetsov, A. Motrenko, R. Kuznetsova, V. Strijov. Methods for Intrinsic Plagiarism Detection and Author Diarization
    4. K. Safin, R. Kuznetsova. Style Breach Detection with Neural Sentence Embeddings (https://pdfs.semanticscholar.org/c70e/7f8fbc561520accda7eea2f9bbf254edb255.pdf)
  • Basic algorithm: solution described in [3, 4].
  • Solution: is proposed to solve the problem using generative adversarial networks — the generative model generates texts in the same author's style, the discriminative model — a binary classifier.
  • Novelty: it is assumed that the solution of this problem by the proposed method can give an increase in quality compared to typical methods for solving this problem, as well as related clustering problems of the authors.
  • Authors: Rita Kuznetsova (consultant), Strijov V.V.

8 (1)

  • Title: Obtaining likelihood estimates using autoencoders
  • Problem: it is assumed that the objects under consideration obey the manifold hypothesis (manifold learning) - high-dimensional vectors are concentrated around some subspace of lower dimension. Works [1, 2] show that some modifications of autoencoders are looking for a k-dimensional manifold in the object space, which most fully conveys the data structure. In [2], an estimate of the probability density of data is derived using an autoencoder. It is required to obtain this estimate for the plausibility of the model.
  • Data: it is proposed to experiment on short text fragments of Google ngrams (http://storage.googleapis.com/books/ngrams/books/datasetsv2.html)
  • References:
    1. Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-Antoine Manzagol. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion (http://www.jmlr.org/papers/volume11/vincent10a/vincent10a.pdf).
    2. Guillaume Alain, Yoshua Bengio. What Regularized Auto-Encoders Learn from the Data Generating Distribution (https://arxiv.org/pdf/1211.4246.pdf)
    3. Hanna Kamyshanska, Roland Memisevic. The Potential Energy of an Autoencoder (https://www.iro.umontreal.ca/~memisevr/pubs/AEenergy.pdf)
  • Basic algorithm:
  • Solution: in the problem it is proposed to train vector representations for phrases (n-grams) using an autoencoder, using Theorem 2 in [2] to obtain an estimate for the likelihood of the sample and, using this estimate, derive the likelihood of the model . Using the estimates obtained, one can also consider the sampling process.
  • Novelty: obtaining data and model likelihood estimates, generating texts using the resulting estimates.
  • Authors: Rita Kuznetsova (consultant).

9 (1)

  • Title: Predict properties and types of atoms in molecular graphs using convolutional networks.
  • Problem: Multilabel classification using convolutional neural networks (CNN) on graphs.

To predict the interaction of molecules with each other, it is often necessary to correctly describe their constituent atoms by assigning certain types to them. For small molecules, not many descriptors are available: the coordinates and chemical elements of atoms, the lengths of bonds and the magnitude of the angles between them. Using these features, we successfully predict atomic hybridizations and bond types. In this approach, each atom is considered "individually", the information about neighboring atoms necessary to determine the type of an atom is practically not used, and the types of atoms are determined by checking a large number of conditions. At the same time, molecules are represented as 3D molecular graphs, and it would be interesting to use this to predict their types with machine learning methods, for example, using CNNs. It is necessary to predict the types of vertices and edges of molecular graphs:

    1. atom type (graph vertex type, about 150 classes),
    2. atom hybridization (auxiliary feature, vertex type, 4 classes),
    3. connection type (auxiliary feature, edge type, 5 classes).

The type of an atom (graph vertex) is based on information about its hybridization and the properties of neighboring atoms. Therefore, in the case of a successful solution of the classification problem, clustering can be carried out to find other ways to determine the types of atoms.

  • Data: About 15 thousand molecules represented as molecular graphs. For each vertex (atom), 3D coordinates and a chemical element are known. Additionally, bond lengths, angles and dihedral angles between atoms (3D graph coordinates), binary signs reflecting whether an atom is included in the cycle and whether it is terminal are calculated. The sample is labeled, but the labeled data may contain ~5% errors.

If there is not enough data, it is possible to increase the sample (up to 200 thousand molecules), associated with an increase in inaccuracies in labeling.

  • References:
    1. [75]
    2. [76]
    3. [77]
  • Base algorithm: Prediction of hybridizations and link orders using a multiclass non-linear SVM with a small number of descriptors. https://hal.inria.fr/hal-01381010/document
  • Solution: Proposed solution to the problem and ways of conducting research.

Methods for presenting and visualizing data and conducting error analysis, analyzing the quality of the algorithm. At the first stage, it will be necessary to determine the operations on the graphs necessary to build the network architecture. Next, you will need to train the network for multi-class classification of the types of vertices (and edges) of the input graph. To assess the quality of the algorithm, it is supposed to evaluate the accuracy using cross-validation. For the final publication (in a specialized journal), it will be necessary to make a specific test for the quality of predictions: based on the predicted bond types, the molecule is written as a string (in SMILES format) and compared with a sample. In this case, for each molecule, the prediction will be considered correct only if the types of all bonds in it were predicted without errors.

  • Novelty: The proposed molecular graphs have a 3D structure and internal hierarchy, making them an ideal CNN application.
  • Authors: Sergei Grudinin, Maria Kadukova, Strijov V.V.

10 (1)

  • Title: Formulation and solution of an optimization problem combining classification and regression to estimate the binding energy of a protein and small molecules. The problem description [78]
  • Problem:

From the point of view of bioinformatics, The problem is to estimate the free energy of protein binding to a small molecule (ligand): the best ligand in its best position has the \textbf{lowest free energy} of interaction with the protein. (Following a large text, see the file at the link above.)

  • Data:
    1. Data for binary classification.

Approximately 12,000 protein-ligand complexes: for each of them there is 1 native position and 18 non-native ones. The main descriptors are histograms of distributions of distances between different atoms of the protein and ligand, the dimension of the vector of descriptors is ~ 20,000. In the case of continued research and publication in a specialized journal, the set of descriptors can be expanded. The data will be provided as binary files with a python script to read.

    1. Data for regression.

For each of the presented complexes, the value of the quantity is known, which can be interpreted as the binding energy.

  • References:
    1. SVM [79]
    2. Ridge Regression [80]
    3. [81] (section 1)
  • Base algorithm: [82]

In the classification problem, we used an algorithm similar to linear SVM, whose relationship with the energy estimate, which is outside the scope of the classification problem, is described in the above article. Various loss functions can be used in a regression problem.

  • Solution: It is necessary to connect the previously used optimization problem with the regression problem and solve it using standard methods. Cross-validation will be used to check the operation of the algorithm.

There is a separate test set consisting of (1) 195 complexes of proteins and ligands, for which it is necessary to find the best ligand pose (the algorithm for obtaining ligand positions differs from that used in training), (2) complexes of proteins and ligands, for which native poses it is necessary to predict the energy binding, and (3) 65 proteins for which the most strongly binding ligand is to be found.

  • Novelty:' First of all, the interest is combining classification and regression problems.

The correct assessment of the quality of protein and ligand binding is used in drug development to search for molecules that interact most strongly with the protein under study. Using the classification problem described above to predict the binding energy results in an insufficiently high correlation of predictions with experimental values, while using the regression problem alone leads to overfitting.

  • Authors Sergei Grudinin, Maria Kadukova, Strijov V.V.

2017

Author Topic Link Consultant Reviewer Report Letters
Goncharov Alexey (example) Metric classification of time series code,

paper, slides

Maria Popova Zadayanchuk Andrey BMF AILSBRCVTDSWH>
Alekseev Vasily Intratext coherence as a measure of interpretability of thematic models of text collections code

data paper slides video

Viktor Bulatov Zakharenkov Anton BMF AILSB+RC+V+TDHW
Anikeev Dmitry Local approximation of time series for building predictive metamodels code

paper slides

Strijov V.V. Smerdov Anton BMF AILS>B0R0C0V0T0D0H0W0
Gasanov Elnur Construction of an approximating description of a scalogram in the problem of predicting movements using an electrocorticogram code paper

slides

Anastasia Motrenko Kovalev Dmitry BMF AILSBRCVTDH0W0
Zakharenkov Anton Massively multiThe problem deep learning for drug discovery problemNetworks/code/ code

problemNetworks/doc/Zakharenkov2017MassivelyMultiThe problemNetworks.pdf paper problemNetworks/doc/Zakharenkov2016Presentation.pdf slides video

Maria Popova Alekseev Vasily BMF AILSBRCVT>D>H0W0
Kovalev Dmitry Unsupervised representation for molecules code

paper slides

Maria Popova Gasanov Elnur BMF AILSBRCVT>D>H0W0
Novitsky Vasily Feature Selection in Problems of Autoregressive Prediction of Biomedical Signals paper

code slides

Alexander Katrutsa B - F AILS>B0R0C0V0T0D0H0W0
Selezneva Maria Aggregation of heterogeneous text collections in a hierarchical thematic model of Russian-language popular science content paper

code slides video

Irina Efimova Sholokhov Alexey BMF A+IL+SBRCVTDHW
Smerdov Anton Choosing the optimal recurrent network model in the Paraphrase Search The problems paper

code slides video

Oleg Bakhteev Dmitry Anikeev BMF AIL+SB+RC>V+M-T>D0H0W0
Uvarov Nikita Optimal Algorithm for Reconstruction of Dynamic Models paper

slides code video

Yuri Maksimov BMF AILS0B0R0C0V0T0D0H0W0
Usmanova Karina Multiple Manifold Learning (Joint diagonalization for 3D shapes - AJD on Hessian matrices) paper

slides code video

Mikhail Karasikov Innokenty Shibaev BMF AILSBRC+VT+EDH>W
Innokenty Shibaev Convex relaxations for multiple structure alignment (synchronization problem for SO(3)) paper

slides code video

Mikhail Karasikov Usmanova Karina BMF AILS-BRCVT>D>H>W
Sholokhov Alexey Noise immunity of methods for informational analysis of ECG signals

paper code slides video

Vlada Bunakova Selezneva Maria BMF AILSBRCVTDHW


Risky works

Author Topic Link Consultant Reviewer Report Letters
Kaloshin Pavel Using deep learning networks to transfer classification models in case of insufficient data.

paper code data

Anton Khritankov - MF AIL-SBRC-VT+D>H>W0
Malinovsky Grigory Choice of Interpreted Multimodels in Credit Scoring The problems paper

code

Alexander Aduenko out B - - AILS-B>R>C>V>T0D0H0W0
Pletnev Nikita Internal plagiarism detection paper Rita Kuznetsova out - - - A-I-L-S>B0R0C0V0T0D0H0W0
Grevtsev Alexander Parallel Algorithms for Parametric Identification of the Tersoff Potential for AlN

paper

Karine Abgaryan
Zaitsev Nikita Automatic classification of scientific articles on crystallography

paper readme

Evgeny Gavrilov
Diligul Alexander Determination of the optimal potential parameters for the Rosato-Guillope-Legrand (RGL) model from experimental data and the results of quantum mechanical calculations

paper

Karine Abgaryan
Daria Fokina Selection of Candidates in the Problem of Finding Text Borrowings with Paraphrasing Based on the Vectorization of Text Fragments Alexey Romanov AILSB0R0C0V0T0D0H0W0

1. 2017

  • Title: Classification of human activities according to fitness bracelet measurements.
  • Problem: According to the accelerometer and gyroscope measurements, it is required to determine the type of worker's activity. It is assumed that the time series of measurements contain elementary movements that form clusters in the space of time series descriptions. The characteristic duration of the movement is seconds. Time series are labeled with activity type labels: work, leisure. The typical duration of activity is minutes. It is required to restore the type of activity according to the description of the time series and cluster.
  • Data: WISDM accelerometer time series (Time series (examples library), Accelerometry section).
  • References:
    1. Karasikov M.E., Strijov V.V. Classification of time series in the space of parameters of generating models // Informatics and its applications, 2016. [URL]
    2. Kuznetsov M.P., Ivkin N.P. Algorithm for Classifying Accelerometer Time Series by Combined Feature Description // Machine Learning and Data Analysis. 2015. V. 1, No. 11. C. 1471 - 1483. [URL]
    3. Isachenko R.V., Strijov V.V. Metric learning in The problemx multiclass classification of time series // Informatics and its applications, 2016, 10(2) : 48-57. [URL]
    4. Zadayanchuk A.I., Popova M.S., Strijov V.V. Choosing the optimal model for classifying physical activity based on accelerometer measurements // Information technologies, 2016. [URL]
    5. Motrenko A.P., Strijov V.V. Extracting fundamental periods to segment human motion time series // Journal of Biomedical and Health Informatics, 2016, Vol. 20, no. 6, 1466 - 1476.
    6. Ignatov A., Strijov V. Human activity recognition using quasiperiodic time series collected from a single triaxial accelerometer // Multimedia Tools and Applications, 2015, 17.05.2015 : 1-14. [URL]
  • Base algorithm: Basic algorithm is described in [Karasikov, Strijov: 2016] and [Kuznetsov, Ivkin: 2014].
  • Solution: Find the optimal segmentation method and optimal description of the time series. Construct a metric space of descriptions of elementary motions.
  • Novelty:: Connection of two characteristic times of the description of a person's life, combined statement of the problem.
  • Authors: Strijov V.V., M.P. Kuznetsov, P.V. Levdik.

2. 2017

  • Title: Construction of an approximating description of a scalogram in the problem of predicting movements using an electrocorticogram.
  • Problem: As part of solving the problem of decoding ECoG signals, The problem of classifying movements by time series of electrode readings is solved. The tools for extracting features from ECoG time series are the coefficients of the wavelet transform of the signal under study [Makarchuk 2016], on the basis of which a scalogram is built for each electrode - a two-dimensional array of features in frequency-time space. Combining scalograms for each electrode gives signs of a time series in the spatio-frequency-time domain. The feature description constructed in this way obviously contains multicorrelated features and is redundant. It is required to propose a method for reducing the dimension of the feature space.
  • Data: Measurements of the positions of the fingers when performing simple gestures. Description of experiments data.
  • References:
    1. Makarchuk G.I., Zadayanchuk A.I. Strijov V.V. 2016. Using partial least squares to decode hand movement using ECoG cues in monkeys. pdf
    2. Karasikov M.E., Strijov V.V. Classification of time series in the space of parameters of generating models // Informatics and its applications, 2016. [URL]
    3. Kuznetsov M.P., Ivkin N.P. Algorithm for Classifying Accelerometer Time Series by Combined Feature Description // Machine Learning and Data Analysis. 2015. T. 1, No. 11. C. 1471 - 1483.
  • Base algorithm: PLS

Chen C, Shin D, Watanabe H, Nakanishi Y, Kambara H, et al. (2013) Prediction of Hand Trajectory from Electrocorticography Signals in Primary Motor Cortex. PLoS ONE 8(12): e83534.

  • Solution: To reduce the dimension, it is proposed to use the local approximation method proposed in [Kuznetsov 2015] used to classify accelerometric time series [Karasikov 2016].
  • Novelty: A new method of movement recovery based on electrocorticograms is proposed.
  • Authors: Strijov V.V., A.P. Motrenko

3. 2017

  • Title: Multiple Manifold Learning (Joint diagonalization for 3D shapes - AJD on Hessian matrices).
  • Problem: Building an optimal algorithm for the Multiple Manifold Learning The problem. Two protein conformations (two tertiary structures) are given. In the vicinity of each state, a model of an elastic body is specified (oscillations of the structure in the vicinity of these states). The problem is to build a general model of an elastic body to find intermediate states with the maximum match with these models in the vicinity of given conformations. The space of motion of an elastic body is given by the Hessian eigenvectors. It is required to find a common low-rank approximation of the space of motions of two elastic bodies.
  • Data: Protein structures in double conformations from PDB, about 100 sets from the article https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4677049/
  • References: A list of scientific papers, supplemented by 1) the statement of the problem being solved, 2) links to new results (a recent article that is close in results), 3) basic information about the problem under study.

Tirion, M. M. (1996). Large amplitude elastic motions in proteins from a single-parameter, atomic analysis. Physical Review Letters, 77(9), 1905. Moal, I. H., & Bates, P. A. (2010). {SwarmDock} and the Use of Normal Modes in Protein-Protein Docking. IJMS, 11(10), 3623–3648. https://doi.org/10.3390/ijms11103623

  • Base algorithm: AJD algorithm: http://perso.telecom-paristech.fr/~cardoso/jointdiag.html, AJD algorithms implemented as part of Shogun ML toolbox http://shogun-toolbox.org , http://shogun-toolbox.org/api/latest/classshogun_1_1CApproxJointDiagonalizer.html.
  • Solution: Computing Hessians (C++ code from Sergey), learning and running standard joint diagonalization algorithms for the first n non-trivial eigenvectors, analyzing loss functions, adapting the standard algorithm to solve the original problem.
  • Novelty: Using simple elasticity models with one or more free parameters, thermal fluctuations in proteins can be described. However, such models do not describe transitions between several stable conformations in proteins. The purpose of this work is to refine the elastic model so that it also describes the space of conformational changes.
  • Authors: Sergey Grudinin, consultant: Mikhail Karasikov / Yury Maksimov.

4. 2017

  • Title: Convex relaxations for multiple structure alignment (synchronization problem for SO(3)).
  • Problem: Find transformations to align protein tertiary structures simultaneously (in simple words: find orthogonal transformations that align data in R^3 molecules that have the same chemical formula). If the structures are the same (the RMSD is equal to zero after alignment, the structures are aligned exactly), then you can align in pairs. However, if this is not the case, then the Basic algorithm, generally speaking, does not find the optimum of the original problem with a loss function for simultaneous equalization.
  • Data: Protein structures in PDB format in various states and coordinate systems.
  • References:
    1. Multiple structural alignment:
    2. Kearsley.S.K. (1990)7. Comput. Chem., 11, 1187-1192.
    3. Shapiro., BothaJ.D., PastorA and Lesk.A.M. (1992) Acta Crystallogr., A48, 11-14.
    4. Diamond,R. (1992) Protein Sci., 1, 1279-1287.
    5. May AC, Johnson MS, Improved genetic algorithm-based protein structure comparisons: pairwise and multiple superpositions. ProteinEng. 1995 Sep;8(9):873-82.
    6. Synchronization problem:
    7. O. Özyeşil, N. Sharon, A. Singer, ``Synchronization over Cartan motion groups via contraction”, Available at arXiv.
    8. L. Wang, A. Singer, `ʻExact and Stable Recovery of Rotations for Robust Synchronization”, Information and Inference: A Journal of the IMA, 2(2), pp. 145--193 (2013).
    9. Semidefinite relaxations for optimization problems over rotation matrices J Saunderson, PA Parrilo… - Decision and Control ( …, 2014 - ieeexplore.ieee.org
    10. Spectral synchronization of multiple views in SE (3) F Arrigoni, B Rossi, A Fusiello - SIAM Journal on Imaging Sciences, 2016 - SIAM
    11. Robust Rotation Synchronization via Low-rank and Sparse Matrix Decomposition, F Arrigoni, A Fusiello, B Rossi, P Fragneto - arXiv preprint arXiv: …, 2015 - arxiv.org
    12. Spectral relaxation for SO(2)
    13. A. Singer, Angular synchronization by eigenvectors and semidefinite programming, Applied and Computational Harmonic Analysis 30 (1) (2011) 20 – 36.
    14. Spectral relaxation for SO(3)
    15. M.Arie-Nachimson,S.Z.Kovalsky,I.Kemelmacher-Shlizerman,A.Singer,R.Basri,Global motion estimation from point matches, in: International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission, 2012 , pp. 81–88.
    16. A. Singer, Y. Shkolnisky, Three-dimensional structure determination from common lines in cryo-em by eigenvectors and semidefinite programming, SIAM Journal on Imaging Sciences 4 (2) (2011) 543–572.
  • Base algorithm: Local (pairwise) alignment algorithm. Kearsley S.K. (1989) Acta Crystallogr., A45, 208-210; Rapid determination of RMSDs corresponding to macromolecular rigid body motions

Petr Popov, Sergei Grudinin, Journal of Computational Chemistry, Wiley, 2014, 35(12), pp.950-956. <10.1002/jcc.23569> DOI: 10.1002/jcc.23569

  • Solution: Two options for setting optimization problems (through rotation matrices and through quaternions). Relaxation of the obtained problems by convex ones, comparison of the solutions of the problem by the basic algorithm and relaxations (spectral relaxation, SDP).
  • Novelty: A method that flattens structures by minimizing the loss function, taking into account all pairwise losses.
  • Authors: Sergey Grudinin, consultant: Mikhail Karasikov.

5. 2017

  • Title: Local approximation of time series for building predictive metamodels.
  • Problem: The physical activity of a person is investigated by time series - accelerometer measurements. The aim of the project is to create a tool for analyzing the problem of creating models for predicting models - metamodels. The segment of the time series is investigated. It is required to predict the class of the segment. (Option: predict the end of the segment, the next segment, its class. In this case, the class of the next segment may differ from the class of the previous one).
  • Data: Based on a Santa Fe or WISDM sample (samples consist of segments with many elementary movements and class labels corresponding to the segments), a variant of the OPPORTUNITY Activity Recognition Challenge.
  • References:
    1. Karasikov M.E., Strijov V.V. Classification of time series in the space of parameters of generating models // Informatics and its applications, 2016. [URL]
    2. Kuznetsov M.P., Ivkin N.P. Algorithm for Classifying Accelerometer Time Series by Combined Feature Description // Machine Learning and Data Analysis. 2015. V. 1, No. 11. C. 1471 - 1483. [URL]
  • Base algorithm: [Karasikov 2016]
  • Solution: See The problem description.
  • Novelty: When creating meta-prognostic models (predictive models of predictive models), the problem of using the values of parameters of local models when creating meta-models remains open. The purpose of the project below is to create a tool to analyze this problem.
  • Authors: Strijov V.V.

6. 2017

  • Title: Choosing the optimal recurrent network model in the Paraphrase Search The problems
  • Problem: Given a selection of pairs of sentences labeled <<similar>> and <<dissimilar>>. It is required to build a recurrent network of low complexity (that is, with a small number of parameters) that delivers a minimum error in the classification of pairs of sentences.
  • Data: It is proposed to consider two samples: Microsoft Paraphrase Corpus (a small set of sentences) and [http ://sitem.herts.ac.uk/aeru/ppdb/en/ PPDB] (set of short segments, markup not always correct)
  • References:
    1. [1] Step by step description of the implementation of the LSTM recurrent network
    2. [2] Thinning algorithm based on building a network with a minimum description length
    3. Optimal Brain Damage [3]
  • Basic algorithm: The basic algorithm can be:
    1. Solution without thinning
    2. Solution described in [3]
    3. Optimal Brain Damage
  • Solution: It is proposed to consider the thinning method described in [3] with a block covariance matrix: either neurons or parameters grouped by input features act as blocks.
  • Novelty: The proposed method will effectively reduce the complexity of the recurrent network, taking into account the relationship between neurons or input features.
  • Authors: Oleg Bakhteev, consultant

7. 2017

  • Title: Internal plagiarism detection
  • Problem: Solved by The problem to identify internal borrowings in text. It is required to test the hypothesis that the given text was written by a single author, and if it is not fulfilled, highlight the borrowed parts of the text. A borrowing is a part of the text, presumably written by another author and containing characteristic differences from the style of the main author. It is required to develop such a style function that allows to distinguish with a high degree of certainty the style of the main author of the text from borrowings.
  • Data: It is proposed to consider the corpus PAN-2011, PAN-2016
  • References:
    1. [1] Step by step description of the implementation of the LSTM recurrent network
    2. [2] Author clustering algorithm
    3. [3] Statistical Language Models Based on Neural Networks
    4. [4] Methods for intrinsic plagiarism detection and author diarization
  • Basic algorithm: The solution described in [4] can be used as the Basic algorithm
  • Solution: It is proposed to consider the method described in [2] and build a style function based on the neural network outputs.
  • Novelty: It is assumed that the construction of a style function by the proposed method can give an increase in quality compared to typical solutions to this problem.
  • Authors: Rita Kuznetsova, consultant

8. 2017

  • Title: Adaptive relaxations of NP hard problems through machine learning
  • Problem: Modern problems of optimizing power flows in power networks lead to non-convex optimization The problems with a large number of restrictions. Statements similar in structure also arise in a number of other engineering problems and in classical The problems of combinatorial optimization. The traditional approach to solving such NP hard problems is to write their convex relaxations (semidefinite/SDP, second order conic/SOCP, etc), which usually have a much larger set of feasible solutions than in the original problem. and by the subsequent projection of the obtained solution into the region where the constraints of the original problem are satisfied. In many practical cases, the quality of the solution obtained in this way is not high. Alternative approaches, for example MILP (mixed integer linear programming) relaxation, are substantially more time consuming but result in a more accurate answer.

The main problem is the impossibility of using known methods for solving large-scale problems (networks of 1000 nodes and more). One of the key obstacles is not so much the dimension of the problem as a large number of restrictions. At the same time, in real The problems it is possible to single out a small set of restrictions such that the sets of admissible points in the selected set and in the original one are very close. This will allow us to replace The problem with another one with fewer restrictions, which will increase the speed of the algorithms used. It is proposed to use machine learning methods to build the indicated set of the most important constraints.

  • References: Sampling/machine learning methods:
    1. Beygelzimer, A., Dasgupta, S., & Langford, J. (2009, June). Importance weighted active learning. In Proceedings of the 26th annual international conference on machine learning (pp. 49-56). ACM.
    2. Tong, S., & Koller, D. (2001). Support vector machine active learning with applications to text classification. Journal of machine learning research, 2(Nov), 45-66.
    3. Owen, A., & Zhou, Y. (2000). Safe and effective importance sampling. Journal of the American Statistical Association, 95(449), 135-143.

Relaxations: Nagarajan, H., Lu, M., Yamangil, E., & Bent, R. (2016). Tightening McCormick Relaxations for Nonlinear Programs via Dynamic Multivariate Partitioning. arXiv preprint arXiv:1606.05806.

  • Data: ieee + matpower data containing descriptions of energy networks and their modes of operation.
  • Novelty: This approach seems to be the first application of applied statistics/machine learning methods to solve difficult optimization problems. We expect substantial gains in labor-intensive style methods
  • Author: consultant: Yuri Maksimov, Expert: Mikhail Chertkov

9. 2017

  • Title: Optimal Algorithm for Reconstruction of Dynamic Models.
  • Problem: A standard machine learning problem statement in the context of unsupervised learning assumes that the examples are independent and come from the same probability distribution. However, often observed data are of dynamic origin and are correlated. The problem is to develop an efficient method for restoring a dynamic graphical model (graph and model parameters) from observed correlated dynamic configurations. This The problem is theoretically important and has many applications. The basis of the algorithm will be the adaptation of a new optimal method of screening interactions (interaction screening), developed for the Ising model. The solution process will combine familiarity with computer science/machine learning theoretical methods and numerical experiments.
  • Data: Simulated dynamic configurations of spins in the kinetic Ising model.
  • References:
    1. Lokhov et al., "Optimal structure and parameter learning of Ising models", arXiv:1612.05024 (2016) {https://arxiv.org/abs/1612.05024}
    2. Vuffray et al., "Interaction screening: efficient and sample-optimal learning of Ising models", NIPS 2016 {https://arxiv.org/abs/1605.07252}
    3. Decelle and Zhang, "Inference of the sparse kinetic Ising model using the decimation method", Phys. Rev. E 2016 {https://arxiv.org/abs/1502.01660}
    4. Bresler et al., "Learning graphical models from the Glauber dynamics", Allerton 2014 {https://arxiv.org/abs/1410.7659}
    5. Zeng et al., "Maximum likelihood reconstruction for Ising models with asynchronous updates", Phys. Rev. Lett. 2013
  • Base algorithm: Dynamic method for shielding interactions. Comparison with the maximum likelihood method.
  • Novelty: Currently, the optimal (ie using the minimum possible number of examples) algorithm for this problem is unknown. The dynamic method of interaction screening has a good chance of finally "closing" this The problem, because is optimal for a static problem.
  • Author: consultants Andrey Lokhov, Yuri Maksimov. Expert Mikhail Chertkov

10. 2017

  • Title: Choice of Interpreted Multimodels in Credit Scoring The problems
  • Problem: The problem of credit scoring is to determine the level of creditworthiness of the borrower. For this, a borrower's questionnaire is used, containing both numerical (age, income) and categorical features (gender, profession). It is required, having historical information about the repayment of loans by other borrowers, to determine whether the borrower will return the loan. The data can be heterogeneous (example, if there are different income regions in a country), and several models will be needed to adequately classify. It is necessary to determine the optimal number of models. Based on the set of model parameters, it is necessary to draw up a portrait of the borrower.
  • Data: It is proposed to consider five samples from the UCI and Kaggle repositories, with a capacity of 50,000 objects or more.
  • References: A.A. Aduenko \MLAlgorithms\PhDThesis; C. Bishop, Pattern recognition and machine learning, final chapter; 20 years of Mixture experts.
  • Base algorithm: Clustering and building independent logistic regression models, Adaboost, Decision Forest (with restrictions on complexity), Blend of Experts.
  • Solution: An algorithm is proposed for selecting a multi-model (a mixture of models or a mixture of Experts) and determining the optimal number of models.
  • Novelty: Proposed function of distance between models in which parameter distributions are given on different media.
  • Authors: A.A. Aduenko, Strijov V.V.

11. 2017

  • Title: Feature Selection in Problems of Autoregressive Prediction of Biomedical Signals.
  • Problem: The problem of predicting biomedical signals and IoT signals is being solved. It is required to predict the vector - the next few signal samples. It is assumed that the proper dimension of the space of both the predicted variable and the independent variable can be significantly reduced, thereby increasing the stability of the forecast without significant loss of accuracy. For this, the Partial Least Squares approach in autoregressive forecasting is used.
  • Data: SantaFe biomedical time series sample, IoT signal sample.
  • References: Katrutsa A.M., Strijov V.V. Stresstest procedure for feature selection algorithms // Chemometrics and Intelligent Laboratory Systems, 2015, 142 : 172-183; : Katrutsa A.M., Strijov V.V. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria // Expert Systems with applications, 2017; Kee Siong Ng A Simple Explanation of Partial Least Squares keesiong.ng@gopivotal.com Draft, April 27, 2013, http://users.cecs.anu.edu.au/~kee/pls.pdf
  • Base algorithm: PLS, quadratic optimization algorithm for feature selection.
  • Solution: build a design matrix with a suboptimal set of objects and features, propose a quadratic optimization error function (if possible, develop it for the case of a tensor representation of the design matrix).
  • Novelty: Generalized feature selection algorithm (published two weeks ago) for the PLS case.
  • Authors: A.M. Katrutsa, Strijov V.V.

12. 2017

  • Title: Massively multiThe problem deep learning for drug discovery
  • Problem: Develop a multi-The problem recurrent neural network to predict biological activity. For each molecule-protein pair, it is required to predict the binary value 0/1, which means that the molecule binds/does not bind to the protein.
  • Data: sparse biological activity data for ~100K molecules versus ~1000 proteins. Molecules are represented as SMILES strings (sequence of characters encoding a molecule)
  • References: https://arxiv.org/pdf/1502.02072
  • Base algorithm: multi-The problem neural network that predicts activity by numerical features, single-The problem recurrent neural network
  • Solution: MultiThe probleming means that you need to build a model that is obtained for the input of a molecule and predicts its biological activity against all proteins in the sample.
  • Novelty: Existing methods did not show a significant improvement in the quality of the DL model compared to standard ML models
  • Authors: Expert -- Alexander Isaev, consultant -- Maria Popova

13. 2017

  • Title: Unsupervised representation for molecules
  • Problem: Develop an unsupervised method for representing molecules
  • Data: ~1.5M molecules in SMILES string format (character sequence encoding the molecule)
  • References: https://www.cs.toronto.edu/~hinton/science.pdf
  • Base algorithm: currently hand-selected numerical features are used as such representation. The quality of the resulting representations can be compared with the tox21 dataset (10K molecules versus 12 proteins)
  • Solution: use convolutional or recurrent networks to build an autoencoder.
  • Novelty: building an end-to-end model to get informative features
  • Authors: Expert -- Alexander Isaev, consultant -- Maria Popova

14. 2017

  • Title: Intratext coherence as a measure of interpretability of thematic models of text collections.
  • Problem: Interpretability is a subjective measure of the quality of topic models, as measured by Expert Scores. Coherence is a measure of the occurrence of thematic words, calculated automatically from the text and correlates well with interpretability, as shown in the Newman and Mimno series. The first The problem is to evaluate the representativeness of the sequence of words in the text, according to which the coherence is estimated. The second The problem is to compare several new methods for measuring interpretability and coherence based on the selection of the most representative sequence of words in the source text.
  • Data: A collection of popular science content PostNauka, a collection of news content.
  • References:
    1. Vorontsov K. V. Review of probabilistic thematic models, 2017.
    2. N.Aletras, M.Stevenson. Evaluating Topic Coherence Using Distributional Semantics, 2013.
    3. D. Newman et al. Automatic evaluation of topic coherence, 2010
    4. D.Mimno et al. Optimizing semantic coherence in topic models, 2011
    5. http://palmetto.aksw.org/palmetto-webapp/
  • Base algorithm: Standard methods for estimating the interpretability and coherence of topics in topic models.
  • Solution: A new method for measuring interpretability and coherence, experiments to find the most correlated measures of interpretability and coherence, similar to [D.Newman, 2010].
  • Novelty: inline measures of interpretability and coherence were not previously proposed.
  • Authors: Vorontsov K. V.. consultants: Viktor Bulatov, Anna Potapenko, Artyom Popov.

15. 2017

  • Title: Aggregation of heterogeneous text collections in a hierarchical thematic model of Russian-language popular science content.
  • Problem: Implement and compare multiple ways of combining text collections from different sources into one hierarchical topic model. Build a classifier that determines the presence of a topic in the source.
  • Data: Collection of popular science content PostNauka, Wikipedia collection.
  • References:
    1. Vorontsov K. V. Review of probabilistic thematic models, 2017.
    2. Chirkova N. A, Vorontsov K. V. Additive regularization of multimodal hierarchical topic models // Machine Learning and Data Analysis, 2016. T. 2. No. 2.
  • Base algorithm: An algorithm for constructing a thematic hierarchy in BigARTM, implemented by Nadezhda Chirkova. Marking tool
  • Solution: Build a topic model with source modalities and highlight topics specific to only one of the sources. Prepare a sample for training a classifier that determines the presence of a topic in the source.
  • Novelty: Additive regularization of topic models has not been applied to this problem before.
  • Authors: Vorontsov K. V.. consultants: Alexander Romanenko, Irina Efimova, Nadezhda Chirkova.

16. 2017

  • Title: Application of the methods of symbolic dynamics in the technology of informational analysis of electrocardiosignals.
  • Problem: The technology of informational analysis of electrocardiosignals, proposed by V.M.Uspensky, involves converting a raw signal into a character sequence and searching for disease patterns in this sequence. So far, symbolic n-grams have been predominantly used to search for patterns. In the framework of this work, it is proposed to expand the class of templates in which the search for diagnostic signs of diseases is performed. Quality criterion -- AUC and MAP ranking of diagnoses.
  • Data: A selection of electrocardiograms with known diagnoses.
  • References:
    1. Uspensky V.M. Informational function of the heart. Theory and practice of diagnosing diseases of internal organs by the method of information analysis of electrocardiosignals. - M .: "Economics and Information", 2008. - 116s
    2. Technology of information analysis of electrocardiosignals.
  • Base algorithm: Classification methods .
  • Solution: Search for logical patterns in character strings, methods of character dynamics, comparison of algorithms according to the quality criteria AUC and MAP (diagnosis ranking).
  • Novelty: So far, character n-grams have been used predominantly to search for patterns.
  • Authors: Vorontsov K. V.. consultants: Vlada Tselykh.

Vorontsov The problems +

  • Title: Dynamic hierarchical thematic model of the news flow.
  • Problem: Develop an algorithm for classifying topics in news flows into new and ongoing ones. Apply the obtained criteria for creating new topics at all levels of the topic model hierarchy when adding the next piece of data to the text collection (for example, all news for one day).
  • Data: Collection of news in Russian. A subsample of news classified into two classes: new and ongoing topics.
  • Literature:
    1. Vorontsov K.V. Review of probabilistic thematic models, 2017.
    2. Chirkova N. A, Vorontsov K. V. Additive regularization of multimodal hierarchical topic models // Machine Learning and Data Analysis , 2016 T. 2. No. 2.
  • Basic Algorithm: An algorithm for constructing a thematic hierarchy in BigARTM, implemented by Nadezhda Chirkova. Known Topic Detection & Tracking algorithms.
  • Solution: Using BigARTM, selecting regularizers and their parameters, using the topic selection regularizer. Building an algorithm for classifying topics into new and ongoing.
  • Novelty: Additive regularization of topic models has not been applied to this problem before.
  • Authors: KV Vorontsov. Consultants: Alexander Romanenko, Artyom Popov.

Antiplagiarism +

  • Title: Selection of Candidates in the Problem of Finding Text Borrowings with Paraphrasing Based on the Vectorization of Text Fragments.
  • Problem: Searching for text borrowings in a collection of documents involves selecting a small set of candidates for subsequent detailed analysis. The Candidate Selection The problem is formulated as finding the optimal ranking of documents in a collection for a query with respect to some function that is an estimate for the total length of borrows from a collection document to a query document.
  • Data: PAN
  • References:
    1. Romanov A.V., Khritankov A.S. Selection of candidates when searching for borrowings in a collection of documents in a foreign language .pdf
  • Basic algorithm: shingles method with reverse index construction.
  • Solution: Vectorization of text fragments (word embeddings + convolutional / recurrent neural networks) and subsequent search for nearest objects in a multidimensional metric space.
  • Novelty: a new approach to solving the problem.
  • Authors: Alexey Romanov (consultant)

Additional projects

Vorontsov+

  • Title: Thematic modeling of an economic sector based on bank transaction data.
  • Problem: Test the hypothesis that a large sample of transactions between firms is adequately described by a relatively small set of economic activities (aka topics). The problem is reduced to decomposing the matrix of transactional data "buyers × sellers" into the product of three non-negative matrices "buyers × topics", "topics × topics", "topics × sellers", while the middle matrix describes a directed graph of financial flows in the industry. It is required to compare several methods for constructing such expansions and find the number of topics for which the observed set of transactions is modeled with sufficient accuracy.
  • Data: selection of transactions between firms, such as "buyer, seller, volume".
  • References:
    1. Vorontsov K. V. Review of probabilistic thematic models, 2017.
  • Base algorithm: Standard methods for non-negative matrix expansions.
  • Solution: Regularized EM-algorithm for sparse non-negative matrix expansions. Visualization of the graph of financial flows. Testing the algorithm on synthetic data, testing the hypothesis about the stability of sparse solutions.
  • Novelty: Thematic modeling has not previously been applied to the analysis of financial transactional data.
  • Authors: Vorontsov K. V.. consultants: Viktor Safronov, Rosa Aisina.

scoring+

  • Title: Generating and selecting features when building a credit scoring model.
  • Problem: Credit scoring models are built step by step. In particular, a number of independent transformations of individual features are performed, and new features are generated. Each step uses its own quality criterion. It is required to build a scoring model that adequately describes the sample. Maximizing the quality of the model at each step does not guarantee the maximum quality of the resulting model. It is proposed to abandon the step-by-step construction of the scoring model. To do this, the quality criterion must include all the optimized parameters of the model.
  • Data: The computational experiment will be performed on 5-7 samples to be found. It is desirable that the samples be of the same nature, for example, the samples of consumer credit questionnaires.
  • References: Siddique N. Constructing scoring models, SAS. Hosmer D., Lemeshow S., Applied logistic regression, Wiley. Katrutsa A.M., Strijov V.V. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria // Expert Systems with applications, 2017.
  • Base algorithm: The scoring model construction algorithm recommended by SAS.
  • Solution: Each step of the procedure is represented as an optimization problem. The parameters to be optimized are combined, and the Feature Selection The problem is included as a Mixed Optimization The problem.
  • Novelty: An error function is proposed, when using which the generation and selection of features, as well as the optimization of model parameters, are performed together.
  • Authors: T.V. Voznesenskaya, Strijov V.V.

Popova+

  • Title: Representation of molecules in 3D
  • Problem: Develop representations of the 3D structure of molecules that would have the property of rotational and translational invariance.
  • Data: Millions of molecules given by 3D coordinates
  • References: https://arxiv.org/abs/1610.08935, http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.98.146401
  • Base algorithm: low rank matrix/tensor factorization
  • Solution: Molecules have a different number of atoms, and therefore the matrix of their 3D coordinates is Nx3. We need to find a mathematical transformation that would be independent of N (N is the number of atoms).
  • Novelty: existing algorithms depend on the number of atoms in the molecule
  • Authors: Expert -- Alexander Isaev, consultant -- Maria Popova

Maksimov+

  • Title: Optimal algorithm for recovering block Hamiltonians (XY and Heisenberg models).
  • Problem: The problem is to reconstruct block Hamiltonians with continuous spins (a generalization of the Ising model to two- and three-dimensional spins) from the observed data. This setting is a special case of a field of machine learning known as unsupervised learning. Reconstruction of a graphical spin model from observational data is an important problem in physics. The basis of the algorithm will be the adaptation of a new optimal method of screening interactions (interaction screening), developed for the Ising model. The solution process will combine familiarity with computer science/machine learning theoretical methods and numerical experiments.
  • Data: Simulated block spin model configurations.
  • References:
    1. Lokhov et al., "Optimal structure and parameter learning of Ising models", arXiv:1612.05024 (2016) {https://arxiv.org/abs/1612.05024}
    2. Vuffray et al., "Interaction screening: efficient and sample-optimal learning of Ising models", NIPS 2016 {https://arxiv.org/abs/1605.07252}
    3. Tyagi et al., "Regularization and decimation pseudolikelihood approaches to statistical inference in XY spin models", Phys. Rev. B 2016 {https://arxiv.org/abs/1603.05101}
  • Base algorithm: Dynamic method for shielding interactions. Comparison with the method of maximum pseudo-likelihood (pseudolikelihood).
  • Novelty: An algorithm based on the dynamic interaction shielding method has a good chance of being optimal for this problem, because the corresponding method is optimal for the inverse Ising problem.
  • Author: consultants Andrey Lokhov, Yuri Maksimov. Expert Mikhail Chertkov

Khritankova (Transfer Learning)

  • Title: Using deep learning networks to transfer classification models in case of insufficient data.
  • Problem description:
    1. Develop an algorithm for calculating a set of latent features in the symmetric homogeneous transfer learning problem, the solution of the classification problem in which does not depend on the original area, and which is no worse than when solving for each area separately (transfer error) for the case of small sample sizes with errors in markup
    2. Develop an algorithm for transitioning to a hidden set of features without using markup (unsupervised domain adaptation)
  • Data: teraPromise-CK (33 datasets with the same features but different distributions).
  • References: Base article: Xavier Glorot , Antoine Bordes , Yoshua Bengio. (2011) Domain Adaptation for Large-Scale sentiment classification: A Deep Learning approach / In Proceedings of the Twenty-eight International Conference on Machine Learning, ICML.

Articles with ideas for improving the algorithm will be handed out (several).

  • Base algorithm: SDA (Stacked Denoising Autoencoder) – described in the Glorot et al.
  • Solution: Take the Basic algorithm, a) try to improve it for application to small datasets of 100-1000 objects (when transfer learning is applied) by applying regularizers, adjusting the architecture of the autoencoder, adjusting the learning algorithm (for example, bootstrapping) b ) investigate the model for resistance to markup errors (label corruption / noisy labels) and propose improvements to increase stability (robustness).
  • Novelty: Obtaining a stable algorithm for transferring classification models on small amounts of data with markup errors.
  • Authors: Khritankov

INRIA

  • Title: Estimated binding energy of protein and small molecules.
  • Problem: Modeling the binding of a protein and a small molecule (hereinafter referred to as a ligand) is based on the fact that the best ligand in its best position has the lowest free energy of interaction with the protein. It is necessary to estimate the free energy of protein and ligand binding. Complexes of proteins with ligands can be used for training, and for each protein there are several positions of the ligand: 1 correct, "native", for which the energy is minimal, and several generated incorrect ones. For a third of the data set, values are known that are proportional to the desired binding energy of ligands in native positions with the protein. There is a separate test set consisting of 1) complexes of proteins and ligands, for which it is necessary to find the best ligand position (the algorithm for obtaining ligand positions differs from that used in training), 2) complexes of proteins and ligands, for whose native positions it is necessary to predict the binding energy, and 3) proteins for which it is necessary to find the most strongly binding ligand.
  • Data: About 10000 complexes: for each of them there is 1 native pose and 18 (more can be generated) non-native ones. The main descriptors are histograms of distributions of distances between different atoms of the protein and ligand, the dimension of the vector of descriptors is ~ 20,000. The set of descriptors can be extended (you can generate poses with different deviations and use it as a descriptor, you can add the properties of small molecules: the number of bonds around which rotation is possible in a molecule, its surface area, its surface division by a Voronoi diagram. The data will be provided in the form of binary files with a python script to read.
  • References: PEPSI-Dock: a detailed data-driven protein–protein interaction potential accelerated by polar Fourier correlation Predicting Binding Poses and Affinities in the CSAR 2013―2014 Docking Exercises Using the Knowledge-Based Convex-PL Potential
  • Base algorithm: We used a linear SVM (these are just lecture notes, I see no reason to give Vapnik here, especially since all this, including these lecture notes, is googled), the connection of which with an energy estimate that goes beyond scope of the classification The problem is described in the articles listed above. To take into account experimentally known values proportional to energy, it is proposed to use linear regression SVR .
  • Solution: It is necessary to reduce the previously used SVM problem to a regression problem and solve it using standard methods. To check the operation of the algorithm, both the test described above and several other test sets with similar The problems but different data will be used.
  • Novelty: Proper assessment of the quality of protein and ligand binding is used in drug development to find molecules that interact most strongly with the protein under study.

Of particular importance is the assessment of the values of the binding energy of the protein with the ligand: the coefficient of correlation (Pearson) of the energy with its experimental values determined by different groups on the proposed test does not exceed 0.7. Prediction of the most strongly binding ligand from a large number of non-protein-binding molecules is also difficult. The aim of this work is to obtain a method that allows a fairly accurate assessment of protein binding to ligands. From the point of view of machine learning and optimization, it is of interest to combine classification and regression problems.

  • Appendix Given several data sets describing an atom in a molecule or a bond between atoms, with a small feature vector (usually 3-10 descriptors) and several classes corresponding to the atom's hybridization or bond order. The data itself can be from ~100 to 20,000 vectors depending on the type of atom. You need to test some kind of multiclass machine learning on this (random forests, neural network, something else), you can do anything with descriptors. We are currently using SVM. Not only the accuracy is important, but also the computational complexity of the prediction.
  • Authors: Sergei Grudinin, Maria Kadukova

Strijov and Kulunchakov+

  • Title: Creation of delay-operators for multiscale forecasting by means of symbolic regression
  • Problem: Suppose that one needs to build a forecasting machine for a response variable. Given a large set of time series, one can advance a hypothesis that they are related to this variable. Relying upon this hypothesis, we can use given time series as features for the forecasting machine. However, the values of time series could be produced with different frequencies. Therefore, we should take into account not only the values, but the delays as well. The simplest model for forecast is a linear one. In the presence of large set of features this model can approximate the response quite well. To avoid the problem of multiscaling, we introduce a definition of delay-operators. Each delay-operator corresponds to one time series and represents continuous correlation function. This correlation function shows a dependence between the response variable and corresponding time series. Therefore, each delay-operator put weights on the values of corresponding time series depending on the greatness of the delay. Having these delay-operators, we avoid the problem of multiscaling. To find them, we use genetic programming and symbolic regression. If the resulted weighted linear regression model would produce poor approximation, we can use a nonlinear one instead. To find good nonlinear function, we would use symbolic regression as well.
  • Data: Any data from the domain of multiscalse forecating of time series. See the full version of this introduction.
  • References: to be handed by V.V.Strijov
  • Base algorithm: to be handed by V.V.Strijov
  • Solution: Use genetic algorithms applied to symbolic regression to create and test delay-operators in multiscale forecasting.
  • Novelty: to be handed by V.V.Strijov
  • Authors: supervisor: V.V.Strijov, consultant: A.S. Kulunchakov

2016

Author Topic Link Consultant Reviewer Report Letters Grade Journal
Bayandina Anastasia Thematic models of distributive semantics for highlighting ethno-relevant topics in social networks paper

slides video

Anna Potapenko Oleg Gorodnitsky BF AILSB++RCVTDEWHS 10
Belozerova Anastasia Coordination of logical and linear classification models in the information analysis of electrocardiosignals code

paper slides video

Vlada Tselykh Malygin Vitaly BF AILSB+RC+VTD>E0WH>S 10
Maria Vladimirova Bagging of neural networks in the problem of predicting the biological activity of cell receptors code

paper slides vido

Maria Popova Volodin Sergey BMF AILSBRCVTD>E>WHS 10
Volodin Sergey A probabilistic approach to the problem of predicting the biological activity of nuclear receptors code paper slides

video, itis

Maria Popova Maria Vladimirova BMF AILSBRCVTDEWHS 10
Gorodnitsky Oleg An Adaptive Nonlinear Method for Recovering a Matrix from Partial Observations code

paper slides, itis

Mikhail Trofimov Bayandina Anastasia M A++I++L++S+B+R+C++VTDE+WH 10
Ivanychev Sergey Synergy of classification algorithms (SVM Multimodelling) code

paper slides

Alexander Aduenko BM A+I+L++S+BRCVTDEW+H 10
Kovaleva Valeria Regular structure of rare macromolecular clusters code

paper slides video, itis

Olga Valba, Yuri Maksimov Dmitry Fedoryaka BM A+IL+SBRCVTD0E0WH 10
Makarchuk Gleb Time series transformations for hand motion decoding using ECoG signals (electrocorticographic signals) of monkeys code,

paper slides video

Andrey Zadayanchuk BF AI+L+S+BRС>V>T+D>E0WH>S 10
Malygin Vitaly Application of combinatorial estimates of retraining of threshold decision rules for feature selection in the problem of medical diagnostics by the method of V. M. Uspensky code,

paper, slides

Shaura Ishkina Belozerova Anastasia B AILSBRCVTDEWH 10
Molibog Igor Using Dimension Reduction Methods When Building a Feature Space in the Problem of Internal Plagiarism Detection

paper, doc, slides, itis

Anastasia Motrenko Safin Kamil BMF AILSBRCVTDEWHS 10
Pogodin Roman Determining the position of proteins using an electronic map code, paper, slides

video, itis

Alexander Katrutsa Andrey Ryazanov BMF AILSBRСVTDEWHS 10
Andrey Ryazanov Restoration of the primary structure of a protein according to the geometry of its main chain folder

paper slides video, itis

Mikhail Karasikov Roman Pogodin BMF AIL+SBRC++VTD+EWHS 10
Safin Kamil Definition of borrowings in the text without indicating the source code, paper

slides video

Mikhail Kuznetsov Molibog Igor BMF AIL+SBRC>V>T>D>E0WHS 10
Dmitry Fedoryaka Mixtures of vector autoregression models in the problem of time series forecasting code,

slides, paper

Radoslav Neichev Kovaleva Valeria BM AILSBRCV-T>D0E0WH> 10
Tsvetkova Olga Building scoring models in the SAS system code,

paper slides

Raisa Jamtyrova Chygrynskiy Viktor BF A+I+L+S+B+R+C+V0T0D0E0WH>S 10
Chygrynskiy Viktor Approximation of the boundaries of the iris code paper

slides video

Yuri Efimov B AI+L+SBRCV+TDEHFS 10

1. 2016

  • Data: Synergy of classification algorithms. Data from the UCI repository so that it can be compared directly with other works, in particular the work of Vapnik.
  • References: There are different approaches to combining SVMs: on example, bagging (http://www.ecse.rpiscrews.us/~cvrl/FaceProject/Homepage/Publication/ICPR04_final_cameraready_v4.pdf), also try and boosting (http://www.researchgate.net/profile/Hong-Mo_Je/publication/3974309_Pattern_classification_using_support_vector_machine_ensemble/links/09e415091bdc559051000000.pdf).
  • Base algorithm: Described in the problem statement
  • Solution: a modification of the basic algorithm, or simply the Basic algorithm itself. The main thing is to compare with other methods and draw conclusions, in particular, about the relationship between the presence of an improvement in the quality and diversity of sets of reference objects built by different SVMs.
  • Novelty: It is known (for example, from Konstantin Vyacheslavovich's lectures) that it is not possible to build short compositions from strong classifiers (for example, SVM) using boosting (although they still try (see literature)). Therefore, it is proposed to build a nonlinear combination instead of a linear one. It is assumed that such a composition can give an increase in quality compared to a single SVM.
  • consultant: Alexander Aduenko

2. 2016

  • Title: Temporal theme model of the press release collection.
  • Problem: Development of methods for analyzing the thematic structure of a large text collection and its dynamics over time. The problem is the assessment of the quality of the constructed structure. It is required to implement the criteria of stability and completeness of the temporal thematic model using manual selection of the found topics according to their interpretability, difference and eventfulness.
  • Data: A collection of press releases from the foreign ministries of a number of countries over 10 years, in English.
  • References:
    1. Doikov N.V. Adaptive regularization of probabilistic topic models. VKR bachelor, VMK MSU. 2015.
  • Base algorithm: Blay's classic LDA with post-hoc time analysis.
  • Solution: Implementation of an additively regularized topic model using the BigARTM library. Building a series of thematic models. Evaluation of their interpretability, stability and completeness.
  • Novelty: Criteria for sustainability and completeness of thematic models are new.
  • consultant: Nikita Doikov, problem author Vorontsov K. V.

3. 2016

  • Title: Coordination of logical and linear classification models in the information analysis of electrocardiosignals.
  • Problem: There are logical classifiers based on the identification of diagnostic standards for each disease and built by the Expert in semi-manual mode. For these classifiers, estimates of disease activities are determined, which have been used in the diagnostic system for many years and satisfy physician users. We build linear classifiers that are trained completely automatically and are ahead of logical classifiers in terms of classification quality. However, a direct transfer of the activity estimation technique to linear classifiers turned out to be impossible. It is required to build a linear activity model, setting it to reproduce the known activity estimates of the logical classifier.
  • Data: A selection of more than 10 thousand electrocardiograms with diagnoses for 32 diseases.
  • References: will issue :)
  • Base algorithm: Linear classifier.
  • Solution: Methods of linear regression, linear classification, feature selection.
  • Novelty: The problem of matching two models of different nature can be considered as learning with privileged information - a promising direction proposed by the machine learning classic VN Vapnik several years ago.
  • consultant: Vlada Tselykh, problem author Vorontsov K. V.

4. 2016

  • Title: Thematic classification model for diagnosing diseases by electrocardiogram.
  • Problem: Technology of information analysis of electrocardiosignals according to V.M.Uspensky is based on ECG conversion into a character string and selection of informative sets of words - diagnostic standards for each disease. The linear classifier builds one diagnostic standard for each disease. The Screenfax screening diagnostic system now uses four standards for each disease, built in a semi-manual mode. It is required to fully automate the process of constructing diagnostic standards and to determine their optimal number for each disease. To do this, it is supposed to finalize the thematic classification model of S. Tsyganova, to perform a new implementation under BigARTM, to expand computational experiments, to improve the quality of classification.
  • Data: A selection of more than 10 thousand electrocardiograms with diagnoses for 32 diseases.
  • References: will issue :)
  • Base algorithm: Classification models by V.Tselykh, thematic model by S.Tsyganova.
  • Solution: Topic model implemented using the BigARTM library.
  • Novelty: Topic models have not previously been used to classify sampled biomedical signals.
  • consultant: Svetlana Tsyganova, problem author Vorontsov K. V.

5. 2016

  • Title: Thematic models of distributive semantics for highlighting ethno-relevant topics in social networks.
  • Problem: Thematic modeling of social media text collections faces the problem of ultra-short documents. It is not always clear where to draw the boundaries between documents (possible options: a single post, a user's wall, all posts by a given user, all posts for a given day in a given region, and so on). Topic models give interpretable vector representations of words and documents, but their quality depends on the distribution of document lengths. The word2vec model is independent of document lengths, since it takes into account only the local contexts of words, but the coordinates of vector representations do not allow thematic interpretation. The objective of the project is to build a hybrid model that combines the advantages and is free from the disadvantages of both models.
  • Data: Collections of social networks LJ and VK.
  • References: will issue :)
  • Base algorithm: Topic models previously built on this data.
  • Solution: Implementation of a distributive semantics regularizer similar to the vord2vec language model in the BigARTM library.
  • Novelty: So far, there are no language models in the literature that combine the main advantages of probabilistic topic models and the word2vec model.
  • consultant: Anna Potapenko, on technical issues Murat Apishev, problem author Vorontsov K. V.

7. 2016

  • Title: Determining the position of proteins using an electronic map
  • Problem: informally --- there are sets of experimentally determined maps of the location of proteins in complexes, some of them are known in high resolution, it is necessary to restore the entire map in high resolution; formally --- there are matrices and energy vectors corresponding to each map of the protein complex, it is necessary to determine which set of proteins minimizes the quadratic form formed by the matrix and vector.
  • Data: experimental data from the site http://www.emdatabank.org/ will be converted into matrices into energy vectors. Understanding the biophysical nature is not necessary.
  • References: articles on methods for solving quadratic programming problems and various relaxations
  • Base algorithm: quadratic programming methods with various relaxations
  • Solution: minimizing the total energy of the protein complex
  • Novelty: the application of quadratic programming methods and the study of their accuracy in The problems of restoring electronic maps
  • consultant: Alexander Katrutsa, problem author: Sergei Grudinin.
  • Desirable skills: understanding and interest in optimization methods, working with CVX package

8. 2016

  • Title: Classification of Physical Activity: Investigation of Parameter Space Variation in Retraining and Modification of Deep Learning Models
  • Problem: Given a classification model for a sample of time segments recorded from a mobile phone's accelerometer. The model is a multilayer neural network. It is required 1) to investigate the variance and covariance matrix of the neural network parameters under different optimization schedules (i.e., under different approaches to staged learning). 2) based on the obtained parameter covariance matrix, propose an effective way to modify the deep learning model.
  • Data: WISDM Sample http://www.cis.fordham.edu/wisdm/dataset.php.
  • References:
    1. Zadayanchuk A.I., Popova M.S., Strijov V.V. Choosing the optimal physical activity classification model based on accelerometer measurements http://strijov.com/papers/Zadayanchuk2015OptimalNN4.pdf
    2. Popova M.S., Strijov V.V. Building Deep Learning Networks for Time Series Classification - http://strijov.com/papers/PopovaStrijov2015DeepLearning.pdf
    3. Oleg Bakhteev Yu., Popova M.S., Strijov V.V. Deep Learning Systems and Tools in The problem Classification
    4. LeCun Y. Optimal Brain Damage - yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf
    5. Works on pre-training (pre-training) and additional training (fine-tuning)
  • Base algorithm: The basic model is described in the article "Building Deep Learning Networks for Time Series Classification". The algorithm can be implemented either using the PyLearn library or keras (other libraries and programming languages are also acceptable).
  • Solution: Analysis of the covariance matrix, building an add-del method based on the received data.
  • Novelty: The technique for studying a high-dimensional covariance matrix, as well as the resulting model modification algorithm, are important and will be used in the future when analyzing deep learning models.
  • consultant: Oleg Bakhteev

9. 2016

  • Title: Restoration of the primary structure of a protein according to the geometry of its main chain
  • Problem: on the basis of the main chain of the protein, that is, in essence its geometry, it is necessary to restore the primary structure of the protein, that is, which sequence of amino acids corresponds to the given geometry of the main chain. It is proposed to do this on the basis of minimizing the total energy of the protein, expressed by a quadratic form, most likely not positive definite.
  • Data: at the choice of the student: collected energy matrices for various proteins based on their descriptions in the PDB format or the PDB files themselves; in the latter case, it will be necessary to collect matrices for further work
  • References: articles on methods for solving quadratic programming problems and various relaxations
  • Base algorithm: quadratic programming methods with various relaxations
  • Solution: minimizing the total protein energy
  • Novelty: application of quadratic programming methods and study of their accuracy
  • consultant: Mikhail Karasikov, problem author: Sergei Grudinin.
  • Desirable skills: understanding and interest in optimization methods, working with CVX package

10. 2016

  • Title: Multi-The problem learning approach for The problem of predicting the biological activity of nuclear receptors
  • Problem: In The problem it is necessary to build a multi-The problem model that predicts the interaction of two types of molecules: receptors and proteins. The solution of this problem is necessary for the development of new drugs (drug design).
  • Data: description of 8500+ proteins and labels for 12 receptors
  • References: will be sent to the student
  • Base algorithm: multi-The problem lasso regression from scikit-learn python library
  • Solution: generalization of linear regression to the multi-The problem case in probabilistic interpretation
  • Novelty: Multi-The problem learning approach is pioneering in drug design
  • consultant: Maria Popova
  • Desired skills: understanding of and interest in probability theory, willingness to quickly understand various approaches to regression, knowledge or willingness to learn Python

11. 2016

  • Title: Bagging of neural networks in The problem of predicting the biological activity of nuclear receptors.
  • Problem: In The problem, it is necessary to implement bagging (bootstrap aggregating) for a two-layer neural network. Such a model will be multiThe probleming and predict the interaction of two types of molecules: receptors and proteins. The solution of this problem is necessary for the development of new drugs (drug design).
  • Data: description of 8500+ proteins and labels for 12 receptors
  • References: will be sent to the student
  • Base algorithm: two-layer neural network
  • Solution: Composition of base classifiers bagging
  • Novelty: This approach is innovative in the field of drug design
  • consultant: Maria Popova

12. 2016

  • Title: Mixtures of models in vector autoregression in the problem of predicting (large) time series.
  • Problem: There is a set of time series of length T containing the readings of various sensors that reflect the state of the device. It is necessary to predict the next t sensor readings. Practical significance: before a breakdown, the state of the device changes, the prediction of "abnormal" behavior will help to take timely measures and avoid breakdowns or minimize losses.
  • Data: Multivariate time series with indications of various server sensors (CPU, memory, temperature)
  • References: Keywords: mixture models, boosting, Adaboost, vector autoregression.
    1. Alexander Tsyplakov. Introduction to forecasting in classical time series models. [83]
    2. Neichev R.G., Katrutsa A.M., Strijov V.V. Selection of the optimal set of features from a multicorrelated set in the forecasting problem[84]
    3. Christopher M. Bishop. Pattern Recognition and Machine Learning. Page 667
  • Basic algorithm: Boosting, Adaboost algorithm.
  • Solution: Use a mixture of several linear models instead of one complex one to build pronosis.
  • Novelty: Improved parameter space for mixture of models in vector autoregression.
  • consultant: Radoslav Neichev

13. 2016

  • Title: Selection of multicorrelated features in the problem of vector autoregression.
  • Problem: There is a set of time series containing the readings of various sensors that reflect the state of the device. The readings of the sensors correlate with each other. It is necessary to select the optimal set of features for solving the forecasting problem.
  • Data: Multivariate time series with indications of various server sensors (CPU, memory, temperature)
  • References: Keywords: bootstrap aggregation, Belsley method, vector autoregression.
    1. Neichev R.G., Katrutsa A.M., Strijov V.V. Selection of the optimal set of features from a multicorrelated set in the forecasting problem[85]
  • Basic algorithm: Belsley's method for univariate autoregression (see bibliography article).
  • Solution: Apply the Belsley method to detect correlated features.
  • Novelty: The Belsley method is used for vector autoregression.
  • consultant: Radoslav Neichev

14. 2016

  • Title: Generation of features in the prediction problem.
  • Problem: There is a set of time series containing the readings of various sensors that reflect the state of the device. It is necessary to expand the feature space with the help of non-linear parametric generating functions.
  • Data: Multivariate time series with indications of various server sensors (CPU, memory, temperature)
  • References: Keywords: curvilinear regression, feature generation, non-linear regression, time series approximation.
    1. M.P. Kuznetsov, Strijov V.V., M.M. Medvednikov. Algorithm for multiclass classification of objects described in rank scales.[86]
  • Basic algorithm: Non-parametric generating functions.
  • Solution: Apply quasi-linear and non-linear parameter dependent transformations to features.
  • Novelty: A new set of features for solving autoregressive problems is proposed.
  • consultant: Roman Isachenko

15. 2016

  • Title: Time series transformations for hand motion decoding using ECoG signals (electrocorticographic signals) in monkeys.
  • Problem: There is a set of time series records of ECoG signals. It is necessary to extract the features using time series transformations (for example, the windowed Fourier transform).
  • Data: Multivariate time series with ECOG readings and monkey movement data problem
  • References: Keywords: feature extraction, time series transformations, ECoG signal processing
    1. Zenas C. Chao, Yasuo Nagasaka and Naotaka Fujii. Long-term asynchronous decoding of arm motion using electrocorticographic signals in monkeys
  • Basic algorithm: Wavelet transform
  • Solution: Feature extraction from ECoG by various methods.
  • Novelty: Wavelet Transform Optimality Analysis in ECoG Signal Processing The problems
  • consultant: Zadayanchuk Andrey

16. 2016

  • Title: An adaptive nonlinear method for recovering a matrix from partial observations
  • Problem: Let there be an unknown (possibly multidimensional) matrix A, the position of an element in it is described by an integer vector p. The values of the matrix on some subset of its elements are known. It is required to find a parametrization and parameters such that the quadratic deviation is minimized on some subset of elements. More detailed description at the link [87]
  • Data: model data, Netflix Prize Data Set, MovieLens 20M Dataset, Criteo Display Advertising Challenge Dataset
  • References:
    1. "ACCAMS: Additive Co-Clustering to Approximate Matrices Succinctly" (Beutel, Amr Ahmed, Smola)
    2. "Non-linear Matrix Factorization with Gaussian Processes" (Neil D. Lawrence)
    3. "Low-rank matrix completion using alternating minimization" (Prateek Jain, Praneeth Netrapalli, Sujay Sanghavi)
  • Basic algorithm: Low-rank approximation
  • Solution: and parameters, and search for parametrization from the data.
  • Novelty: A summary of works in this area; a new model is proposed, the effectiveness of which is proposed to be tested
  • consultant: Mikhail Trofimov
  • Desirable Skills: python

17. 2016

  • Title: Building scoring models in the SAS system (or MATLAB).
  • Problem: Describe the main steps in building scoring models. At the stage of data preparation, The problem of filtering choices (removing noise objects) is solved. Since the sample contains a significant number of features that do not correlate with solvency, it is necessary to solve the problem of feature selection. In addition, due to the heterogeneity of the data (by example, by region), it is proposed to build a mixture of models, in which each model describes its own subset of the sample. At the same time, different sets of features can correspond to different components of the mixture.
  • Data: Credit Story/Potential Borrower Questionnaires [88], .uci.edu/ml/datasets/Statlog+%28Australian+Credit+Approval%29/.
  • References:
    1. Hosmer, Lemeshov. Logistic regression
    2. Siddiqi. Constructing scorecards
    3. Scoring Mapping Materials
  • Basic algorithm: Logistic regression
  • Solution: Mix of models
  • Novelty: A method for constructing scoring maps is described, in which both feature generation and multi-modeling are included in the optimization problem.
  • consultant: Raisa Jamtyrova
  • Desirable Skills: SAS

18. 2016

  • Title: Approximation of the boundaries of the iris.
  • Problem: Based on the image of the human eye, determine the circles approximating the inner and outer border of the iris.
  • Data: Raster monochrome images, typical size 640*480 pixels (however, other sizes are also possible)

[89], [90].

  • References:
    1. K.A. Gankin, A.N. Gneushev, I.A. Matveev Segmentation of the iris image based on approximate methods with subsequent refinements // Izvestiya RAN. Theory and control systems, 2014, no. 2, p. 78–92.
    2. Duda, R. O. Use of the Hough transformation to detect lines and curves in pictures / R. O. Duda, P. E. Hart // Communications of the ACM. 1972 Vol. 15, no. 1.Pp.
  • Basic algorithm: Efimov Yury. Search for the outer and inner boundaries of the iris in the eye image using the paired gradient method, 2015.
  • Solution: See iris_circle_problem.pdf
  • Novelty: A fast non-enumerative algorithm for approximating boundaries using linear multimodels is proposed.
  • consultant: Yuri Efimov (by Strijov V.V., Expert Matveev)

19. 2016

  • Title: Approximation of combinatorial overfitting estimates for feature selection in the problem of medical diagnostics.
  • Problem: Technology of information analysis of electrocardiosignals according to V. M. Uspensky is used to diagnose diseases of internal organs by electrocardiogram. The linear naive bayesian classifier with feature selection performs well in this The problem. However, only very simple greedy strategies have been used so far for feature selection. It is proposed to use more intensive enumeration strategies to find better and shorter diagnostic feature sets. However, the more intense the search, the higher the probability of overfitting. To reduce overfitting, it is proposed to use combinatorial estimates of overfitting of threshold decision rules. For efficient calculation of these estimates, it is proposed to use surrogate modeling.
  • Data: Samples of vectors of ECG feature descriptions obtained using the Screenfax screening diagnostics system. Will be issued.
  • References:
    1. Uspensky V. M. Informational function of the heart. Theory and practice of diagnosing diseases of internal organs by the method of information analysis of electrocardiosignals. - M.: Economics and informatics, 2008. - 116 p.
    2. Vorontsov K. V. Reliability theory of precedent learning. Course of lectures of VMK MSU and MIPT. 2011.
    3. Ishkina Sh. Kh. Combinatorial estimates of generalizing ability as criteria for feature selection in the syndromic algorithm. - Abstracts of the 58th scientific conference of the Moscow Institute of Physics and Technology. URL: http://conf58.mipt.ru/static/reports_pdf/755.pdf
    4. MVR Composer http://www.machinelearning.ru/wiki/index.php?title=MVR_Composer
  • Base algorithm: linear naive bayes classifier with feature selection.
  • Solution: Exact combinatorial formulas are used to evaluate overfitting. For approximation (surrogate modeling) of these formulas, MVR Composer is used. Heuristic semi-greedy combinatorial optimization algorithms are used for feature selection.
  • Novelty: Previously, combinatorial retraining estimates were not used for feature selection. This method makes it possible to reduce diagnostic sets of features and improve the quality of classification.
  • consultant: Ishkina Shaura, Kulunchakov Andrey (MVR Composer), problem author: Vorontsov K. V.

20. 2016

  • Title: Object generation model in the problem of time series forecasting
  • Problem: Build an object generation model for the prediction The problem, which will create a high-quality sample for the subsequent solution of the prediction The problem.
  • Data: Electricity consumption time series, mobile phone accelerometer time series
  • References:
    1. Keogh E. J., Pazzani M. J. Scaling up dynamic time warping to massive datasets
    2. Salvador S., Chan P. Fastdtw: Toward accurate dynamic time warping in linear time and space
    3. Kuznetsov M.P., Ivkin N.P. Algorithm for classification of accelerometer time series by combined feature description
    4. Karasikov M. E. Classification of time series in the space of parameters of generating models
  • Base algorithm: Various heuristics
  • Problem Statement: The formulation and detailed description of the problem is given at [91]
  • Novelty: consideration of the data generation model in a similar The problem
  • consultant: Alexey Goncharov

21. 2016

  • Title: Algorithm for predicting the structure of locally optimal models
  • Problem: It is required to predict a time series using some parametric superposition of algebraic functions. It is proposed not to cost the prognostic model, but to predict it, that is, to predict the structure of the approximating superposition. A class of considered superpositions is introduced, and on the set of such structural descriptions, a search is made for a locally optimal model for the problem under consideration. The problem consists in 1) searching for a suitable structural description of the model 2) describing the search algorithm for the structure that will correspond to the optimal model 3) describing the algorithm for inverse construction of the model according to its structural description. For an already existing example of the answer to questions 1-3, see the work of A. A. Varfolomeeva.
  • Data: A set of time series, which implies the restoration of functional dependencies. It is proposed to first use synthetic data or immediately apply the algorithm to forecasting time series 1) electricity consumption 2) physical activity with subsequent analysis of the resulting structures.
  • References:
    1. A. A. Varfolomeeva Selection of features when marking up bibliographic lists using structural learning methods, 2013, [92]
    2. Bin Cao, Ying Li and Jianwei Yin Measuring Similarity between Graphs Based on the Levenshtein Distance, 2012, [93]
  • Base algorithm: Specifically, there is no basic algorithm for the proposed problem. It is proposed to try to repeat the experiment of A. A. Varfolomeeva for a different structural description in order to understand what is happening.
  • Solution: The superposition of algebraic functions defines an ortree, on the vertices of which the labels of the corresponding algebraic functions or variables are given. Therefore, the structural description of such a superposition can be its DFS-code. This is a string consisting of vertex labels, written in the order in which the tree is traversed by depth-first search. Knowing the arities of the corresponding algebraic functions, we can restore any such DFS-code in O(n) and get back the superposition of functions. On the set of similar string descriptions, it is proposed to search for the string description that will correspond to the optimal model.
  • consultant: Kulunchakov Andrey

22. 2016

  • Title: Definition of borrowings in the text without indicating the source
  • Problem: The problem is solved to detect internal borrowings in the text. It is required to test the hypothesis that the given text was written by a single author, and if it is not fulfilled, highlight the borrowed parts of the text. A borrowing is a part of the text, presumably written by another author and containing characteristic differences from the style of the main author. It is required to develop such a style function that allows to distinguish with a high degree of certainty the style of the main author of the text from borrowings.
  • Data: PAN-2011 contest collection.
  • References:
    1. Oberreuter, G., L'Huillier, G., Rıos, S. A., & Velásquez, J. D. (2011). Approaches for intrinsic and external plagiarism detection. Proceedings of the PAN.
  • Basic algorithm, solution: At the moment, a basic method for identifying dependencies is implemented, based on the analysis of the frequencies of words and symbolic n-grams in a sentence. For each text, a dictionary is formed, in which each word (n-gram) is assigned the value of its occurrence in the text. Based on the occurrence values, an indicative description of each segment-offer is formed. Classification of text segments is performed on the basis of Expert markup of borrowings. The quality of the base algorithm is 0.29 in F1-measure (Pladget 0.21) on the PAN-2011 collection, while the quality of the best algorithm that participated in the 2011 competition [Oberreuter] is 0.32 in F1-measure (Pladget 0.32). It is proposed to implement this algorithm and compare it with the base method.
  • consultant: Mikhail Kuznetsov

23. 2016

  • Title: Using Dimension Reduction Methods When Building a Feature Space in the Problem of Internal Plagiarism Detection
  • Problem: For a more efficient solution to The problem of detecting internal plagiarism, use dimensionality reduction methods that preserve the distance between objects. It is required to refine the tSNE method [2] by including in the model information about data markup and the possibility of adding previously unconsidered objects to the space of reduced dimension. For details see [1]
  • Data: PAN-2011 contest collection.
  • References:
    1. Problem_statement_dim_reduce.pdf‎
    2. Laurens van der Maaten. Visualizing Data using t-SNE Journal of Machine Learning Research, 9 (2008) 2579-2605.
    3. Julian Brooke and Graeme Hirst. Paragraph Clustering for Intrinsic Plagiarism Detection using a Stylistic Vector-Space Model with Extrinsic Features, 2012.
  • Basic algorithm, solution: See [1]
  • consultant: Anastasia Motrenko

26. 2016

  • Title: Construction of mappings with minimal deformation to compare images with the standard.
  • Problem: Apply the variational method of constructing quasi-isometric mappings to solve the classical problem of geometric morphology and image registration - constructing a two-dimensional or three-dimensional deformation for comparison with the standard.
  • Data: Images in bmp format. At the first stage, simple bodies can be defined by means of a b/w coloring of the Cartesian lattice.
  • References:
    1. Michael I. Miller, Alain Trouve, Laurent Younes. ON THE METRICS AND EULER-LAGRANGE EQUATIONS OF COMPUTATIONAL ANATOMY. Annu. Rev. Biomed. Eng. 2002. 4:375–405
    2. Beg MF, Miller MI, Trouve A, Younes L. Computing large deformation metric mappings via geodesics flows of diffeomorphisms. International Journal of Computer Vision. 2005; V.61(2):139-157.
    3. Trouve A. An approach of pattern recognition through infinite dimensional group action. Research report LMENS-95-9. 1995.
    4. Garanzha VA. Maximum norm optimization of quasi-isometric mappings. Num. Linear Algebra Appl. 2002; V.9(6-7):493-510.
    5. Garanzha V.A., Kudryavtseva L.N., Utyzhnikov S.V. Untangling and optimization of spatial meshes // Journal of Computational and Applied Mathematics. -- 2014. -- October. -- V. 269 -- P. 24--41.
  • Base algorithm: Use the variational method for constructing mappings, which was previously proposed for constructing spatial mappings with a given boundary mapping [4], [5], in the case when a measure of proximity of functions describing geometric bodies is given on example , as an rms measure of the proximity of brightness functions.
  • Solution: For the existing code that implements the variational method for constructing two-dimensional mappings with minimal distortion, it is necessary to add a module that implements an additive to the functional, which is a measure of the proximity of geometric bodies. This includes calculating the functional itself, its gradient, and adjusting the preconditioner.
  • Novelty: Compare the obtained method with the method of geodesic flow of diffeomorphisms proposed in the works of Alain Trouvé (see references [1]-[3]). Estimate the quality of the approximation and the performance of the resulting algorithm.
  • consultant: Vladimir Anatolyevich Garanzha (CC RAS).

27. 2016

  • Title: Cross-language thematic search for scientific publications.
  • Problem: Creation of a prototype search service that accepts the text of a scientific article in Russian as a request and returns thematically related articles in English from the arXiv.org collection as a search result.
  • Data: The arXiv.org text collection, Wikipedia's bilingual text collection.
  • References: will issue.
  • Base algorithm: Topic model built from the combined collection of the English-language arXiv and the bilingual English-Russian Wikipedia.
  • Solution: Building a regularized topic model using the BigARTM library. Application of standard means of constructing inverted indexes.
  • Novelty: There is no such service on the Russian Internet yet.
  • Consultant: Marina Suvorova.

28. 2016

  • Title: Search for resonant frequencies in polymer solutions.
  • Problem: Mathematically, The problem comes down to finding the spectral density of random graphs in the vicinity of the percolation point.
  • Data: Simulation data (Erdos-Rényi graphs around the percolation point).
  • References: Nazarov L. I. et al. A statistical model of intra-chromosome contact maps //Soft matter. - 2015. - T. 11. - No. 5. - S. 1019-1025.
  • Base algorithm: Monte Carlo.
  • Novelty: At present, an algorithm for estimating the spectral density of linear chains is known, the issue with estimating the spectral density of tree ensembles is open.
  • Consultant: Olga Valba, Yuri Maksimov, Problem Author: Nechaev Sergey.

2016 Group 2

Author Topic Link Consultant Reviewer Report Letters Grade Magazine
Akhtyamov Pavel Selection of multicorrelating features in the problem of vector autoregression code,

paper, slides

Radoslav Neichev Medvedeva Anna BF AI+LSB++R+CVTDEH 10
Bataev Vladislav Thematic classification model for diagnosing diseases by electrocardiogram code,

paper

Svetlana Tsyganova B AIL-S++B>R>C0V0T0D0E0W0H> >26.05 (7)
Ivanov Ilya Classification of physical activity: study of parameter space change during retraining and modification of deep learning models code,

paper, slides

Oleg Bakhteev BF A+ILS+B+R++C+VT+DEW0H 10
Medvedeva Anna Object generation model in the problem of time series forecasting code

paper slides

Goncharov Alexey Akhtyamov Pavel BF AILS-BRCVTD0EWS 10
Persianov Dmitry Temporal theme model of press release collection code

paper slides

Nikita Doikov BF A+I+L+S++B+R+C+V+T0DEW0H 10
Semenenko Denis Algorithm for Predicting the Structure of Locally Optimal Models code

paper

Kulunchakov Andrey B AI+L+SB0R0C0V0T0D0E0W0H0
Sofienko Alexander Coordination of logical and linear classification models in the information analysis of electrocardiosignals code,

paper

Vlada Tselykh B A-I-L-S-C0V0T0D0E0W0H> >26.05
Yaronskaya Lyubov Sparse Regularized Regression on Protein Complex Data code

paper slides

Alexander Katrutsa A-I-L-SB-R-CVT--D-EW0H> >26.05
Aksenov Sergey Cross-language thematic search for scientific publications. code

paper slides

Marina Suvorova AILS0B0R0C0V0T0D0E0W0H> >26.05 (7)
Khismatullin Timur Analysis and classification of the DNA-protein complex interface code

paper slides

Vladimir Garanzha F AILSBRCVT>H> >26.05 (7)

6

  • Title: Sparse Regularized Regression on Protein Complex Data
  • Problem: find the best regression model on protein complex binding data
  • Data: feature description of protein complexes and binding constants for them
  • References: articles on regression and comparing methods on similar data
  • Base algorithm: regularized linear regression (Lasso, Ridge, ..), SVR, kernel methods, etc.
  • Solution: comparison of various regression algorithms on data, selection of the optimal model and parameter optimization
  • Novelty: getting the best regression model for protein complex binding data
  • consultant: Alexander Katrutsa, problem author: Sergei Grudinin.
  • Desirable Skills: willingness to quickly understand various approaches to regression, knowledge or willingness to master C++ at an intermediate level (for a more complete study, you will need to try C++ libraries)

8

  • Title: Classification of physical activity: study of parameter space change during retraining and modification of deep learning models
  • Problem: Given a classification model for a sample of time segments recorded from a mobile phone's accelerometer. The model is a multilayer neural network. It is required 1) to investigate the variance and covariance matrix of the neural network parameters under different optimization schedules (i.e., under different approaches to staged learning). 2) based on the obtained parameter covariance matrix, propose an effective way to modify the deep learning model.
  • Data: WISDM Sample http://www.cis.fordham.edu/wisdm/dataset.php.
  • References:
    1. Zadayanchuk A.I., Popova M.S., Strijov V.V. Choosing the optimal physical activity classification model based on accelerometer measurements http://strijov.com/papers/Zadayanchuk2015OptimalNN4.pdf
    2. Popova M.S., Strijov V.V. Building Deep Learning Networks for Time Series Classification - http://strijov.com/papers/PopovaStrijov2015DeepLearning.pdf
    3. Oleg Bakhteev Yu., Popova M.S., Strijov V.V. Deep Learning Systems and Tools in The problem Classification
    4. LeCun Y. Optimal Brain Damage - yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf
    5. Works on pre-training (pre-training) and additional training (fine-tuning)
  • Base algorithm: The basic model is described in the article "Building Deep Learning Networks for Time Series Classification". The algorithm can be implemented either using the PyLearn library or keras (other libraries and programming languages are also acceptable).
  • Solution: Analysis of the covariance matrix, building an add-del method based on the received data.
  • Novelty: The technique for studying a high-dimensional covariance matrix, as well as the resulting model modification algorithm, are important and will be used in the future when analyzing deep learning models.
  • consultant: Oleg Bakhteev

25

  • Title: Stability of sampling of electrocardiosignals relative to frequency filtering.
  • Problem: Technology of information analysis of electrocardiosignals according to V.M.Uspensky is based on the transformation of the electrocardiogram into a character string (codogram) and the selection of informative sets of words - diagnostic standards for each disease. The problem is that for discretization it is necessary to accurately determine the amplitude of the R-peaks. The amplitude can be affected by the frequency filtering of the signal, which is performed by the electrocardiograph at the hardware or software level. The problem is to evaluate how much different frequency filters (example, 50.4Hz mains suppression filter, high-pass filter) can affect the word frequencies in the codegram and the quality of the classification.
  • Data: electrocardiograms in KDM format.
  • References: will issue :)
  • Base algorithm: Linear classifier.
  • Solution: Direct and inverse Fourier transform, algorithm for detecting R-peaks on an electrocardiogram, algorithm for determining the amplitude of R-peaks.
  • Novelty: The study of the stability of codograms in relation to frequency filtering with different parameters has not previously been carried out in the information analysis of electrocardiosignals.
  • consultant: Victor Safronov (Scientific Center named after V.I.Kulakov)

2015

Author Topic Link Consultant Reviewer DZ-1 DZ-2 (Problem number) Letters
Bernstein Julia Methods for characterizing fibrinolysis by in vitro blood imaging sequence Matveev I. A. Solomatin 1 3 (8) AILSBRCVTDE
Bochkarev Artem Structural learning when generating models [94] (no code), paper, slides Varfolomeeva Anna, Oleg Bakhteev Isachenko 2 2 (7) A+I++LS+BRCVT+DS
Goncharov Alexey Metric classification of time series code,

paper, slides

Maria Popova Zadayanchuk 1.5 1 (4) AILSBRCVTDSW
Dvinskikh Darina Improving the quality of forecasting using product groups code,

paper, slides

Kanevsky D. Yu. Smirnov 0.5 3 (7) AILSBRCVTDEHS
Efimov Yuri Search for the outer and inner boundaries of the iris in the eye image using the paired gradient method code,

paper, slides

Matveev I. A. Neichev AILSBRCVTDEW
Zharikov Ilya Checking the compliance of the electrocardiograph with the requirements of the diagnostic system "Screenfax" and assessing the quality of electrocardiograms. code, paper, slides Shaura Ishkina Bochkarev 3.5 3 (5) AIL+SBRCVTDEHSW
Zadayanchuk Andrey Choosing the optimal physical activity classification model code,

paper, slides

Maria Popova Goncharov 2 0 (17) AI-LSB+RCVTD
Zlatov Alexander Building a hierarchical model of a large conference code,

paper, slides

Arsenty Kuzmin Dvinskyh 1.5 3 (14) AI+L+SBRC++V+TDESW
Isachenko Roman Metric Learning and Space Dimension Reduction in The problems of Time Series Clustering code, paper, slides Alexander Katrutsa Zharikov 3.5 3 (14) A-I+L+S-BR+CVTDEHSW
Radoslav Neichev Feature Selection in Time Series Forecasting Using Exogenous Factors code, paper, slides Alexander Katrutsa Efimov 1 3 (9) AI-L-SBRCVTDEHSW
Podkopaev Alexander Prediction of Quaternary Structures of Proteins code,

paper, slides

Maksimov Yu. V. Reshetov 3.5 3 (11) AILS+B+RCVTDEHS
Reshetova Daria Multiclass Classification Methods with Improved Convergence Estimators in Partial Learning The problems code,

paper, slides

Maksimov Yu. V. Kamzolov 2.5 3 (10) AIL++SB+RCVT++DEHS-
Smirnov Evgeniy Thematic model of interests of permanent users of the mobile application code, paper, slides Victor Safronov Zlatov 1 1 (4) AILSBRCVTWDE
Solomatin Ivan Determination of the iris shading area by the classifier of local textural features code, paper, slides Matveev I. A. Bernstein Julia 3 (9) AILSBRCVTDE
Chernykh Vladimir Testing nonparametric algorithms for time series forecasting under nonstationary conditions code,

paper, slides

Stenina Maria Shishkovets Svetlana 3.5 3 (4) A+I+LSBRCVT+DE++H++
Shishkovets Svetlana Regularization of a linear naive bayes classifier. code,

paper, slides

Uskov Mikhail, Vorontsov K. V. Chernykh Vladimir 3.5 2 (9) A+I+L+SBR+CV+TD+E+H+S
Kamzolov Dmitri New algorithms for the problem of ranking web pages Alexander Gasnikov, Yuri Maksimov Podkopaev AILSB+RCVT+DEHS--
Sukhareva Angelica Classification of scientific texts by branches of knowledge code,

paper, slides

Sergei Tsarkov 0.5 AILSBRCVTDEH

1. 2015

  • Title: Improving the quality of demand forecasting using product groups
  • Problem description:

Given:

    1. Time series of sales for several product groups in one hypermarket. Also, for each product, periods of shortage, periods of influence on the demand of calendar holidays and periods of holding are known. marketing promotions. A product classifier is also known: a tree of product groups, where the products themselves are leaves.
    2. Forecasting algorithm that is used to generate demand forecasts for these products: self-adaptive exponential smoothing (Trigg-Leach model, see [1])
    3. Loss function by which the quality of forecasts is measured: MAPE.
    4. Requirements for building forecasts: forecasts must be built weekly for 4 weeks ahead (at the beginning of the current week, you need to build a forecast of total demand for the next week, a week in one, two, and 3).

Hypothesis: Demand for individual goods is too volatile to reveal their characteristic seasonality. It is proposed to use data on product groups in order to more accurately determine the parameters of seasonality. Note: there are other options for improving the quality of forecasting by working with groups of goods. The problem is to improve the quality of forecasting within the framework of The problem by taking into account the effect of the interchangeability of goods, in comparison with the Basic algorithm The result can be considered achieved if a statistically significant increase in quality is shown when building a series of forecasts (at least 20) for each time series using a sliding control.

  • Data:
    1. Data on sales of several product groups in a hypermarket of a large retail chain: https://drive.google.com/file/d/0B5YjPespcL83X3pHaE1aRzBUaDg/view?usp=sharing
  • References:
    1. Lukashin Yu. P. Adaptive methods of short-term forecasting of time series. - M .: Finance and statistics, 2003.
    2. http://www.machinelearning.ru/wiki/index.php?title=%D0%9C%D0%BE%D0%B4%D0%B5%D0%BB%D1%8C_%D0%A2%D1 %80%D0%B8%D0%B3%D0%B3%D0%B0-%D0%9B%D0%B8%D1%87%D0%B0
    3. Nitin Patel, Mahesh Kumar, Rama Ramakrishnan. Clustering models to improve forecasts in retail merchandising. http://www.cytel.com/Papers/INFORMS_Prac_%2004.pdf
    4. Kumar M., Error-based Clustering and Its Application to Sales Forecasting in Retail Merchandising. PhD Thesis. http://books.google.ru/books/about/Error_based_Clustering_and_Its_Applicati.html?id=6252NwAACAAJ&redir_esc=y
  • Base algorithm: It is proposed to use the seasonality model [3] in combination with the Trigg-Leach model as a non-seasonal series prediction algorithm ([1] and [2]). In this case, 3 variants of the algorithm are possible, depending on the method of assessing seasonality:
    1. Seasonality is estimated by the very series of sales. For products with a "short" history, seasonality is not assessed.
    2. Seasonality is estimated for a group of goods, based on the classifier of commodity groups (lower level of the classifier)
    3. Seasonality is estimated by clusters, based on the methodology [3], [4].
  • Solution: It is required to implement the combination of the seasonality model [3] and the Trigg-Leach model as a non-seasonal series prediction algorithm ([1] and [2]), with the 3 variants of seasonality analysis described above. When constructing seasonal profiles, it is necessary to exclude periods of marketing campaigns (otherwise, there may be a significant distortion of seasonality). Next, you need a series of experiments with quality analysis on real data. When analyzing quality, you can exclude periods of holidays and marketing campaigns. Based on the results of the experiments, it may be necessary to adapt the clustering algorithm.
  • Novelty: Building a self-adaptive forecasting algorithm taking into account seasonality, identified by cluster analysis.
  • consultant: Kanevsky D.Yu.

2. 2015

3. 2015

  • Title: Obtaining an estimate of the sparse covariance matrix for nonlinear models (neural networks).
  • Problem: Suggest a method for estimating the covariance matrix of parameters of a general model for the case of linear regression, logistic regression, general non-linear models, including neural networks. Suggest a way to take into account the structure of the matrix (sparseness, dependencies between coefficients, etc.)
  • Data: Synthetic data and tests.
  • References:
    1. Zaitsev A.A., Strijov V.V., Tokmakova A.A. Maximum Likelihood Estimation of Hyperparameters of Regression Models // Information Technologies, 2013, 2 - 11-15.
    2. Kuznetsov M.P., Tokmakova A.A., Strijov V.V. Analytic and stochastic methods of structure parameter estimation // Preprint, 2015.
    3. Aduenko A. A. Presentation on Evidence, 2015. aduenko_presentation_russian.pdf
    4. Bishop C. M. Pattern Recognition and Machine Learning, pp. 161-172, 2006.
  • Base algorithm: Diagonal matrix estimation, see MLAlgorithms/HyperOptimization folder.
  • Solution:
  • Novelty: A fast algorithm for obtaining estimates of the general covariance matrix for nonlinear models is proposed, the properties of sparse matrices are investigated.
  • consultant: Alexander Aduenko.

4. 2015

  • Title: Feature selection in time series forecasting using exogenous factors
  • Problem: The problem statement from [95] formula (32)
  • Data: time series with electricity prices.
  • References:
    1. Keywords: Hourly Price Forward Curve, short-term time series forecasting, feature selection, Add-Del method, (non)linear regression.
    2. Main Articles:
    3. [96] - study of the influence of prices in one country on the price in another and how to take this into account when forecasting .
    4. [97] - overview of terms and processes emerging in HPFC forecasting + motivation
    5. [98] - also about price forecasting, but here about spot prices
  • Base algorithm:
    1. LAD-Lasso estimation from [99]
    2. Sanduleanu's article about the Add-Del modification: [100].
  • Solution: apply the modified Add-Del method as a feature selection method.
  • Novelty: comparison of basic and proposed methods, analysis of properties of the proposed method.
  • consultant: Alexander Katrutsa.

5. 2015

  • Title: Development of an image recognition algorithm for the search for fibrinolysis parameters.
  • Problem: A set of images of fibrin clot growth obtained during the study of thrombodynamics and 80%D0%B8%D0%BD%D0%BE%D0%BB%D0%B8%D0%B7|fibrinolysis. It is required to develop an algorithm for finding the coordinates of the segment and the angle of inclination of the activator line from a series of images. Test the developed algorithm on different types of fibrinolysis and examples where this process is absent.
  • Data: An array of images for each study in tiff format 16 bits with time points from the beginning in seconds.
  • References:
    1. Description of the applied The problem and terms of reference: on request.
  • Base algorithm: Hough Transform [101], discussed.
  • consultant: I.A. Matveev

6. 2015

  • Title: Prediction of Quaternary Structures of Proteins: нивелирование
  • Problem description: The problem is to predict the packing of protein molecules into a multimeric complex in the rigid body approximation. One of the formulations of the problem is written as a non-convex optimization.

It is necessary to study this formulation and propose a solution algorithm. Suppose we have N proteins in an assembly, such that each protein i can be located in one of P positions x_{p}^{i}. N is ~ 10, P ~ 100. To each two vectors x_{i}^{p} and x_{j}^{q}, we can assign an energy function q_{0}, which is the overlap integral in the simplest approximation. Each protein position also has an associated score b_{0}.

  • Data: Collected using one of the standard complexes resolved using electron microscopy. The energy values and overlap integrals are calculated by modifying one of the standard packages, on example, HermiteFit. Data is generated in ~1 minute, code modification and data preparation will take ~1 week.
  • References: Yu.E. Nesterov Introduction to Convex Optimization (available at PreMoLab website)
  • Code notes: Implementation notes
  • Base algorithm: I would like to try convex relaxations.
  • Novelty: Convex relaxations have not been used before in such The problems on these proteins
  • consultant: Yu.V. Maksimov

7. 2015

  • Title: Metric learning and space dimensionality reduction in Time Series Classification The problems
  • Problem: The problem statement from the base article, some modification of the error function is possible due to the specifics of the time series
  • Data: electricity price time series
  • References:
    1. [102] - basic article
    2. [103] - excellent overview of Metric Learning methods
    3. [104] - more overview
  • Base algorithm: Frank-Wolf algorithm (conditional gradient descent)
  • Solution: apply target matrix decimation with Belsley method to remove multicollinearity
  • Novelty: application of Metric Learning methods in the problem of time series clustering, analysis of the properties of the proposed method
  • consultant: Alexander Katrutsa

8. 2015

  • Title: Structural learning when generating models
  • Problem: Solved by The problem search ranking function in Information Search The problems. The search is carried out among non-parametric functions (structures) generated by a grammar of the form G: g---> B(g, g) | U(g) | S, where B is a set of binary operations {+, -, *, /}, U - unary operations {-(), sqrt, log, exp}, S - variables and parameters {x, y, k}. It is proposed to solve the problem of generating a ranking model in two stages, using the history of restoring the structure of the model as a training sample.
  • Data: TREC subcollections.
  • Description of the collection of data used to evaluate the features, and the evaluation procedure. [105]
  • References:
    1. Jaakkola T. Scaled structured prediction.
    2. Jaakkola lecture “Scaling structured prediction”
    3. Find all the work of TJ students on a given topic.
    4. Varfolomeeva A.A. Bachelor's thesis in MLAlgorithms/BSThesis/Varfolomeeva
  • Base algorithm: Parantap, BM25 - models for comparison.
  • Solution: It is proposed to cluster the collection and generate models for document clusters. Then, using the structural learning method, find models that generalize the unions of clusters up to the collection itself.
  • Novelty: Ranking functions found that are as good as those used in practice.
  • consultant: Anna Varfolomeeva, Oleg Bakhteev

9. 2015

  • Title: Checking the compliance of the electrocardiograph with the requirements of the diagnostic system "Screenfax" and assessing the quality of electrocardiograms.
  • Problem description: The problem of checking the compliance of an arbitrary electrocardiograph with the requirements of the "Screenfax" diagnostic system [1—4] is solved based on a comparison of electrocardiograms (ECG) of the same and the same patients recorded by both devices according to the ABAB scheme, where A is the first device, B - the second. The problem of automatic detection of low-quality electrocardiograms that do not meet the requirements of the diagnostic system is also solved.
  • Data: The selection consists of records with ECG values recorded by the device for which the test is being carried out, and by the device used in the Screenfax diagnostic system (data with a detailed description of the recording format will be provided to the person who selected The problem). You can use http://www.physionet.org/physiobank/database/ptbdb/ to test algorithms for R-peak detection and noise level estimation.
  • References:
    1. Information portal of the Diagnostic system "Screenfax". URL: http://skrinfax.ru/method-author/
    2. Technology for information analysis of electrocardiosignals
    3. Uspensky V.M. Information function of the heart. Theory and practice of diagnosing diseases of internal organs by the method of information analysis of electrocardiosignals. M.: Economics and informatics, 2008. 116p.
    4. Uspensky V.M. Information function of the heart. // Clinical medicine. 2008. V.86. No. 5. pp.4–13.
    5. Naseri H., Homainezhad M.R. Electrocardiogram signal quality assessment using an artificially reconstructed target lead // Computer Methods in Biomechanics and Biomedical Engineering. 2015. Vol.18, No. 10.Pp. 1126-1141.
    6. Zidelmal Z., Amirou A., Ould-Abdeslam D., Moukadem A., Dieterlen A. QRS detection using S-Transform and Shannon energy. // Comput Methods Programs Biomed. 2014. Vol. 116, no. 1.Pp. 1-9. URL: https://yadi.sk/i/-kD00y1VepB3q
    7. Sarfraz M., Li F. F., Khan A. A. Independent Component Analysis Methods to Improve Electrocardiogram Patterns Recognition in the Presence of Non-Trivial Artifacts // Journal of Medical and Bioengineering. 2015. Vol. 4, no. 3.Pp. 221-226. URL: https://yadi.sk/i/-kD00y1VepB3q
    8. Meziane N. et al. Simultaneous comparison of 1 gel with 4 dry electrode types for electrocardiography // Physiol. Meas. 2015. Vol. 36, no. 513.
    9. Allana S., Aversa J., Varghese C., et al. Poor quality electrocardiograms negatively affect the diagnostic accuracy of ST segment elevation myocardial infarction. // J Am Call Cardiol. 2014. Vol. 63, no. 12_S. doi:10.1016/S0735-1097(14)60172-8.
  • Base algorithm: ECG quality estimation – [4], R-peak detection – [5], noise level estimation in data – [6].
  • Solution: The problem of checking the compliance of an arbitrary electrocardiograph with the requirements of the "Screenfax" diagnostic system is proposed to be solved by constructing permutation statistical tests by comparing the values of RR-intervals and R-amplitudes and detected code sequences (calculated by amplitudes and intervals) for each diseases. This is where The problem of detecting R peaks comes in. In The problem of detecting low-quality electrocardiograms, The problem of estimating the noise level arises. In addition, it is necessary to learn how to filter out ECG with non-informative amplitude values or a large spread of interval values, since the method of analyzing electrocardiographic signals is not applicable to the diagnosis of arrhythmia.
  • Novelty: The problem of checking the compliance of the electrocardiograph with the requirements of the diagnostic system can be considered as The problem of comparing ECG recording devices that arise, for example, when comparing different types of electrodes, and the noise level in the values of electrocardiosignals, the presence of baseline drift are selected as criteria and some other features [7].
  • consultant: Shaura Ishkina

10. 2015

  • Title: Simplification of the IR models structure
  • Problem: To achieve the acceptable quality of the information retrieval models, modern search engines use models of very complex structure. In current research we propose to simplify the model structure and make it interpretable without decreasing the model accuracy. To do this, we follow the idea from (Goswami et al., 2014) of constructing the set of nonlinear IR functions of simple structure and admissible accuracy. However, each of these functions is expected to have lower accuracy while comparing with the best IR model of complex structure. Thus, we propose to approximate this complex model with the linear combination of simple nonlinear functions and expect to obtain the comparable quality of solution.
  • Data: TREC collections.
  • References:
    1. P. Goswami et Al. Exploring the Space of IR Functions // Advances in Information Retrieval. Lecture Notes in Computer Science. 8416:372-384, 2014.
    2. problem statement
  • Base algorithm: Gradient boosting machine for constructing a model of high complexity. Exaustive search of superpositions from a set of elementary functions for approximation and simplification.
  • Solution: The optimal functions for the linear combination can be found by the greedy algorithm.
  • Novelty: A new ranking function of simple structure competitive with traditional ones.
  • consultant: Mikhail Kuznetsov.

11. 2015

  • Title: Testing non-parametric time series forecasting algorithms under non-stationary conditions
  • Problem: One of the key assumptions about the distribution of data in non-parametric is the assumption that the time series is stationary. The adequacy of forecasts if this requirement is not met is not guaranteed. It is required to develop a method for determining the fulfillment of the condition of local stationarity of the time series to study the applicability of the main algorithms of nonparametric forecasting in the absence of stationarity. Consider the main methods of nonparametric regression, such as kernel smoothing, spline smoothing, autoregression, moving average, etc.
  • Data: Data on freight rail transportation (RZD)
  • References:
    1. Valkov A.S., Kozhanov E.M., Medvednikova M.M., Khusainov F.I. Nonparametric forecasting of railway junction system load based on historical data // Machine Learning and Data Analysis. - 2012. - No. 4.
    2. Dickey D. A. and Fuller W. A. Distribution of the Estimators for Autoregressive Time Series with a Unit Root / Journal of the American Statistical Association. - 74. - 1979. - p. 427--431.
  • Base algorithm: ARMA, Hist.
  • Solution: Use the Dickey-Fuller test as a basic method for checking series for non-stationarity. It is also proposed to consider such sources of non-stationarity as trend and seasonality.
  • Novelty: A method for determining the fulfillment of the condition of local stationarity of a time series has been developed and substantiated.
  • consultant: Stenina Maria

12. 2015

  • Title: Learning metrics in Full and Partial Learning The problems
  • Problem description: is a software implementation of a complex of convex and DC-optimization methods for the problem of choosing the optimal metric in The problems of recognition. In other words, in constructing a metric such that the nearest neighbor classification gives high accuracy.
  • Data: Birds and Fungus ImageNet collection with Deep features extracted (provided by consultant). Primary tests can be done on the data provided by here
  • References: References and a detailed description of the problem are given in file
  • Code notes: Implementation notes
  • Base algorithm: 1) convex relaxation of the problem solved by an internal point through CVX 2) SVM on a modified sample consisting of pairs of objects
  • consultant: Yu.V. Maksimov

13. 2015

  • Title: Building a hierarchical topic model of a large conference
  • Problem: Every year, the program committee of a major EURO conference (more than 2000 reports) is faced with The problem of building a hierarchical model of conference abstracts. Due to the fact that the structure of the conference changes little from year to year, it is proposed to build a thematic model of the future conference using expert models of conferences of previous years. This raises the following subThe problems:
  1. Classification of abstracts of the new conference.
  2. Predicting changes in the structure of the conference.
  • Data: Abstracts and expert models of EURO 2010, 2012, 2013 conferences.
  • References: Alexander A. Aduenko, Arsentii A. Kuzmin, Vadim V. Strijov. Adaptive thematic forecasting of major conference proceedings text of the article
  • Base algorithm:
  • Solution: For solving subThe problems
  1. it is proposed to combine the expert models of conferences of previous years into one, and for each thesis of a new conference to find the most suitable cluster in the resulting combined model, on example, using a weighted cosine measure of proximity.
  2. explore changes in the structure of conferences from year to year and determine the threshold of intra-cluster similarity values at which, for a certain set of abstracts, Experts create a new cluster, rather than adding these abstracts to existing clusters.
  • Novelty: A weighted cosine proximity measure that takes into account the hierarchical structure of clusters. Forecasting changes in the hierarchical structure/topics of the conference
  • consultant: Arsenty Kuzmin

14. 2015

  • Title: Regularization of the linear naive bayes classifier.
  • Problem: Building a linear classifier is one of the classic and most well studied machine learning The problems. A linear naive bayesian (LNB) classifier has the strong advantage that it builds in time that is linear in sample length, and the strong limitation that it assumes that the features are independent in its derivation. On some data, LNB performs surprisingly well, despite a clear violation of the feature independence hypothesis. The Linear Support Vector Machine (SVM) is considered to be a very successful method, but takes a long time on large samples. Both of these methods work in the same space of linear classifiers. The idea of the study is to bring LNB closer to SVM in terms of quality, but without loss of efficiency, by means of minor corrections.
  • Data: One of the three data sets, optional: classification of texts into scientific and non-scientific, classification of abstracts by fields of science, classification of ECG codograms for sick and healthy.
  • References:
    1. Larsen (2005) Generalized Naive Bayes Classifiers.
    2. Abraham, Simha, Iyengar (2009) Effective Discretization and Hybrid feature selection using Naïve Bayesian classifier for Medical datamining.
    3. Lutu (2013) Fast Feature Selection for Naive Bayes Classification in Data Stream Mining.
    4. Zaidi, Carman, Cerquides, Webb (2014) Naive-Bayes Inspired Effective Pre-Conditioner for Speeding-up Logistic Regression.
    5. + ask Vorontsov K. V.а.
  • Base algorithm: any ready-made LNB and SVM implementations. Plus naive feature selection for LNB.
  • Solution: Derive correction formulas for LNB weights when using a margin-maximization regularizer similar to SVM. We build an iterative process in which a correction is calculated at each step, bringing the LNB closer to the SVM a little more. ROC-curves and dependences of Hold-out AUC on the iteration number are built.
  • Novelty: The ML community still hasn't realized that any linear classifier is equivalent to some kind of Naive Bayesian classifier.
  • consultant: Mikhail Uskov. Hyperconsultant: Vorontsov K. V..

15. 2015

  • Title: Thematic model of the interests of regular users of the mobile application.
  • Problem: The mobile app for learning English words offers the user words one by one. The user can either add a word to the studied ones, or discard it. To start learning words, you need to type at least 10 words. It is required to build a probabilistic word generation model that adapts to the interests of the user.
  • Data: There are lists of added and dropped words for each user. In addition, it is intended to use a large external collection of texts, for example, Wikipedia, for sustainable topic definition.
  • References:
    1. Vorontsov K. V., Potapenko A. A. Additive Regularization of Topic Models // Machine Learning. Special Issue “Data Analysis and Intelligent Optimization with Applications”. 2014. Russian translation
  • Base algorithm: Random word selection algorithm.
  • Solution: The topic model for each user determines the topic profile of his interests p(t|u). To generate words, word distributions from the distributions p(w|t) of the topics of the given user are used. Dependences of the quality functionals of the thematic model on the iteration number are constructed. The main functionality of quality is the ability of the model to predict which words the user will leave and which ones they will discard.
  • Novelty: A feature of the model is the presence of discarded words. The developed methods can also be applied in recommender systems with likes and dislikes.
  • consultant: Viktor Safronov. Hyperconsultant: Vorontsov K. V..

2014

Author Topic Link Consultant DZ-1 Letters Sum Grade
Gazizullina Rimma Forecasting the volume of rail freight traffic by pairs of branches [106], pdf Stenina Maria \frac{15}{15}+\frac{10}{16} [MF]TAI+L+SBR+CV+T>DEH(J) 16 10
Grinchuk Alexey Selection of Optimal Structures of Predictive Models by Structural Learning Methods [107], pdf Varfolomeeva Anna \frac{7}{15}+\frac{2}{16} [F]TA+I+LSBR+СV+T+D+E(F) 14,5 9
Gushchin Alexander Sequential Generation of Essentially Nonlinear Models in The problems of Document Ranking [108], pdf Kuznetsov Mikhail \frac{5}{15}+\frac{2}{16} [F]TAI+L+SBRCVTDEHS(F) 15,5 9
Efimova Irina Differential diagnosis of diseases by electrocardiogram [109], pdf Vlada Tselykh \frac{15}{15}+\frac{12}{16} [MF]T+A+I+L+SB++R+CV+TDE+H(J ed) 17,25 10
Zhukov Andrey Building University Rankings: Panel Analysis and Sustainability Assessment [110], pdf Kuznetsov Mikhail \frac{8}{15}+0 [F]TAIL+SBRCVTDEHS(F) 15,25 9
Ignatov Andrey Manifold training for predicting sets of quasi-periodic time series [111], pdf Ivkin Nikita 0+\frac{7}{16} [MF]TA+I+L+S+B+R+C+VTD>E+HS (J if ed) 18 10
Karasikov Mikhail Search for effective methods of dimensionality reduction in solving problems of multiclass classification by reducing it to solving binary problems [112], pdf Yu.V. Maksimov 0+0 [MF]TAI+L+SBRC+V+TDESH(J) 15 10
Kulunchakov Andrey Detecting Isomorphic Structures of Essentially Nonlinear Predictive Models [113], pdf Sologub Roman, Kuznetsov Mikhail \frac{10}{15}+\frac{14}{16} [F]T+AI+L+S+BR+CVT++D+EHS(J ed-ed) 17 10
Lipatova Anna Detecting Patterns in a Set of Time Series by Structural Learning Methods [114], pdf A. P. Motrenko \frac{8}{15}+\frac{6}{16} [MF]TA+I+LSBR-CVTDE (J when ed) 14,25 10
Makarova Anastasia Using non-linear forecasting when looking for dependencies between time series [115], pdf A. P. Motrenko 0+0 [F]TAI-LSB+R-CVTD>E>(F) 12,75 9
Plavin Alexander Optimizing the Number of Topics in Probabilistic Topic Models with a String Sparse Regularizer [116], pdf Potapenko Anna \frac{13}{15}+\frac{14}{16} [F]T+A+I+L+S+BR++CVTD+>>(?) 14 10
Maria Popova Choosing the optimal model for predicting human physical activity based on accelerometer measurements [117], pdf Tokmakova Alexandra \frac{11}{15}+\frac{6}{16} [MF]T+AI+L++SB++R+CV+TD+(JV ed) 15,25 10
Shvets Mikhail Interpretation of multimodels in the processing of sociological data [118], pdf Alexander Aduenko \frac{11}{15}+\frac{4}{16} [M+F]T+A+I+L+S+B+R+CVTD+E(F) 16,25 9
Shinkevich Mikhail Influence of sparse, smoothing and decorrelation regularizers on the stability of a probabilistic topic model [119], pdf Dudarenko Marina \frac{15}{15}+\frac{9}{16} [MF]T+AIL+S+BR+CV+T+D+E+H(J ed) 17 10

1. 2014

  • Optimizing the Number of Topics in Probabilistic Topic Models with a String Sparse Regularizer
  • Problem: The probabilistic topic model describes the probabilities of occurrence of words w\in W in documents d\in D through latent topics t\in T< /text>:
<tex> p(w|d) = \sum_{t\in T} p(w|t)p(t|d) = \sum_{t\in T} \phi_{wt}\theta_{td}. We need to test the hypothesis that by imposing constraints on the \Theta matrix using the string sparse regularizer, it is possible to determine the optimal number of topics.

    • Data: The collection of documents is specified by word frequencies. Since to solve the problem it is necessary to know the <<true>> number of topics, experiments are performed on realistic model or semi-model data.
    • References:
      1. Description of the problem and proposed solutions
      2. Vorontsov K. V. Additive regularization of thematic models of collections of text documentsc ops // Reports of the Russian Academy of Sciences. 2014. - V. 455, No. 3 (in press).
      3. Vorontsov K. V. Probabilistic thematic modeling. — 2014. http://www.MachineLearning.ru/wiki/images/2/22/Voron-2013-ptm.pdf
      4. Teh Y. W., Jordan M. I., Beal M. J., Blei D. M. Hierarchical Dirichlet processes // Journal of the American Statistical Association. - 2006. - Vol. 101, no. 476.-Pp. 1566–1581
    • Basic algorithm: Regularized EM-algorithm [2014: Vorontsov] is used to solve the optimization problem. A rational, stochastic or online version of the EM algorithm can be used.
    • Novelty: Dirichlet's HDP [2006: Teh et Al] hierarchical process model is commonly used to optimize the number of topics. It determines the number of topics is unstable, and at the same time it is difficult both to understand and to implement. Additive Regularization of Topic Models (ARTM) is a new approach to topic modeling that combines versatility, flexibility and simplicity. The problem of optimizing the number of topics has not yet been considered in the framework of ARTM.

    2. 2014

    • Differential diagnosis of diseases by electrocardiogram
    • Problem: It is proposed to solve a typical classification problem. Signs are 216 characteristics calculated from the electrocardiogram. It is necessary to evaluate the quality of the classification on a delayed control sample. To do this, the fractions of errors of the first and second kind are calculated. Under the error of the first kind is meant the assignment of healthy people to the class of patients, the second kind - the assignment of patients to the class of healthy people. Preference is given to minimizing Type II errors.
    • Data: For each of the 5 diseases, there are 2 types of samples. Reference - more reliable, specially selected cases. The rest are cases when the diagnoses were established by doctors less reliably; these samples are proposed to be used for control.
    • References:
      1. Vorontsov K. V. Metric classification algorithms. Lectures on machine learning. — 2014. http://www.MachineLearning.ru/wiki/images/c/c3/Voron-ML-Metric-slides.pdf
      2. Uspensky V. M. Information function of the heart // Clinical Medicine, 2008. - V. 86, No. 5. - P. 4–13.
      3. Uspensky V. M. Information function of the heart. Theory and practice of diagnosing diseases of internal organs by the method of information analysis of electrocardiosignals. - M .: "Economy and information", 2008. - 116 p.
    • Basic algorithm: To solve the problem, it is proposed to use a metric algorithm with greedy feature selection.
    • Novelty: The data were prepared using a unique technology for information analysis of electrocardiosignals, developed by prof. MD V.M.Uspensky. A classification algorithm is proposed and its generalizing ability is investigated.
    • consultant: Vlada Tselykh

    3. 2014

    • Influence of sparse, smoothing and decorrelation regularizers on the stability of a probabilistic topic model
    • Problem:Probabilistic topic model describes the probabilities of occurrence of words w\in W in documents d\in D through latent topics t\in T< /text>: <tex> p(w|d) = \sum_{t\in T} p(w|t)p(t|d) = \sum_{t\in T} \phi_{wt}\theta_{td}. Matrix representation \|p(w|d)\|_{W\times D}

    as a product of two smaller matrices {\Phi=\|\phi_{wt}\|_{W\times T}} and {\Theta=\|\theta_{dt} \|_{T\times D}} is not the only one: \Phi \Theta = (\Phi S)(S^{-1}\Theta) = \Phi'\Theta' for some non-degenerate S. It is required to test the hypothesis that, by imposing restrictions on the matrices \Phi, \Theta using regularizers, it is possible to increase the stability of their recovery.

    • Data: The collection of documents is specified by word frequencies. To solve the problem, it is necessary to know the “true” matrices \Phi, \Theta, experiments are performed on realistic model or semi-model data that satisfy the hypotheses of sparseness, weak correlation of topics and the presence of background topics.
    • References:
      1. Vorontsov K. V. Additive regularization of thematic models of collections of text documents // Reports of the Russian Academy of Sciences. 2014. - V. 455, No. 3 (in press).
      2. Vorontsov K. V. Probabilistic thematic modeling. - 2014. http://www.MachineLearning.ru/wiki/images/2/22/Voron-2013-ptm.pdf.
    • Basic algorithm: Regularized EM-algorithm [2014: Vorontsov] is used to solve the optimization problem. A rational, stochastic or online version of the EM algorithm can be used.
    • Novelty: Additive Regularization of Topic Models (ARTM) was proposed in [2014: Vorontsov] as a universal way to improve the stability and interpretability of topic models. However, the question of which particular combination of regularizers increases stability remains open. This study is aimed at solving this problem.
    • consultant: Marina Dudarenko

    4. 2014

    • Building University Rankings: Panel Analysis and Sustainability Assessment
    • consultant: Kuznetsov Mikhail
    • Problem: University ranking changes from year to year. This change may be due to the poor quality of the ranking calculation methodology, random changes in the institution's performance, and purposeful changes in the state of the institution. It is required to propose such a rating method that is resistant to random changes, which would allow interpreting the change in the state of the university.
    • Data: Eight years of data for the world's top 100 universities.
    • References:
      1. Strijov V.V. Refinement of expert assessments using measured data. Zavodskaya lab. Diagnostics of materials, 2006, 72(7) - 59-64.
      2. Strijov V.V. Refinement of Expert assessments in rank scales using measured data. Zavodskaya lab. Diagnostics of materials, 2011, 77(7) - 72-78.
      3. Kuznetsov M.P., Strijov V.V. Methods of expert estimations concordance for integral quality estimation // Expert Systems with Applications, 2014.
      4. Draft POF article on request.
    • Basic algorithm: A method for constructing the RUR rating and one of the redundantly stable algorithms for ranking scales.
    • Novelty: Introduced the concept of interpretability of the change in the rating position. The problem of choosing and optimal locally monotonous correction of indicators was solved. A technique for constructing a rating is proposed that allows interpreting the change in the state of a university for the purpose of monitoring. Option: solved the reverse The problem of management: how to change the indicators of the university in order to achieve a given goal.

    5. 2014

    • Detecting Patterns in a Set of Time Series by Structural Learning Methods
    • consultant: A. P. Motrenko
    • Problem: To improve the quality of the time series forecast, I would like to use expert statements about the presence of a causal relationship between events. To do this, it is necessary to be able to assess the reliability of expert statements. It is impossible to prove the existence of a causal relationship by statistical methods. The researcher can only check the presence of a certain structure of communication. The purpose of The problem is, based on expert statements about the presence of a connection between events, to examine the time series for the presence of various structural connections and find the structure that is most consistent with the Expert's opinion.
    • References:
      1. R. B. Kline, Principles and Practice of Structural Equation Modeling. New York: Guilford. 2005.
      2. J. Pearl, Graphs, Causality and Structural Equation Models. Sociological Methods and Research, 27-2(1998), 226-284.
      3. J. Pearl, E. Bareinboim, Transportability of Causal and Statistical Relations: A Formal Approach // Proceedings of the 25th AAAI Conference on Artificial Intelligence, August 7-11, 2011, San Francisco. 247-254
      4. Valkov A.S., Kozhanov E.M., Motrenko A.P., Khusainov F.I. Construction of cross-correlation dependences in the forecast of load of the railway junction // Machine learning and data analysis. 2013. T. 1, No. 5. C. 505-518.
      5. Valkov A.S., Kozhanov E.M., Medvednikova M.M., Khusainov F.I. Nonparametric forecasting of railway junction system load based on historical data // Machine Learning and Data Analysis. 2012. T. 1, No. 4. C. 448-465.
    • Basic algorithm: structural equation modeling, SEM
    • Novelty: A method for assessing the reliability of Expert statements about the impact of exchange prices on major instruments on the volume of rail freight traffic is proposed. Various structures of links between time series are proposed. The concept of structure complexity is introduced. The relationship between the complexity of the structure and the assessment of the reliability of the statement is investigated.

    18. 2014

    • Using non-linear forecasting when looking for dependencies between time series
    • consultant: A. P. Motrenko
    • Problem: (As part of a study devoted to the discovery of patterns in time series sets) It is proposed to abandon the standard assumptions about the stationarity of the time series when searching for dependencies between time series and to study time series from the point of view of dynamical systems theory, within which irregular time dependences determined by the structure of the phase space are considered. It is required to study a set of approaches to the analysis of dynamic data and the identification of relationships between them; describe the limits of applicability of the basic algorithm and propose new options for the revealed structural relationships.
    • Data: Synthetic data, historical stock prices for major instruments and rail freight data.
    • References:
      1. Tools for the Analysis of Chaotic Data. HENRY D. I. ABARBANEL
      2. Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series, G. Sugihara, R.M. May.
      3. George Sugihara et al. Detecting Causality in Complex Ecosystems. Science 338, 496 (2012);
      4. Valkov A.S., Kozhanov E.M., Motrenko A.P., Khusainov F.I. Construction of cross-correlation dependences in the forecast of load of the railway junction // Machine learning and data analysis. 2013. T. 1, No. 5. C. 505-518.
      5. Valkov A.S., Kozhanov E.M., Medvednikova M.M., Khusainov F.I. Nonparametric forecasting of railway junction system load based on historical data // Machine Learning and Data Analysis. 2012. T. 1, No. 4. C. 448-465.
    • Basic algorithm: convergent cross mapping
    • Novelty: Proposed different structures of relationships between time series and a method for checking the existence of relationships

    6. 2014

    • Sequential Generation of Essentially Nonlinear Models in The problems of Document Ranking
    • consultant: Kuznetsov Mikhail
    • Problem: Propose and test on test and real data an algorithm for generating essentially non-linear models. The algorithm should generate 1) a complete set of models 2) choose the optimal step for a fixed model structure (adding a superposition element).
    • Data: Synthetic data, data for LIG text collections.
    • References:
      1. Goswami P., Moura1 S., Gaussier E., Amini M.R. Exploring the Space of IR Functions //
      2. Ore G.I., Strijov V.V. Algorithms for the inductive generation of superpositions for the approximation of measured data // Informatics and its applications, 2013, 7(1) - 17-26.
      3. Ore G.I., Strijov V.V. Simplification of superpositions of elementary functions with the help of graph transformations according to the rules // Intellectualization of information processing. Reports of the 9th international conference, 2012 - 140-143.
      4. Vladislavleva E., Smith G., Hertog D., Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via Pareto Genetic Programming // IEEE Transactions on Evolutionary Computation, 2009. Vol. 13(2). pp. 333-349.
      5. Vladislavleva E. Model-based Problem Solving through Symbolic Regression via Pareto Genetic Programming: PhD thesis, Tilburg University, Tilburg, the Netherlands, 2008.
    • Basic algorithm: An exhaustive enumeration algorithm for admissible superpositions of generating functions.
    • Novelty: An algorithm for sequential addition of superposition elements is proposed. A function of the distance between superpositions is proposed and its properties are investigated. The notion of superposition complexity and the notion of adjacent superpositions that differ in complexity by one are introduced. An algorithm for generating adjacent superpositions is proposed.

    7. 2014

    • Detecting Isomorphic Structures of Essentially Nonlinear Predictive Models
    • consultant: Sologub Roman, Kuznetsov Mikhail
    • Problem: Develop an algorithm for finding isomorphic subgraphs for trees (a variant - for directed acyclic graphs). Compare the complexity of the algorithm for checking the isomorphism of two superpositions for the proposed algorithm and for the algorithm for element-by-element comparison of mappings.
    • Data: Data on exchange options: dependence of option volatility on the price and time of its execution.
    • References:
      1. Ore G.I., Strijov V.V. Algorithms for the inductive generation of superpositions for the approximation of measured data // Informatics and its applications, 2013, 7(1) - 17-26.
      2. Ore G.I., Strijov V.V. Simplification of superpositions of elementary functions with the help of graph transformations according to the rules // Intellectualization of information processing. Reports of the 9th international conference, 2012 - 140-143.
      3. Ehrig H., Ehrig G., Prange U., Taentzer. G. Fundamentals of Algebraic Graph Transformation. Springer, 2006.
      4. Ehrig H., Engels G. Handbook of Graph Grammars and Computing by Graph Transformation. World Scientific Publishing, 1997.
      5. Strijov V.V., Sologub R.A. Inductive generation of regression models of implied volatility for option trading // Computational technologies, 2009, 14(5) — 102-113.
    • Basic algorithm: Algorithm for element-by-element comparison of mappings.
    • Novelty: A fast algorithm for simplifying superpositions and searching for isomorphic models is proposed. The incidence matrix of the set of generating functions is used.

    8. 2014

    • Building predictive models as superpositions of expert-specified functions
    • consultant: Ivkin Nikita
    • Problem: Required to assign a set of time series to one of several classes. It is proposed to do this using the automated feature generation procedure. To do this, Expert creates a set of generating functions that 1) transform the time series (by example, smooth, decompose into principal components), 2) extract its aggregated descriptions from the time series (by example, mean, variance, number of extrema). It is possible to generate a significant number of features by constructing superpositions of generating functions. The resulting features are used to classify a set of time series (for example, by the nearest neighbor method).
    • Data: data from the mobile phone's accelerometer.
    • References:
      1. Problem statement \MLAlgorithms\Group074\Kuznetsov2013SSAForecasting\doc
      2. Khaikin S. Neural networks. Williams, 2006.
    • Basic algorithm: neural network (option: deep learning neural network).
    • Novelty: A method for extracting features using automatically constructed superpositions of Expert-specified functions is proposed. Comparison of structural and topological complexity in The problem classification.

    9. 2014

    • Manifold training for predicting sets of quasi-periodic time series
    • consultant: Ivkin Nikita
    • Problem: The problem of classifying human activity based on data from the mobile phone's accelerometer is solved. Data from the accelerometer are represented by quasi-periodic time series. It is required to attribute the time series to one of the types of activity: running, walking, etc. To solve the problem of classifying series, a method based on nearest neighbors in the space of manifolds is proposed.
    • Data: data from the mobile phone's accelerometer.
    • References:
      1. Mi Zhang; Sawchuk, A.A., "Manifold Learning and Recognition of Human Activity Using Body-Area Sensors," Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on , vol.2, no., pp.7,13, 18- 21 Dec. 2011
    • Basic algorithm: neural network
    • Novelty: proposed a method for classifying quasi-periodic time series based on manifolds

    10. 2014

    • Interpretation of multimodels in the processing of sociological data
    • consultant: Alexander Aduenko
    • Problem: The problem of credit scoring is to determine the level of creditworthiness of the borrower who applied for a loan. To do this, a borrower's questionnaire is used, containing both numerical data (age, income, time of residence in the country) and categorical features (gender, profession). It is required, having historical information on loan repayments by other borrowers, to determine whether the client in question will return the loan. Thus, it is required to solve the problem of classification. Since the data can be heterogeneous (for example, if there are different income regions in the country), the data can be described not by one, but by several models. In this paper, we propose to compare two methods for constructing multimodels: mixtures of logistic models and gradient boosting.
    • Data: data on consumer loans (\mlalgorithms\BSThesis\Aduenko2013\data).
    • References:
      1. model blends (\mlalgorithms\BSThesis\Aduenko2013\doc, Bishop)
      2. boosting (lecture "Compositional methods of classification and regression" by Vorontsov)
    • Basic algorithm: boosting.
    • Novelty: Identification and explanation of similarities and differences between the solutions obtained by the two specified algorithms.

    11. 2014

    • Selection of Optimal Structures of Predictive Models by Structural Learning Methods
    • consultant: Varfolomeeva Anna
    • Problem: It is proposed to solve the problem of forecasting in two stages: first, the structure of the predictive model is restored using the stories of constructing successful forecasts. The model parameters are then optimized; using the model, a time series forecast is built.
    • Data: synthetic sample, biomedical time series, accelerometer measurements.
    • References:
      1. Jaakkola T. Scaled structured prediction.
      2. URL: http://video.yandex.ru/users/ya-events/view/486/user-tag/scientific%20seminar/
      3. Find all the work of TJ students on the given topic.
      4. Varfolomeeva A.A. Bachelor's thesis in MLAlgorithms/BSThesis/Varfolomeeva
    • Basic algorithm: the metaprediction algorithm described in the thesis.
    • Novelty: A method for restoring model structures using a priori assumptions about these structures is proposed.

    12. 2014

    • Invariants in Predicting Quasi-Periodic Series
    • consultant: Arsenty Kuzmin
    • Problem: The problem of hourly price/electricity consumption forecasting for the day ahead is being solved. When constructing the plan matrix, it is proposed to use not the original segment of the time series, but its invariant representation.
    • Data: hourly data on electricity prices and volumes (insert link).
    • References:
      1. Sandulyanu L.N., Strijov V.V. Feature Selection in Autoregressive Forecasting The problems // Information Technologies, 2012, 7 — 11-15.
      2. (taken from Fadeev's last article)
      3. Basic algorithm: autoregressive prediction described in Sanduleanu's work.
    • Novelty: An algorithm for joint estimation of the parameters of the invariants and autoregressive model is proposed, which makes it possible to significantly improve the accuracy of forecasting.

    13. 2014

    • Forecasting the volume of rail freight traffic by pairs of branches
    • consultant: Stenina Maria (Medvednikova)
    • Problem: Predict traffic volumes from branch to branch, compare with the basic algorithm for predicting the departure of wagons from branch. Test the hypothesis that the traffic forecast from branch to branch is more accurate than the forecast using the Basic algorithm Examine series for trend/periodicity. If there is a trend/periodicity, then include it in the model. Prepare a prediction algorithm for use.
    • Data: daily data for a year and a half on the transportation of 38 types of cargo in the Omsk region.
    • References:
      1. Valkov A.S., Kozhanov E.M., Medvednikova M.M., Khusainov F.I. Nonparametric forecasting of railway junction system load based on historical data // Machine Learning and Data Analysis. - 2012. - No. 4.
    • Basic algorithm: histogram prediction described in the article.
    • Novelty: it is proposed to improve the quality of the forecast by dividing the data into smaller parts and forecast traffic for specific branches instead of forecasting the departure of wagons.

    14. 2014

    • Choosing the optimal model for predicting human physical activity based on accelerometer measurements
    • consultant: Tokmakova Alexandra
    • Problem: Suggest an algorithm for sequential modification of the neural network. The goal is to find the most simple, stable and accurate network configuration that allows solving the problem of two-class (variant: multi-class) physical activity prediction.
    • Data: Set of time series of accelerometer measurements.
    • References:
      1. Decimation of neural families on Machinelearning.ru.
      2. Khaikin S. Neural networks. Williams, 2006.
    • Basic algorithm: Optimal Brain Damage/Optimal Brain Surgery.
    • Novelty: A method for sequential generation of neural networks of optimal complexity is proposed. The stability of generated models is studied.

    15. 2014

    • Time Series Metaprediction
    • consultant: A.S. Inyakin, Ivkin Nikita
    • Problem: A set of time series forecasting algorithms is specified. According to the presented time series, it is required to indicate the algorithm that delivers the most accurate forecast. In this case, the algorithm itself is not supposed to be executed. To solve this problem, it is proposed to build a set of features that describe the Expert time series, but a set of generating functions is created that 1) transform the time series (by example, smooth, decompose into principal components), 2) extract its aggregated descriptions from the time series (by example, mean, variance , the number of extrema). It is possible to generate a significant number of features by constructing superpositions of generating functions.
    • Data: Library of quasi-periodic and aperiodic time series
    • References:
      1. Kuznetsov M.P., Mafusalov A.A., Zhivotovsky N.K., Zaitsev E., Sungurov D.S. Smoothing forecasting algorithms // Machine learning and data analysis. 2011. T. 1, No. 1. C. 104-112.
      2. Fadeev I.V., Ivkin N.P., Savinov N.A., Kornienko A.I., Kononenko D.S., Dzhamtyrova R.B. Autoregressive forecasting algorithms // Machine learning and data analysis. 2011. T. 1, No. 1. C. 92-103.
    • Basic algorithm: Use the SAS/SPSS algorithm.
    • Novelty: A method for fast selection of the optimal predictive algorithm based on the description of the time series is proposed.

    16. 2014

    • Identification of a person by the image of the iris
    • consultant: Matveev I. A.
    • Problem: In the problem of identifying a person by the image of the iris (iris), the most important role is played by the selection of the region of the iris in the original image (segmentation of the iris). However, the iris image is usually partially obscured (shaded) by eyelids, eyelashes, highlights, that is, part of the iris cannot be used for recognition and moreover, the use of data from shaded areas can generate false signs and reduce accuracy. Therefore, one of the important steps in the segmentation of the iris image is the rejection of shaded areas.
    • Data: bitmap monochrome image, typical size 640*480 pixels (however, other sizes are possible) and coordinates of centers and radii of two circles approximating pupil and iris.
    • References:
      1. Problem description and proposed solutions
      2. Monro D. University of Bath Iris Image Database // http:// www.bath.ac.uk/ elec-eng/ research/ sipg/ irisweb/
      3. Chinese academy of sciences institute of automation (CASIA) CASIA Iris image database // http://www.cb-sr.ia.ac.cn/IrisDatabase.htm, 2005.
      4. MMU Iris Image Database: Multimedia University // http://pesonna.mmu.edu.my/ccteo/
      5. Phillips P.J., Scruggs W.T., O'Toole A.J. et al. Frvt2006 and ice2006 large-scale experimental results // IEEE PAMI. 2010. V. 32. No. 5. P. 831–846.
      6. G.Xu, Z.Zhang, Y.Ma Improving the performance of iris recognition system using eyelids and eyelashes detection and iris image enhancement // Proc. 5Th Int. Conf. Cognitive Informatics. 2006. P.871-876.
    • Basic algorithm: method using sliding window and texture features [2006: Xu, Zhang, Ma].
    • Novelty: the mask of the open area of the iris has been built.

    17. 2014

    Trial Programming

    The problem Who is doing Number
    A selection is given "Wine of different regions". It is required to determine the clusters (regions of origin of wines) and draw the result: the cluster object is marked with a colored dot; the colored circle indicates the class of this object taken from the sample. The problem option: determine the number of clusters. The problem option: use two algorithms, for example k-means and EM, and show a comparison of clustering results on a graph. Plavin 1
    Suggest ways to visualize sets of 4D vectors, see example for Fisher's iris data. Write down your last name here. 2
    Given a time series series describing electricity consumption. Approximate a series by several curvilinear models and plot the predicted and original series on the same graph. Kulunchakov Andrey. 3
    Smooth the time series Prices (volumes) for the main exchange instruments using the exponential smoothing. Draw color plots of the antialiased rows with different  \alpha and the original row. Avdyukhov 4
    Closed Curve Sample Fit [120]: Check if points lie on a circle? Generate data yourself. Gazizullina Rimma 5
    A time series with gaps is given, using the example [121]. Suggest ways to fill in the gaps in the data, fill in the gaps. For each method, construct a histogram. Option: take a sample without gaps, randomly remove part of the data, fill in the gaps, compare with the histogram of the original sample. Ignatov Andrey 6
    A selection is given "Wine of different regions". Choose two features. Consider different distance functions when classifying with nearest neighbor method. For each, depict the classification result in the space of selected features. Maria Popova 7
    For various types of dependence  y = f(x) + \epsilon (linear, quadratic, logarithmic) build linear regression and plot the SSE deviations (standard deviations-?). Generate data yourself or take data "Price for bread". Efimova Irina 8
    Estimate the area of a unit circle using the Monte Carlo method. Plot the result against the sample size. Shinkevich Mikhail 9
    Construct a convex hull of points on a plane. Draw a graph: points and their convex hull is a closed broken line. Makarova Anastasia 10
    A selection is given: Iris. Implement the decision tree classification procedure. Illustrate the results of classification on a plane in the space of two features. Zhukov Andrey 11
    The time series is set - volumes of hourly electricity consumption (select any two days). Approximate the series with polynomial models of various degrees (1-7). *Suggest a method for determining the optimal degree of a polynomial. Karasikov Mikhail 12
    Two one-dimensional [[Time series (library examples)] | time series]] of various lengths. Calculate row spacing using dynamic alignment. Grinchuk Alexey 13
    Generate a set of points on the plane. Select and visualize the main components. Lipatova 14
    Approximate the sample bread prices with a polynomial model. Draw a graph. Mark objects that are outliers using the three sigma rule. Shvets Mikhail 15
    Divide the sample Iris into clusters. Illustrate the results of clustering on a graph, highlight the clusters in different colors. Gushchin Alexander 16
    And more The problems to choose from
    A sample of several features is given, without a target vector Y. For example, this https://dmba.svn.sourceforge.net/svnroot/dmba/Data/Diabets_LARS.csv You need to specify the feature that is well described (in terms of linear regression) by the rest (such a feature is usually excluded from the sample). 17
    Smooth time series (see library) with moving average. Take several windows of different lengths and superimpose the result on the graph on top of each other. Kostyuk 18
    Given a time series (see library). Based on its variational series, construct a histogram of n percentiles and draw it. What is the most common time series value? Gizzatullin Anvar 19
    Show the difference in the speed of performing matrix operations and operations in a loop. You can use Singular value decomposition and other linear algebra methods as an example. Show the efficiency of parallel computing (parfor). 20
    Understand how function superposition works. Using the @ function, generate all possible polynomials in n variables of degree at most p. Option: use the obtained polynomials to approximate the time series of bread prices (data).

    2013

    Title Author Link MAIPVTDCHSJ
    Definition of the printed image Pushnyakov Alexey [122] MAIPVTDCHSJ
    Comparison of Fast Clustering Algorithms Alexander Katrutsa [123] MAIPVTDCHS
    Vector autoregression and management of macroeconomic indicators Kashcheeva Maria [124] MAIPVTDCHS
    Marking up bibliographic records using logical algorithms Ryskina Maria [125] MAIPVTDCHS
    Determination of the exact border of the pupil Chinaev Nikolai [126] MAIPV.DCHS
    Vector autoregression and management of macroeconomic indicators Grinchuk Oleg [127] MAIPVTD.HS
    Generating Neural Networks with Expert-Defined Activation Functions Perekrestenko Dmitry [128] MAIPVTDСHS
    Comparative analysis of feature selection algorithms: accuracy, stability, complexity of regression models Yashkov Daniel [129] MAI.VTD.HS
    Invariant transformations in The problems of local forecasting Kostin Alexander [130] MAI.VT.HS
    Genetic Programming Algorithm for Solving the Prediction Problem Voronov Sergey [131] MAIPVTDC.S
    Grouping of Nominal Variables in Bank Credit Scoring The problems Mityashov Andrey [132] MAIPVTDCHS
    Modeling the process of learning and forgetting when assessing the quality of production Neklyudov Kirill [133] MAI..DC.S
    Overview of Algorithms for Simplifying Algebraic Expressions Shubin Andrey [134] MAIPVTD.S
    Search algorithms for the most informative objects and features in logistic regression Ibraimova Aizhan [135] MAIP.TD..
    Interpretation of expert assessments of species of the Red Book of the Russian Federation by selecting reference (representative) objects Byrdin Alexander [136] MAI.TD.S
    Visualization of Pair Distance Matrix in Topic Modeling Vdovina Evgenia [137] MAI.TDC.S
    Algorithm for Estimating the Reliability of Expert Judgments on the Relationship of Time Series Antipova Natasha [138] MAIP.T..S

    2. 2013 MassProduction

    • Name Generation and optimization of logical descriptions when building production lines.
    • Problem It is required to set The problem of synthesizing admissible superpositions, develop an algorithm and test it on synthetic data.
    • Data Required to create.
    • References: Need a search (most likely German publications).
    • Proposed algorithm On discussion.
    • Basic algorithm None.

    3. 2013 LearnForget

    • Name Modeling the process of learning and forgetting when assessing the quality of production.
    • Problem Find an adequate regression model that describes the activities of a group of people.
    • Data Data on the speed and quality of the assembly of paper airplanes.
    • References: Need to find.
    • Proposed algorithm The procedure for analyzing regression residuals.
    • Basic algorithm Regression model in the attached article.

    4. 2013 GeneticProg

    • Name Genetic Programming Algorithm for Solving the Prediction Problem.
    • Problem Create a genetic programming algorithm that solves the problems named by Ivan Zelinka. Suggest a way to test the resulting models, organize a sliding control. Compare its performance on a test set of The problems with the performance of other GPU algorithms and with neural networks.
    • Data Test set of The problems, take on the UCI or on the Polygon.
    • References: Zelinka, Oplatkova, Vladislavleva; find works of recent years on this topic. Especially for testing these algorithms.
    • Proposed algorithm GPU.
    • Basic algorithm GPU, neural networks.

    5. 2013 Simplify

    • Name Overview of Algorithms for Simplifying Algebraic Expressions.
    • Problem It is required to find literature on algorithms that simplify expressions, compare algorithms, program the algorithm proposed in the work by Ruda/Strijov V.V.
    • Data Collect a test collection of expressions.
    • References: Graph rewriting.
    • Proposed algorithm R/S, comparison of algorithms.

    6. 2013 RedListExplanation

    • Name Interpretation of expert assessments of species of the Red Book of the Russian Federation by selecting reference (representative) objects.
    • Problem Selection of reference objects (STOLP algorithm). This algorithm can be interesting for Experts: it quickly finds noise objects, which in our terms are considered to be inconsistent with Expert data and "out of their class", and also selects reference objects that are also interpreted in a curious way. From a mathematical point of view, it is interesting, firstly, to observe different metrics (generalizations of the Hamming distance) and, most importantly, it is necessary to generalize the margin formula for the case of monotone classes, apparently by introducing the weight function of objects.
    • Data expert assessments of Red Data Book species.
    • References: according to metric classification algorithms.
    • Proposed algorithm A method or algorithm that tells the Expert why (sic!) an object is not in the Expert's intended class.

    7. 2013 RedListClassification

    • Name Algorithm for monotonic classification of objects described in rank scales.
    • Problem Apply a decision tree to the Expert Estimates of Threatened Species in the Red Data Book. Compare with previously proposed algorithms. To substantiate operations with rank features, to introduce a generalization of the concept of informativeness for the case of monotone classes, apparently, to generalize the hypergeometric distribution.
    • Data expert assessments of Red Data Book species.
    • References: You should try to avoid referring to trivial sources. Search for similar works in foreign magazines.

    11. 2013 Invaraint4LocalForecast

    • Name Invariant transformations in The problems of local forecasting.
    • Problem Combine algorithms for invariant transformation of time and amplitude of predicted time series.
    • Data Time series of pulse wave measurement.
    • References: Find, avoid trivial references.

    8. 2013 PlausibleExpert

    • Name Algorithm for Estimating the Reliability of Expert Judgments on the Relationship of Time Series.
    • Problem Study of the relationship between exchange prices for the main instruments and rail freight.
    • Data Time series for 1.5 years. But it is better to choose a synthetic example.
    • References: Publications on CCM.
    • Proposed algorithm CCM modifications.

    9. 2013 DeepLearning

    • Name Generating Neural Networks with Expert-Defined Activation Functions.
    • Problem It is required to raise the current state of the DeepLearning area, program the algorithm, test it on the problem of predicting consumption volumes and electricity prices.
    • Data Daily data for three years.
    • References: Deep Learning.
    • Proposed algorithm Building a neural network and estimating its parameters.

    16. 2013 ScoringSelection

    • Name Search algorithms for the most informative objects and features in logistic regression.
    • Problem Using a genetic algorithm to find informative objects and features.
    • Data Consumer credit data.
    • References: -

    10. 2013 ScoringFeatureSelection

    • Name Grouping of Nominal Variables in Bank Credit Scoring The problems.
    • Problem Create a genetic algorithm for reducing the dimension of a feature space.
    • Data Historical data on cash loans.
    • References: SAS, find more.

    15. 2013 InverseVAR

    • Name Vector autoregression and management of macroeconomic indicators.
    • Problem Solve the inverse forecasting problem. According to the given state of the economy, set such a value of managed macroeconomic indicators that would bring the economy to the desired state.
    • Data Macroeconomic indicators of Russia over the past 16 years.
    • References: S.A. Ayvazyan works.

    12. 2013 DistanceVisualizing

    • Name Visualization of Pair Distance Matrix in Topic Modeling.
    • Problem Display abstracts of the conference on the plane with the preservation of clusters.
    • Data EURO conference abstracts.
    • References: Zinoviev on ML, references on the topic.
    • Proposed algorithm PCA.
    • Basic algorithm Algorithm with minimization of the energy criterion.

    13. 2013 RhoNets

    • Name Comparison of Fast Clustering Algorithms.
    • Problem Compare clustering algorithm using $\rho$-networks and a fast $k$-means algorithm.
    • Data A selection of amino acid sequences. We need a test sample from the UCI or from comparison papers.
    • References: $k$-средних, $\varepsilon$-networks.
    • Proposed algorithm $\rho$-networks.
    • Basic algorithm $k$-means.

    17. 2013 FeatureSelection

    • Name Comparative analysis of feature selection algorithms: accuracy, stability, complexity of regression models.
    • Problem Build a series of test problems to compare algorithms. Propose a feature selection algorithm with the analysis of covariance matrices based on the Belsley method.
    • Data Synthetic.
    • References: Leontieva/Strijov V.V., search for modern reviews.

    1. 2013 Txt2Bib

    • Name Marking up bibliographic records using logical algorithms.
    • Problem It is required to create a text markup algorithm. Novelty in the formulation of the problem. The relevance is that a more complete library of logical expressions will be created and an adequate algorithm will be selected.
    • Data MLAlgorithms.
    • References: The work of A. Ivanova and everything that is on the topic over the past two years.
    • Proposed algorithm Choose from logical classification algorithms; optional clustering.
    • Basic algorithm Dead-end coatings.

    14. 2013 FindTheFormula (Risky)

    • Name Algorithm for searching text structures in a document.
    • Problem Suggest an algorithm that would look for formulas in a TeX document that are equivalent to a given one.
    • Data Synthetic, MLAlgorithms collection.
    • References Have to search. Search by chemical compounds in WoK works well.

    18. 2013 ScannedImage (Image)

    • Name Form type definition.
    • Problem Determine the type of form from the scan.
    • Data A set of images in TIF.

    19. 2013 SpectrumImage (Image)

    • Name Definition of the printed image.
    • Problem Make a spectral transformation of the image, explore the spectrum.
    • Data A set of JPG images classified into two classes.


    The problem Who is doing
    A set of three-element vectors is given. Draw the first two elements along the abscissa and ordinate axes. The third element is displayed as a circle with a proportional radius. Choose proportions based on a sense of beauty. Compare the resulting graph with plot3. What's better? Mityashov Andrey
    Given a five-element vector. Neklyudov Kirill
    Understand how regexp works in Matlab. Make code that highlights everything that is inside the brackets of some arithmetic expression. Ryskina Maria
    Understand how function superposition works. Using the @ function, generate all possible polynomials in n variables of degree at most p. Shubin Andrey
    Understand how a web connection and regexp works. Make a search query on a topic and make up a BibTeX entry from it.
    Given a time series of m + 1 (random) points. Approximate its first m points by polynomials of degree from 1 to m. Calculate the mean error in points. Which degree gives the largest error? Voronov Sergey
    Rotate and zoom in on a flat figure, make a zoom effect with frame-by-frame rotation. Antipova Natasha
    Two matrices are given. Check if they have an intersection - a submatrix? Vdovina Evgenia
    A sample of several features is given, without a target vector Y. For example, this https://dmba.svn.sourceforge.net/svnroot/dmba/Data/Diabets_LARS.csv You need to specify the feature that is well described (in terms of linear regression) the rest (such a feature is usually excluded from the sample). Grinchuk Oleg
    Given a sample that has several outliers. It is known that it can be described by one-dimensional linear regression. It is required to find the outliers by enumeration. Show them on a chart. Pushnyakov Alexey
    Given a sample of two classes on a plane. It is required to find all the objects that got into a foreign class. Show them on a chart. Kashcheeva Maria
    The input is the incidence matrix of the tree. The function returns a list (vector) of vertices in the order they were visited. Ibraimova Aizhan
    Classify iris flowers with an arbitrary algorithm, draw the “most visual” pair of features on the plane, indicate what was classified correctly and what was not. Yashkov Daniel
    Given a time series. Based on its variational series, build a histogram of n percentiles, draw it. What is the most common time series value?
    Create several groups of points on the plane and perform their clustering using any algorithm of your choice. Visualize the resulting clusters. Calculate the average intracluster distance for one cluster. Perekrestenko Dmitry
    Upload a sound sequence, preferably a few piano notes. Select and play a specific note.
    Download video. Delete every second frame. Process to taste. Write back. Byrdin Alexander
    Show the difference in the speed of performing matrix operations and operations in a loop. Show the efficiency of parallel computing (parfor and others). Alexander Katrutsa
    Suggest options for visualization of four-dimensional vectors and spaces. Compare them to a built-in function.
    Smooth the time series with a moving average. Take several windows of different lengths and superimpose the result on the graph on top of each other. Chinaev Nikolai
    Draw a surface. Replace each point of the surface with a median of n neighbors. Draw the result. Kostin Alexander

    2012

    Thematic Modeling: paper in the Higher Attestation Commission journal

    Title Author Link Comments
    Calculation of integral indicators in rank scales by co-clustering methods Medvednikova Maria [139] Published
    Hierarchical thematic abstract clustering and visualization Arsenty Kuzmin [140] Published
    Joint selection of objects and features in The problems of multiclass classification. Alexander Aduenko [141] Published
    Building hierarchical topic models Tsyganova Svetlana [142] Published
    Feature Selection in The problems Structural Regression Varfolomeeva Anna [143] Accepted
    Statistical tests for homogeneity and goodness of fit for highly sparse discrete distributions Vlada Tselykh

    [144]

    Published
    Building logical rules when marking up texts Ivanova Alina [145] Accepted
    Checking the adequacy of the topic model Stepan Lobastov [146] Redaction

    1. 2012

    • Name: CoRegression. Calculation of integral indicators in rank scales by co-clustering methods.
    • Teaser: Construction of an integral assessment of the effectiveness of scientific activity.
    • Data: Synthetic. PRND employees. Table authors-journals and number of articles of selected authors in journals.
    • References: Vorontsov K. V. «Collaborative filtering».
    • Keywords: h-index, co-clustering, collaborative filtering.
    • Proposed algorithm: Joint regression (invent or find ready-made).
    • Basic algorithm: Calculated IF of journals and h-index of authors. (Coclustering or adaptive filtering is not good for comparison).
    • Problem: Description in file. Additionally: when creating a rating, there is a problem of splitting the set of authors and journals into clusters. The size of the cluster needs to be correlated with the "Assessment of the involvement of the author/journal in the scientific community". This assessment should be included in the rating (as a last resort, it should be presented separately).

    2. 2012

    • Name: ExpertRanking. Coordination of rank Expert estimates.
    • Teaser: Voting ranking methods (selection of literary works, selection of a limited committee).
    • Data: Internet voting for a list of books, voting without co-optation.
    • References: Article in Notices AMS, 2008, 55(4). It will be necessary to review the literature on this issue.
    • Proposed algorithm:: Finding the intersection of cones and estimating the effective space dimension or another algorithm.
    • Basic algorithm: Kemeny Median and other algorithms.
    • Problem: It is required to illustrate and study the properties of the committee selection algorithm. In particular, highlight the following problem. The n ranking of the selected candidates differs from the n+k ranking of the selected candidates, in a single vote with a choice of N candidates. It may be necessary to shed light on Arrow's paradox.

    3. 2012

    • Name: StructureRegression. Feature Selection in Structural Regression The problems
    • Teaser: Structural regression algorithm for tagging bibliographic lists, abstracts and other structured texts.
    • Data: bibliographic records from the BibTeX collection on CS.
    • References: by Jaakkola and his team, possibly code.
    • Proposed algorithm:: Structural regression.
    • Basic algorithm: is described by Valentin.
    • Required: segment the input text and assign each segment a field and each group of fields a bibliographic record type.

    4. 2012

    • Name: LogicClassification. Building logical rules when marking up texts
    • Teaser: Structural regression algorithm for tagging bibliographic lists, abstracts and other structured texts.
    • Data: bibliographic records from BibTeX collection on CS / conference abstracts, other marked up texts.
    • References: works by Inyakin, Chuvilin, Kudinov.
    • Proposed algorithm:: Decision trees, Dead-end coatings.
    • Basic algorithm: is described by Valentin.
    • Required: train the model, text markup, using decision rules over RegExp - strings.

    5. 2012

    • Title: RankClustering. Rank clustering and dynamic alignment algorithms.
    • Teaser: Search for duplicates in bibliographic records. Dynamic alignment when finding duplicate bibliographic records.
    • Data: Corrupted and incorrect bibliographic records (bases of student abstracts). Over 1000 bibliographic entries from data mining articles/books.
    • References: Strijov V.V. et al. "Metric Sequence Clustering", work on fast k-Means clustering.
    • Keywords: DTW — modifications, k-Means.
    • Proposed algorithm:: Rank clustering algorithm.
    • Base algorithm: k-Means and its high performance variations.
    • Problem: It is required to modify the procedure for calculating the cost of the alignment path in such a way as to detect and take into account the invariants of permutations (and allowable modifications) of parts of the bibliographic record.

    6. 2012

    • Name: ThematicClustering. Checking the adequacy of the topic model.
    • Teaser: Methods for detecting incorrect thematic classification on conference materials. Methods for constructing a thematic model similar to the given one. Article clustering, hierarchical topic models with topic interpretability. Hierarchical thematic clustering of abstracts.
    • Data: Texts of Euro 2012 conference abstracts, 1862 abstracts.
    • References: on clustering, and introducing distances between texts as bags of words.
    • Keywords: hierarchical clustering, text similarity metrics.
    • Proposed algorithm:: k-means hierarchical clustering algorithm + k-NN classification.
    • Basic algorithm: k-Means
    • Problem: It is required to build a thematic model using the clustering method and check the correctness of the current text classification. To do this, (hierarchical) clustering of texts is performed, each cluster is assigned a topic name corresponding to the majority of articles from the cluster. After building the model, each article is checked and refers to its own or someone else's topic.

    7. 2012

    • Name: ThematicHierarchy. Building hierarchical topic models.
    • Teaser: Hierarchical thematic clustering of abstracts. Building a thematic model based on the materials of the conference.
    • Data: Abstract text.
    • References: hierarchical models, topic modeling.
    • Keywords: hierarchical topic modeling.
    • Proposed algorithm:: hierarchical models, evaluation of topic distribution.
    • Basic algorithm:PLSA--LDA.
    • Problem: It is required to build a hierarchical topic model by calculating statistical estimates of the distribution functions of words by topic.

    8. 2012

    • Name: ThematicVisualizing. Visualization of hierarchical thematic models.
    • Teaser: On the materials of the EURO conference.
    • Data: Texts of Euro 2012 conference abstracts.
    • References: multidimensional scaling, clustering.
    • Keywords: graph visualization.
    • Proposed algorithm::
    • Basic algorithm: --
    • Problem: It is required to visualize the matrix of paired distances in such a way that it is possible to make a decision about
      1. correction of the names of topics/subtopics of the conference,
      2. transferring the thesis from one topic to another,
      3. adequacy of correspondence between model and actual clustering.

    9. 2012

    • Name: CovSelection. Joint selection of objects and features in The problems of multiclass classification.
    • Teaser: Yandex search results ranking.
    • Data: Yandex - mathematics.
    • References: Bishop, Strijov V.V..
    • Keywords: logistic regression, feature selection, feature filtering.
    • Proposed algorithm:: Joint selection by analysis of covariance matrices.
    • Basic algorithm: SVM.
    • Problem: Get matrix T, p. 209 Bishop, make a multi-class classification (p. 208). Check on a synthetic sample of the same format as Yandex data. (For comparison, run the SVM algorithm on the same sample. Associate with feature selection.) Estimate the hyperparameter matrices of the multiclass regression model. Propose a step-by-step algorithm for joint selection with maximization of the likelihood of the model.

    10. 2012

    • Name: ThematicMatching. Determining whether a document matches the topic based on the selection of key phrases.
    • Teaser: Does the dissertation match the declared dissertation passport? What is the actual specialty of the dissertation?
    • Data: Abstracts of dissertations (SugarSync). Passports of specialties.
    • References: (Article by S. Tsarkov "Morphological and statistical methods for extracting key phrases for building probabilistic thematic models of collections of text documents" - check).
    • Keywords: key phrases, topic patterns, N-grams, morphological and statistical features.
    • Proposed algorithm::
    • Basic algorithm: C-Value and TF-IDF.
    • Problem: It is required to check each abstract from the collection for formal compliance with the passport of the specialty declared in the abstract. At the same time, passport items are considered as descriptions of topics. An abstract is considered relevant to a given topic if the total probability of a given number of terms belonging to one of the topic descriptions of this specialty is higher than belonging to topic descriptions of other specialties.
    • Problem, again: Extracting the keywords from the document. We believe that the specialty passport consists of keywords. Finding distances from one set of keywords to another. Eventually
      1. we fill up the passport of a known specialty with new keywords, or
      2. find the nearest specialty passport.
    • Solution options:Introduction of the distance function from the set of terms to the description of the topic, construction of a matrix of such distances.

    11. 2012

    • Name: FeatureGen. Sequential generation and selection of features in a multiclass classification problem
    • Teaser: Is this work scientific? Determination of the type of work (definition of the scientific field of the work). Definition of the social role of the author of the text.
    • Data: synthetic, internet collection.
    • References: Strijov V.V., Ore.
    • Keywords: generation of features, search for isomorphic models.
    • Proposed algorithm:: Algorithm for sequential generation of superpositions.
    • Basic algorithm: decision trees.
    • Problem: It is required to build a set of features by which the text can be classified.

    12. 2012

    • Name: TypeDetection. Methods for extracting features from text information
    • Teaser: Is this work scientific? Determination of the type of work (definition of the scientific field of the work). Definition of the social role of the author of the text.
    • Data: synthetic, internet collection.
    • References: Find.
    • Keywords: hierarchical clustering, structural learning, text similarity metrics.
    • Proposed algorithm
    • Basic algorithm
    • Problem: It is required to build a set of features by which the text can be classified.

    13. 2012

    • Name: Checking the adequacy of the topic model.
    • Teaser: Methods for detecting incorrect thematic classification on conference materials. Methods for constructing a thematic model similar to the given one. Article clustering, hierarchical topic models with topic interpretability. Hierarchical thematic clustering of abstracts.
    • Data: Texts of Euro 2012 conference abstracts, 1862 abstracts.
    • References: for latent models.
    • Keywords: soft clustering, latent models.
    • Proposed algorithm:: hHDP.
    • Basic algorithm:HDP.
    • Problem: It is required to build a thematic model using the clustering method and check the correctness of the current text classification. To do this, (hierarchical) clustering of texts is performed, each cluster is assigned a topic name corresponding to the majority of articles from the cluster. After building the model, each article is checked and refers to its own or someone else's topic.
    Title Author Link to the journal The original text of the work Date of application State
    Feature selection and metric optimization when clustering a collection of documents Aduenko A.A., Kuzmin A.A., Strijov V.V. Izvestiya TulGu [147] 12.10.2012 Published
    Estimating the Probabilities of Strings in a Collection of Documents Budnikov E.A., Strijov V.V. Information Technology [148] 24.09.2012 Published
    Checking the adequacy of the topic models of a collection of documents Kuzmin A.A., Strijov V.V. Software engineering [149] 17.12.2012 Published
    Algorithm for the optimal location of the names of a collection of documents Aduenko A.A., Strijov V.V. Software engineering [150] 13.11.2012 Published
    Visualization of the matrix of paired distances between documents Aduenko A.A., Strijov V.V. Scientific and technical statements of S.-Pb.PSU [151] 29.10.2012 Submitted
    Construction of an integral indicator of the quality of scientific publications by co-clustering methods Medvednikova M.M., Strijov V.V. Izvestiya TulGu [152] 15.11.2012 Published
    Joint selection of objects and features in The problems of multiclass classification of a collection of documents Aduenko A.A., Strijov V.V. Infocommunication technologies [153] 18.12.2012 Published
    Algorithm for constructing logical rules when marking up texts Ivanova A.B., Aduenko A.A., Strijov V.V. Software engineering [154] 24.01.2013 Accepted
    Building hierarchical topic models of document collections Tsyganova S.V., Strijov V.V. Applied Informatics [155] 27.01.2013 Published
    Choice of features when marking bibliographic lists by methods of structured learning Varfolomeeva A.A., Strijov V.V. Scientific and technical statements of S.-Pb.PSU [156] 27.01.2013 Reviewed
    Goodness-of-fit criteria for sparse discrete distributions and their application in topic modeling Tselykh V.R., Vorontsov K. V. Machine learning and data analysis [157] 17.12.2012 Published
    Checking the adequacy of the topic model Stepan Lobastov [158] Redaction

    List of works accepted for publication

    • 1. Aduenko A. A., Strijov V. V. V.V. Visualization of the matrix of paired distances between documents // Scientific and technical bulletin of St. Petersburg. PGU. Computer science. Telecommunications. Management, 2013, 1 - ?.
    • 2. Aduenko A. A., Kuzmin A. A., Strijov V. V. V. V. Feature selection and metric optimization when clustering a collection of documents // Proceedings of the Tula State University, Natural Sciences, 2012, No. 3. P. 119-132.
    • 3. Aduenko A. A., Strijov V. V. V.V. Algorithm for the optimal location of the names of a collection of documents // Software engineering, 2013. No. 3. P.21-25.
    • 4. Budnikov E. A., Strijov V. V. V. V. Estimating the Probabilities of Strings in a Collection of Documents // Information Technology, 2013. No. 4.
    • 5. Kuzmin A. A., Strijov V. V. Checking the adequacy of the topic models of a collection of documents // Software engineering, 2013. No. 4.
    • 6. Medvednikova M. M., Strijov V.V. Construction of an integral indicator of the quality of scientific publications by co-clustering methods // Proceedings of the Tula State University, Natural Sciences, 2013. No. 1.
    • 7. Aduenko A. A., Strijov V. V. V. V. Joint selection of objects and features in The problems of multiclass classification of a collection of documents // Infocommunication technologies, 2013. No. 2.
    • 8. Ivanova A.V., Aduenko A.A., Strijov V.V. V.V. Algorithm for constructing logical rules when marking up texts // Software engineering, 2013. No. 4(5).
    • 9. Tsyganova S.V., Strijov V.V. V. V. Building hierarchical topic models of document collections // Applied Informatics, 2013. No. 1.
    • 10. Varfolomeeva A.A., Strijov V.V. V. V. Choice of features when marking bibliographic lists by methods of structured learning // Scientific and Technical Bulletin of St. Petersburg. PGU. Computer science. Telecommunications. Management, 2013.
    • 11. Tselykh V.R., Vorontsov K. V. Goodness-of-fit criteria for sparse discrete distributions and their application in topic modeling // JMLDA, 2012. No. 4. pp. 432-442.
    Title Author Reviewer Link Comments
    CMARS: spline approximation Vlada Tselykh Tatiana Shpakova Celyh2012CMARS [.]сaipvdstrj(10)
    Algorithmic foundations for constructing bank scoring cards Alexander Aduenko Alina Ivanova Aduenko2012economics [.]сaipvdstrj(10)
    Using the method of principal components in the construction of integral indicators Maria Medvednikova Svetlana Tsyganova Medvednikova2012PCA [r]сaipvdstrj(10)
    Multi-level classification for price movement detection Arsenty Kuzmin Varfolomeeva A.A. Kuzmin2012TimeRows [r]сaipvdstjr(10)
    Local forecasting methods with the choice of an invariant transformation Svetlana Tsyganova Maria Medvednikova Tsyganova2012 LocalForecast [r]сaipvdstjr(10)
    Prediction of Quasi-Periodic Multivariate Time Series by Non-Parametric Methods (example) Egor Klochkov Alexander Shulga Klochkov2012Goods4Cast [r]сaipvdstj.(10)
    Search algorithms for the most informative objects and features in logistic regression (example) Stepan Lobastov Egor Klochkov Lobastov2012FOSelection [r]сaipvdstrj(10)
    Local forecasting methods with the choice of metric Varfolomeeva A.A. Arsenty Kuzmin Varfolomeeva2012 LocForecastMetrics [r]сaipvdstjr(10)
    Chebyshev polynomials and time series forecasting Valeria Bochkareva Stepan Lobastov Bochkareva2012TimeSeriesPrediction [.]сaipvdst-r(9)
    Clustering and compiling a dictionary of amino acid sequences Tatiana Shpakova Vlada Tselykh Shpakova2012Clustering [.]сaipvdst.(9)
    Vector autoregression and management of macroeconomic indicators Alexander Shulga Shulga2012VAR [.]сaipvds..(9)
    Approximation of empirical distribution functions Alina Ivanova Alexander Aduenko Ivanova2012 ApproximateFunc [r]сaipvd..(9)

    1

    • Search algorithms for the most informative objects and features in logistic regression
    • Logistic regression is a statistical model that is used to predict the probability of an event occurring based on the values of a set of features. It has applications, for example, in medicine [159] and credit scrolling. In real conditions, the number of features is usually large, and the most important The problem is to select only essential features, as well as to search for objects that are atypical for one reason or another.
    • Keywords: logit model, feature selection, boosting.

    2

    • Using the method of principal components in the construction of integral indicators
    • This paper considers Using the method of principal components in the construction of integral indicators. The results obtained are compared with the results given by the Pareto stratification method. An integral indicator is being built for Russian universities. For this, biographies of the 30 richest businessmen in Russia according to the Forbes magazine for 2011 are used.
    • Keywords: integral indicator, expert estimates, parameter weights, principal component method, Pareto stratification method.

    3

    • Approximation of empirical distribution functions
    • The work is devoted to methods of approximation of functions for efficient calculation of integrals. Practical The problems usually have data at certain points in time or space. When making assumptions about the remaining points, it becomes necessary to approximate the distribution function of the quantity under study, as well as to estimate the corresponding error. For its calculation, it is possible to use methods of different accuracy.
    • Keywords: Monte Carlo method, calculation of distribution functions, empirical distribution functions.

    4

    • Local prediction methods with choice of transformation
    • Time series forecasting problems have many applications in various fields such as economics, physics, and medicine. Their solution is a forecast for the near future based on the already known values of the predicted series at previous points in time. In the work, a local forecasting algorithm will be built taking into account transformations, which allows, without human intervention, to identify visually similar sections of the time series.

    2011

    Name Author Reviewer Link
    Estimation of hyperparameters of linear regression models in the selection of noise and correlated features Tokmakova Alexandra A. P. Motrenko Tokmakova2011HyperPar
    Choice of forecasting models for electricity prices Leontieva Lyubov Grebennikov Evgeny Leonteva2011ElectricityConsumption
    Multiclass prediction of the probability of myocardial infarction and estimation of the required sample size of patients (example) A. P. Motrenko Tokmakova Alexandra Motrenko2011HAPrediction
    Algorithms for generating essentially non-linear models Georgy Rudoy Nikolai Baldin Rudoy2012Generation
    Event Modeling and Financial Time Series Forecast Alexander Romanenko Budnikov E. A. Romanenko2011Event
    Overview of some statistical models of natural language Budnikov E. A. Alexander Romanenko Budnikov2011Statistical

    Practical part

    Name Author Reviewer Link Comments
    Using the Granger Test in Time Series Forecasting Anastasia Motrenko Leontieva Lyubov Motrenko2011GrangerForc Published at JMLDA
    Choosing an Activation Function for Predicting Neural Networks Georgy Rudoy Nikolai Baldin Rudoy2011NNForecasting Published at JMLDA
    Multidimensional caterpillar, choice of length and number of caterpillar components Leontieva Lyubov Mikhail Burmistrov Leonteva2011GaterpillarLearning Published at JMLDA
    Prediction by Discrete Argument Functions (example) Budnikov E. A. Alexander Romanenko Budnikov2011DiscreteForecasting Published at JMLDA
    Investigation of Convergence in Prediction by Neural Networks with Feedback Nikolai Baldin Georgy Rudoy Baldin2011FNNForecasting Published at JMLDA
    Time series alignment: Forecasting with DTW Alexander Romanenko Budnikov E. A. Romanenko2011DTWForecasting Published at JMLDA
    Isolation of the periodic component of the time series (example) Tokmakova Alexandra Budnikov E. A. Tokmakova2011Periodic Published at JMLDA

    1. 2011

    • Non-parametric forecasting: kernel selection, parameter tuning
    • The paper describes the method of nuclear smoothing of the time series, as one of the types of nonparametric regression. The essence of the method

    consists in restoring the function of time as a weighted linear combination of points from some neighborhood. A continuous bounded symmetric real weight function is called a kernel. The resulting kernel estimate is used to predict the next point in the series. The dependence of the quality of prediction on the parameters of the kernel and the superimposed noise is investigated.

    2. 2011

    • Exponential Smoothing and Prediction
    • The paper investigates the application of the exponential smoothing algorithm to time series forecasting. The algorithm is based on taking into account the previous values of the series with weights decreasing as you move away from the studied section of the time series. The behavior of the algorithm on model data in various models of weights is studied. An analysis of the operation of the algorithm on real data - stock indices was carried out.

    3. 2011

    • Isolation of the periodic component of the time series
    • The project examines the time series for the presence of a periodic component, builds a trigonometric interpolation of the proposed time series using the least squares method. The parameters of the function of the least squares method are estimated depending on the quality of forecasting. In a computational experiment, the results of the work of the correlation function and the least squares method on a noisy model sine and a real time series of an electrocardiogram are presented.

    4. 2011

    • Multivariate caterpillar, choice of length and number of caterpillar components (comparison of smoothed and unsmoothed time series)
    • The paper describes the caterpillar method and its application for time series forecasting. The algorithm is based on the selection of its informative components from the studied time series and the subsequent construction of a forecast. The dependence of the accuracy of forecasts on the choice of the caterpillar length and the number of its components is investigated. In a computational experiment, the results of the algorithm's operation on periodic series with different patterns within a period, on series with violation of periodicity, as well as on real time series of hourly temperature, are presented.

    5. 2011

    • Prediction by Discrete Argument Functions
    • The paper investigates short time series on the example of monophonic musical melodies. There is a prediction of one note by exponential smoothing, a local method, as well as a method of searching for constant patterns. The computational experiment is carried out on two melodies, one of which has exactly repeating fragments.

    7. 2011

    • Local forecasting methods, search for metrics
    • The time series is divided into separate sections, each of which is associated with a point in the n-dimensional feature space. The local model is calculated in three successive stages. The first one finds the k-nearest neighbors of the observed point. The second one builds a simple model using only these k neighbors. The third - using this model, predicts the next one based on the observed point. Many researchers use the Euclidean metric to measure distances between points. This work is intended to compare the accuracy of forecasting when using different metrics. In particular, it is required to investigate the optimal set of weights in the weighted metric to maximize the prediction accuracy.

    8. 2011

    • Local prediction methods, search for invariant transformation
    • The project uses local forecasting methods time series. There is no temporary representation in these methods series in the class of given functions of time. Instead, the prediction is made on the basis of data about some part of the time series (local information is used). In this paper, we study in detail the following method (a generalization of the classical "nearest neighbour").
    • Let there be a time series and The problem should continue it. It is assumed that such a continuation is determined

    prehistory, i.e. in a series you need to find the part that after some transformation of A becomes similar to the part we are trying to predict. Finding such a transformation A and is the goal of this project. To determine the degree of similarity, the function B is used - the function of the proximity of two segments time series. This is how we find the closest neighbor to our backstory. In general, we are looking for several nearest neighbors. The continuation will be written as their linear combination.

    9. 2011

    • Time Series Flattening: Forecasting with DTW
    • Time series is a sequence of time-ordered values of some real variable $\mathbf{x}=\{x_{t}\}_{t=1}^T\in\mathbb{R }^T$. The problem that accompanies the appearance of time series is the comparison of one data sequence with another. Comparison of sequences is greatly simplified after the deformation of the time series along one of the axes and its alignment. Dynamic time warping (DTW) is a technique for effectively leveling time series. DTW methods are used in speech recognition, information analysis in robotics, industry, medicine and other areas.
    • The purpose of the work is to give an example of alignment, to introduce a comparison functional for two time series, which has the natural properties of commutativity, reflexivity and transitivity. The functional should take two time series as input, and at the output give a number characterizing the degree of their "similarity".

    10. 2011

    • Choosing an Activation Function for Predicting Neural Networks
    • The aim of the project is to study the dependence of the quality of prediction by neural networks without feedback (single- and multilayer perceptrons) on the chosen activation function of neurons in the network, as well as on the parameters of this function.
    • The result of the project is to evaluate the quality of forecasting by neural networks depending on the type and parameters of the activation function.

    12. 2011

    • Investigation of Convergence in Prediction by Neural Networks with Feedback
    • The dependence of the convergence rate in time series forecasting on the parameters of a neural network with feedback is investigated. The concept of feedback is typical for dynamic systems in which the output signal of some element of the system affects the input signal of this element. The output signal can be represented as an infinite weighted the sum of the current and previous input signals. The Jordan network is used as a neural network model. It is proposed to investigate the rate of convergence depending on the choice of the activation function (sigmoid, hyperbolic tangent), on the number of neurons in the intermediate layer and on the width of the sliding window. We also explore a way to increase the rate of convergence using the generalized delta rule.

    13. 2011

    • Multidimensional caterpillar, choice of length and number of caterpillar components
    • The work is devoted to the study of one of the methods for analyzing multivariate time series - the "caterpillar" method, also known as Singular Spectrum Analysis or SSA. The method can be divided into four stages - the representation of the time series in the form of a matrix using a shift procedure, the calculation of the covariance matrix of the sample and its singular value decomposition, the selection of principal components related to various components of the series (from slowly changing and periodic to noise), and, finally, line restoration.
    • The scope of the algorithm is The problems of both meteorology and geophysics, and economics and medicine. The purpose of this work is to find out the dependence of the efficiency of the algorithm on the choice of time series used in its work.

    14. 2011

    • Using the Granger Test in Time Series Forecasting
    • When predicting a series, it can be useful to determine whether a given series is "dependent" on some other series. The Granger test, based on statistical tests, helps to identify such a relationship (in this case, the method does not guarantee an accurate result - when comparing two rows that depend on another row, an error is possible). The method is used in forecasting economic and natural phenomena (for example, earthquakes).
    • The purpose of the work is to propose an algorithm that makes the best use of this method; investigate the effectiveness of the method depending on the predicted series.
Личные инструменты