Временной ряд (библиотека примеров)

Материал из MachineLearning.

(Различия между версиями)
Перейти к: навигация, поиск
м (Базы данных)
м (Базы данных)
Строка 65: Строка 65:
* http://smartlab.ws/component/content/article?id=60 (2013) 30 subjects, 6 activities, fixed set of features from
* http://smartlab.ws/component/content/article?id=60 (2013) 30 subjects, 6 activities, fixed set of features from
 +
* http://llmpp.nih.gov/lymphoma/: classification DLBCL (Diffuse large B-cell lymphoma) patients into curable and noncurable groups ([http://www.broadinstitute.org/mpr/publications/projects/Lymphoma/Shipp_et_al_2002.pdf pdf]). Raw data for all Lymphochip microarrays are [http://llmpp.nih.gov/lymphoma/data/rawdata/ available here]. For each microarray, two scan files were generated, one for each fluorescence emission wavelength corresponding to the fluorophor used in the reverse transcription labeling reaction.
 +
 +
== Конкурсы Kaggle ==
 +
* https://www.kaggle.com/c/seizure-prediction: predict seizures in intracranial EEG recordings. Intracranial EEG was recorded from dogs with naturally occurring epilepsy using an ambulatory monitoring system. EEG was sampled from 16 electrodes at 400 Hz, and recorded voltages were referenced to the group average. These are long duration recordings, spanning multiple months up to a year and recording up to a hundred seizures in some dogs. Preictal training and testing data segments are provided covering one hour prior to seizure with a five minute seizure horizon.
 +
* https://www.kaggle.com/c/belkin-energy-disaggregation-competition/data: SmartHouse energy consumption prediction. Electromagnetic Interference (EMI) is measured using a special sensor built at the Ubicomp Lab to identify what appliance is being used and how much energy it is consuming. The data is available from 4 homes (H1-H4) consisting of both training datasets and testing datasets. The training set includes information about which appliance was turned ON or OFF and at what timestamps.
 +
* https://www.kaggle.com/c/predicting-parkinson-s-disease-progression-with-smartphone-data: measure the symptoms of Parkinson’s disease with a smartphone. The data was collected from 9 PD patients, at varying stages of the disease, and 7 healthy controls over a period wthin 4 months. The data inclides the following streams: audio, accelerometry (3D, for each of the 3 axes: mean, absolute central moment, standard deviation, maximum deviation, power spectral density across four separate bands), GPS (latitude, longitude, altitude), compass (for each of the 3 axes: mean, absolute central moment, standard deviation, maximum deviation).
 +
* https://www.kaggle.com/c/accelerometer-biometric-competition/data: recognize users of mobile devices from accelerometer data. The dataset contains approximately 60 million unique samples of accelerometer data collected from 387 different devices. These are split into equal sets for training and test. Samples in the training set are labeled with the unique device from which the data was collected. The test set is demarcated into 90k sequences of consecutive samples from one device.
 +
 +
== Базы данных ==
== Базы данных ==
* http://www.physionet.org/ contains collections of recorded physiologic signals (accelerometry, ECG, EEG, EHG, EMG, blood pressure, hart rate, auditory brainstem response, etc.)
* http://www.physionet.org/ contains collections of recorded physiologic signals (accelerometry, ECG, EEG, EHG, EMG, blood pressure, hart rate, auditory brainstem response, etc.)
 +
* http://www.ebi.ac.uk/arrayexpress/experiments/browse.html is a database of genomic data. Data can be searched by a number of parameters, such as molecule (DNA, RNA, amplicon, metabolite, protein) or experimntal technology (array, high-throughput sequencing, mass spectrometry)
* https://www.ieeg.org/ includes a large database of scientific data and tools to analyze epilepsy datasets.
* https://www.ieeg.org/ includes a large database of scientific data and tools to analyze epilepsy datasets.

Версия 11:47, 10 июля 2015

Временной ряд — набор измерений, сделанный через равные промежутки времени. Представлен ряд примеров временных рядов, предназначенных для тестирования алгоритмов прогнозирования.

Содержание

Структура файлов

Файл имеет расширение tsName.csv, значения в строках разделены запятыми. Десятичные знаки отделены точкой. Первый столбец — время. Второй столбец — прогнозируемый временной ряд, последующие столбцы — вспомогательный набор временных рядов. К файлу прилагается вспомогательный файл tsNameReadme.txt, в котором указаны:

  • источник данных (или задача, которую требовалось решить),
  • формат отсчетов времени,
  • названия столбцов (смысловые),
  • тип шкал столбцов,
  • периоды, если есть,
  • прочая информация.

Примеры

Синтетические ряды (в формате ts - см. TSForecastingInterfaces)

Высокопериодичные

  • Потребление электроэнергии EnergyConsumption
  • Работа машин и механизмов
  • Звук
  • Музыка LedZeppelin

Периодичные зашумленные

  • Цены на электроэнергию
  • Цены на потребительские товары
  • Объем сбыта товаров RetailSalesItems
  • Цены на сахар SugarPrice
  • Цены на хлеб WhiteBreadPrices
  • Объем потребления напитков
  • Погода: температура, влажность, сила ветра GermanWeather
  • Объем пассажирских (и грузо-) перевозок

Со сложным периодом

  • Электрокардиограмма ECG
  • Пульсовая волна
  • Энцефалограмма
  • Отраженные волны

Апериодичные

  • Распространение гриппа FluUSA
  • Миграция населения
  • Миграция птиц

Сильно зашумленные

  • Цены (объемы) на основные биржевые инструменты Cisco
  • Биржевые индикаторы DowJonesIndustrialAverage
  • Цены на опционы (по сетке)

Событийные

Акселерометр

  • http://smartlab.ws/component/content/article?id=60 (2013) 30 subjects, 6 activities, fixed set of features from
  • http://llmpp.nih.gov/lymphoma/: classification DLBCL (Diffuse large B-cell lymphoma) patients into curable and noncurable groups (pdf). Raw data for all Lymphochip microarrays are available here. For each microarray, two scan files were generated, one for each fluorescence emission wavelength corresponding to the fluorophor used in the reverse transcription labeling reaction.

Конкурсы Kaggle

  • https://www.kaggle.com/c/seizure-prediction: predict seizures in intracranial EEG recordings. Intracranial EEG was recorded from dogs with naturally occurring epilepsy using an ambulatory monitoring system. EEG was sampled from 16 electrodes at 400 Hz, and recorded voltages were referenced to the group average. These are long duration recordings, spanning multiple months up to a year and recording up to a hundred seizures in some dogs. Preictal training and testing data segments are provided covering one hour prior to seizure with a five minute seizure horizon.
  • https://www.kaggle.com/c/belkin-energy-disaggregation-competition/data: SmartHouse energy consumption prediction. Electromagnetic Interference (EMI) is measured using a special sensor built at the Ubicomp Lab to identify what appliance is being used and how much energy it is consuming. The data is available from 4 homes (H1-H4) consisting of both training datasets and testing datasets. The training set includes information about which appliance was turned ON or OFF and at what timestamps.
  • https://www.kaggle.com/c/predicting-parkinson-s-disease-progression-with-smartphone-data: measure the symptoms of Parkinson’s disease with a smartphone. The data was collected from 9 PD patients, at varying stages of the disease, and 7 healthy controls over a period wthin 4 months. The data inclides the following streams: audio, accelerometry (3D, for each of the 3 axes: mean, absolute central moment, standard deviation, maximum deviation, power spectral density across four separate bands), GPS (latitude, longitude, altitude), compass (for each of the 3 axes: mean, absolute central moment, standard deviation, maximum deviation).
  • https://www.kaggle.com/c/accelerometer-biometric-competition/data: recognize users of mobile devices from accelerometer data. The dataset contains approximately 60 million unique samples of accelerometer data collected from 387 different devices. These are split into equal sets for training and test. Samples in the training set are labeled with the unique device from which the data was collected. The test set is demarcated into 90k sequences of consecutive samples from one device.


Базы данных

  • http://www.physionet.org/ contains collections of recorded physiologic signals (accelerometry, ECG, EEG, EHG, EMG, blood pressure, hart rate, auditory brainstem response, etc.)
  • http://www.ebi.ac.uk/arrayexpress/experiments/browse.html is a database of genomic data. Data can be searched by a number of parameters, such as molecule (DNA, RNA, amplicon, metabolite, protein) or experimntal technology (array, high-throughput sequencing, mass spectrometry)
  • https://www.ieeg.org/ includes a large database of scientific data and tools to analyze epilepsy datasets.

См. также

Библиотека используется в проектах:

Личные инструменты