Численные методы обучения по прецедентам (практика, В.В. Стрижов)/Коллекция реальных данных
Материал из MachineLearning.
Коллекция реальных данных для выполнения работ по курсу Численные методы обучения по прецедентам находится по адресу https://svn.code.sf.net/p/mvr/code/data/
Measured data collection https://svn.code.sf.net/p/mvr/code/data/ contains collection of data for regression, classification, ranking and clustering.
Title: Croatian power plants waste measurements
- Link: CroatianPPs.csv, CroatianPPs.xls
- Abstract:
- Characteristics: multivariate
- Attributes: see the columns of the data file
- Scales: real
- Problem: alternatives ranking
- Number of samples: 8
- Number of features: 12
- Source: DOI: 10.1016/j.energy.2011.04.030
- Citation: Strijov V. et al. Integral indicator of ecological impact of the Croatian thermal power plants // Energy, 2011. Vol. 36(7). P. 4144-4149.
Title: Diabets study
- Link: Diabets_LARS.csv
- Abstract:
- Characteristics: multivariate
- Attributes: see the columns of the data file
- Scales: real
- Problem: regression
- Number of samples: 442
- Number of features: 10
- Source: http://www-stat.stanford.edu/~imj/WEBLIST/2004/LarsAnnStat04.pdf
- Citation: B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression // Annals of statistics. Volume 32, Number 2 (2004), 407-499.
Title: Electric cars comparisons
- Link: ElectricCarsComparisonNamed.csv, ElectricCarsComparison.xls
- Abstract:
- Characteristics: multivariate
- Attributes: see the columns of the data file
- Scales: real, nominal
- Problem: alternatives ranking
- Number of samples: 20
- Number of features: 7
- Source: http://www.electriccarsite.co.uk/uk-electric-car-comparisons
- Citation:
Title: Energy consumption
- Link: EnergyConsumption.xls
- Abstract:
- Characteristics: multivariate, time series
- Attributes: see the columns of the data file
- Scales: real, nominal
- Problem: regression, time series forecasting
- Number of samples: 8760
- Number of features: 4
- Source:
- Citation: Сандуляну Л.Н., Стрижов В.В. Выбор признаков в авторегрессионных задачах прогнозирования // Информационные технологии, 2012, 7 — 11-15.
Title: Hungarian Cash Loan
- Link: HungarianCashLoan.csv, HungarianCashLoan.xls, HungarianCashLoanUSD.csv
- Abstract:
- Characteristics: multivariate
- Attributes: see the columns of the data file
- Scales: real, nominal, binary
- Problem: classification (last column is the target variable)
- Number of samples: 6269
- Number of features: 49
- Source:
- Citation:
Title: MMRO 15 contest OTP
- Link: Contest_MMRO15_OTP.xls
- Abstract:
- Characteristics: multivariate
- Attributes: see the columns of the data file
- Scales: real, nominal, binary
- Problem: classification
- Number of samples: 15223
- Number of features: 52
- Source: http://poligon.machinelearning.ru/Contests/Card.aspx?synonim=otp
- Citation:
Title: Hybrid cars comparisons
- Link: HybridCarsComparison.xls
- Abstract:
- Characteristics: multivariate
- Attributes: see the columns of the data file
- Scales: real, nominal
- Problem: alternatives ranking
- Number of samples: 20
- Number of features: 7
- Source: http://www.buzzle.com/articles/hybrid-cars-comparison.html
- Citation:
Title: Iris Data Set
- Link: IrisClassification.txt, IrisFromUCI.csv
- Abstract: This is perhaps the best known database to be found in the pattern recognition literature. Dataset can be found at the UCI machine learning repository.
- Characteristics: multivariate
- Attributes: see description in http://archive.ics.uci.edu/ml/datasets/Iris
- Scales: real
- Problem: multiclass classification
- Number of samples: 150
- Number of features: 4
- Source: http://archive.ics.uci.edu/ml/datasets/Iris
- Citation:
Title: Wine Data Set
- Link: ItalianWines3Classes.csv, ItalianWines3Classes_Readme.txt
- Abstract: These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.
- Characteristics: multivariate
- Attributes: see description in ItalianWines3Classes_Readme.txt
- Scales: real
- Problem: multiclass classification
- Number of samples: 178
- Number of features: 13
- Source: http://archive.ics.uci.edu/ml/datasets/Wine
- Citation:
Title: Mortage and auto loan
- Link: MortgageAndAutoLoan.csv, MortgageAndAutoLoan.xls,MortgageAndAutoLoan_Readme.txt
- Abstract:
- Characteristics: multivariate
- Attributes: see description in MortgageAndAutoLoan_Readme.txt
- Scales: real
- Problem: classification
- Number of samples: 1891
- Number of features: 18
- Source:
- Citation:
Title: Sugar prices
- Link: SuigarPriceNYBOT.csv, SuigarPriceNYBOT.xls
- Abstract:
- Characteristics: univariate, time series
- Attributes: time
- Scales: datetime
- Problem: regression, time series forecasting
- Number of samples: 1903
- Number of features: 1
- Source:
- Citation:
Title: Turkish electricity consumption
- Link: TurkElectricityConsumption.csv, TurkElectricityConsumption.xls, TurkElectricityConsumption_Readme.txt
- Abstract:
- Characteristics: multivariate, time series
- Attributes: see description in TurkElectricityConsumption_Readme.txt
- Scales: real
- Problem: regression, time series forecasting
- Number of samples: 1903
- Number of features: 7
- Source:
- Citation:
Title: White bread prices
- Link: WhiteBreadPrices.csv, WhiteBreadPricesSep.xls, WhiteBreadPrices_Readme.txt
- Abstract:
- Characteristics: univariate, time series
- Attributes: see description in WhiteBreadPrices_Readme.txt
- Scales: real
- Problem: regression, time series forecasting
- Number of samples: 195
- Number of features: 1
- Source:
- Citation:
Title: Face profile multiclass collection
- Link: FaceProfile dir
- Abstract:
- Characteristics: images
- Attributes: each file in dir FaceProfile is an object. Class labels (A,B,C,D,E,F,G) are the substrings of filenames
- Scales: real
- Problem: classification
- Number of samples: 45
- Number of features:
- Source:
- Citation:
Title: European option prices and volatility smile
- Link: OptionDataRS.xls
- Abstract:
- Characteristics: multivariate
- Attributes: see the columns of the data file
- Scales: real
- Problem: regression
- Number of samples: 26
- Number of features: 18
- Source:
- Citation:
Title: Sale dymanics in a supermarket
- Link: SupermarketSales.txt
- Abstract:
- Characteristics: multivariate
- Attributes: columns: ProductID, time, volume, price
- Scales: real
- Problem: regression
- Number of samples: 167289
- Number of features: 4
- Source:
- Citation:
Title: KNN Forecasting
- Link: k-NNForecasting.xls
- Abstract: need docs
- Characteristics:
- Attributes:
- Scales:
- Problem:
- Number of samples:
- Number of features
- Source:
- Citation:
Title: South African heart disease
- Link: SAheart.txt, SAHD.txt, SAheart.info
- Abstract:
- Characteristics: multivariate
- Attributes: see description in SAheart.info
- Scales: real, nominal
- Problem: classification
- Number of samples: 462
- Number of features: 10
- Source: http://www-stat.stanford.edu/~tibs/ElemStatLearn/data.html
- Citation: http://strijov.com/papers/Skipor10Method.pdf
Title: Combustion pressure in diesel engine
- Link: PressureCurves.csv
- Abstract:
- Characteristics: multivariate
- Attributes: see description in http://svn.code.sf.net/p/mlalgorithms/code/Ivkin2010ForecastingO_2/doc/ivkin10forecastingO_2.pdf
- Scales: real, nominal
- Problem: regression
- Number of samples: 122
- Number of features: 7220
- Source:
- Citation:
Title: Income vs Lifespan
- Link: IncomeVsLifespan.xls
- Abstract:
- Characteristics: multivariate
- Attributes: see the columns of the data file
- Scales: real
- Problem: regression
- Number of samples: 193
- Number of features: 7
- Source:
- Citation: