load_japanese_vowels#
- load_japanese_vowels(split=None, return_X_y=True, return_type=None)[source]#
Load the JapaneseVowels time series classification problem.
Example of a multivariate problem with unequal length series.
- Parameters:
- split: None or one of “TRAIN”, “TEST”, optional (default=None)
Whether to load the train or test instances of the problem. By default it loads both train and test instances (in a single container).
- return_X_y: bool, optional (default=True)
If True, returns (features, target) separately instead of a single dataframe with columns for features and the target.
- return_type: valid Panel mtype str or None, optional (default=None=”nested_univ”)
Memory data format specification to return X in, None = “nested_univ” type. str can be any supported sktime Panel mtype,
for list of mtypes, see datatypes.MTYPE_REGISTER for specifications, see examples/AA_datatypes_and_datasets.ipynb
- commonly used specifications:
“nested_univ: nested pd.DataFrame, pd.Series in cells “numpy3D”/”numpy3d”/”np3D”: 3D np.ndarray (instance, variable, time index) “numpy2d”/”np2d”/”numpyflat”: 2D np.ndarray (instance, time index) “pd-multiindex”: pd.DataFrame with 2-level (instance, time) MultiIndex
Exception is raised if the data cannot be stored in the requested type.
- Returns:
- X: pd.DataFrame with m rows and c columns
The time series data for the problem with m cases and c dimensions
- y: numpy array
The class labels for each case in X
Notes
Dimensionality: multivariate, 12 Series length: 7-29 Train cases: 270 Test cases: 370 Number of classes: 9
A UCI Archive dataset. 9 Japanese-male speakers were recorded saying the vowels ‘a’ and ‘e’. A ‘12-degree linear prediction analysis’ is applied to the raw recordings to obtain time-series with 12 dimensions and series lengths between 7 and 29. The classification task is to predict the speaker. Therefore, each instance is a transformed utterance, 12*29 values with a single class label attached, [1…9]. The given training set is comprised of 30 utterances for each speaker, however the test set has a varied distribution based on external factors of timing and experimental availability, between 24 and 88 instances per speaker. Reference: M. Kudo, J. Toyama and M. Shimbo. (1999). “Multidimensional Curve Classification Using Passing-Through Regions”. Pattern Recognition Letters, Vol. 20, No. 11–13, pages 1103–1111. Dataset details: http://timeseriesclassification.com/description.php ?Dataset=JapaneseVowels
Examples
>>> from sktime.datasets import load_japanese_vowels >>> X, y = load_japanese_vowels()