ts
File Format v1.0#
Overview#
This document has the following content:
Introduction: What is a
.ts
file, when and why to use it.Description: What are the individual components of a
.ts
file.Instructions: How to create your own
.ts
file.Illustrations: A running example to tie up the above sections.
Version History#
v1.0 - 2022-10-08 - author: Sagar Mishra
Introduction#
This document formalizes string identifiers used in .ts
file format.
Encoded in utf-8
, .ts
files stores time-series dataset and its corresponding
metadata (specified via string identifiers) and can be opened via any basic editor like notepad for visual inspection.
String identifiers refer to strings beginning with @
in the file.
.ts
files contains information blocks in the following order:
- A description block.
It contains any number of continuous lines starting with
#
. Each#
is followed by an arbitrary (utf-8) sequence of symbols. Thets
specification does not prescribe any content for the description block, but it is common to include a description of the dataset contained in the file. Eg: a full data dictionary, citations, etc. See subsection on description block for more details.
- A metadata block.
It contains continuous lines starting with
@
. Each@
is directly followed a string identifier without whitespace (@<identifier>
), followed by an appropriate value for the identifier where the value depends on type of identifier. There is no strict order of occurrence for all string identifiers, except@data
which must be at the end of this block. The number of lines in this block depends on certain properties of the dataset (e.g: if the dataset is multidimensional, an additional line is required to specify number of dimensions) See subsection on metadata block for further details.
- A dataset block.
It contains list of float values that represent the dataset. In the simplest case (when timestamps are absent), the values for a series are expressed in a comma-separated list and the index of each value is relative to its position in the list (0, 1, …, m). An instance may contain 1 to many dimensions, where instances are line-delimited and dimensions within an instance are colon-delimited (:). In case timestamps are present, individual data of the series is enclosed within round brackets as
(YYYY-MM-DD HH:mm:ss,<value>)
. The response variable is at the end of each instance and is separated via a colon. To understand data representation, visit loading data.
Here is an extract from Basic Motion.ts that shows all three blocks:
1#The data was generated as part of a student project where four students performed four activities whilst wearing a smart watch.
2#The watch collects 3D accelerometer and a 3D gyroscope It consists of four classes, which are walking, resting, running and
3#badminton.
4...
5@problemName BasicMotions
6@timeStamps false
7@missing false
8...
9@data
10-0.740653,-0.740653,10.208449,2.867009,-0.194301,-0.194301,-0.249618,0.516079,-0.255552:Standing
11-0.247409,-0.247409,-0.77129,-0.576154,-0.368484,-0.020851,-0.020851,-0.465607,-0.382975,-0.382975:Walking
12...
Description#
This section describes the components of a .ts
file.
Description Block#
This is an optional block that is present to provide context for the dataset. All lines are ignored by the sktime
loader functions. We recommend the user to add information that will give context about the dataset, like
how the dataset was collected, the type of license associated with this dataset, citations etc.
Metadata Block#
A metadata block consists of various string identifiers that serve the purpose of containing metadata for the dataset.
sktime
’s core loader/writer functions rely on their existence to correctly load data into memory.
This is also helpful to provide information about the dataset to a different user not familiar with the dataset.
The format of individual string identifier is: @<identifier> [value]
,
except for @data
where there is no trailing information.
Information that is included in the metadata:
Name of the dataset
Does it include timestamps
Does it include missing values
Does it contain only one dimension
Number of dimensions, in case of a multivariate problem
Do all instances have the same length
Labels for the class
String identifiers are to be written at the start of the line only and must be present at a separate line.
Note
Since these datasets are often from different sources (see tsregression and timeseriesclassification.com)
There may be minor conflict in their naming conventions (lowercase vs. camelCase).
sktime
internally takes care of such inconsistencies.
For this document, we will only use lowercase to represent the identifier.
However, if you run into an inconsistency that isn’t already taken care of, kindly consider opening an issue.
Here is a short description of every column found in the table:
Identifier: The name of the identifier preceded by
@
without any spaces.Description: Describing the purpose of an identifier.
Value: All possible values that the identifier can take.
Additional Comments: Few peculiarities to remember when writing this identifier.
Example: An illustrated value of the given identifier.
Identifier |
Description |
Value |
Additional Comments |
Example |
---|---|---|---|---|
|
The name of the dataset. |
any |
Value cannot be space separated |
|
|
Whether timestamps are present. |
|
|
|
|
Whether there are missing values. |
|
|
|
|
Whether there is only one dimension for the time series. |
|
|
|
|
The number of variables. |
integer > 0 |
Only present when |
6 |
|
Whether each instance has equal length. |
|
|
|
|
Number of timestamps in each instance. |
integer > 0 |
Only present if |
100 |
|
Whether there is a target label. |
|
Exclusive to regression data; |
|
|
Whether class labels are present. |
|
Exclusive to classification data; when |
|
|
Marks the beginning of data. |
- |
The data begins from the next line. |
- |
Instructions#
This section provides full set of instructions to create a format specification .ts
file
for your dataset that is compatible with sktime
.
Remember that this begins with the assumption that you have the dataset readily available in expected format.
Few points to keep in mind while creating the dataset:
The general order of identifiers does not matter with the exception that
@data
should be the last string identifier.One line should contain only one identifier-value pair.
Lines containing an identifier must begin with it.
The only place a space is allowed is between an identifier and its corresponding value.
Avoid having newline characters in between lines.
Follow the “comments, identifiers, data” order
- Create an empty file
Open your favorite text editor (even notepad works). We’ll add contents into this file before finally saving as a
.ts
file.
- Write a descriptive comment
Few initial lines of the file should ideally be given to describing the dataset. This is optional but gives context about the dataset. A comment line begins with
#
.
Add those metadata that are common to both classification and regression data
Add the problem name: eg:
@problemName Example
Add info about having missing contents: eg:
@missing false
Add info about timestamps: eg:
@timestamps true
Add info if dataset has only one dimension: eg:
@univariate false
Since univariate is eg:
false
, add info about number of dimensions, skip otherwise: eg:@dimension 3
Add info whether all instances have equal length: eg:
@equallength true
If above is true, add info about length of an instance, skip otherwise: eg:
@serieslength 5
Now depending if your dataset is:
Regression-based: add
@targetlabel
identifier followed bytrue
if the response variable exists, otherwisefalse
.Classification-based: add
@classlabel
identifier. If there is no response variable it will have a value offalse
. Iftrue
, you can optionally provide the class labels in space separated manner:eg: Three string labels:
@classlabel true good bad neutral
eg: Two integer labels:
@classlabel true 0 1
Add the identifier
@data
followed by the values in the newline.Finally, save the file as
<CHOOSE_NAME>.ts
. The encoding should beutf-8
.
Tip
File still showing as <CHOSEN_NAME>.ts.txt
? Rename it to <CHOSEN_NAME>.txt
then open your terminal and write
in that directory mv <CHOSEN_NAME>.txt <CHOSEN_NAME>.ts
.
Illustration#
Here, we provide a running example showing how your file will look like after performing each step in the instructions.
The sample dataset that we will use for this is as shown (single instance of multidimensional regression data, with timestamps):
1(2004-08-10 18:00:00,1130.0),(2004-08-10 19:00:00,1217.75),(2004-08-10 20:00:00,1134.75),(2004-08-10 21:00:00,1155.5),
2(2004-08-10 22:00:00,1151.0):(2004-08-10 18:00:00,1144.24),(2004-08-11 19:00:00,1111.25),(2004-08-11 20:00:00,1065.75),
3(2004-08-11 21:00:00,992.5),(2004-08-11 22:00:00,905.76):(2004-08-11 18:00:00,903.35),(2004-08-11 19:00:00,941.0),
4(2004-08-11 20:00:00,1073.6666666667),(2004-08-11 21:00:00,1113.5),(2004-08-11 22:00:00,1100.6):3.2
Let’s add some comments to give some context about the dataset:
1 # The following dataset is generated using sensor S in the apparatus A as shown in the following
2 # link: https://example.com/. We receive three individual variables, collected within the time duration of 4 hours.
3 # There are no missing values in the dataset and timestamps are also included.
4 # For more information about how data was collected, visit the datacollection.com.
Now, let’s add metadata that are common to both classification and regression dataset:
1 # The following dataset is generated using sensor S in the apparatus A as shown in the following
2 # link: https://example.com/. We receive three individual variables, collected within the time duration of 4 hours.
3 # There are no missing values in the dataset and timestamps are also included.
4 # For more information about how data was collected, visit the above mentioned link.
5 @problemName Example
6 @missing false
7 @timestamps true
8 @univariate false
9 @dimension 3
10 @equallength true
11 @serieslength 5
Since we have a regression dataset, let’s add
@targetlabel
astrue
:
1 # The following dataset is generated using sensor S in the apparatus A as shown in the following
2 # link: https://example.com/. We receive three individual variables, collected within the time duration of 4 hours.
3 # There are no missing values in the dataset and timestamps are also included.
4 # For more information about how data was collected, visit the above mentioned link.
5 @problemName Example
6 @missing false
7 @timestamps true
8 @univariate false
9 @dimension 3
10 @equallength true
11 @serieslength 5
12 @targetlabel true
Finally, let’s mark the beginning of the dataset by adding
@data
followed by the data in the newline.
1 # The following dataset is generated using sensor S in the apparatus A as shown in the following
2 # link: https://example.com/. We receive three individual variables, collected within the time duration of 4 hours.
3 # There are no missing values in the dataset and timestamps are also included.
4 # For more information about how data was collected, visit the above mentioned link.
5 @problemName Example
6 @missing false
7 @timestamps true
8 @univariate false
9 @dimension 3
10 @equallength true
11 @serieslength 5
12 @targetlabel true
13 @data
14 (2004-08-10 18:00:00,1130.0),(2004-08-10 19:00:00,1217.75),(2004-08-10 20:00:00,1134.75),(2004-08-10 21:00:00,1155.5),
15 (2004-08-10 22:00:00,1151.0):(2004-08-10 18:00:00,1144.24),(2004-08-11 19:00:00,1111.25),(2004-08-11 20:00:00,1065.75),
16 (2004-08-11 21:00:00,992.5),(2004-08-11 22:00:00,905.76):(2004-08-11 18:00:00,903.35),(2004-08-11 19:00:00,941.0),
17 (2004-08-11 20:00:00,1073.6666666667),(2004-08-11 21:00:00,1113.5),(2004-08-11 22:00:00,1100.6):3.2
After saving it as
sample.ts
, the file is ready to be loaded via sktime.
This concludes how to create string identifiers for .ts
format. To learn more about sktime
, visit
tutorials page.