Views: 77
A tidy data approach for digital ledgers
General Ledger journal entry
Consider how to define tidy data for General Ledger journal entry. The following is a journal entry example for purchasing paper with cash.
Date | Line Description | Debit (JPY) |
Credit (JPY) |
Account Number | Account Name | C | N | T | ID |
---|---|---|---|---|---|---|---|---|---|
2009-04-03 |
用紙 |
14,858 |
754 |
事務用消耗品費 |
10 |
札幌 |
部門 |
943-8-477 |
|
2009-04-03 |
742 |
191 |
仮払消費税等 |
10 |
札幌 |
部門 |
943-8-477 |
||
2009-04-03 |
用紙 |
15,600 |
111 |
現金 |
0 |
共通部門 |
部門 |
943-8-477 |
|
Key C: Business Segment Code, N: Business Segment Name, T: Business Segment Type |
Tidy data examples in Wickham
Our vocabulary of rows and columns is simply not rich enough to describe why the two tables represent the same data. In addition to appearance, we need a way to describe the underlying semantics, or meaning, of the values displayed in tables
Tidy Data Journal of Statistical Software August 2014 Volume 59 Issue 10
treatment A | treatment B | |
---|---|---|
John Smith |
– |
2 |
Jane Doe |
16 |
11 |
Mary Johnson |
3 |
1 |
John Smith | Jane Doe | Mary Johnson | |
---|---|---|---|
treatment A |
– |
16 |
3 |
treatment B |
2 |
11 |
1 |
A dataset is a collection of values. Values are organized in two ways. Every value belongs to a variable and an obserbation.
Table 4 reorganizes Table 1 to make the values, variables and observations more clear. The variables are:
1. person, with three possible values (John Smith, Mary Johnson, and Jane Doe)
2. treatment, with two possible values (A and B)
3. result, with five or six values depending on how you think of the missing value (-, 16, 3, 2, 11, 1).
Tidy Data Journal of Statistical Software August 2014 Volume 59 Issue 10
person | treatment | result value |
---|---|---|
John Smith |
A |
– |
Jane Doe |
A |
16 |
Mary Johnson |
A |
3 |
John Smith |
B |
2 |
Jane Doe |
B |
11 |
Mary Johnson |
B |
1 |
In tidy data:
— Each variable forms a column.
— Each observation forms a row.
— Each type of observational unit forms a table.
Tidy Data Journal of Statistical Software August 2014 Volume 59 Issue 10
Semantic structure of General Ledger
Table 5 shows Semantic structure of General Ledger. This hierarhical data structure represeents how variables are organized in digital ledger, which we are going to reorganize to a single tidy table.
ID | H | Element | O | R |
---|---|---|---|---|
A25 |
0 |
General Ledger |
0..n |
Aggregation |
A25-01 |
1 |
•Journal ID |
1..1 |
Identifier |
A25-02 |
1 |
•Journal Number |
0..1 |
Code |
A25-09 |
1 |
•Header Description |
0..1 |
Text |
A25-10 (A25-A26) |
1 |
•General Ledger Line |
1..n |
Aggregation |
A26-03 (A26-A09) |
2 |
••Chart Of Accounts |
1..1 |
Aggregation |
A26-A09-01 |
3 |
•••GL Account Number |
1..1 |
Identifier |
A26-A09-02 |
3 |
•••GL Account Name |
0..1 |
Text |
A26-A09-05 |
3 |
•••Account Type |
0..1 |
Code |
A26-A09-06 |
3 |
•••Account Subtype |
0..1 |
Code |
A25-A26-06 |
2 |
••Effective Date |
0..1 |
Date |
A25-A26-08 |
2 |
••Journal Entry Type Code |
0..1 |
Code |
A25-A26-07 |
2 |
••Line Number |
0..1 |
Code |
A25-A26-10 |
2 |
••Line Description |
0..1 |
Text |
A25-A26-11 |
2 |
••Source Code |
0..1 |
Code |
A25-A26-12 |
2 |
••Bill Number |
0..1 |
Code |
A25-A26-14 |
2 |
••Bill Date |
0..1 |
Date |
A25-A26-27 |
2 |
••Credit Debit Indicator |
0..1 |
Code |
A26-18 (A26-A89) |
2 |
••Multi Currency Amount |
1..n |
Aggregation |
A26-A89-01 |
3 |
•••Amount |
1..1 |
Amount |
A26-A89-02 |
3 |
•••Currency |
1..1 |
Code |
A26-41 (A26-A01) |
2 |
••Business Segment |
1..n |
Aggregation |
A26-A01-03 |
3 |
•••Segment Reference Level |
1..1 |
Code |
A26-A01-01 |
3 |
•••Business Segment Code |
1..1 |
Code |
A26-A01-02 |
3 |
•••Business Segment Name |
1..1 |
Text |
A26-A01-04 |
3 |
•••Business Segment Type |
0..1 |
Text |
Key H: Hierarchy Level, O: Occurence, R: Representation |
Codd’s 3rd normal form tables for General Ledger
Traditional approach for representing above semantic structure is defining many Codd’s 3rd normal form tables and connecting these table with relational data base. Tidy data puts focus on a single data set instead of the many connected datasets.
Table 6, Table 7, Table 8, Table 9, and Table 10 are tables for the relational data base.
ID | H | Element | O | R |
---|---|---|---|---|
A25 |
0 |
General Ledger |
0..n |
Aggregation |
A25-01 |
1 |
•Journal ID |
1..1 |
Identifier |
A25-02 |
1 |
•Journal Number |
0..1 |
Code |
A25-09 |
1 |
•Header Description |
0..1 |
Text |
A25-10 (A25-A26) |
1 |
•General Ledger Line |
1..n |
Aggregation |
Key H: Hierarchy Level, O: Occurence, R: Representation |
ID | H | Element | O | R |
---|---|---|---|---|
A26 |
0 |
General Ledger Line |
0..n |
Aggregation |
A26-03 (A26-A09) |
1 |
•Chart Of Accounts |
1..1 |
Aggregation |
A26-06 |
1 |
•Effective Date |
0..1 |
Date |
A26-08 |
1 |
•Journal Entry Type Code |
0..1 |
Code |
A26-07 |
1 |
•Line Number |
0..1 |
Code |
A26-10 |
1 |
•Line Description |
0..1 |
Text |
A26-11 |
1 |
•Source Code |
0..1 |
Code |
A26-12 |
1 |
•Bill Number |
0..1 |
Code |
A26-14 |
1 |
•Bill Date |
0..1 |
Date |
A26-27 |
1 |
•Credit Debit Indicator |
0..1 |
Code |
A26-18 (A26-A89) |
1 |
•Multi-Currency Amount |
1..1 |
Aggregation |
A26-41 (A26-A01) |
1 |
•Business Segment |
1..n |
Aggregation |
Key H: Hierarchy Level, O: Occurence, R: Representation |
ID | H | Element | O | R |
---|---|---|---|---|
A09 |
0 |
Chart Of Accounts |
0..n |
Aggregation |
A09-01 |
1 |
•GL Account Number |
1..1 |
Identifier |
A09-02 |
1 |
•GL Account Name |
0..1 |
Text |
A09-05 |
1 |
•Account Type |
0..1 |
Code |
A09-06 |
1 |
•Account Subtype |
0..1 |
Code |
Key H: Hierarchy Level, O: Occurence, R: Representation |
ID | H | Element | O | R |
---|---|---|---|---|
A89 |
0 |
Multi-Currency Amount |
0..n |
Aggregation |
A89-01 |
1 |
•Amount |
1..1 |
Amount |
A89-02 |
1 |
•Currency |
1..1 |
Code |
Key H: Hierarchy Level, O: Occurence, R: Representation |
ID | H | Element | O | R |
---|---|---|---|---|
A01 |
0 |
Business Segment |
0..n |
Aggregation |
A01-03 |
1 |
•Segment Reference Level |
1..1 |
Code |
A01-01 |
1 |
•Business Segment Code |
1..1 |
Code |
A01-02 |
1 |
•Business Segment Name |
1..1 |
Text |
A01-04 |
1 |
•Business Segment Type |
0..1 |
Text |
Key H: Hierarchy Level, O: Occurence, R: Representation |
— Column headers are values, not variable names.
— Multiple variables are stored in one column.
— Variables are stored in both rows and columns.
— Multiple types of observational units are stored in the same table.
— A single observational unit is stored in multiple tables.
Tidy Data Journal of Statistical Software August 2014 Volume 59 Issue 10
Tidy data for General Ledger
The first step is detecting variable which is included aggregation table within a table. These variable requires are used for reporting the value if its occurrence is specified as multiple. A General Ledger table includes variables ,which are multiple occurrence aggregation tables, namely General Ledger Line, Multi Currency Amount, and Business Segment.
In our example we know that General Ledger is recorded in a single currency, Local Currency is Reporting Currency. We don’t need variable for multi-currency amount.
There are following three variables .
— d_A25 General Ledger
— d_A26 General Ledger Line
— d_A01 Business Segment
Following table is defined using these three variables and variable for concept as a column header and value column.
d_A26 column corresponding to the three entry lines.
d_A25 | d_A26 | d_A01 | ID | Element | value |
---|---|---|---|---|---|
943-8-477 |
A25-01 |
Journal ID |
943-8-477 |
||
943-8-477 |
A25-A26-08 |
JE Type Code |
entries |
||
943-8-477 |
A25-A26-07 |
Journal Line Number |
8-477 |
||
943-8-477 |
A25-A26-14 |
Bill Date |
2009-04-03 |
||
943-8-477 |
A25-A26-12 |
Bill Number |
manual |
||
943-8-477 |
A25-A26-11 |
Source Code |
gj |
||
943-8-477 |
A25-09 |
Header Description |
General |
||
943-8-477 |
A25-02 |
Journal Number |
134 |
||
943-8-477 |
1 |
A26-A09-01 |
GL Account Number |
754 |
|
943-8-477 |
1 |
A26-A09-02 |
GL Account Name |
事務用消耗品費 |
|
943-8-477 |
1 |
A26-A09-05 |
Account Type |
account |
|
943-8-477 |
1 |
A25-A26-27 |
Credit Debit Indicator |
debit |
|
943-8-477 |
1 |
A26-A89-01 |
Amount |
14858 |
|
943-8-477 |
1 |
A26-A89-02 |
Currency |
ISO4217:JPY |
|
943-8-477 |
1 |
A25-A26-06 |
Effective Date |
2009-04-03 |
|
943-8-477 |
1 |
A26-A09-06 |
Account Subtype |
proposed |
|
943-8-477 |
1 |
A25-A26-10 |
Line Description |
用紙 |
|
943-8-477 |
1 |
1 |
A26-A01-03 |
Segment Reference Level |
account sub |
943-8-477 |
1 |
1 |
A26-A01-01 |
Business Segment Code |
10 |
943-8-477 |
1 |
1 |
A26-A01-02 |
Business Segment Name |
札幌 |
943-8-477 |
1 |
1 |
A26-A01-04 |
Business Segment Type |
部門 |
943-8-477 |
2 |
A26-A09-01 |
GL Account Number |
191 |
|
943-8-477 |
2 |
A26-A09-02 |
GL Account Name |
仮払消費税等 |
|
943-8-477 |
2 |
A26-A09-05 |
Account Type |
account |
|
943-8-477 |
2 |
A25-A26-27 |
Credit Debit Indicator |
debit |
|
943-8-477 |
2 |
A26-A89-01 |
Amount |
742 |
|
943-8-477 |
2 |
A26-A89-02 |
Currency |
ISO4217:JPY |
|
943-8-477 |
2 |
A25-A26-06 |
Effective Date |
2009-04-03 |
|
943-8-477 |
2 |
A26-A09-06 |
Account Subtype |
tax |
|
943-8-477 |
2 |
1 |
A26-A01-03 |
Segment Reference Level |
account sub |
943-8-477 |
2 |
1 |
A26-A01-01 |
Business Segment Code |
10 |
943-8-477 |
2 |
1 |
A26-A01-02 |
Business Segment Name |
札幌 |
943-8-477 |
2 |
1 |
A26-A01-04 |
Business Segment Type |
部門 |
943-8-477 |
3 |
A26-A09-01 |
GL Account Number |
111 |
|
943-8-477 |
3 |
A26-A09-02 |
GL Account Name |
現金 Cash |
|
943-8-477 |
3 |
A26-A09-05 |
Account Type |
account |
|
943-8-477 |
3 |
A25-A26-27 |
Credit Debit Indicator |
credit |
|
943-8-477 |
3 |
A26-A89-01 |
Amount |
15600 |
|
943-8-477 |
3 |
A26-A89-02 |
Currency |
ISO4217:JPY |
|
943-8-477 |
3 |
A25-A26-06 |
Effective Date |
2009-04-03 |
|
943-8-477 |
3 |
A26-A09-06 |
Account Subtype |
proposed |
|
943-8-477 |
3 |
A25-A26-10 |
Line Description |
用紙 |
|
943-8-477 |
3 |
1 |
A26-A01-03 |
Segment Reference Level |
account sub |
943-8-477 |
3 |
1 |
A26-A01-01 |
Business Segment Code |
0 |
943-8-477 |
3 |
1 |
A26-A01-02 |
Business Segment Name |
共通部門 |
943-8-477 |
3 |
1 |
A26-A01-04 |
Business Segment Type |
部門 |
Table 11 is mostly tidy, but observed value contains value of different data type. This is the result of the column Element containd different representation. Note that Tidy data doesn’t limit the number of columns for measured variables. Providing additional columns for each element’s variable results Table 12.
d_A25 General Ledger |
d_A26 General Ledger Line |
d_A01 Business Segment |
A25-01 Journal ID |
A25-A26-08 Type Code |
A25-A26-07 Journal Line Number |
A25-A26-14 Bill Date |
A25-A26-12 Bill Number |
A25-A26-11 Source Code |
A25-09 Header Description |
A25-02 Journal Number |
A26-A09-01 GL Account Number |
A26-A09-02 GL Account Name |
A26-A09-05 Account Type |
A25-A26-27 Credit Debit Indicator |
A26-A89-01 Amount |
A26-A89-02 Currency |
A25-A26-06 Effective Date |
A26-A09-06 Account Subtype |
A25-A26-10 Line Description |
A26-A01-03 Segment Reference Level |
A26-A01-01 Business Segment Code |
A26-A01-02 Business Segment Name |
A26-A01-04 Business Segment Type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
943-8-477 |
943-8-477 |
entries |
8-477 |
2009-04-03 |
manual |
gj |
General |
134 |
|||||||||||||||
943-8-477 |
1 |
754 |
事務用消耗品費 |
account |
debit |
14858 |
JPY |
2009-04-03 |
proposed |
用紙 |
|||||||||||||
943-8-477 |
1 |
1 |
account sub |
10 |
札幌 |
部門 |
|||||||||||||||||
943-8-477 |
2 |
191 |
仮払消費税等 |
account |
debit |
742 |
JPY |
2009-04-03 |
tax |
||||||||||||||
943-8-477 |
2 |
1 |
account sub |
10 |
札幌 |
部門 |
|||||||||||||||||
943-8-477 |
3 |
111 |
現金 |
account |
credit |
15600 |
JPY |
2009-04-03 |
proposed |
用紙 |
|||||||||||||
943-8-477 |
3 |
1 |
account sub |
0 |
共通部門 |
部門 |
xBRL taxonomy and xBRL-CSV
xBRL taxonomy
On the left side of Figure 1, there is a piece of primitive numeric data: “1234.” On its own, that data point only relays information to a reader if that recipient is already familiar with its context and purpose. For the purposes of this discussion, that data point can be referred to as a fact. In layers, meaning can be added to the fact by defining precision, scaling, units, and other similar information. Contextual meaning can also be supplied through another layer containing information such as time and location.
Finally, the fact can be identified with a concept. Depending on the semantic component of the standard being used to represent the data, this conceptual information could be quite detailed and include a wealth of additional meaning. As mentioned previously, structured data standards offer defined systems to represent data in this way, by adding layers of meaningful context that provide useful, pre-determined ways to interpret that data. XBRL is one such standard.
XBRL Taxonomy Development Handbook A Guide for XBRL Taxonomy

In XBRL, a fact is the unique intersection of a set of XBRL dimensions with a data point. Figure 2 illustrates the basic structure of a fact. Arbitrary information, such as a number or a name or even a short section of text, has no semantic or contextual information in and of itself. Once XBRL dimensions, which add semantic information, intersect with that data point, it now becomes an XBRL fact.
XBRL Taxonomy Development Handbook A Guide for XBRL Taxonomy

Table 13 shows the relationship between Tidy data terms and xBRL terms.
.Relation among Tidy data and xBRL
Tidy data | xBRL |
---|---|
value |
fact |
variable |
dimension |
observation |
data point |
The concepts would be the line items in Figure 3 (such as Food Expenses, Rent Expenses, and Utilities Expenses). The periods would be represented by the columns (January, February, and so forth). Again, each place where a concept dimension and period dimension intersect is a fact (in this case, a monetary amount located in the cell), and the combination of the contextual information provided by the concept and other dimensions (the period) create the XBRL dimensions of that fact.
XBRL Taxonomy Development Handbook A Guide for XBRL Taxonomy

xBRL-CSV
xBRL-CSV[_2] supports this shape of Tidy data.
The data I tried this time was converted from PCA Accounting 9V.2 journal entry data to XBRL-GL in 2003 (2829records 31MB).
Example xbrlgl/instances/0001-20091231-145-2109-1-4679.xml
The article page for XBRL-GL is https://www.sambuichi.jp/?p=345.
xBRL-CSV data adc-instances2-facts.csv (18MB)
xBRL-CSV meta data adc-instances2-metadata.json (1.1KB)
xBRL-CSV data adc-instances.csv (2.4MB) 1/12 or more reduction
xBRL-CSV meta data adc-instances.json (4.6KB)
Schema core.xsd
Definition Linkbase core-def.xml
Presentation Linkbase core-pre.xml
Label Linkbase core-label-en.xml
Bibliography
[1] Tidy Data, Journal of Statistical Software, August 2014, Volume 59, Issue 10, Headley Wickham
https://www.jstatsoft.org/article/view/v059i10
[2] xBRL-CSV: CSV representation of XBRL data 1.0, Recommendation 13 October 2021
https://www.xbrl.org/Specification/xbrl-csv/REC-2021-10-13/xbrl-csv-REC-2021-10-13.html
[3] XBRL Taxonomy Development Handbook A Guide for XBRL Taxonomy, Developers First Edition (July 2020), XBRL US
https://xbrlus.github.io/docs/tdh.html#a_Toc45794887