Views: 52
XBRL GL Palette Taxonomy Parser
2025-04-02
This article introduces a Python-based parser to extract a logical hierarchical model (LHM) structure from the XBRL GL taxonomy. The parser also retrieves multilingual labels and documentation from the label linkbase. The output is a structured CSV file useful for semantic analysis, implementation, and documentation.
1. Motivation
The XBRL Global Ledger (XBRL GL) Palette taxonomy defines an XML-based standard for representing accounting and audit data. However, its hierarchical structure—especially when modularised—can be difficult to navigate, particularly when multilingual labels are defined using labelArc.
This script provides a bridge between raw schema definitions and a friendly CSV format enriched with English and localised labels (e.g., Japanese).
2. What This Script Does
-
Loads all
gl-*.xsd
andgl-*-content.xsd
schemas -
Detects
complexType
definitions withanyType
base as tuples -
Extracts all element names, types, and cardinality
-
Extracts labels from
label.xml
andlabel-ja.xml
vialabelArc
-
Supports fallback resolution of label identifiers
-
Outputs a fully annotated CSV representing the logical structure defined by complexType and complexContent/xs:sequence declarations in the schema
3. Requirements
-
Python 3.7 or later
-
lxml
library:pip install lxml
4. Usage Instructions
4.1. Command-Line Execution
python xbrl_gl_label_parser.py --base-dir XBRL-GL-PWD-2016-12-01
4.2. Optional Parameters
Argument | Description |
---|---|
|
(Required) Path to the root directory of your XBRL GL taxonomy |
|
Subdirectory name of the palette folder (default: |
|
Language code for labels (default: |
|
Enable detailed debug logging |
|
Enable top-level trace output |
|
Output CSV filename (default: XBRL_GL_Parsed_LHM_Structure.csv) |
4.3. Example (in launch.json
for VSCode)
"args": [
"--base-dir", "XBRL-GL-PWD-2016-12-01",
"--palette", "case-c-b",
"--lang", "ja",
"--debug",
"--trace",
"--output", "XBRL_GL_case-c-b_Structure.csv"
]
5. Input Directory Structure
Your XBRL GL taxonomy should be structured like this:
XBRL-GL-PWD-2016-12-01/
├── gl/
│ ├── cor/
│ │ ├── gl-cor-2016-12-01.xsd
│ │ └── lang/
│ │ ├── gl-cor-2016-12-01-label.xml
│ │ └── gl-cor-2016-12-01-label-ja.xml
│ ├── bus/
│ ├── muc/
│ └── ...
├── gl/plt/case-c-b/
│ ├── gl-cor-content-2016-12-01.xsd
│ └── ...
6. Output
The script generates a CSV file:
Level,Element,Type,Path,isTuple,minOccurs,maxOccurs,BaseType,Label,Documentation,LocalLabel,LocalDocumentation
1,accountingEntries,gl-cor:accountingEntriesComplexType,/gl-cor:accountingEntries,True,1,unbounded,,Accounting Entries,Root for XBRL GL. No entry made here.,【会計仕訳】,XBRL GLのルート要素。 この要素にはデータは登録されない。
2,gl-cor:documentInfo,gl-cor:documentInfoComplexType,/gl-cor:accountingEntries/gl-cor:documentInfo,True,1,1,,Document Information,Parent for descriptive information about the accountingEntries section in which it is contained.,【文書情報】,この会計仕訳に関する情報の親タグ。
3,gl-cor:entriesType,gl-gen:entriesTypeItemType,/gl-cor:accountingEntries/gl-cor:documentInfo/gl-cor:entriesType,False,1,1,xbrli:tokenItemType,Document Type,"account: information to fill in a chart of accounts file.
balance: the results of accumulation of a complete and validated list of entries for an account (or a list of account) in a specific period - sometimes called general ledger
entries: a list of individual accounting entries, which might be posted/validated or nonposted/validated
journal: a self-balancing (Dr = Cr) list of entries for a specific period including beginning balance for that period.
ledger: a complete list of entries for a specific account (or list of accounts) for a specific period; note - debits do not have to equal credits.
assets: a listing of open receivables, payables, inventory, fixed assets or other information that can be extracted from but are not necessarily included as part of a journal entry.
trialBalance: the self-balancing (Dr = Cr) result of accumulation of a complete and validated list of entries for the entity in a complete list of accounts in a specific period.
Google Drive
XBRL_GL_Parsed_LHM_Structure.csv
6.1. CSV Columns
Column | Meaning |
---|---|
|
Depth level in the hierarchy |
|
QName (e.g. |
|
Schema type (e.g. |
|
Hierarchy path |
|
True if the type is a tuple |
|
Minimum cardinality |
|
Maximum cardinality |
|
Underlying XBRL base type (e.g. |
|
English label from |
|
English description |
|
Localised label (e.g. Japanese) |
|
Localised description |
6.2. Notes
-
Tuples are determined by checking if
complexType
is based onanyType
. -
Localised labels (e.g.
ja
) can be extracted by using--lang ja
. -
The script is modular and extensible to support other taxonomies.
7. Related Links
-
https://www.xbrl.org/the-standard/what/global-ledger/ — XBRL Global Ledger: Transactional Reporting
-
https://specifications.xbrl.org/spec-group-index-xbrl-gl.html — XBRL Specification
-
https://www.xbrl.org/int/gl/2015-03-25/GLTFTA-REC-2015-03-25.html — XBRL GL Taxonomy Framework Technical Architecture 2015
8. Questions or Feedback?
If you have suggestions, encounter issues, or need support adapting the script to other taxonomies, feel free to comment on this page. Contributions and improvements are always welcome.
You can also fork the script or submit enhancements by referencing the source file:
SOURCE
Google Drive xbrl_gl_palette_parser.py
#!/usr/bin/env python3
# coding: utf-8
"""
xbrl_gl_palette_parser.py
Parses XBRL Global Ledger (XBRL GL) taxonomy and extracts labeled hierarchical element structures into CSV format.
Designed by SAMBUICHI, Nobuyuki (Sambuichi Professional Engineers Office)
Written by SAMBUICHI, Nobuyuki (Sambuichi Professional Engineers Office)
Creation Date: 2025-04-02
MIT License
(c) 2025 SAMBUICHI, Nobuyuki (Sambuichi Professional Engineers Office)
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Usage:
python xbrl_gl_label_parser.py --base-dir <taxonomy-root-directory> [--palette <palette-subdir>] [--lang <language-code>] [--debug] [--trace] [--output <filename>]
Arguments:
--base-dir Required. Path to the root of the XBRL GL taxonomy (e.g., XBRL-GL-PWD-2016-12-01).
--palette Optional. Subdirectory name of the palette folder (default: case-c-b-m-u-e-t-s).
--lang Optional. Language code for multilingual labels. Default is 'ja'.
--debug Optional. Enables detailed debug output.
--trace Optional. Enables trace messages.
--output Optional. Filename for the output CSV (default: XBRL_GL_Parsed_LHM_Structure.csv).
Example:
python xbrl_gl_label_parser.py --base-dir XBRL-GL-PWD-2016-12-01 --palette case-c-b --lang ja --debug --output my_labels.csv
"""
import lxml.etree as ET
import os
import re
import csv
import argparse
from collections import defaultdict
TRACE = True
DEBUG = True
def trace_print(text):
if TRACE or DEBUG:
print(text)
def debug_print(text):
if DEBUG:
print(text)
# Helper to clean label IDs
def clean_label_id(label_id):
label_id = re.sub(r"^label_", "", label_id)
label_id = re.sub(r"(_lbl|_\d+(_\d+)?)$", "", label_id)
return label_id
# Argument parser for base directory
parser = argparse.ArgumentParser(description="Parse XBRL-GL schemas and extract labeled hierarchy.")
parser.add_argument("--palette", type=str, default="case-c-b-m-u-e-t-s", help="Palette subdirectory under gl/plt/ (e.g. case-c-b or case-c-b-m-u-e-t-s)")
parser.add_argument("--base-dir", type=str, required=True, help="Base directory path to XBRL GLtaxonomy, e.g. XBRL-GL-PWD-2016-12-01")
parser.add_argument("--debug", action="store_true", help="Enable debug output")
parser.add_argument("--trace", action="store_true", help="Enable trace output")
parser.add_argument("--lang", type=str, default="ja", help="Language code for local labels (e.g. 'ja', 'en')")
parser.add_argument("--output", type=str, default="XBRL_GL_Parsed_LHM_Structure.csv", help="Output CSV filename")
args = parser.parse_args()
base_dir = args.base_dir
palette = args.palette
DEBUG = args.debug
TRACE = args.trace
LANG = args.lang
output_filename = args.output
xsd_path = os.path.join(base_dir, f"gl/plt/{palette}/gl-cor-content-2016-12-01.xsd")
namespaces = {
'xs': "http://www.w3.org/2001/XMLSchema",
'xbrli': "http://www.xbrl.org/2003/instance"
}
modules = ['gen', 'cor', 'bus', 'muc', 'usk', 'ehm', 'taf', 'srcd']
# Load base schemas and build type maps
element_type_map = {}
type_base_map = {}
type_base_lookup = {}
complex_type_lookup = {}
for mod in modules:
path = os.path.join(base_dir, f"gl/{mod}/gl-{mod}-2016-12-01.xsd")
if os.path.exists(path):
tree = ET.parse(path)
root = tree.getroot()
for el in root.xpath("//xs:element", namespaces=namespaces):
name, type_ = el.get("name"), el.get("type")
if name and type_:
# debug_print(f"gl-{mod}:{name}")
element_type_map[f"gl-{mod}:{name}"] = type_
for tdef in root.xpath("//xs:simpleType | //xs:complexType", namespaces=namespaces):
name = tdef.get("name")
if name:
# debug_print(name)
complex_type_lookup[name] = tdef
restriction = tdef.find(".//xs:restriction", namespaces)
if restriction is not None:
base = restriction.get("base")
if base:
type_base_map[name] = base
type_base_lookup[name] = base
extension = tdef.find(".//xs:extension", namespaces)
if extension is not None:
base = extension.get("base")
if base:
type_base_map[name] = base
type_base_lookup[name] = base
# Load content schemas
content_roots = {}
for mod in modules:
path = os.path.join(base_dir, f"gl/plt/{palette}/gl-{mod}-content-2016-12-01.xsd")
if os.path.exists(path):
content_roots[mod] = ET.parse(path).getroot()
tree = ET.parse(path)
root = tree.getroot()
for el in root.xpath("//xs:element", namespaces=namespaces):
name, type_ = el.get("name"), el.get("type")
if name and type_:
# debug_print(f"gl-{mod}:{name}")
element_type_map[f"gl-{mod}:{name}"] = type_
for tdef in root.xpath("//xs:simpleType | //xs:complexType", namespaces=namespaces):
name = tdef.get("name")
if name:
# debug_print(name)
complex_type_lookup[name] = tdef
restriction = tdef.find(".//xs:restriction", namespaces)
if restriction is not None:
base = restriction.get("base")
if base:
type_base_map[name] = base
type_base_lookup[name] = base
extension = tdef.find(".//xs:extension", namespaces)
if extension is not None:
base = extension.get("base")
if base:
type_base_map[name] = base
type_base_lookup[name] = base
# Load content schemas
content_roots = {}
for mod in modules:
path = os.path.join(base_dir, f"gl/plt/{palette}/gl-{mod}-content-2016-12-01.xsd")
if os.path.exists(path):
content_roots[mod] = ET.parse(path).getroot()
# Load label linkbases (EN and JA)
def load_labels(mod, lang):
label_map = defaultdict(dict)
suffix = "label.xml" if lang == "en" else f"label-{lang}.xml"
path = os.path.join(base_dir, f"gl/{mod}/lang/gl-{mod}-2016-12-01-{suffix}")
if not os.path.exists(path):
return label_map
tree = ET.parse(path)
root = tree.getroot()
ns = {'link': 'http://www.xbrl.org/2003/linkbase', 'xlink': 'http://www.w3.org/1999/xlink'}
locator_map = {}
label_resources = {}
# Map locator label -> href target
for loc in root.xpath(".//link:loc", namespaces=ns):
label_id = loc.get("{http://www.w3.org/1999/xlink}label")
href = loc.get("{http://www.w3.org/1999/xlink}href")
_, anchor = href.split("#")
if label_id and href and '#' in href:
locator_map[label_id] = anchor
# Collect label resources
for label in root.xpath(".//link:label", namespaces=ns):
label_id = label.get("{http://www.w3.org/1999/xlink}label")
role = label.get("{http://www.w3.org/1999/xlink}role")
label_text = label.text.strip() if label.text else ""
if label_id not in label_resources:
label_resources[label_id] = {}
if role.endswith("label"):
label_resources[label_id]["label"] = label_text
elif role.endswith("documentation"):
label_resources[label_id]["documentation"] = label_text
# Resolve labelArcs and map labels to href anchors
for arc in root.xpath(".//link:labelArc", namespaces=ns):
from_label = arc.get("{http://www.w3.org/1999/xlink}from")
to_label = arc.get("{http://www.w3.org/1999/xlink}to")
href = locator_map.get(from_label)
label = label_resources.get(to_label)
if href and label is not None:
role = label.get("{http://www.w3.org/1999/xlink}role")
if lang == "en":
if "label" in label:
label_map[href]["label"] = label["label"]
if "documentation" in label:
label_map[href]["documentation"] = label["documentation"]
elif lang != "en":
if "label" in label:
label_map[href][f"label_{lang}"] = label["label"]
if "documentation" in label:
label_map[href][f"documentation_{lang}"] = label["documentation"]
return label_map
label_texts = defaultdict(dict)
for mod in modules:
labels = [load_labels(mod, "en")]
if LANG != "en":
labels.append(load_labels(mod, LANG))
for label_map in labels:
for k, v in label_map.items():
label_texts[k].update(v)
# Helpers
def is_tuple_type(complex_type_element):
if complex_type_element is None:
return False
if complex_type_element.find("xs:simpleContent", namespaces) is not None:
return False
complex_content = complex_type_element.find("xs:complexContent", namespaces)
if complex_content is not None:
for tag in ["xs:restriction", "xs:extension"]:
inner = complex_content.find(tag, namespaces)
if inner is not None:
base = inner.get("base")
return base == "anyType"
return False
def resolve_base_type(type_str):
type_name = type_str.split(":")[-1]
return type_base_lookup.get(type_name, "")
# Traversal
records = []
def process_sequence(seq, _type, module, path, base, namespaces):
debug_print(f" - Processing xs:sequence in path: /{path}")
for el in seq.findall("xs:element", namespaces=namespaces):
ref = el.get("ref")
name = el.get("name")
el_name = ref or name
el_type = element_type_map.get(el_name, "")
type_name = el_type.split(":")[-1]
complex_type = complex_type_lookup.get(type_name)
is_tuple = False
if complex_type is not None:
is_tuple = is_tuple_type(complex_type)
path_str = f"gl-{module}:{path}" if "gl-" not in path else path
new_path = f"{path_str}/{el_name}"
min_occurs = el.get("minOccurs", "1")
max_occurs = el.get("maxOccurs", "1")
base_type = resolve_base_type(el_type) if not is_tuple and el_type else ""
level = 1 + new_path.count("/")
raw_key = el_name.replace(":", "_")
label_info = label_texts.get(raw_key, {})
record = {
"Level": level,
"Element": el_name,
"Type": el_type,
"Path": f"/{new_path}",
"isTuple": is_tuple,
"minOccurs": min_occurs,
"maxOccurs": max_occurs,
"BaseType": base_type,
"Label": label_info.get("label", ""),
"Documentation": label_info.get("documentation", ""),
"LocalLabel": label_info.get("label_ja", ""),
"LocalDocumentation": label_info.get("documentation_ja", "")
}
records.append(record)
if not el_type:
continue
type_name = el_type.split(":")[-1]
if is_tuple:
mod = el_type.split(":")[0][3:]
for _path in [
os.path.join(base_dir, f"gl/{mod}/gl-{mod}-2016-12-01.xsd"),
os.path.join(base_dir, f"gl/plt/{palette}/gl-{mod}-content-2016-12-01.xsd")
]:
if os.path.exists(_path):
tree = ET.parse(_path)
nested = tree.xpath(f".//xs:complexType[@name='{type_name}']", namespaces=namespaces)
if nested:
walk_complex_type(type_name, nested[0], "tuple", mod, new_path, namespaces)
break
def walk_complex_type(name, element, _type, module, path, namespaces):
if ":" not in path:
trace_print(f"Walking {_type} type '{name}' at path: /gl-{module}:{path}")
else:
trace_print(f"Walking {_type}: '{name}' at path: /{path}")
sequence = element.find("xs:sequence", namespaces)
if sequence is not None:
process_sequence(sequence, _type, module, path, name, namespaces)
return
complex_content = element.find("xs:complexContent", namespaces)
if complex_content is not None:
for tag in ["xs:restriction", "xs:extension"]:
inner = complex_content.find(tag, namespaces)
if inner is not None:
base = inner.get("base")
seq = inner.find("xs:sequence", namespaces)
if seq is not None:
process_sequence(seq, _type, module, path, base, namespaces)
return
# Start with root complexType
root = content_roots["cor"]
complex_type_list = root.xpath(".//xs:complexType[@name='accountingEntriesComplexType']", namespaces=namespaces)
if complex_type_list:
href = "gl-cor_accountingEntries"
record = {
"Level": 1,
"Element": "accountingEntries",
"Type": "gl-cor:accountingEntriesComplexType",
"Path": "/gl-cor:accountingEntries",
"isTuple": True,
"minOccurs": "1",
"maxOccurs": "unbounded",
"BaseType": "",
"Label": label_texts[href].get("label", ""),
"Documentation": label_texts[href].get("documentation", ""),
"LocalLabel": label_texts[href].get("label_ja", ""),
"LocalDocumentation": label_texts[href].get("documentation_ja", "")
}
records.append(record)
walk_complex_type("accountingEntriesComplexType", complex_type_list[0], "tuple", "cor", "accountingEntries", namespaces)
else:
print("❌ Not found: accountingEntriesComplexType")
# Output to CSV
output_dir = "XBRL-GL-2025"
os.makedirs(output_dir, exist_ok=True)
output_file = os.path.join(output_dir, output_filename)
with open(output_file, mode='w', newline='', encoding='utf-8-sig') as f:
if records:
writer = csv.DictWriter(f, fieldnames=records[0].keys())
writer.writeheader()
writer.writerows(records)
else:
print("⚠️ No records to write.")
print(f"\n✅ Saved parsed structure to: {output_file}")
Leave a Reply