Views: 11
XBRL GL パレットタクソノミ パーサー
2025-04-02
本記事では、XBRL GL(XBRL Global Ledger)タクソノミから論理的階層モデル(LHM)構造を抽出する Python ベースのパーサーを紹介します。このパーサーは、
labelArc
により定義された多言語ラベルおよびドキュメントも取得します。出力は構造化された CSV ファイルであり、セマンティック分析、実装、文書化に役立ちます。
1. 背景
XBRL Global Ledger(XBRL GL)パレットタクソノミは、会計および監査データを表現するための XML ベースの標準を定義しています。しかし、特にモジュール化された構成では、その階層構造を把握するのが困難であり、labelArc による多言語ラベルの利用も複雑さを増します。
このスクリプトは、生のスキーマ定義と、英語およびローカライズされたラベル(例:日本語)を含む、わかりやすい CSV 形式との橋渡しをします。
2. スクリプトの主な機能
-
すべての
gl-.xsd
およびgl--content.xsd
スキーマを読み込む -
anyType
を基底とするcomplexType
定義を タプル として認識 -
要素名、型、出現回数(cardinality)を抽出
-
label.xml
およびlabel-ja.xml
からlabelArc
を用いてラベルを取得 -
ラベル識別子のフォールバック解決に対応
-
スキーマ内の complexType および complexContent/xs:sequence の定義に基づく論理構造を示す、注釈付きのCSVを出力します。
3. 動作環境
-
Python 3.7 以降
-
lxml
ライブラリ:pip install lxml
4. 実行方法
4.1. コマンドラインからの実行
python xbrl_gl_label_parser.py --base-dir XBRL-GL-PWD-2016-12-01
4.2. オプション引数
引数 | 説明 |
---|---|
|
(必須) XBRL GL タクソノミのルートディレクトリへのパス |
|
パレットフォルダのサブディレクトリ名(既定値: |
|
ラベルに使用する言語コード(既定値: |
|
詳細なデバッグ出力を有効にする |
|
上位レベルのトレース出力を有効にする |
|
出力 CSV ファイル名(既定値: |
4.3. VSCode の launch.json
での例
"args": [
"--base-dir", "XBRL-GL-PWD-2016-12-01",
"--palette", "case-c-b",
"--lang", "ja",
"--debug",
"--trace",
"--output", "XBRL_GL_case-c-b_Structure.csv"
]
5. 入力ディレクトリ構造
タクソノミのフォルダ構成は次のようになっている必要があります:
XBRL-GL-PWD-2016-12-01/
├── gl/
│ ├── cor/
│ │ ├── gl-cor-2016-12-01.xsd
│ │ └── lang/
│ │ ├── gl-cor-2016-12-01-label.xml
│ │ └── gl-cor-2016-12-01-label-ja.xml
│ ├── bus/
│ ├── muc/
│ └── ...
├── gl/plt/case-c-b/
│ ├── gl-cor-content-2016-12-01.xsd
│ └── ...
6. 出力結果
スクリプトは次のような CSV ファイルを生成します:
Level,Element,Type,Path,isTuple,minOccurs,maxOccurs,BaseType,Label,Documentation,LocalLabel,LocalDocumentation
1,accountingEntries,gl-cor:accountingEntriesComplexType,/gl-cor:accountingEntries,True,1,unbounded,,Accounting Entries,Root for XBRL GL. No entry made here.,【会計仕訳】,XBRL GLのルート要素。 この要素にはデータは登録されない。
2,gl-cor:documentInfo,gl-cor:documentInfoComplexType,/gl-cor:accountingEntries/gl-cor:documentInfo,True,1,1,,Document Information,Parent for descriptive information about the accountingEntries section in which it is contained.,【文書情報】,この会計仕訳に関する情報の親タグ。
3,gl-cor:entriesType,gl-gen:entriesTypeItemType,/gl-cor:accountingEntries/gl-cor:documentInfo/gl-cor:entriesType,False,1,1,xbrli:tokenItemType,Document Type,"account: information to fill in a chart of accounts file.
balance: the results of accumulation of a complete and validated list of entries for an account (or a list of account) in a specific period - sometimes called general ledger
entries: a list of individual accounting entries, which might be posted/validated or nonposted/validated
journal: a self-balancing (Dr = Cr) list of entries for a specific period including beginning balance for that period.
ledger: a complete list of entries for a specific account (or list of accounts) for a specific period; note - debits do not have to equal credits.
assets: a listing of open receivables, payables, inventory, fixed assets or other information that can be extracted from but are not necessarily included as part of a journal entry.
trialBalance: the self-balancing (Dr = Cr) result of accumulation of a complete and validated list of entries for the entity in a complete list of accounts in a specific period.
Google Drive
XBRL_GL_Parsed_LHM_Structure.csv
6.1. CSV カラム構成
カラム名 | 内容 |
---|---|
|
階層の深さ(ルートからのレベル) |
|
要素名(QName、例: |
|
スキーマでの型(例: |
|
階層パス |
|
タプルであれば True |
|
最小出現数 |
|
最大出現数 |
|
基底型(例: |
|
|
|
英語の説明文 |
|
ローカライズされたラベル(例:日本語) |
|
ローカライズされた説明文 |
6.2. 注意点
-
タプルは、
complexType
がanyType
を基底に持つことで判定されます。 -
ローカライズされたラベルは
--lang ja
のように指定して取得できます。 -
他のタクソノミにも対応できるように拡張可能です。
7. 関連リンク
-
https://www.xbrl.org/the-standard/what/global-ledger/ — XBRL Global Ledger: 取引データ報告のための標準
-
https://specifications.xbrl.org/spec-group-index-xbrl-gl.html — XBRL GL 仕様書一覧
-
https://www.xbrl.org/int/gl/2015-03-25/GLTFTA-REC-2015-03-25.html — XBRL GL タクソノミ・フレームワーク技術アーキテクチャ(2015年)
8. ご意見・ご質問
スクリプトの適用や他のタクソノミ対応に関するご質問・ご要望がありましたら、本ページのコメント欄までお気軽にお寄せください。改善提案やコントリビューションも歓迎します。
スクリプトの内容を確認したい場合は、以下のように記述ファイルを参照できます:
SOURCE
Google Drive xbrl_gl_palette_parser.py
#!/usr/bin/env python3
# coding: utf-8
"""
xbrl_gl_palette_parser.py
Parses XBRL Global Ledger (XBRL GL) taxonomy and extracts labeled hierarchical element structures into CSV format.
Designed by SAMBUICHI, Nobuyuki (Sambuichi Professional Engineers Office)
Written by SAMBUICHI, Nobuyuki (Sambuichi Professional Engineers Office)
Creation Date: 2025-04-02
MIT License
(c) 2025 SAMBUICHI, Nobuyuki (Sambuichi Professional Engineers Office)
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Usage:
python xbrl_gl_label_parser.py --base-dir <taxonomy-root-directory> [--palette <palette-subdir>] [--lang <language-code>] [--debug] [--trace] [--output <filename>]
Arguments:
--base-dir Required. Path to the root of the XBRL GL taxonomy (e.g., XBRL-GL-PWD-2016-12-01).
--palette Optional. Subdirectory name of the palette folder (default: case-c-b-m-u-e-t-s).
--lang Optional. Language code for multilingual labels. Default is 'ja'.
--debug Optional. Enables detailed debug output.
--trace Optional. Enables trace messages.
--output Optional. Filename for the output CSV (default: XBRL_GL_Parsed_LHM_Structure.csv).
Example:
python xbrl_gl_label_parser.py --base-dir XBRL-GL-PWD-2016-12-01 --palette case-c-b --lang ja --debug --output my_labels.csv
"""
import lxml.etree as ET
import os
import re
import csv
import argparse
from collections import defaultdict
TRACE = True
DEBUG = True
def trace_print(text):
if TRACE or DEBUG:
print(text)
def debug_print(text):
if DEBUG:
print(text)
# Helper to clean label IDs
def clean_label_id(label_id):
label_id = re.sub(r"^label_", "", label_id)
label_id = re.sub(r"(_lbl|_\d+(_\d+)?)$", "", label_id)
return label_id
# Argument parser for base directory
parser = argparse.ArgumentParser(description="Parse XBRL-GL schemas and extract labeled hierarchy.")
parser.add_argument("--palette", type=str, default="case-c-b-m-u-e-t-s", help="Palette subdirectory under gl/plt/ (e.g. case-c-b or case-c-b-m-u-e-t-s)")
parser.add_argument("--base-dir", type=str, required=True, help="Base directory path to XBRL GLtaxonomy, e.g. XBRL-GL-PWD-2016-12-01")
parser.add_argument("--debug", action="store_true", help="Enable debug output")
parser.add_argument("--trace", action="store_true", help="Enable trace output")
parser.add_argument("--lang", type=str, default="ja", help="Language code for local labels (e.g. 'ja', 'en')")
parser.add_argument("--output", type=str, default="XBRL_GL_Parsed_LHM_Structure.csv", help="Output CSV filename")
args = parser.parse_args()
base_dir = args.base_dir
palette = args.palette
DEBUG = args.debug
TRACE = args.trace
LANG = args.lang
output_filename = args.output
xsd_path = os.path.join(base_dir, f"gl/plt/{palette}/gl-cor-content-2016-12-01.xsd")
namespaces = {
'xs': "http://www.w3.org/2001/XMLSchema",
'xbrli': "http://www.xbrl.org/2003/instance"
}
modules = ['gen', 'cor', 'bus', 'muc', 'usk', 'ehm', 'taf', 'srcd']
# Load base schemas and build type maps
element_type_map = {}
type_base_map = {}
type_base_lookup = {}
complex_type_lookup = {}
for mod in modules:
path = os.path.join(base_dir, f"gl/{mod}/gl-{mod}-2016-12-01.xsd")
if os.path.exists(path):
tree = ET.parse(path)
root = tree.getroot()
for el in root.xpath("//xs:element", namespaces=namespaces):
name, type_ = el.get("name"), el.get("type")
if name and type_:
# debug_print(f"gl-{mod}:{name}")
element_type_map[f"gl-{mod}:{name}"] = type_
for tdef in root.xpath("//xs:simpleType | //xs:complexType", namespaces=namespaces):
name = tdef.get("name")
if name:
# debug_print(name)
complex_type_lookup[name] = tdef
restriction = tdef.find(".//xs:restriction", namespaces)
if restriction is not None:
base = restriction.get("base")
if base:
type_base_map[name] = base
type_base_lookup[name] = base
extension = tdef.find(".//xs:extension", namespaces)
if extension is not None:
base = extension.get("base")
if base:
type_base_map[name] = base
type_base_lookup[name] = base
# Load content schemas
content_roots = {}
for mod in modules:
path = os.path.join(base_dir, f"gl/plt/{palette}/gl-{mod}-content-2016-12-01.xsd")
if os.path.exists(path):
content_roots[mod] = ET.parse(path).getroot()
tree = ET.parse(path)
root = tree.getroot()
for el in root.xpath("//xs:element", namespaces=namespaces):
name, type_ = el.get("name"), el.get("type")
if name and type_:
# debug_print(f"gl-{mod}:{name}")
element_type_map[f"gl-{mod}:{name}"] = type_
for tdef in root.xpath("//xs:simpleType | //xs:complexType", namespaces=namespaces):
name = tdef.get("name")
if name:
# debug_print(name)
complex_type_lookup[name] = tdef
restriction = tdef.find(".//xs:restriction", namespaces)
if restriction is not None:
base = restriction.get("base")
if base:
type_base_map[name] = base
type_base_lookup[name] = base
extension = tdef.find(".//xs:extension", namespaces)
if extension is not None:
base = extension.get("base")
if base:
type_base_map[name] = base
type_base_lookup[name] = base
# Load content schemas
content_roots = {}
for mod in modules:
path = os.path.join(base_dir, f"gl/plt/{palette}/gl-{mod}-content-2016-12-01.xsd")
if os.path.exists(path):
content_roots[mod] = ET.parse(path).getroot()
# Load label linkbases (EN and JA)
def load_labels(mod, lang):
label_map = defaultdict(dict)
suffix = "label.xml" if lang == "en" else f"label-{lang}.xml"
path = os.path.join(base_dir, f"gl/{mod}/lang/gl-{mod}-2016-12-01-{suffix}")
if not os.path.exists(path):
return label_map
tree = ET.parse(path)
root = tree.getroot()
ns = {'link': 'http://www.xbrl.org/2003/linkbase', 'xlink': 'http://www.w3.org/1999/xlink'}
locator_map = {}
label_resources = {}
# Map locator label -> href target
for loc in root.xpath(".//link:loc", namespaces=ns):
label_id = loc.get("{http://www.w3.org/1999/xlink}label")
href = loc.get("{http://www.w3.org/1999/xlink}href")
_, anchor = href.split("#")
if label_id and href and '#' in href:
locator_map[label_id] = anchor
# Collect label resources
for label in root.xpath(".//link:label", namespaces=ns):
label_id = label.get("{http://www.w3.org/1999/xlink}label")
role = label.get("{http://www.w3.org/1999/xlink}role")
label_text = label.text.strip() if label.text else ""
if label_id not in label_resources:
label_resources[label_id] = {}
if role.endswith("label"):
label_resources[label_id]["label"] = label_text
elif role.endswith("documentation"):
label_resources[label_id]["documentation"] = label_text
# Resolve labelArcs and map labels to href anchors
for arc in root.xpath(".//link:labelArc", namespaces=ns):
from_label = arc.get("{http://www.w3.org/1999/xlink}from")
to_label = arc.get("{http://www.w3.org/1999/xlink}to")
href = locator_map.get(from_label)
label = label_resources.get(to_label)
if href and label is not None:
role = label.get("{http://www.w3.org/1999/xlink}role")
if lang == "en":
if "label" in label:
label_map[href]["label"] = label["label"]
if "documentation" in label:
label_map[href]["documentation"] = label["documentation"]
elif lang != "en":
if "label" in label:
label_map[href][f"label_{lang}"] = label["label"]
if "documentation" in label:
label_map[href][f"documentation_{lang}"] = label["documentation"]
return label_map
label_texts = defaultdict(dict)
for mod in modules:
labels = [load_labels(mod, "en")]
if LANG != "en":
labels.append(load_labels(mod, LANG))
for label_map in labels:
for k, v in label_map.items():
label_texts[k].update(v)
# Helpers
def is_tuple_type(complex_type_element):
if complex_type_element is None:
return False
if complex_type_element.find("xs:simpleContent", namespaces) is not None:
return False
complex_content = complex_type_element.find("xs:complexContent", namespaces)
if complex_content is not None:
for tag in ["xs:restriction", "xs:extension"]:
inner = complex_content.find(tag, namespaces)
if inner is not None:
base = inner.get("base")
return base == "anyType"
return False
def resolve_base_type(type_str):
type_name = type_str.split(":")[-1]
return type_base_lookup.get(type_name, "")
# Traversal
records = []
def process_sequence(seq, _type, module, path, base, namespaces):
debug_print(f" - Processing xs:sequence in path: /{path}")
for el in seq.findall("xs:element", namespaces=namespaces):
ref = el.get("ref")
name = el.get("name")
el_name = ref or name
el_type = element_type_map.get(el_name, "")
type_name = el_type.split(":")[-1]
complex_type = complex_type_lookup.get(type_name)
is_tuple = False
if complex_type is not None:
is_tuple = is_tuple_type(complex_type)
path_str = f"gl-{module}:{path}" if "gl-" not in path else path
new_path = f"{path_str}/{el_name}"
min_occurs = el.get("minOccurs", "1")
max_occurs = el.get("maxOccurs", "1")
base_type = resolve_base_type(el_type) if not is_tuple and el_type else ""
level = 1 + new_path.count("/")
raw_key = el_name.replace(":", "_")
label_info = label_texts.get(raw_key, {})
record = {
"Level": level,
"Element": el_name,
"Type": el_type,
"Path": f"/{new_path}",
"isTuple": is_tuple,
"minOccurs": min_occurs,
"maxOccurs": max_occurs,
"BaseType": base_type,
"Label": label_info.get("label", ""),
"Documentation": label_info.get("documentation", ""),
"LocalLabel": label_info.get("label_ja", ""),
"LocalDocumentation": label_info.get("documentation_ja", "")
}
records.append(record)
if not el_type:
continue
type_name = el_type.split(":")[-1]
if is_tuple:
mod = el_type.split(":")[0][3:]
for _path in [
os.path.join(base_dir, f"gl/{mod}/gl-{mod}-2016-12-01.xsd"),
os.path.join(base_dir, f"gl/plt/{palette}/gl-{mod}-content-2016-12-01.xsd")
]:
if os.path.exists(_path):
tree = ET.parse(_path)
nested = tree.xpath(f".//xs:complexType[@name='{type_name}']", namespaces=namespaces)
if nested:
walk_complex_type(type_name, nested[0], "tuple", mod, new_path, namespaces)
break
def walk_complex_type(name, element, _type, module, path, namespaces):
if ":" not in path:
trace_print(f"Walking {_type} type '{name}' at path: /gl-{module}:{path}")
else:
trace_print(f"Walking {_type}: '{name}' at path: /{path}")
sequence = element.find("xs:sequence", namespaces)
if sequence is not None:
process_sequence(sequence, _type, module, path, name, namespaces)
return
complex_content = element.find("xs:complexContent", namespaces)
if complex_content is not None:
for tag in ["xs:restriction", "xs:extension"]:
inner = complex_content.find(tag, namespaces)
if inner is not None:
base = inner.get("base")
seq = inner.find("xs:sequence", namespaces)
if seq is not None:
process_sequence(seq, _type, module, path, base, namespaces)
return
# Start with root complexType
root = content_roots["cor"]
complex_type_list = root.xpath(".//xs:complexType[@name='accountingEntriesComplexType']", namespaces=namespaces)
if complex_type_list:
href = "gl-cor_accountingEntries"
record = {
"Level": 1,
"Element": "accountingEntries",
"Type": "gl-cor:accountingEntriesComplexType",
"Path": "/gl-cor:accountingEntries",
"isTuple": True,
"minOccurs": "1",
"maxOccurs": "unbounded",
"BaseType": "",
"Label": label_texts[href].get("label", ""),
"Documentation": label_texts[href].get("documentation", ""),
"LocalLabel": label_texts[href].get("label_ja", ""),
"LocalDocumentation": label_texts[href].get("documentation_ja", "")
}
records.append(record)
walk_complex_type("accountingEntriesComplexType", complex_type_list[0], "tuple", "cor", "accountingEntries", namespaces)
else:
print("❌ Not found: accountingEntriesComplexType")
# Output to CSV
output_dir = "XBRL-GL-2025"
os.makedirs(output_dir, exist_ok=True)
output_file = os.path.join(output_dir, output_filename)
with open(output_file, mode='w', newline='', encoding='utf-8-sig') as f:
if records:
writer = csv.DictWriter(f, fieldnames=records[0].keys())
writer.writeheader()
writer.writerows(records)
else:
print("⚠️ No records to write.")
print(f"\n✅ Saved parsed structure to: {output_file}")
コメントを残す