XBRL GL パレットタクソノミ パーサー

Views: 11

本記事では、XBRL GL(XBRL Global Ledger)タクソノミから論理的階層モデル(LHM)構造を抽出する Python ベースのパーサーを紹介します。このパーサーは、labelArc により定義された多言語ラベルおよびドキュメントも取得します。出力は構造化された CSV ファイルであり、セマンティック分析、実装、文書化に役立ちます。

1. 背景

XBRL Global Ledger(XBRL GL)パレットタクソノミは、会計および監査データを表現するための XML ベースの標準を定義しています。しかし、特にモジュール化された構成では、その階層構造を把握するのが困難であり、labelArc による多言語ラベルの利用も複雑さを増します。

このスクリプトは、生のスキーマ定義と、英語およびローカライズされたラベル(例:日本語)を含む、わかりやすい CSV 形式との橋渡しをします。

2. スクリプトの主な機能

  • すべての gl-.xsd および gl--content.xsd スキーマを読み込む

  • anyType を基底とする complexType 定義を タプル として認識

  • 要素名、型、出現回数(cardinality)を抽出

  • label.xml および label-ja.xml から labelArc を用いてラベルを取得

  • ラベル識別子のフォールバック解決に対応

  • スキーマ内の complexType および complexContent/xs:sequence の定義に基づく論理構造を示す、注釈付きのCSVを出力します。

3. 動作環境

  • Python 3.7 以降

  • lxml ライブラリ:

    pip install lxml

4. 実行方法

4.1. コマンドラインからの実行

python xbrl_gl_label_parser.py --base-dir XBRL-GL-PWD-2016-12-01

4.2. オプション引数

引数 説明

--base-dir

(必須) XBRL GL タクソノミのルートディレクトリへのパス

--palette

パレットフォルダのサブディレクトリ名(既定値: case-c-b-m-u-e-t-s

--lang

ラベルに使用する言語コード(既定値: ja)。例:en, ja など

--debug

詳細なデバッグ出力を有効にする

--trace

上位レベルのトレース出力を有効にする

--output

出力 CSV ファイル名(既定値: XBRL_GL_Parsed_LHM_Structure.csv

4.3. VSCode の launch.json での例

"args": [
  "--base-dir", "XBRL-GL-PWD-2016-12-01",
  "--palette", "case-c-b",
  "--lang", "ja",
  "--debug",
  "--trace",
  "--output", "XBRL_GL_case-c-b_Structure.csv"
]

5. 入力ディレクトリ構造

タクソノミのフォルダ構成は次のようになっている必要があります:

XBRL-GL-PWD-2016-12-01/
├── gl/
│   ├── cor/
│   │   ├── gl-cor-2016-12-01.xsd
│   │   └── lang/
│   │       ├── gl-cor-2016-12-01-label.xml
│   │       └── gl-cor-2016-12-01-label-ja.xml
│   ├── bus/
│   ├── muc/
│   └── ...
├── gl/plt/case-c-b/
│   ├── gl-cor-content-2016-12-01.xsd
│   └── ...

6. 出力結果

スクリプトは次のような CSV ファイルを生成します:

Level,Element,Type,Path,isTuple,minOccurs,maxOccurs,BaseType,Label,Documentation,LocalLabel,LocalDocumentation
1,accountingEntries,gl-cor:accountingEntriesComplexType,/gl-cor:accountingEntries,True,1,unbounded,,Accounting Entries,Root for XBRL GL. No entry made here.,【会計仕訳】,XBRL GLのルート要素。 この要素にはデータは登録されない。
2,gl-cor:documentInfo,gl-cor:documentInfoComplexType,/gl-cor:accountingEntries/gl-cor:documentInfo,True,1,1,,Document Information,Parent for descriptive information about the accountingEntries section in which it is contained.,【文書情報】,この会計仕訳に関する情報の親タグ。
3,gl-cor:entriesType,gl-gen:entriesTypeItemType,/gl-cor:accountingEntries/gl-cor:documentInfo/gl-cor:entriesType,False,1,1,xbrli:tokenItemType,Document Type,"account: information to fill in a chart of accounts file.  
balance: the results of accumulation of a complete and validated list of entries for an account (or a list of account) in a specific period - sometimes called general ledger  
entries: a list of individual accounting entries, which might be posted/validated or nonposted/validated   
journal: a self-balancing (Dr = Cr) list of entries for a specific period including beginning balance for that period.  
ledger: a complete list of entries for a specific account (or list of accounts) for a specific period; note - debits do not have to equal credits.   
assets: a listing of open receivables, payables, inventory, fixed assets or other information that can be extracted from but are not necessarily included as part of a journal entry.  
trialBalance: the self-balancing (Dr = Cr) result of accumulation of a complete and validated list of entries for the entity in a complete list of accounts in a specific period. 

6.1. CSV カラム構成

カラム名 内容

Level

階層の深さ(ルートからのレベル)

Element

要素名(QName、例:gl-cor:uniqueID

Type

スキーマでの型(例:gl-cor:uniqueIDItemType

Path

階層パス

isTuple

タプルであれば True

minOccurs

最小出現数

maxOccurs

最大出現数

BaseType

基底型(例:xbrli:stringItemType

Label

label.xml からの英語ラベル

Documentation

英語の説明文

LocalLabel

ローカライズされたラベル(例:日本語)

LocalDocumentation

ローカライズされた説明文

6.2. 注意点

  • タプルは、complexTypeanyType を基底に持つことで判定されます。

  • ローカライズされたラベルは --lang ja のように指定して取得できます。

  • 他のタクソノミにも対応できるように拡張可能です。

7. 関連リンク

8. ご意見・ご質問

スクリプトの適用や他のタクソノミ対応に関するご質問・ご要望がありましたら、本ページのコメント欄までお気軽にお寄せください。改善提案やコントリビューションも歓迎します。

スクリプトの内容を確認したい場合は、以下のように記述ファイルを参照できます:
SOURCE
Google Drive xbrl_gl_palette_parser.py

#!/usr/bin/env python3
# coding: utf-8
"""
xbrl_gl_palette_parser.py
Parses XBRL Global Ledger (XBRL GL) taxonomy and extracts labeled hierarchical element structures into CSV format.

Designed by SAMBUICHI, Nobuyuki (Sambuichi Professional Engineers Office)
Written by SAMBUICHI, Nobuyuki (Sambuichi Professional Engineers Office)

Creation Date: 2025-04-02

MIT License

(c) 2025 SAMBUICHI, Nobuyuki (Sambuichi Professional Engineers Office)

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Usage:
    python xbrl_gl_label_parser.py --base-dir <taxonomy-root-directory> [--palette <palette-subdir>] [--lang <language-code>] [--debug] [--trace] [--output <filename>]

Arguments:
    --base-dir     Required. Path to the root of the XBRL GL taxonomy (e.g., XBRL-GL-PWD-2016-12-01).
    --palette      Optional. Subdirectory name of the palette folder (default: case-c-b-m-u-e-t-s).
    --lang         Optional. Language code for multilingual labels. Default is 'ja'.
    --debug        Optional. Enables detailed debug output.
    --trace        Optional. Enables trace messages.
    --output       Optional. Filename for the output CSV (default: XBRL_GL_Parsed_LHM_Structure.csv).

Example:
    python xbrl_gl_label_parser.py --base-dir XBRL-GL-PWD-2016-12-01 --palette case-c-b --lang ja --debug --output my_labels.csv
"""

import lxml.etree as ET
import os
import re
import csv
import argparse
from collections import defaultdict

TRACE = True
DEBUG = True

def trace_print(text):
    if TRACE or DEBUG:
        print(text)

def debug_print(text):
    if DEBUG:
        print(text)

# Helper to clean label IDs
def clean_label_id(label_id):
    label_id = re.sub(r"^label_", "", label_id)
    label_id = re.sub(r"(_lbl|_\d+(_\d+)?)$", "", label_id)
    return label_id

# Argument parser for base directory
parser = argparse.ArgumentParser(description="Parse XBRL-GL schemas and extract labeled hierarchy.")
parser.add_argument("--palette", type=str, default="case-c-b-m-u-e-t-s", help="Palette subdirectory under gl/plt/ (e.g. case-c-b or case-c-b-m-u-e-t-s)")
parser.add_argument("--base-dir", type=str, required=True, help="Base directory path to XBRL GLtaxonomy, e.g. XBRL-GL-PWD-2016-12-01")
parser.add_argument("--debug", action="store_true", help="Enable debug output")
parser.add_argument("--trace", action="store_true", help="Enable trace output")
parser.add_argument("--lang", type=str, default="ja", help="Language code for local labels (e.g. 'ja', 'en')")
parser.add_argument("--output", type=str, default="XBRL_GL_Parsed_LHM_Structure.csv", help="Output CSV filename")

args = parser.parse_args()
base_dir = args.base_dir
palette = args.palette
DEBUG = args.debug
TRACE = args.trace
LANG = args.lang
output_filename = args.output

xsd_path = os.path.join(base_dir, f"gl/plt/{palette}/gl-cor-content-2016-12-01.xsd")
namespaces = {
    'xs': "http://www.w3.org/2001/XMLSchema",
    'xbrli': "http://www.xbrl.org/2003/instance"
}
modules = ['gen', 'cor', 'bus', 'muc', 'usk', 'ehm', 'taf', 'srcd']

# Load base schemas and build type maps
element_type_map = {}
type_base_map = {}
type_base_lookup = {}
complex_type_lookup = {}
for mod in modules:
    path = os.path.join(base_dir, f"gl/{mod}/gl-{mod}-2016-12-01.xsd")
    if os.path.exists(path):
        tree = ET.parse(path)
        root = tree.getroot()
        for el in root.xpath("//xs:element", namespaces=namespaces):
            name, type_ = el.get("name"), el.get("type")
            if name and type_:
                # debug_print(f"gl-{mod}:{name}")
                element_type_map[f"gl-{mod}:{name}"] = type_
        for tdef in root.xpath("//xs:simpleType | //xs:complexType", namespaces=namespaces):
            name = tdef.get("name")
            if name:
                # debug_print(name)
                complex_type_lookup[name] = tdef
                restriction = tdef.find(".//xs:restriction", namespaces)
                if restriction is not None:
                    base = restriction.get("base")
                    if base:
                        type_base_map[name] = base
                        type_base_lookup[name] = base
                extension = tdef.find(".//xs:extension", namespaces)
                if extension is not None:
                    base = extension.get("base")
                    if base:
                        type_base_map[name] = base
                        type_base_lookup[name] = base

# Load content schemas
content_roots = {}
for mod in modules:
    path = os.path.join(base_dir, f"gl/plt/{palette}/gl-{mod}-content-2016-12-01.xsd")
    if os.path.exists(path):
        content_roots[mod] = ET.parse(path).getroot()
        tree = ET.parse(path)
        root = tree.getroot()
        for el in root.xpath("//xs:element", namespaces=namespaces):
            name, type_ = el.get("name"), el.get("type")
            if name and type_:
                # debug_print(f"gl-{mod}:{name}")
                element_type_map[f"gl-{mod}:{name}"] = type_
        for tdef in root.xpath("//xs:simpleType | //xs:complexType", namespaces=namespaces):
            name = tdef.get("name")
            if name:
                # debug_print(name)
                complex_type_lookup[name] = tdef
                restriction = tdef.find(".//xs:restriction", namespaces)
                if restriction is not None:
                    base = restriction.get("base")
                    if base:
                        type_base_map[name] = base
                        type_base_lookup[name] = base
                extension = tdef.find(".//xs:extension", namespaces)
                if extension is not None:
                    base = extension.get("base")
                    if base:
                        type_base_map[name] = base
                        type_base_lookup[name] = base

# Load content schemas
content_roots = {}
for mod in modules:
    path = os.path.join(base_dir, f"gl/plt/{palette}/gl-{mod}-content-2016-12-01.xsd")
    if os.path.exists(path):
        content_roots[mod] = ET.parse(path).getroot()

# Load label linkbases (EN and JA)
def load_labels(mod, lang):
    label_map = defaultdict(dict)
    suffix = "label.xml" if lang == "en" else f"label-{lang}.xml"
    path = os.path.join(base_dir, f"gl/{mod}/lang/gl-{mod}-2016-12-01-{suffix}")
    if not os.path.exists(path):
        return label_map
    tree = ET.parse(path)
    root = tree.getroot()
    ns = {'link': 'http://www.xbrl.org/2003/linkbase', 'xlink': 'http://www.w3.org/1999/xlink'}

    locator_map = {}
    label_resources = {}

    # Map locator label -> href target
    for loc in root.xpath(".//link:loc", namespaces=ns):
        label_id = loc.get("{http://www.w3.org/1999/xlink}label")
        href = loc.get("{http://www.w3.org/1999/xlink}href")
        _, anchor = href.split("#")
        if label_id and href and '#' in href:
            locator_map[label_id] = anchor

    # Collect label resources
    for label in root.xpath(".//link:label", namespaces=ns):
        label_id = label.get("{http://www.w3.org/1999/xlink}label")
        role = label.get("{http://www.w3.org/1999/xlink}role")
        label_text = label.text.strip() if label.text else ""
        if label_id not in label_resources:
            label_resources[label_id] = {}
        if role.endswith("label"):
            label_resources[label_id]["label"] = label_text
        elif role.endswith("documentation"):
            label_resources[label_id]["documentation"] = label_text


    # Resolve labelArcs and map labels to href anchors
    for arc in root.xpath(".//link:labelArc", namespaces=ns):
        from_label = arc.get("{http://www.w3.org/1999/xlink}from")
        to_label = arc.get("{http://www.w3.org/1999/xlink}to")
        href = locator_map.get(from_label)
        label = label_resources.get(to_label)
        if href and label is not None:
            role = label.get("{http://www.w3.org/1999/xlink}role")
            if lang == "en":
                if "label" in label:
                    label_map[href]["label"] = label["label"]
                if "documentation" in label:
                    label_map[href]["documentation"] = label["documentation"]
            elif lang != "en":
                if "label" in label:
                    label_map[href][f"label_{lang}"] = label["label"]
                if "documentation" in label:
                    label_map[href][f"documentation_{lang}"] = label["documentation"]

    return label_map

label_texts = defaultdict(dict)
for mod in modules:
    labels = [load_labels(mod, "en")]
    if LANG != "en":
        labels.append(load_labels(mod, LANG))
    for label_map in labels:
        for k, v in label_map.items():
            label_texts[k].update(v)

# Helpers
def is_tuple_type(complex_type_element):
    if complex_type_element is None:
        return False
    if complex_type_element.find("xs:simpleContent", namespaces) is not None:
        return False
    complex_content = complex_type_element.find("xs:complexContent", namespaces)
    if complex_content is not None:
        for tag in ["xs:restriction", "xs:extension"]:
            inner = complex_content.find(tag, namespaces)
            if inner is not None:
                base = inner.get("base")
                return base == "anyType"
    return False

def resolve_base_type(type_str):
    type_name = type_str.split(":")[-1]
    return type_base_lookup.get(type_name, "")

# Traversal
records = []
def process_sequence(seq, _type, module, path, base, namespaces):
    debug_print(f" - Processing xs:sequence in path: /{path}")
    for el in seq.findall("xs:element", namespaces=namespaces):
        ref = el.get("ref")
        name = el.get("name")
        el_name = ref or name
        el_type = element_type_map.get(el_name, "")
        type_name = el_type.split(":")[-1]
        complex_type = complex_type_lookup.get(type_name)
        is_tuple = False
        if complex_type is not None:
            is_tuple = is_tuple_type(complex_type)

        path_str = f"gl-{module}:{path}" if "gl-" not in path else path
        new_path = f"{path_str}/{el_name}"
        min_occurs = el.get("minOccurs", "1")
        max_occurs = el.get("maxOccurs", "1")
        base_type = resolve_base_type(el_type) if not is_tuple and el_type else ""
        level = 1 + new_path.count("/")

        raw_key = el_name.replace(":", "_")
        label_info = label_texts.get(raw_key, {})

        record = {
            "Level": level,
            "Element": el_name,
            "Type": el_type,
            "Path": f"/{new_path}",
            "isTuple": is_tuple,
            "minOccurs": min_occurs,
            "maxOccurs": max_occurs,
            "BaseType": base_type,
            "Label": label_info.get("label", ""),
            "Documentation": label_info.get("documentation", ""),
            "LocalLabel": label_info.get("label_ja", ""),
            "LocalDocumentation": label_info.get("documentation_ja", "")
        }
        records.append(record)
        if not el_type:
            continue
        type_name = el_type.split(":")[-1]
        if is_tuple:
            mod = el_type.split(":")[0][3:]
            for _path in [
                os.path.join(base_dir, f"gl/{mod}/gl-{mod}-2016-12-01.xsd"),
                os.path.join(base_dir, f"gl/plt/{palette}/gl-{mod}-content-2016-12-01.xsd")
            ]:
                if os.path.exists(_path):
                    tree = ET.parse(_path)
                    nested = tree.xpath(f".//xs:complexType[@name='{type_name}']", namespaces=namespaces)
                    if nested:
                        walk_complex_type(type_name, nested[0], "tuple", mod, new_path, namespaces)
                        break

def walk_complex_type(name, element, _type, module, path, namespaces):
    if ":" not in path:
        trace_print(f"Walking {_type} type '{name}' at path: /gl-{module}:{path}")
    else:
        trace_print(f"Walking {_type}: '{name}' at path: /{path}")
    sequence = element.find("xs:sequence", namespaces)
    if sequence is not None:
        process_sequence(sequence, _type, module, path, name, namespaces)
        return
    complex_content = element.find("xs:complexContent", namespaces)
    if complex_content is not None:
        for tag in ["xs:restriction", "xs:extension"]:
            inner = complex_content.find(tag, namespaces)
            if inner is not None:
                base = inner.get("base")
                seq = inner.find("xs:sequence", namespaces)
                if seq is not None:
                    process_sequence(seq, _type, module, path, base, namespaces)
                return

# Start with root complexType
root = content_roots["cor"]
complex_type_list = root.xpath(".//xs:complexType[@name='accountingEntriesComplexType']", namespaces=namespaces)
if complex_type_list:
    href = "gl-cor_accountingEntries"
    record = {
        "Level": 1,
        "Element": "accountingEntries",
        "Type": "gl-cor:accountingEntriesComplexType",
        "Path": "/gl-cor:accountingEntries",
        "isTuple": True,
        "minOccurs": "1",
        "maxOccurs": "unbounded",
        "BaseType": "",
        "Label": label_texts[href].get("label", ""),
        "Documentation": label_texts[href].get("documentation", ""),
        "LocalLabel": label_texts[href].get("label_ja", ""),
        "LocalDocumentation": label_texts[href].get("documentation_ja", "")
    }
    records.append(record)
    
    walk_complex_type("accountingEntriesComplexType", complex_type_list[0], "tuple", "cor", "accountingEntries", namespaces)
else:
    print("❌ Not found: accountingEntriesComplexType")

# Output to CSV
output_dir = "XBRL-GL-2025"
os.makedirs(output_dir, exist_ok=True)
output_file = os.path.join(output_dir, output_filename)

with open(output_file, mode='w', newline='', encoding='utf-8-sig') as f:
    if records:
        writer = csv.DictWriter(f, fieldnames=records[0].keys())
        writer.writeheader()
        writer.writerows(records)
    else:
        print("⚠️ No records to write.")

print(f"\n✅ Saved parsed structure to: {output_file}")


投稿日

カテゴリー:

,

投稿者:

タグ:

コメント

コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です