Search Posts

Visits: 275

“デジタル会計帳簿の Tidy data (整然としたデータ)アプローチ” で紹介したTidy dataを使用するとXMLと互換性のあるCSVファイルが作成可能です。
今回は、UBL 2.1のXML文書をTidy data形式のCSVファイルと相互変換するPythonプログラムも紹介します。
使用している自作ライブラリについては、”JP PINT 0.9.3 スキーマトロンファイルの解析プログラム” の”XMLとPython dictのデータ変換ライブラリ”をお読みください。dic2etreeライブラリは、 stackoverflow converting xml to dictionary using elementtree を参考に作成しました。
CSVは、Tidy data表現を使用すると構造を含めたデータを一枚のシートで表現することができます。
リレーショナルデータベースのテーブルの形式にとらわれないTidy data形式は、すべてのデータをメモリ上で処理する際にも効果的な形式です。

JP PINT 0.9.3からダウンロードしたデジタルインボイスのTidy data表現

JP PINT 0.9.3のDownload resourcesに含まれているデジタルインボイスの例をTidy data形式にしたものをExample.xlsxにまとめました。
今回紹介するJapan PINT Invoice UBL Example1-minimum.xmlは、UBL 2.1のデジタルインボイスです。
が、よく見ると記載された金額に誤りがあります。エラー箇所については最後に紹介しています。

<?xml version="1.0" encoding="UTF-8"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2"
  xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2"
  xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2"
  xmlns:ccts="urn:un:unece:uncefact:documentation:2"
  xmlns:ext="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2"
  xmlns:qdt="urn:oasis:names:specification:ubl:schema:xsd:QualifiedDatatypes-2"
  xmlns:udt="urn:un:unece:uncefact:data:specification:UnqualifiedDataTypesSchemaModule:2"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2
  http://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/UBL-Invoice-2.1.xsd">
  <!-- Japan common commercial invoice, example1-minimum -->
  <cbc:UBLVersionID>2.1</cbc:UBLVersionID>
  <cbc:CustomizationID>urn:fdc:peppol:jp:billing:3.0</cbc:CustomizationID> <!--IBT-024 - Specification identifier -->
  <cbc:ProfileID>urn:fdc:peppol.eu:2017:poacc:billing:01:1.0</cbc:ProfileID> <!--IBT-023 - Business process type -->
  <cbc:ID>156</cbc:ID> <!--IBT-001 - Invoice number -->
  <cbc:IssueDate>2023-10-24</cbc:IssueDate> <!--IBT-002 - Invoice issue date -->
  <cbc:InvoiceTypeCode>380</cbc:InvoiceTypeCode> <!--IBT-003 - Invoice type code -->
  <cbc:DocumentCurrencyCode>JPY</cbc:DocumentCurrencyCode> <!--IBT-005 - Invoice currency code -->
  <cac:InvoicePeriod> <!--IBG-14 - INVOICING PERIOD -->
    <cbc:StartDate>2023-10-18</cbc:StartDate> <!--IBT-073 - Invoicing period start date -->
    <cbc:EndDate>2023-10-18</cbc:EndDate> <!--IBT-074 - Invoicing period end date -->
  </cac:InvoicePeriod>
  <cac:AccountingSupplierParty> <!--IBG-04 - SELLER -->
    <cac:Party>
      <cbc:EndpointID schemeID="0188">1234567890123</cbc:EndpointID> <!--IBT-034 - Seller electronic address, IBT-034-1 - Scheme identifier -->
      <cac:PostalAddress> <!--IBG-05 - SELLER POSTAL ADDRESS -->
        <cac:Country>
          <cbc:IdentificationCode>JP</cbc:IdentificationCode> <!--IBT-040 - Seller country code -->
        </cac:Country>
      </cac:PostalAddress>
      <cac:PartyTaxScheme>
        <cbc:CompanyID>T1234567890123</cbc:CompanyID> <!--IBT-031 - Seller TAX identifier -->
        <cac:TaxScheme>
          <cbc:ID>VAT</cbc:ID> <!--IBT-031, qualifier -->
        </cac:TaxScheme>
      </cac:PartyTaxScheme>
      <cac:PartyLegalEntity>
        <cbc:RegistrationName>株式会社 〇〇商事</cbc:RegistrationName> <!--IBT-027 - Seller name -->
      </cac:PartyLegalEntity>
    </cac:Party>
  </cac:AccountingSupplierParty>
  <cac:AccountingCustomerParty> <!--IBG-07 - BUYER -->
    <cac:Party>
      <cbc:EndpointID schemeID="0188">3210987654321</cbc:EndpointID> <!--IBT-049 - Buyer electronic address, IBT-049-1 - Scheme identifier -->
      <cac:PostalAddress> <!--IBG-08 - BUYER POSTAL ADDRESS -->
        <cac:Country>
          <cbc:IdentificationCode>JP</cbc:IdentificationCode> <!--IBT-055 - Buyer country code -->
        </cac:Country>
      </cac:PostalAddress>
      <cac:PartyLegalEntity>
        <cbc:RegistrationName>株式会社 〇〇物産</cbc:RegistrationName> <!--IBT-044 - Buyer name -->
      </cac:PartyLegalEntity>
    </cac:Party>
  </cac:AccountingCustomerParty>
  <cac:TaxTotal>
    <cbc:TaxAmount currencyID="JPY"> <!--26000-->20000</cbc:TaxAmount> <!--IBT-110 - Invoice total TAX amount -->
    <cac:TaxSubtotal> <!--IBG-23 - TAX BREAKDOWN -->
      <cbc:TaxableAmount currencyID="JPY"> <!--260000-->200000</cbc:TaxableAmount> <!--IBT-116 - TAX category taxable amount -->
      <cbc:TaxAmount currencyID="JPY"> <!--26000-->20000</cbc:TaxAmount> <!--IBT-117 - TAX category tax amount -->
      <cac:TaxCategory>
        <cbc:ID>S</cbc:ID> <!--IBT-118 - TAX category code -->
        <cbc:Percent>10</cbc:Percent> <!--IBT-119 - TAX category rate -->
        <cac:TaxScheme>
          <cbc:ID>VAT</cbc:ID> <!--IBT-118, qualifier -->
        </cac:TaxScheme>
      </cac:TaxCategory>
    </cac:TaxSubtotal>
    <cac:TaxSubtotal> <!--IBG-23 - TAX BREAKDOWN -->
      <cbc:TaxableAmount currencyID="JPY">3490</cbc:TaxableAmount> <!--IBT-116 - TAX category taxable amount -->
      <cbc:TaxAmount currencyID="JPY">0</cbc:TaxAmount> <!--IBT-117 - TAX category tax amount -->
      <cac:TaxCategory>
        <cbc:ID>E</cbc:ID> <!--IBT-118 - TAX category code -->
        <cbc:Percent>0</cbc:Percent> <!--IBT-119 - TAX category rate -->
        <cac:TaxScheme>
          <cbc:ID>VAT</cbc:ID> <!--IBT-118, qualifier -->
        </cac:TaxScheme>
      </cac:TaxCategory>
    </cac:TaxSubtotal>
  </cac:TaxTotal>
  <cac:LegalMonetaryTotal> <!--IBG-22 - DOCUMENT TOTALS -->
    <cbc:LineExtensionAmount currencyID="JPY">255990</cbc:LineExtensionAmount> <!--IBT-106 - Sum of Invoice line net amount -->
    <cbc:TaxExclusiveAmount currencyID="JPY">255990</cbc:TaxExclusiveAmount> <!--IBT-109 - Invoice total amount without TAX -->
    <cbc:TaxInclusiveAmount currencyID="JPY">281990</cbc:TaxInclusiveAmount> <!--IBT-112 - Invoice total amount with TAX -->
    <cbc:AllowanceTotalAmount currencyID="JPY">0</cbc:AllowanceTotalAmount> <!--IBT-107 - Sum of allowances on document level -->
    <cbc:ChargeTotalAmount currencyID="JPY">0</cbc:ChargeTotalAmount> <!--IBT-108 - Sum of charges on document level -->
    <cbc:PrepaidAmount currencyID="JPY">0</cbc:PrepaidAmount> <!--IBT-113 - Paid amount -->
    <cbc:PayableRoundingAmount currencyID="JPY">0</cbc:PayableRoundingAmount> <!--IBT-114 - Rounding amount -->
    <cbc:PayableAmount currencyID="JPY">281990</cbc:PayableAmount> <!--IBT-115 - Amount due for payment -->
  </cac:LegalMonetaryTotal>
  <cac:InvoiceLine> <!--IBG-25 - INVOICE LINE -->
    <cbc:ID>1</cbc:ID> <!--IBT-126 - Invoice line identifier -->
    <cbc:InvoicedQuantity unitCode="H87">5</cbc:InvoicedQuantity> <!--IBT-129 - Invoiced quantity, IBT-130 - Invoiced quantity unit of measure code -->
    <cbc:LineExtensionAmount currencyID="JPY">250000</cbc:LineExtensionAmount> <!--IBT-131 - Invoice line net amount -->
    <cac:InvoicePeriod> <!--IBG-26 - INVOICE LINE PERIOD -->
      <cbc:StartDate>2023-10-18</cbc:StartDate> <!--IBT-134 - Invoice line period start date -->
      <cbc:EndDate>2023-10-18</cbc:EndDate> <!--IBT-135 - Invoice line period end date -->
    </cac:InvoicePeriod>
    <cac:Item> <!--IBG-31 - ITEM INFORMATION -->
      <cbc:Name>デスクチェア</cbc:Name> <!--IBT-153 - Item name -->
      <cac:ClassifiedTaxCategory> <!--IBG-30 - LINE TAX INFORMATION -->
        <cbc:ID>S</cbc:ID> <!--IBT-151 - Invoiced item TAX category code -->
        <cbc:Percent>10</cbc:Percent> <!--IBT-152 - Invoiced item TAX rate -->
        <cac:TaxScheme>
          <cbc:ID>VAT</cbc:ID> <!--IBT-167 - Tax Scheme -->
        </cac:TaxScheme>
      </cac:ClassifiedTaxCategory>
    </cac:Item>
    <cac:Price> <!--IBG-29 - PRICE DETAILS -->
      <cbc:PriceAmount currencyID="JPY">50000</cbc:PriceAmount> <!--IBT-146 - Item net price -->
      <cbc:BaseQuantity unitCode="H87">1</cbc:BaseQuantity> <!--IBT-149 - Item price base quantity, IBT-150 - Item price base quantity unit of measure code -->
    </cac:Price>
  </cac:InvoiceLine>
  <cac:InvoiceLine> <!--IBG-25 - INVOICE LINE -->
    <cbc:ID>2</cbc:ID> <!--IBT-126 - Invoice line identifier -->
    <cbc:InvoicedQuantity unitCode="H87">5</cbc:InvoicedQuantity> <!--IBT-130 - Invoiced quantity unit of measure code, IBT-129 - Invoiced quantity -->
    <cbc:LineExtensionAmount currencyID="JPY">2500</cbc:LineExtensionAmount> <!--IBT-131 - Invoice line net amount -->
    <cac:InvoicePeriod> <!--IBG-26 - INVOICE LINE PERIOD -->
      <cbc:StartDate>2023-10-18</cbc:StartDate> <!--IBT-134 - Invoice line period start date -->
      <cbc:EndDate>2023-10-18</cbc:EndDate> <!--IBT-135 - Invoice line period end date -->
    </cac:InvoicePeriod>
    <cac:Item> <!--IBG-31 - ITEM INFORMATION -->
      <cbc:Name>コピー用紙(A4)</cbc:Name> <!--IBT-153 - Item name -->
      <cac:ClassifiedTaxCategory> <!--IBG-30 - LINE TAX INFORMATION -->
        <cbc:ID>S</cbc:ID> <!--IBT-151 - Invoiced item TAX category code -->
        <cbc:Percent>10</cbc:Percent> <!--IBT-152 - Invoiced item TAX rate -->
        <cac:TaxScheme>
          <cbc:ID>VAT</cbc:ID> <!--IBT-167 - Tax Scheme -->
        </cac:TaxScheme>
      </cac:ClassifiedTaxCategory> <!--IBG-32 - ITEM ATTRIBUTES -->
    </cac:Item>
    <cac:Price> <!--IBG-29 - PRICE DETAILS -->
      <cbc:PriceAmount currencyID="JPY">500</cbc:PriceAmount> <!--IBT-146 - Item net price -->
      <cbc:BaseQuantity unitCode="H87">1</cbc:BaseQuantity> <!--IBT-149 - Item price base quantity, IBT-150 - Item price base quantity unit of measure code -->
    </cac:Price>
  </cac:InvoiceLine>
  <cac:InvoiceLine> <!--IBG-25 - INVOICE LINE -->
    <cbc:ID>3</cbc:ID> <!--IBT-126 - Invoice line identifier -->
    <cbc:InvoicedQuantity unitCode="H87">10</cbc:InvoicedQuantity> <!--IBT-130 - Invoiced quantity unit of measure code, IBT-129 - Invoiced quantity -->
    <cbc:LineExtensionAmount currencyID="JPY">3490</cbc:LineExtensionAmount> <!--IBT-131 - Invoice line net amount -->
    <cac:InvoicePeriod> <!--IBG-26 - INVOICE LINE PERIOD -->
      <cbc:StartDate>2023-10-18</cbc:StartDate> <!--IBT-134 - Invoice line period start date -->
      <cbc:EndDate>2023-10-18</cbc:EndDate> <!--IBT-135 - Invoice line period end date -->
    </cac:InvoicePeriod>
    <cac:Item> <!--IBG-31 - ITEM INFORMATION -->
      <cbc:Name>検定済教科書(算数)</cbc:Name> <!--IBT-153 - Item name -->
      <cac:ClassifiedTaxCategory> <!--IBG-30 - LINE TAX INFORMATION -->
        <cbc:ID>E</cbc:ID> <!--IBT-151 - Invoiced item TAX category code -->
        <cbc:Percent>0</cbc:Percent> <!--IBT-152 - Invoiced item TAX rate -->
        <cac:TaxScheme>
          <cbc:ID>VAT</cbc:ID> <!--IBT-167 - Tax Scheme -->
        </cac:TaxScheme>
      </cac:ClassifiedTaxCategory>
    </cac:Item>
    <cac:Price> <!--IBG-29 - PRICE DETAILS -->
      <cbc:PriceAmount currencyID="JPY">349</cbc:PriceAmount> <!--IBT-146 - Item net price -->
      <cbc:BaseQuantity unitCode="H87">1</cbc:BaseQuantity> <!--IBT-149 - Item price base quantity, IBT-150 - Item price base quantity unit of measure code -->
    </cac:Price>
  </cac:InvoiceLine>
</Invoice>

このUBL2.1で定義されたXML文書に対応したTidy data形式の表を次に示します。

G00

G23

G25

ibt-001

ibt-002

ibt-003

ibt-005

ibt-023

ibt-024

ibt-027

ibt-031

ibt-034

ibt-034-1

ibt-040

ibt-044

ibt-049

ibt-049-1

ibt-055

ibt-073

ibt-074

ibt-106

ibt-107

ibt-108

ibt-109

ibt-110

ibt-112

ibt-113

ibt-114

ibt-115

ibt-116

ibt-117

ibt-118

ibt-119

ibt-126

ibt-129

ibt-130

ibt-131

ibt-134

ibt-135

ibt-146

ibt-149

ibt-150

ibt-151

ibt-152

ibt-167

ibt-153

156

156

2023-10-24

380

JPY

urn:fdc:peppol.eu:2017:poacc:billing:01:1.0

urn:fdc:peppol:jp:billing:3.0

株式会社 〇〇商事

T1234567890123

1234567890123

0188

JP

株式会社 〇〇物産

3210987654321

0188

JP

2023-10-18

2023-10-18

255990

0

0

255990

20000

281990

0

0

281990

156

S10

200000

20000

S

10

156

E0

3490

0

E

0

156

1

1

5

H87

250000

2023-10-18

2023-10-18

50000

1

H87

S

10

VAT

デスクチェア

156

2

2

5

H87

2500

2023-10-18

2023-10-18

500

1

H87

S

10

VAT

コピー用紙(A4)

156

3

3

10

H87

3490

2023-10-18

2023-10-18

349

1

H87

E

0

VAT

検定済教科書(算数)

G00は、インボイスに対応しておりインボイス番号156を記載しています。
G23は、TAX BREAKDOWNに対応して標準税率(S)の10%をS10、非課税(E)の0%をE0と区別できるように記載しています。
G25には、明細行の番号が1,2,3と記載されています。それ以降の欄には、項目に対応する値を記載しています。

目視確認のために、縦と横を転置した表を次にしまします。
こちらのほうが人には分かりやすいと思いますが、コンピュータ処理には上のTidy data形式が最適です。

この他のDownload resourcesに含まれているデジタルインボイスの例も併せてExample.xlsxにまとめましたのでご確認ください。

ibg-23

ibg-23

ibg-25

ibg-25

ibg-25

TAX BREAKDOWN

TAX BREAKDOWN

INVOICE LINE

INVOICE LINE

INVOICE LINE

0

1

0

1

2

ibt-001

1

1..1

Invoice number

156

ibt-002

1

1..1

Invoice issue date

2023-10-24

ibt-003

1

1..1

Invoice type code

380

ibt-005

1

1..1

Invoice currency code

JPY

ibt-023

2

1..1

Business process type

urn:fdc:peppol.eu:2017:poacc:billing:01:1.0

ibt-024

2

1..1

Specification identifier

urn:fdc:peppol:jp:billing:3.0

ibg-04

1

1..1

SELLER

ibt-027

2

1..1

Seller name

株式会社 〇〇商事

ibt-031

2

0..1

Seller TAX identifier

T1234567890123

ibt-034

2

1..1

Seller electronic address

1234567890123

ibt-034-1

3

1..1

Scheme identifier

0188

ibg-05

2

1..1

SELLER POSTAL ADDRESS

ibt-040

3

1..1

Seller country code

JP

ibg-07

1

1..1

BUYER

ibt-044

2

1..1

Buyer name

株式会社 〇〇物産

ibt-049

2

1..1

Buyer electronic address

3210987654321

ibt-049-1

3

1..1

Scheme identifier

0188

ibg-08

2

1..1

BUYER POSTAL ADDRESS

ibt-055

3

1..1

Buyer country code

JP

ibg-13

1

0..1

DELIVERY INFORMATION

ibg-14

2

0..1

INVOICING PERIOD

ibt-073

3

0..1

Invoicing period start date

2023-10-18

ibt-074

3

0..1

Invoicing period end date

2023-10-18

ibg-22

1

1..1

DOCUMENT TOTALS

ibt-106

2

1..1

Sum of Invoice line net amount

255990

ibt-107

2

0..1

Sum of allowances on document level

0

ibt-108

2

0..1

Sum of charges on document level

0

ibt-109

2

1..1

Invoice total amount without TAX

255990

ibt-110

2

1..1

Invoice total TAX amount

20000

ibt-112

2

1..1

Invoice total amount with TAX

281990

ibt-113

2

0..1

Paid amount

0

ibt-114

2

0..1

Rounding amount

0

ibt-115

2

1..1

Amount due for payment

281990

ibg-23

1

1..n

TAX BREAKDOWN

ibt-116

2

1..1

TAX category taxable amount

200000

3490

ibt-117

2

1..1

TAX category tax amount

20000

0

ibt-118

2

1..1

TAX category code

S

E

ibt-119

2

0..1

TAX category rate

10

0

ibg-25

1

1..n

INVOICE LINE

ibt-126

2

1..1

Invoice line identifier

1

2

3

ibt-129

2

1..1

Invoiced quantity

5

5

10

ibt-130

2

1..1

Invoiced quantity unit of measure code

H87

H87

H87

ibt-131

2

1..1

Invoice line net amount

250000

2500

3490

ibg-26

2

0..1

INVOICE LINE PERIOD

ibt-134

3

0..1

Invoice line period start date

2023-10-18

2023-10-18

2023-10-18

ibt-135

3

0..1

Invoice line period end date

2023-10-18

2023-10-18

2023-10-18

ibg-29

2

1..1

PRICE DETAILS

ibt-146

3

1..1

Item net price

50000

500

349

ibt-149

3

0..1

Item price base quantity

1

1

1

ibt-150

3

0..1

Item price base quantity unit of measure code

H87

H87

H87

ibg-30

2

1..n

LINE TAX INFORMATION

ibt-151

3

1..1

Invoiced item TAX category code

S

S

E

ibt-152

3

0..1

Invoiced item TAX rate

10

10

0

ibt-167

3

0..1

Tax Scheme

VAT

VAT

VAT

ibg-31

2

1..1

ITEM INFORMATION

ibt-153

3

1..1

Item name

デスクチェア

コピー用紙(A4)

検定済教科書(算数)

デジタルインボイスXML文書からTidy data形式のCSVを生成するプログラム

プログラムの処理の流れは、次の順序です。
1. JP PINT 0.9.3の定義表(Excel)からPINT変換辞書データを作成しておく。
2. Element Treeライブラリを使用してXML文書を読み込み、Element Treeデータ形式に展開する。
3. PINT変換辞書データをSemanticソート番号順に読み込み、項目ごとに定義されたXPathを使ってElement Treeの関数を使用して該当データの値を取得する。
4. データの値をTidy data形式のPythonの辞書データに登録する。
5. Tidy data形式のPythonの辞書データを転置形式のCSVとして出力する。
6. Tidy data形式のPythonの辞書データをxBRL-GD定義のメタデータファイル(JSON)およびCSVデータファイルに出力する。

invoice2oim.py
#!/usr/bin/env python3
#coding: utf-8
#
# generate CSV and OIM-CSV from Open Peoopl e-Invoice (UBL 2.1)
#
# designed by SAMBUICHI, Nobuyuki (Sambuichi Professional Engineers Office)
# written by SAMBUICHI, Nobuyuki (Sambuichi Professional Engineers Office)
#
# MIT License
#
# (c) 2021-2022 SAMBUICHI Nobuyuki (Sambuichi Professional Engineers Office)
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
from termios import VERASE
import xml.etree.ElementTree as ET
import csv
import re
import sys
import json
import os
import argparse
import collections

from dic2etree import *

DEBUG = None

ET.register_namespace('', ns[''])
ET.register_namespace('xsd', ns['xsd'])
ET.register_namespace('xsi', ns['xsi'])
ET.register_namespace('cac', ns['cac'])
ET.register_namespace('cbc', ns['cbc'])
ET.register_namespace('qdt', ns['qdt'])
ET.register_namespace('udt', ns['udt'])
ET.register_namespace('ccts', ns['ccts'])
ET.register_namespace('cn', ns['cn'])
ET.register_namespace('ubl', ns['ubl'])

SEP = os.sep

invoiceNumber = ''
DocumentCurrencyCode = ''
TaxCurrencyCode = ''
SupplierTaxScheme = ''
pintList = []
pintDict = {}
pintSemSort = {}
pintMap = {}

root = None

bough0 = {'level': 0, 'id': 'ibg-00', 'count': ''}
boughs = {}
boughs[0] = [bough0]

def file_path(pathname):
    if '/' == pathname[0:1]:
        return pathname
    else:
        dir = os.path.dirname(__file__)
        new_path = os.path.join(dir, pathname)
        return new_path

def formatXPath(xpath):
    global id
    if '[not(@schemeID="SEPA")]' in xpath: # ETree doesn't supprt 'not'. Assume we don't use SEPA in Japan
        xpath = xpath.replace('[not(@schemeID="SEPA")]','')
    if re.match(r'.*\[cac:TaxScheme/cbc:ID.*\]', xpath):
        xpath = re.sub(r'\[cac:TaxScheme/cbc:ID[ ]*=[ ]*(.*)\]','/cac:TaxScheme[cbc:ID=\\1]/..', xpath)
        xpath = re.sub(r'\[cac:TaxScheme/cbc:ID[ ]*!=[ ]*(.*)\]','/cac:TaxScheme[cbc:ID!=\\1]/..', xpath)
        # if VERBOSE: print(f'formatXPath xpath={xpath}')
    if re.match(r'.*\[cbc:TaxAmount/@currencyID', xpath):
        xpath = re.sub(r'\[cbc:TaxAmount/@currencyID[ ]*=[ ]*(.*)\]','/cbc:TaxAmount[@currencyID=\\1]/..', xpath)
    xpath = re.sub(r'/Invoice/cbc:DocumentCurrencyCode(/text\(\))?','"'+DocumentCurrencyCode+'"', xpath)
    if TaxCurrencyCode and re.match(r'.*/Invoice/cbc:TaxCurrencyCode(/text\(\))?', xpath):
        xpath = re.sub(r'/Invoice/cbc:TaxCurrencyCode(/text\(\))?','"'+TaxCurrencyCode+'"', xpath)
    xpath = re.sub(r'/Invoice', '/{'+ns['ubl']+'}Invoice', xpath)
    xpath = re.sub(r'cac:', '{'+ns['cac']+'}', xpath)
    xpath = re.sub(r'cbc:', '{'+ns['cbc']+'}', xpath)
    if re.match(r'^.*/@[a-zA-Z]*$', xpath):
        path = re.sub(r'(.*)/@.*$', '\\1', xpath)
        attr = re.sub(r'.*/@(.*)$', '\\1', xpath)
        xpath = [path, attr]
    else:
        xpath = re.sub(r'false\(\)', "'false'", xpath)
        xpath = re.sub(r'true\(\)', "'true'", xpath)
        # if DEBUG: print(f'{getframeinfo(currentframe()).lineno:03d}: xpath={xpath}')
    return xpath

def updateBough(i, count):
    global rows
    global boughs
    global n
    data = pintList[i]
    if '1'==data['card'][-1:]:
        return
    id = data['id']
    BT = data['BT']
    level = boughLevel[id]
    # if DEBUG:
    #     print(f'{getframeinfo(currentframe()).lineno:03d}: n={n:02d} i={i:03d} updateBough rows[{n}] {id} {rows[n]["0001"]}({rows[n]["0000"]})\n boughs[{n}] {boughs[n]}')
    n = len(rows) - 1
    bough = boughs[n]
    bough_id = bough[-1:][0]['id']
    parentList = parents[id]  # [1:]
    if parentList and len(parentList) > 0:
        if '-' == id[-2:-1]:
            parent_id = parentList[1:2][0]
            parent_BT = [v['BT'] for v in list(pintList) if parent_id == v['id']][0]
        else:
            parent_id = id
            parent_BT = BT
    else:
        parent_id = id
        parent_BT = BT
    if id in L1multipleBG or 'ibg-38' == id or n > 0:
        n = n + 1
    if n > 0:
        idx = n - 1
    else:
        idx = 0
    _boughs = boughs[idx]
    if 0 == len(parentList):
        _boughs = _boughs[:1]
    elif level < len(_boughs):
        _boughs = _boughs[:level]
    boughs[n] = _boughs + [{'level': len(_boughs), 'id': parent_id, 'count': count}]
    rows[n] = {'0000': parent_id, '0001': parent_BT}
    # if DEBUG:
    #     print(f'{getframeinfo(currentframe()).lineno:03d}: n={n:02d} i={i:03d} updateBough {id} rows[{n}] {rows[n]["0001"]}({rows[n]["0000"]}) boughs[{n}] {boughs[n]}')

def getAttribute(xpath,element):
    text = None
    if xpath[1]:
        attrib = xpath[1]
        if attrib and attrib in element.attrib:
            text = element.attrib[attrib]
    return text

def fillData(parent, parentXPath, i, count):
    global n
    global id
    global boughs
    if None == parent:
        parent = root
    data = pintList[i]
    semSort = data['semSort']
    id = data['id']
    level = data['level']
    BT = data['BT']
    xpath = data['xpath']
    if not xpath or len(xpath) < 9:
        return
    xpath = xpath[9:]
    xpath = formatXPath(xpath)
    text = None
    try:
        if isinstance(xpath, list): # xpath contains attribute
            element = None
            text = None
            if parentXPath:
                _xpath = xpath[0].replace(parentXPath+'/', '')
            else:
                _xpath = xpath[0]
            if re.match(r'^.*\[not\(.*\)\]', _xpath):
                # e.g. .*[not({urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}DocumentTypeCode='130')]
                xpath_ = re.sub(r'^(.*)\[not\(.*\)\]', r'\1', _xpath)
                __xpath = re.sub(r'^(.*)\[not\((.*)\)\]', r'\1[\2]', _xpath)
                if parentXPath:
                    elements_ = parent.findall(xpath_)
                else:
                    elements_ = root.findall(xpath_)
                if len(elements_) > 0:
                    _elements = root.findall(__xpath)
                    elements = list(set(elements_) - set(_elements))
                    if len(elements) > 0:
                        element = elements[0]
                        text = getAttribute(xpath,element)
            else:
                element = parent.find(_xpath)
                if not None == element and element.tag:
                    text = getAttribute(xpath,element)
        else:
            if re.match(r'^.*\[not\(.*\)\]', xpath):  # and re.match(r''):
                element = None
                text = None
                # e.g. .*[not({urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}DocumentTypeCode='130')]
                xpath = formatXPath(xpath)
                if parentXPath:
                    _xpath = xpath.replace(parentXPath, '')[1:]
                else:
                    _xpath = xpath
                xpath_ = re.sub(r'^(.*)\[not\(.*\)\]', r'\1', _xpath)
                __xpath = re.sub(r'^(.*)\[not\((.*)\)\]', r'\1[\2]', _xpath)
                if parentXPath:
                    elements_ = parent.findall(xpath_)
                else:
                    elements_ = root.findall(xpath_)
                if len(elements_) > 0:
                    _elements = root.findall(__xpath)
                    elements = list(set(elements_)-set(_elements))
                    if len(elements) > 0:
                        text = elements[0].text
            else:
                if parentXPath and not 'TaxTotal' in xpath:
                    _xpath = xpath.replace(parentXPath+'/', '')
                else:
                    _xpath = xpath
                if 'TaxTotal' in _xpath and not 'TaxSubtotal' in _xpath:
                    text = root.findtext(_xpath)
                elif 'TaxSubtotal' in _xpath:
                    _xpath = xpath.replace(parentXPath+'/', '')
                    text = parent.findtext(_xpath)
                    if not text:
                        text = root.findtext(parentXPath+'['+str(n)+']/'+_xpath)
                else:
                    text = parent.findtext(_xpath)
    except SyntaxError:
        print(f'-- SyntaxError {_xpath}')
    if text:
        if re.match(r'ibg-', id):
            text = ''
        text = text.strip().replace('\n', '\\n')
        rows[n][semSort] = {'id': id, 'level': level, 'BT': BT, 'text': text}
        if DEBUG:
            print(f'{getframeinfo(currentframe()).lineno:03d}: n={n:02d} i={i:03d} * {rows[n][semSort]}')
    return i + 1


def fillGroup(parent, parentPath, i, parentCount):
    global n
    global id
    # if DEBUG:
    #   print(f'{getframeinfo(currentframe()).lineno:03d}: n={n:02d} i={i:03d} - fillGroup boughs[{n}]={boughs[n]}')
    data = pintList[i]
    id = data['id']
    level = data['level']
    BT = data['BT']
    xpath = data['xpath']
    if DEBUG:
        print(f'{getframeinfo(currentframe()).lineno:03d}: n={n:02d} i={i:03d} - fillGroup {BT}({id})')
    if not xpath or len(xpath) < 9:
        return i + 1
    while i < len(pintList):
        data = pintList[i]
        id = data['id']
        level = data['level']
        BT = data['BT']
        xpath = data['xpath']
        xpath = xpath[9:]
        xpath = formatXPath(xpath)
        if parentPath:
            _xpath = xpath.replace(parentPath+'/', '')
        else:
            _xpath = xpath
        if re.match(r'ibt-', id):
            return None
        elif re.match(r'ibg-', id):
            if re.match(r'^.*\[not\(.*\)\]', _xpath):
                # e.g. .*[not({urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}DocumentTypeCode='130')]
                xpath_ = re.sub(r'^(.*)\[not\(.*\)\]', r'\1', _xpath)
                __xpath = re.sub(r'^(.*)\[not\((.*)\)\]', r'\1[\2]', _xpath)
                if parentPath:
                    elements_ = parent.findall(xpath_)
                    _elements = parent.findall(__xpath)
                else:
                    elements_ = root.findall(xpath_)
                    _elements = root.findall(__xpath)
                elements = list(set(elements_)-set(_elements))
            else:
                if 'InvoicePeriod' in xpath:
                    if 'InvoiceLine' in xpath and parent:
                        elements = parent.findall(_xpath)
                    else:
                        elements = root.findall(_xpath)
                elif 'Contact' in xpath:
                    elements = root.findall(_xpath)
                elif parentPath:
                    elements = parent.findall(_xpath)
                else:
                    elements = root.findall(_xpath)
            if not elements or 0 == len(elements):
                return i + 1
            next_i = None
            # ibg-30:LINE TAX INFORMATION
            if len(elements) > 1 or id in ['ibg-23', 'ibg-38', 'ibg-25', 'ibg-27', 'ibg-28', 'ibg-32']:
                # ibg-23:TAX BREAK DOWN
                # ibg-38:TAX BREAKDOWN IN ACCOUNTING CURRENCY
                # ibg-25:INVOICE LINE
                # ibg-27:INVOICE LINE ALLOWANCE
                # ibg-28:INVOICE LINE CHARGE
                # ibg-32:ITEM ATTRIBUTES
                # print(boughs[n][-1])
                count = 0
                updateBough(i, count)
            else:
                count = ''
                if not id in ['ibg-32', 'ibt-160', 'ibt-161']:
                    # ibg-32:ITEM ATTRIBUTES
                    # ibt-160:Item attribute name
                    # ibt-161:Item attribute value
                    n = len(rows) - 1
                if id in L12_single_BTG + L1multipleBG:
                    n = 0
                if parents[id][-1] != [x['id'] for x in boughs[n]][-1]:
                    current_bough = [[x['id'], x['level']] for x in boughs[n]]  # [1:-1][-1]
                    if len(current_bough) > 1:
                        if len(current_bough) == 2:
                            current_bough_ = current_bough[-1]
                        else:
                            current_bough_ = current_bough[1:-1][-1]
                        for index, bough_ in boughs.items():
                            if len(bough_) == 2:
                                bough_1 = bough_[1]
                                # print(current_bough_,bough_1)
                                if current_bough_[0] == bough_1['id'] and current_bough_[1]-1 == bough_1['count']:
                                    n = index
            for element in elements:
                if isinstance(count, int) and count > 0 and count < len(elements):
                    updateBough(i, count)
                _i = i + 1
                _data = pintList[_i]
                _id = _data['id']
                _level = _data['level']
                _BT = _data['BT']
                if DEBUG:
                    id = pintList[i]['id']
                    print(f'{getframeinfo(currentframe()).lineno:03d}: n={n:02d} i={_i:03d} - fillGroup -check- {BT}({id})[{count}] -> {_BT}({_id})')
                while _level > level and _i < len(pintList):
                    next_i = None
                    if re.match(r'^ibg-', _id):
                        next_i = fillGroup(element, xpath, _i, count)
                    elif re.match(r'^ibt-', _id):
                        if _id in L12_single_BTG:
                            n = 0
                        next_i = fillData(element, xpath, _i, count)
                    if next_i:
                        if next_i < len(pintList):
                            next_data = pintList[next_i]
                            _id = next_data['id']
                            _level = next_data['level']
                            _i = next_i
                        if next_i == len(pintList):
                            _i = next_i
                    else:
                        _i += 1
                if isinstance(count, int):
                    count += 1
            if next_i:
                i = next_i
                next_i = None
            else:
                i = _i + 1
            return i


if __name__ == '__main__':
    # Create the parser
    parser = argparse.ArgumentParser(prog='invoice2oim.py',
                                     usage='%(prog)s infile -s PINT -o outfile -e encoding [options]',
                                     description='電子インボイスXMLファイルをOIM-CSVファイルに変換')
    # Add the arguments
    parser.add_argument('inFile', metavar='infile', type=str, help='入力XMLファイル')
    parser.add_argument('-c', '--csvfile')
    parser.add_argument('-o', '--oimfile')
    parser.add_argument('-s', '--source')
    parser.add_argument('-e', '--encoding')  # 'Shift_JIS' 'cp932' 'utf_8'
    parser.add_argument('-v', '--verbose', action='store_true')
    parser.add_argument('-d', '--debug', action='store_true')

    args = parser.parse_args()
    ncdng = args.encoding
    if ncdng:
        ncdng = ncdng.lstrip()
    else:
        ncdng = 'UTF-8'
    in_file = None
    if args.inFile:
        in_file = args.inFile.strip()
        in_file = in_file.replace('/', SEP)
        in_file = file_path(args.inFile)
    # Check if infile exists
    if not in_file or not os.path.isfile(in_file):
        print('入力ファイルがありません')
        sys.exit()
    pint_file = None
    if args.source:
        pint_file = args.source.lstrip()
        pint_file = pint_file.replace('/', SEP)
        pint_file = file_path(pint_file)
    # Check if PINT exists
    if not pint_file or not os.path.isfile(pint_file):
        print('PINTファイルがありません')
        sys.exit()
    pre, ext = os.path.splitext(in_file)
    if args.csvfile:
        csv_file = args.csvfile.lstrip()
        csv_file = csv_file.replace('/', SEP)
        csv_file = file_path(csv_file)
    else:
        csv_file = pre+'.csv'
    pre, ext = os.path.splitext(csv_file)
    if args.oimfile:
        oim_file = args.oimfile.lstrip()
        oim_file = oim_file.replace('/', SEP)
        oim_file = file_path(oim_file)
    else:
        oim_file = pre+'-oim.csv'
    pre, ext = os.path.splitext(oim_file)
    metadata_file = pre[:-4]+'-metadata.json'
    VERBOSE = False
    if args.verbose:
        VERBOSE = args.verbose
    DEBUG = False
    if args.debug:
        DEBUG = args.debug
        from inspect import currentframe, getframeinfo

    if VERBOSE:
        print(f'** START ** {__file__}')
        print(f'-- Input file {in_file}')

    L12_single_BTG = ['ibt-006', 'ibt-007', 'ibt-008', 'ibt-009', 'ibt-010', 'ibt-011', 'ibt-012', 'ibt-013', 'ibt-014', 'ibt-015', 'ibt-016', 'ibt-017', 'ibt-018', 'ibt-018-1', 'ibt-019', 'ibg-02', 'ibt-023', 'ibt-024', 'ibg-04', 'ibt-027', 'ibt-028', 'ibt-029', 'ibt-029-1', 'ibt-090', 'ibt-090-1', 'ibt-030', 'ibt-030-1', 'ibt-031', 'ibt-032', 'ibt-032-1', 'ibt-033', 'ibt-034', 'ibt-034-1', 'ibg-05', 'ibt-035', 'ibt-036', 'ibt-162', 'ibt-037', 'ibt-038', 'ibt-039', 'ibt-040', 'ibg-06', 'ibt-041', 'ibt-042', 'ibt-043', 'ibg-07', 'ibt-044', 'ibt-045', 'ibt-046', 'ibt-046-1', 'ibt-047', 'ibt-047-1', 'ibt-048', 'ibt-048-1', 'ibt-049', 'ibt-049-1', 'ibg-08', 'ibt-050', 'ibt-051', 'ibt-163', 'ibt-052', 'ibt-053', 'ibt-054', 'ibt-055', 'ibg-09', 'ibt-056', 'ibt-057', 'ibt-058', 'ibg-10', 'ibt-059', 'ibt-060', 'ibt-060-1', 'ibt-061', 'ibt-061-1', 'ibg-11', 'ibt-062', 'ibt-063', 'ibt-063-1', 'ibg-12', 'ibt-064', 'ibt-065', 'ibt-164', 'ibt-066', 'ibt-067', 'ibt-068', 'ibt-069', 'ibg-13', 'ibt-070', 'ibt-071', 'ibt-071-1', 'ibt-072', 'ibg-14', 'ibt-073', 'ibt-074', 'ibg-15', 'ibt-075', 'ibt-076', 'ibt-165', 'ibt-077', 'ibt-078', 'ibt-079', 'ibt-080', 'ibg-22', 'ibt-106', 'ibt-107', 'ibt-108', 'ibt-109', 'ibt-110', 'ibt-111', 'ibt-112', 'ibt-113', 'ibt-114', 'ibt-115']
    L2_multiple_BG = ['ibg-17', 'ibt-084', 'ibt-085', 'ibt-086']

    TaxAccounting = ['ibg-37', 'ibt-111', 'ibg-38', 'ibt-190', 'ibt-192', 'ibt-193', 'ibt-194', 'ibt-195', 'ibt-199', 'igb-23', 'ibt-116', 'ibt-117', 'ibt-118', 'ibt-119']

    tree = ET.parse(in_file)
    root = tree.getroot()
    invoiceNumber = root.findtext(".//{"+ns['cbc']+"}ID")
    invoiceIssueDate = root.findtext(".//{"+ns['cbc']+"}IssueDate")
    DocumentCurrencyCode = root.findtext(".//{"+ns['cbc']+"}DocumentCurrencyCode")
    TaxCurrencyCode = root.findtext(".//{"+ns['cbc']+"}TaxCurrencyCode")
    SupplierTaxScheme = root.findtext(".//{"+ns['cac']+"}AccountingSupplierParty/{"+ns['cac']+"}Party/{"+ns['cac']+"}PartyTaxScheme/{"+ns['cac']+"}TaxScheme/{"+ns['cbc']+"}ID")

    # SemSort,ID,Section,PINTCard,Aligned,AlignedCard,Level,BT,BT_ja,DT,Desc,Desc_ja,Explanation,Explanation2,Example,SyntSort,element,UBLdatatype,SyntaxBinding,selectors,XPath,SyntaxCard,UBLOccurrence
    # 0       1  2       3        4       5           6     7  8     9  10   11      12          13           14      15       16      17          18            19        20    21         22
    # 
    COL_SemanticSort = 0
    COL_ID = 1
    COL_card = 5
    COL_level = 6
    COL_BT = 7
    COL_datatype = 9
    COL_SyntaxSort = 15
    COL_xpath = 20
    if VERBOSE:
        print(f'*** JP PINT file {pint_file}')
    with open(pint_file, encoding='utf_8', newline='') as f0:
        reader = csv.reader(f0, delimiter=',')
        header = next(reader)
        for v in reader:
            id = v[COL_ID].strip()
            if id:
                xpath = v[COL_xpath]
                xpath = xpath.replace('/ubl:','/')
                syntaxSort = v[COL_SyntaxSort]
                if not syntaxSort:
                    syntaxSort = '9999'
                if not xpath:
                    continue
                card = ''+v[COL_card].strip()
                maxcard = card[-1]
                if '0'==maxcard:
                    continue
                if len(v) > COL_xpath and '/' in xpath:
                    if re.match(r'.*\[cac:TaxScheme/cbc:ID.*\]', xpath):
                        xpath = re.sub(r'\[cac:TaxScheme/cbc:ID[ ]*=[ ]*(.*)\]','/cac:TaxScheme[cbc:ID=\\1]/..', xpath)
                        xpath = re.sub(r'\[cac:TaxScheme/cbc:ID[ ]*!=[ ]*(.*)\]','/cac:TaxScheme[cbc:ID!=\\1]/..', xpath)
                    if re.match(r'.*\[cbc:TaxAmount/@currencyID.*\]', xpath):
                        xpath = re.sub(r'\[cbc:TaxAmount/@currencyID[ ]*=[ ]*(.*)\]','/cbc:TaxAmount[@currencyID=\\1]/..', xpath)
                        if re.match(r'^.*@currencyID=/Invoice/cbc:DocumentCurrencyCode',xpath):
                            xpath = re.sub(r'^(.*@currencyID=)/Invoice/cbc:DocumentCurrencyCode(/text\(\))?(\].*)$', r'\1"' + DocumentCurrencyCode + r'"\3', xpath)
                        elif re.match(r'^.*@currencyID=/Invoice/cbc:TaxCurrencyCode',xpath):
                            if TaxCurrencyCode and id in TaxAccounting:
                                xpath = re.sub(r'^(.*@currencyID=)/Invoice/cbc:DocumentCurrencyCode(/text\(\))?(\].*)$', r'\1"' + TaxCurrencyCode + r'"\3', xpath)
                    semanticSort = v[COL_SemanticSort]
                    if v[COL_BT]:
                        BT = v[COL_BT]
                    else:
                        BT = ''
                    level = v[COL_level]
                    if '' == level:
                        level = 0
                    try:
                        level = int(level)
                    except:
                        print(f'-- Error {id} "{level}"')
                        level = 0
                    # IF ibt-001 HAS level 0
                    level = 1 + int(level)
                    v[COL_level] = level
                    datatype = ''+v[COL_datatype].strip()
                    data = {'semSort': semanticSort, 'id': id, 'level': level, 'BT': BT, 'card': card, 'datatype': datatype, 'xpath': xpath}
                    pintList.append(data)
                    pintDict[semanticSort] = data
                    pintSemSort[id] = semanticSort
                    pintMap[id] = data

    sorted_rows = sorted(pintList, key=lambda x: x['semSort'])
    pintList = sorted_rows

    idxLevel = {}
    idxLevel = []
    parents = {}
    for i in range(len(pintList)):
        data = pintList[i]
        level = int(data['level'])
        id = data['id']
        num = id
        parent = id
        if level > 0:
            parent = idxLevel[:level]
        while level > len(idxLevel) - 1:
            idxLevel.append('')
        idxLevel[level] = num
        parents[id] = parent

    L1multipleBG = [x['id'] for x in pintList if 1 == x['level'] and re.match(r'ibg-', x['id']) and 'n' == x['card'][3:]]
    L2multipleBG = [x['id'] for x in pintList if 2 == x['level'] and re.match(r'ibg-', x['id']) and 'n' == x['card'][3:]]
    bLevel = [{'id': v['id'], 'level':v['level']} for v in pintList if 'ibg' == v['id'][:3] and 'n' == v['card'][-1:]]
    boughLevel = {}
    for d in bLevel:
        boughLevel[d['id']] = int(d['level'])
    boughLevel['ibg-32'] = 2

    rows = {}
    rows[0] = {'0000': 'ibg-00', '0001': 'CONTENT'}
    boughs[0] = [{'level': 0, 'id': 'ibg-00', 'count': ''}]
    parent_xpath = None
    count = ''
    n = 0
    i = 0
    next_i = None
    boughLvl = 1
    while i < len(pintList):
        data = pintList[i]
        semSort = data['semSort']
        id = data['id']
        if not TaxCurrencyCode and id in TaxAccounting:
            i += 1
            continue
        BT = data['BT']
        level = data['level']
        xpath = data['xpath']
        if DEBUG:
            print(f'{getframeinfo(currentframe()).lineno:03d}: n={n:02d} pintList[{i}] {BT}({id})')
        if re.match(r'^ibt-', id):
            i = fillData(None, '', i, None)
        elif re.match(r'^ibg-', id):
            i = fillGroup(None, '', i, None)

    checkedParents = []
    for id in rows[0].keys():
        if re.match(r'^ibg-', id):
            if not id in checkedParents:
                checkedParents.append(id)

    checkedData = []
    for k, v in rows.items():
        for semanticSort, data in v.items():
            if semanticSort < '1000':
                continue
            id = data['id']
            parentIDs = parents[id]  # [1:]
            for parentID in parentIDs:
                if re.match(r'ibg-', parentID) and not parentID in checkedData and not parentID in checkedParents:
                    semanticSort = pintSemSort[parentID]
                    data = {'id': parentID, 'semanticSort': semanticSort}
                    if not data in checkedData:
                        checkedData.append(data)

    for parent in checkedData:
        id = parent['id']
        semanticSort = parent['semanticSort']
        data = pintDict[semanticSort]
        if not semanticSort in rows[0].keys():
            rows[0][semanticSort] = {'id': data['id'], 'level': data['level'], 'BT': data['BT'], 'text': ''}

    maxBoughLvl = 1
    for n, data in boughs.items():
        for bough in data:
            level = int(bough['level'])
            if level > maxBoughLvl:
                maxBoughLvl = level

    checkedParents = []
    for id in rows[0].keys():
        if re.match(r'^ibg-', id):
            if not id in checkedParents:
                checkedParents.append(id)

    checkedData = []
    for k, v in rows.items():
        for semanticSort, data in v.items():
            if semanticSort < '1000':
                continue
            id = data['id']
            parentIDs = parents[id]
            for parentID in parentIDs:
                if re.match(r'ibg-', parentID) and not parentID in checkedData and not parentID in checkedParents:
                    semanticSort = pintSemSort[parentID]
                    data = {'id': parentID, 'semanticSort': semanticSort}
                    if not data in checkedData:
                        checkedData.append(data)

    for parent in checkedData:
        id = parent['id']
        semanticSort = parent['semanticSort']
        data = pintDict[semanticSort]
        if not semanticSort in rows[0].keys():
            rows[0][semanticSort] = {'id': data['id'], 'level': data['level'], 'BT': data['BT'], 'text': ''}

    maxBoughLvl = 1
    for n, data in boughs.items():
        for bough in data:
            level = int(bough['level'])
            if level > maxBoughLvl:
                maxBoughLvl = level

    bough_rows = {}
    for n in rows.keys():
        bough_rows[n] = {}
        m = 0
        for bough in boughs[n]:
            level = int(bough['level'])
            id = bough['id']
            if 'ibg-00' == id:
                BT = 'INVOICE'
            else:
                semSort = pintSemSort[id]
                data = pintDict[semSort]
                BT = data['BT']
            count = str(bough['count'])
            bough_rows[n]['000'+str(3*level)] = id
            bough_rows[n]['000'+str(1 + 3*level)] = BT
            bough_rows[n]['000'+str(2 + 3*level)] = count
            m += 1
        for i in range(maxBoughLvl - m):
            bough_rows[n]['000'+str(3*(m + i))] = ''
            bough_rows[n]['000'+str(1 + 3*(m + i))] = ''
            bough_rows[n]['000'+str(2 + 3*(m + i))] = ''
        # if DEBUG: print(f'{getframeinfo(currentframe()).lineno:03d}: n={n:02d} bough_rows[{n}] {bough_rows[n]}')

    for n, row in rows.items():
        for semSort, data in row.items():
            if int(semSort) < 1000:
                continue
            bough_rows[n][semSort] = data

    max_col = len(bough_rows)
    transposed = {}
    lookup = {}
    index = 0
    for col, row in bough_rows.items():
        for semSort, data in row.items():
            if not semSort in transposed:
                transposed[semSort] = {}
                if semSort in pintDict:
                    d = pintDict[semSort]
                    transposed[semSort][0] = d['id']
                    transposed[semSort][1] = d['level']
                    transposed[semSort][2] = d['card']
                    transposed[semSort][3] = d['BT']
                else:
                    transposed[semSort][0] = ''
                    transposed[semSort][1] = ''
                    transposed[semSort][2] = ''
                    transposed[semSort][3] = ''
            if int(semSort) < 1000:  # when'0000'==semSort or '0001'==semSort:
                transposed[semSort][4+col] = data
            elif data['text']:
                transposed[semSort][4+col] = data['text']

    od_transposed = collections.OrderedDict(sorted(transposed.items()))

    del_list = []
    for semSort, data in od_transposed.items():
        check = ''
        for k, v in data.items():
            if k > 3:
                check += v
        if not check and not re.match(r'^ibg-', data[0]):
            del_list.append(semSort)

    for semSort in del_list:
        del transposed[semSort]

    max_col += 4
    lst = []
    i = 0
    for row in od_transposed:
        lst.append(['']*max_col)
    i = 0
    for semSort, data in od_transposed.items():
        for j in range(max_col):
            if j in od_transposed[semSort]:
                lst[i][j] = od_transposed[semSort][j]
        i += 1

    with open(csv_file, 'w', encoding='utf-16') as f:
        writer = csv.writer(f)  # ,delimiter='\t')
        n = 0
        for l in lst:
            if n < 3:
                n += 1
                continue
            writer.writerow(l)

    #
    # xBRL-CSV
    #
    max_x = len(lst[0])
    h = 0
    while '' == lst[h][0]:
        h += 1
    max_h = int(h/3)
    max_y = len(lst)
    num_d = 0
    header = []
    for x in range(4, max_x):
        for h in range(max_h):
            ibg_id = lst[3*h][x]
            if ibg_id and not ibg_id in header:
                header.append(ibg_id)
                num_d += 1
    for y in range(3*max_h, max_y):
        ibt_id = lst[y][0]
        if re.match(r'^ibt-[0-9]*(-[0-9]*)?',ibt_id) and not ibt_id in header:
            header.append(ibt_id)

    records = []
    for x in range(4, max_x):
        record = {}
        for i in range(len(header)):
            record[header[i]] = ''
        for h in range(max_h):
            ibg_id = lst[3*h][x]
            if ibg_id:
                if 'ibg-00'==ibg_id:
                    v = invoiceNumber
                else:
                    v = lst[3*h + 2][x]
                record[ibg_id] = v
        for y in range(3*max_h, max_y):
            ibt_id = lst[y][0]
            v = lst[y][x]
            if v:
                record[ibt_id] = v
        records.append(record)

    lineID = ''
    for record in records:
        if 'ibg-23' in record and len(record['ibg-23'])>0:
            record['ibg-23'] = f"{record['ibt-118']}{record['ibt-119']}"
        if ('ibg-25' in record and len(record['ibg-25']) > 0) or \
                ('ibg-27' in record and len(record['ibg-27']) > 0) or \
                ('ibg-28' in record and len(record['ibg-28']) > 0) or \
                ('ibg-32' in record and len(record['ibg-32']) > 0):
            if 'ibt-126' in record and len(record['ibt-126']) > 0:
                lineID= record['ibt-126']
        record['ibg-25'] = lineID

    header = [x.replace('ibg-','G') for x in header]
    with open(oim_file,'w',encoding=ncdng,newline='') as oimfile:
        writer = csv.DictWriter(oimfile, fieldnames=header)
        writer.writeheader()
        for record in records:
            data = {}
            for k,v in record.items():
                k = k.replace('ibg-','G')
                data[k] = v
            writer.writerow(data)

    if VERBOSE:
        print(f'-- xBRL-CSV {oim_file}')

    metadata = {
        "documentInfo": {
            "documentType": "https://xbrl.org/2021/xbrl-csv",
            "namespaces": {
                "pint": "http://www.xbrl.jp/eipa/peppol/0.9",
                "ns0": "http://www.example.com",
                "link": "http://www.xbrl.org/2003/linkbase",
                "iso4217": "http://www.xbrl.org/2003/iso4217",
                "xsi": "http://www.w3.org/2001/XMLSchema-instance",
                "xbrli": "http://www.xbrl.org/2003/instance",
                "xbrldi": "http://xbrl.org/2006/xbrldi",
                "xlink": "http://www.w3.org/1999/xlink"
            },
            "taxonomy": [
                "core.xsd"
            ],
        },
        "tableTemplates": {
            "pint": {
                "columns": {

                },
                "dimensions": {
                    "pint:_380": "$G00",
                    "period": "",
                    "entity": "ns0:Example co."
                }
            }
        },
        "tables": {
            "pint": {
                "url": ""
            }
        }
    }

    n = 4
    for id in header:
        if re.match(r'^G[0-9]*$',id) and n < max_x:
            ID = id
            metadata['tableTemplates']['pint']['columns'][ID] = {}
            if 'G00'!=id:
                metadata['tableTemplates']['pint']['dimensions'][f"pint:{ID}"] = f"${ID}"
        elif re.match(r'^ibt-[0-9]*',id):
            d = pintMap[id]
            if d['datatype'] in ['Amount', 'Unit Price Amount']:
                metadata['tableTemplates']['pint']['columns'][id] = {
                    'decimals': 0,
                    'dimensions': {
                        'concept': f'pint:{id}',
                        'unit': f'iso4217:{DocumentCurrencyCode}' # TODO TaxCurrencyCode
                    }
                }
            else:
                metadata['tableTemplates']['pint']['columns'][id] = {
                    'dimensions': {
                        'concept': f'pint:{id}',
                    }
                }
                
    metadata['tableTemplates']['pint']['dimensions']['period'] = f'{invoiceIssueDate}T00:00:00'
    metadata['tables']['pint']['url'] = os.path.basename(oim_file)

    with open(metadata_file, 'w') as metadatafile:
        json.dump(metadata, metadatafile, indent=4)

    if VERBOSE:
        print(f'-- metadata {metadata_file}')
    
    print('** END')

デジタルインボイスXML文書からTidy data形式のCSVを生成するプログラム

プログラムの処理の流れは、次の順序です。
1. JP PINT 0.9.3の定義表(Excel)からPINT変換辞書データを作成しておく。
2. xBRL-GD定義のCSVデータファイルを読み込み、PINT変換辞書データのSyntaxソート順に辞書データに展開する。
3. Syntaxソート順辞書データ展開したCSVレコードからPINT変換辞書データに項目ごとに定義されたXPathを使ってElement
Treeの関数を使用して該当データの値をXML文書に設定する。
4. Element Treeライブラリを使用してXML文書をファイルに出力する。

oim2invoice.py
#!/usr/bin/env python3
#coding: utf-8
#
# generate Open Peoopl e-Invoice (UBL 2.1) from xBRL-CSV file
#
# designed by SAMBUICHI, Nobuyuki (Sambuichi Professional Engineers Office)
# written by SAMBUICHI, Nobuyuki (Sambuichi Professional Engineers Office)
#
# MIT License
#
# Copyright (c) 2021 SAMBUICHI Nobuyuki (Sambuichi Professional Engineers Office)
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import xml.etree.ElementTree as ET
from collections import defaultdict
import csv
import re
import sys
import os
import argparse

from dic2etree import *

ET.register_namespace('', ns[''])
ET.register_namespace('xsd', ns['xsd'])
ET.register_namespace('xsi', ns['xsi'])
ET.register_namespace('cac', ns['cac'])
ET.register_namespace('cbc', ns['cbc'])
ET.register_namespace('qdt', ns['qdt'])
ET.register_namespace('udt', ns['udt'])
ET.register_namespace('ccts', ns['ccts'])
# if use ubl: namespace prefix
# ET.register_namespace('ubl', ns['ubl'])
# ET.register_namespace('cn', ns['cn'])

SEP = os.sep

TaxAccounting = ['ibg-37', 'ibt-111', 'ibg-38', 'ibt-190', 'ibt-192', 'ibt-193', 'ibt-194', 'ibt-195']
Allowances = []
Charges = []

allowance_exist = False
charge_exist = False
document_allowance_count = 0
line_allowance_count = 0
# index_AllowanceCharge = -1
# new_AllowanceCharge = False

def file_path(pathname):
    if '/' == pathname[0:1]:
        return pathname
    else:
        dir = os.path.dirname(__file__)
        new_path = os.path.join(dir, pathname)
        return new_path

def set_path_value(base, path, value, datatype):
    global CurrencyCode
    if DEBUG:
        base_tag = re.sub(r'{.*}', '', str(base.tag))
        print(f'{base_tag} - {"/".join(path)}  {value}')
    if len(path) > 1 and path[1].isdigit():
        n = int(path[1])
        path = path[:1] + path[2:]
    else:
        n = 0
    elements = None
    if len(path) > 1 and '@' == path[1][:1]:
        _attr = path[1][1:]
        p = path[0].split(':')
        el = ET.QName(ns[p[0]], p[1])
        elements = base.findall(str(el))
        if len(elements) > 0:
            element = elements[0]
        else:
            element = ET.SubElement(base, el)
        element.set(_attr, value)
    else:
        if path[0] == 'Invoice':
            p = path[1].split(':')
        else:
            p = path[0].split(':')
        if not p[0] in ns:
            print(p)
            return
        elif 2 == len(p):
            el = ET.QName(ns[p[0]], p[1])
            elements = base.findall(str(el))
        else:
            return
        if elements and len(elements) > 0:
            if len(elements) > n:
                element = elements[n]
            else:
                element = ET.SubElement(base, el)
            if 'cbc' == p[0]:
                element.text = value
            else:
                _path = path[1:]
                set_path_value(element, _path, value, datatype)
        else:
            element = ET.SubElement(base, el)
            if 'cbc' == p[0]:
                element.text = value
                if 'Amount' == datatype or 'Unit Price Amount' == datatype:
                    element.set('currencyID', CurrencyCode)
                elif 'PartyTaxScheme' == str(base.tag)[-14:] and 'CompanyID' == p[1]:
                    el1 = ET.QName(ns['cac'], 'TaxScheme')
                    element1 = ET.SubElement(base, el1)
                    el2 = ET.QName(ns['cbc'], 'ID')
                    element2 = ET.SubElement(element1, el2)
                    element2.text = 'VAT'
                elif 'TaxCategory' == str(base.tag)[-11:] and 'Percent' == p[1]:
                    el1 = ET.QName(ns['cac'], 'TaxScheme')
                    element1 = ET.SubElement(base, el1)
                    el2 = ET.QName(ns['cbc'], 'ID')
                    element2 = ET.SubElement(element1, el2)
                    element2.text = 'VAT'
            else:
                _path = path[1:]
                set_path_value(element, _path, value, datatype)

def set_record(record):
    global root
    global sorted_header
    global CurrencyCode
    global allowance_exist
    global charge_exist
    global document_allowance_count
    global line_allowance_count
    # global new_AllowanceCharge
    # global index_AllowanceCharge
    i = 0
    head = []
    seq1 = '0'
    seq2 = '0'
    head1_pathList = None
    head2_pathList = None
    base_path = None
    while re.match(r'^G[0-9]*$',header[i]):
        if record[i]:
            data = {'id':'ibg-'+header[i][1:], 'value':record[i]}
            head.append(data)
        i += 1
    head0_id = head[0]['id']
    if len(head) > 0:
        if head0_id and head0_id in pintDict and 'xpath' in pintDict[head0_id]:
            head0_xpath = pintDict[head0_id]['xpath']
            if re.match(r'^(.*)\[.*\](.*)$', head0_xpath):
                head0_xpath = re.sub(r'^(.*)\[.*\](.*)$', r'\1\2', head0_xpath)
            head0_pathList = head0_xpath[1:].split('/')[1:]
            if 'ibt' == head0_id[:3]:
                head0_pathList = head0_pathList[-1:]
                base0_path = head0_pathList[:-1]
            elif len(head0_pathList) > 0:
                head0_pathList = []
                base0_path = head0_pathList
        if len(head0_xpath[1:].split('/')) > 1:
            base = '/'+'/'.join(base0_path[:-1])  # head_xpath
        else:
            base = ''
            base_path = []
    if len(head) > 1:
        head1_id = head[1]['id']
        seq1 = head[1]['value']
        if head1_id and head1_id in pintDict and 'xpath' in pintDict[head1_id]:
            head1_xpath = pintDict[head1_id]['xpath']
            if re.match(r'^(.*)\[.*\](.*)$', head1_xpath):
                head1_xpath = re.sub(r'^(.*)\[.*\](.*)$', r'\1\2', head1_xpath)
            head1_xpath = head1_xpath.replace(head0_xpath,'')
            head1_pathList = head1_xpath[1:].split('/')
            if 'ibt' == head1_id[:3]:
                head1_pathList = head1_pathList[:-1] + [seq1] + head1_pathList[-1:]
                base_path = head1_pathList[:-1]
            elif len(head1_pathList) > 0:
                head1_pathList = head1_pathList + [seq1]
                base_path = head1_pathList
    if len(head) > 2:
        head2_id = head[2]['id']
        seq2 = head[2]['value']
        if head2_id and head2_id in pintDict and 'xpath' in pintDict[head2_id]:
            head2_xpath = pintDict[head2_id]['xpath']
            if re.match(r'^(.*)\[.*\](.*)$', head2_xpath):
                head2_xpath = re.sub(r'^(.*)\[.*\](.*)$', r'\1\2', head2_xpath)
            head2_xpath = head2_xpath.replace(head0_xpath,'')
            head2_pathList = head2_xpath[1:].split('/')
            if 'ibt' == head2_id[:3]:
                head2_pathList = head1_pathList[:-1] + [seq1] + head2_pathList[:-1] + [seq2] + head2_pathList[-1:]
                base_path = head2_pathList[:-1]
            elif len(head2_pathList) > 0:
                head2_pathList = head1_pathList + [seq1] + head2_pathList + [seq2]
                base_path = head1_pathList
    # traverse field in a record
    for n in range(len(record)): 
        cell = record[n]
        h = sorted_header[n]
        if re.match(r'^G[0-9]*$',h):
            continue
        if len(cell) > 0:
            id = sorted_header[n]
            if not id in pintDict:
                continue
            if 'ibt-126' == id: # Invoice line identifier
                allowance_exist = False
                charge_exist = False
                line_allowance_count = 0
                # index_AllowanceCharge = -1
                # new_AllowanceCharge = False
                # if DEBUG:
                #     print(f'- id:{id} index_AllowanceCharge:{index_AllowanceCharge} allowance_exist:{allowance_exist}')
            data = pintDict[id]
            datatype = data['datatype']
            xpath = data['xpath']
            xpath_ = None
            # remove [...] from xpath
            if re.match(r'^(.*)\[.*\](.*)$', xpath):
                xpath_ = re.sub(r'^(.*)\[.*\](.*)$', r'\1\2', xpath)
                pathList = xpath_[1:].split('/')
            else:
                pathList = xpath[1:].split('/')
            if "[cbc:DocumentTypeCode='130']" in xpath:
                xpath = xpath.replace("[cbc:DocumentTypeCode='130']", '')
                if 'cac:InvoiceLine' in pathList:
                    idxT188 = sorted_header.index('ibt-188')
                    index = 1 + pathList.index('cac:DocumentReference')
                    if record[idxT188]: # 1bt-188 and ibt-128,ibt-128-1 are cac:DocumentReference/cbc:ID
                        pathList = pathList[:2] + [seq1] + pathList[2:index] + ['1'] + pathList[index:]
                    else:
                        pathList = pathList[:2] + [seq1] + pathList[2:index] + ['0'] + pathList[index:]
                path = pathList[1:]
                CurrencyCode = DocumentCurrencyCode
                set_path_value(root, path, cell, datatype)
                if not '@schemeID' in pathList:
                    path = pathList[:-1] + ['cbc:DocumentTypeCode']
                    path = path[1:]
                    CurrencyCode = DocumentCurrencyCode
                    set_path_value(root, path, '130', datatype)
            elif "[not(cbc:DocumentTypeCode='130')]" in xpath:
                if 'cac:AdditionalDocumentReference' in xpath:
                    pathList = pathList[:2] + [seq1] + pathList[2:]
                    path = pathList
                elif 'cac:InvoiceLine' in xpath:
                    pathList = pathList[:2] + [seq1] + pathList[2:]
                    path = pathList
                else:
                    path = pathList
                path = path[1:]
                CurrencyCode = DocumentCurrencyCode
                set_path_value(root, path, cell, datatype)
            elif '/Invoice/cac:TaxTotal' in xpath:
                if 'DocumentCurrencyCode' in xpath:
                    _xpath = re.sub(r'^(.*)\[.*\](.*)$', r'\1\2', xpath)
                    path = _xpath[1:].split('/')
                    if 'cac:TaxSubtotal' in path:
                        path = ['cac:TaxTotal', '0', 'cac:TaxSubtotal'] + [seq1] + path[3:]
                        CurrencyCode = DocumentCurrencyCode
                        set_path_value(root, path, cell, datatype)
                    else:
                        path = ['cac:TaxTotal', '0'] + path[2:]
                        CurrencyCode = DocumentCurrencyCode
                        set_path_value(root, path, cell, datatype)
                elif 'TaxCurrencyCode' in xpath:
                    _xpath = re.sub(r'^(.*)\[.*\](.*)$', r'\1\2', xpath)
                    path = _xpath[1:].split('/')
                    if 'cac:TaxSubtotal' in path:
                        path = ['cac:TaxTotal', '1', 'cac:TaxSubtotal'] + [str(seq1)] + path[3:]
                        CurrencyCode = TaxCurrencyCode
                        set_path_value(root, path, cell, datatype)
                    else:
                        path = ['cac:TaxTotal', '1'] + path[2:]
                        CurrencyCode = TaxCurrencyCode
                        set_path_value(root, path, cell, datatype)
                else:
                    if 'cac:TaxTotal' in Dic['Invoice'] and isinstance(Dic['Invoice']['cac:TaxTotal'], list):
                        if 'ibt-117-1' in header:
                            if DocumentCurrencyCode == record[header.index('ibt-117-1')]:
                                loc = '0'
                            else:
                                loc = '1'
                        else:
                            loc = '0'
                        pathList = xpath[1:].split('/')
                        path = pathList[:2] + [loc] + pathList[2:]
                        path = path[:4] + [seq1] + path[4:]
                        path = path[1:]
                        CurrencyCode = DocumentCurrencyCode
                        set_path_value(root, path, cell, datatype)
                    else:
                        pathList = re.sub(base, '', xpath)[1:].split('/')
                        path = base_path + pathList
                        path = path[1:]
                        CurrencyCode = DocumentCurrencyCode
                        set_path_value(root, path, cell, datatype)
            elif 'ChargeIndicator=' in xpath:
                _xpath = xpath[9:]
                # remove [] condition from xpath
                allowanceCharge = re.sub(r'(.*)\[cbc:ChargeIndicator=(true|false)\(\)\](.*)', r'\1', _xpath)
                chargeIndicator = re.sub(r'.*\[cbc:ChargeIndicator=(true|false)\(\)\](.*)', r'\1', _xpath)
                pathList = allowanceCharge.split('/')
                if 'cac:InvoiceLine' in _xpath:
                    if len(pathList) > 1:
                        pathList = pathList[:1] + [seq1] + pathList[1:]
                    else:
                        pathList = pathList + [seq1]
                if 'cac:Price' in pathList:
                    path = pathList
                    path.append('cbc:ChargeIndicator')
                    set_path_value(root, path, chargeIndicator, 'Indicator')
                    path = re.sub(r'(.*)\[cbc:ChargeIndicator=false\(\)\](.*)', r'\1\2', xpath)[1:].split('/')
                    path = path[:2] + [seq1] + path[2:]
                    path = path[1:]
                    CurrencyCode = DocumentCurrencyCode
                    set_path_value(root, path, cell, datatype)
                else:
                    # if 'ibt-146'==id:
                    if id in [v['id'] for k, v in semDict.items() if 'ibg-29' == v['parent']]:  # PRICE DETAILS
                        path = pathList + ['0']
                        path.append('cbc:ChargeIndicator')
                        set_path_value(root, path, chargeIndicator, 'Indicator')
                    elif id in [v['id'] for k, v in semDict.items() if 'ibg-20' == v['parent']] and not allowance_exist:
                        # children of DOCUMENT LEVEL ALLOWANCES
                        allowance_exist = True
                        document_allowance_count += 1
                        path = pathList + [seq1]
                        path.append('cbc:ChargeIndicator')
                        set_path_value(root, path, chargeIndicator, 'Indicator')
                    elif id in [v['id'] for k, v in semDict.items() if 'ibg-21' == v['parent']] and not charge_exist:
                        # children of DOCUMENT LEVEL CHARGES
                        charge_exist = True
                        seq1 = str(document_allowance_count + int(seq1))
                        path = pathList + [seq1]
                        path.append('cbc:ChargeIndicator')
                        set_path_value(root, path, chargeIndicator, 'Indicator')
                    elif id in [v['id'] for k, v in semDict.items() if 'ibg-27' == v['parent']] and not allowance_exist:
                        # children of INVOICE LINE ALLOWANCES
                        allowance_exist = True
                        line_allowance_count += 1
                        path = pathList + [seq2]
                        path.append('cbc:ChargeIndicator')
                        set_path_value(root, path, chargeIndicator, 'Indicator')
                    elif id in [v['id'] for k, v in semDict.items() if 'ibg-28' == v['parent']] and not charge_exist:
                        # children of INVOICE LINE CHARGES
                        charge_exist = True
                        seq2 = str(line_allowance_count + int(seq2))
                        path = pathList + [seq2]
                        path.append('cbc:ChargeIndicator')
                        set_path_value(root, path, chargeIndicator, 'Indicator')
                    # seq_ = str(index_AllowanceCharge)
                    # if new_AllowanceCharge:
                    #     path = pathList + [seq_]
                    #     path.append('cbc:ChargeIndicator')
                    #     if 'Invoice' == path[0]:
                    #         path = path[1:]
                    #     set_path_value(root, path, chargeIndicator, 'Indicator')
                    #     new_AllowanceCharge = False
                    xpath = re.sub(r'(.*)\[cbc:ChargeIndicator=(true|false)\(\)\](.*)', r'\1\3', xpath)
                    pathList = xpath[1:].split('/')
                    if 'cac:InvoiceLine' in pathList:
                        path = pathList[:2] + [seq1] + pathList[2:3] + [seq2] + pathList[3:]
                    else:
                        path = pathList[:2] + [seq1] + pathList[2:]
                    path = path[1:]
                    CurrencyCode = DocumentCurrencyCode
                    set_path_value(root, path, cell, datatype)
            else:
                if 'cac:InvoiceLine' in pathList:
                    if 'cac:AdditionalItemProperty' in pathList:
                        pathList = pathList[:2] + [seq1] + pathList[2:4] + [seq2 or '0'] + pathList[4:]
                    else:
                        pathList = pathList[:2] + [seq1] + pathList[2:]
                    if 'cac:DespatchLineReference' in pathList:
                        path_ = pathList[1:4] + ['cbc:LineID']
                        set_path_value(root, path_, '0000', 'Identifier')
                path = pathList[1:]
                CurrencyCode = DocumentCurrencyCode
                set_path_value(root, path, cell, datatype)
                if 'ibt-087' == id: 
                    # UBL requires cbc:NetworkID when cbc:PrimaryAccountNumberIDPayment card primary account number (4 to 6 digits) is used.
                    path[-1] = 'cbc:NetworkID'
                    set_path_value(root, path, 'NA', datatype)
                
            

def writeET(root,out_file):
    element = ET.ElementTree(root)
    ET.indent(element,space='\t')
    element.write(out_file, xml_declaration=True, encoding='utf-8')

if __name__ == '__main__':
    # Create the parser
    parser = argparse.ArgumentParser(prog='csv2invoice',
                                     usage='%(prog)s [options] infile -o outfile',
                                     description='OIM-CSVファイルから電子インボイスXMLを作成')
    # Add the arguments
    parser.add_argument('inFile', metavar='infile', type=str, help='入力OIM-CSVファイル')
    parser.add_argument('-s', '--source')
    parser.add_argument('-o', '--out')
    parser.add_argument('-e', '--encoding')  # e.g. utf-16 Shift_JIS cp932
    parser.add_argument('-t', '--transpose', action='store_true')
    parser.add_argument('-v', '--verbose', action='store_true')
    parser.add_argument('-d', '--debug', action='store_true')

    args = parser.parse_args()
    in_file = None
    if args.inFile:
        in_file = args.inFile
        in_file = in_file.replace('/', SEP)
        in_file = file_path(in_file)
    # Check if infile exists
    if not in_file or not os.path.isfile(in_file):
        print('入力ファイルがありません')
        sys.exit()
    pre, ext = os.path.splitext(in_file)
    pint_file = None
    if args.source:
        pint_file = args.source.lstrip()
        pint_file = pint_file.replace('/', SEP)
        pint_file = file_path(pint_file)
    # Check if PINT exists
    if not pint_file or not os.path.isfile(pint_file):
        print('PINTファイルがありません')
        sys.exit()
    if args.out:
        out_file = args.out.lstrip()
        out_file = file_path(out_file)
    else:
        out_file = pre+'.xml'
    # tmp_file = pre+'.txt'
    ncdng = args.encoding.lstrip()
    if not ncdng:
        ncdng = 'UTF-8'
    TRANSPOSE = args.verbose
    VERBOSE = args.verbose
    DEBUG = args.debug

    if VERBOSE:
        print('** START ** ', __file__)

    # initialize globals
    pintDict = {}
    semDict = {}
    syntaxDict = {}
    header_id = None
    header_count = 0
    DocumentCurrencyCode = None
    TaxCurrencyCode = None
    CurrencyCode = None
    Dic = defaultdict(type(''))
    Dic['Invoice'] = {}
    sortedDic = defaultdict(type(''))
    sortedDic['Invoice'] = {}
    pintDict = defaultdict(type(''))
    pintL1 = []
    multipleBG = []
    # SemSort,ID,Section,PINTCard,Aligned,AlignedCard,Level,BT,BT_ja,DT,Desc,Desc_ja,Explanation,Explanation2,Example,SyntSort,element,UBLdatatype,SyntaxBinding,selectors,XPath,SyntaxCard,UBLOccurrence,SharedRule,AlignedRule
    # 0       1  2       3        4       5           6     7  8     9  10   11      12          13           14      15       16      17          18            19        20    21         22
    COL_SemanticSort = 0
    COL_ID = 1
    COL_card = 5
    COL_level = 6
    COL_BT = 7
    COL_datatype = 9
    COL_syntSort = 15
    COL_xpath = 20
    if VERBOSE:
        print(f'*** XPath file {pint_file}')
    with open(pint_file, encoding='utf-8', newline='') as f0:
        reader = csv.reader(f0)  # , delimiter='\t')
        header = next(reader)
        for v in reader:
            id = v[COL_ID].strip()
            if id:
                semSort = v[COL_SemanticSort]
                xpath = v[COL_xpath]
                xpath = xpath.replace('/ubl:','/')
                syntSort = v[COL_syntSort]
                if not syntSort:
                    syntSort = '9999'
                if not xpath:
                    continue
                if len(v) > COL_xpath and '/' in xpath:
                    if v[COL_BT]:
                        BT = v[COL_BT]
                    else:
                        BT = None
                    level = v[COL_level]
                    if level:
                        level = 1+int(level)
                        level = str(level)
                    else:
                        level = '0'
                    card = ''+v[COL_card].strip()
                    datatype = ''+v[COL_datatype].strip()
                    data = {'syntSort': syntSort, 'id': id, 'level': level, 'BT': BT, 'card': card, 'datatype': datatype, 'xpath': xpath}
                    pintDict[id] = data
                    semDict[semSort] = {'id': id, 'level': int(level)}

    sorted_semDict = sorted(semDict.items(), key=lambda x: x[0])

    level = 0
    parent = ['ibg-00']
    for k, v in dict(sorted_semDict).items():
        if v['level'] == level:
            semDict[k]['parent'] = parent[level-1]
            parent[level] = v['id']
        elif v['level'] == level+1:
            semDict[k]['parent'] = parent[level]
            level = v['level']
            if level == len(parent):
                parent.append(None)
            parent[level] = v['id']
        else:
            level = v['level']
            semDict[k]['parent'] = parent[level-1]
            parent[level] = v['id']
            for i in range(len(parent)):
                if i > level:
                    parent[i] = None

    if VERBOSE:
        print(f'*** Input file {in_file}')
    with open(in_file, encoding=ncdng, newline='') as f:
        reader = csv.reader(f)  # , delimiter='\t')
        rows = []
        count = 0
        for record in reader:
            if 0 == count:
                header = []
                for field in record:
                    header.append(field)
            elif 1==count:
                for n in range(0,len(record)):
                    id = header[n]
                    data = pintDict[id]
                    if 'ibt-005' == id:
                        DocumentCurrencyCode = record[n]
                    elif 'ibt-006' == id:
                        TaxCurrencyCode = record[n]
            count += 1
            rows.append(record)

    # change to occurence sequence
    idxG00 = header.index('G00')
    idxG23 = header.index('G23')
    idxG25 = header.index('G25')
    currentG00 = ''
    currentG23 = ''
    currentG25 = ''
    countG00 = 0
    countG23 = 0
    countG25 = 0
    for i in range(1,len(rows)):
        row = rows[i]
        # G00
        if not currentG00:
            if len(row[idxG00]) > 0:
                currentG00 = row[idxG00]
                countG00 = 0
        else:
            if len(row[idxG00]) > 0 and currentG00 != row[idxG00]:
                currentG00 = row[idxG00]
                countG00 += 1
        if row[idxG00]:
            row[idxG00] = str(countG00)
        # G23
        if not currentG23:
            if len(row[idxG23]) > 0:
                currentG23 = row[idxG23]
                countG23 = 0
        else:
            if len(row[idxG23]) > 0 and currentG23 != row[idxG23]:
                currentG23 = row[idxG23]
                countG23 += 1
        if row[idxG23]:
            row[idxG23] = str(countG23)
        # G25
        if not currentG25:
            if len(row[idxG25]) > 0:
                currentG25 = row[idxG25]
                countG25 = 0
        else:
            if len(row[idxG25]) > 0 and currentG25 != row[idxG25]:
                currentG25 = row[idxG25]
                countG25 += 1
        if row[idxG25]:
            row[idxG25] = str(countG25)
        rows[i] = row

    if 'ibt-018' in header:
        index = rows[0].index('G24')
        for row in rows:
            if row[index] and 'G24'!=row[index]:
                g24 = row[index]
                row[index] = str(1 + int(g24))

    # sort order by stntax sort
    header = rows[0]
    dim_n = 0
    while re.match(r'^G[0-9]*$',header[dim_n]):
        dim_n += 1
        
    dim_header = []
    for n in range(dim_n):
        dim_header.append({'id':header[n],'num':n})
    sorted_dimHeader = sorted(dim_header, key=lambda x: pintDict[f"ibg-{x['id'][-2:]}"]['syntSort'])

    element_header = []
    for n in range(dim_n,len(header)):
        element_header.append({'id':header[n],'num':n})
    sorted_elementHeader = sorted(element_header, key=lambda x: pintDict[x['id']]['syntSort'])

    sorted_header = []
    for d in sorted_dimHeader:
        sorted_header.append(d['id'])
    for d in sorted_elementHeader:
        sorted_header.append(d['id'])

    sorted_rows = []
    sorted_rows.append(sorted_header)

    for j in range(1,len(rows)):
        row = rows[j]
        sorted_row = ['']*len(row)
        for k in range(dim_n):
            num = sorted_dimHeader[k]['num']
            sorted_row[k] = row[num]
        for k in range(dim_n,len(row)):
            num = sorted_elementHeader[k - dim_n]['num']
            sorted_row[k] = row[num]
        sorted_rows.append(sorted_row)

    idxG24 = None
    idxG25 = None
    idxG27 = None
    idxG28 = None
    idxG32 = None
    if 'G24' in sorted_rows[0]:
        idxG24 = sorted_rows[0].index('G24')
    if 'G25' in sorted_rows[0]:
        idxG25 = sorted_rows[0].index('G25')
    if 'G27' in sorted_rows[0]:
        idxG27 = sorted_rows[0].index('G27')
    if 'G28' in sorted_rows[0]:
        idxG28 = sorted_rows[0].index('G28')
    if 'G32' in sorted_rows[0]:
        idxG32 = sorted_rows[0].index('G32')

    dimsDocument = [
        {'id':'G03', 'BT':'PRECEDING INVOICE REFERENCE'},
        {'id':'G24', 'BT':'ADDITIONAL SUPPORTING DOCUMENTS'},
        {'id':'G16', 'BT':'PAYMENT INSTRUCTIONS'},
        {'id':'G33', 'BT':'INVOICE TERMS'},
        {'id':'G35', 'BT':'Paid amounts'},
        {'id':'G20', 'BT':'DOCUMENT LEVEL ALLOWANCES'},
        {'id':'G21', 'BT':'DOCUMENT LEVEL CHARGES'},
        {'id':'G23', 'BT':'TAX BREAKDOWN'},
        {'id':'G38', 'BT':'TAX BREAKDOWN IN ACCOUNTING CURRENCY'}]
    dimsLine = [
        {'id':'G25', 'BT':'INVOICE LINE'},
        {'id':'G27', 'BT':'INVOICE LINE ALLOWANCES'},
        {'id':'G28', 'BT':'INVOICE LINE CHARGES'},
        {'id':'G30', 'BT':'LINE TAX INFORMATION'},
    ]

    leading_ibt = ['ibt-024','ibt-023','ibt-001','ibt-002','ibt-168','ibt-009','ibt-003','ibt-022','ibt-007','ibt-005','ibt-006','ibt-019','ibt-010','ibg-14','ibt-073','ibt-074','ibt-008','ibt-013','ibt-014','ibg-03','ibt-025','ibt-026','ibt-016','ibt-015','ibt-017','ibt-012','ibt-018','ibt-018-1']
    item_ibt = ['ibt-154','ibt-153','ibt-156','ibt-155','ibt-157','ibt-157-1','ibt-159','ibt-158','ibt-158-1','ibt-158-2','ibt-151','ibt-152','ibt-166','ibt-167','ibt-160','ibt-161']
    price_ibt = ['ibt-146','ibt-149','ibt-147','ibt-148','ibt-150']

    l = len(row)
    reviced_rows = []
    reviced_rows.append(sorted_rows[0])
    row = sorted_rows[1]
    leading = [sorted_rows[0][n] in leading_ibt for n in range(l)]
    trailing = [not leading[n] for n in range(l)]
    leading_row = ['']*l
    for n in range(l):
        if n < dim_n or leading[n]:
            leading_row[n] = row[n]
    reviced_rows.append(leading_row)
    trailing_row = ['']*l
    for n in range(l):
        if n < dim_n or trailing[n]:
            trailing_row[n] = row[n]
    reviced_rows.append(trailing_row)

    sorted_rows2 = sorted_rows[2:]

    for dim in dimsDocument:
        id = dim['id']
        if id in sorted_rows[0]:
            index = sorted_rows[0].index(id) # id is dimension of document level
            for row in sorted_rows2:
                if row[index]:
                    if 'G24'==id:
                        last_i = len(reviced_rows)-1
                        previous_row = reviced_rows[last_i]
                        reviced_rows[last_i] = row
                        reviced_rows.append(previous_row)
                    else:
                        reviced_rows.append(row)
    
    l = len(sorted_header)
    # countG27 = 0
    for i in range(len(sorted_rows2)):
        row = sorted_rows2[i]
        if idxG25 and row[idxG25]:
            if idxG27 or idxG28:
                if not((idxG27 and row[idxG27]) or (idxG28 and row[idxG28]) or (idxG32 and row[idxG32])):
                    before = [not sorted_header[n] in item_ibt + price_ibt for n in range(l)]
                    after = [sorted_header[n] in item_ibt + price_ibt for n in range(l)]
                    row1 = ['']*l
                    for n in range(l):
                        if n < dim_n or before[n]:
                            row1[n] = row[n]
                    reviced_rows.append(row1)
                    row2 = ['']*l
                    for n in range(l):
                        if n < dim_n or after[n]:
                            row2[n] = row[n]
                    reviced_rows.append(row2)
                elif (idxG27 and row[idxG27]):
                    print(row[idxG27])
                    last_i = len(reviced_rows) - 1
                    previous_row = reviced_rows[last_i]
                    reviced_rows[last_i] = row
                    reviced_rows.append(previous_row)
                elif (idxG28 and row[idxG28]):
                    print(row[idxG28])
                    last_i = len(reviced_rows) - 1
                    previous_row = reviced_rows[last_i]
                    reviced_rows[last_i] = row
                    reviced_rows.append(previous_row)
                else:
                    reviced_rows.append(row)
            else:
                reviced_rows.append(row)
        
    invoiceText = '''
<Invoice \n
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2"
    xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2"
    xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2"
	xmlns:ccts="urn:un:unece:uncefact:documentation:2"
	xmlns:ext="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2"
	xmlns:qdt="urn:oasis:names:specification:ubl:schema:xsd:QualifiedDatatypes-2"
	xmlns:udt="urn:un:unece:uncefact:data:specification:UnqualifiedDataTypesSchemaModule:2"
	xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2 http://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/UBL-Invoice-2.1.xsd" /> 
  '''
    root = ET.XML(invoiceText)
    set_path_value(root, ['cbc:UBLVersionID'], '2.1', 'Code')

    for i in range(1, len(reviced_rows)):
        record = reviced_rows[i]
        if DEBUG:
            elements = [f"{reviced_rows[0][i]}:{pintDict[reviced_rows[0][i].replace('G','ibg-')]['BT']} {record[i]}"
                        for i in range(len(record)) if record[i]]
            print('\n'.join(elements))
        set_record(record)

    writeET(root,out_file)

    if VERBOSE:
        print(f'** END ** {out_file}')

__init__.py

__init__.py
# print("Load pint_ja/generate_ubl/dic2etree/__init__.py")
from .dic2etree import dict_to_etree
from .dic2etree import etree_to_dict

ns = {
  '': 'urn:oasis:names:specification:ubl:schema:xsd:Invoice-2',
  'xsd': 'http://www.w3.org/2001/XMLSchema',
  'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
  'cac': 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2',
  'cbc': 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2',
  'qdt': 'urn:oasis:names:specification:ubl:schema:xsd:QualifiedDataTypes-2',
  'udt': 'urn:oasis:names:specification:ubl:schema:xsd:UnqualifiedDataTypes-2',
  'ccts': 'urn:un:unece:uncefact:documentation:2',
  'cn': 'urn:oasis:names:specification:ubl:schema:xsd:CreditNote-2',
  'ubl': 'urn:oasis:names:specification:ubl:schema:xsd:Invoice-2',
  'sch': 'http://purl.oclc.org/dsdl/schematron'
}

__all__ = ['dict_to_etree', 'etree_to_dict', 'ns']
                    

XML文書とPython辞書データの変換プログラム

dic2etree.py
# print("Load dic2etree.py")
import xml.etree.ElementTree as ET
# import defusedxml.ElementTree as ET
from collections import defaultdict
# import csv
import pprint

ET.register_namespace('cac', 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2')
ET.register_namespace('cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2')
ET.register_namespace('qdt', 'urn:oasis:names:specification:ubl:schema:xsd:QualifiedDataTypes-2')
ET.register_namespace('udt', 'urn:oasis:names:specification:ubl:schema:xsd:UnqualifiedDataTypes-2')
ET.register_namespace('ccts', 'urn:un:unece:uncefact:documentation:2')
ET.register_namespace('', 'urn:oasis:names:specification:ubl:schema:xsd:Invoice-2')

# https://stackoverflow.com/questions/7684333/converting-xml-to-dictionary-using-elementtree
def etree_to_dict(t):
  d = {t.tag: {} if t.attrib else None}
  children = list(t)
  if children:
    dd = defaultdict(list)
    for dc in map(etree_to_dict, children):
      for k, v in dc.items():
        dd[k].append(v)
    d = {t.tag: {k: v[0] if len(v) == 1 else v
      for k, v in dd.items()}}
  if t.attrib:
    d[t.tag].update(('@' + k, v)
      for k, v in t.attrib.items())
  if t.text:
    text = t.text.strip()
    if children or t.attrib:
      if text:
        d[t.tag]['#text'] = text
    else:
      d[t.tag] = text
  return d

def dict_to_etree(d, root):
  def _to_etree(d, root):
    if not d:
      pass
    elif isinstance(d, str):
      root.text = d
    elif isinstance(d, dict):
      for k,v in d.items():
        assert isinstance(k, str)
        if k.startswith('#'):
          try:
            assert k == '#text' and isinstance(v, str)
            root.text = v
          except (Exception, ValueError, TypeError) as e:
            print(e, v)
        elif k.startswith('@'):
          if isinstance(v, str): # 2021-06-05
            root.set(k[1:], v)
          else:
            pass
        elif isinstance(v, list):
          for e in v:
            _to_etree(e, ET.SubElement(root, k))
        else:
          _to_etree(v, ET.SubElement(root, k))
    else:
      assert d == 'invalid type', (type(d), d)
  assert isinstance(d, dict) and len(d) == 1
  tag, body = next(iter(d.items()))
  _to_etree(body, root)
  return root

エラー箇所

PINT-UBL-validation-preprocessed.schでスキーマトロンチェックしたところ、次のメッセージでした。
[ibr-co-15]-Invoice total amount with Tax (ibt-112) = Invoice total amount without Tax (ibt-109) + Invoice total Tax amount (ibt-110).
これは、税込合計金額の計算が間違っているということですから、検算してみると。
ibt-109 + ibt-110 = 255990 + 20000 = 275990 ですから、ibt-112に記載された281990が間違っていました。
ibt-112を275990に訂正したところ、今度は、
[ibr-co-16]-Amount due for payment (ibt-115) = Invoice total amount with Tax (ibt-112) – Paid amount (ibt-113) + Rounding amount (ibt-114).
ibt-112だけでなくibt-115も同様の訂正が必要でした。
今度は、Validation sucsessfulです。
さて、他の箇所は大丈夫でしょうか?税額計算は、ibt-116課税対象金額 200000に標準税率10%をかけた金額20000ですから正しそうです。
標準税率10%の明細行は、250000 + 2500 = 252500 ですので、ibt-116課税対象金額は、 200000でなく252500でなければなりません。
JP PINT 0.9.3では、BIS Billing 3.0で提供されていた、税率ごとの明細行の請求金額の合計計算についてのチェックが提供されていません。
このため、C4での検算が不可欠となっています。
詳しくは、BIS Billing 3.0からJP PINT 0.9.3に引き継がれなかった検証ルールをお読みください。