Pythonで構造化CSVを自動生成するスクリプトの解説

本記事では、複数のCSVファイル（請求書・買手・売手・明細・税情報など）をもとに、構造化された1枚または複数の請求書情報を出力するPythonスクリプトについて解説します。
中小企業共通EDIやXBRL-GLなどの仕訳帳・インボイス処理に応用可能です。

1. 背景

電子インボイスや構造化会計データの普及により、請求データをCSV形式で標準化し、税務・会計ソフト間で連携可能にすることが求められています。本スクリプトは、そのためのテンプレートとなります。

2. スクリプトの目的

請求書ごとに構造化された行形式（ヘッダ行、税情報行、明細行）を出力
単一または複数の請求書を対象に、柔軟な出力
Python＋Pandasによるシンプルな実装

3. ファイル構成

ファイル名

内容

Invoice.csv

請求書ヘッダ情報

Buyer.csv

買手（請求先）マスタ

Seller.csv

売手（供給者）マスタ

InvoiceLine.csv

明細行情報

Item.csv

商品マスタ（品目名、税率など）

TaxBreakdown.csv

税区分ごとの税情報

4. 出力されるCSV構造

出力CSVは以下の3階層の情報を含みます：

請求書ヘッダ（dInvoice）
税明細（dTaxBreakdown）
明細行（dInvoiceLine）

これにより、タブ形式の構造化CSV（tidy data）が実現されます。

5. 実行コマンド例

単一請求書（例：INV002）を出力：

python structured_invoice_csv.py --base_dir 2025/202506/20250601/ \
       --invoice_id INV002 \
       --single_csv_path 2025/202506/20250601/invoice_INV002.csv

全請求書を出力：

python structured_invoice_csv.py --base_dir 2025/202506/20250601/ \
       --all_csv_path 2025/202506/20250601/invoice_all.csv

6. 主な処理の流れ

6.1. 1. 各CSVファイルの読み込み

invoice_df = pd.read_csv(...Invoice.csv)
buyer_df = pd.read_csv(...Buyer.csv)
...

6.2. 2. カラム構造定義

出力CSVの列構造を固定し、複数インボイスにも対応。

columns = [
    "dInvoice", "dTaxBreakdown", "dInvoiceLine", ...
]

6.3. 3. 単一インボイス出力（`export_single_invoice`）

ヘッダ情報の抽出と出力
税区分ごとにユニークキーで並べ替え、dTaxBreakdown 番号を付与
明細行を ID.factorize() で順番に番号付けし出力

6.4. 4. 全インボイス出力（`export_all_invoices`）

請求書をID順にソートし dInvoice を連番付与
税と明細は各インボイスごとに繰り返し出力

6.5. 5. 書き出し処理

NA/空白を適切に処理し、utf-8-sig でCSV保存。

df.to_csv(path, index=False, encoding="utf-8-sig")

7. launch.json例（VS Code）

{
  "name": "構造化インボイスCSV出力",
  "type": "python",
  "request": "launch",
  "program": "${workspaceFolder}/structured_invoice_csv.py",
  "args": [
    "--base_dir", "2025/202506/20250601/",
    "--invoice_id", "INV002",
    "--single_csv_path", "2025/202506/20250601/invoice_INV002.csv",
    "--all_csv_path", "2025/202506/20250601/invoice_all.csv"
  ]
}

8. おわりに

このスクリプトは、実務での請求書やインボイスの構造化処理にそのまま活用できます。

#!/usr/bin/env python3
# coding: utf-8
"""
Universal Audit Data Adaptor: structured_csv.py

This script processes CSV files dumped from the RDB and converts them into a hierarchically organized structured CSV format.

designed by SAMBUICHI, Nobuyuki (Sambuichi Professional Engineers Office)
written by SAMBUICHI, Nobuyuki (Sambuichi Professional Engineers Office)

Creation Date: 2025-05-30
Last Modified: 2025-06-02

MIT License

(c) 2024 SAMBUICHI Nobuyuki (Sambuichi Professional Engineers Office)

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
"""
import pandas as pd
import os
import argparse

def main(base_dir, invoice_id=None, single_csv_path=None, all_csv_path=None):
    # 出力ファイル名のデフォルト設定（指定がなければ標準のファイル名を使用）
    if single_csv_path is None:
        single_csv_path = os.path.join(base_dir, "structured_invoice_export0.csv")
    if all_csv_path is None:
        all_csv_path = os.path.join(base_dir, "structured_invoice_export1.csv")

    # --- CSV 読み込み ---
    # 各種マスタファイルと取引データを読み込む
    invoice_df = pd.read_csv(os.path.join(base_dir, "Invoice.csv"), dtype=str)
    buyer_df = pd.read_csv(os.path.join(base_dir, "Buyer.csv"), dtype=str)
    seller_df = pd.read_csv(os.path.join(base_dir, "Seller.csv"), dtype=str)
    tax_df = pd.read_csv(os.path.join(base_dir, "TaxBreakdown.csv"), dtype=str)
    line_df = pd.read_csv(os.path.join(base_dir, "InvoiceLine.csv"), dtype=str)
    item_df = pd.read_csv(os.path.join(base_dir, "Item.csv"), dtype=str)

    # --- カラム定義 ---
    # 出力CSVの列順と構造を定義する
    columns = [
        "dInvoice", "dTaxBreakdown", "dInvoiceLine",
        "Invoice.ID", "issueDate", "typeCode", "dueDate",
        "Buyer.name", "Seller.name", "Seller.taxID",
        "sumOfLineNetAmount", "totalAmountWithoutTax", "totalTaxAmount", "totalAmountWithTax",
        "TaxBreakdown.taxCategoryCode", "TaxBreakdown.taxCategoryRate", "TaxBreakdown.taxCategoryTaxAmount",
        "InvoiceLine.ID", "InvoiceLine.netAmount", "InvoiceLine.quantity", "InvoiceLine.uom",
        "Item.ID", "Item.name", "Item.price", "Item.baseQuantity", "Item.uom",
        "Item.taxCategoryCode", "Item.taxCategoryRate"
    ]

    # --- 単一請求書の出力処理 ---
    def export_single_invoice(inv_id):
        rows = []
        # 対象の請求書レコードを抽出
        i = invoice_df[invoice_df["ID"] == inv_id]
        if i.empty:
            raise ValueError(f"Invoice {inv_id} not found.")
        i = i.iloc[0]
        # BuyerおよびSeller情報を取得
        b = buyer_df[buyer_df["ID"] == i["buyerID"]].iloc[0]
        s = seller_df[seller_df["ID"] == i["sellerID"]].iloc[0]

        # 請求書ヘッダ行を追加
        rows.append({
            "dInvoice": 1, "dTaxBreakdown": pd.NA, "dInvoiceLine": pd.NA,
            "Invoice.ID": i["ID"], "issueDate": i.get("issueDate", ""), "typeCode": i.get("typeCode", ""),
            "dueDate": i.get("dueDate", ""), "Buyer.name": b.get("name", ""), "Seller.name": s.get("name", ""),
            "Seller.taxID": s.get("taxID", ""), "sumOfLineNetAmount": i.get("sumOfLineNetAmount", ""),
            "totalAmountWithoutTax": i.get("totalAmountWithoutTax", ""), "totalTaxAmount": i.get("totalTaxAmount", ""),
            "totalAmountWithTax": i.get("totalAmountWithTax", ""),
            **{k: "" for k in columns[14:]}
        })

        # 税情報の出力（ユニークなカテゴリごとに並べ替えて順番を振る）
        tb = tax_df[tax_df["invoiceID"] == inv_id].copy()
        tb["key"] = tb["taxCategoryCode"].fillna('') + "|" + tb["taxCategoryRate"].fillna('')
        tb["dTaxBreakdown"] = tb["key"].factorize()[0] + 1

        for _, t in tb.iterrows():
            row = {k: "" for k in columns}
            row.update({
                "dInvoice": 1,
                "dTaxBreakdown": t["dTaxBreakdown"],
                "TaxBreakdown.taxCategoryCode": t.get("taxCategoryCode", ""),
                "TaxBreakdown.taxCategoryRate": t.get("taxCategoryRate", ""),
                "TaxBreakdown.taxCategoryTaxAmount": t.get("taxCategoryTaxAmount", "")
            })
            rows.append(row)

        # 明細行情報の出力
        il = line_df[line_df["invoiceID"] == inv_id].copy()
        il["dInvoiceLine"] = il["ID"].factorize()[0] + 1

        for _, l in il.iterrows():
            item = item_df[item_df["ID"] == l["itemID"]].iloc[0]
            row = {k: "" for k in columns}
            row.update({
                "dInvoice": 1,
                "dInvoiceLine": l["dInvoiceLine"],
                "InvoiceLine.ID": l.get("ID", ""),
                "InvoiceLine.netAmount": l.get("netAmount", ""),
                "InvoiceLine.quantity": l.get("quantity", ""),
                "InvoiceLine.uom": l.get("uom", ""),
                "Item.ID": item.get("ID", ""), "Item.name": item.get("name", ""),
                "Item.price": item.get("price", ""), "Item.baseQuantity": item.get("baseQuantity", ""),
                "Item.uom": item.get("uom", ""), "Item.taxCategoryCode": item.get("taxCategoryCode", ""),
                "Item.taxCategoryRate": item.get("taxCategoryRate", "")
            })
            rows.append(row)

        return pd.DataFrame(rows, columns=columns)

    # --- 全請求書の出力処理 ---
    def export_all_invoices():
        rows = []
        invoice_df_sorted = invoice_df.sort_values("ID").copy()
        invoice_df_sorted["dInvoice"] = range(1, len(invoice_df_sorted) + 1)

        for _, i in invoice_df_sorted.iterrows():
            d_inv = i["dInvoice"]
            b = buyer_df[buyer_df["ID"] == i["buyerID"]].iloc[0]
            s = seller_df[seller_df["ID"] == i["sellerID"]].iloc[0]

            # 請求書ヘッダ行
            rows.append({
                "dInvoice": d_inv, "dTaxBreakdown": pd.NA, "dInvoiceLine": pd.NA,
                "Invoice.ID": i["ID"], "issueDate": i.get("issueDate", ""), "typeCode": i.get("typeCode", ""),
                "dueDate": i.get("dueDate", ""), "Buyer.name": b.get("name", ""), "Seller.name": s.get("name", ""),
                "Seller.taxID": s.get("taxID", ""), "sumOfLineNetAmount": i.get("sumOfLineNetAmount", ""),
                "totalAmountWithoutTax": i.get("totalAmountWithoutTax", ""), "totalTaxAmount": i.get("totalTaxAmount", ""),
                "totalAmountWithTax": i.get("totalAmountWithTax", ""),
                **{k: "" for k in columns[14:]}
            })

            # 税ブレークダウン行
            tb = tax_df[tax_df["invoiceID"] == i["ID"]].copy()
            tb["key"] = tb["taxCategoryCode"].fillna('') + "|" + tb["taxCategoryRate"].fillna('')
            tb["dTaxBreakdown"] = tb["key"].factorize()[0] + 1

            for _, t in tb.iterrows():
                row = {k: "" for k in columns}
                row.update({
                    "dInvoice": d_inv,
                    "dTaxBreakdown": t["dTaxBreakdown"],
                    "TaxBreakdown.taxCategoryCode": t.get("taxCategoryCode", ""),
                    "TaxBreakdown.taxCategoryRate": t.get("taxCategoryRate", ""),
                    "TaxBreakdown.taxCategoryTaxAmount": t.get("taxCategoryTaxAmount", "")
                })
                rows.append(row)

            # 明細行
            il = line_df[line_df["invoiceID"] == i["ID"]].copy()
            il["dInvoiceLine"] = il["ID"].factorize()[0] + 1

            for _, l in il.iterrows():
                item = item_df[item_df["ID"] == l["itemID"]].iloc[0]
                row = {k: "" for k in columns}
                row.update({
                    "dInvoice": d_inv,
                    "dInvoiceLine": l["dInvoiceLine"],
                    "InvoiceLine.ID": l.get("ID", ""),
                    "InvoiceLine.netAmount": l.get("netAmount", ""),
                    "InvoiceLine.quantity": l.get("quantity", ""),
                    "InvoiceLine.uom": l.get("uom", ""),
                    "Item.ID": item.get("ID", ""), "Item.name": item.get("name", ""),
                    "Item.price": item.get("price", ""), "Item.baseQuantity": item.get("baseQuantity", ""),
                    "Item.uom": item.get("uom", ""), "Item.taxCategoryCode": item.get("taxCategoryCode", ""),
                    "Item.taxCategoryRate": item.get("taxCategoryRate", "")
                })
                rows.append(row)

        return pd.DataFrame(rows, columns=columns)

    # --- 処理分岐 ---
    if invoice_id:
        # 単一請求書出力
        if single_csv_path:
            df0 = export_single_invoice(invoice_id)
            for col in ["dInvoice", "dTaxBreakdown", "dInvoiceLine"]:
                df0[col] = df0[col].apply(lambda x: pd.NA if not str(x).strip().isdigit() else int(x)).astype("Int64")
            df0.to_csv(single_csv_path, index=False, encoding="utf-8-sig")
            print(f"✅ Exported: {single_csv_path}")
        # 全請求書も出力
        if all_csv_path:
            df1 = export_all_invoices()
            for col in ["dInvoice", "dTaxBreakdown", "dInvoiceLine"]:
                df1[col] = df1[col].apply(lambda x: pd.NA if not str(x).strip().isdigit() else int(x)).astype("Int64")
            df1.to_csv(all_csv_path, index=False, encoding="utf-8-sig")
            print(f"✅ Exported: {all_csv_path}")
    else:
        # 全請求書出力（単体請求書ID未指定時）
        df1 = export_all_invoices()
        for col in ["dInvoice", "dTaxBreakdown", "dInvoiceLine"]:
            df1[col] = df1[col].apply(lambda x: pd.NA if not str(x).strip().isdigit() else int(x)).astype("Int64")
        df1.to_csv(all_csv_path, index=False, encoding="utf-8-sig")
        print(f"✅ Exported: {all_csv_path}")

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="構造化インボイスCSVエクスポート")
    parser.add_argument("--base_dir", required=True, help="CSVファイル格納ディレクトリ")
    parser.add_argument("--invoice_id", help="単一請求書ID（指定時はその請求書を出力）")
    parser.add_argument("--single_csv_path", help="単一請求書の出力先ファイルパス")
    parser.add_argument("--all_csv_path", help="全請求書の出力先ファイルパス")
    args = parser.parse_args()
    main(
        args.base_dir,
        invoice_id=args.invoice_id,
        single_csv_path=args.single_csv_path,
        all_csv_path=args.all_csv_path
    )

1. 背景
2. スクリプトの目的
3. ファイル構成
4. 出力されるCSV構造
5. 実行コマンド例
6. 主な処理の流れ
7. launch.json例（VS Code）
8. おわりに

三分一技術士事務所

Pythonで構造化CSVを自動生成するスクリプトの解説

Pythonで構造化CSVを自動生成するスクリプトの解説

1. 背景

2. スクリプトの目的

3. ファイル構成

4. 出力されるCSV構造

5. 実行コマンド例

6. 主な処理の流れ

6.1. 1. 各CSVファイルの読み込み

6.2. 2. カラム構造定義

6.3. 3. 単一インボイス出力（`export_single_invoice`）

6.4. 4. 全インボイス出力（`export_all_invoices`）

6.5. 5. 書き出し処理

7. launch.json例（VS Code）

8. おわりに

コメント

コメントを残すコメントをキャンセル

2025年6月
月	火	水	木	金	土	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Pythonで構造化CSVを自動生成するスクリプトの解説

Pythonで構造化CSVを自動生成するスクリプトの解説

1. 背景

2. スクリプトの目的

3. ファイル構成

4. 出力されるCSV構造

5. 実行コマンド例

6. 主な処理の流れ

6.1. 1. 各CSVファイルの読み込み

6.2. 2. カラム構造定義

6.3. 3. 単一インボイス出力（export_single_invoice）

6.4. 4. 全インボイス出力（export_all_invoices）

6.5. 5. 書き出し処理

7. launch.json例（VS Code）

8. おわりに

コメント

コメントを残す コメントをキャンセル

6.3. 3. 単一インボイス出力（`export_single_invoice`）

6.4. 4. 全インボイス出力（`export_all_invoices`）

コメントを残すコメントをキャンセル