春バッチ分割の例

Spring Batch Partitioningの例

spring batch partitioning

写真クレジット：Spring Source

Spring Batchでは、「パーティショニング」は「それぞれのデータ範囲を処理する複数のスレッド」です。たとえば、1〜100の「プライマリID」が割り当てられたテーブルに100個のレコードがあり、100個のレコード全体を処理するとします。

通常、プロセスは1〜100の単一スレッドの例から始まります。このプロセスは、完了するまでに10分かかると推定されています。

Single Thread - Process from 1 to 100

「パーティション分割」では、10個のスレッドを開始して、それぞれ「id」の範囲に基づいて10個のレコードを処理できます。現在、このプロセスは1分で終了します。

Thread 1 - Process from 1 to 10
Thread 2 - Process from 11 to 20
Thread 3 - Process from 21 to 30
......
Thread 9 - Process from 81 to 90
Thread 10 - Process from 91 to 100

「パーティション化」手法を実装するには、「データの範囲」を適切に計画できるように、処理する入力データの構造を理解する必要があります。

1. チュートリアル

このチュートリアルでは、10個のスレッドを持つ「Partitioner」ジョブを作成する方法を示します。各スレッドは、指定された「id」の範囲に基づいてデータベースからレコードを読み取ります。

使用するツールとライブラリ

メーベン3
Eclipse 4.2
JDK 1.6
Spring Core 3.2.2.RELEASE
Spring Batch 2.2.0.RELEASE
MySQL Javaドライバー5.1.25

P.S Assume “users” table has 100 records.

ユーザーテーブル構造

id, user_login, user_passs, age

1,user_1,pass_1,20
2,user_2,pass_2,40
3,user_3,pass_3,70
4,user_4,pass_4,5
5,user_5,pass_5,52
......
99,user_99,pass_99,89
100,user_100,pass_100,76

2. プロジェクトのディレクトリ構造

最終的なプロジェクト構造、標準のMavenプロジェクトを確認します。

spring-batch-partitioner-before

3. パーティショナー

まず、Partitioner実装を作成し、「partitioning range」をExecutionContextに配置します。後で、バッチジョブXMLファイルで同じfromIdとtiedを宣言します。

この場合、パーティション範囲は次のようになります。

Thread 1 = 1 - 10
Thread 2 = 11 - 20
Thread 3 = 21 - 30
......
Thread 10 = 91 - 100

RangePartitioner.java

package com.example.partition;

import java.util.HashMap;
import java.util.Map;

import org.springframework.batch.core.partition.support.Partitioner;
import org.springframework.batch.item.ExecutionContext;

public class RangePartitioner implements Partitioner {

    @Override
    public Map partition(int gridSize) {

        Map result
                       = new HashMap();

        int range = 10;
        int fromId = 1;
        int toId = range;

        for (int i = 1; i <= gridSize; i++) {
            ExecutionContext value = new ExecutionContext();

            System.out.println("\nStarting : Thread" + i);
            System.out.println("fromId : " + fromId);
            System.out.println("toId : " + toId);

            value.putInt("fromId", fromId);
            value.putInt("toId", toId);

            // give each thread a name, thread 1,2,3
            value.putString("name", "Thread" + i);

            result.put("partition" + i, value);

            fromId = toId + 1;
            toId += range;

        }

        return result;
    }

}

4. バッチジョブ

バッチジョブのXMLファイルを確認します。これは一目瞭然です。ハイライトするポイントはほとんどありません：

パーティショナーの場合、grid-size = number of threads。
jdbcリーダーの例であるpagingItemReaderBeanの場合、#{stepExecutionContext[fromId, toId]}値はrangePartitionerのExecutionContextによって注入されます。
itemProcessor Beanの場合、#{stepExecutionContext[name]}値はrangePartitionerのExecutionContextによって注入されます。
ライターの場合、各スレッドは、ファイル名形式（users.processed[fromId]}-[toId].csv）で異なるcsvファイルにレコードを出力します。

job-partitioner.xml

アイテムプロセッサクラスは、処理アイテムと現在実行中の「スレッド名」のみを出力するために使用されます。

UserProcessor.java - item processor

package com.example.processor;

import org.springframework.batch.item.ItemProcessor;
import com.example.User;

public class UserProcessor implements ItemProcessor {

    private String threadName;

    @Override
    public User process(User item) throws Exception {

        System.out.println(threadName + " processing : "
            + item.getId() + " : " + item.getUsername());

        return item;
    }

    public String getThreadName() {
        return threadName;
    }

    public void setThreadName(String threadName) {
        this.threadName = threadName;
    }

}

5. それを実行します

すべてをロードして実行します…提供された範囲のデータを処理するために10個のスレッドが開始されます。

package com.example;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.JobParametersBuilder;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.context.ApplicationContext;
import org.springframework.context.support.ClassPathXmlApplicationContext;

public class PartitionApp {

  public static void main(String[] args) {
    PartitionApp obj = new PartitionApp ();
    obj.runTest();
  }

  private void runTest() {

    String[] springConfig = { "spring/batch/jobs/job-partitioner.xml" };

    ApplicationContext context = new ClassPathXmlApplicationContext(springConfig);

    JobLauncher jobLauncher = (JobLauncher) context.getBean("jobLauncher");
    Job job = (Job) context.getBean("partitionJob");

    try {

      JobExecution execution = jobLauncher.run(job, new JobParameters());
      System.out.println("Exit Status : " + execution.getStatus());
      System.out.println("Exit Status : " + execution.getAllFailureExceptions());

    } catch (Exception e) {
        e.printStackTrace();
    }

      System.out.println("Done");

  }
}

コンソール出力

Starting : Thread1
fromId : 1
toId : 10

Starting : Thread2
fromId : 11
toId : 20

Starting : Thread3
fromId : 21
toId : 30

Starting : Thread4
fromId : 31
toId : 40

Starting : Thread5
fromId : 41
toId : 50

Starting : Thread6
fromId : 51
toId : 60

Starting : Thread7
fromId : 61
toId : 70

Starting : Thread8
fromId : 71
toId : 80

Starting : Thread9
fromId : 81
toId : 90

Starting : Thread10
fromId : 91
toId : 100

Thread8 processing : 71 : user_71
Thread2 processing : 11 : user_11
Thread3 processing : 21 : user_21
Thread10 processing : 91 : user_91
Thread4 processing : 31 : user_31
Thread6 processing : 51 : user_51
Thread5 processing : 41 : user_41
Thread1 processing : 1 : user_1
Thread9 processing : 81 : user_81
Thread7 processing : 61 : user_61
Thread2 processing : 12 : user_12
Thread7 processing : 62 : user_62
Thread6 processing : 52 : user_52
Thread1 processing : 2 : user_2
Thread9 processing : 82 : user_82
......

プロセスが完了すると、10個のCSVファイルが作成されます。

spring-batch-partitioner-after

users.processed1-10.csv

1,user_1,pass_1,20
2,user_2,pass_2,40
3,user_3,pass_3,70
4,user_4,pass_4,5
5,user_5,pass_5,52
6,user_6,pass_6,69
7,user_7,pass_7,48
8,user_8,pass_8,34
9,user_9,pass_9,62
10,user_10,pass_10,21

6. Misc

6.1 Alternatively, you can inject the #{stepExecutionContext[name]} via annotation.

UserProcessor.java - Annotation version

package com.example.processor;

import org.springframework.batch.item.ItemProcessor;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Scope;
import org.springframework.stereotype.Component;
import com.example.User;

@Component("itemProcessor")
@Scope(value = "step")
public class UserProcessor implements ItemProcessor {

    @Value("#{stepExecutionContext[name]}")
    private String threadName;

    @Override
    public User process(User item) throws Exception {

        System.out.println(threadName + " processing : "
                     + item.getId() + " : " + item.getUsername());

        return item;
    }

}

Springコンポーネントの自動スキャンを有効にしてください。

6.2 Database partitioner reader - MongoDB example.

job-partitioner.xml

完了しました。

ソースコードをダウンロード

ダウンロード-SpringBatch-Partitioner-Example.zip（31 KB）

TOC