社内se × プログラマ × ビッグデータ

プログラミングなどITに興味があります。

Spark Streaming の textFileStream で複数のディレクトリを対象にしてみる

ソースコード

単に2つの DStream を作成してあげるだけです。

// create DStream from text file
String logDir = "/tmp/logs";
String logDir2 = "/tmp/logs2";
JavaDStream<String> logData = jssc.textFileStream(logDir);
JavaDStream<String> logData2 = jssc.textFileStream(logDir2);

// output
logData.print();
logData2.print();

// start streaming
jssc.start();

// wait for end of job
jssc.awaitTermination();

出力結果
print() の出力結果ですが、以下のように1つの間隔(今回は一秒)に対して
2つの結果が出力される場所が出来ていました。

-------------------------------------------
Time: 1498053561000 ms
-------------------------------------------

-------------------------------------------
Time: 1498053561000 ms
-------------------------------------------

-------------------------------------------
Time: 1498053562000 ms
-------------------------------------------
2017-06-21T22:58:19+09:00	test.access1	{"message":"66.249.69.97 - - [24/Sep/2014:22:25:44 +0000] \"GET /071300/242153 HTTP/1.1\" 404 514 \"-\" \"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)\""}
2017-06-21T22:58:19+09:00	test.access1	{"message":"71.19.157.174 - - [24/Sep/2014:22:26:12 +0000] \"GET /error HTTP/1.1\" 404 505 \"-\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36\""}
2017-06-21T22:58:19+09:00	test.access1	{"message":"71.19.157.174 - - [24/Sep/2014:22:26:12 +0000] \"GET /favicon.ico HTTP/1.1\" 200 1713 \"-\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36\""}
2017-06-21T22:58:19+09:00	test.access1	{"message":"71.19.157.174 - - [24/Sep/2014:22:26:37 +0000] \"GET / HTTP/1.1\" 200 18785 \"-\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36\""}
2017-06-21T22:58:19+09:00	test.access1	{"message":"71.19.157.174 - - [24/Sep/2014:22:26:37 +0000] \"GET /jobmineimg.php?q=m HTTP/1.1\" 200 222 \"http://www.holdenkarau.com/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36\""}

-------------------------------------------
Time: 1498053562000 ms
-------------------------------------------

-------------------------------------------
Time: 1498053571000 ms
-------------------------------------------

-------------------------------------------
Time: 1498053571000 ms
-------------------------------------------

-------------------------------------------
Time: 1498053572000 ms
-------------------------------------------
2017-06-21T22:58:29+09:00	test.access2	{"message":"71.19.157.174 - - [24/Sep/2014:22:26:12 +0000] \"GET /error78978 HTTP/1.1\" 404 505 \"-\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36\""}

-------------------------------------------
Time: 1498053572000 ms
-------------------------------------------