本文演示以Spark作為分析引擎,Cassandra作為數(shù)據(jù)存儲(chǔ),而使用Spring Boot來(lái)開發(fā)驅(qū)動(dòng)程序的示例。
1.前置條件
- 安裝Spark(本文使用Spark-1.5.1,如安裝目錄為/opt/spark)
- 安裝Cassandra(3.0+)
創(chuàng)建keyspace
1
|
CREATE KEYSPACE hfcb WITH REPLICATION = { 'class' : 'SimpleStrategy' , 'replication_factor' : 3 }; |
創(chuàng)建table
1
2
3
4
5
|
CREATE TABLE person ( id text PRIMARY KEY, first_name text, last_name text ); |
插入測(cè)試數(shù)據(jù)
1
2
3
4
5
|
insert into person (id,first_name,last_name) values( '1' , 'wang' , 'yunfei' ); insert into person (id,first_name,last_name) values( '2' , 'peng' , 'chao' ); insert into person (id,first_name,last_name) values( '3' , 'li' , 'jian' ); insert into person (id,first_name,last_name) values( '4' , 'zhang' , 'jie' ); insert into person (id,first_name,last_name) values( '5' , 'liang' , 'wei' ); |
2.spark-cassandra-connector安裝
讓Spark-1.5.1能夠使用Cassandra作為數(shù)據(jù)存儲(chǔ),需要加上下面jar包的依賴(示例將包放置于 /opt/spark/managed-lib/ 目錄,可任意):
1
2
3
4
5
6
7
8
|
cassandra-clientutil- 3.0 . 2 .jar cassandra-driver-core- 3.1 . 4 .jar guava- 16.0 . 1 .jar cassandra-thrift- 3.0 . 2 .jar joda-convert- 1.2 .jar joda-time- 2.9 . 9 .jar libthrift- 0.9 . 1 .jar spark-cassandra-connector_2. 10 - 1.5 . 1 .jar |
在 /opt/spark/conf 目錄下,新建 spark-env.sh 文件,輸入下面內(nèi)容
1
|
SPARK_CLASSPATH=/opt/spark/managed-lib/* |
3.Spring Boot應(yīng)用開發(fā)
添加 spark-cassandra-connector 和 spark 依賴
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
<dependency> <groupId>com.datastax.spark</groupId> <artifactId>spark-cassandra-connector_2. 10 </artifactId> <version> 1.5 . 1 </version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2. 10 </artifactId> <version> 1.5 . 1 </version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2. 10 </artifactId> <version> 1.5 . 1 </version> </dependency> |
在 application.yml 中配置 spark 與 cassandra 路徑
1
2
3
|
spark.master: spark: //master:7077 cassandra.host: 192.168 . 1.140 cassandra.keyspace: hfcb |
此處特別說(shuō)明 spark://master:7077 是域名形式而不是ip地址,可修改本地 hosts 文件將 master 與 ip 地址映射。
配置 SparkContext 和 CassandraSQLContext
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
@Configuration public class SparkCassandraConfig { @Value ( "${spark.master}" ) String sparkMasterUrl; @Value ( "${cassandra.host}" ) String cassandraHost; @Value ( "${cassandra.keyspace}" ) String cassandraKeyspace; @Bean public JavaSparkContext javaSparkContext(){ SparkConf conf = new SparkConf( true ) .set( "spark.cassandra.connection.host" , cassandraHost) // .set("spark.cassandra.auth.username", "cassandra") // .set("spark.cassandra.auth.password", "cassandra") .set( "spark.submit.deployMode" , "client" ); JavaSparkContext context = new JavaSparkContext(sparkMasterUrl, "SparkDemo" , conf); return context; } @Bean public CassandraSQLContext sqlContext(){ CassandraSQLContext cassandraSQLContext = new CassandraSQLContext(javaSparkContext().sc()); cassandraSQLContext.setKeyspace(cassandraKeyspace); return cassandraSQLContext; } } |
簡(jiǎn)單調(diào)用
1
2
3
4
5
6
7
8
9
|
@Repository public class PersonRepository { @Autowired CassandraSQLContext cassandraSQLContext; public Long countPerson(){ DataFrame people = cassandraSQLContext.sql( "select * from person order by id" ); return people.count(); } } |
啟動(dòng)即可如常規(guī)Spring Boot程序一樣執(zhí)行。
源碼地址: https://github.com/wiselyman/spring-spark-cassandra.git
總結(jié)
以上所述是小編給大家介紹的Spring Boot與Spark、Cassandra系統(tǒng)集成開發(fā)示例,希望對(duì)大家有所幫助,如果大家有任何疑問(wèn)請(qǐng)給我留言,小編會(huì)及時(shí)回復(fù)大家的。在此也非常感謝大家對(duì)服務(wù)器之家網(wǎng)站的支持!
原文鏈接:http://www.wisely.top/2018/02/01/spring_boot-spark-cassandra-integration/