solr
solr的安装
简介
Solr是Apache Lucene专案的开源企业搜寻平台。
其主要功能包括全文检索、命中标示、分面搜寻、动态聚类、资料库整合,以及富文字的处理。
Solr是高度可延伸的,并提供了分散式搜寻和索引复制。
Solr是最流行的企业级搜寻引擎,Solr 4还增加了NoSQL支援。
资源
依赖环境
目录结构
bin:solr的运行脚本
contrib:solr的一些软件/插件,用于增强solr的功能。
dist:该目录包含jar文件,以及相关的依赖文件。
docs:solr的API文档
example:solr工程示例:
licenses:solr相关引用的一些许可信息
Server:solr的核心,可以看成是一个数据库里面有多个实例
按照示例启动solr
执行启动命令
./bin/solr start -e cloud
提示选择节点数量
*** [WARN] *** Your Max Processes Limit is currently 63068.
It should be set to 65000 to avoid operational disruption.
If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh
Welcome to the SolrCloud example!
This interactive session will help you launch a SolrCloud cluster on your local workstation.
To begin, how many Solr nodes would you like to run in your local cluster? (specify 1-4 nodes) [2]:
这个提示是询问要运行多少个节点。注意最后一行末尾的[2];这是默认的节点数。若是不需要修改默认数据,按 enter
键即可
3
Ok, let's start up 3 Solr nodes for your example SolrCloud cluster.
Please enter the port for node1 [8983]:
Please enter the port for node2 [7574]:
Please enter the port for node3 [8984]:
若是想要3各节点则输入3即可,后续会提示输入每个阶段的端口,后面的数字是默认端口,不需要修改则按 enter
键即可。
但是启动会报错
*** [WARN] *** Your Max Processes Limit is currently 63068.
It should be set to 65000 to avoid operational disruption.
If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh
WARNING: Starting Solr as the root user is a security risk and not considered best practice. Exiting.
Please consult the Reference Guide. To override this check, start with argument '-force'
这是建议我们不直接使用root用户启动,要是确定用root用户启动则在后面添加参数 -force
即可
- 建立新用户
添加用户
useradd -m solr -p solr
把solr-8.8.2
相关文件夹赋权给solr
用户
chown -R solr /var/local/solr-8.8.2/
切换用户
su solr
root
用户切换时不需要输入密码,切换的solr
用户有部分权限
再次启动solr
Started Solr server on port 8984 (pid=23742). Happy searching!
INFO - 2021-04-13 19:45:19.968; org.apache.solr.common.cloud.ConnectionManager; Waiting for client to connect to ZooKeeper
INFO - 2021-04-13 19:45:19.987; org.apache.solr.common.cloud.ConnectionManager; zkClient has connected
INFO - 2021-04-13 19:45:19.988; org.apache.solr.common.cloud.ConnectionManager; Client is connected to ZooKeeper
INFO - 2021-04-13 19:45:20.006; org.apache.solr.common.cloud.ZkStateReader; Updated live nodes from ZooKeeper... (0) -> (3)
INFO - 2021-04-13 19:45:20.023; org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider; Cluster at localhost:9983 ready
Now let's create a new collection for indexing documents in your 3-node cluster.
Please provide a name for your new collection: [gettingstarted]
启动完成后,系统将提示您创建一个用于索引数据的集合
How many shards would you like to split solrtest into? [2]
How many replicas per shard would you like to create? [2]
Please choose a configuration for the solrtest collection, available options are:
_default or sample_techproducts_configs [_default]
Created collection 'solrtest' with 2 shard(s), 2 replica(s) with config-set 'solrtest'
Enabling auto soft-commits with maxTime 3 secs using the Config API
POSTing request to Config API: http://localhost:8983/solr/solrtest/config
{"set-property":{"updateHandler.autoSoftCommit.maxTime":"3000"}}
Successfully set-property updateHandler.autoSoftCommit.maxTime to 3000
此时我们创建了一个3
实例,2
分片(平均分配索引数据),2
副本(用于故障转移)的solr集群
若是不想要集群,直接单机启动即可
./bin/solr start
查看启动的进程,发现启动参数如下
java -server -Xms512m -Xmx512m -XX:+UseG1GC -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled
-XX:MaxGCPauseMillis=250 -XX:+UseLargePages -XX:+AlwaysPreTouch -XX:+ExplicitGCInvokesConcurrent
-verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
-Xloggc:/var/local/solr-8.8.2/example/cloud/node1/solr/../logs/solr_gc.log
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M
-Dsolr.jetty.inetaccess.includes= -Dsolr.jetty.inetaccess.excludes=
-DzkClientTimeout=30000 -DzkRun -Dsolr.log.dir=/var/local/solr-8.8.2/example/cloud/node1/solr/../logs
-Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC
-XX:-OmitStackTraceInFastThrow -Djetty.home=/var/local/solr-8.8.2/server
-Dsolr.solr.home=/var/local/solr-8.8.2/example/cloud/node1/solr
-Dsolr.data.home= -Dsolr.install.dir=/var/local/solr-8.8.2
-Dsolr.default.confdir=/var/local/solr-8.8.2/server/solr/configsets/_default/conf
-Dlog4j.configurationFile=/var/local/solr-8.8.2/server/resources/log4j2.xml -Xss256k
-Dsolr.log.muteconsole -XX:OnOutOfMemoryError=/var/local/solr-8.8.2/bin/oom_solr.sh 8983
/var/local/solr-8.8.2/example/cloud/node1/solr/../logs -jar start.jar --module=http
此数据可以在solr的控制台中查看 http://localhost:8983/solr/#/
关闭solr
bin/solr stop -p 8983
or
bin/solr stop -all
自定义启动
Creating Solr home directory /var/local/solr-8.8.2/example/cloud/node1/solr
Cloning /var/local/solr-8.8.2/example/cloud/node1 into
/var/local/solr-8.8.2/example/cloud/node2
Cloning /var/local/solr-8.8.2/example/cloud/node1 into
/var/local/solr-8.8.2/example/cloud/node3
Starting up Solr on port 8888 using command:
"bin/solr" start -cloud -p 8888 -s "example/cloud/node1/solr"
Starting up Solr on port 7574 using command:
"bin/solr" start -cloud -p 7574 -s "example/cloud/node2/solr" -z localhost:9888
Starting up Solr on port 8984 using command:
"bin/solr" start -cloud -p 8984 -s "example/cloud/node3/solr" -z localhost:9888
关于ZooKeeper
solr内部集了ZooKeeper,当solr启动的时候ZooKeeper也会启动,solr关闭的时候ZooKeeper也会关闭,它不提供任何故障转移,依赖于它的任何分片或Solr实例也无法彼此通信,因此生产中最好使用外部的ZooKeeper
若是单机测试时不想自己启动ZooKeeper,可以修改配置文件下的zoo.cfg
文件,放开指定端口的配置,这样所有的solr启动实例都会注册到这个zookeeper中(启动第二次会报错,然后注册已启动的zk实例中)
同步数据库数据
- 导包
导入 dist/solr-dataimporthandler-*.jar 两个jar包及下载mysql驱动包,放置在 server/solr-webapp/webapp/WEB-INF/lib 路径
- 修改配置文件
添加数据库配置文件 data-config.xml
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.cj.jdbc.Driver"
url="jdbc:mysql://cdb-ilzz1jt1.gz.tencentcdb.com:10068/writing_helper?userSSL=true&useUnicode=true&characterEncoding=UTF8&serverTimezone=UTC"
user="root"
password="{password}"/>
<document>
<entity name="test_collection" pk="id"
query="select id,idiom,classify_id,description,derivation,content,ctime from tbl_idiom_info"
deltaImportQuery="select id,idiom,classify_id,description,derivation,content,ctime from tbl_idiom_info where id='${dataimporter.delta.id}'"
deltaQuery="select id,idiom,classify_id,description,derivation,content,ctime from tbl_idiom_info where ctime > '${dataimporter.last_index_time}'">
<field column="id" name="index" />
<field column="idiom" name="idiom" />
<field column="classify_id" name="classify_id" />
<field column="description" name="description" />
<field column="derivation" name="derivation" />
<field column="content" name="content" />
<field column="ctime" name="ctime" />
</entity>
</document>
</dataConfig>
在solrconfig中添加配置
<requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
在manage-schame中添加配置
<field name="index" type="plong" indexed="true" stored="true" />
<field name="idiom" type="text_tr" indexed="true" stored="true" />
<field name="classify_id" type="text_tr" indexed="true" stored="true" />
<field name="description" type="text_tr" indexed="true" stored="true" />
<field name="derivation" type="text_tr" indexed="true" stored="true" />
<field name="content" type="text_tr" indexed="true" stored="true" />
<field name="ctime" type="string" indexed="true" stored="true" />
- 重启
添加插件
- 添加拼音分词器
<!-- ik分词器 -->
<fieldType name="text_ik" class="solr.TextField">
<analyzer type="index">
<tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false" conf="ik.conf"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true" conf="ik.conf"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
作者:admin 创建时间:2023-04-20 16:46
最后编辑:admin 更新时间:2024-05-14 10:08
最后编辑:admin 更新时间:2024-05-14 10:08