Steps to connect MongoDB and Solr using DataImportHandler

chetna agarwal picture chetna agarwal · Jan 30, 2014 · Viewed 8.9k times · Source

I am new to SOLR and MONGODB.

I am trying to index data from mongodb into SOLR using DataImportHandler but I could not find the exact steps that I need to follow.

Could you please help me in getting the exact steps to index MongoDB into Solr using DataImportHandler?

SolrVersion - solr-4.6.0

MongoDB version- 2.2.7

Answer

Manjunath H picture Manjunath H · Jan 22, 2015

Late to answer, however thought people might find it useful.

Below are the steps for importing data from mongodb to Solr 4.7.0 using DataImportHandler.

Step 1:

Assume that your Mongodb has following database and collection

Database Name: Test
Collection Name: sample

The sample collection has following documents

db.sample.find()
{ "_id" : ObjectId("54c0c6666ee638a21198793b"), "Name" : "Rahul", "EmpNumber" : 452123 }
{ "_id" : ObjectId("54c0c7486ee638a21198793c"), "Name" : "Manohar", "EmpNumber" : 784521 }

Step 2:

Create a lib folder in your solrhome folder( which has bin and collection1 folders)

add below jar files to lib folder. You can download solr-mongo-importer from here!

- solr-dataimporthandler-4.7.0.jar
- solr-mongo-importer-1.0.0.jar 
- mongo-java-driver-2.10.1.jar (this is the mongo java driver)

Step 3:

Declare Solr fields in schema.xml(assumed that id is already defined by default)

add below fields in schema.xml inside the <fields> </fields> tag.

 <field name="Name" type="text_general" indexed="true" stored="true"/>
 <field name="EmployeeNumber" type="int" indexed="true" stored="true"/>

Step 4:

Declare data-config file in solrconfig.xml by adding below code inside <config> </config> tag.

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">  
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>

Step 5:

Create a data-config.xml file in the path collection1\conf\ (which by default holds solrconfig.xml and schema.xml)

data-config.xml

<?xml version="1.0"?>
<dataConfig>
<dataSource name="MyMongo" type="MongoDataSource" database="Test" />
<document name="import">
 <!-- if query="" then it imports everything -->
     <entity  processor="MongoEntityProcessor"
             query="{Name:'Rahul'}"
             collection="sample"   
             datasource="MyMongo"
             transformer="MongoMapperTransformer" name="sample_entity">

               <!--  If mongoField name and the field declared in schema.xml are same than no need to declare below.
                     If not same than you have to refer the mongoField to field in schema.xml
                    ( Ex: mongoField="EmpNumber" to name="EmployeeNumber"). -->                                              

           <field column="_id"  name="id"/>               
           <field column="EmpNumber" name="EmployeeNumber" mongoField="EmpNumber"/>                            
       </entity>
 </document>
</dataConfig>

Step 6:

Assuming solr (I have used port 8080) and mongodb are running, open the following link http://localhost:8080/solr/dataimport?command=full-import in your browser for importing data from mongodb to solr.

fields imported are _id,Name and EmpNumber(MongoDB) as id,Name and EmployeeNumber(Solr).

You can see the result in http://localhost:8080/solr/query?q=*