-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 12 posts ] 
Author Message
 Post subject: Hibernate Search 5 - MassIndexer performance
PostPosted: Fri Sep 26, 2014 10:10 am 
Beginner
Beginner

Joined: Wed Aug 06, 2014 10:53 am
Posts: 30
Hi,
I'm experiencing a performance problem during the batch indexing process.
I'm trying to load about 50 Millions of documents. When the process starts elaborates about 1000 doc/s, after 2 hour the process elaborates about 600 doc/sec and so on.
Could be a memory leak of the MassIndexer? I realized that the java process reaches the maximum size of the heap space!

I run the MassIndexer with the following parameters:
batchSizeToLoadObjects=800
threadToLoadObjects=32
idFetchSize=100
cacheMode=IGNORE

and I use the following parameters for hibernate search configuration:
hibernate.search.default.directory_provider=filesystem
hibernate.search.default.filesystem_access_type=mmap
hibernate.search.default.exclusive_index_use=true
hibernate.search.worker.execution=sync
hibernate.search.worker.thread_pool.size=32
hibernate.search.​enable_dirty_check=false
hibernate.search.default.indexwriter.max_merge_docs=100
hibernate.search.default.indexwriter.ram_buffer_size=4096
hibernate.search.default.indexwriter.merge_factor=1000

connection pool: c3p0
hibernate search: 5.0.0.Alpha6
hibernate: 4.3.6.Final
spring: 4.0.6.RELEASE

OS: Win 2008R2 64bit
System RAM: 64GB
SQLServer RAM: 16GB
JRE: 1.7 server 64 bit (-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:MaxPermSize=4096m)
WebServer: Tomcat (MaxPoolSize: 4 GB)


Last edited by darioc on Fri Sep 26, 2014 10:57 am, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: Hibernate Search 5 - MassIndexer performance
PostPosted: Fri Sep 26, 2014 10:19 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi,
it is normal that it will slow down a bit as write performance depend on the index size: since the index size is zero at the start, it will be faster initially.

You should monitor your JVM and system metrics to help identify what part is slowing you down the most: I can't guess much from these parameters except these suggestions:

batchSizeToLoadObjects is rather high -> this might be correct for some domain models (it depends on the shape of your graphs of data, like if you have many additional elements which get reloaded by @IndexedEmbedded but I'd initially experiment with lower values like 30 or 100.

idFetchSize -> is rather low, try 1000

cacheMode : IGNORE is the best option if you have almost no relations to load. If you have many indexed relations which need to be loaded, make sure you enable a 2nd level cache and you mark the relation as cacheable (and change this configuration attribute of the MassIndexer).

ram_buffer_size=4096 -> that means 4GB .. your JVM heap needs to be significantly larger than that or it will slow down massively (or even go in out-of-memory)

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Hibernate Search 5 - MassIndexer performance
PostPosted: Fri Sep 26, 2014 10:21 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
P.S.
that it will slow down a bit is normal, but you should be able to get a much higher speed anyway by figuring out how your system is behaving.
Enable Sharding might also help, but I'd look at 2nd level cache first.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Hibernate Search 5 - MassIndexer performance
PostPosted: Fri Sep 26, 2014 11:41 am 
Beginner
Beginner

Joined: Wed Aug 06, 2014 10:53 am
Posts: 30
Hi Sanne,
Now I've run the process with this changes:

batchSizeToLoadObjects=100
idFetchSize=1000
cachemode=NORMAL

and I've enabled the 2nd level cache on IndexedEmbedded classes!

I keep ram_buffer_size to 4GB and move the jvm heapspace max size to 8GB.

I'm going to let work the process during the weekend.
Monday I'll restart the process to check the metrics (by jconsole) that you asked me

About Sharding, I already use it! Do you think I need to shard further the index? Now the size of the index for 2 millions of documents is about 700mb.


Top
 Profile  
 
 Post subject: Re: Hibernate Search 5 - MassIndexer performance
PostPosted: Fri Sep 26, 2014 11:50 am 
Beginner
Beginner

Joined: Wed Aug 06, 2014 10:53 am
Posts: 30
ps. About the IndexedEmbedded fields, they are added to documents by a custom ClassBridge without the use of the annotation "@IndexedEmbedded"


Top
 Profile  
 
 Post subject: Re: Hibernate Search 5 - MassIndexer performance
PostPosted: Mon Sep 29, 2014 12:15 pm 
Beginner
Beginner

Joined: Wed Aug 06, 2014 10:53 am
Posts: 30
The mass indexer performance are improved; now I load twice the number of documents in the same time!
Lowering the parameter value of batchSizeToLoadObjects and increasing the value of idFetchSize I have better performance.

Now I move my attention to the Search.
My faceted search is relatively slow, on small dataset 1.5 millions of documents I take 10 seconds of process on my field "indexed embedded"

Is there some parameter to set to improve it?


Top
 Profile  
 
 Post subject: Re: Hibernate Search 5 - MassIndexer performance
PostPosted: Wed Oct 01, 2014 10:57 am 
Beginner
Beginner

Joined: Wed Aug 06, 2014 10:53 am
Posts: 30
I have not found solutions for better performance: On my dataset - about 50 millions of documents - the system spends four minutes to elaborate the facets!
So I'm going to manage faceted methods with the cache system (ehcache).


Top
 Profile  
 
 Post subject: Re: Hibernate Search 5 - MassIndexer performance
PostPosted: Mon Oct 06, 2014 8:06 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi,
sorry for the delay, I've been away. Glad you improved the indexing performance; twice the speed is good, but remember in case you need it you should be able to get much more from it, but it needs some trial and error to play with the fetch options and caching options for all relations.

Regarding faceting: I don't think we ever tested with 50 million documents, so you might need to explore some new trick. Caching sounds like a good idea: I would be interested to hear your results and maybe we can integrate this as a feature?
But please profile it first, so we can see if there is a better solution - maybe we can significantly improve the faceting code without the need of a cache.

Feel free to open a JIRA at https://hibernate.atlassian.net/browse/HSEARCH , so you can attach there profiling outputs, screenshots and other files which might be useful, and discuss there what can be done.
Ideally if you could attach a simple test case which is able to generate some test data of the right size and then reproduce the problem, I'd be happy to profile it myself and improve Search.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Hibernate Search 5 - MassIndexer performance
PostPosted: Mon Oct 06, 2014 10:47 am 
Beginner
Beginner

Joined: Wed Aug 06, 2014 10:53 am
Posts: 30
The cache system is build around the methods I use to perform the faceted search!

I have a method named "getFacetedFor(parameters...)". When the parameters and the lucene index data are unchanged the system caches the results! So further calls can be resolved with no server elaboration.
That's all. I haven't found a better rapid solution.

My main beans structure is similar to:
Code:
@Indexed
@ClassBridge(name = "customClassBridge", impl = MyCustomClassBridge.class)
class MyFactBean{
  @ID
  UUID id;
  Date from;
  Date to;
  @IndexedEmbedded
  BeanType1: sourceItemType1;
  @IndexedEmbedded
  BeanType2: sourceItemType2;
  @IndexedEmbedded
  BeanType3: sourceItemType3;
 
  @IndexedEmbedded
  BeanType1: destinationItemType1;
  @IndexedEmbedded
  BeanType2: destinationItemType2;
  @IndexedEmbedded
  BeanType3: destinationItemType3;

  ... other fields are not indexed
}

class BeanTypeBase(){
  @ID
  UUID id,
  @Field() // variable length - max length=16
  String value;
}

@Cacheable
class BeanType1 extends BeanTypeBase{}
@Cacheable
class BeanType2 extends BeanTypeBase{}
@Cacheable
class BeanType3 extends BeanTypeBase{}


It's possible that linked class have null values.
I would like improve performance for "indexed embedded" fields.

ps. I tryed to create an "improvement issue" on jira, I registered myself now, but it seems I cannot create issues.


Top
 Profile  
 
 Post subject: Re: Hibernate Search 5 - MassIndexer performance
PostPosted: Mon Oct 06, 2014 1:28 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
About JIRA: you should be able to create issues, let me check if something is wrong with permissions. What is the account you created?

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Hibernate Search 5 - MassIndexer performance
PostPosted: Tue Oct 07, 2014 6:54 am 
Beginner
Beginner

Joined: Wed Aug 06, 2014 10:53 am
Posts: 30
Now it works!
I just opened an improvement issue: HSEARCH-1686


Top
 Profile  
 
 Post subject: Re: Hibernate Search 5 - MassIndexer performance
PostPosted: Tue Oct 07, 2014 7:00 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Ok thanks. strange, I'll see if it happens again.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 12 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.