-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 6 posts ] 
Author Message
 Post subject: Hibernate Search - Infinispan - CheckIndex - How to recover?
PostPosted: Fri Jan 13, 2017 7:35 am 
Beginner
Beginner

Joined: Sun Aug 16, 2015 3:21 am
Posts: 27
Hi,

If we use Infinispan and store index in DB, can we run CheckIndex if there is any corruption of segments?

Secondly if we are able to remove the corrupted segments can we see which Keys we need to re-index?

And possibly which versions have got this feature?

Problem is if you got millions of documents and because of any EOFException or anything some segment got corrupted. Now if you restart your (master or slave) - it will load from underline cachstore that will fail to start even.

Like RDBMS if some records are faulty we at least get up and running and then we fix them... How to achieve similar here so that we get Live and fix the segments somehow like Lucene CheckIndex etc.


I might be missing something very fundamental please explain if this is the case.

Thanks in advance!


Top
 Profile  
 
 Post subject: Re: Hibernate Search - Infinispan - CheckIndex - How to recover?
PostPosted: Sat Jan 14, 2017 7:01 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hello,

unlike an RDBMS, the index needs to be strictly in synch with your database to be useful. Data loss in the index is acceptable, as it can be rebuilt from the RDBMS: that puts it in very different light from the database, whose job is to work hard to avoid data loss.

The best strategy to maintain this synchronization is to rebuild the index from the database on errors: recovering a corrupt index would still imply you had some downtime in which updates might have been missed.

So I recommend to wipe your index and rebuild it using the MassIndexer; Something I can't stress enough is to make sure you have tuned the MassIndexer so that recovery can be performed quickly; having dedicated a couple of engineering days to tune the MassIndexer properly is always very useful both for further development (you need to rebuild the index when updating your indexing options) and as a proper plan for disaster recovery.

HTH

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Hibernate Search - Infinispan - CheckIndex - How to recover?
PostPosted: Mon Jan 16, 2017 6:09 am 
Beginner
Beginner

Joined: Sun Aug 16, 2015 3:21 am
Posts: 27
Hmmmm

I am sure MassIndexer is not always to rebuild all from scratch. For example if we have got a backup of indexes when we restarted server last time (let say a month ago) then we can restore the backup and run MassIndexer for only changes since that restart date on top of existing index...

We can do that easily by restricting data load queries to only consider changes since that date.

This means we can be up and running quicker if we don't need to wipe the whole index that can be a billion of documents. Reloading a month's changes can hardly go to 100K-300K documents.

I am waiting for your reply as that can help designing the stuff properly. So quick response is needed if possible. Many many thanks in advance.


Top
 Profile  
 
 Post subject: Re: Hibernate Search - Infinispan - CheckIndex - How to recover?
PostPosted: Mon Jan 16, 2017 11:02 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Quote:
For example if we have got a backup of indexes when we restarted server last time (let say a month ago) then we can restore the backup and run MassIndexer for only changes since that restart date on top of existing index...


I'm not sure which MassIndexer can do that. Not the one I implemented which is now in Hibernate Search, unless you're restricting what Hibernate can see, or mapping a filtered view for these purposes?

That might be an interesting experiment, but I'm not sure how to help you to guarantee that the resulting index will be in sync, unless you can prevent other changes from happening concurrently. We've had some discussions on the mailing list to consider using changeset ids, timestamps or transactions ids but no single solution is safe for general purpose usage. I agree you might be able to build something which works fine for your specific requirements.

Back to your original question about using CheckIndex: sure you can run it, but you'll have to restore your backup in case you find non-recoverable issues as it's not possible to identify from the segment id nor the filename which keys need to be reindexed. I suspect one possible solution would be to check - for each key in the database - if there's a matching document in any other segment, then skip it if there is as there should never be a duplicate. I'm not sure if you can implement this to be efficient enough to be faster than reindexing it all though, as you'll still need to iterate at least all ids: this pre-filtering approach could be a good idea if the indexer has to produce complex Lucene Document and/or load complex object graphs.

HTH

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Hibernate Search - Infinispan - CheckIndex - How to recover?
PostPosted: Mon Jan 16, 2017 1:44 pm 
Beginner
Beginner

Joined: Sun Aug 16, 2015 3:21 am
Posts: 27
Quote:
I'm not sure which MassIndexer can do that. Not the one I implemented which is now in Hibernate Search, unless you're restricting what Hibernate can see, or mapping a filtered view for these purposes?

Yes DB views we are using and that can restrict the Hibernate to see data out of that given date range.

Quote:
That might be an interesting experiment, but I'm not sure how to help you to guarantee that the resulting index will be in sync, unless you can prevent other changes from happening concurrently

Hmmmm... In this case it doesn't matter as translog table (we created to keep the index log - JMS Messages) only contains keys to changed documents. Regardless if new change comes in or not it will load upto-date data from transactional database. If someone changed any document while MassIndexer is running it will be indexed twice but same data will be indexed. I hope MassIndexer can run in parallel to normal index and search actions....


Top
 Profile  
 
 Post subject: Re: Hibernate Search - Infinispan - CheckIndex - How to recover?
PostPosted: Mon Jan 16, 2017 2:30 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Yes that sounds good. And yes, MassIndexer can run in parallel with normal operations. Just make sure to enable the option to not clear the index on job start.

Interesting setup. Thanks for the feedback!

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 6 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.