-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 1 post ] 
Author Message
 Post subject: Avoiding mix and match fuzzy searches on collections
PostPosted: Mon Jan 16, 2017 5:51 pm 
Newbie

Joined: Fri Feb 06, 2015 11:22 am
Posts: 3
Hi,

In our project we have a Client entity that can hold one to many name entities and one to many address entities. This is what a client containing names and address collections looks like with the annotations.

@Entity
@Indexed
public class Client extends BaseEntity{
...

@IndexedEmbedded(depth = 1)
@Fetch(FetchMode.SUBSELECT)
@OneToMany(mappedBy = "client", fetch = FetchType.LAZY, cascade = {CascadeType.MERGE,CascadeType.REMOVE, CascadeType.REFRESH})
public Set<Name> getNames() {
return names;
}
...

@IndexedEmbedded(depth = 1)
@Fetch(FetchMode.SUBSELECT)
@OneToMany(mappedBy = "client", fetch = FetchType.LAZY, cascade = {CascadeType.MERGE,CascadeType.REMOVE, CascadeType.REFRESH})
public Set<UsAddress> getUsAddresses() {
return usAddresses;
}

...
}

This is what Name entity looks like with analyzers applied.

@AnalyzerDef(name="nameanalyzer",
charFilters = {
@CharFilterDef(factory = MappingCharFilterFactory.class, params = {
@Parameter(name = "mapping",
value = "mapping-chars.properties")
})
},
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = StopFilterFactory.class, params = {
@Parameter(name="words",
value= "stoplist.properties" ),
@Parameter(name="ignoreCase", value="true")
}),
@TokenFilterDef(factory = SynonymFilterFactory.class, params = {
@Parameter(name="synonyms",
value= "nicknames.txt" ),
@Parameter(name="ignoreCase", value="true"),
@Parameter(name="expand", value="false"),
})
})
@Entity
public class Name extends BaseEntity {
...
@Basic
@Field(index = Index.YES, analyze = Analyze.YES, store = Store.NO)
@Analyzer(definition = "nameanalyzer")
@Column(name = "BUSINESS_NAME", nullable = true, insertable = true, updatable = true, length = 370, precision = 0)
public String getBusinessName() {
return businessName;
}
...
}

This is what UsAddress entity looks like with analyzers applied.

@AnalyzerDef(name="usAddressAnalyzer",
charFilters = {
@CharFilterDef(factory = MappingCharFilterFactory.class, params = {
@Parameter(name = "mapping",
value = "mapping-chars.properties")
})
},
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = StopFilterFactory.class, params = {
@Parameter(name="words",
value= "stoplist.properties" ),
@Parameter(name="ignoreCase", value="true")
}),
@TokenFilterDef(factory = SynonymFilterFactory.class, params = {
@Parameter(name="synonyms",
value= "street_synonyms.txt" ),
@Parameter(name="ignoreCase", value="true"),
@Parameter(name="expand", value="false"),
})
})
@Entity
public class UsAddress extends BaseEntity {
...
@Basic
@Field(index = Index.YES, analyze = Analyze.YES, store = Store.NO)
@Analyzer(definition = "usAddressAnalyzer")
@Column(name = "DELIVERY_LINE", nullable = false, insertable = true, updatable = true, length = 256, precision = 0)
public String getDeliveryLine() {
return deliveryLine;
}
...
}

Here we are indexing only client because all our fuzzy searches on businessName field and/or deliveryLine field should return list of clients. So I believe the index document for a client is holding all businessName values from the name collection against the indexed businessName field and same for deliveryLine.
So suppose a client consists of the following two names

<client>
<names>
<name>
<id>efd2173d-d7d4-4449-b100-92ae373fedb1</id>
<clientId>ca46c8e1-69a9-4bf9-ab7d-15938f7a459d</clientId>
<businessName>Tom Raulston Co.</businessName>
</name>
<name>
<id>970cb247-195b-406d-937a-dc8399a8e0e9</id>
<clientId>ca46c8e1-69a9-4bf9-ab7d-15938f7a459d</clientId>
<businessName>Mike Jason Incorp</businessName>
</name>
</names>
...

</client>

Now if I fuzzy search for 'Tom AND Raulston' or 'Mike AND Jason' I find this client and that is ok.
But the problem is even if I search for 'Tom AND Jason' I find this client despite no business name existing as <businessName>Tom Jason</businessName>.
So my question is how do I ensure a fuzzy search like 'Tom AND Jason' not find this client at all because the combination does not exist.

I do not want to use phrase query with slops because then the fuzziness is lost and the consumers of our system want fuzziness.

Thanks


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 1 post ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.