Main Page | Class Hierarchy | Alphabetical List | Data Structures | Directories | File List | Data Fields | Globals | Related Pages

StringSearch Class Reference

StringSearch is a SearchIterator that provides language-sensitive text searching based on the comparison rules defined in a RuleBasedCollator object. More...

#include <stsearch.h>

Inheritance diagram for StringSearch:

SearchIterator

Public Member Functions

 StringSearch (const UnicodeString &pattern, const UnicodeString &text, const Locale &locale, BreakIterator *breakiter, UErrorCode &status)
 Creating a StringSearch instance using the argument locale language rule set.
 StringSearch (const UnicodeString &pattern, const UnicodeString &text, RuleBasedCollator *coll, BreakIterator *breakiter, UErrorCode &status)
 Creating a StringSearch instance using the argument collator language rule set.
 StringSearch (const UnicodeString &pattern, CharacterIterator &text, const Locale &locale, BreakIterator *breakiter, UErrorCode &status)
 Creating a StringSearch instance using the argument locale language rule set.
 StringSearch (const UnicodeString &pattern, CharacterIterator &text, RuleBasedCollator *coll, BreakIterator *breakiter, UErrorCode &status)
 Creating a StringSearch instance using the argument collator language rule set.
 StringSearch (const StringSearch &that)
 Copy constructor that creates a StringSearch instance with the same behavior, and iterating over the same text.
virtual ~StringSearch (void)
 Destructor.
StringSearchoperator= (const StringSearch &that)
 Assignment operator.
virtual UBool operator== (const SearchIterator &that) const
 Equality operator.
virtual void setOffset (int32_t position, UErrorCode &status)
 Sets the index to point to the given position, and clears any state that's affected.
virtual int32_t getOffset (void) const
 Return the current index in the text being searched.
virtual void setText (const UnicodeString &text, UErrorCode &status)
 Set the target text to be searched.
virtual void setText (CharacterIterator &text, UErrorCode &status)
 Set the target text to be searched.
RuleBasedCollatorgetCollator () const
 Gets the collator used for the language rules.
void setCollator (RuleBasedCollator *coll, UErrorCode &status)
 Sets the collator used for the language rules.
void setPattern (const UnicodeString &pattern, UErrorCode &status)
 Sets the pattern used for matching.
const UnicodeStringgetPattern () const
 Gets the search pattern.
virtual void reset ()
 Reset the iteration.
virtual SearchIteratorsafeClone (void) const
 Returns a copy of StringSearch with the same behavior, and iterating over the same text, as this one.

Protected Member Functions

virtual int32_t handleNext (int32_t position, UErrorCode &status)
 Search forward for matching text, starting at a given location.
virtual int32_t handlePrev (int32_t position, UErrorCode &status)
 Search backward for matching text, starting at a given location.

Detailed Description

StringSearch is a SearchIterator that provides language-sensitive text searching based on the comparison rules defined in a RuleBasedCollator object.

StringSearch ensures that language eccentricity can be handled, e.g. for the German collator, characters ß and SS will be matched if case is chosen to be ignored. See the "ICU Collation Design Document" for more information.

The algorithm implemented is a modified form of the Boyer Moore's search. For more information see "Efficient Text Searching in Java", published in Java Report in February, 1999, for further information on the algorithm.

There are 2 match options for selection:
Let S' be the sub-string of a text string S between the offsets start and end <start, end>.
A pattern string P matches a text string S at the offsets <start, end> if

 
 option 1. Some canonical equivalent of P matches some canonical equivalent 
           of S'
 option 2. P matches S' and if P starts or ends with a combining mark, 
           there exists no non-ignorable combining mark before or after S? 
           in S respectively. 
 
Option 2. will be the default·

This search has APIs similar to that of other text iteration mechanisms such as the break iterators in BreakIterator. Using these APIs, it is easy to scan through text looking for all occurances of a given pattern. This search iterator allows changing of direction by calling a reset followed by a next or previous. Though a direction change can occur without calling reset first, this operation comes with some speed penalty. Match results in the forward direction will match the result matches in the backwards direction in the reverse order

SearchIterator provides APIs to specify the starting position within the text string to be searched, e.g. setOffset, preceding and following. Since the starting position will be set as it is specified, please take note that there are some danger points which the search may render incorrect results:


Constructor & Destructor Documentation

StringSearch::StringSearch const UnicodeString pattern,
const UnicodeString text,
const Locale &  locale,
BreakIterator breakiter,
UErrorCode status
 

Creating a StringSearch instance using the argument locale language rule set.

A collator will be created in the process, which will be owned by this instance and will be deleted in during destruction

Parameters:
pattern The text for which this object will search.
text The text in which to search for the pattern.
locale A locale which defines the language-sensitive comparison rules used to determine whether text in the pattern and target matches.
breakiter A BreakIterator object used to constrain the matches that are found. Matches whose start and end indices in the target text are not boundaries as determined by the BreakIterator are ignored. If this behavior is not desired, NULL can be passed in instead.
status for errors if any. If pattern or text is NULL, or if either the length of pattern or text is 0 then an U_ILLEGAL_ARGUMENT_ERROR is returned. ICU 2.0

StringSearch::StringSearch const UnicodeString pattern,
const UnicodeString text,
RuleBasedCollator coll,
BreakIterator breakiter,
UErrorCode status
 

Creating a StringSearch instance using the argument collator language rule set.

Note, user retains the ownership of this collator, it does not get destroyed during this instance's destruction.

Parameters:
pattern The text for which this object will search.
text The text in which to search for the pattern.
coll A RuleBasedCollator object which defines the language-sensitive comparison rules used to determine whether text in the pattern and target matches. User is responsible for the clearing of this object.
breakiter A BreakIterator object used to constrain the matches that are found. Matches whose start and end indices in the target text are not boundaries as determined by the BreakIterator are ignored. If this behavior is not desired, NULL can be passed in instead.
status for errors if any. If either the length of pattern or text is 0 then an U_ILLEGAL_ARGUMENT_ERROR is returned. ICU 2.0

StringSearch::StringSearch const UnicodeString pattern,
CharacterIterator text,
const Locale &  locale,
BreakIterator breakiter,
UErrorCode status
 

Creating a StringSearch instance using the argument locale language rule set.

A collator will be created in the process, which will be owned by this instance and will be deleted in during destruction

Note: No parsing of the text within the CharacterIterator will be done during searching for this version. The block of text in CharacterIterator will be used as it is.

Parameters:
pattern The text for which this object will search.
text The text iterator in which to search for the pattern.
locale A locale which defines the language-sensitive comparison rules used to determine whether text in the pattern and target matches. User is responsible for the clearing of this object.
breakiter A BreakIterator object used to constrain the matches that are found. Matches whose start and end indices in the target text are not boundaries as determined by the BreakIterator are ignored. If this behavior is not desired, NULL can be passed in instead.
status for errors if any. If either the length of pattern or text is 0 then an U_ILLEGAL_ARGUMENT_ERROR is returned. ICU 2.0

StringSearch::StringSearch const UnicodeString pattern,
CharacterIterator text,
RuleBasedCollator coll,
BreakIterator breakiter,
UErrorCode status
 

Creating a StringSearch instance using the argument collator language rule set.

Note, user retains the ownership of this collator, it does not get destroyed during this instance's destruction.

Note: No parsing of the text within the CharacterIterator will be done during searching for this version. The block of text in CharacterIterator will be used as it is.

Parameters:
pattern The text for which this object will search.
text The text in which to search for the pattern.
coll A RuleBasedCollator object which defines the language-sensitive comparison rules used to determine whether text in the pattern and target matches. User is responsible for the clearing of this object.
breakiter A BreakIterator object used to constrain the matches that are found. Matches whose start and end indices in the target text are not boundaries as determined by the BreakIterator are ignored. If this behavior is not desired, NULL can be passed in instead.
status for errors if any. If either the length of pattern or text is 0 then an U_ILLEGAL_ARGUMENT_ERROR is returned. ICU 2.0

StringSearch::StringSearch const StringSearch that  ) 
 

Copy constructor that creates a StringSearch instance with the same behavior, and iterating over the same text.

Parameters:
that StringSearch instance to be copied. ICU 2.0

virtual StringSearch::~StringSearch void   )  [virtual]
 

Destructor.

Cleans up the search iterator data struct. If a collator is created in the constructor, it will be destroyed here. ICU 2.0


Member Function Documentation

RuleBasedCollator* StringSearch::getCollator  )  const
 

Gets the collator used for the language rules.

Deleting the returned RuleBasedCollator before calling the destructor would cause the string search to fail. The destructor will delete the collator if this instance owns it

Returns:
collator used for string search ICU 2.0

virtual int32_t StringSearch::getOffset void   )  const [virtual]
 

Return the current index in the text being searched.

If the iteration has gone past the end of the text (or past the beginning for a backwards search), USEARCH_DONE is returned.

Returns:
current index in the text being searched. ICU 2.0

Implements SearchIterator.

const UnicodeString& StringSearch::getPattern  )  const
 

Gets the search pattern.

Returns:
pattern used for matching ICU 2.0

virtual int32_t StringSearch::handleNext int32_t  position,
UErrorCode status
[protected, virtual]
 

Search forward for matching text, starting at a given location.

Clients should not call this method directly; instead they should call SearchIterator#next.

If a match is found, this method returns the index at which the match starts and calls SearchIterator#setMatchLength with the number of characters in the target text that make up the match. If no match is found, the method returns USEARCH_DONE.

The StringSearch is adjusted so that its current index (as returned by getOffset) is the match position if one was found. If a match is not found, USEARCH_DONE will be returned and the StringSearch will be adjusted to the index USEARCH_DONE.

Parameters:
position The index in the target text at which the search starts
status for errors if any occurs
Returns:
The index at which the matched text in the target starts, or USEARCH_DONE if no match was found.

Implements SearchIterator.

virtual int32_t StringSearch::handlePrev int32_t  position,
UErrorCode status
[protected, virtual]
 

Search backward for matching text, starting at a given location.

Clients should not call this method directly; instead they should call SearchIterator.previous(), which this method overrides.

If a match is found, this method returns the index at which the match starts and calls SearchIterator#setMatchLength with the number of characters in the target text that make up the match. If no match is found, the method returns USEARCH_DONE.

The StringSearch is adjusted so that its current index (as returned by getOffset) is the match position if one was found. If a match is not found, USEARCH_DONE will be returned and the StringSearch will be adjusted to the index USEARCH_DONE.

Parameters:
position The index in the target text at which the search starts.
status for errors if any occurs
Returns:
The index at which the matched text in the target starts, or USEARCH_DONE if no match was found.

Implements SearchIterator.

StringSearch& StringSearch::operator= const StringSearch that  ) 
 

Assignment operator.

Sets this iterator to have the same behavior, and iterate over the same text, as the one passed in.

Parameters:
that instance to be copied. ICU 2.0

virtual UBool StringSearch::operator== const SearchIterator that  )  const [virtual]
 

Equality operator.

Parameters:
that instance to be compared.
Returns:
TRUE if both instances have the same attributes, breakiterators, collators and iterate over the same text while looking for the same pattern. ICU 2.0

Reimplemented from SearchIterator.

virtual void StringSearch::reset  )  [virtual]
 

Reset the iteration.

Search will begin at the start of the text string if a forward iteration is initiated before a backwards iteration. Otherwise if a backwards iteration is initiated before a forwards iteration, the search will begin at the end of the text string. ICU 2.0

Reimplemented from SearchIterator.

virtual SearchIterator* StringSearch::safeClone void   )  const [virtual]
 

Returns a copy of StringSearch with the same behavior, and iterating over the same text, as this one.

Note that all data will be replicated, except for the user-specified collator and the breakiterator.

Returns:
cloned object ICU 2.0

Implements SearchIterator.

void StringSearch::setCollator RuleBasedCollator coll,
UErrorCode status
 

Sets the collator used for the language rules.

User retains the ownership of this collator, thus the responsibility of deletion lies with the user. This method causes internal data such as Boyer-Moore shift tables to be recalculated, but the iterator's position is unchanged.

Parameters:
coll collator
status for errors if any ICU 2.0

virtual void StringSearch::setOffset int32_t  position,
UErrorCode status
[virtual]
 

Sets the index to point to the given position, and clears any state that's affected.

This method takes the argument index and sets the position in the text string accordingly without checking if the index is pointing to a valid starting point to begin searching.

Parameters:
position within the text to be set
status for errors if it occurs ICU 2.0

Implements SearchIterator.

void StringSearch::setPattern const UnicodeString pattern,
UErrorCode status
 

Sets the pattern used for matching.

Internal data like the Boyer Moore table will be recalculated, but the iterator's position is unchanged.

Parameters:
pattern search pattern to be found
status for errors if any. If the pattern length is 0 then an U_ILLEGAL_ARGUMENT_ERROR is returned. ICU 2.0

virtual void StringSearch::setText CharacterIterator text,
UErrorCode status
[virtual]
 

Set the target text to be searched.

Text iteration will hence begin at the start of the text string. This method is useful if you want to re-use an iterator to search for the same pattern within a different body of text. Note: No parsing of the text within the CharacterIterator will be done during searching for this version. The block of text in CharacterIterator will be used as it is.

Parameters:
text text string to be searched
status for errors if any. If the text length is 0 then an U_ILLEGAL_ARGUMENT_ERROR is returned. ICU 2.0

Reimplemented from SearchIterator.

virtual void StringSearch::setText const UnicodeString text,
UErrorCode status
[virtual]
 

Set the target text to be searched.

Text iteration will hence begin at the start of the text string. This method is useful if you want to re-use an iterator to search for the same pattern within a different body of text.

Parameters:
text text string to be searched
status for errors if any. If the text length is 0 then an U_ILLEGAL_ARGUMENT_ERROR is returned. ICU 2.0

Reimplemented from SearchIterator.


The documentation for this class was generated from the following file:
Generated on Sun May 22 18:49:57 2005 for ICU 2.1 by  doxygen 1.4.2