Friday 16 August 2013

SHOBHIT-Improved String Search Algorithm



In Computer Science, SHOBHIT-Improved String Search Algorithm is a string searching algorithm created by SHOBHIT UPADHYAYA in August, 2013. He is working as a Software Developer in Bangalore India.


“This algorithm uses the first, last, mid1 and mid2 index of the substring for a pattern search.”


The main reason for publishing this new algorithm is, there was bug in my last search algorithm SHOBHIT-Advance String Search Algorithm . That bug had been raised by my blog follower.

I would like to give thanks to that guy.

My new algorithm is faster than the previous one.



PESUDOCODE OF ALGORITHM:

 Shobhit_Improved_String_Search(text, text_len, substr, substr_len)
 {
         If substr_len is greater than text_len
                return -1      
                       
         sub_end = substr_len -1
                       
         if substr_len is even
         then
                  sub_mid1 = (substr_len /2 ) – 1
                  sub_mid2 = (substr_len/2)
         else
                  sub_mid1 = (substr_len/2)
                  sub_mid2 = (substr_len/2)

          sub_start = 0
          t_end = sub_end
          t_mid1 = sub_mid1
          t_mid2 = sub_mid2

          m1 = sub_mid1
          m2 = sub_mid2
          index=-1

          for ( t_start = 0;  t_end < text_len; t_start ++)
          {
                 If text[t_start] is equal to substr[sub_start] AND
                             text[t_end] is equal to substr[sub_end]

                  then
                          if index is equal to -1
                          then
                                index = t_start
                                               
                                if  ( sub_start + 1 is equal to sub_mid1 AND
                                                 sub_mid2 +1 is equal to sub_end )
                                         OR
                                    ( sub_start is equal to sub_mid1 AND
                                                 sub_mid2 is equal to sub_end )
           
                                then
                                         return index;           //Pattern Found
                                   
                                if text[t_mid1] is equal to substr[sub_mid1] AND
                                              text[t_mid2] is equal to substr[sub_mid2]
                                then
                                        increment sub_start by 1
                                        decrement sub_end by 1

                                         decrement sub_mid1 by 1
                                         increment sub_mid2 by 1

                                         decrement t_end by 1
                                         decrement t_mid1 by 1
                                         increment t_mid2 by 1
                                else
                                         set sub_start to zero
                                         set sub_end to substr_len -1
                                         set sub_mid1 to m1
                                         set sub_mid2 to m2

                                         set t_end to t_start + sub_end + 1
                                         set t_mid1 to sub_mid1 + t_start + 1
                                         set t_mid2 to sub_mid2 + t_start + 1

                                         set index to -1

                     else
                               set sub_start to zero
                               set sub_end to substr_len -1
                               set sub_mid1 to m1
                               set sub_mid2 to m2

                               set t_end to t_start + sub_end
                               set t_mid1 to sub_mid1 + t_start
                               set t_mid2 to sub_mid2 + t_start

                               set index to -1
                                   
                               If text[t_start] is equal to substr[sub_start] AND
                                         text[t_end] is equal to substr[sub_end]

                                then
                                         if index is equal to -1
                                         then
                                               index = t_start
                                               
                                          if  ( sub_start + 1 is equal to sub_mid1 AND
                                                            sub_mid2 +1 is equal to sub_end )
                                                 OR
                                                  ( sub_start is equal to sub_mid1 AND
                                                             sub_mid2 is equal to sub_end )
           
                                           then
                                                  return index;           //Pattern Found
                                   
                                          if text[t_mid1] is equal to substr[sub_mid1] AND
                                                         text[t_mid2] is equal to substr[sub_mid2]
                                          then
                                                increment sub_start by 1
                                                decrement sub_end by 1

                                                decrement sub_mid1 by 1
                                                 increment sub_mid2 by 1

                                                 decrement t_end by 1
                                                 decrement t_mid1 by 1
                                                  increment t_mid2 by 1
                                                           
                            else
                                   increment t_end by 1
                                   increment t_mid1 by 1
                                   increment t_mid2 by 1
                                                           
 }


TIME and SPACE Complexity:

For a text of length n and substring of length m.

Its best case, time complexity is O(m/4) and in worst case, time complexity is O(n – (m/4) ).

In all the cases best, average and worst its space complexity is O(1).

This algorithm checks four characters at a time.

In best case where the substring is at the start of the text, we can check all the character of substring in only m/4 time.
In worst case where the substring is at the last of the text, we need to iterate till n – m


SOURCE CODE AND LICENSE:-

          The code of this algorithm is free and it’s under
GNU GENERAL PUBLIC LICENSE
Version 3, 29 June 2007

Please find out the source code from the following link.

The source code is having the algorithm api and a test application code to test the api.
If you found any bug or if you want to communicate related to this algorithm drop me a mail at < sho.upadhyaya@gmail.com > or do comment on my blog.



Monday 5 August 2013

SHOBHIT-Advance String Search Algorithm



In Computer Science, SHOBHIT-Advance String Search Algorithm is a string searching algorithm created by SHOBHIT UPADHYAYA in 2013. He is working as a Software Developer in Bangalore India.

“This algorithm uses the first and last index of the substring for a pattern search.”


PESUDOCODE OF ALGORITHM:

Shobhit_Advance_String_Search( str1 , length_of_str1, str2 , length_of_str2 )
{
      i =  0;
      j =  0;
      k = length_of_str2 -1
      last = length_of_str2 -1;
      index = -1
     
      if length_of_str2 is greater than length_of_str1
            return index

      for(  ; k is less than length_of_str1 ; increment i by one )
      {
           if str2[ j ] equals to str1[ i ] and str2[ last ] equals to str1[ k ]
           then
                   if index equals to -1
                         set index to i

                   if j equals to last or j + 1 equals to last
                   then
                            do break; // pattern found
                    else
                           increment j by one
                           decrement last by one
                            decrement k by one
         else
                 set index to -1
                 set j to zero
                 set last to length_of_str2 -1              
                 set k to i + last

                 if str2[ j ] equals to str1[ i ] and str2[last] equals to str1[k]
                 then
                            if index equals to -1
                                    set index to i

                            if j equals to last or j + 1 equals to last
                           then
                                    do break; // pattern found
                           else
                                    increment j by one
                                   decrement last by one
                                   decrement k by one
                 else
                          increment k by one
      }
      return index
}

 Please click on the following link to understand better this Algorithm.


TIME and SPACE Complexity:

For a text of length n and substring of length m.

Its best case, time complexity is O(m/2) and in worst case, time complexity is O(n – (m/2) ).

In all the cases best, average and worst its space complexity is O(1).
This algorithm checks two characters at a time.
In best case where the substring is at the start of the text, we can check all the character of substring in only m/2 time.
In worst case where the substring is at the last of the text, we need to iterate till n – m


SOURCE CODE AND LICENSE:-

          The code of this algorithm is free and it’s under

GNU GENERAL PUBLIC LICENSE

Version 3, 29 June 2007
Please find out the code from the following link.
The source code is having the algorithm api and a test application code to test the api.
If you found any bug or if you want to communicate related to this algorithm send me a mail < sho.upadhyaya@gmail.com > or do comment on my blog.

           

Other Good String Search ALGORITHM:-

1)     Knuth-Morris-Pratt Algorithm
The running time of Knuth-Morris-Pratt algorithm is proportional to the time needed to read the characters in text and pattern. In other words, the worst-case running time of the algorithm is O(m + n) and it requires O(m) extra space. It is important to note that these quantities are independent of the size of the underlying alphabet.

2)     Rabin-Karp Algorithm
 For text of length n and p patterns of combined length m, its average and best case running time is O(n+m) in space O(p), but its worst-case time is O(nm)

3)     Boyer-Moore String search  Algorithm
Worst Case runtime when pattern occur in the text is O(nm)
Worst Case runtime when pattern does not occur in the text is O(n+m)

4)     Aho-Corasick string matching algorithm
 Aho–Corasick string matching algorithm has asymptotic worst-time complexity O(n+m) in space O(m).
                                   

THANKS FOR READING

SHOBHIT-Advance String Search Algorithm in Graphical Way




SHOBHIT-Advance String Search Algorithm in Graphical Way






back to main page of Algorithm