Friday, 16 August 2013

SHOBHIT-Improved String Search Algorithm



In Computer Science, SHOBHIT-Improved String Search Algorithm is a string searching algorithm created by SHOBHIT UPADHYAYA in August, 2013. He is working as a Software Developer in Bangalore India.


“This algorithm uses the first, last, mid1 and mid2 index of the substring for a pattern search.”


The main reason for publishing this new algorithm is, there was bug in my last search algorithm SHOBHIT-Advance String Search Algorithm . That bug had been raised by my blog follower.

I would like to give thanks to that guy.

My new algorithm is faster than the previous one.



PESUDOCODE OF ALGORITHM:

 Shobhit_Improved_String_Search(text, text_len, substr, substr_len)
 {
         If substr_len is greater than text_len
                return -1      
                       
         sub_end = substr_len -1
                       
         if substr_len is even
         then
                  sub_mid1 = (substr_len /2 ) – 1
                  sub_mid2 = (substr_len/2)
         else
                  sub_mid1 = (substr_len/2)
                  sub_mid2 = (substr_len/2)

          sub_start = 0
          t_end = sub_end
          t_mid1 = sub_mid1
          t_mid2 = sub_mid2

          m1 = sub_mid1
          m2 = sub_mid2
          index=-1

          for ( t_start = 0;  t_end < text_len; t_start ++)
          {
                 If text[t_start] is equal to substr[sub_start] AND
                             text[t_end] is equal to substr[sub_end]

                  then
                          if index is equal to -1
                          then
                                index = t_start
                                               
                                if  ( sub_start + 1 is equal to sub_mid1 AND
                                                 sub_mid2 +1 is equal to sub_end )
                                         OR
                                    ( sub_start is equal to sub_mid1 AND
                                                 sub_mid2 is equal to sub_end )
           
                                then
                                         return index;           //Pattern Found
                                   
                                if text[t_mid1] is equal to substr[sub_mid1] AND
                                              text[t_mid2] is equal to substr[sub_mid2]
                                then
                                        increment sub_start by 1
                                        decrement sub_end by 1

                                         decrement sub_mid1 by 1
                                         increment sub_mid2 by 1

                                         decrement t_end by 1
                                         decrement t_mid1 by 1
                                         increment t_mid2 by 1
                                else
                                         set sub_start to zero
                                         set sub_end to substr_len -1
                                         set sub_mid1 to m1
                                         set sub_mid2 to m2

                                         set t_end to t_start + sub_end + 1
                                         set t_mid1 to sub_mid1 + t_start + 1
                                         set t_mid2 to sub_mid2 + t_start + 1

                                         set index to -1

                     else
                               set sub_start to zero
                               set sub_end to substr_len -1
                               set sub_mid1 to m1
                               set sub_mid2 to m2

                               set t_end to t_start + sub_end
                               set t_mid1 to sub_mid1 + t_start
                               set t_mid2 to sub_mid2 + t_start

                               set index to -1
                                   
                               If text[t_start] is equal to substr[sub_start] AND
                                         text[t_end] is equal to substr[sub_end]

                                then
                                         if index is equal to -1
                                         then
                                               index = t_start
                                               
                                          if  ( sub_start + 1 is equal to sub_mid1 AND
                                                            sub_mid2 +1 is equal to sub_end )
                                                 OR
                                                  ( sub_start is equal to sub_mid1 AND
                                                             sub_mid2 is equal to sub_end )
           
                                           then
                                                  return index;           //Pattern Found
                                   
                                          if text[t_mid1] is equal to substr[sub_mid1] AND
                                                         text[t_mid2] is equal to substr[sub_mid2]
                                          then
                                                increment sub_start by 1
                                                decrement sub_end by 1

                                                decrement sub_mid1 by 1
                                                 increment sub_mid2 by 1

                                                 decrement t_end by 1
                                                 decrement t_mid1 by 1
                                                  increment t_mid2 by 1
                                                           
                            else
                                   increment t_end by 1
                                   increment t_mid1 by 1
                                   increment t_mid2 by 1
                                                           
 }


TIME and SPACE Complexity:

For a text of length n and substring of length m.

Its best case, time complexity is O(m/4) and in worst case, time complexity is O(n – (m/4) ).

In all the cases best, average and worst its space complexity is O(1).

This algorithm checks four characters at a time.

In best case where the substring is at the start of the text, we can check all the character of substring in only m/4 time.
In worst case where the substring is at the last of the text, we need to iterate till n – m


SOURCE CODE AND LICENSE:-

          The code of this algorithm is free and it’s under
GNU GENERAL PUBLIC LICENSE
Version 3, 29 June 2007

Please find out the source code from the following link.

The source code is having the algorithm api and a test application code to test the api.
If you found any bug or if you want to communicate related to this algorithm drop me a mail at < sho.upadhyaya@gmail.com > or do comment on my blog.