Find the original string, without the repetition without the repetition in the middle

Question

Sometimes it happens that while typing a sentence, I am distracted and I end up typing the same couple of words twice couple of words twice in succession.

To make sure make sure other people are not bothered by this, your task is to write a program that resolves this problem!

Task

Given an input string (if it matters for your language, you may assume ASCII-only input that does not contain linefeeds.) str, that contains somewhere in its middle a substring that occurs twice in immediate succession, return the string with one instance of this substring removed.

In the case of multiple possibilities, return the shortest answer possible (that is, pick the longest consecutive repeating substring and remove that one).

In the case of multiple, equally-long consecutive repeating substrings, remove the first (that is, the first one encountered when reading through the string from front to back) one.

You may assume that the input is correct (i.e. always contains a consecutive repeating substring), which might help to golf it down.

Examples

Input: hello hello world -> Output: hello world.
Input: foofoo -> Output: foo. (So: Yes, the string might only consist of the repeating part twice).
Input: aaaaa -> Output: aaa, as the longest repeating consecutive substring is here aa.
Input: Slartibartfast -> This is not a valid input, as it does not contain a consecutive repeating substring, so you do not need to handle this case.
Input: the few the bar -> This is another invalid input, since the repeating part should immediately follow the original part. In this case, the and the are separated by something else in-between, so this input is invalid.
Input: ababcbc -> Output: abcbc. The two possible longest consecutive repeating substrings are ab and bc. As ab is encountered earlier in the string, this one is the correct answer.
Input: Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo. Output: Buffalo buffalo buffalo buffalo Buffalo buffalo. (The performed replacement should be case-sensitive).
Input: Sometimes it happens that while typing a sentence, I am distracted and I end up typing the same couple of words twice couple of words twice in succession. -> Output: Sometimes it happens that while typing a sentence, I am distracted and I end up typing the same couple of words twice in succession.. Only the longest consecutive repeating substring is removed.

Your code should be as short as possible, since this is code-golf, so the shortest answer in bytes wins. Good luck!

@manatwork When taking the first sentence, that is Sometimes it happens that while typing a sentence, I am distracted and I end up typing the same couple of words twice couple of words twice in succession. as input, the output should be Sometimes it happens that while typing a sentence, I am distracted and I end up typing the same couple of words twice in succession.. Only the longest found duplication is removed. — Qqwy, Commented Mar 4, 2017 at 19:38
I suggest adding a test that has two possible replacements, where the second one is longer than the first. I suspect most answers won't pass that one :) — aross, Commented Mar 6, 2017 at 12:45
Unless me and my test code are mistaken, there is only one repeated string in there. — aross, Commented Mar 8, 2017 at 9:08

Martin Ender · Accepted Answer · 2017-03-05 12:03:44Z

Retina, 35 33 bytes

Byte count assumes ISO 8859-1 encoding.

(?=(.+)(\1.*))
$2¶$`
O$#`
$.&
G1`

Try it online!

Explanation

Since regex engines look for matches from left to right, it's not trivial to find the longest match regardless of position. It can be done with .NET's balancing groups, but the result is rather unpleasantly long:

1`((.)+)\1(?<=(?!.*((?>(?<-2>.)+).+)\3)^.*)
$1

So I figured I'd try to avoid that by making use of some other Retina features.

(?=(.+)(\1.*))
$2¶$`

We start by essentially applying all possible substitutions, one on each line. To do this, we match the position in front of a match (instead of the match itself), to allow for overlapping matches. This is done by putting the real regex into a lookahead. That lookahead then captures the remaining except the duplicate we want to remove in group 2. We write back group 2 (deleting the duplicate), a linefeed, and then then the entire input up to the match, which gives us basically a fresh line to be substituted.

At the end we'll have one line for each match, with the corresponding duplicate removed. At the end there'll also be the full input again without any substitutions done.

Now that we have all the possible substitutions, we want the shortest result (which corresponds to the longest removed repetition).

O$#`
$.&

So we first sort the lines by length.

G1`

And then we only keep the first line.

\$\begingroup\$ Wow, that replacement technique is really clever! \$\endgroup\$
– Leo
Commented Mar 5, 2017 at 15:45 — Leo, Commented Mar 5, 2017 at 15:45

Brad Gilbert b2gills · Accepted Answer · 2017-03-05 00:52:33Z

8

Perl 6, 40 bytes

{.subst: m:ex/(.*))>$0/.max(*.chars),''}

Try it

{
  .subst:             # substitute


    m                 # match
    :exhaustive
    /
      ( .* )          # any number of chars

      )>              # don't include the following in what is returned

      $0              # the first match again
    /.max( *.chars ), # find the first longest submatch


    ''                # substitute it with nothing
}

answered Mar 5, 2017 at 0:52

Brad Gilbert b2gills

13.2k1 gold badge18 silver badges33 bronze badges

Add a comment |

Jonathan Allan · Accepted Answer · 2017-03-05 07:18:42Z

Jelly, 22 19 bytes

-2 bytes thanks to Dennis (avoid an argument reversal, remove subtly redundant increment)

ẋ2³wȧ+¥J
ẆÇ€LÐṀḢṬœp

Try it online!

Full program (a bug has been found with ÐṀ not acting with the correct arity over dyads, which will be fixed soon; although I'm not sure it can make for shorter code here).

How?

Finds the first of the longest slices of the input such that a repetition exists in the input and removes it from the input.

ẋ2³wȧ+¥J - Link 1, removal indices for given slice if valid, else 0: slice, x
ẋ2       - repeat x twice, say y
  ³      - program input: s
   w     - index of first occurrence of y in s (1-based) or 0, say i
       J - range(length(x)): [1,2,3,...,length(x)]
      ¥  - last two links as a dyad
    ȧ    -     and (non-vectorising)
     +   -     addition: [1+i,2+i,3+i,...,length(x)+i] or 0
         - note: no need to decrement these since the last index will be the 1st index
         - of the repetition (thanks to Dennis for spotting that!)

ẆÇ€LÐṀḢṬœp - Main link: string, s
Ẇ          - all sublists of s (order is short to long, left to right, e.g. a,b,c,ab,bc,abc)
 Ç€        - call the last link (1) as a monad for €ach
    ÐṀ     - filter by maximal
   L       -     length
      Ḣ    - head: get the first (and hence left-most) one
       Ṭ   - untruth: make a list with 1s at the indexes given and 0s elsewhere
        œp - partition s at truthy indexes of that, throwing away the borders
           - implicit print

Ørjan Johansen · Accepted Answer · 2017-03-05 06:26:34Z

Haskell, 101 bytes

Main function is f, it takes and returns a String.

l=length
a=splitAt
f s|i<-[0..l s-1]=[p++t|n<-i,(p,(r,t))<-fmap(a$l s-n).(`a`s)<$>i,r==take(l r)t]!!0

Try it online!

When I started this, I imported Data.List and used maximum, tails, inits and isPrefixOf. Somehow that got turned into this. But I still only managed to shave off 11 bytes...

Notes

splitAt/a splits a string at a given index.
s is the input string.
i is the list of numbers [0 .. length s - 1], the -1 is to work around that splitAt splits at the end if given a too large index.
n is length s minus the current length goal for the repeated part, it's chosen that way so we don't have to use two number lists and/or the verbose decreasing list syntax.
p,r, and t are a threeway split of s, with r the intended repeated part. The fmap there uses the (,) String Functor to avoid a variable for an intermediate split.
!!0 selects the first element of the list of matches.

Neil · Accepted Answer · 2017-03-05 10:22:25Z

6

JavaScript (ES6), 81 74 bytes

f=
s=>s.replace(/(?=(.+)\1)/g,(_,m)=>r=m[r.length]?m:r,r='')&&s.replace(r,'')

<input oninput=o.textContent=f(this.value)><pre id=o>

Edit: Saved 7 bytes by stealing @Arnauld's m[r.length] trick.

edited Mar 5, 2017 at 10:22

answered Mar 4, 2017 at 20:56

Neil

173k12 gold badges72 silver badges276 bronze badges

Add a comment |

briantist · Accepted Answer · 2017-03-04 21:22:52Z

PowerShell, 87 bytes

param($s)([regex](([regex]'(.+)\1'|% *hes $s|sort L*)[-1]|% Gr*|% V*)[1])|% Re* $s '' 1

Try it online! (all test cases)

Explanation

Starting from the inside basically, we run Matches with the (.+)\1 regex, to return all match objects for the specified string. The regex matches any sequence of characters that is followed by itself.

Then the resulting match objects are piped into sort to be sorted by their Length property (shortened to wildcard). This results in an array of matches sorted by length, ascending, so index with [-1] to get the last element (the longest). That match's value is the match though, not the group, so it includes the repetition, so we retrieve the Group object (|% Gr*) and then the value of that (|% V*) to get the largest repeated string. Thing is the group object is actually an array because group 0 is always the match, but I want the actual group (1), so the resulting value is actually values, hence indexing to get the second element [1]. This value is casted to a regex object itself and then the Replace method is called against the original string, replacing with nothing, and only the first match is replaced (|% Re* $s '' 1).

Dennis · Accepted Answer · 2017-03-05 05:56:58Z

5

Jelly, 23 21 bytes

ṚẆUẋ€2ẇÐf¹ṪðLHḶ+w@Ṭœp

Thanks to @JonathanAllan for his Ṭœp idea which saved 2 bytes.

Try it online!

edited Mar 5, 2017 at 5:56

answered Mar 4, 2017 at 23:30

Dennis

210k41 gold badges370 silver badges824 bronze badges

Add a comment |

Community · Accepted Answer · 2017-04-13 12:39:16Z

4

Mathematica, 63 60 59 bytes

^{4 bytes saved due to Martin Ender.}

#&@@StringReplaceList[#,a__~~a__->a]~SortBy~{StringLength}&

Anonymous function. Takes a string as input and returns a string as output.

edited Apr 13, 2017 at 12:39

CommunityBot

1

answered Mar 4, 2017 at 21:24

LegionMammal978

18k4 gold badges24 silver badges57 bronze badges

\$\begingroup\$ This doesn't seem to work on example 6 — ~SortBy~StringLength sorts strings alphabetically if their lengths are the same... \$\endgroup\$
– Not a tree
Commented Mar 6, 2017 at 10:55
1

\$\begingroup\$ @LegionMammal978 The shorter fix is keep SortBy and wrap StringLength in a list to get a stable sort. \$\endgroup\$
– Martin Ender
Commented Mar 6, 2017 at 13:51

Add a comment |

Arnauld · Accepted Answer · 2017-03-05 00:11:36Z

3

JavaScript (ES6), 70 bytes

s=>s.replace(s.match(/(.+)(?=\1)/g).reduce((p,c)=>c[p.length]?c:p),'')

Test cases

let f =

s=>s.replace(s.match(/(.+)(?=\1)/g).reduce((p,c)=>c[p.length]?c:p),'')

console.log(f("hello hello world"))
console.log(f("foofoo"))
console.log(f("aaaaa"))
console.log(f("ababcbc"))
console.log(f("Sometimes it happens that while typing a sentence, I am distracted and I end up typing the same couple of words twice couple of words twice in succession."))

answered Mar 5, 2017 at 0:11

Arnauld

194k20 gold badges176 silver badges639 bronze badges

\$\begingroup\$ Fails on aaaabaaab, but nice use of reduce. \$\endgroup\$
– Neil
Commented Mar 5, 2017 at 10:19

Add a comment |

TRОLL · Accepted Answer · 2017-03-04 22:06:40Z

2

This should be a comment, but I don't have enough reputations to comment. I just want to tell @Neil that his code can be reduced to 77 bytes. You don't need to use forward assertion in regex. Here is reduced version:

s=>s.replace(/(.+)\1/g,(_,m)=>(n=m.length)>l&&(l=n,r=m),l=0)&&s.replace(r,'')

answered Mar 4, 2017 at 22:06

TRОLL

212 bronze badges

2

\$\begingroup\$ Hello, and welcome to PPCG! You can submit this as your own JavaScript answer! If you'd like, I can edit your post and show you how it should look. \$\endgroup\$
– NoOneIsHere
Commented Mar 4, 2017 at 22:22
2

\$\begingroup\$ I need to use the forward assertion to deal with the case of overlapping matches. aabab is the shortest example of where your suggestion fails. \$\endgroup\$
– Neil
Commented Mar 5, 2017 at 10:16

Add a comment |

Extragorey · Accepted Answer · 2017-03-07 04:51:19Z

C#, 169 bytes

(s)=>{var x="";for(int i=0;i<s.Length-2;i++){for(int l=1;l<=(s.Length-i)/2;l++){var y=s.Substring(i,l);if(s.Contains(y+y)&l>x.Length)x=y;}}return s.Replace(x+x,x);}

Explanation

(s) => {                // Anonymous function declaration    
    var x = "";         // String to store the longest repeating substring found
    for (int i = 0; i < s.Length - 2; i++) {               // Loop through the input string
        for (int l = 1; l <= (s.Length - i) / 2; l++) {    // Loop through all possible substring lengths
            var y = s.Substring(i, l);
            if (s.Contains(y + y) & l > x.Length) x = y;   // Check if the substring repeats and is longer than any previously found
        }
    }
    return s.Replace(x + x, x);    // Perform the replacement
}

This is the brute-force approach: try every possible substring until we find the longest repeating substring. Undoubtedly Regex is more efficient, but dealing with Regex in C# tends to be quite verbose.

Welcome to PPCG! All answers need to be either full programs or callable functions, not sure snippets with inputs in hardcoded variables. Also, please show the version of the code that you actually counted with all unnecessary whitespace removed. You can always include the more readable version with indentation in addition to the fully golfed one. — Martin Ender, Commented Mar 6, 2017 at 13:53

aross · Accepted Answer · 2017-03-09 09:35:07Z

PHP, 84 82 bytes

Note: uses IBM-850 encoding.

for($l=strlen($argn);--$l&&!$r=preg_filter("#(.{0$l})\g-1#",~█╬,$argn,1););echo$r;

Run like this:

echo 'hello hello world' | php -nR 'for($l=strlen($argn);--$l&&!$r=preg_filter("#(.{0$l})\g-1#",~█╬,$argn,1););echo$r;';echo
> hello world

Explanation

for(
  $l=strlen($argn);   # Set $l to input length.
  --$l   &&           # Decrement $l each iteration until it becomes 0.
  !$r=preg_filter(    # Stop looping when preg_filter has a result
                      # (meaning a successful replace).
    "#(.{0$l})\g-1#", # Find any character, $l times (so the longest
                      # match is tried first), repeated twice.
    ~█╬,              # Replace with $1: first capture group, removing the
                      # duplicate.
    $argn,
    1                 # Only replace 1 match.
  );
);
echo$r;               # Print the result of the (only) successful
                      # search/replace, if any.

Tweaks

Saved 2 bytes because there is no minimum length of the repeated substring

Stack Exchange Network

Find the original string, without the repetition without the repetition in the middle

Task

Examples

12 Answers 12

Retina, 35 33 bytes

Explanation

Perl 6, 40 bytes

Jelly, 22 19 bytes

How?

Haskell, 101 bytes

Notes

JavaScript (ES6), 81 74 bytes

PowerShell, 87 bytes

Explanation

Jelly, 23 21 bytes

Mathematica, 63 60 59 bytes

JavaScript (ES6), 70 bytes

Test cases

C#, 169 bytes

Explanation

PHP, 84 82 bytes

Explanation

Tweaks

Not the answer you're looking for? Browse other questions tagged
code-golf
string
sequence
subsequence
or ask your own question.

Hot Network Questions

Find the original string, without the repetition without the repetition in the middle

Task

Examples

12 Answers 12

Retina, 35 33 bytes

Explanation

Perl 6, 40 bytes

Jelly, 22 19 bytes

How?

Haskell, 101 bytes

Notes

JavaScript (ES6), 81 74 bytes

PowerShell, 87 bytes

Explanation

Jelly, 23 21 bytes

Mathematica, 63 60 59 bytes

JavaScript (ES6), 70 bytes

Test cases

C#, 169 bytes

Explanation

PHP, 84 82 bytes

Explanation

Tweaks

Not the answer you're looking for? Browse other questions tagged code-golfstringsequencesubsequence or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
code-golf
string
sequence
subsequence
or ask your own question.