25
\$\begingroup\$

Sometimes it happens that while typing a sentence, I am distracted and I end up typing the same couple of words twice couple of words twice in succession.

To make sure make sure other people are not bothered by this, your task is to write a program that resolves this problem!

Task

Given an input string (if it matters for your language, you may assume ASCII-only input that does not contain linefeeds.) str, that contains somewhere in its middle a substring that occurs twice in immediate succession, return the string with one instance of this substring removed.

In the case of multiple possibilities, return the shortest answer possible (that is, pick the longest consecutive repeating substring and remove that one).

In the case of multiple, equally-long consecutive repeating substrings, remove the first (that is, the first one encountered when reading through the string from front to back) one.

You may assume that the input is correct (i.e. always contains a consecutive repeating substring), which might help to golf it down.


Examples

  1. Input: hello hello world -> Output: hello world.
  2. Input: foofoo -> Output: foo. (So: Yes, the string might only consist of the repeating part twice).
  3. Input: aaaaa -> Output: aaa, as the longest repeating consecutive substring is here aa.
  4. Input: Slartibartfast -> This is not a valid input, as it does not contain a consecutive repeating substring, so you do not need to handle this case.
  5. Input: the few the bar -> This is another invalid input, since the repeating part should immediately follow the original part. In this case, the and the are separated by something else in-between, so this input is invalid.
  6. Input: ababcbc -> Output: abcbc. The two possible longest consecutive repeating substrings are ab and bc. As ab is encountered earlier in the string, this one is the correct answer.
  7. Input: Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo. Output: Buffalo buffalo buffalo buffalo Buffalo buffalo. (The performed replacement should be case-sensitive).
  8. Input: Sometimes it happens that while typing a sentence, I am distracted and I end up typing the same couple of words twice couple of words twice in succession. -> Output: Sometimes it happens that while typing a sentence, I am distracted and I end up typing the same couple of words twice in succession.. Only the longest consecutive repeating substring is removed.

Your code should be as short as possible, since this is , so the shortest answer in bytes wins. Good luck!

\$\endgroup\$
6
  • \$\begingroup\$ @manatwork When taking the first sentence, that is Sometimes it happens that while typing a sentence, I am distracted and I end up typing the same couple of words twice couple of words twice in succession. as input, the output should be Sometimes it happens that while typing a sentence, I am distracted and I end up typing the same couple of words twice in succession.. Only the longest found duplication is removed. \$\endgroup\$
    – Qqwy
    Commented Mar 4, 2017 at 19:38
  • 1
    \$\begingroup\$ I suggest adding a test that has two possible replacements, where the second one is longer than the first. I suspect most answers won't pass that one :) \$\endgroup\$
    – aross
    Commented Mar 6, 2017 at 12:45
  • \$\begingroup\$ @aross test case 8 is exactly that :) \$\endgroup\$
    – Qqwy
    Commented Mar 8, 2017 at 7:50
  • \$\begingroup\$ Unless me and my test code are mistaken, there is only one repeated string in there. \$\endgroup\$
    – aross
    Commented Mar 8, 2017 at 9:08
  • \$\begingroup\$ @aross there is a double p in happens \$\endgroup\$
    – Qqwy
    Commented Mar 9, 2017 at 9:19

12 Answers 12

9
\$\begingroup\$

Retina, 35 33 bytes

Byte count assumes ISO 8859-1 encoding.

(?=(.+)(\1.*))
$2¶$`
O$#`
$.&
G1`

Try it online!

Explanation

Since regex engines look for matches from left to right, it's not trivial to find the longest match regardless of position. It can be done with .NET's balancing groups, but the result is rather unpleasantly long:

1`((.)+)\1(?<=(?!.*((?>(?<-2>.)+).+)\3)^.*)
$1

So I figured I'd try to avoid that by making use of some other Retina features.

(?=(.+)(\1.*))
$2¶$`

We start by essentially applying all possible substitutions, one on each line. To do this, we match the position in front of a match (instead of the match itself), to allow for overlapping matches. This is done by putting the real regex into a lookahead. That lookahead then captures the remaining except the duplicate we want to remove in group 2. We write back group 2 (deleting the duplicate), a linefeed, and then then the entire input up to the match, which gives us basically a fresh line to be substituted.

At the end we'll have one line for each match, with the corresponding duplicate removed. At the end there'll also be the full input again without any substitutions done.

Now that we have all the possible substitutions, we want the shortest result (which corresponds to the longest removed repetition).

O$#`
$.&

So we first sort the lines by length.

G1`

And then we only keep the first line.

\$\endgroup\$
1
  • \$\begingroup\$ Wow, that replacement technique is really clever! \$\endgroup\$
    – Leo
    Commented Mar 5, 2017 at 15:45
8
\$\begingroup\$

Perl 6, 40 bytes

{.subst: m:ex/(.*))>$0/.max(*.chars),''}

Try it

{
  .subst:             # substitute


    m                 # match
    :exhaustive
    /
      ( .* )          # any number of chars

      )>              # don't include the following in what is returned

      $0              # the first match again
    /.max( *.chars ), # find the first longest submatch


    ''                # substitute it with nothing
}
\$\endgroup\$
0
7
\$\begingroup\$

Jelly, 22 19 bytes

-2 bytes thanks to Dennis (avoid an argument reversal, remove subtly redundant increment)

ẋ2³wȧ+¥J
ẆÇ€LÐṀḢṬœp

Try it online!

Full program (a bug has been found with ÐṀ not acting with the correct arity over dyads, which will be fixed soon; although I'm not sure it can make for shorter code here).

How?

Finds the first of the longest slices of the input such that a repetition exists in the input and removes it from the input.

ẋ2³wȧ+¥J - Link 1, removal indices for given slice if valid, else 0: slice, x
ẋ2       - repeat x twice, say y
  ³      - program input: s
   w     - index of first occurrence of y in s (1-based) or 0, say i
       J - range(length(x)): [1,2,3,...,length(x)]
      ¥  - last two links as a dyad
    ȧ    -     and (non-vectorising)
     +   -     addition: [1+i,2+i,3+i,...,length(x)+i] or 0
         - note: no need to decrement these since the last index will be the 1st index
         - of the repetition (thanks to Dennis for spotting that!)

ẆÇ€LÐṀḢṬœp - Main link: string, s
Ẇ          - all sublists of s (order is short to long, left to right, e.g. a,b,c,ab,bc,abc)
 Ç€        - call the last link (1) as a monad for €ach
    ÐṀ     - filter by maximal
   L       -     length
      Ḣ    - head: get the first (and hence left-most) one
       Ṭ   - untruth: make a list with 1s at the indexes given and 0s elsewhere
        œp - partition s at truthy indexes of that, throwing away the borders
           - implicit print
\$\endgroup\$
0
6
\$\begingroup\$

Haskell, 101 bytes

Main function is f, it takes and returns a String.

l=length
a=splitAt
f s|i<-[0..l s-1]=[p++t|n<-i,(p,(r,t))<-fmap(a$l s-n).(`a`s)<$>i,r==take(l r)t]!!0

Try it online!

When I started this, I imported Data.List and used maximum, tails, inits and isPrefixOf. Somehow that got turned into this. But I still only managed to shave off 11 bytes...

Notes

  • splitAt/a splits a string at a given index.
  • s is the input string.
  • i is the list of numbers [0 .. length s - 1], the -1 is to work around that splitAt splits at the end if given a too large index.
  • n is length s minus the current length goal for the repeated part, it's chosen that way so we don't have to use two number lists and/or the verbose decreasing list syntax.
  • p,r, and t are a threeway split of s, with r the intended repeated part. The fmap there uses the (,) String Functor to avoid a variable for an intermediate split.
  • !!0 selects the first element of the list of matches.
\$\endgroup\$
6
\$\begingroup\$

JavaScript (ES6), 81 74 bytes

f=
s=>s.replace(/(?=(.+)\1)/g,(_,m)=>r=m[r.length]?m:r,r='')&&s.replace(r,'')
<input oninput=o.textContent=f(this.value)><pre id=o>

Edit: Saved 7 bytes by stealing @Arnauld's m[r.length] trick.

\$\endgroup\$
0
5
\$\begingroup\$

PowerShell, 87 bytes

param($s)([regex](([regex]'(.+)\1'|% *hes $s|sort L*)[-1]|% Gr*|% V*)[1])|% Re* $s '' 1

Try it online! (all test cases)

Explanation

Starting from the inside basically, we run Matches with the (.+)\1 regex, to return all match objects for the specified string. The regex matches any sequence of characters that is followed by itself.

Then the resulting match objects are piped into sort to be sorted by their Length property (shortened to wildcard). This results in an array of matches sorted by length, ascending, so index with [-1] to get the last element (the longest). That match's value is the match though, not the group, so it includes the repetition, so we retrieve the Group object (|% Gr*) and then the value of that (|% V*) to get the largest repeated string. Thing is the group object is actually an array because group 0 is always the match, but I want the actual group (1), so the resulting value is actually values, hence indexing to get the second element [1]. This value is casted to a regex object itself and then the Replace method is called against the original string, replacing with nothing, and only the first match is replaced (|% Re* $s '' 1).

\$\endgroup\$
0
5
\$\begingroup\$

Jelly, 23 21 bytes

ṚẆUẋ€2ẇÐf¹ṪðLHḶ+w@Ṭœp

Thanks to @JonathanAllan for his Ṭœp idea which saved 2 bytes.

Try it online!

\$\endgroup\$
4
\$\begingroup\$

Mathematica, 63 60 59 bytes

4 bytes saved due to Martin Ender.

#&@@StringReplaceList[#,a__~~a__->a]~SortBy~{StringLength}&

Anonymous function. Takes a string as input and returns a string as output.

\$\endgroup\$
2
  • \$\begingroup\$ This doesn't seem to work on example 6 — ~SortBy~StringLength sorts strings alphabetically if their lengths are the same... \$\endgroup\$
    – Not a tree
    Commented Mar 6, 2017 at 10:55
  • 1
    \$\begingroup\$ @LegionMammal978 The shorter fix is keep SortBy and wrap StringLength in a list to get a stable sort. \$\endgroup\$ Commented Mar 6, 2017 at 13:51
3
\$\begingroup\$

JavaScript (ES6), 70 bytes

s=>s.replace(s.match(/(.+)(?=\1)/g).reduce((p,c)=>c[p.length]?c:p),'')

Test cases

let f =

s=>s.replace(s.match(/(.+)(?=\1)/g).reduce((p,c)=>c[p.length]?c:p),'')

console.log(f("hello hello world"))
console.log(f("foofoo"))
console.log(f("aaaaa"))
console.log(f("ababcbc"))
console.log(f("Sometimes it happens that while typing a sentence, I am distracted and I end up typing the same couple of words twice couple of words twice in succession."))

\$\endgroup\$
1
  • \$\begingroup\$ Fails on aaaabaaab, but nice use of reduce. \$\endgroup\$
    – Neil
    Commented Mar 5, 2017 at 10:19
2
\$\begingroup\$

This should be a comment, but I don't have enough reputations to comment. I just want to tell @Neil that his code can be reduced to 77 bytes. You don't need to use forward assertion in regex. Here is reduced version:

s=>s.replace(/(.+)\1/g,(_,m)=>(n=m.length)>l&&(l=n,r=m),l=0)&&s.replace(r,'')
\$\endgroup\$
2
  • 2
    \$\begingroup\$ Hello, and welcome to PPCG! You can submit this as your own JavaScript answer! If you'd like, I can edit your post and show you how it should look. \$\endgroup\$ Commented Mar 4, 2017 at 22:22
  • 2
    \$\begingroup\$ I need to use the forward assertion to deal with the case of overlapping matches. aabab is the shortest example of where your suggestion fails. \$\endgroup\$
    – Neil
    Commented Mar 5, 2017 at 10:16
0
\$\begingroup\$

C#, 169 bytes

(s)=>{var x="";for(int i=0;i<s.Length-2;i++){for(int l=1;l<=(s.Length-i)/2;l++){var y=s.Substring(i,l);if(s.Contains(y+y)&l>x.Length)x=y;}}return s.Replace(x+x,x);}

Explanation

(s) => {                // Anonymous function declaration    
    var x = "";         // String to store the longest repeating substring found
    for (int i = 0; i < s.Length - 2; i++) {               // Loop through the input string
        for (int l = 1; l <= (s.Length - i) / 2; l++) {    // Loop through all possible substring lengths
            var y = s.Substring(i, l);
            if (s.Contains(y + y) & l > x.Length) x = y;   // Check if the substring repeats and is longer than any previously found
        }
    }
    return s.Replace(x + x, x);    // Perform the replacement
}

This is the brute-force approach: try every possible substring until we find the longest repeating substring. Undoubtedly Regex is more efficient, but dealing with Regex in C# tends to be quite verbose.

\$\endgroup\$
1
  • \$\begingroup\$ Welcome to PPCG! All answers need to be either full programs or callable functions, not sure snippets with inputs in hardcoded variables. Also, please show the version of the code that you actually counted with all unnecessary whitespace removed. You can always include the more readable version with indentation in addition to the fully golfed one. \$\endgroup\$ Commented Mar 6, 2017 at 13:53
0
\$\begingroup\$

PHP, 84 82 bytes

Note: uses IBM-850 encoding.

for($l=strlen($argn);--$l&&!$r=preg_filter("#(.{0$l})\g-1#",~█╬,$argn,1););echo$r;

Run like this:

echo 'hello hello world' | php -nR 'for($l=strlen($argn);--$l&&!$r=preg_filter("#(.{0$l})\g-1#",~█╬,$argn,1););echo$r;';echo
> hello world

Explanation

for(
  $l=strlen($argn);   # Set $l to input length.
  --$l   &&           # Decrement $l each iteration until it becomes 0.
  !$r=preg_filter(    # Stop looping when preg_filter has a result
                      # (meaning a successful replace).
    "#(.{0$l})\g-1#", # Find any character, $l times (so the longest
                      # match is tried first), repeated twice.
    ~█╬,              # Replace with $1: first capture group, removing the
                      # duplicate.
    $argn,
    1                 # Only replace 1 match.
  );
);
echo$r;               # Print the result of the (only) successful
                      # search/replace, if any.

Tweaks

  • Saved 2 bytes because there is no minimum length of the repeated substring
\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.