28
\$\begingroup\$

(Literally: "Does this follow/realize the gismu-form?")

Premise

The language Lojban is a constructed language, meaning in part that all of its words have been created rather than allowed to develop naturally. The semantic base of Lojban are its gismu, or root words, which were synthesized by combining roots from widely spoken natural languages like Chinese, Hindi, and English. All gismu are 5 letters long and follow a certain strict form.

Information

For our purposes, the Lojban alphabet is:

abcdefgijklmnoprstuvxz

That is, the Roman alphabet without hqwy.

This alphabet can be divided into four categories:

  • Vowels aeiou

  • Sonorant consonants lmnr

  • Unvoiced consonants ptkfcsx. When voiced, these become respectively the...

  • Voiced consonants bdgvjz (No voiced consonant corresponds to x.)

To be a valid gismu, a 5-char-long string must:

  1. Be in one of the consonant-vowel patterns CVCCV or CCVCV, where C represents a consonant, and V represents a vowel.

  2. Follow consonant-matching rules.

Consonant-matching rules for CCVCV words:

The first two characters must constitute one of the following 48 pairs (source):

ml mr
pl pr
bl br
   tr                   tc ts
   dr                   dj dz
kl kr
gl gr
fl fr
vl vr
cl cr cm cn cp ct ck cf
      jm    jb jd jg jv
sl sr sm sn sp st sk sf
      zm    zb zd zg zv
xl xr

Note that this looks rather nicer when separated into voiced and unvoiced pairs. In particular, every voiced-voiced pair is valid iff the corresponding unvoiced-unvoiced pair is valid. This does not extend to pairs with a sonorant consonant; cl is valid but jl is not.

Consonant-matching rules for CVCCV words (source):

The third and fourth characters must follow the following rules:

  1. It is forbidden for both consonants to be the same [...]

  2. It is forbidden for one consonant to be voiced and the other unvoiced. The consonants “l”, “m”, “n”, and “r” are exempt from this restriction. As a result, “bf” is forbidden, and so is “sd”, but both “fl” and “vl”, and both “ls” and “lz”, are permitted.

  3. It is forbidden for both consonants to be drawn from the set “c”, “j”, “s”, “z”.

  4. The specific pairs “cx”, “kx”, “xc”, “xk”, and “mz” are forbidden.

Note that there are 179 possible pairs.

Challenge

Determine if the given string follows the gismu formation rules. This is , so the shortest solution in bytes wins.

Input: A string of length 5 from the Lojban alphabet.

Output: A truthy value if the string can be a gismu and a falsey value otherwise.

Test cases

Valid:

gismu
cfipu
ranxi
mupno
rimge
zosxa

Invalid:

ejram
xitot
dtpno
rcare
pxuja
cetvu

More test cases: this text file contains all valid gismu, one per line.

I don't really know Lojban, so I suspect the title translation is wrong. Help is appreciated.

\$\endgroup\$
10
  • 9
    \$\begingroup\$ Note that Lojban pronunciation is phonetic, so gismu is pronounced with a hard g, like in GIF. \$\endgroup\$
    – lirtosiast
    Commented Dec 8, 2015 at 20:27
  • 12
    \$\begingroup\$ I don't know if that's a good example, because the official pronunciation of GIF is like Jiff. :p \$\endgroup\$
    – geokavel
    Commented Dec 8, 2015 at 21:16
  • \$\begingroup\$ Side question: Since both s and k are part of the language, what pronunciation does c has? \$\endgroup\$
    – Fatalize
    Commented Dec 9, 2015 at 11:07
  • 2
    \$\begingroup\$ @Fatalize: It's "sh". \$\endgroup\$
    – Deusovi
    Commented Dec 9, 2015 at 11:43
  • 1
    \$\begingroup\$ @Deusovi it seems you are right. The reason I got it wrong is because j is not pronounced as English J, but rather as French J (without the plosive at the beginning.) From one of the linked pages The regular English pronunciation of “James”, which is [dʒɛjmz], would Lojbanize as “djeimz.”, which contains a forbidden consonant pair......[additional rule to avoid this] so we see that the plosive D needs to be added in. The unvoiced version of French J is indeed SH. The IPA symbols (for those who understand them) are on the wikipedia page. \$\endgroup\$ Commented Dec 9, 2015 at 12:12

4 Answers 4

7
+200
\$\begingroup\$

Ruby, 302 252 bytes

c='[cjsztdpbfvkgxmnlr]'
v=c+'[aeiou]'
z=!1
n=/#{c+v+v}/=~(s=gets.chop)*2
(n==0||n==2)&&289.times{|k|q=[i=k%17,j=k/17].max
z||=(x=s[n,2])==c[j+1]+c[i+1]&&['UeUeJOJOJOJOETJ
:'[i].ord-69>>j&1-j/14>0,i!=j&&q>3&&(k%2<1||q>12)&&!'mzxcxkx'.index(x)][n/2]}
p z

A few more bytes could be saved as follows:

Initialize z to false using z=!c='[cjsztdpbfvkgxmnlr]'. This works but gives the warning warning: found = in conditional, should be ==.

Change from a program to a function (I left it as a program because according to the question, shortest "program" in bytes wins.)

Summary of changes from first post

Major overhaul of regex/matching part.

Constant 72 changed to 69 so that the lowest ASCII code in the magic string is 10 instead of 13. This enables a literal newline to be used in the golfed version instead of an escape sequence.

Magic string 'mzxcxkx' replaces arithmetic rules for the 5 prohibited characters in the CVCCV type table.

ungolfed version

added whitespace and changed newline in magic string to a \n

c='[cjsztdpbfvkgxmnlr]'                                   #c=consonants
v=c+'[aeiou]'                                             #v=consonant+vowel
z=!1                                                      #Set z to false (everything is truthy in Ruby except nil and false.)
n=/#{c+v+v}/=~(s=gets.chop)*2                             #Get input and duplicate it. do regex match, n becomes the index of the double consonant. 
(n==0||n==2)&&                                            #If n==0 (ccvcv) or n==2 (cvccv) 
   289.times{|k|                                          #iterate 17*17 times
     q=[i=k%17,j=k/17].max                                #generate row and column, find their maximum.
     z||=                                                 #OR z with the following expression:
     (x=s[n,2])==c[j+1]+c[i+1]&&                          #double consonant == the pair corresponding to j,i AND either 
       ["UeUeJOJOJOJOETJ\n:"[i].ord-69>>j&1-j/14>0,       #this expression or
       i!=j&&q>3&&(k%2<1||q>12)&&!'mzxcxkx'.index(x)][n/2]#this expresson, depending on the value of n/2
   }
p z                                                       #print output

Explanation of matching

The two characters in the input string s[n,2]are compared with the character pair of the iterating loop. If they match and the consonant-vowel regex pattern is correct, the row and column values i,j are checked for validity. Careful ordering of the consonants helps here.

For CVCCV:

i!=j                        It is forbidden for both consonants to be the same
(k%2<1||q>12)               It is forbidden for one consonant to be voiced and the other unvoiced. The consonants “l”, “m”, “n”, and “r” are exempt from this restriction. As a result, “bf” is forbidden, and so is “sd”, but both “fl” and “vl”, and both “ls” and “lz”, are permitted.
q>3                         It is forbidden for both consonants to be drawn from the set “c”, “j”, “s”, “z”.
!'mzxcxkx'.index(x)         The specific pairs “cx”, “kx”, “xc”, “xk”, and “mz” are forbidden.

For CCVCV

A bitmap for each column of the table below is encoded into the magic string, from which 69 is subtracted. For all columns except the last two, only 6 bits are required. For the last two, the higher order bits need to be 1, so a negative number is generated (characters \n and :) in order to have leading 1's instead of leading zeroes. We don't want to include the last three rows of the table though, so instead of rightshift and ANDing by 1, we rightshift and AND by 1-j/14 which normally evaluates to 1, but evaluates to 0 for the last 3 rows.

The following program (with the same expressions as the submission) was used to generate the tables below (uncomment whichever if line is required for the table you want.

c='[cjsztdpbfvkgxmnlr]'
z=0
289.times{|k|
  q=[i=k%17,j=k/17].max
  r=c[j+1]+c[i+1]
  #if i!=j && q>3 && (k%2<1||q>12) && !'mzxcxkx'.index(r)
  #if "UeUeJOJOJOJOETJ\n:"[i].ord-69>>j&1-j/14>0
    print r,' '
    z+=1
  else
    print '   '
  end
  i==16&&puts 
}
puts z


            ct    cp    cf    ck       cm cn cl cr
               jd    jb    jv    jg    jm jn jl jr
            st    sp    sf    sk    sx sm sn sl sr
               zd    zb    zv    zg    zm zn zl zr
tc    ts          tp    tf    tk    tx tm tn tl tr
   dj    dz          db    dv    dg    dm dn dl dr
pc    ps    pt          pf    pk    px pm pn pl pr
   bj    bz    bd          bv    bg    bm bn bl br
fc    fs    ft    fp          fk    fx fm fn fl fr
   vj    vz    vd    vb          vg    vm vn vl vr
kc    ks    kt    kp    kf             km kn kl kr
   gj    gz    gd    gb    gv          gm gn gl gr
      xs    xt    xp    xf             xm xn xl xr
mc mj ms    mt md mp mb mf mv mk mg mx    mn ml mr
nc nj ns nz nt nd np nb nf nv nk ng nx nm    nl nr
lc lj ls lz lt ld lp lb lf lv lk lg lx lm ln    lr
rc rj rs rz rt rd rp rb rf rv rk rg rx rm rn rl 
179

            ct    cp    cf    ck       cm cn cl cr
               jd    jb    jv    jg    jm
            st    sp    sf    sk       sm sn sl sr
               zd    zb    zv    zg    zm
tc    ts                                        tr
   dj    dz                                     dr
                                             pl pr
                                             bl br
                                             fl fr
                                             vl vr
                                             kl kr
                                             gl gr
                                             xl xr
                                             ml mr


48
\$\endgroup\$
1
  • \$\begingroup\$ I changed the wording to allow functions; sorry it took so long. \$\endgroup\$
    – lirtosiast
    Commented Dec 18, 2015 at 1:15
6
\$\begingroup\$

JavaScript (ES6), 366 352 bytes

g=>((q=3,w=2,r=0,f="mzcscjzjxcxkx",c="bdgvjzptkfcsxlmnr",d=[...c],v="aeiou")[m="match"](g[1])?d.map((a,i)=>d.map((b,j)=>a==b|(i<6&j>5&j<13|j<6&i>5&i<13)||f[m](a+b)||(p+=","+a+b)),p="",q=0,r=w--)&&p:"jbl,zbr,tstcl,cmr,cn,cr,jdr,cfl,sfr,jgl,zgr,zdjml,ckl,skr,cpl,spr,sl,sm,sn,sr,ctr,jvl,zvr,xl,xr,dzm")[m](g[r]+g[r+1])&&c[m](g[q])&&v[m](g[w])&&v[m](g[4])

Explanation

Returns an array containing the last letter (truthy) if it is a valid gismu or null if it is not.

A lot of the size comes from the hard-coded CCVCV pairs (even after condensing them). It might be possible to find a pattern to generate them but I've spent way too much time on this already! xD

g=>
  (
    // Save the positions to check for the consonant, vowel and pair respectively
    (q=3,w=2,r=0,                       // default = CCVCV format
    f="mzcscjzjxcxkx",                  // f = all forbidden pairs for CVCCV pairs
    c="bdgvjzptkfcsxlmnr",              // c = consonants
    d=[...c],                           // d = array of consonants
    v="aeiou")                          // v = vowels
    [m="match"](g[1])?                  // if the second character is a vowel

      // Generate CC pairs of CVCCV
      d.map((a,i)=>                     // iterate over every possible pair of consonants
        d.map((b,j)=>
          a==b|                         // rule 1: consonants cannot be the same
          (i<6&j>5&j<13|j<6&i>5&i<13)|| // rule 2: pair cannot be voiced and unvoiced
          f[m](a+b)||                   // rule 3 & 4: certain pairs are forbidden
            (p+=","+a+b)                // if it follows all the rules add the pair
        ),
        p="",                           // p = comma-delimited valid CVCCV pairs
        q=0,r=w--                       // update the match positions to CVCCV format
      )&&p
    :
      // CC pairs of CCVCV (condensed so that valid pairs like "jb", "bl" and
      //     "zb" can be matched in this string but invalid pairs like "lz" cannot)
      "jbl,zbr,tstcl,cmr,cn,cr,jdr,cfl,sfr,jgl,zgr,zdjml,ckl,skr,cpl,spr,sl,sm,sn,sr,ctr,jvl,zvr,xl,xr,dzm"

  // Match the required format
  )[m](g[r]+g[r+1])&&c[m](g[q])&&v[m](g[w])&&v[m](g[4])

Test

var solution = g=>((q=3,w=2,r=0,f="mzcscjzjxcxkx",c="bdgvjzptkfcsxlmnr",d=[...c],v="aeiou")[m="match"](g[1])?d.map((a,i)=>d.map((b,j)=>a==b|(i<6&j>5&j<13|j<6&i>5&i<13)||f[m](a+b)||(p+=","+a+b)),p="",q=0,r=w--)&&p:"jbl,zbr,tstcl,cmr,cn,cr,jdr,cfl,sfr,jgl,zgr,zdjml,ckl,skr,cpl,spr,sl,sm,sn,sr,ctr,jvl,zvr,xl,xr,dzm")[m](g[r]+g[r+1])&&c[m](g[q])&&v[m](g[w])&&v[m](g[4])
<input type="text" id="input" value="gismu" />
<button onclick="result.textContent=solution(input.value)">Go</button>
<pre id="result"></pre>

\$\endgroup\$
0
\$\begingroup\$

Javascript ES6, 240 bytes

x=>eval(`/${(c='[bcdfgjklmnprstvxz]')+c+(v='[aeiou]')+c+v}/${t='.test(x)'}?/^[bfgkmpvx][lr]|[cs][fklmnprt]|d[jrz]|[jz][bdgmv]/${t}:/${c+v+c+c+v}/${t}?!/^..((.)\\2|${V='[bdgvjz]'}${U='[ptkfcsx]'}|${U+V}|[cjsz][cjsz]|cx|kx|xc|xk|mz)/${t}:!1`)

I guess this is my work now.

\$\endgroup\$
0
\$\begingroup\$

Haskell + hgl, 186 bytes

c="[^aeiou]"
v="[aeiou]"
cP$rXw(yS hdS)$fo["([mpbkgfvcsx][lr]|[cs][mnptkf]|[jz][mbdgv]|t[csr]|d[jzr])",v,c,v,"|",c,v,"([cjsz]{2}|x[kc]|[kc]x|mz|/p)!([ptkfcsxlmnr]{2}|[bdgvjzlmnr]{2})",v]

Attempt This Online!

Without regex, 189 bytes

y=jzW(p2**).*wR
v=xys W5
c=xys d
d="ptkfcsxlmnr bdgvjzlmnr"
cP$fo[asy$y"mpbkgfvcsx cs jz t d""lr mnptkf mbdgv csr jzr",v,c,v]++fo[c,v,asy$cX(y d d)$zW p2 d d<>y"cjsz x kc m""cjsz kc x z",v]

Attempt This Online!

Explanation

First we define some helpers. y=jzW(p2**).*wR is the most complex. It's a weird synthetic operation that happens to be useful in this usecase, so it doesn't really have a simple explanation. But you can break it down part by part:

  • (p2**) takes two lists of characters and creates all strings formed by taking one character from the first and one from the second, so

    >>> (p2**) "abc" "ae"
    ["aa","ae","ba","be","ca","ce"]
    
  • jzW applies it pairwise to two lists. This allows us to get a bunch of pairs made from combinations of specific sets of characters.

  • (.*wR) makes it so it takes the lists as space separated strings. This is just a denser format than lists of strings.

v is relatively simple. It's a parser that parses any vowel, aeiou. Similarly c parses any Lojban consonant (or a space). d is our list of all consonants, it contains a space and some repeats because this format is useful for other things. We can do this since xys basically treats its input as a set and we know the input will never have a space.

Specifically the string is <voiceless consonants><sonorants> <voiced consonants><sonorants>.

Now we get into the body, this has the form:

cP$fo[??,v,c,v]++fo[c,v,??,v]

with the two ??s each being replaced with the clustering rules.

Reflection

The non-regex answer is so close to beating the regex answer. I have some improvements:

  • While jzW(p2**).*wR is overly synthetic jzW(p2**) and zW p2 are both probably useful.
  • "[aeiou]" is shorter than '[':W5<>"]". If there were a function, ekQ, to enclose a string in square brackets ekQ W5 would be shorter than both.
  • There should probably be constants for consonants of the ISO alphabet as well as the ones for vowels.
  • Back-referencing would be useful in the regex answer. It's already a planned feature, there are just technical hurdles on the way to implementing it.
  • There could be more versatile ways to handle user input parsers. v and c could potentially have been user input parsers if there existed the ability to supply more than one.
  • Along the lines of that, I could make a way for the user to assign escape sequences in some sort of header string, like rwh"v[aeiou];c[^aeiou]""/c/v/c/c/v".

  • Although I didn't think of this at the time, we could save a couple of bytes if we could use ranges in character classes. [ptkfcsxlmnr] could be [cfk-np-tx] (q isn't in the input so it doesn't matter that this matches q). These were implemented in 51806e570.
\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.