The first obstacle is tower height. Factoring in standard refraction and surface curvature you'll need a tower around 150m tall at both ends of the span. On a clear day that will give you around 115m obstructed height, leaving 35m of the top of the towers visible from either side.
At that range you'll need either a good-sized signal panel (flags, shutters, whatever) or a fairly bright light to be visible without a telescope. If you're using signal panels they'll need to be at least 30m across to be barely visible to the naked eye, which has a minimum resolution of about 1 arcminute (1/60th of a degree). If you can build a pair of 150m-tall towers with an array of signal panels on top, you are probably capable of building some fixed telescopes on the towers to allow you to see smaller signal panels.
As an aside, a shutter mechanism that has bright lights behind it is more visible, because we can see a light source that has an angular resolution smaller than our eye resolution. You still need the lights separated enough to resolve which light you are seeing.
Moving on, let's consider the coding.
While you can use a single light to transmit all of the information you need to, but if you're relying on human perception your bandwidth is going to be quite low since each signal has to be long enough to be visible when the observer blinks. Having multiple signal bits is harder, from an engineering perspective, but it greatly increases the bandwidth available... to a point. You need to keep the total number of signal panels down to the point where a human can unambiguously recognize each valid combination of states in a short amount of time. I honestly can't tell you where the break point is, but I expect that you should be able to get upwards of 5 bits of data (32 symbols) into each frame without too many problems.
No, that's not 5 signal panels, that's 32 combinations of 6-7 panels. Yes, that's 50-75% worth of wasted bandwidth, but there are a lot of combinations that are difficult to read accurately because they're too similar.
Consider a 3x2 array of signal panels. Single bit combinations are mostly out since it relies too much on the observer being able to figure out which particular bit is lit, and they need to do so quickly. So that's 6 possible combinations out of the way. Same deal with two adjacent bits - which two are they? There's another 7 bad ones we need to get rid of. And so on.
So how many good combinations are there for a 3x2 array? Not as many as you might think. There are a couple of rules to consider:
- All combinations must have at least one light on in both rows.
- No combination may be shifted on the grid, which means:
- No empty left column.
- No empty right column.
Seems pretty simple. We can enumerate those fairly quickly (there's only 6 bits of data here after all). Here's the full list, split by common bottom row values:
☒☐☐ ☒☐☒ ☒☒☐ ☒☒☒
☐☐☒ ☐☐☒ ☐☐☒ ☐☐☒
☒☐☒ ☒☒☒
☐☒☐ ☐☒☐
☒☐☐ ☒☐☒ ☒☒☐ ☒☒☒
☐☒☒ ☐☒☒ ☐☒☒ ☐☒☒
☐☐☒ ☐☒☒ ☒☐☒ ☒☒☒
☒☐☐ ☒☐☐ ☒☐☐ ☒☐☐
☐☐☒ ☐☒☐ ☐☒☒ ☒☐☐ ☒☐☒ ☒☒☐ ☒☒☒
☒☐☒ ☒☐☒ ☒☐☒ ☒☐☒ ☒☐☒ ☒☐☒ ☒☐☒
☐☐☒ ☐☒☒ ☒☐☒ ☒☒☒
☒☒☐ ☒☒☐ ☒☒☐ ☒☒☐
☐☐☒ ☐☒☐ ☐☒☒ ☒☐☐ ☒☐☒ ☒☒☐ ☒☒☒
☒☒☒ ☒☒☒ ☒☒☒ ☒☒☒ ☒☒☒ ☒☒☒ ☒☒☒
With just those two rules we've whittled it down to 32 viable combinations - 50% of the 64 possible combinations of 6 lights. With training I think an average human operator could be trained to recognize 2 of these symbols per second, maybe more if they're using a keyboard of some sort to enter the symbols. That's upwards of 10 bits per second worth of bandwidth... which is slow, but far from useless.
To put this into perspective, highyly skilled amateur enthusiasts can transcribe Morse Code from audio (via keyboard) at about the same rate - 60 words per minute, or approximately 10 bits per second (assuming: 5 characters per word, ~5 symbols per character and factoring for inter-character and inter-word gaps). While there are people who can go higher, first class licencing requirements are much, much lower: 25 WPM (~4 bps) for text.