Shalin Says...

Welcome 2021

I learned important life lessons in the last year. Many of these are inter-related:

Work intentionally - It is too easy to live life on auto-pilot. Be thoughtful about your relationships, work, and learning.
Train the monkey before building the pedestal - We focus our attention on the easy parts, the things we know how to do. Work on hard problems first. In other words, be ambitious and optimize for growth.
Invert, always invert - Write down your anti-goals and either avoid them or ruthlessly de-prioritize time spent on them.
Build a learning loop - Learn, reflect, apply, evaluate. There’s no point reading if you are not learning!
Walk, exercise, sleep - A tested cure for mental blocks.

We have a new start in this new year. I wish you all well. Happy 2021!

Bangalore just got a new search meetup group started by folks from eBay. This one is a bit more general than our existing Bangalore Solr/Lucene Meetup group which by the way is close to 600 members and running strong!

Today was the first meetup and I presented a session on the upcoming release of Solr 6 which has some juicy new features!

It is called “What’s cooking in Solr 6″. Hope you find it useful!

How to make Apache Solr listen on a specific IP address or host name

Someone asked me how to ensure that Solr is exposed exclusively on a server’s internal IP address so I thought this bit of information would be useful more generally.

On Linux, edit the solr.in.sh file, find the property called SOLR_HOST (it is commented out by default) and set its value to the IP address or the host name that you want Solr to listen for requests.

The procedure is similar on Windows, except that the file to be edited is solr.in.cmd.

Solr auto-detects the IP address of the node on which it is running by default. In case you are curious on how it works behind the scenes, take a look at ZkController.java’s normalizeHostName(String host) method

Edit (15 September 2015) - It was pointed out to me later that setting SOLR_HOST is not enough because the host/IP set by that property is only used by SolrCloud for making inter-shard requests. We also need to set a property used by Jetty in solr.in.sh or solr.in.cmd:

Testing SolrCloud with Jepsen

So yeah, I wrote a blog post after a long time and no, it’s not this one but a mammoth post over at the Lucidworks blog called “Call me maybe: SolrCloud, Jepsen and flaky networks”.

Some interesting excerpts:

TL;DR;
We tested SolrCloud against bridge, random transitive, and fixed transitive network partitions using Jepsen and found no data loss issues for both compare-and-set operations and inserts. One major and a few minor bugs were found. Most have been fixed in Solr 4.10.2 and others will be fixed soon. We’re working on writing better Jepsen tests for Solr and to integrate them into Solr’s build servers. This is a process and it’s not over yet.

What is Jepsen?

Jepsen is a tool which simulates network partitions and tests how distributed data stores behave under them. Put in a simple way, Jepsen will cut off one or more nodes from talking to other nodes in a cluster while continuing to insert, update or lookup data during the partition as well as after the partition heals to find if they lose data, read inconsistent data or become unavailable.
Why is this important? This is important because networks are unreliable. It’s not just the network; garbage collection pauses in the JVM or heavy activity by your neighbour in a server running on the cloud can also manifest in slowdowns which are virtually indistinguishable from a network partition. Even in the best managed data centers, things go wrong; disks fail, switches malfunction, power supplies get shorted out, RAM modules die and a distributed system that runs on a large scale should strive to work around such issues as much as possible.

We found some bugs as well:

SOLR-6530 – Commits under network partition can put any node into ‘down’ state
SOLR-6583 – Resuming connection with ZooKeeper causes log replay
SOLR-6610 – Slow cluster startup

Where’s the code?

All our changes to Jepsen are available on our Jepsen fork at Github in the solr-jepsen branch

Conclusion?

While Solr may require a little extra work in setting up Zookeeper in the beginning, as you can see by these tests, this allows us to create a significantly more stable environment when it matters most: production. The Solr community made a conscious decision to trade a tiny bit of ease of getting started in exchange for a whole lot of “get finished”. This should result in significantly less data loss and more reliable operations in general.

What’s next?

Integrate Jepsen with Solr’s build servers, get these tests running on each change
Test harder; write more tests against more scenarios/topologies. Break Solr, then fix it again.

It’s been a lot of fun! Until next time.

Read more at the Lucidworks blog.

I recorded an episode of the SolrCluster podcast with Yann Yu about the same topic which you may also be interested in.

Shalin Says… turned 4 today!

I received this notification from tumblr today and I figured that it was an appropriate occasion to check in to this oft neglected blog. There are many things to say but too little time to say them.

The biggest item that I am working on right now is SOLR-5308 which allows users to split all documents with a particular route key into their own collection. So if you have a multi-tenant collection then this feature will allow you to migrate a tenant into their own collection seamlessly and without downtime. I’d appreciate help in the form of testing and suggestions on this feature. Please vote or watch the Jira issue if you think this is interesting or useful to you.

(Source: assets)

First Bangalore Lucene/Solr Meetup Report

The First Bangalore Lucene/Solr meetup was organized on Saturday, 8th June 2013 courtesy of the initiatives of Anshum Gupta and Varun Thacker. Although I joined in as a co-organizer but honestly I did nothing except tweet about it and show up with some slides.

I must say that I was pleasantly surprised at the rate at which the group went from zero to a hundred members (it stands at a 132 members as of writing this post). Our initial limit for the venue was fifty but it was increased to seventy five once the size of the venue was confirmed. Microsoft Accelerator was gracious enough to provide a conference room and refreshments for the attendees. 50+ people showed up which is pretty good considering that the meetup schedule clashed with some other popular meetups. The agenda was dominated by presentations but quite a bit of time was spent in Q&A.

Vinodh Kumar R (Head of BloomReach India) gave a talk on ranking models (adversarial vs implicit vs real time news) applicable for different kind of search applications.

Varun Thacker from Unbxd talked about Faceted search and result reordering in Solr focusing on e-commerce applications. He introduced term range facets, multi-select faceting and then delved into reordering documents using function queries and query elevation along with examples and use-cases for each.

Dikchant Sahi presented Apache Solr's DataImportHandler to index databases and xml files in Solr. He also gave a live demo of full and delta imports of a sample music data set into Solr.

I gave a presentation on SolrCloud and Shard Splitting which is something that Anshum Gupta and I have been working on the past few months.

Here are the slides that I presented:

SolrCloud and Shard Splitting from Shalin Mangar

At the start of my presentation, I solicited an informal poll from the audience to gauge their interest:

Everyone in attendance was familiar with the projects
Most of the attendees were already using Solr for search
Everyone in attendance was using Solr 3.x and no one was on SolrCloud yet
Almost everyone was evaluating, prototyping or testing SolrCloud

I met Umesh Prasad from Flipkart at the meetup and we chatted quite a bit on Solr’s performance under heavy bulk re-indexing workloads and also about accommodating large elevation and synonym files in SolrCloud. I’m happy to know that Flipkart uses Apache Solr for their excellent search. I also met a couple of search enthusiasts who have used Solr in the past and want to contribute back to the community.

All in all, I think it was a good first step towards establishing a strong Lucene/Solr community in Bangalore. I wish that the next meetup gives more time for one to one interactions and focused conversations around search issues. It’d be nice to have more about Lucene in the next meetup. A lot of people inquired about training on Apache Solr so we may organize a workshop for the next meetup.

Drop me a line if you are interested in attending a Solr training in Bangalore. Also, if you’re in Bangalore and interested in Lucene/Solr or search in general, do join the Bangalore Lucene/Solr Meetup group.

Resuming blogging

I realize that it’s been a long time since I posted something here. About three years ago, I started working on projects at AOL, which did not have anything to do with Apache Solr. Increasingly I found myself having nothing to say publicly about my work. Though I did get back to working with Apache Solr for the AOL WebMail team, I didn’t resume blogging due to sheer laziness, I guess. (Yes, they use Solr! In fact, AOL WebMail just upgraded to Apache Solr 4.2.1 with impressive results!)

A lot has happened in the meantime. In December 2012, I joined the impressive team at LucidWorks – the Lucene/Solr company, to work (almost) full-time on open source search. In January, this year, I found my life partner and got married.

Now that I work on Lucene/Solr again, I think it is time to resume blogging regularly. I intend to write about new features, tutorials, tips & tricks and perhaps also explain the internals of Lucene/Solr features in greater depth. Here’s to a new start!

Apache Lucene 3.1.0 and Apache Solr 3.1.0 »

This is the first release bringing Lucene and Solr release versions in sync. There are numerous bug fixes, optimizations and new features. Download from here

My Android phone - Samsung Galaxy S Review

I had been holding out on buying a more internet friendly phone for some time now, waiting for 3G service to start in India. After my iPad experience, it was clear to me that I couldn’t be happy with an iPhone but it was also obvious that none of the available Android phones were good enough.

Enter the new Samsung Galaxy S with Android 2.1, awesome 4" display, light weight and a good (enough) battery life. A little market research showed that Samsung’s Super AMOLED technology was best in class and a better phone, the Samsung Galaxy S2, will only be available next year. With 3G to be introduced (supposedly) around October, the stage was set. So, one fine July evening, I bought the Samsung Galaxy S.

Samsung Galaxy S

First of all, a message to people who are surprised upon hearing the price - the Samsung Galaxy S is not a phone; it is a pocket sized computer that also happens to be a phone. And what have I been upto with this device? Here’s a list of ten things (in no particular order) that I have used it to:

Get email alerts via K9 Mail, read blogs, check twitter, buzz and facebook feeds
Take photographs and upload it to Facebook (darn you Airtel, I could never get MMS to work properly)
Watch movies on my desktop monitor and phone through VLC Remote and Gmote.
Stream music from GrooveShark - Yay! freedom from syncing my music
Play Asphalt5 while sitting at the back of an autorickshaw - something ironic about it.
Use AIM to answer questions from our QA team while having lunch. Note to myself, use a spoon next time if you want to use the phone.
SMS while walking down the road without the fear of banging into an obstacle.
Read a book on the Kindle app.
Navigate to Kolar and back with Google Maps
Impress people with the live wallpapers, gesture search and some not so useful tidbits such as a lie detector and a sky map.

Google integration means that all my contacts are backed up on their servers. I am still surprised that something as simple as backing up contacts has been so hard for phones till now. Another cool feature that I loved was that I could link phone numbers with Facebook contacts together. Now when a friend calls, I see his Facebook photo automatically. The device has a decent battery life; with on and off wifi use, it lasts for a little more than a day. The device has a 5 megapixel camera and can record a 720p video. And, by the way, flash sites works too.

Before I bought the phone, I wasn’t sure how easy it would be to use a touch keyboard. Well, it wasn’t too hard with the default keyboard but ever since I switched to the Swype keyboard, I’m insanely fast. I’d recommend Swype to everybody. Good thing that Samsung provides this keyboard application for free.

There are a few downsides though. The phone feels sluggish when more than a couple of applications are running. The “Advanced Task Killer” application is essential for a decent experience. There is no flash light so one must depend on external light source being available to avoid dark photos. Sometimes (and this happens rarely), after moving out of a wifi zone, it won’t automatically switch to a GPRS/EDGE connection unless I restart the phone.

All in all, I’m happy with the device and eagerly waiting for an Android 2.2 (froyo) upgrade and Airtel’s 3G service.

Solr 1.4.1 Released »

From the mailing list announcement:

Apache Solr 1.4.1 has been released and is now available for public
download!
http://www.apache.org/dyn/closer.cgi/lucene/solr/

Solr 1.4.1 is a bug fix release. See the change log for more details.