Sunday, October 7, 2018

I saw Brett Hazen speak on Elasticsearch at TCDNUG Thursday night.

As best as I could tell in listening to Brett talk in Elasticsearch you have indexes that hold data. In version 6.4, the most recent, you can jam two different bits of data into an index but that is going away in version 7 to make the indexes and the goodies they store a bit more dictionaryesque. Elasticsearch doesn't want you to jam a table full of stuff into an index. Elasticsearch makes up the E in the ELK stack wherein the L is for Logstash which puts data into Elasticsearch and the K for Kibana which is sort of a UI for Elasticsearch which includes some charting for SSRS-flavored stuff. Brett is working for General Mills doing big data work to help others determine what parts of their big data they should care about. Using Impala to crawl a data lake (a repository of NoSQL stored data) with Elasticsearch Brett found it desirable to have an autocomplete for a search field. There are analyzers to help with this that have zero or more character filters and there are four kinds of suggesters. The completion suggester just works on prefix using mathematical formulas wearing the "finite state transducer" moniker. What is the "finite state transducer" exactly? Brett suggested that he looked into it and that it's just too much to understand. At some point things become magic and if Brett can't explain it I'm honestly not going to go looking into it either. I am betting that I won't get it either. The context suggester is an extension of the completion suggester. It allows for further filtering either by category or by geolocation. The term suggester allows for a numeric value for how many characters can be off in a single word while still matching on an index. There is basically a one character mix up between viola and voilà so if you type in voilà I think that you'd find viola assuming an "edit distance" of 1 (or greater) and then you'd be saying "Voilà, there's the viola." aloud in a happy tune. Don't strum up an edit distance of 10 or 20 or you'll match on everything. Be careful how you play. The phrase suggester allows you to specify n number of grams for matching against a specific word and in this regard it really isn't much different than how the term suggester behaves, however you may instead specify n number of shingles for matching against a phrase and therein lies its distinction. Amazon has Elasticsearch as a service. Azure doesn't but it used to have "Azure Search" which just wrapped Elasticsearch. It may still have this. Elasticsearch was portrayed as superior to Lucene which is less easy to work with. The Twin Cities .NET User Group meets at ILM Professional Services (wherein ILM stands for imagination, learning, mentoring) and after Brett Hazen spoke a William Austin demoed a product called OzCode which is a new sponsor of TCDNUG. OzCode helps you debug in Visual Studio. If you are stopped at breakpoint and there is a collection of thousands of things to increment through, it will break it up into chunks for you. You may search across all chunks with a textbox and pick properties of the object to reveal with the Reveal feature and revealed properties will appear as you browse the collections before you drill into a specific object to look at the properties. Basically, you have a name for an object from the outside looking in. I guess this is a variant of this trick. What is more, OzCode will, when highlighting a breakpoint in the familiar yellow, highlight things that resolve to true in green and things that resolve to false in red as seen here:

No comments:

Post a Comment