ElasticSeach _routing for Parent Child Document


ParentChild and Nested documents are some of the most powerful and key features which ranks ES higher than SOLR, unfortunately there isn't enough documentation around design guidelines and usage, In this blog I explore the importance of defining "_routing" while defining multi level child documents, When defining a parent child relationship it is important to consider following.

1. For successful search operation with desired results it is important that parent - child documents reside in the same shard, this is really the secret sauce of ES for parent child, you simply can't have parent child in shards that can end up in different nodes as it won't perform well.

2. Only when you have one level of parent child relationship does elasticsearch automatically set routing of child same as parent based on parent's id.

if you were to create a relationship that involves multiple level hierarchy like Parent, Child, Grand Child  etc.. it is important that you define the routing field. elasticsearch does not ensure more than one level of hierarchy is stored automatically in same shard, to solve this issue we need to use a special field "_routing"
The value of this field should be matched to the unique identifier in parent / top level document.

3. Another cool feature of ES parentchild is that you can load child independent of parent, there is no dependency on existence of parent while loading child documents.

With this configuration, you are always guaranteed that every child and sub-child document of a root parent document will reside within the same shard.

4. Another way to test and ensure your related documents are ending up in the same shard is to run the index status Query after you load your first set of document with related parent, run this query
 curl -XGET 'http://localhost:9200/parentchild/_status?pretty=1'

You should see all documents loading into any one particular shard, this can be validated by checking numb_docs after before and after loading the document.

  • docs: {
    • num_docs2
    • max_doc2
    • deleted_docs0
    }

Here is a good and bad example of the issues noticed with search when _routing is not defined.

1. Here is an example of incorrect (without _routing) defined.

Now run the test Query, this query filters all records in immediate parent (attributes) for leaf level child (attribute), it then apply the result to filter top level parent document (product)

curl -XPOST 'http://localhost:9200/parentchild/_search?pretty=true' -d '{
    "query": {
        "filtered": {
            "query": {
                "bool": {
                    "must": [{"term": {"catentry_id" :"EI01Y" }}]
                }
            },
         
        "filter": {
                "has_child": {
                    "type": "attributes",
                    "query": {
                        "filtered": {
                            "query": {
                                "bool": {
                                    "must": [{"term": {"attribute_id": "0001"}}]
                                }
                            },
                            "filter": {
                                "has_child": {
                                    "type": "attrvalues",
                                    "query": {
                                        "term": {
                                           "stringvalue": "test"
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}'


2. Here is an example of correct(with _routing) defined.



BERIKAN KOMENTAR ()
Jangan Lupa Tinggalkan Komentarnya di Kolom komentar jika ada bug, atau artikelnya error atau tulisannya salah ya sahabat
 
wisata tradisi game kuliner