Abstract

JSON array gives you flexibility in the type of elements, number of elements, size of the elements, and the depth of the elements. This adds to the flexibility of operational JSON databases like Couchbase and MongoDB. The performance of queries with an array predicate in operational databases depends on the array indexes. However, the array indexes in these databases come with significant limitations. E.g. only one array key is allowed per index. The array indexes, even when created, can only process AND predicates efficiently. The upcoming Couchbase 6.6 release removes these limitations by using a built-in inverted index to be used to index and query arrays in N1QL. This article explains the background and the workings of this novel implementation.

Introduction

An array is a basic type built into JSON defined as _An array is an ordered collection of values. An array begins with [left bracket and ends with ]right bracket. Values are separated by ,comma. _An array gives you flexibility because it can contain an arbitrary number of scalar, vector, and object values. A user profile can have an array of hobbies, a customer profile an array of cars, a member profile an array of friends. Couchbase N1QL provides a set of operators to manipulate arrays; MongoDB has a list of operators to handle arrays as well.

Before you start querying, you need to model your data in arrays. All JSON document databases like Couchbase, MongoDB recommend you to denormalize your data model to improve your performance and appdev. What that means is, transform your 1:N relationship into a single document by embedded the N into 1. In JSON, you’d do that by using an array. The example below, the document(1) contains 8 (N) likes. Instead of storing a foreign key reference to another table, in JSON, we store data inline.

JSON

1

"public_likes": [

2

    "Julius Tromp I",

3

    "Corrine Hilll",

4

    "Jaeden McKenzie",

5

    "Vallie Ryan",

6

    "Brian Kilback",

7

    "Lilian McLaughlin",

8

    "Ms. Moses Feeney",

9

    "Elnora Trantow"

10

],

Values here are arrays of strings. In JSON, each element can be any valid JSON type: scalars (numeric, string, etc), or objects or vectors (arrays). Each hotel document contains an array of reviews. This is the process of denormalization. Converting multiple 1:N relationships to a single hotel object that contains N public_likes and M reviews. With this, the hotel object embeds two arrays: public_likes and reviews. There can be any number of values of any type under these arrays. This is the key contributing factor to JSON Schema flexibility. When you need to add new likes or reviews, you simply add a new value or an object to this.

JSON

1

"reviews": [

2

    {

3

        "author": "Ozella Sipes",

4

        "content": "This was our 2nd trip here...",

5

        "date": "2013-06-22 18:33:50 +0300",

6

        "ratings": {

7

            "Cleanliness": 5,

8

            "Location": 4,

9

            "Overall": 4,

10

            "Rooms": 3,

11

            "Service": 5,

12

            "Value": 4

13

        }

14

    },

15

    {

16

        "author": "Barton Marks",

17

        "content": "We found the hotel ...",

18

        "date": "2015-03-02 19:56:13 +0300",

19

        "ratings": {

20

            "Business service (e.g., internet access)": 4,

21

            "Check in / front desk": 4,

22

            "Cleanliness": 4,

23

            "Location": 4,

24

            "Overall": 4,

25

            "Rooms": 3,

26

            "Service": 3,

27

            "Value": 5

28

        }

29

    }

30

],

Like the hotel object above, you denormalize your data model into JSON, there could be many arrays for each object. Profiles have arrays for hobbies, cars, credit cards, preferences, etc. each of these can be scalars (simple numeric/string/boolean values) or vectors (arrays of other scalars, arrays of objects, etc).

Once you model and store the data, you need to process it — select, join, project. Couchbase N1QL (SQL for JSON) provides an expressive language to do these and more. Here are common use cases.

SQL

1

1\. Find all the documents with a simple value can be done by either of the following queries.

2

3

SELECT *

4

FROM `travel-sample`

5

WHERE type = "hotel"

6

    AND ANY p IN public_likes SATISFIES p = "Vallie Ryan" END

7

8

9

SELECT t

10

FROM `travel-sample` t

11

      UNNEST t.public_likes AS p

12

WHERE t.type = "hotel"

13

    AND p = "Vallie Ryan"

SQL

1

2\. Find all the documents that match a range.  In this case, we try to find all the documents that have atleast one rating has “Overall” > 4

2

3

SELECT COUNT(1)

4

FROM `travel-sample`

5

WHERE type = "hotel"

6

    AND ANY r IN reviews SATISFIES r.ratings.Overall > 4 END

7

8

SELECT COUNT(1) 

9

FROM `travel-sample` t

10

UNNEST reviews AS r

11

WHERE t.type = "hotel"

12

    AND r.ratings.Overall > 4

13

GROUP BY t.type

SQL

1

3\. Find all the documents where every rating for  “Overall” > 4

2

3

SELECT * FROM `travel-sample` WHERE type = ‘hotel’ AND ANY AND EVERY r in reviews SATISFIES r.Overall > 4 END

4

5

SELECT COUNT(1)

6

FROM `travel-sample`

7

WHERE type = "hotel"

8

    AND ANY

9

    AND EVERY r IN reviews SATISFIES r.ratings.Overall > 4 END

10

11

SELECT reviews[*].ratings[*].Overall

12

FROM `travel-sample`

13

WHERE type = "hotel"

14

    AND ANY

15

    AND EVERY r IN reviews SATISFIES r.ratings.Overall > 4 END

16

limit 10;

17

18

    [

19

    {

20

        "Overall": [

21

            5

22

        ],

23

        "name": "The Bulls Head"

24

    },

25

    {

26

        "Overall": [

27

            5,

28

            5,

29

            5,

30

            5,

31

            5

32

        ],

33

        "name": "La Pradella"

34

    },

35

    {

36

        "Overall": [

37

            5,

38

            5,

39

            5

40

        ],

41

        "name": "Culloden House Hotel"

42

    },

43

    {

44

        "Overall": [

45

            5

46

        ],

47

        "name": "Auberge-Camping Bagatelle"

48

    },

49

    {

50

        "Overall": [

51

            5,

52

            5

53

        ],

54

        "name": "Avignon Hotel Monclar"

55

    }

56

    ]
Array Indexing:

Indexing arrays are a challenge for B-tree based indexes. However, the JSON database has to do it to meet the performance requirements: MongoDB does itCouchbase does it. However, both come with limitations. You can only have one array key within an index. This is true of MongoDB; this is true of Couchbase N1QL. The core reason for this limitation is when you index elements of an array, you need separate index entries.

Plain Text

1

Consider the array:

2

3

Document key: “bob”

4

{

5

   “Id”: “bob123”

6

   “A”: [1, 2, 3, 4]

7

   “B”: [521, 4892, 284]

8

}

9

10

Indexing of the field “id”  simply requires 1 entry in the index:   

11

         “bob123”:bob

12

13

Indexing of the field “a” requires 4 entries in the index: 

14

        “1”:”bob”, 2:”bob”, 3:”bob”, 4:”bob”

15

16

Indexing of the composite index (id, a) requires 4 entries:

17

   “bob123”, 1: bob

18

   “bob123”, 2: bob

19

   “bob123”, 3: bob

20

   “bob123”, 4: bob

21

22

Indexing of the composite index (id, a, b) requires the following 12 entries:

23

   “bob123”, 1, 521: bob

24

   “bob123”, 1,4982: bob

25

   “bob123”, 1, 284: bob

26

   “bob123”, 2, 521: bob

27

   “bob123”, 2,4982: bob

28

   “bob123”, 2, 284: bob

29

   “bob123”, 3, 521: bob

30

   “bob123”, 3,4982: bob

31

   “bob123”, 3, 284: bob

32

   “bob123”, 4, 521: bob

33

   “bob123”, 4,4982: bob

34

   “bob123”, 4, 284: bob

The size of the index grows exponentially as the number of array keys in the index and the number of array elements in the index. Hence the limitation. Implications of this limitation are:

  • Push only one array predicate to the index scan and handle other predicates after the index scan.
  • This means queries with multiple array predicates could be slow
  • Avoid composite indexes with array keys to avoid huge indexes.
  • This means queries with complex predicates on array keys will be slow

#nosql #performance #mongodb #json #couchbase #index #arrays #modeling and querying

Arrays in JSON: Modeling, Querying and Indexing Performance
3.95 GEEK