Datalookup documentation

The Datalookup library makes it easier to filter and manipulate your data. The module is inspired by the Django Queryset Api and it’s lookups.

Note

This documentation will copy some information of the Django documentation. All lookups have the same name as in Django and some method’s name from the Queryset are also the same. However it’s important to note that Datalookup is not Django. It’s just a simple module for deep nested data filtering.

Installation

$ pip install datalookup

Example

Throughout the below examples (and in the reference), we’ll refer to the following data, which comprise a list of authors with the books they wrote.

data = [
    {
        "id": 1,
        "author": "J. K. Rowling",
        "books": [
            {
                "name": "Harry Potter and the Chamber of Secrets",
                "genre": "Fantasy",
                "published": "1998",
                "sales": 77000000,
                "info": {
                    "pages": 251,
                    "language": "English"
                }
            },
            {
                "name": "Harry Potter and the Prisoner of Azkaban",
                "genre": "Fantasy",
                "published": "1999",
                "sales": 65000000,
                "info": {
                    "pages": 317,
                    "language": "English"
                }
            }
        ],
        "genres": [
            "Fantasy",
            "Drama",
            "Crime fiction"
        ]
    },
    {
        "id": 2,
        "author": "Agatha Christie",
        "books": [
            {
                "name": "And Then There Were None",
                "genre": "Mystery",
                "published": "1939",
                "sales": 100000000,
                "info": {
                    "pages": 272,
                    "language": "English"
                }
            }
        ],
        "genres": [
            "Murder mystery",
            "Detective story",
            "Crime fiction",
            "Thriller"
        ]
    }
]

Datalookup makes it easy to find an author by calling one of the methods of the Dataset class like filter() or exclude(). There are multiple ways to retrieve an author.

Basic filtering

Use one of the field of your author dictionary to filter your data.

Note

Datalookup consider a dictionary as a Node. Each keys of the dictionary is converted to a Field. Each dictionary can contain one or multiple fields. Some fields are considered ValueField and others RelatedField. Those fields are what will help us filter your dataset.

from datalookup import Dataset

# Use Dataset to manipulate and filter your data. We assume for the
# next examples that this line will be added.
books = Dataset(data)
assert len(books) == 2

# Retrieve an author using one of the field of the author.
# Something like 'id' or 'author'
authors = books.filter(author="J. K. Rowling")
assert len(authors) == 1
assert authors[0].author == "J. K. Rowling"

AND, OR - filtering

Keyword argument queries - in filter(), etc. - are “AND”ed together. If you need to execute more complex queries (for example, queries with OR statements), you can combine two filter request with “|”.

# Retrieve an author using multiple filters with a single request (AND). This
# filter use the '__icontains' lookup. Same as '__contains' but case-insensitive
authors = books.filter(books__name__icontains="and", books__genre="Fantasy")
assert len(authors) == 1
assert authors[0].author == "J. K. Rowling"

# Retrieve an author by combining filters (OR)
authors = books.filter(author="Stephane Capponi") | books.filter(
    author="J. K. Rowling"
)
assert len(authors) == 1
assert authors[0].author == "J. K. Rowling"

Cascade filtering

Sometimes you will want to filter the author but also the related books. It is possible to do that by calling the on_cascade() method before filtering.

# Filter the author but also the books of the author
authors = books.on_cascade().filter(
    books__name="Harry Potter and the Chamber of Secrets"
)
assert len(authors) == 1
assert authors[0].author == "J. K. Rowling"

# The books are also filtered
assert len(authors[0].books) == 1
assert authors[0].books[0].name == "Harry Potter and the Chamber of Secrets"

Lookup filtering

You might have seen in the previous examples the use of lookups to retrieve authors or books. Here are a couple more examples:

# Use of the '__contains' lookup to look into the 'genres' fields
authors = books.filter(genres__contains="Fantasy")

# Use of the '__gt' lookup to get all authors that wrote a book with
# more than 'X' pages
authors = books.filter(books__info__pages__gt=280)

# Same as above but with '__range'. Find author that wrote a book
# with the numbers of pages between 'X' and 'y'
authors = books.filter(books__info__pages__range=(250, 350))

Dataset Api

Here’s the formal declaration of a Dataset:

class Dataset(data: Union[dict, list])[source]

A Dataset is the entry point to manipulate and filter your data. Usually when you’ll interact with a Dataset you’ll use it by chaining filters. To make this work, most methods return new dataset. These methods are covered in detail later in this section.

Note

A Dataset will accept a dictionary for the data parameter. Bear in mind that if you use values on this kind of dataset, it will still return a list of dictionary.

Class methods

from_json()

Dataset.from_json(file: str)[source]

Return a Dataset from a json file.


from_nodes()

Dataset.from_nodes(nodes: list[Node])[source]

Return a Dataset from a list of Nodes. This is mostly use internally.

Methods that return new Dataset

filter()

Dataset.filter(**kwargs)[source]

Returns a new Dataset containing objects that match the given filter parameters.

The filter parameters (**kwargs) should be in the format described in the Field lookups below. Multiple parameters will be ANDsed together



exclude()

Dataset.exclude(**kwargs)[source]

Returns a new Dataset containing objects that do not match the given filter parameters.

The filter parameters (**kwargs) should be in the format described in the Field lookups below. Multiple parameters will be ANDsed together


distinct()

Dataset.distinct()[source]

Returns a new Dataset without duplicate entry.


values()

Dataset.values()[source]

Returns a list of dictionaries instead of a Dataset.


on_cascade()

Dataset.on_cascade()[source]

Must be followed by filter(), exclude() or other filtering methods (like books.on_cascade().filter(…)). This method will not only filter the current dataset but also the related field dataset. Example:

# Filter the author but also the books of the author
authors = books.on_cascade().filter(
    books__name="Harry Potter and the Chamber of Secrets"
)
assert len(authors) == 1
assert authors[0].author == "J. K. Rowling"

# The books are also filtered
assert len(authors[0].books) == 1
assert authors[0].books[0].name == "Harry Potter and the Chamber of Secrets"

Field lookups

Field lookups are used to specify how a the dataset should query the results it returns. They’re specified as keyword arguments to the Dataset methods filter() and exclude(). Basic lookups keyword arguments take the form “field__lookuptype=value”. (That’s a double-underscore).

As a convenience when no lookup type is provided (like in books.filter(id=1)) the lookup type is assumed to be exact.

exact

Exact match.

Examples:

books.filter(id__exact=1)

iexact

Case-insensitive exact match.

Example:

books.filter(author__iexact='j. k. rowling')

contains

Case-sensitive containment test. Value type can be a ‘list’

Example:

books.filter(books__name__contains='And')
books.filter(books__name__contains=['And', 'Potter'])

icontains

Case-insensitive containment test. Value type can be a ‘list’

Example:

books.filter(books__name__contains='and')
books.filter(books__name__contains=['and', 'potter'])

in

In a given iterable; often a list, tuple, or dataset. It’s not a common use case, but strings (being iterables) are accepted.

Examples:

books.filter(id__in=[1, 3, 4])
books.filter(author__in='abc')

gt

Greater than.

Example:

books.filter(id__gt=1)

gte

Greater than or equal to.


lt

Less than.


lte

Less than or equal to.


startswith

Case-sensitive starts-with.

Example:

books.filter(author__startswith='J.')

istartswith

Case-insensitive starts-with.

Example:

books.filter(author__istartswith='j.')

endswith

Case-sensitive ends-with.

Example:

books.filter(books__name__endswith='Azkaban')

iendswith

Case-insensitive ends-with.

Example:

books.filter(books__name__endswith='azkaban')

range

Range test (inclusive).

Example:

books.filter(books__info__pages__range=(250, 350))

isnull

Takes either True or False, which correspond to None in Python.

Example:

books.filter(books__sales__isnull=True)

regex

Case-sensitive regular expression match. This feature is provided by a (Python) user-defined REGEXP function, and the regular expression syntax is therefore that of Python’s re module.

Example:

books.filter(author__regex=r'.*Row.*')

iregex

Case-insensitive regular expression match.

Example:

books.filter(author__regex=r'.*row.*')

ArrayFields lookups

There are special lookups for ArrayField like:

"genres": [
    "Fantasy",
    "drama",
    "crime fiction"
]

contained_by

This is the opposite of the contains lookup - the objects returned will be those where the data is a subset of the values passed. For example:

authors = books.filter(
    genres__contained_by=["Fantasy", "Drama", "Crime fiction"]
)
assert len(authors) == 1

overlap

Returns objects where the data shares any results with the values passed.

authors = books.filter(genres__overlap=["Fantasy"])
assert len(authors) == 1

authors = books.filter(genres__overlap=["Fantasy", "Thriller"])
assert len(authors) == 2

len

Returns the length of the array. For example:

authors = books.filter(genres__len=3)
assert len(authors) == 1
assert authors[0].name == "J. K. Rowling"

Node class

Here’s the formal declaration of a Node:

class Node(data: dict)[source]

A Node represent a dictionary where the value of each key is a specific Field. Right now exists two categories of Fields. ValueField and RelatedField. The first one is used for - int, float, str. And the second one for dictionary and list of dictionary.

Thanks to those fields we are able to filter and find every nodes that the user query.

Methods

filter()

Node.filter(**kwargs)[source]

Returns the current Node or raise an ObjectNotFound exception.

The filter parameters (**kwargs) should be in the format described in the Field lookups below. Multiple parameters will be ANDsed together


values()

Node.values()[source]

Returns a dictionary, with the keys corresponding to the field names of the node object.