Pythonic Implementations of Trie (Prefix Tree)
This post proposes a Pythonic implementation of Trie data structure, with insert
, search
, and starts_with
methods. With careful design, the Pythonic improvements reduced the lines of code from 27 lines to 8 lines.
class Trie(dict): |
See this LeetCode problem for more info.
Basic Implementation
The most basic implementation has separated TrieNode
and Trie
classes, and marks the end of the word with the is_end_of_word
attribute in TrieNode
.
class TrieNode: |
Improved
There are multiple improvements we can make to this basic implementation.
- We can mark the end of word by inserting a character out of the alphabet into the dictionary.
- If end-of-word marker is used, searching a full word is becomes to searching the word with end-of-word marker suffixed.
- We can merge the
Trie
andTrieNode
classes into one class, since each of theTrieNode
is also the root of a "sub-trie". - We can simplify the searching and inserting code with higher-order functions such as
reduce
,all
, andany
. - Since each trie is a
dict
, we can makeTrie
extendsdict
.
class Trie(dict): |
Now let's go into details of the implementation.
insert
reduce(lambda node, c: node.setdefault(c, {}), word, self)['$'] = True |
When inserting a word into the trie, we start from the root, iterate over the word and insert the characters into the dictionary at each level. This can be done in a Pythonic functional programming style using reduce
. We leverage dict.setdefault
to create a new level of trie for unseen word sequences before.
The final return value of reduce
is the leave of trie. By inserting {'$': True}
in this dictionary, we marked it as a full word.
The whole inserting process takes only one line, isn't it beautiful?
starts_with
node = self |
Let's go over starts_with
first since search
relies on this method. Recall that in Python, many built-in data structures have default boolean value, which is True
when they are non-empty. We can utilize this property to convert the searching process into a series of boolean and
operations.
To avoid the redundant declarative style, we leverage the walrus operator :=
to update the node as we iterate. When a character isn't found, the expression is short-circuited and False
is returned immediately.
search
return self.starts_with(word + '$') |
With the use of end-of-word marker, searching 'word'
is now equivalent to searching a word that starts with 'word$'
!
Bonus: Implementation of WordDictionary
https://leetcode.com/problems/design-add-and-search-words-data-structure/
class WordDictionary(dict): |