Tutorial

BitArray in Python

14 min read

In this article, we’ll explore how to work with bitarray in Python using the bitarray library.

Computers store data in the form of bits.

But what exactly are bits?
Bits hold binary values: 0 or 1.

We can easily make use of a specialized data structure called a bit array to make it easy to manipulate collections of bits. Harnessing the power of individual bits often requires specialized data structures, like bit arrays.

First understand what are bits?

The term “bit” is a portmanteau of “binary digit“.

In the binary system, digits are used in base 2, rather than the base 10 system we use in everyday life.

This means that a bit, which is the smallest unit of data in a computer, can have only one of two values: 0 or 1.

These two values can also represent other binary opposites such as true/false, on/off, yes/no, and so on. The choice of what each value represents is arbitrary and can be chosen to suit the situation.

Computer information is stored in the form of bits.

For example, a text character is typically stored as a sequence of 8 bits (known as a byte), with each bit representing a binary value.

In terms of data storage and transmission, bits are often grouped into larger units such as bytes (8 bits), kilobytes (1024 bytes), megabytes (1024 kilobytes), and so on.

Thus, a single bit represents the most basic unit of data a computer can process. However, when we need to manage numerous bits efficiently, bit arrays come into play.

What are BitArrays in Python?

Bit arrays (also known as bitmaps, bit sets, bit strings, or bit vectors), provide an efficient way to store sequences of bits.

Rather than using traditional data types like integers or strings, bit arrays allow for compact representation, especially when dealing with large datasets or sets of boolean values.

The Python library, `bitarray`, allows easy manipulation of bit arrays in Python.

In the following sections, we will see how to install, and make use of bitarray library.

How to use Python bitarray library?

While Python offers us built-in collection data structures like lists, the `bitarray` library provides the specific functionalities for working with bit arrays.

It makes it easy to create, manipulate, and perform other operations on bit arrays.

To make use of bit arrays in Python, simply install the library using pip:

pip install bitarray

In addition, conda packages are available (both the default Anaconda repository as well as conda-forge support bitarray):

conda install bitarray

Now that we have the bitarray library installed, we can now start exploring how to work with bit arrays in Python.

Getting started with BitArrays in Python

The bitarray library allows us to create and manipulate bit arrays.

In the sections that follow, we will explore-

  1. How to create bit arrays
  2. Perform logical operations on bitarrays
  3. Slice bitarrays
  4. Concatenate bit arrays
  5. count the vales in bitarrays
  6. Perform search within a bit array

So, let’s get started!

Create BitArrays in Python

To make use of a bit array, we have to create it first.

The code below shows how we can create bit arrays in code:

from bitarray import bitarray

empty_array = bitarray()  # Creates an empty bitarray
print("empty array:",empty_array)

bit_array = bitarray(10)  # Creates a bitarray of size 10
print("bit array:", bit_array)

Our code example creates two bit array instances.

The first one, empty_array, is an empty bit array.

We used the bitarray() constructor without any arguments to create an empty bit array.

We might need such bit arrays when we do not know in advance how many values we will have.

If we know in advance how many bits we need to store, we can specify it while constructing the bitarray instance.

To do this, we pass the desired size as an argument to the bitarray() constructor.

This is what our code did when creating the second bitarray instance. In this example, bit_array is an instance of the bitarray class with a size of 10.

output for creating bitarrays

When a bit array is created with a fixed size, all bits are initially set to 0 (as seen in the output).

As we will discuss later, we can modify the individual bits in the array using indexing.

Initializing bit arrays using setall method

bitarray has a method called setall.

This method allows you to set all bits in a bit array to either 1 or 0.

This can be particularly useful when you want to initialize a bit array with a specific value.

Here’s how you can use the setall method:

from bitarray import bitarray
bit_array1 = bitarray(10)
bit_array2 = bitarray(10)

bit_array1.setall(0)
bit_array2.setall(1)

print("bit_array1:", bit_array1)
print("bit_array2:", bit_array2)

The code snippet defines two variables, bit_array1 and bit_array2, both having the same size of 10 bits.

The setall method takes a single argument, which is the value to be set for all bits. bit_array1 has all bits set to False (or 0), and bit_array2 is set to all True (or 1).

output for initializing bitarray with setall

This method modifies the bit array in-place, meaning that it doesn’t return a new bit array, but instead changes the values of the bits in the existing bit array.

Initializing bit arrays from list

Another way to initialize a bit array is to use a list of bits, or booleans.

This can be particularly useful when you have a sequence of boolean values that you want to store efficiently.

See the example in the code snippet below:

from bitarray import bitarray

bit_array1 = bitarray([1, 0, 1])
bit_array2 = bitarray([True, False, True])

print("bit_array1:", bit_array1)
print("bit_array2:", bit_array2)

If we run the code, we see from the output that both bit_array1 and bit_array2 have the same contents:

output for initializing bitarray with a list of bits or booleans

In this example, bit_array is a bitarray that is initialized with the boolean values True, False, and True.

These boolean values are represented as bits in the bitarray, with True represented as 1 and False represented as 0.

This method of initializing a bit array is very intuitive, as you simply provide a list of the bit values you need.

The `bitarray` constructor will then create a bitarray with the corresponding bit values.

Note:  The order of the bit, or boolean, values in the list corresponds to the order of the bits in the bitarray. That is, the first value in the list corresponds to the first bit in the bitarray, the second boolean value corresponds to the second bit, and so on.

Initializing bit arrays from binary string

Another interesting way in which we can initialize a bitarray is by supplying a binary string.

This can be particularly useful when we have a binary representation of data that we want to store.

Here’s how we can initialize a bit array from a binary string:

from bitarray import bitarray

str_array = bitarray('101010')  # Initializes from binary string

print(str_array)

In this example, the bitarray instance is initialized with a string of bits, or a binary string, 101010.

When the code prints out str_array, we get the following output:

output for initializing bitarray with binary string

The constructor interpretes the string as a sequence of bits., with 1 representing True and 0 representing False.

Like the case of using a list to initialize a bit array, the order of the bits in the string also sorresponds to the order of the bits in the bitarray.

Adding more bits to bit arrays

It is also possible to add more values to a bitarray.

We can do so using either the append, extend, or insert methods.

See the code for how to add bits using these methods.

from bitarray import bitarray

bit_array = bitarray('1001') 
print("Initial bit array:", bit_array)

# Append values True, True
bit_array.append(True)
bit_array.append(True)
print("bit array after appending 'True, True':", bit_array)

# Extend bitarray with string '1100'
bit_array.extend('1100')
print("bit array after extending with '1100':", bit_array)

# Extend bitarray with list [True, False, True]
bit_array.extend([True, False, True])
print("bit array after extending with the list '[True, False, True]':", bit_array)

# Insert Boolean value True
bit_array.insert(0, True)
print("bit array after inserting 'True' at the first position:", bit_array)

# Insert bit value 0
bit_array.insert(1, 0)
print("bit array after inserting '1' at the second position:", bit_array)

Our code has a series of calls to print so show what happens when each method is called.

The printouts are shown below:

output for appending, extending, and inserting values into a bit array

bit_array.append expects a single parameter; the bit to append.

Then it appends the bit to the end of the bit_array.

With bit_array.extend, we can append a binary string, or a list of bits to an existing bit array.
All we need to do is to specify the value we want to use in extending the existing bit array.

Finally, the method insert requires an argument for the position to insert in the bit array, and another argument for the bit to insert.

Any method can be selected based on the needs of the code we are writing.

Concatenating bit arrays into one

Concatenation merges two or more bit arrays to get a single, larger array.

Concatenation in bit arrays is straightforward and follows the same principles as concatenating strings or lists in Python.

from bitarray import bitarray

# Initialize bit arrays
bit_array1 = bitarray('1010')
bit_array2 = bitarray('0101')

# Perform concatenation
combined_array = bit_array1 + bit_array2

print(combined_array)

The printout for the code execution is shown below:

output for concatenating two bitarrays

From the output, we can see that the two bitarrays have been joined end-to-end to form a new bitarray.

Accessing values from a bit array

Accessing values from a bit array is similar to list indexing.

We just need to specify the index within square brackets as seen in the following code:

from bitarray import bitarray

# Create a bitarray
bit_array = bitarray('1100')

# Get the first and last values
first_value, last_value = bit_array[0], bit_array[-1]

print("The first value is:", first_value)
print("The last value is:", last_value)

From our code example, to access the first bit of bit_array we indexed bit_array at position 0.

Accessing the last bit in bit_array was done in a way similar to accessing the last item in a Python list; we indexed bit_array at position -1.

See the output below for the printout of our Python script.
output for getting values from a bitarray

With this indexing capability, we can also modify bits in specific positions, just like in the following code:

from bitarray import bitarray

# Create a bitarray
bit_array = bitarray('11101100')

# Set the value at index 3 and 6 to True
bit_array[3] = 1
bit_array[6] = 1

print("The new bit array is:", bit_array)

The printout shows that we successfully modified the bits at indexes `3` and `6`.

output for setting values in a bitarray

To access multiple bits at once, we can slice the bit array just as we would with a list.

Slicing allows us to access a range of bits in the bitarray.

Here’s an example:

from bitarray import bitarray

# Initialize bit array
bit_array = bitarray('1010111000')

# Perform slicing
subset = bit_array[2:8]

print(subset)

In this example, we used slicing (`[start:stop]`), which returns all elements from the `start`-th element up to but not including the `stop`-th element. In our case, the sart index was `2`, while the stop index was at position `8`.

The output below shows the result of our slicing.

output for slicing a bitarray to get a range of bits

Logical operations on bitarrays

The bitarray supports a number of bitwise logical operations.

Here’s a brief overview of these operations:

  • Bitwise AND (&): Returns 1 if both the bits are 1, else it returns 0.
  • Bitwise OR (|): Returns 1 if either of the bit is 1, else it returns 0.
  • Bitwise NOT (~): Returns 0 if the bit is 1, or 1 if the bit is 0.
  • Bitwise XOR (^): Returns 1 if one of the bits is 1 and the other is 0, else it returns 0.
  • Bitwise right shift (>>): Returns the bit array (on the left of the operator) shifted to the right by the quantity specified (on the right of the operator), and fills 0 in the voids left as a result.
  • Bitwise left shift (<<): Returns the bit array (on the left of the operator) shifted to the left by the quantity specified (on the right of the operator), and fills 0 in the voids left as a result.

Here’s a code example of how you can use these operations:

from bitarray import bitarray

# Creating bitarrays
bit_array1 = bitarray([False, False, True, True])
bit_array2 = bitarray([False, True, False, True])

# Logical operators
bit_and = bit_array1 & bit_array2   # Bitwise AND
bit_or = bit_array1 | bit_array2    # Bitwise OR
bit_not = ~bit_array1               # Bitwise NOT
bit_xor = bit_array1 ^ bit_array2   # Bitwise XOR

# Shift operators
bit_rshift = bit_array1 >> 1        # Bitwise right shift
bit_lshift = bit_array1 << 2        # Bitwise left shift

# Print the results
print("AND: ", bit_and)
print("OR: ", bit_or)
print("NOT: ", bit_not)
print("XOR: ", bit_xor)
print(f"Right Shift of {bit_array1}: ", bit_rshift)
print(f"Left Shift of {bit_array1}: ", bit_lshift)

In this code snippet, bit_array1 and bit_array2 are initialized with bit arrays '0011' and '0101', respectively.

Subsequently, logical operations such as AND, OR, XOR, and NOT are performed between these bit arrays, showcasing their utility in bitwise manipulation.

The results of these operations are then printed out as seen below:

output for bit-wise logical and shift operations on bit arrays

Counting set and unset bits

Counting the number of set bits, or ones, and unset bits, or zeros, within a bit array is a common operation.

The bitarray library simplifies this task by providing the count method to count bits that are 1 or 0.

See the example below for how this can be done:

from bitarray import bitarray

# Initialize bit array
bit_array = bitarray('100011110000')

# Count set and unset bits
set_bits_count = bit_array.count(True)
unset_bits_count = bit_array.count(0)

print("Number of set bits: ", set_bits_count)
print("Number of unset bits: ", unset_bits_count)

The first call to the count method specifies that we want to count the bits that are set to 1 or True.

From the output we see that we have 5 set bits. The second call to the count method counts the number of unset bits (which is 7 in this case).

See the output for the printouts:

output counting the set and unset bits in a bit array

Its time to put what we now know about Python bit arrays, it is time to put it to use by writing a simple prime sieve algorithm.

Example: Sieve of Eratosthenes

The Sieve of Eratosthenes is an algorithm used to find all prime numbers from the smallest, 2, to a specified integer.

In this section, we will explore how to make use of bit arrays to make the algorithm more efficient.

The sieve algorithm iteratively marks the multiples of each prime number as composite, gradually sieving out non-prime numbers until only primes remain.

This method efficiently identifies primes by eliminating multiples of each prime starting from the smallest prime, 2.

In the context of this algorithm, each index in the bit array represents a number, and its corresponding value indicates whether the number is prime or composite.

Let’s delve into a Python implementation of the Sieve of Eratosthenes algorithm using bit arrays:

from bitarray import bitarray

def sieve_of_eratosthenes(n):
    primes = bitarray(n + 1)
    primes.setall(True)
    primes[:2] = False

    for i in range(2, int(n ** 0.5) + 1):
        if primes[i]:
            primes[i * i::i] = False

    return [i for i in range(n + 1) if primes[i]]

# Example usage
print(sieve_of_eratosthenes(100))

From the script we wrote, we defined a function sieve_of_eratosthenes that accepts a single parameter, n, which represents the upper limit, and returns a list of primes up to that limit.

The algorithm initializes a bit array primes with a size of n + 1, setting all values initially to True.

Then, it marks index 0 and 1 as non-prime by setting their corresponding indices in the bit array to False.

Then, it iterates over each number from 2 to the square root of n, marking multiples of each prime as non-prime (by setting those bit positions to False).

Finally, our function returns a list of prime numbers up to n based on the values in the bit array. The list of primes is seen in the output below:

output for using bit arrays to implement the sieve of eratosthenes bitarray

In conclusion, understanding bit arrays can significantly enhance the performance and efficiency of various algorithms and data manipulation tasks in Python.

With the `bitarray` library, handling boolean data becomes intuitive and streamlined, opening doors to a wide range of applications in computer science and programming.