Tutorial

How to Truncate Python String

11 min read

In Python, there are several methods available to truncate strings efficiently. Here is how to truncate Python string.

String truncation is the process of shortening a string to fit it within a specified width or character limit. It is an important technique in programming for formatting text data to be cleanly displayed in interfaces and reports.

Finally, we will present a practical example that will demonstrate truncation of person addresses during display. So, without further ado, let’s dive in and see what the problem is all about.

Why we need to truncate strings?

The challenge we are addressing when truncating strings is: how can we display a long string within a shorter available space?

Solving this problem might result in the string being shortened, or truncated, to fit in the available width.

Truncating strings becomes particularly important when, for instance, we are dealing with display limitations or when we are trying to ensure the privacy of information.

When we truncate strings, we present the information in the string concisely, without compromising meaning or integrity.

Consider a scenario where a program needs to display a list of available posts for a blog. The post body is what we are now concerned about, as we will see when we run the following Python code:

from datetime import datetime

posts = [
    {
        "Author": "Brett Leonard",
        "Title": "Executive others like blue.",
        "Date": datetime.strptime("2022-08-26", '%Y-%m-%d'),
        "Body": """Create culture campaign by anyone but choose. Keep total majority until their increase side. 
Expert often draw try series move southern be. Film that side population response two.
Fight have game report painting. Whose argue after thus.
Stay develop left with building. Large court marriage allow customer catch. Budget result south require management necessary. Prove rule everyone ok few.
Our be century story seven southern however long. Prevent sister pass the people. Serious keep meeting effect author international help.
Deep daughter little prepare. Collection movie no. Eight final most human letter sister project.
Building mention black approach find fund.
Generation he even. Run what south television on. Increase responsibility certain however sell."""
    },
    {
        "Author": "Cheryl Smith",
        "Title": "Even lawyer investment between body either friend.",
        "Date": datetime.strptime("2021-11-25", '%Y-%m-%d'),
        "Body": """Three social magazine name.
Debate good try end. Father four site truth eight. Tonight positive right music.
Drop factor training success machine.
Wonder house think. Because property key point visit.
Reality despite song value blue. Kind outside wind. Carry four dog fear action who.
Pay artist attention. School wife single system sell view bring reveal. Travel stop southern attorney sing door fill always. Offer leg market.
Foot per water spring beyond tree.
Cut language drive detail pass yeah."""
    },
    {
        "Author": "Kendra Mitchell",
        "Title": "Tonight number assume face.",
        "Date": datetime.strptime("2023-09-23", '%Y-%m-%d'),
        "Body": """Minute care wind cost. If possible behind season off manager. Woman court drug nor.
Live painting hit popular. Understand everybody student where.
Instead term full site. Ability than country low along for.
Political base water task evidence meeting. Yeah range site day action right necessary senior. Remember magazine increase decade. Image energy responsibility economic town war machine understand.
Fund feel hospital population pass pattern. Item cell good stage need.
Final industry personal sit where. Share physical author campaign.
Mind major and network. Sing magazine democratic week side everything.
Specific save hot owner oil. Serious course market feeling simply."""
    }
]

# Format codes to allow colors, bold, and italic on the console
RED_START = "\033[91m"
GREEN_START = "\033[32m"
GREY_START = "\033[90m"
BOLD_START = "\u001b[1m"
ITALIC_START = "\033[3m"
END = "\033[0m"

# Display the blog posts
for post in posts:
    print(f"{BOLD_START}{post['Title']}{END}")
    print(f"{RED_START}{BOLD_START}{post['Author']}{END} - {GREEN_START}{post['Date'].strftime('%Y/%m/%d')}{END}")
    print(f"{GREY_START}{ITALIC_START}{post['Body']}{END}")
    print("\n")

The code displays the properties of each blog post, including the title, author, date posted, and main post.

The post body has a lot of text, and as seen below, it is unsightly to display it completely.

output of untruncated String in Python which is main blog posts

Our goal in this article is to explore the different ways we can truncate long strings, then, we can make use of one of the techniques to format the display of the posts in a more appealing way.

Different ways to truncate string in Python

There are several ways we can truncate strings in Python.

Let’s dive into various approaches, beginning with the basic approach: slicing strings.

Truncate string using string Python slice

The most basic form of truncation involves string slicing, specifying the end index in order to extract the prefix portion of the string. Consider the following code:

name = "John Christopher Smith"
print(name[:15])

This slices name to only the first 15 characters before appending an ellipsis. We then get the following output:

Python output naive slice, no ellipsis

Real string truncations append an ellipsis, so we will modify our code to append it. The new string truncation code with ellipsis looks like what we have below:

name = "John Christopher Smith"
print(name[:15] + "...")

The output produced now has an ellipsis added to the end of the truncated string:

output naive slice with ellipsis

But, there’s one problem: our string has 22 characters. What happens when we set the truncation limit at 22?

name = "John Christopher Smith"
print(name[:22] + "...")

We asked our code to truncate after character 22. Our dumb code happily obliged to give us this silly-looking output:

output naive slice with ellipsis at limit

Which isn’t what we intended. Ellipsis are to be added only when truncation actually happens. So we have to modify the script with an if expression (an if statement will equally do):

char_limit = 20

name = "John Christopher Smith"
print((name[:char_limit] + "...") if char_limit<len(name) else name)

The condition in the if expression makes sure truncation only happens when the value of char_limit is smaller than the length of name. The output is as follows:

Python output of slice with ellipsis at limit, Truncate string  using string Python slice

As mentioned earlier, string slicing is a simple and effective method for truncating strings in Python.

By specifying the end index, we can extract a portion of a string based on the desired length. However, it has a glaring flaw: it can slice mid-word, which looks very unnatural.

This flaw underscores the necessity for a method that takes word boundaries into account. That is what we will explore in the next section.

Truncate string at word boundary

To maintain readability, we need a way to truncate strings at the nearest word boundary. This will make sure the truncated string remains meaningful.

There are three approaches we will explore:

  1. The split method

  2. Regular expressions

  3. textwrap module

So, let’s explore these approaches one after the other, beginning with the split method.

The split method to truncate string in Python

This approach makes use of the str.split method to extract individual words that comprise the string. Then recombine words one-by-one until reaching the length limit.

See the following code:

name = "John Christopher Smith"
words = name.split() 

char_limit = 20
output = []

for word in words:
    if len(' '.join(output + [word])) > char_limit:
        break

    output.append(word)
    
print(' '.join(output))

The script splits the name "John Christopher Smith" into individual words, and then sets the character limit to be 20 characters.

The script produces the following output:

output split no ellipsis

The for loop repeatedly adds the words one by one into the output list until we reach the length limitation. Then, the loop is exited using the break statement.

How do we add ellipsis to our result? Well, we have to make sure a truncation actually happened, before we add ellipsis. The code example below shows us how:

def truncate(s, limit):
    words = s.split() 
    output = []

    for word in words:
        if len(' '.join(output + [word])) > limit:
            break

        output.append(word)

    truncated = ' '.join(output)
    if len(truncated) < len(s):
        truncated += "..."

    return truncated

# Test the code
name = "John Christopher Smith"

print(truncate(name, 22))
print(truncate(name, 20))
print(truncate(name, 15))

This time, we refactored the truncate functionality into a truncate function. truncate defines two parameters:

  • s:str – representing the string to be truncated, and
  • limit:int – representing the character limit.

The function works similarly to our previous script, except that we have added a code segment to check if the length of the final truncated string is less than that of the original string.

If that is the case, then an ellipsis is appended to the returned value. Testing the truncate function produces the output:

output split truncate function function, split method to truncate string in Python
The next technique makes use of regular expressions to truncate strings. Let’s discuss it.

Python truncate string using Regular expressions

Using regex in Python, we can truncate by matching on word boundaries instead of blindly slicing.

We search using \b to select the substring before the index exceeding length.

import re

# Define the character limit
char_limit = 20

# Original string
name = "John Christopher Smith"

# Use a regular expression with an f-string to extract a portion of the name string
name = re.search(fr'(.{{0,{char_limit}}})(?!\w)', name).group(1)

# Display the result
print(name)

The part of the code doing the most work is the regex re.search function. The regex pattern ((.{{0,{char_limit}}})) looks a little detailed, but we will break it down below:

  • . – matches any character (except for a newline).
  • {{0,{char_limit}}} – allows for matching between 0 and the specified char_limit of the preceding character.
  • The outer parentheses ( ... ) define a capturing group, indicating the portion of the string to extract.
  • (?!\w): This is a negative lookahead assertion. It asserts that what follows the matched portion is not a word character (\w). In practical terms, it prevents the inclusion of the space after the matched substring.

The re.search(...).group(1) part retrieves the matched substring as the first (and only) capturing group, providing us with the desired result.

Our regex script produces the output below, without ellipses:

output regex no ellipsis

So, we can refactor this into a function, that also adds ellipsis:

def truncate(s, limit):
    truncated = re.search(fr'(.{{0,{limit}}})(?!\w)', name).group(1)

    if len(truncated) < len(s):
        truncated += "..."

    return truncated

# Test the code
name = "John Christopher Smith"

print(truncate(name, 22))
print(truncate(name, 20))
print(truncate(name, 15))

Now we get a better output:

Python truncate string using Regular expressions, output regex truncate function with ellipsis

Now, let’s continue our discussion with the next technique, the textwrap module.

textwrap module to truncate string in Python

This handy Python module has a shorten() function that truncates strings smartly to any length, without breaking words. It does this by dropping entire words/phrases that exceed the limit.

Here is an example of Python truncate string using the textwrap module:

import textwrap
name = "John Christopher Smith"
short_name = textwrap.shorten(name, width=15) # John Christopher
print(short_name)

The output produced when making use of textwrap.shorten on our input ("John Christopher Smith") is shown below:

output textwrap.shorten no ellipsis

The output of our script also takes care of adding ellipsis.

However, this is not the kind of ellipsis we are looking for. However, we need not worry because we can specify the kind of ellipsis we want after truncation is done on our input string.

textwrap.shorten also accepts a placeholder argument, that has a default string value of [...].

The next code example modifies the placeholder parameter to the string value ..., in order to give us the kind of ellipsis we have been making use of:

import textwrap

def truncate(s, limit):    
    truncated = textwrap.shorten(s, width=limit, placeholder="...")
    return truncated

# Test the code
name = "John Christopher Smith"

print(tr(name, 22))
print(tr(name, 20))
print(tr(name, 15))

We also created a truncate function similar to the one in the previous examples.

In this implementation, we made use of textwrap.shorten and specified our ellipsis placeholder.

This produced the output:

textwrap module to truncate string in Python, output textwrap truncate function

Now that we have discussed the string truncation techniques, we can now consider a practical example that makes use of string truncation.

Python truncate string example: Displaying all blog posts

Now we are ready to display our blog posts in a more readable form using string truncation.

We want to use one of our previous implementations to shorten, or truncate, the post part of each blog post. In the code below, theposts variable is still the dictionary of posts used previously.

The next code listing implements the functionality to truncate the main post body.

import textwrap

def truncate(s, limit):    
    truncated = textwrap.shorten(s, width=limit, placeholder="...")
    return truncated

RED_START = "\033[91m"
GREEN_START = "\033[32m"
GREY_START = "\033[90m"
BOLD_START = "\u001b[1m"
ITALIC_START = "\033[3m"
END = "\033[0m"

char_limit = 50     # initialize the char limit

for post in posts:
    print(f"{BOLD_START}{post['Title']}{END}")
    print(f"{RED_START}{BOLD_START}{post['Author']}{END} - {GREEN_START}{post['Date'].strftime('%Y/%m/%d')}{END}")

    # make use of the truncate function here
    print(f"{GREY_START}{ITALIC_START}{truncate(post['Body'], char_limit)}{END}")
    print("\n")

The script makes use of the textwrap.shorten implementation to create the truncate function used in the script. Then within the for loop, the truncation is then applied to each main post body before it is printed.

The output we get now is more readable:

Python truncate string example

In conclusion, the art of string truncation plays a pivotal role in enhancing readability and visual appeal, especially when dealing with lengthy content.

While basic string slicing might result in awkward mid-word truncations, our exploration has introduced more sophisticated methods.

We’ve delved into techniques such as the str.split method, regular expressions, and the textwrap.shorten function to intelligently abbreviate strings while preserving word boundaries.