Python
Basic Data Types
string
split

Python split() string: by character, multiple separators, to list, regex

One of the most common tasks in programming, especially when dealing with text-based data, is string truncation. Python is one of the easiest, most powerful, and most intuitive of almost any programming language when it comes to dealing with strings.

The same is true for string truncation. Python provides a number of built-in methods and functions for truncating strings.

In this article, I'll summarize how to work with strings in Python by going through the various techniques for truncating and manipulating strings, as well as examples of their use.

1. Create a list by truncating each character with the list() method

In Python, a string is a sequence of characters enclosed in single or double quotes. Strings are immutable, meaning that their contents cannot be changed once they are declared. Python provides some basic built-in methods for working with strings, such as cutting, pasting, and formatting them. Of these, we'll focus on how to split a string.

First, we'll use the list() method to split every character in a string, including spaces, into an array.

text = "Python is great language!"
text_list = list(text)
# Output: ['P', 'y', 't', 'h', 'o', 'n', ' ', 'i', 's', ' ', 'g', 'r', 'e', 'a', 't', ' ', 'l', 'a', 'n', 'g', 'u', 'a', 'g', 'e', '!']
 
# If we wanted to create a list without spaces, we would first remove the spaces with the replace() method and then call the list() method.
 
text_list = list(text.replace(' ', ''))
# Output:  ['P', 'y', 't', 'h', 'o', 'n', 'i', 's', 'g', 'r', 'e', 'a', 't', 'l', 'a', 'n', 'g', 'u', 'a', 'g', 'e', '!']

2. Splitting a string with the split() method

The split() method is the most common and simplest way to split a string in Python. It basically splits a string based on whitespace and returns a list of substrings.

Example using split().

text = "Python is great language!"
words = text.split()
 
print(words)

Here's the result of running the example.

['Python', 'is', 'great', 'language!']

You can also pass the base character at which you want to split the string as an argument to the split() method, so you can split the string however you want.

text = "Python-is-great-language!"
words = text.split("-")
 
print(words)

Here's the result of running the example.

['Python', 'is', 'great', 'language!']

The split() method has a maxsplit parameter. The maxsplit parameter specifies the maximum number of times the string can be split. After the string is split the maximum number of times from the beginning, the remaining string is returned as the last element.

text = "Python is great language! It's easy-to-use."
words = text.split(" ", maxsplit=2)
 
print(words)

Here's the result of running the example.

['Python', 'is', "great language! It's easy-to-use."]

3. Splitting lines at line breaks with the splitlines() method

When dealing with multi-line strings, use the splitlines() method. The splitlines() method splits a multiline string line by line and returns a list of each record. By default, the method splits the string by newline characters (\n).

An example of using the splitlines() method:

multiline_text = "Python is great language!\nIt's easy-to-use.\nJust try it today!"
lines = multiline_text.splitlines()
 
print(lines)

Here's the result of running the example.

['Python is great language!', "It's easy-to-use.", 'Just try it today!']

The splitlines() method can also optionally take a keepends parameter, If this parameter is set to True, it will keep the newline character (\n) at the end of each line of the returned list:

multiline_text = "Python is great language!\nIt's easy-to-use.\nJust try it today!"
lines = multiline_text.splitlines(keepends=True)
 
print(lines)

Here's the result of running the example.

['Python is great language!\n', "It's easy-to-use.\n", 'Just try it today!']

4. Split a string with multiple delimiters using the re.split() regular expression

In some cases, you may need more advanced split functions to split a string based on multiple delimiters or patterns. Python's re module provides a powerful split() function that allows you to split strings using regular expressions.

Here's an example of splitting a string using multiple delimiters:

import re
 
text = "Python is;great:language! It's,easy-to-use."
words = re.split(r"[;:,\s]\s*", text)
 
print(words)

Here's the result of running the example

['Python', 'is', 'great', 'language!', "It's", 'easy-to-use.']

The regular expression pattern used in the example above is r"[;:,\s]\s*". The meaning of r"[;:,\s]\s*" is to find all patterns that start with a semicolon (;), colon (:), comma (,), or space (\s) followed by zero or more spaces (\s*). The re.split() function will split the string whenever this pattern occurs.

Another way to use re.split() is to split a string based on a pattern rather than a specific delimiter:

import re
 
text = "Python is;great:language!1234It's,easy-to-use."
words = re.split(r"\d+", text)
 
print(words)

Here's the result of running the example

['Python is;great:language!', "It's,easy-to-use."]

In the example above, the regular expression pattern (r"\d+") means that it matches one or more numbers. The function re.split() splits the string each time this pattern occurs.

5. Best Practices for Splitting Strings in Python

  • For simple string splitting, use Python's built-in split() method. The split() method is efficient and easy to use for most string splitting tasks. It is the best solution, especially if you are splitting a string based on a single delimiter.
  • To split multi-line strings, use the splitlines() method. The splitlines() method is the most efficient and convenient way to split a multiline string line by line and return a list with the lines as elements.
  • For more advanced splitting, use regular expressions with re.split(). If you need to split a string based on multiple delimiters, patterns, or complex rules, the re module's re.split() function provides powerful and flexible splitting capabilities.
  • However, while regular expressions are powerful, they can also be slower than the built-in methods. We recommend using the built-in methods whenever possible, and using regular expressions only when necessary.

6. Conclusion

In this article, we've covered the built-in Python methods split() and splitlines(). and the re module's re.split() function, We've seen several techniques for splitting strings in Python. Understanding and mastering these techniques will allow you to efficiently manipulate and process text-based data in Python.

From simple text processing to complex data extraction and transformation, these powerful string manipulation tools are at your disposal, We hope you find this post helpful in handling a variety of tasks.

copyright for Python string split

© 2023 All rights reserved.