Python String Splitting Guide: split, rsplit, splitlines, partition, and re.split
String splitting is one of the most common text-processing tasks in Python. Most people start with split(), but once you need right-side splitting, line-aware splitting, fixed three-part results, or regex-based splitting, one method is no longer enough.
This article puts the 6 most useful Python string splitting methods in one place:
- slpit
- rsplit
- splitlines
- partition
- rpartition
- re.split
Below are detailed introductions to each method.
A quick rule for choosing the right method
If you just want a practical shortcut, use this rule:
- Use
splitfor normal fixed-separator splitting - Use
rsplitwhen you want limited splits starting from the right - Use
splitlineswhen you are processing line breaks - Use
partitionorrpartitionwhen you need a fixed three-part result - Use
re.splitonly when the split rule is complex enough to need regex
split(sep=None, maxsplit=-1)
This is the most common method. It splits a string using the separator given by sep and returns a list. You can also control how many times the string should be split with maxsplit.
Two details matter in practice:
- The first parameter is the separator
sep, and the second parameter ismaxsplit - If you want to keep the default separator behavior but still limit the number of splits, you should write
split(maxsplit=2)
split example 1
Without specifying a separator, Python splits on spaces and newline characters (including newline characters from different systems).
## Normal space splitting
normal = "this is a string"
normal.split()
#Out[1]: ['this', 'is', 'a', 'string']
## Multiple spaces, automatically merged
twospace = "string two space one"
twospace.split()
#Out[2]: ['string', 'two', 'space', 'one']
## Newline splitting
line = "aother\nstring"
line.split()
#Out[5]: ['aother', 'string']
## Windows style newline
wline = "windows\r\nstring"
wline.split()
#Out[7]: ['windows', 'string']
split example 2
Custom split string
## Using common separators like commas or periods
normal = "this,is,a,string"
normal.split(",")
# Out[1]: ['this', 'is', 'a', 'string']
## You can also use words or other strings
words = u"我是春江暮客博客博主"
words.split(u"博客")
#Out[1]: ['我是春江暮客', '博主']
split example 3
Specify how many parts to split the string into with maxsplit
## Only split into three parts; note the count starts from 0
spe_len = "this,is,a,string"
spe_len.split(",", 2)
#['this', 'is', 'a,string']
## Default first parameter, custom second parameter requires naming it
spe_len = "this is a string"
spe_len.split(maxsplit=2)
#['this', 'is', 'a string']
rsplit(sep=None, maxsplit=-1)
This function is almost the same as split, but the r means “right”. When the number of splits is limited, it starts counting from the right side.
## Normal space splitting, same as split
normal = "this is a string"
normal.rsplit()
#Out[1]: ['windows', 'string']
## Specifying separator, same as split
normal = "this,is,a,string"
normal.rsplit(",")
#Out[2]: ['this', 'is', 'a', 'string']
## Specifying split count, different from split
### split
spe_len = "this is a string"
spe_len.split(maxsplit=2)
#['this', 'is', 'a string']
### rsplit
spe_len = "this is a string"
spe_len.rsplit(maxsplit=2)
#Out[2]: ['this is', 'a', 'string']
In rsplit, the leftover part stays on the left. In split, the leftover part stays on the right.
splitlines(keepends=False)
This method is designed specifically for line-based splitting. Unlike plain split(), it understands multiple line separators and can optionally keep those separators with keepends=True.
The line break characters supported include:
["\n","\r","\r\n","\v","\x0b","\f","\x0c","\x1c","\x1d","\x1e","\x85","\u2028","\u2029"]
Examples:
# Line splitting
s = "我是\n春江暮客\r博客\r\n博主"
s.splitlines()
#Out[1]: ['我是', '春江暮客', '博客', '博主']
# Keeping split characters, the separators will be attached to the previous string
s = "我是\n春江暮客\r博客\r\n博主"
s.splitlines(True)
#Out[2]: ['我是\n', '春江暮客\r', '博客\r\n', '博主']
partition(sep)
Splits the string at the first occurrence of the separator, returning a tuple of three elements: the part before the separator, the separator itself, and the part after.
sep has no default value, so omitting it raises an error. If the separator does not appear in the string, the first element is the original string.
Examples:
# Space partitioning
s = "我是 春江暮客 博客博主"
s.partition(" ")
# Out[3]: ('我是', ' ', '春江暮客 博客博主')
# Carriage return partitioning
s2 = "我是\n春江暮客\r博客\r\n博主"
s2.partition("\r")
# Out[2]: ('我是\n春江暮客', '\r', '博客\r\n博主')
# No separator specified error
s3 = "我是\n春江暮客\r博客\r\n博主"
s3.partition()
#TypeError: partition() takes exactly one argument (0 given)
# Separator not found
s4 = "我是\n春江暮客\r博客\r\n博主"
s4.partition(",")
#Out[2]: ('我是\n春江暮客\r博客\r\n博主', '', '')
rpartition(sep)
This is similar to partition, but it searches from the right side.
## partition
s3 = "我是\n春江暮客\r博客\r\n博主"
s3.partition("\r")
#Out[2]: ('我是\n春江暮客', '\r', '博客\r\n博主')
## rpartition
s3 = "我是\n春江暮客\r博客\r\n博主"
s3.rpartition("\r")
#Out[3]: ('我是\n春江暮客\r博客', '\r', '\n博主')
re.split(pattern, string, maxsplit=0, flags=0)
The last option is re.split from the re module. Its strength is that it can split using regex rules instead of one fixed separator, but it is also more expensive and more complex than the built-in string methods.
patternis the regex pattern to match.stringis the string to split.maxsplitis the maximum number of splits, defaults to 0 (meaning no limit).flagscan be set tore.IGNORECASEto ignore case.
Examples:
# Import re module
import re
# Split by space directly
s = "我是 春江暮客 博客 博主"
re.split(' ', s)
#Out[1]: ['我是', '春江暮客', '博客', '博主']
# Use regex \s for any whitespace character
s = "我是 春江暮客 博客 博主"
re.split('\s', s)
#Out[2]: ['我是', '春江暮客', '博客', '博主']
s = "我是 春江暮客\n 博客 博主"
re.split('\s', s)
#Out[3]: ['我是', '春江暮客', '', '博客', '博主']
# Specifying max split count
s = "我是 春江暮客 博客 博主"
re.split('\s', s, 2)
#Out[2]: ['我是', '春江暮客', '博客 博主']
# Ignore case
## Lowercase match
s = "hello World pythOn"
re.split('o', s, 2)
#Out[6]: ['hell', ' W', 'rld pythOn']
## Ignore case match
s = "hello World pythOn"
re.split('o', s, 2, re.IGNORECASE)
#Out[7]: ['hell', ' W', 'rld pythOn']
How to choose among these 6 methods
The main differences come down to how complex the separator rule is and what kind of result shape you want.
| Method | Best use case | Return value |
|---|---|---|
split |
Ordinary fixed-separator splitting | list |
rsplit |
Limited splitting from the right | list |
splitlines |
Line-based text processing | list |
partition |
Split only at the first match | tuple |
rpartition |
Split only at the last match | tuple |
re.split |
Regex-based split rules | list |
As a practical rule, prefer the built-in string methods first. Reach for re.split only when the separator rule is clearly complex enough to justify regex.
Summary
This article summarized the 6 most useful Python string splitting methods, the differences between them, and the situations where each one fits best.
In day-to-day code, one rule is usually enough: use split for ordinary cases, splitlines for line-oriented text, partition or rpartition when you need a fixed three-part result, and re.split only when the split rule is genuinely complex.
References
- 原文作者:春江暮客
- 原文链接:https://www.bobobk.com/en/852.html
- 版权声明:本作品采用 知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议 进行许可,非商业转载请注明出处(作者,原文链接),商业转载请联系作者获得授权。