Solution preview: removing punctuation is fastest and simplest with the built-in str.translate()
and a translation table built from string.punctuation
. For full Unicode punctuation, use unicodedata
or a Unicode-aware regex.
Method 1: Delete ASCII punctuation with str.translate (fast, built-in)
This approach runs in C under the hood and is typically the most efficient for ASCII punctuation. See the documentation for details.
Step 1: Import string
.
import string
Step 2: Build a translation table that deletes punctuation.
table = str.maketrans('', '', string.punctuation)
Step 3: Apply the table with translate
.
s = "Hello, world! Python 3.12—fun?"
clean = s.translate(table)
print(clean) # Hello world Python 312—fun
Step 4: Optionally replace punctuation with spaces instead of deleting.
space_table = str.maketrans({ch: ' ' for ch in string.punctuation})
spaced = s.translate(space_table)
normalized = ' '.join(spaced.split())
print(normalized) # Hello world Python 3.12—fun
Notes:
string.punctuation
covers ASCII punctuation only. It does not include punctuation like“ ” — 。 !
. See the reference.- To handle non-ASCII punctuation, use Method 3 or Method 4.
Option 2: Remove ASCII punctuation with re.sub
Regular expressions are concise and flexible. Escape the punctuation class once, then substitute. See re.sub in the documentation.
Step 1: Import re
and string
.
import re, string
Step 2: Compile a pattern that matches any ASCII punctuation.
pattern = re.compile(r'[%s]' % re.escape(string.punctuation))
Step 3: Substitute matches with empty strings (or a space).
s = "A test: regex-only, please!"
clean = pattern.sub('', s)
print(clean) # A test regexonly please
Tip: \w
includes letters, digits, and underscore, and \s
matches whitespace; both are described in the documentation. If you prefer a whitelist, you can keep word and space characters: re.sub(r'[^\w\s]', '', s)
.
Approach 3: Remove all Unicode punctuation with unicodedata (built-in)
This approach removes any character whose Unicode category begins with 'P'
(punctuation), not just ASCII. See the reference.
Step 1: Import unicodedata
and sys
.
import unicodedata, sys
Step 2: Build a deletion map for all code points in the Unicode range whose category starts with 'P'
.
delete_punct = dict.fromkeys(
i for i in range(sys.maxunicode + 1)
if unicodedata.category(chr(i)).startswith('P')
)
Step 3: Translate the string using the map.
s = "Unicode: 「quotes」 — dashes… 你好,世界!"
clean = s.translate(delete_punct)
print(clean) # Unicode quotes dashes 你好世界
Tip: If you also want to drop symbols like currency signs, extend the filter to include category 'S'
.
Way 4: Use the third‑party “regex” module for Unicode properties
Python’s built-in re
does not support \p{...}
Unicode properties. The regex
package supports them and can target punctuation precisely using \p{P}
. Install it from the package page.
Step 1: Install the package.
pip install regex
Step 2: Import and compile a Unicode property pattern.
import regex
pattern = regex.compile(r'\p{P}+')
Step 3: Substitute punctuation with an empty string or a space.
s = "Mix: ASCII, Unicode… and 「symbols」!"
clean = pattern.sub('', s)
print(clean) # Mix ASCII Unicode and symbols
Tip: To remove punctuation and symbols together, use r'[\p{P}\p{S}]+'
.
Path 5: Quick comprehension/filter (simple, slower)
This pure-Python option is easy to read for small inputs, but is slower than the methods above.
Step 1: Import string
.
import string
Step 2: Keep only non-punctuation characters.
s = "Keep it simple, okay?"
clean = ''.join(ch for ch in s if ch not in string.punctuation)
print(clean) # Keep it simple okay
Note: This also relies on ASCII-only string.punctuation
.
Practical tips:
- Decide whether to delete punctuation or replace it with spaces; replacing then normalizing whitespace keeps word boundaries intact.
- When using regex, remember
\w
includes underscore; if underscores should be removed, target them explicitly. - For very large texts or performance-critical code, prefer
str.translate()
with a prebuilt table.
That’s it—use str.translate
for speed on ASCII, unicodedata
or a Unicode-aware regex when you need to cover all punctuation across languages.
Member discussion