In Python, how to check if a string only contains certain characters?
I need to check a string containing only a..z, 0..9, and . (period) and no other character.
I could iterate over each character and check the character is a..z or 0..9, or . but that would be slow.
I am not clear now how to do it with a regular expression.
Is this correct? Can you suggest a simpler regular expression or a more efficient approach.
#Valid chars . a-z 0-9
def check(test_str):
import re
#http://docs.python.org/library/re.html
#re.search returns None if no position in the string matches the pattern
#pattern to search for any character other then . a-z 0-9
pattern = r'[^\.a-z0-9]'
if re.search(pattern, test_str):
#Character other then . a-z 0-9 was found
print 'Invalid : %r' % (test_str,)
else:
#No character other then . a-z 0-9 was found
print 'Valid : %r' % (test_str,)
check(test_str='abcde.1')
check(test_str='abcde.1#')
check(test_str='ABCDE.12')
check(test_str='_-/>"!@#12345abcde<')
'''
Output:
>>>
Valid : "abcde.1"
Invalid : "abcde.1#"
Invalid : "ABCDE.12"
Invalid : "_-/>"!@#12345abcde<"
'''
Here's a simple, pure-Python implementation. It should be used when performance is not critical (included for future Googlers).
import string
allowed = set(string.ascii_lowercase + string.digits + '.')
def check(test_str):
set(test_str) <= allowed
Regarding performance, iteration will probably be the fastest method. Regexes have to iterate through a state machine, and the set equality solution has to build a temporary set. However, the difference is unlikely to matter much. If performance of this function is very important, write it as a C extension module with a switch statement (which will be compiled to a jump table).
Here's a C implementation, which uses if statements due to space constraints. If you absolutely need the tiny bit of extra speed, write out the switch-case. In my tests, it performs very well (2 seconds vs 9 seconds in benchmarks against the regex).
#define PY_SSIZE_T_CLEAN
#include <Python.h>
static PyObject *check(PyObject *self, PyObject *args)
{
const char *s;
Py_ssize_t count, ii;
char c;
if (0 == PyArg_ParseTuple (args, "s#", &s, &count)) {
return NULL;
}
for (ii = 0; ii < count; ii++) {
c = s[ii];
if ((c < '0' && c != '.') || c > 'z') {
Py_RETURN_FALSE;
}
if (c > '9' && c < 'a') {
Py_RETURN_FALSE;
}
}
Py_RETURN_TRUE;
}
PyDoc_STRVAR (DOC, "Fast stringcheck");
static PyMethodDef PROCEDURES[] = {
{"check", (PyCFunction) (check), METH_VARARGS, NULL},
{NULL, NULL}
};
PyMODINIT_FUNC
initstringcheck (void) {
Py_InitModule3 ("stringcheck", PROCEDURES, DOC);
}
Include it in your setup.py:
from distutils.core import setup, Extension
ext_modules = [
Extension ('stringcheck', ['stringcheck.c']),
],
Use as:
>>> from stringcheck import check
>>> check("abc")
True
>>> check("ABC")
False