I want to create a sane/safe filename (i.e. somewhat readable, no "strange" characters, etc.) from some random Unicode string (mich might contain just anything).
(It doesn't matter for me wether the function is Cocoa, ObjC, Python, etc.)
Of course, there might be infinite many characters which might be strange. Thus, it is not really a solution to have a blacklist and to add more and more to that list over the time.
I could have a whitelist. However, I don't really know how to define it. [a-zA-Z0-9 .]
is a start but I also want to accept unicode chars which can be displayed in a normal way.
Python:
"".join([c for c in filename if c.isalpha() or c.isdigit() or c==' ']).rstrip()
this accepts Unicode characters but removes line breaks, etc.
example:
filename = u"ad\nbla'{-+\)(ç?"
gives: adblaç
edit str.isalnum() does alphanumeric on one step. – comment from queueoverflow below. danodonovan hinted on keeping a dot included.
keepcharacters = (' ','.','_')
"".join(c for c in filename if c.isalnum() or c in keepcharacters).rstrip()