Qt Turkish characters in regular expressions

onurozcelik picture onurozcelik · Jun 5, 2013 · Viewed 16k times · Source

I want to validate QLineEdit's text with a regular expression. It should allow characters from a to z plus A to Z plus Turkish characters(ğüşöçİĞÜŞÖÇ) plus numbers from 0 to 9. I googled about my problem and found two solutions but neither one worked for me. In one solution it says "include Turkish characters in regexp" and in other one it says "use unicodes of turkish characters"

Below are two reqular expressions

QRegExp exp = QRegExp("^[a-zA-Z0-9ğüşöçİĞÜŞÖÇ]+$");

QRegExp exp = QRegExp("^[a-zA-Z0-9\u00E7\u011F\u0131\u015F\u00F6\u00FC\u00C7\u011E\u0130\u015E\u00D6\u00DC]+$");

Neither one of reqular expressions above can validate the name 'İSMAİL'. Also I tried a text only contains Turkish characters('ğüşöçİĞÜŞÖÇ') but it can not be validated. When I remove 'İ' character from both texts they can be validated. I guess the problem may be related with 'İ' character.

How can I solve the problem?

Note: We are using Qt 4.6.3 in our project.

Answer

Pavel Strakhov picture Pavel Strakhov · Jun 5, 2013

I think this is an encoding problem. You use implicit cast from const char* to QString which results in using QString::fromAscii. If you want to use non-Latin1 encoding here, you need to call QTextCodec::setCodecForCStrings and set the encoding your source files are saved in. I'd use UTF-8 encoding, so at the initialization of the app should be done like this:

QTextCodec::setCodecForCStrings(QTextCodec::codecForName("utf-8"));
QRegExp exp = QRegExp("^[a-zA-Z0-9ğüşöçİĞÜŞÖÇ]+$");
qDebug() << exp.exactMatch("İSMAİL"); // <= true

I suggest more clear solution to check if your problem is here. Save your code in UTF-8 encoding and use QString::fromUtf8 to convert your string literals to QString using UTF-8 explicitly:

QRegExp exp = QRegExp(QString::fromUtf8("^[a-zA-Z0-9ğüşöçİĞÜŞÖÇ]+$"));
qDebug() << exp.exactMatch(QString::fromUtf8("İSMAİL")); // <= true