I need a Regex in a C# program.
I've to capture a name of a file with a specific structure.
I used the \w
char class, but the problem is that this class doesn't match any accented char.
Then how to do this? I just don't want to put the most used accented letter in my pattern because we can theoretically put every accent on every letter.
So I though there is maybe a syntax, to say we want a case insensitive(or a class which takes in account accent), or a Regex option which allows me to be case insensitive.
Do you know something like this?
Thank you very much
You could simply replace diacritics with alphabetic (near-)equivalences, and then use use your current regex.
See for example:
How do I remove diacritics (accents) from a string in .NET?
static string RemoveDiacritics(string input)
{
string normalized = input.Normalize(NormalizationForm.FormD);
var builder = new StringBuilder();
foreach (char ch in normalized)
{
if (CharUnicodeInfo.GetUnicodeCategory(ch) != UnicodeCategory.NonSpacingMark)
{
builder.Append(ch);
}
}
return builder.ToString().Normalize(NormalizationForm.FormC);
}
string s1 = "Renato Núñez David DeJesús Edwin Encarnación";
string s2 = RemoveDiacritics(s1);
// s2 = "Renato Nunez David DeJesus Edwin Encarnacion"