I have a Tab-delimited String (representing a table) that is passed to my method. When I print it to the command line, it appears like a table with rows:
https://i.stack.imgur.com/2fAyq.gif
The command window is correctly buffered. My thinking is that there is definitely a new line character before or after each row.
My problem is that I want to split up the incoming string into individual strings representing the rows of the table. So far I have:
private static final String newLine = System.getProperty("line.separator").toString();
private static final String tab = "\t";
private static String[] rows;
...
rows = tabDelimitedTable.split(newLine); //problem is here
System.out.println();
System.out.println("################### start debug ####################");
System.out.println((tabDelimitedTable.contains(newLine)) ? "True" : "False");
System.out.println("#################### end debug###################");
System.out.println();
output:
################### start debug ####################
False
#################### end debug###################
Obviously there is something in the string telling the OS to start a new line. Yet it apparently contains no newline characters.
Running the latest JDK on Windows XP SP3.
Any Ideas?
You must NOT assume that an arbitrary input text file uses the "correct" platform-specific newline separator. This seems to be the source of your problem; it has little to do with regex.
To illustrate, on the Windows platform, System.getProperty("line.separator")
is "\r\n"
(CR+LF). However, when you run your Java code on this platform, you may very well have to deal with an input file whose line separator is simply "\n"
(LF). Maybe this file was originally created in Unix platform, and then transferred in binary (instead of text) mode to Windows. There could be many scenarios where you may run into these kinds of situations, where you must parse a text file as input which does not use the current platform's newline separator.
(Coincidentally, when a Windows text file is transferred to Unix in binary mode, many editors would display ^M
which confused some people who didn't understand what was going on).
When you are producing a text file as output, you should probably prefer the platform-specific newline separator, but when you are consuming a text file as input, it's probably not safe to make the assumption that it correctly uses the platform specific newline separator.
One way to solve the problem is to use e.g. java.util.Scanner
. It has a nextLine()
method that can return the next line (if one exists), correctly handling any inconsistency between the platform's newline separator and the input text file.
You can also combine 2 Scanner
, one to scan the file line by line, and another to scan the tokens of each line. Here's a simple usage example that breaks each line into a List<String>
. The entire file therefore becomes a List<List<String>>
.
This is probably a better approach than reading the entire file into one huge String
and then split
into lines (which are then split
into parts).
String text
= "row1\tblah\tblah\tblah\n"
+ "row2\t1\t2\t3\t4\r\n"
+ "row3\tA\tB\tC\r"
+ "row4";
System.out.println(text);
// row1 blah blah blah
// row2 1 2 3 4
// row3 A B C
// row4
List<List<String>> input = new ArrayList<List<String>>();
Scanner sc = new Scanner(text);
while (sc.hasNextLine()) {
Scanner lineSc = new Scanner(sc.nextLine()).useDelimiter("\t");
List<String> line = new ArrayList<String>();
while (lineSc.hasNext()) {
line.add(lineSc.next());
}
input.add(line);
}
System.out.println(input);
// [[row1, blah, blah, blah], [row2, 1, 2, 3, 4], [row3, A, B, C], [row4]]
java.util.Scanner
- has many examples of usage