Comment lire et analyser un fichier CSV en Java
Un fichier de valeurs séparées par des virgules (CSV) est juste un fichier de texte brut normal, stocke les données colonne par colonne et les divise par un séparateur (par exemple, normalement, il s'agit d'une virgule «,»).
Par exemple :
1,US,United States 2,MY,Malaysia 3,AU,Australia
or
"1","US","United States" "2","MY","Malaysia" "3","AU","Australia"
Note
Lisez ce documentRFC4180 au format CSV (valeurs séparées par des virgules).
Dans un fichier CSV, il y a normalement deux problèmes:
1. Le champ contenant le séparateur, par exemple, séparateur est une virgule, et le champ contient des virgules:
"aaa","b,bb","ccc"
2. Les guillemets doubles sont utilisés pour entourer les champs et le champ contenant des guillemets doubles. Pour le résoudre, un guillemet double apparaissant à l'intérieur d'un champ doit être échappé en le précédant d'un autre guillemet double (RFC4180)
"aaa","b""bb","ccc"
Dans ce didacticiel, nous vous montrons trois exemples pour lire, analyser et imprimer les valeurs d'un fichier CSV.
-
Solution simple pour analyser un fichier CSV formaté simple.
-
Solution avancée pour analyser un fichier CSV au format étrange (champ contenant un séparateur ou des guillemets)
-
Solution tierce, exemple OpenCSV.
1. Solution simple
Si vous êtes sûr que les fichiers CSV ne contiennent pas de "séparateur ou de guillemets", utilisez simplement lessplit()
tandard pour analyser le fichier CSV.
1.1 Review a simple CSV file
/Users/example/csv/country.csv
"1.0.0.0","1.0.0.255","16777216","16777471","AU","Australia" "1.0.1.0","1.0.3.255","16777472","16778239","CN","China" "1.0.4.0","1.0.7.255","16778240","16779263","AU","Australia" "1.0.8.0","1.0.15.255","16779264","16781311","CN","China" "1.0.16.0","1.0.31.255","16781312","16785407","JP","Japan" "1.0.32.0","1.0.63.255","16785408","16793599","CN","China" "1.0.64.0","1.0.127.255","16793600","16809983","JP","Japan" "1.0.128.0","1.0.255.255","16809984","16842751","TH","Thailand"
1.2 No magic, just read above text file, and splits it by a comma separator.
CSVReader.csv
package com.example.csv; import java.io.BufferedReader; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.IOException; public class CSVReader { public static void main(String[] args) { String csvFile = "/Users/example/csv/country.csv"; BufferedReader br = null; String line = ""; String cvsSplitBy = ","; try { br = new BufferedReader(new FileReader(csvFile)); while ((line = br.readLine()) != null) { // use comma as separator String[] country = line.split(cvsSplitBy); System.out.println("Country [code= " + country[4] + " , name=" + country[5] + "]"); } } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { if (br != null) { try { br.close(); } catch (IOException e) { e.printStackTrace(); } } } } }
1.3 For JDK 7 and above, uses try-resources.
CSVReader.csv
package com.example.csv; import java.io.BufferedReader; import java.io.FileReader; import java.io.IOException; public class CSVReader { public static void main(String[] args) { String csvFile = "/Users/example/csv/country.csv"; String line = ""; String cvsSplitBy = ","; try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) { while ((line = br.readLine()) != null) { // use comma as separator String[] country = line.split(cvsSplitBy); System.out.println("Country [code= " + country[4] + " , name=" + country[5] + "]"); } } catch (IOException e) { e.printStackTrace(); } } }
Sortie
Country [code= "AU" , name="Australia"] Country [code= "CN" , name="China"] Country [code= "AU" , name="Australia"] Country [code= "CN" , name="China"] Country [code= "JP" , name="Japan"] Country [code= "CN" , name="China"] Country [code= "JP" , name="Japan"] Country [code= "TH" , name="Thailand"]
2. Solution avancée
Cette solution résoudra le champ contenant le problème «séparateur ou guillemets», et prendra également en charge le séparateur personnalisé et le champ fermé personnalisé. Consultez l'exemple d'analyse CSV suivant ainsi que les cas de test JUnit pour comprendre comment cela fonctionne.
Note
Encore une fois, si un guillemet double apparaissant à l'intérieur d'un champ doit être échappé en le précédant d'un autre guillemet double, par exemple
"aaa","b""bb","ccc"
2.1 Review another CSV file
/Users/example/csv/country2.csv
10,AU,Australia 11,AU,Aus""tralia "12","AU","Australia" "13","AU","Aus""tralia" "14","AU","Aus,tralia"
2.2 The below example is inspired by this article – “https://agiletribe.wordpress.com/2012/11/23/the-only-class-you-need-for-csv-files/[Only Class You Need for CSV Files]” (with some fixed to support other features, read “fixed comments”) and also this third party OpenCSV library.
CSVUtils.java
package com.example.utils; import java.io.File; import java.util.ArrayList; import java.util.List; import java.util.Scanner; public class CSVUtils { private static final char DEFAULT_SEPARATOR = ','; private static final char DEFAULT_QUOTE = '"'; public static void main(String[] args) throws Exception { String csvFile = "/Users/example/csv/country2.csv"; Scanner scanner = new Scanner(new File(csvFile)); while (scanner.hasNext()) { Listline = parseLine(scanner.nextLine()); System.out.println("Country [id= " + line.get(0) + ", code= " + line.get(1) + " , name=" + line.get(2) + "]"); } scanner.close(); } public static List parseLine(String cvsLine) { return parseLine(cvsLine, DEFAULT_SEPARATOR, DEFAULT_QUOTE); } public static List parseLine(String cvsLine, char separators) { return parseLine(cvsLine, separators, DEFAULT_QUOTE); } public static List parseLine(String cvsLine, char separators, char customQuote) { List result = new ArrayList<>(); //if empty, return! if (cvsLine == null && cvsLine.isEmpty()) { return result; } if (customQuote == ' ') { customQuote = DEFAULT_QUOTE; } if (separators == ' ') { separators = DEFAULT_SEPARATOR; } StringBuffer curVal = new StringBuffer(); boolean inQuotes = false; boolean startCollectChar = false; boolean doubleQuotesInColumn = false; char[] chars = cvsLine.toCharArray(); for (char ch : chars) { if (inQuotes) { startCollectChar = true; if (ch == customQuote) { inQuotes = false; doubleQuotesInColumn = false; } else { //Fixed : allow "" in custom quote enclosed if (ch == '\"') { if (!doubleQuotesInColumn) { curVal.append(ch); doubleQuotesInColumn = true; } } else { curVal.append(ch); } } } else { if (ch == customQuote) { inQuotes = true; //Fixed : allow "" in empty quote enclosed if (chars[0] != '"' && customQuote == '\"') { curVal.append('"'); } //double quotes in column will hit this! if (startCollectChar) { curVal.append('"'); } } else if (ch == separators) { result.add(curVal.toString()); curVal = new StringBuffer(); startCollectChar = false; } else if (ch == '\r') { //ignore LF characters continue; } else if (ch == '\n') { //the end, break! break; } else { curVal.append(ch); } } } result.add(curVal.toString()); return result; } }
Sortie
Country [id= 10, code= AU , name=Australia] Country [id= 11, code= AU , name=Aus"tralia] Country [id= 12, code= AU , name=Australia] Country [id= 13, code= AU , name=Aus"tralia] Country [id= 14, code= AU , name=Aus,tralia]
3.3 Review the following unit test, it test the “commas and double quotes” issue.
CSVUtilsTest.java
package com.example.csv; import com.example.utils.CSVUtils; import org.hamcrest.core.IsNull; import org.junit.Test; import java.util.List; import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.core.Is.is; public class CSVUtilsTest { @Test public void test_no_quote() { String line = "10,AU,Australia"; Listresult = CSVUtils.parseLine(line); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Australia")); } @Test public void test_no_quote_but_double_quotes_in_column() throws Exception { String line = "10,AU,Aus\"\"tralia"; List result = CSVUtils.parseLine(line); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Aus\"tralia")); } @Test public void test_double_quotes() { String line = "\"10\",\"AU\",\"Australia\""; List result = CSVUtils.parseLine(line); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Australia")); } @Test public void test_double_quotes_but_double_quotes_in_column() { String line = "\"10\",\"AU\",\"Aus\"\"tralia\""; List result = CSVUtils.parseLine(line); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Aus\"tralia")); } @Test public void test_double_quotes_but_comma_in_column() { String line = "\"10\",\"AU\",\"Aus,tralia\""; List result = CSVUtils.parseLine(line); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Aus,tralia")); } }
3.4 Review another unit test, it test the custom separator and custom enclosed field.
CSVUtilsTestCustom.java
package com.example.csv; import com.example.utils.CSVUtils; import org.hamcrest.core.IsNull; import org.junit.Test; import java.util.List; import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.core.Is.is; public class CSVUtilsTestCustom { @Test public void test_custom_separator() { String line = "10|AU|Australia"; Listresult = CSVUtils.parseLine(line, '|'); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Australia")); } @Test public void test_custom_separator_and_quote() { String line = "'10'|'AU'|'Australia'"; List result = CSVUtils.parseLine(line, '|', '\''); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Australia")); } @Test public void test_custom_separator_and_quote_but_custom_quote_in_column() { String line = "'10'|'AU'|'Aus|tralia'"; List result = CSVUtils.parseLine(line, '|', '\''); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Aus|tralia")); } @Test public void test_custom_separator_and_quote_but_double_quotes_in_column() { String line = "'10'|'AU'|'Aus\"\"tralia'"; List result = CSVUtils.parseLine(line, '|', '\''); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Aus\"tralia")); } }
3. Exemple OpenCSV
Si vous n'êtes pas à l'aise avec la solution simple et avancée ci-dessus, essayez d'utiliser la bibliothèque CSV tierce -OpenCSV.
3.1 Maven.
pom.xml
com.opencsv opencsv 3.8
3.2 Review a CSV file.
/Users/example/csv/country3.csv
10,AU,Australia 11,AU,Aus""tralia "12","AU","Australia" "13","AU","Aus""tralia" "14","AU","Aus,tralia"
3.2 OpenCSV example to parse above CSV file.
CSVReaderExample.java
package com.example.csv; import com.opencsv.CSVReader; import java.io.FileReader; import java.io.IOException; public class CSVReaderExample { public static void main(String[] args) { String csvFile = "/Users/example/csv/country3.csv"; CSVReader reader = null; try { reader = new CSVReader(new FileReader(csvFile)); String[] line; while ((line = reader.readNext()) != null) { System.out.println("Country [id= " + line[0] + ", code= " + line[1] + " , name=" + line[2] + "]"); } } catch (IOException e) { e.printStackTrace(); } } }
Sortie
Country [id= 10, code= AU , name=Australia] Country [id= 11, code= AU , name=Aus"tralia] Country [id= 12, code= AU , name=Australia] Country [id= 13, code= AU , name=Aus"tralia] Country [id= 14, code= AU , name=Aus,tralia]
Note
Veuillez vous référer à ceOpenCSV official
documentation pour plus d'exemples.
Terminé.