(Comma Seperated Value) CSV file format is a very common way of storing datasets in a simple and portable format. Also, since it is plain text, it is very easy to make such a file programmatically in an application. There are libraries around for such tasks but if you want to customize or create something specific it is better to write your own implementation for full control.
In this guide, I will explain how to write a generic CSV Writer/Reader that will automatically pick data from the public properties of the input objects and generate a CSV file.
Before we dive into the code, let me explain what is CSV file format so we have a better understanding of what we are dealing with.
A CSV file is a plain text file which holds data in table format. Each line is like a row in a table, and columns are separated by a comma. Hence, the name comma separated value.
Below is an example of such a file containing table of numbers, first names and last names of people:
11,murat,aykanat
22,john,smith
CSV format is very useful because it is a text file and any operating system can read it. For example, if your application creates the CSV file on a Windows machine, you can open it up and use it on a Linux machine. So it is a very portable and easily readable file format.
In some cases, the file may not be comma separated. Especially if you are using a 3rd party program(e.g. Microsoft Excell), depending on the culture of your machine "separator" might be a different character such as ;
. This is because of the decimal separator is different in different cultures. In some cultures, the decimal separator is .
so the CSV separator can be ,
. But in some cultures decimal separator is ,
so the CSV file has to use ;
as a separator.
For example, if your locale is set to some European culture, such as fr-FR
, default decimal separator becomes ,
and you need to use ;
in CSV file as column separator:
13,5;2,5;5,4
24,5;6,7;8,9
However in a machine which has en-US
set as default, since decimal separator is .
by default, same CSV file would look like this:
13.5,2.5,5.4
24.5,6.7,8.9
The most critical rule is every row must contain the same number of data fields otherwise it would be impossible to read by any CSV reader also it would not make sense.
If you have an empty data field, you can just use an empty string.
11,,aykanat
22,john,smith
If you have text values in your CSV file, you might run into a problem where there is a comma inside one of your rows which would pose a problem because that field would be divided from that comma and you would end up with an extra column in that row.
11, Hello, world!
In the above example, our first column is 1 and the second is Hello, world!
, however, a CSV reader would divide the row into 3 columns 1
, Hello
and world!
.
To solve this issue we must use quotation marks:
11, "Hello, world!"
This way we mean that the string Hello, world!
is a single data field.
You can also use quotation mark on single word strings, but it is not necessary.
11,"murat","aykanat"
21,"john","smith"
We can also have actual quotation marks in our data fields. In this case, we need to double our quotation marks indicating it is included in the data field.
11,murat,""aykanat""
22,john,""smith""
That would read as; number is 1
, name is murat
, lastname is "aykanat"
.
You can also add headers to the columns.
1id,name,lastname
21,murat,aykanat
31,john,smith
This is useful when the columns are not clear by its data fields. For example, if columns are all numbers, and you send the file to a person who does not know what the columns mean, that would be very confusing for the other person because he or she doesn't know what those numbers mean.So it would be better if we include headers in those scenarios to indicate what those columns of values mean.
If you would like to know more about CSV file format, you can use the Wikipedia article about CSV and its resources.
Our idea is simple, we want to input an array of objects into our CSV Writer and output a CSV file. For the reading part, we want to input the file path and output an array of objects back into the memory.
Input
1public class Person
2{
3 public int Id{ get; set };
4 public string Name { get; set; }
5 public string Lastname { get; set; }
6}
Output
11,murat,aykanat
22,john,smith
However there are some considerations:
Optional considerations:
According to our considerations above we need a method for each class, ToCsv()
, that we want to turn to CSV format. This can be done in 3 ways:
ICsvable
with a method ToCsv()
ToString()
CsvableBase
having a virtual ToCsv()
methodI didn't want to use the ToString()
override because maybe we need it somewhere else in our code, and I want the ToCsv()
to be a separate method so it will be clear what it is doing. Also, since the code will be same for each class, I will go with the abstract class way. However if your particular class is already inheriting from a certain base class, you must go with the interface implementation since you can't inherit from more than one base class in C#. You just need to copy and paste for each class you want to turn to CSV.
What we need to do here is use reflection to get all the values of public properties of the class.
1public abstract class CsvableBase
2{
3 public virtual string ToCsv()
4 {
5 string output = "";
6
7 var properties = GetType().GetProperties();
8
9 for (var i = 0; i < properties.Length; i++)
10 {
11 output += properties[i].GetValue(this).ToString();
12 if (i != properties.Length - 1)
13 {
14 output += ",";
15 }
16 }
17
18 return output;
19 }
20}
So what we do here is simple:
PropertyInfo
.Let's test this on the Person
class we created earlier in a console application.
1public class Person : CsvableBase
2{
3 public Person(int id, string name, string lastname)
4 {
5 Id = id;
6 Name = name;
7 Lastname = lastname;
8 }
9
10 public int Id { get; set; }
11 public string Name { get; set; }
12 public string Lastname { get; set; }
13}
14class Program
15{
16 static void Main(string[] args)
17 {
18 var p = new Person(1,"murat","aykanat");
19 Console.WriteLine(p.ToCsv());
20 Console.ReadLine();
21 }
22}
Output of above code:
11,murat,aykanat
To pre-process commas, quotation marks, and special characters, let's add a pre-processing method to our base class.
1public abstract class CsvableBase
2{
3 public virtual string ToCsv()
4 {
5 string output = "";
6
7 var properties = GetType().GetProperties();
8
9 for (var i = 0; i < properties.Length; i++)
10 {
11 output += PreProcess(properties[i].GetValue(this).ToString());
12 if (i != properties.Length - 1)
13 {
14 output += ",";
15 }
16 }
17
18 return output;
19 }
20 private string PreProcess(string input)
21 {
22 input = input.Replace('ı', 'i')
23 .Replace('ç', 'c')
24 .Replace('ö', 'o')
25 .Replace('ş', 's')
26 .Replace('ü', 'u')
27 .Replace('ğ', 'g')
28 .Replace('İ', 'I')
29 .Replace('Ç', 'C')
30 .Replace('Ö', 'O')
31 .Replace('Ş', 'S')
32 .Replace('Ü', 'U')
33 .Replace('Ğ', 'G')
34 .Replace(""", """")
35 .Trim();
36 if (input.Contains(","))
37 {
38 input = """ + input + """;
39 }
40 return input;
41 }
42}
Let's change our Person
object into something that can test our pre-processing method.
1class Program
2{
3 static void Main(string[] args)
4 {
5 var p = new Person(1,""Hello", world!","İĞÜÇÖıüşöç");
6 Console.WriteLine(p.ToCsv());
7 Console.ReadLine();
8 }
9}
Output of this code:
11,"""Hello"", world!",IGUCOiusoc
Sometimes you may not need all the public properties in a class. So we need to ignore or filter properties either by their name or their index in the list of properties we acquire by reflection.
But what happens in the edge cases such as the beginning or the end? How do we manage commas?
We just need to modify our code a bit to achieve what we want. Let's add following methods to our CsvableBase
class.
1public virtual string ToCsv(string[] propertyNames, bool isIgnore)
2{
3 string output = "";
4 bool isFirstPropertyWritten = false;
5
6
7 var properties = GetType().GetProperties();
8
9 for (var i = 0; i < properties.Length; i++)
10 {
11 if (isIgnore)
12 {
13 if (!propertyNames.Contains(properties[i].Name))
14 {
15 if (isFirstPropertyWritten)
16 {
17 output += ",";
18 }
19 output += PreProcess(properties[i].GetValue(this).ToString());
20
21 if (!isFirstPropertyWritten)
22 {
23 isFirstPropertyWritten = true;
24 }
25 }
26 }
27 else
28 {
29 if (propertyNames.Contains(properties[i].Name))
30 {
31 if (isFirstPropertyWritten)
32 {
33 output += ",";
34 }
35 output += PreProcess(properties[i].GetValue(this).ToString());
36
37 if (!isFirstPropertyWritten)
38 {
39 isFirstPropertyWritten = true;
40 }
41 }
42 }
43 }
44
45 return output;
46}
47
48public virtual string ToCsv(int[] propertyIndexes, bool isIgnore)
49{
50 string output = "";
51
52 bool isFirstPropertyWritten = false;
53
54 var properties = GetType().GetProperties();
55
56 for (var i = 0; i < properties.Length; i++)
57 {
58 if (isIgnore)
59 {
60 if (!propertyIndexes.Contains(i))
61 {
62 if (isFirstPropertyWritten)
63 {
64 output += ",";
65 }
66
67 output += PreProcess(properties[i].GetValue(this).ToString());
68
69 if (!isFirstPropertyWritten)
70 {
71 isFirstPropertyWritten = true;
72 }
73 }
74 }
75 else
76 {
77 if (propertyIndexes.Contains(i))
78 {
79 if (isFirstPropertyWritten)
80 {
81 output += ",";
82 }
83
84 output += PreProcess(properties[i].GetValue(this).ToString());
85
86 if (!isFirstPropertyWritten)
87 {
88 isFirstPropertyWritten = true;
89 }
90 }
91 }
92
93 }
94
95 return output;
96}
Here, we:
isFirstPropertyWritten
to see if we write the first property yet.PropertyInfo
using reflection.isIgnore
flag.isFirstPropertyWritten
so that we will keep adding commas.1class Program
2{
3 static void Main(string[] args)
4 {
5 var p = new Person(1,"murat","aykanat");
6 Console.WriteLine("Ignore by property name:");
7 Console.WriteLine("Ignoring Id property: "
8 + p.ToCsv(new []{"Id"}, true));
9 Console.WriteLine("Ignoring Name property: "
10 + p.ToCsv(new[] { "Name" }, true));
11 Console.WriteLine("Ignoring Lastname property: "
12 + p.ToCsv(new[] { "Lastname" },true));
13 Console.WriteLine("Ignore by property index:");
14 Console.WriteLine("Ignoring 0->Id and 2->Lastname: "
15 + p.ToCsv(new[] { 0,2 },true));
16 Console.WriteLine("Ignoring everything but Id: "
17 + p.ToCsv(new[] { "Id" }, false));
18 Console.ReadLine();
19 }
20}
Output is:
1Ignore by property name:
2Ignoring Id property:
3murat,aykanat
4Ignoring Name property:
51,aykanat
6Ignoring Lastname property:
71,murat
8Ignore by property index:
9Ignoring 0->Id and 2->Lastname:
10murat
11Ignoring everything but Id:
121
So far we only used value types as our properties. However what happens if we have a reference type property which also is derived from CsvableBase
?
Below is an example of such scenario:
1public class Address : CsvableBase
2{
3 public Address(string city, string country)
4 {
5 City = city;
6 Country = country;
7 }
8 public string City { get; set; }
9 public string Country { get; set; }
10}
11public class Person : CsvableBase
12{
13 public Person(int id, string name, string lastname, Address address)
14 {
15 Id = id;
16 Name = name;
17 Lastname = lastname;
18 Address = address;
19 }
20
21 public int Id { get; set; }
22 public string Name { get; set; }
23 public string Lastname { get; set; }
24 public Address Address { get; set; }
25}
The idea is, while iterating over the properties in our code, we need to dedect if the type of the property derives from CsvableBase
. Then call the ToCsv()
method of that object instance by using reflection.
To achieve this, we need to modify our ToCsv()
methods:
1public virtual string ToCsv()
2{
3 string output = "";
4
5 var properties = GetType().GetProperties();
6
7 for (var i = 0; i < properties.Length; i++)
8 {
9 if (properties[i].PropertyType.IsSubclassOf(typeof (CsvableBase)))
10 {
11 var m = properties[i].PropertyType
12 .GetMethod("ToCsv", new Type[0]);
13 output += m.Invoke(properties[i].GetValue(this),
14 new object[0]);
15 }
16 else
17 {
18 output += PreProcess(properties[i]
19 .GetValue(this).ToString());
20 }
21 if (i != properties.Length - 1)
22 {
23 output += ",";
24 }
25 }
26
27 return output;
28}
29
30public virtual string ToCsv(string[] propertyNames, bool isIgnore)
31{
32 string output = "";
33 bool isFirstPropertyWritten = false;
34
35
36 var properties = GetType().GetProperties();
37
38 for (var i = 0; i < properties.Length; i++)
39 {
40 if (isIgnore)
41 {
42 if (!propertyNames.Contains(properties[i].Name))
43 {
44 if (isFirstPropertyWritten)
45 {
46 output += ",";
47 }
48
49 if (properties[i].PropertyType
50 .IsSubclassOf(typeof(CsvableBase)))
51 {
52 var m = properties[i].PropertyType
53 .GetMethod("ToCsv", new Type[0]);
54 output += m.Invoke(properties[i].GetValue(this),
55 new object[0]);
56 }
57 else
58 {
59 output += PreProcess(properties[i]
60 .GetValue(this).ToString());
61 }
62
63 if (!isFirstPropertyWritten)
64 {
65 isFirstPropertyWritten = true;
66 }
67 }
68 }
69 else
70 {
71 if (propertyNames.Contains(properties[i].Name))
72 {
73 if (isFirstPropertyWritten)
74 {
75 output += ",";
76 }
77
78 if (properties[i].PropertyType
79 .IsSubclassOf(typeof(CsvableBase)))
80 {
81 var m = properties[i].PropertyType
82 .GetMethod("ToCsv", new Type[0]);
83 output += m.Invoke(properties[i].GetValue(this),
84 new object[0]);
85 }
86 else
87 {
88 output += PreProcess(properties[i]
89 .GetValue(this).ToString());
90 }
91
92 if (!isFirstPropertyWritten)
93 {
94 isFirstPropertyWritten = true;
95 }
96 }
97 }
98 }
99
100 return output;
101}
102
103public virtual string ToCsv(int[] propertyIndexes, bool isIgnore)
104{
105 string output = "";
106
107 bool isFirstPropertyWritten = false;
108
109 var properties = GetType().GetProperties();
110
111 for (var i = 0; i < properties.Length; i++)
112 {
113 if (isIgnore)
114 {
115 if (!propertyIndexes.Contains(i))
116 {
117 if (isFirstPropertyWritten)
118 {
119 output += ",";
120 }
121
122 if (properties[i].PropertyType
123 .IsSubclassOf(typeof(CsvableBase)))
124 {
125 var m = properties[i].PropertyType
126 .GetMethod("ToCsv", new Type[0]);
127 output += m.Invoke(properties[i].GetValue(this),
128 new object[0]);
129 }
130 else
131 {
132 output += PreProcess(properties[i]
133 .GetValue(this).ToString());
134 }
135
136 if (!isFirstPropertyWritten)
137 {
138 isFirstPropertyWritten = true;
139 }
140 }
141 }
142 else
143 {
144 if (propertyIndexes.Contains(i))
145 {
146 if (isFirstPropertyWritten)
147 {
148 output += ",";
149 }
150
151 if (properties[i].PropertyType
152 .IsSubclassOf(typeof(CsvableBase)))
153 {
154 var m = properties[i].PropertyType
155 .GetMethod("ToCsv", new Type[0]);
156 output += m.Invoke(properties[i].GetValue(this),
157 new object[0]);
158 }
159 else
160 {
161 output += PreProcess(properties[i]
162 .GetValue(this).ToString());
163 }
164
165 if (!isFirstPropertyWritten)
166 {
167 isFirstPropertyWritten = true;
168 }
169 }
170 }
171
172 }
173
174 return output;
175}
Now let's try our modified code:
1class Program
2{
3 static void Main(string[] args)
4 {
5 var p = new Person(1,"murat","aykanat",
6 new Address("city1", "country1"));
7 Console.WriteLine(p.ToCsv());
8 Console.ReadLine();
9 }
10}
Output is displayed as:
11,murat,aykanat,city1,country1
Property ignoring works the same way as before, however you need to include parameters through reflection. But since I want to keep things simple at this point, I will not add the ignore feature of properties that derive from CsvableBase
.
Since we laid the groundwork in CsvableBase
, CsvWriter
itself is very simple:
1public class CsvWriter<T> where T : CsvableBase
2{
3 public void Write(IEnumerable<T> objects, string destination)
4 {
5 var objs = objects as IList<T> ?? objects.ToList();
6 if (objs.Any())
7 {
8 using (var sw = new StreamWriter(destination))
9 {
10 foreach (var obj in objs)
11 {
12 sw.WriteLine(obj.ToCsv());
13 }
14 }
15 }
16 }
17
18 public void Write(IEnumerable<T> objects, string destination,
19 string[] propertyNames, bool isIgnore)
20 {
21 var objs = objects as IList<T> ?? objects.ToList();
22 if (objs.Any())
23 {
24 using (var sw = new StreamWriter(destination))
25 {
26 foreach (var obj in objs)
27 {
28 sw.WriteLine(obj.ToCsv(propertyNames, isIgnore));
29 }
30 }
31 }
32 }
33
34 public void Write(IEnumerable<T> objects, string destination,
35 int[] propertyIndexes, bool isIgnore)
36 {
37 var objs = objects as IList<T> ?? objects.ToList();
38 if (objs.Any())
39 {
40 using (var sw = new StreamWriter(destination))
41 {
42 foreach (var obj in objs)
43 {
44 sw.WriteLine(obj.ToCsv(propertyIndexes, isIgnore));
45 }
46 }
47 }
48 }
49}
Now let's try our code with our initial example:
1class Program
2{
3 static void Main(string[] args)
4 {
5 var people = new List<Person>
6 {
7 new Person(1, "murat", "aykanat",
8 new Address("city1","country1")),
9 new Person(2, "john", "smith",
10 new Address("city2","country2"))
11 };
12
13 var cw = new CsvWriter<Person>();
14 cw.WriteFromEnumerable(people, "example.csv");
15 }
16}
If we check our application folder we can see our newly created file.
If you open the file you can see output as expected:
11,murat,aykanat,city1,country1
22,john,smith,city2,country2
Now, we have to reverse the process to read from our CSV file. To do this, we will follow what we did while writing. By this logic, as we implemented ToCsv()
, we need another method to reverse the CSV process in the CsvableBase
class. Let's call this method AssignValuesFromCsv()
. Maybe not the most creative name, but we will go with that for now.
In this method we will do what we did in ToCsv()
method. First, we will check whether the current property is derived from CsvableBase
or not. After that, we will save the data back to the public properties.
1public virtual void AssignValuesFromCsv(string[] propertyValues)
2{
3 var properties = GetType().GetProperties();
4 for (var i = 0; i < properties.Length; i++)
5 {
6 if (properties[i].PropertyType
7 .IsSubclassOf(typeof (CsvableBase)))
8 {
9 var instance = Activator.CreateInstance(properties[i].PropertyType);
10 var instanceProperties = instance.GetType().GetProperties();
11 var propertyList = new List<string>();
12
13 for (var j = 0; j < instanceProperties.Length; j++)
14 {
15 propertyList.Add(propertyValues[i+j]);
16 }
17 var m = instance.GetType().GetMethod("AssignValuesFromCsv", new Type[] { typeof(string[]) });
18 m.Invoke(instance, new object[] {propertyList.ToArray()});
19 properties[i].SetValue(this, instance);
20
21 i += instanceProperties.Length;
22 }
23 else
24 {
25 var type = properties[i].PropertyType.Name;
26 switch (type)
27 {
28 case "Int32":
29 properties[i].SetValue(this,
30 int.Parse(propertyValues[i]));
31 break;
32 default:
33 properties[i].SetValue(this, propertyValues[i]);
34 break;
35 }
36 }
37 }
38}
Here we:
CsvableBase
.AssignValuesFromCsv()
with its properties.CsvableBase
, just assign it to the property value according to the switch
.You may notice we don't have float
, double
or char
in our switch statement. That's because in this example we only have int
and string
so I didn't want to make the class bigger.
So, now we have to iterate over the objects via our CsvReader
class.
1public class CsvReader<T> where T : CsvableBase, new()
2{
3 public IEnumerable<T> Read(string filePath, bool hasHeaders)
4 {
5 var objects = new List<T>();
6 using (var sr = new StreamReader(filePath))
7 {
8 bool headersRead = false;
9 string line;
10 do
11 {
12 line = sr.ReadLine();
13
14 if (line != null && headersRead)
15 {
16 var obj = new T();
17 var propertyValues = line.Split(',');
18 obj.AssignValuesFromCsv(propertyValues);
19 objects.Add(obj);
20 }
21 if (!headersRead)
22 {
23 headersRead = true;
24 }
25 } while ( line != null);
26 }
27
28 return objects;
29 }
30}
Remember when I wrote that we may need ToString()
override somewhere? Well, now we need it to print Person
and Address
objects. Also, we need to add an empty constructor for CsvReader
to work.
1public class Address : CsvableBase
2{
3 public Address()
4 {
5
6 }
7 public Address(string city, string country)
8 {
9 City = city;
10 Country = country;
11 }
12 public string City { get; set; }
13 public string Country { get; set; }
14 public override string ToString()
15 {
16 return " " +City + " / " + Country;
17 }
18}
19public class Person : CsvableBase
20{
21 public Person()
22 {
23
24 }
25 public Person(int id, string name, string lastname, Address address)
26 {
27 Id = id;
28 Name = name;
29 Lastname = lastname;
30 Address = address;
31 }
32
33 public int Id { get; set; }
34 public string Name { get; set; }
35 public string Lastname { get; set; }
36 public Address Address { get; set; }
37 public override string ToString()
38 {
39 return Name + " " + Lastname + " " + Address;
40 }
41}
Let's try our code:
1class Program
2{
3 static void Main(string[] args)
4 {
5 var people = new List<Person>
6 {
7 new Person(1, "murat", "aykanat", new Address("city1","country1")),
8 new Person(2, "john", "smith", new Address("city2","country2"))
9 };
10
11 var cw = new CsvWriter<Person>();
12 cw.WriteFromEnumerable(people, "example.csv", true);
13
14 var cr = new CsvReader<Person>();
15 var csvPeople = cr.Read("example.csv", true);
16 foreach (var person in csvPeople)
17 {
18 Console.WriteLine(person);
19 }
20
21 Console.ReadLine();
22 }
23}
Output:
1murat aykanat city1 / country1
2john smith city2 / country2
In this guide, I explained how you would develop your very own CSV writer and reader class. We used features of reflection to extract properties from classes and process them as needed, so we can just plug any class we want into our reader and writer. One of the benefits of generating your own CSV processing class is that you can modify it as you need different features so you don't get stuck with 3rd party libraries.
I hope this guide will be useful for your projects.
Happy coding!