Author avatar

Murat Aykanat

Building a Generic CSV Writer/Reader Using Reflection

Murat Aykanat

  • Dec 15, 2018
  • 29 Min read
  • 42,591 Views
  • Dec 15, 2018
  • 29 Min read
  • 42,591 Views
Microsoft.NET

Introduction

(Comma Seperated Value) CSV file format is a very common way of storing datasets in a simple and portable format. Also, since it is plain text, it is very easy to make such a file programmatically in an application. There are libraries around for such tasks but if you want to customize or create something specific it is better to write your own implementation for full control.

In this guide, I will explain how to write a generic CSV Writer/Reader that will automatically pick data from the public properties of the input objects and generate a CSV file.

Before we dive into the code, let me explain what is CSV file format so we have a better understanding of what we are dealing with.

What is a CSV File?

A CSV file is a plain text file which holds data in table format. Each line is like a row in a table, and columns are separated by a comma. Hence, the name comma separated value.

Below is an example of such a file containing table of numbers, first names and last names of people:

11,murat,aykanat
22,john,smith
csv

CSV format is very useful because it is a text file and any operating system can read it. For example, if your application creates the CSV file on a Windows machine, you can open it up and use it on a Linux machine. So it is a very portable and easily readable file format.

Formatting of a CSV File

Separator Issues

In some cases, the file may not be comma separated. Especially if you are using a 3rd party program(e.g. Microsoft Excell), depending on the culture of your machine "separator" might be a different character such as ;. This is because of the decimal separator is different in different cultures. In some cultures, the decimal separator is . so the CSV separator can be ,. But in some cultures decimal separator is , so the CSV file has to use ; as a separator.

For example, if your locale is set to some European culture, such as fr-FR, default decimal separator becomes , and you need to use ; in CSV file as column separator:

13,5;2,5;5,4
24,5;6,7;8,9
csv

However in a machine which has en-US set as default, since decimal separator is . by default, same CSV file would look like this:

13.5,2.5,5.4
24.5,6.7,8.9
csv

Number of Data Fields in Each Row

The most critical rule is every row must contain the same number of data fields otherwise it would be impossible to read by any CSV reader also it would not make sense.

If you have an empty data field, you can just use an empty string.

11,,aykanat
22,john,smith
csv

Comma in a Data Field

If you have text values in your CSV file, you might run into a problem where there is a comma inside one of your rows which would pose a problem because that field would be divided from that comma and you would end up with an extra column in that row.

11, Hello, world!
csv

In the above example, our first column is 1 and the second is Hello, world!, however, a CSV reader would divide the row into 3 columns 1, Hello and world!.

To solve this issue we must use quotation marks:

11, "Hello, world!"
csv

This way we mean that the string Hello, world! is a single data field.

You can also use quotation mark on single word strings, but it is not necessary.

11,"murat","aykanat"
21,"john","smith"
csv

Quotation Marks in a Data Field

We can also have actual quotation marks in our data fields. In this case, we need to double our quotation marks indicating it is included in the data field.

11,murat,""aykanat""
22,john,""smith""
csv

That would read as; number is 1, name is murat, lastname is "aykanat".

Headers

You can also add headers to the columns.

1id,name,lastname
21,murat,aykanat
31,john,smith
csv

This is useful when the columns are not clear by its data fields. For example, if columns are all numbers, and you send the file to a person who does not know what the columns mean, that would be very confusing for the other person because he or she doesn't know what those numbers mean.So it would be better if we include headers in those scenarios to indicate what those columns of values mean.

More Details

If you would like to know more about CSV file format, you can use the Wikipedia article about CSV and its resources.

The Theory

Our idea is simple, we want to input an array of objects into our CSV Writer and output a CSV file. For the reading part, we want to input the file path and output an array of objects back into the memory.

Input

1public class Person
2{
3    public int Id{ get; set };
4    public string Name { get; set; }
5    public string Lastname { get; set; }
6}
csharp

Output

11,murat,aykanat
22,john,smith
csv

However there are some considerations:

  • CSV Writer and Reader must abide by the rules of CSV file format mentioned above
  • The process of writing must be automated. What I mean by this is, if we have 2 public properties in a class, it is fairly easy to write those properties to a file. But what if we have 100 or 10000 public properties? We just can't write them one by one.
  • Every object should output its public properties via a method.

Optional considerations:

  • Since some CSV readers out there do not support UTF-8 encoding, all "special" characters must be converted into ASCII counterparts if possible. Note that this may not be possible in some languages, however since I will be giving Turkish special characters as an example, it will be possible in this guide.
  • Unless explicitly defined by quotation marks left and right spaces must be trimmed, because in my experience unnecessary left and right spaces are usually a mistake by the person who input the values, especially if the values are copied from some other application. If we absolutely need left and right spaces, we can use quotation marks.
  • Properties can be ignored with their index or their name.

CSV Writer

According to our considerations above we need a method for each class, ToCsv(), that we want to turn to CSV format. This can be done in 3 ways:

  • Interface implementation ICsvable with a method ToCsv()
  • Override ToString()
  • Abstract base class CsvableBase having a virtual ToCsv() method

I didn't want to use the ToString() override because maybe we need it somewhere else in our code, and I want the ToCsv() to be a separate method so it will be clear what it is doing. Also, since the code will be same for each class, I will go with the abstract class way. However if your particular class is already inheriting from a certain base class, you must go with the interface implementation since you can't inherit from more than one base class in C#. You just need to copy and paste for each class you want to turn to CSV.

Basic Implementation

What we need to do here is use reflection to get all the values of public properties of the class.

1public abstract class CsvableBase
2{
3    public virtual string ToCsv()
4    {
5        string output = "";
6
7        var properties = GetType().GetProperties();
8
9        for (var i = 0; i < properties.Length; i++)
10        {
11            output += properties[i].GetValue(this).ToString();
12            if (i != properties.Length - 1)
13            {
14                output += ",";
15            }
16        }
17
18        return output;
19    }
20}
csharp

So what we do here is simple:

  • Get all public properties of this class as a collection of PropertyInfo.
  • Iterate over them, get their value and add it to the output and put a comma.
  • If we reach the end, do not put a comma.

Let's test this on the Person class we created earlier in a console application.

1public class Person : CsvableBase
2{
3    public Person(int id, string name, string lastname)
4    {
5        Id = id;
6        Name = name;
7        Lastname = lastname;
8    }
9
10    public int Id { get; set; }
11    public string Name { get; set; }
12    public string Lastname { get; set; }
13}
14class Program
15{
16    static void Main(string[] args)
17    {
18        var p = new Person(1,"murat","aykanat");
19        Console.WriteLine(p.ToCsv());
20        Console.ReadLine();
21    }
22}
csharp

Output of above code:

11,murat,aykanat
text

Comma, Quotation Marks and Special Characters

To pre-process commas, quotation marks, and special characters, let's add a pre-processing method to our base class.

1public abstract class CsvableBase
2{
3    public virtual string ToCsv()
4    {
5        string output = "";
6
7        var properties = GetType().GetProperties();
8
9        for (var i = 0; i < properties.Length; i++)
10        {
11            output += PreProcess(properties[i].GetValue(this).ToString());
12            if (i != properties.Length - 1)
13            {
14                output += ",";
15            }
16        }
17
18        return output;
19    }
20    private string PreProcess(string input)
21    {
22        input = input.Replace('ı', 'i')
23            .Replace('ç', 'c')
24            .Replace('ö', 'o')
25            .Replace('ş', 's')
26            .Replace('ü', 'u')
27            .Replace('ğ', 'g')
28            .Replace('İ', 'I')
29            .Replace('Ç', 'C')
30            .Replace('Ö', 'O')
31            .Replace('Ş', 'S')
32            .Replace('Ü', 'U')
33            .Replace('Ğ', 'G')
34            .Replace(""", """")
35            .Trim();
36        if (input.Contains(","))
37        {
38            input = """ + input + """;
39        }
40        return input;
41    }
42}
csharp

Let's change our Person object into something that can test our pre-processing method.

1class Program
2{
3    static void Main(string[] args)
4    {
5        var p = new Person(1,""Hello", world!","İĞÜÇÖıüşöç");
6        Console.WriteLine(p.ToCsv());
7        Console.ReadLine();
8    }
9}
csharp

Output of this code:

11,"""Hello"", world!",IGUCOiusoc
text

Ignoring Properties

Sometimes you may not need all the public properties in a class. So we need to ignore or filter properties either by their name or their index in the list of properties we acquire by reflection.

But what happens in the edge cases such as the beginning or the end? How do we manage commas?

We just need to modify our code a bit to achieve what we want. Let's add following methods to our CsvableBase class.

1public virtual string ToCsv(string[] propertyNames, bool isIgnore)
2{
3    string output = "";
4    bool isFirstPropertyWritten = false;
5
6
7    var properties = GetType().GetProperties();
8
9    for (var i = 0; i < properties.Length; i++)
10    {
11        if (isIgnore)
12        {
13            if (!propertyNames.Contains(properties[i].Name))
14            {
15                if (isFirstPropertyWritten)
16                {
17                    output += ",";
18                }
19                output += PreProcess(properties[i].GetValue(this).ToString());
20
21                if (!isFirstPropertyWritten)
22                {
23                    isFirstPropertyWritten = true;
24                }
25            }
26        }
27        else
28        {
29            if (propertyNames.Contains(properties[i].Name))
30            {
31                if (isFirstPropertyWritten)
32                {
33                    output += ",";
34                }
35                output += PreProcess(properties[i].GetValue(this).ToString());
36
37                if (!isFirstPropertyWritten)
38                {
39                    isFirstPropertyWritten = true;
40                }
41            }
42        }
43    }
44
45    return output;
46}
47
48public virtual string ToCsv(int[] propertyIndexes, bool isIgnore)
49{
50    string output = "";
51
52    bool isFirstPropertyWritten = false;
53
54    var properties = GetType().GetProperties();
55
56    for (var i = 0; i < properties.Length; i++)
57    {
58        if (isIgnore)
59        {
60            if (!propertyIndexes.Contains(i))
61            {
62                if (isFirstPropertyWritten)
63                {
64                    output += ",";
65                }
66
67                output += PreProcess(properties[i].GetValue(this).ToString());
68
69                if (!isFirstPropertyWritten)
70                {
71                    isFirstPropertyWritten = true;
72                }
73            }
74        }
75        else
76        {
77            if (propertyIndexes.Contains(i))
78            {
79                if (isFirstPropertyWritten)
80                {
81                    output += ",";
82                }
83
84                output += PreProcess(properties[i].GetValue(this).ToString());
85
86                if (!isFirstPropertyWritten)
87                {
88                    isFirstPropertyWritten = true;
89                }
90            }
91        }
92
93    }
94
95    return output;
96}
csharp

Here, we:

  • Define a boolean isFirstPropertyWritten to see if we write the first property yet.
  • Get all the properties as PropertyInfo using reflection.
  • Iterate over the properties, checking if the current property name or index is in the ignore or filter list by isIgnore flag.
  • If it is not, we check if the first property is written.
  • If the first property is not written then, we add a comma to output, since we will keep adding properties after that.
  • Preprocess and add the current property's value to output.
  • Set isFirstPropertyWritten so that we will keep adding commas.

Test our code

1class Program
2{
3    static void Main(string[] args)
4    {
5        var p = new Person(1,"murat","aykanat");
6        Console.WriteLine("Ignore by property name:");
7        Console.WriteLine("Ignoring Id property: "
8        + p.ToCsv(new []{"Id"}, true));
9        Console.WriteLine("Ignoring Name property: "
10        + p.ToCsv(new[] { "Name" }, true));
11        Console.WriteLine("Ignoring Lastname property: "
12        + p.ToCsv(new[] { "Lastname" },true));
13        Console.WriteLine("Ignore by property index:");
14        Console.WriteLine("Ignoring 0->Id and 2->Lastname: "
15        + p.ToCsv(new[] { 0,2 },true));
16        Console.WriteLine("Ignoring everything but Id: "
17        + p.ToCsv(new[] { "Id" }, false));
18        Console.ReadLine();
19    }
20}
csharp

Output is:

1Ignore by property name:
2Ignoring Id property:
3murat,aykanat
4Ignoring Name property:
51,aykanat
6Ignoring Lastname property:
71,murat
8Ignore by property index:
9Ignoring 0->Id and 2->Lastname:
10murat
11Ignoring everything but Id:
121
text

Properties that Derive from CsvableBase

So far we only used value types as our properties. However what happens if we have a reference type property which also is derived from CsvableBase? Below is an example of such scenario:

1public class Address : CsvableBase
2{
3    public Address(string city, string country)
4    {
5        City = city;
6        Country = country;
7    }
8    public string City { get; set; }
9    public string Country { get; set; }
10}
11public class Person : CsvableBase
12{
13    public Person(int id, string name, string lastname, Address address)
14    {
15        Id = id;
16        Name = name;
17        Lastname = lastname;
18        Address = address;
19    }
20
21    public int Id { get; set; }
22    public string Name { get; set; }
23    public string Lastname { get; set; }
24    public Address Address { get; set; }
25}
csharp

The idea is, while iterating over the properties in our code, we need to dedect if the type of the property derives from CsvableBase. Then call the ToCsv() method of that object instance by using reflection.

To achieve this, we need to modify our ToCsv() methods:

1public virtual string ToCsv()
2{
3    string output = "";
4
5    var properties = GetType().GetProperties();
6
7    for (var i = 0; i < properties.Length; i++)
8    {
9        if (properties[i].PropertyType.IsSubclassOf(typeof (CsvableBase)))
10        {
11            var m = properties[i].PropertyType
12                    .GetMethod("ToCsv", new Type[0]);
13            output += m.Invoke(properties[i].GetValue(this),
14                                new object[0]);
15        }
16        else
17        {
18            output += PreProcess(properties[i]
19                                .GetValue(this).ToString());
20        }
21        if (i != properties.Length - 1)
22        {
23            output += ",";
24        }
25    }
26
27    return output;
28}
29
30public virtual string ToCsv(string[] propertyNames, bool isIgnore)
31{
32    string output = "";
33    bool isFirstPropertyWritten = false;
34
35
36    var properties = GetType().GetProperties();
37
38    for (var i = 0; i < properties.Length; i++)
39    {
40        if (isIgnore)
41        {
42            if (!propertyNames.Contains(properties[i].Name))
43            {
44                if (isFirstPropertyWritten)
45                {
46                    output += ",";
47                }
48
49                if (properties[i].PropertyType
50                    .IsSubclassOf(typeof(CsvableBase)))
51                {
52                    var m = properties[i].PropertyType
53                    .GetMethod("ToCsv", new Type[0]);
54                    output += m.Invoke(properties[i].GetValue(this),
55                                        new object[0]);
56                }
57                else
58                {
59                    output += PreProcess(properties[i]
60                                .GetValue(this).ToString());
61                }
62
63                if (!isFirstPropertyWritten)
64                {
65                    isFirstPropertyWritten = true;
66                }
67            }
68        }
69        else
70        {
71            if (propertyNames.Contains(properties[i].Name))
72            {
73                if (isFirstPropertyWritten)
74                {
75                    output += ",";
76                }
77
78                if (properties[i].PropertyType
79                .IsSubclassOf(typeof(CsvableBase)))
80                {
81                    var m = properties[i].PropertyType
82                            .GetMethod("ToCsv", new Type[0]);
83                    output += m.Invoke(properties[i].GetValue(this),
84                                        new object[0]);
85                }
86                else
87                {
88                    output += PreProcess(properties[i]
89                                .GetValue(this).ToString());
90                }
91
92                if (!isFirstPropertyWritten)
93                {
94                    isFirstPropertyWritten = true;
95                }
96            }
97        }
98    }
99
100    return output;
101}
102
103public virtual string ToCsv(int[] propertyIndexes, bool isIgnore)
104{
105    string output = "";
106
107    bool isFirstPropertyWritten = false;
108
109    var properties = GetType().GetProperties();
110
111    for (var i = 0; i < properties.Length; i++)
112    {
113        if (isIgnore)
114        {
115            if (!propertyIndexes.Contains(i))
116            {
117                if (isFirstPropertyWritten)
118                {
119                    output += ",";
120                }
121
122                if (properties[i].PropertyType
123                    .IsSubclassOf(typeof(CsvableBase)))
124                {
125                    var m = properties[i].PropertyType
126                            .GetMethod("ToCsv", new Type[0]);
127                    output += m.Invoke(properties[i].GetValue(this),
128                                        new object[0]);
129                }
130                else
131                {
132                    output += PreProcess(properties[i]
133                                .GetValue(this).ToString());
134                }
135
136                if (!isFirstPropertyWritten)
137                {
138                    isFirstPropertyWritten = true;
139                }
140            }
141        }
142        else
143        {
144            if (propertyIndexes.Contains(i))
145            {
146                if (isFirstPropertyWritten)
147                {
148                    output += ",";
149                }
150
151                if (properties[i].PropertyType
152                    .IsSubclassOf(typeof(CsvableBase)))
153                {
154                    var m = properties[i].PropertyType
155                            .GetMethod("ToCsv", new Type[0]);
156                    output += m.Invoke(properties[i].GetValue(this),
157                                        new object[0]);
158                }
159                else
160                {
161                    output += PreProcess(properties[i]
162                                .GetValue(this).ToString());
163                }
164
165                if (!isFirstPropertyWritten)
166                {
167                    isFirstPropertyWritten = true;
168                }
169            }
170        }
171
172    }
173
174    return output;
175}
csharp

Now let's try our modified code:

1class Program
2{
3    static void Main(string[] args)
4    {
5        var p = new Person(1,"murat","aykanat",
6                            new Address("city1", "country1"));
7        Console.WriteLine(p.ToCsv());
8        Console.ReadLine();
9    }
10}
csharp

Output is displayed as:

11,murat,aykanat,city1,country1
text

Property ignoring works the same way as before, however you need to include parameters through reflection. But since I want to keep things simple at this point, I will not add the ignore feature of properties that derive from CsvableBase.

Generic Writer

Since we laid the groundwork in CsvableBase, CsvWriter itself is very simple:

1public class CsvWriter<T> where T : CsvableBase
2{
3	public void Write(IEnumerable<T> objects, string destination)
4	{
5		var objs = objects as IList<T> ?? objects.ToList();
6		if (objs.Any())
7		{
8			using (var sw = new StreamWriter(destination))
9			{
10				foreach (var obj in objs)
11				{
12					sw.WriteLine(obj.ToCsv());
13				}
14			}
15		}
16	}
17
18	public void Write(IEnumerable<T> objects, string destination,
19	                    string[] propertyNames, bool isIgnore)
20	{
21		var objs = objects as IList<T> ?? objects.ToList();
22		if (objs.Any())
23		{
24			using (var sw = new StreamWriter(destination))
25			{
26				foreach (var obj in objs)
27				{
28					sw.WriteLine(obj.ToCsv(propertyNames, isIgnore));
29				}
30			}
31		}
32	}
33
34	public void Write(IEnumerable<T> objects, string destination,
35	        int[] propertyIndexes, bool isIgnore)
36	{
37		var objs = objects as IList<T> ?? objects.ToList();
38		if (objs.Any())
39		{
40			using (var sw = new StreamWriter(destination))
41			{
42				foreach (var obj in objs)
43				{
44					sw.WriteLine(obj.ToCsv(propertyIndexes, isIgnore));
45				}
46			}
47		}
48	}
49}
csharp

Now let's try our code with our initial example:

1class Program
2{
3	static void Main(string[] args)
4	{
5		var people = new List<Person>
6            {
7                new Person(1, "murat", "aykanat",
8                            new Address("city1","country1")),
9                new Person(2, "john", "smith",
10                            new Address("city2","country2"))
11            };
12
13            var cw = new CsvWriter<Person>();
14            cw.WriteFromEnumerable(people, "example.csv");
15	}
16}
csharp

If we check our application folder we can see our newly created file.

description

If you open the file you can see output as expected:

11,murat,aykanat,city1,country1
22,john,smith,city2,country2
text

CSV Reader

Now, we have to reverse the process to read from our CSV file. To do this, we will follow what we did while writing. By this logic, as we implemented ToCsv(), we need another method to reverse the CSV process in the CsvableBase class. Let's call this method AssignValuesFromCsv(). Maybe not the most creative name, but we will go with that for now. In this method we will do what we did in ToCsv() method. First, we will check whether the current property is derived from CsvableBase or not. After that, we will save the data back to the public properties.

1public virtual void AssignValuesFromCsv(string[] propertyValues)
2{
3    var properties = GetType().GetProperties();
4    for (var i = 0; i < properties.Length; i++)
5    {
6        if (properties[i].PropertyType
7            .IsSubclassOf(typeof (CsvableBase)))
8        {
9            var instance = Activator.CreateInstance(properties[i].PropertyType);
10            var instanceProperties = instance.GetType().GetProperties();
11            var propertyList = new List<string>();
12
13            for (var j = 0; j < instanceProperties.Length; j++)
14            {
15                propertyList.Add(propertyValues[i+j]);
16            }
17            var m = instance.GetType().GetMethod("AssignValuesFromCsv",                               new Type[] { typeof(string[]) });
18            m.Invoke(instance, new object[] {propertyList.ToArray()});
19            properties[i].SetValue(this, instance);
20
21            i += instanceProperties.Length;
22        }
23        else
24        {
25            var type = properties[i].PropertyType.Name;
26            switch (type)
27            {
28                case "Int32":
29                    properties[i].SetValue(this,
30                                    int.Parse(propertyValues[i]));
31                    break;
32                default:
33                    properties[i].SetValue(this, propertyValues[i]);
34                    break;
35            }
36        }
37    }
38}
csharp

Here we:

  • Get all public properties of the object.
  • Iterate over the properties.
  • Check if the current property is derived from CsvableBase.
  • If so, create a temporary instance of that object.
  • Get its properties.
  • Call AssignValuesFromCsv() with its properties.
  • If the property is not derived from CsvableBase, just assign it to the property value according to the switch.

You may notice we don't have float, double or char in our switch statement. That's because in this example we only have int and string so I didn't want to make the class bigger.

So, now we have to iterate over the objects via our CsvReader class.

1public class CsvReader<T> where T : CsvableBase, new()
2{
3    public IEnumerable<T> Read(string filePath, bool hasHeaders)
4    {
5        var objects = new List<T>();
6        using (var sr = new StreamReader(filePath))
7        {
8            bool headersRead = false;
9            string line;
10            do
11            {
12                line = sr.ReadLine();
13
14                if (line != null && headersRead)
15                {
16                    var obj = new T();
17                    var propertyValues = line.Split(',');
18                    obj.AssignValuesFromCsv(propertyValues);
19                    objects.Add(obj);
20                }
21                if (!headersRead)
22                {
23                    headersRead = true;
24                }
25            } while ( line != null);
26        }
27
28        return objects;
29    }
30}
csharp

Remember when I wrote that we may need ToString() override somewhere? Well, now we need it to print Person and Address objects. Also, we need to add an empty constructor for CsvReader to work.

1public class Address : CsvableBase
2{
3    public Address()
4    {
5
6    }
7    public Address(string city, string country)
8    {
9        City = city;
10        Country = country;
11    }
12    public string City { get; set; }
13    public string Country { get; set; }
14    public override string ToString()
15    {
16        return " " +City + " / " + Country;
17    }
18}
19public class Person : CsvableBase
20{
21    public Person()
22    {
23
24    }
25    public Person(int id, string name, string lastname, Address address)
26    {
27        Id = id;
28        Name = name;
29        Lastname = lastname;
30        Address = address;
31    }
32
33    public int Id { get; set; }
34    public string Name { get; set; }
35    public string Lastname { get; set; }
36    public Address Address { get; set; }
37    public override string ToString()
38    {
39        return Name + " " + Lastname + " " + Address;
40    }
41}
csharp

Let's try our code:

1class Program
2{
3    static void Main(string[] args)
4    {
5        var people = new List<Person>
6        {
7            new Person(1, "murat", "aykanat", new Address("city1","country1")),
8            new Person(2, "john", "smith", new Address("city2","country2"))
9        };
10
11        var cw = new CsvWriter<Person>();
12        cw.WriteFromEnumerable(people, "example.csv", true);
13
14        var cr = new CsvReader<Person>();
15        var csvPeople = cr.Read("example.csv", true);
16        foreach (var person in csvPeople)
17        {
18            Console.WriteLine(person);
19        }
20
21        Console.ReadLine();
22    }
23}
csharp

Output:

1murat aykanat  city1 / country1
2john smith  city2 / country2
text

Conclusion

In this guide, I explained how you would develop your very own CSV writer and reader class. We used features of reflection to extract properties from classes and process them as needed, so we can just plug any class we want into our reader and writer. One of the benefits of generating your own CSV processing class is that you can modify it as you need different features so you don't get stuck with 3rd party libraries.

I hope this guide will be useful for your projects.

Happy coding!