Skip to main content

Copying data from a spreadsheet into a SQL table

Long before I even knew what SSIS was, I was doing what some might call ad-hoc programming when I was given a spreadsheet and told that they (the business, user) needed this data saved in a database. I looked at the spreadsheet and saw that there were 5,000 rows and I knew I didn't want to do this manually, so I did the next best thing. Regex. Here is how I did it.

Copied the data into a text editor

These days I like to use VS Code as my text editor as it has a lot of plugins and built-in support for source control (which is honestly the big reason why I use it). So, I took my data, which looks like this:

User data
User data

and pasted it in VS Code. The data in VS Code doesn't look very nice, but trust me that the data is indeed in a usable format for us.

Data in VS Code
The data in VS Code

Select the data with regex

In order to move this data into a SQL table, I need to INSERT the data in a SQL table. The easiest way to do this is to pull out each of the values in the data and insert the values into a string like so:

INSERT INTO SQLDBTable VALUES ('value1', 'value2', 'value3', .....)

In order to do this correctly, we need the correct regex to pull each of the values out of our data. Let's tackle each piece of data (first name, last name, birthday, etc.) one at a time and we'll wrap it up once we've matched everything.

First name

From the looks of the sample data we are given, the first names of each of our people are consistent - just letters. We can use simple regex to capture the first name from the data.

^(\w+)\t

This regex looks at the beginning of the string ^, for any number of letters \w+, followed by a tab character \t. 

But wait! How do we know the first name ends in a tab character? In VS Code, we can turn on visibility for whitespace characters (which includes tab characters) by holding down ctrl/cmd + shift and clicking p and typing "Preferences" and selecting Preferences: Open Settings (VS Code version 1.28.2). Turn Render Whitespace to All:


Render Whitespace set to All
Turning on Render Whitespace

Now, if you look at our data you will see a very small right arrow, which indicates the character is a tab:

Tab character present
The arrow means this is a tab character

Note - the small dot means a space character.

Last name

A little bit more complicated than the first name, I see from our data sample that a few last names have spaces, as well as apostrophes and hyphens. Luckily, this isn't hard for us in regex.

^(\w+)\t([\w'\- ]+)\t

This regex is matching letters, apostrophes, hyphens and spaces within a character set (also called a character class). We use the + (which is a quantifier) to essentially say "match one or more of the characters as you can within the character set (all the tokens within the brackets [ ])." We follow this by another tab character.

Note - we are using the parentheses to capture our matches to save them for later. If we didn't capture the characters that match the regex, we would not be able to pull out the values from our data!

Birthday

Unfortunately, the data in the birthday column is not consistent - which isn't that uncommon in real-world scenarios at all. We will need to handle this somehow in regex, and I feel the best way in this case is to capture each of the data points (month, day and year) separately and we can combine them together later.

^(\w+)\t([\w'\- ]+)\t(\d+)[\-/](\d+)[\-/](\d+)\t

This regex is matching one or more digits, \d, followed by a hyphen or forward slash - repeated. You notice that we have to escape the hyphen because the hyphen can be used in ranges and we do not want that functionality here. 

Income

Again, inconsistent data here too. We ended up speaking with the "business" and looking at our existing SQL table and only need to insert the numeric value into the SQL table. We will be dropping the commas, dollar sign and any decimal found in the data.

^(\w+)\t([\w'\- ]+)\t(\d+)[\-/](\d+)[\-/](\d+)\t\$?(\d+),?(\d+)\.?\d?\d?\t

This regex uses a little bit of the optional quantifier (the question mark ?) to make certain characters match 0 or 1 time (effectively what that means is the character or token before the question mark is optional). We mark the dollar sign optional, which also needs to be escaped because the dollar sign is an end-of-string anchor). 

The decimal value is also marked as optional, by adding a question mark after each of the tokens.

Note - the period in regex stands for a wildcard, meaning any character. Since we just want to match against a literal period, we need to escape it with a preceding backslash. This is true for any character we want to escape, the character must be preceded by a backslash.

Email

The email field is simpler from the last two fields, so we get a little bit of relief! The regex for the email field will be very similar to the last name field.

^(\w+)\t([\w'\- ]+)\t(\d+)[\-/](\d+)[\-/](\d+)\t\$?(\d+),?(\d+)\.?\d?\d?\t([\w@\-!]+)\.?c?o?m?\t

This regex matches one or more letters, @ signs, hyphens and exclamation marks followed by optional ".com". 

Address 1

I spoke with the business, and they assured me there would be no tab characters in the address 1 field, so we are able to write very simple regex to capture the address 1 field.

^(\w+)\t([\w'\- ]+)\t(\d+)[\-/](\d+)[\-/](\d+)\t\$?(\d+),?(\d+)\.?\d?\d?\t([\w@\-!]+)\.?c?o?m?\t([^\t]+)\t

Our regex captures one or more characters that is not a tab character. It does this through the use of a negated character set. A negated character set matches any character that is not within the set. 

I held off showing this regex to introduce us to other types of regex capturing for the first or last name, when in reality the above regex for address 1 would suffice for first and/or last name.

Zip

We do not want the optional 4 digits for zip code, so all we need to do to finish off our regex is just capture the 5 digits and ignore anything at the end of the string.

^(\w+)\t([\w'\- ]+)\t(\d+)[\-/](\d+)[\-/](\d+)\t\$?(\d+),?(\d+)\.?\d?\d?\t([\w@\-!]+)\.?c?o?m?\t([^\t]+)\t(\d{5}).*$

This regex matches 5 digits and then we have a wildcard match .* to the very end of the string $. The asterisk is a quantifier, just like the +, that matches 0 or more characters of the preceding token. Again, because the period is a wildcard token, the following tokens .* will match everything.

Reformatted the data

Now that we have our full regex, we can select everything in our text editor and do a regex find/replace. Before you do that, do delete the column headers! We do not want to cause issues with the presence of the column headers in our file. We can do a regex find/replace by opening up the find window by using ctrl/cmd + f and selecting the .* option in VS Code to enable regex matching.

Next, in our replace input box, we will use the following template to create our SQL INSERT statements. The numbers after the $ indicate capture group, that is $1 will be replaced by the first capture group, which is the first name. $2 will be replaced by the second capture group, which is the last name, and so on. You can see what value I used for the replace input box below

INSERT INTO Data VALUES ('$1', '$2', '$5-$3-$4', '$6$7', '$8.com', '$9', '$10')


Regex find/replacing
About to get our SQL INSERT statement!

Hit replace all or ctrl/cmd + alt + enter. This is our result:

Our new SQL INSERT statements
Our new SQL INSERT statements

Finishing touch

Here are a few tips to make your SQL a little more professional, add the following touches in order to wrap the SQL in a transaction and make it log output in case someone else has to run your SQL in a different environment / server.

BEGIN TRANSACTION
GO

INSERT INTO SQLDBTable VALUES ('Ray', 'Smith', '1987-5-2', '120000', 'rsm@aol.com', '123 Pleasant St.', '42511')
INSERT INTO SQLDBTable VALUES ('Richard', 'O' Charles', '2000-1-1', '10000', 'r_0charlz@ff.com', 'Benjamin road', '98292')
INSERT INTO SQLDBTable VALUES ('Rick', 'Xavier', '1994-10-22', '15000', 'rxavier@hotmail.com', 'Los Gatores', '77261')
INSERT INTO SQLDBTable VALUES ('Richardson', 'Chesterton', '1958-7-30', '80000', 'chestertonrichard@gmail.com', 'P.O. box 561', '23443')
INSERT INTO SQLDBTable VALUES ('Ricky', 'Lemonz', '1999-11-3', '22000', 'lemonz@sc.com', 'Mary ln.', '58273')
INSERT INTO SQLDBTable VALUES ('Rebecca', 'Fiddler', '2006-9-14', '500', 'fiddlerontheroof@ilovemusic.com', '2867 Bear Creek', '84271')
INSERT INTO SQLDBTable VALUES ('Rachel', 'Neber-Julius', '1977-4-21', '80000', 'neber-juliusthequeen!@me.com', 'Apt Complex', '13331')
GO

PRINT '******* Transaction rolled back - commit if there are no errors'
GO
ROLLBACK TRANSACTION
--COMMIT TRANSACTION

The GO keyword is used for usability and readability.

Comments

  1. I was diagnosed as HEPATITIS B carrier in 2013 with fibrosis of the
    liver already present. I started on antiviral medications which
    reduced the viral load initially. After a couple of years the virus
    became resistant. I started on HEPATITIS B Herbal treatment from
    ULTIMATE LIFE CLINIC (www.ultimatelifeclinic.com) in March, 2020. Their
    treatment totally reversed the virus. I did another blood test after
    the 6 months long treatment and tested negative to the virus. Amazing
    treatment! This treatment is a breakthrough for all HBV carriers.

    ReplyDelete

Post a Comment

Popular posts from this blog

UI redesigns are mostly a waste of time

To preface the article, I primarily work on, and prefer, back-end code. I've been involved in both web and software development for over 4 years now and worked with many front-end and back-end frameworks. New Twitter UI Before all of the UI designers that read this go out and riot and champion against me for saying UI redesigns are a waste of time, let me say that I do value design . I think at the bare minimum, a product or website needs to be usable , and if you possess a good eye and steady hand , you should feel compelled to create something that looks pleasing. David Just stop redesigning the UI all the time . UI redesigns, in my opinion, are a waste of time 95% of the time. Let me explain further. No one cares Come see our fresh new look ! What about our new  material design , come see! I'm sorry, but besides fixing the UI where it impacts the usability of your application, no one is raving about how a redesign makes the application any better. ...

[Fix] - ASUS PCE-AC68 adapter (no internet)

There seem to be a lot of problems with this adapter, even with such strong performance . Why so many issues? I'm not quite sure, but I needed to find a fix because I kept on losing wifi. The ASUS PCE-AC68 The fix Keeping it short - this is how I fixed the issue: Downloaded the driver for my OS from ASUS's support page -  https://www.asus.com/us/Networking/PCEAC68/HelpDesk_Download/ (in my case it was Windows 10 64-bit). Open Device Manager by holding the Windows key and pressing R, then typing "devmgmt.msc" and hitting Enter. (Don't worry, this isn't a scam . We are simply opening Window's Device Manager through the Microsoft Management Console snap-in .) Navigate to the yellow warning sign sitting under Network adapters and right click it. Select Update driver . Select Browse my computer for driver software  and choose the following path of the OS that you have installed on your computer. (The path for the driver on my computer was C...

Logging into a website with Powershell

Powershell is great, and it's lately been my go-to shell while I'm working on Windows. Sorry command prompt I really don't do a lot of work in the shell, but I do like to play with low-level interfaces from time to time. The article is about Linux shells, but goes into good explanation about what a shell is if you don't know. Log into a website Today, I wanted to do something that I have never really tried before and that is logging into a website using Powershell . The concepts behind this are quite simple really, as Powershell has support to send HTTP requests  and that's usually all we need, unless the server has CSRF protections in place (which it should). We are going to attempt  to log in to my favorite website for buying socks, Absolute Socks : Is that a turkey?! In order to do that, we need to have a login. So if you don't already have an account on www.absolutesocks.com , go make one now. Viewing the login request On websites,...