How to Build a Data Lab with Little or No Money?
In a publishing company I was working for several years ago, we had to take care of thousands of products that were exported in over 20 categories. Each product had its own pricing with special conditions depending on the destination country or carrier. The publisher wanted us to design an application where he could check all prices in just one screen and determine which ones needed to be adjusted due to discounts or other policies.
My boss asked me to develop the system so he would only have to fill in some fields with information about each category, export it with Excel macros, and start playing around with the data in order to find out how much money would go into his pocket with this new strategy. This project included millions of records (originally stored in Excel), and our main objective was to add new information to the application in a timely fashion.
The first thing we did when we got the project was to create a database with all that information, but then we realized that doing it manually would be easier than setting up all those tables and relationships. My boss didn’t even want Excel as an option and instead asked me to set everything up so he could do his job directly from Access. Though I’ve never worked with macros before, I was able to find several programs online that made our lives much easier.
After analyzing the requests of my boss over an entire year, I learned how important it is for people who require daily reports with large sets of data to have them delivered on time and at low cost. People who are not familiar with database software will almost always look for an online solution to request reports, but they can’t find it if they don’t have the necessary knowledge.
I have worked on projects where the main requirement was to deliver Excel files every morning by 8 AM sharp, even when there were questions about what information had to be included or how many categories had to be shown in each row. It took us a long time to figure out how we could meet those requirements easily, and I decided that this would make a nice article for anyone who needs daily reports with large sets of data at low cost.
First idea: Extracting data from an existing application
- The first step is to determine whether you already have access to the source of the information. If you can access some kind of database or Excel file, then it is easier to choose that as the source because there are fewer steps involved. However, if this is not an option, then you will need to build everything from scratch.
- If there aren’t any pre-existing databases or data files available for your application, then the only way to proceed is by building a custom solution. We tried all kinds of applications until we found one that was able to serve our needs without consuming too many resources.
- You could use PHP with MySQL on Linux servers, Tomcat with Java on Windows servers, Ruby on Rails with PostgreSQL or Oracle on Linux servers… The most important thing here is your solution should be able to meet the application’s requirements at all times, even if you have to use a different solution in the future.
- The next step is to make sure your data is properly organized. If it isn’t, then take some time to restructure it so it can be processed without issues and exported as required by your application. The last thing we want is having problems because we didn’t think about how each field should be named or whether they are stored in the correct order.
- No matter what technology you choose for extracting and processing your data, one of these products will most likely include an example with a SQL SELECT statement that retrieves records from a table and processes them into another table or view before executing an INSERT INTO…SELECT FROM statement that inserts the processed data into the destination table.
- The following example shows how you can retrieve all the records from a table named teleport, transform them to another table (TB processed), and then insert that new data into the required destination tables.
Conclusion:
If you have the source data, then it is easier to build your own solution because you won’t have to spend time writing reports. You can just use an existing database or file as the source of information and then transform that into another format using macros or VBA code, custom applications, etc.
However, if you don’t have any pre-existing sources available for your daily reports, then the best solution would be creating a custom application that will generate them on demand or at set times each day, avoiding performance issues in your production environment.

