A Growth Hacker's Journey – At the right place at the right time

When I was in high school in the 1970's, I discovered that some of my cleverer math classmates were skipping lunch. No, they were not skipping school. But they were spending time in a small office next to the math classroom. What were they doing that induced them to miss lunch and their free period? They were learning to program computers. Specifically, they were learning the FORTRAN computer language, and they were submitting their simple codes remotely via a 110-bps (bits per second) teletype in that small office over phone lines to the IBM mainframe computer on the campus of the University of Nebraska at Omaha. I was drawn into this amazing world like a bee to spring flowers. I was soon coding with the best of them -- implementing a variety of math formulae, generating trig, log, and other math tables, completing homework problems, and exploring a brave new world of numerical method algorithms for the very first time. I was hooked. I was at the right place at the right time. 

I continued this journey through college and into astronomy grad school at Caltech, where I carried out astrophysical calculations, astronomical data analysis, and computational astrophysics for my Ph.D. dissertation. Some of my more computer-savvy classmates were doing more than writing high-level programs in FORTRAN. They were actually creating their own reusable informal commands at the machine level that controlled the computer in more direct ways than the formal mathematical syntax of FORTRAN. My classmates were using an amazingly fun and powerful programming language called FORTH. "Fun?" you may be asking! For anyone who has ever programmed in FORTH, no explanation is necessary. For those who haven't, no explanation is possible. I was hooked. I was at the right place at the right time. 

After grad school, I continued my astronomical data analysis and computational astrophysics research projects in postdoctoral positions. One of those fellowships was as an instructor at the University of Michigan. It was at that time (early 1980's) that the PC revolution began. Everyone was talking about getting one of the IBM PC's or an IBM clone. While it was great to use the powerful university computers for my day job (and for my "night job" -- remember, I am an astronomer!), I really wanted to try some fun things on my own computer at home. Though I was a poor university instructor, my wife and I saved enough funds to take advantage of a deep-discount purchase opportunity that was offered by the university for staff members to buy a home PC. I took the bait. I was hooked. I was at the right place at the right time. 

After Michigan, we were off to Washington DC, where I received a Carnegie Fellowship within the Carnegie Institution of Washington. Soon, children came into our lives, which changed a lot of priorities. If our little family had started sooner, then maybe that home PC purchase would have been delayed several years. As it turned out, as our twin baby daughters began to become more aware of their world, I started writing little game programs (with interactive graphics) for them to play on that PC. I was using some of those hacking skills that I learned from FORTH and FORTRAN for fun. I also used those skills for personal money management -- I wrote my own very simple spreadsheet program to balance our tight family budget. I used simple screen-hacking skills and wrote "database v.0.01" routines to make data entry, access, and analysis easy and extensible. It was this combination of skills that spilled over into my day job, which got the attention of a senior manager of the then new (in 1985) Hubble Space Telescope Science Institute in Baltimore Maryland. She needed to hire someone with such skills right away, immediately, even instantly -- especially since the Hubble Telescope launch was about one year away (according to NASA's Shuttle launch schedule at that time). She called my Fellowship mentor at Carnegie, asking her if she knew any candidates. My mentor named me. So, I went on the interview a couple of days later, and before I left the building that day, I was offered a job as scientist on the world's premier scientific instrument at one of the world's premier research institutes. I was hooked. I was at the right place at the right time. 

A few months after my arrival at the Hubble Telescope Science Institute, tragedy struck! In January 1986, the Shuttle Challenger exploded 78 seconds after launch, killing all 7 astronauts on board. As a young person who dreamed of working in astronomy and space sciences since I was 9 years old, I was devastated. It took weeks for the staff to recover from the trauma of that horrific day. To this day, I still get choked up when I watch the recorded video footage of the event. Three things became very clear during those after-months:

  1. The Shuttle launches would not resume for several years (hence, the Hubble Telescope would be grounded for all those years) while NASA fixed the problems that led to the Challenger catastrophe, which meant that the Hubble team of scientists and engineers had a lot of years to evaluate and improve all of the telescope systems.
  2. One of the systems that was in significant need of improvement was the adminstrator-oriented Hubble Data Management System, which was previously designed primarily for data management (and curation) and not so much for scientist-friendly data access, exploration, and discovery -- hence, during those post-Challenger years, fresh designs and plans were developed for a new "top of the line" end-user-oriented Hubble Science Data Archive.
  3. Another system was identified as needing total overhaul, even rewriting the entire code base from scratch, and that was the scientific proposal entry, processing, and reporting system -- they needed someone new to do the job, someone with a fresh perspective, with database skills, user interface skills, programming skills, and strong familiarity with astronomy. Guess who satisfied all of those constraints? Yes, they tasked me to do it. I learned SQL (along with all those other programming languages) and spent the next few years designing, code-writing, integrating, testing, deploying, and managing the entire system (every single one of those million lines of code -- at $100 per line of code at industry rates, I estimated that my boss owed me a small private island as my bonus!). The Hubble Telescope scientific proposal entry, processing, and reporting system worked perfectly in 1990 when "all systems were go" for the Hubble project, and I received a major award in 1991 for my efforts. It was unbelievable to realize that my system (built upon simple things that I had learned to do since high school) was used by thousands of astronomers worldwide. That database stuff was awesome. I was hooked. I was at the right place at the right time. 

After those achievements, I was promoted to NASA Project Scientist for the Hubble Science Data Archive, tasked with bringing the new data system online and validating its use for scientific data access, exploration, and discovery. My group received another major award in 1994 after successful completion of that project. I gave a presentation that year outlining our comprehensive science data verification plans and the database system. A person in the audience for that talk was a NASA senior manager seeking a contractor employee manager for the Astrophysics Data Facility (ADF) within NASA's Space Science Data Operations Office at the Goddard Space Flight Center in Greenbelt Maryland. We met, we talked, I interviewed, and I was hired in 1995. My new job was to oversee the acquisition, ingest, management, and public dissemination of the scientific data sets from all of NASA's thousands of astronomy and space science experiments since the birth of NASA. I was hooked. I was at the right place at the right time.

In 1998, I was attending a conference when an astronomer that I knew from across the country sought me out and asked if his group could send the data from their large astronomy experiment to NASA's ADF. It was two Terabytes in total. That seemed big (like the birth of "Astronomy Big Data") to me, especially for 1998, but I didn't know how big until I went back to work a few days later. When I mentioned this opportunity to the NASA facility senior managers, they looked at me like I was unaware of something really obvious and important. They were right! They "reminded" me that the data facility was the home for 15,000 NASA space science mission data sets, and the aggregate sum total data volume for all of those data sets combined(!) was less than one Terabyte! They couldn't possibly accept one single experiment's data that single-handedly eclipsed the total volume of all of the other 15,000 experiments' data sets combined. Well, this was embarrassing! What could we do? I was told that ADF could accept the data if I would write a research grant proposal and win some funds to pay for all of the new I.T. resources that would be required. "What kind of research proposal would pay for such a thing?" I asked myself. This led me to investigate a field of research that I had only briefly heard in conversation once or twice previously -- Data Mining (= Machine Learning applied to large data sets). The more I read about this topic (now called Data Science), the more I became convinced that this is what I wanted to do for the rest of my research career. I was hooked. I was at the right place at the right time.

So, I began learning about data mining, more and more, and I wrote research proposals to pursue this dream. Some of those proposals were funded! This allowed me to spend a little of my time (apart from my management function) to create, build, and maintain a comprehensive website "NASA’s Data Mining Resources for Space Science", which you can still find on the Internet Archive Wayback Machine. I was not a data mining expert, but I was learning who the experts were. My website was a big long curated list of cool data mining resources that I found on the Internet: algorithms, methods, software packages, tutorials, research papers, white papers, popular articles, lectures, conferences, expert interviews, and more (plus links to some of my own papers, including some papers in 2000 where I introduced the concept of using data mining to discover "unknown unknowns"). I did not consider myself to be an expert (yet!), but I sure knew how to curate other experts' contributions. I didn't recognize the reach and influence of my website until October 2001 (one month after the tragic day September 11). On that day in October, my office phone rang -- the unknown voice at the other end of the line asked me "Can you brief the President tomorrow morning on data mining?" My first reaction was "shock and awe." I remember clearly what I said next to the mystery caller: "Do you mean THE President?" Yes, they did! I asked how they ever thought to call me. They said, "We realized that we need some expertise in this area, so we asked folks at NASA HQ, and they said you were NASA's data mining expert." I never thought of myself that way, but I then realized that even the little bit I was doing was already considered "expert". I also realized after that phone call that what I was doing (now called Big Data Analytics) was going to change the world (not just in science, but in all domains). I was hooked. I was at the right place at the right time.

It was not too many months after those events that I learned that NASA was planning to shut down the ADF. Month after month, I had to let members of my staff go, usually helping them find good jobs elsewhere. It soon became apparent that the last remaining staff were the barebones staff needed to be on duty until the final day of ADF operations. So, to satisfy the latest round of budget cuts, I decided to "fire" myself. After discussing this at one of our advisory committee meetings, one of the members of the committee contacted me. He was the Dean of the School of Computational Sciences at George Mason University. He told me they have had a PhD program in Computational Science and Informatics (= Data Science) since 1991, and he liked what I was doing and wanted to offer me a faculty position in his program. This information astonished me! So, I revealed to him a secret that I had kept to myself for years -- that I wanted to create a Data Science undergraduate degree program (the first in the world) to teach data science and promote data literacy skills to the next generation workforce. He said "Come on! Let's do it together!" So I left NASA, joined GMU in 2003, and established that B.S. program a few years later. Guess what? You guessed it -- I was hooked. I was at the right place at the right time.

As the years went by, I tried to maintain my own personal log of news stories, articles, and events surrounding those topics of my old NASA website: data mining, BI, analytics! Then the floodgates opened in 2012: (1) the U.S. President announced the National Big Data Initiative; and (2) a couple of bonafide data scientists wrote an earth-shaking article for the prestigious Harvard Business Review: "Data Scientist - The Sexiest Job of the 21st Century." As a consequence of these events (and some others), the number of related news articles, websites, and blogs skyrocketed from a couple per week to a couple per hour (at least!). I decided to join Twitter in March 2012, and began tweeting regularly and systematically in late northern Summer 2012. My Twitter obsession was personal: I wanted to do the best I could to curate and provide a log of all the new big data, analytics, and data science stories, articles, events, and major players that were popping up hourly. For me, Twitter was like my old personal log, but now on steroids! I did it for me initially, and then also for my growing follower community. But a funny thing happened in May 2013: one of my new Twitter friends Carla Gentry @data_nerd messaged me and asked "How does it feel to be the #2 big data influencer on Twitter in the world?" I said "WHAT?!?" (or something like that.) Soon after that, a couple of other lists came out, placing me at number one! I realized that what I started doing for myself, was also resonating with a lot of other big data and data science believers that were coming out into the bright sunlight of social media. I was hooked. I was at the right place at the right time.

As my social media involvement grew, I began getting many(!) offers to write blog posts (including at MapR), to give invited talks at conferences, to be on companies' advisory boards, to tweet for sponsors, to do Twitter chats, to do webinars, and... to consider new job opportunities! I have received such job requests from countless firms (at least two requests per week since 2013)! Though some of these job opportunities were very attractive, I was not sufficiently tempted to leave my dream job at GMU (tenured full professor, "teaching data science and promoting data literacy skills to the next generation workforce"). Furthermore, relocation was not acceptable (now that my own children, who live nearby, are giving us grandchildren now). It was therefore with secret admiration that I became aware of an outstanding, well known, established corporation (just a few miles from my home) that was making all the right moves:

  1. They created a Strategic Innovation Group, with one of the focus areas being data science and big data analytics, with big corporate buy-in (at least one Executive VP and two other VPs overseeing it all);
  2. they now employ several hundred data scientists;
  3. they have established a world class data-driven business culture;
  4. they wrote a fabulous and freely available "Field Guide to Data Science";
  5. they are passionately dedicated to "teaching data science and promoting data literacy skills to the next generation workforce";
  6. they created an online Data Science Training Program at https://exploredatascience.com (which even has an astronomy-based theme);
  7. they funded and sponsored (with Kaggle) the first National Data Science Bowl;
  8. they are actively growing their data science capabilities with many new positions opening weekly;
  9. they have given data science jobs to several of my students; and
  10. they have been seeking to hire a Principal Data Scientist to join their leadership team. 

That data science champion firm is Booz-Allen Hamilton, and their new Principal Data Scientist will be yours truly (me!) – growth-hacking my "next career" with this truly amazing organization starting in late May 2015.  

I am hooked. I am at the right place at the right time. And I am immeasurably grateful to every person who helped me to reach this awesome place.

no

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams

 

 

 

Download for free